How does AWS NLB work when it comes to routing packets? - amazon-web-services

I am trying to understand how AWS NLB routes traffic between a client and targets. I ran a simple test with NLB using tcpdump in both the client and server to see who is sending packets where etc. This is what i found from tcpdump -
NLB IP:172.31.29.2 Mac:02:4d:8f:d9:22:e2
Client IP:172.31.20.174 Mac: 02:7e:b1:06:aa:42
02:7e:b1:06:aa:42 > 02:4d:8f:d9:22:e2, ethertype IPv4 (0x0800), length 74: 172.31.20.174.47682 > 172.31.29.2.80: Flags [S]
02:4d:8f:d9:22:e2 > 02:7e:b1:06:aa:42, ethertype IPv4 (0x0800), length 74: 172.31.29.2.80 > 172.31.20.174.47682: Flags [S.]
Server IP:172.31.24.59 Mac: 02:0d:2c:74:be
02:4d:8f:d9:22:e2 > 02:0d:2c:74:be:88, ethertype IPv4 (0x0800), length 74: 172.31.20.174.47682 > 172.31.24.59.http: Flags [S]
02:0d:2c:74:be:88 > 02:7e:b1:06:aa:42, ethertype IPv4 (0x0800), length 74: 172.31.24.59.http > 172.31.20.174.47682: Flags [S.]
From #4 line above, server responded with an acknowledgement packet directly to the client - which makes me think that it is doing Direct Routing instead of sending response packets via the NLB.
But, when i look at line #2, i expected to see that the acknowledgement packet from the server IP/Mac address instead of the NLB IP/Mac address as seen in line #4. I do understand that in line #3, the mac address is of NLB but since NLB preserves client IP, we can see the client IP intact.
Would appreciate if some can explain how this routing is happening.

The short version: Network Load Balancer isn't really a device, and VPC isn't really Ethernet, and what you see is an artifact of this otherwise-convincing illusion.
NLB is provided by an internal service called AWS Hyperplane, which is integrated with the VPC network infrastructure. It manipulates traffic in the network at the flow level, rewriting source or destination IP addresses as the traffic passes from machine to machine.
Take two machines on the same subnet, without an NLB in the mix, and who don't have each other in their ARP tables... and ping from one to the other. On the instance where you run the ping, you'll see ARP traffic going out to discover the other instance, and you'll see the ARP response come back from the other instance. But on the other instance, you'll sniff... nothing, because that ARP negotiation never actually occurred end-to-end. It only looks like it does. The ARP response is forged by the network, including the source MAC address of the second machine.
Something similar is happening here. The network is essentially forging the source MAC at the same time it rewrites the source IP... so the server responds "directly" to the client IP, but then the network translates both the source address and the source MAC to appear to be "from" the NLB ENI. It gives the impression of asymmetry but that's an illusion, because the traffic isn't really going "through" the NLB in either direction.

Related

How TCP Load-balancer and AWS ELB works?

I am bit puzzled about how TCP load balancer works, in particular AWS ELB. By looking at AWS ELB doc:
For TCP traffic, the load balancer selects a target using a flow hash algorithm based on the protocol, source IP address, source port, destination IP address, destination port, and TCP sequence number. The TCP connections from a client have different source ports and sequence numbers, and can be routed to different targets. Each individual TCP connection is routed to a single target for the life of the connection.
This confuses me. I assume/expect that a TCP connection persists to the same target for the duration of it. However (!) if the hash algorithm takes in account also the TCP sequence number - which is going to change for every tcp packet round-trip - then such connection is (wrongly) routed to other targets after every tcp packet round-trip.. Please help.

IPv6 Network is unreachable (os error 101)

Trying to create tcp socket connection between via an Ipv6 address, I get Network is unreachable (os error 101)
As a binding local address is used fe80::850***.
Probably it's because of fe80*** is local ipv6 address generated by OS. Is there a way to provide correct configuration for system to make a call via IPv6?
Short answer:
Your IPv6 connectivity is not setup correctly. This is most probably no configuration problem on your machine, but on the gateway router.
Long answer:
You get a "Network is unreachable" error because your operating system does not know how to reach the destination address. Usually it maintains a table called the routing table, where it looks up which network path leads to which ip subnet. It seems that in your case, there are several things missing.
Your interface probably has no configured non-link-local (fe80::/64) address
There is no routing table entry for the destination address range
In the IPv4 world these missing things are usually resolved by a DHCP server. In the IPv6 world, there are two possible configuration options. Either DHCPv6 is used or the IPv6 Stateless Address Autoconfiguration is used. It seems that neither is setup in your network. Note that correctly setting up one of these requires you to have administrative access to your gateway router.
As it required quite long time to find the main reason of the problem, will try to share the solution/investigation steps:
To make sure ipv6 is not configured properly just run 'ping6 ipv6.google.com'. It should return some error message like 'connect: Network is unreachable' or just stuck.
AWS ec2 instances by default are not assigned ipv6 global routable addresses(only ipv4 is assigned). Ipv6 addresses like 'fe80:*' are local routable and can't be used for global requests. To check the list of ip addresses, check eth0 config /sbin/ifconfig. Btw, AWS provides interface to generate global routable ipv6 address for ec2 instance (just check vpc/ec2 console pages - https://docs.aws.amazon.com/vpc/latest/userguide/vpc-migrate-ipv6.html)
Check security group of ec2 instance and make sure that in route table ipv6 traffic is allowed.Specifically inbound and outbound rules should include '::/0' for ipv6(or one of them based on the need).
Try 'ping6 ipv6.google.com' again
Run the following command to check whether IPv6 is enabled:
ip addr
If only an IPv4 address is displayed, IPv6 is disabled. Enable it by referring to this tutorial.
If a link-local address (starting with fe80) is displayed, IPv6 is enabled but dynamic assignment of IPv6 addresses is not enabled.
If an IPv6 address other than fe80 is displayed, IPv6 is enabled and an IPv6 address has been assigned.

How to configure l3ACL dpdk application for gateway

I am trying to configure l3ACL application for gateway.l3fwd ACL Dpdk application is running in Mellanox NIC.using dpdk (dpdk-stable-20.11) as a shared library.
Edit:
Earlier scenario of Connection setup of l3fwd acl testing using Trex Traffic generator
enter image description here
In this scenario the packets are forwarded by the L3fwd ACL application of DPDK in the direction from Port 1 to Port 0 of the Trex traffic generator. This was made possible by including the MAC address in the --eth-dest flag with the MAC address indicated by the Trex at initialization. On including this MAC address the packets were detected by the Rx side of the traffic generator i.e Port 0.
Current scenario
This setup was modified to mimic gateway level deployment to test L3fwd ACL as shown in the connection diagram attached below.
Connection Diagram attached
enter image description here
In this setup the ports connected to the traffic generator are replaced by two machines that mimic the external network and internal LAN network as shown. We have tried to ping the external and internal network without running DPDK application. The ping in the network is working without enabling DPDK. The L3fwd ACL application was subsequently started with the command given below with the physical MAC address of the machine which was replaced at the port 0 side which was earlier connected to the traffic generator. Port 0 received the traffic from the L3fwd ACL application in the case of the traffic generator. The main difference is that we are including a physical MAC address with the --eth-dest flag in the gateway scenario whereas an emulated MAC indicated by the traffic generator was used with --eth-dest flag to forward the packets in the working setup for L3fwd ACL using traffic generator in a loopback manner.
The physical MAC address of the interface of the Rx side of external network machine connected at Port 0 is not receiving the traffic destined out from the L3fwd ACL application. The configured Route entry in rule_ipv4.db is R0.0.0.0/0 0.0.0.0/0 0 : 65535 0 : 65535 0/0xff 0.
Not able to trace the packet at the interface with Mac address given in –eth_dest parameter.After starting l3fwd acl destination host is unreachable.
Command used for L3FWD ACL
./dpdk-l3fwd-acl –l 1-7 –n 4 -- -p 0x3
--config=”(0,0,1),(1,0,2),(0,1,3),(1,1,4),(0,2,5),(1,2,6),(0,3,7)” --rule_ipv4=”/root/rule_ipv4.db” –rule_ipv6=”/root/rule_ipv6.db” –eth-dest=0,next-hop-MAC-here
How to configure the l3ACl dpdk sample application for gateway?
DPDK example code l3fwd-acl works on IP packets only. For non-ip it is dropped in the function process_one_packet. While using external packet generator like TREX, DPDK_PKTGEN, SCAPY, PACKETH, PCAP replay with IP packets are not dropped and ACL lookup is performed. Packets matching with the condition are forwarded through DPDK ports while no match are dropped.
In your current scenario, connecting to the interface to actual network could lead to
ARP or RARP packets to be generated
LDAP packets to be generated
If connected over managed switch VLAN packets are generated.
In all above cases these are treated as non-IP packets leading to drop of the same. Hence the recommended way is to use by adding static ARP entry to end machines or switch. This will eliminate the ARP and RARP packets.
Note: If the external devices are not configured with promiscious mode, please use --eth-macaddress to help l3fwd-acl to update the MAC address too.

How does AWS NLB preserve the client source IP address

I am playing with the NLB. One feature is that it can preserve the client source IP. I tested it and it works. However, has anybody been wondering how it works?
Let's say that my home PC is the client for the HTTP request and it is behind the public IP 1.1.1.1
The NLB has an IP of 2.2.2.2 on the public side. The real webserver in the target group is an instance with private IP 192.168.0.10. The instance is also in the public subnet and it has an elastic IP of 2.2.2.10.
I confirmed with my packet capture (tcpdump) on the server that I see requests coming in from 1.1.1.1. I see the response going back to 1.1.1.1 as well. However, my home PC's Wireshark would show traffic to and from 2.2.2.2, and not 2.2.2.10... How's that possible?
From the routing perspective, the server would receive the request from 1.1.1.1 and will send a response back to it. The response would traverse through the IGW, instead of the NLB, and therefore will have 2.2.2.10 when on the Internet. The connection would be rejected by my PC because the response came back from a different IP (2.2.2.10) rather than the original one (2.2.2.2).
Is the NLB somehow tied to the IGW and in this case, the IGW would know to SNAT the response to 2.2.2.2 instead of 2.2.2.10?
Thanks,
Difan

Tcp level Information on Ec2

I'm trying to get TCP timestamp from the packets for clock skewing purposes on my application which is hosted on EC2. In my network I have an ALB.
So my question is how do I get TCP level packet information in my app ? Since ALB filters out all the OSI Layers except application level (HTTP)
If the only reason to get access to TCP packet is to detect timestamp and correct clock drift, I would suggest to configure your EC2 instance to use NTP time server instead.
https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/
That being said, the ALB is not "removing" TCP information from network packets. HTTP connections made to your application are still transported over IP and TCP. If you need low level access to network packets from an app, I would suggest to look at the pCAP library which is used by TCPDUMP and many other tool to capture network traffic on an interface.
https://www.tcpdump.org/
[UPDATED to include comments]
It is important to understand the TCP connection between your client and the ALB is terminated at the ALB level. The ALB creates a second TCP connection to forward HTTP requests to your EC2 instance. The ALB does not remove information from TCP/IP, it just creates a second, independent and new connection. Usually the only information you want to propagate from the initial TCP connection is the source IP address. The ALB, like most load balancers and proxies, captures this information from the original connection (the one received from the client) and embed the information in an HTTP header called X-Forwarded-For.
This is documented at https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/x-forwarded-headers.html
If you want to capture other information from the original connection, I am afraid it will not be possible using ALB. (but I also would be very curious about the use case, i.e. WHAT you're trying to achieve)