How TCP Load-balancer and AWS ELB works? - amazon-web-services

I am bit puzzled about how TCP load balancer works, in particular AWS ELB. By looking at AWS ELB doc:
For TCP traffic, the load balancer selects a target using a flow hash algorithm based on the protocol, source IP address, source port, destination IP address, destination port, and TCP sequence number. The TCP connections from a client have different source ports and sequence numbers, and can be routed to different targets. Each individual TCP connection is routed to a single target for the life of the connection.
This confuses me. I assume/expect that a TCP connection persists to the same target for the duration of it. However (!) if the hash algorithm takes in account also the TCP sequence number - which is going to change for every tcp packet round-trip - then such connection is (wrongly) routed to other targets after every tcp packet round-trip.. Please help.

Related

Uniform balancing with AWS network load balancer

We have several servers behind an AWS network load balancer.
The used algorithm for balancing trafic is the "round robin" describe as below :
"With Network Load Balancers, the load balancer node that receives the connection uses the following process:
Selects a target from the target group for the default rule using a flow hash algorithm. It bases the algorithm on:
The protocol
The source IP address and source port
The destination IP address and destination port
The TCP sequence number
Routes each individual TCP connection to a single target for the life of the connection.
The TCP connections from a client have different source ports and sequence numbers, and can be routed to different targets."
Due to the persistance of connections, servers load may be unbalanced and can cause problems.
How to configure the network load balancer to route new connections on the server that have the less load ?
ALBs now support Least Outstanding Request routing. NLB does not appear to support this (yet?)
Is there any possibility of adapting your LB strategy to ALBs from NLBs?

Kubernetes nginx ingress path-based routing of HTTPS in AWS

Question: Within Kubernetes, how do I configure the nginx ingress to treat traffic from an elastic load balancer as HTTPS, when it is defined as TCP?
I am working with a Kubernetes cluster in an AWS environment. I want to use an nginx ingress to do path-based routing of the HTTPS traffic; however, I do not want to do SSL termination or reencryption on the AWS elastic load balancer.
The desired setup is:
client -> elastic load balancer -> nginx ingress -> pod
Requirements:
1. The traffic be end-to-end encrypted.
2. An AWS ELB must be used (the traffic cannot go directly into Kubernetes from the outside world).
The problem that I have is that to do SSL passthrough on the ELB, I must configure the ELB as TCP traffic. However, when the ELB is defined as TCP, all traffic bypasses nginx.
As far as I can tell, I can set up a TCP passthrough via a ConfigMap, but that is merely another passthrough; it does not allow me to do path-based routing within nginx.
I am looking for a way to define the ELB as TCP (for passthrough) while still having the ingress treat the traffic as HTTPS.
I can define the ELB as HTTPS, but then there is a second, unnecessary negotiate/break/reencrypt step in the process that I want to avoid if at all possible.
To make it more clear I'll start from OSI model, which tells us that TCP is level 4 protocol and HTTP/HTTPS is level 7 protocol. So, frankly speaking HTTP/HTTP data is encapsulated to TCP data before doing rest levels encapsulations to transfer packet to another network device.
If you setup Classic (TCP) LoadBalancer it stops reading packet data after reading TCP part, which is enough to decide (according to LB configuration) to which IP address and to which IP port this data packet should be delivered. After that LB takes the TCP payload data and wrap it around with another TCP layer data and send it to the destination point (which in turn cause all other OSI layers applied).
To make your configuration works as expected, it's required to expose nginx-ingress-controller Pod using NodePort service. Then Classic ELB can be configured to deliver traffic to any cluster node to port selected for that NodePort service. Usually it is in between 30000 and 32767. Sou your LB pool will look like the following:
Let's imagine cluster nodes have IP addresses 10.132.10.1...10 and NodePort port is 30276.
ELB Endpoint 1: 10.132.10.1:30276
ELB Endpoint 2: 10.132.10.2:30276
...
ELB Endpoint 10: 10.132.10.10:30276
Note: In case of AWS ELB, I guess, nodes DNS names should be used instead of IP addresses.
So it should cause the following sequence of traffic distribution from a client to Kubernetes application Pod:
Client sends TCP packet with HTTP/HTTPS request in payload to ELB_IP:ELB_port (a.b.c.d:80).
ELB receives IP packet, analyze its TCP data, finds the appropriate endpoint from backend pool (whole list of Kubernetes cluster nodes), and creates another TCP packet with the same HTTP/HTTPS data inside, and also replaces destination IP and destination TCP port to cluster node IP and Service NodePort TCP port (l.m.n.k:30xxx) and then send it to the selected destination.
Kubernetes node receives TCP packet and using the iptables rules changes the destination IP and destination port of the TCP packet again, and forward the packet (according to the Nodeport Service configuration) to destination pod. In this case it would be nginx-ingress-controller pod.
Nginx-ingress-controller pod receives the TCP packet, and because according to TCP data it have to be delivered locally, extracts HTTP/HTTP data out of it and send the data (HTTP/HTTPS request) to Nginx process inside the Nginx container in the Pod,
Nginx process in the container receives HTTP/HTTPS request, decrypt it (in case of HTTPS) and analyze all HTTP headers.
According to nginx.conf settings, Nginx process change HTTP request and deliver it to the cluster service, specified for the configured host and URL path.
Nginx process sends changed HTTP request to the backend application.
Then TCP header is added to the HTTP request and send it to the backend service IP_address:TCP_port.
iptables rules defined for the backend service, deliver packet to one of the service endpoints (application Pods).
Note: To terminate SSL on ingress controller you have to create SSL certificates that includes ELB IP and ELB FQDN in the SAN section.
Note: If you want to terminate SSL on the application Pod to have end to end SSL encryption, you may want to configure nginx to bypass SSL traffic.
Bottom line: ELB configured for delivering TCP traffic to Kubernetes cluster works perfectly with nginx-ingress controller if you configure it in the correct way.
In GKE (Google Kubernetes Engine) if you create a Service with type:LoadBalancer it creates you exactly TCP LB which forward traffic to a Service NodePort and then Kubernetes is responsible to deliver it to the Pod. EKS (Elastic Kubernetes Service) from AWS works in pretty much similar way.

Tcp level Information on Ec2

I'm trying to get TCP timestamp from the packets for clock skewing purposes on my application which is hosted on EC2. In my network I have an ALB.
So my question is how do I get TCP level packet information in my app ? Since ALB filters out all the OSI Layers except application level (HTTP)
If the only reason to get access to TCP packet is to detect timestamp and correct clock drift, I would suggest to configure your EC2 instance to use NTP time server instead.
https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/
That being said, the ALB is not "removing" TCP information from network packets. HTTP connections made to your application are still transported over IP and TCP. If you need low level access to network packets from an app, I would suggest to look at the pCAP library which is used by TCPDUMP and many other tool to capture network traffic on an interface.
https://www.tcpdump.org/
[UPDATED to include comments]
It is important to understand the TCP connection between your client and the ALB is terminated at the ALB level. The ALB creates a second TCP connection to forward HTTP requests to your EC2 instance. The ALB does not remove information from TCP/IP, it just creates a second, independent and new connection. Usually the only information you want to propagate from the initial TCP connection is the source IP address. The ALB, like most load balancers and proxies, captures this information from the original connection (the one received from the client) and embed the information in an HTTP header called X-Forwarded-For.
This is documented at https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/x-forwarded-headers.html
If you want to capture other information from the original connection, I am afraid it will not be possible using ALB. (but I also would be very curious about the use case, i.e. WHAT you're trying to achieve)

How does AWS NLB work when it comes to routing packets?

I am trying to understand how AWS NLB routes traffic between a client and targets. I ran a simple test with NLB using tcpdump in both the client and server to see who is sending packets where etc. This is what i found from tcpdump -
NLB IP:172.31.29.2 Mac:02:4d:8f:d9:22:e2
Client IP:172.31.20.174 Mac: 02:7e:b1:06:aa:42
02:7e:b1:06:aa:42 > 02:4d:8f:d9:22:e2, ethertype IPv4 (0x0800), length 74: 172.31.20.174.47682 > 172.31.29.2.80: Flags [S]
02:4d:8f:d9:22:e2 > 02:7e:b1:06:aa:42, ethertype IPv4 (0x0800), length 74: 172.31.29.2.80 > 172.31.20.174.47682: Flags [S.]
Server IP:172.31.24.59 Mac: 02:0d:2c:74:be
02:4d:8f:d9:22:e2 > 02:0d:2c:74:be:88, ethertype IPv4 (0x0800), length 74: 172.31.20.174.47682 > 172.31.24.59.http: Flags [S]
02:0d:2c:74:be:88 > 02:7e:b1:06:aa:42, ethertype IPv4 (0x0800), length 74: 172.31.24.59.http > 172.31.20.174.47682: Flags [S.]
From #4 line above, server responded with an acknowledgement packet directly to the client - which makes me think that it is doing Direct Routing instead of sending response packets via the NLB.
But, when i look at line #2, i expected to see that the acknowledgement packet from the server IP/Mac address instead of the NLB IP/Mac address as seen in line #4. I do understand that in line #3, the mac address is of NLB but since NLB preserves client IP, we can see the client IP intact.
Would appreciate if some can explain how this routing is happening.
The short version: Network Load Balancer isn't really a device, and VPC isn't really Ethernet, and what you see is an artifact of this otherwise-convincing illusion.
NLB is provided by an internal service called AWS Hyperplane, which is integrated with the VPC network infrastructure. It manipulates traffic in the network at the flow level, rewriting source or destination IP addresses as the traffic passes from machine to machine.
Take two machines on the same subnet, without an NLB in the mix, and who don't have each other in their ARP tables... and ping from one to the other. On the instance where you run the ping, you'll see ARP traffic going out to discover the other instance, and you'll see the ARP response come back from the other instance. But on the other instance, you'll sniff... nothing, because that ARP negotiation never actually occurred end-to-end. It only looks like it does. The ARP response is forged by the network, including the source MAC address of the second machine.
Something similar is happening here. The network is essentially forging the source MAC at the same time it rewrites the source IP... so the server responds "directly" to the client IP, but then the network translates both the source address and the source MAC to appear to be "from" the NLB ENI. It gives the impression of asymmetry but that's an illusion, because the traffic isn't really going "through" the NLB in either direction.

how to distinguish request and response packet in aws cloudWatch logs?

i hope your help.
My cloudWatch example is below.
image capture: ssh connection logs with 172.0.0.10
As you see, cloudWatch is logging both of request and response packets.
In this case, everyone knows that packets displaying 22 as destination port is reponse packets because port 22 is well-known ssh server port.
However, if it is not a well-known port number, you will not be able to distinguish between request and response packets. How do you distinguish it in that case? The cloudwatch log alone does not show me how. No matter how I google it, I can not find a way. Please advise.
In this case, everyone knows that packets displaying 22 as destination port is reponse packets because port 22 is well-known ssh server port.
That's not actually correct. It's the opposite.
The server side of a TCP connection is using the well-known port, not the client¹ thus the well-known port is the destination of a request and the source of a response.
Packets with the source port of 22 would be the SSH "response" (server → client) packets. Ports with the destination port of 22 would be the SSH "request" (client → server) packets.
When I make a request to a web server, my source port is ephemeral but the destination port is 80. Responses come from source port 80.
But of course, the argument can be made that the terms "request" and "response" don't properly apply to packets,
But rather they apply to what the packet contains -- and that is protocol specific. In many cases, the client does the requesting and the server does the responding, but that correlation does not cleanly map down to the low layers of the protocol stack.
In the case of TCP, one side is always listening for connections, usually on a specific port, and that port is usually known to you, if not a "well-known" port, because you are the one who created the service and configured it to listen there.
As these flow log records do not capture the flags that are needed to discern the source and dest of the SYN... SYN+ACK... ACK sequence, you can't ascertain who originated the connection.
With no knowledge of the well-known-ed-ness or other significance of "port 22," it is still easy to conclude from your logs that 172.0.0.10 has a TCP socket listening on that port and that numerous other clients are connecting to it from their ephemeral ports... and we can confirm that this is still listening by running netstat -tln on that machine.
¹ not the client most of the time. There are cases where a server daemon is also a client and will use the well-known port as its source port for outgoing connections, so source and dest might be the same in such a case. I believe Sendmail might be an example of this, at least in some cases, but these are exceptions.