I have the application sending a lot of UDP traffic to AWS NLB.
As per documentation:
https://docs.aws.amazon.com/elasticloadbalancing/latest/network/network-load-balancers.html
While UDP is connectionless, the load balancer maintains UDP flow state based on the source and destination IP addresses and ports, ensuring that packets that belong to the same flow are consistently sent to the same target. After the idle timeout period elapses, the load balancer considers the incoming UDP packet as a new flow and routes it to a new target. Elastic Load Balancing sets the idle timeout value for UDP flows to 120 seconds.
Can we somehow change this timer ? I would like to go down to 1 second....
Thanks,
Related
I am bit puzzled about how TCP load balancer works, in particular AWS ELB. By looking at AWS ELB doc:
For TCP traffic, the load balancer selects a target using a flow hash algorithm based on the protocol, source IP address, source port, destination IP address, destination port, and TCP sequence number. The TCP connections from a client have different source ports and sequence numbers, and can be routed to different targets. Each individual TCP connection is routed to a single target for the life of the connection.
This confuses me. I assume/expect that a TCP connection persists to the same target for the duration of it. However (!) if the hash algorithm takes in account also the TCP sequence number - which is going to change for every tcp packet round-trip - then such connection is (wrongly) routed to other targets after every tcp packet round-trip.. Please help.
I'm trying to get TCP timestamp from the packets for clock skewing purposes on my application which is hosted on EC2. In my network I have an ALB.
So my question is how do I get TCP level packet information in my app ? Since ALB filters out all the OSI Layers except application level (HTTP)
If the only reason to get access to TCP packet is to detect timestamp and correct clock drift, I would suggest to configure your EC2 instance to use NTP time server instead.
https://aws.amazon.com/blogs/aws/keeping-time-with-amazon-time-sync-service/
That being said, the ALB is not "removing" TCP information from network packets. HTTP connections made to your application are still transported over IP and TCP. If you need low level access to network packets from an app, I would suggest to look at the pCAP library which is used by TCPDUMP and many other tool to capture network traffic on an interface.
https://www.tcpdump.org/
[UPDATED to include comments]
It is important to understand the TCP connection between your client and the ALB is terminated at the ALB level. The ALB creates a second TCP connection to forward HTTP requests to your EC2 instance. The ALB does not remove information from TCP/IP, it just creates a second, independent and new connection. Usually the only information you want to propagate from the initial TCP connection is the source IP address. The ALB, like most load balancers and proxies, captures this information from the original connection (the one received from the client) and embed the information in an HTTP header called X-Forwarded-For.
This is documented at https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/x-forwarded-headers.html
If you want to capture other information from the original connection, I am afraid it will not be possible using ALB. (but I also would be very curious about the use case, i.e. WHAT you're trying to achieve)
I run a rabbit HA cluster with 3 nodes and a classic AWS load-balancer(LB) in front of them. There are two apps, one that publishes and the other one that consumes through the LB.
When publisher app starts sending 3 million messages, after short period of time its connection is put into Flow Control state. After the publishing is finished, in publisher app logs I can see that all 3 million messages are sent. On the other hand in consumer app log I can only see 500K - 1M messages (varies between runs), which means that the large number of messages is lost.
So what is happening is that in the middle of a run, classic LB decides to change its IP address or drop connections, thus loosing a lot of messages (see my update for more details).
The issue does not occur if I skip LB and hit the nodes directly, doing load-balancing on app side. Of course in this case I lose all the benefits of ELB.
My question are:
Why is LB changing IP addresses and dropping connections, is that related to high message rate from publisher or Flow Control state?
How to configure LB, so that this issue doesn't occur?
UPDATE:
This is my understanding what is happening:
I use AMQP 0-9-1 and publish without 'publish confirms', so message is considered sent as soon as it's put on a wire. Also, the connection on rabbitmq node is between LB and a node, not Publisher app and a node.
Before the communication enters Flow Control, messages are passed from LB to a node immediately
Then the connection between LB and a node enters Flow Control, Publisher App connection is not blocked and thus it continues to publish at the same rate. That causes messages to pile up on LB.
Then LB decides to change IP(s) or drop the connection for whatever reasons and create a new one, causing all the piled messages to be lost. This is clearly visible from the RabbitMQ logs:
=WARNING REPORT==== 6-Jan-2018::10:35:50 ===
closing AMQP connection <0.30342.375> (10.1.1.250:29564 -> 10.1.1.223:5672):
client unexpectedly closed TCP connection
=INFO REPORT==== 6-Jan-2018::10:35:51 ===
accepting AMQP connection <0.29123.375> (10.1.1.22:1886 -> 10.1.1.223:5672)
The solution is to use AWS network LB. The network LB is going to create a connection between Publisher App and rabbitmq node. So if the connection is blocked or dropped Publisher is going to be aware of that and act accordingly. I have run the same test with 3M messages and not the single message is lost.
In the AWS docs, there's this line which explains the behaviour:
Preserve source IP address Network Load Balancer preserves the client side source IP allowing the back-end to see the IP address of
the client. This can then be used by applications for further
processing.
From: https://aws.amazon.com/elasticloadbalancing/details/
ELBs will change their addresses when they scale in reaction to traffic. New nodes come up, and appear in DNS, and then old nodes may go away eventually, or they may stay online.
It increases capacity by utilizing either larger resources (resources with higher performance characteristics) or more individual resources. The Elastic Load Balancing service will update the Domain Name System (DNS) record of the load balancer when it scales so that the new resources have their respective IP addresses registered in DNS. The DNS record that is created includes a Time-to-Live (TTL) setting of 60 seconds, with the expectation that clients will re-lookup the DNS at least every 60 seconds. (emphasis added)
— from “Best Practices in Evaluating Elastic Load Balancing”
You may find more useful information in that "best practices" guide, including the concept of pre-warming a balancer with the help of AWS support, and how to ramp up your test traffic in a way that the balancer's scaling can keep up.
The behavior of a classic ELB is automatic, and not configurable by the user.
But it also sounds as if you have configuration issues with your queue, because it seems like it should be more resilient to dropped connections.
Note also that an AWS Network Load Balancer does not change its IP addresses and does not need to scale by replacing resources the way ELB does, because unlike ELB, it doesn't appear to run on hidden instances -- it's part of the network infrastructure, or at least appears that way. This might be a viable alternative.
I read the aws documentation inside the elastic beanstalk program where aws is responsible for scaling the servers and auto managing it. In the same documentation there is an option for changing and configuring the load balancer. In my case I want to change it to balance the requests that come to the servers on the IP network layer (L3), but it says that only HTTP and TCP can be listened and balanced.
I am developing a chat application backend that need to be developed with scaling in considerations. how can I configure the load balancer to listen on L3 ?
the chat application in order to work it must make the tcp connection with the server not the load balancer so that's why I must load the packets on the IP layer to the server so the server can establish a tcp connection with the app ( if I am wrong and I can do it on the tcp layer tell me ).
If I can't, does that give me another option or I will just be forced on using ec2 and handle all the system management overhead myself and create my own load balancer ?
ELB Classic operates at either Layer 4 or Layer 7. Those are the options.
the chat application in order to work it must make the tcp connection with the server not the load balancer so that's why I must load the packets on the IP layer to the server so the server can establish a tcp connection with the app.
You're actually incorrect about this. If you need to know the client's source IP address, you can enable the Proxy Protocol on your ELB, and support this in your server code.
When the ELB establishes each new connection to the instance, with the Proxy Protocol enabled, the ELB emits a single line preamble containing the 5-way tuple describing the external connection, which your application can interpret. Then it opens up the L4 connection's payload streams and is transparent for the remainder of the connection.
http://docs.aws.amazon.com/elasticloadbalancing/latest/classic/enable-proxy-protocol.html
So, we are kinda kinda lost using the AWS ELB connection draining feature.
We have an Auto Scaling Group and we have an application that has independent sessions (A session on every instance). We configured the ELB listener over HTTP on port 80, forwarding to port 8080 (this is of course the port where the application is deployed) and we created a LBCookieStickinessPolicy. We also enabled the connection draining for 120 seconds.
The behavior we want:
We want to scale down an instance but since the session is sticked to each instance, we want to "maintain" that session during 120 seconds (Or the connection draining configuration).
The behavior we have:
We have tried to deregister, set to stanby, terminate, stop, set to unhealthy an instance. But no matter what we do, the instance shut downs immediately causing the session to end abruptly. Also, we changed the ELB listener configuration to work over TCP with no luck.
Thoughts?
Connection draining refers to open tcp connections with the client it has nothing to do with sessions on your instance. You may be able to do something with keep-alives if you do a TCP passthrough instead of http listener.
The best route to go is set up sessions to be shared between your instances and then disable stickyness on the load balancer.