Fixing AWS Application Load Balancer connection timeouts under small loads - amazon-web-services

I'm currently using an application load balancer to split traffic between 3 instance.
Testing the individual instance (connection via IP), they are able to handle about ~200req/second without any connection timeout (timeout being set a 5 seconds).
As such, I'd assume that load balancing over all 3 of them would scale this up to ~600req/second (there's no bottleneck further down the pipe to stop this).
However, when sending the exact same type of test requests to an application load balancer, the connections start to time out before I even hit 100req/second.
I already eliminated the possibly of a low down due to HTTPS (by just sending the requests over HTTP), the instance themselves are healthy and not under heavy use an the load balancer reports no "errors".
I've also configured IP stickiness for ~20 minutes to try and improve the situation but it hasn't helped one bit.
What could be the cause of this problem ? I found no information about increasing the network capacity of LB on aws and no similar questions, so I'm bound to be doing something wrong, but I'm quite unsure what that something is.

Related

AWS classic LB changing IPs/dropping connections results in lost messages on RabbitMQ

I run a rabbit HA cluster with 3 nodes and a classic AWS load-balancer(LB) in front of them. There are two apps, one that publishes and the other one that consumes through the LB.
When publisher app starts sending 3 million messages, after short period of time its connection is put into Flow Control state. After the publishing is finished, in publisher app logs I can see that all 3 million messages are sent. On the other hand in consumer app log I can only see 500K - 1M messages (varies between runs), which means that the large number of messages is lost.
So what is happening is that in the middle of a run, classic LB decides to change its IP address or drop connections, thus loosing a lot of messages (see my update for more details).
The issue does not occur if I skip LB and hit the nodes directly, doing load-balancing on app side. Of course in this case I lose all the benefits of ELB.
My question are:
Why is LB changing IP addresses and dropping connections, is that related to high message rate from publisher or Flow Control state?
How to configure LB, so that this issue doesn't occur?
UPDATE:
This is my understanding what is happening:
I use AMQP 0-9-1 and publish without 'publish confirms', so message is considered sent as soon as it's put on a wire. Also, the connection on rabbitmq node is between LB and a node, not Publisher app and a node.
Before the communication enters Flow Control, messages are passed from LB to a node immediately
Then the connection between LB and a node enters Flow Control, Publisher App connection is not blocked and thus it continues to publish at the same rate. That causes messages to pile up on LB.
Then LB decides to change IP(s) or drop the connection for whatever reasons and create a new one, causing all the piled messages to be lost. This is clearly visible from the RabbitMQ logs:
=WARNING REPORT==== 6-Jan-2018::10:35:50 ===
closing AMQP connection <0.30342.375> (10.1.1.250:29564 -> 10.1.1.223:5672):
client unexpectedly closed TCP connection
=INFO REPORT==== 6-Jan-2018::10:35:51 ===
accepting AMQP connection <0.29123.375> (10.1.1.22:1886 -> 10.1.1.223:5672)
The solution is to use AWS network LB. The network LB is going to create a connection between Publisher App and rabbitmq node. So if the connection is blocked or dropped Publisher is going to be aware of that and act accordingly. I have run the same test with 3M messages and not the single message is lost.
In the AWS docs, there's this line which explains the behaviour:
Preserve source IP address Network Load Balancer preserves the client side source IP allowing the back-end to see the IP address of
the client. This can then be used by applications for further
processing.
From: https://aws.amazon.com/elasticloadbalancing/details/
ELBs will change their addresses when they scale in reaction to traffic. New nodes come up, and appear in DNS, and then old nodes may go away eventually, or they may stay online.
It increases capacity by utilizing either larger resources (resources with higher performance characteristics) or more individual resources. The Elastic Load Balancing service will update the Domain Name System (DNS) record of the load balancer when it scales so that the new resources have their respective IP addresses registered in DNS. The DNS record that is created includes a Time-to-Live (TTL) setting of 60 seconds, with the expectation that clients will re-lookup the DNS at least every 60 seconds. (emphasis added)
— from “Best Practices in Evaluating Elastic Load Balancing”
You may find more useful information in that "best practices" guide, including the concept of pre-warming a balancer with the help of AWS support, and how to ramp up your test traffic in a way that the balancer's scaling can keep up.
The behavior of a classic ELB is automatic, and not configurable by the user.
But it also sounds as if you have configuration issues with your queue, because it seems like it should be more resilient to dropped connections.
Note also that an AWS Network Load Balancer does not change its IP addresses and does not need to scale by replacing resources the way ELB does, because unlike ELB, it doesn't appear to run on hidden instances -- it's part of the network infrastructure, or at least appears that way. This might be a viable alternative.

Elastic Beanstalk reports 5xx errors even though instances are in perfect health

I need to set up an api application for gathering event data to be used in a recommendation engine. This is my setup:
Elastic Beanstalk env with a load balancer and autoscaling group.
I have 2x t2.medium instances running behind a load balancer.
EBS configuration is 64bit Amazon Linux 2016.03 v2.1.1 running Tomcat 8 Java 8
Additionally I have 8x t2.micro instances that I use for high-load testing the api, sending thousands of requests/sec to be handled by the api.
Im using Locust (http://locust.io/) as my load testing tool.
Each t2.micro instance that is run by Locust can send up to about 500req/sec
Everything works fine while the reqs/sec are below 1000, maybe 1200. Once over that, my load balancer reports that some of the instances behind it are reporting 5xx errors (attached). I've also tried with 4 instances behind the load balancer, and although things start out well with up to 3000req/sec, soon after, the ebs health tool and Locust both report 503s and 504s, while all of the instances are in perfect health according to the actual numbers in the ebs Health Overview, showing only 10%-20% CPU utilization.
Is there smth I'm missing in configuring the env? It seems like no matter how many machines I have behind the load balancer, the env handles no more than 1000-2000 requests per second.
EDIT:
Now I know for sure that it's the ELB that is causing the problems, not the instances.
I ran a load test with 10 simulated users. Each user sends about 1req/sec and the load increases by 10 users/sec to 4000 users, which should equal to about 4000req/sec. Still it doesn't seem to like any request rate over 3.5k req/sec (attachment1).
As you can see from attachment2, the 4 instances behind the load balancer are in perfect health, but I still keep getting 503 errors. It's just the load balancer itself causing problems. Look how SurgeQueueLength and SpilloverCount increase rapidly at some point. (attachment3) I'm trying to figure out why.
Also I completely removed the load balancer and tested with just one instance alone. It can handle up to about 3k req/sec. (attachment4 and attachment5), so it's definitely the load balancer.
Maybe I'm missing some crucial limit that load balancers have by default, like the queue size of 1024? What is normal handle rate for 1 load balancer? Should I be adding more load balancers? Could it be related to availability zones? ELB listeners from one zone are trying to route to instances from a different zone?
attachment1:
attachment2:
attachment3:
attachment4:
attachment5:
UPDATE:
Cross zone load balancing is enabled
UPDATE:
maybe this helps more:
The message says that "9.8 % of the requests to the ELB are failing with HTTP 5xx (6 minutes ago)". This does not mean that your instances are not returning HTTP 5xx responses. The requests are failing at the ELB itself. This can happen when your backend instances are at capacity (e.g. connections are saturated and they are rejecting connections to the ELB).
Your requests are spilling over at the ELB. They never make it to the instance. If they were failing at the EC2 instances then the cause would be different and data for the environment would match the data for the instances.
Also note that the cause says that this was the state "6 minutes ago". Elastic Beanstalk multiple data sources - one is the data coming from the instance which shows the requests per second and HTTP status codes in the table shown. Another data source is cloudwatch metrics for your ELB. Since cloudwatch metrics for ELB are 1 minute, this data is slightly delayed and the cause tells you how old the information is.

Do load balancers flood?

I am reading about load balancing.
I understand the idea that load balancers transfer the load among several slave servers of any given app. However very few literature that I can find talks about what happens when the load balancers themselves start struggling with the huge amount of requests, to the point that the "simple" task of load balancing (distribute requests among slaves) becomes an impossible undertaking.
Take for example this picture where you see 3 Load Balancers (LB) and some slave servers.
Figure 1: Clients know one IP to which they connect, one load balancer is behind that IP and will have to handle all those requests, thus that first load balancer is the bottleneck (and the internet connection).
What happens when the first load balancer starts struggling? If I add a new load balancer to side with the first one, I must add even another one so that the clients only need to know one IP. So the dilema continues: I still have only one load balancer receiving all my requests...!
Figure 2: I added one load balancer, but for having clients to know just one IP I had to add another one to centralize the incoming connections, thus ending up with the same bottleneck.
Moreover, my internet connection will also reach its limit of clients it can handle so I probably will want to have my load balancers in remote places to avoid flooding my internet connection. However if I distribute my load balancers, and want to keep my clients knowing just one single IP they have to connect, I still need to have one central load balancer behind that IP carrying all the traffic once again...
How do real world companies like Google and Facebook handle these issues? Can this be done without giving the clients multiple IPs and expect them to choose one at random avoiding every client to connect to the same load balancer, thus flooding us?
Your question doesn't sound AWS specific, so here's a generic answer (elastic LB in AWS auto-scales depending on traffic):
You're right, you can overwhelm a loadbalancer with the number of requests coming in. If you deploy a LB on a standard build machine, you're likely to first exhaust/overload the network stack including max number of open connections and handling rate of incoming connections.
As a first step, you would fine tune the network stack of your LB machine. If that still does not provide you the required throughput, there are special purpose loadbalancer appliances on the market, that are built ground-up and highly optimized to handle a large number of incoming connections and routing them to several servers. Examples of these are F5 and netscaler
You can also design your application in ways that help you split traffic to different sub domains, thereby reducing the number of requests 1 LB has to handle.
It is also possible to implement a round-robin DNS, where you would have 1 DNS entry point to several client facing LBs instead of just one as you've depicted.
Advanced load balancers like Netscaler and similar also does GSLB with DNS not simple DNS-RR (to explain further scaling)
if you are to connect to i.e service.domain.com, you let the load balancers become Authorative DNS for the zone and you add all the load balancers as valid name servers.
When a client looks up "service.domain.com" any of your loadbalancers will answer the DNS request and reply with the IP of the correct data center for your client. You can then further make the loadbalancer reply on the DNS request based of geo location of your client, latency between clients dns server and netscaler, or you can answer based on the different data centers load.
In each datacenter you typically set up one node or several nodes in cluster. You can scale quite high using such a design.
Since you tagged Amazon, they have load balancers built in to their system so you don't need to. Just use ELB and Amazon will direct the traffic to your correct system.
If you are doing it yourself, load balancers typically have a very light processing load. They typically do little more than redirect a connection from one machine to another based on a shallow inspection (or no inspection) of the data. It is possible for them to be overwhelmed, but typically that requires a load that would saturate most connections.
If you are running it yourself, and if your load balancer is doing more work or your connection is getting saturated, the next step is to use Round-Robin DNS for looking up your load balancers, generally using a combination of NS and CNAME records so different name lookups give different IP addresses.
If you plan to use amazon elastic load balancer they claim that
Elastic Load Balancing automatically scales its request handling
capacity to meet the demands of application traffic. Additionally,
Elastic Load Balancing offers integration with Auto Scaling to ensure
that you have back-end capacity to meet varying levels of traffic
levels without requiring manual intervention.
so you can go with them and do not need to handle the Load Balancer using your own instance/product

How can I configure an automatic timeout for an Elastic Load Balancer?

Does anyone know of a way to make Amazon's Elastic Load Balancers timeout if an HTTP response has not been received from upstream in a set timeframe?
Occasionally Amazon's Elastic Beanstalk will fail an update and any requests to the specified resource (running Nginx + Node if tht's any use) will hang any request pages whilst the resource attempts to load.
I'd like to keep the request timeout under 2s, and if the upstream server has no response by then, to automatically fail over to a default 503 response.
Is this possible with ELB?
Cheers
You can Configure Health Check Settings for Elastic Load Balancing to achieve this:
Elastic Load Balancing routinely checks the health of each registered Amazon EC2 instance based on the configurations that you specify. If Elastic Load Balancing finds an unhealthy instance, it stops sending traffic to the instance and reroutes traffic to healthy instances. For more information on configuring health check, see Health Check.
For example, you simply need to specify an appropriate Ping Path for the HTTP health check, a Response Timeout of 2 seconds and an UnhealthyThreshold of 1 to approximate your specification.
See my answer to What does the Amazon ELB automatic health check do and what does it expect? for more details on how the ELB health check system work.
TLDR - Set your timeout in Nginx.
Let's see if we can walkthrough the issues.
Problem:
The client should be presented with something quickly. It's okay if it's a 500 page. However, the ELB currently waits 60 seconds until giving up (https://forums.aws.amazon.com/thread.jspa?messageID=382182) which means it takes a minute before the user is shown anything.
Solutions:
Change the timeout of the ELB
Looks like AWS support will help increase the timeout (https://forums.aws.amazon.com/thread.jspa?messageID=382182) so I imagine that you'll be able to ask for the reverse. Thus, we can see that it's not user/api tunable and requires you to interact with support. This takes a bit of lead time and more importantly, seems like an odd dial to tune when future developers working on this project will be surprised by such a short timeout.
Change the timeout of the nginx server
This seems like the right level of change. You can use proxy_read_timeout (http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_read_timeout) to do what you're looking for. Tune it to something small (and in particular, you can set it for a particular location if you would like).
Change the way the request happens.
It may be beneficial to change how your client code works. You could imagine shipping a really simple html/js page that 1. pings to see if the job is done and 2. keeps the user updated on the progress. This takes a bit more work then just throwing the 500 page.
Recently, AWS added a way to configure timeouts for ELB. See this blog post:
http://aws.amazon.com/blogs/aws/elb-idle-timeout-control/

How can I make an ELB forward requests from a single client to multiple nodes?

I'm currently running a REST API app on two EC2 nodes under a single load balancer. Rather than the standard load-balancing scenario of small amounts of traffic coming from many IPs, I get huge amounts of traffic from only a few IPs. Therefore, I'd like requests from each individual IP to be spread among all available nodes.
Even with session stickiness turned off, however, this doesn't appear to be the case. Looking at my logs, almost all requests are going to one server, with my smallest client going to the secondary node. This is detrimental, as requests to my service can last up to 30 seconds and losing that primary node would mean a disproportionate amount of requests get killed.
How can I instruct my ELB to round-robin for each client's individual requests?
You cannot. ELB uses a non-configurable round-robin algorithm. What you can do to mitigate (and not solve) this problem is adding additional servers to your ELB and/or making the health check requests initiated by your ELB more frequent.
I understand where you're coming from. However, I think you should approach the problem from a different angle. Your problem it appears isn't specifically related to the fact that the load is not balanced. Lets say you do get this balancing problem solved. You're still going to loose a large amount of requests. I don't know how you're clients connect to your services so I can't go into details on how you might fix the problem, but you may want to look at improving the code to be more robust and plan for the connection to get dropped. No service that has connections of 30+ seconds should rely on the connection not getting dropped. Back in the days of TCP/UDP sockets there was a lot more work done on building for failures, somehow that's gotten lost in today's HTTP world.
What I'm trying to say, is if you write the code you're clients are using to connect, build the code to be more robust and handle failures with retries. Once you start performing retries you'll need to make sure that your API calls are atomic and use transactions where necessary.
Lastly, I'll answer your original question. Amazon's ELB's are round robin even from the same computer / ip address. If your clients are always connecting to the same server its most likely the browser or code that is caching the response. If they're not directly accessing your REST API from a browser most languages allow you to get a list of ip's for a given host name. Those ip's will be the ip's of the loadbalancers and you can just shuffle the list and then use the top entry each time. For example you could use the following PHP code to randomly send requests to a different load balancer.
public function getHostByName($domain) {
$ips = gethostbynamel($domain);
shuffle($ips);
return $ips[0];
}
I have had similar issues with Amazon ELB however for me it turned out that the HTTP client used Connection: keep-alive. In other words, the requests from the same client was served over the same connection and for that reason it did not switch between the servers.
I don't know which server you use but it is probably possible to turn off keep-alive forcing the client to make a new connection for every request. This might be a good solution for requests with a lot of data. If you have a large amount of requests with small data it might affect performance negatively.
This may happen when you have the two instances in different availability zones.
When one ELB is working with multiple instances in a single availability zone, it will round-robin the requests between the instances.
When two instances are in two different availability zones, the way ELB works is create two servers (elb servers) each with its own IP, and they balance the load with DNS.
When your client asks the DNS for the IP address of your server, it receives two (or more) responses. Then the client chooses one IP and caches that (the OS usually does). Not much you can do about this, unless you control the clients.
When your problem is that the two instances are in different availability zones, the solution might be to have at least two instances in each availability zone. Then one single ELB server will handle the round-robin across two servers and will have just one IP so when a server fails it will be transparent to the clients.
PS: Another case when ELBs create more servers with unique IPs is when you have a lot of servers in a single availability zone, and one single ELB server can't handle all the load and distribute it to connected servers. Then again, a new server is created to connect the extra instances and the load is distributed using DNS and multiple IPs.