GCP loadbalancer not distributing traffic evenly/properly - google-cloud-platform

I have 3 instances attached to a gcp http loadbalancer. I have webservice running on all three of the instances. When I send requests , for example 3 request concurrently to loadbalancer, sometimes it routes all of the 3 requests to one instance, sometimes the loadbalancer routes the requests amongst the instances, but that is not even. What I mean by even is, when there is already load on a instance, still it would send request to that instance instead of sending to a instance which has no load. I would like to know how the loadbalancer distributes the traffic? And if there is any specific algorithm for distribution of traffic?
Loadbalancer has healthcheck on, which checks if the webservice is live, I have also tested with CPU usage and I get the same results.

You can fin here about how the traffic is load in a HTTP load balancer. You should also have in mind the instances are physically in some cells, so might the instance itself isn't in use, but other instances from other projects in the same cell are using much resources and the cell is actually with less resources than the one where your instance is getting all the test traffic.

Related

ELB cross-AZ balancing DNS resolution with Sticky sessions

I am preparing for AWS certification and came across a question about ELB with sticky session enabled for instances in 2 AZs. The problem is that requests from a software-based load tester in one of the AZs end up in the instances in that AZ only instead of being distributed across AZs. At the same time regular requests from customers are evenly distributed across AZs.
The correct answers to fix the load tester issue are:
Forced the software-based load tester to re-resolve DNS before every
request;
Use third party load-testing service to send requests from
globally distributed clients.
I'm not sure I can understand this scenario. What is the default behaviour of Route 53 when it comes to ELB IPs resolution? In any case, those DNS records have 60 seconds TTL. Isn't it redundant to re-resolve DNS on every request? Besides, DNS resolution is a responsibility of DNS service itself, not load-testing software, isn't it?
I can understand that requests from the same instance, with load testing software on it, will go to the same LBed EC2, but why does it have to be an instance in the same AZ? It can be achieved only by Geolocation- or Latency-based routing, but I can't find anything in the specs whether those are the default ones.
When an ELB is in more than one availability zone, it always has more than one public IP address -- at least one per zone.
When you request these records in a DNS lookup, you get all of these records (assuming there are not very many) or a subset of them (if there are a large number, which would be the case in an active cluster with significant traffic) but they are unordered.
If the load testing software resolves the IP address of the endpoint and holds onto exactly one of the IP addresses -- as it a likely outcome -- then all of the traffic will go to one node of the balancer, which is in one zone, and will send traffic to instances in that zone.
But what about...
Cross-Zone Load Balancing
The nodes for your load balancer distribute requests from clients to registered targets. When cross-zone load balancing is enabled, each load balancer node distributes traffic across the registered targets in all enabled Availability Zones. When cross-zone load balancing is disabled, each load balancer node distributes traffic across the registered targets in its Availability Zone only.
https://docs.aws.amazon.com/elasticloadbalancing/latest/userguide/how-elastic-load-balancing-works.html
If stickiness is configured, those sessions will initially land in one AZ and then stick to that AZ because they stick to the initial instance where they landed. If cross-zone is enabled, the outcome is not quite as clear, but either balancer nodes may prefer instances in their own zone in that scenario (when first establishing stickiness), or this wasn't really the point of the question. Stickiness requires coordination, and cross-AZ traffic takes a non-zero amount of time due to distance (typically <10 ms) but it would make sense for a balancer to prefer to select instances its local zone for sessions with no established affinity.
In fact, configuring the load test software to re-resolve the endpoint for each request is not really the focus of the solution -- the point is to ensure that (1) the load test software uses all of them and does not latch onto exactly one and (2) that if more addresses become available due to the balancer scaling out under load, that the load test software expands its pool of targets.
In any case, those DNS records have 60 seconds TTL. Isn't it redundant to re-resolve DNS on every request?
The software may not see the TTL, may not honor the TTL and, as noted above, may stick to one answer even if multiple are available, because it only needs one in order to make the connection. Every request is not strictly necessary, but it does solve the problem.
Besides, DNS resolution is a responsibility of DNS service itself, not load-testing software, isn't it?
To "resolve DNS" in this context simply means to do a DNS lookup, whatever that means in the specific instance, whether using the OS's DNS resolver or making a direct query to a recursive DNS server. When software establishes a connection to a hostname, it "resolves" (looks up) the associated IP address.
The other solution, "use third party load-testing service to send requests from globally distributed clients," solves the problem by accident, since the distributed clients -- even if they stick to the first address they see -- are more likely to see all of the available addresses. The "global" distribution aspect is a distraction.
ELB relies on random arrival of requests across its external-facing nodes as part of the balancing strategy. Load testing software whose design overlooks this is not properly testing the ELB. Both solutions mitigate the problem in different ways.
The sticky is the issue , see here : https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/elb-sticky-sessions.html
The load balancer uses a special cookie to associate the session with
the instance that handled the initial request, but follows the
lifetime of the application cookie specified in the policy
configuration. The load balancer only inserts a new stickiness cookie
if the application response includes a new application cookie. The
load balancer stickiness cookie does not update with each request. If
the application cookie is explicitly removed or expires, the session
stops being sticky until a new application cookie is issued.
The first solution, to re-resolve DNS will create new sessions and with this will break the stickiness of the ELB . The second solution is to use multiple clients , stickiness is not an issue if the number of globally distributed clients is large.
PART 2 : could not add as comment , is to long :
Yes, my answer was to simple and incomplete.
What we know is that ELB is 2 AZ and will have 2 nodes with different IP. Not clear how many IP , depends on the number of requests and the number of servers on each AZ. Route 53 is rotating the IP for every new request , first time in NodeA-IP , NodeB-IP , second time is NodeB-IP, NodeA-IP. The load testing application will take with every new request the first IP , balancing between the 2 AZ. Because a Node can route only inside his AZ , if the sticky cookie is for NodeA and the request arrives to NodeB , NodeB will send it to one of his servers in AZ2 ignoring the cookie for a server in AZ 1.
I need to run some tests, quickly tested with Route53 with classic ELB and 2 AZ and is rotating every time the IP's. What I want to test if I have a sticky cookie for AZ 1 and I reach the Node 2 will not forward me to Node 1 ( In case of no available servers, is described in the doc this interesting flow ). Hope to have updates in short time.
Just found another piece of evidence that Route 53 returns multiple IPs and rotate them for ELB scaling scenarios:
By default, Elastic Load Balancing will return multiple IP addresses when clients perform a DNS resolution, with the records being randomly ordered on each DNS resolution request. As the traffic profile changes, the controller service will scale the load balancers to handle more requests, scaling equally in all Availability Zones.
And then:
To ensure that clients are taking advantage of the increased capacity, Elastic Load Balancing uses a TTL setting on the DNS record of 60 seconds. It is critical that you factor this changing DNS record into your tests. If you do not ensure that DNS is re-resolved or use multiple test clients to simulate increased load, the test may continue to hit a single IP address when Elastic Load Balancing has actually allocated many more IP addresses.
What I didn't realize at first is that even if regular traffic is distributed evenly across AZs, it doesn't mean that Cross-Zone Load Balancing is enabled. As Michael pointed out regular traffic will naturally come through various locations and end up in different AZs.
And as it was not explicitly mentioned in the test, Cross-AZ balancing might not have been in place.
https://aws.amazon.com/articles/best-practices-in-evaluating-elastic-load-balancing/

Elastic Beanstalk reports 5xx errors even though instances are in perfect health

I need to set up an api application for gathering event data to be used in a recommendation engine. This is my setup:
Elastic Beanstalk env with a load balancer and autoscaling group.
I have 2x t2.medium instances running behind a load balancer.
EBS configuration is 64bit Amazon Linux 2016.03 v2.1.1 running Tomcat 8 Java 8
Additionally I have 8x t2.micro instances that I use for high-load testing the api, sending thousands of requests/sec to be handled by the api.
Im using Locust (http://locust.io/) as my load testing tool.
Each t2.micro instance that is run by Locust can send up to about 500req/sec
Everything works fine while the reqs/sec are below 1000, maybe 1200. Once over that, my load balancer reports that some of the instances behind it are reporting 5xx errors (attached). I've also tried with 4 instances behind the load balancer, and although things start out well with up to 3000req/sec, soon after, the ebs health tool and Locust both report 503s and 504s, while all of the instances are in perfect health according to the actual numbers in the ebs Health Overview, showing only 10%-20% CPU utilization.
Is there smth I'm missing in configuring the env? It seems like no matter how many machines I have behind the load balancer, the env handles no more than 1000-2000 requests per second.
EDIT:
Now I know for sure that it's the ELB that is causing the problems, not the instances.
I ran a load test with 10 simulated users. Each user sends about 1req/sec and the load increases by 10 users/sec to 4000 users, which should equal to about 4000req/sec. Still it doesn't seem to like any request rate over 3.5k req/sec (attachment1).
As you can see from attachment2, the 4 instances behind the load balancer are in perfect health, but I still keep getting 503 errors. It's just the load balancer itself causing problems. Look how SurgeQueueLength and SpilloverCount increase rapidly at some point. (attachment3) I'm trying to figure out why.
Also I completely removed the load balancer and tested with just one instance alone. It can handle up to about 3k req/sec. (attachment4 and attachment5), so it's definitely the load balancer.
Maybe I'm missing some crucial limit that load balancers have by default, like the queue size of 1024? What is normal handle rate for 1 load balancer? Should I be adding more load balancers? Could it be related to availability zones? ELB listeners from one zone are trying to route to instances from a different zone?
attachment1:
attachment2:
attachment3:
attachment4:
attachment5:
UPDATE:
Cross zone load balancing is enabled
UPDATE:
maybe this helps more:
The message says that "9.8 % of the requests to the ELB are failing with HTTP 5xx (6 minutes ago)". This does not mean that your instances are not returning HTTP 5xx responses. The requests are failing at the ELB itself. This can happen when your backend instances are at capacity (e.g. connections are saturated and they are rejecting connections to the ELB).
Your requests are spilling over at the ELB. They never make it to the instance. If they were failing at the EC2 instances then the cause would be different and data for the environment would match the data for the instances.
Also note that the cause says that this was the state "6 minutes ago". Elastic Beanstalk multiple data sources - one is the data coming from the instance which shows the requests per second and HTTP status codes in the table shown. Another data source is cloudwatch metrics for your ELB. Since cloudwatch metrics for ELB are 1 minute, this data is slightly delayed and the cause tells you how old the information is.

Do load balancers flood?

I am reading about load balancing.
I understand the idea that load balancers transfer the load among several slave servers of any given app. However very few literature that I can find talks about what happens when the load balancers themselves start struggling with the huge amount of requests, to the point that the "simple" task of load balancing (distribute requests among slaves) becomes an impossible undertaking.
Take for example this picture where you see 3 Load Balancers (LB) and some slave servers.
Figure 1: Clients know one IP to which they connect, one load balancer is behind that IP and will have to handle all those requests, thus that first load balancer is the bottleneck (and the internet connection).
What happens when the first load balancer starts struggling? If I add a new load balancer to side with the first one, I must add even another one so that the clients only need to know one IP. So the dilema continues: I still have only one load balancer receiving all my requests...!
Figure 2: I added one load balancer, but for having clients to know just one IP I had to add another one to centralize the incoming connections, thus ending up with the same bottleneck.
Moreover, my internet connection will also reach its limit of clients it can handle so I probably will want to have my load balancers in remote places to avoid flooding my internet connection. However if I distribute my load balancers, and want to keep my clients knowing just one single IP they have to connect, I still need to have one central load balancer behind that IP carrying all the traffic once again...
How do real world companies like Google and Facebook handle these issues? Can this be done without giving the clients multiple IPs and expect them to choose one at random avoiding every client to connect to the same load balancer, thus flooding us?
Your question doesn't sound AWS specific, so here's a generic answer (elastic LB in AWS auto-scales depending on traffic):
You're right, you can overwhelm a loadbalancer with the number of requests coming in. If you deploy a LB on a standard build machine, you're likely to first exhaust/overload the network stack including max number of open connections and handling rate of incoming connections.
As a first step, you would fine tune the network stack of your LB machine. If that still does not provide you the required throughput, there are special purpose loadbalancer appliances on the market, that are built ground-up and highly optimized to handle a large number of incoming connections and routing them to several servers. Examples of these are F5 and netscaler
You can also design your application in ways that help you split traffic to different sub domains, thereby reducing the number of requests 1 LB has to handle.
It is also possible to implement a round-robin DNS, where you would have 1 DNS entry point to several client facing LBs instead of just one as you've depicted.
Advanced load balancers like Netscaler and similar also does GSLB with DNS not simple DNS-RR (to explain further scaling)
if you are to connect to i.e service.domain.com, you let the load balancers become Authorative DNS for the zone and you add all the load balancers as valid name servers.
When a client looks up "service.domain.com" any of your loadbalancers will answer the DNS request and reply with the IP of the correct data center for your client. You can then further make the loadbalancer reply on the DNS request based of geo location of your client, latency between clients dns server and netscaler, or you can answer based on the different data centers load.
In each datacenter you typically set up one node or several nodes in cluster. You can scale quite high using such a design.
Since you tagged Amazon, they have load balancers built in to their system so you don't need to. Just use ELB and Amazon will direct the traffic to your correct system.
If you are doing it yourself, load balancers typically have a very light processing load. They typically do little more than redirect a connection from one machine to another based on a shallow inspection (or no inspection) of the data. It is possible for them to be overwhelmed, but typically that requires a load that would saturate most connections.
If you are running it yourself, and if your load balancer is doing more work or your connection is getting saturated, the next step is to use Round-Robin DNS for looking up your load balancers, generally using a combination of NS and CNAME records so different name lookups give different IP addresses.
If you plan to use amazon elastic load balancer they claim that
Elastic Load Balancing automatically scales its request handling
capacity to meet the demands of application traffic. Additionally,
Elastic Load Balancing offers integration with Auto Scaling to ensure
that you have back-end capacity to meet varying levels of traffic
levels without requiring manual intervention.
so you can go with them and do not need to handle the Load Balancer using your own instance/product

AWS Autoscalling Rolling over Connections to new Instances

Is it possible to automatically 'roll over' a connection between auto scaled instances?
Given instances which provide a compute intensive service, we would like to
Autoscale a new instance after CPU reachs say 90%
Have requests for service handled by the new instance.
It does not appear that there is a way with the AWS Dashboard to set this up, or have I missed something?
What you're looking for is a load balancer. If you're using HTTP, this works pretty much out of the box. Clients open connections to the load balancer, which then distributes individual HTTP requests from the connection evenly across instances in your auto scaling group. When a new instance joins the group, the load balancer automatically shifts a portion of the incoming requests over to the new instance.
Things get a bit trickier if you're speaking a protocol other than HTTP(S). A generic TCP load balancer can't tell where one "request" ends and the next begins (or if that even makes sense for your protocol), so incoming TCP connections get mapped directly to a particular backend host. The load balancer will route new connections to the new instance when it spins up, but it can't migrate existing connections over.
Typically what you'll want to do in this scenario is to have clients periodically close their connections to the service and create new ones - especially if they're seeing increased latencies or other evidence that the instance they're talking to is overworked.

Load balancer for php application

Questions about load balancers if you have time.
So I've been using AWS for some time now. Super basic instances, using them to do some tasks whenever I needed something done.
I have a task that needs to be load balanced now. It's not a public service though. It's pretty much a giant cron job that I don't want running on the same servers as my website.
I set up an AWS load balancer, but it doesn't do what I expected it to do.
It get's stuck on one server, and doesn't load balance at all. I've read why it does this, and that's all fine and well, but I need it to be a serious round-robin load balancer.
edit:
I've set up the instances on different zones, but no matter how many instances I add to the ELB, it just uses one. If I take that instance down, it switches to a different one, so I know it's working. But I really would like it to always use a different one under every circumstance.
I know there are alternatives. Here's my question(s):
Would a custom php load balancer be an ok option for now?
IE: Have a list of servers, and have php randomly select a ec2 instance. Wouldn't be scalable at all, bu atleast I could set this up in 2 mins and it can work for now.
or
Should I take the time to learn how HAProxy works, and set that up in place of the AWS ELB?
or
Am I doing it wrong, and AWS's ELB does do round-robin. I just have something configured wrong?
edit:
Structure:
1) Web server finds a task to do.
2) If it's too large it sends it off to AWS (to load balancer).
3) Do the job on EC2
4) Report back via curl to an API
5) Rinse and repeat
Everything works great. But because the connection always comes from my server (one IP) it get's sticky'd to a single EC2 machine.
ELB works well for sites whose loads increase gradually. If you are expecting an uncommon and sudden increase on the load, you can ask AWS to pre-warm it for you.
I can tell you I used ELB in different scenarios and it always worked well for me. As you didn't provide too much information about your architecture, I would bet that ELB works for you, and the case that all connections are hitting only one server, I would ask you:
1) Did you check the ELB to see how many instances are behind it?
2) The instances that you have behind the ELB, are all alive?
3) Are you accessing your application through the ELB DNS?
Anyway, I took an excerpt from the excellent article that does a very good comparison between ELB and HAProxy. http://harish11g.blogspot.com.br/2012/11/amazon-elb-vs-haproxy-ec2-analysis.html
ELB provides Round Robin and Session Sticky algorithms based on EC2
instance health status. HAProxy provides variety of algorithms like
Round Robin, Static-RR, Least connection, source, uri, url_param etc.
Hope this helps.
This point comes as a surprise to many users using Amazon ELB. Amazon
ELB behaves little strange when incoming traffic is originated from
Single or Specific IP ranges, it does not efficiently do round robin
and sticks the request. Amazon ELB starts favoring a single EC2 or
EC2’s in Single Availability zones alone in Multi-AZ deployments
during such conditions. For example: If you have application
A(customer company) and Application B, and Application B is deployed
inside AWS infrastructure with ELB front end. All the traffic
generated from Application A(single host) is sent to Application B in
AWS, in this case ELB of Application B will not efficiently Round
Robin the traffic to Web/App EC2 instances deployed under it. This is
because the entire incoming traffic from application A will be from a
Single Firewall/ NAT or Specific IP range servers and ELB will start
unevenly sticking the requests to Single EC2 or EC2’s in Single AZ.
Note: Users encounter this usually during load test, so it is ideal to
load test AWS Infra from multiple distributed agents.
More info at the Point 9 in the following article http://harish11g.blogspot.in/2012/07/aws-elastic-load-balancing-elb-amazon.html
HAProxy is not hard to learn and is tremendously lightweight yet flexible. I actually use HAProxy behind ELB for the best of both worlds -- the hardened, managed, hands-off reliability of ELB facing the Internet and unwrapping SSL, and the flexible configuration of HAProxy to allow me to fine tune how things hit my servers. I've never lost an HAProxy instance yet, but it I do, ELB will just take that one out of rotation... as I have seen happen when the back-end servers have all become inaccessible, which (because of the way it's configured) makes ELB think the HAProxy is unhealthy, but that's by design in my setup.