AWS Auto Scaling Group - Application Load Balancer Request Count Per Target

AWS Auto Scaling Group - Application Load Balancer Request Count Per Target - amazon-web-services

On AWS, you can create an auto scaling policy which scales based on "Application Load Balancer Request Count Per Target".
Like this:
This has a min of 1 instance and a max of 5. It should aim to achieve 10 "Request count per target" for my ElbTargetGroup.
My question is, what is "Application Load Balancer Request Count Per Target"?
Is this:
Number of active connections to targets from the load balancer divided by number of targets?
Number of requests per 5 minutes divided by number of targets?
Number of requests per 1 minute divided by number of targets?
The documentation here just says:
The average number of requests received by each target in a target group. You must specify the target group using the TargetGroup dimension.
Also, how long does it have to be over that target for it to start creating new instances? The main reason I ask is that I have sent many requests to this load balancer, but scaling events aren't being triggered.

The answer is your first choice:
"Number of active connections to targets from the load balancer divided by number of targets"
The ELB metrics are all 1 minute, as quoted by Hagen above.
You can see all the metric definitions for load balancers in the AWS ALB doc.
Note, that there is both a RequestCount and RequestCountPerTarget where the latter is the former divided by the number of active targets.
You can see both these metrics in the CloudWatch console, but more simply in the EC2 console. Select Target Groups on the left pane and then the Monitoring tab. (Note that there is a lot of overlap between the monitoring tab of Target Groups and Monitoring in the Load Balancer screen)
Although the Load Balancer metrics are every 1 minute, if you used EC2 metrics (like CPU) they are only every 5 minutes by default unless you change your CloudWatch settings to turn on detailed monitoring to get metrics every minute. You pay extra for detailed metrics.

RequestCountPerTarget is a load balancer metric. The ELB metrics are always over 1 minute, as outlined in the documentation:
Elastic Load Balancing reports metrics to CloudWatch only when
requests are flowing through the load balancer. If there are requests
flowing through the load balancer, Elastic Load Balancing measures and
sends its metrics in 60-second intervals. If there are no requests
flowing through the load balancer or no data for a metric, the metric
is not reported.
So if you stick to this metric, there is no need to pay for detailed EC2 instance metrics. This is only relevant if you need to use something like the CPU utilization on the instances.

Related

Auto Scaling Without Load Balancer in GCP

Can someone help me with GCP autoscaling. I want to achive Auto Scaling Without using Load Balancer in GCP because the service which is running on the VM does not need any endpoint its more likely a kafka consumer where its only fetch the data from cluster and send it to DB so there is no load balancing.
so far i have successfully created instaces template and have define the minimum and maximum state there but thats only maintaining the running state not perfroming autoscaling.

You can use instance groups which is a collection of virtual machine (VM) instances that you can manage as a single entity.
Autoscaling groups of instances have managed instance groups which will autoscale as per requirement by using the following policies.
CPU Usage: The size of the group is controlled to keep the average processor load of the virtual machines in the group at the required level
HTTP Load Balancing Usage: The size of the group is controlled to keep the load of the HTTP traffic balancer at the required level
Stackdriver Monitoring Metric: The size of the group is controlled to keep the selected metric from the Stackdriver Monitoring instrument at the required level .
Multiple Metrics: The decision to change the size of the group is made on the basis of multiple metrics.
Select your required policy and create a managed group of instances which will autoscale your VM.Here in this document you can find the steps to create scaling based on CPU usage, similarly you can create a required group.
For understanding attaching a Blog refer to it for more information.

How does MIG autoscale when autoscale policy is set to Target HTTP load balancing utilization?

I'm learning load balancer and managed instance group auto scaling. I do not understand how does MIG autoscales when using HTTP load balancing utilization:
So, in MIG autoscale setting, I set Target HTTP load balancing utilization to 10%:
And in setting external HTTP load balancer: I have following two options:
utilization:
rate:
I can understand CPU based MIG autoscale, if the average CPU usage is greater than the number I inputed, then MIG will add more VMs to lower the number. It's very simple and straightforward.
But I do not know when will MIG autoscale when using HTTP load balancing utilization?

GCP Load Balancing offers three types of autoscaling:
You can choose to scale using the following policies:
Average CPU utilization.
HTTP load balancing serving capacity, which can be based on either utilization or requests per second.
Cloud Monitoring metrics (not supported by regional autoscalers)
First one as you said yourself is pretty self-explanatory.
And this is what the official documentaiton says about Requests per second (RPS) based autoscaling:
With RATE, you must specify a target number of requests per second on
a per-instance basis or a per-group basis. (Only zonal instance groups
support specifying a maximum rate for the whole group.
But there is a limitation to the RPS based autoscaling:
Autoscaling does not work with maximum requests per group because this
setting is independent of the number of instances in the instance
group. The load balancer continuously sends the maximum number of
requests per group to the instance group, regardless of how many
instances are in the group.
For example, if you set the backend to handle 100 maximum requests per
group per second, the load balancer sends 100 requests per second to
the group, whether the group has two instances or 100 instances.
Because this value cannot be adjusted, autoscaling does not work with
a load balancing configuration that uses the maximum number of
requests per second per group.
You may also find useful to have a look at the types of GCP load balancing supported by in various scenarios.
This document also describes when it's best not to use some types of load balancing.

Google cloud load balancer unevenly distributes traffic

I have created instance groups with 3 instances and GCP load balancer of type HTTP/2.
When I hit the load balancer's IP, the requests get randomly distributed say I hit 12 requests since there are 3 instances the load distribution should have been 4 per VM but it doesn't happen in a round robin away. Is there any possibility that I can achieve this in GCP?

The algorithm used by the GCP load Balancing is intended to distribute load according to the geographic location of the clients. If more than one zone is configured with backends in a region, the traffic is distributed across the instance groups in each zone according to each group's capacity.
The Round-robin algorithm is present only when you create a backend made of instances within the same zone; in that case the requests are spread evenly over the instances.

AWS Network Load Balancer - TCP target group unhealthy threshold

I have a question about health check configuration for a target group behind a network load balancer.
When I try to put different healthy/unhealthy thresholds I cannot change the unhealthy threshold and the infobox for this field specifies that must be the same than healthy threshold, but I'm not able to find any reason for this limitation.
Why is not possible to set different healthy / unhealthy thresholds?
I've seen a strange behaviour in autoscaling activities where all instances behind the target group were declared as unhealthy at the same time including one that have been added recently, taking the system out of service for a few minutes until scaling activity solve the problem.
Thanks in advance,

AWS ELB doesn't distribute requests to auto scaling group EC2 instances in some cases

I'm trying to do performance testing for my AWS auto scaling group using jmeter.
Firstly, I did scale-in/out testing. I set the threshold to be 70% cpu utilization for 2 periods, each period is 2 minutes. The ELB works fine, and the requests was distributed to all EC2 instances in auto scaling group, in spite of un-equality, after the system scale-out.
In next, I want to test whether the two instances' load can be twice of one instance's.
I fixed the instance number of auto scaling group, I set the min/max/desired instance count to be 2. When I push load from single JMeter, there are always just only one instance works and its cpu utilization reach almost 100 percent, but the cpu utilization of the other instance is still zero.... If I push load from an JMeter cluster which contains several slaves, all instances take load.
Somebody said, maybe the load is not heavy enough, so the ELB considered that just one instance can handle it and didn't dispatch requests to other instance. I don't think so, because I push load from just one slave of this JMeter cluster, however I increase the load, just only one instance handle requests.
I found an blog which said ELB is great in HA but not load balancing.
https://www.stackdriver.com/elb-affinity-problems
But, I don't think the behavior, only one instance handle requests, is normal.
What the hell in the ELB load balance mechanism? I'm confused.

Elastic Load Balancing mechanism
DNS server uses DNS round robin to determine which load balancer node in a specific Availablity Zone will receive the request
The selected load balancer checks for "sticky session" cookie
The selected oad balancer sends the request to the least loaded instance
And in greater details:
Availability Zones (unlikely your case)
By default, the load balancer node routes traffic to back-end instances within the same Availability Zone. To ensure that your back-end instances are able to handle the request load in each Availability Zone, it is important to have approximately equivalent numbers of instances in each zone. For example, if you have ten instances in Availability Zone us-east-1a and two instances in us-east-1b, the traffic will still be equally distributed between the two Availability Zones. As a result, the two instances in us-east-1b will have to serve the same amount of traffic as the ten instances in us-east-1a.
Sessions (most likely your case)
By default a load balancer routes each request independently to the server instance with the smallest load. By comparison, a sticky session binds a user's session to a specific server instance so that all requests coming from the user during the session will be sent to the same server instance.
AWS Elastic Beanstalk uses load balancer-generated HTTP cookies when sticky sessions are enabled for an application. The load balancer uses a special load balancer–generated cookie to track the application instance for each request. When the load balancer receives a request, it first checks to see if this cookie is present in the request. If so, the request is sent to the application instance specified in the cookie. If there is no cookie, the load balancer chooses an application instance based on the existing load balancing algorithm. A cookie is inserted into the response for binding subsequent requests from the same user to that application instance. The policy configuration defines a cookie expiry, which establishes the duration of validity for each cookie.
Routing Algorithm (less likely your case)
Load balancer node sends the request to healthy instances within the same Availability Zone using the leastconns routing algorithm. The leastconns routing algorithm favors back-end instances with the fewest connections or outstanding requests.
Source: Elastic Load Balancing Terminology And Key Concepts
Hope it helps.

I had this issue with unbalanced ELB traffic when back end instances where in different availability zones and the ELB was receiving requests from a small number of clients. In our case we were using an internal ELB within the application tiers. In your case "push load from single JMeter" likely means a small number of clients as seen by the ELB. The solution was to enable cross zone load balancing using the API similar to this fragment:-
elb-modify-lb-attributes ${ELB} --region ${REGION} --crosszoneloadbalancing "enabled=true"
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/enable-disable-crosszone-lb.html

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js