Configure AWS CPU Utilisation metric for Load Balancer - amazon-web-services

I have a AWS ELB instance up and running. I have enabled the Classic Load Balancer with minimum number of instances as 1.
What I want to test/verify is if the load on the instance increases an additional instance should be created. To verify this I wanted to configure the Scaling triggers.
Can you guide me on how to configure the Scaling triggers for Metric CPUUtilization? What should be the Upper threshold or Lower threshold?

I would recommend that you not use the Classic Load Balancer. These days, you should use the Application Load Balancer or Network Load Balancer. (Anything with the name 'classic' basically means it is outdated, but still available for legacy use.)
There are many ways to create scaling triggers. The easiest method is to use Target Tracking Scaling Policies for Amazon EC2 Auto Scaling. This allows you to provide a target (eg "CPU Utilization of 75%") and Auto Scaling will handle the details.
However, I note that you tagged this question as using Elastic Beanstalk. I don't think it supports Target Tracking, so instead you can specify a "Scale-out" and "Scale-In" threshold.
As to what number you should put in... this depends totally on your application and its typical usage patterns. You can only determine the 'correct' setting by observing your normal traffic, or by creating a test system and simulating typical usage.
CPU Utilization might be a good metric to use for scaling, but this depends on what the application is doing. For example, if it is doing heavy calculations (eg video encoding), it is a good metric. However, there might be other indications of heavy usage, such as the amount of free memory or the number of users. You can only figure out which is the 'right' metric by observing what your system does when it is under load.

Related

How to scale AWS application load balancer for unexpected spikes in traffic?

So I just have a question in regards to load balancers which relates to one of our our systems. So we have a system which sits behind an ALB with a fleet of EC2 instances which deal with requests, these requests based on the type of request are forwarded downstream to other components and eventually reside within dynamodb. This system is essentially the entry point into our system and if we know an event is coming we can appropriately scale up our instances to deal with the spike. The problem arises when we have an unexpected spike it traffic, normally within 60 seconds which taxes the instances and the load balancer is unable to scale up in time, what we find is new instances come into play well after the incident is over.
Currently we scale based on CPU threshold. My question is, is there any other metric or method we could use to scale fast when a large spike in traffic comes into play. The easiest solution really that I can see is to throw more instances into play permentantly but this isn’t the most cost effective solution.
Thanks in advance for any guidance you can provide.

AWS T2 Micro Autoscaling Network Out

I have a T2 Micro instance on AWS Beanstalk with Autoscaling set up. The autoscaling policy uses the Network Out parameter and currently I have it set at 6mb. However, this results in a lot of instances being created and terminated (as the Net Out goes above 6mb). My question is what is an appropriate auto-scaling Net Out policy for a Micro Instance. I understand that a Micro instance should support a Network bandwidth of about 70 Mbit so perhaps the Net Out auto scale can safely be set to about 20 Mbit?
EC2 instance types's exact network performance?
Determining a scale-out trigger for an Auto Scaling group is always difficult.
It needs to be something that identifies that the instance is "busy", to know when to add/remove instances. This varies greatly depending upon the application.
The specific issue with T2 instances is that they have CPU credits. If these credits are exhausted, then there is an artificial maximum level of CPU available. Thus, T2 instances should never have a scaling policy based on CPU.
In your case, you are using networking as the scaling trigger. This is good if network usage is an indication of the instance being "busy", resulting in a bottleneck. If, on the other hand, networking is not the bottleneck then this is not a good scaling trigger.
Traditionally, busy computers are either limited in CPU, Network or Disk access. You will need to study a "busy" instance to discover which of these dimensions is the best indicator that the instance is "busy" such that it cannot handle any additional load.
Alternatively, you might want the application to generate its own metrics, such as the number of messages being simultaneously processed. These can be pushed to Amazon CloudWatch as a custom metric, which can then be used for scaling in/out.
You can even get fancy and use information from a database to trigger scaling events: AWS Autoscaling Based On Database Query Custom Metrics - powerupcloud

AWS Network out

Our web application has 5 pages (Signin, Dashboard, Map, Devices, Notification)
We have done the load test for this application, and load test script does the following:
Signin and go to Dashboard page
Click Map
Click Devices
Click Notification
We have a basic free plan in AWS.
While performing load test, till about 100 users, we didn’t get any error. please see the below image. We could see NetworkIn, CPUUtilization seems to be normal. But the NetworkOut showed 846K.
But when reach around 114 users, we started getting error in the map page (highlighted in red). During that time, it seems only NetworkOut is high. Please see the below image.
We want to know what is the optimal score for the NetworkOut, If this number is high, is there any way to reduce this number?
Please let me know if you need more information. Thanks in advance for your help.
You are using a t2.micro instance.
This instance type has limitations on CPU that means it is good for bursty workloads, but sustained loads will consume all the available CPU credits. Thus, it might perform poorly under sustained loads over long periods.
The instance also has limited network bandwidth that might impact the throughput of the server. While all Amazon EC2 instances have limited allocations of bandwidth, the t2.micro and t2.nano have particularly low bandwidth allocations. You can see this when copying data to/from the instance and it might be impacting your workloads during testing.
The t2 family, especially at the low-end, is not a good choice for production workloads. It is great for workloads that are sometimes high, but not consistently high. It is also particularly low-cost, but please realise that there are trade-offs for such a low cost.
See:
Amazon EC2 T2 Instances – Amazon Web Services (AWS)
CPU Credits and Baseline Performance for Burstable Performance Instances - Amazon Elastic Compute Cloud
Unlimited Mode for Burstable Performance Instances - Amazon Elastic Compute Cloud
That said, the network throughput showing on the graphs is a result of your application. While the t2 might be limiting the throughput, it is not responsible for the spike on the graph. For that, you will need to investigate the resources being used by the application(s) themselves.
NetworkOut simply refers to volume of outgoing traffic from the instance. You reduce the requests you are sending from this instance to reduce the NetworkOut .So you may need to see which one of click Map, Click Devices and Click Notification is sending traffic outside of the instances. It may not necessarily related only to the number of users but a combination of number of users and application module.

start second instance AWS when the first reaches 85% of memory or cpu,

I have the following scenario:
I have two Windows servers on AWS that run an application via IIS. For particularities of the application, they work with HTTP load balancing on the IIs.
To reduce costs, I was asked, that the second instance is only started when the first one reaches 90% CPU usage or 85% memory usage.
In my zone (sa-east-1), there are still no Auto Scaling Groups.
Initially, I created a cloudwatch event to start the second instance when it detected high CPU usage at first. The problem is that Cloudwatch, natively still does not monitor memory and so far I'm having trouble customizing this type of monitoring.
Is there any other way for me to be able to start the second instance based on the above conditions?
Since the first instance is always running, it might be something Windows-level, some powershell that detects the high memory usage and start the second? The script to start instances via powershell I already own, I just need help with how to detect the high memory usage event to start the second instance from it.
or some third-party application that does so...
Thanks!
Auto Scaling groups are available in sa-east-1, so use them
Pick one metric upon which to scale (memory OR CPU), do not pick both otherwise it would be confusing how to scale when one metric is high and the other is low.
If you wish to monitor Windows memory in CloudWatch, see: Sending Logs, Events, and Performance Counters to Amazon CloudWatch - Amazon Elastic Compute Cloud
Also, be careful using a metric such as "memory usage" to measure the need to launch more instances. Some systems use garbage collection to free-up memory, but only when available memory is low (rather than continuously).
Plus, make sure your application is capable of running across multiple instances, such as putting it behind a load balancer (depending on what the application actually does).

AWS cloudwatch custom metrics on AWS-Auto Scaling

I have auto-scaling setup currently listed to the CPU usage on scaling in & out. Now there are scenarios that our servers got out of service due to out of memory, I applied custom metrics to get those data on the instance using the Perl scripts. Is it possible to have a scaling policy that listed to those custom metrics?
Yes!
Just create an Alarm (eg Memory-Alarm) on the Custom Metric and then adjust the Auto Scaling group to scale based on the Memory-Alarm.
You should pick one metric to trigger the scaling (CPU or Memory) -- attempting to scale with both could cause problems where one alarm is high and another is low.
Update:
When creating an Alarm on an Auto Scaling group, it uses only one alarm and the alarm uses an aggregated metric across all instances. For example, it might be Average CPU Utilization. So, if one instance is at 50% and another is at 100%, the metric will be 75%. This way, it won't add instances just because one instance is too busy.
This will probably cause a problem for your memory metric because aggregating memory across the group makes no sense. If one machine has zero memory but another has plenty of memory, it won't add more instances. This is fine because one machine can handle more load, but it won't really be a good measure of 'how busy' the servers are.
If you are experiencing "servers got out of service due to out of memory", the best thing you should do is to configure the Health Check on the load balancer such that it can detect whether an instance can handle requests. If the Auto Scaling health check fails on an instance, then it will stop sending requests to that server until the Health Check is successful. This is the correct way to identify specific instances that are having problems, rather than trying to scale-out.
At any rate, you should investigate your memory issues and determine whether it is actually related to load (how many requests are being handled) or whether it's a memory leak in the application.