AWS T2 Micro Autoscaling Network Out - amazon-web-services

I have a T2 Micro instance on AWS Beanstalk with Autoscaling set up. The autoscaling policy uses the Network Out parameter and currently I have it set at 6mb. However, this results in a lot of instances being created and terminated (as the Net Out goes above 6mb). My question is what is an appropriate auto-scaling Net Out policy for a Micro Instance. I understand that a Micro instance should support a Network bandwidth of about 70 Mbit so perhaps the Net Out auto scale can safely be set to about 20 Mbit?
EC2 instance types's exact network performance?

Determining a scale-out trigger for an Auto Scaling group is always difficult.
It needs to be something that identifies that the instance is "busy", to know when to add/remove instances. This varies greatly depending upon the application.
The specific issue with T2 instances is that they have CPU credits. If these credits are exhausted, then there is an artificial maximum level of CPU available. Thus, T2 instances should never have a scaling policy based on CPU.
In your case, you are using networking as the scaling trigger. This is good if network usage is an indication of the instance being "busy", resulting in a bottleneck. If, on the other hand, networking is not the bottleneck then this is not a good scaling trigger.
Traditionally, busy computers are either limited in CPU, Network or Disk access. You will need to study a "busy" instance to discover which of these dimensions is the best indicator that the instance is "busy" such that it cannot handle any additional load.
Alternatively, you might want the application to generate its own metrics, such as the number of messages being simultaneously processed. These can be pushed to Amazon CloudWatch as a custom metric, which can then be used for scaling in/out.
You can even get fancy and use information from a database to trigger scaling events: AWS Autoscaling Based On Database Query Custom Metrics - powerupcloud

Related

Cloudwatch Period time

CPU metrics cannot be selected below 1 minute in Cloudwatch service. For example, how can I lower this period time to trigger the Autoscale scale faster? I just need to trigger the AutoScale instances in short time. (By the way, datapoints value 1 to 1)
the minimum granularity for the metrics that EC2 provides is 1 minute.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html
Would also say that if you need to scale that quickly, wouldn't the startup time be an issue anyway?
You are correct -- basic monitoring of an Amazon EC2 instance provides metrics over 5-minute periods. If you activate EC2 Detailed Monitoring, metrics are provided over 1-minute periods. Extra charges apply for Detailed Monitoring.
When launching a new instance via Amazon EC2 Auto-Scaling, it can take a few minutes for the new instance to launch and for the User Data script (if any) to run. Linux instances are quite fast, but Windows instances take a while on their first boot due to sysprep operations.
You mention that you want to react to a metric in less than one minute. I would suggest that this would not be an ideal way to trigger Auto-scaling. Sometimes a computer can be busy for a while, then can drop down again. Reacting too quickly to a high CPU load would cause the Auto-Scaling group to flap between adding instances and terminating instances. It is better to provision enough capacity for a reasonable amount of extra load and then gradually add more capacity as it is required over time.
If you have a need to react so quickly, then perhaps you should investigate using AWS Lambda to perform small amounts of work in a highly-parallel fashion rather than relying on Amazon EC2 instances.

How does AWS autoscaling groups recognize that EC2 is idle and it should be terminated?

I am running a flask python program on EC2 which is under Load Balancer and autoscaling. In a scenario where is load increases on one Ec2 it creates another and if newly scaled Ec2 has been idle or not utilized it scales in or terminates it. The problem here is if a single user is accessing newly scaled instance which hardly takes any CPU utilization how autoscaling group will realize that it idle and if it doesn't it will terminate it leaving downtime for that user.
I have two scenarios in mind that it checks for a particular program for a amount of time in EC2 if it is running then don't, otherwise terminate it.
I see Step scaling policy but there option is only for CPU utilization that is hardly consumed if there is a single user, not even 0.1 %.
Can someone please tell me whats the best option for me and if these two options are possible then how to do it? I have been trying to ask developers since many days but could not get reliable answers in my case.
Amazon EC2 Auto-scaling does not know which of your instances are 'in use'.
Also, the decision to terminate an instance is typically made on a metric across all instances (eg CPU Utilization), rather than a metric on a specific instance.
When Auto Scaling decides to remove an instance from the Auto Scaling group, it picks an instance as follows:
It picks an Availability Zone with the most instances (to keep them balanced)
It then selects an instance based on the Termination Policy
See also: Control which Auto Scaling instances terminate during scale in - Amazon EC2 Auto Scaling
When using a Load Balancer with Auto Scaling, traffic going to the instance that will be terminated is 'drained', allowing a chance for the instance to complete existing requests.
You can further prevent an instance from terminating while it is still "in use"by implementing Amazon EC2 Auto Scaling lifecycle hooks that allow your own code to delay the Termination.
Or, if all of this is unsatisfactory, you can disable the automatic selection of an instance to terminate and instance have your own code call TerminateInstanceInAutoScalingGroup - Amazon EC2 Auto Scaling to terminate a specific instance of your choosing.
For an overview of Auto Scaling, I recommend this video from the AWS Reinvent conference: AWS re:Invent 2019: Capacity management made easy with Amazon EC2 Auto Scaling (CMP326-R1) - YouTube

Configure AWS CPU Utilisation metric for Load Balancer

I have a AWS ELB instance up and running. I have enabled the Classic Load Balancer with minimum number of instances as 1.
What I want to test/verify is if the load on the instance increases an additional instance should be created. To verify this I wanted to configure the Scaling triggers.
Can you guide me on how to configure the Scaling triggers for Metric CPUUtilization? What should be the Upper threshold or Lower threshold?
I would recommend that you not use the Classic Load Balancer. These days, you should use the Application Load Balancer or Network Load Balancer. (Anything with the name 'classic' basically means it is outdated, but still available for legacy use.)
There are many ways to create scaling triggers. The easiest method is to use Target Tracking Scaling Policies for Amazon EC2 Auto Scaling. This allows you to provide a target (eg "CPU Utilization of 75%") and Auto Scaling will handle the details.
However, I note that you tagged this question as using Elastic Beanstalk. I don't think it supports Target Tracking, so instead you can specify a "Scale-out" and "Scale-In" threshold.
As to what number you should put in... this depends totally on your application and its typical usage patterns. You can only determine the 'correct' setting by observing your normal traffic, or by creating a test system and simulating typical usage.
CPU Utilization might be a good metric to use for scaling, but this depends on what the application is doing. For example, if it is doing heavy calculations (eg video encoding), it is a good metric. However, there might be other indications of heavy usage, such as the amount of free memory or the number of users. You can only figure out which is the 'right' metric by observing what your system does when it is under load.

AWS cloudwatch custom metrics on AWS-Auto Scaling

I have auto-scaling setup currently listed to the CPU usage on scaling in & out. Now there are scenarios that our servers got out of service due to out of memory, I applied custom metrics to get those data on the instance using the Perl scripts. Is it possible to have a scaling policy that listed to those custom metrics?
Yes!
Just create an Alarm (eg Memory-Alarm) on the Custom Metric and then adjust the Auto Scaling group to scale based on the Memory-Alarm.
You should pick one metric to trigger the scaling (CPU or Memory) -- attempting to scale with both could cause problems where one alarm is high and another is low.
Update:
When creating an Alarm on an Auto Scaling group, it uses only one alarm and the alarm uses an aggregated metric across all instances. For example, it might be Average CPU Utilization. So, if one instance is at 50% and another is at 100%, the metric will be 75%. This way, it won't add instances just because one instance is too busy.
This will probably cause a problem for your memory metric because aggregating memory across the group makes no sense. If one machine has zero memory but another has plenty of memory, it won't add more instances. This is fine because one machine can handle more load, but it won't really be a good measure of 'how busy' the servers are.
If you are experiencing "servers got out of service due to out of memory", the best thing you should do is to configure the Health Check on the load balancer such that it can detect whether an instance can handle requests. If the Auto Scaling health check fails on an instance, then it will stop sending requests to that server until the Health Check is successful. This is the correct way to identify specific instances that are having problems, rather than trying to scale-out.
At any rate, you should investigate your memory issues and determine whether it is actually related to load (how many requests are being handled) or whether it's a memory leak in the application.

AWS - how to prevent load balancer from terminating instances under load?

I'm writing a web-service that packs up customer data into zip-files, then uploads them to S3 for download. It is an on-demand process, and the amount of data can range from a few Megabytes to multiple Gigabytes, depending on what data the customer orders.
Needless to say, scalability is essential for such a service. But I'm having trouble with it. Packaging the data into zip-files has to be done on the local harddrive of a server instance.
But the load balancer is prone to terminating instances that are still working. I have taken a look at scaling policies:
http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html
But what I need doesn't seem to be there. The issue shouldn't be so difficult: I set the scale metric to CPU load, and scale down when it goes under 1%. But I need a guarantee that the exact instance will be terminated that breached the threshold, not another one that's still hard at work, and the available policies don't seem to present me with that option. Right now, I am at a loss how to achieve this. Can anybody give me some advice?
You can use Auto Scaling Lifecycle Hooks to perform actions before an instance is terminated. You could use this to wait for the processing to finish before proceeding with the instance termination.
It appears that you have configured an Auto Scaling group with scaling policies based upon CPU Utilization.
Please note that an Elastic Load Balancer will never terminate an Amazon EC2 instance -- if a Load Balancer health check fails, it will merely stop serving traffic to that EC2 instance until it again passes the health checks. It is possible to configure Auto Scaling to use ELB health checks, in which case Auto Scaling will terminate any instances that ELB marks as unhealthy.
Therefore, it would appear that Auto Scaling is responsible for terminating your instances, as a result of your scaling policies. You say that you wish to terminate specific instances that are unused. However, this is not the general intention of Auto Scaling. Rather, Auto Scaling is used to provide a pool of resources that can be scaled by launching new instances and terminating unwanted instances. Metrics that trigger Auto Scaling are typically based upon aggregate metrics across the whole Auto Scaling group (eg average CPU Utilization).
Given that Amazon EC2 instances are charged by the hour, it is often a good idea to keep instance running longer -- "Scale Out quickly, Scale In slowly".
Once Auto Scaling decides to terminate an instance (which it selects via a termination policy), use an Auto Scaling lifecycle hook to delay the termination until ready (eg, copying log files to S3, or waiting for a long process to complete).
If you do wish to terminate an instance after it has completed a particular workload, there is no need to use Auto Scaling -- just have the instance Shutdown when it is finished, and set the Shutdown Behavior to terminate to automatically terminate the instance upon shutdown. (This assumes that you have a process to launch new instances when you have work you wish to perform.)
Stepping back and looking at your total architecture, it would appear that you have a Load Balancer in front of web servers, and you are performing the Zip operations on the web servers? This is not a scalable solution. It would be better if your web servers pushed a message into an Amazon Simple Queue Service (SQS) queue, and then your fleet of back-end servers processed messages from the queue. This way, your front-end can continue receiving requests regardless of the amount of processing underway.
It sounds like what you need is Instance Protection, which is actually mentioned a bit more towards the bottom of the document that you linked to. As long as you have work being performed on a particular instance, it should not be automatically terminated by the Auto-Scaling Group (ASG).
Check out this blog post, on the official AWS blog, that conceptually talks about how you can use Instance Protection to prevent work from being prematurely terminated.