How to use AWS Autoscaling Effectively - amazon-web-services

Please lemme know the answer for below question:
In Reviewing the Auto scaling events for ur application you notice that application is scaling up and down multiple times in the same hour.What design you make to optimize for cost while preserving elasticity?
A.Modify Autoscaling group termination polict to terminate old Oldinstance first
B..Modify Autoscaling group termination polict to terminate old new instance first
C.Modify Cloud watch alarm period that triggers Autoscaling down policy
D.Modify auto scaling group cool down timers.
E.Modify the Autoscaling policy to use scheduled scaling Actions.
im guessing D&E ..Please suggest!!

This is a question from from the many "Become AWS certified!" websites. The purpose of such questions is to determine whether you understand AWS enough to be officially recognised via certification. If you are merely asking people for the correct answer, then you are only learning the answer... not the actual knowledge!
If you truly researched Auto Scaling and thought about it, here's some of the things you should be thinking about. I present this information hoping that you'll actually learn about AWS rather than just memorising answers (which won't help you in the real world).
Scaling In/Out vs Up/Down
Auto Scaling is all about launching additional Amazon EC2 instances when they are required (eg during times of peak load) and terminating them when they are no longer needed, thereby saving money.
Since instances are being added and removed, this is referred to as Scaling Out and Scaling In. Try to avoid using using terms such as Scaling Up and Scaling Down since they suggest that the instances are being made bigger and smaller (which is not the case).
Scaling Out & In multiple times per hour
The assumption in this statement is that such scaling is not desired, which is true. Amazon EC2 is charged per-hour, so adding instances and them removing them within a short period of time is wasting money. This is known as thrashing.
In general, it is a good idea to Scale Out quickly and Scale In slowly. When a system needs extra capacity (Scale Out), it will want it fairly quickly to satisfy demand. When it no longer needs as much capacity, it might be worthwhile waiting before Scaling In because demand might increase again very soon thereafter.
Therefore, it is important to get the right alarm to trigger a scaling action and to wait a while before trying to scale again.
Optimize for cost while preserving elasticity
When an exam question makes a statement about optimizing, it's giving you a hint that the primary goal should be cost minimization, even if other choices might make more sense. Therefore, you want the solution to Scale In when possible, while avoiding thrashing.
Termination Policies
When an Auto Scaling Policy is triggered to remove instances, Auto Scaling uses the termination policy to determine which instance(s) to remove. This is, therefore, irrelevant to the question because optimizing for cost while preserving elasticity is only impacted by the number of instances, not which instances are actually terminated.
CloudWatch Alarms
Auto Scaling actions can be triggered by CloudWatch alarms, such as "average CPU < 70% for 15 minutes". A rule with a longer time period means that it will react to longer-term changes rather than temporary changes, which certainly helps avoid thrashing. However, it also means that Auto Scaling will take longer to respond to changes in demand.
Cooldowns
From the Auto Scaling documentation:
The Auto Scaling cooldown period is a configurable setting for your Auto Scaling group that helps to ensure that Auto Scaling doesn't launch or terminate additional instances before the previous scaling activity takes effect. After the Auto Scaling group dynamically scales using a simple scaling policy, Auto Scaling waits for the cooldown period to complete before resuming scaling activities.
This is very useful, because newly-launched instances take some time (eg for booting, configuring) before they can take some of the application workload. If the cooldown is too short, then Auto Scaling might launch additional instances before the first one is ready. The result is that too many instances will be launched, meaning that some will need to Scale In soon after, leading to more thrashing.
Scheduled Actions
Instead of triggering Scale In and Scale Out actions based upon a metric, Auto Scaling can be configured to use Schedules actions. For example, increasing the minimum number of instances at 8am in the morning before an expected rush, and decreasing the minimum number at 6pm when usage starts to drop-off.
Scheduled Actions are unlikely to cause thrashing, since scaling is based on a schedule rather than metrics that frequently change.
The Correct Answer
The correct answer is... I'm not going to tell you! However, by reading the above information and trying to grok how Auto Scaling works, you will hopefully come to a better understanding of the question and arrive at a suitable answer.
This way, you will have learned something rather than merely memorizing the answers.

Related

AWS ECS Scaling based on memoryreservation

I've been given a AWS environment to look after and it runs ECS on EC2 instances and has scaling configured using ECS Memory Reservation. The system was originally running before Cluster Autoscaling was made generally available so it's just using a cloudwatch metric to scale out and scale in. As far as I can work out it is following a basic AWS design.
The EC2 has an autoscaling group and allows scale from 1 to 5 instances with 1 being the desired state.
There is 1 cluster service running with 6 tasks configured.
5 of those tasks are configured to run up to 2 copies of the task maximum and 1 the desired, the other is set to maximum of 1.
The tasks have MemoryReservation (soft limit) figures configured but not Memory (hard limit).
The tasks are primarily running Java.
The highest memory reservation is set at about 200MB and most are around this figure.
The scale out rule is based on MemoryReservation at 85%.
Docker stats shows most of the tasks are running about 300MB and some exceed 600MB.
The instance size has 4GB of RAM.
If the maximum reservation is 2GB, even if the tasks are consuming more like 3GB in reality, am I right in believing that the scale out rule will NEVER be invoked because 2GB is 50% of available RAM? Do I need to increase the memory reservations to something more realistic?
Also if it is only running a single EC2 instance am I right in thinking even if I increased the MemoryReservation figures to something more realistic, just because there's no theoretical room to start another task it won't spin up a second EC2 instance automatically? Just picked this up from different articles I've been reading when searching.
Thanks
After the update of Capacity Providers in May 2022, Capacity Providers still have a gap to fill in Memory scaling.
As per the OP "ECS Memory Reservation" seems not to even be an option any more (at least in the web console)
And when creating the Capacity Provider, only the target value is configurable.
There are more details into how this Capacity is calculated in this blog, but while it mentions:
This calculation accounts for vCPU, memory, ENI, ports, and GPUs of the tasksĀ andĀ the instances
If you have tasks that not necessarily grow memory consumption, but you have a service with scheduled actions configured to scale tasks (eg: minimum tasks at different times of day)
This case will not trigger a scale out, since the memory in the instances does not get to be used if the tasks simply does not fit in, due to its configuration and you will see errors (in the service events) like:
service myservice was unable to place a task because no container
instance met all of its requirements. The closest matching
container-instance abc123xxxx has insufficient memory available.
This basically mean a scheduled task scaling change may not happen if the task memory setting is just big enough so it doesn't fit in the running instances, and the CapacityProviderReservation does not change because the calculation is only done when tasks are in Provisioned state, which does not happen in this case.
Possible workarounds
Decrease the Capacity Reservation. This basically means "to have spare capacity", ie: by default Reservation is 100 (%) so it tries to use the ASG cluster resources as much as possible, so having a number less than 100, means it will scale out when the cluster is used at that capacity therefore having a margin spare of resources at all times, which means new scheduled tasks will fit in, as long as the spare is enough (eg: calculate per task memory reservation and cluster memory reservation of all expected running tasks)
Setup ASG rules for scaling that match the service scaling rules.
While possible, this may be bound for problems with timing and auto scaling due to other triggers.
A few things:
Cluster AutoScaling usually is just the term ECS uses for "An AutoScaling Group that launches instances into the cluster", and it sounds like that's what you are currently using. Capacity Providers are a newer feature where ECS more directly manages the ASG, which might be the newer feature you're thinking onf
'Desired Capacity' isn't a state that you set for where you want the group to be, its the current amount of capacity that AutoScaling wants there to be in the group. So if a scaling policy goes off and says +1, the desired will change to 2, and then AutoScaling will try to launch an instance since you presumably only had 1 before (since the desired was 1 before)
Memory reservation is based on that 2GB's reserved, so it doesn't mater how much is in use for scaling purposes. This is importaint because even if you had 6/8GB reserved (from 3 2GB tasks), but 7.5Gb in use, ECS would still allow another task to be launched, since there's still 2 reservable GBs
Because of 3) you should probably increase the reservation value, wouldn't want an instance to get overloaded. Java can be nasty about RAM issues. This would also help with your scale out threshold issue.
For your second question, scaling will only happen after the cloudwatch alarm is triggered. So if the metric never goes above that threshold, alarm can't trigger the scaling policy. There are a whole host of cases where just because the alarm triggers, scaling won't happen (more of them for scaling in than scaling out, but it can still happen on scale out too); but the alarm going into the Alarm state is definitely a required step.

AWS T2 Micro Autoscaling Network Out

I have a T2 Micro instance on AWS Beanstalk with Autoscaling set up. The autoscaling policy uses the Network Out parameter and currently I have it set at 6mb. However, this results in a lot of instances being created and terminated (as the Net Out goes above 6mb). My question is what is an appropriate auto-scaling Net Out policy for a Micro Instance. I understand that a Micro instance should support a Network bandwidth of about 70 Mbit so perhaps the Net Out auto scale can safely be set to about 20 Mbit?
EC2 instance types's exact network performance?
Determining a scale-out trigger for an Auto Scaling group is always difficult.
It needs to be something that identifies that the instance is "busy", to know when to add/remove instances. This varies greatly depending upon the application.
The specific issue with T2 instances is that they have CPU credits. If these credits are exhausted, then there is an artificial maximum level of CPU available. Thus, T2 instances should never have a scaling policy based on CPU.
In your case, you are using networking as the scaling trigger. This is good if network usage is an indication of the instance being "busy", resulting in a bottleneck. If, on the other hand, networking is not the bottleneck then this is not a good scaling trigger.
Traditionally, busy computers are either limited in CPU, Network or Disk access. You will need to study a "busy" instance to discover which of these dimensions is the best indicator that the instance is "busy" such that it cannot handle any additional load.
Alternatively, you might want the application to generate its own metrics, such as the number of messages being simultaneously processed. These can be pushed to Amazon CloudWatch as a custom metric, which can then be used for scaling in/out.
You can even get fancy and use information from a database to trigger scaling events: AWS Autoscaling Based On Database Query Custom Metrics - powerupcloud

AWS cloudwatch custom metrics on AWS-Auto Scaling

I have auto-scaling setup currently listed to the CPU usage on scaling in & out. Now there are scenarios that our servers got out of service due to out of memory, I applied custom metrics to get those data on the instance using the Perl scripts. Is it possible to have a scaling policy that listed to those custom metrics?
Yes!
Just create an Alarm (eg Memory-Alarm) on the Custom Metric and then adjust the Auto Scaling group to scale based on the Memory-Alarm.
You should pick one metric to trigger the scaling (CPU or Memory) -- attempting to scale with both could cause problems where one alarm is high and another is low.
Update:
When creating an Alarm on an Auto Scaling group, it uses only one alarm and the alarm uses an aggregated metric across all instances. For example, it might be Average CPU Utilization. So, if one instance is at 50% and another is at 100%, the metric will be 75%. This way, it won't add instances just because one instance is too busy.
This will probably cause a problem for your memory metric because aggregating memory across the group makes no sense. If one machine has zero memory but another has plenty of memory, it won't add more instances. This is fine because one machine can handle more load, but it won't really be a good measure of 'how busy' the servers are.
If you are experiencing "servers got out of service due to out of memory", the best thing you should do is to configure the Health Check on the load balancer such that it can detect whether an instance can handle requests. If the Auto Scaling health check fails on an instance, then it will stop sending requests to that server until the Health Check is successful. This is the correct way to identify specific instances that are having problems, rather than trying to scale-out.
At any rate, you should investigate your memory issues and determine whether it is actually related to load (how many requests are being handled) or whether it's a memory leak in the application.

AWS Scale out , Scale Up

In AWS, we come across scaling up (Adding more storage i.e from t1.small to t2.medium or t2.large) and scaling out is adding up of instances (adding EC2 instances or other). How are these related to Horizontal scaling and vertical scaling. Also, what is preferred to be used more in Recovery and Backups, Volume management more often while the condition is to minimize the cost of the infrastructure maintenance.
Scaling up is when you change the instance types within your Auto Scaling Group to a higher type (for example: changing an instance from a m4.large to a m4.xlarge), scaling down is to do the reverse.
Scaling out is when you add more instances to your Auto Scaling Group and scaling in is when you reduce the number of instances in your Auto Scaling Group.
When you scale out, you distribute your load and risk which in turn provides a more resilient solution, here is an example:
Let's say you have an ASG with 4x m4.xlarge instances. If one fails that means you lost 25% of your processing capability, it doesn't matter that these are sizeable instances with a good amount of CPU and Ram, the fact is by having bigger instance types but less of them you increase the impact of a failure.
However if you had say 8x m4.large instead, your total compute is the same as 4x m4.xlarge however if 1 instance dies then you only lose 12.5% of your resources.
Typically its better to use more smaller instances than less larger ones, so you will see that its more common to "scale-out" to meet demand than it is to "scale-up".
One last consideration is, in order to scale-up/scale-down you have to restart the instance, so there is a service impact when you scale-up/scale-down. There is no such impact when you scale-in/scale-out however.
I hope this helps!
This might help to get better picture on scaling in AWS
Any application loaded with a considerable amount of business logic, typically, follows a three tier architecture (client, server and data-storage) with multiple TSL. Right combination of AWS services can help to achieve the scalability goal. Let's focus on each layer individually and come up with an infrastructure plan on scalability.
Full Article is Here

AWS SQS + autoscale

Assuming that I have a queue and multiple instances in autoscaling group.
For scaling up case, it's quite easy to determine.If the length of the queue grows, autoscaling group will spawn new instances.
For scaling down case, it's a bit tricky here. If the length of the queue shrinks, autoscaling group will terminate the instances. It sounds obvious, but the question is: what happens if the instances which are still processing messages being terminated?
Of course we can use some metrics like CPU Utilisation, Disk Read/Write, etc to check. But I don't think it's a good idea. I'm thinking about a central place where instances will be registered for whether they are processing or not, so that the free ones can be determined and so terminated properly.
Any thoughts for this? Thanks.
The accepted answer on this thread:
Amazon Auto Scaling API for Job Servers
gives you two possibilities to handle your situation. One of them should work for you. Also keep in mind, that you don't necessarily want to kill and instance as soon as there is no work - when they spin up, you are going to pay for the whole hour wether you use 59 minutes or 1 minute, so you may want to build that into your solution - spin up instances fast, turn them off slowly.