Choosing Specific instances for scaling in and scaling out in AWS - amazon-web-services

Using Auto Scaling with Load balance in AWS, We can do the following things, according to my understanding:
we can scale up and scale down according to load.
all instances have the same image.
But I have a different problem:
if we have less load then we should terminate a big machine and start a small machine and vice versa.
small machine and the big machine has a different image
but I am not getting any help from AWS UI.
Can anyone help me on this issue?

Amazon EC2 Auto Scaling can launch new instances and can terminate instances. It only ever adds or removes instances -- it never changes the size of an instance. This is why you'll often see it referred to as "scale-out and scale-in" rather than "scale-up and scale-down".
When a scaling policy is triggered and Auto Scaling needs to launch a new instance, it uses the provided Launch Configuration or Launch Template to determine what type of instance to launch, which network to use, etc.
Therefore, an Auto Scaling group typically consists of all the same size instances since they are all launched from the same Launch Configuration. This is actually a good thing because it makes it easier for the scaling alarms to know when to add/remove instances and it also helps Load Balancers distribute load amongst instances since they assume that all instances are sized the same.
So, rather than "terminate a big machine and start a small machine and vice versa", Auto Scaling simply launches the same sized instance or terminates an instance.
Also, all instances should use the same AMI since load balancers will send traffic to every instance, expecting them to behave the same way.
You could, if you wish, modify the Launch Configuration associated with the Auto Scaling group so that, when it next launches an instance, it launches a different-sized instance. However, Auto Scaling and Load Balancers will not 'know' that it is a different-sized instance.

Basically John answered this question.
As alternative we can have some sophisticated scaling logic in any computing resource. For example AWS Alarms can send SNS notification, that Lambda reads and scale in or out using sophisticated logic you have (big or small instances etc.).

Related

How does AWS autoscaling groups recognize that EC2 is idle and it should be terminated?

I am running a flask python program on EC2 which is under Load Balancer and autoscaling. In a scenario where is load increases on one Ec2 it creates another and if newly scaled Ec2 has been idle or not utilized it scales in or terminates it. The problem here is if a single user is accessing newly scaled instance which hardly takes any CPU utilization how autoscaling group will realize that it idle and if it doesn't it will terminate it leaving downtime for that user.
I have two scenarios in mind that it checks for a particular program for a amount of time in EC2 if it is running then don't, otherwise terminate it.
I see Step scaling policy but there option is only for CPU utilization that is hardly consumed if there is a single user, not even 0.1 %.
Can someone please tell me whats the best option for me and if these two options are possible then how to do it? I have been trying to ask developers since many days but could not get reliable answers in my case.
Amazon EC2 Auto-scaling does not know which of your instances are 'in use'.
Also, the decision to terminate an instance is typically made on a metric across all instances (eg CPU Utilization), rather than a metric on a specific instance.
When Auto Scaling decides to remove an instance from the Auto Scaling group, it picks an instance as follows:
It picks an Availability Zone with the most instances (to keep them balanced)
It then selects an instance based on the Termination Policy
See also: Control which Auto Scaling instances terminate during scale in - Amazon EC2 Auto Scaling
When using a Load Balancer with Auto Scaling, traffic going to the instance that will be terminated is 'drained', allowing a chance for the instance to complete existing requests.
You can further prevent an instance from terminating while it is still "in use"by implementing Amazon EC2 Auto Scaling lifecycle hooks that allow your own code to delay the Termination.
Or, if all of this is unsatisfactory, you can disable the automatic selection of an instance to terminate and instance have your own code call TerminateInstanceInAutoScalingGroup - Amazon EC2 Auto Scaling to terminate a specific instance of your choosing.
For an overview of Auto Scaling, I recommend this video from the AWS Reinvent conference: AWS re:Invent 2019: Capacity management made easy with Amazon EC2 Auto Scaling (CMP326-R1) - YouTube

AWS - how to prevent load balancer from terminating instances under load?

I'm writing a web-service that packs up customer data into zip-files, then uploads them to S3 for download. It is an on-demand process, and the amount of data can range from a few Megabytes to multiple Gigabytes, depending on what data the customer orders.
Needless to say, scalability is essential for such a service. But I'm having trouble with it. Packaging the data into zip-files has to be done on the local harddrive of a server instance.
But the load balancer is prone to terminating instances that are still working. I have taken a look at scaling policies:
http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html
But what I need doesn't seem to be there. The issue shouldn't be so difficult: I set the scale metric to CPU load, and scale down when it goes under 1%. But I need a guarantee that the exact instance will be terminated that breached the threshold, not another one that's still hard at work, and the available policies don't seem to present me with that option. Right now, I am at a loss how to achieve this. Can anybody give me some advice?
You can use Auto Scaling Lifecycle Hooks to perform actions before an instance is terminated. You could use this to wait for the processing to finish before proceeding with the instance termination.
It appears that you have configured an Auto Scaling group with scaling policies based upon CPU Utilization.
Please note that an Elastic Load Balancer will never terminate an Amazon EC2 instance -- if a Load Balancer health check fails, it will merely stop serving traffic to that EC2 instance until it again passes the health checks. It is possible to configure Auto Scaling to use ELB health checks, in which case Auto Scaling will terminate any instances that ELB marks as unhealthy.
Therefore, it would appear that Auto Scaling is responsible for terminating your instances, as a result of your scaling policies. You say that you wish to terminate specific instances that are unused. However, this is not the general intention of Auto Scaling. Rather, Auto Scaling is used to provide a pool of resources that can be scaled by launching new instances and terminating unwanted instances. Metrics that trigger Auto Scaling are typically based upon aggregate metrics across the whole Auto Scaling group (eg average CPU Utilization).
Given that Amazon EC2 instances are charged by the hour, it is often a good idea to keep instance running longer -- "Scale Out quickly, Scale In slowly".
Once Auto Scaling decides to terminate an instance (which it selects via a termination policy), use an Auto Scaling lifecycle hook to delay the termination until ready (eg, copying log files to S3, or waiting for a long process to complete).
If you do wish to terminate an instance after it has completed a particular workload, there is no need to use Auto Scaling -- just have the instance Shutdown when it is finished, and set the Shutdown Behavior to terminate to automatically terminate the instance upon shutdown. (This assumes that you have a process to launch new instances when you have work you wish to perform.)
Stepping back and looking at your total architecture, it would appear that you have a Load Balancer in front of web servers, and you are performing the Zip operations on the web servers? This is not a scalable solution. It would be better if your web servers pushed a message into an Amazon Simple Queue Service (SQS) queue, and then your fleet of back-end servers processed messages from the queue. This way, your front-end can continue receiving requests regardless of the amount of processing underway.
It sounds like what you need is Instance Protection, which is actually mentioned a bit more towards the bottom of the document that you linked to. As long as you have work being performed on a particular instance, it should not be automatically terminated by the Auto-Scaling Group (ASG).
Check out this blog post, on the official AWS blog, that conceptually talks about how you can use Instance Protection to prevent work from being prematurely terminated.

How exactly does Auto Scaling in AWS work?

I read the docs but still didn't understand anything. The AWS forum takes time to answer. Here is my scenario:
I had code in an EC2 instance but needed to join the instance to a scaling group. There was one already created so I just joined mine to that since it was "unused". In a moment notice my instance terminates and I lost all my code within the instance. Only the original instance from the scaling group maintained itself.
Now my question: Is scaling supposed to help the same service performance? Because:
the dynamically created instances DO NOT possess the same address.
This means I can SSH to one but if I SSH to another one will there be the same code?
the dynamically created instances DO NOT possess the same security group
the original instance (that could hold the code) ALSO CLOSES if replaced by the new one (I don't know the criteria). So, the one holding the code can always close...
So, between trying to understand if need to try and recover my terminated EC2 instance and understanding all this process, I must admit I'm quite lost because nothing really inspires "merely incrementing process power" with confidence so I really don't know what to do.
Sorry to hear that you lost valuable information. One thing to learn about Auto Scaling is that instances can be terminated, so they should be designed with this in mind.
Purpose of Auto Scaling
Auto Scaling is designed to dynamically adjust the number of instances in an auto scaling group to handle a given compute/processing requirement. Instances can be automatically added when things are busy, and terminated when things are less busy.
Auto Scaling automatically creates new instances by using a Launch Configuration, which describes the type of instance to launch, which disk image (AMI) to use, security group, network settings, startup script, etc. Each new instance would, of course, receive its own private IP address and (if configured) its own public IP address.
The idea is that a new instance can be started with no manual intervention, and all instances within an Auto Scaling group would normally be configured identically (since they are doing an identical job).
When an instance is no longer required, the group is "scaled-in", meaning that an instance is terminated. All data on the instance is lost. (Actually, disks can be kept around after termination, but they are a pain to handle later, so it is best not to do this.)
It's okay for instances to be terminated because they can easily be recreated when Auto Scaling launches a new instances. Unfortunately in your case, you added an instance to an existing Auto Scaling group with your own configuration. It would therefore be your responsibility to make sure you can handle the loss of the instance. It's quite unusual to manually add an instance to an existing Auto Scaling group -- usually it's to send some traffic to a test instance or for doing A/B trials.
Why did it terminate?
The scaling policies attached to your Auto Scaling group probably decided that you had too much capacity (as a result of increasing the number of instances), and therefore decided to terminate an instance.
Recovering your terminated instance
Your terminated instance cannot be "recovered", but there is a small possibility that your disk is still available. By default, when launching an Amazon EC2 instance, boot disks are marked as Delete on Termination, but additional disks default to not deleting on termination. Therefore, if your code was on the non-boot disk, it may still be available in your Volumes tab.
If, however, your code was on a disk volume that was deleted, then the contents cannot be recovered. (Well, if you have Support, you could ask them but it's unlikely to be easy, especially if time has passed since deletion.)
See: AWS Auto Scaling documentation
The short answer. Autoscaling uses horizontal scaling (adding more instances) and not vertical scaling (increasing CPU/memory allotment)
In order to successfully use auto scaling, you need to design your application using the principle of shared nothing architecture. No persistent data can be stored on the instance itself. Instead you would store it on S3, or other instances not part of autoscaling group.

Duplicate and destroy an EC2 instance in AWS via .NET code

We have an application with long running processes which prevents us from being able to use Elastic Beanstalk to properly scale the environment. In fact no metric scaling would be useful for us and what we really need to be able to do is the following....
On demand, programatically, create a new EC2 instance which is a duplicate of a specific EC2 "template" instance (That template instance would be an EC2 running IIS with specific code deployed to it, probably via beanstalk).
On demand, programatically, destroy a specific instance
Based on specific events we would need to perform the above actions via our .NET code base.
I get the feeling that we should be able to do this with cloudformation templates but i dont see any clear documentation to handle this.
Any advise or direction would be greatly appreciated.
Not sure about doing this through .Net code, but you can create an auto scaling group in AWS console and that will take care of the scaling requirements in a maintainable way.
Log on to AWS and navigate to Management Console
First create an EC2 Instance with proper instance type (say, t2.large) or whatever your sizing is.
Then have IIS installed and get your app running on this instance.
Now create a new Image from the above running EC2 Instance
Create a new Auto Scaling Group and add the above new Image
Create a Launch Configuration for the Auto Scaling Group (AWS Console will redirect you to do this step when you try to setup Auto Scaling Group).
Once the Auto Scaling Group is setup, navigate to Auto Scaling Policy and add a policy there. For example, you can create a policy to 'add 1 instance on CPU utilization of 80% or more for 2 consecutive times'.
Also make sure Min is 0 and Max is say, 2. This is upto you to decide based on your scaling requirements. If you have Min as 1, it automatically creates the first instance. If Min is 0, no new instances will be created until the threshold in the Auto Scale Policy is met.
Also create a Scale Down Policy to remove instances when there is a low CPU utilization.
Note: I took CPU utilization as an example to describe the scenario. You can have any metrics as per your choice and architectural needs.

Standard & Spot Instance Load balancing between each other

I've been reading up on Spot Instances but still cant find a way that is safe enough with no downtime. Will it be possible if i launch a normal instance and a spot instance then load balance between them both?
Any other suggestions? :) , i don't mind using Spot instance but the issue is that it doesn't give a warning before shutting down.
It is absolutely possible to load balance between spot instances and on demand instances. I'm doing it right now. I'm using two auto scaling (AS) groups. One for Spot and the other for On Demand. The load balancer is specified when you create an AS group. You can specify the number of instances you want or choose to scale based on load. Auto Scaling is not available in the AWS Console at this time, so there may be a learning curve to get yourself up and running with the API.