I have auto-scaling setup currently listed to the CPU usage on scaling in & out. Now there are scenarios that our servers got out of service due to out of memory, I applied custom metrics to get those data on the instance using the Perl scripts. Is it possible to have a scaling policy that listed to those custom metrics?
Yes!
Just create an Alarm (eg Memory-Alarm) on the Custom Metric and then adjust the Auto Scaling group to scale based on the Memory-Alarm.
You should pick one metric to trigger the scaling (CPU or Memory) -- attempting to scale with both could cause problems where one alarm is high and another is low.
Update:
When creating an Alarm on an Auto Scaling group, it uses only one alarm and the alarm uses an aggregated metric across all instances. For example, it might be Average CPU Utilization. So, if one instance is at 50% and another is at 100%, the metric will be 75%. This way, it won't add instances just because one instance is too busy.
This will probably cause a problem for your memory metric because aggregating memory across the group makes no sense. If one machine has zero memory but another has plenty of memory, it won't add more instances. This is fine because one machine can handle more load, but it won't really be a good measure of 'how busy' the servers are.
If you are experiencing "servers got out of service due to out of memory", the best thing you should do is to configure the Health Check on the load balancer such that it can detect whether an instance can handle requests. If the Auto Scaling health check fails on an instance, then it will stop sending requests to that server until the Health Check is successful. This is the correct way to identify specific instances that are having problems, rather than trying to scale-out.
At any rate, you should investigate your memory issues and determine whether it is actually related to load (how many requests are being handled) or whether it's a memory leak in the application.
Related
I have a AWS ELB instance up and running. I have enabled the Classic Load Balancer with minimum number of instances as 1.
What I want to test/verify is if the load on the instance increases an additional instance should be created. To verify this I wanted to configure the Scaling triggers.
Can you guide me on how to configure the Scaling triggers for Metric CPUUtilization? What should be the Upper threshold or Lower threshold?
I would recommend that you not use the Classic Load Balancer. These days, you should use the Application Load Balancer or Network Load Balancer. (Anything with the name 'classic' basically means it is outdated, but still available for legacy use.)
There are many ways to create scaling triggers. The easiest method is to use Target Tracking Scaling Policies for Amazon EC2 Auto Scaling. This allows you to provide a target (eg "CPU Utilization of 75%") and Auto Scaling will handle the details.
However, I note that you tagged this question as using Elastic Beanstalk. I don't think it supports Target Tracking, so instead you can specify a "Scale-out" and "Scale-In" threshold.
As to what number you should put in... this depends totally on your application and its typical usage patterns. You can only determine the 'correct' setting by observing your normal traffic, or by creating a test system and simulating typical usage.
CPU Utilization might be a good metric to use for scaling, but this depends on what the application is doing. For example, if it is doing heavy calculations (eg video encoding), it is a good metric. However, there might be other indications of heavy usage, such as the amount of free memory or the number of users. You can only figure out which is the 'right' metric by observing what your system does when it is under load.
I have a T2 Micro instance on AWS Beanstalk with Autoscaling set up. The autoscaling policy uses the Network Out parameter and currently I have it set at 6mb. However, this results in a lot of instances being created and terminated (as the Net Out goes above 6mb). My question is what is an appropriate auto-scaling Net Out policy for a Micro Instance. I understand that a Micro instance should support a Network bandwidth of about 70 Mbit so perhaps the Net Out auto scale can safely be set to about 20 Mbit?
EC2 instance types's exact network performance?
Determining a scale-out trigger for an Auto Scaling group is always difficult.
It needs to be something that identifies that the instance is "busy", to know when to add/remove instances. This varies greatly depending upon the application.
The specific issue with T2 instances is that they have CPU credits. If these credits are exhausted, then there is an artificial maximum level of CPU available. Thus, T2 instances should never have a scaling policy based on CPU.
In your case, you are using networking as the scaling trigger. This is good if network usage is an indication of the instance being "busy", resulting in a bottleneck. If, on the other hand, networking is not the bottleneck then this is not a good scaling trigger.
Traditionally, busy computers are either limited in CPU, Network or Disk access. You will need to study a "busy" instance to discover which of these dimensions is the best indicator that the instance is "busy" such that it cannot handle any additional load.
Alternatively, you might want the application to generate its own metrics, such as the number of messages being simultaneously processed. These can be pushed to Amazon CloudWatch as a custom metric, which can then be used for scaling in/out.
You can even get fancy and use information from a database to trigger scaling events: AWS Autoscaling Based On Database Query Custom Metrics - powerupcloud
I have the following scenario:
I have two Windows servers on AWS that run an application via IIS. For particularities of the application, they work with HTTP load balancing on the IIs.
To reduce costs, I was asked, that the second instance is only started when the first one reaches 90% CPU usage or 85% memory usage.
In my zone (sa-east-1), there are still no Auto Scaling Groups.
Initially, I created a cloudwatch event to start the second instance when it detected high CPU usage at first. The problem is that Cloudwatch, natively still does not monitor memory and so far I'm having trouble customizing this type of monitoring.
Is there any other way for me to be able to start the second instance based on the above conditions?
Since the first instance is always running, it might be something Windows-level, some powershell that detects the high memory usage and start the second? The script to start instances via powershell I already own, I just need help with how to detect the high memory usage event to start the second instance from it.
or some third-party application that does so...
Thanks!
Auto Scaling groups are available in sa-east-1, so use them
Pick one metric upon which to scale (memory OR CPU), do not pick both otherwise it would be confusing how to scale when one metric is high and the other is low.
If you wish to monitor Windows memory in CloudWatch, see: Sending Logs, Events, and Performance Counters to Amazon CloudWatch - Amazon Elastic Compute Cloud
Also, be careful using a metric such as "memory usage" to measure the need to launch more instances. Some systems use garbage collection to free-up memory, but only when available memory is low (rather than continuously).
Plus, make sure your application is capable of running across multiple instances, such as putting it behind a load balancer (depending on what the application actually does).
I'm writing a web-service that packs up customer data into zip-files, then uploads them to S3 for download. It is an on-demand process, and the amount of data can range from a few Megabytes to multiple Gigabytes, depending on what data the customer orders.
Needless to say, scalability is essential for such a service. But I'm having trouble with it. Packaging the data into zip-files has to be done on the local harddrive of a server instance.
But the load balancer is prone to terminating instances that are still working. I have taken a look at scaling policies:
http://docs.aws.amazon.com/autoscaling/latest/userguide/as-instance-termination.html
But what I need doesn't seem to be there. The issue shouldn't be so difficult: I set the scale metric to CPU load, and scale down when it goes under 1%. But I need a guarantee that the exact instance will be terminated that breached the threshold, not another one that's still hard at work, and the available policies don't seem to present me with that option. Right now, I am at a loss how to achieve this. Can anybody give me some advice?
You can use Auto Scaling Lifecycle Hooks to perform actions before an instance is terminated. You could use this to wait for the processing to finish before proceeding with the instance termination.
It appears that you have configured an Auto Scaling group with scaling policies based upon CPU Utilization.
Please note that an Elastic Load Balancer will never terminate an Amazon EC2 instance -- if a Load Balancer health check fails, it will merely stop serving traffic to that EC2 instance until it again passes the health checks. It is possible to configure Auto Scaling to use ELB health checks, in which case Auto Scaling will terminate any instances that ELB marks as unhealthy.
Therefore, it would appear that Auto Scaling is responsible for terminating your instances, as a result of your scaling policies. You say that you wish to terminate specific instances that are unused. However, this is not the general intention of Auto Scaling. Rather, Auto Scaling is used to provide a pool of resources that can be scaled by launching new instances and terminating unwanted instances. Metrics that trigger Auto Scaling are typically based upon aggregate metrics across the whole Auto Scaling group (eg average CPU Utilization).
Given that Amazon EC2 instances are charged by the hour, it is often a good idea to keep instance running longer -- "Scale Out quickly, Scale In slowly".
Once Auto Scaling decides to terminate an instance (which it selects via a termination policy), use an Auto Scaling lifecycle hook to delay the termination until ready (eg, copying log files to S3, or waiting for a long process to complete).
If you do wish to terminate an instance after it has completed a particular workload, there is no need to use Auto Scaling -- just have the instance Shutdown when it is finished, and set the Shutdown Behavior to terminate to automatically terminate the instance upon shutdown. (This assumes that you have a process to launch new instances when you have work you wish to perform.)
Stepping back and looking at your total architecture, it would appear that you have a Load Balancer in front of web servers, and you are performing the Zip operations on the web servers? This is not a scalable solution. It would be better if your web servers pushed a message into an Amazon Simple Queue Service (SQS) queue, and then your fleet of back-end servers processed messages from the queue. This way, your front-end can continue receiving requests regardless of the amount of processing underway.
It sounds like what you need is Instance Protection, which is actually mentioned a bit more towards the bottom of the document that you linked to. As long as you have work being performed on a particular instance, it should not be automatically terminated by the Auto-Scaling Group (ASG).
Check out this blog post, on the official AWS blog, that conceptually talks about how you can use Instance Protection to prevent work from being prematurely terminated.
I manage a group of integration projects and for many we provide an Amazon instance with our product for development and/or demo purposes. In order to be economical with IT budgets, I wonder if there is a proxy software that can measure the traffic to those servers and start the instance on the first request and shut it down if there is no request for a set time (i.e. 60 min.)
Ideally the first request would trigger a page informing the user about the delay and keep autoloading until the instance has been up.
I'd also love to see usage statistics by IP, so I can measure the spread of users, how many different IPs, and the time they kept up the instance. But that is secondary.
Is there any such software/service out there? Preferably in FOSS?
If you utilize Auto Scaling and Custom CloudWatch Metrics you can potentially use any data you want to decide how to Auto Scale. If you have other log sources or application level code, you won't need a proxy, just something to interpret that data, pass it to your CloudWatch metric and then the auto scaling will occur as needed.
Utilizing t1.micro, you can have one instance handling requests and scale out from there with your autoscale group. Pricing for 1 or 3 year reserved instances costs extremely little. You need something to understand incoming volume, so having one instance would be required anyways. Using t1.micro, you operating costs are low and you can scale very granularly.