Cloudwatch Period time - amazon-web-services

CPU metrics cannot be selected below 1 minute in Cloudwatch service. For example, how can I lower this period time to trigger the Autoscale scale faster? I just need to trigger the AutoScale instances in short time. (By the way, datapoints value 1 to 1)

the minimum granularity for the metrics that EC2 provides is 1 minute.
Source: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html
Would also say that if you need to scale that quickly, wouldn't the startup time be an issue anyway?

You are correct -- basic monitoring of an Amazon EC2 instance provides metrics over 5-minute periods. If you activate EC2 Detailed Monitoring, metrics are provided over 1-minute periods. Extra charges apply for Detailed Monitoring.
When launching a new instance via Amazon EC2 Auto-Scaling, it can take a few minutes for the new instance to launch and for the User Data script (if any) to run. Linux instances are quite fast, but Windows instances take a while on their first boot due to sysprep operations.
You mention that you want to react to a metric in less than one minute. I would suggest that this would not be an ideal way to trigger Auto-scaling. Sometimes a computer can be busy for a while, then can drop down again. Reacting too quickly to a high CPU load would cause the Auto-Scaling group to flap between adding instances and terminating instances. It is better to provision enough capacity for a reasonable amount of extra load and then gradually add more capacity as it is required over time.
If you have a need to react so quickly, then perhaps you should investigate using AWS Lambda to perform small amounts of work in a highly-parallel fashion rather than relying on Amazon EC2 instances.

Related

AWS T2 Micro Autoscaling Network Out

I have a T2 Micro instance on AWS Beanstalk with Autoscaling set up. The autoscaling policy uses the Network Out parameter and currently I have it set at 6mb. However, this results in a lot of instances being created and terminated (as the Net Out goes above 6mb). My question is what is an appropriate auto-scaling Net Out policy for a Micro Instance. I understand that a Micro instance should support a Network bandwidth of about 70 Mbit so perhaps the Net Out auto scale can safely be set to about 20 Mbit?
EC2 instance types's exact network performance?
Determining a scale-out trigger for an Auto Scaling group is always difficult.
It needs to be something that identifies that the instance is "busy", to know when to add/remove instances. This varies greatly depending upon the application.
The specific issue with T2 instances is that they have CPU credits. If these credits are exhausted, then there is an artificial maximum level of CPU available. Thus, T2 instances should never have a scaling policy based on CPU.
In your case, you are using networking as the scaling trigger. This is good if network usage is an indication of the instance being "busy", resulting in a bottleneck. If, on the other hand, networking is not the bottleneck then this is not a good scaling trigger.
Traditionally, busy computers are either limited in CPU, Network or Disk access. You will need to study a "busy" instance to discover which of these dimensions is the best indicator that the instance is "busy" such that it cannot handle any additional load.
Alternatively, you might want the application to generate its own metrics, such as the number of messages being simultaneously processed. These can be pushed to Amazon CloudWatch as a custom metric, which can then be used for scaling in/out.
You can even get fancy and use information from a database to trigger scaling events: AWS Autoscaling Based On Database Query Custom Metrics - powerupcloud

reduce price by on AWS (EC2 and spot instances)

I have a queue of jobs and running AWS EC2 instances which process the jobs. We have an AutoScaling groups for each c4.* instance type in spot and on-demand version.
Each instance has power which is a number equal to number of instances CPUs. (for example c4.large has power=2 since it has 2 CPUs).
The the exact power we need is simply calculated from the number of jobs in the queue.
I would like to implement an algorithm which would periodically check the number of jobs in the queue and change the desired value of the particular AutoScaling groups by AWS SDK to save as much money as possible and maintain the total power of instances to keep jobs processed.
Especially:
I prefer spot instances to on-demand since they are cheaper
EC2 instances are charged per hour, we would like to turn off the instance only at the very last minute of its 1hour uptime.
We would like to replace on-demand instance by spot instances when possible. So, at 55min increase spot-group, at 58 check the new spot instance is running and if yes, decrease on-demand-group.
We would like to replace spot instances by on-demand if the bid would be too high. Just turn off the on-demand one and turn on the spot one.
Seems the problem is really difficult to handle. Anybody have any experience or a similar solution implemented?
You could certainly write your own code to do this, effectively telling your Auto Scaling groups when to add/remove instances.
Also, please note that a good strategy for lowering costs with Spot Instances is to appreciate that the price for a spot instance varies by:
Region
Availability Zone
Instance Type
So, if the spot price for a c4.xlarge goes up in one AZ, it might still be the same cost in another AZ. Also, the price of a c4.2xlarge might then be lower than a c4.xlarge, with twice the power.
Therefore, you should aim to diversity your spot instances across multiple AZs and multiple instance types. This means that spot price changes will impact only a small portion of your fleet rather than all at once.
You could use Spot Fleet to assist with this, or even third-party products such as SpotInst.
It's also worth looking at AWS Batch (not currently available in every region), which is designed to intelligently provide capacity for batch jobs.
Autoscaling groups allow you to use alarms and metrics that are defined outside of the autoscaling group.
If you are using SNS, you should be able to set up an alarm on your SNS queue and use that to scale up and scale down your scaling group.
If you are using a custom queue system, you can push metrics to cloudwatch to create a similar alarm.
You can determine how often scaling actions do occur, but it may be difficult to get the run time to exactly one hour.

AWS site down issue because cpu utilization reach 100%

I am using an Amazon EC2 instance with instance type m3.medium and an Amazon RDS database instance.
In my working hours the website goes down because CPU utilization reaches 100%, and at night (not working hours) the CPU utilization is 60%.
So please give me right solution for this site down issue. I am not sure why I am experiencing this problem.
Once I had set a cron job for every minutes, but I was removed it because of slow down issue, but still I have site down issue.
When i try to use "top" command, i had shows below images for cpu usage, in which httpd command consume more cpu usage, so any suggestion for settings to reduce cpu usage with httpd command
Without website use by any user below two images:
http://screencast.com/t/1jV98WqhCLvV
http://screencast.com/t/PbXF5EYI
After website access simultaneously 5 users
http://screencast.com/t/QZgZsiNgdCUl
If you are CPU Utilization is reaching 100% you have two options.
Increase your EC2 Instance Type to large.
Use AutoScaling to launch one more EC2 Instance of same Instance Type.
Looks like you need some scheduled actions as you donot need 100% CPU Utilization during non-working hours.
The best possible option is to use AWS AutoScaling with Scheduled actions.
http://docs.aws.amazon.com/autoscaling/latest/userguide/schedule_time.html
AWS AutoScaling can launch new EC2 instances based on your CPU Utilization (or other metrics like Network Load, Disk read/write etc). This way you can always keep your site alive.
Using the AutoScaling scheduled actions you can specify metrics such that you stop your autoscaled instances during non-working hours and autoscale instances during working hours according to CPU Utilization(or other metrics).
You can even stop your severs if you donot need them at some point of time.
If you are not familiar with AWS AutoScaling you can follow the Documentation which is very precise and easy.
http://docs.aws.amazon.com/autoscaling/latest/userguide/GettingStartedTutorial.html
If the cpu utilization reach 100% bacause of the number of visitors your site have, you must consider to change the instance type, Auto Scaling or AWS CloudFront in order to cache as many http requests as posible (static and dynamic content).
If visitors are not the problem and there are other scheduled tasks on the EC2 isntance, I strongly recomend to decouple these workload via AWS SQS & AWS Elasticbeanstalk - Worker type

How to autoscale EMR task instances

I am using EMR with task instance groups as spot instances. I want to maintain minimum number of task instances always.
Means, whenever EMR terminates task instances because of bid price goes higher than what we set, my application should launch another task instance with little higher bid price.
My research-
Use Cloudwatch to inform when it breaches threshhold, and auto-scale task instances. But as per study, there is no concept of auto-scaling in EMR.
Use Cloudwatch, and notify SQS when threshhold breahes, and there is one service who is always consuming and expand task instances.
Questions
Is there any auto-scaling present in EMR ? If that is available, then my efforts will reduce to just set threshhold, and corresponding expansion task instances action.
If you have any other approach to solve this problem, please suggest.
How Spot Prices Work
When an Amazon EC2 instance is launched with a spot price (including when launched from Amazon EMR), the instance will start if the current spot price is below the provided bid price. If the spot price rises above the bid price, the instance is terminated. Instances are only charged the current spot price.
Therefore, the logic of launching a new spot instance with a "little higher bid price" is not necessary. The instance will always be charged the current spot price, so simply bid as high as you are willing to pay for a spot instance. You will either pay less than the spot price (great!) or your instance will be terminated because the price has gone higher than you are willing to pay (in which case you don't want to pay a "little higher" for the instance).
If you wish to "maintain minimum number of task instances" at all times, then either pay the normal EMR charge (which means the instances won't be terminated) or bid a particularly large price for the spot instances, such as 2 x the normal price. Yes, you might occasionally pay more for instances, but on average your price will be quite low.
If you wish to be particularly sneaky, you could bid up to the normal price for the EC2 instances then, if instances are terminated, launch more task nodes without using spot pricing. That way, your instances won't be terminated and you won't pay more than the normal EC2 price. However, you would have to terminate and replace those instances when the spot price drops, otherwise you are paying too much. That's why it might be better just to provide a high bid price on your spot instances.
Bottom line: Use spot pricing, but bid a high price. You'll get a good price most of the time.
AWS EMR does not have a autoscaling option available. But you can use a work around and integrate Autoscaling using AWS SQS. This is a rough picture what you can integrate.
Launch you EMR cluster using spot instance.
Set up a SQS Queue and create 3 triggers one for CPU threshold , second for EC2 spot instance termination notice and third for changing the spot instance bid prices.
So if the CPU usage increases SQS will trigger an event to launch a new instance to cluster, if there is spot instance termination notice SQS will trigger to launch another instance to balance the load and send a event to change the bid price to launch another spot instance. (This is just rough sketch but I guess you will understand the logic.
This is guide to AWS SQS Autoscaling.
https://docs.aws.amazon.com/autoscaling/latest/userguide/as-using-sqs-queue.html
As has been correctly pointed, the EMR API provides all necessary ingredients to 1) collect monitoring data, and 2) programmatically scale the cluster up and down.
Basically, there are two main options to implement autoscaling for EMR clusters:
Autoscaling Loop: A process that is running on a server and continuously monitors the cluster for its current load. Performance metrics (memory, CPU, I/O, etc) can be collected in regular intervals and stored in a database. Autoscaling rules are evaluated against the performance metrics, and the cluster's task nodes are scaled up or down if required.
Event-Based Autoscaling: Using CloudWatch metrics (e.g., metrics for EMR or EC2), you can programmatically define triggers that are fired under certain conditions (for instance, add nodes if average CPUUtilization of all nodes exceeds 80%).
Both options have their pros and cons. The main advantage of option 2 is that it is a server-less approach (does not require to run your own server). Option 1, on the other hand, does require a server, but therefore comes with more control to customize the logic of your scaling rules. Also, it allows to keep searchable records of the history of the scaling decisions.
You could take a look at Themis, an EMR autoscaling framework developed at Atlassian. Themis implements the autoscaling loop as discussed in option 1 above. Current features include proactive as well as reactive autoscaling, support for spot/on-demand task nodes, it comes with a Web UI, and the tool is very easy to configure.
I have had a similar problem, and I wanted to share one possible alternative. I have written a Java tool to dynamically resize an EMR cluster during the processing. It might help you. Check it out at:
http://www.lopakalogic.com/articles/hadoop-articles/dynamically-resize-emr/
The source code is available on Github

Amazon EC2 Instance Monitoring?

I am in need of a fairly short/simple script to monitor my EC2 instances for Memory and CPU (for now).
After using Get-EC2Instance -Region , it lists all of the instances. from here where can i go?
Cloudwatch is the monitoring tool for AWS instances. While it can support custom metrics, by default it only measures what the hypervisor can see for your instance.
CPU utilization is supported by default, this is often a more accurate way to see your true CPU utilization since the value comes from the hypervisor.
Memory utilization however is not. This depends largely on your OS and is not visible to the hypervisor. However, you can set up a script that will report this metric to Cloudwatch. Some scripts to help you do this are here: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts-perl.html
There are a few possibilities for monitoring EC2 instances.
Nagios - http://www.nagios.com/solutions/aws-monitoring
StackDriver - http://www.stackdriver.com/
CopperEgg - http://copperegg.com/aws/
But my favorite is Datadog - http://www.datadoghq.com/ - (not just because I work here, but its important to disclose I do work for Datadog.) 5 hosts or less is free and I bet you can be up and running in less than 5 minutes.
Depends what your requirements are for service availability of the monitoring solution itself, as well as how you want to be alerted about host/service notifications.
Nagios, Icinga etc... will allow you to customise an extremely large number of parameters that can be passed to your EC2 hosts, specifying exactly what you want to monitor or check up on. You can run any of the default (or custom) scripts which then feed data back to a central system, then handle those notifications however you want (i.e. send an email, SMS, execute an arbitrary script). Downside of this approach is that you need to self-manage your backend for all of the aggregated monitoring data.
The CloudWatch approach means your instances can push metric data into AWS, then define custom policies around thresholds. For example, 90% CPU usage for more than 5 minutes on an instance or ASG, which might then push a message out to your email via SNS (Simple Notification Service). This method reduces the amount of backend components to manage/maintain, but lacks the extreme customisation abilities of self-hosted monitoring platforms.