Sometimes if there are conditions that prevent the app from starting, say a bad config, the auto scaler will continue to start up instances one after the other.
Anybody know of a good way to alert on this?
Most of our servers receive network traffic so we put a CloudWatch monitor on the NetworkIn metric.
I would suggest configuring the start-up script to Terminate/Shutdown the instance upon failure and sending an alert using CloudWatch custom metrics or any other service like NewRelic.
I don't think that there is a way to alert auto-scaling-group to stop spanning up instances. You could set the max instances limit and have an alert upon reaching this number.
You could alert based on the CloudWatch metric:
Auto Scaling / Group Metrics / GroupTerminatingInstances
See the doc page for more details
Related
Can we stop an AWS windows server EC2 instance of a development environment when there is no activity in it, say after 2 hours of inactivity? I am having trouble identifying whether any user is connected to the server virtually.
I can easily start/stop the EC2 at a fixed time, programmatically, but in order to cut the cost of my server, I am trying to stop the EC2 when it is not being used.
My intent(or use case) is: If no user is using the EC2 till a specified amount of time, it will automatically stop. Developers can restart it as and when needed.
Easiest solution probably would be to set up an Alert with CloudWatch.
Have a read at the documentation, which basically describes your use case perfectly:
You can create an alarm that stops an Amazon EC2 instance when a
certain threshold has been met
A condition could be the average CPU utilisation, e.g. CPU utilisation is below a certain point (which most probably correlates with no logged in users / no developer actually utilising the machine).
This is not a simple task.
The Amazon EC2 service provides a virtual computer that has RAM, CPU and Disk. It can view the amount of activity on the CPU, Network traffic and disk access but it cannot see into the Operating System.
So, the problem becomes how to detect 'inactivity'. This really comes down to the operating system and making some hard decisions. For example, your home computer screen turns off after a defined time of no mouse/keyboard input but the operating system is still doing activity in the background. If the system is running an application such as a web server, and there are no web requests, it is hard to know whether this is 'inactive' because there are no requests, or 'active' because the web server is running.
Bottom line: There is no out-of-the-box feature to do this. You would need to find your own definition of 'inactivity' and then trigger a shutdown in the Operating System.
If you wish to do it via schedule, this might help: Auto-Stop EC2 instances when they finish a task - DEV Community
UPDATE: Lambda's aren't needed anymore, see tpschmidt's answer.
Create a Lambda to turn off the EC2 that will be triggered by a Cloud Watch Alarm when for example the CPU goes under 20% average for an hour. This is fine when you're coding as you will be using more than 20%, and when you have a break for over an hour that's when you want it turned off.
Be sure to set auto save in your IDE's.
Example Python Lambda:
import boto3
region = 'eu-west-3'
instances = ['i-05be5c0c4039881ed']
ec2 = boto3.client('ec2', region_name=region)
def lambda_handler(event, context):
#TODO getInstanceIDFromCloudWatch = event["instanceid"]
ec2.stop_instances(InstanceIds=instances)
print('stopped your instances: ' + str(instances))
Ref: https://www.howtoforge.com/aws-lambda-function-to-start-and-stop-ec2-instance/
In AWS Console:
Goto EC2, select the EC2 instance and copy the Instance ID
Goto Cloud Watch and select Metrics
Under AWS Namespaces click EC2
Paste the Instance ID to find it
Select EC2 > Per-Instance Metrics
Choose the first metric CPU utilisation
Select the second tab called Graphed Metric
Click the Bell icon under Actions
Set a threshold, also this is the hard part, leave the default of Statistic: Average over 1 hour
Set the Condition Lower/Equal and put the value as 20% (you'll need to use the machine more than 1/5th of the hour over 20% CPU otherwise it'll turn off).
Next create an alarm, setup a notification if you like or remove it
Once the Alarm is created
In Cloud Watch select Event > Rules
Add a Rule
Select EC2 as the Service Name and All Event
Click Target and select your Lambda.
When the Alarm goes off the Lambda will turn off the instance ID
You can set up an AWS Cloudwatch alarm that monitors activity. Different parameters like ComparisonOperator, Period, and Threshold can be modified according to how you want to monitor your Ec2 instance.
Then, you can set up an SQS queue and set a Python Lambda function as its target. Within the lambda function, you can use boto3 to turn off the ec2 instance. You can read more details here: https://medium.com/geekculture/automatically-turn-off-ec2-instances-upon-inactivity-31fedd363cad
Terraform setup:
https://medium.com/geekculture/terraform-setup-for-automatically-turning-off-ec2-instances-upon-inactivity-d7f414390800
You are looking for adding stop action to your ec2 instance, this can be easily achieved using CloudWatch alarms.
You can do this from the console using the following steps:
Open the Amazon EC2 console
In the navigation pane, choose Instances.
Select the instance and choose Actions, Monitor and troubleshoot,
Manage CloudWatch alarms.
Alternatively, you can choose the plus sign ( ) in the Alarm status
column.
On the Manage CloudWatch alarms page, do the following:
Choose to Create an alarm.
To receive an email when the alarm is triggered, for Alarm
notification, choose an existing Amazon SNS topic. You first need to
create an Amazon SNS topic using the Amazon SNS console. For more
information, see Using Amazon SNS for application-to-person (A2P)
messaging in the Amazon Simple Notification Service Developer Guide.
Toggle on the Alarm action, and choose Stop.
For Group samples by and Type of data to sample, choose a statistic
and a metric. In this example, choose Average and CPU utilization.
For Alarm When and Percent, specify the metric threshold. In this
example, specify <= and 10 percent.
For the Consecutive period and Period, specify the evaluation period for
the alarm. In this example, specify 1 consecutive period of 5
Minutes.
Amazon CloudWatch automatically creates an alarm name for you. To
change the name, for the Alarm name, enter a new name. Alarm names must
contain only ASCII characters.
Choose to Create.
Note You can adjust the alarm configuration based on your own
requirements before creating the alarm, or you can edit them later.
This includes the metric, threshold, duration, action, and
notification settings. However, after you create an alarm, you
cannot edit its name later.
Check this link from the documentation for terminating the instance using the same way.
You are looking for adding stop action to your ec2 instance, this can be easily achieved using CloudWatch alarms.
Here, I will show how to do that using Terraform:
resource "aws_cloudwatch_metric_alarm" "ec2_cpu" {
alarm_name = "StopTheInstanceAfterInactivity"
metric_name = "CPUUtilization"
comparison_operator = "LessThanOrEqualToThreshold"
statistic = "Average"
threshold = var.threshold
evaluation_periods = var.evaluation_periods # The number of periods over which data is compared to the specified threshold
period = var.period # Evaluation Period (seconds)
namespace = "AWS/EC2"
alarm_description = "This metric monitors ec2 cpu utilization and stop the instance if it is inactive"
actions_enabled = "true"
alarm_actions = ["arn:aws:automate:${var.region}:ec2:stop"]
ok_actions = [] # do nothing
insufficient_data_actions = [] # do nothing
dimensions = {InstanceId = aws_instance.ec2_instance.id}
}
We have a service running in aws ecs that we want to scale in and out based on 2 metrics.
Scale out when: cpu > 80% or connection_count > 9500
Scale in when: cpu < 50% and connection_count < 5000
We have access to both the cpu and connection count metrics and alarms in cloud watch. However, we can't figure out how to setup a dynamic scaling policy like this based on both of them.
Using the standard aws console interface for creating the auto scaling rules I don't see any options for multiple. Any links to a tutorial or aws docs on this would be appreciated.
Based on the responses posted in the support aws forums, nothing can be done for AND/OR/IF conditions. (https://forums.aws.amazon.com/thread.jspa?threadID=94984)
It does mention however that they already put a feature request to the cloudwatch team.
The following is mentioned as a workaround:
"In the meantime, a possible workaround can be to create a custom metric using a custom script which would run after every five minutes and get the data points from the CloudWatch metrics, then perform the AND or OR operation and then push the output to a custom metric. You can then create a CloudWatch alarm which would monitor this custom metric and then trigger actions accordingly."
I've got my web servers set up to autoscale at certain times of the day.
I can measure the load on the box using scripts executed by Consul - and this can trigger events at certain thresholds.
I want to push these two together and trigger autoscaling at certain load levels. (Assume CPU load at 75% is the threshold).
My question is: What is the mechanism to get load on a box to trigger an autoscaling group in AWS?
Assumptions:
I was not planning to use AWS Cloudwatch - but am interested if this is the solution.
I'm more interested in the autoscale triggering interface. Is it a queue or a rest endpoint?
As #mahdi said, you can easily use AWS Cloudwatch to do this.
However, if you want Consul (or anything outside the scope of an AWS "service") to do it, you can use lambda.
You would create a lambda function that scales your instance up or down (or both). Lambda can have many triggers, such as an HTTP endpoint through API Gateway. If you already have Consul set up to do it (sounds like you do since you said can trigger events at certain thresholds.) just make it issue the HTTP request to API Gateway to scale up or down.
You can create a CloudWatch alarm with a CPUUtilization metric and set it to change state when your instance has CPU utilization more than 75%. Then in the Auto Scaling Group, you use this alarm for scaling (in/out) policy. You can also control the number of instances in an Auto Scaling Group by manually (e.g. through your application running on one the instances) changing the Desired value. This documentation can be helpful.
AWS auto scaling works based on the load (number of concurrent requests). It works perfectly for web sites and web APIs. However there are situations in which the number of required EC2 instances is not related to the requests but it depends on something else such as number of items in a queue.
For example an order processing system which pulls the orders from a custom queue (and not SQS) might need to scale out to process the order quicker. How can we make this happpen?
Auto scaling groups can be configured to scale in or out by linking their scaling policies to Cloud Watch alarms. Many people use CPU utilization as a scaling trigger but you can use any Cloud Watch metric you like. In your case you could use your queue's ApproximateNumberOfMessagesVisible metric.
For example, if you create an alarm that fires when the ApproximateNumberOfMessagesVisible > 500 and link that to the scale out policy of your auto scaling group, the group will create new instances whenever the queue has more that 500 messages in it.
I've got an app running on AWS. How do I set up Amazon CloudWatch to notify me when the EC2 instance fails or is no longer responsive?
I went through the CloudWatch screens, and it appears that you can monitor certain statistics, like CPU or disk utilization, but I didn't see a way to monitor an event like "the instance got an http request and took more than X seconds to respond."
Amazon's Route 53 Health Check is the right tool for the job.
Route 53 can monitor the health and performance of your application as well as your web servers and other resources.
You can set up HTTP resource checks in Route 53 that will trigger an e-mail notification if the server is down or responding with an error.
http://eladnava.com/monitoring-http-health-email-alerts-aws/
To monitor an event in CloudWatch you create an Alarm, which monitors a metric against a given threshold.
When creating an alarm you can add an "action" for sending a notification. AWS handles notifications through SNS (Simple Notification Service). You can subscribe to a notification topic and then you'll receive an email for you alarm.
For EC2 metrics like CPU or disk utilization this is the guide from the AWS docs: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/US_AlarmAtThresholdEC2.html
As answered already, use an ELB to monitor HTTP.
This is the list of available metrics for ELB:
http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/US_MonitoringLoadBalancerWithCW.html#available_metrics
To answer your specific question, for monitoring X seconds for the http response, you would set up an alarm to monitor the ELB "Latency".
CloudWatch monitoring is just like you have discovered. You will be able to infer that one of your instances is frozen by taking a look at the metrics, but CloudWatch won't e.g. send you an email when your app is down or too slow, for example.
If you are looking for some sort of notification when your app or instance is down, I suggest you to use a monitoring service. Pingdom is a good option. You can also set up a new instance on AWS and install a monitoring tool, like Nagios, which would be my preferred option.
Good practices that are always worth, in the long road: using load balancing (Amazon ELB), more than one instance running your app, Autoscaling (when an instance is down, Amazon will automatically start a new one and maintain your SLA), and custom monitoring.
My team has used a custom monitoring script for a long time, and we always knew of failures as soon as they occurred. Basically, if we had two nodes running our app, node 1 sent HTTP requests to node 2 and node 2 to 1. If any request took more than expected, or returned an unexpected HTTP status or response body, the script sent an email to the system admins. Nowadays, we rely on more robust approaches, like Nagios, which can even monitor operating system stuff (threads, etc), application servers (connection pools health, etc) and so on. It's worth every cent invested in setting it up.
CloudWatch recently added "status check" metrics that will answer one of your questions on whether an instance is down or not. It will not do a request to your Web server but rather a system check. As previous answer suggest, use ELB for HTTP health checks.
You could always have another instance for tools/testing, that instance would try the http request based on a schedule and measure the response time, then you could publish that response time with CloudWatch and set an alarm when it goes over a certain threshold.
You could even do that from the instance itself.
As Kurst Ursan mentioned above, using "Status Check" metrics is the way to go. In some cases you won't be able to browse that metrics (i.e if you;re using AWS OpsWorks), so you're going to have to report that custom metric on your own. However, you can set up an alarm built on a metric that always matches (in an OK sate) and have the alarm trigger when the state changes to "INSUFFICIENT DATA" state, this technically means CloudWatch can't tell whether the state is OK or ALARM because it can't reach your instance, AKA your instance is offline.
There are a bunch of ways to get instance health info. Here are a couple.
Watch for instance status checks and EC2 events (planned downtime) in the EC2 API. You can poll those and send to Cloudwatch to create an alarm.
Create a simple daemon on the server which writes to DynamoDB every second (has better granularity than Cloudwatch). Have a second process query the heartbeats and alert when missing.
Put all instances in a load balancer with a dummy port open that that gives a TCP response. Setup TCP health checks on the ELB, and alert on unhealthy instances.
Unless you use a product like Blue Matador (automatically notifies you of production issues), it's actually quite heinous to set something like this up - let alone maintain it. That said, if you're going down the road, and want some help getting started using Cloudwatch (terminology, alerts, logs, etc), start with this blog: How to Monitor Amazon EC2 with CloudWatch
You can use CloudWatch Event Rule to Monitor whenever any EC2 instance goes down. You can create an Event rule from CloudWatch console as following :
In the CLoudWatch Console choose Events -> rule
For Event Pattern, In service Name Choose EC2
For Event Type, Choose EC2 Instance State-change Notification
For Specific States, Choose Stopped
In targets Choose any previously created SNS topic for sending a notification!
Source : Create a Rule - https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatch-Events-Input-Transformer-Tutorial.html#input-transformer-create-rule
This is not exactly a CloudWatch alarm, however this serves the purpose of monitoring/notification.