I am using a CPUCreditBalance Cloudwatch alarm to alert me whenever CPUCreditBalance goes below a threshold (e.g <50).
However, the problem is when instance is just launched it starts with base credits (~20) or sometime even 0.
Is there a way I can prevent alarm from alerting when instance has just launched?
Update: One possible solution that I can think of is writing a custom lambda function that checks if the instance has just started and suppress the alarms for such cases.
Related
I'm setting up an AWS EC2 Autoscaling Group (ASG) and it's using TargetTrackingScaling that is tracking a custom metric. This metric is published by each instance in the ASG every 30 seconds.
It's working fine, but I'd like the scale-out action to happen more quickly. The source of this looks to be due to the ASG Alarm that gets auto generated. To me it looks like it's waiting for at least 3 datapoints over 3 minutes before ringing the alarm.
Is there a way I can configure the ASG/Scaling policy such that the alarm only needs 1 datapoint (or less time) before deciding to ring the alarm? Or if that's not possible, can I create a custom alarm and use that instead of the alarm that the ASG auto generated.
Cases to consider:
Given: An instance attached to an ELB with state InService
When: Its state changes from InService to OutOfService
Then: An AWS Lambda Function is invoked with the InstanceId as part of the event
Given: A fresh new instance, registered to an ELB for the first time
When: It is still in the process of starting up (and therefore not yet reporting positive in health checks)
Then: No AWS Lambda Function is invoked, since the desired state change has not occured
Solutions I have attempted
I set up a Lambda Function to poll a hard-coded list of ELBs every minute. It was successfully invoked for every instance with state: OutOfService. However this was not desirable, since it does not support the second case in the list of cases above.
I modified the health check function which reports back to the ELB. If health check failed, the ELB instance state of the instance from where it was being called. But this too was difficult to establish whether the instance was still starting up.
There are further options available to explore in the second solution but ideally I would prefer not to poll the ELB for information in order to trigger the Lambda Function. I would instead like the Lambda Function to recieve an event of such a transition (either through CloudWatch, SNS or something else if available).
Any insight into options I have not yet considered?
I would agree the polling for the status is not the right answer. One option to explore would be using cloudwatch alarms to trigger your lambda function.
Not 100% sure if there is a built in metric or if you would need to create a custom metric for this but from cloudwatch you can then trigger a SNS notification which in turn could trigger your lambda function.
I am trying to setup a EC2 Scaling group that scales depending on how many items are in an SQS queue.
When the SQS queue has items visible I need the Scaling group to have 1 instance available and when the SQS queue is empty (e.g. there are no visible or non-visible messages) I want there to be 0 instances.
Desired instances it set to 0, min is set to 0 and max is set to 1.
I have setup cloudwatch alarms on my SQS queue to trigger when visible messages are greater than zero, and also triggers an alarm when non visible messages are less than one (i.e no more work to do).
Currently the Cloudwatch Alarm Triggers to create an instance but then the scaling group automatically kills the instance to meet the desired setting. I expected the alarm to adjust the desired instance count within the min and max settings but this seems to not be the case.
Yes, you can certainly have an Auto Scaling group with:
Minimum = 0
Maximum = 1
Alarm: When ApproximateNumberOfMessagesVisible > 0 for 1 minute, Add 1 Instance
This will cause Auto Scaling to launch an instance when there are messages waiting in the queue. It will keep trying to launch more instances, but the Maximum setting will limit it to 1 instance.
Scaling-in when there are no messages is a little bit tricker.
Firstly, it can be difficult to actually know when to scale-in. If there are messages waiting to be processed, then ApproximateNumberOfMessagesVisible will be greater than zero. However, there are no messages waiting, it doesn't necessarily mean you wish to scale-in because messages might be currently processing ("in flight"), as indicated by ApproximateNumberOfMessagesNotVisible. So, you only want to scale-in if both of these are zero. Unfortunately, a CloudWatch alarm can only reference one metric, not two.
Secondly, when an Amazon SQS queue is empty, it does not send metrics to Amazon CloudWatch. This sort of makes sense, because queues are mostly empty, so it would be continually sending a zero metric. However, it causes a problem that CloudWatch does not receive a metric when the queue is empty. Instead, the alarm will enter the INSUFFICIENT_DATA state.
Therefore, you could create your alarm as:
When ApproximateNumberOfMessagesVisible = 0 for 15 minutes, Remove 1 instance but set the action to trigger on INSUFFICIENT_DATA rather than ALARM
Note the suggested "15 minutes" delay to avoid thrashing instances. This is where instances are added and removed in rapid succession because messages are coming in regularly, but infrequently. Therefore, it is better to wait a while before deciding to scale-in.
This leaves the problem of having instances terminated while they are still processing messages. This can be avoided by taking advantage of Auto Scaling Lifecycle Hooks, which send a signal when an instance is about to be terminated, giving the application the opportunity to delay the termination until work is complete. Your application should then signal that it is ready for termination only when message processing has finished.
Bottom line
Much of the above depends upon:
How often your application receives messages
How long it takes to process a message
The cost savings involved
If your messages are infrequent and simple to process, it might be worthwhile to continuously run a t2.micro instance. At 2c/hour, the benefit of scaling-in is minor. Also, there is always the risk when adding and removing instances that you might actually pay more, because instances are charged by the hour -- running an instance for 30 minutes, terminating it, then launching another instance for 30 minutes will actually be charged as two hours.
Finally, you could consider using AWS Lambda instead of an Amazon EC2 instance. Lambda is ideal for short-lived code execution without requiring a server. It could totally remove the need to use Amazon EC2 instances, and you only pay while the Lambda function is actually running.
for simple conf, with per sec aws ami/ubuntu billing dont worry about wasted startup/shutdown time, just terminate your ec2 by yourself, w/o any asg down policy add a little bash in client startup code or preinstal it in cron and poll for process presence or cpu load and term ec2 or shutdown (termination is better if you attach volumes and need 'em to autodestruct) after processing is done. there's ane annoying thing about asg defined as 0/0/1 (min/desired/max) with defaults and ApproximateNumberOfMessagesNotVisible on sqs - after ec2 is fired somehow it switches to 1/0/1 and it start to loop firing ec2 even if there's nothing is sqs (i'm doing video transcoding, queing jobs to do to sns/sqs and firing ffmpeg nodes with asg defined on non empty sqs)
I'm trying to find a way to make an Amazon EC2 instance stop automatically when a certain custom metric on CloudWatch passes a limit. So far if I've understood correctly based on these articles:
Discussion Forum: Custom Metric EC2 Action
CloudWatch Documentation: Create Alarms to Stop, Terminate, Reboot, or Recover an Instance
This will only work if the metric is defined as follows:
Tied to certain instance
With type of System/Linux
However in my case I have a custom metric that is actually not instance-related but "global" and if a certain limit is passed, I would need to stop all instances, no matter from which instance the limiting log is received.
Does anybody know if there is way to make this work? What I'd need is some way to make CloudWatch work like this:
If arbitrary custom metric value passes a certain limit -> stop defined instances not tied to the metric itself.
The main problem is that the EC2 option is greyed out as the metric is not tied to certain EC2 instance and I'm not sure if there's any way to do this without actually making the metric itself certain instance related.
Have the custom CloudWatch metric post alerts to an SNS topic.
Have the SNS topic trigger a Lambda function that shuts down your EC2 instances via a call to the AWS API.
We have an opsworks stack with two 24x7 instances. Four time-based instances. Two load-based instances.
Our issue is with the load-based instances. We've spent a great deal of time creating meaningful-to-our-service cloudwatch alarms. Thus, we want the load-based instances in our stack to come UP when a particular cloudwatch latency alarm is in an ALARM state. I see that in the load-based instance configuration, you can define a cloudwatch alarm for bringing the instance(s) UP and you can define a cloudwatch alarm for bringing the instance(s) DOWN.
Thing is, when I select the specific cloudwatch alarm I want to use to trigger the UP, it removes that cloudwatch alarm from being selected as the trigger for DOWN. Why?
Specifically, we want our latency alarm (we'll call it the "oh crap things are slowing down" cloudwatch alarm) to trigger the load-based instances to START when in an ALARM state. Then, we want the "oh crap things are slowing down" cloudwatch alarm to trigger the load-based instances to SHUTDOWN when in an OK state. It would be rad if the load-based instances waited 15 minutes after the OK state of the alarm before shutting down.
The "oh crap things are slowing down" threshold is Latency > 2 for 3 minutes
Do I just need to create a new "oh nice things are ok" alarm with a threshold of Latency < 2 for 3 minutes to use as the DOWN alarm in the load-based instance configuration?
Sorry for the newbie question, just feel stuck.
From what I can tell, you have to add a second alarm that triggers only when the latency is below 2 for three minutes. If someone else comes up with a cleaner solution than this, I'd love to hear about it. As it is, you'll always have one of the alerts in a continuous state of alarm.