Set AWS Cloudwatch Alarm datapoint timespan and action to shut it down - amazon-web-services

Follwing case:
We want an Alarm in AWS that reads the EstimatedCharges Metric of AmazonCloudWatch every 5 minutes (for potential log overflow). But the only timespan I can set are 6 hours, else it gives me "Insufficient" as Status. How can I change the metric so that I can use it with 5 minutes between each check?
And how can I make an action that will stop the Cloudwatch Logs when over X?

According to documentation.
CloudWatch Billing metrics are updated every 6 hours.
Thus, your Alert may change the status only every 6 hours.

Just set Treat Missing Data to notBreaching
notBreaching – Missing data points are treated as "good" and within the threshold,
More info: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

Related

AWS CloudWatch Composite Alarms: Send Alert When 1 Alarm Has been "In Alarm" For Over a Certain Amount Of Time

So I'm trying to setup composite alarms on AWS. So far, I have most of it set up. At the moment, I have a composite alarm set up with 3 alarms. If any 2 of these 3 alarms trigger, then the composite alarm also triggers. This part works fine.
However, I am having trouble with part of my use case. I'd also like to make it so that if one of these alarms within the composite alarm stays in alarm for over a certain period of time, then an alert is also sent out.
Here's an example of the situation:
2 out of the 3 alarms turn on in any time period: Alert should be sent
1 out of the 3 alarms turn on for under a certain time period: Alert should not be sent
1 out of the 3 alarms turn on for over a certain time period: Alert should be sent
I've tried looking into the settings available on the alarms themselves, and there doesn't seem to be an option for what I'm trying to do.
I'm wondering if this would require a lambda function? Is it possible for a lambda function to keep track of how long an alarm has been in alarm?
As talked in the comment section above, I am providing you with a possible solution to your problem. The only blocker is that you can't have different time frame for the alarms, both should be the same.
So you will have (example)- Alarm 1(cpu) if for 15 min it's over 60%. Alarm 2(EFS connections) if for 15 min there are more than 10 connections.
Now the alarm will go off when both the statements are true. Also the alarm will go off when only Alarm 1 goes off.
This is how you are going to make this alarm.
As for testing, it depends on what type of alarms you are making. For example cpu and ram increment methods are widely available on stackoverflow.
Also with aws cli you can change state of an alarm. It's usually for a very small amount of time, maybe 10 seconds.
aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"
You need to find a method which can suite your needs better.

AWS Cloudwatch monitoring alarm is triggered when instance is off

I have some alarms to check when an instance is left idle. The conditions are when 12 consecutive datapoints (at 5 min each) are found to have an average of <1% CPU usage, the instance should be stopped and a notification email sent out.
The alarm I created reads:
Whenever _Average_ of _CPU Utilization_
is _<_ +1+ Percent
For at least _12_ consecutive periods of _5 minutes_
Alarm
The alarm gets triggered in the use case of the instance being up and running for 1 hour with <1% CPU usage.
However, the alarm is also triggered when the instance is shut off. For instance, if the the instance is turned on, has 30 minutes of data points <1% CPU, and then is turned off, the alarm will be triggered in 30 minutes.
CPU metrics
How can I set this alarm so it is either:
only triggered when the instance is running, or
only triggered when a full set of 12 consecutive data points is actually collected, and not missing points that register as <1%?
The answer to this was actually quite simple. If you go to Cloudwatch, select the alarm and scroll down to Additional Configuration. For Missing Data Treatment, select "Treat missing data as good (not breaching alarm)".
Well as AWS says:
For each alarm you can specify CloudWatch to treat missing data points
as any of the following :
missing: the alarm does not consider missing data points when evaluating whether to change state (default)
notBreaching: missing data points are treated as begin within the threshold
breaching: missing data points are treated as breaching the threshold
ignore: the current alarm state is maintained

AWS Cloudwatch Alarm status

I have set cloudwatch alarm to trigger SNS mail whenever some keywords are found in cloudwatch logs. (using metric filter)
When those keywords are detected, Alarm state gets changed from insufficient data to alarm & triggers SNS topic
Now, to move from Alarm state alarm to insufficient data it takes time randomly.
Is there any specific way it works, I expect it to come back to Alarm state insufficient data immediately after alarm state.
Any help would be appreciated. Thanks
The alarm has a metric period of 60 seconds and some evaluation period (let suppose 3; total equal 3 * 60 = 3 mints evaluation window).
The alarm will be in Alarm state if all the last 3 datapoints at 60 seconds interval are in Alarm State (above the threshold).
If any 1 in last 3 datapoint is below threshold then the Alarm will transition to OK.
BUT, if the latest all 3 datapoints are missing (say your metric filter did not match and as a result no metric was pushed), the Alarm waits longer than 3 periods to transition to InsufficientData and this is by design to accommodate network delays or processing delay.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
Came across the same situation, used a period of 1 min and some x > threshold.
The state changes to Alarm immediately whenever the metric exceeds the threshold. But to change back to OK/ Insufficient data takes 6 mins. This happens only for missing data.
As per AWS Support this is the expected behavior of Cloudwatch Alarms, clear explanation can be found here https://forums.aws.amazon.com/thread.jspa?threadID=284182

AWS CloudWatch Zero Queue Size For One Week alarm

I am wondering if there is a way to set up a CloudWatch alarm that will alarm if an SQS queue has not received any traffic for 7 days. I currently have a job that runs on my host once a week that is guaranteed to add message to my SQS queue, I already have a way of alarming if the job doesn't run but I would also like to alarm if for some reason the job does run but does not send any messages to my queue. I understand that the longest alarm period you can set is 1 day. Is there another way to create an alarm that will do what I am looking for?
Edit:
Since my job runs once a week is there a way to have an alarm that will monitor metrics every 7th day, seeing if any traffic hits the queue within a 24 hour time frame? This is more accurate seeing as the 6 days in between I don't expect or care if there is any traffic only that there is traffic on that 7th day.
CloudWatch Alarms set a limit that period * number_of_datapoints_to_watch must be less than 24 hours. As far as I know, there is no way around that.
To get the behavior you want, you can calculate days since last activity yourself, publish that as a custom metric and alarm on that.
One way to do it would be:
Create a lambda function and have it trigger every hour for example.
In the lambda, call CloudWatch GetMetricStatistics for the SQS metric you want to monitor.
Get the latest datapoint returned that has value greater than 0 and calculate the difference between now and the timestamp on that datapoint.
Use CloudWatch PutMetricData to publish this value to your new metric days-since-last-activity.
Now you can alarm when the value of your new metric goes above 7 days.

Amazon Cloudwatch alarm not triggered

I have a cloudwatch alarm configured :
Threshold : "GreaterThan 0" for 1 consecutive period,
Period : 1 minute,
Statistic : Sum
The alarm is configured on top of AWS SQS NumberOfMessagesSent. The queue was empty and no messages were being published to it. I sent a message manually. I could see the spike in metric but state of alarm was still OK. I am a bit confused why this alarm is not changing its state even though all the conditions to trigger this are met.
I just overcame this problem with the help of AWS support. You need to set the period on your alarm to ~15 minutes. It's got to do with how SQS marks the event's timestamps as it pushes them to CloudWatch.
Don't worry, as setting the period to a greater number will not affect how quickly you are alerted of an alarm. It will still get data from SQS every 5 minutes.
It could be that the interval time is set to less than 300 seconds. The free CloudWatch checks every 5 minutes so if you set an alarm for less than that it you will sometimes get INSUFFICIENT_DATA.
Sometimes they suffer something calling "Delayed Metric delivery", it's something more usual when the alarm period is around narrow times, like 1 minute.
When the delayed timestamp arrive, is too late for the alarm, but not for the graph, because it finally print it nicely without gap.
Play with Evalution Periods and Datapoints to Alarm, not 1/1, maybe 3/2 or 3/1 would work fine.