How do i schedule alarms in aws cloudwatch - amazon-web-services

I have a few alarms set up with an evaluation period of 5 minutes.
The problem is that I get too many alerts throughout the day because of them getting triggered. Is there a way to schedule those alarms once a day or twice a day?

CloudWatch Alarms only trigger when the cross the threshold. They will not send another alarm until they return to the OK status and then cross into ALARM again.
So, if you are receiving multiple alarms, is because they are often going into, and out of, the ALARM state.
If this is too sensitive for your needs, increase the evaluation period or the number of number of datapoints required to trigger the alarm.

Related

AWS CloudWatch Composite Alarms: Send Alert When 1 Alarm Has been "In Alarm" For Over a Certain Amount Of Time

So I'm trying to setup composite alarms on AWS. So far, I have most of it set up. At the moment, I have a composite alarm set up with 3 alarms. If any 2 of these 3 alarms trigger, then the composite alarm also triggers. This part works fine.
However, I am having trouble with part of my use case. I'd also like to make it so that if one of these alarms within the composite alarm stays in alarm for over a certain period of time, then an alert is also sent out.
Here's an example of the situation:
2 out of the 3 alarms turn on in any time period: Alert should be sent
1 out of the 3 alarms turn on for under a certain time period: Alert should not be sent
1 out of the 3 alarms turn on for over a certain time period: Alert should be sent
I've tried looking into the settings available on the alarms themselves, and there doesn't seem to be an option for what I'm trying to do.
I'm wondering if this would require a lambda function? Is it possible for a lambda function to keep track of how long an alarm has been in alarm?
As talked in the comment section above, I am providing you with a possible solution to your problem. The only blocker is that you can't have different time frame for the alarms, both should be the same.
So you will have (example)- Alarm 1(cpu) if for 15 min it's over 60%. Alarm 2(EFS connections) if for 15 min there are more than 10 connections.
Now the alarm will go off when both the statements are true. Also the alarm will go off when only Alarm 1 goes off.
This is how you are going to make this alarm.
As for testing, it depends on what type of alarms you are making. For example cpu and ram increment methods are widely available on stackoverflow.
Also with aws cli you can change state of an alarm. It's usually for a very small amount of time, maybe 10 seconds.
aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"
You need to find a method which can suite your needs better.

AWS Cloudwatch Alarm status

I have set cloudwatch alarm to trigger SNS mail whenever some keywords are found in cloudwatch logs. (using metric filter)
When those keywords are detected, Alarm state gets changed from insufficient data to alarm & triggers SNS topic
Now, to move from Alarm state alarm to insufficient data it takes time randomly.
Is there any specific way it works, I expect it to come back to Alarm state insufficient data immediately after alarm state.
Any help would be appreciated. Thanks
The alarm has a metric period of 60 seconds and some evaluation period (let suppose 3; total equal 3 * 60 = 3 mints evaluation window).
The alarm will be in Alarm state if all the last 3 datapoints at 60 seconds interval are in Alarm State (above the threshold).
If any 1 in last 3 datapoint is below threshold then the Alarm will transition to OK.
BUT, if the latest all 3 datapoints are missing (say your metric filter did not match and as a result no metric was pushed), the Alarm waits longer than 3 periods to transition to InsufficientData and this is by design to accommodate network delays or processing delay.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
Came across the same situation, used a period of 1 min and some x > threshold.
The state changes to Alarm immediately whenever the metric exceeds the threshold. But to change back to OK/ Insufficient data takes 6 mins. This happens only for missing data.
As per AWS Support this is the expected behavior of Cloudwatch Alarms, clear explanation can be found here https://forums.aws.amazon.com/thread.jspa?threadID=284182

Continuous alerts in Cloudwatch

I have an instance in AWS that from time to time it's CPU cross the threshold of 90%.
I have created an alert for this, however I saw that I received one notification only and it was during the first 5 minutes while the CPU was at 100% for 2 hours.
How do I set the metric so I will keep getting notifications all the time?
Cloudwatch does not send notifications continuously if the threshold is breached. Cloudwatch can send a Notification only when the state changes.
Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state, the state must have changed and been maintained for a specified number of periods.
Ref: AWS Cloudwatch Documentation
One possible solution that I can think of is to create a Multiple Cloudwatch Alarms with Multiple thresholds.
As the above answer already says it is not triggered again, one thing you can do is changing the alarm conditions to a very large value and then the orginal value and the state change will occur again.

AWS CloudWatch Zero Queue Size For One Week alarm

I am wondering if there is a way to set up a CloudWatch alarm that will alarm if an SQS queue has not received any traffic for 7 days. I currently have a job that runs on my host once a week that is guaranteed to add message to my SQS queue, I already have a way of alarming if the job doesn't run but I would also like to alarm if for some reason the job does run but does not send any messages to my queue. I understand that the longest alarm period you can set is 1 day. Is there another way to create an alarm that will do what I am looking for?
Edit:
Since my job runs once a week is there a way to have an alarm that will monitor metrics every 7th day, seeing if any traffic hits the queue within a 24 hour time frame? This is more accurate seeing as the 6 days in between I don't expect or care if there is any traffic only that there is traffic on that 7th day.
CloudWatch Alarms set a limit that period * number_of_datapoints_to_watch must be less than 24 hours. As far as I know, there is no way around that.
To get the behavior you want, you can calculate days since last activity yourself, publish that as a custom metric and alarm on that.
One way to do it would be:
Create a lambda function and have it trigger every hour for example.
In the lambda, call CloudWatch GetMetricStatistics for the SQS metric you want to monitor.
Get the latest datapoint returned that has value greater than 0 and calculate the difference between now and the timestamp on that datapoint.
Use CloudWatch PutMetricData to publish this value to your new metric days-since-last-activity.
Now you can alarm when the value of your new metric goes above 7 days.

Amazon Cloudwatch alarm not triggered

I have a cloudwatch alarm configured :
Threshold : "GreaterThan 0" for 1 consecutive period,
Period : 1 minute,
Statistic : Sum
The alarm is configured on top of AWS SQS NumberOfMessagesSent. The queue was empty and no messages were being published to it. I sent a message manually. I could see the spike in metric but state of alarm was still OK. I am a bit confused why this alarm is not changing its state even though all the conditions to trigger this are met.
I just overcame this problem with the help of AWS support. You need to set the period on your alarm to ~15 minutes. It's got to do with how SQS marks the event's timestamps as it pushes them to CloudWatch.
Don't worry, as setting the period to a greater number will not affect how quickly you are alerted of an alarm. It will still get data from SQS every 5 minutes.
It could be that the interval time is set to less than 300 seconds. The free CloudWatch checks every 5 minutes so if you set an alarm for less than that it you will sometimes get INSUFFICIENT_DATA.
Sometimes they suffer something calling "Delayed Metric delivery", it's something more usual when the alarm period is around narrow times, like 1 minute.
When the delayed timestamp arrive, is too late for the alarm, but not for the graph, because it finally print it nicely without gap.
Play with Evalution Periods and Datapoints to Alarm, not 1/1, maybe 3/2 or 3/1 would work fine.