AWS Cloudwatch Alarm status - alarm

I have set cloudwatch alarm to trigger SNS mail whenever some keywords are found in cloudwatch logs. (using metric filter)
When those keywords are detected, Alarm state gets changed from insufficient data to alarm & triggers SNS topic
Now, to move from Alarm state alarm to insufficient data it takes time randomly.
Is there any specific way it works, I expect it to come back to Alarm state insufficient data immediately after alarm state.
Any help would be appreciated. Thanks

The alarm has a metric period of 60 seconds and some evaluation period (let suppose 3; total equal 3 * 60 = 3 mints evaluation window).
The alarm will be in Alarm state if all the last 3 datapoints at 60 seconds interval are in Alarm State (above the threshold).
If any 1 in last 3 datapoint is below threshold then the Alarm will transition to OK.
BUT, if the latest all 3 datapoints are missing (say your metric filter did not match and as a result no metric was pushed), the Alarm waits longer than 3 periods to transition to InsufficientData and this is by design to accommodate network delays or processing delay.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html

Came across the same situation, used a period of 1 min and some x > threshold.
The state changes to Alarm immediately whenever the metric exceeds the threshold. But to change back to OK/ Insufficient data takes 6 mins. This happens only for missing data.
As per AWS Support this is the expected behavior of Cloudwatch Alarms, clear explanation can be found here https://forums.aws.amazon.com/thread.jspa?threadID=284182

Related

AWS Cloudwatch monitoring alarm is triggered when instance is off

I have some alarms to check when an instance is left idle. The conditions are when 12 consecutive datapoints (at 5 min each) are found to have an average of <1% CPU usage, the instance should be stopped and a notification email sent out.
The alarm I created reads:
Whenever _Average_ of _CPU Utilization_
is _<_ +1+ Percent
For at least _12_ consecutive periods of _5 minutes_
Alarm
The alarm gets triggered in the use case of the instance being up and running for 1 hour with <1% CPU usage.
However, the alarm is also triggered when the instance is shut off. For instance, if the the instance is turned on, has 30 minutes of data points <1% CPU, and then is turned off, the alarm will be triggered in 30 minutes.
CPU metrics
How can I set this alarm so it is either:
only triggered when the instance is running, or
only triggered when a full set of 12 consecutive data points is actually collected, and not missing points that register as <1%?
The answer to this was actually quite simple. If you go to Cloudwatch, select the alarm and scroll down to Additional Configuration. For Missing Data Treatment, select "Treat missing data as good (not breaching alarm)".
Well as AWS says:
For each alarm you can specify CloudWatch to treat missing data points
as any of the following :
missing: the alarm does not consider missing data points when evaluating whether to change state (default)
notBreaching: missing data points are treated as begin within the threshold
breaching: missing data points are treated as breaching the threshold
ignore: the current alarm state is maintained

How do i schedule alarms in aws cloudwatch

I have a few alarms set up with an evaluation period of 5 minutes.
The problem is that I get too many alerts throughout the day because of them getting triggered. Is there a way to schedule those alarms once a day or twice a day?
CloudWatch Alarms only trigger when the cross the threshold. They will not send another alarm until they return to the OK status and then cross into ALARM again.
So, if you are receiving multiple alarms, is because they are often going into, and out of, the ALARM state.
If this is too sensitive for your needs, increase the evaluation period or the number of number of datapoints required to trigger the alarm.

AWS CloudWatch alarm for SQS Number of Messages Visible

I am trying to capture the event of a new message in my FIFO queue (as I want to avoid , infinite polling of Queue) .
For this purpose I am evaluating the CloudWatch alarm option with metrics ApproximateNumberOfMessagesVisible .
Following is my Alarm description-
Threshold: The condition in which the alarm will go to the ALARM state.ApproximateNumberOfMessagesVisible >= 0 for 1 minute
Actions:The actions that will occur when the alarm changes state.
In ALARM:
Send message to topic "topic_for_events_generated_bycloudwatch" (xyz#xyz)
Send message to topic "topic_for_events_generated_bycloudwatch"
Period:The granularity of the datapoints for the monitored metric.1 minute
Following are my queries -
Assuming there are more than 0 messages in the given Q - will this alarm raised only once when the condition met or every minute ?
During quick test I saw Alarm keeping moving between INSUFFICIENT and ALARM state in random other without any configuration changes, what could be rational ?
Screenshot of ApproximateNumberOfMessagesVisible metric graph
Screenshot of the log activity
Thanks in advance.
Regards,
Rohan K
Cloudwatch will alarm once the threshold is breached for state transition.
From the Docs
Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state, the state must have changed and been maintained for a specified number of periods.
But
After an alarm invokes an action due to a change in state, its
subsequent behavior depends on the type of action that you have
associated with the alarm. For Amazon EC2 and Auto Scaling actions,
the alarm continues to invoke the action for every period that the
alarm remains in the new state. For Amazon SNS notifications, no additional actions are invoked.
An Example:
In the following figure, the alarm threshold is set to 3 units and the
alarm is evaluated over 3 periods. That is, the alarm goes to ALARM
state if the oldest of the 3 periods being evaluated is breaching, and
the 2 subsequent periods are either breaching or missing. In the
figure, this happens with the third through fifth time periods, and
the alarm's state is set to ALARM. At period six, the value dips below
the threshold, and the state reverts to OK. Later, during the ninth
time period, the threshold is breached again, but for only one period.
Consequently, the alarm state remains OK.

AWS CloudWatch Zero Queue Size For One Week alarm

I am wondering if there is a way to set up a CloudWatch alarm that will alarm if an SQS queue has not received any traffic for 7 days. I currently have a job that runs on my host once a week that is guaranteed to add message to my SQS queue, I already have a way of alarming if the job doesn't run but I would also like to alarm if for some reason the job does run but does not send any messages to my queue. I understand that the longest alarm period you can set is 1 day. Is there another way to create an alarm that will do what I am looking for?
Edit:
Since my job runs once a week is there a way to have an alarm that will monitor metrics every 7th day, seeing if any traffic hits the queue within a 24 hour time frame? This is more accurate seeing as the 6 days in between I don't expect or care if there is any traffic only that there is traffic on that 7th day.
CloudWatch Alarms set a limit that period * number_of_datapoints_to_watch must be less than 24 hours. As far as I know, there is no way around that.
To get the behavior you want, you can calculate days since last activity yourself, publish that as a custom metric and alarm on that.
One way to do it would be:
Create a lambda function and have it trigger every hour for example.
In the lambda, call CloudWatch GetMetricStatistics for the SQS metric you want to monitor.
Get the latest datapoint returned that has value greater than 0 and calculate the difference between now and the timestamp on that datapoint.
Use CloudWatch PutMetricData to publish this value to your new metric days-since-last-activity.
Now you can alarm when the value of your new metric goes above 7 days.

Amazon Cloudwatch alarm not triggered

I have a cloudwatch alarm configured :
Threshold : "GreaterThan 0" for 1 consecutive period,
Period : 1 minute,
Statistic : Sum
The alarm is configured on top of AWS SQS NumberOfMessagesSent. The queue was empty and no messages were being published to it. I sent a message manually. I could see the spike in metric but state of alarm was still OK. I am a bit confused why this alarm is not changing its state even though all the conditions to trigger this are met.
I just overcame this problem with the help of AWS support. You need to set the period on your alarm to ~15 minutes. It's got to do with how SQS marks the event's timestamps as it pushes them to CloudWatch.
Don't worry, as setting the period to a greater number will not affect how quickly you are alerted of an alarm. It will still get data from SQS every 5 minutes.
It could be that the interval time is set to less than 300 seconds. The free CloudWatch checks every 5 minutes so if you set an alarm for less than that it you will sometimes get INSUFFICIENT_DATA.
Sometimes they suffer something calling "Delayed Metric delivery", it's something more usual when the alarm period is around narrow times, like 1 minute.
When the delayed timestamp arrive, is too late for the alarm, but not for the graph, because it finally print it nicely without gap.
Play with Evalution Periods and Datapoints to Alarm, not 1/1, maybe 3/2 or 3/1 would work fine.