AWS CloudWatch alarm for SQS Number of Messages Visible - amazon-web-services

I am trying to capture the event of a new message in my FIFO queue (as I want to avoid , infinite polling of Queue) .
For this purpose I am evaluating the CloudWatch alarm option with metrics ApproximateNumberOfMessagesVisible .
Following is my Alarm description-
Threshold: The condition in which the alarm will go to the ALARM state.ApproximateNumberOfMessagesVisible >= 0 for 1 minute
Actions:The actions that will occur when the alarm changes state.
In ALARM:
Send message to topic "topic_for_events_generated_bycloudwatch" (xyz#xyz)
Send message to topic "topic_for_events_generated_bycloudwatch"
Period:The granularity of the datapoints for the monitored metric.1 minute
Following are my queries -
Assuming there are more than 0 messages in the given Q - will this alarm raised only once when the condition met or every minute ?
During quick test I saw Alarm keeping moving between INSUFFICIENT and ALARM state in random other without any configuration changes, what could be rational ?
Screenshot of ApproximateNumberOfMessagesVisible metric graph
Screenshot of the log activity
Thanks in advance.
Regards,
Rohan K

Cloudwatch will alarm once the threshold is breached for state transition.
From the Docs
Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state, the state must have changed and been maintained for a specified number of periods.
But
After an alarm invokes an action due to a change in state, its
subsequent behavior depends on the type of action that you have
associated with the alarm. For Amazon EC2 and Auto Scaling actions,
the alarm continues to invoke the action for every period that the
alarm remains in the new state. For Amazon SNS notifications, no additional actions are invoked.
An Example:
In the following figure, the alarm threshold is set to 3 units and the
alarm is evaluated over 3 periods. That is, the alarm goes to ALARM
state if the oldest of the 3 periods being evaluated is breaching, and
the 2 subsequent periods are either breaching or missing. In the
figure, this happens with the third through fifth time periods, and
the alarm's state is set to ALARM. At period six, the value dips below
the threshold, and the state reverts to OK. Later, during the ninth
time period, the threshold is breached again, but for only one period.
Consequently, the alarm state remains OK.

Related

How do i schedule alarms in aws cloudwatch

I have a few alarms set up with an evaluation period of 5 minutes.
The problem is that I get too many alerts throughout the day because of them getting triggered. Is there a way to schedule those alarms once a day or twice a day?
CloudWatch Alarms only trigger when the cross the threshold. They will not send another alarm until they return to the OK status and then cross into ALARM again.
So, if you are receiving multiple alarms, is because they are often going into, and out of, the ALARM state.
If this is too sensitive for your needs, increase the evaluation period or the number of number of datapoints required to trigger the alarm.

AWS Cloudwatch Alarm status

I have set cloudwatch alarm to trigger SNS mail whenever some keywords are found in cloudwatch logs. (using metric filter)
When those keywords are detected, Alarm state gets changed from insufficient data to alarm & triggers SNS topic
Now, to move from Alarm state alarm to insufficient data it takes time randomly.
Is there any specific way it works, I expect it to come back to Alarm state insufficient data immediately after alarm state.
Any help would be appreciated. Thanks
The alarm has a metric period of 60 seconds and some evaluation period (let suppose 3; total equal 3 * 60 = 3 mints evaluation window).
The alarm will be in Alarm state if all the last 3 datapoints at 60 seconds interval are in Alarm State (above the threshold).
If any 1 in last 3 datapoint is below threshold then the Alarm will transition to OK.
BUT, if the latest all 3 datapoints are missing (say your metric filter did not match and as a result no metric was pushed), the Alarm waits longer than 3 periods to transition to InsufficientData and this is by design to accommodate network delays or processing delay.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
Came across the same situation, used a period of 1 min and some x > threshold.
The state changes to Alarm immediately whenever the metric exceeds the threshold. But to change back to OK/ Insufficient data takes 6 mins. This happens only for missing data.
As per AWS Support this is the expected behavior of Cloudwatch Alarms, clear explanation can be found here https://forums.aws.amazon.com/thread.jspa?threadID=284182

Continuous alerts in Cloudwatch

I have an instance in AWS that from time to time it's CPU cross the threshold of 90%.
I have created an alert for this, however I saw that I received one notification only and it was during the first 5 minutes while the CPU was at 100% for 2 hours.
How do I set the metric so I will keep getting notifications all the time?
Cloudwatch does not send notifications continuously if the threshold is breached. Cloudwatch can send a Notification only when the state changes.
Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state, the state must have changed and been maintained for a specified number of periods.
Ref: AWS Cloudwatch Documentation
One possible solution that I can think of is to create a Multiple Cloudwatch Alarms with Multiple thresholds.
As the above answer already says it is not triggered again, one thing you can do is changing the alarm conditions to a very large value and then the orginal value and the state change will occur again.

AWS Cloudwatch Heartbeat Alarm

I have an app that puts a custom Cloudwatch metric to AWS every minute. This is supposed to act as a heartbeat so I know the app is alive.
Now I want to put an alarm on this metric to notify me if the heartbeat stops. I have tried to accomplish this using different cloudwatch alarm statistics including "average" and "data samples" and setting an alarm threshold less than 1 over a given period. However, in all cases, if my app dies and stops reporting the heartbeat, the alarm will only go into an "Insufficient Data" state and never into an "Alarm" state.
I understand I can put a notification on the "Insufficient Data" state, but I want this to show up as an alarm. Is this possible in Cloudwatch?
Thanks,
Matt
I think that the alarm going into "Insufficient Data" state has to do with how missing data is being handled. As the doc states:
Similar to how each alarm is always in one of three states, each specific data point reported to CloudWatch falls under one of three categories:
Not breaching (within the threshold)
Breaching (violating the threshold)
Missing
You can specify how alarms handle missing data points. Choose whether to treat missing data points as:
missing (The alarm looks back farther in time to find additional data points)
notBreaching (Treated as a data point that is within the threshold)
breaching (Treated as a data point that is breaching the threshold)
ignore (The current alarm state is maintained)
The default behavior is missing.
So i guess that specifying missing data points as breaching would do the trick :)
Instead of pushing in a custom metric to Cloudwatch, consider:
Push a message onto an SNS topic, on the same periodic basis as you were doing, and set up a CloudWatch monitor for the SNS topic's NumberOfMessagesPublished metric. If the number of heartbeats falls below the expected value for the time period you specify, whether its because the app crashed, or server crashed, the metric will go into an Alarm state.
Treat missing data as breaching threshold (step 4)
Check this: https://cloudonaut.io/dead-mans-switch-with-cloudwatch/

Amazon Cloudwatch alarm not triggered

I have a cloudwatch alarm configured :
Threshold : "GreaterThan 0" for 1 consecutive period,
Period : 1 minute,
Statistic : Sum
The alarm is configured on top of AWS SQS NumberOfMessagesSent. The queue was empty and no messages were being published to it. I sent a message manually. I could see the spike in metric but state of alarm was still OK. I am a bit confused why this alarm is not changing its state even though all the conditions to trigger this are met.
I just overcame this problem with the help of AWS support. You need to set the period on your alarm to ~15 minutes. It's got to do with how SQS marks the event's timestamps as it pushes them to CloudWatch.
Don't worry, as setting the period to a greater number will not affect how quickly you are alerted of an alarm. It will still get data from SQS every 5 minutes.
It could be that the interval time is set to less than 300 seconds. The free CloudWatch checks every 5 minutes so if you set an alarm for less than that it you will sometimes get INSUFFICIENT_DATA.
Sometimes they suffer something calling "Delayed Metric delivery", it's something more usual when the alarm period is around narrow times, like 1 minute.
When the delayed timestamp arrive, is too late for the alarm, but not for the graph, because it finally print it nicely without gap.
Play with Evalution Periods and Datapoints to Alarm, not 1/1, maybe 3/2 or 3/1 would work fine.