Usefulness of Publish the Value Zero for cloudwatch - amazon-web-services

According to this doc I should consider publishing value zero instead of no data because I "can set a CloudWatch alarm to notify you if your application fails to publish metrics every five minute".
But I can set a cloudwatch alarm to notify on INSUFFICIENT_DATA too. Is using 0 a more reliable way of doing this? Is using 0 over INSUFFICIENT_DATA recommended by amazon because its more reliable?

You can set an alarm via either method.
However, there is a difference between publishing a value of zero and an alarm state of INSUFFICIENT_DATA.
If your service is running, then publish a zero value instead of not publishing and having the alarm go into the INSUFFICIENT_DATA state. In the first case you know your service is running. In the second case you have no data. This may or not be valuable to you but at least your log files will not have missing time areas.

Related

CloudWatch alarm action only triggers SNS once

I have a CloudWatch alarm created for testing purposes. It checks the number of bytes read on a Kinesis Stream. If less than 1 bytes are received within 1 minute, it triggers alarm and send email via SNS. So, I get email after 1 min, but then after that I don’t get any further email. Is it right the email notification sent only once? In my test data is not flowing all the time. So, ideally, it should send email every minute. Correct?
Whether or not the action keeps firing depends on the type of action.
SNS actions only trigger once when the state changes to ALARM. Other actions such as EC2 auto scaling keep triggering as long as the alarm is in ALARM state.
If your alarm reverts to OK state and then back again to ALARM, SNS will get triggered again.
AWS docs
If CloudWatch alarm switched to ALARM state after specified period (1 minute), then in your case it means it received less than 1 byte. When switched, it will trigger configured actions (in your case it is email notification). As long as alarm remains in the ALARM state without switching it state back to OK, nothing will be triggered again.
if alarm returned back to OK state and then again after the specified period it switched to ALARM, configured action will be triggered again.
For more information, refer to documentation.

Want SNS alert repeatedly

I had setup an Alert for CPU utilization on EC2 instance. Created one SNS topic to send alerts on mail. It sends me an alert when CPU utilization goes to ALARM state but I want repeated alerts till ALARM state get resolved. Please help me... I'm newbie to AWS.
What you can do is setup a Lambda function with a CloudWatch event trigger so that it runs periodically, and inside it call the CloudWatch GetMetricStatistics API. Then, simply check if it is above or below your preferred threshold (or if you want, whether or not it's in Alarm state) and publish a message to SNS. There are a lot of SDK documentations on how to use these API's with your preferred language.
It is not possible to get repeated notifications after getting into the ALARM state. As the alarm is entering the ALARM state only once that means the notification via Amazon SNS will be sent only once.
Autoscaling policy will be triggered by the same alarm. But mail will be sent only once.

Continuous alerts in Cloudwatch

I have an instance in AWS that from time to time it's CPU cross the threshold of 90%.
I have created an alert for this, however I saw that I received one notification only and it was during the first 5 minutes while the CPU was at 100% for 2 hours.
How do I set the metric so I will keep getting notifications all the time?
Cloudwatch does not send notifications continuously if the threshold is breached. Cloudwatch can send a Notification only when the state changes.
Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state, the state must have changed and been maintained for a specified number of periods.
Ref: AWS Cloudwatch Documentation
One possible solution that I can think of is to create a Multiple Cloudwatch Alarms with Multiple thresholds.
As the above answer already says it is not triggered again, one thing you can do is changing the alarm conditions to a very large value and then the orginal value and the state change will occur again.

AWS Cloudwatch Heartbeat Alarm

I have an app that puts a custom Cloudwatch metric to AWS every minute. This is supposed to act as a heartbeat so I know the app is alive.
Now I want to put an alarm on this metric to notify me if the heartbeat stops. I have tried to accomplish this using different cloudwatch alarm statistics including "average" and "data samples" and setting an alarm threshold less than 1 over a given period. However, in all cases, if my app dies and stops reporting the heartbeat, the alarm will only go into an "Insufficient Data" state and never into an "Alarm" state.
I understand I can put a notification on the "Insufficient Data" state, but I want this to show up as an alarm. Is this possible in Cloudwatch?
Thanks,
Matt
I think that the alarm going into "Insufficient Data" state has to do with how missing data is being handled. As the doc states:
Similar to how each alarm is always in one of three states, each specific data point reported to CloudWatch falls under one of three categories:
Not breaching (within the threshold)
Breaching (violating the threshold)
Missing
You can specify how alarms handle missing data points. Choose whether to treat missing data points as:
missing (The alarm looks back farther in time to find additional data points)
notBreaching (Treated as a data point that is within the threshold)
breaching (Treated as a data point that is breaching the threshold)
ignore (The current alarm state is maintained)
The default behavior is missing.
So i guess that specifying missing data points as breaching would do the trick :)
Instead of pushing in a custom metric to Cloudwatch, consider:
Push a message onto an SNS topic, on the same periodic basis as you were doing, and set up a CloudWatch monitor for the SNS topic's NumberOfMessagesPublished metric. If the number of heartbeats falls below the expected value for the time period you specify, whether its because the app crashed, or server crashed, the metric will go into an Alarm state.
Treat missing data as breaching threshold (step 4)
Check this: https://cloudonaut.io/dead-mans-switch-with-cloudwatch/

What is Amazon AWS CloudWatch alarm 1 datapoint (1577523.0) was less than the threshold

Amazon AWS CloudWatch has the following Alarm in an alarmed state
What caused it to get into this state?
Why is it still in this state, as my application is not currently being used.
CloudWatch alarms have three possible states:
ALARM: This means the condition is TRUE. It is typically associated with a condition that should trigger an alert or an auto-scaling action.
OK: This means the condition is FALSE. It typically means "don't worry, everything's fine".
INSUFFICIENT DATA: This means there is not enough data for the state to be determined. Typically caused by an alarm configured for a period of time (eg Average over 5 minutes) where there is insufficient data (eg less than 5 minutes of data).
The ALARM condition can look scary when associated with a scale-down alarm because it doesn't mean anything is 'wrong'. Rather, it just means TRUE. Sometimes I wish they'd call it something other than 'ALARM' since people sometimes get worried when this state is perfectly OK.
Your alarm triggers if the amount of outgoing network usage is less than the configured threshold. Given that you say that your application is not currently being used it sounds normal for it to be in this state.
When using alarms to trigger scale up/down behaviour, it's normal that the scale down alarm is active when usage is low. It won't actually do anything in general since it can't make the number of instances less than the minimum you've allowed.