Can you set up CloudWatch to fire an event (which can lead to a Lambda function being called) on every change to a metric? I can see how to fire an even when it meets a boundary via alerts, but I'd like an event on every change.
No.
A CloudWatch Alarm is triggered when a calculated metric goes outside a given bound over a desired time period. It is always a calculated value (eg average, sum, min, max) and is not based upon an individual metric.
That said, if you have very sparse metrics (that don't trigger very often), using COUNT or SUM might be sufficient but it isn't specifically what you are requesting.
If you have a metric that behaves in a predictable way then you can indeed achieve this kind of behaviour.
If you have a metric with a small set of possible values, for example, consider a metric where the value can be 0 or 1.
You could then create a CloudWatch alarm where the threshold is 0 for 1 period and then a second alarm where the threshold is 1 for 1 period.
So basically for each possible value that your metric can be, you would have an alarm. Each of these alarms would trigger an action of your choice e.g. SNS
As I said, this would only work if you have a metric with a known set of possible values, not with a metric that can have unpredictable values.
Related
Looking to list all EC2 servers/instances which have crossed a certain threshold using AWS CloudWatch
I want to view all my ec2 instances or servers which have reached or crossed some threshold i.e triggered some alarm, in any time in the last one month. I have been looking for a solution for the past two days but to no avail. I would really appreciate any help regarding the matter.
You can create an alarm in CloudWatch for an EC2 metric (e.g., CPUUtilization) for all instances by selecting a metric like this EC2 -> Across All Instances -> CPUUtilization. Then you can select a value for Statistic (e.g., Maximum) and a specify a period over which you need the alarm to check for alert (e.g., 1 Minute). Under conditions, select Threshold type (e.g., Static), chose the condition operator (e.g., Greater > threshold), define the threshold value (e.g., 75.0). Under Notification, select the value for Whenever this alarm state is (e.g., in Alarm) and Select an SNS topic. Finally, specify other values like name, description, etc and create the alarm.
Need to generate CloudWatch Alarm for API Gateway and Lambda functions.
For API Gateway CloudWatch Alarm should be generated if 5XX Error count is 10% of total request count at given period (e.g. 5 minutes).
Similarly, We need to add a metric if Maximum Latency 6% of total request count > 4 seconds.
For such type of metric looks like we need a combination of metric like Sum of total requests and than need to calculate percentage of errors.
We have Math Expressions/Metric math which might be used. Is there any other way to achieve this?
Any help is appreciated!
EDIT: Its now possible to create alarms on Metric expressions from the CloudWatch Console.
Original Answer below:
Unfortunately its not possible to create alarms based on Metric expressions in CloudWatch.
Your best bet may be the M of N thresholds in alarms, for example:
If num_of_errors > 5 for 3 datapoints in 5 minutes.
Its not exactly what your asking for, but may be a good start.
Best of luck!
I have a cloudwatch alarm that is watching a somewhat sparse metrics (manually published at unpredictable intervals).
I didn't think this would be an issue if I used: Treat missing data as "ignore", but it looks like this is not working.
Basically I have a lambda function that is triggered at unpredictable intervals (might not get called for days, or get called 5 times an hour, etc. - it's triggered based on a human-controlled action).
This lambda function records a metric (ex. # of example metric). If the value is > 0, I want the alarm to go into ALARM. If the value is < 0, I want the alarm to go into OK.
But for some reason, not sure why, the alarm isn't clearing automatically whenever I record a 0 metric from the lambda function. And when I record the metric w/ value of 1, it should go into alarm, but often doesn't.
Am I misunderstanding how these sparce metrics work?
I have an app that puts a custom Cloudwatch metric to AWS every minute. This is supposed to act as a heartbeat so I know the app is alive.
Now I want to put an alarm on this metric to notify me if the heartbeat stops. I have tried to accomplish this using different cloudwatch alarm statistics including "average" and "data samples" and setting an alarm threshold less than 1 over a given period. However, in all cases, if my app dies and stops reporting the heartbeat, the alarm will only go into an "Insufficient Data" state and never into an "Alarm" state.
I understand I can put a notification on the "Insufficient Data" state, but I want this to show up as an alarm. Is this possible in Cloudwatch?
Thanks,
Matt
I think that the alarm going into "Insufficient Data" state has to do with how missing data is being handled. As the doc states:
Similar to how each alarm is always in one of three states, each specific data point reported to CloudWatch falls under one of three categories:
Not breaching (within the threshold)
Breaching (violating the threshold)
Missing
You can specify how alarms handle missing data points. Choose whether to treat missing data points as:
missing (The alarm looks back farther in time to find additional data points)
notBreaching (Treated as a data point that is within the threshold)
breaching (Treated as a data point that is breaching the threshold)
ignore (The current alarm state is maintained)
The default behavior is missing.
So i guess that specifying missing data points as breaching would do the trick :)
Instead of pushing in a custom metric to Cloudwatch, consider:
Push a message onto an SNS topic, on the same periodic basis as you were doing, and set up a CloudWatch monitor for the SNS topic's NumberOfMessagesPublished metric. If the number of heartbeats falls below the expected value for the time period you specify, whether its because the app crashed, or server crashed, the metric will go into an Alarm state.
Treat missing data as breaching threshold (step 4)
Check this: https://cloudonaut.io/dead-mans-switch-with-cloudwatch/
Amazon AWS CloudWatch has the following Alarm in an alarmed state
What caused it to get into this state?
Why is it still in this state, as my application is not currently being used.
CloudWatch alarms have three possible states:
ALARM: This means the condition is TRUE. It is typically associated with a condition that should trigger an alert or an auto-scaling action.
OK: This means the condition is FALSE. It typically means "don't worry, everything's fine".
INSUFFICIENT DATA: This means there is not enough data for the state to be determined. Typically caused by an alarm configured for a period of time (eg Average over 5 minutes) where there is insufficient data (eg less than 5 minutes of data).
The ALARM condition can look scary when associated with a scale-down alarm because it doesn't mean anything is 'wrong'. Rather, it just means TRUE. Sometimes I wish they'd call it something other than 'ALARM' since people sometimes get worried when this state is perfectly OK.
Your alarm triggers if the amount of outgoing network usage is less than the configured threshold. Given that you say that your application is not currently being used it sounds normal for it to be in this state.
When using alarms to trigger scale up/down behaviour, it's normal that the scale down alarm is active when usage is low. It won't actually do anything in general since it can't make the number of instances less than the minimum you've allowed.