AWS: Alarming on Metric Math alternative - amazon-web-services

I am currently in the process of migrating some services to AWS and have hit a bit of a road block. I would like to be able to monitor the error percentage of a Lambda and create an Alarm if a certain threshold is breached. Currently the percentage error rate can be calculated with Metric Math, however alarms cannot be generated from this.
I was wondering if anyone know a way in that I could push the metrics require to calculate the percentage, Error and Invocation, to a Lambda and have the Lambda perform the calculation and create the SNS alarm?
Thanks!

CloudWatch just released the Alarms on MetricMath expressions.
https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-cloudwatch-launches-ability-to-add-alarms-on-metric-math-expressions/
So basically you just need to:
Go to CloudWatch
Go to Alarms
Create Alarm
Add your metrics
Add a MetricMath expression
Optionally, add other properties for the alarm
Add the actions that you want to be executed
More information in their documentation

Related

Is there a way to ignore cloudwatch alarms when they are triggered at night?

I have alarms in AWS Cloudwatch but at night I keep getting False positives due to low volumes. How can I set up an alarm so that it only triggers at certain times of the day? Or how do you suggest approaching this problem?
Using AWS CLI you can disable cloudwatch alaram using the following command:
aws cloudwatch disable-alarm-actions --alarm-names "alarm name"
And then enable it again using this command:
aws cloudwatch enable-alarm-actions --alarm-names "alarm name"
You scheduled this disable/enable using cronjob for example.
You can automate this by creating an EventBridge rule where you specify a cron or schedule expression that runs a lambda function.
Then, you can use your Lambda function to enable or disable an alarm (or even multiple alarms together) according to your desired schedules.
disable_alarm = client.disable_alarm_actions(AlarmNames=alarm_names)
Here's a good tutorial: https://medium.com/geekculture/terraform-structure-for-enabling-disabling-alarms-in-batches-5c4f165a8db7
Alternatively, I found that it is possible to create a metric based on a Math expression where I could say for example:
IF(Invocations > threshold, metric, 0)
And this will output 0 at night where the Invocations volume is less than the threshold.
Then I could create an alarm on top of this new metric.

Cloudwatch alarm for a time range

I want to create an alarm for a particular time window. So, the use case is if we see customer/traffic drop from 6:00 AM to 10 PM then we should get an alarm to know why customers are not using our service and to take some action. is this scenario possible through cloudwatch alarm? we have the number of request metric in place.
Amazon CloudWatch cannot specify time ranges, but since you want to know whether something "unusual" is happening, I would recommend you look at Using CloudWatch Anomaly Detection - Amazon CloudWatch:
When you enable anomaly detection for a metric, CloudWatch applies statistical and machine learning algorithms. These algorithms continuously analyze metrics of systems and applications, determine normal baselines, and surface anomalies with minimal user intervention.
See: New – Amazon CloudWatch Anomaly Detection | AWS News Blog
It should be able to notice if a metric goes outside of its "normal" range, and trigger an alarm.

Creating a CloudWatch alarm based on a search expression

I'm attempting to the do the following:
I have a DynamoDB global table which publishes the ReplicationLatency metric. I want to create an alarm on the aggregate of the ReplicationLatency metric published for each region.
The DDB table replicas exist in us-east-1, us-west-2 and us-west-1. In defining the CW alarm for each receiving region, I was under the assumption that I could a search expression. For example, here's the expression I see in the CloudWatch console.
SEARCH('{AWS/DynamoDB,ReceivingRegion,TableName} MetricName="ReplicationLatency"', 'Average', 300)
I'd like to create a metric math alarm which is the avg of the metrics of the above search result. I was attempting to create a metric math expression of the format:
AVG(METRICS())
I then get the following error - The expression for an alarm must include at least one metric. Has anybody attempted to create an alarm from a search expression before? If yes, could you shed some light on how it can be done?
The only other way I can think of solving this problem is to enumerate/add the ReplicationLatency metric for each receiving region and then create a metric math expression out of that. That's seems to completely defeat the purpose of having a search expression and creating an alarm from all those metrics.
You cannot do alarms on search expressions at the moment.
You will have to add manually all the metrics you want to alarm on and then use the math function you specified above.
edit: official documentation link
Here is the link to official documentation:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Create-alarm-on-metric-math-expression.html
"You can't create an alarm based on the SEARCH expression. This is because search expressions return multiple time series, and an alarm based on a math expression can watch only one time series."

Continuous alerts in Cloudwatch

I have an instance in AWS that from time to time it's CPU cross the threshold of 90%.
I have created an alert for this, however I saw that I received one notification only and it was during the first 5 minutes while the CPU was at 100% for 2 hours.
How do I set the metric so I will keep getting notifications all the time?
Cloudwatch does not send notifications continuously if the threshold is breached. Cloudwatch can send a Notification only when the state changes.
Alarms invoke actions for sustained state changes only. CloudWatch alarms do not invoke actions simply because they are in a particular state, the state must have changed and been maintained for a specified number of periods.
Ref: AWS Cloudwatch Documentation
One possible solution that I can think of is to create a Multiple Cloudwatch Alarms with Multiple thresholds.
As the above answer already says it is not triggered again, one thing you can do is changing the alarm conditions to a very large value and then the orginal value and the state change will occur again.

Use cloudwatch to determine if linux service is running

Suppose I have an ec2 instance with service /etc/init/my_service.conf with contents
script
exec my_exec
end script
How can I monitor that ec2 instance such that if my_service stopped running I can act on it?
You can publish a custom metric to CloudWatch in the form of a "heart beat".
Have a small script running via cron on your server checking the
process list to see whether my_service is running and if it is, make
a put-metric-data call to CloudWatch.
The metric could be as simple as pushing the number "1" to your custom metric in CloudWatch.
Set up a CloudWatch alarm that triggers if the average for the metric falls below 1
Make the period of the alarm be >= the period that the cron runs e.g. cron runs every 5 minutes, make the alarm alarm if it sees the average is below 1 for two 5 minute periods.
Make sure you also handle the situation in which the metric is not published (e. g. cron fails to run or whole machine dies). you would want to setup an alert in case the metric is missing. (see here: AWS Cloudwatch Heartbeat Alarm)
Be aware that the custom metric will add an additional cost of 50c to your AWS bill (not a big deal for one metric - but the equation changes drastically if you want to push hundred/thousands of metrics - i.e. good to know it's not free as one would expect)
See here for how to publish a custom metric: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html
I am not sure if CloudWatch is the right route for checking if the service is running - it would be easier with Nagios kind of solution.
Nevertheless, you may try the CloudWatch Custom metrics approach. You add Additional lines of code which publishes say an integer 1 to CloudWatch Custom Metrics every 5 mins. Your can then configure CloudWatch alarms to do a SNS Notification / Mail Notification for the conditions like Sample Count or sum deviating your anticipated value.
script
exec my_exec
publish cloudwatch custom metrics value
end script
More Info
Publish Custom Metrics - http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/publishingMetrics.html