I want to create an alarm for a particular time window. So, the use case is if we see customer/traffic drop from 6:00 AM to 10 PM then we should get an alarm to know why customers are not using our service and to take some action. is this scenario possible through cloudwatch alarm? we have the number of request metric in place.
Amazon CloudWatch cannot specify time ranges, but since you want to know whether something "unusual" is happening, I would recommend you look at Using CloudWatch Anomaly Detection - Amazon CloudWatch:
When you enable anomaly detection for a metric, CloudWatch applies statistical and machine learning algorithms. These algorithms continuously analyze metrics of systems and applications, determine normal baselines, and surface anomalies with minimal user intervention.
See: New – Amazon CloudWatch Anomaly Detection | AWS News Blog
It should be able to notice if a metric goes outside of its "normal" range, and trigger an alarm.
Related
I'm trying to combine certain number of similar metrics into a single alarm in aws cloud watch. For example lets say for data quality monitoring in sagemaker, one among the metrics that are emitted from data quality monitoring job is feature baseline drift distance for each column so let say I've 600 columns so each column will have this metric. Is there a possible way to compress these metrics into a single cloud watch alarm ?
If not, Is there anyway to send the violation report as message via AWS SNS?
While I am not sure exactly on what out come you want when you refer to "compress the metrics into a single alarm." You can look at using metric math
I have some ECS tasks running in AWS Fargate which in very rare cases may "die" internally, but will still show as "RUNNING" and not fail and trigger the task to restart.
What I would like to do, if possible is check for the absence of logs, e.g. if logs haven't been written in 30 minutes, trigger a lambda to kill the ECS task which will cause it to start back up.
The health check functionality isn't sufficient.
If this isn't possible, are there any other approaches I could consider?
you can have metric and anomaly detection but it may cost for metric to process logs + alarm may cost too. Would rather do lambda run every 30min which would check if logs are there and then would kill ECS as needed. you can run lambda on interval with cloudwatch events bridge.
Logs are probably sent to cloudwatch logs group from your ECS, if you have static name of the logs group, you can use SDK to describe streams inside the group. This api call will tell you timestamp of the last data in stream.
inside lambda nodejs context aws-sdk v2 is already present, so you can require w/o install. here is doc for v2:
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatchLogs.html#describeLogStreams-property
pick to orderBy: "LastEventTime" and to save networking time, set limit from default 50 to 1 limit: 1 and in result you will have lastEventTimestamp
anomaly detection:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Anomaly_Detection.html
alarms:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
check pricing for these, there is free tier, so maybe it won't cost you anything, yet it's easy to build up real $ spend with cloudwatch. https://aws.amazon.com/cloudwatch/pricing/
To run lambda on interval:
I am currently in the process of migrating some services to AWS and have hit a bit of a road block. I would like to be able to monitor the error percentage of a Lambda and create an Alarm if a certain threshold is breached. Currently the percentage error rate can be calculated with Metric Math, however alarms cannot be generated from this.
I was wondering if anyone know a way in that I could push the metrics require to calculate the percentage, Error and Invocation, to a Lambda and have the Lambda perform the calculation and create the SNS alarm?
Thanks!
CloudWatch just released the Alarms on MetricMath expressions.
https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-cloudwatch-launches-ability-to-add-alarms-on-metric-math-expressions/
So basically you just need to:
Go to CloudWatch
Go to Alarms
Create Alarm
Add your metrics
Add a MetricMath expression
Optionally, add other properties for the alarm
Add the actions that you want to be executed
More information in their documentation
We have a service running in aws ecs that we want to scale in and out based on 2 metrics.
Scale out when: cpu > 80% or connection_count > 9500
Scale in when: cpu < 50% and connection_count < 5000
We have access to both the cpu and connection count metrics and alarms in cloud watch. However, we can't figure out how to setup a dynamic scaling policy like this based on both of them.
Using the standard aws console interface for creating the auto scaling rules I don't see any options for multiple. Any links to a tutorial or aws docs on this would be appreciated.
Based on the responses posted in the support aws forums, nothing can be done for AND/OR/IF conditions. (https://forums.aws.amazon.com/thread.jspa?threadID=94984)
It does mention however that they already put a feature request to the cloudwatch team.
The following is mentioned as a workaround:
"In the meantime, a possible workaround can be to create a custom metric using a custom script which would run after every five minutes and get the data points from the CloudWatch metrics, then perform the AND or OR operation and then push the output to a custom metric. You can then create a CloudWatch alarm which would monitor this custom metric and then trigger actions accordingly."
I'm researching the AWS CloudWatch SDK for Java and I see there's a limit of 5,000 alarms per account per region for PutMetricAlarm:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
My situation is such that the number of alarms could potentially surpass this limit (i.e. transaction fails for a particular product). I wouldn't need to configure thresholds for a predetermined set of alarms. Rather, the alarm would be fired off ad hoc programmatically at the time failure is detected, with different failure possibilities that could reach well over 5,000.
Does CloudWatch support this scenario, either through PutMetricAlarm or otherwise?
You could:
Increase your account limits (contact the AWS support)
Use SetAlarmState API