I have started an EC2 instance (with standard monitoring).
From my understanding, the EC2 service will publish 1 datapoint every 5 minutes for the CPUUtilization to Cloudwatch.
Hence my question is, why are the graphs different for a 5 minutes visualization for different statistics (Min, Max, Avg, ...) ?
Since there is only 1 datapoint per 5 minutes, the Min, Max or Average of a single datapoint should be the same right ?
Example:
Just by changing the "average" statistic to the "max", the graph changes (I don't understand why).
Thanks
Just to add on to #jccampanero's answer, I'd like to explain it with a bit more details.
From my understanding, the EC2 service will publish 1 datapoint every 5 minutes for the CPUUtilization to CloudWatch.
Yes, your understanding is correct, but there are two types of datapoint. One type is called "raw data", and the other type is called "statistic set". Both types use the same PutMetricData API to publish metrics to CloudWatch, but they use different options.
Since there is only 1 datapoint per 5 minutes, the Min, Max or Average of a single datapoint should be the same right?
Not quite. This is only true when all datapoints are of type "raw data". Basically just think of it as a number. If you have statistic sets, then the Min, Max and Average of a single datapoint can be different, which is exactly what happens here.
If you choose the SampleCount statistic, you can see that one datapoint here is an aggregation of 5 samples. Just to give you a concrete example, actually, let's take the one in #jccampanero's answer.
In this period of time on average the CPU utilization was 40%, with a maximum of 90%, and a minimum of 20%,. I hope you get the idea.
Translated to code (e.g. AWS CLI), it's something like
aws cloudwatch put-metric-data \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--unit Percent \
--statistic-values Sum=200,Minimum=20,Maximum=90,SampleCount=5 \
--dimensions InstanceId=i-123456789
If EC2 were using AWS CLI to push the metrics to CloudWatch, this would be it. I think you get the idea now, and it's quite common to aggregate the data to save some money on the CloudWatch bill.
Honestly I have never thought about it carefully but from my understanding the following is going on.
Amazon EC2 sends metric data to CloudWatch in the configured period of time, five minutes in this case unless you enable detailed monitoring for the instance.
This metric data will not consist only of the average, but also the maximum and minimum CPU utilization percentage observed during that period of time. I mean, it will tell CloudWatch: in this period of time on average the CPU utilization was 40%, with a maximum of 90%, and a minimum of 20%. I hope you get the idea.
That explains why your graphs look different depending on the statistic chosen.
Please, consider read this entry in the AWS documentation, in which they explain how the CloudWatch statistics definitions work.
So I'm trying to setup composite alarms on AWS. So far, I have most of it set up. At the moment, I have a composite alarm set up with 3 alarms. If any 2 of these 3 alarms trigger, then the composite alarm also triggers. This part works fine.
However, I am having trouble with part of my use case. I'd also like to make it so that if one of these alarms within the composite alarm stays in alarm for over a certain period of time, then an alert is also sent out.
Here's an example of the situation:
2 out of the 3 alarms turn on in any time period: Alert should be sent
1 out of the 3 alarms turn on for under a certain time period: Alert should not be sent
1 out of the 3 alarms turn on for over a certain time period: Alert should be sent
I've tried looking into the settings available on the alarms themselves, and there doesn't seem to be an option for what I'm trying to do.
I'm wondering if this would require a lambda function? Is it possible for a lambda function to keep track of how long an alarm has been in alarm?
As talked in the comment section above, I am providing you with a possible solution to your problem. The only blocker is that you can't have different time frame for the alarms, both should be the same.
So you will have (example)- Alarm 1(cpu) if for 15 min it's over 60%. Alarm 2(EFS connections) if for 15 min there are more than 10 connections.
Now the alarm will go off when both the statements are true. Also the alarm will go off when only Alarm 1 goes off.
This is how you are going to make this alarm.
As for testing, it depends on what type of alarms you are making. For example cpu and ram increment methods are widely available on stackoverflow.
Also with aws cli you can change state of an alarm. It's usually for a very small amount of time, maybe 10 seconds.
aws cloudwatch set-alarm-state --alarm-name "myalarm" --state-value ALARM --state-reason "testing purposes"
You need to find a method which can suite your needs better.
I am trying to create alarm in cloudwatch. I have a metric where I emit 1.0 for success and 0.0 for failure. The SUM statistic is supposed to me give all the successful request while SAMPLE COUNT should give all the requests(including failed one). I want to create an alarm where if SUM(metric)/SAMPLE COUNT(metric) <= threshold it alarms. I am not able to do the same using a single metric. The SAMPLE COUNT option does not show up. Am I expected to create two metrics in order to achieve this in cloudwatch if it is not possible via single metric?
SUM(metric)/SAMPLE COUNT(metric) is the definition of average. Can you use the Average statistic?
I have one AWS load balancer going to one EC2 instance. According to the AWS documentation, and what I would expect it to mean, the CloudWatch metric for RequestCount on the ELB should show total number of requests. However, I get a graph mapped to a scale of 0-1, with 1 being the peak.
Is this correct? This is not useful for me. Is there a way to see the actual number of requests?
Okay, answering my own question for future searchers:
You need to go the Graph metrics tab and change the Statistic option to Sum (thanks #Dejan Peretin). I previously had it set to Average.
I am currently in the process of migrating some services to AWS and have hit a bit of a road block. I would like to be able to monitor the error percentage of a Lambda and create an Alarm if a certain threshold is breached. Currently the percentage error rate can be calculated with Metric Math, however alarms cannot be generated from this.
I was wondering if anyone know a way in that I could push the metrics require to calculate the percentage, Error and Invocation, to a Lambda and have the Lambda perform the calculation and create the SNS alarm?
Thanks!
CloudWatch just released the Alarms on MetricMath expressions.
https://aws.amazon.com/about-aws/whats-new/2018/11/amazon-cloudwatch-launches-ability-to-add-alarms-on-metric-math-expressions/
So basically you just need to:
Go to CloudWatch
Go to Alarms
Create Alarm
Add your metrics
Add a MetricMath expression
Optionally, add other properties for the alarm
Add the actions that you want to be executed
More information in their documentation