I've created dashboard for a service deployed to Fargate using the following:
Namespace
AWS/ECS
Metric name
CPUUtilization
Both Average and Max can be viewed but neither of these is that useful. Max is really high and Average is really low.
When choosing p99 from the drop-down selector no data is returned and nothing is plotted on the chart. Is it just a case that p99 isn't supported by CPUUtilization metric on Fargate?
Is there a way to get this stat manually on the dashboard, and added to an Alarm as a threshold?
I have started an EC2 instance (with standard monitoring).
From my understanding, the EC2 service will publish 1 datapoint every 5 minutes for the CPUUtilization to Cloudwatch.
Hence my question is, why are the graphs different for a 5 minutes visualization for different statistics (Min, Max, Avg, ...) ?
Since there is only 1 datapoint per 5 minutes, the Min, Max or Average of a single datapoint should be the same right ?
Example:
Just by changing the "average" statistic to the "max", the graph changes (I don't understand why).
Thanks
Just to add on to #jccampanero's answer, I'd like to explain it with a bit more details.
From my understanding, the EC2 service will publish 1 datapoint every 5 minutes for the CPUUtilization to CloudWatch.
Yes, your understanding is correct, but there are two types of datapoint. One type is called "raw data", and the other type is called "statistic set". Both types use the same PutMetricData API to publish metrics to CloudWatch, but they use different options.
Since there is only 1 datapoint per 5 minutes, the Min, Max or Average of a single datapoint should be the same right?
Not quite. This is only true when all datapoints are of type "raw data". Basically just think of it as a number. If you have statistic sets, then the Min, Max and Average of a single datapoint can be different, which is exactly what happens here.
If you choose the SampleCount statistic, you can see that one datapoint here is an aggregation of 5 samples. Just to give you a concrete example, actually, let's take the one in #jccampanero's answer.
In this period of time on average the CPU utilization was 40%, with a maximum of 90%, and a minimum of 20%,. I hope you get the idea.
Translated to code (e.g. AWS CLI), it's something like
aws cloudwatch put-metric-data \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--unit Percent \
--statistic-values Sum=200,Minimum=20,Maximum=90,SampleCount=5 \
--dimensions InstanceId=i-123456789
If EC2 were using AWS CLI to push the metrics to CloudWatch, this would be it. I think you get the idea now, and it's quite common to aggregate the data to save some money on the CloudWatch bill.
Honestly I have never thought about it carefully but from my understanding the following is going on.
Amazon EC2 sends metric data to CloudWatch in the configured period of time, five minutes in this case unless you enable detailed monitoring for the instance.
This metric data will not consist only of the average, but also the maximum and minimum CPU utilization percentage observed during that period of time. I mean, it will tell CloudWatch: in this period of time on average the CPU utilization was 40%, with a maximum of 90%, and a minimum of 20%. I hope you get the idea.
That explains why your graphs look different depending on the statistic chosen.
Please, consider read this entry in the AWS documentation, in which they explain how the CloudWatch statistics definitions work.
Is there a quick way to check how many data (volume wise, GBs, TBs etc) did my specific DMS task transfered for example within last month?
I can't find any note in the documentation on that, I could probably try with boto3 but want to double check first. Thanks for help!
Even with Boto3, you can check the API - DescribeReplicationTasks but likely, there is no information about your data transfer.
Reference: https://docs.aws.amazon.com/dms/latest/APIReference/API_DescribeReplicationTasks.html
If you have only 1 data replication task that is associated with only 1 replication instance, you can check that replication instance's network metric via CloudWatch metric. From CloudWatch metrics, AWS DMS namespace, there will be several network metrics such as NetworkTransitThroughput or NetworkReceiveThroughput. You can choose one and try as below:
Statistic: Sum
Period: 30 days (or up to you)
And you have a 30DAYS_THROUGHPUT.
I have one AWS load balancer going to one EC2 instance. According to the AWS documentation, and what I would expect it to mean, the CloudWatch metric for RequestCount on the ELB should show total number of requests. However, I get a graph mapped to a scale of 0-1, with 1 being the peak.
Is this correct? This is not useful for me. Is there a way to see the actual number of requests?
Okay, answering my own question for future searchers:
You need to go the Graph metrics tab and change the Statistic option to Sum (thanks #Dejan Peretin). I previously had it set to Average.
I have a Master-Slave configuration on AWS RDS MySQL.
I want to set an alert when the replication lag goes above a certain threshold (e.g. 10 seconds).
How can it be done?
If it is not possible, is there another way to achieve similar result? (without using 3rd party tools / custom scripting)
You can track replica lag using the ReplicaLag metric on your slave instance. Note that this metric is measure in seconds. This metric is reported automatically by RDS every minute.
You can create a CloudWatch alarm to monitor the ReplicaLag metric. You should set this alarm to be breaching if the sum of ReplicaLag over an evaluation period of 1 minute is greater than 0.