I'm trying to push data into a custom metric on AWS CloudWatch but wanted to find out more about the Dimensions and how these are used? I've already read the AWS documentation but it doesn't really explain what they are used for and how it affects the graphing UI in the AWS Management Console.
Are Dimensions a way to breakdown the Metric Value further?
To give a fictitious example, say I have a metric which counts the number of people in a room. The metric's name is called "Population". I report the count once a minute. The Metric Count is set to the number of people. The Dimension field is just a list of Name and Value pairs. Assuming I report a datapoint with a value of 90, can I add two Dimensions as follows:
1. Name: Male, Count: 50
2. Name: Female, Count: 40
Any help will be greatly appreciated.
Yes, you can add dimensions such as you described to your custom metrics.
However, CloudWatch is NOT able to aggregate across these dimensions, as it doesn't know the groups of these dimensions. Basically:
Amazon CloudWatch treats each unique combination of dimensions as a
separate metric. For example, each call to mon-put-data in the
following figure creates a separate metric because each call uses a
different set of dimensions. This is true even though all four calls
use the same metric name (ServerStats).
See more information about dimensions in CloudWatch here
Do note that you can retrieve aggregated value from API, as well as plot a graph in CloudWatch using a math expression. See Using metric math
I should probably also add that you can NOT use metric math in alarms.
update: as #Brooks said Amazon CloudWatch Launches Ability to Add Alarms on Metric Math Expressions
All in all pretty restricted and user-unfriendly compared e.g. to DataDog.
Related
Let's lay out some definitions before getting to the question itself.
Based on the CloudWatch concepts page in the docs.
So we have metric, which is a set of time-oredered set of datapoints. A metric is uniquely identitfied by it's namespace, name, and set of dimensions.
A dimension is a key=value pair which is part of the identity of a metric. So for example a metric called ServerStats with the dimensions Domain=Frankfurt,Server=Prod is not the same metric as the metric called ServerStats with the dimensions Domain=Rio,Server=Beta.
Now let's move to an example, and from there to the question:
Let's ride on the example given in the docs. Say I have 2 servers (Prod and Beta) in Frankfurt, and 2 servers (Prod and Beta) in Rio, which are publishing data points (which represent some count) regularly to CloudWatch, as such:
Dimensions: Server=Prod, Domain=Frankfurt, Unit: Count, Timestamp: 2016-10-31T12:30:00Z, Value: 105
Dimensions: Server=Beta, Domain=Frankfurt, Unit: Count, Timestamp: 2016-10-31T12:31:00Z, Value: 115
Dimensions: Server=Prod, Domain=Rio, Unit: Count, Timestamp: 2016-10-31T12:32:00Z, Value: 95
Dimensions: Server=Beta, Domain=Rio, Unit: Count, Timestamp: 2016-10-31T12:33:00Z, Value: 97
I've created a script to simulate this situation. If I go to my CloudWatch console, I can see my metrics:
So now that we have our example set up, I want to understand the statement in the docs:
For metrics produced by certain AWS services, such as Amazon EC2, CloudWatch can aggregate data across dimensions. For example, if you search for metrics in the AWS/EC2 namespace but do not specify any dimensions, CloudWatch aggregates all data for the specified metric to create the statistic that you requested. CloudWatch does not aggregate across dimensions for your custom metrics.
If I understand correctly, aggregating dimensions mean to specify a namespace of metrics, or a metric name, without any dimensions, and get an aggregate of the metrics of that name. So for example, in the AWS/EC2 namespace there are metrics called CPUUtilization. Some of them have the dimension InstanceId, and some have the dimension ImageId, and CloudWatch can aggregate those metrics to give us an overall CPUUtilization of all those metrics.
Now, in our example, It is possible to get an aggregate of all the Server=Prod metrics. If I run the following query:
SELECT SUM(ServerStats) FROM DataCenter WHERE Server = 'Prod'
I get an aggregate of 2 metrics: Server=Prod,Location=Frankfurt and Server=Prod,Location=Rio:
You can see that the Query1 metric's value is the addition of both other values (which are Prod Frankfurt and Prod Rio)
So I don't quite understand what is meant by this paragraph I've cited above, about that CloudWatch cannot aggregate data across dimensions for custom metrics.
Can someone clarify this?
It is poorly worded and out of date documentation on CloudWatch side.
What they mean is that they won't aggregate and store your data as a new metric. There will be no one particular metric that will contain the aggregation you need.
You can however use metric math and metric insight feature to calculate the aggregation on the fly, like you did above. These methods have limits though. Metric math is limited to 500 metrics. Metric Insights can aggregate up to 10000 metrics, but it's limited to latest 3 hours of data.
I'm trying to combine certain number of similar metrics into a single alarm in aws cloud watch. For example lets say for data quality monitoring in sagemaker, one among the metrics that are emitted from data quality monitoring job is feature baseline drift distance for each column so let say I've 600 columns so each column will have this metric. Is there a possible way to compress these metrics into a single cloud watch alarm ?
If not, Is there anyway to send the violation report as message via AWS SNS?
While I am not sure exactly on what out come you want when you refer to "compress the metrics into a single alarm." You can look at using metric math
Plotted a cloudwatch custim metric using lambda in aws cloudwatch/ but sent wrong dimension. In api call by mistake i sent swapped values for values / dimension name. now i am getting lot of metrics with 0.896, 0.345 etc dimentions. how to delete the. It iss creating garbage in the metric list.see screenshot for details.
Dimension is part of metric's identity:
A dimension is a name/value pair that is part of the identity of a metric.
Since its not possible to delete any metrics, you can't remove/change dimensions of metrics already in the AWS. You have to wait till it expires after 15 months:
CloudWatch does not support metric deletion. Metrics expire based on the retention schedules described above.
For your case, you have to create new metrics with correct dimension and use that in your plots and alarms.
I'm trying to identify the initial creation date of a metric on CloudWatch using the AWS CLI but don't see any way of doing so in the documentation. I can kind of identify the start date if there is a large block of missing data but that doesn't work for metrics that have large gaps in data.
CloudWatch metrics are "created" with the first PutMetricData call that includes the metric. I use quotes around created, because the metric doesn't have an independent existence, it's simply an entry in the time-series database. If there's a gap in time with no entries, the metric effectively does not exist for that gap.
Another caveat to CloudWatch metrics is that they only have a lifetime of 455 days, and individual metric values are aggregated as they age (see specifics at link).
All of which begs the question: what's the real problem that you're trying to solve.
Recently I have configured the ALARM in Cloudwatch for tracking VPN Tunnel connection. It is well known that 0 indicates tunnel is DOWN and 1 indicates tunnel is UP. When Connection is down, I have seen some data points on the graph shown as 0.66, 0.75.
So what does that mean, is the connection is DOWN or UP?
The correct statistic for each metric depends on your use case, and the underling metric.
From CloudWatch Concepts - Statistics
Statistics are metric data aggregations over specified periods of
time. CloudWatch provides statistics based on the metric data points
provided by your custom data or provided by other AWS services to
CloudWatch. Aggregations are made using the namespace, metric name,
dimensions, and the data point unit of measure, within the time period
you specify. The following table describes the available statistics.
Given the VPN metric above, try using the Maximum or Minimum statistics for the alarm. You are using the Average statistic, which, as you noted, will not produce meaningful data for your use case.
Minimum
The lowest value observed during the specified period. You can use this value to determine low volumes of activity for your application.
Maximum
The highest value observed during the specified period. You can use this value to determine high volumes of activity for your application.
That happens if your graph shows averages (that is why both of your values are between 1 and 0). In the ClouWatch console select your metric and then click on the Graphed metrics tab. There you will see Statistics column which is most likely set to Average now.