Amazon CloudWatch Metric Math - Rolling Average - amazon-web-services

So I am trying to visualize a AWS CloudWatch Metric with Tolerance via AWS Managed Grafana.
For example, this is my current graph (and I want to add tolerance lines to it):
I want to add some tolerance lines too see which spikes go outside of the expected range.
I could technically do this by enabling CloudWatch anomaly detection and using ANOMALY_DETECTION_BAND(a) as my metric math but I am trying to replicate a dashboard we currently have which uses a 6 week rolling average with a simple multiplier as the upper and lower thresholds.
My thought was that I can accomplish this using metric math by leveraging a combination of SLICE, RUNNING_SUM, and DATAPOINT_COUNT but no matter what combination I try I can't seem to find the right mix.
Does anyone know how I can using Metric Math to create a time series where each data point is either:
The average of the last x amount of time data points (Ex: the last 6 days worth of data points)
The average of the last x data points.
If I can figure out either of these solutions I can do the rest but I am having a hard time just getting "the last x data points" instead of referencing the entire query when doing any metric math operation.
I could maybe find a way to do this with built in Grafana functionality as well but I couldn't find a great way to do it. (Still new to Grafana).

Related

AWS CloudWatch - Creating a metric from different datapoints in time

Is it possible to define CW metric as the difference between the same metrics, but in two consecutive data points in time?
I need to measure how many objects has been put in an S3 bucket for a given time period, so I would use make the difference NumberOfObjects in this time window. PS: I couldn't find any "New objects" metric (which is not the same of PutRequests).
You can use 'DIFF' function.
It returns the difference between each value in the time series and the preceding value from that time series.
Expression:
DIFF(m1)
I do not have much data to test it but in this example I had added 2 new objects and using that expression shows the new objects added that day.
Reference:
Functions supported for metric math

CloudWatch log filter count metric values are < 1

I followed the instructions here:
https://docs.amazonaws.cn/en_us/AmazonCloudWatch/latest/logs/CountOccurrencesExample.html
and created a log filter metric to count occurrences of a particular logged term
But when I graph the metric I get:
I don't see how a value of < 1 is possible for a count metric.
It seems like it is calculating something else, perhaps the ratio of hits for the log filter query vs total number of log entries. But that's a meaningless stat because these are application logs so it's not even the ratio of hits vs no of requests.
The shape of the graph looks right, but the units don't make sense.
How do I get a meaningful count from a log filter metric?
After thinking about this further I realised what maybe should have been obvious already... I was graphing the average rate of a count.
This can very easily be < 1
One option would be to instead graph the sum (per time bucket) of the count, so that is an easy way to get "occurrences per minute" or per second or whatever.
I realised eventually what I really wanted was the percentage of a specific subset of logs lines (potential matches) where the log term matched.
I achieved this by creating another metric which counted both the matching and non-matching instances of this log term, for a specific log path (e.g. requests to a particular endpoint, or calls to a specific function)
Then I can then hide both those metric lines from the graph and instead add a 'math expression' like m1 / m2 * 100 and show that to graph the percentage of requests which feature the log term of interest.

Stackdriver Logs-Based Metrics - need sum over alignment period

We have some stackdriver log entries that look something like this:
{
insertId: "xyz"
jsonPayload: {
countOfApples: 100
// other stuff
}
// other stuff
}
We would like to be able to set up a log-based metric that tells us the total number of apples seen in the past 10 mins (or any alignment period) but I have, thus far, been unable to find a means of doing so despite reading through the documentation.
Attempt 1:
Filter for those log-entries where countOfApples is specified and create a Counter metric with countOfApples as a label.
having done this, I can filter based on the countOfApples being above or below a certain value. I cannot see a means of aggregating based on this value. All the aggregation options seem to apply to the number of log entries matching the filter over the alignment period
Attempt 2:
Filter for those log-entries where countOfApples is specified and create a distribution metric with the Field Name set to jsonPayload.CountOfApples
This seems to get closer because I can now see the apple count in the metrics explorer but I cannot find the correct combination of Aligner/Reducers to just give me the total number of apples over the period? Selecting Aligner:delta & Reducer:sum results in an error message:
This aggregation does not produce a valid data type for a Line plot
type. Click here to switch the aligner to sum and the reducer to 99th
percentile
Is it possible to just monitor the total sum of all these values over each alignment period?
As of 2019/05/03, it is not possible to create a counter metric based on the values stored in the logs. Putting the values into a label simply exposes them as strings, which lets you filter but not perform aggregations based on those values. According to the documentation, a counter metric counts log entries, not the values in those log entries. As you've noticed, there aren't enough operations available on distribution metrics to do what you want.
For now, your best bet is to write your own custom metric based on those log values. You can do this by exporting your logs to Cloud Pub/Sub and writing some code to process the logs from Pub/Sub and send custom metrics. Alternatively, you could try to configure the Stackdriver monitoring agent to extract the values using the tail plugin, and send them as custom metrics.
If you just need to graph and explore the values (rather than, e.g., use them for alerting), you could try Cloud Datalab.
If anyone still looking how to solve it, it seems that now it's possible to do sum aggregation on distribution metric using sum_from function. Example:
fetch k8s_container
| metric 'logging.googleapis.com/user/tracking-data-len'
| group_by [], sum(sum_from(value))

Cumulative sum of AWS Cloudwatch Metric

AWS Cloudwatch receives a count of 1 every time I start an image download. I am downloading 1,000s of images (on a cluster of EC2 instances) and would like to track the total progress.
I can't find any documentation on how to plot the cumulative sum of a metric. The AWS Cloudwatch Math Expressions looked promising, but they do not have an integrate function.
Currently, I can plot the sum of the started image downloads but only for periods, as seen below. Ideally, I'd like to plot the integral of this plot:
You can get a cumulative sum over the current range by using the SUM() function that is operated over the original range containing only the number One (1). Remember, you're looking for a single number in the end, so it's not much of a graph, but you need to turn the single value sum back into a time-series.
Define m1 as your metric. This is the metric you will want to use SUM() on.
Define an expression e1 as m1/m1. This results in a time-series with every value equal to 1. This is what will allow you convert that SUM back to a time-series.
Define an expression e2 as SUM(m1) / e1. This is, effectively, the cumulative sum of m1 divided by one for every data-point in the original time-series. It will be a horizontal line on the graph, which will have every point on that horizontal line being the cumulative sum of metric m1. This is required because Cloudwatch can only plot a time-series on the chart, not a single value.
Make m1 and e1 invisible. You need them, but you don't need to see them.
Finally, change the chart type from Line to Number, since you only wanted the cumulative sum anyway.
The reason you can't use SUM() directly is because it is a single value. By dividing by a time-series containing all 1's, the entire graph is the result of the SUM(). Then, changing the chart to a Number effectively hides all the math and presents only the "final result".
Looks like RUNNING_SUM() has been added that does what your need:
Graph with RUNNING_SUM
You can find RUNNING_SUM() under "Add math"->"All functions"
You are correct. All Amazon CloudWatch metrics are for a defined period.
The maximum period for a metric is one day, so this is not suitable for a cumulative counter that you wish to continue beyond one day.
You would need to find an alternate method of storing the count, such as an Amazon DynamoDB table. Use an atomic counter via UpdateItem to increment the count.
You can also use a very long period.
Change your stat to SUM, and set your metric's period to 7 days. You'll get a time series of 1 point with the cumulative sum of all the downloads.
If you give each download a unique dimension value, you can keep your queries separate.

Show CloudWatch metric with unit Seconds in Hours

I have a custom cloud watch metric with unit Seconds. (representing the age of a cache)
As usual values are around 125,000 I'd like to convert them into Hours - for better readability.
Is that possible?
This has changed with the addition of Metrics Math. You can do all sorts of transformations on your data, both manually (from the console) and from CloudFormation dashboard templates.
From the console: see the link above, which says:
To add a math expression to a graph
Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
Create or edit a graph or line widget.
Choose Graphed metrics.
Choose Add a math expression. A new line appears for the expression.
For the Details column, type the math expression. The tables in the following section list the functions you can use in the
expression.
To use a metric or the result of another expression as part of the formula for this expression, use the value shown in the Id column. For
example, m1+m2 or e1-MIN(e1).
From a CloudFormation Template
You can add new metrics which are Metrics Math expressions, transforming existing metrics. You can add, subtract, multiply, etc. metrics and scalars. In your case, you probably just want to use divide, like in this example:
Say you have the following bucket request latency metrics object in your template:
"metrics":[
["AWS/S3","TotalRequestLatency","BucketName","MyBucketName"]
]
The latency default is in milliseconds. Let's plot it in seconds, just for fun. 1s = 1,000ms so we'll add the following:
"metrics":[
["AWS/S3","TotalRequestLatency","BucketName","MyBucketName",{"id": "timeInMillis"}],
[{"expression":"timeInMillis / 1000", "label":"LatencyInSeconds","id":"timeInSeconds"}]
]
Note that the expression has access to the ID of the other metrics. Helpful naming can be useful when things get more complicated, but the key thing is just to match the variables you put in the expression to the ID you assign to the corresponding metric.
This leaves us with a graph with two metrics on it: one milliseconds, the other seconds. If we want to lose the milliseconds, we can, but we need to keep the metric values around to compute the math expression, so we use the following work-around:
"metrics":[
["AWS/S3","TotalRequestLatency","BucketName","MyBucketName",{"id": "timeInMillis","visible":false}],
[{"expression":"timeInMillis / 1000", "label":"LatencyInSeconds","id":"timeInSeconds"}]
]
Making the metric invisible takes it off the graph while still allowing us to compute our expression off of it.
Cloudwatch does not do any Unit conversion (i.e seconds into hours etc). So you cannot use the AWS console to display your 'Seconds' datapoint values converted to Hours.
You could either publish your metric values as 'Hours' (leaving the Unit field blank or set it to 'None').
Otherwise if you still want to provide the datapoints with units 'Seconds' you could retrieve the datapoints (using the GetMetricStatistics API) and graph the values using some other dashboard/graphing solution.