We have some stackdriver log entries that look something like this:
{
insertId: "xyz"
jsonPayload: {
countOfApples: 100
// other stuff
}
// other stuff
}
We would like to be able to set up a log-based metric that tells us the total number of apples seen in the past 10 mins (or any alignment period) but I have, thus far, been unable to find a means of doing so despite reading through the documentation.
Attempt 1:
Filter for those log-entries where countOfApples is specified and create a Counter metric with countOfApples as a label.
having done this, I can filter based on the countOfApples being above or below a certain value. I cannot see a means of aggregating based on this value. All the aggregation options seem to apply to the number of log entries matching the filter over the alignment period
Attempt 2:
Filter for those log-entries where countOfApples is specified and create a distribution metric with the Field Name set to jsonPayload.CountOfApples
This seems to get closer because I can now see the apple count in the metrics explorer but I cannot find the correct combination of Aligner/Reducers to just give me the total number of apples over the period? Selecting Aligner:delta & Reducer:sum results in an error message:
This aggregation does not produce a valid data type for a Line plot
type. Click here to switch the aligner to sum and the reducer to 99th
percentile
Is it possible to just monitor the total sum of all these values over each alignment period?
As of 2019/05/03, it is not possible to create a counter metric based on the values stored in the logs. Putting the values into a label simply exposes them as strings, which lets you filter but not perform aggregations based on those values. According to the documentation, a counter metric counts log entries, not the values in those log entries. As you've noticed, there aren't enough operations available on distribution metrics to do what you want.
For now, your best bet is to write your own custom metric based on those log values. You can do this by exporting your logs to Cloud Pub/Sub and writing some code to process the logs from Pub/Sub and send custom metrics. Alternatively, you could try to configure the Stackdriver monitoring agent to extract the values using the tail plugin, and send them as custom metrics.
If you just need to graph and explore the values (rather than, e.g., use them for alerting), you could try Cloud Datalab.
If anyone still looking how to solve it, it seems that now it's possible to do sum aggregation on distribution metric using sum_from function. Example:
fetch k8s_container
| metric 'logging.googleapis.com/user/tracking-data-len'
| group_by [], sum(sum_from(value))
Related
TL;DR;
What is the proper way to create a metric so that it generates reliable information about the log insights?
What is desired
The current Log insights can be seen similar to the following
However, it becomes easier to analyse these logs using the metrics (mostly because you can have multiple sources of data in the same plot and even perform math operations between them).
Solution according to docs
Allegedly, a log can be converted to a metric filter following a guide like this. However, this approach does not seem to work entirely right (I guess because of the time frames that have to be imposed in the metric plots), providing incorrect information, for example:
Issue with solution
In the previous image I've created a dashboard containing the metric count (the number 7), corresponding to the sum of events each 5 minutes. Also I've added a preview of the log insight corresponding to the information used to create the event.
However, as it can be seen, the number of logs is 4, but the event count displays 7. Changing the time frame in the metric generates other types of issues (e.g., selecting a very small time frame like 1 sec won't retrieve any data, or a slightly smaller time frame will now provide another wrong number: 3, when there are 4 logs, for example).
P.S.
I've also tried converting the log insights to metrics using this lambda function as suggested by Danil Smirnov to no avail, as it seems to generate the same issues.
I'm emitting metrics in google cloud counting how often certain jobs are executed and succeed vs. fail. These are cumulative metrics, so the actual values emitted should be monotonically increasing.
When I graph these metrics, for each job I would expect to get 3 lines, each generally increasing. Instead of graphing what's being emitted, it seems MQL is destroying the cumulative nature of these metrics.
visualized metrics, not cumulative for some reason
I would expect to see more of a step function, where the last value charted is 6 not 2.
The relevant MQL is:
fetch_cumulative gce_instance
| metric 'custom.googleapis.com/opencensus/foo_job.counter'
| group_by [metric.eventName],
[sum(value.counter)]
My understanding is that fetch would convert cumulative metrics to delta metrics and thus result in this behavior. But fetch_cumulative should avoid that.
I believe fetch_cumulative should do the job. However, according to the footnotes, the charting in the UI might convert it automatically into delta streams. Maybe try the API call directly to confirm this behaviour.
Is it possible to define CW metric as the difference between the same metrics, but in two consecutive data points in time?
I need to measure how many objects has been put in an S3 bucket for a given time period, so I would use make the difference NumberOfObjects in this time window. PS: I couldn't find any "New objects" metric (which is not the same of PutRequests).
You can use 'DIFF' function.
It returns the difference between each value in the time series and the preceding value from that time series.
Expression:
DIFF(m1)
I do not have much data to test it but in this example I had added 2 new objects and using that expression shows the new objects added that day.
Reference:
Functions supported for metric math
I followed the instructions here:
https://docs.amazonaws.cn/en_us/AmazonCloudWatch/latest/logs/CountOccurrencesExample.html
and created a log filter metric to count occurrences of a particular logged term
But when I graph the metric I get:
I don't see how a value of < 1 is possible for a count metric.
It seems like it is calculating something else, perhaps the ratio of hits for the log filter query vs total number of log entries. But that's a meaningless stat because these are application logs so it's not even the ratio of hits vs no of requests.
The shape of the graph looks right, but the units don't make sense.
How do I get a meaningful count from a log filter metric?
After thinking about this further I realised what maybe should have been obvious already... I was graphing the average rate of a count.
This can very easily be < 1
One option would be to instead graph the sum (per time bucket) of the count, so that is an easy way to get "occurrences per minute" or per second or whatever.
I realised eventually what I really wanted was the percentage of a specific subset of logs lines (potential matches) where the log term matched.
I achieved this by creating another metric which counted both the matching and non-matching instances of this log term, for a specific log path (e.g. requests to a particular endpoint, or calls to a specific function)
Then I can then hide both those metric lines from the graph and instead add a 'math expression' like m1 / m2 * 100 and show that to graph the percentage of requests which feature the log term of interest.
I have a custom cloud watch metric with unit Seconds. (representing the age of a cache)
As usual values are around 125,000 I'd like to convert them into Hours - for better readability.
Is that possible?
This has changed with the addition of Metrics Math. You can do all sorts of transformations on your data, both manually (from the console) and from CloudFormation dashboard templates.
From the console: see the link above, which says:
To add a math expression to a graph
Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.
Create or edit a graph or line widget.
Choose Graphed metrics.
Choose Add a math expression. A new line appears for the expression.
For the Details column, type the math expression. The tables in the following section list the functions you can use in the
expression.
To use a metric or the result of another expression as part of the formula for this expression, use the value shown in the Id column. For
example, m1+m2 or e1-MIN(e1).
From a CloudFormation Template
You can add new metrics which are Metrics Math expressions, transforming existing metrics. You can add, subtract, multiply, etc. metrics and scalars. In your case, you probably just want to use divide, like in this example:
Say you have the following bucket request latency metrics object in your template:
"metrics":[
["AWS/S3","TotalRequestLatency","BucketName","MyBucketName"]
]
The latency default is in milliseconds. Let's plot it in seconds, just for fun. 1s = 1,000ms so we'll add the following:
"metrics":[
["AWS/S3","TotalRequestLatency","BucketName","MyBucketName",{"id": "timeInMillis"}],
[{"expression":"timeInMillis / 1000", "label":"LatencyInSeconds","id":"timeInSeconds"}]
]
Note that the expression has access to the ID of the other metrics. Helpful naming can be useful when things get more complicated, but the key thing is just to match the variables you put in the expression to the ID you assign to the corresponding metric.
This leaves us with a graph with two metrics on it: one milliseconds, the other seconds. If we want to lose the milliseconds, we can, but we need to keep the metric values around to compute the math expression, so we use the following work-around:
"metrics":[
["AWS/S3","TotalRequestLatency","BucketName","MyBucketName",{"id": "timeInMillis","visible":false}],
[{"expression":"timeInMillis / 1000", "label":"LatencyInSeconds","id":"timeInSeconds"}]
]
Making the metric invisible takes it off the graph while still allowing us to compute our expression off of it.
Cloudwatch does not do any Unit conversion (i.e seconds into hours etc). So you cannot use the AWS console to display your 'Seconds' datapoint values converted to Hours.
You could either publish your metric values as 'Hours' (leaving the Unit field blank or set it to 'None').
Otherwise if you still want to provide the datapoints with units 'Seconds' you could retrieve the datapoints (using the GetMetricStatistics API) and graph the values using some other dashboard/graphing solution.