Arithmetic operations for Stackdriver monitoring charts - google-cloud-platform

I'm trying to setup a Stackdriver dashboard for my custom metrics that my services provide.
In particular I'm starting with general custom/grpc/time_ms metric that is a gauge and have status label on it. I'd love to be able to set up a chart and alert for success rate of the metric(something like count:custom/grpc/time_ms{status:OK} / count:custom/grpc/time_ms{*}).
With my previous project I used Datadog and it was pretty easy to do so there. But I don't see any similar functionality neither in the UI nor in Stackdriver documentation. So I was wondering if it's not documented or simply not supported?

This question is quite old, however, still might be useful for new users of Google Cloud.
In 'Metrics Explorer' in Google Cloud Console there is an option to write a query with MQL (click Query Editor button).
MQL supports expressions which are described in detail here.
The simplest example for dividing one metric by another would look like this:
{ fetch
your_resource_type ::
your_metric_1
; fetch
your_resource_type ::
your_metric_2
}
| join
| div

Related

Get google cloud uptime history to a third party application

I am trying to get my application(where hosted in google cloud) uptime history to a my own page. Is there any api so something on google cloud? I only need to get date and the up/down percentage or time.
I am already configure the uptime checks on google console. But I need to integrate this into my application.
Yes, you can but it's not obvious and it may be easier to use something other than Cloud Monitoring to export uptime data to a non-GCP site :-)
If you do want to use Cloud Monitoring to source this data into an off-GCP page, one of the Cloud Monitoring SDKs may be best. You can create a URL too (see below) but you'll need to authenticate this URL and that may make it too complex.
By way of an example, here's an Uptime check I created against my blog:
I recommend Google APIs Explorer as it's an excellent way to understand Google's services (via the REST APIs) and to test an approach.
First: List|Get Uptime Check(s)
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.uptimeCheckConfigs/list
Plug in to the form on the right hand side parent, the value of projects/${PROJECT}
If your Project ID is freddie-210224-66311747 then you'd type project/freddie-210224-66311747.
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.uptimeCheckConfigs/get
For this one, you need to provide name, the value of projects/${PROJECT}/uptimeCheckConfigs/${UPTIME_CHECK}
If your Uptime check is called test, then you'd type projects/freddie-210224-66311747/uptimeCheckConfigs/test
NOTE In my case, I used an Uptime check name that included periods (my.blog.com) and this was converted (to my-blog-com). So, you may want to list first to check the name.
Click "Execute" (You don't need to have API Key checked but it makes no difference).
What I learned is that Uptime checks are Metrics like all others. I confirmed this by watching the Chrome Dev Tools while I was watching Uptime checks.
Ensure that you use the correct metric name. You can use Monitoring's Metrics Explorer to confirm this:
The Resource Type is Uptime Check URL (uptime_url)
One (!) of the Metrics you may use is Request Latency (monitoring.googleapis.com/uptime_check/request_latency)
If you populate the Metrics Explorer, you should see the same data plotted as with the Uptime Check page.
Click Query Editor to get your Uptime Metric represented as Cloud Monitoring Query Language (MQL), remove any line-feeds. You can use:
fetch uptime_url | metric 'monitoring.googleapis.com/uptime_check/request_latency' | group_by 1m, [value_request_latency_mean: mean(value.request_latency)] | every 1m
So, now we want to query Montioring Metric Time-series
https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.timeSeries/query
The value for name is projects/${PROJECT}
For query, paste in the MQL from above retain the quotes, i.e. "fetch uptime_url ..."
Hit EXECUTE
You should receive a snapshot of the time-series data underlying your Uptime URL. You can revise the MQL to reflect exactly the subset that you need. At 2021-02-24T20:55:38 the latency was 20.869:
So, to get e.g. request latencies for your uptime checks, you can use the Monitoring API's TimeSeries Query method and, with a suitable Query, this will yield JSON data including an array of Point (values). These values could then be transformed and surfaced into your external page.

Stackdriver log-based metrics does not display the values as reported by logging

My goal is to base my metrics directly from log values. The problem is when I display them as graph it looks like they are distributed. How can I change it so that it displays the values from the logs?
Unfortunately Stackdriver doesn't work in that way, you shouldn't expect that Stackdriver shows you "52" in this case. Have a look at the official documentation where "logs-based metrics can be one of two metric types: counter or distribution" and "counter metrics count the number of log entries matching" and "distribution metrics is to track latencies". You have to choose another tool for this task.
Assuming you created this as a distribution metric, I would expect this to work. Please take a look at this blog post to make sure you're using aligners and aggregators correctly.

AWS Pinpoint: How to view custom metrics

It is clear from the documentation that I can add custom metrics for a custom event.
How do I view these metrics in the Pinpoint console? From the Pinpoint console, it is obvious how to view attributes. I can go to Analytics > Events, select my custom event, and narrow down the events to whatever attributes I desire. I am asking about how to view metrics. To be clear, these differ by being continuous values whereas attributes are discrete. The documentation says that I can do this. See below how I can filter by attributes manually: (attribute is circled)
See the docs on custom events here: https://docs.aws.amazon.com/pinpoint/latest/developerguide/integrate-events.html
Similarly, creating a funnel only allows filtering for attributes. How can I filter for metrics?
Thank you for your time!
When I first asked this question, AWS had the ability to record metrics with the Swift SDK, but not view them in the Pinpoint API, which is absurd, because then you can only record metrics. What's the point? I asked in the AWS forums, and a couple months later, they responded something along the lines of "Please wait - coming soon."
This feature is now available, whereas before it simply wasn't.
Go to Pinpoint, your project, then click the Analytics drop-down menu, then click events. You can see that you can sort by metric. If you look at my outdated screenshot above, you'll see that this was not an option.

Is there any way to track a job across services in stackdrive?

We use lots of components in Google Cloud, for example a job may start on App Engine, then do some work in Apache Airflow, then do some Dataflow work which will run a BigQuery insert.
Is there any way we can track the status of a job across all components using stack driver. For example tell stackdriver somehow a custom job id and query for it.
You can use advanced logs filters [1] to include log entries from various products. In the logging page search for your BigQuery Job ID. Click to the Job ID and select show matching entries. This will open advanced filter text box with the proper syntax. Then you can add more queries with an OR in between.

Count number of GCP log entries during a specified time

Is it possible to count number of occurrences of a specific log message over a specific period of time from GCP Stackdriver logging? To answer the question "How many times did this event occur during this time period." Basically I would like the integral of the curve in the chart below.
It doesn't have to be a moving window, this time it's more of a one-time-task. A count-aggregator or similar on the advanced log query would also work if that would be available.
The query looks like this:
(resource.type="container"
logName="projects/xyz-142842/logs/drs"
"Publish Message for updated entity"
) AND (timestamp>="2018-04-25T06:20:53Z" timestamp<="2018-04-26T06:20:53Z")
My log based metric for the graph above looks like this:
My Dashboard is setup like this:
I ended up building stacked bars.
With correct zoom level I can sum up the number of occurrences easy enough. It would have been a nice feature to get the count directly from a graph (the integral), but this works for now.
There are multiple ways to do so, the two that I saw actually working and that can apply to your situation are the following:
Making use of Logs-based Metrics. They can, for example, record the number of log entries containing particular error messages, or they can extract latency information reported in log entries.
Stackdriver Logging logs-based metrics can be one of two metric types: counter or distribution. [...] Counter metrics count the number of log entries matching an advanced logs filter. [...] Distribution metrics accumulate numeric data from log entries matching a filter.
I would advise you to go through the Documentation to check this feature completely cover your use case.
You can export your logs to Big query, once you have them there you can make use of the classical tools like groupby, select and all the tool that BigQuery offers you.
Here you can find a very minimal step to step guide regarding how to export the logs and how to Analyzing Audit Logs Using BigQuery, but I am sure you can find online many resources.
The product and the approaches are really different, I would say that BigQuery is more flexible, but also more complex to be configure and to properly use it. If you find a third better way please update your question with those information.
At first you have to create a metric :
Go to Log explorer.
Type your query
Go to Actions >> Create Metric.
In the monitoring dashboard
Create a chart.
Select the resource and metric.
Go to "Advanced" and provide the details as given below :
Preprocessing step : Rate
Alignment function : count
Alignment period : 1
Alignment unit : minutes
Group by : log
Group by function : count
This will give you the visualisation in a bar chart with count of the desired events.
There is one more option.
You can read your custom metric using Stackdriver Monitoring API ( https://cloud.google.com/monitoring/api/v3/ ) and process it in script with whatever aggregation you need.
If you are working with python - you may look into gcloud python library https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/monitoring
It will be very simple script and you can stream results of calculation into bigquery table and use it in your dashboard
With PacketAI, you can send logs of arbitrary formats, including from GCP. then the logs dashboard will automatically parse and group into patterns as shown in this video. https://streamable.com/n50kr8
Counts and trends of different log patterns are also displayed
Disclaimer: I work for PacketAI