Monitor GC Dataflow events per minute in Stackdriver - google-cloud-platform

I run a Google Cloud dataflow job. I know how to monitor elementCount metric coming from it. But that metric shows me the total number of events processed by the job from its start. But how to monitor the rate? Like events per timespan, per minute in Stackdriver?
Ideally, I would like to apply a simple transformation on the elementCount metric inside the Stackdriver. But I'm afraid I would need to send a separate metric computed in the Dataflow job...

You can access all the stackdriver metrics via the API (although the elementCount is a gauge, you can fetch the time series). Here are all the dataflow metric in StackDriver:
https://cloud.google.com/monitoring/api/metrics_gcp#gcp-dataflow
Probably you need todo some calculations on the timeseries if you want to have the correct rate per time windows.
The API timeseries documentation is here:
https://cloud.google.com/monitoring/api/ref_v3/rpc/google.monitoring.v3
You can even access the API's in your dataflows. Note, that I think the way the metrics is used it should have been a counter.

Related

How to compress multiple metrics into a single cloud watch alarm using boto3 AWS

I'm trying to combine certain number of similar metrics into a single alarm in aws cloud watch. For example lets say for data quality monitoring in sagemaker, one among the metrics that are emitted from data quality monitoring job is feature baseline drift distance for each column so let say I've 600 columns so each column will have this metric. Is there a possible way to compress these metrics into a single cloud watch alarm ?
If not, Is there anyway to send the violation report as message via AWS SNS?
While I am not sure exactly on what out come you want when you refer to "compress the metrics into a single alarm." You can look at using metric math

Counter metrics in GCP metrics explorer

I have a DataFlow job with a counter metric. On every restart the metric is reset to zero, as expected. The problem is that when using the counter in gcp Metrics explorer, I cannot get an accumulated value for the metric, disregarding restarts. Prometheus has a function called increase() that does this. Is there a similar function for gcp metrics explorer?
One approuch to metrics across runs would be to make use of Cloud Monitoring. There is a good how to on the features and usage of custom metrics.
If you use job names that you can apply a regexp to then you can make use of the filters to aggregate them into a graph.

GCP Alerting policy: Increase by 50% on dataflow elapsed time

Currently I am trying to setup a Alert policy in GCP where I want to compare the current elapsed time of my dataflow job with the elapsed time of the previous dataflow job and fire an alert if the current job has a 50% bigger elapsed time.
Is it possible to do?
Thank you
You can create alerting policies based on metric absence and metric threshold conditions. You can take a look at this documentation for types of alerting policies you can create. It seems the feature you are looking for is not currently supported. However, you can report for Feature request using this documentation
One option would be to use a rate of change condition.
https://cloud.google.com/monitoring/alerts/types-of-conditions#metric-threshold
I don't know if it's exactly what you're looking for, but it should let you get alerts on big changes between runs.

Can I get the uptime check percentage for a particular uptime check via stackdriver monitoring api?

I have uptime checks configured for my instances in google cloud stackdriver. Now I need to programatically check the uptime percentage through that uptime check. Is there any api available for the same ?
I checked the documentation and didn't find any api to do so.
There appears to be a couple of metrics of interest:
monitoring.googleapis.com/uptime_check/check_passed - True if check passed
monitoring.googleapis.com/uptime_check/request_latency - Latency in msecs
see also: Creating Uptime Charts
Once can use these metrics for charts and alerts. In addition, since all metric data is retrievable as time-series information using APIs (at least REST APIs) then you can periodically retrieve the data and perform calculations upon it.
Also distinguish this from the metric called compute.googleapis.com/instance/uptime which is how long a VM has been running.

Custom metric in CloudWatch

How many requests per minute I can send for each custom metric in CloudWatch?
p.s. I know for standard metrics is 1 per minute.
Here's a quote from section Publishing Single Data Points within the Amazon CloudWatch Developer Guide:
Although you can publish data points with time stamps as granular as one-thousandth of a second, Amazon CloudWatch aggregates the data to a minimum granularity of one minute