I have a DataFlow job with a counter metric. On every restart the metric is reset to zero, as expected. The problem is that when using the counter in gcp Metrics explorer, I cannot get an accumulated value for the metric, disregarding restarts. Prometheus has a function called increase() that does this. Is there a similar function for gcp metrics explorer?
One approuch to metrics across runs would be to make use of Cloud Monitoring. There is a good how to on the features and usage of custom metrics.
If you use job names that you can apply a regexp to then you can make use of the filters to aggregate them into a graph.
Related
I'm trying to combine certain number of similar metrics into a single alarm in aws cloud watch. For example lets say for data quality monitoring in sagemaker, one among the metrics that are emitted from data quality monitoring job is feature baseline drift distance for each column so let say I've 600 columns so each column will have this metric. Is there a possible way to compress these metrics into a single cloud watch alarm ?
If not, Is there anyway to send the violation report as message via AWS SNS?
While I am not sure exactly on what out come you want when you refer to "compress the metrics into a single alarm." You can look at using metric math
I wanted to get notified if/when there is/are any VM creation in my infra on GCP.
I see a google library that can give me list of VM.
I can create a function to use this code (probably)
Schedule the above function. And check for difference.
But do storage like triggers available for Compute.
Also if there is any other solution.
You have a third solution. You can use Cloud Run instead of Cloud Functions (the migration is very easy, let me know if you have issues).
With Cloud Run, you can use the trigger (eventArc feature), a new feature (still in preview) based on the auditLog logs. It's very similar to the first solution proposed by LundinCast, but it's automatically set up by Cloud Run Trigger feature.
So, deploy your service on Cloud Run. Then configure a trigger on v1.compute.instancs.insert API, select your region or make the trigger global and that's all!! Your service will be triggered when a new instance will be created.
As you can see in my screenshot, you will be asked to activate the auditLog to be able to use this feature. Because it's built-in, it's done automatically for you!
Using Logging sink and a PubSub-triggered Cloud Function
First, export the relevant logs to a PubSub topic of your choice by creating a Logging sink. Include the logs created automatically during VM creation with the following log filter:
resource.type="gce_instance"
protoPayload.methodName="beta.compute.instances.insert"
protoPayload.methodName="compute.instances.insert"
Next, create a Cloud Function that'll trigger every time a new log is set to the PubSub topic. You can process this new message as per your needs.
Note that with this option you'll have to handle to notification yourself (for example, by sending an email). It is useful though if you want to send different notification based on some condition or if you want to perform additional actions apart from the notification.
Using a log-based metric and a Cloud Monitoring alert
You can use a Log-based metric filtering logs for Compute Engine VM creation and set an alert on that metric to get notified.
First create a counter log-based metric with a log filter similar to the one in the previous method, which will report a data point to Cloud monitoring every time a new VM instance is created.
Then go to Cloud Monitoring and create an alert based on that metric that trigger every time a metric is reported.
This option is the easiest to set up and supports various notification channels out-of-the-box.
Going along with LudninCast's answer.
Cloud Run --
Would have used it if it had not been zone issue for me. Though I conclude this from POC I did
Easy setup.
Containerised Apps. Probably more code to maintain.
Public URL for app.
Out of box support for the requirements like mine.
Cloud Function --
Sink setups for triggers can be time consuming for first timer
Easy coding and maintainance.
I have uptime checks configured for my instances in google cloud stackdriver. Now I need to programatically check the uptime percentage through that uptime check. Is there any api available for the same ?
I checked the documentation and didn't find any api to do so.
There appears to be a couple of metrics of interest:
monitoring.googleapis.com/uptime_check/check_passed - True if check passed
monitoring.googleapis.com/uptime_check/request_latency - Latency in msecs
see also: Creating Uptime Charts
Once can use these metrics for charts and alerts. In addition, since all metric data is retrievable as time-series information using APIs (at least REST APIs) then you can periodically retrieve the data and perform calculations upon it.
Also distinguish this from the metric called compute.googleapis.com/instance/uptime which is how long a VM has been running.
I can find a graph of "Group size" in the page of the instance group.
However, when I try to find this metric in Stackdriver, it doesn't exist.
I tried looking in the metricDescriptors API, but it doesn't seem to be there either.
Where can I find this metric?
I'm particularly interested in sending alerts when this metrics goes to 0.
There is not a Stackdriver Monitoring metric for this data yet. You can fetch the size using the instanceGroups.get API call. You could create a system that polls this data and posts it back to Stackdriver Monitoring as a custom metric and then you will be able to access it from Stackdriver.
I run a Google Cloud dataflow job. I know how to monitor elementCount metric coming from it. But that metric shows me the total number of events processed by the job from its start. But how to monitor the rate? Like events per timespan, per minute in Stackdriver?
Ideally, I would like to apply a simple transformation on the elementCount metric inside the Stackdriver. But I'm afraid I would need to send a separate metric computed in the Dataflow job...
You can access all the stackdriver metrics via the API (although the elementCount is a gauge, you can fetch the time series). Here are all the dataflow metric in StackDriver:
https://cloud.google.com/monitoring/api/metrics_gcp#gcp-dataflow
Probably you need todo some calculations on the timeseries if you want to have the correct rate per time windows.
The API timeseries documentation is here:
https://cloud.google.com/monitoring/api/ref_v3/rpc/google.monitoring.v3
You can even access the API's in your dataflows. Note, that I think the way the metrics is used it should have been a counter.