Dataflow Error Monitoring and Alerting - google-cloud-platform

What is the best way to set up Dataflow resource monitoring and alerting on Dataflow errors?
Is it custom log based metrics only?
Checked Cloud Monitoring - Dataflow is not listed there - no metrics available.
Checked Error Reporting - it is empty too, despite a few of my flows failing.
What do I miss?

Update March 2017: Stackdriver Monitoring Integration with Dataflow is now in Beta. Review user docs, and listen to Dataflow engineers talking about it at GCP Next.
For the time being you could set up alerts based on Dataflow logs (go to Stackdriver Logging and set up alerts there). We are also working on better alerting using Stackdriver Monitoring and will post an announcement to our Big Data blog when it's in beta.

Related

Manually triggering alert in Google Cloud Monitoring

I want to be able to programmatically trigger alerts in Google Cloud Monitoring. Basically I have a watchdog that I want to execute certain actions based on multiple criteria. One of those actions is that I want it to trigger a new alert in Google Cloud Monitoring.
Is there a smooth way to do this?
So far my best guess is:
Setup an alert policy on a custom metric (like isTriggeringAlert>0)
Write the actions to a log ("[ALERT]: ....") and use Cloud Monitoring to catch that log
Both works, but was wondering if there is a programatic way to trigger instead? I haven't found anything in the Python SDK for Cloud Monitoring (just how to create monitoring policies)
Regards,
Niklas
This feature has been requested, but isn't available yet.
As a workaround, you can try writing appropriate data into timeSeries using this API.

GCP Logging log-based metric from folder-level logs?

In Google Cloud Logging (nee Stackdriver), can I create a log-based metric for logs aggregated at folder/organization level? I want to have a log-based metric for a certain audit event across many projects.
This isn't currently supported. You can export logs to a different project, but can't have metrics encapsulating more than one project.
If you think that functionality should be available, you can create a Feature Request at Public Issue Tracker.

How can I get notifications for all stderr logs sent to Google Cloud Logging?

I'd like to get notifications for all standard error logs sent to Google Cloud Logging. Preferably, I'd like to get the notifications through Google Cloud Error Reporting, so I can easily get notifications on my phone through the GCP mobile app.
I've deployed applications to Google Kubernetes Engine that are writing logs to standard error, and GKE is nicely forwarding all the stderr logs to Google Cloud Logging with logName: "projects/projectName/logs/stderr"
I see the logs show up in Google Cloud Logging, but Error Reporting does not pick up on them.
I've tried troubleshooting as described here: https://cloud.google.com/error-reporting/docs/troubleshooting. But the proposed solutions revolve around formatting the logs in a certain way. What if I've deployed applications for which I can't control the log messages?
A (totally ridiculous) option could be to create a "logs-based metric" based on any log sent to stderr, then get notified whenever that metric exceeds 1.
What's the recommended way to get notified for stderr logs?
If Error Reporting is not recognizing the stderr logs from your container it means they are not displayed in the correct format for this API to be able to detect them.
Take a look at this guide on how to setup error reporting for GKE API
There are other ways to do this with third party products like gSlack where basically you will export the stackdriver logs to pub/sub and then send them into the Slack channel with Cloud Functions.
You can also try to do it by using Cloud Build and try to integrate it with GKE container logs.
Still, I think the best and easiest option will be by using the Monitoring Alert.
You can force the error by setting the #type in the context as shown in the docs. For some reason, even if this is the Google library, and it has code to detect that an Exception is thrown, with its stack trace, it won't recognize it as an error worth reporting.
I also added the service array to be able to identify my service in the error reporting.

Monitoring Alert for Cloud Build failure on master

I would like to receive a notification on my Notification Channel every time in Cloud Build a Build on master fails.
Now there were mentions of using Log Viewer but it seems like there is no immediate way of accessing the branch.
Is there another way where I can create a Monitoring Alert/a Metric which is specific to master?
A easy solution might be to define a logging metric and link an alerting trigger to this.
Configure Slack alerting in Notification channels of GCP.
Define your logging metric trigger in Logs-based Metrics. Make a Counter with Units 1 and filter using the logging query language:
resource.type="build"
severity=ERROR
or
resource.type="build"
textPayload=~"^ERROR:"
Create an Alerting Policy with that metric you've just defined and link the trigger to your Slack notification channel you've configured in step 1.
you can create Cloud Build notifications sending you updates to desired channels, such as Slack or your SMTP server HTTP channel. Also create a PubSub topic when your build's state changes, such as when your build is created, when your build transitions to a working state.
I just went through the pain of trying to get the official GCP slack integration via Cloud Run working. It was too cumbersome and didn't let me customize what I wanted.
Best solution I see is to get Cloud Build setup to send Pub/Sub messages to the cloud-builds topic. With that, you can use the below repo I just made public to filter on the specific branch you want but looking at the data_json['substitutions']['BRANCH_NAME'] field.
https://github.com/Ucnt/gcp-cloud-build-slack-notifier

Stackdriver Metrics through Pub/Sub?

I was curious if Stackdriver metrics are only available via the API or is there a way to send them through Pub/Sub? I'm currently not seeing any of the metrics listed here for Compute Engine in my Pub/Sub output.
I did create a sink for all gce vm instances to export from Stackdriver logging in Pub/Sub and I'm not seeing any of them.
There are a few different types of signals that Stackdriver organizes--metrics, logs, traces, errors, plus derived signals like incidents or error groups. Logs can be exported via Pub/Sub using sinks. Metrics, traces, and errors can only be pulled via the API today.