Monitoring Alert for Cloud Build failure on master

Monitoring Alert for Cloud Build failure on master - google-cloud-platform

I would like to receive a notification on my Notification Channel every time in Cloud Build a Build on master fails.
Now there were mentions of using Log Viewer but it seems like there is no immediate way of accessing the branch.
Is there another way where I can create a Monitoring Alert/a Metric which is specific to master?

A easy solution might be to define a logging metric and link an alerting trigger to this.
Configure Slack alerting in Notification channels of GCP.
Define your logging metric trigger in Logs-based Metrics. Make a Counter with Units 1 and filter using the logging query language:
resource.type="build"
severity=ERROR
or
resource.type="build"
textPayload=~"^ERROR:"
Create an Alerting Policy with that metric you've just defined and link the trigger to your Slack notification channel you've configured in step 1.

you can create Cloud Build notifications sending you updates to desired channels, such as Slack or your SMTP server HTTP channel. Also create a PubSub topic when your build's state changes, such as when your build is created, when your build transitions to a working state.

I just went through the pain of trying to get the official GCP slack integration via Cloud Run working. It was too cumbersome and didn't let me customize what I wanted.
Best solution I see is to get Cloud Build setup to send Pub/Sub messages to the cloud-builds topic. With that, you can use the below repo I just made public to filter on the specific branch you want but looking at the data_json['substitutions']['BRANCH_NAME'] field.
https://github.com/Ucnt/gcp-cloud-build-slack-notifier

Related

MediaLive (AWS) how to view channel alerts from php SDK

Question
I have set up a Laravel project that connects to AWS MediaLive for streaming.
Everything is working fine, and I am able to stream, but I couldn't find a way to see if a channel that was running had anyone connected to it.
What I need
I want to be able to see if a running channel has anyone connected to it via the php SDK.
Why
I want to show a stream on the user's side only if there is someone connected to it.
I want to stop a channel that has noone connected to it for too long (like an hour?)
Other
I tried looking at the docs but the closest thing I could find was the DescribeChannel command.
This however does not return any informations about the alerts. I also tried comparing the output of DescribeChannel when someone was connected and when noone was connected, but there was no difference
On the AWS site I can see the alerts on the channel page, but I cannot find how to view that from my laravel application.
Update
I tried running these from the SDK:
CloudWatch->DescribeAlarms();
CloudWatchLogs->GetLogEvents(['logGroupName'=>'ElementalMediaLive', 'logStreamName'=>'channel-log-stream-name']);
But it seems to me that their output didn't change after a channel started running without anyone connected to it.
I went on the console's CloudWatch and it was the same.
Do I need to first set up Egress Points for alerts to show here?
I looked into SNS Topics and lambda functions, but it seems they are for sending messages and notifications? can I also use this to stop/delete a channel that has been disconnected for over an hour? Are there any docs that could help me?
I'm using AWS MediaStore, but I'm guessing I can do the same as AWS MediaPackage? How can the threshold tell me if, and for how long no-one has been connected to a MediaLive channel?
Overall
After looking here and there in the docs I am assuming I have to:
1. set up a metric alarm that detects when a channel had no input for over an hour
2. Send the alarm message to the CloudWatchLogs
3. retrieve the alarm message from the SDK and/or the SNS Topic
4. stop/delete the channel that sent the alarm message
Did I understand this correctly?

Thanks for your post.
Channel alerts will go your AWS CloudWatch logs. You can poll these alarms from SDK or CLI using a command of the form 'aws cloudwatch describe-alarms'. Related log events may be retrieved with a command of the form 'aws logs get-log-events'.
You can also configure a CloudWatch rule to propagate selected service alerts to an SNS Topic which can be polled by various clients including a Lambda function, which can then take various actions on your behalf. This approach works well to aggregate the alerts from multiple channels or services.
Measuring the connected sessions is possible for MediaPackage endpoints, using the 2xx Egress Request Count metric. You can set a metric alarm on this metric such that when its value drops below a given threshold, and alarm message will be sent to the CloudWatch logs mentioned above.

With Regard to your list:
set up a metric alarm that detects when a channel had no input for over an hour
----->CORRECT.
Send the alarm message to the CloudWatchLogs
----->The alarm message goes directly to an SNS Topic, and will be echoed to your CloudWatch logs. See: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html
a Lambda Fn will need to be created to process new entries arriving in the SNS topic (queue) mentioned above, and take a desired action. This Lambda Fn can send API or CLI calls to stop/delete the channel that sent the alarm message. You can also have email alerts or other actions triggered from the SNS Topic (queue); refer to https://docs.aws.amazon.com/sns/latest/dg/sns-common-scenarios.html
Alternatively, you could do everything in one lambda function that queries the same MediaPackage metric (EgressRequestCount), evaluates the response and takes a yes/no action WRT shutting down a specified channel. This lambda function could be scheduled to run in a recurring fashion every 5 minutes to achieve the desired result. This approach would be simpler to implement, but is limited in scope to the metrics and actions coded into the Lambda Function. The Channel Alert->SNS->LAMBDA approach would allow you to take multiple actions based on any one Alert hitting the SNS Topic (queue).

Manually triggering alert in Google Cloud Monitoring

I want to be able to programmatically trigger alerts in Google Cloud Monitoring. Basically I have a watchdog that I want to execute certain actions based on multiple criteria. One of those actions is that I want it to trigger a new alert in Google Cloud Monitoring.
Is there a smooth way to do this?
So far my best guess is:
Setup an alert policy on a custom metric (like isTriggeringAlert>0)
Write the actions to a log ("[ALERT]: ....") and use Cloud Monitoring to catch that log
Both works, but was wondering if there is a programatic way to trigger instead? I haven't found anything in the Python SDK for Cloud Monitoring (just how to create monitoring policies)
Regards,
Niklas

This feature has been requested, but isn't available yet.
As a workaround, you can try writing appropriate data into timeSeries using this API.

Compute Engine VM Creation Notification

I wanted to get notified if/when there is/are any VM creation in my infra on GCP.
I see a google library that can give me list of VM.
I can create a function to use this code (probably)
Schedule the above function. And check for difference.
But do storage like triggers available for Compute.
Also if there is any other solution.

You have a third solution. You can use Cloud Run instead of Cloud Functions (the migration is very easy, let me know if you have issues).
With Cloud Run, you can use the trigger (eventArc feature), a new feature (still in preview) based on the auditLog logs. It's very similar to the first solution proposed by LundinCast, but it's automatically set up by Cloud Run Trigger feature.
So, deploy your service on Cloud Run. Then configure a trigger on v1.compute.instancs.insert API, select your region or make the trigger global and that's all!! Your service will be triggered when a new instance will be created.
As you can see in my screenshot, you will be asked to activate the auditLog to be able to use this feature. Because it's built-in, it's done automatically for you!

Using Logging sink and a PubSub-triggered Cloud Function
First, export the relevant logs to a PubSub topic of your choice by creating a Logging sink. Include the logs created automatically during VM creation with the following log filter:
resource.type="gce_instance"
protoPayload.methodName="beta.compute.instances.insert"
protoPayload.methodName="compute.instances.insert"
Next, create a Cloud Function that'll trigger every time a new log is set to the PubSub topic. You can process this new message as per your needs.
Note that with this option you'll have to handle to notification yourself (for example, by sending an email). It is useful though if you want to send different notification based on some condition or if you want to perform additional actions apart from the notification.
Using a log-based metric and a Cloud Monitoring alert
You can use a Log-based metric filtering logs for Compute Engine VM creation and set an alert on that metric to get notified.
First create a counter log-based metric with a log filter similar to the one in the previous method, which will report a data point to Cloud monitoring every time a new VM instance is created.
Then go to Cloud Monitoring and create an alert based on that metric that trigger every time a metric is reported.
This option is the easiest to set up and supports various notification channels out-of-the-box.

Going along with LudninCast's answer.
Cloud Run --
Would have used it if it had not been zone issue for me. Though I conclude this from POC I did
Easy setup.
Containerised Apps. Probably more code to maintain.
Public URL for app.
Out of box support for the requirements like mine.
Cloud Function --
Sink setups for triggers can be time consuming for first timer
Easy coding and maintainance.

How do I send a notification to Slack from AWS CloudWatch on a specific error?

I'm trying to setup notifications to be sent from our AWS Lambda instance to a Slack channel. I'm following along in this guide:
https://medium.com/analytics-vidhya/generate-slack-notifications-for-aws-cloudwatch-alarms-e46b68540133
I get stuck on step 4 however because the type of alarm I want to setup does not involve thresholds or anomalies. It involves a specific error in our code. We want to be notified when users encounter errors when attempting to login in or sign up. We have try/catch blocks in our Node.js backend to log errors to CloudWatch at various points in the login/signup flow where we think the errors are most likely happening. We would like to identify when those SPECIFIC errors are occurring and send a notification to a Slack channel built for this purpose.
So in step 4 of the article, what would I have to do to set this up? Or is the approach in this article simply the wrong one for my purposes?
Thanks.

The step 4 titled "Create a CloudWatch Alarm" uses CPUUtlization metric to trigger an alarm.
In your case, since you want to use CloudWatch Logs, you would create CloudWatch Metric Filters based on the logs entries of interest. This would produce custom metrics based on your error string. Subsequently, you would create CloudWatch Alarm of this metric as shown in the linked tutorial for CPUUtlization.

How/Where to log, audit, and alert on changes to Google Cloud Function code?

How would our organization log, audit, and alert on any code changes (add, change, delete) to Google Cloud Functions to survive an external audit? We've figured out how to do so on AWS (combination of CloudTrail and CloudWatch Events/Amazon EventBridge) and Azure (Audit log and Alerts under the Monitor service, although this is not as reliable as the AWS solution because some events do not seem to be picked up. Azure even has this nice new service in preview called Application Change Analysis, but it does not alert, and it goes away when a function is deleted instead of reporting that it has been deleted.)
But how do we do the same thing with Google Cloud Functions? How would we log and audit the creation/update/deletion of Cloud Functions and Cloud Function code? How would we go even further and receive an alert whenever any of those conditions occur, just like we have proven can happen with AWS and (kind of, at least) with Azure? Thank you!

You can use the Cloud Function audit logs. You can export the logs to PubSub, and then, you can do what you want on the event:
Store them in BigQuery for the history
Send an alert (email, slack message,...)
Act: for example, perform a rollback to the previous code stored in the source repository
...
All depends on your security process and what do you want to do with the events.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js