How to be notified on Google Cloud Function timeout? - google-cloud-platform

I've enabled the notifications in Stackdriver and I'm getting notification e-mails for exceptions just fine.
The problem is that I don't get any notification for timeouts.
Is there any way to be notified when a Google Cloud Function is killed by timeout?

Even though a timeout is not reported as an error, you can still set up a metric for timeout log entries, and then an alert on the metric exceeding a zero threshold.
From the GCP console, go to the Stackdriver Logging viewer (/logs/viewer), and build a filter like this:
resource.type="cloud_function"
resource.labels.function_name="[YOUR_FUNCTION_NAME_HERE]"
"finished with status: timeout"
The third line is a "contains" text filter. Timeout messages consistently contain this text. You can add other things or modify as needed.
Click Create Metric. Give the metric a name like "Function timeouts", and make sure the type is counter. You can leave the optional fields blank. Submit the form, and you should be redirected to /logs/metrics.
Under User-defined Metrics, you should see your new metric. Click the three-dot button on the right and select Create alert from metric.
Give the alert policy a meaningful name. Under target, you may also get some red text about being unable to produce a line plot. Click the helpful link to switch the aligner to mean and the aggregator to none. Then under Configuration, set the condition to "is above," threshold to "0", and for "most recent value."
Proceed with building the notification and documentation as desired. Make sure you add a notification channel so you get alerted. The UI should include hints on each field.
More detail is in the official documentation.

Navigate to "Create alerting policy" using the search box at the top of the dashboard.
Under "What do you want to track?" click "Add Condition."
Configure the new condition like so:
Click "Add."
Click "Next."
Select a notification channel or create a new one.
I unchecked "Notify on incident resolution."
Click "Next."
Provide a descriptive alert name and optional documentation.
Click "Save."
Ensure that at the top of the policy you see the word "Enabled" along with a green checkmark.

Came up with a workaround for this by forcing an error before Cloud Functions times out. In terms of workflow, I think this is much easier to control and be able to consolidate all the errors in one place, rather than having to configure settings elsewhere.
Basically something like the code snippet below:
exports.cloudFunction = async (event, context, callback) => {
try {
const timeout = setTimeout(function(){
throw new Error(`Timeout: ${event}`);
}, 58000); // 2sec buffer off the default 60s timeout
// DO SOMETHING
clearTimeout(timeout);
callback();
} catch(e) {
// HANDLE ERROR
callback(e);
}
}

Related

GCP Alert Filters Don't Affect Open Incidents

I have an alert that I have configured to send email when the sum of executions of cloud functions that have finished in status other than 'error' or 'ok' is above 0 (grouped by the function name).
The way I defined the alert is:
And the secondary aggregator is delta.
The problem is that once the alert is open, it looks like the filters don't matter any more, and the alert stays open because it sees that the cloud function is triggered and finishes with any status (even 'ok' status keeps it open as long as its triggered enough).
ATM the only solution I can think of is to define a log based metric that will count it itself and then the alert will be based on that custom metric instead of on the built in one.
Is there something that I'm missing?
Edit:
Adding another image to show what I think might be the problem:
From the image above we see that the graph wont go down to 0 but will stay at 1, which is not the way other normal incidents work
According to the official documentation:
"Monitoring automatically closes an incident when it observes that the condition is no longer met or when 7 days have passed without an observation that the condition is still being met."
That made me think that there are times where the condition is not relevant to make it close the incident. Which is confirmed here:
"If measurements are missing (for example, if there are no HTTP requests for a couple of minutes), the policy uses the last recorded value to evaluate conditions."
The lack of HTTP requests aren't a reason to close the metric as it keeps using the last recorded value (that triggered the metric).
So, using alerts for Http Requests is fine but you need to close them by yourself. Although I think it would be better to use a custom metric instead if you want them to be disabled automatically.

Google cloud alert for not receiving message during an hour?

I have several Alerts in GCP for specific causes/action.
Like for myFunction:
I get an alert (slack/mail) if it fails (msg: "failed!"). The alert works for the specific text-msg "failed!"
But how to create alert if my function not started during an hour (msg: "started!")?
Any suggestions?
Create an alerting policy with custom log based metric to look for msg: "started!" and in Configuration section, set the condition to: Is absent and select time of 1 hr

GCP Logs-Based Monitoring: Trigger an alert when no logs are received

I have an application that I'm setting up logs-based monitoring for. The application will log whenever it completes a certain task. I want to ensure that the application completes this at least once every 6 hours.
I have tried to replicate this rule by configuring monitoring to fire an alert when the metric stays below 1 for the given amount of time.
Unfortunately, when the logs-based metric doesn't receive any logs, it appears to act that there is "no data" instead of a value of 0.
Is it possible to treat segments when no logs are received as a 0 so that the alert will fire?
Screenshot of my metric graph:
Screenshot of alert definition:
You can see that we receive a log for one time frame, but right afterwards the line disappears and an alert isn't triggered.
Try using absent_for and MQL based Alert.
The absent_for table operation generates a table with two value columns, active and signal. The active column is true when there is data missing from the table input and false otherwise. This is useful for creating a condition query to be used to alert on the absence of inputs.
Example:
fetch gce_instance :: compute.googleapis.com/instance/cpu/usage_time
| absent_for 8h

How can I customize the entire email notification in Stackdriver Alerting?

Currently, the message specified in the Document field while creating alerting policy appears in the Document field of the Stackdriver alert email.
I would like to overwrite the entire email message body with my custom content.
How can I overwrite the message body of Stackdriver Alert email with my custom message?
Is there any other workaround to do this?
You should be able to send the notification to a webhook, and this could directly be an HTTP-triggered Cloud Function.
This Cloud Function would receive all the information from the alert, and you can follow this tutorial to use SendGrid to send your alerts.
This is a lot more complex than just setting the email notifications, but also provides you with an amazing flexibility regarding alerts, as you'll be able to not just write the message however you want, but you could process the data in any way you want:
You have low priority alerts? Then store them and just send a digest
once in a while instead of spamming.
Want to change who is sent the
alert depending on a calendar rotation? Use the function to look up
who should be notified.
And those are just some random quick ideas I got while writing this message.
The information provided in the POST body is this one (that's just a sample):
{
"incident": {
"incident_id": "f2e08c333dc64cb09f75eaab355393bz",
"resource_id": "i-4a266a2d",
"resource_name": "webserver-85",
"state": "open",
"started_at": 1385085727,
"ended_at": null,
"policy_name": "Webserver Health",
"condition_name": "CPU usage",
"url": "https://app.google.stackdriver.com/incidents/f333dc64z",
"summary": "CPU for webserver-85 is above the threshold of 1% with a value of 28.5%"
},
"version": 1.1
}
You can create a single webhook that handles all the alerts, or you can create a webhook on a per-policy basis to handle things separately.

AWS Lex Simple Bot Statement / Intent

I was hoping to create an intent that only makes a statement to the user. I was thinking I could do this by having some utterances that trigger the intent with no slots, but with a "goodbye message". When I try to do that, when I save the intent, it deletes the goodbye message and reverts it to None. I also picked a "no-op" lambda function to call since it doesn't make sense for what I'm doing.
At this point, I'm not sure about how to do this, but it seems like the claudia-bot-builder has support for something like this, but I can't get it to deploy to my AWS account to see how it might do it.
Does anyone else here have an idea about how to have the bot just give information in response to an utterance instead of starting a dialog to retrieve information?
You can return simple messages to the user using only Amazon Lex Console without any Lambda Function selected.
Open your Lex Chat Bot in console.aws.amazon.com/lex/home....
1 Click the "Editor" Tab.
2 On the left, click on the Intent you want to return a message on.
3 On the right, scroll down to the "Fulfillment" section.
4 Click "Return parameters to client"
5 In "Response" section, type in the message(s) you want to respond with.
Note: Make sure the section "Lambda initialization and validation" is unchecked so that the intent goes directly to the "Fulfillment" response.