Google Cloud alerting condition on missing infrequent event - google-cloud-platform

I'm trying to create an alert condition where if an infrequent event (e.g. cron job running once a week) does not occur, it will trigger.
The metric is log-based. I've had success with smaller windows by using the alignment period, but there is a limitation where the alignment period can not be longer than 1 day.
Alignment periods longer than 86400 seconds are not supported.
(Not working) sample of what I'm trying to do:
- conditionThreshold:
aggregations:
- alignmentPeriod: 604800s # 1 week NOT possible
perSeriesAligner: ALIGN_SUM
comparison: COMPARISON_LT
thresholdValue: 1.0
duration: 0s
filter: metric.type="logging.googleapis.com/user/my_infrequent_event_count"
trigger:
count: 1
displayName: Infrequent event did not occur
Any idea on how this is possible?

Currently this is not possible to accomplish as the duration can’t exceed from 24h.
As a workaround, you might find useful Cloud Monitoring metric export for long-term metrics analysis. Please also refer to this doc.
I found this public thread, which might be helpful too.

Related

AWS Auto Scaling Group: is it possible to schedule an increase of capacity every first sunday of each months?

I'm using Auto Scaling groups which works pretty fine with some custom rules I've set, but I also noticed that I need to set the minimum values of servers to a certain, higher, amount every first Sunday of each month (we have an increase in requests at that period).
I saw the Scheduled actions in the "Automatic scaling" tab, but it does not appear to be possible to set longer than per week, even less setting something like "every first Sunday of each months".
Is this something possible in another way ? Maybe via some Cloudwatch settings with custom "cron" tasks? I'm not sure here.
You don't need to use an external service or a Lambda to do it. You can do it in the advance tab in AutoScaling option in AWS.
For your particular case, your crontab expression would be 0 14 ? * 1#1 * See the image below, it will be triggered every first Sunday of the month.
Edit: It only works in Event bridge cron expression.
Cron expression
You can create a different timer (e.g. for lambda called every morning/hour/whatever) and scale-up the scaling group from there if its the first sunday of the month.

How can I get AWS lambda usage for the last hour?

I would like to know if there is a way to get all of my lambda invocation usages for the last 1 hour (better if every 5 minutes).
It could also be nice to get the cost usage (but from what I've read it only updates once a day).
From looking at the documentation it seems like I can use GetMetricData (Cloudwatch), is there a better one for my use case?
You can get this information by region within CloudWatch metrics.
In the AWS/Lambda namespace is a metric named Invocations, this can be viewed for the entire region or on a per Lambda basis.
If you look at the Sum per whichever period you want to use (you can get down to per 1 minute values for this metric), you will be able to get these values in near real-time.
You can get these values from within the console or by using the get-metric-data command within the CLI or SDK.
There are many tools to get metrics on your lambda, so it really depends on your needs.
What do you mean by "is there a better one for my use case"?
If you prefer, you can check it through the console: Go to cloudwatch -> metrics -> and navigate to your lambda. You can aggregate the data differently (examples: average per 5 minutes, or total a day, etc.)
Here's a great doc: https://docs.aws.amazon.com/lambda/latest/dg/monitoring-metrics.html#monitoring-metrics-invocation
Moreover, here's a solution that I gave that surveys different approaches to monitor lambda resources: Best Way to Monitor Customer Usage of AWS Lambda
Disclosoure: I work for Lumigo, a company that does exactly that.

Scheduling one time tasks with AWS

I have to implement functionality that requires delayed sending of a message to a user once on a specific date, which can be anytime - from tomorrow till in a few months from now.
All our code is so far implemented as lambda functions.
I'm considering three options on how to implement this:
Create an entry in DynamoDB with hash key being date and range key being unique ID. Schedule lambda to run once a day and pick up all entries/tasks scheduled for this day, send a message for each of them.
Using SDK Create cloudwatch event rule with cron expression indicating single execution and make it invoke lambda function (target) with ID of user/message. The lambda would be invoked on a specific schedule with a specific user/message to be delivered.
Create a step function instance and configure it to sleep & invoke step with logic to send a message when the right moment comes.
Do you have perhaps any recommendation on what would be best practice to implement this kind of business requirement? Perhaps an entirely different approach?
It largely depends on scale. If you'll only have a few scheduled at any point in time then I'd use the CloudWatch events approach. It's very low overhead and doesn't involve running code and doing nothing.
If you expect a LOT of schedules then the DynamoDB approach is very possibly the best approach. Run the lambda on a fixed schedule, see what records have not yet been run, and are past/equal to current time. In this model you'll want to delete the records that you've already processed (or mark them in some way) so that you don't process them again. Don't rely on the schedule running at certain intervals and checking for records between the last time and the current time unless you are recording when the last time was (i.e. don't assume you ran a minute ago because you scheduled it to run every minute).
Step functions could work if the time isn't too far out. You can include a delay in the step that causes it to just sit and wait. The delays in step functions are just that, delays, not scheduled times, so you'd have to figure out that delay yourself, and hope it fires close enough to the time you expect it. This one isn't a bad option for mid to low volume.
Edit:
Step functions include a wait_until option on wait states now. This is a really good option for what you are describing.
As of November 2022, the cleanest approach would be to use EventBridge Scheduler's one-time schedule.
A one-time schedule will invoke a target only once at the date and time that you specify using a valid date, and a timestamp. EventBridge Scheduler supports scheduling in Universal Coordinated Time (UTC), or in the time zone that you specify when you create your schedule. You configure a one-time schedule using an at expression.
Here is an example using the AWS CLI:
aws scheduler create-schedule --schedule-expression "at(2022-11-30T13:00:00)" --name schedule-name \
--target '{"RoleArn": "role-arn", "Arn": "QUEUE_ARN", "Input": "TEST_PAYLOAD" }' \
--schedule-expression-timezone "America/Los_Angeles"
--flexible-time-window '{ "Mode": "OFF"}'
Reference: Schedule types on EventBridge Scheduler - EventBridge Scheduler
User Guide
Instead of using DynamoDB I would suggest to use s3. Store the message and time to trigger as key value pairs.
S3 to store the date and time as key value store.
Use s3 lambda trigger to create the cloudwatch rules that would target specific lambda's etc
You can even schedule a cron to a lambda that will read the files from s3 and update the required cron for the message to be sent.
Hope so this is in line with your requirements

Is it possible to set up CloudWatch Alarm for 3 or 4 mins period?

I need to receive a notification each time a certain message does not appear in logs for 3-4 minutes. It is a clear sign that the system is not working properly.
But it is only possible to choose 1 min or 5 mins. Is there any workaround?
"does not appear in logs for 3-4 minutes. It is a clear sign that the system is not working properly."
-- I know what you mean, CloudWatch Alarm on a metric which is not continuously pushed might behave a bit differently.
You should consider using Alarm's M out of N option with 3 out 4 option.
https://aws.amazon.com/about-aws/whats-new/2017/12/amazon-cloudwatch-alarms-now-alerts-you-when-any-m-out-of-n-metric-datapoints-in-an-interval-are-above-your-threshold/
Also, if the metric you are referring to was created using a metric filter on a CloudWatch Log Group, you should edit the metric to include a default value so that each time a log is pushed and the metric filter expression does not match it still pushes a default value (of say 0) thus making metric have more continuous datapoint.
If you describe an cloudwatch alarm using AWS Cli it is possible to input the period in seconds.Only the web interface limits the period to set of values.
https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/describe-alarms.html

Can I post an incrementing number to CloudWatch and have it compute the delta?

We have a statistic coming from a third party tool that is running on our servers. We want to post this statistic to cloud watch every 5 minutes. The stat is an incrementing number. We have no control over getting this number or the fact that it is incrementing.
The stat is basically, "number of dropped messages".
We want to be able to alarm whenever for a period of 15 minutes, if the number of dropped messages is greater than a certain threshold.
In order to do this with CloudWatch, we have been maintain state over what the past stat was and subtract this from the current stat, in order to compute the difference (the number of dropped messages since the last time we posted the metric) and then post the difference to CloudWatch
Is there a way to post the raw numbers to CloudWatch and have CloudWatch figure out the difference?
So let's say these are our metrics:
12:00 - 0 -> post to cloud watch "0"
12:05 - 2225 -> post to cloud watch "225"
12:10 - 3350 -> post to cloud watch "1135"
12:15 - 7700 -> post to cloud watch "4350"
Instead of computing the difference since the last metric, can we just post 2000, 2225, 3350 and 7700, and be able to place an alarm on the difference between two periods?
You can achieve this through CloudWatch Metric Math (released in April 2018). See documentation.
In your particular case, you could use RATE or STDEV functions