GCP Alert notification is sending only once - google-cloud-platform

I setup Alert Monitoring for pubsub subscription like below:
I was expecting this to fire every 2 minutes, since the condition is met throughout.
But I got the notification only one time. I also tried with duration 1 minute. Still no luck.
What am I doing wrong here?
Or my understanding of these terms may be wrong?
What I want is:
For every 2 minutes, when count of un-acked message count is > x, trigger an alert.
Note: I just masked the filter field here, which is a subscription_id.

Your current monitoring set up is working as intended since you only have a single time series and a single condition. As per alert notification docs:
You can only receive multiple notifications if any of the following
are true:
All conditions are met:
When all conditions are met, then for each
time series that results in a condition being met, the policy sends a
notification and creates an incident. For example, if you have a
policy with two conditions and each condition is monitoring one time
series, then when the policy is triggered, you receive two
notifications and you see two incidents.
Any condition is met:
The policy sends a notification each time a new
combination of conditions is met. For example, assume that ConditionA
is met, that an incident opens, and that a notification is sent. If
the incident is still open when a subsequent measurement meets both
ConditionA and ConditionB, then another notification is sent.
If you create the policy by using the Google Cloud Console, then the
default behavior is to send a notification when the condition is
met.
Lastly the purpose of "Period" is to just increase the data points in the chart and is not related to triggering a notification multiple times until it is below the threshold. Thus it is not possible send continuous notifications until the monitored data is below the threshold.

Related

How to tell when Lambdas complete processing of all messages in SQS

Currently I have a process where a Lambda (A) gets triggered which has logic to find out what customers need to have another lambda (B) run for (via a queue). For any run there could be 3k to 4k messages placed on the SQS Queue by Lambda A to be picked up by Lambda B to process. As Lambda B communicates with an external Api, the concurrency is set to 10 for Lambda B so as not to overload the Api. The whole process completes in 35 to 45 minutes.
My problem is how to tell when all the processing is complete?
If you don't need timely information, you could check out the CloudWatch Metrics that SQS offers, e.g.:
ApproximateNumberOfMessagesVisible
The number of messages available for retrieval from the queue.
Reporting Criteria: A non-negative value is reported if the queue is active.
and
ApproximateNumberOfMessagesNotVisible
The number of messages that are in flight. Messages are considered to be in flight if they have been sent to a client but have not yet been deleted or have not yet reached the end of their visibility window.
Reporting Criteria: A non-negative value is reported if the queue is active.
If the sum of these two metrics hits zero, no messages are in the Queue, and processing should be done.
If you need more timely information, the producer of the messages could increment a counter item in DynamoDB with the number of messages added, and each Lambda decrements that counter once it's done. You could then add a Lambda to the DynamoDB Stream of that table with a filter and do something when the value changes to zero again. This is, however, much more complex.
A third option could be to transform the whole thing into a stepfunction and use a map state with a parallelization factor to work on the tasks. The drawback is that the length of the list it can work on is limited afaik.

DynamoDB sends multiple update event and my lambda is called multiple times

I've got a DynamoDB in AWS and it has a trigger on it with an AWS Lambda connected.
I need to send an email when a status reaches a particular value and it's done checking the EventName=="MODIFY" and the newImage["Status"] value.
What currently happens is that the event is fired 3 times and so 3 emails are sent...
I thought to set a value on DynamoDB telling that I've already sent an email, but doing so another trigger is fired and I don't know if the time between events is enough to perform this update... anyone has got this issue before? how did you handle it?
Thanks

GCP Alert Filters Don't Affect Open Incidents

I have an alert that I have configured to send email when the sum of executions of cloud functions that have finished in status other than 'error' or 'ok' is above 0 (grouped by the function name).
The way I defined the alert is:
And the secondary aggregator is delta.
The problem is that once the alert is open, it looks like the filters don't matter any more, and the alert stays open because it sees that the cloud function is triggered and finishes with any status (even 'ok' status keeps it open as long as its triggered enough).
ATM the only solution I can think of is to define a log based metric that will count it itself and then the alert will be based on that custom metric instead of on the built in one.
Is there something that I'm missing?
Edit:
Adding another image to show what I think might be the problem:
From the image above we see that the graph wont go down to 0 but will stay at 1, which is not the way other normal incidents work
According to the official documentation:
"Monitoring automatically closes an incident when it observes that the condition is no longer met or when 7 days have passed without an observation that the condition is still being met."
That made me think that there are times where the condition is not relevant to make it close the incident. Which is confirmed here:
"If measurements are missing (for example, if there are no HTTP requests for a couple of minutes), the policy uses the last recorded value to evaluate conditions."
The lack of HTTP requests aren't a reason to close the metric as it keeps using the last recorded value (that triggered the metric).
So, using alerts for Http Requests is fine but you need to close them by yourself. Although I think it would be better to use a custom metric instead if you want them to be disabled automatically.

GCP Logs-Based Monitoring: Trigger an alert when no logs are received

I have an application that I'm setting up logs-based monitoring for. The application will log whenever it completes a certain task. I want to ensure that the application completes this at least once every 6 hours.
I have tried to replicate this rule by configuring monitoring to fire an alert when the metric stays below 1 for the given amount of time.
Unfortunately, when the logs-based metric doesn't receive any logs, it appears to act that there is "no data" instead of a value of 0.
Is it possible to treat segments when no logs are received as a 0 so that the alert will fire?
Screenshot of my metric graph:
Screenshot of alert definition:
You can see that we receive a log for one time frame, but right afterwards the line disappears and an alert isn't triggered.
Try using absent_for and MQL based Alert.
The absent_for table operation generates a table with two value columns, active and signal. The active column is true when there is data missing from the table input and false otherwise. This is useful for creating a condition query to be used to alert on the absence of inputs.
Example:
fetch gce_instance :: compute.googleapis.com/instance/cpu/usage_time
| absent_for 8h

Google Cloud PubSub Message Delivered More than Once before reaching deadline acknowledgement time

Background:
We configured cloud pubsub topic to interact within multiple app engine services,
There we have configured push based subscribers. We have configured its acknowledgement deadline to 600 seconds
Issue:
We have observed pubsub has pushed same message twice (more than twice from some other topics) to its subscribers, Looking at the log I can see this message push happened with the gap of just 1 Second, Ideally as we have configured ackDeadline to 600 seconds, pubsub should re-attempt message delivery only after 600 seconds.
Need following answers:
Why same message has got delivered more than once in 1 second only
Does pubsub doesn’t honors ackDeadline configuration before
reattempting message delivery?
References:
- https://cloud.google.com/pubsub/docs/subscriber
Message redelivery can happen for a couple of reasons. First of all, it is possible that a message got published twice. Sometimes the publisher will get back an error like a deadline exceeded, meaning the publish took longer than anticipated. The message may or may not have actually been published in this situation. Often, the correct action is for the publisher to retry the publish and in fact that is what the Google-provided client libraries do by default. Consequently, there may be two copies of the message that were successfully published, even though the client only got confirmation for one of them.
Secondly, Google Cloud Pub/Sub guarantees at-least-once delivery. This means that occasionally, messages can be redelivered, even if the ackDeadline has not yet passed or an ack was sent back to the service. Acknowledgements are best effort and most of the time, they are successfully processed by the service. However, due to network glitches, server restarts, and other regular occurrences of that nature, sometimes the acknowledgements sent by the subscriber will not be processed, resulting in message redelivery.
A subscriber should be designed to be resilient to these occasional redeliveries, generally by ensuring that operations are idempotent, i.e., that the results of processing the message multiple times are the same, or by tracking and catching duplicates. Alternatively, one can use Cloud Dataflow as a subscriber to remove duplicates.