Metric for Number of unacknowledged messages older than 20 minutes - google-cloud-platform

I am trying to set up alerts on pubsub in gcp that monitor the number of old messages in a queue. Specifically the number of unacknowledged messages older than 20 minutes.
I want an alert that because number of unacknowledged messages cloud shoot high on sudden push of hugh number of messages. And using only the oldest unacknowledged message will run the alert for outlier messages that might stuck in the queue (ex bad formatted messages etc..)
I've tried to combine both metrics but could not know how to filter on one of them.
fetch pubsub_subscription |
{
t_0: metric 'pubsub.googleapis.com/subscription/num_undelivered_messages';
t_1: metric 'pubsub.googleapis.com/subscription/oldest_unacked_message_age'
}
| outer_join 0 # how to filter now on oldest_unacked_message_age > 20 minutes and select num_undelivered_messages
Also I think this won't work as my understanding of cloud pubsub metrics because each metric is a single time series number. It does not have information about individual messages (correct me if I am wrong).
Also I've tried to look for a metic that have them both but can't find one as well.

You can deploy an alert of undelivered messages in Google Cloud Monitoring. You will find the Pub/Sub subscription resource type, and then you can set a filter based on the response_code. Also, you can create a new chart based on your needs.

Related

GCP PubSub understanding filters

In my understanding PubSub filters are supposed to reduce number of messages sent to a specific subscription. We currently observe behaviour that we didn't expect.
Assuming there is a PubSub Topic "XYZ" and a subscription to that topic "XYZ-Sub" with a filter attributes.someHeader = "x"
There are 2 messages published to that topic:
First one attributes.someHeader = "a". Second one with attributes.someHeader = "x"
I expect the only message 2 will be delivered to the subscription as message 1 does not match the filter.
If it is not the case and still both messages get delivered (what we currently observe):
GCP console shows a rising number of unacked messages on a sub when no client is connected. Pulling this messages in the gcp console removes them without showing any received messages, which makes me assume that the filters are applied when pulling messages.
Are filters evaluated on PubSub client and not topic level?
What is the point in using filters with pub/sub?
Will the delivery of the unwanted message (the bytes of the message) be billed?
Filtering in Cloud Pub/Sub only delivers messages that match the filter to subscribers. The filters are applied in the Pub/Sub service itself, not in the client. They allow you to limit the set of messages delivered to subscribers when the subscriber only wants to process a subset of the messages.
In your example, only the message with attributes.someHeader = "x" should be delivered. However, note that as the documentation, the backlog metrics might include messages that don't match the filter. Such messages will not be delivered to subscribers, but may still show up in the backlog metrics for a time.
You do get charged the Pub/Sub message delivery price for messages that were not delivered. However, you do not pay any network fees for them, nor do you end up paying for any compute to process messages you do not receive.

Is there a way to retrieve the count of messages in a PubSub subscription (in realtime)?

I want to achieve batch consuming of a PubSub subscription, retrieving all the messages that were in the subscription at the begining of my process. To do so, I use PubSub's asynchronous pulling for Java, and the consumer.ack() and consumer.nack() functions to process exactly the number of messages that I want, and make the subscription redeliver the messages that I have received but not processed yet. My problem being that I did not managed to find a way to retrieve the real time count of messages in my subscription.
I have started to request pubsub.googleapis.com/subscription/num_undelivered_messages metric from Google Cloud Monitoring, but unfortunately the metric has a ~3 minutes latency with the real count of undelivered messages in the subscription.
Is there any way to retrieve this message count on real time ?
There is no way to retrieve the message count in real time, no. Also keep in mind that such a number would not be sufficient to retrieve all of the messages that were in the subscription at the beginning of the process unless you can guarantee that no publishing is happening at the same time.
If there is publishing, then your subscriber could get those messages before messages published earlier, unless you are using ordered message delivery and even still, those delivery guarantees are per ordering key, not a total ordering guarantee. If you can guarantee that there are no publishes during this time and/or you are only bringing the subscriber up periodically, then it sounds more like a batch case, which means you may want to consider a database or a GCS file as an alternative place to store the messages for processing.

Throttle down GCP DataFlow?

Using the standard GCP provided Storage/text file to PubSub DataFlow template but although I have set #workernodes eq 1 the thruput of messages processed is "to high" for downstream components.
CloudFunction that runs on message event in Pub/Sub hits GCP quotas and with CloudRun I get a bunch of 500, 429 and 503 errors in the beginning (due to to step burst rate).
Is there any way to control the processing rate of DataFlow? Need to get a softer/slower start so downstream components have time to scale up.
Anyone?
You can use Stateful ParDo's to achieve this where in you can buffer events in batches and make an API call with all the keys at once. This is very nicely explained with code snippets here

How to modify/check google cloud run's retry limit on failure?

I have got a topic, which on publish it pushes the event to a cloud run endpoint and I got a trigger on a storage bucket to publish for this topic. The container in the cloud run fails to process the event and it has been restarted over hundreds of times and I don't wanna waste money on this. How can I limit the retry on failure on a cloud run's container?
A possible answer to the puzzle might be the following notion.
If we read the documentation on PUSH subscriptions found here, we find the following:
... Pub/Sub retries delivery until the message expires after the
subscription's message retention period.
What this means is that if Pub/Sub pushes a message to Cloud Run and Cloud Run does not acknowledge the message by returning a 200 response code, then the message will be re-pushed for the "message retention period". By default, this is 7 days but according to the documentation, can be set to a minimum value of 10 minutes. What this seems to say to me is that we can stop a poison message after 10 minutes (minimum) of retries.
If a message is pushed and not acked, then it won't be pushed again immediately but instead be pushed as a function of a back-off algorithm described here.
If we look at the gcloud documentation we find reference to the concept of a maximum number of delivery attempts (--max-delivery-attempts). Associated with this is a topic called the dead letter topic (--dead-letter-topic). What this appears to define is that if an attempt to deliver a pub sub message more than the maximum number of times, the message will be removed from the queue of messages associated with the subscription and moved to the topic associated with the dead letter. If you define this for your environment, then your Cloud Run will only execute a finite number of times after which the poision messages will be moved elsewhere.

GCloud Pub/Sub Push Subscription: Limit max outstanding messages

Is there a way in a push subscription configuration to limit the maximum number of outstanding messages. In the high level subscriber docs (https://cloud.google.com/pubsub/docs/push) it says "With slow-start, Google Cloud Pub/Sub starts by sending a single message at a time, and doubles up with each successful delivery, until it reaches the maximum number of concurrent messages outstanding." I want to be able to limit the maximum number of messages being processed, can this be done through the pub/sub config?
I've also thought of a number of other ways to effectively achieve this, but none seem great:
Have some semaphore type system implemented in my push endpoint that returns a 429 once my max concurrency level is hit?
Similar, but have it deregister the push endpoint (turning it into a pull subscription) until the current messages have been processed
My push endpoints are all on gae, so there could also be something in the gae configs to limit the simultaneous push subscription requests?
Push subscriptions do not offer any way to limit the number of outstanding messages. If one wants that level of control, the it is necessary to use pull subscriptions and flow control.
Returning 429 errors as a means to limit outstanding messages may have undesirable side effects. On errors, Cloud Pub/Sub will reduce the rate of sending messages to a push subscriber. If a sufficient number of 429 errors are returned, it is entirely possible that the subscriber will receive a smaller number of messages than it can handle for a time while Cloud Pub/Sub ramps the delivery rate back up.
Switching from push to pull is a possibility, though still may not be a good solution. It would really depend on the frequency with which the push subscriber exceeds the desired number of outstanding messages. The change between push and pull and back may not take place instantaneously, meaning the subscriber could still exceed the desired limit for some period of time and may also experience a delay in receiving new messages when switching back to a push subscriber.