What happen to PubSub messages when not consumed? - google-cloud-platform

According to the Cloud PubSub documentation messages cease to be stored if not consumed within 7 days.
Retains unacknowledged messages in persistent storage for 7 days from the moment of publication.
What happens to PubSub messages when the retention delay is over ? Are they simply deleted ?
Is there a log entry generated for a lost message ?
Is there a way to know how many messages were lost ?

When the seven-day retention expires, the messages are deleted. No long entry is generated for these deletions. There is not a metric to determine how many messages are deleted, though you could set up alerting on the subscription/oldest_unacked_message_age metric in Stackdriver to know when you have messages that are close to the seven-day retention limit.

Related

Pub/Sub messages from snapshot not processed in a Dataflow streaming pipeline

We have a Dataflow consumming from a Pub/Sub and writing into bigquery in streaming. Due to a permits issue the pipeline got stuck and the messages were not consumed, we re-started the pipeline, save the unacked messages in a snapshot, replay the messages but they are discarded
We fix the problem, re-deployed the pipeline with a new subscription to the topic and all the events are consumed in streaming without a problem
For all the unacked messages accumulated (20M) in the first subscription, we created a snapshot
This snapshot was then connected to the new subscription via the UI using Replay messages dialog
In the metrics dashboard we see that the unacked messages spike to 20M and then they get consumed
subscription spike
But then the events are not sent to BigQuery, checking inside dataflow job metrics we are able to see a spike in the Duplicate message count within the read from pubsub step
Dataflow Duplicate counter
The messages are < 3 days old, does anybody knows why this happen? Thanks in advance
The pipeline is using Apache Beam SDK 2.39.0 and python 3.9 with streming engine and v2 runner enable.
How long does it take for a Pub/Sub message to process, is it a long process?
In that case, Pub/Sub may redeliver messages, according to subscription configuration/delays. See Subscription retry policy.
Dataflow can work-around that, as it acknowledges from the source after a successful shuffle. If you add a GroupByKey (or artificially, a Reshuffle) transform, it may resolve source duplications.
More information at https://beam.apache.org/contribute/ptransform-style-guide/#performance

Rate limit GCP Cloud Function triggers from pub/sub topic

I have a Cloud Function that is being triggered from a Pub/Sub topic.
I want to rate limit my Cloud Function, so I set the max instances to 5. In my case, there will be a lot more produced messages than Cloud Functions (and I want to limit the number of running Cloud Functions).
I expected this process to behave like Kafka/queue - the topic messages will be accumulated, and the Cloud Function will slowly consume messages until the topic will be empty.
But it seems that all the messages that did not trigger cloud function (ack), simply sent a UNACK - and left behind. My subscription details:
The ack deadline max value is too low for me (it may take a few hours until the Cloud Function will get to messages due to the rate-limiting).
Anything I can change in the Pub/Sub to fit my needs? Or I'll need to add a queue? (Pub/Sub to send to a Task Queue, and Cloud Function consumes the Task Queue?).
BTW, The pub/sub data is actually GCS events.
If this was AWS, I would simply send S3 file-created events to SQS and have Lambdas on the other side of the queue to consume.
Any help would be appreciated.
The ideal solution is simply to change the retrying policy.
When using "Retry after exponential backoff delay", the Pub/Sub will keep retrying even after the maximum exponential delay (600 seconds).
This way, you can have a lot of messages in the Pub/Sub, and take care of them slowly with a few Cloud Functions - which fits our need of rate-limiting.
Basically, everything is the same but this configuration changed, and the result is:
Which is exactly what I was looking for :)
You cannot compare to kafka because your kafka consumer is pulling messages at its convenience, while Cloud Function(CF) creates a push subscription that is pushing messages to your CF.
So some alternatives:
Create a HTTP CF triggered by cloud scheduler that will pull messages from your PULL subscription. Max retention of unack messages are 7 days (hope it's enough)
Use Cloud for which you increase max concurrency (max concurrent request), with proper sizing for CPU and RAM. Of course your can control the max number of cloud run instances (different from max concurrency). And Use PUSH subscription pushing to cloud run. But here Also you will be limited by 10 minutes ack deadline.

What happens to SNS' self throttled messages which are not delivered for a long time?

Say that I have a SNS which I am self throttling using the attribute maxReceivesPerSecond. Let's say we have a very high production rate, but due to the throttling, the consumption is very slow. This can lead to some messages being in the SNS for a long time.
I saw this SO answer, where it's mentioned that such messages will be deleted after 1 hour. But the quote doesn't exist in the documentation anymore.
So what is the current policy of deletion of un-delivered messages in a SNS now?
Amazon SNS now allows you to set a TTL (Time to Live) value of up to two weeks for each message. Messages that remain undelivered for the given period of time (expressed as a number of seconds since the message was published) will expire and will not be delivered.
You should refer to this link for more details:
https://aws.amazon.com/blogs/aws/sns-ttl-control/

AWS SQS Queue declining message count

The host that my SQS app runs on recently experienced some external DNS resolution issues. This meant that suddenly, I couldn't hit the SQS API endpoints. As a part of figuring out what was going on, I logged into the AWS console only to find the messages count slowly declining.
If the messages could not have been consumed by my app, how could the number of messages in the queue be declining?
Amazon SQS automatically deletes messages that have been in a queue for more than the maximum message retention period.
By default, the message retention period is 4 days. However, you can set the message retention period to any value from 60 seconds to 1,209,600 seconds (14 days) in the AWS console.
Link

Amazon SQS DLQ : Are sqs messages older than 14days moved to DLQ

How does the Amazon SQS's DLQ work when it comes to old messages?
Do messages older than 14days() get moved to DLQ instead of being deleted ?
I dont see any documentation relating to how older messages are handled.
From the documentation it looks like just the errored messages are moved to DLQ, is my assumption right ?
Your understanding is correct; Messages that are older than the retention period you have set (max of 14 days), will be deleted, not moved to the DLQ.
SQS automatically deletes messages that have been in a queue for more
than maximum message retention period. The default message retention
period is 4 days. However, you can set the message retention period to
a value from 60 seconds to 1209600 seconds (14 days) with
SetQueueAttributes.
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/MessageLifecycle.html
All the messages that are not processed/consumed will gets pushed to DLQ.
Amazon SQS supports dead-letter queues (DLQ), which other queues (source queues) can target for messages that can't be processed (consumed) successfully.
Reference - https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-dead-letter-queues.html