I'm developing a message retry mechanism using Cloud Pub/Sub and Cloud Function with Pub/Sub trigger. I would like to know if I can set a visibility timeout(Just like in RabbitMQ) on a message in Cloud Pub/Sub so that it will be visible to my Cloud Function only after a certain time and gets processed? If not, What is the workaround?
Yes, Cloud Pub/Sub has a feature called acknowledgement deadline which works similarly to visibility timeout.
According to the documentation comparing Pub/Sub to Amazon SQS:
Similarly, Pub/Sub has an acknowledgement deadline. By default, this deadline is 10 seconds, but it can be extended up to 10 minutes. For a pull subscription, subscribers can also modify the deadline on the fly on a per-message basis to allow for shorter or longer time to process a given message.
Related
I have a Google Cloud Function subscribed to a topic. Our GCP Pub/Sub publishes a message to the topic when cloud scheduler invoke GCP Pub/Sub each 5 minutes. The problem is that the cloud functions gets sometimes invoked twice 90s after invoking first one.
The acknowledgement deadline on the subscription is 600 seconds.
So, I can't figure it out why GCF is invoked twice in 90s by GCP Pub/Sub.
Does invoking twice 90s after related to something?
Your duplicate could either be on the publish side or on the subscribe side. If the duplicate messages have different message IDs, then your duplicates are generated on the publish side. This could be caused by retries on the publish side in response to retryable errors. If the messages have the same message ID, then the duplication is on the subscribe side within Pub/Sub.
Cloud Pub/Sub offers at-least-once delivery semantics. That means it is possible for duplicates to occur, even if you acknowledge the message and even if the acknowledgement deadline has not passed. If you want stronger guarantees around delivery, you can use Pub/Sub's exactly once feature, which is currently in public preview. However, this will require you to set up your Cloud Function with an HTTP trigger and to create a push subscription in Pub/Sub that points to the address of the function because there is no way to set the exactly once setting on a subscription created by Cloud Functions.
I have a cloud run service which will run upto 60 minutes.The pubsub is the trigger point for execution of cloud run service.
pubsub configuration for Retry policy is set to max (600s).
Now when a message is published from pubsub, cloud run starts executing, as the complete execution takes around 60 minutes to complete, but the pubsub message after 600s starts to retry again as it doesn't received any acknowledge from cloud run and again causing cloud run service executing again and again.
How to handle the pubsub retry here so that cloud run will not execute again and again because of retrying.
I was thinking to use Cloud Tasks, or Cloud Workflows as a proxy for your long running Cloud Run. Unfortunately both services have max timeout of 1800s (30minutes). By the way upcoming callback feature of Cloud Workflows will have 12h timeout. In the meantime I would create a proxy as Cloud Function triggered by PubSub message that will be immediately acknowledged, and the function will call your Cloud Run in async with the PubSub message and return right away.
With push subscriptions, such as what you'd use with a Cloud Run service, the maximum ack deadline for a message is indeed 600s. If using pull, one can call ModifyAckDeadline to extend the deadline for a message. In fact, the client libraries for Cloud Pub/Sub do this automatically for up to a configured amount of time (default is 60m).
There is not going to be a way to extend the deadline if using a push subscription. Therefore, your options are:
Switch to a pull subscription. You could potentially do this via Cloud Run, though it would not be the best fit. More likely, you want to spin up a job in an environment that can keep it running without any kind of trigger, e.g., GKE. If you switch to pull, you can extend the ack deadline, though note that duplicates are still possible, even if the ack deadline has not expired or the message has already been acknowledged. They should be rare, but you still have to account for it.
When you receive the message, persist it somewhere, either on disk or in a database, and then acknowledge the message once persisted. Once you are actually done processing the message an hour later, you remove it from this persistent storage. Of course, you could just persist the message instead of publishing it via Pub/Sub and rely on the persistence layer's notifications mechanisms to learn of the new message. For example, if you write to GCS, you could use Cloud Storage notifications via Pub/Sub. In this case, you probably want to have some periodic read from your storage to see if there are any messages that have not been processed for some period of time and if so, reprocess them. For example, if you write with the message the time at which processing started and if more than some amount of time has passed since then and the message is still present, you could start the processing over again.
I have a Cloud Function that is being triggered from a Pub/Sub topic.
I want to rate limit my Cloud Function, so I set the max instances to 5. In my case, there will be a lot more produced messages than Cloud Functions (and I want to limit the number of running Cloud Functions).
I expected this process to behave like Kafka/queue - the topic messages will be accumulated, and the Cloud Function will slowly consume messages until the topic will be empty.
But it seems that all the messages that did not trigger cloud function (ack), simply sent a UNACK - and left behind. My subscription details:
The ack deadline max value is too low for me (it may take a few hours until the Cloud Function will get to messages due to the rate-limiting).
Anything I can change in the Pub/Sub to fit my needs? Or I'll need to add a queue? (Pub/Sub to send to a Task Queue, and Cloud Function consumes the Task Queue?).
BTW, The pub/sub data is actually GCS events.
If this was AWS, I would simply send S3 file-created events to SQS and have Lambdas on the other side of the queue to consume.
Any help would be appreciated.
The ideal solution is simply to change the retrying policy.
When using "Retry after exponential backoff delay", the Pub/Sub will keep retrying even after the maximum exponential delay (600 seconds).
This way, you can have a lot of messages in the Pub/Sub, and take care of them slowly with a few Cloud Functions - which fits our need of rate-limiting.
Basically, everything is the same but this configuration changed, and the result is:
Which is exactly what I was looking for :)
You cannot compare to kafka because your kafka consumer is pulling messages at its convenience, while Cloud Function(CF) creates a push subscription that is pushing messages to your CF.
So some alternatives:
Create a HTTP CF triggered by cloud scheduler that will pull messages from your PULL subscription. Max retention of unack messages are 7 days (hope it's enough)
Use Cloud for which you increase max concurrency (max concurrent request), with proper sizing for CPU and RAM. Of course your can control the max number of cloud run instances (different from max concurrency). And Use PUSH subscription pushing to cloud run. But here Also you will be limited by 10 minutes ack deadline.
I currently have a pub/sub push subscription that pushes to a http endpoint. This endpoint then triggers my cloud function. I am running into an issue where the same events that have already been sent to my cloud function are being resent by the pub/sub subscription. I increased my subscription's ack deadline to 3 minutes but after about a minute into my cloud functions execution, it will resend the same event that has already been processed. This leads to multiple invocations of my cloud function and further issues. I haven't seen any way to disable pub/sub retries but wondering if there are any suggestions as to a root cause of this or any work arounds?
Current set-up:
cloud function timeout limit: 120seconds
pub/sub subscription ack deadline: 180seconds
dead-lettering after 5 retries
You will need to consider idempotency and flag any recent retries to prevent them from firing again. This could be a timestamp stored in a database and filter based on time and any metadata you contain. Another important thing is to return a successful result.
Doug covers this concept in a video, while it doesn't reference pubsub, it is still just as valid: https://www.youtube.com/watch?v=Pwsy8XR7HNE
My system run on an Amazon autoscaling group and one feature allows user to user messaging and I have the following use case to resolve.
A new message is sent in my application between users.
A message to notify the the user by e-mail is dropped into a queue with a 60 second delay. This delay allows time for a realtime chat client (faye/angularjs) to see the message and mark it as viewed.
After the delay the message is picked up, the "read" status is checked and if it has not been read by the client an e-mail is dispatched.
Originally I was going to use a cronjob on each application server poll the message queue however it occurs to me it would be more efficient to use SNS to call some kind of e-mail sending endpoint (perhaps in Lambda).
I can't see any way to have SNS poll SQS however, can anybody suggest how this could be done? Essentially I want SNS with a delay so that I don't spam somebody in a "live" chat with e-mail alerts.
Thanks
Unfortunately this is not yet available out of the box. The missing part is the generation of Amazon SNS notifications on message arrival/visibility by an Amazon SQS queue, be it via push (similar to Amazon S3 notifications, or via poll similar to Amazon Kinesis subscriptions (see The Pull/Push Event Models for more on the difference), which would both allow to directly connect an AWS Lambda function to the resp. SQS delay queue events, see e.g.:
Lambda with SQS
That being said, you can work around this limitations in a few ways, for example:
trigger your Lambda function on schedule (e.g. once per minute), and poll your SQS delay queue from there
scheduled Lambda functions are an eagerly awaited missing Lambda feature in turn, but it is more easily worked around, be it either by a cron job of yours, or Eric Hammond's Unreliable Town Clock (UTC) for example
The AWS Lambda team has delivered many/most similar feature requests over recent month' btw., so I would expect them to offer both SQS event handling and scheduled Lambda functions over the course of the year still.
In early 2019, this problem can be solved in a few different ways:
SQS as an Event Source to Lambda (finally announced 2018-06-28),
similar to the OP's original design.
AWS Step Functions (announced 2016-12-01), using a wait step for
the delay.
DynamoDB Streams with Lambda triggers (announced 2017-02-17),
using TTL expiration on items to fire the Lambda trigger.
As SNS has a topic limit of 100,000 per account, I would recommend using Amazon SES to send the emails (62,000 free emails/month could help with implementation cost decisions).