Cloud Functions with Cloud pubsub trigger queuing the messages - google-cloud-platform

We have a Cloud Functions with a Pubsub trigger and it will invoke an application on HTTP endpoint based on the message. When we need to update the backend application we want the Cloud Functions to pause and queue up the messages and then start again when the application is up.
Currently we are logging all the failed messages in Stackdriver and resubmitting them after the release. Is there any better way to do this?

Your system must be resilient to outage (managed by you when you perform an update, or unexpected). Therefore, I recommend you to handle both in the same way.
Set the retry parameter when you deploy your Cloud Functions (background functions, bind with PubSub topic). When your Cloud Functions can't process the message (the application isn't available), raise an error. The message will be retried later.
Like that, you have nothing to worry. When the message can be delivered, it is, and when it can't, there is an error and the message is retried.

Related

Why does GCP Pub/Sub publish a message twice?

I have a Google Cloud Function subscribed to a topic. Our GCP Pub/Sub publishes a message to the topic when cloud scheduler invoke GCP Pub/Sub each 5 minutes. The problem is that the cloud functions gets sometimes invoked twice 90s after invoking first one.
The acknowledgement deadline on the subscription is 600 seconds.
So, I can't figure it out why GCF is invoked twice in 90s by GCP Pub/Sub.
Does invoking twice 90s after related to something?
Your duplicate could either be on the publish side or on the subscribe side. If the duplicate messages have different message IDs, then your duplicates are generated on the publish side. This could be caused by retries on the publish side in response to retryable errors. If the messages have the same message ID, then the duplication is on the subscribe side within Pub/Sub.
Cloud Pub/Sub offers at-least-once delivery semantics. That means it is possible for duplicates to occur, even if you acknowledge the message and even if the acknowledgement deadline has not passed. If you want stronger guarantees around delivery, you can use Pub/Sub's exactly once feature, which is currently in public preview. However, this will require you to set up your Cloud Function with an HTTP trigger and to create a push subscription in Pub/Sub that points to the address of the function because there is no way to set the exactly once setting on a subscription created by Cloud Functions.

Is it possible to send pubsub messages to stack driver logs?

I have seen examples of using log sinks to send stack driver logs to pub/sub, but I want to do the opposite.
Is it possible to configure stack driver logs to subscribe to a topic and just dump the pubsub messages to stack driver as logs? Or some way to just have all events sent to a topic go to stack driver logs?
This is instead of having to write a custom application that has to read them and write them as logs.
No, you can't sink the PubSub messages in Cloud Logging. You have to create a small custom app (a Cloud Functions (use runtime v2 if you have a lot of message to reduce cost) or a Cloud Run) to get the messages, to log them and ack them. Less than 10 lines of code, but you have to do it.

long running cloud run process and pubsub message retry

I have a cloud run service which will run upto 60 minutes.The pubsub is the trigger point for execution of cloud run service.
pubsub configuration for Retry policy is set to max (600s).
Now when a message is published from pubsub, cloud run starts executing, as the complete execution takes around 60 minutes to complete, but the pubsub message after 600s starts to retry again as it doesn't received any acknowledge from cloud run and again causing cloud run service executing again and again.
How to handle the pubsub retry here so that cloud run will not execute again and again because of retrying.
I was thinking to use Cloud Tasks, or Cloud Workflows as a proxy for your long running Cloud Run. Unfortunately both services have max timeout of 1800s (30minutes). By the way upcoming callback feature of Cloud Workflows will have 12h timeout. In the meantime I would create a proxy as Cloud Function triggered by PubSub message that will be immediately acknowledged, and the function will call your Cloud Run in async with the PubSub message and return right away.
With push subscriptions, such as what you'd use with a Cloud Run service, the maximum ack deadline for a message is indeed 600s. If using pull, one can call ModifyAckDeadline to extend the deadline for a message. In fact, the client libraries for Cloud Pub/Sub do this automatically for up to a configured amount of time (default is 60m).
There is not going to be a way to extend the deadline if using a push subscription. Therefore, your options are:
Switch to a pull subscription. You could potentially do this via Cloud Run, though it would not be the best fit. More likely, you want to spin up a job in an environment that can keep it running without any kind of trigger, e.g., GKE. If you switch to pull, you can extend the ack deadline, though note that duplicates are still possible, even if the ack deadline has not expired or the message has already been acknowledged. They should be rare, but you still have to account for it.
When you receive the message, persist it somewhere, either on disk or in a database, and then acknowledge the message once persisted. Once you are actually done processing the message an hour later, you remove it from this persistent storage. Of course, you could just persist the message instead of publishing it via Pub/Sub and rely on the persistence layer's notifications mechanisms to learn of the new message. For example, if you write to GCS, you could use Cloud Storage notifications via Pub/Sub. In this case, you probably want to have some periodic read from your storage to see if there are any messages that have not been processed for some period of time and if so, reprocess them. For example, if you write with the message the time at which processing started and if more than some amount of time has passed since then and the message is still present, you could start the processing over again.

How to deploy new code when GCE is working on a job?

I started a GCE VM with a Docker image that runs a pub/sub subscriber, which handles the messages and start some big computational work (long running).
When we are ready to deploy new code, how do we ensure all the current running jobs are finished (make the deploy block on task finish). What's the best practice here?
I believe that you can look at Google Cloud Functions. Saying this, you can create a programming function that will respond to some specific events without the need to manage a server or runtime environment.
In particular, it is feasible subscribing specific Cloud function to Pub/Sub topic and every message published to this topic will trigger some custom code execution with message contents passed as input data generating google.pubsub.topic.publish event type.
Supposedly, you can compose some function, that would be subscribed to the same Pub/Sub topic as the message consumer from your example, triggering the desired deployment upon some condition match, checking the status of the long running job.

Is there a way to invoke an AWS Step Function or Lambda in response to a websocket message?

TLDR: Is there a way to trigger an AWS lambda or step function based on an external system's websocket message?
I'm building a synchronization service which connects to a system which supports websockets. I can use timers in step functions to wake periodically and call lambda functions to perform the sync, but I would prefer to subscribe to the websocket and perform the sync only when a message is received.
There are plenty of ways to expose websockets in AWS, but I haven't found a way to consume them short of something like an EC2 instance with a custom service running on it. I'm trying to stay in the serverless ecosystem.
It seems like consuming a websocket is a fairly common requirement; have I overlooked something?
Lambdas are ephemeral. They can't be sitting there waiting for a websocket message.
However, I think what you can do is use an Activity task. Once the step function gets to that state it will wait. The activity worker will run on an EC2 instance and subscribe to a websocket. When a message is received it will poll the State Machine for an activity token and call SendTaskSuccess. The state machine will then continue execution and call the lambda that performs the sync.
You can use AWS API gateway service and lambda. It supports web sockets and can trigger lambda on request