I have a cloud function which publishes a message to PubSub and that triggers a cloud run to perform an archive file process. When there are large files, my cloud run python code takes some time to process the data it looks like PubSub is retrying the message after 20 seconds (default acknowledge deadline time) which is triggering another instance from my Cloud Run. I've increased the acknowledge deadline to 600s and redeployed everything but it's still retrying the message after 20 seconds. I am missing anything?
Cloud Function publishing the message code:
# Publishes a message
try:
publish_future = publisher.publish(topic_path, data=message_bytes)
publish_future.result() # Verify the publish succeeded
return 'Message published.'
except Exception as e:
print(e)
return (e, 500)
Here is the PubSub subscription config:
Logging showing a second instance being triggered after 20s:
Cloud Run code:
#app.route("/", methods=["POST"])
def index():
envelope = request.get_json()
if not envelope:
msg = "no Pub/Sub message received"
print(f"error: {msg}")
return f"Bad Request: {msg}", 400
if not isinstance(envelope, dict) or "message" not in envelope:
msg = "invalid Pub/Sub message format"
print(f"error: {msg}")
return f"Bad Request: {msg}", 400
pubsub_message = envelope["message"]
if isinstance(pubsub_message, dict) and "data" in pubsub_message:
#Decode base64 event['data']
event_data = base64.b64decode(pubsub_message['data']).decode('utf-8')
message = json.loads(event_data)
#logic to process data/archive
return ("", 204)
You should be able to control the retries by setting the minimumBackoff retrypolicy. You can set the minimumBackoff time to the max of 600 seconds, like your ack deadline, so that redelivered messages will be more than 600 seconds old. This should lower the number of occurrences you see.
To handle duplicates, making your subscriber idempotent is recommended. You need to apply some kind of code check to see if the messageId was processed before.
You can find below in the documentation at-least-once-delivery :
Typically, Pub/Sub delivers each message once and in the order in which it was published. However, messages may sometimes be delivered out of order or more than once. In general, accommodating more-than-once delivery requires your subscriber to be idempotent when processing messages. You can achieve exactly once processing of Pub/Sub message streams using the Apache Beam programming model. The Apache Beam I/O connectors let you interact with Cloud Dataflow via controlled sources and sinks. You can use the Apache Beam PubSubIO connector (for Java and Python) to read from Cloud Pub/Sub. You can also achieve ordered processing with Cloud Dataflow by using the standard sorting APIs of the service. Alternatively, to achieve ordering, the publisher of the topic to which you subscribe can include a sequence token in the message.
Related
I'm testing Cloud Pub/Sub. According to google documentation, ack_deadline of a pull substription can be set between 10s-600s ie. msg will be redelivered by Pubsub if ack_deadline is passed.
I'm processing the pubsub message in subscriber client before ack-ing the msg. This processing time can take ~ 700s which exceeds the max limit of 600s.
reproduction:
create a topic and subscription (by default Acknowledgement deadline is set to 10s)
run subscriber code (which ack the messages) see below
publish some msg on the topic from Web UI
subscriber code:
import time
import datetime
from concurrent.futures import TimeoutError
from google.cloud import pubsub_v1
project_id = "my-project"
subscription_id = "test-sub"
def sub():
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(project_id, subscription_id)
def callback(message: pubsub_v1.subscriber.message.Message) -> None:
# My processing code, which takes 700s
time.sleep(700) # sleep function to demonstrate processing
print(f"Received {message}."+ str(datetime.datetime.now()) )
message.ack()
print("msg acked")
streaming_pull_future = subscriber.subscribe(subscription_path, callback=callback)
print(f"Listening for messages on {subscription_path}..\n")
try:
streaming_pull_future.result()
except:
streaming_pull_future.cancel() # Trigger the shutdown.
streaming_pull_future.result() # Block until the shutdown is complete.
subscriber.close()
if __name__ == "__main__":
sub()
Even if the ack_deadline is reached, the message is getting acked which is weird. According to my understanding, pubsub should redeliver the message again and eventually go this code will go into an infinite loop.
am I missing something here?
The reason that the message is getting acked and not getting redelivered even after the ack deadline specified in the subscription is reached is that the Pub/Sub client libraries internally extend ack deadlines up to a time specified when instantiating the subscriber client. By default, the time is 1 hour. You can change this amount of time by changing the max_lease_duration parameter in the FlowControl object (search for "FlowControl" in the Types page) passed into the subscribe method.
That's correct. There are several solutions with their tradeoff
Ack immediately the message and process it. The problem is: if you have an outage on your system, you lost the message
Save the message ID state in a database (firestore for instance).
If the message ID is new, start the processing; at the end of the processing update the message ID status in the database
If the message ID already exists, sleep a while (about 90s), check the status of the message ID in the database. If DONE, ack the message. If not, sleep again (max 6 time. Then NACK and start again that process. To break the loop, repeat the process until the message timestamp is above 1h)
Save the message in database, ack the message, and start the processing. In case of outage, at the start, check the not yet done messages and restart the process for each of them. At the end of the process, mark them as DONE.
You can also imagine other pattern. nothing is real perfect, depends on your needs.
Situation:
I'm trying to have a single message in Pub/Sub processed by exactly 1 instance of Cloud Run. Additional messages will be processed by another instance of Cloud Run. Each message triggers a heavy computation that runs for around 100s in the Cloud Run instance.
Currently, Cloud Run is configured with max concurrency requests = 1, and min/max instances of 0/5. Subscription is set to allow for 600s Ack deadline.
Issue:
Each message seems to be triggering multiple instances of Cloud Run to be spun up. I believe that it is due to high CPU utilization that's causing Cloud Run to spin up additional instances to help process. Unfortunately, these new instances are attempting to process the same exact message, causing unintented results.
Question:
Is there a way to force Cloud Run to only have 1 instance process a single message, regardless of CPU utilization and other potential factors?
Relevant Code Snippet:
import base64
import json
from fastapi import FastAPI, Request, Response
app = FastAPI()
#app.post("/")
async def handleMessage(request: Request):
envelope = await request.json()
# Basic data validation
if not envelope:
msg = "no Pub/Sub message received"
print(f"error: {msg}")
return Response(content=msg, status_code=400)
if not isinstance(envelope, dict) or "message" not in envelope:
msg = "invalid Pub/Sub message format"
print(f"error: {msg}")
return Response(content=msg, status_code=400)
message = envelope["message"]
if isinstance(message, dict) and "data" in message:
data = json.loads(base64.b64decode(message["data"]).decode("utf-8").strip())
try:
# Do computationally heavy operations here
# Will run for about 100s
return Response(status_code=204)
except Exception as e:
print(e)
Thanks!
I've found the issue.
Apparently, Pub/Sub guarantees "at least once" delivery, which means it is possible for it to deliver a message to a subscriber more than once. The onus is therefore on the subscriber, which in my case is Cloud Run, to handle such scenarios (idempotency) gracefully.
I am using the following GCP Pub/Sub REST APIs for pulling and Acknowledging messages.
For pulling message:-
POST https://pubsub.googleapis.com/v1/projects/myproject/subscriptions/mysubscription:pull
{
"returnImmediately": "false",
"maxMessages": "10"
}
To acknowledge message:-
POST https://pubsub.googleapis.com/v1/projects/myproject/subscriptions/mysubscription:acknowledge
{
"ackIds": [
"dQNNHlAbEGEIBERNK0EPKVgUWQYyODM2LwgRHFEZDDsLRk1SK..."
]
}
I am using the postman tool for calling the above APIs.But I can see the same message with same messageId and a different ackId even after the acknowledgement, when I pull the messages next time.Is there any mechanism available to exclude the acknowledged messages in gcp pull (subscriptions/mysubscription:pull)
Cloud Pub/Sub is an at-least-once delivery system, so some duplicates are expected. However, if you are always seeing duplicates, it is likely that you are not acknowledging the message before the ack deadline passes. The default ack deadline is 10 seconds. If you do not call ack within that time period, then the message will be redelivered. You can set the ack deadline on a subscription to up to 600 seconds.
If all of your messages are expected to take a longer time to process, then it is best to increase the ack deadline. If only a couple of messages will be slow and most will be processed quickly, then it's better to use the modifyAckDeadline call to increase the ack deadline on a per-message basis.
I have this simple python function where i am just taking the input from pubsub topic and then print it.
import base64,json
def hello_pubsub(event, context):
"""Triggered from a message on a Cloud Pub/Sub topic.
Args:
event (dict): Event payload.
context (google.cloud.functions.Context): Metadata for the event.
"""
pubsub_message = base64.b64decode(event['data'])
data = json.loads(pubsub_message)
for i in data:
for k,v in i.items():
print(k,v)
If i had used the pubsub_v1 library, there i could do following.
subscriber = pubsub_v1.SubscriberClient()
def callback(message):
message.ack()
subscriber.subscribe(subscription_path, callback=callback)
How do i ack the message in pubsub triggered function?
Following your latest message, I understood the (common) mistake. With Pubsub, you have
a topic, and the publishers can publish messages in it
(push or pull) subscriptions. All the messages published in the topic are duplicated in each subscription. The message queue belong to each subscription.
Now, if you look closely to your subscriptions on your topic, you will have at least 2.
The pull subscription that you have created.
A push subscription created automatically when you deployed your Cloud Function on the topic.
The messages of the push subscription is correctly processed and acknowledge. However those of pull subscription aren't, because the Cloud Function don't consume and acknowledge them; the subscription are independent.
So, your Cloud Function code is correct!
The message should be ack'd automatically if the function terminates normally without an error.
How can I bulk move messages from one topic to another in GCP Pub/Sub?
I am aware of the Dataflow templates that provide this, however unfortunately restrictions do not allow me to use Dataflow API.
Any suggestions on ad-hoc movement of messages between topics (besides one-by-one copy and pasting?)
Specifically, the use case is for moving messages in a deadletter topic back into the original topic for reprocessing.
You can't use snapshots, because snapshots can be applied only on subscriptions of the same topics (to avoid message ID overlapping).
The easiest way is to write a function that pull your subscription. Here, how I will do it:
Create a topic (named, for example, "transfer-topic") with a push subscription. Set the timeout to 10 minutes
Create a Cloud Functions HTTP triggered by PubSub push subscription (or a CLoud Run service). When you deploy it, set the timeout to 9 minutes for Cloud Function and to 10 minutes for Cloud Run. The content of the processing is the following
Read a chunk of messages (for examples 1000) from the deadletter pull subscription
Publish the messages (in bulk mode) into the initial topic
Acknowledge the messages of the dead letter subscription
Repeat this up to the pull subscription is empty
Return code 200.
The global process:
Publish a message in the transfer-topic
The message trigger the function/cloud run with a push HTTP
The process pull the messages and republish them into the initial topic
If the timeout is reached, the function crash and PubSub perform a retry of the HTTP request (according with an exponential backoff).
If all the message are processed, the HTTP 200 response code is returned and the process stopped (and the message into the transfer-topic subscription is acked)
this process allow you to process a very large amount of message without being worried about the timeout.
I suggest that you use a Python script for that.
You can use the PubSub CLI to read the messages and publish to another topic like below:
from google.cloud import pubsub
from google.cloud.pubsub import types
# Defining parameters
PROJECT = "<your_project_id>"
SUBSCRIPTION = "<your_current_subscription_name>"
NEW_TOPIC = "projects/<your_project_id>/topics/<your_new_topic_name>"
# Creating clients for publishing and subscribing. Adjust the max_messages for your purpose
subscriber = pubsub.SubscriberClient()
publisher = pubsub.PublisherClient(
batch_settings=types.BatchSettings(max_messages=500),
)
# Get your messages. Adjust the max_messages for your purpose
subscription_path = subscriber.subscription_path(PROJECT, SUBSCRIPTION)
response = subscriber.pull(subscription_path, max_messages=500)
# Publish your messages to the new topic
for msg in response.received_messages:
publisher.publish(NEW_TOPIC, msg.message.data)
# Ack the old subscription if necessary
ack_ids = [msg.ack_id for msg in response.received_messages]
subscriber.acknowledge(subscription_path, ack_ids)
Before running this code you will need to install the PubSub CLI in your Python environment. You can do that running pip install google-cloud-pubsub
An approach to execute your code is using Cloud Functions. If you decide to use it, pay attention in two points:
The maximum time that you function can take to run is 9 minutes. If this timeout get exceeded, your function will terminate without finishing the job.
In Cloud Functions you can just put google-cloud-pubsub in a new line of your requirements file instead of running a pip command.