Cloud Pub/Sub - Single message delaying others (HoL blocking?) - google-cloud-platform

Today I experienced something I found rather interesting.
I had a batch of unacknowledged messages that were all published within the same second, and for an expected reason, one of these messages were being unacknowledged. However, the remaining messages kept being attempted delivered and were being processed and acknowledged successfully.
Why does this happen? Is this expected behavior? The messages did not have an ordering key, nor was message ordering enabled on the given subscription.
Also, I even attempted to ACK these messages manually in Google Cloud, but it did not seem to do anything. When I pulled after ACKing, the same messages showed up.

You are probably running into the case described in the note in the "dealing with duplicates" section of the documentation. If messages are batched together, all messages in the batch must be acknowledged or the entire batch of messages may be redelivered. This means that if 100 messages were batched together in a single publish request and 99 of them are acked, but 1 is not acked, all 100 may be redelivered. There are some efforts to avoid this duplicate delivery as much as possible in the service, but it is not guaranteed.

Related

Pub/Sub - unable to pull undelivered messages

There is an issue with my company's Pub/Sub. Some of our messages are stuck and the oldest unacked message age is increasing over time.
1 day charts:
and when I go to metrics explorer and select Expired ack deadlines count this is the one week chart.
I decided to find out why these messages are stuck, but when I ran the pull command (below), I got Listed 0 items response. It is therefore not possible to see them.
Is there a way how I can figure out why some of the messages are displayed as unacknowledged?
Also, the Unacked message count shows the same amount (around 2k) messages for the whole month, even though there are new messages published every day.
Here are the parameters we use for this subscription:
I tried to fix this error by setting the deadline to 600 seconds, but it didn't help.
Additionally, I want to mention that we use node.js Pub/Sub client library to handle the messages.
The most common causes of messages not being able to be pulled are:
The subscriber client already received the messages and "forgot" about them, perhaps due to an exception being thrown and not handled. In this case, the message will continue to be leased by the client until the deadline passes. The client libraries all extend the lease automatically until the maxExtension time is reached. If these are messages that are always forgotten, then it could be that they are redelivered to the subscriber and forgotten again, resulting in them not being pullable via the gcloud command-line tool or UI.
There could be a rogue subscriber. It could be that another subscriber is running somewhere for the same subscription and is "stealing" these messages. Sometimes this can be a test job or something that was used early on to see if the subscription works as expected and wasn't turned down.
You could be falling into the case of a large backlog of small messages. This should be fixed in more recent versions of the client library (v2.3.0 of the Node client has the fix).
The gcloud pubsub subscription pull command and UI are not guaranteed to return messages, even if there are some available to pull. Sometimes, rerunning the command multiple times in quick succession helps to pull messages.
The fact that you see expired ack deadlines likely points to 1, 2, or 3, so it is worth checking for those things. Otherwise, you should open a support case so the engineers can look more specifically at the backlog and determine where the messages are.

Verify the data has reached sent to GCP Pub/Sub

We have a project which receives data from sensors and then we send this data to GCP. For this we have used GCP's Pub/Sub model. Issue here is when we pull the messages, they are not in ordered manner. So we are not able to verify that the data we have sent to GCP has reached there or not.
Also GCP has mentioned that they don't guarantee the order of messages https://cloud.google.com/pubsub/docs/ordering
Any better way to verify this messages, other than the solutions recommended by GCP.
Ordering is not guaranteed in general in Pub/Sub, it is true. However, when using ordering keys as described in the ordering documentation to which you link, ordering is guaranteed. You would need to set an ordering key on published messages and enable message ordering on your subscription. Right now, the documentation only shows how to do this in Java, though other language examples will be coming soon.
Without using ordering, you could potentially monitor the backlog to see when num_undelivered_messages is 0. However, this has some drawbacks:
You would have to continuously query the metric to see its value.
The delay in computing the metric is O(minutes) and so it may be stale, resulting in either not tracking messages that were very recently published (resulting in it showing a value less than the actual size of the backlog) or not recording the fact that some messages were delivered and acked (resulting in it showing a value greater than the actual size of the backlog).
In general, it is preferred with Pub/Sub that your subscribers are always running and ready to receive data when it is published. Cloud Pub/Sub guarantees that messages successfully published will be received by subscribers, assuming subscribers are able to receive the messages within the message retention duration, which defaults to seven days.

The payload from my subscription doesn't show up in Nifi flow

After I sent a message to my GCP subscription, it takes a minute or two (should be instant) to appear in my Nifi flow. At this point, I see a bunch of XML and my payload isn't there. Does anyone know what's possibly happening?
If your push messages are not acknowledged then it may slow down delivery of the rest significantly.
Your use case looks more like the endpoints don't acknowledge it's delivery instantly (or acknowledgement is late due to some other reasons). If the message is not acknowledged immediately then a system will retry to deliveer it (with some delay) and it will keep trying untill it's acknowledged.
Also look at the Message Flow Control documentation which albo may point you to a solution.
Similar topic was also discussed here in StackOverflow (which might help you).

Google PubSub Python multiple subscriber clients receiving duplicate messages

I have a pretty straightforward app that starts a PubSub subscriber StreamingPull client. I have this deployed on Kubernetes so I can scale. When I have a single pod deployed, everything works as expected. When I scale to 2 containers, I start getting duplicate messages. I know that some small of duplicate messages is to be expected, but almost half the messages, sometimes more, are received multiple times.
My process takes about 600ms to process a message. The subscription acknowledgement deadline is set to 600s. I published 1000 messages, and the subscription was emptied in less than a minute, but the acknowledge_message_operation metric shows ~1500 calls, with a small amount with response_code expired. There were no failures in my process and all messages were acked upon processing. Logs show that the same message was received by the two containers at the exact same time. The minute to process all the messages was well below the acknowledgement deadline of the subscription, and the Python client is supposed to handle lease management, so I'm not sure why there were any expired messages at all. I also don't understand why the same message is sent to multiple subscriber clients at the same time.
Minimal working example:
import time
from google.cloud import pubsub_v1
PROJECT_ID = 'my-project'
PUBSUB_TOPIC_ID = 'duplicate-test'
PUBSUB_SUBSCRIPTION_ID = 'duplicate-test'
def subscribe(sleep_time=None):
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
PROJECT_ID, PUBSUB_SUBSCRIPTION_ID)
def callback(message):
print(message.data.decode())
if sleep_time:
time.sleep(sleep_time)
print(f'acking {message.data.decode()}')
message.ack()
future = subscriber.subscribe(
subscription_path, callback=callback)
print(f'Listening for messages on {subscription_path}')
future.result()
def publish(num_messages):
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(PROJECT_ID, PUBSUB_TOPIC_ID)
for i in range(num_messages):
publisher.publish(topic_path, str(i).encode())
In two terminals, run subscribe(1). In a third terminal, run publish(200). For me, this will give duplicates in the two subscriber terminals.
It is unusual for two subscribers to get the same message at the same time unless:
The message got published twice due to a retry (and therefore as far as Cloud Pub/Sub is concerned, there are two messages). In this case, the content of the two messages would be the same, but their message IDs would be different. Therefore, it might be worth ensuring that you are looking at the service-provided message ID to ensure the messages are indeed duplicates.
The subscribers are on different subscriptions, which means each of the subscribers would receive all of the messages.
If neither of these is the case, then duplicates should be relatively rare. There is an edge case in dealing with large backlogs of small messages with streaming pull (which is what the Python client library uses). Basically, if messages that are very small are published in a burst and subscribers then consume that burst, it is possible to see the behavior you are seeing. All of the messages would end up being sent to one of the two subscribers and would be buffered behind the flow control limits of the number of outstanding messages. These messages may exceed their ack deadline, resulting in redelivery, likely to the other subscriber. The first subscriber still has these messages in its buffer and will see these messages, too.
However, if you are consistently seeing two subscribers freshly started immediately receive the same messages with the same message IDs, then you should contact Google Cloud support with your project name, subscription name, and a sample of the message IDs. They will better be able to investigate why this immediate duplication is happening.
(Edited as I misread the deadlines)
Looking at the Streaming Pull docs, this seems like an expected behavior:
The gRPC StreamingPull stack is optimized for high throughput and therefore
buffers messages. This can have some consequences if you are attempting to
process large backlogs of small messages (rather than a steady stream of new
messages). Under these conditions, you may see messages delivered multiple times
and they may not be load balanced effectively across clients.
From: https://cloud.google.com/pubsub/docs/pull#streamingpull

Google Cloud PubSub Message Delivered More than Once before reaching deadline acknowledgement time

Background:
We configured cloud pubsub topic to interact within multiple app engine services,
There we have configured push based subscribers. We have configured its acknowledgement deadline to 600 seconds
Issue:
We have observed pubsub has pushed same message twice (more than twice from some other topics) to its subscribers, Looking at the log I can see this message push happened with the gap of just 1 Second, Ideally as we have configured ackDeadline to 600 seconds, pubsub should re-attempt message delivery only after 600 seconds.
Need following answers:
Why same message has got delivered more than once in 1 second only
Does pubsub doesn’t honors ackDeadline configuration before
reattempting message delivery?
References:
- https://cloud.google.com/pubsub/docs/subscriber
Message redelivery can happen for a couple of reasons. First of all, it is possible that a message got published twice. Sometimes the publisher will get back an error like a deadline exceeded, meaning the publish took longer than anticipated. The message may or may not have actually been published in this situation. Often, the correct action is for the publisher to retry the publish and in fact that is what the Google-provided client libraries do by default. Consequently, there may be two copies of the message that were successfully published, even though the client only got confirmation for one of them.
Secondly, Google Cloud Pub/Sub guarantees at-least-once delivery. This means that occasionally, messages can be redelivered, even if the ackDeadline has not yet passed or an ack was sent back to the service. Acknowledgements are best effort and most of the time, they are successfully processed by the service. However, due to network glitches, server restarts, and other regular occurrences of that nature, sometimes the acknowledgements sent by the subscriber will not be processed, resulting in message redelivery.
A subscriber should be designed to be resilient to these occasional redeliveries, generally by ensuring that operations are idempotent, i.e., that the results of processing the message multiple times are the same, or by tracking and catching duplicates. Alternatively, one can use Cloud Dataflow as a subscriber to remove duplicates.