Cloud pubsub slow poll rate - google-cloud-platform

I have a pubsub topic, with one subscription, and two different subscribers are pulling from it.
Using stackdriver, I can see that the subscription has ~1000 messages.
Each subscriber runs the following poll loop:
client = pubsub.Client()
topic = client.topic(topic_name)
subscription = pubsub.Subscription(subscription_name)
while True:
messages = subscription.pull(return_immediately=True, max_messages=100, client=client)
print len(messages)
# put messages in local queue for later processing. Those processes will ack the subsription
My issue is a slow poll rate - even though I have plenty of messages waiting to be polled, I'm getting only several messages each time. Also, lots of responses are back without any messages. According to stackdriver, my messages pulled rate is ~1.5 messages/sec.
I tried to use return_immediately=False, and it improved it a bit - the pull rate increased to ~2.5 messages/sec, but still - not the rate I would expect to have.
Any ideas how to increase pull rate? Any pubsub poll best practices?

In order to increase your pull rate, you need to have more than one outstanding pull request at a time. How many depends on how fast and from how many places you publish. You'll need at least a few outstanding at all times. As soon as one of them returns, create another pull request. That way, whenever Cloud Pub/Sub is ready to deliver messages to your subscriber, you have requests waiting to receive messages.


Is there a way to retrieve the count of messages in a PubSub subscription (in realtime)?

I want to achieve batch consuming of a PubSub subscription, retrieving all the messages that were in the subscription at the begining of my process. To do so, I use PubSub's asynchronous pulling for Java, and the consumer.ack() and consumer.nack() functions to process exactly the number of messages that I want, and make the subscription redeliver the messages that I have received but not processed yet. My problem being that I did not managed to find a way to retrieve the real time count of messages in my subscription.
I have started to request metric from Google Cloud Monitoring, but unfortunately the metric has a ~3 minutes latency with the real count of undelivered messages in the subscription.
Is there any way to retrieve this message count on real time ?
There is no way to retrieve the message count in real time, no. Also keep in mind that such a number would not be sufficient to retrieve all of the messages that were in the subscription at the beginning of the process unless you can guarantee that no publishing is happening at the same time.
If there is publishing, then your subscriber could get those messages before messages published earlier, unless you are using ordered message delivery and even still, those delivery guarantees are per ordering key, not a total ordering guarantee. If you can guarantee that there are no publishes during this time and/or you are only bringing the subscriber up periodically, then it sounds more like a batch case, which means you may want to consider a database or a GCS file as an alternative place to store the messages for processing.

Google PubSub Python multiple subscriber clients receiving duplicate messages

I have a pretty straightforward app that starts a PubSub subscriber StreamingPull client. I have this deployed on Kubernetes so I can scale. When I have a single pod deployed, everything works as expected. When I scale to 2 containers, I start getting duplicate messages. I know that some small of duplicate messages is to be expected, but almost half the messages, sometimes more, are received multiple times.
My process takes about 600ms to process a message. The subscription acknowledgement deadline is set to 600s. I published 1000 messages, and the subscription was emptied in less than a minute, but the acknowledge_message_operation metric shows ~1500 calls, with a small amount with response_code expired. There were no failures in my process and all messages were acked upon processing. Logs show that the same message was received by the two containers at the exact same time. The minute to process all the messages was well below the acknowledgement deadline of the subscription, and the Python client is supposed to handle lease management, so I'm not sure why there were any expired messages at all. I also don't understand why the same message is sent to multiple subscriber clients at the same time.
Minimal working example:
import time
from import pubsub_v1
PROJECT_ID = 'my-project'
PUBSUB_TOPIC_ID = 'duplicate-test'
PUBSUB_SUBSCRIPTION_ID = 'duplicate-test'
def subscribe(sleep_time=None):
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
def callback(message):
if sleep_time:
print(f'acking {}')
future = subscriber.subscribe(
subscription_path, callback=callback)
print(f'Listening for messages on {subscription_path}')
def publish(num_messages):
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path(PROJECT_ID, PUBSUB_TOPIC_ID)
for i in range(num_messages):
publisher.publish(topic_path, str(i).encode())
In two terminals, run subscribe(1). In a third terminal, run publish(200). For me, this will give duplicates in the two subscriber terminals.
It is unusual for two subscribers to get the same message at the same time unless:
The message got published twice due to a retry (and therefore as far as Cloud Pub/Sub is concerned, there are two messages). In this case, the content of the two messages would be the same, but their message IDs would be different. Therefore, it might be worth ensuring that you are looking at the service-provided message ID to ensure the messages are indeed duplicates.
The subscribers are on different subscriptions, which means each of the subscribers would receive all of the messages.
If neither of these is the case, then duplicates should be relatively rare. There is an edge case in dealing with large backlogs of small messages with streaming pull (which is what the Python client library uses). Basically, if messages that are very small are published in a burst and subscribers then consume that burst, it is possible to see the behavior you are seeing. All of the messages would end up being sent to one of the two subscribers and would be buffered behind the flow control limits of the number of outstanding messages. These messages may exceed their ack deadline, resulting in redelivery, likely to the other subscriber. The first subscriber still has these messages in its buffer and will see these messages, too.
However, if you are consistently seeing two subscribers freshly started immediately receive the same messages with the same message IDs, then you should contact Google Cloud support with your project name, subscription name, and a sample of the message IDs. They will better be able to investigate why this immediate duplication is happening.
(Edited as I misread the deadlines)
Looking at the Streaming Pull docs, this seems like an expected behavior:
The gRPC StreamingPull stack is optimized for high throughput and therefore
buffers messages. This can have some consequences if you are attempting to
process large backlogs of small messages (rather than a steady stream of new
messages). Under these conditions, you may see messages delivered multiple times
and they may not be load balanced effectively across clients.

Google Cloud PubSub Message Delivered More than Once before reaching deadline acknowledgement time

We configured cloud pubsub topic to interact within multiple app engine services,
There we have configured push based subscribers. We have configured its acknowledgement deadline to 600 seconds
We have observed pubsub has pushed same message twice (more than twice from some other topics) to its subscribers, Looking at the log I can see this message push happened with the gap of just 1 Second, Ideally as we have configured ackDeadline to 600 seconds, pubsub should re-attempt message delivery only after 600 seconds.
Need following answers:
Why same message has got delivered more than once in 1 second only
Does pubsub doesn’t honors ackDeadline configuration before
reattempting message delivery?
Message redelivery can happen for a couple of reasons. First of all, it is possible that a message got published twice. Sometimes the publisher will get back an error like a deadline exceeded, meaning the publish took longer than anticipated. The message may or may not have actually been published in this situation. Often, the correct action is for the publisher to retry the publish and in fact that is what the Google-provided client libraries do by default. Consequently, there may be two copies of the message that were successfully published, even though the client only got confirmation for one of them.
Secondly, Google Cloud Pub/Sub guarantees at-least-once delivery. This means that occasionally, messages can be redelivered, even if the ackDeadline has not yet passed or an ack was sent back to the service. Acknowledgements are best effort and most of the time, they are successfully processed by the service. However, due to network glitches, server restarts, and other regular occurrences of that nature, sometimes the acknowledgements sent by the subscriber will not be processed, resulting in message redelivery.
A subscriber should be designed to be resilient to these occasional redeliveries, generally by ensuring that operations are idempotent, i.e., that the results of processing the message multiple times are the same, or by tracking and catching duplicates. Alternatively, one can use Cloud Dataflow as a subscriber to remove duplicates.

GCloud Pub/Sub Push Subscription: Limit max outstanding messages

Is there a way in a push subscription configuration to limit the maximum number of outstanding messages. In the high level subscriber docs ( it says "With slow-start, Google Cloud Pub/Sub starts by sending a single message at a time, and doubles up with each successful delivery, until it reaches the maximum number of concurrent messages outstanding." I want to be able to limit the maximum number of messages being processed, can this be done through the pub/sub config?
I've also thought of a number of other ways to effectively achieve this, but none seem great:
Have some semaphore type system implemented in my push endpoint that returns a 429 once my max concurrency level is hit?
Similar, but have it deregister the push endpoint (turning it into a pull subscription) until the current messages have been processed
My push endpoints are all on gae, so there could also be something in the gae configs to limit the simultaneous push subscription requests?
Push subscriptions do not offer any way to limit the number of outstanding messages. If one wants that level of control, the it is necessary to use pull subscriptions and flow control.
Returning 429 errors as a means to limit outstanding messages may have undesirable side effects. On errors, Cloud Pub/Sub will reduce the rate of sending messages to a push subscriber. If a sufficient number of 429 errors are returned, it is entirely possible that the subscriber will receive a smaller number of messages than it can handle for a time while Cloud Pub/Sub ramps the delivery rate back up.
Switching from push to pull is a possibility, though still may not be a good solution. It would really depend on the frequency with which the push subscriber exceeds the desired number of outstanding messages. The change between push and pull and back may not take place instantaneously, meaning the subscriber could still exceed the desired limit for some period of time and may also experience a delay in receiving new messages when switching back to a push subscriber.

Subscribing to AWS SQS Messages

I have large number of messages in AWS SQS Queue. These messages will be pushed to it constantly by other source. There are no proper dynamic on how often those messages will be pushed to queue. Currently, I keep polling SQS every second and checking if there are any messages available in there. Is there any better way of handling this, like receiving notification from SQS or SNS that some messages are available so that I only request SQS when I needed instead of constant polling?
The way to do what you want is to use long polling - rather than constantly poll every second, you open a request that stays open until it either times out or a message comes into the queue. Take a look at the documentation for ReceiveMessageRequest
ReceiveMessageRequest req = new ReceiveMessageRequest()
.withWaitTimeSeconds(Integer.valueOf(20)); // set long poll timeout to 20 sec
// set other properties on the request as well
ReceiveMessageResult result = amazonSQS.receiveMessage(req);
A common usage pattern for this is to have a background thread running the long poll and pushing the results into an internal queue (such as LinkedBlockingQueue or an ExecutorService) for a worker thread to read from.
PS. Don't forget to call deleteMessage once you're done processing the result so you don't end up receiving it again.
You can also use the worker functionality in AWS Elastic Beanstalk. It allows you to build a worker to process each message, and when you use Elastic Beanstalk to deploy it to an EC2 instance, you can define it as subscribed to a specific queue. Then each message will be POST to the worker, without your need to call receive-message on it from the queue.
It makes your system wiring much easier, as you can also have auto scaling rules that will allow you to spawn multiple workers to handle more messages in time of peak load, and scale down back to a single worker, when the load is low. It will also delete the message automatically, if you respond with OK from your worker.
See more information about it here:
You could also have a look at Shoryuken and the property delay:
delay: 25 # The delay in seconds to pause a queue when it's empty
But being honest we use delay: 0 here, the cost of SQS is inexpensive:
First 1 million Amazon SQS Requests per month are free
$0.50 per 1 million Amazon SQS Requests per month thereafter ($0.00000050 per SQS Request)
A single request can have from 1 to 10 messages, up to a maximum total payload of 256KB.
Each 64KB ‘chunk’ of payload is billed as 1 request. For example, a single API call with a 256KB payload will be billed as four requests.
You will probably spend less than 10 dollars monthly polling messages every second 24x7 in a single host.
One of the advantages of Shoryuken is that it fetches in batch, so it saves some money compared with a fetch per message solutions.