Google pub/sub ERROR com.google.cloud.pubsub.v1.StreamingSubscriberConnection - google-cloud-platform

I have a snowplow enricher application hosted in GKE consuming messages from google pub/sub subscription and the enricher application is throwing the below error.
I can see num_undelivered_messages count spiking(going above 50000) in the pub/sub subscription 3-4 times a day and i presume these error messages are occurring as enricher application is unable to fetch messages from the mentioned subscription.
Why is the application unable to connect to pub/sub subscription at times?
Any help is really appreciated.
Apr 12, 2022 12:30:32 PM com.google.cloud.pubsub.v1.StreamingSubscriberConnection$2 onFailure
WARNING: failed to send operations
com.google.api.gax.rpc.UnavailableException: io.grpc.StatusRuntimeException: UNAVAILABLE: 502:Bad Gateway
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:69)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72)
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60)
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97)
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68)
at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1050)
at com.google.common.util.concurrent.DirectExecutor.execute(DirectExecutor.java:30)
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:1176)
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:969)
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:760)
at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:545)
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:515)
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
at io.grpc.internal.ClientCallImpl.access$500(ClientCallImpl.java:66)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:689)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$900(ClientCallImpl.java:577)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInternal(ClientCallImpl.java:751)
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:740)
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: io.grpc.StatusRuntimeException: UNAVAILABLE: 502:Bad Gateway
at io.grpc.Status.asRuntimeException(Status.java:533)
... 15 more

The accumulation of messages in the subscriptions suggests that your subscribers are not keeping up with the flow of messages.
To monitor your subscribers, you can create a dashboard that contains backlog metrics: num_undelivered_messages and oldest_unacked_message_age (age of the oldest unacked message in the subscription's backlog) aggregated by resource for all your subscription.
If both the oldest_unacked_message and num_undelivered_messages are growing it is because the subscribers are not keeping up with the message volume.
Solution: Add more subscriber threads/ machines and look for any bugs in your code which might prevent acknowledging messages.
If there is a steady, small backlog size with a steadily growing oldest_unacked_message_age, there may be a small number of messages that cannot be processed. This can be due to the messages getting stuck.
Solution: Check your application logs to understand whether some messages are causing your code to crash. It's unlikely—but possible —that the offending messages are stuck on Pub/Sub rather than in your client.
If the oldest_unacked_message_age exceeds the subscription's message retention duration there are high chances of data loss; in that case the best option is to set up alerts to fire before subscription's message retention duration lapses.

Related

GCP PubSub understanding filters

In my understanding PubSub filters are supposed to reduce number of messages sent to a specific subscription. We currently observe behaviour that we didn't expect.
Assuming there is a PubSub Topic "XYZ" and a subscription to that topic "XYZ-Sub" with a filter attributes.someHeader = "x"
There are 2 messages published to that topic:
First one attributes.someHeader = "a". Second one with attributes.someHeader = "x"
I expect the only message 2 will be delivered to the subscription as message 1 does not match the filter.
If it is not the case and still both messages get delivered (what we currently observe):
GCP console shows a rising number of unacked messages on a sub when no client is connected. Pulling this messages in the gcp console removes them without showing any received messages, which makes me assume that the filters are applied when pulling messages.
Are filters evaluated on PubSub client and not topic level?
What is the point in using filters with pub/sub?
Will the delivery of the unwanted message (the bytes of the message) be billed?
Filtering in Cloud Pub/Sub only delivers messages that match the filter to subscribers. The filters are applied in the Pub/Sub service itself, not in the client. They allow you to limit the set of messages delivered to subscribers when the subscriber only wants to process a subset of the messages.
In your example, only the message with attributes.someHeader = "x" should be delivered. However, note that as the documentation, the backlog metrics might include messages that don't match the filter. Such messages will not be delivered to subscribers, but may still show up in the backlog metrics for a time.
You do get charged the Pub/Sub message delivery price for messages that were not delivered. However, you do not pay any network fees for them, nor do you end up paying for any compute to process messages you do not receive.

Message stays in GCP Pub/Sub even after acknowledge using GCP REST API

I am using the following GCP Pub/Sub REST APIs for pulling and Acknowledging messages.
For pulling message:-
POST https://pubsub.googleapis.com/v1/projects/myproject/subscriptions/mysubscription:pull
{
"returnImmediately": "false",
"maxMessages": "10"
}
To acknowledge message:-
POST https://pubsub.googleapis.com/v1/projects/myproject/subscriptions/mysubscription:acknowledge
{
"ackIds": [
"dQNNHlAbEGEIBERNK0EPKVgUWQYyODM2LwgRHFEZDDsLRk1SK..."
]
}
I am using the postman tool for calling the above APIs.But I can see the same message with same messageId and a different ackId even after the acknowledgement, when I pull the messages next time.Is there any mechanism available to exclude the acknowledged messages in gcp pull (subscriptions/mysubscription:pull)
Cloud Pub/Sub is an at-least-once delivery system, so some duplicates are expected. However, if you are always seeing duplicates, it is likely that you are not acknowledging the message before the ack deadline passes. The default ack deadline is 10 seconds. If you do not call ack within that time period, then the message will be redelivered. You can set the ack deadline on a subscription to up to 600 seconds.
If all of your messages are expected to take a longer time to process, then it is best to increase the ack deadline. If only a couple of messages will be slow and most will be processed quickly, then it's better to use the modifyAckDeadline call to increase the ack deadline on a per-message basis.

How to modify/check google cloud run's retry limit on failure?

I have got a topic, which on publish it pushes the event to a cloud run endpoint and I got a trigger on a storage bucket to publish for this topic. The container in the cloud run fails to process the event and it has been restarted over hundreds of times and I don't wanna waste money on this. How can I limit the retry on failure on a cloud run's container?
A possible answer to the puzzle might be the following notion.
If we read the documentation on PUSH subscriptions found here, we find the following:
... Pub/Sub retries delivery until the message expires after the
subscription's message retention period.
What this means is that if Pub/Sub pushes a message to Cloud Run and Cloud Run does not acknowledge the message by returning a 200 response code, then the message will be re-pushed for the "message retention period". By default, this is 7 days but according to the documentation, can be set to a minimum value of 10 minutes. What this seems to say to me is that we can stop a poison message after 10 minutes (minimum) of retries.
If a message is pushed and not acked, then it won't be pushed again immediately but instead be pushed as a function of a back-off algorithm described here.
If we look at the gcloud documentation we find reference to the concept of a maximum number of delivery attempts (--max-delivery-attempts). Associated with this is a topic called the dead letter topic (--dead-letter-topic). What this appears to define is that if an attempt to deliver a pub sub message more than the maximum number of times, the message will be removed from the queue of messages associated with the subscription and moved to the topic associated with the dead letter. If you define this for your environment, then your Cloud Run will only execute a finite number of times after which the poision messages will be moved elsewhere.

Google Cloud PubSub Message Delivered More than Once before reaching deadline acknowledgement time

Background:
We configured cloud pubsub topic to interact within multiple app engine services,
There we have configured push based subscribers. We have configured its acknowledgement deadline to 600 seconds
Issue:
We have observed pubsub has pushed same message twice (more than twice from some other topics) to its subscribers, Looking at the log I can see this message push happened with the gap of just 1 Second, Ideally as we have configured ackDeadline to 600 seconds, pubsub should re-attempt message delivery only after 600 seconds.
Need following answers:
Why same message has got delivered more than once in 1 second only
Does pubsub doesn’t honors ackDeadline configuration before
reattempting message delivery?
References:
- https://cloud.google.com/pubsub/docs/subscriber
Message redelivery can happen for a couple of reasons. First of all, it is possible that a message got published twice. Sometimes the publisher will get back an error like a deadline exceeded, meaning the publish took longer than anticipated. The message may or may not have actually been published in this situation. Often, the correct action is for the publisher to retry the publish and in fact that is what the Google-provided client libraries do by default. Consequently, there may be two copies of the message that were successfully published, even though the client only got confirmation for one of them.
Secondly, Google Cloud Pub/Sub guarantees at-least-once delivery. This means that occasionally, messages can be redelivered, even if the ackDeadline has not yet passed or an ack was sent back to the service. Acknowledgements are best effort and most of the time, they are successfully processed by the service. However, due to network glitches, server restarts, and other regular occurrences of that nature, sometimes the acknowledgements sent by the subscriber will not be processed, resulting in message redelivery.
A subscriber should be designed to be resilient to these occasional redeliveries, generally by ensuring that operations are idempotent, i.e., that the results of processing the message multiple times are the same, or by tracking and catching duplicates. Alternatively, one can use Cloud Dataflow as a subscriber to remove duplicates.

What is the Message broker retry intval and how to configure

I have used WSO2 Message Broker MB300 server for comiunicate each micro service. That is using Topic connection.
In the dash board "Durable Topic Subscriptions" section and "Number Of Messages Delivery Pending" column message showing as pending. That count getting increase. Any configuration for Message delivery delay or retry interval?
Redelivery delay is introduced from MB 3.2.0 version which can be set as a system property. System.setProperty("AndesRedeliveryDelay", "10000"); Also, maximum redelivery attempts can be set in broker.xml by setting the value as follows, <maximumRedeliveryAttempts>10</maximumRedeliveryAttempts>