Google Cloud Pub/Sub retrieve message by ID - google-cloud-platform

Problem: My use case is I want to publish thousends of messages to Google Cloud Pub/Sub with a 5min retention period but only retrieve specific messages by their ID - So a cloud function will retrieve one message by ID using the Nodejs SDK and all the untreated messages will be deleted by the retention policy. All the current examples mention are to handle random messages from the subscriber.
Is it possible to just pull 1 message by id or any other metadata and close the connection.

There is no way to retrieve individual messages by ID, no. It doesn't really fit into the expected use cases for Cloud Pub/Sub where the publishers and subscribers are meant to be decoupled, meaning the subscriber inherently doesn't know the message IDs prior to receiving the messages.
You may instead want to transmit the messages via whatever mechanism you are using to making the subscribers aware of the message IDs. Or, if you know at publish time which messages will ultimately need to be retrieved, you could add an attribute to the message to indicate this and use filtering.

Related

GCP PubSub understanding filters

In my understanding PubSub filters are supposed to reduce number of messages sent to a specific subscription. We currently observe behaviour that we didn't expect.
Assuming there is a PubSub Topic "XYZ" and a subscription to that topic "XYZ-Sub" with a filter attributes.someHeader = "x"
There are 2 messages published to that topic:
First one attributes.someHeader = "a". Second one with attributes.someHeader = "x"
I expect the only message 2 will be delivered to the subscription as message 1 does not match the filter.
If it is not the case and still both messages get delivered (what we currently observe):
GCP console shows a rising number of unacked messages on a sub when no client is connected. Pulling this messages in the gcp console removes them without showing any received messages, which makes me assume that the filters are applied when pulling messages.
Are filters evaluated on PubSub client and not topic level?
What is the point in using filters with pub/sub?
Will the delivery of the unwanted message (the bytes of the message) be billed?
Filtering in Cloud Pub/Sub only delivers messages that match the filter to subscribers. The filters are applied in the Pub/Sub service itself, not in the client. They allow you to limit the set of messages delivered to subscribers when the subscriber only wants to process a subset of the messages.
In your example, only the message with attributes.someHeader = "x" should be delivered. However, note that as the documentation, the backlog metrics might include messages that don't match the filter. Such messages will not be delivered to subscribers, but may still show up in the backlog metrics for a time.
You do get charged the Pub/Sub message delivery price for messages that were not delivered. However, you do not pay any network fees for them, nor do you end up paying for any compute to process messages you do not receive.

Is there a way to retrieve the count of messages in a PubSub subscription (in realtime)?

I want to achieve batch consuming of a PubSub subscription, retrieving all the messages that were in the subscription at the begining of my process. To do so, I use PubSub's asynchronous pulling for Java, and the consumer.ack() and consumer.nack() functions to process exactly the number of messages that I want, and make the subscription redeliver the messages that I have received but not processed yet. My problem being that I did not managed to find a way to retrieve the real time count of messages in my subscription.
I have started to request pubsub.googleapis.com/subscription/num_undelivered_messages metric from Google Cloud Monitoring, but unfortunately the metric has a ~3 minutes latency with the real count of undelivered messages in the subscription.
Is there any way to retrieve this message count on real time ?
There is no way to retrieve the message count in real time, no. Also keep in mind that such a number would not be sufficient to retrieve all of the messages that were in the subscription at the beginning of the process unless you can guarantee that no publishing is happening at the same time.
If there is publishing, then your subscriber could get those messages before messages published earlier, unless you are using ordered message delivery and even still, those delivery guarantees are per ordering key, not a total ordering guarantee. If you can guarantee that there are no publishes during this time and/or you are only bringing the subscriber up periodically, then it sounds more like a batch case, which means you may want to consider a database or a GCS file as an alternative place to store the messages for processing.

Is it possible to kick off two different cloud build which are based on subscription to same topic?

currently i have a cloud-build application which is being kicked off by a pub-sub trigger , subscribing to eg. topic1
I would like to know if i can kick off another cloud-build application from subscribing to the same topic. Is there a way to configure the message (or the trigger) so that if message1 is published to topic1, then cloudbuild1 is kicked off, and if message2 is published to topic1, then cloudbuild2 is kicked off?
Kind regards
marco
When you create a subscription on a topic, all the published messages in the topic are replicated in each subscription.
Therefore, if you have TOPIC and Sub1 and Sub2, if you publish 1 message in TOPIC, you will have this message in Sub1 and Sub2.
However, you can set up a filter on messages when you create a subscription. You can set this filter only at the creation and you can't update it later. You need to delete and recreate the subscription if you want to update the filter.
In addition, you can filter only on message attributes, not on the message body content.
Therefore, with filter, think wisely your filter from the beginning and when you publish a message in TOPIC, add attributes that allow your to route the messages to the correct subscription.

Propagating Error messages in Google Cloud Platform (GCP)

I am building a near real time service. The input is a cloud storage bucket and blob path to a photo image. This horizontally-scalable service is made up of multiple components including ML models running on k8s and Google Cloud Functions, each of which has a chance of failing for a variety of reasons. The ML models are independent and run in parallel. Each component is triggered by a PubSub push message topic unique to the component. Running the entire flow for one photo may take 15 seconds.
I want to return a meaning error message back to the service requester telling which component failed if there is a failure. Essentially, I want to report which image failed and where it failed.
What is the recommended practice for returning an error back to the requester?
There is no built in service for this. But, because you already use PubSub for asynchronous call, I propose to use it also to push back the error.
You can do this in 2 flavors
First, create a PubSub topic for the errors, let's say 'error_topic'
1. Without message customization
In the PubSub message, the requester put which it is in the attribute (let's say 'requester' attribute name)
In the consumer service, if an error occurs, return an error code (500 for example) for push subscription or a NACK in pull subscription.
Configure the PubSub subscription to manage retry and dead letter topic (the dead letter topic is 'error_topic')
Then, create one subscription per requester on the 'error_topic' (use the filter capability for this) and consume the message in the requester services
2. With message customization
In the PubSub message, the requester put which it is in the attribute (let's say 'requester' attribute name)
The consumer service that raises the error create a new message with custom information and copies the 'requester' attribute value and then puts it in attribute of the message in the 'error_topic' (let's say 'original_requester' attribute name).
Then, create one subscription per requester on the 'error_topic' (use the filter capability for this) and consume the message in the requester services

REST API for monitoring undelivered message in google cloud pubsub

I want to implement a service to monitor an undelivered messages and send notification when it reach threshold or process further.
I already look through the Stackdriver. It provide me the monitoring and alert that It only provide the API to get the metricDescriptor but it does not provide an API to get the undelivered message as you can see in Stackdriver Monitoring API.
Is there actually an provided API to get the metrics value?
You can get the values via the projects.timeSeries.list method. You would set the name to projects/<your project>, filter to metric.type = "pubsub.googleapis.com/subscription/num_undelivered_messages", and end time (and if a range of values is desired, the start time as well) to a string representing a time in RFC3339 UTC "Zulu" format, e.g., 2018-10-04T14:00:00Z. If you want to look at a specific subscription, set the filter to metric.type = "pubsub.googleapis.com/subscription/num_undelivered_messages" AND resource.label.subscription_id = "<subscription name>".
The result will be one or more TimeSeries types (depending on whether or not you specified a specific subscription) with the points field including the data points for the specified time range, each of which will have the value's int64Value set to the number of messages that have have not been acknowledged by subscribers.