How to deploy pubsub-triggered cloud function with message ordering? - google-cloud-platform

I want to deploy a Pubsub-triggered Cloud Function with message ordering:
https://cloud.google.com/pubsub/docs/ordering
gcloud functions deploy doesn't have an option to set an --enable-message-ordering option:
https://cloud.google.com/sdk/gcloud/reference/functions/deploy
Should I pre-create the subscription before deploying the function? If so, does Cloud Functions have a well-known format for how it matches to a subscription name? It seems maybe the format is: gcf-{function-name}-{region}-{topic-name}, but it also looks like the name format has changed over time, e.g. older deployed functions don't have the region name in the subscription. Is there a stable way to do this?

You must create message ordering pub/sub and Cloud function manually.
Fisrt, Create a pub/sub topic, and then create a subscription that subscribes pub/sub topic with --enable-message-ordering
Second, Create a Cloud function that will serve ordered pub/sub messages.
Last, back to the pub/sub subscription, Edit delivery type to push and specify your cloud function endpoint.
So final diagram is like below.
Publisher -> Pub/sub topic -> Pub/sub subscriber -> Cloud function
You tried to make a connection Pub/sub topic with Cloud function directly.
But for message ordering, Pub/sub needs topic -> subscriber connection.
So only pub/sub topic -> pub/sub subscriber -> Cloud function connection can delivers ordered messages to your function.

When declaring a Pub/Sub topic with a Cloud Function alike:
exports.pubsub = functions.pubsub.topic('some-topic').onPublish((message, context) => {}
The problem is that message ordering is only available for the subscription, but not for the topic.On deploy a push subscription is automatically being created at: cloudpubsub/subscription/list (the one which starts with gcf-*). This only appears to work when manually subscribing to a topic: Enabling message ordering. I haven't yet tried if it would pick up a subscription with the same name; if everything fails, one still could record the messages and then order by timestamp or a "sequence token": https://www.youtube.com/watch?v=nQ9_Xur2aM4.

Related

Why does GCP Pub/Sub publish a message twice?

I have a Google Cloud Function subscribed to a topic. Our GCP Pub/Sub publishes a message to the topic when cloud scheduler invoke GCP Pub/Sub each 5 minutes. The problem is that the cloud functions gets sometimes invoked twice 90s after invoking first one.
The acknowledgement deadline on the subscription is 600 seconds.
So, I can't figure it out why GCF is invoked twice in 90s by GCP Pub/Sub.
Does invoking twice 90s after related to something?
Your duplicate could either be on the publish side or on the subscribe side. If the duplicate messages have different message IDs, then your duplicates are generated on the publish side. This could be caused by retries on the publish side in response to retryable errors. If the messages have the same message ID, then the duplication is on the subscribe side within Pub/Sub.
Cloud Pub/Sub offers at-least-once delivery semantics. That means it is possible for duplicates to occur, even if you acknowledge the message and even if the acknowledgement deadline has not passed. If you want stronger guarantees around delivery, you can use Pub/Sub's exactly once feature, which is currently in public preview. However, this will require you to set up your Cloud Function with an HTTP trigger and to create a push subscription in Pub/Sub that points to the address of the function because there is no way to set the exactly once setting on a subscription created by Cloud Functions.

Access to the Google Cloud Storage Trigger Events "Pub/Sub"?

I have a Google Cloud Storage Trigger set up on a Cloud Function with max instances of 5, to fire on the google.storage.object.finalize event of a Cloud Storage Bucket. The docs state that these events are "based on" the Cloud Pub/Sub.
Does anyone know:
Is there any way to see configuration of the topic or subscription in the console, or through the CLI?
Is there any way to get the queue depth (or equivalent?)
Is there any way to clear events?
No, No and No. When you plug Cloud Functions to Cloud Storage event, all the stuff are handle behind the scene by Google and you see nothing and you can't interact with anything.
However, you can change the notification mechanism. Instead of plugin directly your Cloud Functions on Cloud Storage Event, plug a PubSub on your Cloud Storage event.
From there, you have access to YOUR pubsub. Monitor the queue, purge it, create the subscription that you want,...
The recomended way to work with storage notifications is using Pubsub.
Legacy storage notifications still work, but with pubsub you can "peek" into the pubsub message queue and clear it if you need it.
Also, you can process pubsub events with cloud run - which is easier to develop and test (just web service), easier to deploy (just a container) and it can process several requests in parallel without having to pay more (great when you have a lot of requests together).
Where does pubsub storage notifications go?
You can see where gcloud notifications go with the gsutil command:
% gsutil notification list gs://__bucket_name__
projects/_/buckets/__bucket_name__/notificationConfigs/1
Cloud Pub/Sub topic: projects/__project_name__/topics/__topic_name__
Filters:
Event Types: OBJECT_FINALIZE
Is there any way to get the queue depth (or equivalent?)
In pubsub you can have many subsciptions to topics.
If there is no subsciption, messages get lost.
To send data to a cloud function or cloud run you setup a push subscription.
In my experience, you won't be able to see what happened because it faster that you can click: you'll find this empty 99.9999% of the time.
You can check the "queue" depht in the console (pubsub -> choose you topics -> choose the subscription).
If you need to troubleshoot this, set up a second subscription with a time to live low enough that it does not use a lot of space (you'll be billed for it).
Is there any way to clear events?
You can empty the messages from the pubsub subscription, but...
... if you're using a push notification agains a cloud function it will much faster than you can "click".
If you need it, it is on the web console (opent the pubsub subscription and click in the vertical "..." on the top right).

GCP Dataflow Pub/Sub to Text Files on Cloud Storage

I'm referring to Google provided dataflow Pub/Sub to Text Files on Cloud Storage.
The messages once read by dataflow don't get acknowledged. How do we ensure that messages once consumed by dataflow is acknowledged and is not available to any other subscriber?
To reproduce and test it, create 2 Jobs from the same template and you would see that both the job processing the same message.
Firstly, the messages are correctly acknowledge.
Then, to demonstrate this, and how your reproduction is wrong, I would like to focus on PubSub behavior.
One or several publishers publish messages in a topic
One or several subscription can be created on a topic
All the messages published in a topic are copied in each subscription
Subscription can have one or several subscribers.
Each subscriber receives a subset of the messages in the subscription.
Go back to your template. You specify only a topic, not a subscription. When your dataflow is running, go to the subscription, you will be able to see a new subscription created.
-> When you start a PubSub to TextFiles template a subscription is automatically created on the provided topic
Therefore, if you create 2 jobs, you will have 2 subscribtions, and thus, all the messages published in the topic are copied in each subscription. That's why you will have 2 times the same messages.
Now, keep your job up and go to the subscription. Here you can see the number of message in the queue and the unacked messages. You should see 0 in the unacked message graph.

How to listen to GCE events

I would like to be able to listen for when a GCE instance is started, stopped, deleted. This is so that I can build a dashboard for users to view the status of machines. How can I do this?
You can use Cloud Function to implement such a workflow. Cloud Functions can't "listen" to GCE events directly but they can be triggered when a message is published to a specific PubSub topic.
Now, GCE VM events are actually logged in Cloud Logging, and logs matching a particular filter can be exported to a PubSub topic.
So in Cloud Logging, you could set an advanced log filter like so:
resource.type="gce_instance"
jsonPayload.event_subtype="compute.instances.stop" OR jsonPayload.event_subtype="compute.instances.start"
This filter will filter stop and start events from all VMs in your project. You can see a list of available events here.
Once you've defined the log filter, you can "create sink" and set it to send the filtered logs to a PubSub topic of your choice. More info on how to set up an export sink here.
Now that your event logs are sent to the PubSub topic, you can go to your PubSub topic list, select your topic and click the "Trigger Cloud Function" button. You'll be guided through setting up the Cloud Function that'll be triggered for every new message in that topic. The suggested function code (in nodejs 8 for example):
exports.helloPubSub = (event, context) => {
const pubsubMessage = event.data;
console.log(Buffer.from(pubsubMessage, 'base64').toString());
};
will log the message data where you'll find the event log info. You can then write your Cloud Function to perform whichever process you want, for example updating a Firestore database with the VM instance status.

Should Dataflow consume events from a Pub/Sub topic or subscription? [duplicate]

This question already has an answer here:
Dataflow Template Cloud Pub/Sub Topic vs Subscription to BigQuery
(1 answer)
Closed 3 years ago.
I am looking to stream events from from PubSub into BigQuery using Dataflow. I see that there are two templates for doing this in GCP: one where Dataflow reads messages from a topic; and one from a subscription.
What are the advantages of using a subscription here, rather than just consuming the events from the topic?
Core concepts
Topic: A named resource to which messages are sent by publishers.
Subscription: A named resource representing the stream of messages from a single, specific topic, to be delivered to the subscribing
application.
According to the core concepts, the the difference is rather simple:
Use a Topic when you would like to publish messages from Dataflow to Pub/Sub (indeed, for a given topic).
Use a Subscription when you would like to consume messages coming from Pub/Sub in Dataflow.
Thus, in your case, go for a subscription.
More info:
Keep into account that Pub/Sub manages topics using is own message store. However, a Cloud Pub/Sub Topic to BigQuery template is particularly useful when you would like to move these messages as well in BigQuery (and eventually perform your own analysis).
The Cloud Pub/Sub Topic to BigQuery template is a streaming pipeline
that reads JSON-formatted messages from a Cloud Pub/Sub topic and
writes them to a BigQuery table. You can use the template as a quick
solution to move Cloud Pub/Sub data to BigQuery. The template reads
JSON-formatted messages from Cloud Pub/Sub and converts them to
BigQuery elements.
https://cloud.google.com/dataflow/docs/guides/templates/provided-streaming#cloudpubsubtobigquery
Disclaimer: Comments and opinions are my own and not the views of my employer.
Both the Topic to BigQuery and Subscription to BigQuery templates consume messages from Pub/Sub and stream them into BigQuery.
If you use the Topic to BigQuery template, Dataflow will create a subscription behind the scenes for you that reads from the specified topic. If you use the Subscription to BigQuery template, you will need to provide your own subscription.
You can use Subscription to BigQuery templates to emulate the behavior of a Topic to BigQuery template by creating multiple subscription-connected BigQuery pipelines reading from the same topic.
For new deployments, using the Subscription to BigQuery template is preferred. If you stop and restart a pipeline using the Topic to BigQuery template, a new subscription will be created, which may cause you to miss some messages that were published while the pipeline was down. The Subscription to BigQuery template doesn't have this disadvantage, since it uses the same subscription even after the pipeline is restarted.