GCP Pub/Sub: Life of a Message - google-cloud-platform

I'm trying to learn about GCP Pub/Sub and I have a problem about the life of a message in Pub/Sub. In fact, I used this article as my reference. And in this article, they said:
Once at least one subscriber for each subscription has acknowledged the message, Pub/Sub deletes the message from storage.
So my first question is: for example I have a Subscription A which connects to Subscriber X et Subscriber Y. According to the docs, when the Subscriber X received the message and it sends an ACK to the Subscription A, the Pub/Sub will delete the message from storage without considering if the Subscriber Y received or not the message. In other words, Pub/Sub doesn't care if all subscribers have received messages or not, just one subscriber gets the message and Pub/Sub will delete the message from storage? Am I right, please?
Then, in the following part of the article, the article said:
Once all subscriptions on a topic have acknowledged a message, the message is asynchronously deleted from the publish message source and from storage.
And I feel a little bit confuse here. What I understood is that, for instance, I have a topic that has N subscriptions, each subscription has M subscriber, Pub/Sub just needs to known that for each subscription, at least one subscriber has acknowledged the message, it'll delete the message from storage. Am I right, please?
I also found that in the documentation, we have two concepts: Publishing Forwarder and Subscribing Forwarder. So may I ask some last questions:
What is the relationship between Subscription, Publishing Forwarder and Subscribing Forwarder? (for example, a Subscription consists only one Publishing Forwarder and one Subscribing Forwarder?)
The relationship between Publishing Forwarder and Subscribing Forwarder is one-to-one or one-to-many or many-to-one or many-to-many, please?
Can a Subscriber be associated with many Subscription or not, please?
Once a Subscriber consumes a message (here I say this message is not duplicated, it has no copy, it is unique), is it possible to this Subscriber re-consumes/re-reads exactly this message?
If I misunderstand something, please, point it out for me, I really appreciate that.
Thank you guys !!!

Quite a bit to unpack here. It is best not to think of a subscription as attaching to subscribers and also to understand that these two things are different. A subscription is a named entity that wants to receive all messages published to a topic. A subscriber is an actual client running to receive and process messages on behalf of a subscription. A topic can have many subscriptions. A subscription can have many subscribers. If there are multiple subscribers in a subscription, then, assuming there are no duplicate deliveries and subscriber ack all messages received, each message published to a topic will be delivered to one subscriber for the subscription. This is called load balancing: the processing of messages is spread out over many subscribers. If a topic has multiple subscriptions, each with one subscriber, then every subscriber will receive all messages. This is called fan out: each subscriber receives the complete set of messages published. Of course, it is possible to combine these two and have more than one subscriber for each subscription, in which case each message will be delivered to one subscriber for each subscription.
Forwarders are just the servers that are responsible for delivering messages. A publishing forwarder receives messages from publishers and a subscribing forwarder sends messages to subscribers. All of the relationships along the path of delivering a message, from publisher to publishing forwarder, publishing forwarder to subscribing forwarder, and subscribing forwarder to subscriber, can be many-to-many relationships.
A subscriber is associated with a single subscription. However, a job running could have multiple subscribers running within it, e.g., one could instantiate the subscriber client library several times on different subscriptions.
All of the above assumed an important caveat: assuming there are no duplicate deliveries. In general, Cloud Pub/Sub guarantees at least once delivery. That means that even a message that was properly acked by a subscriber could be redelivered--either to the same subscriber or a different subscriber--in which case the subscriber needs to ack the message on the subsequent delivery. Generally, duplicate rates should be very low, in the 0.1% range for a well-behaved subscriber that is acking messages before the ack deadline expires.

Related

Difference between best-effort delivery and at least once delivery in Google Cloud Pub/Sub

What is the difference between best-effort delivery and at least once delivery in Pub/Sub?
"Best-effort delivery" refers to the sending of messages to the dead-letter topic while "at least once delivery" refers to the sending of messages to subscribers. The latter is the primary guarantee offered by Pub/Sub around delivery: messages that are successfully published to a topic will be delivered to a subscriber for each subscription attached to the topic at least once (unless the message exceeds its messages retention duration and expires). A message could be delivered to a subscriber more than once, even if an acknowledgement request for the message returns successfully. Note that exactly once delivery offers some stronger guarantees.
The "best-effort delivery" indicates that there are no strong guarantees around delivering messages to the dead-letter topic. In general, messages are meant to be sent to the dead letter topic once their delivery count exceeds a provided threshold and usually are. However, the delivery count may reset and/or the publish to the dead letter topic could fail, which results in the message continuing to be redelivered to subscribers.

Akka - When a topic actor has no subscribers

I have created one actor that subscribes to one topic to get messages. This is the only actor that subscribes to this topic.
I wondered what will happen if for some reason the actor will unsubscribe and then subscribe again to the same topic (if actor restarts, for example).
From Akka types API documentation (https://doc.akka.io/docs/akka/2.6.19//typed/distributed-pub-sub.html):
When a topic actor has no subscribers for a topic it will deregister
itself from the receptionist meaning published messages for the topic
will not be sent to it.
What does it means? Does it mean that after restart - no one can send messages to this actor through this topic?
If the only actor subscribing to a topic unsubscribes, messages sent to that topic will no longer be delivered. When that topic has a subscriber again, messages sent after that point may be delivered (messages sent between the subscriptions will not be delivered).
So if you have an actor which is subscribing to a Distributed Pub Sub topic, it should resubscribe on every restart.

Acknowledgement Behaviour of GCP Pub/sub messages

I'm working on micro-service that contains subscriptions to a topic in GCP Pub/Sub. As multiple instances of a Microservices run on more than one host (multiple clusters on cloud), I wanted to know acknowledging behaviour of messages from subscriptions. When a subscription on one instance receives, process and acknowledges the message, does the same subscription on other hosts receive the message?
I expect that once the subscriber acknowledges, pub/sub doesn't further send the message, but what if two subscribers on same subscription on different hosts receives message at the same time, does it cause duplication?
Pub/Sub delivers each published message at least once for every subscription.
https://cloud.google.com/pubsub/docs/subscriber#at-least-once-delivery
If you want multiple "workers" to not receive message clones, you need to use a single subscription for all of them.
This is because for events you can have multiple systems listening on the same topic, on different subscriptions so that all the systems receive the event that something has happened.
For commands, you usually want a single system to handle them (even if split between multiple workers) so you would need a single subscription that is shared among all the workers.
By the way, your system should be idempotent in processing events/commands from a topic. The general rule of thumb is that each message is guaranteed to be received by a subscriber at least one time. Meaning the same system could potentially receive the same command two times.

GCP Dataflow Pub/Sub to Text Files on Cloud Storage

I'm referring to Google provided dataflow Pub/Sub to Text Files on Cloud Storage.
The messages once read by dataflow don't get acknowledged. How do we ensure that messages once consumed by dataflow is acknowledged and is not available to any other subscriber?
To reproduce and test it, create 2 Jobs from the same template and you would see that both the job processing the same message.
Firstly, the messages are correctly acknowledge.
Then, to demonstrate this, and how your reproduction is wrong, I would like to focus on PubSub behavior.
One or several publishers publish messages in a topic
One or several subscription can be created on a topic
All the messages published in a topic are copied in each subscription
Subscription can have one or several subscribers.
Each subscriber receives a subset of the messages in the subscription.
Go back to your template. You specify only a topic, not a subscription. When your dataflow is running, go to the subscription, you will be able to see a new subscription created.
-> When you start a PubSub to TextFiles template a subscription is automatically created on the provided topic
Therefore, if you create 2 jobs, you will have 2 subscribtions, and thus, all the messages published in the topic are copied in each subscription. That's why you will have 2 times the same messages.
Now, keep your job up and go to the subscription. Here you can see the number of message in the queue and the unacked messages. You should see 0 in the unacked message graph.

Broadcast or Multicast Pattern (SQS and SNS)

Following is a message design pattern :
Step 1- Application sends message to SNS Topic
Step 2- SNS publishes message to subscribed SQS queue .
As per following definitions :
Broadcast : Message is published to all end points.
Multicast : Message is published to selected endpoints.
Above diagram can be interpreted as
Option 1 - Message is published to selected SQS queues which are subscribed to SNS topic , multicast pattern
OR
Option 2 - Message is published to all subscribed end points,broadcast pattern
how should this design pattern be interpreted ?
In the absence of clarification, this is likely to be broadcast.
The publisher can't select the queues that will receive the messages -- it will go to all of them, by default.
Historically, SNS fanout to SQS was always broadcast.
However, recent enhancements of SNS provide a capability for the subscriptions of each queue to the SNS topic to be "filtered" -- in which case, the publisher still can't directly select the queues that will receive the message (they're not explicitly addressable), but SNS makes decisions on where to deliver the messages based on the subscription filters... which might fit the multicast label, depending on the circumstances.
https://docs.aws.amazon.com/sns/latest/dg/message-filtering.html