Following is a message design pattern :
Step 1- Application sends message to SNS Topic
Step 2- SNS publishes message to subscribed SQS queue .
As per following definitions :
Broadcast : Message is published to all end points.
Multicast : Message is published to selected endpoints.
Above diagram can be interpreted as
Option 1 - Message is published to selected SQS queues which are subscribed to SNS topic , multicast pattern
OR
Option 2 - Message is published to all subscribed end points,broadcast pattern
how should this design pattern be interpreted ?
In the absence of clarification, this is likely to be broadcast.
The publisher can't select the queues that will receive the messages -- it will go to all of them, by default.
Historically, SNS fanout to SQS was always broadcast.
However, recent enhancements of SNS provide a capability for the subscriptions of each queue to the SNS topic to be "filtered" -- in which case, the publisher still can't directly select the queues that will receive the message (they're not explicitly addressable), but SNS makes decisions on where to deliver the messages based on the subscription filters... which might fit the multicast label, depending on the circumstances.
https://docs.aws.amazon.com/sns/latest/dg/message-filtering.html
Related
I'm currently facing a problem when thinking about a event driven arch using SNS to decouple some applications.
Imagine a SNS Topic, and I have application A producing messages to it and application B will listen and consume messages from this topic.
This application B has a autoscaling group attached to it, so it can scales to more than one instance. How will SNS handle when application B scales? If now I got 2 instances of application B, SNS will send the message for all of them or it can realize that they are the same application and just send the message to one of them?
Think of SNS as a radio broadcast: Everyone who is listening will get your message. Meaning that every single of your subscribed servers will get notified.
SQS, on the other hand, is more like a todo list. Many subscribers can also listen to it, but every message is distributed to at least someone. Meaning that usually, only one server will get triggered.
If that suits you better, then you might consider using SQS instead of SNS.
I'm not sure what your desired outcome is here, so I'm splitting the answer into two parts:
a) You only want to process each message once:
A common pattern in this case is to subscribe an SQS queue to the SNS topic, and then have N application servers polling from this queue. That way, you can make sure that you process each message only once.
b) You want to process each message once on each server:
In this case, you can create one subscription for each server to the SNS topic. Each message published to the topic will be delivered once to each subscription.
I'm trying to learn about GCP Pub/Sub and I have a problem about the life of a message in Pub/Sub. In fact, I used this article as my reference. And in this article, they said:
Once at least one subscriber for each subscription has acknowledged the message, Pub/Sub deletes the message from storage.
So my first question is: for example I have a Subscription A which connects to Subscriber X et Subscriber Y. According to the docs, when the Subscriber X received the message and it sends an ACK to the Subscription A, the Pub/Sub will delete the message from storage without considering if the Subscriber Y received or not the message. In other words, Pub/Sub doesn't care if all subscribers have received messages or not, just one subscriber gets the message and Pub/Sub will delete the message from storage? Am I right, please?
Then, in the following part of the article, the article said:
Once all subscriptions on a topic have acknowledged a message, the message is asynchronously deleted from the publish message source and from storage.
And I feel a little bit confuse here. What I understood is that, for instance, I have a topic that has N subscriptions, each subscription has M subscriber, Pub/Sub just needs to known that for each subscription, at least one subscriber has acknowledged the message, it'll delete the message from storage. Am I right, please?
I also found that in the documentation, we have two concepts: Publishing Forwarder and Subscribing Forwarder. So may I ask some last questions:
What is the relationship between Subscription, Publishing Forwarder and Subscribing Forwarder? (for example, a Subscription consists only one Publishing Forwarder and one Subscribing Forwarder?)
The relationship between Publishing Forwarder and Subscribing Forwarder is one-to-one or one-to-many or many-to-one or many-to-many, please?
Can a Subscriber be associated with many Subscription or not, please?
Once a Subscriber consumes a message (here I say this message is not duplicated, it has no copy, it is unique), is it possible to this Subscriber re-consumes/re-reads exactly this message?
If I misunderstand something, please, point it out for me, I really appreciate that.
Thank you guys !!!
Quite a bit to unpack here. It is best not to think of a subscription as attaching to subscribers and also to understand that these two things are different. A subscription is a named entity that wants to receive all messages published to a topic. A subscriber is an actual client running to receive and process messages on behalf of a subscription. A topic can have many subscriptions. A subscription can have many subscribers. If there are multiple subscribers in a subscription, then, assuming there are no duplicate deliveries and subscriber ack all messages received, each message published to a topic will be delivered to one subscriber for the subscription. This is called load balancing: the processing of messages is spread out over many subscribers. If a topic has multiple subscriptions, each with one subscriber, then every subscriber will receive all messages. This is called fan out: each subscriber receives the complete set of messages published. Of course, it is possible to combine these two and have more than one subscriber for each subscription, in which case each message will be delivered to one subscriber for each subscription.
Forwarders are just the servers that are responsible for delivering messages. A publishing forwarder receives messages from publishers and a subscribing forwarder sends messages to subscribers. All of the relationships along the path of delivering a message, from publisher to publishing forwarder, publishing forwarder to subscribing forwarder, and subscribing forwarder to subscriber, can be many-to-many relationships.
A subscriber is associated with a single subscription. However, a job running could have multiple subscribers running within it, e.g., one could instantiate the subscriber client library several times on different subscriptions.
All of the above assumed an important caveat: assuming there are no duplicate deliveries. In general, Cloud Pub/Sub guarantees at least once delivery. That means that even a message that was properly acked by a subscriber could be redelivered--either to the same subscriber or a different subscriber--in which case the subscriber needs to ack the message on the subsequent delivery. Generally, duplicate rates should be very low, in the 0.1% range for a well-behaved subscriber that is acking messages before the ack deadline expires.
Let's say I have set up a AWS SNS with 3 subscribers. I'd like to know when all of the subscribers received/processed the message in order to mark that message as processed by all 3, and to generate some metrics.
Is there a way to do this?
You can log delivery status for SNS topics to CloudWatch, but only for certain types of messages (AWS has no reliable way of knowing if some messages were received or not, such as with SMS or email).
The types of messages you can log are:
HTTP
Lambda
SQS
Custom Application (must be configured to tell AWS that the message is received)
To set up logging in SNS:
In the SNS console, click "Edit Topic"
Expand "delivery status logging"
Then you can configure which protocols to log and the necessary permissions to do so.
Once you're logging to CloudWatch, you can draw metrics from there.
If you need to be notified when the subscribers have received the messages, you could set up a subscription filter within cloudwatch to send the relevant log events to a lambda function, in which you would implement custom logic to notify you appropriately.
I mean successful processing by the consumer
Usually your consumers would have to indicate this somehow. This is use-case specific, therefore its difficult to speculate on exact solutions.
But just to give an example, a popular patter is Request-response messaging pattern. In here, your your consumers would use a SQS queue to publish outcome of the message processing. The producer(s) would pull the queue to get these messages, subsequently, knowing which messages were correctly process and which not.
Basically I have a SNS topic with multiple SQS subscribers. By default its sending out the same message to all subscribers but what I need is one message is sent to exactly one subscriber.
One solution come to my mind but I want to know if I'm leaving out any possible solutions.
Use `Message Group Id`
A) Get all SQS subscribers for SNS topic
B) When sending out list of messages, we randomly choose a SQS subscriber
id, and use that as `Message Group Id`
C) When SNS message is received, it will go the appropriate SQS subscriber.
Is there any other solutions? for instance if I create a FIFO queue and attach subscribers, are they all getting the same messages?
As there is a limitation on SQS to support multiple consumers to process messages in parallel. ie. m1 to m10 picked by process 1 and m11 to m20 picked by process 2 and so on.. with duplication. Since this is not supported by SQS, I am thinking of using SNS + SQS (list of queues subscribed), where each process listens to its specific queue and processes records.
Is there an option to set between SNS and SQS like round-robin so that SNS distributes messages to SQS in a round robin fashion, So that each queue would have unique messages without duplication across queues?
Thanks in advance!
Regards,
Kumar
If you don't want your SNS publish to go to all subscribers (queues), look in to SNS Message Filtering. Message filtering allows you to define logic controlling which subscribers receive a given message.
By default, a subscriber of an Amazon SNS topic receives every message
published to the topic. To receive only a subset of the messages, a
subscriber assigns a filter policy to the topic subscription.
A filter policy is a simple JSON object. The policy contains
attributes that define which messages the subscriber receives. When
you publish a message to a topic, Amazon SNS compares the message
attributes to the attributes in the filter policy for each of the
topic's subscriptions. If there is a match between the attributes,
Amazon SNS sends the message to the subscriber. Otherwise, Amazon SNS
skips the subscriber without sending the message to it. If a
subscription lacks a filter policy, the subscription receives every
message published to its topic.
Unless you are using SQS FIFO queues, your assumption about the limitation of SQS not supporting multiple parallel consumers is not correct
Standard SQS do support multiple parallel consumers.
Regarding the SQS FIFO queues they don't serve messages from the same message group to more than one consumer at a time. However, if your FIFO queue has multiple message groups, you can take advantage of parallel consumers, allowing Amazon SQS to serve messages from different message groups to different consumers.