Acknowledgement Behaviour of GCP Pub/sub messages - google-cloud-platform

I'm working on micro-service that contains subscriptions to a topic in GCP Pub/Sub. As multiple instances of a Microservices run on more than one host (multiple clusters on cloud), I wanted to know acknowledging behaviour of messages from subscriptions. When a subscription on one instance receives, process and acknowledges the message, does the same subscription on other hosts receive the message?
I expect that once the subscriber acknowledges, pub/sub doesn't further send the message, but what if two subscribers on same subscription on different hosts receives message at the same time, does it cause duplication?

Pub/Sub delivers each published message at least once for every subscription.
https://cloud.google.com/pubsub/docs/subscriber#at-least-once-delivery
If you want multiple "workers" to not receive message clones, you need to use a single subscription for all of them.
This is because for events you can have multiple systems listening on the same topic, on different subscriptions so that all the systems receive the event that something has happened.
For commands, you usually want a single system to handle them (even if split between multiple workers) so you would need a single subscription that is shared among all the workers.
By the way, your system should be idempotent in processing events/commands from a topic. The general rule of thumb is that each message is guaranteed to be received by a subscriber at least one time. Meaning the same system could potentially receive the same command two times.

Related

On what condition can AWS SNS Standard topic deliver more than one message which is not allowed in FIFO topic

Can anybody explain in distributed computing terms, as why AWS SNS Standard topic can deliver more than one message, while this is strictly not allowed in SNS FIFO topics.
Whats the architecture behind?
The documentation says:
**Best-effort deduplication**:
A message is delivered at least once, but occasionally more than one copy of a
message is delivered.
It think the short polling in SQS illustrates why distributed systems sometimes work differently then fully centralized:
When you consume messages from a queue using short polling, Amazon SQS samples a subset of its servers (based on a weighted random distribution) and returns messages from only those servers. Thus, a particular ReceiveMessage request might not return all of your messages
AWS does not share exact details of its internal architectures, but I think similar situation may explain why SNS may deliver duplicate messages. Namely, some servers the support the SNS may not receive a notification that a given msg has been deliver in time, and re-send it.
SNS FIFO requires probably a centralized system to manage the msgs, but at the same time it has lower throughout then non-FIFO.
In case of SNS Standard(at least one message), the client ACK may fail at times(Network failure/partition) or other cases. In case the server dosent receives ACK, it would reattempt for the same message delivery. IN these rare cases, the client might receive another copy of same message.
When AWS SNS FIFO guaratees that there is atmost one message, AWS simply has a very "Availaible" set of hardware for FIFO SNS, else in distributed system its not correct to confirm at most 1 messge. Also it must be maintaining high consistency for ordered message confirmation.

Amazon SNS topic with multiple instances of same application

I'm currently facing a problem when thinking about a event driven arch using SNS to decouple some applications.
Imagine a SNS Topic, and I have application A producing messages to it and application B will listen and consume messages from this topic.
This application B has a autoscaling group attached to it, so it can scales to more than one instance. How will SNS handle when application B scales? If now I got 2 instances of application B, SNS will send the message for all of them or it can realize that they are the same application and just send the message to one of them?
Think of SNS as a radio broadcast: Everyone who is listening will get your message. Meaning that every single of your subscribed servers will get notified.
SQS, on the other hand, is more like a todo list. Many subscribers can also listen to it, but every message is distributed to at least someone. Meaning that usually, only one server will get triggered.
If that suits you better, then you might consider using SQS instead of SNS.
I'm not sure what your desired outcome is here, so I'm splitting the answer into two parts:
a) You only want to process each message once:
A common pattern in this case is to subscribe an SQS queue to the SNS topic, and then have N application servers polling from this queue. That way, you can make sure that you process each message only once.
b) You want to process each message once on each server:
In this case, you can create one subscription for each server to the SNS topic. Each message published to the topic will be delivered once to each subscription.

How to send notification to multiple sns topics at once

I have 9000 AWS SNS topics with more than 1M subscribers in each topic. At the moment I am lopping to each topic to send a push message which is consuming lot of my system resources. Is there a way to send message to all the topics at once? what is the best approach to handle the scenario?
It is not possible to subscribe Amazon SNS queues to an Amazon SNS queue, so there is no out-of-the-box method for sending one message to multiple queues.
I would recommend creating an AWS Lambda function that will:
Retrieve a list of all relevant queues (based on tag?)
Loops through and sends a message to each queue
Thus, you would just trigger the Lambda function with one message and it would go to all other queues. It would not "consume system resources", but it is charged based upon run duration. Lambda functions can run for a maximum of 15 minutes, so as long as it sends 10+ messages per minute, it can send to 9000 topics.
Depending upon your use-case, you might also consider using Amazon Pinpoint:
Amazon Pinpoint is an AWS service that you can use to engage with your customers across multiple messaging channels. You can use Amazon Pinpoint to send push notifications, emails, SMS text messages, and voice messages.

amazon sqs only get messages with a specific message attribute and value

I'm building a bunch of systems where they will communicate with each other via SQS. The issue I am having is that messages that are just for all message are getting read instead of the ones just for that service, so now I have messages for different services in flight an inaccessible to the proper destination. Any idea how I can use the MEssageAttributes of a message to only retrieve messages with a particular destination identified?
you basically cannot do that with SQS.
you can either create separate queues per service or you can look at possibly using something else (rabbit mq, etc).

What is the difference between Amazon SNS and Amazon SQS?

When would I use SNS versus SQS, and why are they always coupled together?
SNS is a distributed publish-subscribe system. Messages are pushed to subscribers as and when they are sent by publishers to SNS.
SQS is distributed queuing system. Messages are not pushed to receivers. Receivers have to poll or pull messages from SQS. Messages can't be received by multiple receivers at the same time. Any one receiver can receive a message, process and delete it. Other receivers do not receive the same message later. Polling inherently introduces some latency in message delivery in SQS unlike SNS where messages are immediately pushed to subscribers. SNS supports several end points such as email, SMS, HTTP end point and SQS. If you want unknown number and type of subscribers to receive messages, you need SNS.
You don't have to couple SNS and SQS always. You can have SNS send messages to email, SMS or HTTP end point apart from SQS. There are advantages to coupling SNS with SQS. You may not want an external service to make connections to your hosts (a firewall may block all incoming connections to your host from outside).
Your end point may just die because of heavy volume of messages. Email and SMS maybe not your choice of processing messages quickly. By coupling SNS with SQS, you can receive messages at your pace. It allows clients to be offline, tolerant to network and host failures. You also achieve guaranteed delivery. If you configure SNS to send messages to an HTTP end point or email or SMS, several failures to send message may result in messages being dropped.
SQS is mainly used to decouple applications or integrate applications. Messages can be stored in SQS for a short duration of time (maximum 14 days). SNS distributes several copies of messages to several subscribers. For example, let’s say you want to replicate data generated by an application to several storage systems. You could use SNS and send this data to multiple subscribers, each replicating the messages it receives to different storage systems (S3, hard disk on your host, database, etc.).
Here's a comparison of the two:
Entity Type
SQS: Queue (Similar to JMS)
SNS: Topic (Pub/Sub system)
Message consumption
SQS: Pull Mechanism - Consumers poll and pull messages from SQS
SNS: Push Mechanism - SNS Pushes messages to consumers
Use Case
SQS: Decoupling two applications and allowing parallel asynchronous processing
SNS: Fanout - Processing the same message in multiple ways
Persistence
SQS: Messages are persisted for some (configurable) duration if no consumer is available (maximum two weeks), so the consumer does not have to be up when messages are added to queue.
SNS: No persistence. Whichever consumer is present at the time of message arrival gets the message and the message is deleted. If no consumers are available then the message is lost after a few retries.
Consumer Type
SQS: All the consumers are typically identical and hence process the messages in the exact same way (each message is processed once by one consumer, though in rare cases messages may be resent)
SNS: The consumers might process the messages in different ways
Sample applications
SQS: Jobs framework: The Jobs are submitted to SQS and the consumers at the other end can process the jobs asynchronously. If the job frequency increases, the number of consumers can simply be increased to achieve better throughput.
SNS: Image processing. If someone uploads an image to S3 then watermark that image, create a thumbnail and also send a Thank You email. In that case S3 can publish notifications to an SNS topic with three consumers listening to it. The first one watermarks the image, the second one creates a thumbnail and the third one sends a Thank You email. All of them receive the same message (image URL) and do their processing in parallel.
You can see SNS as a traditional topic which you can have multiple Subscribers. You can have heterogeneous subscribers for one given SNS topic, including Lambda and SQS, for example. You can also send SMS messages or even e-mails out of the box using SNS. One thing to consider in SNS is only one message (notification) is received at once, so you cannot take advantage from batching.
SQS, on the other hand, is nothing but a queue, where you store messages and subscribe one consumer (yes, you can have N consumers to one SQS queue, but it would get messy very quickly and way harder to manage considering all consumers would need to read the message at least once, so one is better off with SNS combined with SQS for this use case, where SNS would push notifications to N SQS queues and every queue would have one subscriber, only) to process these messages. As of Jun 28, 2018, AWS Supports Lambda Triggers for SQS, meaning you don't have to poll for messages any more.
Furthermore, you can configure a DLQ on your source SQS queue to send messages to in case of failure. In case of success, messages are automatically deleted (this is another great improvement), so you don't have to worry about the already processed messages being read again in case you forgot to delete them manually. I suggest taking a look at Lambda Retry Behaviour to better understand how it works.
One great benefit of using SQS is that it enables batch processing. Each batch can contain up to 10 messages, so if 100 messages arrive at once in your SQS queue, then 10 Lambda functions will spin up (considering the default auto-scaling behaviour for Lambda) and they'll process these 100 messages (keep in mind this is the happy path as in practice, a few more Lambda functions could spin up reading less than the 10 messages in the batch, but you get the idea). If you posted these same 100 messages to SNS, however, 100 Lambda functions would spin up, unnecessarily increasing costs and using up your Lambda concurrency.
However, if you are still running traditional servers (like EC2 instances), you will still need to poll for messages and manage them manually.
You also have FIFO SQS queues, which guarantee the delivery order of the messages. SQS FIFO is also supported as an event source for Lambda as of November 2019
Even though there's some overlap in their use cases, both SQS and SNS have their own spotlight.
Use SNS if:
multiple subscribers is a requirement
sending SMS/E-mail out of the box is handy
Use SQS if:
only one subscriber is needed
batching is important
AWS SNS is a publisher subscriber network, where subscribers can subscribe to topics and will receive messages whenever a publisher publishes to that topic.
AWS SQS is a queue service, which stores messages in a queue. SQS cannot deliver any messages, where an external service (lambda, EC2, etc.) is needed to poll SQS and grab messages from SQS.
SNS and SQS can be used together for multiple reasons.
There may be different kinds of subscribers where some need the
immediate delivery of messages, where some would require the message
to persist, for later usage via polling. See this link.
The "Fanout Pattern." This is for the asynchronous processing of
messages. When a message is published to SNS, it can distribute it
to multiple SQS queues in parallel. This can be great when loading
thumbnails in an application in parallel, when images are being
published. See this link.
Persistent storage. When a service that is going to process a message is not reliable. In a case like this, if SNS pushes a
notification to a Service, and that service is unavailable, then the
notification will be lost. Therefore we can use SQS as a persistent
storage and then process it afterwards.
From the AWS documentation:
Amazon SNS allows applications to send time-critical messages to
multiple subscribers through a “push” mechanism, eliminating the need
to periodically check or “poll” for updates.
Amazon SQS is a message queue service used by distributed applications
to exchange messages through a polling model, and can be used to
decouple sending and receiving components—without requiring each
component to be concurrently available.
Fanout to Amazon SQS queues
Following are the major differences between the main messaging technologies on AWS (SQS, SNS, +EventBridge). In order to choose a particular AWS service, we should know the functionalities a service provides as well as its comparison with other services.
The below diagram summarizes the main similarities as well as differences between this service.
In simple terms,
SNS - sends messages to the subscriber using push mechanism and no need of pull.
SQS - it is a message queue service used by distributed applications to exchange messages through a polling model, and can be used to decouple sending and receiving components.
A common pattern is to use SNS to publish messages to Amazon SQS queues to reliably send messages to one or many system components asynchronously.
Reference from Amazon SNS FAQs.
One reason for coupling SQS and SNS would be for data processing pipelines.
Let's say you are generating three kinds of product, and that products B & C are both derived from the same intermediate product A. For each kind of product (i.e., for each segment of the pipeline) you set up:
a compute resource (maybe a lambda function, or a cluster of virtual machines, or an autoscaling kubernetes job) to generate the product.
a queue (describing units of work that need to be performed) to partition the work across the compute resource (so that each unit of work is processed exactly once, but separate units of work can be processed separately in parallel and asynchronously with each other).
a news feed (announcing outputs that have been produced).
Then arrange so that the input queues for B & C are both subscribing to the output announcements of A.
This makes the pipeline modular on the level of infrastructure. Rather than having a monolithic server application that generates all three products together, different stages of the pipeline can utilise different hardware resources (for example, perhaps stage B is very memory intensive, but the two other stages can be performed with cheaper hardware/services). This also makes it easier to iterate on the development of one pipeline segment without disrupting delivery of the other products.
There are some key distinctions between SNS and SQS:
SNS supports A2A and A2P communication, while SQS supports only A2A
communication.
SNS is a pub/sub system, while SQS is a queuing system. You'd
typically use SNS to send the same message to multiple consumers via
topics. In comparison, in most scenarios, each message in an SQS
queue is processed by only one consumer. With SQS, messages are
delivered through a long polling (pull) mechanism, while SNS uses a
push mechanism to immediately deliver messages to subscribed
endpoints.
SNS is typically used for applications that need real time
notifications, while SQS is more suited for message processing use
cases.
SNS does not persist messages - it delivers them to subscribers that
are present, and then deletes them. In comparison, SQS can persist
messages (from 1 minute to 14 days).
Individually, Amazon SQS and SNS are used for different use cases. You can, however, use them together in some scenarios.