Fault Tolerant Clustered Queues - SQS

Fault Tolerant Clustered Queues - SQS - amazon-web-services

I would like to create SQS queues in 2 different AWS regions. Is there a way to setup synchronization between both queues? When data is read off a queue in either region , message must not be available for consumption. If one the region goes down , then consumer must start reading from the next message in the available region? Does AWS support this out of the box or are there any patterns available to support this use case?

No, this is not a feature of Amazon SQS.
It would be quite difficult to implement because you cannot request a specific message off a queue. So, if a message is retrieved in one region, there is no way to delete that message in a different region. You would need to operate a database to keep track of the messages, which sort of defeats the whole purpose of the queue.
Amazon SQS is a multi-AZ service that can survive failure of an Availability Zone, but resides in a single region.

You can use Amazon SNS to fan out messages to multi SQS queues, even in multiple different regions. Details here: Sending Amazon SNS messages to an Amazon SQS queue or AWS Lambda function in a different Region.
However this results in duplicate messages across those regions, this does not satisfy your requirement
When data is read off a queue in either region , message must not be available for consumption

Related

How does ECS Fargate distribute traffic across the available tasks when it receive messages from SQS?

I have a multi-region ECS Fargate, running 2 tasks in 1 cluster per region. Totally I have 4 tasks, 2 in us-east-1 and 2 in us-west-1.
The purpose of the ECS consumer tasks is to process messages as and when messages are available in SQS.
SQS will be configured in just a single region. The SQS arn will be configured in the container running the tasks.
With this setup, when there are messages from SQS, how does the traffic gets distributed across all available ECS tasks across multi-region? Is it random ? Someone please clarify.
I am not configuring load balancers for the ECS task since I do not have external calls. The source is always the messages from SQS.

With this setup, when there are messages from SQS, how does the traffic gets distributed across all available ECS tasks across multi-region? Is it random ? Someone please clarify
It's not random, but it is arbitrary. Here is what the docs say:
Standard queues provide best-effort ordering which ensures that messages are generally delivered in the same order as they're sent.
The reason that it's arbitrary is because SQS queues are distributed across multiple nodes and you have no idea how many nodes there are. So if SQS decides that you need 20 nodes to handle the rate that messages are added to the queue, and you retrieve 10 messages at a time (the limit), clearly you're going to get messages from some subset of those nodes.
Going into the realm of complete speculation, long polling might improve your chances of getting messages in the order that they were sent, because it is documented to "quer[y] all of the servers for messages." Of course, that could only apply when you can't fill your response from a single server. I would expect it to grab all messages that it can from each server and return as soon as it hits the maximum number of messages, even if it hasn't actually queried all servers.
SQS will be configured in just a single region. The SQS arn will be configured in the container running the tasks.
Beware that you need the queue URL, not its ARN, in order to retrieve messages.
Beware also that -- at least with the Python SDK -- you need to configure your SQS client's region to match the region where the queue exists (even though you pass the URL, which contains the region).

Does AWS SQS replicate messages across regions?

As SQS is distribute queue, so does it replicate messages in the same region or different region? Looking at architecture at the AWS
docs, it shows the message being replicated, but does it replicate in the same region or different regions?
Use case:
I'm setting up queue in region X, but it might be accessed in a region at other end of world. So if there are two workers running one in region X and one in region Y, does both get data from same region X queue or can it be region X and region Y got data from region near to them.
Like X got a message from region X and before the time this info reaches region Y to update queue, then another worker take from replicated region Y queue and reads same message.
P.S :- I know SQS in at least once semantics. But I want to know semantics in the above use case.

SQS is a regional service, that is highly available within a single region. There is no cross-region replication capability. You can definitely access the queue from different regions, just initialize the sqs client with the correct destination region.

As a standard practice for AWS services, the data resides within the region that you create the service in.
There are exceptions, but these will require you as the user to perform an action to allow such as copying an AMI, or enabling S3 replication.
If the queue is being consumed in multiple regions, it will always access the regional endpoint of the SQS queue rather than that of the current region.
As SQS is a queueing service, if you have workers distributed across regions the likelihood is that the item is removed from the queue and processed in a single region (although the exact definition would be it is delivered at least once).
If you're trying to have the message consumed in multiple regions, it would be better to consider a fanout based approach whereby each regions workers would consume from their own SQS queue as opposed to sharing one.
For more information take a look at the https://aws.amazon.com/getting-started/hands-on/send-fanout-event-notifications/ documentation.

Does lambdas execute operations in sequence.?

We are contemplating using Amazon web services for our project. Wherein the upstream flow will push the messages into the kinesis and later those messages will be fed into the lambdas, those messages before processing are going to be in order. As per my understanding, the AWS lambdas will scale out horizontally based on the volume of the messages. We have a volume of 400 messages per second, which means AWS lambda will respond to message volume and will instantiate new processes on separate containers to leverage parallelism and in order to achieve parallelism, ordering has to be compromised. So in case of 10 messages, which were ordered, hit the lambda functions and one function takes more time than another, a new function will be provisioned in some container by the AWS to serve the request.
Is the final output going to be in order after all of this processes?
Any help is appreciated.
Thanks.

If you are using Amazon Kinesis, then you can use a Data Transformation to trigger an AWS Lambda function on each incoming record.
This allows the record to be transformed or deleted, before continuing through the Firehose. Thus, records can be processed by Lambda while remaining in the same order. The final data can be delivered to Amazon S3, Amazon Redshift, Amazon Elasticsearch Service or Splunk.
If your application is consuming records from Amazon Kinesis directly (instead of via Firehose), then records will be consumed in order by your application.

Can I limit concurrent invocations of an AWS Lambda?

I have a Lambda function that’s triggered by a PUT to an S3 bucket.
I want to limit this Lambda function so that it’s only running one instance at a time – I don’t want two instances running concurrently.
I’ve had a look through the Lambda configuration and docs, but I can’t see anything obvious. I can about writing my own locking system, but it would be nice if this was already a solved problem.
How can I limit the number of concurrent invocations of a Lambda?

AWS Lambda now supports concurrency limits on individual functions:
https://aws.amazon.com/about-aws/whats-new/2017/11/set-concurrency-limits-on-individual-aws-lambda-functions/

I would suggest you to use Kinesis Streams (or alternatively DynamoDB + DynamoDB Streams, which essentially have the same behavior).
You can see Kinesis Streams as as queue. The good part is that you can use a Kinesis Stream as a Trigger to you Lambda function. So anything that gets inserted into this queue will automatically be passed over to your function, in order. So you will be able to process those S3 events one by one, one Lambda execution after the other (one instance at a time).
In order to do that, you'll need to create a Lambda function with the simple purpose of getting S3 Events and putting them into a Kinesis Stream. Then you'll configure that Kinesis Stream as your Lambda Trigger.
When you configure the Kinesis Stream as your Lambda Trigger I suggest you to use the following configuration:
Batch size: 1
This means that your Lambda will be called with only one event from Kinesis. You can select a higher number and you'll get a list of events of that size (for example, if you want to process the last 10 events in one Lambda execution instead of 10 consecutive Lambda executions).
Starting position: Trim horizon
This means it'll behave as a queue (FIFO)
A bit more info on AWS May Webinar Series - Streaming Data Processing with Amazon Kinesis and AWS Lambda.
I hope this helps anyone with a similar problem.
P.S. Bear in mind that Kinesis Streams have their own pricing. Using DynamoDB + DynamoDB Streams might be cheaper (or even free due to the non-expiring Free Tier of DynamoDB).

No, this is one of the things I'd really like to see Lambda support, but currently it does not. One of the problems is that if there were a lot of S3 PUT operations happening AWS would have to queue up all the Lambda invocations somehow, and there is currently no support for that.
If you built a locking mechanism into your Lambda function, what would you do with the requests you don't process due to a lock? Would you just throw those S3 notifications away?
The solution most people recommend is to have S3 send the notifications to an SQS queue, and then have your Lambda function scheduled to run periodically, like once a minute, and check if there is an item in the queue that needs to be processed.
Alternatively, have S3 send the notifications to SQS and just have a t2.nano EC2 instance with a single-threaded service polling the queue.

I know this is an old thread, but I ran across it trying to figure out how to make sure my time sequenced SQS messages were processed in order coming out of a FIFO queue and not getting processed simultaneously/out-of-order via multiple Lambda threads running.
Per the documentation:
For FIFO queues, Lambda sends messages to your function in the order
that it receives them. When you send a message to a FIFO queue, you
specify a message group ID. Amazon SQS ensures that messages in the
same group are delivered to Lambda in order. Lambda sorts the messages
into groups and sends only one batch at a time for a group. If your
function returns an error, the function attempts all retries on the
affected messages before Lambda receives additional messages from the
same group.
Your function can scale in concurrency to the number of active message
groups.
Link: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
So essentially, as long as you use a FIFO queue and submit your messages that need to stay in sequence with the same MessageGroupID, SQS/Lambda automatically handles the sequencing without any additional settings necessary.

Have the S3 "Put events" cause a message to be placed on the queue (instead of involving a lambda function). The message should contain a reference to the S3 object. Then SCHEDULE a lambda to "SHORT POLL the entire queue".
PS: S3 events can not trigger a Kinesis Stream... only SQS, SMS, Lambda (see http://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html#supported-notification-destinations). Kinesis Stream are expensive and used for real-time event handling.

What is the difference between Amazon SNS and Amazon SQS?

When would I use SNS versus SQS, and why are they always coupled together?

SNS is a distributed publish-subscribe system. Messages are pushed to subscribers as and when they are sent by publishers to SNS.
SQS is distributed queuing system. Messages are not pushed to receivers. Receivers have to poll or pull messages from SQS. Messages can't be received by multiple receivers at the same time. Any one receiver can receive a message, process and delete it. Other receivers do not receive the same message later. Polling inherently introduces some latency in message delivery in SQS unlike SNS where messages are immediately pushed to subscribers. SNS supports several end points such as email, SMS, HTTP end point and SQS. If you want unknown number and type of subscribers to receive messages, you need SNS.
You don't have to couple SNS and SQS always. You can have SNS send messages to email, SMS or HTTP end point apart from SQS. There are advantages to coupling SNS with SQS. You may not want an external service to make connections to your hosts (a firewall may block all incoming connections to your host from outside).
Your end point may just die because of heavy volume of messages. Email and SMS maybe not your choice of processing messages quickly. By coupling SNS with SQS, you can receive messages at your pace. It allows clients to be offline, tolerant to network and host failures. You also achieve guaranteed delivery. If you configure SNS to send messages to an HTTP end point or email or SMS, several failures to send message may result in messages being dropped.
SQS is mainly used to decouple applications or integrate applications. Messages can be stored in SQS for a short duration of time (maximum 14 days). SNS distributes several copies of messages to several subscribers. For example, let’s say you want to replicate data generated by an application to several storage systems. You could use SNS and send this data to multiple subscribers, each replicating the messages it receives to different storage systems (S3, hard disk on your host, database, etc.).

Here's a comparison of the two:
Entity Type
SQS: Queue (Similar to JMS)
SNS: Topic (Pub/Sub system)
Message consumption
SQS: Pull Mechanism - Consumers poll and pull messages from SQS
SNS: Push Mechanism - SNS Pushes messages to consumers
Use Case
SQS: Decoupling two applications and allowing parallel asynchronous processing
SNS: Fanout - Processing the same message in multiple ways
Persistence
SQS: Messages are persisted for some (configurable) duration if no consumer is available (maximum two weeks), so the consumer does not have to be up when messages are added to queue.
SNS: No persistence. Whichever consumer is present at the time of message arrival gets the message and the message is deleted. If no consumers are available then the message is lost after a few retries.
Consumer Type
SQS: All the consumers are typically identical and hence process the messages in the exact same way (each message is processed once by one consumer, though in rare cases messages may be resent)
SNS: The consumers might process the messages in different ways
Sample applications
SQS: Jobs framework: The Jobs are submitted to SQS and the consumers at the other end can process the jobs asynchronously. If the job frequency increases, the number of consumers can simply be increased to achieve better throughput.
SNS: Image processing. If someone uploads an image to S3 then watermark that image, create a thumbnail and also send a Thank You email. In that case S3 can publish notifications to an SNS topic with three consumers listening to it. The first one watermarks the image, the second one creates a thumbnail and the third one sends a Thank You email. All of them receive the same message (image URL) and do their processing in parallel.

You can see SNS as a traditional topic which you can have multiple Subscribers. You can have heterogeneous subscribers for one given SNS topic, including Lambda and SQS, for example. You can also send SMS messages or even e-mails out of the box using SNS. One thing to consider in SNS is only one message (notification) is received at once, so you cannot take advantage from batching.
SQS, on the other hand, is nothing but a queue, where you store messages and subscribe one consumer (yes, you can have N consumers to one SQS queue, but it would get messy very quickly and way harder to manage considering all consumers would need to read the message at least once, so one is better off with SNS combined with SQS for this use case, where SNS would push notifications to N SQS queues and every queue would have one subscriber, only) to process these messages. As of Jun 28, 2018, AWS Supports Lambda Triggers for SQS, meaning you don't have to poll for messages any more.
Furthermore, you can configure a DLQ on your source SQS queue to send messages to in case of failure. In case of success, messages are automatically deleted (this is another great improvement), so you don't have to worry about the already processed messages being read again in case you forgot to delete them manually. I suggest taking a look at Lambda Retry Behaviour to better understand how it works.
One great benefit of using SQS is that it enables batch processing. Each batch can contain up to 10 messages, so if 100 messages arrive at once in your SQS queue, then 10 Lambda functions will spin up (considering the default auto-scaling behaviour for Lambda) and they'll process these 100 messages (keep in mind this is the happy path as in practice, a few more Lambda functions could spin up reading less than the 10 messages in the batch, but you get the idea). If you posted these same 100 messages to SNS, however, 100 Lambda functions would spin up, unnecessarily increasing costs and using up your Lambda concurrency.
However, if you are still running traditional servers (like EC2 instances), you will still need to poll for messages and manage them manually.
You also have FIFO SQS queues, which guarantee the delivery order of the messages. SQS FIFO is also supported as an event source for Lambda as of November 2019
Even though there's some overlap in their use cases, both SQS and SNS have their own spotlight.
Use SNS if:
multiple subscribers is a requirement
sending SMS/E-mail out of the box is handy
Use SQS if:
only one subscriber is needed
batching is important

AWS SNS is a publisher subscriber network, where subscribers can subscribe to topics and will receive messages whenever a publisher publishes to that topic.
AWS SQS is a queue service, which stores messages in a queue. SQS cannot deliver any messages, where an external service (lambda, EC2, etc.) is needed to poll SQS and grab messages from SQS.
SNS and SQS can be used together for multiple reasons.
There may be different kinds of subscribers where some need the
immediate delivery of messages, where some would require the message
to persist, for later usage via polling. See this link.
The "Fanout Pattern." This is for the asynchronous processing of
messages. When a message is published to SNS, it can distribute it
to multiple SQS queues in parallel. This can be great when loading
thumbnails in an application in parallel, when images are being
published. See this link.
Persistent storage. When a service that is going to process a message is not reliable. In a case like this, if SNS pushes a
notification to a Service, and that service is unavailable, then the
notification will be lost. Therefore we can use SQS as a persistent
storage and then process it afterwards.

From the AWS documentation:
Amazon SNS allows applications to send time-critical messages to
multiple subscribers through a “push” mechanism, eliminating the need
to periodically check or “poll” for updates.
Amazon SQS is a message queue service used by distributed applications
to exchange messages through a polling model, and can be used to
decouple sending and receiving components—without requiring each
component to be concurrently available.
Fanout to Amazon SQS queues

Following are the major differences between the main messaging technologies on AWS (SQS, SNS, +EventBridge). In order to choose a particular AWS service, we should know the functionalities a service provides as well as its comparison with other services.
The below diagram summarizes the main similarities as well as differences between this service.

In simple terms,
SNS - sends messages to the subscriber using push mechanism and no need of pull.
SQS - it is a message queue service used by distributed applications to exchange messages through a polling model, and can be used to decouple sending and receiving components.
A common pattern is to use SNS to publish messages to Amazon SQS queues to reliably send messages to one or many system components asynchronously.
Reference from Amazon SNS FAQs.

One reason for coupling SQS and SNS would be for data processing pipelines.
Let's say you are generating three kinds of product, and that products B & C are both derived from the same intermediate product A. For each kind of product (i.e., for each segment of the pipeline) you set up:
a compute resource (maybe a lambda function, or a cluster of virtual machines, or an autoscaling kubernetes job) to generate the product.
a queue (describing units of work that need to be performed) to partition the work across the compute resource (so that each unit of work is processed exactly once, but separate units of work can be processed separately in parallel and asynchronously with each other).
a news feed (announcing outputs that have been produced).
Then arrange so that the input queues for B & C are both subscribing to the output announcements of A.
This makes the pipeline modular on the level of infrastructure. Rather than having a monolithic server application that generates all three products together, different stages of the pipeline can utilise different hardware resources (for example, perhaps stage B is very memory intensive, but the two other stages can be performed with cheaper hardware/services). This also makes it easier to iterate on the development of one pipeline segment without disrupting delivery of the other products.

There are some key distinctions between SNS and SQS:
SNS supports A2A and A2P communication, while SQS supports only A2A
communication.
SNS is a pub/sub system, while SQS is a queuing system. You'd
typically use SNS to send the same message to multiple consumers via
topics. In comparison, in most scenarios, each message in an SQS
queue is processed by only one consumer. With SQS, messages are
delivered through a long polling (pull) mechanism, while SNS uses a
push mechanism to immediately deliver messages to subscribed
endpoints.
SNS is typically used for applications that need real time
notifications, while SQS is more suited for message processing use
cases.
SNS does not persist messages - it delivers them to subscribers that
are present, and then deletes them. In comparison, SQS can persist
messages (from 1 minute to 14 days).
Individually, Amazon SQS and SNS are used for different use cases. You can, however, use them together in some scenarios.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js