Can you add SQS message attributes to SQS messages generated by S3 events? - amazon-web-services

I'd like to use AWS S3 to allow some users to add files to an S3 bucket.
Then, I'd like to generate an SQS message when a new file has been added.
Last, I'd like to consume the SQS message and process it with a background job worker of a particular class.
I'd like to use SQS message attributes to determine which background job worker class I should use for processing. As the SQS message attribute documentation states:
Message attributes [...] can be used by the consumer of the message to help decide how to handle the message without having to first process the message body.
(via the SQS Developer Guide)
However, under the S3 bucket's Properties, under Advanced Settings, the Events settings do not appear to expose a way to set message attributes.
Is there a way to specify message attributes on an event-by-event basis for events generated from S3?

There is not a way to inject custom message attributes into S3 event notifications... but also note that you may be misinterpreting what message attributes can be used for.
Message attributes [...] can be used by the consumer of the message
This means they provide a mechanism for the consumer to triage a message after the consumer has already received it from the queue.
You can't selectively consume messages based on message attributes. Queue consumers receive the next available message(s) when they poll the queue. They don't get to select which messages they consume.
If you want to divide the messages up by class, you'll need an intermediate process that selectively distributes messages to the appropriate (separate) downstream queues. Better, if your structure allows it, might be different event configurations matching specific patterns that need to go to individual queues.

You should use a Lambda to add messages to the queue, for example with python using BOTO 3 API. Map this Lambda to the S3 Event. http://docs.aws.amazon.com/lambda/latest/dg/with-s3.html

Related

AWS SQS Selective Polling Pattern

I have a system where I publish updates to a shared topic meant for specific consumers.
I noticed messages getting stuck in the queue due to a lack of selective listening in SQS consumers, so messages are being hijacked.
Example:
Given: Message{destination: A, payload: 1234}
Given: ConsumerA, & ConsumerB
I expect Message to be processed by ConsumerA. However, it gets hijacked by Consumer B continuously. It receives the message, then refuses to process it since the destination field doesn't match, leading to the visibility timeout to expire, and the message put back on the queue.. but due to the nature of SQS, ConsumerB has an equal chance of picking the message again.
My question is, what patterns are used to solve this type of issue?
I'm considering creating a queue per consumer but it has drawbacks specific to the system im working on.
If I could only listen for messages with matching attributes, problem solved, but that's seemingly not the case.
Is there any other way?
Sharing a single Amazon SQS queue is not an appropriate architecture for your use-case.
If you want your consumers to be able to 'request' a message from a particular subset, you should either use separate SQS queues or use a database. You could even store objects in Amazon S3 as a form of noSQL database.
Having consumers grab messages and then 'send them back' to the queue is not compatible with the design of the Amazon SQS service.

SQS Lambda Trigger with Visibility Timeout extension

I'm working on a solution where I have a SQS queue with Lambda trigger. My understanding is Lambda will receive messages in batches to be processed, and once Lambda function is successful, the messages in the SQS queue is automatically deleted. However, how do I only allow some of those messages to be deleted?
Let's assume this use case:
Lambda function receives a batch with 10 messages, and only 7 messages are valid and can be processed, and the other 3 messages needs to be reprocessed at later point.
My initial thought was I could update the visibility timeout via boto3.sqs.change_visibility_timeout for each of the 3 messages to have it reprocessed after the timeout, however, since overall lambda function execution is successful, all 10 messages are deleted from SQS queue.
Any suggestions?
Yes, by default, the Lambda function deletes all the messages upon success. You would need to handle this in your code, but not by changing the visibility timeout of the messages.
Add DLQ (dead-letter queue) that will actually handle the failed messages (messages go to DLQ after a certain number of failed attempts to be processed, depending on how you set it up)
You have few options here:
You can handle each item yourself, and delete messages that are processed successfully. In case of a message that's not successful, you can throw an error and it won't be deleted automatically by the lambda function
If you use JavaScript you can try with Middy
If you use Python, you can use Lambda Powertools Python
For AWS Lambdas with an SQS trigger, by default, when your function encounters an error processing one or more messages in a given batch, the entire batch is marked as a failure. All of the messages in the batch are made visible again in the queue. Depending on your redrive policy, you can end up repeatedly processing successful messages along with the failures.
Rather than change the visibility timeout, the simplest way to specify which messages should be retried later and which can safely be deleted from the queue is to change the function response type to ReportBatchItemFailures. This allows you to return a list of failed message ids, indicating that only those messages in the batch should be made visible again in the queue.
Here's what the reporting syntax looks like for a handler function in Node.js:
exports.handler = async (event) => {
// Process the event
const batchItemFailureResponse = {
batchItemFailures: [
{
itemIdentifier: "idFailedMessage1"
},
{
itemIdentifier: "idFailedMessage2"
},
{
itemIdentifier: "idFailedMessage3"
}
]
};
return batchItemFailureResponse;
};
There is more information to be found in the official documentation.
This response type is configured when setting a queue as an event source for the Lambda. If you're configuring from the console, navigate to the Lambda function page, select the Configuration tab, and then choose Triggers. Then choose Add Trigger and choose the SQS trigger type. In addition to providing the standard parameters, be sure to check the box under Report batch item failures after expanding Additional Settings. It should look something like this:
Add trigger with batch failure reporting
This parameter must be set when first creating the trigger.
This response type can also be defined if you use CloudFormation templates to provision your resources. See the AWS documentation for more information. Note that if you use AWS SAM event source mappings, the documentation suggests that adding FunctionResponseTypes to the YAML with ReportBatchItemFailures in the type list isn't supported. That is incorrect, the documentation is simply outdated. There is an open issue around addressing this oversight.
Finally, in addition to reporting batch item failures, you should provision a target DLQ (dead-letter queue) and determine a reasonable maximum receive count so that action can be taken on messages that fail repeatedly.

AWS SQS Dead Letter Queue notifications

I'm trying to design a small message processing system based on SQS, Lambda, and SNS. In case of failure, I'd like for the message to be enqueued in a Dead Letter Queue (DLQ) and for a webhook to be called.
I'd like to know what the most canonical or reasonable way of achieving that would look like.
Currently, if everything goes well, the process should be as follows:
SQS (in place to handle retries) enqueues a message
Lambda gets invoked by SQS and processes the message
Lambda sends a webhook and finishes normally
If something in the lambda goes wrong (success webhook cannot be called, task at hand cannot be processed), the easiest way to achieve what I want seems to be to set up a DLQ1 that SQS would put the failed messages in. An auxiliary lambda would then be called to process this message, pass it to SNS, which would call the failure webhook, and also forward the message to DLQ2, the final/true DLQ.
Is that the best approach?
One alternative I know of is Alarms, though I've been warned that they are quite tricky. Another one would be to have lambda call the error reporting webhook if there's a failure on the last retry, although that somehow seems inappropriate.
Thanks!
Your architecture looks good enough in case of success, but I personally find it quite confusing if anything goes wrong as I don't see why you need two DLQs to begin with.
Here's what I would do in case of failure:
Define a DLQ on your source SQS Queue and set the maxReceiveCount to e.g. 3, meaning if messages fail three times, they will be redirected to the configured DLQ
Create a Lambda that listens to this DLQ.
Execute the webhook inside this Lambda.
Since step 3 automatically deletes the message from the Queue once it has been processed and, apparently, you want the messages to be persisted somewhere, store the content of the message in a file on S3 and store the file metadata (bucket and key) in a table in DynamoDB, so you can always query for failed messages.
I don't see any role for SNS here unless you want multiple subscribers for a given message, but as I see this is not the case.
This way, you need need to maintain only one DLQ and you can get rid of SNS as it's only adding an extra layer of complexity to your architecture.

use aws sqs for different message types

I am using AWS SQS and Spring JMS in my project. I have my method with #JmsListener(destination = "queue_name"). I want to use this queue for two different types of messages.
Since this listener is configured to this queue it receives both types of messages. What I am trying to achieve is to ignore message of one type. (Sender is adding a MessageAttribute while sending message to Queue). So, is there a way to just ignore message coming from sender 2 so this method won't process them.
Also, I have DLQ set on this queue with max receives as 5. So if message is not processed in first 5 attempts it gets moved to DLQ.
Please do share your suggestion.
Thanks.
The correct solution is to use 2 different queues; SQS can't filter the messages delivered by any property, so as you are seeing, when the client reads the message and doesn't process it, it is going to end up in your DLQ quicker.
Queues are free, so having multiple won't cost any more.

Using Amazon SQS with multiple consumers

I have a service-based application that uses Amazon SQS with multiple queues and multiple consumers. I am doing this so that I can implement an event-based architecture and decouple all the services, where the different services react to changes in state of other systems. For example:
Registration Service:
Emits event 'registration-new' when a new user registers.
User Service:
Emits event 'user-updated' when user is updated.
Search Service:
Reads from queue 'registration-new' and indexes user in search.
Reads from queue 'user-updated' and updates user in search.
Metrics Service:
Reads from 'registration-new' queue and sends to Mixpanel.
Reads from queue 'user-updated' and sends to Mixpanel.
I'm having a number of issues:
A message can be received multiple times when doing polling. I can design a lot of the systems to be idempotent, but for some services (such as the metrics service) that would be much more difficult.
A message needs to be manually deleted from the queue in SQS. I have thought of implementing a "message-handling-service" that handles the deletion of messages when all the services have received them (each service would emit a 'message-acknowledged' event after handling a message).
I guess my question is this: what patterns should I use to ensure that I can have multiple consumers for a single queue in SQS, while ensuring that the messages also get delivered and deleted reliably. Thank you for your help.
I think you are doing it wrong.
It looks to me like you are using the same queue to do multiple different things. You are better of using a single queue for a single purpose.
Instead of putting an event into the 'registration-new' queue and then having two different services poll that queue, and BOTH needing to read that message and both doing something different with it (and then needing a 3rd process that is supposed to delete that message after the other 2 have processed it).
One queue should be used for one purpose.
Create a 'index-user-search' queue and a 'send to mixpanels' queue,
so the search service reads from the search queues, indexes the user
and immediately deletes the message.
The mixpanel-service reads from the mix-panels queue, processes the
message and deletes the message.
The registration service, instead of emiting a 'registration-new' to a single queue, now emits it to two queues.
To take it one step better, add SNS into the mix here and have the registration service emit an SNS message to the 'registration-new' topic (not queue), and then subscribe both of the queues I mentioned above, to that topic in a 'fan-out' pattern.
https://aws.amazon.com/blogs/aws/queues-and-notifications-now-best-friends/
Both queues will receive the message, but you only load it into SNS once - if down the road a 3rd unrelated service needs to also process 'registration-new' events, you create another queue and subscribe it to the topic as well - it can run with no dependencies or knowledge of what the other services are doing - that is the goal.
The primary use-case for multiple consumers of a queue is scaling-out.
The mechanism that allows for multiple consumers is the Visibility Timeout, which gives a consumer time to process and delete a message without it being consumed concurrently by another consumer.
To address the "At-Least-Once Delivery" property of Standard Queues,
the consuming service should be idempotent.
If that isn't possible, one possible solution is to use FIFO queues, but this mode has a limited message delivery rate and is not compatible with SNS subscription.
They even have a tutorial on how to create a fanout scenario using the combo SNS+SQS.
https://aws.amazon.com/getting-started/tutorials/send-fanout-event-notifications/
Too bad it does not support FIFO queues so you have to be careful to handle out of order messages.
It would be nice if they had a consistent hashing solution to have multiple competing consumers while respecting the message order.