SQS - Delivery Delay of 30 minutes - amazon-web-services

From the documentation of SQS, Max time delay we can configure for a message to hide from its consumers is 15 minutes - http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html
Suppose if I need to hide the messages for a day, what is the pattern?
For eg. I want to mimic a daily cron for doing some action.
Thanks

The simplest way to do this is as follows:
SQS.push_to_queue({perform_message_at : "Thursday November 2022"},delay: 15 mins)
Inside your worker
message = SQS.poll_messages
if message.perform_message_at > Time.now
SQS.push_to_queue({perform_message_at : "Thursday November
2022"},delay:15 mins)
else
process_message(message)
end
Basically push the message back to the queue with the maximum delay and only process it when its processing time is less than the current time.
HTH.

Visibility timeout can do up to 12 hours. I think you can hack something together where you process a message but don't delete it and next time it is processed its been 12 hours. So a queue with one message and visibility timeout of 12 hours. That gets you a 12 hour cron.

Cloudwatch is likely a better way to do it. You can use a createEvent API with the timer, and have it trigger either a lambda function or an API call to whatever comes next.
Another way to do is to use the "wait" utility in an AWS step function.
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-wait-state.html
In any case, unless you are extremely sure you will never need anything more than 15 minutes, the SQS backdoor to add the delay seems hacky.

You can do this by adding a DLQ with MaxReceives set to 1 on the first queue.
Add a simple Lambda on the first queue and fail the message vi Lambda. So message will be moved to DLQ automatically and then you can consume from DLQ.
Both primary queue and DLQ can have max 15 min delay, so finally you get 30 min delay.
So your consumer app receives the message after 30 minutes, without adding any custom logic on it.

Two thoughts.
Untested. Perhaps publish to and SNS topic that has no SQS queues. When delivery needs to happen, subscribe the queue to the topic. (I've not done this, I'm not sure if this would work as expected)
Push messages as files to a central store (like S3). Create a worker that looks at the time created timestamp and decides whether to publish them to a queue or not. If created >= 1d ago, publish.

This was a challenge for us as well and I never found a perfect solution so I ended up building a service to address it. Obviously self promotion here but the system allows you to work around the DelaySeconds limitation and set arbitrary date/times at scale.
https://anticipated.io
Some of the challenges working with Step Functions are scale of registered machines (if your system had that requirement). If you use EventBridge to fire them you run out of allowable rulesets (limit is 200 as of this posting). Example: if you need to set 150,000 arbitrary events a month you run into limits quickly.

Related

AWS Scheduling Mechanism?

I want to delay an email being sent for X days. This could be days, weeks, or months.
Is there anything out there other than Step Functions?
Some concerns I had with Step Functions, which could be debunked:
The longest wait cannot exceed a year
If the stack is deleted during long running waits. Does that in turn delete the wait task?
You can define an event rule in either Cloudwatch or EventBridge that triggers on a schedule (fixed or based on cron expression). The event target can be either SNS Topic or a Lambda function to send the email.
Please refer here for more details
Check this solution out. It should be what you need. It is a combination of Step Function and the DB solutions mentioned above which addresses the issues being raised (max wait time and inefficient scanning).
https://github.com/aws-samples/step-functions-workflows-collection/tree/main/smart-cron-job/

SQS batching for Lambda trigger doesn't work as expected

I have 2 Lambda Functions and an SQS queue inbetween.
The first Lambda sends the messages to the Queue.
Then second Lambda has a trigger for this Queue with a batch size of 250 and a batch window of 65 seconds.
I expect the second Lambda to be triggered in batches of 250 messages after about every 65 seconds. In the second Lambda I'm calling a 3rd party API that is limited to 250 API calls per minute (I get 250 tokens per minute).
I tested this setup with for 32.000 messages being added to the queue and the second Lambda didn't pick up the messages in batches as expected. At first it got executed for 15k messages and then there were not enough tokens so it did not process those messages.
The 3rd party API is based on a token bucket with a fill rate of 250 per minute and a maximum capacity of 15.000. It managed to process the first 15.000 messages due to the bucket capacity and then didn't have enough capacity to handle the rest.
I don't understand what went wrong.
The misunderstanding is probably related to how Lambda handles scaling.
Whenever there are more events than a single Lambda execution context/instance can handle, Lambda just creates more execution contexts/instances to process these events.
What probably happened is that Lambda saw there are a bunch of messages in the queue and it tries to work on these as fast as possible. It created a Lambda instance to handle the first event and then talked to SQS and asked for more work. When it got the next batch of messages, the first instance was still busy, so it scaled out and created a second one that worked on the second batch in parallel, etc. etc.
That's how you ended up going through your token budget in a few minutes.
You can limit how many functions Lambda is allowed to execute in parallel by using reserved concurrency - here are the docs for reference. If you set the reserved concurrency to 1, there will be no parallelization and only one Lambda is allowed to work on the messages.
This however opens you up to another issue. If that single Lambda takes less than 60 seconds to process the messages, Lambda will call it again with another batch ASAP and you might go over your budget again.
At this point a relatively simple approach would be to make sure that your lambda function always takes about 60 seconds by adding a sleep for the remaining time at the end.

Processing AWS SQS messages with separate Lambda at a time

Like the title suggests, I have a scenario that I would like to explore but do not know how to go about it.
I have a lambda function processCSVFile. I also have a SQS queue that at a set time everyday, it gets populated with link of csv files from S3, let's say about 2000 messages. Now I want to process 25 messages at a time once the SQS queue has the messages.
The scenario I am looking for is to process 25 messages concurrently, I want the 25 messages to be processed by 25 lambda invocations separately. I thought I could use SendMessageBatch function in SQS but this only delivers messages to the queue, it does not seem to apply to my use case.
My question is, am I able to perform the action explained above and if it is possible, what documentation or use cases can explain what I am looking for.
Also, if this use case is impossible, what do you recommend as an alternative way to do the processing I want done concurrently.
To process 25 messages from Amazon SQS with 25 concurrent Lambda functions (1 message per running Lambda function), you would need:
A maximum concurrency of 25 configured for the Lambda function (otherwise it might go higher than this when more messages are available)
A batch size of 1 configured on the Lambda trigger so that SQS only passes it one message at a time
See:
AWS Lambda Function Scaling (Maximum concurrency)
Configuring a Queue as an Event Source (Batch size)
I think that combination of lambda's event source mapping for sqs
and setting reserved concurrency to 25 could be the way do go.
The lambda uses long pooling to prepare message batches for concurrent processing by lambda. Thus each invocation of your function could get more than 1 message at a time.
I don't think there is a way to set event source mapping to serve just one message per batch. If you absolute must ensure only one message is processed by lambda, then you process one and disregards others (put them back to queue).
The reserved concurrency of 25 guarantees that you wont be running more than 25 functions in parallel. If you leave it at its default value, you can run up to whatever free concurrency you have in your account.
Edit:
#JohnRotenstein already confirmed that there is a way to set lambda to pass message a time to your function.
Hope this helps.

Optimizing SQS Lambda configuration for single concurrency microservice composition

Apologies for the title. It's hard to summarise what I'm trying to accomplish.
Basically let me define a service to be an SQS Queue + A Lambda function.
A service (represented by square brackets below) performs a given task, where the queue is the input interface, processes the input, and outputs on to the queue of the subsequent service.
Service 1 Service 2 Service 3
[(APIG) -> (Lambda)] -> [(SQS) -> (Lambda)] -> [(SQS) -> (Lambda)] -> ...
Service 1: Consumes the request and payload, splits it into messages and passes on to the queue of the next service.
Service 2: This service does not have a reserved concurrency. It validates each message on the queue, and if valid, passes on to the next service.
Service 3: Processes each message in the queue (ideally in batches of approximately 100). The lambda here must have a reserved concurrency of 1 (as it hits an API that can't process multiple requests concurrently).
Currently I have the following configuration on Service 3.
Default visibility timeout of queue = 5 minutes
Lambda timeout = 5 minutes
Lambda reserved concurrency = 1
Problem 1: Service 3 consumes x items off the queue and if it finishes processing them within 30 seconds I expect the queue to process the next x items off the queue immediately (ideally x=100). Instead, it seems to always wait 5 minutes before taking the next batch of messages off the queue, even if the lambda completes in 30 seconds.
Problem 2: Service 3 typically consumes a few messages at a time (inconsistent) rather than batches of 100.
A couple of more notes:
In service 3 I do not explicitly delete messages off the queue using the lambda. AWS seems to do this itself when the lambda successfully finishes processing the messages
In service 2 I have one item per message. And so when I send messages to Service 3 I can only send 10 items at a time, which is kind of annoying. Because queue.send_messages(Entries=x), len(x) cannot exceed 10.
Does anyone know how I solve Problem 1 and 2? Is it an issue with my configuration? If you require any further information please ask in comments.
Thanks
Both your problems and notes indicate misconfigured SQS and/or Lambda function.
In service 3 I do not explicitly delete messages off the queue using
the lambda. AWS seems to do this itself when the lambda successfully
finishes processing the messages.
This is definitely not the case here as it would go agains the reliability of SQS. How would SQS know that the message was successfully processed by your Lambda function? SQS doesn't care about consumers and doesn't really communicate with them and that is exactly the reason why there is a thing such as visibility timeout. SQS deletes message in two cases, either it receives DeleteMessage API call specifying which message to be deleted via ReceiptHandle or you have set up redrive policy with maximum receive count set to 1. In such case, SQS will automatically send message to dead letter queue when if it receives it more than 1 time which means that every message that was returned to the queue will be send there instead of staying in the queue. Last thing that can cause this is a low value of Message Retention Period (min 60 seconds) which will drop the message after x seconds.
Problem 1: Service 3 consumes x items off the queue and if it finishes
processing them within 30 seconds I expect the queue to process the
next x items off the queue immediately (ideally x=100). Instead, it
seems to always wait 5 minutes before taking the next batch of
messages off the queue, even if the lambda completes in 30 seconds.
This simply doesn't happen if everything is working as it should. If the lambda function finishes in 30 seconds, if there is reserved concurrency for the function and if there are messages in the queue then it will start processing the message right away.
The only thing that could cause is that your lambda (together with concurrency limit) is timing out which would explain those 5 minutes. Make sure that it really finishes in 30 seconds, you can monitor this via CloudWatch. The fact that the message has been successfully processed doesn't necessarily mean that the function has returned. Also make sure that there are messages to be processed when the function ends.
Problem 2: Service 3 typically consumes a few messages at a time
(inconsistent) rather than batches of 100.
It can never consume 100 messages since the limit is 10 (messages in the sense of SQS message not the actual data that is stored within the message which can be anywhere up to 256 KB, possibly "more" using extended SQS library or similar custom solution). Moreover, there is no guarantee that the Lambda will receive 10 messages in each batch. It depends on the Receive Message Wait Time setting. If you are using short polling (1 second) then only subset of servers which are storing the messages will be polled and a single message is stored only on a subset of those servers. If those two subsets do not match when the message is polled, the message is not received in that batch. You can control this by increasing polling interval, Receive Message Wait Time, (max 20 seconds) but even if there are not enough messages in the queue when the timer finishes, the batch will still be received with fewer messages, possibly zero.
And as it was mentioned in the comments, using this strategy with concurrency set to low number can lead to some problems. Another thing is that you need to ensure that rate at which messages are produced is somehow consistent with the time it takes for one instance of lambda function to process the message otherwise you will end up with constantly growing queue, possibly losing messages after they outlive the Message Retention Period.

When to use delay queue feature of Amazon SQS?

I understand the concept of delay queue of Amazon SQS, but I wonder why it is useful.
What's the usage of SQS delay queue?
Thanks
One use case which i can think of is usage in distributed applications which have eventual consistency semantics. The system consuming the message may have an dependency like a co-relation identifier to be available and hence may need to wait for certain guaranteed duration of time before seeing the co-relation data. In this case, it makes sense for the message to be delayed for certain duration of time.
Like you I was confused as to a use-case for delay queues, until I stumbled across one in my own work. My application needs to have an internal queue with each item waiting at least one minute between each check for completion.
So instead of having to manage a "last-checked-time" on every object, I just shove the object's ID into an SQS queue messagewith a delay time of 60 seconds, and my main loop then becomes a simple long-poll against the queue.
A few off the top of my head:
Emails - Let's say you have a service that sends reminder emails triggered from queue messages. You'd have to delay enqueueing the message in that case.
Race conditions - Delivery delays can be used to overcome race conditions in distributed systems. For example, a service could insert a row into a table, and sends a message about its availability to other services. They can't use the new entry just yet, so you have to delay publishing the SQS message.
Handling retries - Sometimes if a message fails you want to retry with exponential backoffs. This requires re-enqueuing the message with longer delays.
I've built a suite of API's to make queue message scheduling easy. You can call our API's to schedule queue messages, cancel, edit, and check on the status of such messages. Think of it like a scheduler microservice.
www.schedulerapi.com
If you are looking for a solution, let me know. I've built these schedulers before at work for delivering emails at high scale, so I have experience with similar use cases.
One use-case can be:
Think of a time critical expression like a scheduled equity trade order.
If one of your system is fetching all the order scheduled in next 60 minutes and putting them in queue (which will be fetched by another sub system).
If you send these order directly, then they will be visible immediately to process in queue and will be processed depending upon their order.
But most likely, they will not execute in exact time (Hour:Minute:Seconds) in which Customer wanted and this will impact the outcome.
So to solve this, what first sub system will do, it will add delay seconds (difference between current and execution time) so message will only be visible after that much delay or at exact time when user wanted.