Will a standard SQS that I configured to invoke a lambda when it receives message invoke "many lambdas" or only 1 lambda at a time?
From Using AWS Lambda with Amazon SQS:
Lambda polls the queue and invokes your function synchronously with an event that contains queue messages.
It will invoke as many as required, depending on your reserved concurrency limits:
Lambda increases the number of processes that are reading batches by up to 60 more instances per minute. The maximum number of batches that can be processed simultaneously by an event source mapping is 1000.
Related
My requirement is like this.
Read from a SQS every 2 hours, take all the messages available and then process it.
Processing includes creating a file with details from SQS messages and sending it to an sftp server.
I implemented a AWS Lambda to achieve point 1. I have a Lambda which has an sqs trigger. I have set batch size as 50 and then batch window as 2 hours. My assumption was that Lambda will get triggered every 2 hours and 50 messages will be delivered to the lambda function in one go and I will create a file for every 50 records.
But I observed that my lambda function is triggered with varied number of messages(sometimes 50 sometimes 20, sometimes 5 etc) even though I have configured batch size as 50.
After reading some documentation I got to know(I am not sure) that there are 5 long polling connections which lambda spawns to read from SQS and this is causing this behaviour of lambda function being triggered with varied number of messages.
My question is
Is my assumption on 5 parallel connections being established correct? If yes, is there a way I can control it? I want this to happen in a single thread / connection
If 1 is not possible, what other alternative do I have here. I do not want to have one file created for every few records. I want one file to be generated every two hours with all the messages in sqs.
A "SQS Trigger" for Lambda is implemented with the so-called Event Source Mapping integration, which polls, batches and deletes messages from the queue on your behalf. It's designed for continuous polling, although you can disable it. You can set a maximum batch size of up to 10,000 records a function receives (BatchSize) and a maximum of 300s long polling time (MaximumBatchingWindowInSeconds). That doesn't meet your once-every-two-hours requirement.
Two alternatives:
Remove the Event Source Mapping. Instead, trigger the Lambda every two hours on a schedule with an EventBridge rule. Your Lambda is responsible for the SQS ReceiveMessage and DeleteMessageBatch operations. This approach ensures your Lambda will be invoked only once per cron event.
Keep the Event Source Mapping. Process messages as they arrive, accumulating the partial results in S3. Once every two hours, run a second, EventBridge-triggered Lambda, which bundles the partial results from S3 and sends them to the SFTP server. You don't control the number of Lambda invocations.
Note on scaling:
<Edit (mid-Jan 2023): AWS Lambda now supports SQS Maximum Concurrency>
AWS Lambda now supports setting Maximum Concurrency to the Amazon SQS event source, a more direct and less fiddly way to control concurrency than with reserved concurrency. The Maximum Concurrency setting limits the number of concurrent instances of the function that an Amazon SQS event source can invoke. The valid range is 2-1000 concurrent instances.
The create and update Event Source Mapping APIs now have a ScalingConfig option for SQS:
aws lambda update-event-source-mapping \
--uuid "a1b2c3d4-5678-90ab-cdef-11111EXAMPLE" \
--scaling-config '{"MaximumConcurrency":2}' # valid range is 2-1000
</Edit>
With the SQS Event Source Mapping integration you can tweak the batch settings, but ultimately the Lambda service is in charge of Lambda scaling. As the AWS Blog Understanding how AWS Lambda scales with Amazon SQS standard queues says:
Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time. If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages.
You could theoretically restrict the number of concurrent Lambda executions with reserved concurrency, but you would risk dropped messages due to throttling errors.
You could try to set the ReservedConcurrency of the function to 1. That may help. See the docs for reference.
A simple solution would be to create a CloudWatch Event Trigger (similar to a Cronjob) that triggers your Lambda function every two hours. In the Lambda function, you call ReceiveMessage on the Queue until you get all messages, process them and afterward delete them from the Queue. The drawback is that there may be too many messages to process within 15 minutes so that's something you'd have to manage.
in the AWS doc, it is written
Lambda reads up to five batches and sends them to your function.
(https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#events-sqs-scaling)
I am a bit confused about that part
"reads up to five batches".
Does it mean:
5 SQS ReceiveMessage API calls are made in parallel at the same time ?
5 SQS ReceiveMessage API calls are made one by one (each one creating a new lambda environment)
Lambda polls 5 batches in parallel.
AWS Lambda, in python for example, uses the queue.receive_messages function, to receive messages. This function is able to receive a batch of messages in a single request from an SQS queue.
The default is 10 messages per batch as seen here and may range to 10000 for standard queues. But there is a limit for simultaneous batches and that's 5 batches, sent to the same lambda.
If there are still messages in the Queue, lambda launches up to 60 more lambdas per minute to consume them.
Finally, event source mapping (lambda's link to the SQS queue) can handle up to 1000 batches of messages simultaneously.
Why are some Amazon Web Services configured such that they can make direct calls to Lambda handlers with appropriate permissions, while for others like SQS, lambda needs to poll repeatedly? Why can't we have a provision for invoking Lambda as soon as a message is added to an SQS, instead of polling repeatedly?
I think this is related to scaling.
From Understanding Scaling Behavior - AWS Lambda:
Poll-based event sources that are not stream-based: For Lambda functions that process Amazon SQS queues, AWS Lambda will automatically scale the polling on the queue until the maximum concurrency level is reached, where each message batch can be considered a single concurrent unit. AWS Lambda's automatic scaling behavior is designed to keep polling costs low when a queue is empty while simultaneously enabling you to achieve high throughput when the queue is being used heavily.
When an Amazon SQS event source mapping is initially enabled, Lambda begins long-polling the Amazon SQS queue. Long polling helps reduce the cost of polling Amazon Simple Queue Service by reducing the number of empty responses, while providing optimal processing latency when messages arrive.
When messages are available, Lambda initially launches up to 5 instances of your function, to handle 5 batches simultaneously. Then, Lambda launches up to 60 more instances per minute, up to 1000 total, as long as you have concurrency available at the account and function level.
I have a AWS Lambda function which is triggered by SQS. This function is triggered approximately 100 times daily, but request count to the SQS queue is approximately 20.000 times daily. I don't understand why the number of requests made to the SQS is too high. My expectation is that the number of requests made to the SQS should be same with the Lambda invocation.
I have only one Lambda function and one SQS queue in my account.
Can be related with polling of SQS queue? I tried to change the polling interval of SQS from the queue configuration but nothing changed. Another possibility is to change polling interval from Lambda function configuration. However, I cannot find any related parameter.
Briefy, I want to reduce number of SQS request, how can i do that while invoking Lmabda function with SQS?
When using SQS as an event source for AWS Lambda, AWS Lambda regularly polls the configured SQS queue to fetch new messages. While the official documentation isn't clear really about that, the blog post announcing that feature goes into the details:
When an SQS event source mapping is initially created and enabled, or when messages first appear after a period with no traffic, then the Lambda service will begin polling the SQS queue using five parallel long-polling connections.
According to the AWS documentation, the default duration for a long poll from AWS Lambda to SQS is 20 seconds.
That results in five requests to SQS every 20 seconds for AWS Lambda functions without significant load, which sums up to the ~21600 per day, which is close to the 20000 you're experiencing.
While increasing the long poll duration seems like an easy way to decrease the number of requests, that's not possible, as the 20 seconds AWS Lambda is using by default is already the maximum possible duration for an SQS queue. I'm afraid there is no easy way to decrease the requests to SQS, when using it as event source for AWS Lambda. Instead depending it could be worth evaluating if another event source, like SNS, would fit your use case as well.
Here is how we originally implemented when there is no SQS trigger.
Create a SNS trigger with the SQS Cloudwatch Metric
ApproximateNumberOfMessagesVisible > 0
Trigger a Lambda from SNS, Read Messages from SQS and deliver it to whichever the lambda needs the message.
Alternatively, you can use Kinesis to deliver it to Lambda.
SQS --> Cloudwatch (Trigger Lambda) --> Lambda(Reads Messages) -->
Kinesis (Set Batch Size) --> Lambda (Handle Actual Message)
You can also use Kinesis directly but there is no delayed delivery.
Hope it helps.
I'm using an AWS Lambda function that is triggered from an SNS event trigger to consume from an SQS queue. When the Lambda function executes, it pulls 10 messages from the queue, processes them, pulls another 10, and so on and so forth - up to a certain time limit that's coded into the Lambda function (less than the max of 5 minutes, obviously).
It's my understanding that a Lambda function triggered by an SNS event is one-to-one, is that correct? In other words, one SNS event won't trigger multiple Lambda functions (up to the maximum concurrent execution limit). There's no scaling based on load.
Are there any other potential solutions, leveraging Lambda, that would let me consume from SQS as frequently/fast as possible? I had considered trying to auto-scale my Lambda functions by leveraging CloudWatch alarms (and SNS event triggers) based on SQS queue size, but it seems like those alarms can fire, at most, every 5 minutes. I've also considered developing a master Lambda function that can automatically execute (many) slave Lambdas based on querying the queue size.
I understand that the more optimal design may be to leverage Kinesis instead of SNS. I may consider incorporating Kinesis in the future, but let's just pretend that Kinesis is not an option at this time.
There is no best way to do this. One approach (which you've kind of already mentioned) is to use CloudWatch and schedule a Lambda function to run every minute (that's the minimum schedule time for Lambda). This Lambda function will then look for new SQS messages and invoke other Lambda functions to handle new message(s). Here is a very good article for that use case: https://cloudonaut.io/integrate-sqs-and-lambda-serverless-architecture-for-asynchronous-workloads/
Personally, I do not recommend triggering your Lambda by SNS for this use case, because SNS doesn't give a full guarantee for delivery and recommend sending the SNS notifications to SQS - which does not solve your problem. From the FAQ's:
[...] If it is critical that all published messages be successfully processed, developers should have notifications delivered to an SQS queue (in addition to notifications over other transports).
Source: https://aws.amazon.com/sns/faqs/
For this kind of processing, instead of SQS if you push messages to Kinesis Stream you should be able to flexibly process(In batches of needed size) the messages.
Note: If you use SQS, after triggering a Lambda function through SNS (or using a Scheduled Lambda), it can invoke inner Lambda functions to check the queue where multiple concurrent inner Lambdas are spawned. However the problem is that its not practical to process SQS items in batches.