How does AWS Lambda internal pollers manage SQS API calls? - amazon-web-services

in the AWS doc, it is written
Lambda reads up to five batches and sends them to your function.
(https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#events-sqs-scaling)
I am a bit confused about that part
"reads up to five batches".
Does it mean:
5 SQS ReceiveMessage API calls are made in parallel at the same time ?
5 SQS ReceiveMessage API calls are made one by one (each one creating a new lambda environment)

Lambda polls 5 batches in parallel.

AWS Lambda, in python for example, uses the queue.receive_messages function, to receive messages. This function is able to receive a batch of messages in a single request from an SQS queue.
The default is 10 messages per batch as seen here and may range to 10000 for standard queues. But there is a limit for simultaneous batches and that's 5 batches, sent to the same lambda.
If there are still messages in the Queue, lambda launches up to 60 more lambdas per minute to consume them.
Finally, event source mapping (lambda's link to the SQS queue) can handle up to 1000 batches of messages simultaneously.

Related

AWS Lambda read from SQS without concurrency

My requirement is like this.
Read from a SQS every 2 hours, take all the messages available and then process it.
Processing includes creating a file with details from SQS messages and sending it to an sftp server.
I implemented a AWS Lambda to achieve point 1. I have a Lambda which has an sqs trigger. I have set batch size as 50 and then batch window as 2 hours. My assumption was that Lambda will get triggered every 2 hours and 50 messages will be delivered to the lambda function in one go and I will create a file for every 50 records.
But I observed that my lambda function is triggered with varied number of messages(sometimes 50 sometimes 20, sometimes 5 etc) even though I have configured batch size as 50.
After reading some documentation I got to know(I am not sure) that there are 5 long polling connections which lambda spawns to read from SQS and this is causing this behaviour of lambda function being triggered with varied number of messages.
My question is
Is my assumption on 5 parallel connections being established correct? If yes, is there a way I can control it? I want this to happen in a single thread / connection
If 1 is not possible, what other alternative do I have here. I do not want to have one file created for every few records. I want one file to be generated every two hours with all the messages in sqs.
A "SQS Trigger" for Lambda is implemented with the so-called Event Source Mapping integration, which polls, batches and deletes messages from the queue on your behalf. It's designed for continuous polling, although you can disable it. You can set a maximum batch size of up to 10,000 records a function receives (BatchSize) and a maximum of 300s long polling time (MaximumBatchingWindowInSeconds). That doesn't meet your once-every-two-hours requirement.
Two alternatives:
Remove the Event Source Mapping. Instead, trigger the Lambda every two hours on a schedule with an EventBridge rule. Your Lambda is responsible for the SQS ReceiveMessage and DeleteMessageBatch operations. This approach ensures your Lambda will be invoked only once per cron event.
Keep the Event Source Mapping. Process messages as they arrive, accumulating the partial results in S3. Once every two hours, run a second, EventBridge-triggered Lambda, which bundles the partial results from S3 and sends them to the SFTP server. You don't control the number of Lambda invocations.
Note on scaling:
<Edit (mid-Jan 2023): AWS Lambda now supports SQS Maximum Concurrency>
AWS Lambda now supports setting Maximum Concurrency to the Amazon SQS event source, a more direct and less fiddly way to control concurrency than with reserved concurrency. The Maximum Concurrency setting limits the number of concurrent instances of the function that an Amazon SQS event source can invoke. The valid range is 2-1000 concurrent instances.
The create and update Event Source Mapping APIs now have a ScalingConfig option for SQS:
aws lambda update-event-source-mapping \
--uuid "a1b2c3d4-5678-90ab-cdef-11111EXAMPLE" \
--scaling-config '{"MaximumConcurrency":2}' # valid range is 2-1000
</Edit>
With the SQS Event Source Mapping integration you can tweak the batch settings, but ultimately the Lambda service is in charge of Lambda scaling. As the AWS Blog Understanding how AWS Lambda scales with Amazon SQS standard queues says:
Lambda consumes messages in batches, starting at five concurrent batches with five functions at a time. If there are more messages in the queue, Lambda adds up to 60 functions per minute, up to 1,000 functions, to consume those messages.
You could theoretically restrict the number of concurrent Lambda executions with reserved concurrency, but you would risk dropped messages due to throttling errors.
You could try to set the ReservedConcurrency of the function to 1. That may help. See the docs for reference.
A simple solution would be to create a CloudWatch Event Trigger (similar to a Cronjob) that triggers your Lambda function every two hours. In the Lambda function, you call ReceiveMessage on the Queue until you get all messages, process them and afterward delete them from the Queue. The drawback is that there may be too many messages to process within 15 minutes so that's something you'd have to manage.

SQS invoke multiple lambda at same time

I am new in aws sqs as of now I understand sqs have a queue which is storing request messages (parameter) then our attached lambda will fetch numbers of messages based on the batch file which we set on lambda.
so if the sqs queue has 10000 messages and the lambda batch is set to 100 then in each pulling lambda which fetches 100 messages from the queue and executes all until all request are processed then again it will pull 100 messages and so on?
so as of now, I understand lambda will wait for the next pulling until the previous pulling process is finished.
hope I am correct if not please correct me.
now my requirement is lambda should not wait to finish the previous pulling instead it should pull the next 100 messages and execute parallelly for eg lambda should create a different instance(something like this) and each instance pulls 100 100 messages and execute parallelly.
In the situation you describe, the AWS Lambda service will automatically run multiple AWS Lambda functions based upon the concurrency settings of your function.
See: Lambda function scaling - AWS Lambda
The default is to permit up to 1000 concurrent executions of an AWS Lambda function.
Therefore, you do not need to change anything. It will automatically create multiple instances of the Lambda function in parallel and pass (up to) 100 messages to each execution.
For a really good series of articles to understand how AWS Lambda operates, see: Operating Lambda: Performance optimization – Part 1 | AWS Compute Blog

AWS SQS message concurrent processing

I have connected an SQS to a lambda(Lambda 1). The lambda(Lambda 1) makes API calls to a service.
I have to seed 1000 messages at once. For that, I am sending 1000 messages to SQS with the help of another lambda(Lambda 2). The problem I am facing is that the SQS triggers 1000 lambda which makes 1000 API calls to my service leading to service unavailability.
Can I configure the SQS frequency for example, in 1 second the SQS should only process 10 or 100 messages?
Limiting a lambda function's allowed concurrency
In short you set how many concurrent executions this lambda can have
Intro: https://aws.amazon.com/about-aws/whats-new/2017/11/set-concurrency-limits-on-individual-aws-lambda-functions/
Docs: https://docs.aws.amazon.com/lambda/latest/dg/invocation-scaling.html

Does SQS trigger lambda in an async / sync manner?

Will a standard SQS that I configured to invoke a lambda when it receives message invoke "many lambdas" or only 1 lambda at a time?
From Using AWS Lambda with Amazon SQS:
Lambda polls the queue and invokes your function synchronously with an event that contains queue messages.
It will invoke as many as required, depending on your reserved concurrency limits:
Lambda increases the number of processes that are reading batches by up to 60 more instances per minute. The maximum number of batches that can be processed simultaneously by an event source mapping is 1000.

Map Reduce AWS Lambda & SQS

I have a Lambda function which spawns up a number of worker Lambda functions and each worker function posts to an SQS queue if there is any error.
There is a UI which long-polls the SQS queue for any errors. My problem is that how do I know when the processing is completed?
Since the first Lambda function (which spawns the worker Lambda functions) runs asynchronously, that is, it splits the data across the worker Lambda functions and then returns/finishes. I need to have an ability to be able to figure out when the processing is completed.
The reason why I'm only posting the errors to the SQS queue and not success is because if I have 10,000 objects to process and there are 9,000 successes, I would have to do quite many ReceiveMessages (an SQS API call to retrieve items from the SQS queue) in the UI/client side (probably around 900 calls if I specify the maximum number of messages to receive as 10 per call. You cannot retrieve more than 10 messages from the queue per a call.)
How can I overcome this design issue?
I'm using API Gateway, AWS Lambda and Dynamo DB (feel free to suggest any other Amazon/AWS service that could make this easier to get the job done.)