What is the best practice to architect tasks processing using AWS? - django

I am wondering about how to configure AWS Lambda, SNS, and SQS for processing background tasks.
There are three ways I thought.
Option 1. A function called consumer can execute workers by receiving tasks from the queue.
Option 2. Send all tasks to SNS. One worker and one SQS receive and work from SNS.
Option 3. Directly forward the task to one SQS and one lambda from the APP.
The biggest concern is whether to directly invoke Lambda in the app or use task consumer using SQS or SNS.
My idea is from Triggering multiple lambda functions from one SQS trigger

It depends on your current and future requirements:
Options 1: Choosing consumer lambda will allow you to add validations and manipulation in the event.
But your consumer lambda will be running until your worker lambdas are running.
Option 2: SNS gives you flexibility to add new events in future and new subscribers as well and your App will have to deal with only SNS.
Option 3: If you are sure in future there will be no such other lambdas. In this case your app need to have the configuration which type of event will go to which SQS.
You can choose any option based on your requirement but I will suggest you to choose option 2 as your app will be required to push notification to SNS only(Single integration). In SNS you can add filters for different types of event.
From SNS you can directly trigger lambda as well.
If you do not need output of lambda functions in your App you should use SNS/SQS for async processing.

The typical pattern is:
Push jobs/tasks to an Amazon SQS queue
Configure an AWS Lambda function to subscribe to the SQS queue
Lambda will automatically execute the Lambda function for each message in the SQS queue
I guess this matches your Option 1, but with the AWS Lambda service acting as the "consumer" that triggers the individual Lambda functions.
If there are different types of inputs (eg three different tasks) that each require a different Lambda function, then create 3 separate queues each linked to its own Lambda function (your Option 3).
Inserting Amazon SNS in-between (shown in your Option 2) makes it easier to 'fork' information, such as adding another subscriber to each message in case they need to be processed in parallel. Otherwise, it is not necessary.

Related

How to use Aws SQS as event source and invoke different Lamda function based on event Attributes

guys need small help, I have a use case, where I want to set up a communication service.
using SQS, SQs is going to receive a different type of events to be communicated. Now we have a single lambda function which does a single communication. let's say one email Lambda, Slack lambda, etc.
how I can invoke different lambda based on queue attributes. I was planning to use SQS as an event source and something kind of this architecture link to sample architeture
here in the above, we can handle rate limiting and concurrency at the lambda service level
simplified works if event type is A invoke Lambda A if the event type is B invoke a lambda B
and both events are in same SQS
all suggestions are welcome
Your problem is a SQS message can only be read by one service at a time. When it is being read, it is invisible to anyone else. You can only have one Lambda consumer and there isn't any partitioning or routing in SQS besides setting up another SQS topic. Multiple consumers are implemented Kensis or AWS MSK (Kafka)
What you are trying to accomplish is called a fan out. This is a common cloud architecture. What you probably want to do is publish initially to SNS. Then with SNS you can filter and route to multiple SQS topics for each of the message types and each SQS topic would then be consumed by it's own Lambda.
Check out a tutorial here:
https://docs.aws.amazon.com/sns/latest/dg/sns-common-scenarios.html

AWS Lambda to read from SQS queue

I have an AWS Lambda function to read from an SQS queue. The lambda logic is basically to read off one message from SQS and then it processes and deletes the message. Code to read the message being something like.
ReceiveMessageRequest messageRequest =
new ReceiveMessageRequest(queueUrl).withWaitTimeSeconds(5).withMaxNumberOfMessages(1);
Now my question is what is the best way to trigger this lambda and how does this lambda scale for instance, if there are let's say 1000 messages in the queue so will there be a 1000 lambdas running together, since in my case one lambda can read only one message off the queue.
Any pointers on best practices around this kind of design.
Right now you best option is probably to setup an AWS Cloudwatch event rule that calls the lambda function on the interval that you need.
Here is a sample app from AWS to do just that:
https://github.com/awslabs/aws-serverless-sqs-event-source
I do believe that AWS will eventually support SQS as a event type for AWS lambda, which should make this even easier, but for now you best choice is probably a version of the code I linked above.
We can now use SQS messages to trigger AWS Lambda Functions. Moreover, no longer required to run a message polling service or create an SQS to SNS mapping.
Further details:
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
AWS added native support in June 2018: https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
There are probably a few ways to do this, but I found this guide to be fairly helpful when I tried to implement the same sort of functionality you are describing in Node.js. One downside to this strategy is that you can only poll the queue every 60s.
The basic workflow would look something like this:
Set up a CloudWatch Alarm that gets triggered when the queue has a certain number of messages.
The Cloudwatch alarm then posts to SNS
The SNS message triggers a Lambda scale() function
The scale() function updates a configuration record in a DynamoDB table that sets the number of worker processes needed
You then have a main CloudWatch Schedule that invokes a worker() function every 60s
The worker() function reads configuration from DynamoDB to determine how many concurrent processes are needed, based on the queue size.
Worker() then invokes the appropriate number of process() functions
Process() function consumes messages from SQS, performs your main application logic, and then removes the item from the queue.
You can find an example of what the scaling functions would look like in Node.js here
I have used this solution in a production environment for almost a year without any issues, even with thousands of messages in the queue. If you cut out the scaling portion it is only going to do one message a time.

Using AWS Lambda Functions to Consume AWS SQS Queues

I'm using an AWS Lambda function that is triggered from an SNS event trigger to consume from an SQS queue. When the Lambda function executes, it pulls 10 messages from the queue, processes them, pulls another 10, and so on and so forth - up to a certain time limit that's coded into the Lambda function (less than the max of 5 minutes, obviously).
It's my understanding that a Lambda function triggered by an SNS event is one-to-one, is that correct? In other words, one SNS event won't trigger multiple Lambda functions (up to the maximum concurrent execution limit). There's no scaling based on load.
Are there any other potential solutions, leveraging Lambda, that would let me consume from SQS as frequently/fast as possible? I had considered trying to auto-scale my Lambda functions by leveraging CloudWatch alarms (and SNS event triggers) based on SQS queue size, but it seems like those alarms can fire, at most, every 5 minutes. I've also considered developing a master Lambda function that can automatically execute (many) slave Lambdas based on querying the queue size.
I understand that the more optimal design may be to leverage Kinesis instead of SNS. I may consider incorporating Kinesis in the future, but let's just pretend that Kinesis is not an option at this time.
There is no best way to do this. One approach (which you've kind of already mentioned) is to use CloudWatch and schedule a Lambda function to run every minute (that's the minimum schedule time for Lambda). This Lambda function will then look for new SQS messages and invoke other Lambda functions to handle new message(s). Here is a very good article for that use case: https://cloudonaut.io/integrate-sqs-and-lambda-serverless-architecture-for-asynchronous-workloads/
Personally, I do not recommend triggering your Lambda by SNS for this use case, because SNS doesn't give a full guarantee for delivery and recommend sending the SNS notifications to SQS - which does not solve your problem. From the FAQ's:
[...] If it is critical that all published messages be successfully processed, developers should have notifications delivered to an SQS queue (in addition to notifications over other transports).
Source: https://aws.amazon.com/sns/faqs/
For this kind of processing, instead of SQS if you push messages to Kinesis Stream you should be able to flexibly process(In batches of needed size) the messages.
Note: If you use SQS, after triggering a Lambda function through SNS (or using a Scheduled Lambda), it can invoke inner Lambda functions to check the queue where multiple concurrent inner Lambdas are spawned. However the problem is that its not practical to process SQS items in batches.

lambda calling another lambda and hitting concurrency threshold of 1000

Scenario- There is a master lambda who is splitting work and giving it off to multiple other lambdas (workers). The first lambda iterates and invokes the other lambdas asynchronously
If the number of lambdas which are getting spawned are more than 1000, will it fail?
Should there be an SNS between the two lambdas... so that the SNS will retry?
Or a more complicated approach of putting the messages into a queue and then sending notification of 'X' number of worker lambdas to start polling the queue?
Is there a better way?
Yes, there should be some kind of decoupling between the producer and consumers. The most obvious way is to have the producer create 1000 messages on an SNS topic and let AWS handle how many consumers are needed (perhaps it can reuse consumer lambdas). Other ways include triggering from records inserted into DynamoDB.
If you want the listeners to pull messages from an SQS queue you'll need to trigger them yourself, which you can do with CloudWatch (maximum 1 trigger per minute)
If the number of lambdas which are getting spawned are more than 1000 Qn: Will it fail?
Yes it will Fail. But you can increase the default limits(1000) to n numbers based on region by requesting AWS Customer Support.
http://docs.aws.amazon.com/lambda/latest/dg/limits.html
http://docs.aws.amazon.com/lambda/latest/dg/concurrent-executions.html
I am not sure about your exact requirements. Based on your requirements you need to make sure spawning 1000 lambdas is required in your design.
My suggestions are below
Suggestion 1:-
AWS Step Functions
By AWS Step Functions you can invoke your child lambda from master using state machine language. Based on how your master lambda invoked.(ex: Cloudwatch rules, Triggers). For more information goto:-
https://aws.amazon.com/step-functions/
https://states-language.net/spec.html
Suggestion 2:-
AWS ECS Container
From Master Lambda Send Messages to SQS. Launch your child program as ECS Container Service. In Program, you can consume your SQS Messages and Solve your business logic.
http://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs_services.html

Read SQS queue from AWS Lambda

I have the following infrastructure:
I have an EC2 instance with a NodeJS+Express process listening on a port for messages (process 1). Every time the process receives a message it sends it to an SQS queue. Then I have another process in the same machine reading the queue using long polling (process 2). When it finds a message in the queue it inserts the data in a MariaDB database sitting on an RDS instance.
(Just to clarify, messages are generated by users, they send a chunk of data which can contain arbitrary information to the endpoint where the process 1 is listening)
Now I want to put the process that reads the SQS (process 2) in a Lambda function so that the process that writes to the queue and the one that reads from the queue are completely independent. The problem is that I don't know if this is possible.
I know that Lambda function are invoked in response to an event, and the events supported at the moment are S3, SNS, SES, DynamoDB, Kinesis, Cognito, CloudWatch and Cloudformation but NOT SQS.
I was thinking in using SNS notifications to invoke the Lambda function so that every time a message is pushed to the queue, an SNS notification is fired and invokes the Lambda function but after playing a bit with it I've realised that is not possible to create an SNS notification from SQS, it's only possible to write SNS notifications to the queue.
Right now I'm a bit stuck because I don't know how to continue. I have the feeling that is not possible to create this infrastructure due to the current limitations in the AWS services. Is there another way to do what I want or am I in a dead-end?
Just to extend my question with some research I've made, this github repo shows how to read an SQS queu from a Lambda function but the lambda function works only if is fired from the command line:
https://github.com/robinjmurphy/sqs-to-lambda
In the readme, the author mentions the following:
Update: Lambda now supports SNS notifications as an event source,
which makes this hack entirely unneccessary for SNS notifcations. You
might still find it useful if you like the idea of using a Lambda
function to process jobs on an SQS queue.
But I think this doesn't solve my problem, an SNS notification can invoke the Lambda function but I don't see how I can create a notification when a message is received in the SQS queue.
Thanks
There are couple of Strategies which can be used to connect the dots, (A)Synchronously or Run-Sleep-Run to keep the data process flow between SNS, SQS, Lambda.
Strategy 1 : Have a Lambda function listen to SNS and process it in real time [Please note that an SQS Queue can subscribe to an SNS Topic - which would may be helpful for logging / auditing / retry handling]
Strategy 2 : Given that you are getting data sourced to SQS Queue. You can try with 2 Lambda Functions [Feeder & Worker].
Feeder would be scheduled lambda function whose job is to take items
from SQS (if any) and push it as an SNS topic (and continue doing it forever)
Worker would be linked to listen the SNS topic which would do the actual data processing
We can now use SQS messages to trigger AWS Lambda Functions. Moreover, no longer required to run a message polling service or create an SQS to SNS mapping.
Further details:
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
AWS SQS is one of the oldest products of Amazon, which only supported polling (long and short) up until June 2018. As mentioned in this answer, AWS SQS now supports the feature of triggering lambda functions on new message arrival in SQS. A complete tutorial for this is provided in this document.
I used to tackle this problem using different mechanisms, and given below are some approaches you can use.
You can develop a simple polling application in Lambda, and use AWS CloudWatch to invoke it every 5 mins or so. You can make this near real-time by using CloudWatch events to invoke lambda with short downtimes. Use this tutorial or this tutorial for this purpose. (This could cost more on Lambdas)
You can consider that SQS is redundant if you don't need to persist the messages nor guarantee the order of delivery. You can use AWS SNS (Simple Notification Service) to directly invoke a lambda function and do whatever the processing required. Use this tutorial for this purpose. This will happen in real-time. But the main drawback is the number of lambdas that can be initiated per region at a given time. Please read this and understand the limitation before following this approach. Nevertheless AWS SNS Guarantees the order of delivery. Also SNS can directly call an HTTP endpoint and store the message in your DB.
I had a similar situation (and now have a working solution deploed). I have addressed it in a following manner:
i.e. publishing events to SNS; which then get fanned-out to Lambda and SQS.
NOTE: This is not applicable to the events that have to be processed in a certain order.
That there are some gotchas (w/ possible solutions) such as:
racing condition: lambda might get invoked before messages is deposited into the queue
distributed nature of SQS queue may lead to returning no messages even though there is a message note1.
The solution to both cases would be to do long-polling of SQS queue; but this does make your lambda bill more expensive.
note1
Short poll is the default behavior where a weighted random set of machines is sampled on a ReceiveMessage call. This means only the messages on the sampled machines are returned. If the number of messages in the queue is small (less than 1000), it is likely you will get fewer messages than you requested per ReceiveMessage call. If the number of messages in the queue is extremely small, you might not receive any messages in a particular ReceiveMessage response; in which case you should repeat the request.
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/APIReference/API_ReceiveMessage.html
We had some similar requirements so we ended up building a library and open sourcing it to help with SQS to Lambda async. I'm not sure if this fills your particular set of requirements, but thought it might be worth a look: https://read.iopipe.com/sqs-lambda-teaming-up-92c4096be49c