Polling SQS from Step function - amazon-web-services

I have two AWS Glue jobs in my application.
The first glue job produced some data and it passes to a lambda and lambda errors goes into a DLQ (SQS)
If there is no error messages in DLQ, I want to run the second Glue job.
I want to automate this logic with Step function. So the steps of step function will be:
run first glue job.
poll DLQ
if DLQ has no message for some time, run second glue job.
Is there any way to poll SQS for messages from step function ?

Answering the main question: SFN cannot poll SQS directly. If you really need this kind of integration, SFN can invoke a Lambda to poll a queue and return the result back to your SFN.
[Edit: The recently introduced Step Function Service Integrations now allow calling the SQS SDK ReceiveMessage and DeleteMessage APIs on a Queue as a Step Function task. Although SQS polling is now available in SFN, Queues are arguably not the best fit for the scenario in the OP].
More generally, it's not a typical pattern to use queues as part of a StepFunction architecture. SFN orchestration is really an alternative to queue-based orchestration. SFN's core function is to orchestrate tasks and it has its own try-catch-error handling logic. Not that it can't be done with queues, just not sure what the benefit is.
A typical SFN integration pattern starts with an EventBridge cron that triggers a SFN execution. Glue Job 1 is the first task - SFN calls Glue's StartJobRun as a sync-type task and waits for the result. Chain together multiple lambda, glue event stages. Add Error branches to handle failure states. The job would end with a notification task.

Related

AWS Lambda schedule a delayed execution to run once

I have an API Gateway with Lambdas behind, for some of the endpoints I want to schedule an execution in the future, to run once, for example the REST call was made at T time, I want that lambda to schedule an execution ONCE at T+20min.
The only solution I found to achieve this is to use boto3 and Cloudwatch to setup a cron at the moment the REST call was made, send an event with the payload, then when the delayed lambda runs, it removes the rule.
I found this very heavy, is there any other way to achieve such pattern ?
Edit: It is NOT A RECURRING Lambda, just to run ONCE.
One option is to use AWS Step Functions to trigger the AWS Lambda function after a given delay.
Step Functions has a Wait state that can schedule or delay execution, so you can can implement a fairly simple Step Functions state machine that puts a delay in front of calling a Lambda function. No database required!
For an example of the concept (slightly different, but close enough), see:
Using AWS Step Functions To Schedule Or Delay SNS Message Publication - Alestic.com
Task Timer - AWS Step Functions

AWS Lambda Scheduled One Time Tasks

I’m working on figuring out the best way to have Lambda run one time tasks at a given time.
The system I’m envisioning will basically have events that will need to be sent out, either as soon as the event is received/created, at a specific time, or as a recurring action. And I’d like to use AWS as much as possible for this, due to the scalable nature.
My original idea was to have a AWS SQS queue for events to send. Then I’d have a DynamoDB table for future events. I’d also have two AWS Lambda functions, one setup to run on a cron job every few minutes to take the events that are scheduled in the next 15 minutes or so from the DynamoDB table and put them into that AWS SQS queue with a Message Timer setup to delay the message from being visible for that given time. The second Lambda function would be setup and have a trigger to be run from that AWS SQS queue. This function would be responsible for actually sending the event out.
From there I could either add the event to the SQS queue (with or without a message timer) if it’s gonna need to be sent out within the next 15 minutes. Or add it to the DynamoDB table if it’s gonna need to be sent out in the future (beyond 15 minutes).
The biggest problem I just figured out is that AWS SQS FIFO queues doesn’t support Message Timers on individual messages. I need a FIFO queue because I need to prevent these events from being sent out multiple times, or triggering my second Lambda function twice.
I've also looked into the AWS Lambda cron jobs, and although you can schedule invocations every say 5 minutes, I don't think this is what I'm looking for because I'm looking more for scheduling a 1 time invocation in the future, and having that be scalable. So I don't think this is what I'm looking for.
Any ideas on how I can achieve this, since it doesn’t look like Amazon SQS Message Timers will work for what I'm trying to do?
Have you considered Step Function? You could create a wait state before invoking the lambda.

how should i architect aws lambda to support parallel process in batch model?

i have an aws lambda function to do some statistics on over 1k of stock tickers after market close. i have an option like below.
setup a cron job in ec2 instance and trigger a cron job to submit 1k http request asyn (e.g. http://xxxxx.lambdafunction.xxxx?ticker= to trigger the aws lambda function (or submit 1k request to SNS and let lambda to pickup.
i think it should run fine, but much appreciate if there is any serverless/PaaS approach to trigger task
On top of my head, Here are a couple of ways to achieve what you need:
Option 1: [Cost-Effective]
Post all the ticks to AWS FIFO SQS queue.
Define triggers on this queue to invoke lambda function.
Result: Since you are posting all the events in FIFO queue that maintains the order, all the events will be polled sequentially. More-over SQS to lambda trigger will help you scale automatically based on the number of message in the queue.
Option 2: [Costly and can easily scale for real-time processing]
Same as above, but instead of posting to FIFO queue, post to Kinesis Stream.
Enable Kinesis stream to trigger lambda function.
Result: Kinesis will ensure the order of event arriving in the stream and lambda function invocation will be invoked based on the number of shards in the stream. This implementation scales significantly. If you have any future use-case for real-time processing of tickers, this could be a great solution.
Option 3: [Cost Effective, alternate to Option:1]
Collect all ticker events(1k or whatever) and put it into a file.
Upload this file to AWS S3 bucket.
Enable S3 event notification to trigger proxy lambda function.
This proxy lambda function reads the s3 file and based on the total number of events in the file, it will spawn n parallel actor lambda function.
Actor lambda function will process each event.
Result: Easy to implement, cost-effective and provides easy scaling based on your custom algorithm to distribute the load in the proxy lambda function.
Option 4: [All-serverless]
Write a lambda function that gets the list of tickers from some web-server.
Define an AWS cloud watch rule for generating events based on cron/frequency.
Add a trigger to this cloudwatch rule to invoke proxy lambda function.
Proxy lambda function will use any combination of above options[1, 2 or 3] to trigger the actor lambda function for processing the records.
Result: Everything can be configured via AWS console and easy to use. Alternatively, you can also write your AWS cloud formation template to generate all the required resources in a single go.
Having said that, now I will leave this up to you to choose the right solution based on your business/cost requirements.
You can use lambda fanout option.
You can follow these steps to process 1k or more using serverless aproach.
1.Store all the stock tickers in a S3 file.
2.Create a master lambda which will read the s3 file and split the stocks in groups of 10.
3. Create a child lambda which will make the async call to external http service and fetch the details.
4. In the master lambda Loop through these groups and invoke 100 child lambdas passing in each group and return the results to the
Master lambda
5. Collect all the information returned from the child lambdas and continue with your processing here.
Now you can trigger this master lambda at the end of markets everyday using CloudWatch time based rule scheduler.
This is a complete serverless approach.

AWS Lambda to read from SQS queue

I have an AWS Lambda function to read from an SQS queue. The lambda logic is basically to read off one message from SQS and then it processes and deletes the message. Code to read the message being something like.
ReceiveMessageRequest messageRequest =
new ReceiveMessageRequest(queueUrl).withWaitTimeSeconds(5).withMaxNumberOfMessages(1);
Now my question is what is the best way to trigger this lambda and how does this lambda scale for instance, if there are let's say 1000 messages in the queue so will there be a 1000 lambdas running together, since in my case one lambda can read only one message off the queue.
Any pointers on best practices around this kind of design.
Right now you best option is probably to setup an AWS Cloudwatch event rule that calls the lambda function on the interval that you need.
Here is a sample app from AWS to do just that:
https://github.com/awslabs/aws-serverless-sqs-event-source
I do believe that AWS will eventually support SQS as a event type for AWS lambda, which should make this even easier, but for now you best choice is probably a version of the code I linked above.
We can now use SQS messages to trigger AWS Lambda Functions. Moreover, no longer required to run a message polling service or create an SQS to SNS mapping.
Further details:
https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
AWS added native support in June 2018: https://aws.amazon.com/blogs/aws/aws-lambda-adds-amazon-simple-queue-service-to-supported-event-sources/
There are probably a few ways to do this, but I found this guide to be fairly helpful when I tried to implement the same sort of functionality you are describing in Node.js. One downside to this strategy is that you can only poll the queue every 60s.
The basic workflow would look something like this:
Set up a CloudWatch Alarm that gets triggered when the queue has a certain number of messages.
The Cloudwatch alarm then posts to SNS
The SNS message triggers a Lambda scale() function
The scale() function updates a configuration record in a DynamoDB table that sets the number of worker processes needed
You then have a main CloudWatch Schedule that invokes a worker() function every 60s
The worker() function reads configuration from DynamoDB to determine how many concurrent processes are needed, based on the queue size.
Worker() then invokes the appropriate number of process() functions
Process() function consumes messages from SQS, performs your main application logic, and then removes the item from the queue.
You can find an example of what the scaling functions would look like in Node.js here
I have used this solution in a production environment for almost a year without any issues, even with thousands of messages in the queue. If you cut out the scaling portion it is only going to do one message a time.

Scheduling SQS messages

My use case is as follows: I need to be able to schedule SQS messages in such a way that scheduled messages can be added to a queue on a specific date/time, and also on a recurring basis as needed.
At the implementation level, what I'm basically be looking to do is have some function I can call where I pass in the SQS queue, message, and schedule I want it to run on, without having to build the actual scheduler logic.
I haven't seen anything in AWS itself that seems to allow for that, I also didn't get the impression Lambda functions would do exactly what I need unless I'm missing something.
Is there any other third party cloud service for scheduled processes I should look into, or am I better off in the end just running a scheduling machine at AWS and have some REST API that can add cron jobs/windows scheduled tasks to it that will handle the scheduling of SQS messages?
I could see two slightly different ways of accomplishing this, both based on Cloudwatch scheduled events. The first would be to have Cloudwatch fire off a Lambda. The Lambda would either have the needed parameters or would get them from somewhere else - for example, a DynamoDB table. Otherwise, the rule target allows you to specify a SQS queue - skipping the Lambda. But I'm not sure if that would have the configuration ability you'd want.
Either way, checkout Cloudwatch -> Events -> Create Rule in the AWS console to see your choices.