How can I track the progress/status of an asynchronous AWS Lambda invocation? - amazon-web-services

I have an API which I use to trigger AWS Lambda jobs. Upon request, the API invokes an AWS Lambda job with InvocationType='Event'. Hereafter, I want to periodically poll if the AWS Lambda job has finished.
The way that would fit best to my architecture, is to store an identifier of the Lambda job in a database and periodically check if the job is finished and what its output is. However, I was not able to find how I can do this.
How can I periodically poll for the result of an AWS Lambda job, and view the output once it has finished?
I have looked into using InvocationType='RequestResponse', but this requires me to store a future, which I cannot do in a database.

There's no built-in way to check for the status of an asynchronous Lambda invocation.
Asynchronous Lambda invocation, using the event invocation type, is meant to be a fire and forget job. As such, there's no 'progress' or 'status' to get or poll for.
As you don't want to wait for the Lambda to complete, synchronous Lambda invocation is out of the picture. In this case, you need to write your own logic to keep track of the status.
One way you could do this is to store a (job) item in a DynamoDB jobs table with 2 attributes:
jobId UUID (String attribute, set as the partition key)
completed boolean flag (Boolean attribute)
Workflow is then as follows:
Within your API, create & store a new job with completed defaulting to 'false'
Pass the newly-created jobId to the Lambda being invoked in the payload
When the Lambda finishes, lookup the job associated with the passed in jobId within the jobs table & set the completed attribute of the job to true
You can then periodically poll for the result of the job within the DynamoDB table.
Or take a look at using DynamoDB Streams as a way to know when a job finishes in near-real time without polling.
As to viewing the 'output', AWS Lambda just returns a success response without additional information. There is no 'output'. Store any output you might need in persistent storage - maybe an extra output attribute as a String with each job? - & later retrieve it.

#Ermiya Eskandary's answer is absolutely right.
I am a Dynamodb Subject matter expert, and did this status tracking (also error handling, retry, error logging) pattern for many of my customers
You could check the pynamodb_mate library, it has the status tracker pattern implemented and you can enable that with around 15 lines of code.
in general, when you say you want status tracking, you are talking about the following:
Each task should be handled by only one worker, you want a concurrency lock mechanism to avoid double consumption. (a lot of people didn't aware of this, it is called Idempotent)
For those succeeded tasks, store additional information such as the output of the task and log the success time.
For those failed task, log the error message for debug, so you can fix the bug and rerun the task.
For those failed task, you want to get all of failed tasks by one simple query and rerun with the updated business logic.
For those tasks failed too many times, you don't want to retry them anymore and wants to ignore them. (a lot of people run into endless loop when they deploy to production then realize that it is a necessary feature)
Run custom query based on task status for analytics purpose.
You can read this jupyter notebook example
Basically, with pynamodb_mate your lambda job application code become:
# this is your lambda application code
def lambda_handler(...):
...
# your new code should be:
with tracker.start_job():
lambda_handler()
If your application code is not Python, then you have two options:
create another lambda function that invoke the original one using sync mode. however, you pay more money to run the "caller" lambda function
suppose your lambda code in in Node.js, then add additional lambda runtime as a layer and wrap your node.js caller around a Python function. In short, you are using Python to call node.js.

Related

How to limit number of concurrent workflows running?

The title is pretty much the question. Is there some way to limit the number of concurrent workflows running at any given time?
Some background:
I'm using eventarc to dispatch a workflow once a message has been sent to a pubsub topic. The workflow will be used to start some long-running operation (LRO) but for reasons I won't go into, I don't want more than 3 instances of this workflow running at a given time.
Is there some way to do this? - primarily from some type of configuration rather than using another compute resource.
There is no configuration to limit running processes that specifically targets sessions that are executed by a Workflow enabled for concurrent execution.
The existing process limit applies to all sessions without differentiating between those from non-concurrent or concurrent enabled Workflows.
Synchronization enables users to limit the parallel execution of certain workflows or templates within a workflow without having to restrict others.
Users can create multiple synchronization configurations in the ConfigMap that can be referred to from a workflow or template within a workflow. Alternatively, users can configure a mutex to prevent concurrent execution of templates or workflows using the same mutex.
Refer to this link for more information.
Summarizing your requirements:
Trigger workflow executions with Pub/Sub messages
Execute at most 3 workflow executions concurrently
Queue up waiting Pub/Sub messages
(Unspecified) Do you need messages processed in the order delivered?
There is no out-of-the box capability to achieve this. For fun, below is a solution that doesn't need secondary compute (and therefore is still fully managed).
The key to making this work is likely starting new executions for every message, but waiting in that execution if needed. Workflows does not provide a global concurrency construct, so you'll need to use some external storage, such as Firestore. An algorithm like this could work:
Create a callback
Push the callback into a FIFO queue
Atomically increment a counter (which returns the new value)
If the returned value is <= 3, pop the last callback and call it
Wait on the callback
-- MAIN WORKFLOW HERE --
Atomically decrement the counter
If the returned value is < 3, pop the last callback and call it
To keep things cleaner, you could put the above steps in a the triggered workflow and the main logic in a separate workflow that is called as needed.

Have Lambda function dispatch a task and return response right away

Im a little confused since AWS has a lot of features and I do not know what to use.
So, I was creating a Lambda function that does a lot of stuff with a remote web, process could last at least a minute.
So my idea was creating an API that calls this lambda, have lambda create an unique ID and return a response right away to the client with a token., save this token to a DB.
Then have lambda process all this stuff with a remote web and, when it finishes, save the results to the DB and a bucket (a file), so this result is ready to deliver when the client makes another call to another API that makes a query to the DB to know the status of this process.
Thing is, it seems that if a response is sent from the handler, lambda terminates the execution, Im afraid the processing with the remote web will never finish.
I have read that step functions is the way to go, but I cant figure out which service will take the processing, ¿maybe another lambda?
Is there another service that is more suitable for this type of work?, this process involves scrapping a page and downloading files, is written in python.
I have read that step functions is the way to go, but I cant figure
out which service will take the processing, ¿maybe another lambda?
Yes, another Lambda function would do the background processing. You could also just have the first Lambda invoke a second Lambda directly, without using Step Functions. Or the first Lambda could place the task info in an SQS queue that another Lambda polls.
Is there another service that is more suitable for this type of work?
Lambda functions are fine for this. You just need to split your task into multiple functions somehow, either by one Lambda calling the other directly, or by using Step Functions, or an SQS queue or something.

How can I set an AWS lambda to be triggered 24hs after the beginning of a process?

I have the follow schema (using AWS-IoTCore and AWS-Lambdas) that starts upon some MQTT event. At the end of the main process a flag success is saved in the database. This process takes around 5 min. I would like to call a Lambda function 24hs after the beginning of the process to check if the flag exists in the database. How can I set this timeout or delay in the Lambda function without stepfunctions?
There are two possible paths to "wait 24 hours" until execution of an AWS Lambda function.
Using a 'wait' step in AWS Step Functions
AWS Step Functions is ideal for orchestrating multi-step Lambda functions. There is a Wait state that can be triggered, and the flow could then trigger another Lambda function. Step Functions will track each individual request and you can see each execution and their current state. Sounds good!
Schedule your own check
Use Amazon CloudWatch Events to trigger an AWS Lambda function at regular intervals. The Lambda function could query the database to locate records for processes that did not complete successfully. However, since your code needs to find "non-successful" processes, each process would first need to be added to the database (eg at the start of the 'Process' function).
Alternate approach: Only track failures
The architecture in the above diagram stores 'success' in the database, which is then checked 24 hours later. Based on this diagram, it appears as though the only purpose of the database is to track successful processes.
Instead, I would recommend:
Changing the top line to only store 'failed' processes, and the time of their failure (assuming failure can be detected)
Trigger an AWS Lambda function at regular intervals that will:
Check the database
Retrieve failed processes that are older that 24 hours
Submit them for re-processing
Delete the records from the database, or mark them as 'retried' to avoid future retrievals
Basically, the database is used to track failures, rather than successes. The "wait 24 hours" step is replaced with a regular database check for failed processes.
I'm not sure why the systems wants to wait 24 hours before sending a 'success' email, but I presume that is an intentional part of your design.
You can use CloudWatch Events (or EventBridge) to schedule a Lambda function to run once, at a set time.
To create a CloudWatch Event rule, you can use the PutRule API operation to create the rule and then add the Lambda function as the target using PutTargets. The Lambda function should also allow CloudWatch Events to invoke the function in its function policy.
I also tested this out with the following cron expression: cron(0 0 1 1 ? 2021)
This would trigger the target Lambda function once at Fri, 01 Jan 2021 00:00:00 GMT. Also note, function may be invoked more than once due to CW invoking the function asynchronously. After the Lambda function has been invoked, you can delete the rule from the same function for clean-up.

Using AWS Lambda to periodically monitor state of a remote resource

I need to implement the following feature in the backend on AWS:
- API endpoint which allows a user to start a particular long running "process" in a remote system
- the process status in this remote system should be monitored periodically (every few-several seconds) for status and when status == complete, trigger an action (the remote system does not support sending/triggering notifications or callbacks)
We use primarily lambda functions so I'm thinking about approaching it in the following way:
- my endpoint which is triggered by the user would call remote system to start the process, store record in internal DB and generate a message to SQS (with a delivery delay of X seconds)
- there would be a second lambda that would read messages from SQS & check status of the process in this remote system. When status == complete, trigger an action, when status != complete, generate another message to SQS which would again the same lambda would pick up after X seconds of delay and repeat the check and so on
I'm wondering if there is a better solution/tools to implement this kind of monitoring/notification pattern in the AWS since I'm not that familiar with all the services that AWS provides.
Would anyone comment on this approach and perhaps suggest an alternative if there is one?
Take a look at AWS Step Functions which I think is the best fit for your use case.
All you need to do is, instead of generating a SQS message, initiate an execution of a StateMachine in StepFunctions.
The following tutorial explains a iterator loop with a counter. But you can use the same logic to check the status and keep looping until status == complete
https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-create-iterate-pattern-section.html
Another useful resource which I think very close to your use case
https://docs.aws.amazon.com/step-functions/latest/dg/sample-project-job-poller.html

Make Lambda function execute now, and/or in an hour

I'm trying to implement an AWS Lambda function that should send an HTTP request. If that request fails (response is anything but status 200) I should wait another hour before retrying (longer that the Lambda stays hot). What the best way to implement this?
What comes to mind is to persist my HTTP request in some way and being able to trigger the Lambda function again in a specified amount of time in case of a persisted HTTP request. But I'm not completely sure which AWS service that would provide that functionality for me. Is SQS an option that can help here?
Or, can I dynamically schedule Lambda execution for this? Note that the request to be retried should be identical to the first one.
Any other suggestions? What's the best practice for this?
(Lambda function is my option. No EC2 or such things are possible)
You can't directly trigger Lambda functions from SQS (at the time of writing, anyhow).
You could potentially handle the non-200 errors by writing the request data (with appropriate timestamp) to a DynamoDB table that's configured for TTL. You can use DynamoDB Streams to detect when DynamoDB deletes a record and that can trigger a Lambda function from the stream.
This is obviously a roundabout way to achieve what you want but it should be simple to test.
As jarmod mentioned, you cannot trigger Lambda functions directly by SQS. But a workaround (one I've used personally) would be to do the following:
If the request fails, push an item to an SQS Delay Queue (docs)
This SQS message will only become visible on the queue after a certain delay (you mentioned an hour).
Then have a second scheduled lambda function which is triggered by a cron value of a smaller timeframe (I used a minute).
This second function would then scan the SQS queue and if an item is on the queue, call your first Lambda function (either by SNS or with the AWS SDK) to retry it.
PS: Note that you can put data in an SQS item, since you mentioned you needed the lambda functions to be identical you can store your first function's input in here to be reused after an hour.
I suggest that you take a closer look at the AWS Step Functions for this. Basically, Step Functions is a state machine that allows you to execute a Lambda function, i.e. a task in each step.
More information can be found if you log in to your AWS Console and choose the "Step Functions" from the "Services" menu. By pressing the Get Started button, several example implementations of different Step Functions are presented. First, I would take a closer look at the "Choice state" example (to determine wether or not the HTTP request was successful). If not, then proceed with the "Wait state" example.