Trigger another lambda after a week of first lambda execution - amazon-web-services

I am working on a code where Lambda Function 1 (call it, L1) executes on messages from an SQS queue. I want to execute another lambda (call it, L2) exactly a week after L1 completes and want to pass L1's output to L2.
Execution Environment: Java
For my application, we are expecting around 10k requests on L1 per day. And same number of requests for L2.
If it runs for a week, we can have around 70k active executions at peak.
Things that I have tried:
Cloudwatch events with cron: I can schedule a cron with specified time or date which will trigger L2. But I couldn't find way to pass input with scheduled Cloudwatch event.
Cloudwatch events with new rules: At the end of first lambda I can create a new cloudwatch rule with specified time and specified input. But that will create as many rules (for my case, it could be around 10k new cloudwatch rules everyday). Not sure if that is a good practice or even supported.
Step function: There are two types step functions in play today.
Standard: Supports wait for a year, but only supports 25k active executions at any time. Won't scale since my application will already have 70k active executions at the end of first week.
https://docs.aws.amazon.com/step-functions/latest/dg/limits.html
Express: Doesn't have limit on number of active executions but supports max 5 minutes executions. It will time out after that.
https://docs.aws.amazon.com/step-functions/latest/dg/express-limits.html

It would be easy to create a new Cloudwatch Rule with the "week later" Lambda as a target as the last step in the first Lambda. You would set a Rule with a cron that runs 1 time in 1 week. Then, the Target has an input field. In the console it looks like:
You didn't indicate your programming environment but you can do something similar to (psuedo code, based on Java SDK v2):
String lambdaArn = "the one week from today lambda arn";
String ruleArn = client.putRule(PutRuleRequest.builder()
.scheduleExpression("17 20 23 7 *")
.name("myRule")).ruleArn();
Target target = TargetBuilder.builder().arn(lambdaArn).input("{\"message\": \"blah\"}").rule("myRule");
client.putTargets(PutTargetsRequest.builder().targets(target));
This will create a Cloudwatch Event Rule that runs one time, 1 week from today with the input as shown.
Major Edit
With your new requirements (at least 1 week later, 10's of thousands of events) I would not use the method I described above as there are just too many things happening. Instead I would have a database of events that will act as a queue. Either a DynamoDB or RDS database will suffice. At the end of each "primary" Lambda run, insert an event with the date and time of the next run. For example, today, July 18 I would insert July 25. The table would be something like (PostgreSQL syntax):
create table event_queue (
run_time timestamp not null,
lambda_input varchar(8192),
);
create index on event_queue( run_time );
Where the lambda_input column has whatever data you want to pass to the "week later" Lambda. In PostgreSQL you would do something like:
insert into event_queue (run_time, lambda_input)
values ((current_timestamp + interval '1 week'), '{"value":"hello"}');
Every database has something similar to the date/time functions shown or the code to do this isn't terrible.
Now, in CloudWatch create a rule that runs once an hour (the resolution can be tuned). It will trigger a Lambda that "feeds" an SQS queue. The Lambda will query the database:
select * from event_queue where run_time < current_timestamp
and, for each row, put a message into an SQS queue. The last thing it does is delete these "old" messages using the same where clause.
On the other side you have your "week later" Lambdas that are getting events from the SQS queue. These Lambdas are idle until a set of messages are put into the queue. At that time they fire up and empty the queue, doing whatever the "week later" Lambda is supposed to do.
By running the "feeder" Lambda hourly you basically capture everything that is 1 week plus up to 1 hour old. The less often you run it the more work that your "week later" Lambda's have to do and conversely, running every minute will add load to the database but remove it from the week later Lambda.
This should scale well, assuming that the "feeder" Lambda can keep up. 10k transactions / 24 hours is only 416 transactions and the reading of the DB and creation of the messages should be very quick. Even scaling that by 10 to 100k/day is still only ~4000 rows and messages which, again, should be very doable.

Cloudwatch is more for cron jobs. To trigger something at a specific timestamp or after X amount of time I would recommend using Step Functions instead.
You can achieve your use-case by using a State Machine with a Wait State (you can pass tell it how long to wait based on your input) followed by your Lambda Task State. It will be similar to this example.

Related

Recursive AWS lambda with updating state

I have an AWS Lambda that polls from an external server for new events every 6 hours. On every call, if there are any new events, it publishes the updated total number of events polled to a SNS. So I essentially need to call the lambda on fixed intervals but also pass a counter state across calls.
I'm currently considering the following options:
Store the counter somewhere on a EFS/S3, but it seems an
overkill for a simple number
EventBridge, which would be ok to schedule the execution, but doesn't store state across calls
A step function with a loop + wait on the the lambda would do it, but it doesn't seem to be the most efficient/cost effective way to do it
use a SQS with a delay so that the lambda essentially
triggers itself, passing the updated state. Again I don't think
this is the most effective, and to actually get to the 6 hours delay
I would have to implement some checks/delays within the lambda, as the max delay for SQS is 15 minutes
What would be the best way to do it?
For scheduling Lambda at intervals, you can use CloudWatch Events. Scheduling Lambda using Serverless framework is a breeze. A cronjob type statement can schedule your lambda call. Here's a guide on scheduling: https://www.serverless.com/framework/docs/providers/aws/events/schedule
As for saving data, you can use AWS Systems Manager Parameter Store. It's a simple Key value pair storate for such small amount of data.
https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-parameter-store.html
OR you can also save it in DynamoDB. Since the data is small and frequency is less, you wont be charged much and there's no hassle of reading files or parsing.

How can I set an AWS lambda to be triggered 24hs after the beginning of a process?

I have the follow schema (using AWS-IoTCore and AWS-Lambdas) that starts upon some MQTT event. At the end of the main process a flag success is saved in the database. This process takes around 5 min. I would like to call a Lambda function 24hs after the beginning of the process to check if the flag exists in the database. How can I set this timeout or delay in the Lambda function without stepfunctions?
There are two possible paths to "wait 24 hours" until execution of an AWS Lambda function.
Using a 'wait' step in AWS Step Functions
AWS Step Functions is ideal for orchestrating multi-step Lambda functions. There is a Wait state that can be triggered, and the flow could then trigger another Lambda function. Step Functions will track each individual request and you can see each execution and their current state. Sounds good!
Schedule your own check
Use Amazon CloudWatch Events to trigger an AWS Lambda function at regular intervals. The Lambda function could query the database to locate records for processes that did not complete successfully. However, since your code needs to find "non-successful" processes, each process would first need to be added to the database (eg at the start of the 'Process' function).
Alternate approach: Only track failures
The architecture in the above diagram stores 'success' in the database, which is then checked 24 hours later. Based on this diagram, it appears as though the only purpose of the database is to track successful processes.
Instead, I would recommend:
Changing the top line to only store 'failed' processes, and the time of their failure (assuming failure can be detected)
Trigger an AWS Lambda function at regular intervals that will:
Check the database
Retrieve failed processes that are older that 24 hours
Submit them for re-processing
Delete the records from the database, or mark them as 'retried' to avoid future retrievals
Basically, the database is used to track failures, rather than successes. The "wait 24 hours" step is replaced with a regular database check for failed processes.
I'm not sure why the systems wants to wait 24 hours before sending a 'success' email, but I presume that is an intentional part of your design.
You can use CloudWatch Events (or EventBridge) to schedule a Lambda function to run once, at a set time.
To create a CloudWatch Event rule, you can use the PutRule API operation to create the rule and then add the Lambda function as the target using PutTargets. The Lambda function should also allow CloudWatch Events to invoke the function in its function policy.
I also tested this out with the following cron expression: cron(0 0 1 1 ? 2021)
This would trigger the target Lambda function once at Fri, 01 Jan 2021 00:00:00 GMT. Also note, function may be invoked more than once due to CW invoking the function asynchronously. After the Lambda function has been invoked, you can delete the rule from the same function for clean-up.

Is there a way to set a walltime on AWS Batch jobs?

Is there a way to set a maximum running time for AWS Batch jobs (or queues)? This is a standard setting in most batch managers, which avoids wasting resources when a job hangs for whatever reason.
As of April, 2018, AWS Batch now supports setting a Job Timeout when submitting a Job, or in the job definition.
https://aws.amazon.com/about-aws/whats-new/2018/04/aws-batch-adds-support-for-automatic-termination-with-job-execution-timeout/
You specify an attemptDurationSeconds parameter, which must be at least 60 seconds, either in your job definition, or when you submit the job. When this number of seconds has passed following the job attempt's startedAt timestamp, AWS Batch terminates the job. On the compute resource, your job's container receives a SIGTERM signal to give your application a chance to shut down gracefully; if the container is still running after 30 seconds, a SIGKILL signal is sent to forcefully shut down the container.
Source: https://docs.aws.amazon.com/batch/latest/userguide/job_timeouts.html
POST /v1/submitjob HTTP/1.1
Content-type: application/json
{
...
"timeout": {
"attemptDurationSeconds": number
}
}
AFAIK there is no feature to do this. However, a workaround was suggested in the forum for a similar question.
One idea is to call Batch as an Activity from Step Functions, pingback
back on a schedule (e.g. every minute) from that job. If it stops
responding then you can detect that situation as a Timeout in the
activity and act accordingly (terminate the job etc.). Not an ideal
solution (especially if the job continues to ping back as a "zombie"),
but it's a start. You'd also likely have to store activity tokens in a
database to trace them to Batch job id.
Alternatively, you split that setup into 2 steps, and schedule a Batch
job from a Lambda in the first state, then pass the Batch job id to
the second step which then polls Batch (from another Lambda) for its
state with Retry and IntervalSeconds (e.g. once every minute, or even
with exponential backoff), and MaxAttempts calculated based on your
timeout. This way, you don't need any external state storage
mechanism, long polling or even a "ping back" from the job (it CAN be
a zombie), but the downside is more steps.
There is no option to set timeout on batch job but you can setup a lambda function that triggers every 1 hour or so and deletes jobs created before say 24 hours.
working with aws for some time now and could not find a way to set a maximum running time for batch jobs.
However there are some alternative way which you could utilize.
AWS Forum
Sadly there is no way to set the limit execution time on AWS Batch.
One solution may be to edit the docker's entry point to schedule the execution time limit.

Can I schedule a lambda function execution with a lambda function?

I'm looking for the ability to programmatically schedule a lambda function to run a single time with another lambda function. For example, I made a request to myFirstFunction with date and time parameters, and then at that date and time, have mySecondFunction execute. Is that possible only with stateless AWS services? I'm trying to avoid an always-on ec2 instance.
Most of the results I'm finding for scheduling a lambda functions have to do with cloudwatch and regularly scheduled events, not ad-hoc events.
This is a perfect use case for aws step functions.
Use Wait state with SecondsPath or TimestampPath to add the required delay before executing the Next State.
What you're tring to do (schedule Lambda from Lambda) it's not possible with the current AWS services.
So, in order to avoid an always-on ec2 instance, there are other options:
1) Use AWS default or custom metrics. You can use, for example, ApproximateNumberOfMessagesVisible or CPUUtilization (if your app fires a big CPU utilization when process a request). You can also create a custom metric and fire it when your instance is idle (depending on the app that's running in your instance).
The problem with this option is that you'll waste already paid minutes (AWS always charge a full-hour, no matter if you used your instance for 15 minutes).
2) A better option, in my opinion, would be to run a Lambda function once per minute to check if your instances are idle and shut them down only if they are close to the full hour.
import boto3
from datetime import datetime
def lambda_handler(event, context):
print('ManageInstances function executed.')
environments = [['instance-id-1', 'SQS-queue-url-1'], ['instance-id-2', 'SQS-queue-url-2'], ...]
ec2_client = boto3.client('ec2')
for environment in environments:
instance_id = environment[0]
queue_url = environment[1]
print 'Instance:', instance_id
print 'Queue:', queue_url
rsp = ec2_client.describe_instances(InstanceIds=[instance_id])
if rsp:
status = rsp['Reservations'][0]['Instances'][0]
if status['State']['Name'] == 'running':
current_time = datetime.now()
diff = current_time - status['LaunchTime'].replace(tzinfo=None)
total_minutes = divmod(diff.total_seconds(), 60)[0]
minutes_to_complete_hour = 60 - divmod(total_minutes, 60)[1]
print 'Started time:', status['LaunchTime']
print 'Current time:', str(current_time)
print 'Minutes passed:', total_minutes
print 'Minutes to reach a full hour:', minutes_to_complete_hour
if(minutes_to_complete_hour <= 2):
sqs_client = boto3.client('sqs')
response = sqs_client.get_queue_attributes(QueueUrl=queue_url, AttributeNames=['All'])
messages_in_flight = int(response['Attributes']['ApproximateNumberOfMessagesNotVisible'])
messages_available = int(response['Attributes']['ApproximateNumberOfMessages'])
print 'Messages in flight:', messages_in_flight
print 'Messages available:', messages_available
if(messages_in_flight + messages_available == 0):
ec2_resource = boto3.resource('ec2')
instance = ec2_resource.Instance(instance_id)
instance.stop()
print('Stopping instance.')
else:
print('Status was not running. Nothing is done.')
else:
print('Problem while describing instance.')
UPDATE - I wouldn't recommend using this approach. Things changed in when TTL deletions happen and they are not close to TTL time. The only guarantee is that the item will be deleted after the TTL. Thanks #Mentor for highlighting this.
2 months ago AWS announced DynamoDB item TTL, which allows you to insert an item and mark when you wish for it to be deleted. It will be deleted automatically when the time comes.
You can use this feature in conjunction with DynamoDB Streams to achieve your goal - your first function inserts an item to a DynamoDB table. The record TTL should be when you want the second lambda triggered. Setup a stream that triggers your second lambda. In this lambda you will identify deletion events and if that's a delete then run your logic.
Bonus point - you can use the table item as a mechanism for the first lambda to pass parameters to the second lambda.
About DynamoDB TTL:
https://aws.amazon.com/blogs/aws/new-manage-dynamodb-items-using-time-to-live-ttl/
It does depend on your use case, but the idea that you want to trigger something at a later date is a common pattern. The way I do it serverless is I have a react application that triggers an action to store a date in the future. I take the date format like 24-12-2020 and then convert it using date(), having researched that the date format mentioned is correct, so I might try 12-24-2020 and see what I get(!). When I am happy I convert it to a Unix number in javascript React I use this code:
new Date(action.data).getTime() / 1000
where action.data is the date and maybe the time for the action.
I run React in Amplify (serverless), I store that to dynamodb (serverless). I then run a Lambda function (serverless) to check my dynamodb for any dates (I actually use the Unix time for now) and compare the two Unix dates now and then (stored) which are both numbers, so the comparison is easy. This seems to me to be super easy and very reliable.
I just set the crontab on the Lambda to whatever is needed depending on the approximate frequency required, in most cases running a lambda every five minutes is pretty good, although if I was only operating this in a certain time zone for a business weekday application I would control the Lambda a little more. Lambda is free for the first 1m functions per month and running it every few minutes will cost nothing. Obviously things change, so you will need to look that up in your area.
You will never get perfect timing in this scenario. It will, however, for the vast majority of use cases be close enough according to the timing settings of the Lambda function, you could set it up to check every minute or just once per day, it all depends on your application.
Alternatively, If I wanted an instant reaction to an event I might use SMS, SQS, or Kinesis to instantly stream a message, it all depends on your use case.
I'd opt for enqueuing deferred work to SQS using message timers in myFirstFunction.
Currently, you can't use SQS as a Lambda event source, but you can either periodically schedule mySecondFunction to check the queue via scheduled CloudWatch Events (somewhat of a variant of the other options you've found) or use a CloudWatch alarm on the ApproximateNumberOfMessagesVisible to fire an SNS message to a Lambda and avoid constant polling for queues that are frequently inactive for long periods.

SQS - Delivery Delay of 30 minutes

From the documentation of SQS, Max time delay we can configure for a message to hide from its consumers is 15 minutes - http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-delay-queues.html
Suppose if I need to hide the messages for a day, what is the pattern?
For eg. I want to mimic a daily cron for doing some action.
Thanks
The simplest way to do this is as follows:
SQS.push_to_queue({perform_message_at : "Thursday November 2022"},delay: 15 mins)
Inside your worker
message = SQS.poll_messages
if message.perform_message_at > Time.now
SQS.push_to_queue({perform_message_at : "Thursday November
2022"},delay:15 mins)
else
process_message(message)
end
Basically push the message back to the queue with the maximum delay and only process it when its processing time is less than the current time.
HTH.
Visibility timeout can do up to 12 hours. I think you can hack something together where you process a message but don't delete it and next time it is processed its been 12 hours. So a queue with one message and visibility timeout of 12 hours. That gets you a 12 hour cron.
Cloudwatch is likely a better way to do it. You can use a createEvent API with the timer, and have it trigger either a lambda function or an API call to whatever comes next.
Another way to do is to use the "wait" utility in an AWS step function.
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-wait-state.html
In any case, unless you are extremely sure you will never need anything more than 15 minutes, the SQS backdoor to add the delay seems hacky.
You can do this by adding a DLQ with MaxReceives set to 1 on the first queue.
Add a simple Lambda on the first queue and fail the message vi Lambda. So message will be moved to DLQ automatically and then you can consume from DLQ.
Both primary queue and DLQ can have max 15 min delay, so finally you get 30 min delay.
So your consumer app receives the message after 30 minutes, without adding any custom logic on it.
Two thoughts.
Untested. Perhaps publish to and SNS topic that has no SQS queues. When delivery needs to happen, subscribe the queue to the topic. (I've not done this, I'm not sure if this would work as expected)
Push messages as files to a central store (like S3). Create a worker that looks at the time created timestamp and decides whether to publish them to a queue or not. If created >= 1d ago, publish.
This was a challenge for us as well and I never found a perfect solution so I ended up building a service to address it. Obviously self promotion here but the system allows you to work around the DelaySeconds limitation and set arbitrary date/times at scale.
https://anticipated.io
Some of the challenges working with Step Functions are scale of registered machines (if your system had that requirement). If you use EventBridge to fire them you run out of allowable rulesets (limit is 200 as of this posting). Example: if you need to set 150,000 arbitrary events a month you run into limits quickly.