I am using Amazon Cloud Watch to trigger 4 different lambda functions every twelve hours. The lambda functions pull some data from an api and save it to my database. I want to make sure that the timestamp matches for the data on all my lambda functions. Initially I used the PostgreSQL default timestamp however this records time to the millisecond which introduces small discrepancies in time.
It seems like the Cloud Watch rule which invokes my lambda functions might be able to pass along an identical time stamp but I haven't been able to figure out how to do this, or even verify if it is possible.
I really don't need the time stamp to go to the minute. Mostly I am concerned with the date and whether it was the AM or PM batch so knowing time to the nearest hour is good enough.
If any AWS experts could lend me some advice it would be appreciated.
The scheduled CloudWatch (CW) Event rule passes the following event object to the lambda function, e.g.:
{
"version": "0",
"id": "a75ba59d-81d6-8363-8e68-593f7de30b09",
"detail-type": "Scheduled Event",
"source": "aws.events",
"account": "32323232",
"time": "2021-02-21T06:29:27Z",
"region": "us-east-1",
"resources": [
"arn:aws:events:us-east-1:32323232:rule/test"
],
"detail": {}
}
As you can see, time is measured to the second. Also CW does not guarantee exact execution of it events. They can be off by 1 minute:
Your scheduled rule is triggered within that minute, but not on the precise 0th second
So your four functions will have slightly different time. Thus, you have to manage that in your code - round it to nearest hour for example.
The alternative is to use your lambda environment build in tools for getting timestamp, instead of using time from event. This can be easier as you can just get timestamp with precision of 1 hour directly, rather then parse the time from event and the post-process it to get desired precision.
Related
When you have run a long-running call of Step Functions that includes many steps (sometimes 1000s of events in the history), you may get a failure that is several pages down in the console. I have to keep clicking "Load more" to be able to see the actual error.
There has to be a better way, how do you solve this?
To make this easier, we need to use the AWS CLI.
First, make sure we can list the specific execution by copying the execution ARN from the web console and using that to show the execution details using the CLI:
aws stepfunctions describe-execution --execution-arn <EXECUTION ARN>
The reply should look something like this:
{
"executionArn": "arn:aws:states:us-east-1:123456789012:execution:my-execution-id",
"stateMachineArn": "arn:aws:states:us-east-1:123456789012:stateMachine:my-state-machine-name",
"status": "FAILED",
"startDate": "2021-10-28T08:31:04.138000+02:00",
"stopDate": "2021-10-28T08:33:37.471000+02:00",
"name": "my-execution-name",
"input": "{\"foo\":\"bar\"}"
}
To show the steps in the execution, we can use the CLI to list the execution history. As we're dealing with an execution that has many steps, it's smart to reverse the list and limiting the number of results to something less, like 5.
aws stepfunctions get-execution-history --reverse-order --max-items 5 --execution-arn <EXECUTION ARN>
This will very likely show you the failing step, since it's normally in the end of the execution steps.
You can enable logging on the step function, then load the step function logs into Log Insights, and write a query like:
fields #timestamp, #message
| filter type like "TaskFailed"
which will return all the failed type tasks in the time period of your search.
I have a scenario when i will have set of tasks to be executed at specific timing.
for example
task1: 28-06-2020 1:00 AM
task2: 30-06-2020 2:00 AM
task3: 01-07-2020 12:00 PM
.
.
.
n
i want to trigger my lambda(where me logic is defined), at these specified timing.
Probably i would be storing my timings to execute in a database,
can some tell me a way to execute lambda at a specified time.
I know we have TTL mechanism in dynamo which can trigger lambda but it delays the execution by 48 hours.I want my lambda to execute at the precise timings
You can use CloudWatch Events cron expressions for specific dates to execute only once. You would have to create rules for each date in question. This is based on the assumption that there is no regular pattern to repeatability of the dates.
The rules would trigger your lambda at these specific dates.
For example, for your dates in the question, you could use:
28-06-2020
30-06-2020
Given that you will have potentially 1000+ events at various times of day, you will need to implement your own solution. I would recommend:
Store events in a database (eg date, time, repetition pattern)
Use AWS CloudWatch Events to trigger an AWS Lambda function every minute
Code the Lambda function to:
Query the database for unprocessed events that are due (or past-due)
Invoke the appropriate Lambda function
Delete the event from the database, or mark it as processed (for repeating events, store a 'last processed' time)
Functions will potentially be invoked a few seconds late due to these processing steps, but that should be fine unless you need high-precision timing.
Steps Functions as an ad-hoc scheduler could be a good option for the use-case.
Query the database and Schedule execution for the specific date/time in Step Function state machine
In Step Function execution, map the lambda that needs to be triggered
Once the lambda is triggered at the desired time, the required business functionalities can be implemented.
References:
https://medium.com/serverless-transformation/serverless-event-scheduling-using-aws-step-functions-b4f24997c8e2
https://meetrix.io/blog/aws/06-using-step-functions-to-schedule-your-lambda.html
https://blog.smirnov.la/step-functions-as-an-ad-hoc-scheduling-mechanism-ed1787e44bb1
I have to implement functionality that requires delayed sending of a message to a user once on a specific date, which can be anytime - from tomorrow till in a few months from now.
All our code is so far implemented as lambda functions.
I'm considering three options on how to implement this:
Create an entry in DynamoDB with hash key being date and range key being unique ID. Schedule lambda to run once a day and pick up all entries/tasks scheduled for this day, send a message for each of them.
Using SDK Create cloudwatch event rule with cron expression indicating single execution and make it invoke lambda function (target) with ID of user/message. The lambda would be invoked on a specific schedule with a specific user/message to be delivered.
Create a step function instance and configure it to sleep & invoke step with logic to send a message when the right moment comes.
Do you have perhaps any recommendation on what would be best practice to implement this kind of business requirement? Perhaps an entirely different approach?
It largely depends on scale. If you'll only have a few scheduled at any point in time then I'd use the CloudWatch events approach. It's very low overhead and doesn't involve running code and doing nothing.
If you expect a LOT of schedules then the DynamoDB approach is very possibly the best approach. Run the lambda on a fixed schedule, see what records have not yet been run, and are past/equal to current time. In this model you'll want to delete the records that you've already processed (or mark them in some way) so that you don't process them again. Don't rely on the schedule running at certain intervals and checking for records between the last time and the current time unless you are recording when the last time was (i.e. don't assume you ran a minute ago because you scheduled it to run every minute).
Step functions could work if the time isn't too far out. You can include a delay in the step that causes it to just sit and wait. The delays in step functions are just that, delays, not scheduled times, so you'd have to figure out that delay yourself, and hope it fires close enough to the time you expect it. This one isn't a bad option for mid to low volume.
Edit:
Step functions include a wait_until option on wait states now. This is a really good option for what you are describing.
As of November 2022, the cleanest approach would be to use EventBridge Scheduler's one-time schedule.
A one-time schedule will invoke a target only once at the date and time that you specify using a valid date, and a timestamp. EventBridge Scheduler supports scheduling in Universal Coordinated Time (UTC), or in the time zone that you specify when you create your schedule. You configure a one-time schedule using an at expression.
Here is an example using the AWS CLI:
aws scheduler create-schedule --schedule-expression "at(2022-11-30T13:00:00)" --name schedule-name \
--target '{"RoleArn": "role-arn", "Arn": "QUEUE_ARN", "Input": "TEST_PAYLOAD" }' \
--schedule-expression-timezone "America/Los_Angeles"
--flexible-time-window '{ "Mode": "OFF"}'
Reference: Schedule types on EventBridge Scheduler - EventBridge Scheduler
User Guide
Instead of using DynamoDB I would suggest to use s3. Store the message and time to trigger as key value pairs.
S3 to store the date and time as key value store.
Use s3 lambda trigger to create the cloudwatch rules that would target specific lambda's etc
You can even schedule a cron to a lambda that will read the files from s3 and update the required cron for the message to be sent.
Hope so this is in line with your requirements
I am currently using the boto3 SDK from a Lambda function in order to retrieve various information about the Sagemaker Notebook Instances deployed in my account (almost 70 so not that many...)
One of the operations I am trying to perform is listing the tags for each instance.
However, from time to time it takes ages to return the tags : my Lambda either gets stopped (I could increase the timeout but still...) or a ThrottlingException is raised from the sagemaker.list_tags function (which could be avoided by increasing the number of retry upon sagemaker boto3 client creation) :
sagemaker = boto3.client("sagemaker", config=Config(retries = dict(max_attempts = 10)))
instances_dict = sagemaker.list_notebook_instances()
if not instances_dict['NotebookInstances']:
return "No Notebook Instances"
while instances_dict:
for instance in instances_dict['NotebookInstances']:
print instance['NotebookInstanceArn']
start = time.time()
tags_notebook_instance = sagemaker.list_tags(ResourceArn=instance['NotebookInstanceArn'])['Tags']
print (time.time() - start)
instances_dict = sagemaker.list_notebook_instances(NextToken=instances_dict['NextToken']) if 'NextToken' in instances_dict else None
If you guys have any idea to avoid such delays :)
TY
As you've noted you're getting throttled. Rather than increasing the number of retries you might try to change the delay (i.e. increase the growth_factor). Seems to be configurable looking at https://github.com/boto/botocore/blob/develop/botocore/data/_retry.json#L83
Note that buckets (and refill rates) are usually at the second granularity. So with 70 ARNs you're looking at some number of seconds; double digits does not surprise me.
You might want to consider breaking up the work differently since adding retries/larger growth_factor will just increase the length of time the function will run.
I've had pretty good success at breaking things up so that the Lambda function only processes a single ARN per invocation. The Lambda is processing work (I'll typically use a SQS queue to manage what needs to be processed) and the rate of work is configurable via a combination of configuring the Lambda and the SQS message visibility.
Not know what you're trying to accomplish outside of your original Lambda I realize that breaking up the work this way might (or will) add challenges to what you're doing overall.
It's also worth noting that if you have CloudTrail enabled the tags will be part of the event data (request data) for the "EventName" (which matches the method called, i.e. CreateTrainingJob, AddTags, etc.).
A third option would be if you are trying to find all of the notebook instances with a specific tag then you can use Resource Groups to create a query and find the ARNs with those tags fairly quickly.
CloudTrail: https://docs.aws.amazon.com/awscloudtrail/latest/APIReference/Welcome.html
Resource Groups: https://docs.aws.amazon.com/ARG/latest/APIReference/Welcome.html
Lambda with SQS: https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html
I want to be able to set a time to invoke an AWS Lambda function, then have that function be invoked then and only then. For example, I want my Lambda function to run at 9:00pm on December 19th, 2017. I don't want it to repeat, I don't want it to invoke now, just at 9:00pm on the 19th.
I understand that CloudWatch provides Scheduled Events, and I was thinking that when a time to schedule this reminder for is inputted, a CloudWatch Scheduled Events is created to fire in that amount of time from now (so like if you schedule it at 8:22pm to run at 9pm, it’ll be 38 mins), then it invokes the Lambda function at 9pm which then deletes the CloudWatch Scheduled Event. My issue with this is that when a CloudWatch Scheduled Event is created, it executes right then, then at the specified interval.
Any other ideas would be appreciated, as I can't think of another solution. Thanks in advance!
You can schedule lambda event using following syntax:
cron(Minutes Hours Day-of-month Month Day-of-week Year)
Note: All fields are required and time zone is UTC only
Please refer this AWS Documentation for Details.
Thanks
You can use DynamoDB TTL feature to implement this easily, simply do the following:
1- Put item with TTL, the exact time you want to execute or invoke a lambda function.
2- Configure DynamoDB Streams to trigger a lambda function on item's remove event.
Once the item/record is about to expire, your lambda will be invoked. you don't have to delete or cleanup anything as the item in dynamodb is already gone.
NOTE: However the approach is easy to implement and scales very well, but there's one precaution to mention; using DynamoDB TTL as a scheduling mechanism cannot guarantee exact time precision as there might be a delay. The scheduled tasks are executed couple of minutes behind.
You can schedule a step function which can wait until a specific point in time before invoking the lambda with an arbitrary payload.
https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-wait-state.html
Something like this
const stepFunctions = new AWS.StepFunctions()
const payload = {
stateMachineArn: process.env.SCHEDULED_LAMBDA_SF_ARN,
name: `${base64.encode(email)}-${base64.encode(timestamp)}`, // Dedupe key
input: JSON.stringify({
timestamp,
lambdaName: 'myLambdaName',
lambdaPayload: {
email,
initiatedBy
},
}),
}
await stepFunctions.startExecution(payload).promise()
I understand its quite late to answer this question. But anyone who wants to use CRON expression to trigger an event(or call an API) only once can use following example:
This event will be triggered only once on January 1, 2025 - 12:00:00 GMT
00 12 01 01 ? 2025
For those who do not have much knowledge of cron syntax:
Minutes Hours DayOfMonth Month DayOfWeek Year
I am using this with AWS Cloudwatch Events and the result looks like this:
Note: I did not have to specify Day of week, since I have given it a fixed date and thats obvious.
Invoking a lambda function via Events is asynchronous invocation option. By using CloudWatchEvent to trigger Lambda function we can use cron job but we are still facing issues as Lambda function triggered multiple times for the same cron schedule.PFB Link:
https://cloudonaut.io/your-lambda-function-might-execute-twice-deal-with-it/
But this needs Dynamo DB to be implemented in your account and then make your Lambda function Idempotent.