In specific RDS column as a date, I keep the information when user's trials end.
I'm going to check everyday these dates in database and when less the few days lefts to the end of trial, I want send an email message (with SES).
How can I run a periodic tasks in AWS to check database? I know that I can use:
Lambda
EC2 (or Elastic Beanstalk)
Is there any other solution which I've missed?
You can also use AWS Batch for this. This suits better if the job is heavy and takes more time to complete.
How long does it take to run your check? If it takes less than 300 sec and is well within the limits of Lambda (AWS Lambda Limits), then schedule tasks with Lambda: Schedule Expressions Using Rate or Cron
Otherwise, the best option is to use: AWS Data Pipeline. Very easy to schedule and run your custom script periodically. It charges at least one hour of instance.
Go with lamda here
You can create a Lambda function and direct AWS Lambda to execute it on a regular schedule. You can specify a fixed rate (for example, execute a Lambda function every hour or 15 minutes), or you can specify a Cron expression.
Related
We are using a task-scheduler to run a series (sequence) of tasks/jobs (SOAP calls) no our site, every night. But, we want to use AWS services such as Step-Function and Lambda to achieve above requirement, as Task-scheduler seems less reliable.
Challenge with Lambda is, 15 min. max timeout. As some of our jobs take more than 1 hour to process, I am having trouble figuring out which service could suffice the request.
I am also looking into AWS Fargate, as an alternative.
Any suggestions/edits are welcome, on which AWS services I could use to run jobs which take up to 1 hour or more.
You could look into AWS Batch which is meant for long running jobs. You can use store your powershell scripts in a Docker container and schedule them to run. They will run until completion.
I have a batch job that I need to run on AWS. I'm wondering what's the best service to use. The job needs to run once a day, so I think that naturally AWS Lambda with a CloudWatch Rule triggering it would do it. However, I'm starting to think that AWS Lambda is thought to be used as a service to handle requests. This AWS official library to integrate Spring-Boot is very oriented to handle HTTP requests, and when creating a lambda via AWS Console, only test cases that send an input to the lambda can be written.
Then, is this a use case for AWS Lambda? Also, these functions can run up to 15 minutes. What should I use if my job needs to run longer?
The purpose of Lambda, as compared to AWS EC2, is to simplify building smaller, on-demand applications that are responsive to events and new information.
If your batch is running within a limit of 15 minutes then you can go with a lambda function.
But if you want batch processing to be done, you should check AWS batch.
Here is nice article which demonstrates the usage of AWS batch.
If you are already using some batch framework like spring-batch, you can also take a look at ECS scheduled task with Fargate.
With ECS Fargate you can launch and stop container services that you need to run only at certain times.
Here are some related articles on Fargate event and scheduled task and Scheduled Tasks.
If you're confident that your function will only run at maximum of 15mins, AWS Lambda could be the solution. Here are the AWS Lambda limits that could help you decide on that.
Also note that lambda has cold start, it's when it will run slower at first but will eventually pick up the pace. Here are some good reads about it that could help you decide on the lambda direction, but feel free to check on any articles that could better explain at your disposal.
This one shows a brief lists that you would like to consider and the factors affecting it.
This one might have a deeper explanation of the cold start with regards to how it works internally.
What should I use if my job needs to run longer?
Depending on your infrastructure, you could maybe explore Scheduled Tasks
I am trying to come up with a way to have pieces of data processed at specific time intervals by invoking aws lambda every N hours.
For example, parse a page at specific url every 6 hours and store result in s3 bucket.
Have many (~100k) urls each processed that way.
Of course, you can have a VM that hosts some scheduler that would trigger lambdas, as described in this answer, but that breaks the "serverless" approach.
So, is there a way to do this using aws services only?
Things I tried that does not work:
SQS can delay messages, but only for maximum of 15 min (I need hours) and there is no built-in integration between SQS and Lambda so you need to have some polling agent (lambda?) that would poll the qeueu all the time and send new messages to worker lambda, which again breaks the point of only executing at scheduled time;
CloudWatch Alarms can send messages to SNS that triggers Lambda. You can have periodic lambda calls implemented like that by using future metric timestamp, however alarm message cannot have a custom data (think url from example above) connected to it, so that does not work too;
I could create Lambda CloudWatch scheduled triggers programmatically but they also cannot pass any data to Lambda.
The only way I could think of, is to have a dynamo DB table with "url" records, each with the timestamp of last "processing" and have periodic lambda that would query the table and send "old" records as jobs to another "worker" lambda (directly or via SNS).
That would work, however you still need to have a "polling" lambda, which could become a bottleneck as number of items to process grows.
Any other ideas?
100k jobs every 6 hours, doesn't sound like a great use case for Serverless IMO. Personally, I would set up a CloudWatch event with a relevant cron expression that triggered a Lambda to start an EC2 instance that processed all the URLs (stored in DynamoDB) and script the EC2 instance to shutdown after processing the last url.
But that's not what you asked.
You could set up a CloudWatch event with a relevant cron expression that spawns a lambda (orchestrator) reads the urls from DynamoDB or even an S3 file then invokes a second lambda (worker) for each url to actually parse the pages.
Using this pattern you will start hitting concurrency issues at 1000 lambdas (1 orchestrator & 999 workers), less if you have other lambdas running in the same region. You can ask AWS to increase this limit, but I don't know under what scenarios they will do this, or how high they will increase the limit.
From here you have three choices.
Split out the payload to each worker lambda so each instance receives multiple urls to process.
Add an another column to your list of urls and group urls with this column (e.g. first 500 are marked with a 1, second 500 are marked with a 2, etc). Then your orchestrator lambda could take urls off the list in batches. This would require you to run the CloudWatch event at a greater frequency and manage the state so the orchestrator lambda when invoked knows which is the next batch (I've done this at a smaller scale just storing a variable in a S2 file).
Would be to use some combination of options 1 and 2.
Looks like, it's fitting Batch processing scenario with AWS lambda function as a job. It's serverless but obviously adds dependency on another AWS service.
In the same time, it has dashboard, processing status, retries and all perks from job scheduling service.
I was writing server with serverless model, currently aws lambda. And have a requirement to run a job on exact datetime
Currently now I was running a cron job with aws cloudwatch to execute my server every minute, find all tasks which has timestamp older than present then do those task. Which is both wasteful and sometimes make a delay or in advanced by one minute from the actual time it need (because cloudwatch has maximum frequency only one ping per one minute). Not a desirable approach
And the work is not the same everyday. It can be dynamic datetime by client to ping my server
I wish there should be some service that like a message queue but can actively call target URL on scheduling timestamp. Is there something like that? It could be any service outside aws if it can put a URL for request
Thank you very much
Have you considered getting small EC2 instance and then set up cron jobs there? It can then publish events to SNS or directly call required tasks. And you should be able to schedule new jobs dynamically as well.
You can use DynamoDB with TTL, DynamoDB Streams and AWS Lambda for this.
Since the schedule is dynamic and coming from the user, you can save those items in a DynamoDB table with its TTL set to the scheduled execution time.
When the TTL is reached for an item, it will create a DynamoDB Stream which you can then use to trigger a Lambda function.
References:
DynamoDB Streams and Time To Live
DynamoDB Streams and AWS Lambda Triggers
As a workaround, why not have the lambda wake on a Cloudwatch alert, then check for tasks every 5 seconds until 55 seconds have elapsed?
You likely already found a solution to this but my service https://posthook.io may be good fit for your use case. It lets you schedule 'hooks' with an API call like this:
curl https://api.posthook.io/v1/hooks \
-H 'X-API-Key: ${POSTHOOK_API_KEY}' \
-H 'Content-Type: application/json' \
-d '{
"path": "/webhooks/ph/event_reminder",
"postAt": "2018-07-03T01:11:55Z",
"data": {
"eventID": 25
}
}'
Then from your lamdba function you can either use the data you passed in as data or the hook's unique ID to look something up in your database and do the needed work. A free account allows you to schedule 500 of these requests a month.
Other solutions seem promising but there are another solution I found
using step functions wait state
http://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-wait-state.html
I cannot use it in my region yet because my region is singapore and it cannot be used across region. Currently now I would try to see a dynamodb solution above
As of 2018 step function was generaly available and work as expected
As of 2018 there was Azure Logic Apps. An equivalence service to aws step function on azure. It contains delay connector that can schedule delay time
https://learn.microsoft.com/en-us/azure/connectors/connectors-native-delay
What would be the best way to run a python script on the first of every month?
My situation is I want some data sent to a HipChat room, using the python api, on the first of every month from AWS. The data I want sent is in a text file in a S3 bucket
If your script can execute in under 5 minutes you could do this by creating a Python Lambda function and running it monthly via Lambda scheduled tasks. Running this once a month would stay well within the free Lambda usage limits, so your costs would be almost nothing.
If your script takes longer than 5 minutes to execute then you would probably need to schedule this as a cron job on an EC2 instance.
Create a Lambda function and use cloudWatch ==> Events ==> Rules and configure it
using:
1:AWS built in timers
2:Cron Expressions
In your case cron is better option