Need to call a function at specific times without having a server up and running all the time
In particular, the challenge I'm facing is that we only use AWS Lambda and DynamoDB to - among other things - send a reminder to users at a time of their choice. That means we have to call a lambda function at the time the user wants to be reminded.
The time changes dynamically (depending on each user's choice) so the question is, what is a good way to set this up?
We are considering setting up a server if there's no way around it but even if we go for this solution, I lack the experience to see a good way to set this up. Any suggestions are greatly appreciated.
You can use AWS DynamoDB TTL event stream to trigger Lambda to achieve this. The approach is as follows.
Create a DynamoDB table to store User alarms.
When user setup an alarm, calculate the difference between the alarm timestamp and current timestamp.
Then store the difference as the TTL value of the alarm record, along with alarm information.
Configure DynamoDB streams to trigger a Lambda when TTL exceeds
You can call your Lambda function on a scheduled event:
http://docs.aws.amazon.com/lambda/latest/dg/with-scheduled-events.html
So set up your Lambda function with cron like event to wake on any interval you need, retrieve the list of alarms you need to send next, send them, mark completed alarms so they won't be triggered again.
Related
I have a workflow where I need to pass in some information that would be stored for a period of time, and then sends off a trigger after a scheduled period of time with the same information.
I considered using a TTL on a dynamo db table, but I was wondering if I could use cloudwatch events for this since it seems ideal as it has cron expressions for cloudwatch rules.
I know I can setup a cloudwatch rule to trigger say every 15 minutes, but how do I setup cloudwatch such that only my custom information gets picked up by this rule and I can pass some information into this event so that when the trigger gets sent to the target, my custom information is sent to the target as well?
DynamoDB TTL is a bad fit as you don’t get a ton of granularity.
You can use the CloudWatch events PutEvent api to put a custom event. That should get you where you want.
https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/AddEventsPutEvents.html
I have an application which needs to read data from AWS dynamodb table every 5 seconds.
Currently I fetch the data using lambda, and then getting the data from dynamodb back to the user.
The problem with querying the table every 5 seconds is that it can have performance affect and moreover there is a pricing issue. (Most of the time the data might not even be changed at all but when it is changed I want to be notified it immediately).
An important clarification is that my app sits outsite of AWS, and only access the AWS dynamodb to get data (using simple http request built with c#).
Is there any way I can get a notification to my app when a new data is inserted into dynamodb?
Just to add something on top of #john-rotenstein answer:
Once you have properly configured a Lambda function to be triggered by an event from a DynamoDB Stream, you could have your Lambda function notify your Web Application via an HTTP Request.
Another option is to use Lambda to put this notification in a Queue you may be using outside AWS and then have your C# code be a consumer of this Queue. There are several possibilities to notify your application, you just need to see which one is the best / most cost effective for your current scenario.
A data update in DynamoDB can trigger a DynamoDB Stream, which can trigger an AWS Lambda function.
The Lambda function could notify your application in some way.
See: DynamoDB Streams and AWS Lambda Triggers
Streams is the right answer in terms of engineering, but just to say your concern about the polling option being expensive is unfounded. Therefore if you have a working solution I would be tempted to leave it.
If you queried a table every 5 seconds, it would cost you $0.25 every 2 months.
This assumes your table has on-demand pricing, and the query returns less than 4KB of data.
https://aws.amazon.com/dynamodb/pricing/on-demand/
Is there a way to run a Lambda on every DynamoDb table record?
I have a Dynamo table with name, last name, email and a Lambda that takes name, last name, email as parameters. I am trying to configure the environment such that, every day, the Lambda runs automatically for every value it finds within Dynamo; can't do all the records in one Lambda as it won't scale (will timeout once more users are added).
I currently have a CloudWatch rule set up that triggers the lambda on schedule but I had to manually add the parameters to the trigger from Dynamo - It's not automatic and not dynamic/not connected to dynamo.
--
Another option would be to run a lambda every time a DynamoDb record is updated... I could update all the records weekly and then upon updating them the Lambda would be triggered but I don't know if that's possible either.
Some more insight on either one of these approaches would be appreciated!
Is there a way to run a Lambda on every DynamoDb table record?
For your specific case where all you want to do is process each row of a DynamoDB table in a scalable fashion, I'd try going with a Lambda -> SQS -> Lambdas fanout like this:
Set up a CloudWatch Events Rule that triggers on a schedule. Have this trigger a dispatch Lambda function.
The dispatch Lambda function's job is to read all of the entries in your DynamoDB table and write messages to a jobs SQS queue, one per DynamoDB item.
Create a worker Lambda function that does whatever you want it to do with any given item from your DynamoDB table.
Connect the worker Lambda to the jobs SQS queue so that an instance of it will dispatch whenever something is put on the queue.
Since the limiting factor is lambda timeouts, run multiple lambdas using step functions. Perform a paginated scan of the table; each lambda will return the LastEvaluatedKey and pass it to the next invocation for the next page.
I think your best option is, just as you pointed out, to run a Lambda every time a DynamoDB record is updated. This is possible thanks to DynamoDB streams.
Streams are a ordered record of changes that happen to a table. These can invoke a Lambda, so it's automatic (however beware that the change appears only once in the stream, set up a DLQ in case your Lambda fails). This approach scales well and is also pretty evolvable. If need be, you can either push the events from the stream to an SQS or Kinesis, fan out, etc., depending on the requirements.
I am trying to come up with a way to have pieces of data processed at specific time intervals by invoking aws lambda every N hours.
For example, parse a page at specific url every 6 hours and store result in s3 bucket.
Have many (~100k) urls each processed that way.
Of course, you can have a VM that hosts some scheduler that would trigger lambdas, as described in this answer, but that breaks the "serverless" approach.
So, is there a way to do this using aws services only?
Things I tried that does not work:
SQS can delay messages, but only for maximum of 15 min (I need hours) and there is no built-in integration between SQS and Lambda so you need to have some polling agent (lambda?) that would poll the qeueu all the time and send new messages to worker lambda, which again breaks the point of only executing at scheduled time;
CloudWatch Alarms can send messages to SNS that triggers Lambda. You can have periodic lambda calls implemented like that by using future metric timestamp, however alarm message cannot have a custom data (think url from example above) connected to it, so that does not work too;
I could create Lambda CloudWatch scheduled triggers programmatically but they also cannot pass any data to Lambda.
The only way I could think of, is to have a dynamo DB table with "url" records, each with the timestamp of last "processing" and have periodic lambda that would query the table and send "old" records as jobs to another "worker" lambda (directly or via SNS).
That would work, however you still need to have a "polling" lambda, which could become a bottleneck as number of items to process grows.
Any other ideas?
100k jobs every 6 hours, doesn't sound like a great use case for Serverless IMO. Personally, I would set up a CloudWatch event with a relevant cron expression that triggered a Lambda to start an EC2 instance that processed all the URLs (stored in DynamoDB) and script the EC2 instance to shutdown after processing the last url.
But that's not what you asked.
You could set up a CloudWatch event with a relevant cron expression that spawns a lambda (orchestrator) reads the urls from DynamoDB or even an S3 file then invokes a second lambda (worker) for each url to actually parse the pages.
Using this pattern you will start hitting concurrency issues at 1000 lambdas (1 orchestrator & 999 workers), less if you have other lambdas running in the same region. You can ask AWS to increase this limit, but I don't know under what scenarios they will do this, or how high they will increase the limit.
From here you have three choices.
Split out the payload to each worker lambda so each instance receives multiple urls to process.
Add an another column to your list of urls and group urls with this column (e.g. first 500 are marked with a 1, second 500 are marked with a 2, etc). Then your orchestrator lambda could take urls off the list in batches. This would require you to run the CloudWatch event at a greater frequency and manage the state so the orchestrator lambda when invoked knows which is the next batch (I've done this at a smaller scale just storing a variable in a S2 file).
Would be to use some combination of options 1 and 2.
Looks like, it's fitting Batch processing scenario with AWS lambda function as a job. It's serverless but obviously adds dependency on another AWS service.
In the same time, it has dashboard, processing status, retries and all perks from job scheduling service.
Is there a way to dynamically create scheduled lambda calls in AWS? I have to create many scheduled lambda calls that. I am aware of CloudWatch rules, but they have a limit on the amount you can get. I also heard about Cronally, but they are not launched yet, and I'd rather do something like this on my own. I do not see an obvious solution without trade offs, but does the 'easy way' exist, or it all depends on the particular application?
The cloudwatch events docs say the limit of 50 rules per account can be raised on request so maybe they might be able to raise it high enough for your needs.
Alternatively you could just do one rule that fires a single "scheduler"lambda function every minute. that scheduler can contain a
Schedule of which functions get fired at which times and invoke the other lambda functions according to that schedule. You could even store the schedule in a dynamoDB table or s3 bucket so don't need to update the lambda function itself to change the schedule.