AWS Lambda function data consistency with DynamoDb - amazon-web-services

Using DyanamoDb streaming I am triggering a Lambda function. In function I am retrieving data from dynamodb based on primary key and if key matches the row needs to be updated. If not matched then a new entry will be created in dynamodb.
There are possibilities where Lambda function can scale as per shards created in streaming.
If received requests with similar primary key in different shards, multiple instances of Lambda function will try to get and update the same row at same time. Which will eventually produce wrong data in database (Chances of data overwrite).
I am thinking about solution to use UUID column and insert condition based data in dynamodb so it will fail if updated by another instance. But then need to execute all steps again for failed data.
Another solution where "reservedConcurrentExecutions" property of lambda function to 1 and then lambda function does not scale. Not sure if it throws exception when more than 1 shards get created in dynamodb streaming
I would like to know how I can implement this scenario.

Related

AWS Real time data fetching

I have an application which needs to read data from AWS dynamodb table every 5 seconds.
Currently I fetch the data using lambda, and then getting the data from dynamodb back to the user.
The problem with querying the table every 5 seconds is that it can have performance affect and moreover there is a pricing issue. (Most of the time the data might not even be changed at all but when it is changed I want to be notified it immediately).
An important clarification is that my app sits outsite of AWS, and only access the AWS dynamodb to get data (using simple http request built with c#).
Is there any way I can get a notification to my app when a new data is inserted into dynamodb?
Just to add something on top of #john-rotenstein answer:
Once you have properly configured a Lambda function to be triggered by an event from a DynamoDB Stream, you could have your Lambda function notify your Web Application via an HTTP Request.
Another option is to use Lambda to put this notification in a Queue you may be using outside AWS and then have your C# code be a consumer of this Queue. There are several possibilities to notify your application, you just need to see which one is the best / most cost effective for your current scenario.
A data update in DynamoDB can trigger a DynamoDB Stream, which can trigger an AWS Lambda function.
The Lambda function could notify your application in some way.
See: DynamoDB Streams and AWS Lambda Triggers
Streams is the right answer in terms of engineering, but just to say your concern about the polling option being expensive is unfounded. Therefore if you have a working solution I would be tempted to leave it.
If you queried a table every 5 seconds, it would cost you $0.25 every 2 months.
This assumes your table has on-demand pricing, and the query returns less than 4KB of data.
https://aws.amazon.com/dynamodb/pricing/on-demand/

Run a lambda on every DynamoDb entry on schedule?

Is there a way to run a Lambda on every DynamoDb table record?
I have a Dynamo table with name, last name, email and a Lambda that takes name, last name, email as parameters. I am trying to configure the environment such that, every day, the Lambda runs automatically for every value it finds within Dynamo; can't do all the records in one Lambda as it won't scale (will timeout once more users are added).
I currently have a CloudWatch rule set up that triggers the lambda on schedule but I had to manually add the parameters to the trigger from Dynamo - It's not automatic and not dynamic/not connected to dynamo.
--
Another option would be to run a lambda every time a DynamoDb record is updated... I could update all the records weekly and then upon updating them the Lambda would be triggered but I don't know if that's possible either.
Some more insight on either one of these approaches would be appreciated!
Is there a way to run a Lambda on every DynamoDb table record?
For your specific case where all you want to do is process each row of a DynamoDB table in a scalable fashion, I'd try going with a Lambda -> SQS -> Lambdas fanout like this:
Set up a CloudWatch Events Rule that triggers on a schedule. Have this trigger a dispatch Lambda function.
The dispatch Lambda function's job is to read all of the entries in your DynamoDB table and write messages to a jobs SQS queue, one per DynamoDB item.
Create a worker Lambda function that does whatever you want it to do with any given item from your DynamoDB table.
Connect the worker Lambda to the jobs SQS queue so that an instance of it will dispatch whenever something is put on the queue.
Since the limiting factor is lambda timeouts, run multiple lambdas using step functions. Perform a paginated scan of the table; each lambda will return the LastEvaluatedKey and pass it to the next invocation for the next page.
I think your best option is, just as you pointed out, to run a Lambda every time a DynamoDB record is updated. This is possible thanks to DynamoDB streams.
Streams are a ordered record of changes that happen to a table. These can invoke a Lambda, so it's automatic (however beware that the change appears only once in the stream, set up a DLQ in case your Lambda fails). This approach scales well and is also pretty evolvable. If need be, you can either push the events from the stream to an SQS or Kinesis, fan out, etc., depending on the requirements.

What event can be triggered to fire a lambda function in DynamoDB?

I can't seem to find the documentation on what kinds of events DynamoDB is able to trigger a lambda function based on. All I can find is mentions of when a new record is added to a table or a record is updated. Are those the two "only" actions/events available? Or could I also trigger a lambda function when I request a records that does not exists (which is what I need in my case, where I will be using DynamoDB as a cache)?
Triggering AWS Lambda through events happening in DynamoDB is done by utilizing DynamoDB Streams.
As stated in the documentation:
DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours.
So they only capture operations which modify data, which isn't the case for read operations.
Triggering a Lambda function automatically because somebody queried for a key that doesn't exist is not supported by DynamoDB. You would have to handle that in your querying code.

DynamoDB stream trigger invoke for all records

I'm trying to set up ElasticSearch import process from DynamoDB table. I have already created AWS Lambda and enabled DynamoDB stream with trigger that invokes my lambda for every added/updated record. Now I want to perform initial seed operation (import all records that are currently in my DynamoDB table to ElasticSearch). How do I do that? Is there any way to make all records in a table be "reprocessed" and added to stream (so they can be processed by my lambda)? Or is it better to write a separate function that will manually read all data from the table and send it to ElasticSearch - so basically have 2 lambdas: one for initial data migration (executed only once and triggered manually by me), and another one for syncing new records (triggered by DynamoDB stream events)?
Thanks for all the help :)
Depending on how Large your dataset is you won't be able to seed your database in Lambda as there is a max timeout of 300 seconds (EDIT: This is now 15 minutes, thanks #matchish).
You could spin up an EC2 instance and use the SDK to perform a DynamoDB scan operation and batch write to your Elasticsearch instance.
You could also use Amazon EMR to perform a Map Reduce Job to export to S3 and from there process all your data.
I would write a script that will touch each records in dynamodb. For each items in your dynamodb, add a new property called migratedAt or whatever you wish. Adding this property will trigger dynamodb stream which in turn will trigger your lambda handler. Based on your question, your lambda handler already handles the update so there is no change there.

Multiple DynamoDB triggers for Lambda - Separate invocation per table?

If I have a Lambda function that has multiple DynamoDB Stream triggers, is it guaranteed that each Lambda invocation only contains records from one table?
Yes. Each lambda invocation will get record from one table
Refer Using AWS Lambda with Amazon DynamoDB
Following is an extract from that web page
The event your Lambda function receives is the table update information AWS Lambda reads from your stream. When you configure event source mapping, the batch size you specify is the maximum number of records that you want your Lambda function to receive per invocation.