Multiple DynamoDB triggers for Lambda - Separate invocation per table? - amazon-web-services

If I have a Lambda function that has multiple DynamoDB Stream triggers, is it guaranteed that each Lambda invocation only contains records from one table?

Yes. Each lambda invocation will get record from one table
Refer Using AWS Lambda with Amazon DynamoDB
Following is an extract from that web page
The event your Lambda function receives is the table update information AWS Lambda reads from your stream. When you configure event source mapping, the batch size you specify is the maximum number of records that you want your Lambda function to receive per invocation.

Related

Redshift Lambda UDF not batching as expected

How do I configure a redshift lambda UDF to batch requests?
On this page Creating a scalar Lambda UDF - Amazon Redshift it says in the note section:
You can configure batching of multiple invocations of your Lambda function to improve performance and lower costs.
I'm testing with a hello world lambda that simply returns the input given. Here is the SQL ddl I'm using:
CREATE OR REPLACE EXTERNAL FUNCTION hello_world (varchar)
RETURNS varchar IMMUTABLE
LAMBDA 'redshift_udf_testy'
IAM_ROLE '<redacted>';
My UDF works fine, however it doesn't seem to batch requests. I would expect the following query:
select hello_world(generate_series(1, 500)::text);
to pass multiple rows at a time to hello_world (since the lambda UDF JSON api specifies that it be able to handle arrays of arguments). But instead it performs 500 separate invocations of my lambda function (every lambda invocation has a single row passed in),
which seems totally incorrect.
Any idea how I can configure it to batch? The docs mention it in passing but i can't find anything concrete.
you can set the maximum number of rows that Amazon Redshift sends in a single batch request for a single lambda invocation and the maximum size of the data payload that Amazon Redshift sends in a single batch request for a single lambda invocation by configuring the MAX_BATCH_ROWS and MAX_BATCH_SIZE parameters respectively. Public documentation can be found at: https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_FUNCTION.html

AWS Lambda function data consistency with DynamoDb

Using DyanamoDb streaming I am triggering a Lambda function. In function I am retrieving data from dynamodb based on primary key and if key matches the row needs to be updated. If not matched then a new entry will be created in dynamodb.
There are possibilities where Lambda function can scale as per shards created in streaming.
If received requests with similar primary key in different shards, multiple instances of Lambda function will try to get and update the same row at same time. Which will eventually produce wrong data in database (Chances of data overwrite).
I am thinking about solution to use UUID column and insert condition based data in dynamodb so it will fail if updated by another instance. But then need to execute all steps again for failed data.
Another solution where "reservedConcurrentExecutions" property of lambda function to 1 and then lambda function does not scale. Not sure if it throws exception when more than 1 shards get created in dynamodb streaming
I would like to know how I can implement this scenario.

Lambda function synchronously invoked using dynamoDB stream trigger understanding

I am using Lambda function to read data from dyanmoDB streams. Lambda read items from stream and invokes lambda function once for each batch. Lambda invokes lambda function synchronously using event source mapping.
From what i understand from aws docs is, Lambda invokes a lambda function for each batch in the stream. Suppose there are 1000 items in stream instantly and I configures my lambda function to read 100 items in a batch.
So will it invoke 10 lambda function concurrently to process 10 batch of 100 items each?
I am learning AWS. Is my understanding correct? if yes what does synchronously invoked mean?
DynamoDB uses shards* to partition the data inside a table. The data that will be stored in each shard is defined by the HashKey of the table. DynamoDB streams will trigger AWS Lambda for each shard that was updated and aggregate the shard records in a batch. So the number of records in each batch will depend on the number of records updated in each shard. They can be different of course.
Synchronously invoked means that the service that triggered the function will wait until the function ends to finish its own execution. When you trigger asynchronous, you send a request and forget about it. If the downstream function successfully process the stream or not is not in control of the upstream service. DynamoDB invokes Lambda Function synchronously and waits while it works. If it ends successfully, it will mark the stream as processed. If it ends with a failure it will retry a few more times. This is important to allow at least once processing of ddb streams.
*Shards are different partitions of the database. They allow DynamoDB to process parallel queries and updates. Normally they reside in different storages/availability zones.

Run a lambda on every DynamoDb entry on schedule?

Is there a way to run a Lambda on every DynamoDb table record?
I have a Dynamo table with name, last name, email and a Lambda that takes name, last name, email as parameters. I am trying to configure the environment such that, every day, the Lambda runs automatically for every value it finds within Dynamo; can't do all the records in one Lambda as it won't scale (will timeout once more users are added).
I currently have a CloudWatch rule set up that triggers the lambda on schedule but I had to manually add the parameters to the trigger from Dynamo - It's not automatic and not dynamic/not connected to dynamo.
--
Another option would be to run a lambda every time a DynamoDb record is updated... I could update all the records weekly and then upon updating them the Lambda would be triggered but I don't know if that's possible either.
Some more insight on either one of these approaches would be appreciated!
Is there a way to run a Lambda on every DynamoDb table record?
For your specific case where all you want to do is process each row of a DynamoDB table in a scalable fashion, I'd try going with a Lambda -> SQS -> Lambdas fanout like this:
Set up a CloudWatch Events Rule that triggers on a schedule. Have this trigger a dispatch Lambda function.
The dispatch Lambda function's job is to read all of the entries in your DynamoDB table and write messages to a jobs SQS queue, one per DynamoDB item.
Create a worker Lambda function that does whatever you want it to do with any given item from your DynamoDB table.
Connect the worker Lambda to the jobs SQS queue so that an instance of it will dispatch whenever something is put on the queue.
Since the limiting factor is lambda timeouts, run multiple lambdas using step functions. Perform a paginated scan of the table; each lambda will return the LastEvaluatedKey and pass it to the next invocation for the next page.
I think your best option is, just as you pointed out, to run a Lambda every time a DynamoDB record is updated. This is possible thanks to DynamoDB streams.
Streams are a ordered record of changes that happen to a table. These can invoke a Lambda, so it's automatic (however beware that the change appears only once in the stream, set up a DLQ in case your Lambda fails). This approach scales well and is also pretty evolvable. If need be, you can either push the events from the stream to an SQS or Kinesis, fan out, etc., depending on the requirements.

What event can be triggered to fire a lambda function in DynamoDB?

I can't seem to find the documentation on what kinds of events DynamoDB is able to trigger a lambda function based on. All I can find is mentions of when a new record is added to a table or a record is updated. Are those the two "only" actions/events available? Or could I also trigger a lambda function when I request a records that does not exists (which is what I need in my case, where I will be using DynamoDB as a cache)?
Triggering AWS Lambda through events happening in DynamoDB is done by utilizing DynamoDB Streams.
As stated in the documentation:
DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours.
So they only capture operations which modify data, which isn't the case for read operations.
Triggering a Lambda function automatically because somebody queried for a key that doesn't exist is not supported by DynamoDB. You would have to handle that in your querying code.