Invoke Lambda based on Kinesis records Partition Key - amazon-web-services

I have different Types of data flowing in through my kinesis streams. Each of the record has a different partition key. I need to invoke a lambda function only if a record with certain partition key in added to the stream. Is there a way to specify that the lambda would be triggered only if a partition key "a" is encountered rather than invoking a lambda and then checking the partition key?

It's more kind of a Design Question, I don't know if you can configure this. But you can always use Lambda Chaining using SNS.
Lambda Fuctions with SNS
Create SNS topic for each of your Partition Key and configure your Lambda on them respectively, then You can create one Parent Lambda function which can get Partition key and Object. And publish those object to the respective SNS topics.

You can either use an SNS topic where you push only records that have a certain partition key and then connect another lambda function to that topic that processes the records.
Another option would be to asynchronously re-invoke your lambda function. The benefit is that you don't need another component (i.e., no SNS topic).
How this would work is basically that you check in your function if the partition key is "a" and if it is, re-invoke the function asynchronously with the record and a specific parameter that indicates that the record should be processed.
You can read more about how this can be done here:
https://engineering.dubsmash.com/implementing-real-time-analytics-using-aws-kinesis-lambda-1ea56f10e473

Related

AWS Lambda function data consistency with DynamoDb

Using DyanamoDb streaming I am triggering a Lambda function. In function I am retrieving data from dynamodb based on primary key and if key matches the row needs to be updated. If not matched then a new entry will be created in dynamodb.
There are possibilities where Lambda function can scale as per shards created in streaming.
If received requests with similar primary key in different shards, multiple instances of Lambda function will try to get and update the same row at same time. Which will eventually produce wrong data in database (Chances of data overwrite).
I am thinking about solution to use UUID column and insert condition based data in dynamodb so it will fail if updated by another instance. But then need to execute all steps again for failed data.
Another solution where "reservedConcurrentExecutions" property of lambda function to 1 and then lambda function does not scale. Not sure if it throws exception when more than 1 shards get created in dynamodb streaming
I would like to know how I can implement this scenario.

Run a lambda on every DynamoDb entry on schedule?

Is there a way to run a Lambda on every DynamoDb table record?
I have a Dynamo table with name, last name, email and a Lambda that takes name, last name, email as parameters. I am trying to configure the environment such that, every day, the Lambda runs automatically for every value it finds within Dynamo; can't do all the records in one Lambda as it won't scale (will timeout once more users are added).
I currently have a CloudWatch rule set up that triggers the lambda on schedule but I had to manually add the parameters to the trigger from Dynamo - It's not automatic and not dynamic/not connected to dynamo.
--
Another option would be to run a lambda every time a DynamoDb record is updated... I could update all the records weekly and then upon updating them the Lambda would be triggered but I don't know if that's possible either.
Some more insight on either one of these approaches would be appreciated!
Is there a way to run a Lambda on every DynamoDb table record?
For your specific case where all you want to do is process each row of a DynamoDB table in a scalable fashion, I'd try going with a Lambda -> SQS -> Lambdas fanout like this:
Set up a CloudWatch Events Rule that triggers on a schedule. Have this trigger a dispatch Lambda function.
The dispatch Lambda function's job is to read all of the entries in your DynamoDB table and write messages to a jobs SQS queue, one per DynamoDB item.
Create a worker Lambda function that does whatever you want it to do with any given item from your DynamoDB table.
Connect the worker Lambda to the jobs SQS queue so that an instance of it will dispatch whenever something is put on the queue.
Since the limiting factor is lambda timeouts, run multiple lambdas using step functions. Perform a paginated scan of the table; each lambda will return the LastEvaluatedKey and pass it to the next invocation for the next page.
I think your best option is, just as you pointed out, to run a Lambda every time a DynamoDB record is updated. This is possible thanks to DynamoDB streams.
Streams are a ordered record of changes that happen to a table. These can invoke a Lambda, so it's automatic (however beware that the change appears only once in the stream, set up a DLQ in case your Lambda fails). This approach scales well and is also pretty evolvable. If need be, you can either push the events from the stream to an SQS or Kinesis, fan out, etc., depending on the requirements.

What event can be triggered to fire a lambda function in DynamoDB?

I can't seem to find the documentation on what kinds of events DynamoDB is able to trigger a lambda function based on. All I can find is mentions of when a new record is added to a table or a record is updated. Are those the two "only" actions/events available? Or could I also trigger a lambda function when I request a records that does not exists (which is what I need in my case, where I will be using DynamoDB as a cache)?
Triggering AWS Lambda through events happening in DynamoDB is done by utilizing DynamoDB Streams.
As stated in the documentation:
DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours.
So they only capture operations which modify data, which isn't the case for read operations.
Triggering a Lambda function automatically because somebody queried for a key that doesn't exist is not supported by DynamoDB. You would have to handle that in your querying code.

Multiple DynamoDB triggers for Lambda - Separate invocation per table?

If I have a Lambda function that has multiple DynamoDB Stream triggers, is it guaranteed that each Lambda invocation only contains records from one table?
Yes. Each lambda invocation will get record from one table
Refer Using AWS Lambda with Amazon DynamoDB
Following is an extract from that web page
The event your Lambda function receives is the table update information AWS Lambda reads from your stream. When you configure event source mapping, the batch size you specify is the maximum number of records that you want your Lambda function to receive per invocation.

Can I ensure that AWS DynamoDB triggers are NOT handled in parallel by an AWS Lambda function?

I have a scenario where I have a DynamoDB table with a trigger (a stream) to an AWS Lambda function.
I want to use DynamoDB as an Event Store and use the Lambda function to maintain a projection / aggregate view / read view of the data.
I need to make sure that when I save an CreateEntity event in DynamoDB and then maybe right after when I save an UpdateEntity that the Lambda function will process the CreateEntity event before the UpdateEntity event.
My understanding is that the parallelism of the triggers to Lambda depends on the number of Shards the DynamoDB stream consists of. So if the DynamoDB Stream that the Lambda function use have 2 shards and one event goes on Shard1 and the other event goes on Shard2 then they can be processed in parallel by two instances of the Lambda function.
So if the CreateEntity event is on Shard1 and UpdateEntity is on Shard2 then if Shard1 or the Lambda function instance for some reason is slow then the UpdateEntity event in Shard2 might be processed first. Meaning that it cannot be added to the projection because there is no entity created first.
Is my understanding correct?
Is there a way to ensure that the events are only processed by one instance of the Lambda function so that I can ensure the ordering of processing of the messages?
Or do I have to use something else than Lambda for this? For example DynamoDB stream to Kinesis with my own application where I can ensure that only one instance of the application is running and ensure the ordering this way.
this is partly correct
if you CreateEntity X, and then UpdateEntity X, then in almost all of the cases. it will happen on the same shard (entities are split on shards according their composite key).
the only case that it wont work is when your entity is split over shard, and this can happen only if you have small amount of unique entities, any many of them. and if you are in this case then you are doing something wrong..
so in your case its being ensured...