Lambda Function Inconsistently Missing DynamoDB Triggers - amazon-web-services

Perhaps I have a misunderstanding of something, but I have a Lambda function that is triggered by new items being added to a DynamoDB table. My trigger is configured like so;
DynamoDB Table Name: My Table
Batch Size: 100
Starting Position: Latest
My function's code parses out any events that are not INSERT, and for the most part, this is functioning well. I am noticing, however, that some of my new records will occasionally not trigger the Lambda function (I update the record with a completed tag when the function has run). I can not find any rhyme or reason as to why, but wondering if I'm missing what the Batch Size is (I want every new record to trigger the function to run, as my users will be publishing individual records to the table).
Is this common behavior or is there more I could share to learn what could be causing this?

Related

AWS Lambda - trigger synchronously repeatedly until a condition has been met

I have a use case where i want a scheduled lambda to read from a dynamodb table until there are no records left to process from its dynamodb query. I don't want to run lots of instances of the lamdba as it will hit a REST endpoint each time and don't want to overload this external service.
The reason I am thinking i can't use dynamo streams (please correct me if I am wrong here) is
this DDB is where messages will be sent when a legacy service is down, the scheduled error handler lambda that will read them would not want to try and process them as soon as they are inserted as it is likely the legacy service is still down. (is it possible with streams to update one row in the DB say legacy_service = alive and then trigger a lambda ONLY for the rows where processed_status = false)
I also don't want to have multiple instances of the lambda running at one time as i don't want to throttle the legacy service.
I would like a scheduled lambda that queries dynamodb table for all records that have processed_status = false, the query has a limit to only retrieve a small batch (1 or 2 messages) and process them ( I have this part implemented already) when this lambda is finished i would like it to trigger again and again until there is no records in the DDB with processed_status = false.
This can be done with recursive functions good tutorial here https://labs.ebury.rocks/2016/11/22/recursive-amazon-lambda-functions/

Triggering AWS Lambda when a DynamoDB table grows to a certain size

I'm interested in seeing whether I can invoke an AWS Lambda when one of my DynamoDB tables grows to a certain size. Nothing in the DynamoDB Events/Triggers docs nor the Lambda Developer Guide suggests this is possible, but I find that hard to believe. Anyone ever deal with anything like this before?
You will have to do it manually.
I see two out-of-the box ways to achieve this though:
1) You can create a CloudWatch Event that runs every X min (replace X with whatever you think is necessary for your business case) to trigger your Lambda Function. Your function then needs to invoke the describeTable API and run a check against that value. Once it has run, you can disable the event since your table has reached the size you wanted to be notified about. This is the easiest and most cost effective since most of time your tables size will be lower than your predefined limit.
2) You could also use DynamoDB streams and invoke the describeTable API, but then your function would be triggered upon every new event in your table. This is cost ineffective and, in my opinion, overkilling.

How do you run functions in parallel?

My desire is to retrieve x number of records from a database based on some custom select statement, the output will be an array of json data. I then want to pass each element in the array into another lambda function in parallel.
So if 1000 records are returned, 1000 lambda functions need to be executed in parallel (I increase my account limit to what I need). If 30 out of 1000 fail, the main task that was retrieving the records needs to know about it.
I'm struggling to put together this simple flow.
I currently use javascript and AWS Aurora. I'm not looking for node.js/javascript code that retrieves the data, just the AWS Step Functions configuration and how to build an array within each function.
Thank you.
if 1000 records are returned, 1000 lambda functions need to be
executed in parallel
What you are trying to achieve is not supported by Step Functions. A State Machine task cannot be modified based on the input it received. So for instance, a Parallel task cannot be configured to add/remove functions based on the number of items it received in an array input.
You should probably consider using SQS Lambda trigger. Number of records retrieved from DB can be added to SQS queue which will then trigger a Lambda function for each item received.
If 30 out of 1000 fail, the main task that was retrieving the records
needs to know about it.
There are various ways to achieve this. SQS won't delete an item from the queue if Lambda returns an error. You can configure DLQ and RedrivePolicy based on your requirements. Or you may want to come up with a custom solution to keep the count on failing Lambdas to invoke the service that fetch records from the DB.

DynamoDB not triggering lambda

I'm experimenting with dynamo db and lambda and am having trouble with the following flow:
Lambda A is triggered by a put to S3 event. It takes the object, an audio file, calculates its duration and writes a record in dynamoDB for each 30 second segment.
Lambda B is triggered by dynamoDB, downloads the file from S3 and operates on the 30 second record defined in the dynamo row.
My trouble is that when I run this flow, function A writes all of the rows required to dynamo, by function B
Does not seem to be triggered for each row in dynamo
Times out after 5 minutes.
Configuration
Function B is set with the highest memory and 5 minute expiration
The trigger is set with a batch size of 1 and starting position latest
Things I've confirmed
When function B is triggered, the download from S3 happens fast. This does not seem to be the blocker
When I trigger function B with a test event it executes perfectly.
When I look at the cloudwatch metrics, function B has a nearly 100% error rate in invocation. I can't tell if this means he function was invoked and had an error or could not be invoked at all.
Has anyone had similar issues? Any idea what to check next?
Thanks
I had the same problem, the solution was to create a VERSION from the Lambda and NOT to use the $LATEST Version, but a 'fixed' one.
It is not possible to use the latest ever-changing version to build a trigger upon.
Place to do that:
Lambda / Functions / YourLambdaName / Qualifiers Dropdown on the page / Switch versions/aliases / Version Tab -> check that you have a version
If not -> Actions / Publish new version
Check for DynamoDB "Stream" is it is enabled on the table.
Checkout this
5 min timeout is default for lambda, you can find this mentioned in forums.

AWS - want to upload multiple files to S3 and only when all are uploaded trigger a lambda function

I am seeking advice on what's the best way to design this -
Use Case
I want to put multiple files into S3. Once all files are successfully saved, I want to trigger a lambda function to do some other work.
Naive Approach
The way I am approaching this is by saving a record in Dynamo that contains a unique identifier and the total number of records I will be uploading along with the keys that should exist in S3.
A basic implementation would be to take my existing lambda function which is invoked anytime my S3 bucket is written into, and have it check manually whether all the other files been saved.
The Lambda function would know (look in Dynamo to determine what we're looking for) and query S3 to see if the other files are in. If so, use SNS to trigger my other lambda that will do the other work.
Edit: Another approach is have my client program that puts the files in S3 be responsible for directly invoking the other lambda function, since technically it knows when all the files have been uploaded. The issue with this approach is that I do not want this to be the responsibility of the client program... I want the client program to not care. As soon as it has uploaded the files, it should be able to just exit out.
Thoughts
I don't think this is a good idea. Mainly because Lambda functions should be lightweight, and polling the database from within the Lambda function to get the S3 keys of all the uploaded files and then checking in S3 if they are there - doing this each time seems ghetto and very repetitive.
What's the better approach? I was thinking something like using SWF but am not sure if that's overkill for my solution or if it will even let me do what I want. The documentation doesn't show real "examples" either. It's just a discussion without much of a step by step guide (perhaps I'm looking in the wrong spot).
Edit In response to mbaird's suggestions below-
Option 1 (SNS) This is what I will go with. It's simple and doesn't really violate the Single Responsibility Principal. That is, the client uploads the files and sends a notification (via SNS) that its work is done.
Option 2 (Dynamo streams) So this is essentially another "implementation" of Option 1. The client makes a service call, which in this case, results in a table update vs. a SNS notification (Option 1). This update would trigger the Lambda function, as opposed to notification. Not a bad solution, but I prefer using SNS for communication rather than relying on a database's capability (in this case Dynamo streams) to call a Lambda function.
In any case, I'm using AWS technologies and have coupling with their offering (Lambda functions, SNS, etc.) but I feel relying on something like Dynamo streams is making it an even tighter coupling. Not really a huge concern for my use case but still feels dirty ;D
Option 3 with S3 triggers My concern here is the possibility of race conditions. For example, if multiple files are being uploaded by the client simultaneously (think of several async uploads fired off at once with varying file sizes), what if two files happen to finish uploading at around the same time, and two or more Lambda functions (or whatever implementations we use) query Dynamo and gets back N as the completed uploads (instead of N and N+1)? Now even though the final result should be N+2, each one would add 1 to N. Nooooooooooo!
So Option 1 wins.
If you don't want the client program responsible for invoking the Lambda function directly, then would it be OK if it did something a bit more generic?
Option 1: (SNS) What if it simply notified an SNS topic that it had completed a batch of S3 uploads? You could subscribe your Lambda function to that SNS topic.
Option 2: (DynamoDB Streams) What if it simply updated the DynamoDB record with something like an attribute record.allFilesUploaded = true. You could have your Lambda function trigger off the DynamoDB stream. Since you are already creating a DynamoDB record via the client, this seems like a very simple way to mark the batch of uploads as complete without having to code in knowledge about what needs to happen next. The Lambda function could then check the "allFilesUploaded" attribute instead of having to go to S3 for a file listing every time it is called.
Alternatively, don't insert the DynamoDB record until all files have finished uploading, then your Lambda function could just trigger off new records being created.
Option 3: (continuing to use S3 triggers) If the client program can't be changed from how it works today, then instead of listing all the S3 files and comparing them to the list in DynamoDB each time a new file appears, simply update the DynamoDB record via an atomic counter. Then compare the result value against the size of the file list. Once the values are the same you know all the files have been uploaded. The down side to this is that you need to provision enough capacity on your DynamoDB table to handle all the updates, which is going to increase your costs.
Also, I agree with you that SWF is overkill for this task.