How to wait for all s3 files to be uploaded - amazon-web-services

I have a Step Function State Machine that executes some lambdas in a flow.
I only want to run the State Machine after I upload the 10 required files for my flow, which is the tricky part.
10 files are manually uploaded in S3 bucket
S3 upload triggers a notification to EventBridge (need to wait for all new CSVs to be uploaded)
EventBridge starts the State Machine
I think my current flow will not work because it will call the state machine 10 times for each file upload...
I know how to use S3 file upload to trigger a state machine like in this example but I don't see how to make sure that ALL the files are uploaded before triggering the state machine ONCE.
Is this possible to achieve? Any ideas?

I think my current flow will not work because it will call the state machine 10 times for each file upload
That's right. Everytime an object is uploaded, the S3 event will trigger EventBridge which will invoke the step function.
To achieve your desired workflow, swap out the EventBridge rule with a lambda function. Have the lambda function check the total size of the bucket, and if the condition is met (10 files have been uploaded) invoke the Step Function directly.

Related

S3 Upload Trigger ( Multiple Files )

I have a foo lambda that executes some code by reading some files.
I only want to run the lambda after I upload the 10 required files, which is the tricky part.
10 files are uploaded in S3 bucket via bitbucket pipeline
??? (need to wait for all new CSVs to be uploaded)
Execute foo lambda
If I use S3 upload trigger it will not work because it will call the lambda 10 times for each file upload...
The 10 files already exist in the S3 repo, I just replace them.
Any ideas how to run only the foo lambda ONCE after the 10 files are uploaded?
The AWS Lambda function will be triggered for every object created in the Amazon S3 bucket.
There is no capability to ask for the Lambda function to run only after 10 files are uploaded.
You will need to add custom code to the Lambda function to determine whether it is 'ready' to trigger your work (and the definition of 'ready' is up to you!).

Sending S3 Trigger to AWS Lambda

I have a lambda function that has a trigger on PUT in the root of an S3 bucket. My function processes any file put in the bucket, then it moves the file to either a processed or a failed sub-folder depending on success. I want to be able to move files in bulk back to the root (easy to do via Management Console), but how to I simulate sending an s3 trigger for each file I move so I can re-process the file? I see how to do it manually via the console, but I would rather write a script to just take a list of files and simulate a trigger for each file. Can I do this using serverless invoke?

Is there a way to add delay to trigger a lambda from S3 upload?

I have a Lambda function which is triggered after put/post event of S3 bucket. This works fine if there is only one file uploaded to S3 bucket.
However, at times there could be multiple files uploaded which can take upto 7 minutes to complete the upload process. This triggers my lambda function multiple times which adds overhead of handling this from the code.
Is there any way to either trigger the lambda only once for the complete upload or add delay in the function and avoid multiple execution of Lambda function?
There is no specific interval when the files will be uploaded to S3 hence could not use scheduler.
Delay feature was added for Lambda that has Kinesis or DynamoDB Event Sources recently. But it's not supported for S3 events.
You can send events from S3 to SQS. Then your Lambda will consume SQS events. It consumes them in batch by default.
It seems Multi Part Upload is being used here from the client.
Maybe a duplicate of this? - AWS Lambda and Multipart Upload to/from S3
An alternative might be to have your Lambda function check for existence of all required files before moving on to the action you need to take. The Lambda function would still fire each time, but would exit quickly if not all files have been received yet.

How to Process Multiple Related Files uploaded to s3 as one group using aws Lambda

I have multiple related files being uploaded to S3 bucket as group, which I want to process using aws Lambda. For examples externaly, inventory.txt, orders.txt, order_details.txt are received in one folder in s3 bucket. These are part of one batch. Someone else will send the same files in another folder in the same bucket.
I want to process these files(cleanse,combine, etc) at the same time (so 3 files at the same time) as a batch.
I have dabbled with Lambda, on S3 create object event level but it gets triggered for each file being uploaded. I want the lambda to trigger for the 3 files (and for the additional 3 files in another directory if applicable).
After the upload process completes, make it create a trigger file(dummy file). For example, your process uploads orders.txt, inventory.txt in s3://my-bucket/today_date/
Make it create a s3://my-bucket/today_date/today_date.complete after copy is complete.
Add S3 event trigger to your lambda function to execute when it a .complete file is uploaded to S3 and the process the rest of the files using lambda and delete the .complete file. And the process repeats for next day.

AWS Lambda and Multipart Upload to/from S3

Using Lambda to move files from an S3 to our Redshift.
The data is placed in the S3 using an UNLOAD command directly from the data provider's Redshift. It comes in 10 different parts that, due to running in parallel, sometimes complete at different times.
I want the Lambda trigger to wait until all the data is completely uploaded before firing the trigger to import the data to my Redshift.
There is an event option in Lambda called "Complete Multipart Upload." Does the UNLOAD function count as a multipart upload within Lambda? Or would the simple "POST" event not fire until all the parts are completely uploaded by the provider?
There is no explicit documentation confirming that Redshift's UNLOAD command counts as a Multipart upload, or any confirming that the trigger will not fire until the data provider's entire upload is complete.
For Amazon S3, a multi-part upload is a single file, uploaded to S3 in multiple parts. When all parts have been uploaded, the client calls CompleteMultipartUpload. Only after the client calls CompleteMultipartUpload will the file appear in S3.
And only after the file is complete will the Lambda function be triggered. You will not get a Lambda trigger for each part.
If your UNLOAD operation is generating multiple objects/files in S3, then it is NOT an S3 "multi-part upload".