S3 Lambda Trigger not triggering for EVERY file upload - django

In Python Django, I save multiple video files.
Save 1:
Long Video
Short Video
Save 2:
Long Video
Short Video
Save 3:
Long Video
Short Video
I have a lambda trigger that uses media converter to add HLS formats to these videos as well as generate thumbnails. These 3 saves are done in very short time periods between each other since they are assets to a Social Media Post object.
For some reason the S3 triggers for only some of the files.
Save 1 triggers S3 Lambda but not Save 2.
Save 3 also triggers S3 Lambda.
My assumption is that the S3 trigger has some sort of downtime in between identifying new file uploads (In which case, I think the period in between these file uploads are near instant).
Is this assumption correct and how can I circumvent it?

It should fire for all objects.
When Amazon S3 triggers an AWS Lambda function, information about the object that caused the trigger is passed in the events field:
{
"Records": [
{
"eventSource": "aws:s3",
"awsRegion": "us-west-2",
"eventTime": "1970-01-01T00:00:00.000Z",
"eventName": "ObjectCreated:Put",
"s3": {
"bucket": {
"name": "my-s3-bucket",
"arn": "arn:aws:s3:::example-bucket"
},
"object": {
"key": "HappyFace.jpg",
"size": 1024,
...
}
}
}
]
}
Note that this is an array, so it is possible that multiple objects could be passed to one Lambda function. I have never definitively seen this happen, but the sample code from AWS certainly assumes this can happen based on their sample code:
def lambda_handler(event, context):
for record in event['Records']: # <-- Looping here
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'])
...
Therefore, I would recommend:
Print the event at the start of the function to put it into the log for later examination
Use a loop to go through all records that might be passed
Let us know what you found!

Related

How to retrieve list of encoded files and paths after a done job in MediaConvert?

As stated in the title, nothing in their API seems to provide a list of encoded files after a job is complete, crucial in case of HLS encoding since I need to move them from S3 to another cloud provider.
MediaConvert emits CloudWatch Events [1] for job status changes. You can implement this workflow by capturing jobs that go into a COMPLETE status and triggering a lambda function to gather required S3 paths. The COMPLETE CloudWatch event provide you the playlistFilePaths and outputFilePaths that will contain the S3 path your main and variant playlist.
CloudWatch event pattern to capture all completed jobs.
{
"source": [
"aws.mediaconvert"
],
"detail": {
"status": [
"COMPLETE"
]
}
}
An example for the CloudWatch event payload can be found in the documentation [1]
== Resources ==
[1] https://docs.aws.amazon.com/mediaconvert/latest/ug/apple-hls-group.html

Using CloudWatch Event : How to Pass JSON Object to CodeBuild as an Environment Variable

Summary: I can't specify a JSON object using CloudWatch target Input Transformer, in order to pass the object contents as an environment variable to a CodeBuild project.
Background:
I trigger an AWS CodeBuild job when an S3 bucket receives any new object. I have enabled CloudTrail for S3 operations so that I can use a CloudWatch rule that has my S3 bucket as an Event Source, with the CodeBuild project as a Target.
If I setup the 'Configure input' part of the Target, using Input Transformer, I can get single 'primitive' values from the event using the format below:
Input path textbox:
{"zip_file":"$.detail.requestParameters.key"}
Input template textbox:
{"environmentVariablesOverride": [ {"name":"ZIP_FILE", "value":<zip_file>}]}
And this works fine if I use 'simple' single strings.
However, for example, if I wish to obtain the entire 'resources' key, which is a JSON object, I need to have knowledge of each of the keys within, and the object structure, and manually recreate the structure for each key/value pair.
For example, the resources element in the Event is:
"resources": [
{
"type": "AWS::S3::Object",
"ARN": "arn:aws:s3:::mybucket/myfile.zip"
},
{
"accountId": "1122334455667799",
"type": "AWS::S3::Bucket",
"ARN": "arn:aws:s3:::mybucket"
}
],
I want the code in the buildspec in CodeBuild to do the heavy lifting and parse the JSON data.
If I specify in the input path textbox:
{"zip_file":"$.detail.resources"}
Then CodeBuild project never gets triggered.
Is there a way to get the entire JSON object, identified by a specific key, as an environment variable?
Check this...CodeBuild targets support all the parameters allowed by StartBuild API. You need to use environmentVariablesOverride in your JSON string.
{"environmentVariablesOverride": [ {"name":"ZIPFILE", "value":<zip_file>}]}
Please,avoid using '_' in the environment name.

Triggering a Lambda function with multiple required sources

I'm currently running a CloudFormation stack with a number of elements to process video, including a call to Rekognition. I have most of it working but have a question about properly storing information as I go... so I can, in the end, write the Rekognition data for a video to a DynamoDB table.
Below I have the relevant parts of the stack, which are mostly inside of a Step Function passing this input event along:
sample_event = {
"guid": "1234",
"video": "video.mp4",
"bucket": "my-bucket"
}
Current setup:
Write sample_event to a DynamoDB Table, by primary key 'guid', pass that sample_event along to the next step.
Rekgonition-Trigger Lambda: Lambda function that runs start_label_detection() on 'video.mp4' in 'my-bucket', sets notification channel as an SNS topic.
Rekognition-Collect Lambda: Lambda function (sits outside the Step Function) that is triggered by the SNS topic (several minutes later, for example), collects the JobID from the SNS, runs get_label_detection() at the JobID.
The above is working fine. I want to add step 4:
Write rekognition response to my DyanmoDB table, for the entry at "guid" = "1234", so my dynamo item is updated to
{ "guid": "1234", "video": "video.mp4", "bucket": "my-bucket", "rek_data": "{"Labels": [...]}" }
So it seems to me that I essentially can't pass any other data through Rekognition other than the SNS topic. Also seems that in the second lambda, I shouldn't be querying by a non-primary key such as the JobID.
Is there a way to set up the second lambda function so that it is triggered by two (and only the correct two) SNS topics? Such as one to send the 'guid' and one to send the Rekognition data?
Or would it be efficient to use two Dynamo tables, one to temporarily store the JobID and guid for later referencing? Or a better way to do all of this?
Thanks!

Why is my S3 lifecycle policy not taking effect?

I have an S3 lifecycle policy to delete objects after 3 days, and I am using a prefix. My problem is that the policy works for all but one sub-directory. For example, lets say my bucket looks like this:
s3://my-bucket/myPrefix/env=dev/
s3://my-bucket/myPrefix/env=stg/
s3://my-bucket/myPrefix/env=prod/
When I check the stg and prod directories, there are no objects older than 3 days. However, when I check the dev directory, there are objects a lot older than that.
Note - There is a huge difference between the volume of data in dev compared to the other 2. Dev holds a lot more logs than the others.
My initial thought was that it was taking longer for Eventual Consistency to show what was deleted and what wasn't, but that theory is gone considering the time that has passed.
The issue seems related to the amount of data in this location under the prefix compared to the others, but I'm not sure what I can do to resolve this. Should I have another policy specific to this location, or is there a somewhere I can check to see what is causing the failure? I did not see anything in Cloudtrail for this event.
Here is my policy:
{
"Rules": [
{
"Expiration": {
"Days": 3
},
"ID": "Delete Object When Stale",
"Prefix": "myPrefix/",
"Status": "Enabled"
}
]
}

AWS s3 -trigger on object created, function gets invoked continuously

I've created a lambda function to read a file (input.csv) from s3 bucket and make some changes into it and save that file(output.csv) in same bucket.
Note: i have not deleted input.csv file in bucket.
The lambda function is triggered with object-created(All) event. But the function is called continuously like infinite number of times as input file is present in bucket.
Is is supposed to happen like this ? or Is it fault?
This is your fault :)
You have set up a recursive trigger - each time you update the file, you're actually writing a new copy of it, which triggers the event, etc.
This was a key warning in the initial demo when Lambda was released (an image is uploaded to S3, lambda is triggered to create a thumbnail - if that thumbnail is written to the same bucket, it will trigger again, etc)
As #chris has pointed out, you've triggered a recursive loop by having events triggered by an S3 PUT event, which in turns performs another PUT, calling the trigger again and again.
To avoid this problem, the simplest method is to use two S3 buckets - one for files to be placed prior to processing, and another for files to be placed post-processing.
If you don't want to use two S3 buckets, you can modify your trigger condition to include FilterRules (docs). This allows you to control the trigger such that it would only get executed when an object is placed in a certain "folder" in S3 (of course folders don't really exist in S3, they're just key prefixes).
Here's an example:
{
"LambdaFunctionConfigurations": [
{
"Filter": {
"Key": {
"FilterRules": [
{
"Name": "Prefix",
"Value": "queue/"
}
]
}
},
"LambdaFunctionArn": <lambda_func_arn>,
"Id": "<lambda_func_name>:app.lambda_handler",
"Events": [
"s3:ObjectCreated:*"
]
}
]
}