AWS -a Configure Trigger to detect only directory creation and not files creation - amazon-web-services

I am setting up a lambda function to get triggered only when a directory gets created in s3 and not the file
Example: {bucket-name}/a/b/c/d/
a , b, c, d are directories inside bucket.
I want to get a lambda function triggered when a key "d" (d is not a file, it is a directory) gets created.
Based on my research ,
Only Definite prefixes can be mentioned instead of mentioning {bucket-name}/*/
There is no specific filter in triggers to check for a directory creation. Files and directory creation are considered same as put object
operation. I want to trigger only during directory creation at certain depth, here in this example - i do not want to trigger
during directory/s3 key creation of a,b or c. I need to trigger only during directory creation of d (at deeper level). can this be done any ways while setting up a lambda trigger?

S3 isn't a file system - it is an object store. However, keys that end with a trailing "/" are generally treated as folders, so perhaps that is a way to check.
So I would have my lambda check to see if the object key had a trailing "/", and treat that as the folder creation.
Note that you can create file objects with a trailing "/", you just can't do that via the console, but if you have control over key creation you should be able to avoid that.
Edit:
To address the comment that you want the lambda to only trigger when a "folder" is created, not for every file added, this is not currently supported. Unless you are dealing with billions of files, I would not worry too much about the lambda costs. A function that takes 250ms to run with 256MB of RAM will cost you less than $5 per million objects.
Edit, July 2022:
You can accomplish this by adding an event notification on the bucket and putting "/" for the suffix. You will only get notified when a "folder" is created. (And I should also note that the console for S3 now allows creation of "folders")

Related

Automating folder creation in S3

I have an S3 bucket into which clients drop data files (CSV files) each month. I was wondering there was a way that I could automatically create a new "folder" (object) every time the files are dropped each month and put the newest files into that "folder". I need the CSV files separated by month so that AWS Glue can create new partitions when I run incremental crawlers on this bucket.
For example, let's say I have a S3 bucket called "client." On December 1st, a new CSV file ("DecClientData") will be dropped into that "client" bucket. I want to know if there is a way to automate the following two processes:
Create a "folder" (let's call it "dec") within "client".
Place the "DecClientData" file in the "dec" "folder".
Thanks in advance for any assistance you can provide!
S3 doesn't have the notion of folders commonly found in file systems but instead has a flat structure, more details can be found here.
Instead, the full path of an object is stored in its Key (filename). For example, an object can be stored in Amazon S3 with a Key of files/2020-12/data.txt regardless of the existence of files and 2020-12 directories (they are not really directories but zero-length objects).
In your case, to solve both points you are mentioning, you should leverage S3 event notifications and use them as a Lambda Trigger. When the Lambda function is triggered, it is passed the name of the object (Key) as an argument, at that point you can simply change its Key.
I.e. Object is uploaded in s3://my_bucket/uploads/file.txt, this creates an event notification that triggers a Lambda function. The functions gets the object and re-uploads it to s3://my_bucket/files/dec/file.txt (and deletes the original one).
Write an AWS Lambda function to create a folder in the client bucket and move the most recent .csv file (or files) in the new folder.
Then, configure the client S3 bucket to trigger the AWS Lambda function on new uploads through the event notification settings.

Configure s3 event for alternate PUT operation

I have a Lambda function that gets triggered whenever an object is created in s3 bucket.
Now, I need to trigger the Lambda for alternate object creation.
Lambda should not be triggered when object is created for the first, third , fifth and so on time. But, Lambda should be triggered for the second, fourth, sixth and so on time.
For this, I created an s3 event for 'PUT' operation.
The first time I used the PUT API. The second time I uploaded the file using -
s3_res.meta.client.upload_file
I thought that it would not trigger lambda since this was upload and not PUT. But this also triggered the Lambda.
Is there any way for this?
The reason that meta.client.upload_file is triggering your PUT event lambda is because it is actually using PUT.
upload_file (docs) uses the TransferManager client, which uses PUT under-the-hood (you can see this in the code: https://github.com/boto/s3transfer/blob/develop/s3transfer/upload.py)
Looking at the AWS-SDK you'll see that POST'ing to S3 is pretty much limited to when you want to give a browser/client a pre-signed URL for them to upload a file to. (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html)
If you want to count the number of times PUT has been called, in order to take action on every even call, then the easiest thing to do is to use something like DynamoDB to create a table of 'file-name' vs 'put-count' which you update with every PUT and action accordingly.
Alternatively you could enable bucket file versioning. Then you could use list_object_versions to see how many times the file has been updated. Although you should be aware that S3 is eventually consistent, so this may not be accurate if the file is being rapidly updated.

AWS S3 folder put event notification

I've written a function in Python that uploads a folder and its content to S3. Now I would like S3 to generate an event (so I can send it to a lambda function). S3 allows to generate events only at file level, in fact folders on s3 are just a visualization layer, which means that S3 has no internal representation for folders, keys with the same root are simply grouped together. That said, as for now I've come up with three approaches that revolves around the idea of a 'poison pill'.
Send a special file at the end of the folder upload process, the creation of which sends an event to lambda that can open the file to read custom directives to act on. Seems that this approach is quite flexible, however it poses serious concerns security-wise (I know that ACLs are in place for this reason but I'm not quite sure if it's enough), and generates some overhead while downloading/uploading/deleting the file from/to local memory.
Map an event to the target lambdas and fire it directly. The difference in approaches is simply that in this case I'm not really creating a file on S3, I'm just making S3 believe so. I would use CloudWatch to fire custom S3-object-created events with the name of the folder for lambda to pick up. This approach feels a little more hacky than the other two, plus when I did my research on the matter it seemed like it shouldn't be possible to generate "mock" events on AWS (i.e. Trigger S3 create event). To my understanding however, the function put_events should do the trick.
Using SQS would allow to put the folder name into an SQS task that can be later consumed by lambda. This has some advantages over the other two approaches, since SQS has now a LIFO variant that allows for exactly-once-delivery, failures reprocessing (via dead letters queue), etc, however this generates a non-trivial amount of complexity compared to the other approaches.
At this point I'm trying to opt for the most 'correct' approach, and
in order to do so I'm trying to weight pros and cons to make an informed decision, which led me to some questions:
Is there another way I'm missing out to proceed that does not involve client notification ? (all the aforementioned approaches rely on the client sending the notification in one way or another, which is not very "cloudy")?
Is there a substantial difference between approaches 2 and 3, considering that both rely on sending the information in and out of a stream (CloudWatch and SQS respectively)?
Have you consider using the prefix option of S3 bucket event, I tested it and it worked fine. In my S3 bucket I created two folder test1 and test2. On s3 event I added prefix test1 with that in place every time put/copy operation happen on bucket lambda is trigger.
I think your question nets down to "how can I trigger a Lambda function after I have uploaded a folder full of files to S3?"
Unless you have some information a priori server-side that you can use to determine when the folder upload has completed, the client is going to have to tell you.
Options I would consider:
change your client to publish a message to SNS or to SQS upon the completion of uploading to S3. That message can then trigger your Lambda function.
after the last file has been uploaded to folder images/dogs/, upload a zero-sized object whose key is the same as the folder (images/dogs/). This is a 'sentinel file'. Use an S3 event trigger with suffix of / to detect the upload of that 'folder' object and trigger your Lambda.
I prefer the 1st option. It achieves the end goal without resulting in extraneous S3 objects. With SNS you can also configure multiple downstream processes in response to the ‘finished upload’ message (a fan out) if needed.

AWS Lambda function getting called repeatedly

I have written a Lambda function which gets invoked automatically when a file comes into my S3 bucket.
I perform certain validations on this file, modify the particular and put the file at the same location.
Due to this "put", my lambda is called again and the process goes on till my lambda execution times out.
Is there any way to trigger this lambda only once?
I found an approach where I can store the file name in DynamoDB and can apply a check in lambda function, but can there be any other approach where DynamoDB's use can be avoided?
You have a couple options:
You can put the file to a different location in s3 and delete the original
You can add a metadata field to the s3 object when you update it. Then check for the presence of that field in s3 so you know if you have processed it already. Now this might not work perfectly since s3 does not always provide the most recent data on reads after updates.
AWS allows different type of s3 event triggers. You can try playing s3:ObjectCreated:Put vs s3:ObjectCreated:Post.
You can upload your files in a folder, say
s3://bucket-name/notvalidated
and store the validated in another folder, say
s3://bucket-name/validated.
Update your S3 Event notification to invoke your lambda function whenever there is a ObjectCreate(All) event in the /notvalidated prefix.
The second answer does not seem to be correct (put vs post) - there is not really a concept of update in S3 in terms of POST or PUT. The request to update an object will be the same as the initial POST of the object. See here for details on the available S3 events.
I had this exact problem last year - I was doing an image resize on PUT and every time a file was overwritten, it would be triggered again. My recommended solution would be to have two folders in your s3 bucket - one for the original file and one for the finalized file. You could then create the lambda trigger with the lambda prefix so it only checks the files in the original folder
The events are triggered in S3 based on if the object is put/post/copy/complete Multipart Upload - All these operations corresponds to ObjectCreate as per AWS documentation .
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
The best solution is to restrict your S3 object create event to particular bucket location. So that any change in that bucket location will trigger lambda function.
You can do the modification in some other bucket location which is not configured to trigger lambda function when object is created in that location.
Hope it helps!

Cost of renaming a folder in AWS S3 bucket

I want to rename a folder in S3 bucket, I understand that rename will run a PUT request which costs 1 cent per 1000 request.
However, the PUT request is defined as a COPY and involves with also a GET
My question is, when we rename a folder in S3 bucket, does it involve copying all sub-folders and files to a new folder with the name I want (which costs more than 1 PUT request), or it just simply 1 PUT request to change the name without touching all the items within the folder.
In case you've missed it... there are no folders in S3.
The object /pics/funny/cat.jpg is not a file called cat.jpg inside a folder called funny inside another folder called pics.
In fact, it is a file with an 18-character name: pics/funny/cat.jpg. The hierarchy shown in the console is largely for human convenience, and the ability to create new folders in the console is an illusion, also.
So, yes, renaming a "folder" actually means making a new copy of each object in the "folder," with a change to the object names to look like their are in the path.
This can be done with a PUT/COPY request ($0.005 per 1000 depending on the region) followed by a DELETE request of the old object (free). There is no corresponding GET request, because PUT/COPY is an atomic operation inside S3, so actually downloading and re-uploading the data is avoided.