Negation of a folder in S3 Event Notification - amazon-web-services

I have an S3 bucket with an event notification set up. The trigger of the event is All object create events and it basically runs a lambda function.
Inside the bucket, I have a bunch of folders, one for each operator in our system.
Inside each operator's folder, there might be a folder called /exports/.
Objective
For objects created within the /exports/ folder (under the operator folder), I do not want to trigger the event.
Is that possible with S3?

No, not supported afaik. You have three options:
Use two buckets, one for the dropped items and one for the exports, where each bucket would have a BLAC/ prefix, a CUCC/ prefix etc.
Restructure your prefixes within this one bucket to have: import/BLACC/ , import/CUCC/, export/BLACC/, export/CUCC/ and configure the upload trigger on the prefix import/
Modify your Lambda function to treat objects with exports/ in their key as a no-op.

Related

Copy data from one folder to another inside a AWS bucket automatically

I want to copy files from one folder to another inside the same bucket.
I have two folders Actual and Backup
As soon as new files comes to actual folder i want a way so that it immediately gets copied to Backup folder.
What you need are S3 Event Notifications. With these you can trigger a lambda function when a new item is put, then if it is put with one prefix, write the same object to the other prefix.
It is also worth noting that, though it is functionally as it seems, S3 doesn't really have directories; just objects. So you are just creating the same object as /Actual/some-file with key /Backup/some-file . It just looks like there is a directory because files /Actual/some-file and /Actual/other-file share a prefix /Actual/.

Automating folder creation in S3

I have an S3 bucket into which clients drop data files (CSV files) each month. I was wondering there was a way that I could automatically create a new "folder" (object) every time the files are dropped each month and put the newest files into that "folder". I need the CSV files separated by month so that AWS Glue can create new partitions when I run incremental crawlers on this bucket.
For example, let's say I have a S3 bucket called "client." On December 1st, a new CSV file ("DecClientData") will be dropped into that "client" bucket. I want to know if there is a way to automate the following two processes:
Create a "folder" (let's call it "dec") within "client".
Place the "DecClientData" file in the "dec" "folder".
Thanks in advance for any assistance you can provide!
S3 doesn't have the notion of folders commonly found in file systems but instead has a flat structure, more details can be found here.
Instead, the full path of an object is stored in its Key (filename). For example, an object can be stored in Amazon S3 with a Key of files/2020-12/data.txt regardless of the existence of files and 2020-12 directories (they are not really directories but zero-length objects).
In your case, to solve both points you are mentioning, you should leverage S3 event notifications and use them as a Lambda Trigger. When the Lambda function is triggered, it is passed the name of the object (Key) as an argument, at that point you can simply change its Key.
I.e. Object is uploaded in s3://my_bucket/uploads/file.txt, this creates an event notification that triggers a Lambda function. The functions gets the object and re-uploads it to s3://my_bucket/files/dec/file.txt (and deletes the original one).
Write an AWS Lambda function to create a folder in the client bucket and move the most recent .csv file (or files) in the new folder.
Then, configure the client S3 bucket to trigger the AWS Lambda function on new uploads through the event notification settings.

AWS -a Configure Trigger to detect only directory creation and not files creation

I am setting up a lambda function to get triggered only when a directory gets created in s3 and not the file
Example: {bucket-name}/a/b/c/d/
a , b, c, d are directories inside bucket.
I want to get a lambda function triggered when a key "d" (d is not a file, it is a directory) gets created.
Based on my research ,
Only Definite prefixes can be mentioned instead of mentioning {bucket-name}/*/
There is no specific filter in triggers to check for a directory creation. Files and directory creation are considered same as put object
operation. I want to trigger only during directory creation at certain depth, here in this example - i do not want to trigger
during directory/s3 key creation of a,b or c. I need to trigger only during directory creation of d (at deeper level). can this be done any ways while setting up a lambda trigger?
S3 isn't a file system - it is an object store. However, keys that end with a trailing "/" are generally treated as folders, so perhaps that is a way to check.
So I would have my lambda check to see if the object key had a trailing "/", and treat that as the folder creation.
Note that you can create file objects with a trailing "/", you just can't do that via the console, but if you have control over key creation you should be able to avoid that.
Edit:
To address the comment that you want the lambda to only trigger when a "folder" is created, not for every file added, this is not currently supported. Unless you are dealing with billions of files, I would not worry too much about the lambda costs. A function that takes 250ms to run with 256MB of RAM will cost you less than $5 per million objects.
Edit, July 2022:
You can accomplish this by adding an event notification on the bucket and putting "/" for the suffix. You will only get notified when a "folder" is created. (And I should also note that the console for S3 now allows creation of "folders")

AWS S3 folder put event notification

I've written a function in Python that uploads a folder and its content to S3. Now I would like S3 to generate an event (so I can send it to a lambda function). S3 allows to generate events only at file level, in fact folders on s3 are just a visualization layer, which means that S3 has no internal representation for folders, keys with the same root are simply grouped together. That said, as for now I've come up with three approaches that revolves around the idea of a 'poison pill'.
Send a special file at the end of the folder upload process, the creation of which sends an event to lambda that can open the file to read custom directives to act on. Seems that this approach is quite flexible, however it poses serious concerns security-wise (I know that ACLs are in place for this reason but I'm not quite sure if it's enough), and generates some overhead while downloading/uploading/deleting the file from/to local memory.
Map an event to the target lambdas and fire it directly. The difference in approaches is simply that in this case I'm not really creating a file on S3, I'm just making S3 believe so. I would use CloudWatch to fire custom S3-object-created events with the name of the folder for lambda to pick up. This approach feels a little more hacky than the other two, plus when I did my research on the matter it seemed like it shouldn't be possible to generate "mock" events on AWS (i.e. Trigger S3 create event). To my understanding however, the function put_events should do the trick.
Using SQS would allow to put the folder name into an SQS task that can be later consumed by lambda. This has some advantages over the other two approaches, since SQS has now a LIFO variant that allows for exactly-once-delivery, failures reprocessing (via dead letters queue), etc, however this generates a non-trivial amount of complexity compared to the other approaches.
At this point I'm trying to opt for the most 'correct' approach, and
in order to do so I'm trying to weight pros and cons to make an informed decision, which led me to some questions:
Is there another way I'm missing out to proceed that does not involve client notification ? (all the aforementioned approaches rely on the client sending the notification in one way or another, which is not very "cloudy")?
Is there a substantial difference between approaches 2 and 3, considering that both rely on sending the information in and out of a stream (CloudWatch and SQS respectively)?
Have you consider using the prefix option of S3 bucket event, I tested it and it worked fine. In my S3 bucket I created two folder test1 and test2. On s3 event I added prefix test1 with that in place every time put/copy operation happen on bucket lambda is trigger.
I think your question nets down to "how can I trigger a Lambda function after I have uploaded a folder full of files to S3?"
Unless you have some information a priori server-side that you can use to determine when the folder upload has completed, the client is going to have to tell you.
Options I would consider:
change your client to publish a message to SNS or to SQS upon the completion of uploading to S3. That message can then trigger your Lambda function.
after the last file has been uploaded to folder images/dogs/, upload a zero-sized object whose key is the same as the folder (images/dogs/). This is a 'sentinel file'. Use an S3 event trigger with suffix of / to detect the upload of that 'folder' object and trigger your Lambda.
I prefer the 1st option. It achieves the end goal without resulting in extraneous S3 objects. With SNS you can also configure multiple downstream processes in response to the ‘finished upload’ message (a fan out) if needed.

AWS Lambda function getting called repeatedly

I have written a Lambda function which gets invoked automatically when a file comes into my S3 bucket.
I perform certain validations on this file, modify the particular and put the file at the same location.
Due to this "put", my lambda is called again and the process goes on till my lambda execution times out.
Is there any way to trigger this lambda only once?
I found an approach where I can store the file name in DynamoDB and can apply a check in lambda function, but can there be any other approach where DynamoDB's use can be avoided?
You have a couple options:
You can put the file to a different location in s3 and delete the original
You can add a metadata field to the s3 object when you update it. Then check for the presence of that field in s3 so you know if you have processed it already. Now this might not work perfectly since s3 does not always provide the most recent data on reads after updates.
AWS allows different type of s3 event triggers. You can try playing s3:ObjectCreated:Put vs s3:ObjectCreated:Post.
You can upload your files in a folder, say
s3://bucket-name/notvalidated
and store the validated in another folder, say
s3://bucket-name/validated.
Update your S3 Event notification to invoke your lambda function whenever there is a ObjectCreate(All) event in the /notvalidated prefix.
The second answer does not seem to be correct (put vs post) - there is not really a concept of update in S3 in terms of POST or PUT. The request to update an object will be the same as the initial POST of the object. See here for details on the available S3 events.
I had this exact problem last year - I was doing an image resize on PUT and every time a file was overwritten, it would be triggered again. My recommended solution would be to have two folders in your s3 bucket - one for the original file and one for the finalized file. You could then create the lambda trigger with the lambda prefix so it only checks the files in the original folder
The events are triggered in S3 based on if the object is put/post/copy/complete Multipart Upload - All these operations corresponds to ObjectCreate as per AWS documentation .
https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
The best solution is to restrict your S3 object create event to particular bucket location. So that any change in that bucket location will trigger lambda function.
You can do the modification in some other bucket location which is not configured to trigger lambda function when object is created in that location.
Hope it helps!