Skip Certain folders in S3 Lifecycle Policy - amazon-web-services

I want to apply a deletion rule to my whole bucket to delete all objects that are older than 2 weeks. But my bucket has certain folders that need to be skipped. So is it possible via S3 lifecycle management to skip certain folders and delete rest of the stuff that is older than 2 weeks?
Here is my current bucket structure
- example-bucket.com
- folder 1
- images
- 1. jpg
- videos
- 1.mp4
- 2.flv
- folder 2
- images
- 1.jpg
- folder 3
- videos
- 1.mp4
- folder 4 (Should not be deleted)
- content
- folders
- folder 5
- images
- 1. jpg
- videos
- 1.mp4
- 2.flv
- folder 6 (Should not be deleted)
- content
- folders
I want to skip folder 4 and 6 and want to delete stuff in all other folders that are 14 days old.
Can someone tell me if its possible via AWS S3 Lifecycle management?

Yes you can. Lifecycle Rule supports prefix/tag to filter what you want to apply the rule.
You need to define which you want to delete except others.
Reference:
To apply this lifecycle rule to all objects with a specified name
prefix (i.e., objects whose name begins with a common string), type in
a prefix. You can also limit the lifecycle rule scope to one or more
object tags. You can combine a prefix and tags. For more information
about object name prefixes, see Object Keys in the Amazon Simple
Storage Service Developer Guide. For more information about object
tags, see Object Tagging in the Amazon Simple Storage Service
Developer Guide

It's really frustrating but it isn't possible without adding every single prefix in the bucket that you want archived or deleted or labelling everything you want deleted, so if you are adding new folders to the bucket you will have to make sure that they conform to the prefix or are tagged which makes a maintenance nightmare. It would be much simpler if you could add an exception tag, particularly if you want a readme in the bucket describing what is in it that isn't archived.

Related

AWS S3 Lifecycle

I've been exploring AWS S3 Lifecycle techniques and found the best way to delete S3 files > 60 days old is to configure this through the GUI.
However, I'm not wanting to delete ALL files greater than 60 days. For example, I'd like to at least keep all HTML files inside the bucket that are greater than 60 days.
I've found that a prefix can be entered to limit the scope of the lifecycle to a specific file; however, this requires me to enter ALL files EXCEPT HTMLs. We have hundreds of files, so this will take forever.
I was wondering if anyone knew of an easier way? For example, I would like to just exclude all *.html from the lifecycle.
There is no way to exclude object from rules.
You can rearrange object in your bucket so rule can be applied to objects in specified prefix ("folder").

Delete all version of S3 object using lifecycle rule

I have a S3 bucket with multiple folders with versioning enabled.
Out of these multiple folders I want to complete delete one folder as it has multiple delete marker.
I am using Lifecycle rule to delete the objects but not sure if it will work for specific folder.
In Lifecycle Rule, If I specify the folder_name/ as a prefix and expiration rule as 1 day after creation for all and current versions.
Will it delete all the objects and its versions ?
Can someone please confirm ?
The other folders are quite critical so can't mess with the rule to test.
I can confirm that you can delete at folder level instead of entire bucket. We have a rule that does the exact same thing (although 7 days instead of 1). I will echo John's point that after initial setup, it will take time to do the deletion. You should see progress STARTING within 1 hour, but actual completion may take a while.

What does prefix mean in S3

the docs says,
For example, your application can achieve at least 3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket. There are no limits to the number of prefixes in a bucket. You can increase your read or write performance by parallelizing reads. For example, if you create 10 prefixes in an Amazon S3 bucket to parallelize reads, you could scale your read performance to 55,000 read requests per second.
But, it doesn't clearly mentions the concept of prefixes.
For eg,
Lets say I have 3 files and their corresponding keys are:
a/a1.txt
b/b1.txt
2.txt
As per my understanding, there is no concept of folders in S3. So, S3 will create something like this on my bucket.
|- a/
|- a1.txt
|- 2.txt
|- b/
|- b1.txt
I did came across this blog but it made things more confusing for me.
My questions:-
Does every Object created in S3 that ends with '/' is a prefix?
In other words, Does every folder that we see in the S3 web console is a prefix?
Although S3 is theoretically a flat store, many of its operations have special handling for prefixes with a set delimiter, usually /. For instance this help page discusses how the "folders" on the S3 console web interface are built by looking at the prefixes you've used.
An important point to remember here is that these folders are not objects themselves, so in your example, there is no key of a or b stored in the bucket.
If you create a bucket and immediately add an object with a key of a/b/c/d/e.txt then:
the bucket will contain exactly one object, with key a/b/c/d/e.txt
some APIs and UIs will infer a prefix for that key of a/b/c/d, as a way of grouping related keys

AWS S3 bucket clean up but save certain number of folders

So, currently inside an S3 bucket, I store the javascript bundle file outputted from webpack. Here is a sample folder structure
- s3_bucket_name
- javascript_bundle
- 2018_10_11
- 2018_10_09
- 2018_10_08
- 2018_10_07
- 2018_10_06
- 2018_10_05
So I want to clean up the folders and only save 5 folders. (the folder name are date of deployment) I am unable to clean up by the date since it's possible we may not deploy for a long time.
Because of this, I am unable to use lifecycle methods.
For example, if I set the expiration date to 30 days, S3 will automatically remove all the folders if we don't deploy for 30 days, then all the javascript file will be removed and the site won't work.
Is there a way to accomplish this using AWS CLI?
The requirements are
Delete folder by date
Keep a minimum of 5 folders
For example, given the following folders and we want to delete folders older than 30 days while keeping at least 5 folders
- 2018_10_11
- 2018_09_09
- 2018_08_08
- 2018_07_07
- 2018_06_06
- 2018_05_05
The only one that will be deleted is 2018_05_05.
I don't see any options to do this via aws s3 rm command.
You can specify which folders to delete, but there is no option in the AWS CLI to specify which folders you do not want to delete.
This requirement would best be solved by writing a script (eg in Python) that can retrieve a list of the Bucket contents and then apply some logic to which objects should be deleted.
In Python, using boto3, list_objects_v2() can return a list of CommonPrefixes, which is effectively a list of folders. You could then determine which folders should be kept and then delete the objects in all other paths.

AWS S3 : Do Lifecycle rules accept regex?

I have an s3 bucket with "folders" folder1, folder2, folder3, folder4. In folder2 and folder3 there is a "new" folder. I need to delete everything in "new", older than 1 day. Can I do that with a rule like /*/new/ ? Some guys say they have seen such rules work in the past, but that particular definition does nothing.
(In the real bucket there are folder1, folder2 ... folder3001 so I can't make rules for every folder, so please don't suggest that. The above example is for simplicity only.)
The PUT livecycle API takes a "Prefix", which as the name says is a prefix, not a regex.
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketPUTlifecycle.html
There is also a limit of 1000 rules per bucket.
http://docs.aws.amazon.com/AmazonS3/latest/dev/object-lifecycle-mgmt.html
You could change your folder structure so that keys look like "new/folderN".