AWS S3 bucket clean up but save certain number of folders - amazon-web-services

So, currently inside an S3 bucket, I store the javascript bundle file outputted from webpack. Here is a sample folder structure
- s3_bucket_name
- javascript_bundle
- 2018_10_11
- 2018_10_09
- 2018_10_08
- 2018_10_07
- 2018_10_06
- 2018_10_05
So I want to clean up the folders and only save 5 folders. (the folder name are date of deployment) I am unable to clean up by the date since it's possible we may not deploy for a long time.
Because of this, I am unable to use lifecycle methods.
For example, if I set the expiration date to 30 days, S3 will automatically remove all the folders if we don't deploy for 30 days, then all the javascript file will be removed and the site won't work.
Is there a way to accomplish this using AWS CLI?
The requirements are
Delete folder by date
Keep a minimum of 5 folders
For example, given the following folders and we want to delete folders older than 30 days while keeping at least 5 folders
- 2018_10_11
- 2018_09_09
- 2018_08_08
- 2018_07_07
- 2018_06_06
- 2018_05_05
The only one that will be deleted is 2018_05_05.
I don't see any options to do this via aws s3 rm command.

You can specify which folders to delete, but there is no option in the AWS CLI to specify which folders you do not want to delete.
This requirement would best be solved by writing a script (eg in Python) that can retrieve a list of the Bucket contents and then apply some logic to which objects should be deleted.
In Python, using boto3, list_objects_v2() can return a list of CommonPrefixes, which is effectively a list of folders. You could then determine which folders should be kept and then delete the objects in all other paths.

Related

Is it feasible to maintain directory structure when backing up to AWS S3 Glacier classes?

I am trying to backup 2TB from a shared drive of Windows Server to S3 Glacier
There are maybe 100 folders (some may be nested ) and perhaps 5000 files (some small like spread sheets, photos and other are larger like server images. My first question is what counts as an object here?
Let’s say I have Folder 1 which has 10 folders inside it. Each of 10 folders have 100 files.
Would number of objects be 1 folder + (10 folders * 100 files) = 1001 objects?
I am trying to understand how folder nesting is treated in S3. Do I have to manually create each folder as a prefix and then upload each file inside that using AWS CLI? I am trying to recreate the shared drive experience on the cloud where I can browse the folders and download the files I need.
Amazon S3 does not actually support folders. It might look like it does, but it actually doesn't.
For example, you could upload an object to invoices/january.txt and the invoices directory will just magically 'appear'. Then, if you deleted that object, the invoices folder would magically 'disappear' (because it never actually existed).
So, feel free to upload objects to any location without creating the directories first.
However, if you click the Create folder button in the Amazon S3 management console, it will create a zero-length object with the name of the directory. This will make the directory 'appear' and it would be counted as an object.
The easiest way to copy the files from your Windows computer to an Amazon S3 bucket would be:
aws s3 sync directoryname s3://bucket-name/ --storage-class DEEP_ARCHIVE
It will upload all files, including files in subdirectories. It will not create the folders, since they aren't necessary. However, the folder will still 'appear' in S3.

AWS configuration to delete files

I have a folder "Execution" folder in s3 bucket.
It has folders and files like
Execution
Exec_06-06-2022/
file1.json
file2.json
Exec_07-06-2022/
file3.json
file4.json
I need to configure to delete the Exec_datestamp folders and the inside files after X days.
I tried this using AWS lifecycle config for the prefix "Execution/"
But it deletes the folder Execution/ after X days ( set this to 1 day to test)
Is there any other way to achieve this?
There are no folders in S3. Execution is part of objects name, specifically its key prefix. The S3 console only makes Execution to appear as a folder, but there is no such thing in S3. So your lifecycle deletes Execution/ object (not folder), because it matches your query.
You can try with Execution/Exec* filter.
Got it, https://stackoverflow.com/a/41459761/5010582. The prefix has to be "Execution/Exec" without any wildcard.

How I Can Search Unknown Folders in S3 Bucket. I Have millions of object in my bucket I only want Folder List?

I Have a bucket with 3 million objects. I Even don't know how many folders are there in my S3 bucket and even don't know the names of folders in my bucket.I want to show only list of folders of AWS s3. Is there any way to get list of all folders ?
I would use AWS CLI for this. To get started - have a look here.
Then it is a matter of almost standard linux commands (ls):
aws s3 ls s3://<bucket_name>/path/to/search/folder/ --recursive | grep '/$' > folders.txt
where:
grep command just reads what aws s3 ls command has returned and searches for entries with ending /.
ending > folders.txt saves output to a file.
Note: grep (if I'm not wrong) is unix only utility command. But I believe, you can achieve this on windows as well.
Note 2: depending on the number of files there this operation might (will) take a while.
Note 3: usually in systems like AWS S3, term folder is there only for user to maintain visual similarity with standard file systems however inside it does treat it as a part of a key. You can see in your (web) console when you filter by "prefix".
Amazon S3 buckets with large quantities of objects are very difficult to use. The API calls that list bucket contents are limited to returning 1000 objects per API call. While it is possible to request 'folders' (by using Delimiter='/' and looking at CommonPrefixes), this would take repeated calls to obtain the hierarchy.
Instead, I would recommend using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You can then play with that CSV file from code (or possibly Excel? Might be too big?) to obtain your desired listings.
Just be aware that doing anything on that bucket will not be fast.

AWS s3 bucket shows 1 file less

I have used aws cli tool to move couple of folders named: 2014, 2015, 2016 etc from root directory:
/2015/
into:
/images/2015/
When I moved them it seems that there is one file less in each bucket:
Before copying:
After coping:
Could you help me to understand this phenomena ?
The count is probably including/excluding the 'folder object'.
Normally, there is no need to create folders in Amazon S3. Simply putting an object in a particular path (eg /images/2014 will "create" the images and 2014 folders -- they 'appear' to exist, but they actually do not exist. Deleting the objects will make the folders disappear.
However, it is possible to create a folder by clicking Create folder. This will create a zero-length object with the same name as the folder. This will force the folder to appear, even when there are no objects inside the folder.
Therefore, it is likely that the "off by 1" count of objects is related to a folder that was/wasn't created via the Create folder command. I have previously seen exactly this behaviour.

Skip Certain folders in S3 Lifecycle Policy

I want to apply a deletion rule to my whole bucket to delete all objects that are older than 2 weeks. But my bucket has certain folders that need to be skipped. So is it possible via S3 lifecycle management to skip certain folders and delete rest of the stuff that is older than 2 weeks?
Here is my current bucket structure
- example-bucket.com
- folder 1
- images
- 1. jpg
- videos
- 1.mp4
- 2.flv
- folder 2
- images
- 1.jpg
- folder 3
- videos
- 1.mp4
- folder 4 (Should not be deleted)
- content
- folders
- folder 5
- images
- 1. jpg
- videos
- 1.mp4
- 2.flv
- folder 6 (Should not be deleted)
- content
- folders
I want to skip folder 4 and 6 and want to delete stuff in all other folders that are 14 days old.
Can someone tell me if its possible via AWS S3 Lifecycle management?
Yes you can. Lifecycle Rule supports prefix/tag to filter what you want to apply the rule.
You need to define which you want to delete except others.
Reference:
To apply this lifecycle rule to all objects with a specified name
prefix (i.e., objects whose name begins with a common string), type in
a prefix. You can also limit the lifecycle rule scope to one or more
object tags. You can combine a prefix and tags. For more information
about object name prefixes, see Object Keys in the Amazon Simple
Storage Service Developer Guide. For more information about object
tags, see Object Tagging in the Amazon Simple Storage Service
Developer Guide
It's really frustrating but it isn't possible without adding every single prefix in the bucket that you want archived or deleted or labelling everything you want deleted, so if you are adding new folders to the bucket you will have to make sure that they conform to the prefix or are tagged which makes a maintenance nightmare. It would be much simpler if you could add an exception tag, particularly if you want a readme in the bucket describing what is in it that isn't archived.