Remove empty folder in S3 bucket via CLI - amazon-web-services

I have about 8K folders in an S3 bucket. Some of them are "empty" (does not have objects with its name prefix) and some are "not empty".
How I can programmatically detect such "empty" folder in the bucket and remove it.
Yes, I know there is no concept of a folder in a bucket - it just names.

An empty folder in the context of S3 is a zero-sized S3 object whose key ends with your folder separator, typically /, for example images/cats/.
If the applications that use this S3 bucket don't strictly need these folder objects but instead can infer the presence of a folder structure from the presence of file objects e.g. infer the folder images/dogs/ when they see the file images/dogs/terrier.png, then one solution to remove all empty folders would be to simply enumerate all objects that end in / and then delete all of those that are zero-sized. That would remove all folder objects.
If the applications do need these folder objects to remain for non-empty folders, then you'd do something different. For example, enumerate all S3 objects in the bucket, pull out those that represent folders (zero-sized, ending in /) and then see if that same prefix is present in any other, non-folder object.
Also, if you find that enumerating the entire bucket's contents becomes problematic (for example, if you have millions of objects) then you might consider using an S3 Inventory report to drive your process.

Related

Copy data from one folder to another inside a AWS bucket automatically

I want to copy files from one folder to another inside the same bucket.
I have two folders Actual and Backup
As soon as new files comes to actual folder i want a way so that it immediately gets copied to Backup folder.
What you need are S3 Event Notifications. With these you can trigger a lambda function when a new item is put, then if it is put with one prefix, write the same object to the other prefix.
It is also worth noting that, though it is functionally as it seems, S3 doesn't really have directories; just objects. So you are just creating the same object as /Actual/some-file with key /Backup/some-file . It just looks like there is a directory because files /Actual/some-file and /Actual/other-file share a prefix /Actual/.

Deleting a file that doesn't exist on S3 deletes the folder if that is the last file. How do we prevent this?

So, I have the following structure on S3:
mainbucket
DataFeeds/
Statement/
We had incidents where the DataFeeds/ folder was being deleted! So, I tested with following:
aws s3api put-object --bucket mainbucket --key DataFeeds/.donotdelete
But, if I execute this (deleting blah.txt even if it does not exists) the DataFeeds/ folder gets deleted too:
aws s3 rm s3://mainbucket/DataFeeds/blah.txt
So, how do we prevent a folder from being deleted on S3?
Versions used:
aws-cli/2.2.46 Python/3.9.7 Darwin/20.6.0 source/x86_64 prompt/off
Folders do not exist in Amazon S3. It is a 'flat' storage service where the Key (filename) of an object includes the full path, including directories.
Amazon S3 will 'simulate' folders for you. For example, if you upload a file to invoices/january.txt, then the invoices directory will 'magically' appear. If you then delete that object, the directory will then 'disappear'. This is because it never actually existed.
If you use the Create folder button in the S3 management console, it will create a zero-length object with the same name as the directory. This will 'force' the directory to appear in the bucket listing. Deleting the zero-length object will cause the directory to disappear if there are no objects with that same Prefix (path).
The best advice for using S3 is to pretend that folders exist. You can place an object in any path and the (pretend) directories will magically appear. Do not worry about directories disappearing, since they never actually existed!
If you really need empty directories to stay around, use that Create folder button to create the zero-length object. It will stay around until you delete the zero-length object.

How to prevent Amazon S3 object expiration from deleting the directory

We want to delete temp files from the S3 bucket from one of the folder on daily basis. I have tried with s3 lifecycle policy. For example my folder name is Test, I have set prefix for expiration is Test/ . But the issue here is along with all the files, Test folder is also getting deleted. I want to keep the folder as is and only delete the files in that . Is there any way i can do this?
Don't worry about the folder.
As soon as you have at least one file "in" the folder (object key beginning with Test/) it will appear again and when there are no objects beginning with that key prefix it will disappear -- all of this is normal/expected behavior because folders in S3 are not containers like they are on a filesystem and they do not actually need to exist before putting files "in" them.

AWS S3 Listing API - How to list everything inside S3 Bucket with specific prefix

I am trying to list all items with specific prefix in S3 bucket. Here is directory structure that I have:
Item1/
Item2/
Item3/
Item4/
image_1.jpg
Item5/
image_1.jpg
image_2.jpg
When I set prefex to be Item1/Item2, I get as a result following keys:
Item1/Item2/
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
What I would like to get is:
Item1/Item2/
Item1/Item2/Item3
Item1/Item2/Item3/Item4
Item1/Item2/Item3/Item5
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
Is there anyway to achieve this in golang?
Folders do not actually exist in Amazon S3. It is a flat object storage system.
For example, using the AWS Command-Line Interface (CLI) I could copy a command to an Amazon S3 bucket:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This work just fine, even though folder1 and folder2 do not exist. This is because objects are stored with a Key (filename) that includes the full path of the object. So, the above object actually has a Key (filename) of:
folder1/folder2/foo.txt
However, to make things easier for humans, the Amazon S3 management console makes it appear as though there are folders. In S3, these are called Common Prefixes rather than folders.
So, when you make an API call to list the contents of the bucket while specifying a Prefix, it simply says "List all objects whose Key starts with this string".
Your listing doesn't show any folders because they don't actually exist.
Now, just to contradict myself, it actually is possible to create a folder (eg by clicking Create folder in the management console). This actually creates a zero-length object with the same name as the folder. The folder will then appear in listings because it is actually listing the zero-length object rather than the folder.
This is probably why Item1/Item2/ appears in your listing, but Item1/Item2/Item3 does not. Somebody, at some stage, must have "created a folder" called Item1/Item2/, which actually created a zero-length object with that Key.

Folders in S3 Bucket not visible in Web Console

After deleting a few folders in our S3 bucket, I am not able to see any of my folders through the web console. We had around 10 folders and ended up deleting 6 of them. The remaining four show up when I do an 'ls' on that S3 bucket through the CLI but the bucket shows up empty on the web console.
When I turn on 'Versions' I see everything (including the 6 folders that were deleted). Am I overlooking something extremely simple?
Folders do not actually exist in Amazon S3.
For example, you could create an object like this:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This would instantly 'create' folder1 and folder2. Or, to be more accurate, the folders would 'appear' but they don't actually exist because the full filename (Key) of the object is folder1/folder2/foo.txt.
If you were then to delete that object, the folders would 'disappear' because they never actually existed.
Sometimes, if a system wants to forcefully make a folder 'appear', it can create a zero-length object with the same name as the folder. This makes the folder 'appear', but it is really the empty file that is appearing.
Bottom line: Don't worry about creating and deleting folders. They will appear when necessary and disappear when not being used. Do not try to map normal filesystem behaviour to Amazon S3.