Deleting a file that doesn't exist on S3 deletes the folder if that is the last file. How do we prevent this? - amazon-web-services

So, I have the following structure on S3:
mainbucket
DataFeeds/
Statement/
We had incidents where the DataFeeds/ folder was being deleted! So, I tested with following:
aws s3api put-object --bucket mainbucket --key DataFeeds/.donotdelete
But, if I execute this (deleting blah.txt even if it does not exists) the DataFeeds/ folder gets deleted too:
aws s3 rm s3://mainbucket/DataFeeds/blah.txt
So, how do we prevent a folder from being deleted on S3?
Versions used:
aws-cli/2.2.46 Python/3.9.7 Darwin/20.6.0 source/x86_64 prompt/off

Folders do not exist in Amazon S3. It is a 'flat' storage service where the Key (filename) of an object includes the full path, including directories.
Amazon S3 will 'simulate' folders for you. For example, if you upload a file to invoices/january.txt, then the invoices directory will 'magically' appear. If you then delete that object, the directory will then 'disappear'. This is because it never actually existed.
If you use the Create folder button in the S3 management console, it will create a zero-length object with the same name as the directory. This will 'force' the directory to appear in the bucket listing. Deleting the zero-length object will cause the directory to disappear if there are no objects with that same Prefix (path).
The best advice for using S3 is to pretend that folders exist. You can place an object in any path and the (pretend) directories will magically appear. Do not worry about directories disappearing, since they never actually existed!
If you really need empty directories to stay around, use that Create folder button to create the zero-length object. It will stay around until you delete the zero-length object.

Related

AWS S3 - Use powershell to delete all files but keep the folders

I have a powershell script, that downloads all files form an S3 bucket, and then removes the files from the bucket. All the files I'm removing are stored in a subfolder in the S3 bucket, and I just want to delete the files but maintain the subfolders.
I'm currently using the following command to delete the files in S3 once the file has been downloaded from S3.
Remove-S3Object -BucketName $S3Bucket -Key $key -Force
My problem is that if it removes all the files in the subfolder, the subfolder is removed as well. Is there a way to remove the file, but keep the subfolder present using powerhsell. I believe I can do something like this,
aws s3 rm s3://<key_to_be_removed> --exclude "<subfolder_key>"
but not quite sure if that'll work.
I'm looking for the best way to accomplish this, and at the moment, my only option is to recreate the subfolder via the script if the subfolder not longer exists.
The only way to accomplish having an empty folder is to create a zero-length object which has the same name as the folder you want to keep. This is actually how the S3 console enables you to create an empty folder.
You can check this by running $ aws s3 ls s3://your-bucket/folderfoo/ and observing an output object having length of zero bytes.
See more on this topic here.
As already commented, S3 does not really have folders the way file systems do. The folders as presented by most S3 browsers are just generated based on the paths of the files/objects. If you upload an object/file named folder/file, the browsers will present folder as folder with file as a file in the folder. But technically, all that there is is the file/object folder/file. The folder does not exist on its own.
You can explicitly create a folder by creating an empty empty-named object with "in the folder": folder/. If you do that, it will appear the the folder exists even if there are no files in it. But if you do not do that, the virtual folder disappears once you remove all objects in the folder.
Now the question is whether your command removes even the empty named object representing the folder or not. I cannot tell that.

How to prevent Amazon S3 object expiration from deleting the directory

We want to delete temp files from the S3 bucket from one of the folder on daily basis. I have tried with s3 lifecycle policy. For example my folder name is Test, I have set prefix for expiration is Test/ . But the issue here is along with all the files, Test folder is also getting deleted. I want to keep the folder as is and only delete the files in that . Is there any way i can do this?
Don't worry about the folder.
As soon as you have at least one file "in" the folder (object key beginning with Test/) it will appear again and when there are no objects beginning with that key prefix it will disappear -- all of this is normal/expected behavior because folders in S3 are not containers like they are on a filesystem and they do not actually need to exist before putting files "in" them.

AWS S3 Listing API - How to list everything inside S3 Bucket with specific prefix

I am trying to list all items with specific prefix in S3 bucket. Here is directory structure that I have:
Item1/
Item2/
Item3/
Item4/
image_1.jpg
Item5/
image_1.jpg
image_2.jpg
When I set prefex to be Item1/Item2, I get as a result following keys:
Item1/Item2/
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
What I would like to get is:
Item1/Item2/
Item1/Item2/Item3
Item1/Item2/Item3/Item4
Item1/Item2/Item3/Item5
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
Is there anyway to achieve this in golang?
Folders do not actually exist in Amazon S3. It is a flat object storage system.
For example, using the AWS Command-Line Interface (CLI) I could copy a command to an Amazon S3 bucket:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This work just fine, even though folder1 and folder2 do not exist. This is because objects are stored with a Key (filename) that includes the full path of the object. So, the above object actually has a Key (filename) of:
folder1/folder2/foo.txt
However, to make things easier for humans, the Amazon S3 management console makes it appear as though there are folders. In S3, these are called Common Prefixes rather than folders.
So, when you make an API call to list the contents of the bucket while specifying a Prefix, it simply says "List all objects whose Key starts with this string".
Your listing doesn't show any folders because they don't actually exist.
Now, just to contradict myself, it actually is possible to create a folder (eg by clicking Create folder in the management console). This actually creates a zero-length object with the same name as the folder. The folder will then appear in listings because it is actually listing the zero-length object rather than the folder.
This is probably why Item1/Item2/ appears in your listing, but Item1/Item2/Item3 does not. Somebody, at some stage, must have "created a folder" called Item1/Item2/, which actually created a zero-length object with that Key.

Folders in S3 Bucket not visible in Web Console

After deleting a few folders in our S3 bucket, I am not able to see any of my folders through the web console. We had around 10 folders and ended up deleting 6 of them. The remaining four show up when I do an 'ls' on that S3 bucket through the CLI but the bucket shows up empty on the web console.
When I turn on 'Versions' I see everything (including the 6 folders that were deleted). Am I overlooking something extremely simple?
Folders do not actually exist in Amazon S3.
For example, you could create an object like this:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This would instantly 'create' folder1 and folder2. Or, to be more accurate, the folders would 'appear' but they don't actually exist because the full filename (Key) of the object is folder1/folder2/foo.txt.
If you were then to delete that object, the folders would 'disappear' because they never actually existed.
Sometimes, if a system wants to forcefully make a folder 'appear', it can create a zero-length object with the same name as the folder. This makes the folder 'appear', but it is really the empty file that is appearing.
Bottom line: Don't worry about creating and deleting folders. They will appear when necessary and disappear when not being used. Do not try to map normal filesystem behaviour to Amazon S3.

Inconsistent behavior in Amazon S3 for folder/key upload, delete through console

There seems to be some inconsistency in Amazon S3's behavior.
If in bucket a "Bucket1", I create folder "Folder1" and upload a file say "sample.txt" into it. Next I delete this file. At the bucket level I can see "Folder1" on S3 Console.
Now in the same bucket if I upload a file "Folder2/sample.txt" and just delete sample.txt file, then Folder2 also disappears from console?
Why this inconsistency? AFAIK we do not have any API to create/delete folder at SDK level.
Am I missing something here or is this an actual issue?
Thanks in advance for any help.
A "Folder" in S3 is simply a 0-byte object with a / character at the end of the key name.
So, using the AWS CLI or SDKs, you can "create a folder" by "putting" an object that matches those criteria.
The AWS Management Console also does something extra: it simulates folders, even of they were not explicitly created. So, if you uploaded your object as "Folder2/sample.txt", it extrapolates and simulates "Folder2/" at the parent folder level. You can do this yourself with the CLI/SDKs using the delimiter parameter.
When you delete that object, since "Folder2" did not actually exist as a 0-byte object ending with / (see first paragraph), then "Folder2/" disappears from the management console.