AWS S3 Listing API - How to list everything inside S3 Bucket with specific prefix - amazon-web-services

I am trying to list all items with specific prefix in S3 bucket. Here is directory structure that I have:
Item1/
Item2/
Item3/
Item4/
image_1.jpg
Item5/
image_1.jpg
image_2.jpg
When I set prefex to be Item1/Item2, I get as a result following keys:
Item1/Item2/
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
What I would like to get is:
Item1/Item2/
Item1/Item2/Item3
Item1/Item2/Item3/Item4
Item1/Item2/Item3/Item5
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
Is there anyway to achieve this in golang?

Folders do not actually exist in Amazon S3. It is a flat object storage system.
For example, using the AWS Command-Line Interface (CLI) I could copy a command to an Amazon S3 bucket:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This work just fine, even though folder1 and folder2 do not exist. This is because objects are stored with a Key (filename) that includes the full path of the object. So, the above object actually has a Key (filename) of:
folder1/folder2/foo.txt
However, to make things easier for humans, the Amazon S3 management console makes it appear as though there are folders. In S3, these are called Common Prefixes rather than folders.
So, when you make an API call to list the contents of the bucket while specifying a Prefix, it simply says "List all objects whose Key starts with this string".
Your listing doesn't show any folders because they don't actually exist.
Now, just to contradict myself, it actually is possible to create a folder (eg by clicking Create folder in the management console). This actually creates a zero-length object with the same name as the folder. The folder will then appear in listings because it is actually listing the zero-length object rather than the folder.
This is probably why Item1/Item2/ appears in your listing, but Item1/Item2/Item3 does not. Somebody, at some stage, must have "created a folder" called Item1/Item2/, which actually created a zero-length object with that Key.

Related

Deleting a file that doesn't exist on S3 deletes the folder if that is the last file. How do we prevent this?

So, I have the following structure on S3:
mainbucket
DataFeeds/
Statement/
We had incidents where the DataFeeds/ folder was being deleted! So, I tested with following:
aws s3api put-object --bucket mainbucket --key DataFeeds/.donotdelete
But, if I execute this (deleting blah.txt even if it does not exists) the DataFeeds/ folder gets deleted too:
aws s3 rm s3://mainbucket/DataFeeds/blah.txt
So, how do we prevent a folder from being deleted on S3?
Versions used:
aws-cli/2.2.46 Python/3.9.7 Darwin/20.6.0 source/x86_64 prompt/off
Folders do not exist in Amazon S3. It is a 'flat' storage service where the Key (filename) of an object includes the full path, including directories.
Amazon S3 will 'simulate' folders for you. For example, if you upload a file to invoices/january.txt, then the invoices directory will 'magically' appear. If you then delete that object, the directory will then 'disappear'. This is because it never actually existed.
If you use the Create folder button in the S3 management console, it will create a zero-length object with the same name as the directory. This will 'force' the directory to appear in the bucket listing. Deleting the zero-length object will cause the directory to disappear if there are no objects with that same Prefix (path).
The best advice for using S3 is to pretend that folders exist. You can place an object in any path and the (pretend) directories will magically appear. Do not worry about directories disappearing, since they never actually existed!
If you really need empty directories to stay around, use that Create folder button to create the zero-length object. It will stay around until you delete the zero-length object.

uploading file to specific folder in S3 bucket using boto3

My code is working. The only issue I'm facing is that I cannot specify the folder within the S3 bucket that I would like to place my file in. Here is what I have:
with open("/hadoop/prodtest/tips/ut/s3/test_audit_log.txt", "rb") as f:
s3.upload_fileobj(f, "us-east-1-tip-s3-stage", "BlueConnect/test_audit_log.txt")
Explanation from #danimal captures pretty much everything. If you wanted to just create a folder-like object in s3, you can simply specify that folder-name and end it with "/", so that when you look at it from the console, it will look like a folder.
It's rather useless, an empty object, without a body (consider it as a key with null value) just for eye-candy but if you really want to do it, you can.
1) You can create it on the console interactively, as it gives you that option
2_ You can use aws sdk. boto3 has put_object method for s3 client, where you specify the key as "your_folder_name/", see example below:
import boto3
session = boto3.Session() # I assume you know how to provide credentials etc.
s3 = session.client('s3', 'us-east-1')
bucket = s3.create_bucket('my-test-bucket')
response = s3.put_object(Bucket='my-test-bucket', Key='my_pretty_folder/' # note the ending "/"
And there you have your bucket.
Again, when you are uploading a file you specify "my-test-bucket/my_file" and what you did there is create a "key" with name "my-test-bucket/my_file" and put the content of your file as its "value".
In this case you have 2 objects in the bucket. First object has null body and looks like a folder, while the second one looks like it is inside that but as #danimal pointed out in reality you created 2 keys in the same flat hierarchy, it just "looks-like" what we are used to seeing in a file system.
If you delete the file, you still have the other objects, so on the aws console, it looks like folder is still there but no files inside.
If you skipped creating the folder and simply uploaded the file like you did, you would still see the folder structure in AWS Console but you have a single object at that point.
When you however list the objects from command line, you would see a single object and if you delete it on the console it looks like folder is gone too.
Files ('objects') in S3 are actually stored by their 'Key' (~folders+filename) in a flat structure in a bucket. If you place slashes (/) in your key then S3 represents this to the user as though it is a marker for a folder structure, but those folders don't actually exist in S3, they are just a convenience for the user and allow for the usual folder navigation familiar from most file systems.
So, as your code stands, although it appears you are putting a file called test_audit_log.txt in a folder called BlueConnect, you are actually just placing an object, representing your file, in the us-east-1-tip-s3-stage bucket with a key of BlueConnect/test_audit_log.txt. In order then to (seem to) put it in a new folder, simply make the key whatever the full path to the file should be, for example:
# upload_fileobj(file, bucket, key)
s3.upload_fileobj(f, "us-east-1-tip-s3-stage", "folder1/folder2/test_audit_log.txt")
In this example, the 'key' of the object is folder1/folder2/test_audit_log.txt which you can think of as the file test_audit_log.txt, inside the folder folder1 which is inside the folder folder2 - this is how it will appear on S3, in a folder structure, which will generally be different and separate from your local machine's folder structure.

Remove empty folder in S3 bucket via CLI

I have about 8K folders in an S3 bucket. Some of them are "empty" (does not have objects with its name prefix) and some are "not empty".
How I can programmatically detect such "empty" folder in the bucket and remove it.
Yes, I know there is no concept of a folder in a bucket - it just names.
An empty folder in the context of S3 is a zero-sized S3 object whose key ends with your folder separator, typically /, for example images/cats/.
If the applications that use this S3 bucket don't strictly need these folder objects but instead can infer the presence of a folder structure from the presence of file objects e.g. infer the folder images/dogs/ when they see the file images/dogs/terrier.png, then one solution to remove all empty folders would be to simply enumerate all objects that end in / and then delete all of those that are zero-sized. That would remove all folder objects.
If the applications do need these folder objects to remain for non-empty folders, then you'd do something different. For example, enumerate all S3 objects in the bucket, pull out those that represent folders (zero-sized, ending in /) and then see if that same prefix is present in any other, non-folder object.
Also, if you find that enumerating the entire bucket's contents becomes problematic (for example, if you have millions of objects) then you might consider using an S3 Inventory report to drive your process.

How to create a new folder in S3 using AWS PowerShell

I have a bucket already created but I want to create new folders inside this bucket, not upload data or anything else, just create new folders. How can I do this ?
Thanks
AWS S3 doesn't really have a first class concept of a "folder" or "directory". S3 objects have prefixes, which are segmented by slashes, so there is certainly the appearance of folders, but it is not possible to have a truly empty directory structure.
However, their AWS Console user experience does present content as such, and provides a button to "Create Folder". When using that button, the UI provides the below message:
When you create a folder, S3 console creates an object with the above
name appended by suffix "/" and that object is displayed as a folder
in the S3 console.
You could try using PowerShell's Put Object API/cmdlet to create empty objects named per that instruction. For example, you could create a folder named "my-new-folder" by creating an object named "my-new-folder/".
S3 is object storage; it's not a regular file system. Generally speaking, there is no need to attempt to create folders. Simply upload objects, for example teams/east/falcons/logo.png.
If you really, really want to give the impression that there are folders then you can create zero-sized objects whose names ends in / (or whatever your preferred folder delimiter is). The simplest way to do this is with the AWS S3 console but any SDK will let you do it too (simply issue a PutObject with no body).
I was seaching for this myself and found this.
use -content where content = key then -file or -folder are not needed
$s3Path = "/folder/" + 'subfolder/'
Write-S3Object -BucketName $s3Bucket -Key $s3Path -Content $s3Path

Folders in S3 Bucket not visible in Web Console

After deleting a few folders in our S3 bucket, I am not able to see any of my folders through the web console. We had around 10 folders and ended up deleting 6 of them. The remaining four show up when I do an 'ls' on that S3 bucket through the CLI but the bucket shows up empty on the web console.
When I turn on 'Versions' I see everything (including the 6 folders that were deleted). Am I overlooking something extremely simple?
Folders do not actually exist in Amazon S3.
For example, you could create an object like this:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This would instantly 'create' folder1 and folder2. Or, to be more accurate, the folders would 'appear' but they don't actually exist because the full filename (Key) of the object is folder1/folder2/foo.txt.
If you were then to delete that object, the folders would 'disappear' because they never actually existed.
Sometimes, if a system wants to forcefully make a folder 'appear', it can create a zero-length object with the same name as the folder. This makes the folder 'appear', but it is really the empty file that is appearing.
Bottom line: Don't worry about creating and deleting folders. They will appear when necessary and disappear when not being used. Do not try to map normal filesystem behaviour to Amazon S3.