Copy Folder Structure from one bucket to antother without copying actual data - amazon-web-services

We have a requirement to create a bucket with the same directory structure as another s3 bucket, as AWS s3 is object-based storage I could not able to find any directory listing or copying command.
any solution or workaround will be very helpful.

The most straightforward way to do this is to use an S3 replication rule on the bucket to copy the source to a destination bucket.

As you know, S3 does not have "folders" in the same way that a filesystem does. However, the AWS console simulates them with zero-length objects (doc):
The Amazon S3 console implements folder object creation by creating a zero-byte object with the folder prefix and delimiter value as the key
So if you want to replicate the "folder" structure, you will need to do the following:
Iterate all objects in the bucket.
For each object, strip off everything after the last slash (/). For example, if you have the object /foo/bar/baz, you want to end up with the key /foo/bar/ (the trailing slash is important).
Call PutObject in your language of choice, using that key and a zero-length body.
Repeat steps 2 and 3, removing subsequent trailing components (so /foo/bar/baz becomes /foo/bar/, and that becomes /foo/).
You should use whatever "set" implementation your language of choice supports, to avoid attempting to create the same "folder" multiple times.

The below piece of code (python) will create the empty folders structure the same as other s3 bucket folder structure.
src_bucketName = 'source_bucket'
dest_bucketName = 'dest_bucket'
uniqPath = []
s3 = boto3.resource('s3')
s3c = boto3.client('s3')
bucket = s3.Bucket(src_bucketName)
for my_bucket_object in bucket.objects.all():
path = my_bucket_object.key.rsplit('/',1)
uniqPath.append(path[0])
dir = set(uniqPath)
udir = list(dir)
for i in udir:
s3c.put_object(
Bucket=dest_bucketName,
Key=(i+'/')
)

Related

AWS-s3 : How to copy files from s3 bucket to another s3 bucket based on filename

using the below code , Im able to list the files in s3 bucket. I would like to know how to copy/move the files from s3 one bucket (s3-dev) to another s3 bucket(s3-prod) based on file names. eg if a file with name "abc-21-04-2021.csv" is placed in s3 Bucket (s3-dev) , how to find the filename starting with "abc" and copy/move to another s3 bucket.
consider the files in s3-dev Bucket as 1)abc-21-04-2021.csv, 2)abc-19-04-2021.csv, 3)def-18-04-2021.csv , i need to move files starting with "abc" into another s3 bucket (s3-prod).
please suggest and share your inputs.
my_bucket = s3.Bucket('s3-dev')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object.key)```
I guess your code written in python, you just need to handle the bucket object as string.
my_bucket = s3.Bucket('s3-dev')
for my_bucket_object in my_bucket.objects.all():
file_name = my_bucket_object.key.split('/')[-1] # avoid the folder contains the pattern
if file_name.startswith('abc'):
print(my_bucket_object.key)

uploading file to specific folder in S3 bucket using boto3

My code is working. The only issue I'm facing is that I cannot specify the folder within the S3 bucket that I would like to place my file in. Here is what I have:
with open("/hadoop/prodtest/tips/ut/s3/test_audit_log.txt", "rb") as f:
s3.upload_fileobj(f, "us-east-1-tip-s3-stage", "BlueConnect/test_audit_log.txt")
Explanation from #danimal captures pretty much everything. If you wanted to just create a folder-like object in s3, you can simply specify that folder-name and end it with "/", so that when you look at it from the console, it will look like a folder.
It's rather useless, an empty object, without a body (consider it as a key with null value) just for eye-candy but if you really want to do it, you can.
1) You can create it on the console interactively, as it gives you that option
2_ You can use aws sdk. boto3 has put_object method for s3 client, where you specify the key as "your_folder_name/", see example below:
import boto3
session = boto3.Session() # I assume you know how to provide credentials etc.
s3 = session.client('s3', 'us-east-1')
bucket = s3.create_bucket('my-test-bucket')
response = s3.put_object(Bucket='my-test-bucket', Key='my_pretty_folder/' # note the ending "/"
And there you have your bucket.
Again, when you are uploading a file you specify "my-test-bucket/my_file" and what you did there is create a "key" with name "my-test-bucket/my_file" and put the content of your file as its "value".
In this case you have 2 objects in the bucket. First object has null body and looks like a folder, while the second one looks like it is inside that but as #danimal pointed out in reality you created 2 keys in the same flat hierarchy, it just "looks-like" what we are used to seeing in a file system.
If you delete the file, you still have the other objects, so on the aws console, it looks like folder is still there but no files inside.
If you skipped creating the folder and simply uploaded the file like you did, you would still see the folder structure in AWS Console but you have a single object at that point.
When you however list the objects from command line, you would see a single object and if you delete it on the console it looks like folder is gone too.
Files ('objects') in S3 are actually stored by their 'Key' (~folders+filename) in a flat structure in a bucket. If you place slashes (/) in your key then S3 represents this to the user as though it is a marker for a folder structure, but those folders don't actually exist in S3, they are just a convenience for the user and allow for the usual folder navigation familiar from most file systems.
So, as your code stands, although it appears you are putting a file called test_audit_log.txt in a folder called BlueConnect, you are actually just placing an object, representing your file, in the us-east-1-tip-s3-stage bucket with a key of BlueConnect/test_audit_log.txt. In order then to (seem to) put it in a new folder, simply make the key whatever the full path to the file should be, for example:
# upload_fileobj(file, bucket, key)
s3.upload_fileobj(f, "us-east-1-tip-s3-stage", "folder1/folder2/test_audit_log.txt")
In this example, the 'key' of the object is folder1/folder2/test_audit_log.txt which you can think of as the file test_audit_log.txt, inside the folder folder1 which is inside the folder folder2 - this is how it will appear on S3, in a folder structure, which will generally be different and separate from your local machine's folder structure.

AWS S3 Listing API - How to list everything inside S3 Bucket with specific prefix

I am trying to list all items with specific prefix in S3 bucket. Here is directory structure that I have:
Item1/
Item2/
Item3/
Item4/
image_1.jpg
Item5/
image_1.jpg
image_2.jpg
When I set prefex to be Item1/Item2, I get as a result following keys:
Item1/Item2/
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
What I would like to get is:
Item1/Item2/
Item1/Item2/Item3
Item1/Item2/Item3/Item4
Item1/Item2/Item3/Item5
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
Is there anyway to achieve this in golang?
Folders do not actually exist in Amazon S3. It is a flat object storage system.
For example, using the AWS Command-Line Interface (CLI) I could copy a command to an Amazon S3 bucket:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This work just fine, even though folder1 and folder2 do not exist. This is because objects are stored with a Key (filename) that includes the full path of the object. So, the above object actually has a Key (filename) of:
folder1/folder2/foo.txt
However, to make things easier for humans, the Amazon S3 management console makes it appear as though there are folders. In S3, these are called Common Prefixes rather than folders.
So, when you make an API call to list the contents of the bucket while specifying a Prefix, it simply says "List all objects whose Key starts with this string".
Your listing doesn't show any folders because they don't actually exist.
Now, just to contradict myself, it actually is possible to create a folder (eg by clicking Create folder in the management console). This actually creates a zero-length object with the same name as the folder. The folder will then appear in listings because it is actually listing the zero-length object rather than the folder.
This is probably why Item1/Item2/ appears in your listing, but Item1/Item2/Item3 does not. Somebody, at some stage, must have "created a folder" called Item1/Item2/, which actually created a zero-length object with that Key.

Remove empty folder in S3 bucket via CLI

I have about 8K folders in an S3 bucket. Some of them are "empty" (does not have objects with its name prefix) and some are "not empty".
How I can programmatically detect such "empty" folder in the bucket and remove it.
Yes, I know there is no concept of a folder in a bucket - it just names.
An empty folder in the context of S3 is a zero-sized S3 object whose key ends with your folder separator, typically /, for example images/cats/.
If the applications that use this S3 bucket don't strictly need these folder objects but instead can infer the presence of a folder structure from the presence of file objects e.g. infer the folder images/dogs/ when they see the file images/dogs/terrier.png, then one solution to remove all empty folders would be to simply enumerate all objects that end in / and then delete all of those that are zero-sized. That would remove all folder objects.
If the applications do need these folder objects to remain for non-empty folders, then you'd do something different. For example, enumerate all S3 objects in the bucket, pull out those that represent folders (zero-sized, ending in /) and then see if that same prefix is present in any other, non-folder object.
Also, if you find that enumerating the entire bucket's contents becomes problematic (for example, if you have millions of objects) then you might consider using an S3 Inventory report to drive your process.

How to create a new folder in S3 using AWS PowerShell

I have a bucket already created but I want to create new folders inside this bucket, not upload data or anything else, just create new folders. How can I do this ?
Thanks
AWS S3 doesn't really have a first class concept of a "folder" or "directory". S3 objects have prefixes, which are segmented by slashes, so there is certainly the appearance of folders, but it is not possible to have a truly empty directory structure.
However, their AWS Console user experience does present content as such, and provides a button to "Create Folder". When using that button, the UI provides the below message:
When you create a folder, S3 console creates an object with the above
name appended by suffix "/" and that object is displayed as a folder
in the S3 console.
You could try using PowerShell's Put Object API/cmdlet to create empty objects named per that instruction. For example, you could create a folder named "my-new-folder" by creating an object named "my-new-folder/".
S3 is object storage; it's not a regular file system. Generally speaking, there is no need to attempt to create folders. Simply upload objects, for example teams/east/falcons/logo.png.
If you really, really want to give the impression that there are folders then you can create zero-sized objects whose names ends in / (or whatever your preferred folder delimiter is). The simplest way to do this is with the AWS S3 console but any SDK will let you do it too (simply issue a PutObject with no body).
I was seaching for this myself and found this.
use -content where content = key then -file or -folder are not needed
$s3Path = "/folder/" + 'subfolder/'
Write-S3Object -BucketName $s3Bucket -Key $s3Path -Content $s3Path