Rename parent folder in S3 - amazon-web-services

We know that in S3 objects, the folder path is just a part of the object key itself.
Having an object structure similar to the following:
/files/user/09874/01/
/files/user/09875/01/
/files/user/09875/02/
/files/user/09876/01/
/files/user/09876/02/
What kind of operation would you recommend to rename the parent /files/ to /something/, having in mind that there are thousands of files and that the number of requests should be the cheapest/minimum?
(with the following docs under consideration)

As you said prefix are part of the file name. Unfortunately, The only way that i can think of is to iterate the list of objects and move to the new prefix.
in cli:
aws s3 mv --recursive s3://bucket/files s3://bucket/something/

Related

AWS S3 - Use powershell to delete all files but keep the folders

I have a powershell script, that downloads all files form an S3 bucket, and then removes the files from the bucket. All the files I'm removing are stored in a subfolder in the S3 bucket, and I just want to delete the files but maintain the subfolders.
I'm currently using the following command to delete the files in S3 once the file has been downloaded from S3.
Remove-S3Object -BucketName $S3Bucket -Key $key -Force
My problem is that if it removes all the files in the subfolder, the subfolder is removed as well. Is there a way to remove the file, but keep the subfolder present using powerhsell. I believe I can do something like this,
aws s3 rm s3://<key_to_be_removed> --exclude "<subfolder_key>"
but not quite sure if that'll work.
I'm looking for the best way to accomplish this, and at the moment, my only option is to recreate the subfolder via the script if the subfolder not longer exists.
The only way to accomplish having an empty folder is to create a zero-length object which has the same name as the folder you want to keep. This is actually how the S3 console enables you to create an empty folder.
You can check this by running $ aws s3 ls s3://your-bucket/folderfoo/ and observing an output object having length of zero bytes.
See more on this topic here.
As already commented, S3 does not really have folders the way file systems do. The folders as presented by most S3 browsers are just generated based on the paths of the files/objects. If you upload an object/file named folder/file, the browsers will present folder as folder with file as a file in the folder. But technically, all that there is is the file/object folder/file. The folder does not exist on its own.
You can explicitly create a folder by creating an empty empty-named object with "in the folder": folder/. If you do that, it will appear the the folder exists even if there are no files in it. But if you do not do that, the virtual folder disappears once you remove all objects in the folder.
Now the question is whether your command removes even the empty named object representing the folder or not. I cannot tell that.

AWS S3 Listing API - How to list everything inside S3 Bucket with specific prefix

I am trying to list all items with specific prefix in S3 bucket. Here is directory structure that I have:
Item1/
Item2/
Item3/
Item4/
image_1.jpg
Item5/
image_1.jpg
image_2.jpg
When I set prefex to be Item1/Item2, I get as a result following keys:
Item1/Item2/
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
What I would like to get is:
Item1/Item2/
Item1/Item2/Item3
Item1/Item2/Item3/Item4
Item1/Item2/Item3/Item5
Item1/Item2/Item3/Item4/image_1.jpg
Item1/Item2/Item3/Item5/image_1.jpg
Item1/Item2/Item3/Item5/image_2.jpg
Is there anyway to achieve this in golang?
Folders do not actually exist in Amazon S3. It is a flat object storage system.
For example, using the AWS Command-Line Interface (CLI) I could copy a command to an Amazon S3 bucket:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This work just fine, even though folder1 and folder2 do not exist. This is because objects are stored with a Key (filename) that includes the full path of the object. So, the above object actually has a Key (filename) of:
folder1/folder2/foo.txt
However, to make things easier for humans, the Amazon S3 management console makes it appear as though there are folders. In S3, these are called Common Prefixes rather than folders.
So, when you make an API call to list the contents of the bucket while specifying a Prefix, it simply says "List all objects whose Key starts with this string".
Your listing doesn't show any folders because they don't actually exist.
Now, just to contradict myself, it actually is possible to create a folder (eg by clicking Create folder in the management console). This actually creates a zero-length object with the same name as the folder. The folder will then appear in listings because it is actually listing the zero-length object rather than the folder.
This is probably why Item1/Item2/ appears in your listing, but Item1/Item2/Item3 does not. Somebody, at some stage, must have "created a folder" called Item1/Item2/, which actually created a zero-length object with that Key.

Rename and Move S3 files based on their folders name in pyspark

I am wringing some dataframes using partitionBy to S3. The folder structure that gets created is as below.
root/
date=2018-01-01/
date=2018-01-02/
I want to move these files to another directory in s3 and rename the folders as
root1/
20180101/
20180102/
Is there a way that I can achieve this from pyspark?
Also i need the files to be renamed in a sequential way inside the directories,e.g :
root1/
20180101/FILE_1.csv
20180101/FILE_2.csv
You can't directly rename S3 objects.
So one way to achieve this is to copy the objects to have desirable name and then delete the original objects.
Also, S3 buckets do not have a directory structure, "directory structure" is just prefixes in the objects' keys.
You have two options, either call aws cli from python using subprocees or use boto3 library to copy all the files from one "directory" to another.
Solution using subprocess:
import subprocess
subprocess.check_call("aws s3 sync s3://bucket/root/date=2018-01-01/ s3://bucket/root1/20180101/".split())
sync command will copy recursively. Then you can use aws s3 rm --recursive "somepath" . Call it using subprocess again.

How to create a new folder in S3 using AWS PowerShell

I have a bucket already created but I want to create new folders inside this bucket, not upload data or anything else, just create new folders. How can I do this ?
Thanks
AWS S3 doesn't really have a first class concept of a "folder" or "directory". S3 objects have prefixes, which are segmented by slashes, so there is certainly the appearance of folders, but it is not possible to have a truly empty directory structure.
However, their AWS Console user experience does present content as such, and provides a button to "Create Folder". When using that button, the UI provides the below message:
When you create a folder, S3 console creates an object with the above
name appended by suffix "/" and that object is displayed as a folder
in the S3 console.
You could try using PowerShell's Put Object API/cmdlet to create empty objects named per that instruction. For example, you could create a folder named "my-new-folder" by creating an object named "my-new-folder/".
S3 is object storage; it's not a regular file system. Generally speaking, there is no need to attempt to create folders. Simply upload objects, for example teams/east/falcons/logo.png.
If you really, really want to give the impression that there are folders then you can create zero-sized objects whose names ends in / (or whatever your preferred folder delimiter is). The simplest way to do this is with the AWS S3 console but any SDK will let you do it too (simply issue a PutObject with no body).
I was seaching for this myself and found this.
use -content where content = key then -file or -folder are not needed
$s3Path = "/folder/" + 'subfolder/'
Write-S3Object -BucketName $s3Bucket -Key $s3Path -Content $s3Path

Replicate local directory in S3 bucket

I have to replicate my local folder structure in S3 bucket, I am able to do so but its not creating folders which are empty. My local folder structure is as follows and command used is.
"aws-exec s3 sync ./inbound s3://msit.xxwmm.supplychain.relex.eeeeeeeeee/
its only creating inbound/procurement/pending/test.txt, masterdata and transaction is not cretated but if i put some file in each directory it will create.
As answered by #SabeenMalik in this StackOverflow thread:
S3 doesn't have the concept of directories, the whole folder/file.jpg
is the file name. If using a GUI tool or something you delete the
file.jpg from inside the folder, you will most probably see that the
folder is gone too. The visual representation in terms of directories
is for user convenience.
You do not need to pre-create the directory structure. Just pretend that the structure is there and everything will be okay.
Amazon S3 will automatically create the structure as objects are written to paths. For example, creating an object called s3://bucketname/inbound/procurement/foo` will automatically create the directories.
(This isn't strictly true because Amazon S3 doesn't use directories, but it will appear that the directories are there.)