Why AWS S3 uses objects and not file & directories - amazon-web-services

Why AWS S3 uses objects and not file & directories is there any specific reason to not have directories/folders in s3

You are welcome to use directories/folders in Amazon S3. However, please realise that they do not actually exist.
Amazon S3 is not a filesystem. It is an object storage service that is highly scalable, stores trillions of objects and serves millions of objects per second. To meet the demands of such scale, it has been designed as a Key-Value store. The name of the file is the Key and the contents of the file is the Object.
When a file is uploaded to a directory (eg cat.jpg is stored in the images directory), it is actually stored with a filename of images/cat.jpg. This makes is appear to be in the images directory, but the reality is that the directory does not exist -- rather, the name of the object includes the full path.
This will not impact your normal usage of Amazon S3. However, it is not possible to rename a directory because the directory does not exist. Instead, rename the file to rename the directory. For example:
aws s3 mv s3://my-bucket/images/cat.jpg s3://my-bucket/pictures/cat.jpg
This will cause the pictures directory to magically appear, with cat.jpg inside it. There is not need to create the directory first, because it doesn't actually exist. This is because the user interface is making it appear as though there are directories.
Bottom line: Feel free to use directories, but be aware that they do not actually exist and can't be renamed.

Related

Is it feasible to maintain directory structure when backing up to AWS S3 Glacier classes?

I am trying to backup 2TB from a shared drive of Windows Server to S3 Glacier
There are maybe 100 folders (some may be nested ) and perhaps 5000 files (some small like spread sheets, photos and other are larger like server images. My first question is what counts as an object here?
Let’s say I have Folder 1 which has 10 folders inside it. Each of 10 folders have 100 files.
Would number of objects be 1 folder + (10 folders * 100 files) = 1001 objects?
I am trying to understand how folder nesting is treated in S3. Do I have to manually create each folder as a prefix and then upload each file inside that using AWS CLI? I am trying to recreate the shared drive experience on the cloud where I can browse the folders and download the files I need.
Amazon S3 does not actually support folders. It might look like it does, but it actually doesn't.
For example, you could upload an object to invoices/january.txt and the invoices directory will just magically 'appear'. Then, if you deleted that object, the invoices folder would magically 'disappear' (because it never actually existed).
So, feel free to upload objects to any location without creating the directories first.
However, if you click the Create folder button in the Amazon S3 management console, it will create a zero-length object with the name of the directory. This will make the directory 'appear' and it would be counted as an object.
The easiest way to copy the files from your Windows computer to an Amazon S3 bucket would be:
aws s3 sync directoryname s3://bucket-name/ --storage-class DEEP_ARCHIVE
It will upload all files, including files in subdirectories. It will not create the folders, since they aren't necessary. However, the folder will still 'appear' in S3.

Move large number of folders and files inside a GCS bucket

I have a bucket on GCP and at the top level of this bucket, I have a bunch of folders.
I want to create a new folder and move all of the other ones into it.
However, I've mounted my bucket with gcsfuse and tried traditional Linux mv commands. This is not allowed, apparently.
Likewise, I have also tried gsutil -m mv gs://mybucket/* gs://mybucket/new_folder/ and have received the command error that wildcards are not allowed in this operation.
What's the best option to get this large number of files moved into a new directory?
Posting this as a Community Wiki answer, based in the comments provided by #JohnHanley.
A few concepts to note for Cloud Storage.
Objects are immutable, which means you cannot rename then. You must copy objects and delete the original to emulate changing the name.
Directories/Folders do not exist. The namespace is flat, all objects are in the root directory. The appearance of folders is just a part of the object name.
Cloud Storage supports internal object copy. Be careful not to use a feature which first downloads the object and then uploads it.
Considering this information, you will need to use a tool, for example, the gsutil, so you can start to rename and move the files as you would like.

Replace content in all files inside s3 bucket

I have a s3 bucket which is mapped to a domian say xyz.com . When ever a user register on xyz.com a file is created and stored in s3 bucket. Now i have 1000 of files in s3 and I want to replace some text in those files. All files have common name in start ex abc-{rand}.txt
The safest way of doing this would be to regenerate them again through the same process you originally used.
Personally I would try to avoid find and replace as it could lead to modifying parts that you did not intend.
Run multiple generations in parallel and override the existing files. This will ensure the files you generate will match your expectation and will not need to be modified again.
As a suggestion enable versioning before any of these interactions if you want the ability to rollback quickly in a scenario where it needs to be reverted.
Sadly, you can't do this in place in S3. You have to download them, change their content and re-upload.
This is because S3 is an object storage system, not regular file system.
To simply working with S3 files, you can use third part tool s3fs-fuse. The tool will make the S3 appear like a filesystem on your os.

How to determine the key of the file uploaded on S3?

I have a file uploaded over s3 bucket in some folder hierarchy.
/a/b/c/file_i_want_to_stream.csv
Now, if this file was at root level, I know the key: file name itself.
However, I am unable to determine the key when in some folder.
Amazon S3 does not actually have folders. It is a flat object storage system.
The hierarchy you see is actually part of the filename (Key) of the object.
Therefore, object /a/b/c/file.csv is stored in the root with a name of /a/b/c/file.csv. It simply appears to be in a directory hierarchy called /a/b/c/.
There are also features of Amazon S3 that make this easier to use, such as the concept of a CommonPrefix that is effectively a folder. So, when listing bucket contents, you can ask for a listing of all objects with a CommonPrefix of /a/b/c/.
Bottom line: The Key (filename) includes the path.

Replicate local directory in S3 bucket

I have to replicate my local folder structure in S3 bucket, I am able to do so but its not creating folders which are empty. My local folder structure is as follows and command used is.
"aws-exec s3 sync ./inbound s3://msit.xxwmm.supplychain.relex.eeeeeeeeee/
its only creating inbound/procurement/pending/test.txt, masterdata and transaction is not cretated but if i put some file in each directory it will create.
As answered by #SabeenMalik in this StackOverflow thread:
S3 doesn't have the concept of directories, the whole folder/file.jpg
is the file name. If using a GUI tool or something you delete the
file.jpg from inside the folder, you will most probably see that the
folder is gone too. The visual representation in terms of directories
is for user convenience.
You do not need to pre-create the directory structure. Just pretend that the structure is there and everything will be okay.
Amazon S3 will automatically create the structure as objects are written to paths. For example, creating an object called s3://bucketname/inbound/procurement/foo` will automatically create the directories.
(This isn't strictly true because Amazon S3 doesn't use directories, but it will appear that the directories are there.)