move files within source folder to destination folder without deleting source folder - amazon-web-services

I am trying to move sub folders from one directory of a S3 bucket to another directory in the same bucket. After moving files within the sub folder, the main directory gets deleted, which must not happen for me.
aws s3 mv s3://Bucket-Name/Input-List/$i/ s3://Bucket-Name/Input-List-Archive/$i/ --recursive
COLLECTION_LIST=(A B C D E F)
for i in ${COLLECTION_LIST[#]}
do
if [ $i == "A" -o $i == "B" ]
then
aws s3 mv s3://Bucket-Name/Input-List/$i/ s3://Bucket-Name/Input-List-Archive/$i/ --recursive
else
aws s3 mv s3://Bucket-Name/Input-List/Others/$i/ s3://Bucket-Name/Input-List-Archive/Others/$i/ --recursive
Here all files within Input-List must be moved to Input-List-Archive without Input-List directory being deleted.

How about writing a script to copy the files recursively from sub folders and deleting those files from sub folder instead of using mv command?

Firstly, please note that directories/folders do not actually exist in Amazon S3.
For example, I could run this command:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This will work successfully, even if folder1 and folder2 do not exist.
The Amazon S3 management console will make those folders 'appear', but they do not actually exist.
If I then ran:
aws s3 rm s3://my-bucket/folder1/folder2/foo.txt
then the object would be deleted and the folders would 'disappear' (because they never actually existed).
Sometimes, however, people want a folder to appear. When a folder is created in the management console, a zero-length object is created with the Key (filename) set to the name of the folder. This will force an empty 'folder' to appear, but it is not actually a folder.
When listing objects in S3, API calls can return a common prefix which is similar in concept to a folder, but it is really just the "path portion" of a filename.
It is also worth mentioning that there is no "move" command in Amazon S3. Instead, when using the aws s3 mv command, the AWS CLI copies the object to a new object and then deletes the original object. This makes the object look like it was moved, but it actually was copied and deleted.
So, your options are:
Don't worry about folders. Just pretend they exist. They do not serve any purpose. OR
Create a new folder after the move. OR
Write your own program to Copy & Delete the objects without deleting the folder.
In fact, it is quite possible that the folder never existed in the first place (that is, there was no zero-length file with a Key matching the name of the folder), so it was never actually deleted. It's just that there was nothing to cause S3 to make the folder 'appear' to be there.

Related

Syncing files between projects/buckets in Google Cloud Storage

I am trying to synchronize files between two projects and two buckets on Google Cloud.
However, I would like to only copy files that are not in A but not in B (destination). It is fine to overwrite files that are both in A and B (preferred).
When I do the following:
in my bucket, I create a folder test and add the folder A with inside file-1
I run the following command: gsutil cp -r gs://from-project.appspot.com/test gs://to-project.appspot.com/test2
This works fine, and I have the folder A within the folder test2 in my to-project bucket.
Then the problem occurs:
I add a folder B and within folder A I delete file-1 and add file-2 (to test the notion of a file is in A but not in B).
When I run the same command however, I do not get that only file-2 gets copied and I have an additional folder B, but instead I get a new folder within test2 named test where inside I find A and B but without file-1 in a (basically a replica of the new situation).
Why does this happen and how can I prevent this to enable the syncing?
gsutil rsync command is preferred to synchronize content of two buckets.
You can use the -d option to delete files under your destination bucket that have been not found under the source bucket. Be careful though, because it can delete files in the destination bucket.

Google GSutil create folder

How can u create a new folder inside a bucket in google cloud storage using the gsutil command?
I tried using the same command in creating bucket but still got an error
gsutil mb -l us-east1 gs://my-awesome-bucket/new_folder/
Thanks!
The concept of directory is abstract in Google Cloud Storage. From the docs (How Subdirectories Work) :
gsutil provides the illusion of a hierarchical file tree atop the "flat" name space supported by the Google Cloud Storage service. To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.
So you cannot "create" a directory like in a traditional File System.
If you're clear about what folders and objects already exist in the bucket, then you can create a new 'folder' with gsutil by copying an object into the folder.
>mkdir test
>touch test/file1
>gsutil cp -r test gs://my-bucket
Copying file://test\file1 [Content-
Type=application/octet-stream]...
/ [1 files][ 0.0 B/ 0.0 B]
Operation completed over 1 objects.
>gsutil ls gs://my-bucket
gs://my-bucket/test/
>gsutil ls gs://my-bucket/test
gs://my-bucket/test/file1
It won't work if the local directory is empty.
More simply:
>touch file2
>gsutil cp file2 gs://my-bucket/new-folder/
Copying file://test\file2 [Content- ...
>gsutil ls gs://my-bucket/new-folder
gs://my-bucket/new-folder/file2
Be aware of the potential for Surprising Destination Subdirectory Naming. E.g. if the target directory already exists as an object. For an automated process, a more robust approach would be to use rsync.
I don't know if its possible to create an empty folder with gsutil. For that, use the console's Create Folder button.
You cannot create folders with gsutil as gsutil does not support it (workaround see below).
However, it is supported via:
UI in browser
write your own GCS client (we have written our own custom client which can create folders)
So even if Google has a flat name space structure as the other answer correctly points out, it still has the possibility to create single folders as individual objects. Unfortunately gsutil does not expose this.
(Ugly) workaround with gsutil: Add a dummy file into a folder and upload this dummy file - but the folder will be gone once you delete this file, unless other files in that folder are present.
Copied from Google cloud help:
Copy the object to a folder in the bucket
Use the gsutil cp command to create a folder and copy the image into it:
gsutil cp gs://my-awesome-bucket/kitten.png gs://my-awesome-bucket/just-a-folder/kitten3.png
This works.
You cannot create a folder with gsutil on GCS.
But you can copy an existing folder with gsutil to GCS.
To copy an existing folder with gsutil to GCS, a folder must not be empty and the flag "-r" is needed as shown below otherwise you will get error if a folder is empty or you forgot the flag -r:
gsutil cp -r <non-empty-folder> gs://your-bucket
// "-r" is needed for folder
You cannot create an empty folder with mb

Rename a folder in GCS using gsutil

I can't rename an existing folder in GCS. How do I do this?
As per the documentation, this should be:
gsutil mv gs://my_bucket/olddir gs://my_bucket/newdir
However, what happens is that olddir is placed under newdir, i.e the directory structure is like this (after the call to gsutil mv):
my_bucket
newdir
olddir
instead of (what I would expect)
my_bucket
newdir
I've tried all four combinations of putting trailing slashes or not, but none of them worked.
This is a confirmed bug in GCS, see https://issuetracker.google.com/issues/112817360
It actually only happens, when the directory name of newdir is a substring of olddir. So the gsutil call from the question actually works, but the following one would not:
gsutil mv gs://my-organization-empty-bucket/dir_old gs://my-organization-empty-bucket/dir
I reproduced your case by having a bucket with a folder named olddir of which I want to move the content to newdir folder.
the following command:
gsutils mv gs://<bucketname>/olddir gs://<bucketname>/newdir
moved the whole content of folder to the newly created newdir folder.
Olddir and newdir folders were then at the same level, in the bucket root.
after that I just had to remove the folder called olddir.
Objects in a bucket cannot be renamed.
The gsutil mv command does not remove the previous folder object like the mv comand would do in Unix CLI.
I guess that if you have tried moving folders several times by using "/" characters placed differently, the structure and hierarchy of the folders will have changed after issuing the initial command.
Please try again from the beginning.
Bear in mind that once you have a subfolder inside a folder, objects will have to be moved one by one using the full path.

AWS S3 cp creates undefined files

While using the aws cli cp command to copy the files recursively, there is a bug which creates some undefined files.
aws s3 cp --recursive $HOME/$MYHOST-$MYTIMESTAMP/$MYHOST-$MYTIMESTAMP-*.xml s3://mybucket/$MYHOST-$MYTIMESTAMP/
The program works fine and uploads to the specified bucket. But it also creates some undefined files outside the bucket in the root folder. This happens all the time and I have to rm (delete) those annoying undefined files.
I presumed it to be a bug and then tried individually uploading the files rather than using wildcards, with the same results as the recursive, it still creates additional undefined files outside the bucket in the root folder again. And this happens only when I run a bunch of the same cp commands in a bash script. In this case the problem is intermittently showing up.
aws s3 cp $HOME/$MYHOST-$MYTIMESTAMP/$MYHOST-$MYTIMESTAMP-hello.xml s3://mybucket/$MYHOST-$MYTIMESTAMP/
However while doing it only for a single file, it doesn't show up.
My Cli version -
aws-cli/1.14.34 Python/2.7.14+ Linux/4.4.104-39-default
botocore/1.8.38
Any help would be highly appreciated on this.
You have configured S3 access logging to write logs into this bucket. Presumably, these are the log files for this bucket.
Why the filenames begin with "undefined" is not clear -- something may have gone wrong when you set up logging for the bucket so that the log file prefix did not get saved -- but the filenames look like the names of the log files that S3 creates.
https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html
Best practice is to set up a separate bucket for collecting S3 access logs in each region.

AWS S3: How to delete all contents of a directory in a bucket but not the directory itself?

I have an AWS S3 bucket entitled static.mysite.com
This bucket contains a directory called html
I want to use the AWS Command Line Interface to remove all contents of the html directory, but not the directory itself. How can I do it?
This command deletes the directory too:
aws s3 rm s3://static.mysite.com/html/ --recursive
I don't see the answer to this question in the manual entry for AWS S3 rm.
Old question, but I didn't see the answer here. If you have a use case to keep the 'folder' prefix, but delete all the files, you can use --exclude with an empty match string. I found the --exclude "." and --exclude ".." options do not prevent the folder from being deleted. Use this:
aws s3 rm s3://static.mysite.com/html/ --recursive --exclude ""
I just want to confirm how the folders were created...
If you created the "subA" folder manually and then deleted the suba1 folder, you should find that the the "subA" folder remains. When you create a folder manually, you are actually creating a folder "object" which is similar to any other file/object that you upload to S3.
However, if a file was uploaded directly to a location in S3 (when the "subA" and "suba1" folder don't exist yet) you'll find that the "subA" and "suba1" folders are created automatically. You can do this using something like the AWS CLI tool e.g:
aws s3 cp file1.txt s3://bucket/subA/suba1/file1.txt
If you now delete file1.txt, there will no longer be any objects within the "subA" folder and you'll find that the "subA" and "suba1" folders no longer exist.
If another file (file2.txt) was uploaded to the path "bucket/subA/file2.txt", and you deleted file1.txt (from the previous example) you'll find that the "subA" folder remains and the "suba1" folder disappears.
https://forums.aws.amazon.com/thread.jspa?threadID=219733
aws s3 rm s3://static.mysite.com/html/ --recursive --exclude ""
this command worked for me to delete all the files but not the folder.