Syncing files between projects/buckets in Google Cloud Storage - google-cloud-platform

I am trying to synchronize files between two projects and two buckets on Google Cloud.
However, I would like to only copy files that are not in A but not in B (destination). It is fine to overwrite files that are both in A and B (preferred).
When I do the following:
in my bucket, I create a folder test and add the folder A with inside file-1
I run the following command: gsutil cp -r gs://from-project.appspot.com/test gs://to-project.appspot.com/test2
This works fine, and I have the folder A within the folder test2 in my to-project bucket.
Then the problem occurs:
I add a folder B and within folder A I delete file-1 and add file-2 (to test the notion of a file is in A but not in B).
When I run the same command however, I do not get that only file-2 gets copied and I have an additional folder B, but instead I get a new folder within test2 named test where inside I find A and B but without file-1 in a (basically a replica of the new situation).
Why does this happen and how can I prevent this to enable the syncing?

gsutil rsync command is preferred to synchronize content of two buckets.
You can use the -d option to delete files under your destination bucket that have been not found under the source bucket. Be careful though, because it can delete files in the destination bucket.

Related

tar folder in s3 bucket?

Let's say I have a folder on s3:
s3://tmp/folder1
With several folders within. I would like this to now be:
s3://tmp/folder1.tar.gz
in which the contents of folder1 have been tar.gz'd. However, from what I can find, the only way to do this would be to:
Either download folder1 to a local directory or cp/mv to an existing ec2 instance,
run tar czv folder1.tar.gz folder1
Reupload to s3://tmp
Is there a way to do this without having to move/download folder1? In other words, is there an amazon cli command / set of commands to do this without the download / moving?
No.
Amazon S3 does not provide the ability to manipulate the contents of objects.
You would need to copy the data somewhere, run the tar command, then upload it.
Think of it like asking a Hard Disk to tar/zip a file without a computer attached. It doesn't know how to do that.

How to prevent the remove of a certain folder in a Google Cloud Storage Bucket?

I'm working in the deploy of some files to a Bucket in Google Cloud via Gitlab CI. The command that i'm using to remove the objects is gsutil -m rm gs://pr.homefront.com
That command removes everything from the bucket, but i would like to change it to still removing everything except a folder named "ibw" inside of that bucket every single time the command runs.
You can add temporary holds to your objects within a folder, this holds prevent the manipulation or deletion over this objects
You can apply the holds via the following gsutil command:
gsutil -m retention temp set gs://bucketname/ibw/*******
* each asterisk is a folder level
you can set the holds before rm command and unset these after the rm command. The objects cannot be modified or removed until the hold is removed
UPDATE:
It is possible to add holds to empty folders
I created an empty folder via Cloud console
I run the following commands to avoid remove the empty folder
#this block the empty folder, if this is an folder with files all files will be blocked
gsutil -m retention temp set gs://bucketname/ibw/*******
#this operation can't remove objects,and empty folders with holds
gsutil -m rm gs://bucketname
#remove the hold of the empty folder, if this is an folder with files all files will be released
gsutil -m retention temp release gs://bucketname/ibw/*******

How to create a empty folder in google storage(bucket) using gsutil command?

How we can create the folder using gsutil command. I am using Bashoperator in airflow where I need to use the gsutil Bash command, Bucket is already created I want to create a folder inside bucket.
I already tried with below command but It's not working for me.
$ gsutil cp <new_folder> gs://<bucketname>/
I am getting error - CommandException: No URLs matched: new_folder
Google Storage does not work like a regular file system as in Windows/Linux. It appears to have folders but in the background it behaves as it does not. It only allows us to create "folders" so we can organize better and for our comfort.
If you want to save data in specific folders from gsutil try this.
gsutil cp [filetocopy] gs://your-bucket/folderyouwant/your-file
It will store the item in a "folder".
Check this link for more gsutil cp information.
This is the logic behind Google Cloud Storage "Folders".
gsutil will make a bucket listing request for the named bucket, using
delimiter="/" and prefix="abc". It will then examine the bucket
listing results and determine whether there are objects in the bucket
whose path starts with gs://your-bucket/abc/, to determine whether to
treat the target as an object name or a directory name. In turn this
impacts the name of the object you create: If the above check
indicates there is an "abc" directory you will end up with the object
gs://your-bucket/abc/your-file; otherwise you will end up with the
object gs://your-bucket/abc.
Here you have more interesting information about this if you want.
Apparently the ability to create an empty folder using gsutil is a request that has been seen a few times but not yet satisfied. There appears to be some workarounds by using API that can then be scripted. The GitHub issue for the ability to create empty folders through scripting can be found here:
https://github.com/GoogleCloudPlatform/gsutil/issues/388
You cannot create or copy an empty folder to GCS with gsutil as far as I researched and tried about it. Yes, it's inconvenient somehow.
A folder must not be empty to be created or copied to GCS and don't forget the flag "-r" to create or copy a folder to GCS as shown below otherwise you will get error if a folder is empty or you forgot the flag -r:
gsutil cp -r <non-empty-folder> gs://your-bucket
// "-r" is needed for folder

move files within source folder to destination folder without deleting source folder

I am trying to move sub folders from one directory of a S3 bucket to another directory in the same bucket. After moving files within the sub folder, the main directory gets deleted, which must not happen for me.
aws s3 mv s3://Bucket-Name/Input-List/$i/ s3://Bucket-Name/Input-List-Archive/$i/ --recursive
COLLECTION_LIST=(A B C D E F)
for i in ${COLLECTION_LIST[#]}
do
if [ $i == "A" -o $i == "B" ]
then
aws s3 mv s3://Bucket-Name/Input-List/$i/ s3://Bucket-Name/Input-List-Archive/$i/ --recursive
else
aws s3 mv s3://Bucket-Name/Input-List/Others/$i/ s3://Bucket-Name/Input-List-Archive/Others/$i/ --recursive
Here all files within Input-List must be moved to Input-List-Archive without Input-List directory being deleted.
How about writing a script to copy the files recursively from sub folders and deleting those files from sub folder instead of using mv command?
Firstly, please note that directories/folders do not actually exist in Amazon S3.
For example, I could run this command:
aws s3 cp foo.txt s3://my-bucket/folder1/folder2/foo.txt
This will work successfully, even if folder1 and folder2 do not exist.
The Amazon S3 management console will make those folders 'appear', but they do not actually exist.
If I then ran:
aws s3 rm s3://my-bucket/folder1/folder2/foo.txt
then the object would be deleted and the folders would 'disappear' (because they never actually existed).
Sometimes, however, people want a folder to appear. When a folder is created in the management console, a zero-length object is created with the Key (filename) set to the name of the folder. This will force an empty 'folder' to appear, but it is not actually a folder.
When listing objects in S3, API calls can return a common prefix which is similar in concept to a folder, but it is really just the "path portion" of a filename.
It is also worth mentioning that there is no "move" command in Amazon S3. Instead, when using the aws s3 mv command, the AWS CLI copies the object to a new object and then deletes the original object. This makes the object look like it was moved, but it actually was copied and deleted.
So, your options are:
Don't worry about folders. Just pretend they exist. They do not serve any purpose. OR
Create a new folder after the move. OR
Write your own program to Copy & Delete the objects without deleting the folder.
In fact, it is quite possible that the folder never existed in the first place (that is, there was no zero-length file with a Key matching the name of the folder), so it was never actually deleted. It's just that there was nothing to cause S3 to make the folder 'appear' to be there.

Google GSutil create folder

How can u create a new folder inside a bucket in google cloud storage using the gsutil command?
I tried using the same command in creating bucket but still got an error
gsutil mb -l us-east1 gs://my-awesome-bucket/new_folder/
Thanks!
The concept of directory is abstract in Google Cloud Storage. From the docs (How Subdirectories Work) :
gsutil provides the illusion of a hierarchical file tree atop the "flat" name space supported by the Google Cloud Storage service. To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.
So you cannot "create" a directory like in a traditional File System.
If you're clear about what folders and objects already exist in the bucket, then you can create a new 'folder' with gsutil by copying an object into the folder.
>mkdir test
>touch test/file1
>gsutil cp -r test gs://my-bucket
Copying file://test\file1 [Content-
Type=application/octet-stream]...
/ [1 files][ 0.0 B/ 0.0 B]
Operation completed over 1 objects.
>gsutil ls gs://my-bucket
gs://my-bucket/test/
>gsutil ls gs://my-bucket/test
gs://my-bucket/test/file1
It won't work if the local directory is empty.
More simply:
>touch file2
>gsutil cp file2 gs://my-bucket/new-folder/
Copying file://test\file2 [Content- ...
>gsutil ls gs://my-bucket/new-folder
gs://my-bucket/new-folder/file2
Be aware of the potential for Surprising Destination Subdirectory Naming. E.g. if the target directory already exists as an object. For an automated process, a more robust approach would be to use rsync.
I don't know if its possible to create an empty folder with gsutil. For that, use the console's Create Folder button.
You cannot create folders with gsutil as gsutil does not support it (workaround see below).
However, it is supported via:
UI in browser
write your own GCS client (we have written our own custom client which can create folders)
So even if Google has a flat name space structure as the other answer correctly points out, it still has the possibility to create single folders as individual objects. Unfortunately gsutil does not expose this.
(Ugly) workaround with gsutil: Add a dummy file into a folder and upload this dummy file - but the folder will be gone once you delete this file, unless other files in that folder are present.
Copied from Google cloud help:
Copy the object to a folder in the bucket
Use the gsutil cp command to create a folder and copy the image into it:
gsutil cp gs://my-awesome-bucket/kitten.png gs://my-awesome-bucket/just-a-folder/kitten3.png
This works.
You cannot create a folder with gsutil on GCS.
But you can copy an existing folder with gsutil to GCS.
To copy an existing folder with gsutil to GCS, a folder must not be empty and the flag "-r" is needed as shown below otherwise you will get error if a folder is empty or you forgot the flag -r:
gsutil cp -r <non-empty-folder> gs://your-bucket
// "-r" is needed for folder
You cannot create an empty folder with mb