Find a specific filename in a specific subdirectory with gsutil ls - google-cloud-platform

I have a Google Cloud Platform Storage bucket bucket with a top-level-folder. I want to list a file with specific extension that could be located in any sub-directory withing the top the level folder. How can I do that?
Basically I am having trouble using the glob pattern twice.
gsutil ls gs://bucket/top-level-folder/*/**<sub-directory>/**/*.<extension>

The way to do it is:
gsutil ls gs://bucket/top-level-folder/**<sub-directory>**.<extension>
This will list all the files under top-level-directory which end in the desired extension
EDIT:
I changed the code to include the subdirectory on the gsutil command.

Related

Folder name with date on GCP

I want to create a folder in GCP bucket with date as suffix:
I am trying this
gsutil mkdir gs://bucket_name/raw/data_"$(date +"%m-%d-%y")"
I also tried this:
dt="$(date +"%m-%d-%y")"
mkdir data_$dt
gsutil cp -r data_$dt gs://bucket_name/raw/
But in this getting error :
CommandException: No URLs matched
is there any other way?
Folders doesn't exist in Cloud Storage. The folder representation on the console is simply a human representation.
All the blobs are stored at the root of the bucket. The file name contain the path (that you name folder) and the effective name. Thus, if you add a file with a path, you see directories. If you remove it, all the directories disappeared.
Because of this, you can't filter on a file pattern, only on the path prefix.
So, the solution if you want to do this is to create a placeholder file
dt="$(date +"%m-%d-%y")"
mkdir data_$dt
touch data_$dt/placeholder
gsutil cp -r data_$dt gs://bucket_name/raw/

How to exclude the particular folder in gcs bucket in google cloud while copying to local machine?

I am trying to copy the files and folders from google cloud storage to vm machine using gsutil command but i need to exclude few of the folders in the gcs bucket while copying to vm, i tried searching for the options but i couldn't find it, please help if anyone knows the command for this.
Thanks in-advance,
For this you can use a command like:
gsutil -m rsync -r -x '^dir3/*' gs://bucket
this should retrieve all objects located on the bucket, except objects beginning with dir3 (files not located in dir3 directory in your example).
Here you can find more details about the rsync command

How to create a empty folder in google storage(bucket) using gsutil command?

How we can create the folder using gsutil command. I am using Bashoperator in airflow where I need to use the gsutil Bash command, Bucket is already created I want to create a folder inside bucket.
I already tried with below command but It's not working for me.
$ gsutil cp <new_folder> gs://<bucketname>/
I am getting error - CommandException: No URLs matched: new_folder
Google Storage does not work like a regular file system as in Windows/Linux. It appears to have folders but in the background it behaves as it does not. It only allows us to create "folders" so we can organize better and for our comfort.
If you want to save data in specific folders from gsutil try this.
gsutil cp [filetocopy] gs://your-bucket/folderyouwant/your-file
It will store the item in a "folder".
Check this link for more gsutil cp information.
This is the logic behind Google Cloud Storage "Folders".
gsutil will make a bucket listing request for the named bucket, using
delimiter="/" and prefix="abc". It will then examine the bucket
listing results and determine whether there are objects in the bucket
whose path starts with gs://your-bucket/abc/, to determine whether to
treat the target as an object name or a directory name. In turn this
impacts the name of the object you create: If the above check
indicates there is an "abc" directory you will end up with the object
gs://your-bucket/abc/your-file; otherwise you will end up with the
object gs://your-bucket/abc.
Here you have more interesting information about this if you want.
Apparently the ability to create an empty folder using gsutil is a request that has been seen a few times but not yet satisfied. There appears to be some workarounds by using API that can then be scripted. The GitHub issue for the ability to create empty folders through scripting can be found here:
https://github.com/GoogleCloudPlatform/gsutil/issues/388
You cannot create or copy an empty folder to GCS with gsutil as far as I researched and tried about it. Yes, it's inconvenient somehow.
A folder must not be empty to be created or copied to GCS and don't forget the flag "-r" to create or copy a folder to GCS as shown below otherwise you will get error if a folder is empty or you forgot the flag -r:
gsutil cp -r <non-empty-folder> gs://your-bucket
// "-r" is needed for folder

Google GSutil create folder

How can u create a new folder inside a bucket in google cloud storage using the gsutil command?
I tried using the same command in creating bucket but still got an error
gsutil mb -l us-east1 gs://my-awesome-bucket/new_folder/
Thanks!
The concept of directory is abstract in Google Cloud Storage. From the docs (How Subdirectories Work) :
gsutil provides the illusion of a hierarchical file tree atop the "flat" name space supported by the Google Cloud Storage service. To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.
So you cannot "create" a directory like in a traditional File System.
If you're clear about what folders and objects already exist in the bucket, then you can create a new 'folder' with gsutil by copying an object into the folder.
>mkdir test
>touch test/file1
>gsutil cp -r test gs://my-bucket
Copying file://test\file1 [Content-
Type=application/octet-stream]...
/ [1 files][ 0.0 B/ 0.0 B]
Operation completed over 1 objects.
>gsutil ls gs://my-bucket
gs://my-bucket/test/
>gsutil ls gs://my-bucket/test
gs://my-bucket/test/file1
It won't work if the local directory is empty.
More simply:
>touch file2
>gsutil cp file2 gs://my-bucket/new-folder/
Copying file://test\file2 [Content- ...
>gsutil ls gs://my-bucket/new-folder
gs://my-bucket/new-folder/file2
Be aware of the potential for Surprising Destination Subdirectory Naming. E.g. if the target directory already exists as an object. For an automated process, a more robust approach would be to use rsync.
I don't know if its possible to create an empty folder with gsutil. For that, use the console's Create Folder button.
You cannot create folders with gsutil as gsutil does not support it (workaround see below).
However, it is supported via:
UI in browser
write your own GCS client (we have written our own custom client which can create folders)
So even if Google has a flat name space structure as the other answer correctly points out, it still has the possibility to create single folders as individual objects. Unfortunately gsutil does not expose this.
(Ugly) workaround with gsutil: Add a dummy file into a folder and upload this dummy file - but the folder will be gone once you delete this file, unless other files in that folder are present.
Copied from Google cloud help:
Copy the object to a folder in the bucket
Use the gsutil cp command to create a folder and copy the image into it:
gsutil cp gs://my-awesome-bucket/kitten.png gs://my-awesome-bucket/just-a-folder/kitten3.png
This works.
You cannot create a folder with gsutil on GCS.
But you can copy an existing folder with gsutil to GCS.
To copy an existing folder with gsutil to GCS, a folder must not be empty and the flag "-r" is needed as shown below otherwise you will get error if a folder is empty or you forgot the flag -r:
gsutil cp -r <non-empty-folder> gs://your-bucket
// "-r" is needed for folder
You cannot create an empty folder with mb

Rename a folder in GCS using gsutil

I can't rename an existing folder in GCS. How do I do this?
As per the documentation, this should be:
gsutil mv gs://my_bucket/olddir gs://my_bucket/newdir
However, what happens is that olddir is placed under newdir, i.e the directory structure is like this (after the call to gsutil mv):
my_bucket
newdir
olddir
instead of (what I would expect)
my_bucket
newdir
I've tried all four combinations of putting trailing slashes or not, but none of them worked.
This is a confirmed bug in GCS, see https://issuetracker.google.com/issues/112817360
It actually only happens, when the directory name of newdir is a substring of olddir. So the gsutil call from the question actually works, but the following one would not:
gsutil mv gs://my-organization-empty-bucket/dir_old gs://my-organization-empty-bucket/dir
I reproduced your case by having a bucket with a folder named olddir of which I want to move the content to newdir folder.
the following command:
gsutils mv gs://<bucketname>/olddir gs://<bucketname>/newdir
moved the whole content of folder to the newly created newdir folder.
Olddir and newdir folders were then at the same level, in the bucket root.
after that I just had to remove the folder called olddir.
Objects in a bucket cannot be renamed.
The gsutil mv command does not remove the previous folder object like the mv comand would do in Unix CLI.
I guess that if you have tried moving folders several times by using "/" characters placed differently, the structure and hierarchy of the folders will have changed after issuing the initial command.
Please try again from the beginning.
Bear in mind that once you have a subfolder inside a folder, objects will have to be moved one by one using the full path.