Read/Write data to/from Cloud Storage Bucket using gcsfuse - google-cloud-platform

To mount Google Cloud Storage Bucket onto a directory on a local machine for processing.
Using a manjaro environment and installed gcsfuse manually.
in the gs://bucket01, there are directories containing jpg and json files
go get -u github.com/googlecloudplatform/gcsfuse
$GOPATH/src/github.com/googlecloudplatform/gcsfuse
GOOGLE_APPLICATION_CREDENTIALS=/run/media/manjaro/gcp/key.json gcsfuse bucket01 /run/media/manjaro/gcp/bucket01
Using mount point: /run/media/manjaro/gcp/bucket01
Opening GCS connection...
Mounting file system...
File system has been successfully mounted.
cd /run/media/manjaro/gcp/bucket01
ls
# empty
# The expected outcome is data from gs://bucket01 populates /run/media/manjaro/gcp/bucket01
# Updates in /run/media/manjaro/gcp/bucket01 will also be seen in gs://bucket01
Am i using gcsfuse correctly?

Please try using Implicit directories
As mentioned above, by default there is no allowance for the implicit
existence of directories. Since the usual file system operations like
mkdir will do the right thing, if you set up a bucket's structure
using only gcsfuse then you will not notice anything odd about this.
If, however, you use some other tool to set up objects in GCS (such as
the storage browser in the Google Developers Console), you may notice
that not all objects are visible until you create leading directories
for them.
gcsfuse supports a flag called --implicit-dirs that changes the
behavio

Related

GCP bucket reachable in UI but not by gcsfuse in the cloud shell

Hi I want reach some files in a GCP bucket from the cloud shell terminal (for sftp reasons), gcsfuse successfully mounts the father dir and it has all the directories except the one I need, any ideas what am I doing wrong?
In Google Cloud Storage object names ending in a slash(/) represent a directory, and all other object names represent a file. By default directories are not implicitly defined, they exist only if a matching object ending in a slash(/) exists.
Since the usual file system operations like mkdir will do the right thing, if someone set up a bucket's structure using only gcsfuse then they will not notice anything odd about this. However, if someone uses some other tool to set up objects in Google Cloud Storage (such as the storage browser in the Google Cloud Console), they may notice that not all objects are visible until they create leading directories for them.
For example, let's say someone uploaded an object demo/start.txt by choosing the folder upload option in the storage browser section in Google Cloud Console, then mounted it with gcsfuse. The file system will initially appear empty, since there is no demo/ object. However if they subsequently run mkdir demo, they will now see a directory named demo containing a file named start.txt.
To mitigate this issue gcsfuse supports a flag called --implicit-dirs. When this flag is enabled, name lookup requests use the Google Cloud Storage API's Objects.list operation to search for objects that would implicitly define the existence of a directory with the name in question. So, in the example above, a directory named demo containing a file start.txt would appear.
So in your case I suspect the file you are not able to see is a folder which you have uploaded in Google Cloud Storage bucket. As you have already mounted gcsfuse with a directory, if you mount it again using the flag --implicit-dirs, it will throw an error. So I would suggest you to unmount the directory by running the following command -
fusermount -u /path/to/mount/directory
Then mount the directory again by running the following command -
gcsfuse --implicit-dirs BUCKET_NAME /path/to/mount/directory
You can also create a new directory and mount that directory with gcsfuse without unmounting the existing mounted directory.
Please note that the flag --implicit-dirs has some drawbacks. I would recommend you to go through this github issue to get detailed information about it.

cannot open path of the current working directory: Permission denied when trying to run rsync command twice in gcloud

I am trying to copy the data from the file stores to the google cloud bucket.
This is the command I am using:
gsutil rsync -r /fileserver/demo/dir1 gs://corp-bucket
corp-bucket: Name of my bucket
/fileserver/demo/dir1: Mount point directory (This directory contain the data of the file store)
This command works fine in the first time, It copies the data of the directory /fileserver/demo/dir1 to the cloud bucket but then I delete the data from the cloud bucket and again run the same command without any changes then I get this error:
cannot open path of the current working directory: Permission denied
NOTE: If I made even a small changes to the file of the /fileserver/demo/dir1 and run the above command then again it works fine but my question is why it is not working without any changes and is there any way to copy file without making any changes
Thanks.
You may be hitting the limitation #2 of rsync " The gsutil rsync command considers only the live object version in the source and destination buckets"; You can exclude /dir1... with the -x pattern and still let rsync makes the clean up work as the regular sync process.
Another way to copy those files will be to use cp with -r option to make it recursively instead of rsync.

How to create a empty folder in google storage(bucket) using gsutil command?

How we can create the folder using gsutil command. I am using Bashoperator in airflow where I need to use the gsutil Bash command, Bucket is already created I want to create a folder inside bucket.
I already tried with below command but It's not working for me.
$ gsutil cp <new_folder> gs://<bucketname>/
I am getting error - CommandException: No URLs matched: new_folder
Google Storage does not work like a regular file system as in Windows/Linux. It appears to have folders but in the background it behaves as it does not. It only allows us to create "folders" so we can organize better and for our comfort.
If you want to save data in specific folders from gsutil try this.
gsutil cp [filetocopy] gs://your-bucket/folderyouwant/your-file
It will store the item in a "folder".
Check this link for more gsutil cp information.
This is the logic behind Google Cloud Storage "Folders".
gsutil will make a bucket listing request for the named bucket, using
delimiter="/" and prefix="abc". It will then examine the bucket
listing results and determine whether there are objects in the bucket
whose path starts with gs://your-bucket/abc/, to determine whether to
treat the target as an object name or a directory name. In turn this
impacts the name of the object you create: If the above check
indicates there is an "abc" directory you will end up with the object
gs://your-bucket/abc/your-file; otherwise you will end up with the
object gs://your-bucket/abc.
Here you have more interesting information about this if you want.
Apparently the ability to create an empty folder using gsutil is a request that has been seen a few times but not yet satisfied. There appears to be some workarounds by using API that can then be scripted. The GitHub issue for the ability to create empty folders through scripting can be found here:
https://github.com/GoogleCloudPlatform/gsutil/issues/388
You cannot create or copy an empty folder to GCS with gsutil as far as I researched and tried about it. Yes, it's inconvenient somehow.
A folder must not be empty to be created or copied to GCS and don't forget the flag "-r" to create or copy a folder to GCS as shown below otherwise you will get error if a folder is empty or you forgot the flag -r:
gsutil cp -r <non-empty-folder> gs://your-bucket
// "-r" is needed for folder

Can I provide AWS credentials via mounted directory to local Docker container built by sbt-native-packager

We have some docker images we build with sbt-native-packager that need to interact with AWS services. When running them outside of AWS, we need to explicitly provide credentials.
I know we can explicitly pass environment variables containing the AWS credentials. Doing this complicates keeping our credentials secret. One option is to provide them via the command line, typically storing them into our shell history (yes I know this can be avoided by adding a space to the start of the command, but that is easy to forget) and putting them at higher risk of accidental copy/paste sharing. Alternatively, we can provide them via an env-file. But this exposes us to possibly checking them into version control or pushing them to another server unintentionally.
We've found that the ideal practice is to mount our local ~/.aws/ directory into the running user's home directory for the docker container. However, our attempts at getting this to work with the sbt-native-packager images have been unsuccessful.
One unique detail for sbt-native-packager images (compared to our others) is they are build using docker's ENTRYPOINT instead of CMD to start the application. I don't know if this has bearing on the problem.
So the question: Is it possible to provide AWS credentials to a docker container created by sbt-native-packager by mounting the AWS credentials folder via command line parameters at startup?
The problem I was running into was related to permissions. The .aws files have very restricted access on my machine, and the default user within the sbt-native-packager image is daemon. This user does not have access to read my files when mounted into the container.
I am able to obtain the behavior I desire by adding the following flags to my docker run command: -v ~/.aws/:/root/.aws/ --user=root
I was able to discover this by using the --entrypoint=ash flag when running to look at the HOME environment variable (location to mount the /.aws/ folder) and attempting to cat the contents of mounted folder.
Now I just need to understand what security vulnerabilities I'm opening myself up to by running docker containers in this way.
I'm not entirely sure why mounting ~/.aws would be a problem - typically it could be related to read permissions on that directory and the different UID between the host system and the container.
That said, I can suggest a couple of workarounds:
Use an environment variable file instead of explicitly specifying them in the command line. In docuer run, you can do this by specifying --env-file. To me this sounds like the most simple approach.
Mount a different credentials file and provide the AWS_CONFIG_FILE environment variable to specify it's location.

How to copy file from bucket GCS to my local machine

I need copy files from Google Cloud Storage to my local machine:
I try this command o terminal of compute engine:
$sudo gsutil cp -r gs://mirror-bf /var/www/html/mydir
That is my directory on local machine /var/www/html/mydir.
i have that error:
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
Where the mistake?
You must first create the directory /var/www/html/mydir.
Then, you must run the gsutil command on your local machine and not in the Google Cloud Shell. The Cloud Shell runs on a remote machine and can't deal directly with your local directories.
I have had a similar problem and went through the painful process of having to figuring it out too, so I thought I would provide my step by step solution (under Windows, hopefully similar for unix users) with snapshots and hope it helps others:
The first thing (as many others have pointed out on various stackoverflow threads), you have to run a local Console (in admin mode) for this to work (ie. do not use the cloud shell terminal).
Here are the steps:
Assuming you already have Python installed on your machine, you will then need to install the gsutil python package using pip from your console:
pip install gsutil
The Console looks like this:
You will then be able to run the gsutil config from that same console:
gsutil config
As you can see from the snapshot bellow, a .boto file needs to be created. It is needed to make sure you have permissions to access your drive.
Also note that you are now provided an URL, which is needed in order to get the authorization code (prompted in the console).
Open a browser and paste this URL in, then:
Log in to your Google account (ie. account linked to your Google Cloud)
Google ask you to confirm you want to give access to GSUTIL. Click Allow:
You will then be given an authorization code, which you can copy and paste to your console:
Finally you are asked for a project-id:
Get the project ID of interest from your Google Cloud.
In order to find these IDs, click on "My First Project" as circled here below:
Then you will be provided a list of all your projects and their ID.
Paste that ID in you console, hit enter and here you are! You now have created your .boto file. This should be all you need to be able to play with your Cloud storage.
Console output:
Boto config file "C:\Users\xxxx\.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
You will then be able to copy your files and folders from the cloud to your PC using the following gsutil Command:
gsutil -m cp -r gs://myCloudFolderOfInterest/ "D:\MyDestinationFolder"
Files from within "myCloudFolderOfInterest" should then get copied to the destination "MyDestinationFolder" (on your local computer).
gsutil -m cp -r gs://bucketname/ "C:\Users\test"
I put a "r" before file path, i.e., r"C:\Users\test" and got the same error. So I removed the "r" and it worked for me.
Check with '.' as ./var
$sudo gsutil cp -r gs://mirror-bf ./var/www/html/mydir
or maybe below problem
gsutil cp does not support copying special file types such as sockets, device files, named pipes, or any other non-standard files intended to represent an operating system resource. You should not run gsutil cp with sources that include such files (for example, recursively copying the root directory on Linux that includes /dev ). If you do, gsutil cp may fail or hang.
Source: https://cloud.google.com/storage/docs/gsutil/commands/cp
the syntax that worked for me downloading to a Mac was
gsutil cp -r gs://bucketname dir Dropbox/directoryname