Trying to download organization data to an external drive

Trying to download organization data to an external drive - google-cloud-platform

I am trying to backup all of our Google Cloud data to an external storage device.
There is a lot of data so I am attempting to download the entire bucket at once and am using the following command to do so, but it halts saying that there isn't enough storage on the device to complete the transfer.
gsutil -m cp -r \
"bucket name" \
.
What do I need to add to this command to download this information to my local D: drive? I have searched through the available docs and have not been able to find the answer.
I used the gsutil command that GCP provided for me automatically, but it seems to be trying to copy the files to a destination without enough storage to hold the needed data.

Remember that you are running the command from the Cloud Shell and not in a local terminal or Windows Command Line. If you inspect the Cloud Shell's file system/structure, it resembles more that of a Unix environment in which you can specify the destination like such instead: ~/bucketfiles/. Even a simple gsutil -m cp -R gs://bucket-name.appspot.com ./ will work since Cloud Shell can identify the ./ directory which is the current directory.
A workaround to this is to perform the command on your Windows Command Line. You would have to install Google Cloud SDK beforehand.
Alternatively, this can also be done in Cloud Shell, albeit with an extra step:
Download the bucket objects by running gsutil -m cp -R gs://bucket-name ~/ which will download it into the home directory in Cloud Shell
Transfer the files downloaded in the ~/ (home) directory from Cloud Shell to the local machine either through the User Interface or by running gcloud alpha cloud-shell scp.

Related

how to use gsutil rsync. login and download bucket contents to a local directory

I have the following questions.
I got access to a cloud bucket to my email id. Now I want to download the whole bucket folder into a local directory on ubuntu. I installed gsutil from pip.
Is the command correct?
gsutil rsync gs://bucket_name .
the command seems generic how do I give my gmail credentials to it? The file is 1TB of size and I am allowed to download only once so I want to get the command right.

The command is correct if you want your current directory to mirror the contents of the bucket (including deleting any files on the right not found on the left). If you merely want to copy, you might want cp -r instead.
Here are the current docs on how to authenticate when running a standalone gsutil. It looks like you just need to run gsutil config.

Google cloud vm stop or freeze when extracting a .7z file

Resently I am working with Google Cloud Compute Engine to train a ml model
So I am tring to extract a .7z fike that has the data.
But it is too big and the machine even freezes or stops for uncatching error
I am using the Linux command below:
!7zr 'path of the file'
Any help to be able extracting the file ... Thanks in advance

You could try it by using GCS
Create a directory that only has the compressed file in it and nothing else,
yourdir/myfile.7z
Create an environment variable MYFILE=myfile.7z
Create a bucket on GCS using the gsutil cli:
gsutil mb gs://yourbucket/MY_DIR_FOR_ZIP_FILE
Next you upload the file to the bucket, like so
gsutil cp -m -v $MYFILE gs://MYBUCKET/MY_DIR_FOR_ZIP_FILE
Within the VM you can now download the file, again using gsutil cli
gsutil cp -m -v gs://MYBUCKET/MY_DIR_FOR_ZIP_FILE /YOU_DIR
Then extract and also remove the compresses file,
7z x $MYFILE && rm -v $MYFILE
You should now have the uncompressed file on the VM
Make sure to use the -m flag this will perform a parallel (multi-threaded/multi-processing) copy.
Here is the reference cp - Copy files and objects
Using the gsutil tool
The instructions above assumes that the size of your data is less than 1TB, and also you are using a VM with a disk size large enough to accomadate the data.
If your data is more than 1TB, you will need to use Transfer service for on-premises data.
The steps to follow when setting up transfer jobs are listed here
Creating a transfer job

How to exclude the particular folder in gcs bucket in google cloud while copying to local machine?

I am trying to copy the files and folders from google cloud storage to vm machine using gsutil command but i need to exclude few of the folders in the gcs bucket while copying to vm, i tried searching for the options but i couldn't find it, please help if anyone knows the command for this.
Thanks in-advance,

For this you can use a command like:
gsutil -m rsync -r -x '^dir3/*' gs://bucket
this should retrieve all objects located on the bucket, except objects beginning with dir3 (files not located in dir3 directory in your example).
Here you can find more details about the rsync command

How to download an entire bucket in GCP?

I have a problem downloading entire folder in GCP. How should I download the whole bucket? I run this code in GCP Shell Environment:
gsutil -m cp -R gs://my-uniquename-bucket ./C:\Users\Myname\Desktop\Bucket
and I get an error message: "CommandException: Destination URL must name a directory, bucket, or bucket subdirectory for the multiple source form of the cp command. CommandException: 7 files/objects could not be transferred."
Could someone please point out the mistake in the code line?

To download an entire bucket You must install google cloud SDK
then run this command
gsutil -m cp -R gs://project-bucket-name path/to/local
where path/to/local is your path of local storage of your machine

The error lies within the destination URL as specified by the error message.
I run this code in GCP Shell Environment
Remember that you are running the command from the Cloud Shell and not in a local terminal or Windows Command Line. Thus, it is throwing that error because it cannot find the path you specified. If you inspect the Cloud Shell's file system/structure, it resembles more that of a Unix environment in which you can specify the destination like such instead: ~/bucketfiles/. Even a simple gsutil -m cp -R gs://bucket-name.appspot.com ./ will work since Cloud Shell can identify the ./ directory which is the current directory.
A workaround to this issue is to perform the command on your Windows Command Line. You would have to install Google Cloud SDK beforehand.
Alternatively, this can also be done in Cloud Shell, albeit with an extra step:
Download the bucket objects by running gsutil -m cp -R gs://bucket-name ~/ which will download it into the home directory in Cloud Shell
Transfer the files downloaded in the ~/ (home) directory from Cloud Shell to the local machine either through the User Interface or by running gcloud alpha cloud-shell scp

Your destination path is invalid:
./C:\Users\Myname\Desktop\Bucket
Change to:
/Users/Myname/Desktop/Bucket
C: is a reserved device name. You cannot specify reserved device names in a relative path. ./C: is not valid.

There is not a one-button solution for downloading a full bucket to your local machine through the Cloud Shell.
The best option for an environment like yours (only using the Cloud Shell interface, without gcloud installed on your local system), is to follow a series of steps:
Downloading the whole bucket on the Cloud Shell environment
Zip the contents of the bucket
Upload the zipped file
Download the file through the browser
Clean up:
Delete the local files (local in the context of the Cloud Shell)
Delete the zipped bucket file
Unzip the bucket locally
This has the advantage of only having to download a single file on your local machine.
This might seem a lot of steps for a non-developer, but it's actually pretty simple:
First, run this on the Cloud Shell:
mkdir /tmp/bucket-contents/
gsutil -m cp -R gs://my-uniquename-bucket /tmp/bucket-contents/
pushd /tmp/bucket-contents/
zip -r /tmp/zipped-bucket.zip .
popd
gsutil cp /tmp/zipped-bucket.zip gs://my-uniquename-bucket/zipped-bucket.zip
Then, download the zipped file through this link: https://storage.cloud.google.com/my-uniquename-bucket/zipped-bucket.zip
Finally, clean up:
rm -rf /tmp/bucket-contents
rm /tmp/zipped-bucket.zip
gsutil rm gs://my-uniquename-bucket/zipped-bucket.zip
After these steps, you'll have a zipped-bucket.zip file in your local system that you can unzip with the tool of your choice.
Note that this might not work if you have too much data in your bucket and the Cloud Shell environment can't store all the data, but you could repeat the same steps on folders instead of buckets to have a manageable size.

How to copy file from bucket GCS to my local machine

I need copy files from Google Cloud Storage to my local machine:
I try this command o terminal of compute engine:
$sudo gsutil cp -r gs://mirror-bf /var/www/html/mydir
That is my directory on local machine /var/www/html/mydir.
i have that error:
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
Where the mistake?

You must first create the directory /var/www/html/mydir.
Then, you must run the gsutil command on your local machine and not in the Google Cloud Shell. The Cloud Shell runs on a remote machine and can't deal directly with your local directories.

I have had a similar problem and went through the painful process of having to figuring it out too, so I thought I would provide my step by step solution (under Windows, hopefully similar for unix users) with snapshots and hope it helps others:
The first thing (as many others have pointed out on various stackoverflow threads), you have to run a local Console (in admin mode) for this to work (ie. do not use the cloud shell terminal).
Here are the steps:
Assuming you already have Python installed on your machine, you will then need to install the gsutil python package using pip from your console:
pip install gsutil
The Console looks like this:
You will then be able to run the gsutil config from that same console:
gsutil config
As you can see from the snapshot bellow, a .boto file needs to be created. It is needed to make sure you have permissions to access your drive.
Also note that you are now provided an URL, which is needed in order to get the authorization code (prompted in the console).
Open a browser and paste this URL in, then:
Log in to your Google account (ie. account linked to your Google Cloud)
Google ask you to confirm you want to give access to GSUTIL. Click Allow:
You will then be given an authorization code, which you can copy and paste to your console:
Finally you are asked for a project-id:
Get the project ID of interest from your Google Cloud.
In order to find these IDs, click on "My First Project" as circled here below:
Then you will be provided a list of all your projects and their ID.
Paste that ID in you console, hit enter and here you are! You now have created your .boto file. This should be all you need to be able to play with your Cloud storage.
Console output:
Boto config file "C:\Users\xxxx\.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
You will then be able to copy your files and folders from the cloud to your PC using the following gsutil Command:
gsutil -m cp -r gs://myCloudFolderOfInterest/ "D:\MyDestinationFolder"
Files from within "myCloudFolderOfInterest" should then get copied to the destination "MyDestinationFolder" (on your local computer).

gsutil -m cp -r gs://bucketname/ "C:\Users\test"
I put a "r" before file path, i.e., r"C:\Users\test" and got the same error. So I removed the "r" and it worked for me.

Check with '.' as ./var
$sudo gsutil cp -r gs://mirror-bf ./var/www/html/mydir
or maybe below problem
gsutil cp does not support copying special file types such as sockets, device files, named pipes, or any other non-standard files intended to represent an operating system resource. You should not run gsutil cp with sources that include such files (for example, recursively copying the root directory on Linux that includes /dev ). If you do, gsutil cp may fail or hang.
Source: https://cloud.google.com/storage/docs/gsutil/commands/cp

the syntax that worked for me downloading to a Mac was
gsutil cp -r gs://bucketname dir Dropbox/directoryname

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js