Uploading to Cloud Storage - am I missing something obvious? - google-cloud-platform

I'm trying to find a fast way to upload big folders to Google cloud storage. When I do it via the web browser, it often can't handle the size.
So I've been trying to use SDK Shell.
I write
gsutil cp C:\Folder\Sub folder - name gs://bucketname/
I get
No urls matched C:\Folder
Then I put the file name in quotes
gsutil cp C:\"Folder\Sub folder - name" gs://bucketname/
I get told
unrecognised scheme name gs
I've had a couple of friends look at it, they have no idea. I feel like I've tried so many iterations. Obviously I've missed something super basic? Any thoughts? It's a virtual machine running windows.
Thanks!

You have to use -r flag
The -R and -r options are synonymous. Causes directories,
buckets, and bucket subdirectories to be copied recursively.
If you neglect to use this option for an upload, gsutil will
copy any files it finds and skip any directories. Similarly,
neglecting to specify this option for a download will cause
gsutil to copy any objects at the current bucket directory
level, and skip any subdirectories.
gsutil cp -r C:\Folder\sub-folder-name gs://bucketname/

Related

how to use gsutil rsync. login and download bucket contents to a local directory

I have the following questions.
I got access to a cloud bucket to my email id. Now I want to download the whole bucket folder into a local directory on ubuntu. I installed gsutil from pip.
Is the command correct?
gsutil rsync gs://bucket_name .
the command seems generic how do I give my gmail credentials to it? The file is 1TB of size and I am allowed to download only once so I want to get the command right.
The command is correct if you want your current directory to mirror the contents of the bucket (including deleting any files on the right not found on the left). If you merely want to copy, you might want cp -r instead.
Here are the current docs on how to authenticate when running a standalone gsutil. It looks like you just need to run gsutil config.

Delete files older than 30 days under S3 bucket recursively without deleting folders using PowerShell

I can delete files and exclude folders with following script
aws s3 rm s3://my-bucket/ --recursive --exclude="*" --include="*/*.*"
when i tried to add pipe to delete only older files, i'm unable to.. please help with the script.
aws s3 rm s3://my-bucket/ --recursive --exclude="*" --include="*/*.*" | Where-Object {($_.LastModified -lt (Get-Date).AddDays(-31))}
The approach should be to list the files you need, then pipe the results to a delete call (a reverse of what you have). This might be better managed by a full blown script rather than a one line shell command. There's an article on this and some examples here.
Going forward, you should let S3 versioning take care of this, then you don't have to manage a script or remember to run it. Note: it'll only work with files that are added after versioning has been enabled.

How do I change the extension of multiple files in Cloud Storage?

How do I change the extension of multiple files in GCP's Cloud Storage?
For example, in the following directory:
gs://[bucket-name]/filesdirectory/
I want to change the files with extension .ipynb to .py.
You can do it by using gsutil which you can install on your local machine or use directly from cloud shell.
gsutil uses command much similar to the linux CLI, you can use the gsutil mv command to achieve this, but since you can't use wildcards there you have to use something similar to this:
IFS=$'\n'
gsutil ls gs://your-bucket/*.ipynb| while read x; do gsutil mv $x $(echo $x | sed "s/.ipynb/.py/g"); done
I'm not a shell expert so probably this can be improved, but here's an explanation:
gsutil ls uses a wildcard to return the files you want to rename
loop through the file and store into a variable the result
use gsutil mv + sed to place the file extension and rewrite the file as desired
This is like "rewriting" the files entirely since gcs objects are immutable, so there are probably a few considerations that you should keep in mind, although this might not be your case:
if you have ACLs rules specified for those files, you have to use the
-p flag to pass them on to the new files
these are operations for GCS, implying costs based on your storage class. (since mv is actually copy + delete, if you are on
nearline or coldline, you could have additional early deletion fees)
hope this helps :)
As mentioned in the documentation, you can rename your files in your GCS buckets using Console, GSutil command, Code or REST APIS.
The Gsutil command you should use is the following:
gsutil mv gs://[BUCKET_NAME]/[OLD_OBJECT_NAME] gs://[BUCKET_NAME]/[NEW_OBJECT_NAME]
Furthermore, in case you want to change more than one file, I would suggest you to use a script in order to do it for each file you need to change.

How to create a empty folder in google storage(bucket) using gsutil command?

How we can create the folder using gsutil command. I am using Bashoperator in airflow where I need to use the gsutil Bash command, Bucket is already created I want to create a folder inside bucket.
I already tried with below command but It's not working for me.
$ gsutil cp <new_folder> gs://<bucketname>/
I am getting error - CommandException: No URLs matched: new_folder
Google Storage does not work like a regular file system as in Windows/Linux. It appears to have folders but in the background it behaves as it does not. It only allows us to create "folders" so we can organize better and for our comfort.
If you want to save data in specific folders from gsutil try this.
gsutil cp [filetocopy] gs://your-bucket/folderyouwant/your-file
It will store the item in a "folder".
Check this link for more gsutil cp information.
This is the logic behind Google Cloud Storage "Folders".
gsutil will make a bucket listing request for the named bucket, using
delimiter="/" and prefix="abc". It will then examine the bucket
listing results and determine whether there are objects in the bucket
whose path starts with gs://your-bucket/abc/, to determine whether to
treat the target as an object name or a directory name. In turn this
impacts the name of the object you create: If the above check
indicates there is an "abc" directory you will end up with the object
gs://your-bucket/abc/your-file; otherwise you will end up with the
object gs://your-bucket/abc.
Here you have more interesting information about this if you want.
Apparently the ability to create an empty folder using gsutil is a request that has been seen a few times but not yet satisfied. There appears to be some workarounds by using API that can then be scripted. The GitHub issue for the ability to create empty folders through scripting can be found here:
https://github.com/GoogleCloudPlatform/gsutil/issues/388
You cannot create or copy an empty folder to GCS with gsutil as far as I researched and tried about it. Yes, it's inconvenient somehow.
A folder must not be empty to be created or copied to GCS and don't forget the flag "-r" to create or copy a folder to GCS as shown below otherwise you will get error if a folder is empty or you forgot the flag -r:
gsutil cp -r <non-empty-folder> gs://your-bucket
// "-r" is needed for folder

How to copy file from bucket GCS to my local machine

I need copy files from Google Cloud Storage to my local machine:
I try this command o terminal of compute engine:
$sudo gsutil cp -r gs://mirror-bf /var/www/html/mydir
That is my directory on local machine /var/www/html/mydir.
i have that error:
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
Where the mistake?
You must first create the directory /var/www/html/mydir.
Then, you must run the gsutil command on your local machine and not in the Google Cloud Shell. The Cloud Shell runs on a remote machine and can't deal directly with your local directories.
I have had a similar problem and went through the painful process of having to figuring it out too, so I thought I would provide my step by step solution (under Windows, hopefully similar for unix users) with snapshots and hope it helps others:
The first thing (as many others have pointed out on various stackoverflow threads), you have to run a local Console (in admin mode) for this to work (ie. do not use the cloud shell terminal).
Here are the steps:
Assuming you already have Python installed on your machine, you will then need to install the gsutil python package using pip from your console:
pip install gsutil
The Console looks like this:
You will then be able to run the gsutil config from that same console:
gsutil config
As you can see from the snapshot bellow, a .boto file needs to be created. It is needed to make sure you have permissions to access your drive.
Also note that you are now provided an URL, which is needed in order to get the authorization code (prompted in the console).
Open a browser and paste this URL in, then:
Log in to your Google account (ie. account linked to your Google Cloud)
Google ask you to confirm you want to give access to GSUTIL. Click Allow:
You will then be given an authorization code, which you can copy and paste to your console:
Finally you are asked for a project-id:
Get the project ID of interest from your Google Cloud.
In order to find these IDs, click on "My First Project" as circled here below:
Then you will be provided a list of all your projects and their ID.
Paste that ID in you console, hit enter and here you are! You now have created your .boto file. This should be all you need to be able to play with your Cloud storage.
Console output:
Boto config file "C:\Users\xxxx\.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
You will then be able to copy your files and folders from the cloud to your PC using the following gsutil Command:
gsutil -m cp -r gs://myCloudFolderOfInterest/ "D:\MyDestinationFolder"
Files from within "myCloudFolderOfInterest" should then get copied to the destination "MyDestinationFolder" (on your local computer).
gsutil -m cp -r gs://bucketname/ "C:\Users\test"
I put a "r" before file path, i.e., r"C:\Users\test" and got the same error. So I removed the "r" and it worked for me.
Check with '.' as ./var
$sudo gsutil cp -r gs://mirror-bf ./var/www/html/mydir
or maybe below problem
gsutil cp does not support copying special file types such as sockets, device files, named pipes, or any other non-standard files intended to represent an operating system resource. You should not run gsutil cp with sources that include such files (for example, recursively copying the root directory on Linux that includes /dev ). If you do, gsutil cp may fail or hang.
Source: https://cloud.google.com/storage/docs/gsutil/commands/cp
the syntax that worked for me downloading to a Mac was
gsutil cp -r gs://bucketname dir Dropbox/directoryname