Is there a way to copy Google Cloud Storage object from SDK Shell to network drive like Box? - google-cloud-platform

Is there a way to copy a GCS object via SDK Shell to a network drive like Box?
What i've tried is below. Thanks.
gsutil cp gs://your-bucket/some_file.tif C:/Users/Box/01. name/folder
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.

There appears to be a typo in your destination:
C:/Users/Box/01. name/folder
There is a space after the period and before 'name' - you'll need to either wrap it in quotes or escape that space. Looks like you're on Windows; here's a doc on how to escape spaces in file paths.

Related

How to delete object, filter by extension in Google Cloud Storage

Whats the best way to delete google cloud storage objects filtered by extensions? I have a bucket that contains folders with millions of .html and .jpg extension files and I'd like to automatically delete only html files older than 2 years. I want to apply a rule using Object Lifecycle Management.
I have read the article1 and article2 which shows to apply rule for all objects but I only want to delete html files.
I don't believe this can be done with Object Lifecycle Management. It does not have a condition to specify file extensions. I would try to craft a list of files to remove by creation date with gsutil stat and wildcards and then remove those files with a piped gsutil rm. Something like this should work:
cat filestoremove.txt | gsutil -m rm -I
Per the gsutil rm doc page the txt has to contain object URLs for this to work. One per line.

How to upload a file to S3 without a name

I am reading data from S3 bucket using Athena and the data from following file is correct.
# aws s3 ls --human s3://some_bucket/email_backup/email1/
2020-08-17 07:00:12 0 Bytes
2020-08-17 07:01:29 5.0 GiB email_logs_old1.csv.gz
When I change the path to _updated as shown below, I get an error.
# aws s3 ls --human s3://some_bucket/email_backup_updated/email1/
2020-08-22 12:01:36 5.0 GiB email_logs_old1.csv.gz
2020-08-22 11:41:18 5.0 GiB  
This is because of the extra file without name in the same location. I have no idea how I managed to upload a file without a name. I will like to know how to repeat it (so that I can avoid it)
All S3 files have a name (in fact the full path is actually the object key, which is the metadata to define your object name).
If you see a blank named file in the path of s3://some_bucket/email_backup_updated/email1/ you have likely created a file named s3://some_bucket/email_backup_updated/email1/.
As I mentioned earlier S3 objects use key, for this reason the file hierarchy does not exist. You simply are filtering by prefix instead.
You should be able to validate this by performing the following without the trailing slash aws s3 ls --human s3://some_bucket/email_backup_updated/email1.
If you add an extra non-breaking space at the end of the destination path, the file will be copied to S3 but with a blank name. for e.g.
aws s3 cp t.txt s3://some_bucket_123/email_backup_updated/email1/ 
(Note the non-breaking space after email1/ )
\xa0 is actually non-breaking space in Latin1, also chr(160). The non breaking space itself is the name of the file!
Using the same logic, I can remove the "space" file by adding the non-breaking space at the end.
aws s3 rm s3://some_bucket_123/email_backup_updated/email1/ 
I can also login to console and remove it from User Interface.

How to download folder containing brackets in name?

I have many folders in Google Cloud Storage that contain square brackets in the name. gsutil sees square brackets as wild cards and I am unable to download the project. Can I download folders another way?
I tried using the escape character and quotes. These do not work.
gsutil cp gs://myBucket/[Projects]Number1 /Volumes/DriveName/Desktop
The result is to download files from Google Cloud Storage to my local computer.
gsutil doesn't have a way to escape wildcard characters in file / object names. There's an open issue about this: https://github.com/GoogleCloudPlatform/gsutil/issues/220
Basically, you'll have to use a different tool (or write some code) to handle such files/objects.

How to copy only files from many subdirectory under the directory to another project bucket in GCP?

I have huge number of data in my Google Cloud storage bucket. I have to copy all the files to another project bucket. But the main problem is, in this bucket i created some folder and under this folder have many sub-folders and all sub-folders have data. So when i am using normal gsutil copy command then its copying all the data along with folders.
I need help to resolve this problem. Because it is taking too much time to copy from one project to another project bucket.
You can use this command to have all the files in the root path.
gsutil cp 'gs://[YOUR_FIRST_BUCKET_NAME]/*' gs://[YOUR_SECOND_BUCKET_NAME]
If you have nested directories inside your bucket, use this command:
gsutil cp -r 'gs://[YOUR_FIRST_BUCKET_NAME]/*' gs://[YOUR_SECOND_BUCKET_NAME]
Pay attention to single quotes around the first command.
You can take a look at the Wildcard Names if you need more advanced features.
You can use Google Data Transfer Service
It is the second option in the Google Cloud Storage subcategory.
Use gsutil cp command without -r option.
The -R and -r options are synonymous. Causes directories,
buckets, and bucket subdirectories to be copied recursively.
If you neglect to use this option for an upload, gsutil will
copy any files it finds and skip any directories. Similarly,
neglecting to specify this option for a download will cause
gsutil to copy any objects at the current bucket directory
level, and skip any subdirectories.
If I understand well, you want to copy all the files from one bucket to another bucket, but you don't want to have the same hierarchy, instead, you want to have all the files in the root path.
Nowadays there’s no possible way to do that with gsutil, but you can do it with a script, here you have my solution:
from google.cloud import storage
bucketOrigin = storage.Client().get_bucket("<BUCKET_ID_ORIGIN>")
bucketDestination = storage.Client().get_bucket("<BUCKET_ID_DESTINATION")
for blob in bucketOrigin.list_blobs():
strfile=blob.download_as_string()
blobDest = bucketDestination.blob(blob.name[blob.name.rfind("/")+1:])
blobDest.upload_from_string(strfile)
As mentioned by Akash Dathan, you can use the Cloud Storage Transfer Service to move your bucket content. I recommend you to take a look on this Moving and Renaming Buckets guide, where you can find the steps required to perform this task.
Bear in mind the following requirments:
Transfer Service service account must have permission to read from
your source and write to your destination.
If you're deleting the source files, the Transfer Service's service account will need delete access to the source.
If your service account doesn't have these
permissions yet, a bucket owner must grant them.
Note. If you have 'storage.buckets.setIamPolicy' permission for the source and destination buckets, creating a transfer job will grant that service account the required source and destination permissions to complete the transfer.
You can list all files from your subfolders and get the file name by using split() method. Then you can use use a copy() method to copy the file to another bucket. The method below remove all subfolders:
const [files] = await storage.bucket(srcBucketName).getFiles();
files.forEach((file) => {
let fileName = file.name.split("/").pop();
if (fileName)
file.copy(storage.bucket(destBucketName).file(`${prefix}/${fileName}`));
});

Is there any way to get s3 uri from aws web console?

I want to download a directory from my s3.
When I need a file, the s3 management console (aws web console) allows me to download it, but when a directory, I have to use aws-cli, like:
$ aws s3 cp s3://mybucket/mydirectory/ . --recursive
My question is: Is there any way to get the s3 uri (s3://mybucket/mydirectory/) from s3 management console?
It's URL is available, but it is slightly different from s3 URI required by aws-cli. I could not find any menu to get the uri.
Thank you in advance!
No, it is not displayed in the console. However, it is simply:
s3://<bucket-name>/<key>
Directories are actually part of the key. For example, foo.jpg stored in an images directory will actually have a key (filename) of images/foo.jpg.
(self-answer)
Because it seems there was no such way, I have created one:
pip install aws-s3-url2uri
And command aws_s3_url2uri will be available after installation.
This command internally converts the web console URLs to S3 URIs, so works with URLs and URIs and local paths:
aws_s3_url2uri ls "https://console.aws.amazon.com/s3/home?region=<regionname>#&bucket=mybucket&prefix=mydir/mydir2/"
calls
aws s3 ls s3://mybucket/mydir/mydir2/
internally.
To convert an S3 URL displayed in the console such as https://s3.us-east-2.amazonaws.com/my-bucket-name/filename to an S3 URI, remove the https://s3.us-east-2.amazonaws.com/ portion and replace it with s3://, like so:
s3://my-bucket-name/filename
It looks like this feature is now available in the AWS Web Console.
It is accessible in two ways:
Selecting the checkbox next to the file and clicking on "Copy S3 URI" button.
Clicking on the file, and clicking on the "Copy S3 URI" button on the top right.
You can get the value from the console by selecting the file in the console. Choose Copy path on the Overview tab to copy the S3:// link to the object.
It is possible to get the S3-URI for a proper key/file in the console, by selecting the key and clicking on the Copy path button, this will place the s3-URI for the file on the clipboard.
However, directories are not keys as such but just key prefixes, so this will not work for them.
You may fail to get s3uri if you are created a new bucket.
You can get s3uri after creating a new folder in your bucket >> select check box to the newly created folder >> then copy the s3uri that appears at the top.