How to download folder containing brackets in name? - google-cloud-platform

I have many folders in Google Cloud Storage that contain square brackets in the name. gsutil sees square brackets as wild cards and I am unable to download the project. Can I download folders another way?
I tried using the escape character and quotes. These do not work.
gsutil cp gs://myBucket/[Projects]Number1 /Volumes/DriveName/Desktop
The result is to download files from Google Cloud Storage to my local computer.

gsutil doesn't have a way to escape wildcard characters in file / object names. There's an open issue about this: https://github.com/GoogleCloudPlatform/gsutil/issues/220
Basically, you'll have to use a different tool (or write some code) to handle such files/objects.

Related

AWS S3 remove special character from multiple files

I have S3 bucket which has thousands of folders having millions of files.
The problem is, many files has special characters like Comma, #, £ which is causing broken url.
Is there any way I can remove specific special characters from all the files?
I have tried using cli command aws s3 mv <source_file_name> <new_file_name>, But there also I am not able to access some of the files because of special characters.
Is there a way or script which can be useful?

How I Can Search Unknown Folders in S3 Bucket. I Have millions of object in my bucket I only want Folder List?

I Have a bucket with 3 million objects. I Even don't know how many folders are there in my S3 bucket and even don't know the names of folders in my bucket.I want to show only list of folders of AWS s3. Is there any way to get list of all folders ?
I would use AWS CLI for this. To get started - have a look here.
Then it is a matter of almost standard linux commands (ls):
aws s3 ls s3://<bucket_name>/path/to/search/folder/ --recursive | grep '/$' > folders.txt
where:
grep command just reads what aws s3 ls command has returned and searches for entries with ending /.
ending > folders.txt saves output to a file.
Note: grep (if I'm not wrong) is unix only utility command. But I believe, you can achieve this on windows as well.
Note 2: depending on the number of files there this operation might (will) take a while.
Note 3: usually in systems like AWS S3, term folder is there only for user to maintain visual similarity with standard file systems however inside it does treat it as a part of a key. You can see in your (web) console when you filter by "prefix".
Amazon S3 buckets with large quantities of objects are very difficult to use. The API calls that list bucket contents are limited to returning 1000 objects per API call. While it is possible to request 'folders' (by using Delimiter='/' and looking at CommonPrefixes), this would take repeated calls to obtain the hierarchy.
Instead, I would recommend using Amazon S3 Inventory, which can provide a daily or weekly CSV file listing all objects. You can then play with that CSV file from code (or possibly Excel? Might be too big?) to obtain your desired listings.
Just be aware that doing anything on that bucket will not be fast.

Assist With AWS CLI S3 Bucket/Folder File Search

I have several folders and only certain folders contain files with ".003" at the end of their name. These files do not have an extension.
I am interested in finding out:
The name of the containing folder any of those files ARE in (inside of the bucket) and possibly listed only once (no duplication)?
The name of the containing folder that those files are NOT in?
I know how to do a search for a file like so:
aws s3 ls s3://{bucket}/{folder1}/{folder2} --recursive |grep "\.003"
Are there CLI commands that can give me what I am looking for?
If this or something like this has been asked before please point me in the correct direction. My apologies if so! :)
Thank you for your time!

Is there a way to copy Google Cloud Storage object from SDK Shell to network drive like Box?

Is there a way to copy a GCS object via SDK Shell to a network drive like Box?
What i've tried is below. Thanks.
gsutil cp gs://your-bucket/some_file.tif C:/Users/Box/01. name/folder
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
There appears to be a typo in your destination:
C:/Users/Box/01. name/folder
There is a space after the period and before 'name' - you'll need to either wrap it in quotes or escape that space. Looks like you're on Windows; here's a doc on how to escape spaces in file paths.

Using regEx to download the entire directory using wget

I want to download multiple pdfs from urls such as this - https://dummy.site.com/aabbcc/xyz/2017/09/15/2194812/O7ca217a71ac444eda516d8f78c29091a.pdf
If I do wget on complete URL then it downloads the file wget https://dummy.site.com/aabbcc/xyz/2017/09/15/2194812/O7ca217a71ac444eda516d8f78c29091a.pdf
But if I try to recursively download the entire folder then it returns 403(forbidden access)
wget -r https://dummy.site.com/aabbcc/xyz/
I have tried by setting user agent, rejecting robots.txt and bunch of other solutions from the internet, but I'm coming back to same point.
So I want to form the list of all possible URLs considering the given URL as common pattern, and have no idea how to do that.
I just know that I can pass that file as input to wget which will download the files recursively. So seeking the help for forming the URL list using regEx here.
Thank You!
You can't download using wildcard the files you can't see. If the host do not support directory listing you have no idea what the filenames/paths are. Also as you do not know the algorithm to generate filenames you can't generate and get them.