Trying to use Powershell on Windows 10 to download a small .gz file from an s3 bucket using the aws s3 cp command.
I am piping the output of the s3 cp to gzip -d to decompress. My aim is to basically copy, unzip and display contents without saving the .gz file locally.
From reading the official Amazon documentation for the s3 cp command, the following is mentioned:
https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
Downloading an S3 object as a local file stream
WARNING:: PowerShell may alter the encoding of or add a CRLF to piped or >redirected output.
Here is the command I'm executing from powershell:
PS C:\> aws s3 cp s3://my-bucket/test.txt.gz - | gzip -d
Which returns the following error: gzip: stdin: not in gzip format
The command works fine when run from Windows Command Prompt but I just can't seem to get it working with Powershell.
From a Windows Command prompt, it works fine:
C:\Windows\system32>aws s3 cp s3://my-bucket/test.txt.gz - | gzip -d
With some sample test data output as follows:
first_name last_name
---------- ----------
Ellerey Place
Cherie Glantz
Isaak Grazier
Related
I have list of files in a bucket in aws s3, but when i execute the aws cp command it gives me an error saying "unknown option".
my list
s3://<bucket>/cms/imagepool/5f84dc7234bf5.jpg
s3://<bucket>/cms/imagepool/5f84daa19b7df.jpg
s3://<bucket>/cms/imagepool/5f84dcb12f9c5.jpg
s3://<bucket>/cms/imagepool/5f84dcbf25d4e.jpg
My bash script is below:
#!/bin/bash
while read line
do
aws s3 cp "${line}" ./
done <../links.txt
This is the error I get:
Unknown options: s3:///cms/imagepool/5f84daa19b7df.jpg
Does anybody know how to solve this issue.
Turns out the solution below worked(had to include the --no-cli-auto-prompt flag):
#!/bin/bash
while read line
do
aws s3 cp --no-cli-auto-prompt "${line}" ./
done <../links.txt
I want to get number of records in zip file which is present in s3 bucket. Could you please tell me what is the fastest way to get the result?
I am running below command but that is not working. Please correct me if I am doing anything wrong.
aws s3 cp s3://itx-agu-lake/raw/vs-1/load-1619/data/phd_admsrc.txt.gz - | wc -l
The above command is giving me 0. but actual count is 24.
You need to decompress the .gz file:
aws s3 cp s3://bucket/object.gz - | zcat | wc -l
This copies the S3 object to stdout, sends it through zcat to decompress, then send the output to wc to count the lines.
aws s3 cp s3://bucketname/path/to/file/filename.csv.gz . --content-encoding gzip
I'm just trying to download a compressed csv file from a bucket that we don't control but have permissions to. I ran the above and the file downloads but is not viable. The result is in the picture below.
How can I download a viable file?
The object in question still needs to be decompressed.
Try performing the following command instead aws s3 cp s3://bucketname/path/to/file/filename.csv.gz ---content-encoding gzip | gzip -d to automatically decompress it on the way out
You can download the file as it is. It will be download as a csv file but with a compressed content. So, you can rename the file as a gz file and then decompress it. That will solve the problem.
If you are using terminal commands and the downloaded file name is x.csv
mv x.csv x.gz
gzip -d x.gz
I am trying to download a folder which is inside my Google Cloud Bucket, I read from google docs gsutil/commands/cp and executed below the line.
gsutil cp -r appengine.googleapis.com gs://my-bucket
But i am getting the error
CommandException: No URLs matched: appengine.googleapis.com
Edit
By running below command
gsutil cp -r gs://logsnotimelimit .
I am getting Error
IOError: [Errno 22] invalid mode ('ab') or filename: u'.\logsnotimelimit\appengine.googleapis.com\nginx.request\2018\03\14\14:00:00_14:59:59_S0.json_.gstmp'
What is the appengine.googleapis.com parameter in your command? Is that a local directory on your filesystem you are trying to copy to the cloud bucket?
The gsutil cp -r appengine.googleapis.com gs://my-bucket command you provided will copy a local directory named appengine.googleapis.com recursively to your cloud bucket named my-bucket. If that's not what you are doing - you need to construct your command differently.
I.e. to download a directory named folder from your cloud bucket named my-bucket into the current location try running
gsutil cp -r gs://my-bucket/folder .
-- Update: Since it appears that you're using a Windows machine (the "\" directory separators instead of "/" in the error message) and since the filenames contain the ":" character - the cp command will end up failing when creating those files with the error message you're seeing.
Just wanted to help people out if they run into this problem on Windows. As administrator:
Open C:\Program Files (x86)\Google\Cloud SDK\google-cloud-sdk\platform\gsutil\gslib\utils
Delete copy_helper.pyc
Change the permissions for copy_helper.py to allow writing
Open copy_helper.py
Go to the function _GetDownloadFile
On line 2312 (at time of writing), change the following line
download_file_name = _GetDownloadTempFileName(dst_url)
to (for example, objective is to remove the colons):
download_file_name = _GetDownloadTempFileName(dst_url).replace(':', '-')
Go to the function _ValidateAndCompleteDownload
On line 3184 (at time of writing), change the following line
final_file_name = dst_url.object_name
to (for example, objective is to remove the colons):
final_file_name = dst_url.object_name.replace(':', '-')
Save the file, and rerun the gsutil command
FYI, I was using the command gsutil -m cp -r gs://my-bucket/* . to download all my logs, which by default contain : which does not bode well for Windows files!
Hope this helps someone, I know it's a somewhat hacky solution, but seeing as you never need (should have) colons in Windows filenames, it's fine to do and forget. Just remember that if you update the Google SDK you'll have to redo this.
I got same issue and resolved it as below.
Open a cloud shell, and copy objects by using gsutil command.
gsutil -m cp -r gs://[some bucket]/[object] .
On the shell, zip those objects by using zip command.
zip [some file name].zip -r [some name of your specific folder]
On the shell, copy the zip file into GCS by using gsutil command.
gsutil cp [some file name].zip gs://[some bucket] .
On a Windows Command Prompt, copy the zip file in GCS by using gsutil command.
gsutil cp gs://[some bucket]/[some file name].zip .
I wish this information helps someone.
This is also gsutil's way of saying file not found. The mention of URL is just confusing in the context of local files.
Be careful, in this command, the file path is case sensitive. You can check if it is not a capitalized letter issue.
I'm trying to send the output of an 'echo' to an S3 file. Similar to how we can do something like echo 'Hello World' > file.txt, I'm doing
aws s3 cp s3://dirname/dirsubfolder/file.txt > echo 'Hello World'. However, I get Key "file.txt" does not exist. I know the file doesn't exist, but I want to copy the output as that file - is there a way to do this?
This feature was added in 2014, but I could not find it in the cli help docs.
echo "Hello World" | aws s3 cp - s3://example-bucket/hello.txt
I don't think you can pipe command output into the aws s3 cp command line tool, just like you can't pipe text like that into the standard cp command. Also, the command you are trying:
aws s3 cp s3://dirname/dirsubfolder/file.txt > echo 'Hello World'
is actually piping the output of aws s3 cp command into the echo command, which is the exact opposite of what you say you are trying to do.
You're going to need to script this in a couple steps like:
echo 'Hello World' > /tmp/file.txt
aws s3 cp /tmp/file.txt s3://dirname/dirsubfolder/file.txt
rm /tmp/file.txt