Downloading s3 bucket to local directory but files not copying? - amazon-web-services

There are many, many examples of how to download a directory of files from an s3 bucket to a local directory.
aws s3 cp s3://<bucket>/<directory> /<path>/<to>/<local>/ --recursive
However, I run this command from my AWS CLI that I've connected to and see confirmation in the terminal like:
download: s3://mybucket/myfolder/data1.json to /my/local/dir/data1.json
download: s3://mybucket/myfolder/data2.json to /my/local/dir/data2.json
download: s3://mybucket/myfolder/data3.json to /my/local/dir/data3.json
...
But then I check /my/local/dir for the files, and my directory is empty. I've tried using the sync command instead, I've tried copying just a single file - nothing seems to work right now. In the past I did successfully run this command and downloaded the files as expected.
Why are my files not being copied now, despite seeing no errors?

For testing you can go to your /my/local/dir folder and execute following command:
aws s3 sync s3://mybucket/myfolder .

Related

how to use gsutil rsync. login and download bucket contents to a local directory

I have the following questions.
I got access to a cloud bucket to my email id. Now I want to download the whole bucket folder into a local directory on ubuntu. I installed gsutil from pip.
Is the command correct?
gsutil rsync gs://bucket_name .
the command seems generic how do I give my gmail credentials to it? The file is 1TB of size and I am allowed to download only once so I want to get the command right.
The command is correct if you want your current directory to mirror the contents of the bucket (including deleting any files on the right not found on the left). If you merely want to copy, you might want cp -r instead.
Here are the current docs on how to authenticate when running a standalone gsutil. It looks like you just need to run gsutil config.

Errno 22 When downloading multiple files from S3 bucket "sub-folder"

I've been trying to use the AWS CLI to download all files from a sub-folder in AWS however, after the first few files download it fails to download the rest. I believe this is because it adds an extension to the filename and it then sees that as an invalid filepath.
I'm using the following command;
aws s3 cp s3://my_bucket/sub_folder /tmp/ --recursive
It gives me the following error for almost all of the files in the subfolder;
[Errno 22] Invalid argument: 'C:\\tmp\\2019-08-15T16:15:02.tif.deDBF2C2
I think this is because of the .deDBF2C2 extension it seems to be adding to the files when downloading though I don't know why it does. The filenames all end with .tif in the actual bucket.
Does anyone know what causes this?
Update: The command worked once I executed it from a linux machine. Seems to be specific to windows.
This is an oversight by AWS using Windows reserved characters in Log files names! When you execute the command it will create all the directory's however any logs with :: in the name fail to download.
Issue is discussed here: https://github.com/aws/aws-cli/issues/4543
So frustrated I came up with a workaround by executing a "DryRun" which prints the expected log output and porting that to a text file, eg:
>aws s3 cp s3://config-bucket-7XXXXXXXXXXX3 c:\temp --recursive --dryrun > c:\temp\aScriptToDownloadFilesAndReplaceNames.txt
The output file is filled with these aws log entries we can turn into aws script commands:
(dryrun) download: s3://config-bucket-7XXXXXXXXXXX3/AWSLogs/7XXXXXXXXXXX3/Config/ap-southeast-2/2019/10/1/ConfigHistory/7XXXXXXXXXXX3_Config_ap-southeast-2_ConfigHistory_AWS::RDS::DBInstance_20191001T103223Z_20191001T103223Z_1.json.gz to \AWSLogs\7XXXXXXXXXXX3\Config\ap-southeast-2\2019\10\1\ConfigHistory\703014955993_Config_ap-southeast-2_ConfigHistory_AWS::RDS::DBInstance_20191001T103223Z_20191001T103223Z_1.json.gz
In Notepad++ or other text editor you replace the (dryrun) download: with aws s3 cp
Then you will see the following lines with the command: aws s3 cp, the Bucket file and the local file path. We need to remove the :: in the local file path on the right side of the to:
aws s3 cp s3://config-bucket-7XXXXXXXXXXX3/AWSLogs/7XXXXXXXXXXX3/Config/ap-southeast-2/2019/10/1/ConfigHistory/7XXXXXXXXXXX3_Config_ap-southeast-2_ConfigHistory_AWS::RDS::DBInstance_20191001T103223Z_20191001T103223Z_1.json.gz to AWSLogs\7XXXXXXXXXXX3\Config\ap-southeast-2\2019\10\1\ConfigHistory\7XXXXXXXXXXX3_Config_ap-southeast-2_ConfigHistory_AWS::RDS::DBInstance_20191001T103223Z_20191001T103223Z_1.json.gz
We can replace the :: with - only in local paths not S3 Bucket path's using a regex (.*):: that removes the last occurrence of chars at the end of each line:
And here we can see I've replaced the ::'s with hyphens $1- by clicking 'Replacing All' twice:
Next remove the to (ignore the | cursor icon in the below image, to should be replaced with nothing).
FIND: json.gz to AWSLogs
REPLACE: json.gz AWSLogs
Finally select all the lines copy/paste into a command prompt to download all the files with reserved file characters!
UPDATE:
If you have WSL (Windows Subsystem Linux) you should be able to download the files and then issue a simple file rename replacing the ::'s before copying to the mounted Windows folder system.
I tried from my raspberry pi and it worked. Seems to only be an issue with Windows OS.

Download list of specific files from AWS S3 using CLI

I am trying to download only specific files from AWS. I have the list of file URLs. Using the CLI I can only download all files in a bucket using the --recursive command, but I only want to download the files in my list. Any ideas on how to do that?
This is possibly a duplicate of:
Selective file download in AWS S3 CLI
You can do something along the lines of:
aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2018-02-06*" --recursive
https://docs.aws.amazon.com/cli/latest/reference/s3/cp.html
Since you have the s3 urls already in a file (say file.list), like -
s3://bucket/file1
s3://bucket/file2
You could download all the files to your current working directory with a simple bash script -
while read -r line;do aws s3 cp "$line" .;done < test.list
People, I found out a quicker way to do it: https://stackoverflow.com/a/69018735
WARNING: "Please make sure you don't have an empty line at the end of your text file".
It worked here! :-)

AWS S3 cp creates undefined files

While using the aws cli cp command to copy the files recursively, there is a bug which creates some undefined files.
aws s3 cp --recursive $HOME/$MYHOST-$MYTIMESTAMP/$MYHOST-$MYTIMESTAMP-*.xml s3://mybucket/$MYHOST-$MYTIMESTAMP/
The program works fine and uploads to the specified bucket. But it also creates some undefined files outside the bucket in the root folder. This happens all the time and I have to rm (delete) those annoying undefined files.
I presumed it to be a bug and then tried individually uploading the files rather than using wildcards, with the same results as the recursive, it still creates additional undefined files outside the bucket in the root folder again. And this happens only when I run a bunch of the same cp commands in a bash script. In this case the problem is intermittently showing up.
aws s3 cp $HOME/$MYHOST-$MYTIMESTAMP/$MYHOST-$MYTIMESTAMP-hello.xml s3://mybucket/$MYHOST-$MYTIMESTAMP/
However while doing it only for a single file, it doesn't show up.
My Cli version -
aws-cli/1.14.34 Python/2.7.14+ Linux/4.4.104-39-default
botocore/1.8.38
Any help would be highly appreciated on this.
You have configured S3 access logging to write logs into this bucket. Presumably, these are the log files for this bucket.
Why the filenames begin with "undefined" is not clear -- something may have gone wrong when you set up logging for the bucket so that the log file prefix did not get saved -- but the filenames look like the names of the log files that S3 creates.
https://docs.aws.amazon.com/AmazonS3/latest/dev/ServerLogs.html
Best practice is to set up a separate bucket for collecting S3 access logs in each region.

How to copy file from bucket GCS to my local machine

I need copy files from Google Cloud Storage to my local machine:
I try this command o terminal of compute engine:
$sudo gsutil cp -r gs://mirror-bf /var/www/html/mydir
That is my directory on local machine /var/www/html/mydir.
i have that error:
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
Where the mistake?
You must first create the directory /var/www/html/mydir.
Then, you must run the gsutil command on your local machine and not in the Google Cloud Shell. The Cloud Shell runs on a remote machine and can't deal directly with your local directories.
I have had a similar problem and went through the painful process of having to figuring it out too, so I thought I would provide my step by step solution (under Windows, hopefully similar for unix users) with snapshots and hope it helps others:
The first thing (as many others have pointed out on various stackoverflow threads), you have to run a local Console (in admin mode) for this to work (ie. do not use the cloud shell terminal).
Here are the steps:
Assuming you already have Python installed on your machine, you will then need to install the gsutil python package using pip from your console:
pip install gsutil
The Console looks like this:
You will then be able to run the gsutil config from that same console:
gsutil config
As you can see from the snapshot bellow, a .boto file needs to be created. It is needed to make sure you have permissions to access your drive.
Also note that you are now provided an URL, which is needed in order to get the authorization code (prompted in the console).
Open a browser and paste this URL in, then:
Log in to your Google account (ie. account linked to your Google Cloud)
Google ask you to confirm you want to give access to GSUTIL. Click Allow:
You will then be given an authorization code, which you can copy and paste to your console:
Finally you are asked for a project-id:
Get the project ID of interest from your Google Cloud.
In order to find these IDs, click on "My First Project" as circled here below:
Then you will be provided a list of all your projects and their ID.
Paste that ID in you console, hit enter and here you are! You now have created your .boto file. This should be all you need to be able to play with your Cloud storage.
Console output:
Boto config file "C:\Users\xxxx\.boto" created. If you need to use a proxy to access the Internet please see the instructions in that file.
You will then be able to copy your files and folders from the cloud to your PC using the following gsutil Command:
gsutil -m cp -r gs://myCloudFolderOfInterest/ "D:\MyDestinationFolder"
Files from within "myCloudFolderOfInterest" should then get copied to the destination "MyDestinationFolder" (on your local computer).
gsutil -m cp -r gs://bucketname/ "C:\Users\test"
I put a "r" before file path, i.e., r"C:\Users\test" and got the same error. So I removed the "r" and it worked for me.
Check with '.' as ./var
$sudo gsutil cp -r gs://mirror-bf ./var/www/html/mydir
or maybe below problem
gsutil cp does not support copying special file types such as sockets, device files, named pipes, or any other non-standard files intended to represent an operating system resource. You should not run gsutil cp with sources that include such files (for example, recursively copying the root directory on Linux that includes /dev ). If you do, gsutil cp may fail or hang.
Source: https://cloud.google.com/storage/docs/gsutil/commands/cp
the syntax that worked for me downloading to a Mac was
gsutil cp -r gs://bucketname dir Dropbox/directoryname