Mass Copy Files On Amazon AWE with CLI - amazon-web-services

This is probably easy but it's really stumping me. I literally have about 9 hours experience with Amazon AWS and CLI.
I have a directory
BDp-Archive/item/
on my S3 and I want to copy the text files in that directory into its sub directory called
BDp-Archive/item/txt/
My attempted command was:
aws s3 mv s3://Bdp-Archive/00009e98-3e0f-402e-9d12-7aec8e32b783/ s3://BDp-Archive/00009e98-3e0f-402e-9d12-7aec8e32b783/txt/ --include "*.txt"
This is throwing the error:
A client error (NoSuchKey) occurred when calling the HeadObject operation: Key "
00009e98-3e0f-402e-9d12-7aec8e32b783" does not exist
Completed 1 part(s) with ... file(s) remaining

I think the problem is that you need to use the --recursive switch, since by default, the mv command only applies to a single object (much like the other commands - rm, sync, etc...). try:
aws s3 mv s3://Bdp-Archive/00009e98-3e0f-402e-9d12-7aec8e32b783/ s3://BDp-Archive/00009e98-3e0f-402e-9d12-7aec8e32b783/txt/ --include "*.txt" --recursive

I needed to configure the region of my bucket (or specify it as part of the cli command
aws s3 cp --region <region> <from> <to>

You need to configure your access keys and secret key, try:
aws configure
For more options, see: http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html#cli-installing-credentials

Related

AWS Cli in Windows wont upload file to s3 bucket

Windows server 12r2 with python 2.7.10 and the aws cli tool installed. The following works:
aws s3 cp c:\a\a.txt s3://path/
I can upload that file without problem. What I want to do is upload a file from a mapped drive to an s3 bucket, so I tried this:
aws s3 cp s:\path\file s3://path/
and it works.
Now what I want to do and cannot figure out is how to not specify, but let it grab all file(s) so I can schedule this to upload the contents of a directory to my s3 bucket. I tried this:
aws s3 cp "s:\path\..\..\" s3://path/ --recursive --include "201512"
and I get this error "TOO FEW ARGUMENTS"
Nearest I can guess it's mad I'm not putting a specific file to send up, but I don't want to do that, I want to automate all things.
If someone could please shed some light on what I'm missing I would really appreciate it.
Thank you
In case this is useful for anyone else coming after me: Add some extra spaces between the source and target. I've been beating my head against running this command with every combination of single quotes, double quotes, slashes, etc:
aws s3 cp /home/<username>/folder/ s3://<bucketID>/<username>/archive/ --recursive --exclude "*" --include "*.csv"
And it would give me: "aws: error: too few arguments" Every. Single. Way. I. Tried.
So finally saw the --debug option in aws s3 cp help
so ran it again this way:
aws s3 cp /home/<username>/folder/ s3://<bucketID>/<username>/archive/ --recursive --exclude "*" --include "*.csv" --debug
And this was the relevant debug line:
MainThread - awscli.clidriver - DEBUG - Arguments entered to CLI: ['s3', 'cp', 'home/<username>/folder\xc2\xa0s3://<bucketID>/<username>/archive/', '--recursive', '--exclude', '*', '--include', '*.csv', '--debug']
I have no idea where \xc2\xa0 came from in between source and target, but there it is! Updated the line to add a couple extra spaces and now it runs without errors:
aws s3 cp /home/<username>/folder/ s3://<bucketID>/<username>/archive/ --recursive --exclude "*" --include "*.csv"
aws s3 cp "s:\path\..\..\" s3://path/ --recursive --include "201512"
TOO FEW ARGUMENTS
This is because, in you command, double-quote(") is escaped with backslash(\), so local path(s:\path\..\..\) is not parsed correctly.
What you need to do is to escape backslash with double backslashes, i.e. :
aws s3 cp "s:\\path\\..\\..\\" s3://path/ --recursive --include "201512"
Alternatively you can try 'mc' which comes as single binary is available for windows both 64bit and 32bit. 'mc' implements mirror, cp, resumable sessions, json parseable output and more - https://github.com/minio/mc
64-bit from https://dl.minio.io/client/mc/release/windows-amd64/mc.exe
32-bit from https://dl.minio.io/client/mc/release/windows-386/mc.exe
Use aws s3 sync instead of aws s3 cp to copy the contents of a directory.
I faced the same situation. Let share two scenarios I had tried to check the same code.
Within bash
please make sure you have AWS profile in place (use $aws configure). Also, make sure you use a proper proxy if applicable.
$aws s3 cp s3://bucket/directory/ /usr/home/folder/ --recursive --region us-east-1 --profile yaaagy
it worked.
Within a perl script
$cmd="$aws s3 cp s3://bucket/directory/ /usr/home/folder/ --recursive --region us-east-1 --profile yaaagy"
I enclosed it within "" and it was successful. Let me know if this works out for you.
I ran into this same problem recently, and quiver's answer -- replacing single backslashes with double backslashes -- resolved the problem I was having.
Here's the Powershell code I used to address the problem, using the OP's original example:
# Notice how my path string contains a mixture of single- and double-backslashes
$folderPath = "c:\\a\a.txt"
echo "`$folderPath = $($folderPath)"
# Use the "Resolve-Path" cmdlet to generate a consistent path string.
$osFolderPath = (Resolve-Path $folderPath).Path
echo "`$osFolderPath = $($osFolderPath)"
# Escape backslashes in the path string.
$s3TargetPath = ($osFolderPath -replace '\\', "\\")
echo "`$s3TargetPath = $($s3TargetPath)"
# Now pass the escaped string to your AWS CLI command.
echo "AWS Command = aws s3 cp `"s3://path/`" `"$s3TargetPath`""

AWS CLI moving file with wildcard (asterisk) in path

I am attempting to move a file, from on s3 location to another, using an activity in a AWS data pipeline.
The command I am using is:
(aws s3 mv s3://foobar/Tagger/out//*/lastImage.txt s3://foobar/Tagger/testInput/lastImage.txt)
But I receive the following error:
A client error (404) occurred when calling the HeadObject operation: Key "Tagger/out//*/lastImage.txt" does not exist
But, if I replace the "*" with the specific directory name, it will work. The problem is I won't always know the name of the directory, so I was hoping I could use the "*" as a wild card.
Wildcards in the AWS S3 CLI only work when using the --recursive flag.
So this should work for you:
aws s3 mv s3://foobar/Tagger/out/ s3://foobar/Tagger/testInput/ --recursive --exclude "*" --include "*/lastImage.txt"
Unfortunately, this will recreate the entire directory structure in your target location, and I'm not immediately sure that can be solved by just using the AWS CLI.

Selective file download in AWS CLI

I have files in S3 bucket. I was trying to download files based on a date, like 08th aug, 09th Aug etc.
I used the following code, but it still downloads the entire bucket:
aws s3 cp s3://bucketname/ folder/file \
--profile pname \
--exclude \"*\" \
--recursive \
--include \"" + "2015-08-09" + "*\"
I am not sure, how to achieve this. How can I download selective date file?
This command will copy all files starting with 2015-08-15:
aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2015-08-15*" --recursive
If your goal is to synchronize a set of files without copying them twice, use the sync command:
aws s3 sync s3://BUCKET/ folder
That will copy all files that have been added or modified since the previous sync.
In fact, this is the equivalent of the above cp command:
aws s3 sync s3://BUCKET/ folder --exclude "*" --include "2015-08-15*"
References:
AWS CLI s3 sync command documentation
AWS CLI s3 cp command documentation
Bash Command to copy all files for specific date or month to current folder
aws s3 ls s3://bucketname/ | grep '2021-02' | awk '{print $4}' | aws s3 cp s3://bucketname/{} folder
Command is doing the following thing
Listing all the files under a bucket
Filtering out all the files of 2021-02 i.e. all files of feb month of 2021
Filtering out only the name of them
running command aws s3 cp on specific files
In case your bucket size is large in the upwards of 10 to 20 gigs,
this was true in my own personal use case, you can achieve the same
goal by using sync in multiple terminal windows.
All the terminal sessions can use the same token, in case you need to generate a token for prod environment.
$ aws s3 sync s3://bucket-name/sub-name/another-name folder-name-in-pwd/
--exclude "*" --include "name_date1*" --profile UR_AC_SomeName
and another terminal window (same pwd)
$ aws s3 sync s3://bucket-name/sub-name/another-name folder-name-in-pwd/
--exclude "*" --include "name_date2*" --profile UR_AC_SomeName
and another two for "name_date3*" and "name_date4*"
Additionally, you can also do multiple excludes in the same sync
command as in:
$ aws s3 sync s3://bucket-name/sub-name/another-name my-local-path/
--exclude="*.log/*" --exclude=img --exclude=".error" --exclude=tmp
--exclude="*.cache"
This Bash Script will copy all files from one bucket to another by modified-date using aws-cli.
aws s3 ls <BCKT_NAME> --recursive | sort | grep "2020-08-*" | cut -b 32- > a.txt
Inside Bash File
while IFS= read -r line; do
aws s3 cp s3://<SRC_BCKT>/${line} s3://<DEST_BCKT>/${line} --sse AES256
done < a.txt
aws cli is really slow at this. I waited hours and nothing really happened. So I looked for alternatives.
https://github.com/peak/s5cmd worked great.
supports globs, for example:
s5cmd -numworkers 30 cp 's3://logs-bucket/2022-03-30-19-*' .
is really blazing fast, so you can work with buckets that have s3 access logs without much fuss.

How to move files from amazon ec2 to s3 bucket using command line

In my amazon EC2 instance, I have a folder named uploads. In this folder I have 1000 images. Now I want to copy all images to my new S3 bucket. How can I do this?
First Option sm3cmd
Use s3cmd
s3cmd get s3://AWS_S3_Bucket/dir/file
Take a look at this s3cmd documentation
if you are on linux, run this on the command line:
sudo apt-get install s3cmd
or Centos, Fedore.
yum install s3cmd
Example of usage:
s3cmd put my.file s3://pactsRamun/folderExample/fileExample
Second Option
Using Cli from amazon
Update
Like #tedder42 said in the comments, instead of using cp, use sync.
Take a look at the following syntax:
aws s3 sync <source> <target> [--options]
Example:
aws s3 sync . s3://my-bucket/MyFolder
More information and examples available at Managing Objects Using High-Level s3 Commands with the AWS Command Line Interface
aws s3 sync your-dir-name s3://your-s3-bucket-name/folder-name
Important: This will copy each item in your named directory into the s3 bucket folder you selected. This will not copy your directory as a whole.
Or, you can use the following command for one selected file.
aws s3 sync your-dir-name/file-name s3://your-s3-bucket-name/folder-name/file-name
Or you can use a wild character to select all. Note that this will copy your directory as a whole and also generate metadata and save them to your s3 bucket folder.
aws s3 sync . s3://your-s3-bucket-name/folder-name
To copy from EC2 to S3 use the below code in the Command line of EC2.
First, you have to give "IAM Role with full s3 Access" to your EC2 instance.
aws s3 cp Your_Ec2_Folder s3://Your_S3_bucket/Your_folder --recursive
Also note on aws cli syncing with s3 it is multithreaded and uploads multiple parts of a file at one time. The number of threads however, is not configurable at this time.
aws s3 mv /home/inbound/ s3://test/ --recursive --region us-west-2
This can be done very simply. Follow the following steps:
Open the AWS EC2 on console.
Select the instance and navigate to actions.
Select instances settings and select Attach/Replace IAM Role
When this is done, connect to the AWS instance and the rest will be done via the following CLI commands:
aws s3 cp filelocation/filename s3://bucketname
Hence you don't need to install or do any extra efforts.
Please note... the file location refers to the local address. And the bucketname is the name of your bucket.
Also note: This is possible if your instance and S3 bucket are in the same account.
Cheers.
We do have a dryrun feature available for testing.
To begin with I would assign ec2-instance a role to be able read
write to S3
SSH into the instance and perform the following
vi tmp1.txt
aws s3 mv ./ s3://bucketname-bucketurl.com/ --dryrun
If this works then all you have to do is either create a script to
upload all files with specific from this folder to s3 bucket
I have done the wrritten the following command in my script to move
files older than 2 minutes from current directory to bucket/folder
cd dir; ls . -rt | xargs -I FILES find FILES -maxdepth 1 -name
'*.txt' -mmin +2 -exec aws s3 mv '{}' s3://bucketurl.com

Recursive list s3 bucket contents with AWS CLI

How can I recursively list all all the contents of a bucket using the AWS CLI similar to using find . on Unix.
aws s3 ls s3://MyBucket --recursive complains with unknown option.
http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#directory-and-s3-prefix-operations claims that --recursive is a valid parameter.
aws s3 ls s3://MyBucket --recursive works fine for me.
Try updating your AWS CLI. My version is aws-cli/1.6.2
aws --version
With recent AWS CLI versions, --recursive option is supported.
You can list recursively all the files under a bucket named MyBucket using following command:
aws s3 ls s3://MyBucket/ --recursive
You can list recursively all the files under a folder named MyFolder in the bucket, using following command:
aws s3 ls s3://MyBucket/MyFolder/ --recursive
As #Francisco Cardoso said, the final / is very important. It allows to list the content of the folder instead of the folder itself
For more information, see: https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html
I am not able to interpret the link you referred properly: http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#directory-and-s3-prefix-operations
However, I was able to make --recursive option work with respect to this link: http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#single-local-file-and-s3-object-operations
as per this link, cp, mv and rm supports --recursive option.
The one that you are trying is ls.
I tried using cp and rm with --recursive option and it is working fine.
You can not list recursively all the contents of a bucket via -
aws s3 ls s3://MyBucket
To list object from a folder you need to execute command as -
aws s3 ls s3://MyBucket/MyFolder/
This above command lists object that reside inside folder named MyFolder.
To get an objects list from such a logical hierarchy from Amazon S3, you need specify the full key name for the object in the GET operation.
--recursive Command is performed on allfiles or objects under the specified directory or prefix.
Thanks
Below one line bash script is able to perform:- how to list all S3 buckets with their objects recursively, list bucket name and count objects also.
/usr/bin/sudo /usr/local/bin/aws s3 ls |awk '{print $NF}'| while read l;do echo -e "#######---$l objects---##########\n\n";/usr/bin/sudo /usr/local/bin/aws s3 ls $l|nl;done