Recursive list s3 bucket contents with AWS CLI - amazon-web-services

How can I recursively list all all the contents of a bucket using the AWS CLI similar to using find . on Unix.
aws s3 ls s3://MyBucket --recursive complains with unknown option.
http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#directory-and-s3-prefix-operations claims that --recursive is a valid parameter.

aws s3 ls s3://MyBucket --recursive works fine for me.
Try updating your AWS CLI. My version is aws-cli/1.6.2
aws --version

With recent AWS CLI versions, --recursive option is supported.
You can list recursively all the files under a bucket named MyBucket using following command:
aws s3 ls s3://MyBucket/ --recursive
You can list recursively all the files under a folder named MyFolder in the bucket, using following command:
aws s3 ls s3://MyBucket/MyFolder/ --recursive
As #Francisco Cardoso said, the final / is very important. It allows to list the content of the folder instead of the folder itself
For more information, see: https://docs.aws.amazon.com/cli/latest/reference/s3/ls.html

I am not able to interpret the link you referred properly: http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#directory-and-s3-prefix-operations
However, I was able to make --recursive option work with respect to this link: http://docs.aws.amazon.com/cli/latest/reference/s3/index.html#single-local-file-and-s3-object-operations
as per this link, cp, mv and rm supports --recursive option.
The one that you are trying is ls.
I tried using cp and rm with --recursive option and it is working fine.

You can not list recursively all the contents of a bucket via -
aws s3 ls s3://MyBucket
To list object from a folder you need to execute command as -
aws s3 ls s3://MyBucket/MyFolder/
This above command lists object that reside inside folder named MyFolder.
To get an objects list from such a logical hierarchy from Amazon S3, you need specify the full key name for the object in the GET operation.
--recursive Command is performed on allfiles or objects under the specified directory or prefix.
Thanks

Below one line bash script is able to perform:- how to list all S3 buckets with their objects recursively, list bucket name and count objects also.
/usr/bin/sudo /usr/local/bin/aws s3 ls |awk '{print $NF}'| while read l;do echo -e "#######---$l objects---##########\n\n";/usr/bin/sudo /usr/local/bin/aws s3 ls $l|nl;done

Related

aws s3 CLI, unable to copy entire directory structure

I have the following directory structure
first/
second/
file.txt
If I do aws s3 cp first s3://bucket-name --recursive the file copied in s3 has the path bucket-name/second/file.txt not bucket-name/first/second/file.txt
Why it doesn't behave like the cp command on Linux and how can I achieve the later?
Thanks in advance
Use:
aws s3 cp first s3://bucket-name/first/ --recursive
This mimics normal Linux behaviour, such as:
cp -R first third
Both will result in the contents of first being put into the target directory. Neither commands creates a first directory.

how to get list of files on aws bucket with a specific pattern

I am looking for a way to get the list of files with a specific pattern from aws. But it seems that it can not support it.
In detail:
I have a bucket in AWS that has several hundred thousand files in it and I am want to find that has a specific pattern. For example, I want to find all files that are "my*.txt" on the bucket and list them.
If I run this command:
aws s3 ls s3://my_bucket/ --recursive > byBucketList.txt
Then I can search for my files in the text file and found where are they. But I am not able to run anything such as this:
aws s3 ls s3://my_bucket/my*.txt --recursive > byBucketList.txt
to only get the list of files that I am looking for.
Is there any way that I get the list of files with a specific pattern on an AWS bucket?
Sadly you can't do this with s3 ls. But you could possibly use Exclude and Include Filters along with --dryrun for commands that support filters. For example, for s3 cp:
aws s3 cp s3://my_bucket ./. --recursive --dryrun --exclude "*" --include "my*.txt"
--dryrun (boolean) Displays the operations that would be performed using the specified command without actually running them.
This should print out all the objects that would be normally copied without --dryrun option.
Since you have a large bucket, you can test it out on small bucket just to get the feel of the command as its output is different then from s3 ls.
mans you can achieve this using grep.
aws s3 ls s3://my_bucket --recursive | grep -e 'my[A-Za-z0-9]*.txt' > byBucketList.txt

How to get files to copy from S3 bucket

Need some help with cp command in AWS CLI. I am trying to copy files from S3 bucket to a local folder. The command I used seems to have run successfully in Powershell, but the folder is still empty.
Command:
aws s3 cp s3://<my bucket path> <my local destination> --exclude "*" --include "*-20201023*" --recursive --dryrun
The --dryrun parameter prohibits the command from actually copying anything. It just shows you what would happen. Try removing that parameter and running the command.

Use the aws client to copy s3 files from a single directory only (non recursively)

Consider an aws bucket/key structure along these lines
myBucket/dir1/file1
myBucket/dir1/file2
myBucket/dir1/dir2/dir2file1
myBucket/dir1/dir2/dir2file2
When using:
aws s3 cp --recursive s3://myBucket/dir1/ .
Then we will copy down dir2file[1,2] along with file[1,2]. How to only copy the latter files and not files under subdirectories ?
Responding to a comment: . I am not interested in putting a --exclude for every subdirectory so this is not a duplicate of excluding directories from aws cp
As far as I understood, you want to make sure that the files present in current directories are copied but anything in child directories should not be copied. I think you can use something like that.
aws s3 cp s3://myBucket/dir1/ . --recursive --exclude "*/*"
Here we are excluding files which will have a path separator after "dir1".
You can exclude paths using the --exclude option, e.g.
aws s3 cp s3://myBucket/dir1/ . --recursive --exclude "dir1/dir2/*"
More options and examples can be found by using the aws cli help
aws s3 cp help
There is no way you can control the recursion depth while copying files using aws s3 cp. Neither it is supported in aws s3 ls.
So, if you do not wish to use --exclude or --include options, I suggest you:
Use aws s3 ls command without --recursive option to list files directly under a directory, extract only the file names from the output and save the names to a file. Refer this post
Then write a simple script to read the file names and for each execute aws s3 cp
Alternatively, you may use:
aws s3 cp s3://spaces/dir1/ . --recursive --exclude "*/*"

Selective file download in AWS CLI

I have files in S3 bucket. I was trying to download files based on a date, like 08th aug, 09th Aug etc.
I used the following code, but it still downloads the entire bucket:
aws s3 cp s3://bucketname/ folder/file \
--profile pname \
--exclude \"*\" \
--recursive \
--include \"" + "2015-08-09" + "*\"
I am not sure, how to achieve this. How can I download selective date file?
This command will copy all files starting with 2015-08-15:
aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2015-08-15*" --recursive
If your goal is to synchronize a set of files without copying them twice, use the sync command:
aws s3 sync s3://BUCKET/ folder
That will copy all files that have been added or modified since the previous sync.
In fact, this is the equivalent of the above cp command:
aws s3 sync s3://BUCKET/ folder --exclude "*" --include "2015-08-15*"
References:
AWS CLI s3 sync command documentation
AWS CLI s3 cp command documentation
Bash Command to copy all files for specific date or month to current folder
aws s3 ls s3://bucketname/ | grep '2021-02' | awk '{print $4}' | aws s3 cp s3://bucketname/{} folder
Command is doing the following thing
Listing all the files under a bucket
Filtering out all the files of 2021-02 i.e. all files of feb month of 2021
Filtering out only the name of them
running command aws s3 cp on specific files
In case your bucket size is large in the upwards of 10 to 20 gigs,
this was true in my own personal use case, you can achieve the same
goal by using sync in multiple terminal windows.
All the terminal sessions can use the same token, in case you need to generate a token for prod environment.
$ aws s3 sync s3://bucket-name/sub-name/another-name folder-name-in-pwd/
--exclude "*" --include "name_date1*" --profile UR_AC_SomeName
and another terminal window (same pwd)
$ aws s3 sync s3://bucket-name/sub-name/another-name folder-name-in-pwd/
--exclude "*" --include "name_date2*" --profile UR_AC_SomeName
and another two for "name_date3*" and "name_date4*"
Additionally, you can also do multiple excludes in the same sync
command as in:
$ aws s3 sync s3://bucket-name/sub-name/another-name my-local-path/
--exclude="*.log/*" --exclude=img --exclude=".error" --exclude=tmp
--exclude="*.cache"
This Bash Script will copy all files from one bucket to another by modified-date using aws-cli.
aws s3 ls <BCKT_NAME> --recursive | sort | grep "2020-08-*" | cut -b 32- > a.txt
Inside Bash File
while IFS= read -r line; do
aws s3 cp s3://<SRC_BCKT>/${line} s3://<DEST_BCKT>/${line} --sse AES256
done < a.txt
aws cli is really slow at this. I waited hours and nothing really happened. So I looked for alternatives.
https://github.com/peak/s5cmd worked great.
supports globs, for example:
s5cmd -numworkers 30 cp 's3://logs-bucket/2022-03-30-19-*' .
is really blazing fast, so you can work with buckets that have s3 access logs without much fuss.