Fast way to get AWS S3 key count in bucket - amazon-web-services

Is there anyone out there knowing a fast way to get the count of my keys in S3?
I usually do s3cmd ls s3://bucket/ | wc -l but my bucket contains a huge number of keys which makes this operation impossible to finish.

Try this to count the bucket object using aws s3 api -
aws s3api list-objects-v2 --bucket $bucketNameToUse --query '[length(Contents[?LastModified].{Key: Key})]' --output text
Done.

This is a super old question, but I figured I'd add my $.02.
aws s3api list-objects-v2 --bucket $YOUR_BUCKET --no-cli-pager --query "Contents[].Key" --output text | wc -w
This will give you a count thats pretty close and works better for me that Vipin's answer

Related

How to sort ascending order by last modified date for s3 using aws cli

Below code sort by desc. How do I have it sort by ascending?
KEY=`aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'`
It appears that you wish to obtain the Key of the most recently modified object in the Amazon S3 bucket.
For that, you can use:
aws s3api list-objects --bucket bucketname --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
The AWS CLI --query parameter is highly capable. It uses JMESPath, which can do most required manipulations without needing to pipe data.
The aws s3api list-objects command provides information in specific fields, rather than the aws s3 ls command which is simply text output.
The above might not work as expected if there are more than 1000 objects in the bucket, since results are returned in batches of 1000.
Use: sort -r for ascending order
From the manpage for sort
-r, --reverse
reverse the result of comparisons

AWS CLI Commands

I want to get list of all files in S3 bucket with particular naming pattern.
For Eg if i have files like
aaaa2018-05-01
aaaa2018-05-23
aaaa2018-06-30
aaaa2018-06-21
I need to get list of all files for 5th month.Output should look like:
aaaa2018-05-01
aaaa2018-05-23
I executed the following command and the result was empty:
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 'aaaa2018-05-*')]" > s3list05.txt
when i check the s3list05.txt its empty. Also i tried the below command and
aws s3 ls s3:bucketname --recursive | grep aaaa2018-05* > s3list05.txt
this command lists me all the objects present in the file.
Kindly let me know the exact command to get desired output.
You are almost there. Try this:
aws s3 ls s3://bucketname --recursive | grep aaaa2018-05
or
aws s3 ls bucketname --recursive | grep aaaa2018-05
The Contains parameter doesn't need a wildcard:
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 'aaaa2018-05')].[Key]" --output text
This provides a list of Keys.
--output text removes the JSON formatting.
Using [Key] instead of just Key puts them all on one line.

How to get size of all files in an S3 bucket with versioning?

I know this command can provide the size of all files in a bucket:
aws s3 ls mybucket --recursive --summarize --human-readable
But this does not account for versioning.
If I run this command:
aws s3 ls s3://mybucket/myfile --human-readable
It will show something like "100 MiB" but it may have 10 versions of this file which will be more like "1 GiB" total.
The closest I have is getting the sizes of every version of a given file:
aws s3api list-object-versions --bucket mybucket --prefix "myfile" --query 'Versions[?StorageClass=`STANDARD`].Size' > /tmp/s3_myfile_version_sizes
Then take the sum of all version sizes.
But I would have to rerun this command for every file in a bucket.
Is there an easier way to do this?
You can run list-object-versions on the bucket as a whole:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size'
Use jq to sum it up:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add
Or, if you need a human readable output:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add | numfmt --to=iec-i --suffix=B
You can also add a prefix in case you want to know the size of a given "folder" and maybe get also the number of version objects:
aws s3api list-object-versions --bucket my-bucket --prefix my-folder --query 'Versions[*].Size' | jq 'length|add'
Or you can use jq filtering to write more complex filters, for example, including only non-current objects:
aws s3api list-object-versions --bucket my-bucket --prefix my-folder | jq '[.Versions[]|select(.IsLatest == false)|.Size] | length,add'
If jq is not available, using the --output text option unfortunately results in tab-separated values, so here's a hack to force it to separate lines and then add up the total:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].[Size,Size]' --output text | awk '{s+=$1} END {printf "%.0f", s}'
If you have a large number of objects, it might be better to use data provided by the Amazon S3 Storage Inventory:
Amazon S3 inventory provides a comma-separated values (CSV) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
Use CloudWatch, it will give result with all versioning.

aws s3 ls filter storage class(STANDARD)

How to list files but I want to list all standard class only.
I want to exclude glacier class.
Currently here is my command:
aws s3 ls s3://Videos/Action/ --human-readable --summarize
The aws s3 ls command doesn't display the Storage Class, but you can do it with this command:
aws s3api list-objects-v2 --bucket Videos --prefix Action --query "Contents[?StorageClass=='STANDARD'].Key" --output text
The output is tab-separated, so you may have to massage the output to get it in your desired format, eg:
aws s3api list-objects-v2 --bucket Videos --prefix Action --query "Contents[?StorageClass=='STANDARD'].Key" --output text | sed 's/\t/\n/g'
To gain an understanding of how to selectively use the --query command, see:
How to Filter the Output with the --query Option
JMESPath Tutorial

AWS CLI move all files with condition

I must move into another bucket only files changed in the year 2015. How can I write this condition?
aws s3 mv <condition??> s3://bucket1 s3://bucket2 --recursive
I don't think you can directly do that through through the s3 option.
what you can do though is a 2 steps approach:
get the list of files that have been modified after a date
aws s3api list-objects --bucket bucket1" --query 'Contents[?LastModified > `2015-01-01`].[Key]' --output text
Based on this list you can move the items.
I have not tried and not an shell expert but something around this
aws s3api list-objects --bucket "<YOUR_BUCKET>" --query 'Contents[?LastModified > `2015-01-01`].[Key]' --output text | xargs aws s3 mv s3://bucket2/ -