AWS CLI move all files with condition - amazon-web-services

I must move into another bucket only files changed in the year 2015. How can I write this condition?
aws s3 mv <condition??> s3://bucket1 s3://bucket2 --recursive

I don't think you can directly do that through through the s3 option.
what you can do though is a 2 steps approach:
get the list of files that have been modified after a date
aws s3api list-objects --bucket bucket1" --query 'Contents[?LastModified > `2015-01-01`].[Key]' --output text
Based on this list you can move the items.
I have not tried and not an shell expert but something around this
aws s3api list-objects --bucket "<YOUR_BUCKET>" --query 'Contents[?LastModified > `2015-01-01`].[Key]' --output text | xargs aws s3 mv s3://bucket2/ -

Related

Amazon S3 Copy files after date and with regex

I'm trying to copy some files from S3 sourceBucket to targetBucket, but I need to filter by date and by prefix.
I wish it could be done with AWS CLI, but at the moment I'm stuck with list-object or with cp command.
I can filter correctly with
aws s3api list-objects-v2 --bucket sourceBucket --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix "somePrefix_"
With the CP I can copy the files, but only by prefix
aws s3 cp s3://sourceBucket/ s3://targetBucket/ --recursive --include "somePrefix" --exclude "*"
I tried to come up with some ideas using the header --x-amz-copy-source-if-modified-since but it looks like you can use it with the command aws s3api copy-object and it copies one item at a time (doc).
I read some answers/docs and I think I understood che cp command doesn't filter by date, but only by prefix.
Do you have any idea on how to solve this?
Thank you in advance!
Since you already have a list with objects you want to copy to another bucket, I suggest writing a bash script which does the copying for multiple objects:
#!/bin/bash
SOURCE_BUCKET="<my-bucket>"
DESTINATION_BUCKET="<my-other-bucket>"
PREFIX="<some-prefix>"
content=$(aws s3api list-objects-v2 --bucket $SOURCE_BUCKET --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix $PREFIX | jq -r ".[].Key")
for file in $content;
do
aws s3api copy-object --copy-source $SOURCE_BUCKET/$file --key $file --bucket $DESTINATION_BUCKET | jq
done
Please note, this scripts requires jq to be installed.

AWS CLI Commands

I want to get list of all files in S3 bucket with particular naming pattern.
For Eg if i have files like
aaaa2018-05-01
aaaa2018-05-23
aaaa2018-06-30
aaaa2018-06-21
I need to get list of all files for 5th month.Output should look like:
aaaa2018-05-01
aaaa2018-05-23
I executed the following command and the result was empty:
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 'aaaa2018-05-*')]" > s3list05.txt
when i check the s3list05.txt its empty. Also i tried the below command and
aws s3 ls s3:bucketname --recursive | grep aaaa2018-05* > s3list05.txt
this command lists me all the objects present in the file.
Kindly let me know the exact command to get desired output.
You are almost there. Try this:
aws s3 ls s3://bucketname --recursive | grep aaaa2018-05
or
aws s3 ls bucketname --recursive | grep aaaa2018-05
The Contains parameter doesn't need a wildcard:
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 'aaaa2018-05')].[Key]" --output text
This provides a list of Keys.
--output text removes the JSON formatting.
Using [Key] instead of just Key puts them all on one line.

How to get size of all files in an S3 bucket with versioning?

I know this command can provide the size of all files in a bucket:
aws s3 ls mybucket --recursive --summarize --human-readable
But this does not account for versioning.
If I run this command:
aws s3 ls s3://mybucket/myfile --human-readable
It will show something like "100 MiB" but it may have 10 versions of this file which will be more like "1 GiB" total.
The closest I have is getting the sizes of every version of a given file:
aws s3api list-object-versions --bucket mybucket --prefix "myfile" --query 'Versions[?StorageClass=`STANDARD`].Size' > /tmp/s3_myfile_version_sizes
Then take the sum of all version sizes.
But I would have to rerun this command for every file in a bucket.
Is there an easier way to do this?
You can run list-object-versions on the bucket as a whole:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size'
Use jq to sum it up:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add
Or, if you need a human readable output:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add | numfmt --to=iec-i --suffix=B
You can also add a prefix in case you want to know the size of a given "folder" and maybe get also the number of version objects:
aws s3api list-object-versions --bucket my-bucket --prefix my-folder --query 'Versions[*].Size' | jq 'length|add'
Or you can use jq filtering to write more complex filters, for example, including only non-current objects:
aws s3api list-object-versions --bucket my-bucket --prefix my-folder | jq '[.Versions[]|select(.IsLatest == false)|.Size] | length,add'
If jq is not available, using the --output text option unfortunately results in tab-separated values, so here's a hack to force it to separate lines and then add up the total:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].[Size,Size]' --output text | awk '{s+=$1} END {printf "%.0f", s}'
If you have a large number of objects, it might be better to use data provided by the Amazon S3 Storage Inventory:
Amazon S3 inventory provides a comma-separated values (CSV) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
Use CloudWatch, it will give result with all versioning.

aws s3 ls filter storage class(STANDARD)

How to list files but I want to list all standard class only.
I want to exclude glacier class.
Currently here is my command:
aws s3 ls s3://Videos/Action/ --human-readable --summarize
The aws s3 ls command doesn't display the Storage Class, but you can do it with this command:
aws s3api list-objects-v2 --bucket Videos --prefix Action --query "Contents[?StorageClass=='STANDARD'].Key" --output text
The output is tab-separated, so you may have to massage the output to get it in your desired format, eg:
aws s3api list-objects-v2 --bucket Videos --prefix Action --query "Contents[?StorageClass=='STANDARD'].Key" --output text | sed 's/\t/\n/g'
To gain an understanding of how to selectively use the --query command, see:
How to Filter the Output with the --query Option
JMESPath Tutorial

Listing S3 bucket objects with specific storage class

It's very time consuming to get objects from Glacier so I decided to use S3 IA storage class instead.
I need to list all the objects in my bucket that have Glacier storage class (I configured it via LifeCycle policy) and to change it to S3 IA.
Is there any script or a tool for that?
You can do that using list-objects
list-objects will return the StorageClass, in your case you want to filter for values where it is GLACIER
aws s3api list-objects --bucket %bucket_name% --query 'Contents[?StorageClass==`GLACIER`]'
What you want then is to get only the list of Key that matches
aws s3api list-objects --bucket %bucket_name% --query 'Contents[?StorageClass==`GLACIER`][Key]' --output text
Then you will need to copy the object with changing the storage class of the Key
aws s3api list-objects --bucket %bucket_name% --query 'Contents[?StorageClass==`GLACIER`][Key]' --output text
| xargs -I {} aws s3 cp s3://bucket_name/{} s3://bucket_name/{} --storage-class STANDARD_IA
and ... if you need to run this from Powershell in windows, I had to do this:
aws s3api list-objects --bucket Your_Bucket --query 'Contents[?StorageClass==`STANDARD`][Key]' --output text | foreach { aws s3 cp s3://Your_Bucket/$_ s3://Your_Bucket/$_ --storage-class REDUCED_REDUNDANCY }