Amazon S3 Copy files after date and with regex - amazon-web-services

I'm trying to copy some files from S3 sourceBucket to targetBucket, but I need to filter by date and by prefix.
I wish it could be done with AWS CLI, but at the moment I'm stuck with list-object or with cp command.
I can filter correctly with
aws s3api list-objects-v2 --bucket sourceBucket --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix "somePrefix_"
With the CP I can copy the files, but only by prefix
aws s3 cp s3://sourceBucket/ s3://targetBucket/ --recursive --include "somePrefix" --exclude "*"
I tried to come up with some ideas using the header --x-amz-copy-source-if-modified-since but it looks like you can use it with the command aws s3api copy-object and it copies one item at a time (doc).
I read some answers/docs and I think I understood che cp command doesn't filter by date, but only by prefix.
Do you have any idea on how to solve this?
Thank you in advance!

Since you already have a list with objects you want to copy to another bucket, I suggest writing a bash script which does the copying for multiple objects:
#!/bin/bash
SOURCE_BUCKET="<my-bucket>"
DESTINATION_BUCKET="<my-other-bucket>"
PREFIX="<some-prefix>"
content=$(aws s3api list-objects-v2 --bucket $SOURCE_BUCKET --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix $PREFIX | jq -r ".[].Key")
for file in $content;
do
aws s3api copy-object --copy-source $SOURCE_BUCKET/$file --key $file --bucket $DESTINATION_BUCKET | jq
done
Please note, this scripts requires jq to be installed.

Related

Copy the latest uploaded file from S3 bucket to local machine

I have a cron job set that moves the files from an EC2 instance to S3
aws s3 mv --recursive localdir s3://bucket-name/ --exclude "*" --include "localdir/*"
After that I use aws s3 sync s3://bucket-name/data1/ E:\Datafolder in .bat file and run task scheduler in Windows to run the command.
The issue is that s3 sync command copies all the files in /data1/ prefix.
So let's say I have the following files:
Day1: file1 is synced to local.
Day2: file1 and file2 are synced to local because file1 is removed from the local machine's folder.
I don't want them to occupy space on local machine. On Day 2, I just want file2 to be copied over.
Can this be accomplished by AWS CLI commands? or do I need to write a lambda function?
I followed the answer from Get last modified object from S3 using AWS CLI
but on Windows, the | and awk commands are not working as expected.
To obtain the name of the object that has the most recent Last Modified date, you can use:
aws s3api list-objects-v2 --bucket BUCKET-NAME --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
Therefore (using shell syntax), you could use:
object=`aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query 'sort_by(Contents, &LastModified)[-1].Key' --output text`
aws s3 cp s3://BUCKET-NAME/$object E:\Datafolder
You might need to tweak it to get it working on Windows.
Basically, it gets the bucket listing, sorts by LastModified, then grabs the name of the last object in the list.
Modified answer to work with Windows .bat file. Uses Windows cmd.exe
for /f "delims=" %%i in ('aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query "sort_by(Contents, &LastModified)[-1].Key" --output text') do set object=%%i
aws s3 cp s3://BUCKET-NAME/%object% E:\Datafolder

AWS CLI Commands

I want to get list of all files in S3 bucket with particular naming pattern.
For Eg if i have files like
aaaa2018-05-01
aaaa2018-05-23
aaaa2018-06-30
aaaa2018-06-21
I need to get list of all files for 5th month.Output should look like:
aaaa2018-05-01
aaaa2018-05-23
I executed the following command and the result was empty:
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 'aaaa2018-05-*')]" > s3list05.txt
when i check the s3list05.txt its empty. Also i tried the below command and
aws s3 ls s3:bucketname --recursive | grep aaaa2018-05* > s3list05.txt
this command lists me all the objects present in the file.
Kindly let me know the exact command to get desired output.
You are almost there. Try this:
aws s3 ls s3://bucketname --recursive | grep aaaa2018-05
or
aws s3 ls bucketname --recursive | grep aaaa2018-05
The Contains parameter doesn't need a wildcard:
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 'aaaa2018-05')].[Key]" --output text
This provides a list of Keys.
--output text removes the JSON formatting.
Using [Key] instead of just Key puts them all on one line.

How to get size of all files in an S3 bucket with versioning?

I know this command can provide the size of all files in a bucket:
aws s3 ls mybucket --recursive --summarize --human-readable
But this does not account for versioning.
If I run this command:
aws s3 ls s3://mybucket/myfile --human-readable
It will show something like "100 MiB" but it may have 10 versions of this file which will be more like "1 GiB" total.
The closest I have is getting the sizes of every version of a given file:
aws s3api list-object-versions --bucket mybucket --prefix "myfile" --query 'Versions[?StorageClass=`STANDARD`].Size' > /tmp/s3_myfile_version_sizes
Then take the sum of all version sizes.
But I would have to rerun this command for every file in a bucket.
Is there an easier way to do this?
You can run list-object-versions on the bucket as a whole:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size'
Use jq to sum it up:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add
Or, if you need a human readable output:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].Size' | jq add | numfmt --to=iec-i --suffix=B
You can also add a prefix in case you want to know the size of a given "folder" and maybe get also the number of version objects:
aws s3api list-object-versions --bucket my-bucket --prefix my-folder --query 'Versions[*].Size' | jq 'length|add'
Or you can use jq filtering to write more complex filters, for example, including only non-current objects:
aws s3api list-object-versions --bucket my-bucket --prefix my-folder | jq '[.Versions[]|select(.IsLatest == false)|.Size] | length,add'
If jq is not available, using the --output text option unfortunately results in tab-separated values, so here's a hack to force it to separate lines and then add up the total:
aws s3api list-object-versions --bucket my-bucket --query 'Versions[*].[Size,Size]' --output text | awk '{s+=$1} END {printf "%.0f", s}'
If you have a large number of objects, it might be better to use data provided by the Amazon S3 Storage Inventory:
Amazon S3 inventory provides a comma-separated values (CSV) flat-file output of your objects and their corresponding metadata on a daily or weekly basis for an S3 bucket or a shared prefix (that is, objects that have names that begin with a common string).
Use CloudWatch, it will give result with all versioning.

AWS S3 CLI CP file and add metadata

Trying to copy a local file named test.txt to my s3 bucket and add metadata to the file.
But it always prints error:
argument --metadata-directive: Invalid choice, valid choices are: COPY | REPLACE
Is it possible to do this with the cp command, as I understand the docs it should be possible.
AWS CLI CP DOCS
This is the commands I've tried:
aws s3 cp test.txt to s3://a-bucket/test.txt --metadata x-amz-meta-cms-id:34533452
aws s3 cp test.txt to s3://a-bucket/test.txt --metadata-directive COPY --metadata x-amz-meta-cms-id:34533452
aws s3 cp test.txt to s3://a-bucket/test.txt --metadata-directive COPY --metadata '{"x-amz-meta-cms-id":"34533452"}'
aws s3 cp test.txt to s3://a-bucket/test.txt --metadata '{"x-amz-meta-cms-id":"34533452"}'
aws --version:
aws-cli/1.9.7 Python/2.7.10 Darwin/16.1.0 botocore/1.3.7
OS: macOS Sierra version 10.12.1
Edit
Worth mentioning is that uploading a file without the --metadata flag works fine.
Hmm, I've checked the help for my version of cli with aws s3 cp help
Turns out it does not list --metadata as an option, as the docs at the given link above does.
If runnig older version of aws cli
Use aws s3api put-object
How to upload a file to a bucket and add metadata:
aws s3api put-object --bucket a-bucket --key test.txt --body test.txt --metadata '{"x-amz-meta-cms-id":"34533452"}'
Docs: AWS S3API DOCS
Indeed the support for metadata option has been added since 1.9.10
aws s3 Added support for custom metadata in cp, mv, and sync.
so upgrading your aws cli to this version (or even better to latest) - and the metadata value needs to be a map so
aws s3 cp test.txt s3://a-bucket/test.txt --metadata '{"x-amz-meta-cms-id":"34533452"}'
Install s3cmd tools (free) and invoke like so:
s3cmd modify --add-header x-amz-meta-foo:bar s3://<bucket>/<object>
With x-amz-meta-foo:bar header you will get foo as key and bar as value of that key.
There are special flags to set Content-Type and Content-Encoding
aws s3 cp test.gz. s3://a-bucket/test.gz --content-type application/octet-stream --content-encoding gzip
There is bug with metadata directive "COPY" option.
aws s3api copy-object --bucket testkartik --copy-source testkartik/costs.csv --key costs.csv --metadata-directive "COPY" --metadata "SomeKey=SomeValue"
Below are the three steps to understand cli command with JQ workaround.
Install JQ library to deal with json metadata using command line.
Read the existing metadata.
aws s3api head-object --bucket <bucket> --key <key> | jq '.Metadata' | jq --compact-output '. +{"new":"metadata", "another" : "metadata"}'
Add new metadata.
aws s3api copy-object --bucket <bucket-name> --copy-source <bucket/key> --key <key> --metadata-directive "REPLACE" --metadata $(READ-THE-EXISTING-From-Step-2)
Complete command in one go.
aws s3api copy-object --bucket <bucket-name> --copy-source <bucket/key> --key <key> --metadata-directive "REPLACE" --metadata $(aws s3api head-object --bucket <bucket> --key <key> | jq '.Metadata' | jq --compact-output '. +{"new":"metadata", "another" : "metadata"}')

AWS CLI move all files with condition

I must move into another bucket only files changed in the year 2015. How can I write this condition?
aws s3 mv <condition??> s3://bucket1 s3://bucket2 --recursive
I don't think you can directly do that through through the s3 option.
what you can do though is a 2 steps approach:
get the list of files that have been modified after a date
aws s3api list-objects --bucket bucket1" --query 'Contents[?LastModified > `2015-01-01`].[Key]' --output text
Based on this list you can move the items.
I have not tried and not an shell expert but something around this
aws s3api list-objects --bucket "<YOUR_BUCKET>" --query 'Contents[?LastModified > `2015-01-01`].[Key]' --output text | xargs aws s3 mv s3://bucket2/ -