Get last modified object from S3 using AWS CLI - amazon-web-services

I have a use case where I programmatically bring up an EC2 instance, copy an executable file from S3, run it and shut down the instance (done in user-data). I need to get only the last added file from S3.
Is there a way to get the last modified file / object from a S3 bucket using the AWS CLI tool?

You can list all the objects in the bucket with aws s3 ls $BUCKET --recursive:
$ aws s3 ls $BUCKET --recursive
2015-05-05 15:36:17 4 an_object.txt
2015-06-08 14:14:44 16322599 some/other/object
2015-04-29 12:09:29 32768 yet-another-object.sh
They're sorted alphabetically by key, but that first column is the last modified time. A quick sort will reorder them by date:
$ aws s3 ls $BUCKET --recursive | sort
2015-04-29 12:09:29 32768 yet-another-object.sh
2015-05-05 15:36:17 4 an_object.txt
2015-06-08 14:14:44 16322599 some/other/object
tail -n 1 selects the last row, and awk '{print $4}' extracts the fourth column (the name of the object).
$ aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'
some/other/object
Last but not least, drop that into aws s3 cp to download the object:
$ KEY=`aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'`
$ aws s3 cp s3://$BUCKET/$KEY ./latest-object

Updated answer
After a while there is a small update how to do it a bit elegant:
aws s3api list-objects-v2 --bucket "my-awesome-bucket" --query 'sort_by(Contents, &LastModified)[-1].Key' --output=text
Instead of extra reverse function we can get last entry from the list via [-1]
Old answer
This command just do the job without any external dependencies:
aws s3api list-objects-v2 --bucket "my-awesome-bucket" --query 'reverse(sort_by(Contents, &LastModified))[:1].Key' --output=text

aws s3api list-objects-v2 --bucket "bucket-name" |jq -c ".[] | max_by(.LastModified)|.Key"

If this is a freshly uploaded file, you can use Lambda to execute a piece of code on the new S3 object.
If you really need to get the most recent one, you can name you files with the date first, sort by name, and take the first object.

Following is bash script, that downloads latest file from a S3 Bucket. I used AWS S3 Synch command instead, so that it would not download the file from S3 if already existing.
--exclude, excludes all the files
--include, includes all the files matching the pattern
#!/usr/bin/env bash
BUCKET="s3://my-s3-bucket-eu-west-1/list/"
FILE_NAME=`aws s3 ls $BUCKET | sort | tail -n 1 | awk '{print $4}'`
TARGET_FILE_PATH=target/datdump/
TARGET_FILE=${TARGET_FILE_PATH}localData.json.gz
echo $FILE_NAME
echo $TARGET_FILE
aws s3 sync $BUCKET $TARGET_FILE_PATH --exclude "*" --include "*$FILE_NAME*"
cp target/datdump/$FILE_NAME $TARGET_FILE
p.s. Thanks #David Murray

Related

Amazon S3 Copy files after date and with regex

I'm trying to copy some files from S3 sourceBucket to targetBucket, but I need to filter by date and by prefix.
I wish it could be done with AWS CLI, but at the moment I'm stuck with list-object or with cp command.
I can filter correctly with
aws s3api list-objects-v2 --bucket sourceBucket --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix "somePrefix_"
With the CP I can copy the files, but only by prefix
aws s3 cp s3://sourceBucket/ s3://targetBucket/ --recursive --include "somePrefix" --exclude "*"
I tried to come up with some ideas using the header --x-amz-copy-source-if-modified-since but it looks like you can use it with the command aws s3api copy-object and it copies one item at a time (doc).
I read some answers/docs and I think I understood che cp command doesn't filter by date, but only by prefix.
Do you have any idea on how to solve this?
Thank you in advance!
Since you already have a list with objects you want to copy to another bucket, I suggest writing a bash script which does the copying for multiple objects:
#!/bin/bash
SOURCE_BUCKET="<my-bucket>"
DESTINATION_BUCKET="<my-other-bucket>"
PREFIX="<some-prefix>"
content=$(aws s3api list-objects-v2 --bucket $SOURCE_BUCKET --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix $PREFIX | jq -r ".[].Key")
for file in $content;
do
aws s3api copy-object --copy-source $SOURCE_BUCKET/$file --key $file --bucket $DESTINATION_BUCKET | jq
done
Please note, this scripts requires jq to be installed.

How to sort ascending order by last modified date for s3 using aws cli

Below code sort by desc. How do I have it sort by ascending?
KEY=`aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'`
It appears that you wish to obtain the Key of the most recently modified object in the Amazon S3 bucket.
For that, you can use:
aws s3api list-objects --bucket bucketname --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
The AWS CLI --query parameter is highly capable. It uses JMESPath, which can do most required manipulations without needing to pipe data.
The aws s3api list-objects command provides information in specific fields, rather than the aws s3 ls command which is simply text output.
The above might not work as expected if there are more than 1000 objects in the bucket, since results are returned in batches of 1000.
Use: sort -r for ascending order
From the manpage for sort
-r, --reverse
reverse the result of comparisons

AWS CLI Commands

I want to get list of all files in S3 bucket with particular naming pattern.
For Eg if i have files like
aaaa2018-05-01
aaaa2018-05-23
aaaa2018-06-30
aaaa2018-06-21
I need to get list of all files for 5th month.Output should look like:
aaaa2018-05-01
aaaa2018-05-23
I executed the following command and the result was empty:
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 'aaaa2018-05-*')]" > s3list05.txt
when i check the s3list05.txt its empty. Also i tried the below command and
aws s3 ls s3:bucketname --recursive | grep aaaa2018-05* > s3list05.txt
this command lists me all the objects present in the file.
Kindly let me know the exact command to get desired output.
You are almost there. Try this:
aws s3 ls s3://bucketname --recursive | grep aaaa2018-05
or
aws s3 ls bucketname --recursive | grep aaaa2018-05
The Contains parameter doesn't need a wildcard:
aws s3api list-objects --bucket bucketname --query "Contents[?contains(Key, 'aaaa2018-05')].[Key]" --output text
This provides a list of Keys.
--output text removes the JSON formatting.
Using [Key] instead of just Key puts them all on one line.

How to delete multiple files in S3 bucket with AWS CLI

Suppose I have an S3 bucket named x.y.z
In this bucket, I have hundreds of files. But I only want to delete 2 files named purple.gif and worksheet.xlsx
Can I do this from the AWS command line tool with a single call to rm?
This did not work:
$ aws s3 rm s3://x.y.z/worksheet.xlsx s3://x.y.z/purple.gif
Unknown options: s3://x.y.z/purple.gif
From the manual, it doesn't seem like you can delete a list of files explicitly by name. Does anyone know a way to do it? I prefer not using the --recursive flag.
You can do this by providing an --exclude or --include argument multiple times. But, you'll have to use --recursive for this to work.
When there are multiple filters, remember that the order of the filter parameters is important. The rule is the filters that appear later in the command take precedence over filters that appear earlier in the command.
aws s3 rm s3://x.y.z/ --recursive --exclude "*" --include "purple.gif" --include "worksheet.xlsx"
Here, all files will be excluded from the command except for purple.gif and worksheet.xlsx.
If you're unsure, always try a --dryrun first and inspect which files will be deleted.
Source: Use of Exclude and Include Filters
s3 rm cannot delete multiple files, but you can use s3api delete-objects to achieve what you want here.
Example
aws s3api delete-objects --bucket x.y.z --delete '{"Objects":[{"Key":"worksheet.xlsx"},{"Key":"purple.gif"}]}'
Apparently aws s3 rm works only on individual files/objects.
Below is a bash command that constructs individual delete commands and then removes the objects one by one. Works with some success (might be bit slow, but works):
aws s3 ls s3://bucketname/foldername/ |
awk {'print "aws s3 rm s3://bucketname/foldername/" $4'} |
bash
The first two lines are meant to construct the "rm" commands and the 3rd line (bash) will execute them.
Note that you might face issues if your object names have spaces or funny characters. This is because "aws s3 ls" command won't list such objects (as of this writing)
This command deletes files in a bucket.
aws s3 rm s3://buketname --recursive
If you are using AWS CLI you can filter LS results with grep regex and delete them. For example
aws s3 ls s3://BUCKET | awk '{print $4}' | grep -E -i '^2015-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9a-zA-Z]*)' | xargs -I% bash -c 'aws s3 rm s3://BUCKET/%'
This is slow but it works
This solution will work when you want to specify wildcard for object name.
aws s3 ls dmap-live-dwh-files/backup/mongodb/oms_api/hourly/ | grep order_2019_08_09_* | awk {'print "aws s3 rm s3://dmap-live-dwh-files/backup/mongodb/oms_api/hourly/" $4'} | bash
I found this one useful through the command line. I had more than 4 million files and it took almost a week to empty the bucket. This comes handy as the AWS console is not descriptive with the logs.
Note: You need the jq tool installed.
aws s3api list-object-versions --bucket YOURBUCKETNAMEHERE-processed \
--output json --query 'Versions[].[Key, VersionId]' \
| jq -r '.[] | "--key '\''" + .[0] + "'\'' --version-id " + .[1]' \
| xargs -L1 aws s3api delete-object --bucket YOURBUCKETNAMEHERE
You can delete multiple files using aws s3 rm. If you want to delete all files in a specific folder, just use
aws s3 rm --recursive --region <AWS_REGION> s3://<AWS_BUCKET>/<FOLDER_PATH>/
first test it with the --dryrun option!
Quick way to delete a very large Folder in AWS
AWS_PROFILE=<AWS_PROFILE> AWS_BUCKET=<AWS_BUCKET> AWS_FOLDER=<AWS_FOLDER>; aws --profile $AWS_PROFILE s3 ls "s3://${AWS_BUCKET}/${AWS_FOLDER}/" | awk '{print $4}' | xargs -P8 -n1000 bash -c 'aws --profile '${AWS_PROFILE}' s3api delete-objects --bucket '${AWS_BUCKET}' --delete "Objects=[$(printf "{Key='${AWS_FOLDER}'/%s}," "$#")],Quiet=true" >/dev/null 2>&1'
PS: This might be launch 2/3 times because sometimes, some deletion fails...

Downloading the latest file in an S3 bucket using AWS CLI? [duplicate]

This question already has answers here:
Get last modified object from S3 using AWS CLI
(5 answers)
Closed 2 years ago.
I have an S3 bucket that contains database backups. I am creating a script to download the latest backup (and eventually restore it somewhere else), but I'm not sure how to go about only grabbing the most recent file from a bucket.
Is it possible to copy only the most recent file from an S3 bucket to a local directory using AWS CLI tools?
And here is a bash script create based on #error2007s's answer. This script requires your aws profile and bucket name as variables, and downloads the latest object to your ~/Downloads folder:
#!/bin/sh
PROFILE=your_profile
BUCKET=your_bucket
OBJECT="$(aws s3 ls --profile $PROFILE $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}')"
aws s3 cp s3://$BUCKET/$OBJECT ~/Downloads/$OBJECT --profile $PROFILE
FILE=`aws s3api list-objects-v2 --bucket "$BUCKET_NAME" --query 'reverse(sort_by(Contents[?contains(Key, \`$FILE_NAME_FILTER\`)], &LastModified))[:1].Key' --output=text`;aws s3 cp "s3://$BUCKET_NAME/$FILE" .
$BUCKET_NAME - is the bucket from which you want to download.
$FILE_NAME_FILTER - a string used as a filter for the name, which you want to match.
aws s3 cp " " - its in double-quotes because to also include files that have spaces in their names.
The above solutions are using Bash. If one wants to do the same thing in PowerShell for downloading on Windows here is the script:
# This assumes AWS CLI exe is in your path.
$s3location = "s3://bucket-name"
$files = $(aws s3 ls $s3location --recursive | sort | select -last 3)
$dlPath = "C:\TEMP"
foreach ($s3FileInfo in $files) {
$filename = $s3FileInfo.Split()[-1]
$path = "${s3location}/${filename}"
aws s3 cp $path $dlPath
echo("Done downloading ${path} to ${dlPath}")
}