AWS S3 CLI copy by modified date - amazon-web-services

I have a large database on AWS S3. I am trying to download data files which only modified on 24 June 2022 to my local workstation. I can download entire database to given path as below but how can I just download data files filtered by modified date?
aws s3 cp s3://database2 .
Answer:
aws s3api list-objects --bucket "bucket_name" --output text --query "Contents[?LastModified>='2022-06-24'].{Key: Key}" | xargs -I {} aws s3 cp s3://bucket_name/{} .

Related

Copy the latest uploaded file from S3 bucket to local machine

I have a cron job set that moves the files from an EC2 instance to S3
aws s3 mv --recursive localdir s3://bucket-name/ --exclude "*" --include "localdir/*"
After that I use aws s3 sync s3://bucket-name/data1/ E:\Datafolder in .bat file and run task scheduler in Windows to run the command.
The issue is that s3 sync command copies all the files in /data1/ prefix.
So let's say I have the following files:
Day1: file1 is synced to local.
Day2: file1 and file2 are synced to local because file1 is removed from the local machine's folder.
I don't want them to occupy space on local machine. On Day 2, I just want file2 to be copied over.
Can this be accomplished by AWS CLI commands? or do I need to write a lambda function?
I followed the answer from Get last modified object from S3 using AWS CLI
but on Windows, the | and awk commands are not working as expected.
To obtain the name of the object that has the most recent Last Modified date, you can use:
aws s3api list-objects-v2 --bucket BUCKET-NAME --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
Therefore (using shell syntax), you could use:
object=`aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query 'sort_by(Contents, &LastModified)[-1].Key' --output text`
aws s3 cp s3://BUCKET-NAME/$object E:\Datafolder
You might need to tweak it to get it working on Windows.
Basically, it gets the bucket listing, sorts by LastModified, then grabs the name of the last object in the list.
Modified answer to work with Windows .bat file. Uses Windows cmd.exe
for /f "delims=" %%i in ('aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query "sort_by(Contents, &LastModified)[-1].Key" --output text') do set object=%%i
aws s3 cp s3://BUCKET-NAME/%object% E:\Datafolder

AWS S3 File merge using CLI

I am trying to combine/merge contents from all the files existing in a S3 bucket folder into a new file. The combine/merge should be done by the ascending order of the Last modified of the S3 file.
I am able to do that manually by having hard coded file names like as follows:
(aws s3 cp s3://bucket1/file1 - && aws s3 cp s3://bucket1/file2 - && aws s3 cp s3://bucket1/file3 - ) | aws s3 cp - s3://bucket1/new-file
But, now I want to change the CLI command so that we can do this file merge based on list of as many files as they exist in a folder, sorted by Last Modified. So ideally, the cp command should receive the list of all files that exist in a S3 bucket folder, sorted by Last Modified and then merge them into a new file.
I appreciate everyone's help on this.
Give you some hints.
First list the files in the reverse order of Last Modified.
aws s3api list-objects --bucket bucket1 --query "reverse(sort_by(Contents,&LastModified))"
Then you should be fine to attach the rest commands as you did
aws s3api list-objects --bucket bucket1 --query "reverse(sort_by(Contents,&LastModified))" |jq -r .[].Key |while read file
do
echo $file
# do the cat $file >> new-file
done
aws s3 cp new-file s3://bucket1/new-file

aws-cli command in windows to get the latest object from s3 bucket

I am using a command using aws cli in my windows machine to get latest file from s3 bucket .
aws s3 ls s3://Bucket-name --recursive | sort |tail -n 1
It is listing all the files in sorted manner according to date upto here:
aws s3 ls s3://Bucket-name --recursive | sort
But writing the full command throws error:
'Tail is not recognized as an internal or external command'.
Is there some other alternative for tail or for the full command.
The AWS CLI permits JMESPath expressions in the --query parameter.
This command shows the most recently-updated object:
aws s3api list-objects --bucket my-bucket --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
It's basically saying:
Sort by LastModified
Obtain the last [-1] entry
Show the Key (filename)

Copy Data From AWS S3 Bucket Locally Based on Date of File

I want to copy the Latest CSV file which has the date appended from an AWS S3 bucket to a local drive.
I have the basic code that will download the file but it downloads all the files in the bucket I only want the file uploaded that day, latest file.
Download latest object by modified date
If you only wish to grab the file that was last stored on Amazon S3, you could use:
aws s3 cp s3://my-bucket/`aws s3api list-objects-v2 --bucket my-bucket --query 'sort_by(Contents, &LastModified)[-1].Key' --output text` .
This command does the following:
The inner aws s3api list-objects-v2 command lists the bucket, sorts by date (reversed), then returns the Key (filename) of the object that was last modified
The outer aws s3 cp command downloads that object to the local directory
Download latest object based on filename
If your filenames are like:
some_file_20190130.csv
some_file_20190131.csv
some_file_20190201.csv
then you can list by prefix and copy the last one:
aws s3 cp s3://my-bucket/`aws s3api list-objects-v2 --bucket my-bucket --prefix some_file_ --query 'sort_by(Contents, &Key)[-1].Key' --output text` .
This command does the following:
The inner aws s3api list-objects-v2 command lists the bucket, only shows files with a given prefix of some_file_, sorts by Key (reversed), then returns the Key (filename) of the object that is at the end of the sort
The outer aws s3 cp command downloads that object to the local directory

AWS S3 download and copy

We have a bucket in AWS S3 where backups from production are being copy to.
My task is to copy the most recent backup file from AWS S3 to the local sandbox SQL Server, then do the restore.
I have installed all of the AWS tools for windows on the local server. Credentials to connect to AWS S3 work, etc.
My local server can list all of the files in the AWS S3 bucket. I can successfully download a single file if I specifically name that file.
Here is an example of that working pulling the most recent copy from July 25, 2016.
aws s3 cp s3://mybucket/databasefile_20160725.zip E:\DBA
My goal is to have a copy script that only pulls the most recent file, which I won't know the name of. I want to schedule this.
Nothing I google or try is getting me the correct syntax to do this.
to retrieve the latest file in your bucket you can do the following
aws s3api list-objects --bucket "mybucket" |\
jq '.Contents | sort_by(.LastModified) | .[-1].Key' --raw-output
The first command will list the objects of your bucket in Json, the elements of the JSon are listed here
Then you want to sort the element from their last modified date, take the last element, and you want they Key (i.e. name of the file in bucket). Adding the --raw-output flag to stripe quotes from the key name
You can reuse that in script or pipe it with the s3 cp command like below
aws s3api list-objects --bucket "mybucket" |\
jq '.Contents | sort_by(.LastModified) | .[-1].Key' --raw-output |\
xargs -I {} aws s3 cp s3://mybucket/{} E:\DBA