Copy the latest uploaded file from S3 bucket to local machine - amazon-web-services

I have a cron job set that moves the files from an EC2 instance to S3
aws s3 mv --recursive localdir s3://bucket-name/ --exclude "*" --include "localdir/*"
After that I use aws s3 sync s3://bucket-name/data1/ E:\Datafolder in .bat file and run task scheduler in Windows to run the command.
The issue is that s3 sync command copies all the files in /data1/ prefix.
So let's say I have the following files:
Day1: file1 is synced to local.
Day2: file1 and file2 are synced to local because file1 is removed from the local machine's folder.
I don't want them to occupy space on local machine. On Day 2, I just want file2 to be copied over.
Can this be accomplished by AWS CLI commands? or do I need to write a lambda function?
I followed the answer from Get last modified object from S3 using AWS CLI
but on Windows, the | and awk commands are not working as expected.

To obtain the name of the object that has the most recent Last Modified date, you can use:
aws s3api list-objects-v2 --bucket BUCKET-NAME --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
Therefore (using shell syntax), you could use:
object=`aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query 'sort_by(Contents, &LastModified)[-1].Key' --output text`
aws s3 cp s3://BUCKET-NAME/$object E:\Datafolder
You might need to tweak it to get it working on Windows.
Basically, it gets the bucket listing, sorts by LastModified, then grabs the name of the last object in the list.

Modified answer to work with Windows .bat file. Uses Windows cmd.exe
for /f "delims=" %%i in ('aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query "sort_by(Contents, &LastModified)[-1].Key" --output text') do set object=%%i
aws s3 cp s3://BUCKET-NAME/%object% E:\Datafolder

Related

Amazon S3 Copy files after date and with regex

I'm trying to copy some files from S3 sourceBucket to targetBucket, but I need to filter by date and by prefix.
I wish it could be done with AWS CLI, but at the moment I'm stuck with list-object or with cp command.
I can filter correctly with
aws s3api list-objects-v2 --bucket sourceBucket --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix "somePrefix_"
With the CP I can copy the files, but only by prefix
aws s3 cp s3://sourceBucket/ s3://targetBucket/ --recursive --include "somePrefix" --exclude "*"
I tried to come up with some ideas using the header --x-amz-copy-source-if-modified-since but it looks like you can use it with the command aws s3api copy-object and it copies one item at a time (doc).
I read some answers/docs and I think I understood che cp command doesn't filter by date, but only by prefix.
Do you have any idea on how to solve this?
Thank you in advance!
Since you already have a list with objects you want to copy to another bucket, I suggest writing a bash script which does the copying for multiple objects:
#!/bin/bash
SOURCE_BUCKET="<my-bucket>"
DESTINATION_BUCKET="<my-other-bucket>"
PREFIX="<some-prefix>"
content=$(aws s3api list-objects-v2 --bucket $SOURCE_BUCKET --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix $PREFIX | jq -r ".[].Key")
for file in $content;
do
aws s3api copy-object --copy-source $SOURCE_BUCKET/$file --key $file --bucket $DESTINATION_BUCKET | jq
done
Please note, this scripts requires jq to be installed.

Why is S3 CLI returning no CommonPrefixes?

I'm trying to list the 'folders' in a S3 bucket under a given prefix.
aws --profile my-profile s3api list-objects-v2 --bucket my-bucket --prefix releases/com/example/app/ --delimiter / --query 'CommonPrefixes[*].Prefix'
There are dozens of folders under the prefix, each containing many files, I I should be able to list the folders., i.e. they do exist
There is no CommonPrefixes returned by this query, so I get null as an output. What am I doing wrong?
running aws-cli/2.0.0 Python/3.7.5 Windows/10 botocore/2.0.0dev4 in Git Bash terminal on Windows.

Copy Data From AWS S3 Bucket Locally Based on Date of File

I want to copy the Latest CSV file which has the date appended from an AWS S3 bucket to a local drive.
I have the basic code that will download the file but it downloads all the files in the bucket I only want the file uploaded that day, latest file.
Download latest object by modified date
If you only wish to grab the file that was last stored on Amazon S3, you could use:
aws s3 cp s3://my-bucket/`aws s3api list-objects-v2 --bucket my-bucket --query 'sort_by(Contents, &LastModified)[-1].Key' --output text` .
This command does the following:
The inner aws s3api list-objects-v2 command lists the bucket, sorts by date (reversed), then returns the Key (filename) of the object that was last modified
The outer aws s3 cp command downloads that object to the local directory
Download latest object based on filename
If your filenames are like:
some_file_20190130.csv
some_file_20190131.csv
some_file_20190201.csv
then you can list by prefix and copy the last one:
aws s3 cp s3://my-bucket/`aws s3api list-objects-v2 --bucket my-bucket --prefix some_file_ --query 'sort_by(Contents, &Key)[-1].Key' --output text` .
This command does the following:
The inner aws s3api list-objects-v2 command lists the bucket, only shows files with a given prefix of some_file_, sorts by Key (reversed), then returns the Key (filename) of the object that is at the end of the sort
The outer aws s3 cp command downloads that object to the local directory

AWS CLI Download list of S3 files

We have ~400,000 files on a private S3 bucket that are inbound/outbound call recordings. The files have a certain pattern to it that lets me search for numbers both inbound and outbound. Note these calls are on the Glacier storage class
Using AWS CLI, I can search through this bucket and grep the files I need out. What I'd like to do is now initiate an S3 restore job to expedited retrieval (so ~1-5 minute recovery time), and then maybe 30 minutes later run a command to download the files.
My efforts so far:
aws s3 ls s3://exetel-logs/ --recursive | grep .*042222222.* | cut -c 32-
Retreives the key of about 200 files. I am unsure of how to proceed next, as aws s3 cp wont work for any objects in storage class.
Cheers,
The AWS CLI has two separate commands for S3: s3 ands3api. s3 is a high level abstraction with limited features, so for restoring files, you'll have to use one of the commands available with s3api:
aws s3api restore-object --bucket exetel-logs --key your-key
If you afterwards want to copy the files, but want to ensure to only copy files which were restored from Glacier, you can use the following code snippet:
for key in $(aws s3api list-objects-v2 --bucket exetel-logs --query "Contents[?StorageClass=='GLACIER'].[Key]" --output text); do
if [ $(aws s3api head-object --bucket exetel-logs --key ${key} --query "contains(Restore, 'ongoing-request=\"false\"')") == true ]; then
echo ${key}
fi
done
Have you considered using a high-level language wrapper for the AWS CLI? It will make these kinds of tasks easier to integrate into your workflows. I prefer the Python implementation (Boto 3). Here is example code for how to download all files from an S3 bucket.

AWS CLI move all files with condition

I must move into another bucket only files changed in the year 2015. How can I write this condition?
aws s3 mv <condition??> s3://bucket1 s3://bucket2 --recursive
I don't think you can directly do that through through the s3 option.
what you can do though is a 2 steps approach:
get the list of files that have been modified after a date
aws s3api list-objects --bucket bucket1" --query 'Contents[?LastModified > `2015-01-01`].[Key]' --output text
Based on this list you can move the items.
I have not tried and not an shell expert but something around this
aws s3api list-objects --bucket "<YOUR_BUCKET>" --query 'Contents[?LastModified > `2015-01-01`].[Key]' --output text | xargs aws s3 mv s3://bucket2/ -