AWS S3 download and copy - amazon-web-services

We have a bucket in AWS S3 where backups from production are being copy to.
My task is to copy the most recent backup file from AWS S3 to the local sandbox SQL Server, then do the restore.
I have installed all of the AWS tools for windows on the local server. Credentials to connect to AWS S3 work, etc.
My local server can list all of the files in the AWS S3 bucket. I can successfully download a single file if I specifically name that file.
Here is an example of that working pulling the most recent copy from July 25, 2016.
aws s3 cp s3://mybucket/databasefile_20160725.zip E:\DBA
My goal is to have a copy script that only pulls the most recent file, which I won't know the name of. I want to schedule this.
Nothing I google or try is getting me the correct syntax to do this.

to retrieve the latest file in your bucket you can do the following
aws s3api list-objects --bucket "mybucket" |\
jq '.Contents | sort_by(.LastModified) | .[-1].Key' --raw-output
The first command will list the objects of your bucket in Json, the elements of the JSon are listed here
Then you want to sort the element from their last modified date, take the last element, and you want they Key (i.e. name of the file in bucket). Adding the --raw-output flag to stripe quotes from the key name
You can reuse that in script or pipe it with the s3 cp command like below
aws s3api list-objects --bucket "mybucket" |\
jq '.Contents | sort_by(.LastModified) | .[-1].Key' --raw-output |\
xargs -I {} aws s3 cp s3://mybucket/{} E:\DBA

Related

AWS S3 CLI copy by modified date

I have a large database on AWS S3. I am trying to download data files which only modified on 24 June 2022 to my local workstation. I can download entire database to given path as below but how can I just download data files filtered by modified date?
aws s3 cp s3://database2 .
Answer:
aws s3api list-objects --bucket "bucket_name" --output text --query "Contents[?LastModified>='2022-06-24'].{Key: Key}" | xargs -I {} aws s3 cp s3://bucket_name/{} .

How to delete a bucket starting with a name using aws cli?

Is there any command of aws cli to delete all buckets starting with a specific name?
When I ran:
aws s3 ls
I got so many buckets, we need to frequently cleanup. If got any command, I can create a pipeline for it in gitlab and use that to cleanup.
Like you observe, I want to delete all the buckets starting with somename-
I tried using
aws s3 rb --force s3://somename-*
It didn't work.
There's no built in way to accomplish this. On Unix-like platforms, you can list all of the buckets and use some tools to filter the list and call in the CLI to remove all of the buckets that match some pattern:
aws s3api list-buckets --query 'Buckets[].[Name]' --output text | grep "^somename-" | xargs -n1 -IB echo aws s3 rb s3://B
Remove the "echo " after verifying the command will remove the buckets you want removed.

AWS S3 File merge using CLI

I am trying to combine/merge contents from all the files existing in a S3 bucket folder into a new file. The combine/merge should be done by the ascending order of the Last modified of the S3 file.
I am able to do that manually by having hard coded file names like as follows:
(aws s3 cp s3://bucket1/file1 - && aws s3 cp s3://bucket1/file2 - && aws s3 cp s3://bucket1/file3 - ) | aws s3 cp - s3://bucket1/new-file
But, now I want to change the CLI command so that we can do this file merge based on list of as many files as they exist in a folder, sorted by Last Modified. So ideally, the cp command should receive the list of all files that exist in a S3 bucket folder, sorted by Last Modified and then merge them into a new file.
I appreciate everyone's help on this.
Give you some hints.
First list the files in the reverse order of Last Modified.
aws s3api list-objects --bucket bucket1 --query "reverse(sort_by(Contents,&LastModified))"
Then you should be fine to attach the rest commands as you did
aws s3api list-objects --bucket bucket1 --query "reverse(sort_by(Contents,&LastModified))" |jq -r .[].Key |while read file
do
echo $file
# do the cat $file >> new-file
done
aws s3 cp new-file s3://bucket1/new-file

aws-cli command in windows to get the latest object from s3 bucket

I am using a command using aws cli in my windows machine to get latest file from s3 bucket .
aws s3 ls s3://Bucket-name --recursive | sort |tail -n 1
It is listing all the files in sorted manner according to date upto here:
aws s3 ls s3://Bucket-name --recursive | sort
But writing the full command throws error:
'Tail is not recognized as an internal or external command'.
Is there some other alternative for tail or for the full command.
The AWS CLI permits JMESPath expressions in the --query parameter.
This command shows the most recently-updated object:
aws s3api list-objects --bucket my-bucket --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
It's basically saying:
Sort by LastModified
Obtain the last [-1] entry
Show the Key (filename)

AWS CLI Download list of S3 files

We have ~400,000 files on a private S3 bucket that are inbound/outbound call recordings. The files have a certain pattern to it that lets me search for numbers both inbound and outbound. Note these calls are on the Glacier storage class
Using AWS CLI, I can search through this bucket and grep the files I need out. What I'd like to do is now initiate an S3 restore job to expedited retrieval (so ~1-5 minute recovery time), and then maybe 30 minutes later run a command to download the files.
My efforts so far:
aws s3 ls s3://exetel-logs/ --recursive | grep .*042222222.* | cut -c 32-
Retreives the key of about 200 files. I am unsure of how to proceed next, as aws s3 cp wont work for any objects in storage class.
Cheers,
The AWS CLI has two separate commands for S3: s3 ands3api. s3 is a high level abstraction with limited features, so for restoring files, you'll have to use one of the commands available with s3api:
aws s3api restore-object --bucket exetel-logs --key your-key
If you afterwards want to copy the files, but want to ensure to only copy files which were restored from Glacier, you can use the following code snippet:
for key in $(aws s3api list-objects-v2 --bucket exetel-logs --query "Contents[?StorageClass=='GLACIER'].[Key]" --output text); do
if [ $(aws s3api head-object --bucket exetel-logs --key ${key} --query "contains(Restore, 'ongoing-request=\"false\"')") == true ]; then
echo ${key}
fi
done
Have you considered using a high-level language wrapper for the AWS CLI? It will make these kinds of tasks easier to integrate into your workflows. I prefer the Python implementation (Boto 3). Here is example code for how to download all files from an S3 bucket.