How to restore multiple files from a prefix in AWS - amazon-web-services

I have plenty of objects in AWS S3 Glacier, I'm trying to restore some of them which are on the same prefix (aka folder). However I can't find a way to restore them all at once, it might be worth mentioning that some of the elements in this prefix are prefixes themselves which I also want to restore.

I've managed to get it working. I had to write a simple bash script that iterates through all the objects in the bucket's prefix which are GLACIER or DEEP_ARCHIVE depending on the case. So there are two components to this:
First, you need a file with all the objects:
aws s3api list-objects-v2 --bucket someBucket --prefix
some/prefix/within/the/bucket/ --query "Contents[?StorageClass== 'GLACIER']"
-- output text | awk '{print $2}' > somefile.txt
The list-objects-v2 will list all the objects in the prefix, with the awk '{print $2}' command we'll make sure the resulting file is iterable and contains just the names of the objects.
Finally, iterate through the file restoring the objects:
for i in $(cat somefile.txt);
do
#echo "Sending request for object: $i"
aws s3api restore-object --bucket $BUCKET --key $i --restore-request Days=$DAYS
#echo "Request sent for object: $i"
done
You can uncomment the echo commands to make the execution more verbose but it's unnecessary for the most part.

This can be accomplished easily with s3cmd (https://s3tools.org/s3cmd).
You can use the restore command with the --recursive option. There is no need to gather a list of the objects you want to restore first. You just need to specify the days to restore the objects and the priority of the restore.
Example:
s3cmd restore --recursive s3://BUCKET-NAME/[PREFIX/TO/OBJECTS] --restore-days=60 --restore-priority=standard

Related

How to modify the batch file to get two last modified files from S3?

I currently have a bat file that copies the last modified file from AWS S3 bucket to local folder.
for /f "delims=" %%i in ('aws s3api list-objects-v2 --bucket example.sftp --prefix data/ --query "sort_by(Contents, &LastModified)[-1].Key" --output text') do set object=%%i
aws s3 cp s3://example.sftp/%object% E:\DATA_S3
I want to modify this to get two files when sorted by last modified instead of one. Changing [-1] to [-2] is not working.
You can use a JMESPath query to get the two most recent items using [-2:], and operate on each one in turn. Note that in the path, instead of using .Key, use .[Key], which causes AWS CLI's text output to delimit with newlines instead of tabs, allowing for easy parsing by the for operator of the batch file:
for /f "delims=" %%i in ('aws s3api list-objects-v2 --bucket example-bucket --prefix target/prefix/ --query "sort_by(Contents, &LastModified)[-2:].[Key]" --output text') do (
aws s3 cp s3://example-bucket/%%i folder\to\copy\to
)
JMESPath uses Python syntax for referencing lists, so use:
--query "sort_by(Contents, &LastModified)[-2:].Key"
This means "from the second last element to the end of the list".
I highly recommend the JMESPath Tutorial, which includes interactive testing of commands. I tested this concept using that page.

Copy the latest uploaded file from S3 bucket to local machine

I have a cron job set that moves the files from an EC2 instance to S3
aws s3 mv --recursive localdir s3://bucket-name/ --exclude "*" --include "localdir/*"
After that I use aws s3 sync s3://bucket-name/data1/ E:\Datafolder in .bat file and run task scheduler in Windows to run the command.
The issue is that s3 sync command copies all the files in /data1/ prefix.
So let's say I have the following files:
Day1: file1 is synced to local.
Day2: file1 and file2 are synced to local because file1 is removed from the local machine's folder.
I don't want them to occupy space on local machine. On Day 2, I just want file2 to be copied over.
Can this be accomplished by AWS CLI commands? or do I need to write a lambda function?
I followed the answer from Get last modified object from S3 using AWS CLI
but on Windows, the | and awk commands are not working as expected.
To obtain the name of the object that has the most recent Last Modified date, you can use:
aws s3api list-objects-v2 --bucket BUCKET-NAME --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
Therefore (using shell syntax), you could use:
object=`aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query 'sort_by(Contents, &LastModified)[-1].Key' --output text`
aws s3 cp s3://BUCKET-NAME/$object E:\Datafolder
You might need to tweak it to get it working on Windows.
Basically, it gets the bucket listing, sorts by LastModified, then grabs the name of the last object in the list.
Modified answer to work with Windows .bat file. Uses Windows cmd.exe
for /f "delims=" %%i in ('aws s3api list-objects-v2 --bucket BUCKET-NAME --prefix data1/ --query "sort_by(Contents, &LastModified)[-1].Key" --output text') do set object=%%i
aws s3 cp s3://BUCKET-NAME/%object% E:\Datafolder

Amazon S3 Copy files after date and with regex

I'm trying to copy some files from S3 sourceBucket to targetBucket, but I need to filter by date and by prefix.
I wish it could be done with AWS CLI, but at the moment I'm stuck with list-object or with cp command.
I can filter correctly with
aws s3api list-objects-v2 --bucket sourceBucket --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix "somePrefix_"
With the CP I can copy the files, but only by prefix
aws s3 cp s3://sourceBucket/ s3://targetBucket/ --recursive --include "somePrefix" --exclude "*"
I tried to come up with some ideas using the header --x-amz-copy-source-if-modified-since but it looks like you can use it with the command aws s3api copy-object and it copies one item at a time (doc).
I read some answers/docs and I think I understood che cp command doesn't filter by date, but only by prefix.
Do you have any idea on how to solve this?
Thank you in advance!
Since you already have a list with objects you want to copy to another bucket, I suggest writing a bash script which does the copying for multiple objects:
#!/bin/bash
SOURCE_BUCKET="<my-bucket>"
DESTINATION_BUCKET="<my-other-bucket>"
PREFIX="<some-prefix>"
content=$(aws s3api list-objects-v2 --bucket $SOURCE_BUCKET --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix $PREFIX | jq -r ".[].Key")
for file in $content;
do
aws s3api copy-object --copy-source $SOURCE_BUCKET/$file --key $file --bucket $DESTINATION_BUCKET | jq
done
Please note, this scripts requires jq to be installed.

How to sort ascending order by last modified date for s3 using aws cli

Below code sort by desc. How do I have it sort by ascending?
KEY=`aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'`
It appears that you wish to obtain the Key of the most recently modified object in the Amazon S3 bucket.
For that, you can use:
aws s3api list-objects --bucket bucketname --query 'sort_by(Contents, &LastModified)[-1].Key' --output text
The AWS CLI --query parameter is highly capable. It uses JMESPath, which can do most required manipulations without needing to pipe data.
The aws s3api list-objects command provides information in specific fields, rather than the aws s3 ls command which is simply text output.
The above might not work as expected if there are more than 1000 objects in the bucket, since results are returned in batches of 1000.
Use: sort -r for ascending order
From the manpage for sort
-r, --reverse
reverse the result of comparisons

How to delete multiple files in S3 bucket with AWS CLI

Suppose I have an S3 bucket named x.y.z
In this bucket, I have hundreds of files. But I only want to delete 2 files named purple.gif and worksheet.xlsx
Can I do this from the AWS command line tool with a single call to rm?
This did not work:
$ aws s3 rm s3://x.y.z/worksheet.xlsx s3://x.y.z/purple.gif
Unknown options: s3://x.y.z/purple.gif
From the manual, it doesn't seem like you can delete a list of files explicitly by name. Does anyone know a way to do it? I prefer not using the --recursive flag.
You can do this by providing an --exclude or --include argument multiple times. But, you'll have to use --recursive for this to work.
When there are multiple filters, remember that the order of the filter parameters is important. The rule is the filters that appear later in the command take precedence over filters that appear earlier in the command.
aws s3 rm s3://x.y.z/ --recursive --exclude "*" --include "purple.gif" --include "worksheet.xlsx"
Here, all files will be excluded from the command except for purple.gif and worksheet.xlsx.
If you're unsure, always try a --dryrun first and inspect which files will be deleted.
Source: Use of Exclude and Include Filters
s3 rm cannot delete multiple files, but you can use s3api delete-objects to achieve what you want here.
Example
aws s3api delete-objects --bucket x.y.z --delete '{"Objects":[{"Key":"worksheet.xlsx"},{"Key":"purple.gif"}]}'
Apparently aws s3 rm works only on individual files/objects.
Below is a bash command that constructs individual delete commands and then removes the objects one by one. Works with some success (might be bit slow, but works):
aws s3 ls s3://bucketname/foldername/ |
awk {'print "aws s3 rm s3://bucketname/foldername/" $4'} |
bash
The first two lines are meant to construct the "rm" commands and the 3rd line (bash) will execute them.
Note that you might face issues if your object names have spaces or funny characters. This is because "aws s3 ls" command won't list such objects (as of this writing)
This command deletes files in a bucket.
aws s3 rm s3://buketname --recursive
If you are using AWS CLI you can filter LS results with grep regex and delete them. For example
aws s3 ls s3://BUCKET | awk '{print $4}' | grep -E -i '^2015-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9a-zA-Z]*)' | xargs -I% bash -c 'aws s3 rm s3://BUCKET/%'
This is slow but it works
This solution will work when you want to specify wildcard for object name.
aws s3 ls dmap-live-dwh-files/backup/mongodb/oms_api/hourly/ | grep order_2019_08_09_* | awk {'print "aws s3 rm s3://dmap-live-dwh-files/backup/mongodb/oms_api/hourly/" $4'} | bash
I found this one useful through the command line. I had more than 4 million files and it took almost a week to empty the bucket. This comes handy as the AWS console is not descriptive with the logs.
Note: You need the jq tool installed.
aws s3api list-object-versions --bucket YOURBUCKETNAMEHERE-processed \
--output json --query 'Versions[].[Key, VersionId]' \
| jq -r '.[] | "--key '\''" + .[0] + "'\'' --version-id " + .[1]' \
| xargs -L1 aws s3api delete-object --bucket YOURBUCKETNAMEHERE
You can delete multiple files using aws s3 rm. If you want to delete all files in a specific folder, just use
aws s3 rm --recursive --region <AWS_REGION> s3://<AWS_BUCKET>/<FOLDER_PATH>/
first test it with the --dryrun option!
Quick way to delete a very large Folder in AWS
AWS_PROFILE=<AWS_PROFILE> AWS_BUCKET=<AWS_BUCKET> AWS_FOLDER=<AWS_FOLDER>; aws --profile $AWS_PROFILE s3 ls "s3://${AWS_BUCKET}/${AWS_FOLDER}/" | awk '{print $4}' | xargs -P8 -n1000 bash -c 'aws --profile '${AWS_PROFILE}' s3api delete-objects --bucket '${AWS_BUCKET}' --delete "Objects=[$(printf "{Key='${AWS_FOLDER}'/%s}," "$#")],Quiet=true" >/dev/null 2>&1'
PS: This might be launch 2/3 times because sometimes, some deletion fails...