How to delete multiple files in S3 bucket with AWS CLI - amazon-web-services

Suppose I have an S3 bucket named x.y.z
In this bucket, I have hundreds of files. But I only want to delete 2 files named purple.gif and worksheet.xlsx
Can I do this from the AWS command line tool with a single call to rm?
This did not work:
$ aws s3 rm s3://x.y.z/worksheet.xlsx s3://x.y.z/purple.gif
Unknown options: s3://x.y.z/purple.gif
From the manual, it doesn't seem like you can delete a list of files explicitly by name. Does anyone know a way to do it? I prefer not using the --recursive flag.

You can do this by providing an --exclude or --include argument multiple times. But, you'll have to use --recursive for this to work.
When there are multiple filters, remember that the order of the filter parameters is important. The rule is the filters that appear later in the command take precedence over filters that appear earlier in the command.
aws s3 rm s3://x.y.z/ --recursive --exclude "*" --include "purple.gif" --include "worksheet.xlsx"
Here, all files will be excluded from the command except for purple.gif and worksheet.xlsx.
If you're unsure, always try a --dryrun first and inspect which files will be deleted.
Source: Use of Exclude and Include Filters

s3 rm cannot delete multiple files, but you can use s3api delete-objects to achieve what you want here.
Example
aws s3api delete-objects --bucket x.y.z --delete '{"Objects":[{"Key":"worksheet.xlsx"},{"Key":"purple.gif"}]}'

Apparently aws s3 rm works only on individual files/objects.
Below is a bash command that constructs individual delete commands and then removes the objects one by one. Works with some success (might be bit slow, but works):
aws s3 ls s3://bucketname/foldername/ |
awk {'print "aws s3 rm s3://bucketname/foldername/" $4'} |
bash
The first two lines are meant to construct the "rm" commands and the 3rd line (bash) will execute them.
Note that you might face issues if your object names have spaces or funny characters. This is because "aws s3 ls" command won't list such objects (as of this writing)

This command deletes files in a bucket.
aws s3 rm s3://buketname --recursive

If you are using AWS CLI you can filter LS results with grep regex and delete them. For example
aws s3 ls s3://BUCKET | awk '{print $4}' | grep -E -i '^2015-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9a-zA-Z]*)' | xargs -I% bash -c 'aws s3 rm s3://BUCKET/%'
This is slow but it works

This solution will work when you want to specify wildcard for object name.
aws s3 ls dmap-live-dwh-files/backup/mongodb/oms_api/hourly/ | grep order_2019_08_09_* | awk {'print "aws s3 rm s3://dmap-live-dwh-files/backup/mongodb/oms_api/hourly/" $4'} | bash

I found this one useful through the command line. I had more than 4 million files and it took almost a week to empty the bucket. This comes handy as the AWS console is not descriptive with the logs.
Note: You need the jq tool installed.
aws s3api list-object-versions --bucket YOURBUCKETNAMEHERE-processed \
--output json --query 'Versions[].[Key, VersionId]' \
| jq -r '.[] | "--key '\''" + .[0] + "'\'' --version-id " + .[1]' \
| xargs -L1 aws s3api delete-object --bucket YOURBUCKETNAMEHERE

You can delete multiple files using aws s3 rm. If you want to delete all files in a specific folder, just use
aws s3 rm --recursive --region <AWS_REGION> s3://<AWS_BUCKET>/<FOLDER_PATH>/
first test it with the --dryrun option!

Quick way to delete a very large Folder in AWS
AWS_PROFILE=<AWS_PROFILE> AWS_BUCKET=<AWS_BUCKET> AWS_FOLDER=<AWS_FOLDER>; aws --profile $AWS_PROFILE s3 ls "s3://${AWS_BUCKET}/${AWS_FOLDER}/" | awk '{print $4}' | xargs -P8 -n1000 bash -c 'aws --profile '${AWS_PROFILE}' s3api delete-objects --bucket '${AWS_BUCKET}' --delete "Objects=[$(printf "{Key='${AWS_FOLDER}'/%s}," "$#")],Quiet=true" >/dev/null 2>&1'
PS: This might be launch 2/3 times because sometimes, some deletion fails...

Related

Move files from one S3 folder to another S3 folder up to certain date

I'm trying to move all files from one S3 folder to another folder within the same bucket. But I would like to exclude any file within the last 15 days. Any help with python script or the --exclude command?
aws s3 mv s3://BUCKETNAME/myfolder/All_files.csv s3://BUCKETNAME/myotherfolder/All_files.csv --exclude last-fifteen-days
Here's a quick one but you'll still need to filter by date with wildcard.
aws s3 ls s3://bucketname/folder1 --recursive | \
grep '2021-10*' | \
awk '{print $4}' | \
xargs -I '{}' aws s3 mv s3://bucketname/'{}' s3://bucketname/folder2/'{}'

Amazon S3 Copy files after date and with regex

I'm trying to copy some files from S3 sourceBucket to targetBucket, but I need to filter by date and by prefix.
I wish it could be done with AWS CLI, but at the moment I'm stuck with list-object or with cp command.
I can filter correctly with
aws s3api list-objects-v2 --bucket sourceBucket --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix "somePrefix_"
With the CP I can copy the files, but only by prefix
aws s3 cp s3://sourceBucket/ s3://targetBucket/ --recursive --include "somePrefix" --exclude "*"
I tried to come up with some ideas using the header --x-amz-copy-source-if-modified-since but it looks like you can use it with the command aws s3api copy-object and it copies one item at a time (doc).
I read some answers/docs and I think I understood che cp command doesn't filter by date, but only by prefix.
Do you have any idea on how to solve this?
Thank you in advance!
Since you already have a list with objects you want to copy to another bucket, I suggest writing a bash script which does the copying for multiple objects:
#!/bin/bash
SOURCE_BUCKET="<my-bucket>"
DESTINATION_BUCKET="<my-other-bucket>"
PREFIX="<some-prefix>"
content=$(aws s3api list-objects-v2 --bucket $SOURCE_BUCKET --query 'Contents[?(LastModified > `2021-09-01`)]' --prefix $PREFIX | jq -r ".[].Key")
for file in $content;
do
aws s3api copy-object --copy-source $SOURCE_BUCKET/$file --key $file --bucket $DESTINATION_BUCKET | jq
done
Please note, this scripts requires jq to be installed.

How to restore multiple files from a prefix in AWS

I have plenty of objects in AWS S3 Glacier, I'm trying to restore some of them which are on the same prefix (aka folder). However I can't find a way to restore them all at once, it might be worth mentioning that some of the elements in this prefix are prefixes themselves which I also want to restore.
I've managed to get it working. I had to write a simple bash script that iterates through all the objects in the bucket's prefix which are GLACIER or DEEP_ARCHIVE depending on the case. So there are two components to this:
First, you need a file with all the objects:
aws s3api list-objects-v2 --bucket someBucket --prefix
some/prefix/within/the/bucket/ --query "Contents[?StorageClass== 'GLACIER']"
-- output text | awk '{print $2}' > somefile.txt
The list-objects-v2 will list all the objects in the prefix, with the awk '{print $2}' command we'll make sure the resulting file is iterable and contains just the names of the objects.
Finally, iterate through the file restoring the objects:
for i in $(cat somefile.txt);
do
#echo "Sending request for object: $i"
aws s3api restore-object --bucket $BUCKET --key $i --restore-request Days=$DAYS
#echo "Request sent for object: $i"
done
You can uncomment the echo commands to make the execution more verbose but it's unnecessary for the most part.
This can be accomplished easily with s3cmd (https://s3tools.org/s3cmd).
You can use the restore command with the --recursive option. There is no need to gather a list of the objects you want to restore first. You just need to specify the days to restore the objects and the priority of the restore.
Example:
s3cmd restore --recursive s3://BUCKET-NAME/[PREFIX/TO/OBJECTS] --restore-days=60 --restore-priority=standard

Selective file download in AWS CLI

I have files in S3 bucket. I was trying to download files based on a date, like 08th aug, 09th Aug etc.
I used the following code, but it still downloads the entire bucket:
aws s3 cp s3://bucketname/ folder/file \
--profile pname \
--exclude \"*\" \
--recursive \
--include \"" + "2015-08-09" + "*\"
I am not sure, how to achieve this. How can I download selective date file?
This command will copy all files starting with 2015-08-15:
aws s3 cp s3://BUCKET/ folder --exclude "*" --include "2015-08-15*" --recursive
If your goal is to synchronize a set of files without copying them twice, use the sync command:
aws s3 sync s3://BUCKET/ folder
That will copy all files that have been added or modified since the previous sync.
In fact, this is the equivalent of the above cp command:
aws s3 sync s3://BUCKET/ folder --exclude "*" --include "2015-08-15*"
References:
AWS CLI s3 sync command documentation
AWS CLI s3 cp command documentation
Bash Command to copy all files for specific date or month to current folder
aws s3 ls s3://bucketname/ | grep '2021-02' | awk '{print $4}' | aws s3 cp s3://bucketname/{} folder
Command is doing the following thing
Listing all the files under a bucket
Filtering out all the files of 2021-02 i.e. all files of feb month of 2021
Filtering out only the name of them
running command aws s3 cp on specific files
In case your bucket size is large in the upwards of 10 to 20 gigs,
this was true in my own personal use case, you can achieve the same
goal by using sync in multiple terminal windows.
All the terminal sessions can use the same token, in case you need to generate a token for prod environment.
$ aws s3 sync s3://bucket-name/sub-name/another-name folder-name-in-pwd/
--exclude "*" --include "name_date1*" --profile UR_AC_SomeName
and another terminal window (same pwd)
$ aws s3 sync s3://bucket-name/sub-name/another-name folder-name-in-pwd/
--exclude "*" --include "name_date2*" --profile UR_AC_SomeName
and another two for "name_date3*" and "name_date4*"
Additionally, you can also do multiple excludes in the same sync
command as in:
$ aws s3 sync s3://bucket-name/sub-name/another-name my-local-path/
--exclude="*.log/*" --exclude=img --exclude=".error" --exclude=tmp
--exclude="*.cache"
This Bash Script will copy all files from one bucket to another by modified-date using aws-cli.
aws s3 ls <BCKT_NAME> --recursive | sort | grep "2020-08-*" | cut -b 32- > a.txt
Inside Bash File
while IFS= read -r line; do
aws s3 cp s3://<SRC_BCKT>/${line} s3://<DEST_BCKT>/${line} --sse AES256
done < a.txt
aws cli is really slow at this. I waited hours and nothing really happened. So I looked for alternatives.
https://github.com/peak/s5cmd worked great.
supports globs, for example:
s5cmd -numworkers 30 cp 's3://logs-bucket/2022-03-30-19-*' .
is really blazing fast, so you can work with buckets that have s3 access logs without much fuss.

Get last modified object from S3 using AWS CLI

I have a use case where I programmatically bring up an EC2 instance, copy an executable file from S3, run it and shut down the instance (done in user-data). I need to get only the last added file from S3.
Is there a way to get the last modified file / object from a S3 bucket using the AWS CLI tool?
You can list all the objects in the bucket with aws s3 ls $BUCKET --recursive:
$ aws s3 ls $BUCKET --recursive
2015-05-05 15:36:17 4 an_object.txt
2015-06-08 14:14:44 16322599 some/other/object
2015-04-29 12:09:29 32768 yet-another-object.sh
They're sorted alphabetically by key, but that first column is the last modified time. A quick sort will reorder them by date:
$ aws s3 ls $BUCKET --recursive | sort
2015-04-29 12:09:29 32768 yet-another-object.sh
2015-05-05 15:36:17 4 an_object.txt
2015-06-08 14:14:44 16322599 some/other/object
tail -n 1 selects the last row, and awk '{print $4}' extracts the fourth column (the name of the object).
$ aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'
some/other/object
Last but not least, drop that into aws s3 cp to download the object:
$ KEY=`aws s3 ls $BUCKET --recursive | sort | tail -n 1 | awk '{print $4}'`
$ aws s3 cp s3://$BUCKET/$KEY ./latest-object
Updated answer
After a while there is a small update how to do it a bit elegant:
aws s3api list-objects-v2 --bucket "my-awesome-bucket" --query 'sort_by(Contents, &LastModified)[-1].Key' --output=text
Instead of extra reverse function we can get last entry from the list via [-1]
Old answer
This command just do the job without any external dependencies:
aws s3api list-objects-v2 --bucket "my-awesome-bucket" --query 'reverse(sort_by(Contents, &LastModified))[:1].Key' --output=text
aws s3api list-objects-v2 --bucket "bucket-name" |jq -c ".[] | max_by(.LastModified)|.Key"
If this is a freshly uploaded file, you can use Lambda to execute a piece of code on the new S3 object.
If you really need to get the most recent one, you can name you files with the date first, sort by name, and take the first object.
Following is bash script, that downloads latest file from a S3 Bucket. I used AWS S3 Synch command instead, so that it would not download the file from S3 if already existing.
--exclude, excludes all the files
--include, includes all the files matching the pattern
#!/usr/bin/env bash
BUCKET="s3://my-s3-bucket-eu-west-1/list/"
FILE_NAME=`aws s3 ls $BUCKET | sort | tail -n 1 | awk '{print $4}'`
TARGET_FILE_PATH=target/datdump/
TARGET_FILE=${TARGET_FILE_PATH}localData.json.gz
echo $FILE_NAME
echo $TARGET_FILE
aws s3 sync $BUCKET $TARGET_FILE_PATH --exclude "*" --include "*$FILE_NAME*"
cp target/datdump/$FILE_NAME $TARGET_FILE
p.s. Thanks #David Murray