AWS remove old files in repo S3 - amazon-web-services

AWS remove old files in repo S3 - amazon-web-services

I have a repository in s3 with files with old dates (< 20-10).
I want to delete only those files.
The problem is that when you apply xargs rm you can't delete the files because it concatenates by default the date with the name d
aws s3 ls s3://my_repo/
2019-10-17 06:07:09 9307 20191017_060016_00112_u246w_0950f96f-a55a-4ce5-b0f3-b271ecb8fe90
2019-10-17 06:07:09 467791 20191017_060016_00112_u246w_94bbd3a2-76ea-4c04-8189-d963168ea34b
2019-10-21 19:35:12 1633 20191021_193156_01159_myrsw_2e68c0e4-b1a3-4abf-94b3-797ef653b742
2019-10-21 19:35:12 1643 20191021_193156_01159_myrsw_3491c665-82e3-475c-bba2-35e7d61d7912
aws s3 ls s3://my_repo/ | awk '$1 < "2019-10-20 00:00:00" '
2019-10-17 06:07:09 9307 20191017_060016_00112_u246w_0950f96f-a55a-4ce5-b0f3-b271ecb8fe90
2019-10-17 06:07:09 467791 20191017_060016_00112_u246w_94bbd3a2-76ea-4c04-8189-d963168ea34b
aws s3 ls s3://my_repo/ | awk '$1 < "2019-10-20 00:00:00" {print $0}' | xargs -0 rm --
rm: cannot remove '2019-10-17 06:07:09 9307 20191017_060016_00112_u246w_0950f96f-a55a-4ce5-b0f3-b271ecb8fe90': File name too long

Rather than using aws s3 ls, you can use:
aws s3api list-objects --bucket my-bucket --query "Contents[?LastModified<='2019-06-01'].[Key]" --output text
This will list the name (Key) of objects created before the given date.

Related

My aws cli command returns an error message "The specified key does not exist."

here is the command I used, and it works fine when there is no space space in S3 URL
aws s3 ls <s3://bucket/folder/> --recursive | awk '{print $4}' | awk "NR >= 2" | xargs -I %%% aws s3api restore -object --bucket <bucket> --restore-request Days=3,GlacierJobParameters={"Tier"="Bulk"} --key %%%
But if there is space in s3 url like the picture I attached, it returns an error message. I don't know what the problem is, how do I fix it?

Total size of folder in S3 bucket

I have a nested "folder"/object structure in my S3 bucket:
myBucket/level1/level2/file1.txt
myBucket/level1/level2/file2.txt
...
Is there any way with aws s3 ls or aws s3api to list the size of each "folder"/object on level2?
Thanks!

You can use the following command:
aws s3 ls s3://myBucket/level1/level2/ --recursive --summarize | awk 'BEGIN{ FS= " "} /Total Size/ {print $3}'
It will print the sum of the sizes, in bytes, of all files under level2.

AWS s3api command to get list object with lastmodified greater than yyyy-mm-dd HH:MM:SS

I'm Trying to fetch the objects list from the S3 bucket which are uploaded recently. But only Contents[?LastModified>='yyyy-mm-hh'] comparision is working
in query. When I tried with Contents[?LastModified>='yyyy-mm-hh HH:MM:SS'] then its comparing only yyyy-mm-dd giving the list which has been updated in that day and when i tried to fetch files which has added recently with timestamp HH:MM:SS its giving all the objects added in that day.
echo "###################### Previous Run : $previous_run"
dat2=$(date -d "$previous_run" "+%Y-%m-%d %H:%M:%S")
echo $dat2
get_Latest_Files()
{
#get new files from s3
json_var=$(aws s3api list-objects --bucket "$input_bucket" --prefix "$input_prefix" --query "Contents[?LastModified>='$dat2'].{Key: Key,LastModified: LastModified}" --output text)
echo "$json_var"
if [ -z "$json_var" ]
then
echo "No latest files to Process...!"
exit
else
#grep for tgz files
echo $json_var | tr " " "\n" | egrep -i "(\.tgz)|(\.tar\.gz)$" | awk -v prefix="s3://$input_bucket/" '{print prefix $0}' > input_files.txt
cat input_files.txt
fi
}

Solution:
Try to use this format of the date:
"Contents[?LastModified>='2019-07-26T17:49:00.000Z'][].{Key: Key,LastModified: LastModified}"

looking for s3cmd download command for a certain date

I am trying to figure out on what the s3cmd command would be to download files from bucket by date, so for example i have a bucket named "test" and in that bucket there are different files from different dates. I am trying to get the files that were uploaded yesterday. what would the command be?

There is no single command that will allow you to do that. You have to write a script some thing like this. Or use a SDK that allows you to do this. Below script is a sample script that will get S3 files from last 30 days.
#!/bin/bash
# Usage: ./getOld "bucketname" "30 days"
s3cmd ls s3://$1 | while read -r line; do
createDate=`echo $line|awk {'print $1" "$2'}`
createDate=`date -d"$createDate" +%s`
olderThan=`date -d"-$2" +%s`
if [[ $createDate -lt $olderThan ]]
then
fileName=`echo $line|awk {'print $4'}`
echo $fileName
if [[ $fileName != "" ]]
then
s3cmd get "$fileName"
fi
fi
done;

I like s3cmd but to work with single line command, I prefer the JSon output of aws cli and jq JSon processor
The command will look like
aws s3api list-objects --bucket "yourbucket" |\
jq '.Contents[] | select(.LastModified | startswith("yourdate")).Key' --raw-output |\
xargs -I {} aws s3 cp s3://yourbucket/{} .
basically what the script does
list all object from a given bucket
(the interesting part) jq will parse the Contents array and select element where the LastModified value start with your pattern (you will need to change), get the Key of the s3 object and add --raw-output so it strips the quote from the value
pass the result to an aws copy command to download the file from s3
if you want to automate a bit further you can get yesterday from the command line
for mac os
$ export YESTERDAY=`date -v-1w +%F`
$ aws s3api list-objects --bucket "ariba-install" |\
jq '.Contents[] | select(.LastModified | startswith('\"$YESTERDAY\"')).Key' --raw-output |\
xargs -I {} aws s3 cp s3://ariba-install/{} .
for linux os (or other flavor of bash that I am not familiar)
$ export YESTERDAY=`date -d "1 day ago" '+%Y-%m-%d' `
$ aws s3api list-objects --bucket "ariba-install" |\
jq '.Contents[] | select(.LastModified | startswith('\"$YESTERDAY\"')).Key' --raw-output |\
xargs -I {} aws s3 cp s3://ariba-install/{} .
Now you get the idea if you want to change the YESTERDAY variable to have different kind of date

Recover Deleted Objects From Amazon S3

I have a bucket (version enabled), how can i get back the objects that are accidentally permanent deleted from my bucket.

I have created a script to restore the objects with deletemarker. You'll have to input it like below:
sh Undelete_deletemarker.sh bucketname path/to/certain/folder
**Script:**
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
aws s3api list-object-versions --bucket $1 --prefix $2 --output text |
grep "DELETEMARKERS" | while read obj
do
KEY=$( echo $obj| awk '{print $3}')
VERSION_ID=$( echo $obj | awk '{print $5}')
echo $KEY
echo $VERSION_ID
aws s3api delete-object --bucket $1 --key $KEY --version-id
$VERSION_ID
done
Happy Coding! ;)

Thank you, Kc Bickey, this script works wonderfully! Only thing I might add for others is to make sure " $VERSION_ID" immediately follows "--version-id" on line 12. The forum seems to have wrapped " $VERSION_ID" to the next line and it causes the script to error until that's corrected.
**Script:**
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
aws s3api list-object-versions --bucket $1 --prefix $2 --output text |
grep "DELETEMARKERS" | while read obj
do
KEY=$( echo $obj| awk '{print $3}')
VERSION_ID=$( echo $obj | awk '{print $5}')
echo $KEY
echo $VERSION_ID
aws s3api delete-object --bucket $1 --key $KEY --version-id $VERSION_ID
done

with bucket versioning enable to permanently delete an object you need to specifically mention the version of the object DELETE Object versionId
If you've done so you cannot recover this specific version, you get access to previous version
When versioning is enabled, a simple DELETE cannot permanently delete an object. Instead, Amazon S3 inserts a delete marker in the bucket so you can recover from this specific marker, but if the marker is deleted (and you mention it was permanent deleted) you cannot recover
did you enable Cross-Region Replication ? If so you can retrieve the object in the other region:
If a DELETE request specifies a particular object version ID to delete, Amazon S3 will delete that object version in the source bucket, but it will not replicate the deletion in the destination bucket (in other words, it will not delete the same object version from the destination bucket). This behavior protects data from malicious deletions.
Edit: If you have versioning enabled on your bucket you should get the Versions Hide/Show toggle button and when Show is selected you should have the additional Version ID column as per the screenshot from my bucket

If your bucket objects has white spaces in filename, previous scripts may not work properly. This script take the key including white spaces.
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
aws s3api list-object-versions --bucket $1 --prefix $2 --output text |
grep "DELETEMARKERS" | while read obj
do
KEY=$( echo $obj| awk '{indice=index($0,$(NF-1))-index($0,$3);print substr($0, index($0,$3), indice-1)}')
VERSION_ID=$( echo $obj | awk '{print $NF}')
echo $KEY
echo $VERSION_ID
aws s3api delete-object --bucket $1 --key "$KEY" --version-id $VERSION_ID
done

This version of the script worked really well for me. I have a bucket that has a directory with 180,000 items in it, and this one chews through them and restores all the files that are in a directory/folder that is within the bucket.
If you just need to restore all the items in a bucket that don't have a directory, then you can just drop the prefix parameter.
#!/bin/bash
BUCKET=mybucketname
DIRECTORY=myfoldername
function run() {
aws s3api list-object-versions --bucket ${BUCKET_NAME} --prefix="${DIRECTORY}" --query='{Objects: DeleteMarkers[].{Key:Key}}' --output text |
while read KEY
do
if [[ "$KEY" == "None" ]]; then
continue
else
KEY=$(echo ${KEY} | awk '{$1=""; print $0}' | sed "s/^ *//g")
VERSION=$(aws s3api list-object-versions --bucket ${BUCKET_NAME} --prefix="$KEY" --query='{Objects: DeleteMarkers[].{VersionId:VersionId}}' --output text | awk '{$1=""; print $0}' | sed "s/^ *//g")
echo ${KEY}
echo ${VERSION}
fi
aws s3api delete-object --bucket ${BUCKET_NAME} --key="${KEY}" --version-id ${VERSION}
done
}
Note, running this script two times will run, but it won't work. It will just return the same record in the second script, so it doesn't really do anything. If you had a massive bucket, I might setup 3-4 scripts that filter by files that start with a certain letter/number. At least this way you can start working on files deeper down in the bucket.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

AWS remove old files in repo S3 - amazon-web-services

Rather than using aws s3 ls, you can use: aws s3api list-objects --bucket my-bucket --query "Contents[?LastModified<='2019-06-01'].[Key]" --output text This will list the name (Key) of objects created before the given date.

Related

My aws cli command returns an error message "The specified key does not exist."

Total size of folder in S3 bucket

AWS s3api command to get list object with lastmodified greater than yyyy-mm-dd HH:MM:SS

looking for s3cmd download command for a certain date

Recover Deleted Objects From Amazon S3

Categories

Resources