here is the command I used, and it works fine when there is no space space in S3 URL
aws s3 ls <s3://bucket/folder/> --recursive | awk '{print $4}' | awk "NR >= 2" | xargs -I %%% aws s3api restore -object --bucket <bucket> --restore-request Days=3,GlacierJobParameters={"Tier"="Bulk"} --key %%%
But if there is space in s3 url like the picture I attached, it returns an error message. I don't know what the problem is, how do I fix it?
I have a repository in s3 with files with old dates (< 20-10).
I want to delete only those files.
The problem is that when you apply xargs rm you can't delete the files because it concatenates by default the date with the name d
aws s3 ls s3://my_repo/
2019-10-17 06:07:09 9307 20191017_060016_00112_u246w_0950f96f-a55a-4ce5-b0f3-b271ecb8fe90
2019-10-17 06:07:09 467791 20191017_060016_00112_u246w_94bbd3a2-76ea-4c04-8189-d963168ea34b
2019-10-21 19:35:12 1633 20191021_193156_01159_myrsw_2e68c0e4-b1a3-4abf-94b3-797ef653b742
2019-10-21 19:35:12 1643 20191021_193156_01159_myrsw_3491c665-82e3-475c-bba2-35e7d61d7912
aws s3 ls s3://my_repo/ | awk '$1 < "2019-10-20 00:00:00" '
2019-10-17 06:07:09 9307 20191017_060016_00112_u246w_0950f96f-a55a-4ce5-b0f3-b271ecb8fe90
2019-10-17 06:07:09 467791 20191017_060016_00112_u246w_94bbd3a2-76ea-4c04-8189-d963168ea34b
aws s3 ls s3://my_repo/ | awk '$1 < "2019-10-20 00:00:00" {print $0}' | xargs -0 rm --
rm: cannot remove '2019-10-17 06:07:09 9307 20191017_060016_00112_u246w_0950f96f-a55a-4ce5-b0f3-b271ecb8fe90': File name too long
Rather than using aws s3 ls, you can use:
aws s3api list-objects --bucket my-bucket --query "Contents[?LastModified<='2019-06-01'].[Key]" --output text
This will list the name (Key) of objects created before the given date.
I am trying to figure out on what the s3cmd command would be to download files from bucket by date, so for example i have a bucket named "test" and in that bucket there are different files from different dates. I am trying to get the files that were uploaded yesterday. what would the command be?
There is no single command that will allow you to do that. You have to write a script some thing like this. Or use a SDK that allows you to do this. Below script is a sample script that will get S3 files from last 30 days.
#!/bin/bash
# Usage: ./getOld "bucketname" "30 days"
s3cmd ls s3://$1 | while read -r line; do
createDate=`echo $line|awk {'print $1" "$2'}`
createDate=`date -d"$createDate" +%s`
olderThan=`date -d"-$2" +%s`
if [[ $createDate -lt $olderThan ]]
then
fileName=`echo $line|awk {'print $4'}`
echo $fileName
if [[ $fileName != "" ]]
then
s3cmd get "$fileName"
fi
fi
done;
I like s3cmd but to work with single line command, I prefer the JSon output of aws cli and jq JSon processor
The command will look like
aws s3api list-objects --bucket "yourbucket" |\
jq '.Contents[] | select(.LastModified | startswith("yourdate")).Key' --raw-output |\
xargs -I {} aws s3 cp s3://yourbucket/{} .
basically what the script does
list all object from a given bucket
(the interesting part) jq will parse the Contents array and select element where the LastModified value start with your pattern (you will need to change), get the Key of the s3 object and add --raw-output so it strips the quote from the value
pass the result to an aws copy command to download the file from s3
if you want to automate a bit further you can get yesterday from the command line
for mac os
$ export YESTERDAY=`date -v-1w +%F`
$ aws s3api list-objects --bucket "ariba-install" |\
jq '.Contents[] | select(.LastModified | startswith('\"$YESTERDAY\"')).Key' --raw-output |\
xargs -I {} aws s3 cp s3://ariba-install/{} .
for linux os (or other flavor of bash that I am not familiar)
$ export YESTERDAY=`date -d "1 day ago" '+%Y-%m-%d' `
$ aws s3api list-objects --bucket "ariba-install" |\
jq '.Contents[] | select(.LastModified | startswith('\"$YESTERDAY\"')).Key' --raw-output |\
xargs -I {} aws s3 cp s3://ariba-install/{} .
Now you get the idea if you want to change the YESTERDAY variable to have different kind of date
Does Amazon provide an easy way to see how much storage my S3 bucket or folder is using? This is so I can calculate my costs, etc.
Two ways,
Using aws cli
aws s3 ls --summarize --human-readable --recursive s3://bucket/folder/*
If we omit / in the end, it will get all the folders starting with your folder name and give a total size of all.
aws s3 ls --summarize --human-readable --recursive s3://bucket/folder
Using boto3 api
import boto3
def get_folder_size(bucket, prefix):
total_size = 0
for obj in boto3.resource('s3').Bucket(bucket).objects.filter(Prefix=prefix):
total_size += obj.size
return total_size
Amazon has changed the Web interface so now you have the "Get Size" under the "More" menu.
Answer updated for 2021 :)
In your AWS console, under S3 buckets, find bucket, or folder inside it, and click Calculate total size.
As of the 28th July 2015 you can get this information via CloudWatch.
aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2015-07-15T10:00:00
--end-time 2015-07-31T01:00:00 --period 86400 --statistics Average --region us-east-1
--metric-name BucketSizeBytes --dimensions Name=BucketName,Value=myBucketNameGoesHere
Name=StorageType,Value=StandardStorage
Important: You must specify both StorageType and BucketName in the dimensions argument otherwise you will get no results.
in case if someone needs the bytes precision:
aws s3 ls --summarize --recursive s3://path | tail -1 | awk '{print $3}'
Answer adjusted to 2020:
Go into your bucket, select all folders, files and click on "Actions"->"Get Total Size"
I use s3cmd du s3://BUCKET/ --human-readable to view size of folders in S3. It gives quite a detailed info about the total objects in the bucket and its size in a very readable form.
Using the AWS Web Console and Cloudwatch:
Go to CloudWatch
Clcik Metrics from the left side of the screen
Click S3
Click Storage
You will see a list of all buckets. Note there are two possible points of confusion here:
a. You will only see buckets that have at least one object in the bucket.
b. You may not see buckets created in a different region and you might need to switch regions using the pull down at the top right to see the additional buckets
Search for the word "StandardStorage" in the area stating "Search for any metric, dimension or resource id"
Select the buckets (or all buckets with the checkbox at the left below the word "All") you would like to calculate total size for
Select at least 3d (3 days) or longer from the time bar towards the top right of the screen
You will now see a graph displaying the daily (or other unit) size of list of all selected buckets over the selected time period.
The most recent and the easiest way is to go to "Metric" tab.
It provides clear understanding of the bucket size and number of objects inside it.
As an alternative, you can try s3cmd, which has a du command like Unix.
If you don't need an exact byte count or if the bucket is really large (in the TBs or millions of objects), using CloudWatch metrics is the fastest way as it doesn't require iterating through all the objects, which can take significant CPU and can end in a timeout or network error if using a CLI command.
Based on some examples from others on SO for running the aws cloudwatch get-metric-statistics command, I've wrapped it up in a useful Bash function that allows you to optionally specify a profile for the aws command:
# print S3 bucket size and count
# usage: bsize <bucket> [profile]
function bsize() (
bucket=$1 profile=${2-default}
if [[ -z "$bucket" ]]; then
echo >&2 "bsize <bucket> [profile]"
return 1
fi
# ensure aws/jq/numfmt are installed
for bin in aws jq numfmt; do
if ! hash $bin 2> /dev/null; then
echo >&2 "Please install \"$_\" first!"
return 1
fi
done
# get bucket region
region=$(aws --profile $profile s3api get-bucket-location --bucket $bucket 2> /dev/null | jq -r '.LocationConstraint // "us-east-1"')
if [[ -z "$region" ]]; then
echo >&2 "Invalid bucket/profile name!"
return 1
fi
# get storage class (assumes
# all objects in same class)
sclass=$(aws --profile $profile s3api list-objects --bucket $bucket --max-items=1 2> /dev/null | jq -r '.Contents[].StorageClass // "STANDARD"')
case $sclass in
REDUCED_REDUNDANCY) sclass="ReducedRedundancyStorage" ;;
GLACIER) sclass="GlacierStorage" ;;
DEEP_ARCHIVE) sclass="DeepArchiveStorage" ;;
*) sclass="StandardStorage" ;;
esac
# _bsize <metric> <stype>
_bsize() {
metric=$1 stype=$2
utnow=$(date +%s)
aws --profile $profile cloudwatch get-metric-statistics --namespace AWS/S3 --start-time "$(echo "$utnow - 604800" | bc)" --end-time "$utnow" --period 604800 --statistics Average --region $region --metric-name $metric --dimensions Name=BucketName,Value="$bucket" Name=StorageType,Value="$stype" 2> /dev/null | jq -r '.Datapoints[].Average'
}
# _print <number> <units> <format> [suffix]
_print() {
number=$1 units=$2 format=$3 suffix=$4
if [[ -n "$number" ]]; then
numfmt --to="$units" --suffix="$suffix" --format="$format" $number | sed -En 's/([^0-9]+)$/ \1/p'
fi
}
_print "$(_bsize BucketSizeBytes $sclass)" iec-i "%10.2f" B
_print "$(_bsize NumberOfObjects AllStorageTypes)" si "%8.2f"
)
A few caveats:
For simplicity, the function assumes that all objects in the bucket are in the same storage class!
On macOS, use gnumfmt instead of numfmt.
If numfmt complains about invalid --format option, upgrade GNU coreutils for floating-point precision support.
s3cmd du --human-readable --recursive s3://Bucket_Name/
There are many ways to calculate the total size of folders in the bucket
Using AWS Console
S3 Buckets > #Bucket > #folder > Actions > Calculate total size
Using AWS CLI
aws s3 ls s3://YOUR_BUCKET/YOUR_FOLDER/ --recursive --human-readable --summarize
The command's output shows:
The date the objects were created
Individual file size of each object
The path of each object the total number of objects in the s3 bucket
The total size of the objects in the bucket
Using Bash script
#!/bin/bash
while IFS= read -r line;
do
echo $line
aws s3 ls --summarize --human-readable --recursive s3://#bucket/$line --region #region | tail -n 2 | awk '{print $1 $2 $3 $4}'
echo "----------"
done < folder-name.txt
Sample Output:
test1/
TotalObjects:10
TotalSize:2.1KiB
----------
s3folder1/
TotalObjects:2
TotalSize:18.2KiB
----------
testfolder/
TotalObjects:1
TotalSize:112 Mib
----------
Found here
aws s3api list-objects --bucket cyclops-images --output json --query "[sum(Contents[].Size), length(Contents[])]" | awk 'NR!=2 {print $0;next} NR==2 {print $0/1024/1024/1024" GB"}'
You can visit this URL to see the size of your bucket on the "Metrics" tab in S3: https://s3.console.aws.amazon.com/s3/buckets/{YOUR_BUCKET_NAME}?region={YOUR_REGION}&tab=metrics
The data's actually in CloudWatch so you can just go straight there instead and then save the buckets you're interested in to a dashboard.
In NodeJs
const getAllFileList = (s3bucket, prefix = null, token = null, files = []) => {
var opts = { Bucket: s3bucket, Prefix: prefix };
let s3 = awshelper.getS3Instance();
if (token) opts.ContinuationToken = token;
return new Promise(function (resolve, reject) {
s3.listObjectsV2(opts, async (err, data) => {
files = files.concat(data.Contents);
if (data.IsTruncated) {
resolve(
await getAllFileList(
s3bucket,
prefix,
data.NextContinuationToken,
files
)
);
} else {
resolve(files);
}
});
});
};
const calculateSize = async (bucket, prefix) => {
let fileList = await getAllFileList(bucket, prefix);
let size = 0;
for (let i = 0; i < fileList.length; i++) {
size += fileList[i].Size;
}
return size;
};
Now Just call calculateSize("YOUR_BUCKET_NAME","YOUR_FOLDER_NAME")
I have a bucket (version enabled), how can i get back the objects that are accidentally permanent deleted from my bucket.
I have created a script to restore the objects with deletemarker. You'll have to input it like below:
sh Undelete_deletemarker.sh bucketname path/to/certain/folder
**Script:**
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
aws s3api list-object-versions --bucket $1 --prefix $2 --output text |
grep "DELETEMARKERS" | while read obj
do
KEY=$( echo $obj| awk '{print $3}')
VERSION_ID=$( echo $obj | awk '{print $5}')
echo $KEY
echo $VERSION_ID
aws s3api delete-object --bucket $1 --key $KEY --version-id
$VERSION_ID
done
Happy Coding! ;)
Thank you, Kc Bickey, this script works wonderfully! Only thing I might add for others is to make sure " $VERSION_ID" immediately follows "--version-id" on line 12. The forum seems to have wrapped " $VERSION_ID" to the next line and it causes the script to error until that's corrected.
**Script:**
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
aws s3api list-object-versions --bucket $1 --prefix $2 --output text |
grep "DELETEMARKERS" | while read obj
do
KEY=$( echo $obj| awk '{print $3}')
VERSION_ID=$( echo $obj | awk '{print $5}')
echo $KEY
echo $VERSION_ID
aws s3api delete-object --bucket $1 --key $KEY --version-id $VERSION_ID
done
with bucket versioning enable to permanently delete an object you need to specifically mention the version of the object DELETE Object versionId
If you've done so you cannot recover this specific version, you get access to previous version
When versioning is enabled, a simple DELETE cannot permanently delete an object. Instead, Amazon S3 inserts a delete marker in the bucket so you can recover from this specific marker, but if the marker is deleted (and you mention it was permanent deleted) you cannot recover
did you enable Cross-Region Replication ? If so you can retrieve the object in the other region:
If a DELETE request specifies a particular object version ID to delete, Amazon S3 will delete that object version in the source bucket, but it will not replicate the deletion in the destination bucket (in other words, it will not delete the same object version from the destination bucket). This behavior protects data from malicious deletions.
Edit: If you have versioning enabled on your bucket you should get the Versions Hide/Show toggle button and when Show is selected you should have the additional Version ID column as per the screenshot from my bucket
If your bucket objects has white spaces in filename, previous scripts may not work properly. This script take the key including white spaces.
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
aws s3api list-object-versions --bucket $1 --prefix $2 --output text |
grep "DELETEMARKERS" | while read obj
do
KEY=$( echo $obj| awk '{indice=index($0,$(NF-1))-index($0,$3);print substr($0, index($0,$3), indice-1)}')
VERSION_ID=$( echo $obj | awk '{print $NF}')
echo $KEY
echo $VERSION_ID
aws s3api delete-object --bucket $1 --key "$KEY" --version-id $VERSION_ID
done
This version of the script worked really well for me. I have a bucket that has a directory with 180,000 items in it, and this one chews through them and restores all the files that are in a directory/folder that is within the bucket.
If you just need to restore all the items in a bucket that don't have a directory, then you can just drop the prefix parameter.
#!/bin/bash
BUCKET=mybucketname
DIRECTORY=myfoldername
function run() {
aws s3api list-object-versions --bucket ${BUCKET_NAME} --prefix="${DIRECTORY}" --query='{Objects: DeleteMarkers[].{Key:Key}}' --output text |
while read KEY
do
if [[ "$KEY" == "None" ]]; then
continue
else
KEY=$(echo ${KEY} | awk '{$1=""; print $0}' | sed "s/^ *//g")
VERSION=$(aws s3api list-object-versions --bucket ${BUCKET_NAME} --prefix="$KEY" --query='{Objects: DeleteMarkers[].{VersionId:VersionId}}' --output text | awk '{$1=""; print $0}' | sed "s/^ *//g")
echo ${KEY}
echo ${VERSION}
fi
aws s3api delete-object --bucket ${BUCKET_NAME} --key="${KEY}" --version-id ${VERSION}
done
}
Note, running this script two times will run, but it won't work. It will just return the same record in the second script, so it doesn't really do anything. If you had a massive bucket, I might setup 3-4 scripts that filter by files that start with a certain letter/number. At least this way you can start working on files deeper down in the bucket.