How to pass AWS CLI parameters as variables in Powershell - amazon-web-services

I have a AWS CLI command to download some files from my S3 bucket but I wanted to pass parameters and their values in from PowerShell variables. Is this possible?
This works:
$filterInclude = "7012*"
$results = aws s3 cp $bucketPath $destinationDir --recursive --exclude $filterExclude --include $filterInclude
But I wanted something like:
$filterInclude = "7012*"
$includeCom = "--include \`"$($filterInclude)\`""
$results = aws s3 cp $bucketPath $destinationDir --recursive --exclude $filterExclude "$($includeCom)"
The result I get is:
Unknown options: --include "7012*"

Related

aws s3 sync missing to create root folders

I am archiving some folders to S3
Example: C:\UserProfile\E21126\data ....
I expect to have a folder structure in s3 like, UserProfiles\E21126.
Problem is it created the folders under \E21126 and misses creating the root folder \E21126.
Folds1.txt contains these folders to sync:
G:\UserProfiles\E21126
G:\UserProfiles\E47341
G:\UserProfiles\C68115
G:\UserProfiles\C30654
G:\UserProfiles\C52860
G:\UserProfiles\E47341
G:\UserProfiles\C68115
G:\UserProfiles\C30654
G:\UserProfiles\C52860
my code below:
ForEach ($Folder in (Get-content "F:\scripts\Folds1.txt")) {
aws s3 sync $Folder s3://css-lvdae1cxfs003-archive/Archive-Profiles/ --acl bucket-owner-full-control --storage-class STANDARD
}
It will upload all the folders with their names excluding the path. If you want to include the UserProfiles in the S3 bucket then you will needs to include that in the key. You need to upload them to the S3 bucket with specifying the key name
aws s3 sync $Folder s3://css-lvdae1cxfs003-archive/Archive-Profiles/UserProfiles --acl bucket-owner-full-control --storage-class STANDARD
and If your files have different name instead of UserProfiles string then you can get the parent path and then fetch the leaf to get the username from the string
PS C:\> Split-Path -Path "G:\UserProfiles\E21126"
G:\UserProfiles
PS C:\> Split-Path -Path "G:\UserProfiles" -Leaf -Resolve
UserProfiles
If you were to modify the text file to contain:
E21126
E47341
C68115
Then you could use the command:
ForEach ($Folder in (Get-content "F:\scripts\Folds1.txt")) {
aws s3 sync G:\UserProfiles\$Folder s3://css-lvdae1cxfs003-archive/Archive-Profiles/$Folder/ --acl bucket-owner-full-control --storage-class STANDARD
}
Note that the folder name is included in the destination path.

How to Remove Delete Markers from Multiple Objects on Amazon S3 at once

I have an Amazon S3 bucket with versioning enabled. Due to a misconfigured lifecycle policy, many of the objects in this bucket had Delete Markers added to them.
I can remove these markers from the S3 console to restore the previous versions of these objects, but there are enough objects to make doing this manually on the web console extremely time-inefficient.
Is there a way to find all Delete Markers in an S3 bucket and remove them, restoring all files in that bucket? Ideally I would like to do this from the console itself, although I will happily write a script or use the amazon CLI tools to do this if that's the only way.
Thanks!
Use this to restore the files inside the specific folder. I've used aws cli commands in my script. Provide input as:
sh scriptname.sh bucketname path/to/a/folder
**Script:**
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
aws s3api list-object-versions --bucket $1 --prefix $2 --output text |
grep "DELETEMARKERS" | while read obj
do
KEY=$( echo $obj| awk '{print $3}')
VERSION_ID=$( echo $obj | awk '{print $5}')
echo $KEY
echo $VERSION_ID
aws s3api delete-object --bucket $1 --key $KEY --version-id $VERSION_ID
done
Edit: put $VERSION_ID in correct position in the script
Here's a sample Python implementation:
import boto3
import botocore
BUCKET_NAME = 'BUCKET_NAME'
s3 = boto3.resource('s3')
def main():
bucket = s3.Bucket(BUCKET_NAME)
versions = bucket.object_versions
for version in versions.all():
if is_delete_marker(version):
version.delete()
def is_delete_marker(version):
try:
# note head() is faster than get()
version.head()
return False
except botocore.exceptions.ClientError as e:
if 'x-amz-delete-marker' in e.response['ResponseMetadata']['HTTPHeaders']:
return True
# an older version of the key but not a DeleteMarker
elif '404' == e.response['Error']['Code']:
return False
if __name__ == '__main__':
main()
For some context for this answer see:
https://docs.aws.amazon.com/AmazonS3/latest/dev/DeleteMarker.html
If you try to get an object and its current version is a delete
marker, Amazon S3 responds with:
A 404 (Object not found) error
A response header, x-amz-delete-marker: true
The response header tells you that the object accessed was a delete
marker. This response header never returns false; if the value is
false, Amazon S3 does not include this response header in the
response.
The only way to list delete markers (and other versions of an object)
is by using the versions subresource in a GET Bucket versions request.
A simple GET does not retrieve delete marker objects.
Unfortunately, despite what is written in https://github.com/boto/botocore/issues/674, checking if ObjectVersion.size is None is not a reliable way to determine if a version is a delete marker as it will also be true for previously deleted versions of folder keys.
Currently, boto3 is missing a straightforward way to determine if an ObjectVersion is a DeleteMarker. See https://github.com/boto/boto3/issues/1769
However, ObjectVersion.head() and .Get() operations will throw an exception on an ObjectVersion that is a DeleteMarker. Catching this exception is likely the only reliable way of determining if an ObjectVersion is a DeleteMarker.
I just wrote a program (using boto) to solve the same problem:
from boto.s3 import deletemarker
from boto.s3.connection import S3Connection
from boto.s3.key import Key
def restore_bucket(bucket_name):
bucket = conn.get_bucket(bucket_name)
for version in bucket.list_versions():
if isinstance(version, deletemarker.DeleteMarker) and version.is_latest:
bucket.delete_key(version.name, version_id=version.version_id)
If you need to restore folders within the versioned buckets, the rest of the program I wrote can be found here.
Define variables
PROFILE="personal"
REGION="eu-west-1"
BUCKET="mysql-backend-backups-prod"
Delete DeleteMarkers at once
aws --profile $PROFILE s3api delete-objects \
--region $REGION \
--bucket $BUCKET \
--delete "$(aws --profile $PROFILE s3api list-object-versions \
--region $REGION \
--bucket $BUCKET \
--output=json \
--query='{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}')"
Delete versions at once
aws --profile $PROFILE s3api delete-objects \
--region $REGION \
--bucket $BUCKET \
--delete "$(aws --profile $PROFILE s3api list-object-versions \
--region $REGION \
--bucket $BUCKET \
--output=json \
--query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}')"
And delete S3 bucket afterward
aws --profile $PROFILE s3api delete-bucket \
--region $REGION \
--bucket $BUCKET
You would need to write a program to:
Loop through all objects in the Amazon S3 bucket
Retrieve the version IDs for each version of each object
Delete the delete markers
This could be done fairly easily using the SDK, such as boto.
The AWS Command-Line Interface (CLI) can also be used, but you would have to build a script around it to capture the IDs and then delete the markers.
I have been dealing with this problem a few weeks ago.
Finally I managed to generate a function in PHP that deletes the 'deleted markers' of the latest version of the files within a prefix.
Personally, it worked perfectly and, in a pass of this script, iterating through all the prefixes, I managed to mend my own error by deleting many s3 objects unintentionally.
I leave my implementation in PHP below :
private function restore_files($file)
{
$storage = get_storage()->getDriver()->getAdapter()->getClient();
$bucket_name = 'my_bucket_name';
$s3_path=$file->s3_path;
$restore_folder_path = pathinfo($s3_path, PATHINFO_DIRNAME);
$data = $storage->listObjectVersions([
'Bucket' => $bucket_name,
'Prefix' => $restore_folder_path,
]);
$data_array = $data->toArray();
$deleteMarkers = $data_array['DeleteMarkers'];
foreach ($deleteMarkers as $key => $delete_marker) {
if ($delete_marker["IsLatest"]) {
$objkey = $delete_marker["Key"];
$objVersionId = $delete_marker["VersionId"];
$delete_response = $storage-> deleteObjectAsync([
'Bucket' => $bucket_name,
'Key' => $objkey,
'VersionId' => $objVersionId
]);
}
}
}
Some considerations about the script:
The code was implemented using Laravel Framework so, in the variable $storage, i get the PHP SDK alone, without using all the laravel's wrapper. So, $storage varible is the Client Object of the S3 SDK. Here is the documentation that I have used.
The $file parameter that the function receives, is an objected that has the s3_path in their's properties. So, in the $restore_folder_path varible, I get the prefix of the object s3 path.
Finally, i get all the objects inside the prefix in s3. I iterate over the DeleteMarkers list, and ask if the current object is the lasted deleted marker. If it is, i make a post to deleteObject function with the specif id of the object that i want to delete it's deleted marker. This is the way s3 documentation specify to remove the deleted marker
Most of the above versions are very slow on large buckets as they use delete-object rather than delete-objects. Here a variant on the bash version which uses awk to issue 100 requests at a time:
Edit: just saw #Viacheslav's version which also uses delete-objects and is nice and clean, but will fail with large numbers of markers due to line length issues.
#!/bin/bash
bucket=$1
prefix=$2
aws s3api list-object-versions \
--bucket "$bucket" \
--prefix "$prefix" \
--query 'DeleteMarkers[][Key,VersionId]' \
--output text |
awk '{ acc = acc "{Key=" $1 ",VersionId=" $2 "}," }
NR % 100 == 0 {print "Objects=[" acc "],Quiet=False"; acc="" }
END { print "Objects=[" acc "],Quiet=False" }' |
while read batch; do
aws s3api delete-objects --bucket "$bucket" --delete "$batch" --output text
done
Set up a life cycle rule to remove them after a certain days. Otherwise it will cost you 0.005$ per 1000 Object Listing.
So most efficient way is setting up a lifecycle rule.
Here is the step by step method.
https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html
I checked the file size.
Marker size is 'None'
Remove all Marker.
import boto3
default_session=boto3.session.Session(profile_name="default")
s3_re=default_session.resource(service_name="s3", region_name="ap-northeast-2")
for each_bucket in s3_re.buckets.all():
bucket_name = each_bucket.name
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)
version = bucket.object_versions
for ver in version.all():
if str(ver.size) in 'None':
delete_file = ver.delete()
print(delete_file)
else:
pass

How do I find the total size of my AWS S3 storage bucket or folder?

Does Amazon provide an easy way to see how much storage my S3 bucket or folder is using? This is so I can calculate my costs, etc.
Two ways,
Using aws cli
aws s3 ls --summarize --human-readable --recursive s3://bucket/folder/*
If we omit / in the end, it will get all the folders starting with your folder name and give a total size of all.
aws s3 ls --summarize --human-readable --recursive s3://bucket/folder
Using boto3 api
import boto3
def get_folder_size(bucket, prefix):
total_size = 0
for obj in boto3.resource('s3').Bucket(bucket).objects.filter(Prefix=prefix):
total_size += obj.size
return total_size
Amazon has changed the Web interface so now you have the "Get Size" under the "More" menu.
Answer updated for 2021 :)
In your AWS console, under S3 buckets, find bucket, or folder inside it, and click Calculate total size.
As of the 28th July 2015 you can get this information via CloudWatch.
aws cloudwatch get-metric-statistics --namespace AWS/S3 --start-time 2015-07-15T10:00:00
--end-time 2015-07-31T01:00:00 --period 86400 --statistics Average --region us-east-1
--metric-name BucketSizeBytes --dimensions Name=BucketName,Value=myBucketNameGoesHere
Name=StorageType,Value=StandardStorage
Important: You must specify both StorageType and BucketName in the dimensions argument otherwise you will get no results.
in case if someone needs the bytes precision:
aws s3 ls --summarize --recursive s3://path | tail -1 | awk '{print $3}'
Answer adjusted to 2020:
Go into your bucket, select all folders, files and click on "Actions"->"Get Total Size"
I use s3cmd du s3://BUCKET/ --human-readable to view size of folders in S3. It gives quite a detailed info about the total objects in the bucket and its size in a very readable form.
Using the AWS Web Console and Cloudwatch:
Go to CloudWatch
Clcik Metrics from the left side of the screen
Click S3
Click Storage
You will see a list of all buckets. Note there are two possible points of confusion here:
a. You will only see buckets that have at least one object in the bucket.
b. You may not see buckets created in a different region and you might need to switch regions using the pull down at the top right to see the additional buckets
Search for the word "StandardStorage" in the area stating "Search for any metric, dimension or resource id"
Select the buckets (or all buckets with the checkbox at the left below the word "All") you would like to calculate total size for
Select at least 3d (3 days) or longer from the time bar towards the top right of the screen
You will now see a graph displaying the daily (or other unit) size of list of all selected buckets over the selected time period.
The most recent and the easiest way is to go to "Metric" tab.
It provides clear understanding of the bucket size and number of objects inside it.
As an alternative, you can try s3cmd, which has a du command like Unix.
If you don't need an exact byte count or if the bucket is really large (in the TBs or millions of objects), using CloudWatch metrics is the fastest way as it doesn't require iterating through all the objects, which can take significant CPU and can end in a timeout or network error if using a CLI command.
Based on some examples from others on SO for running the aws cloudwatch get-metric-statistics command, I've wrapped it up in a useful Bash function that allows you to optionally specify a profile for the aws command:
# print S3 bucket size and count
# usage: bsize <bucket> [profile]
function bsize() (
bucket=$1 profile=${2-default}
if [[ -z "$bucket" ]]; then
echo >&2 "bsize <bucket> [profile]"
return 1
fi
# ensure aws/jq/numfmt are installed
for bin in aws jq numfmt; do
if ! hash $bin 2> /dev/null; then
echo >&2 "Please install \"$_\" first!"
return 1
fi
done
# get bucket region
region=$(aws --profile $profile s3api get-bucket-location --bucket $bucket 2> /dev/null | jq -r '.LocationConstraint // "us-east-1"')
if [[ -z "$region" ]]; then
echo >&2 "Invalid bucket/profile name!"
return 1
fi
# get storage class (assumes
# all objects in same class)
sclass=$(aws --profile $profile s3api list-objects --bucket $bucket --max-items=1 2> /dev/null | jq -r '.Contents[].StorageClass // "STANDARD"')
case $sclass in
REDUCED_REDUNDANCY) sclass="ReducedRedundancyStorage" ;;
GLACIER) sclass="GlacierStorage" ;;
DEEP_ARCHIVE) sclass="DeepArchiveStorage" ;;
*) sclass="StandardStorage" ;;
esac
# _bsize <metric> <stype>
_bsize() {
metric=$1 stype=$2
utnow=$(date +%s)
aws --profile $profile cloudwatch get-metric-statistics --namespace AWS/S3 --start-time "$(echo "$utnow - 604800" | bc)" --end-time "$utnow" --period 604800 --statistics Average --region $region --metric-name $metric --dimensions Name=BucketName,Value="$bucket" Name=StorageType,Value="$stype" 2> /dev/null | jq -r '.Datapoints[].Average'
}
# _print <number> <units> <format> [suffix]
_print() {
number=$1 units=$2 format=$3 suffix=$4
if [[ -n "$number" ]]; then
numfmt --to="$units" --suffix="$suffix" --format="$format" $number | sed -En 's/([^0-9]+)$/ \1/p'
fi
}
_print "$(_bsize BucketSizeBytes $sclass)" iec-i "%10.2f" B
_print "$(_bsize NumberOfObjects AllStorageTypes)" si "%8.2f"
)
A few caveats:
For simplicity, the function assumes that all objects in the bucket are in the same storage class!
On macOS, use gnumfmt instead of numfmt.
If numfmt complains about invalid --format option, upgrade GNU coreutils for floating-point precision support.
s3cmd du --human-readable --recursive s3://Bucket_Name/
There are many ways to calculate the total size of folders in the bucket
Using AWS Console
S3 Buckets > #Bucket > #folder > Actions > Calculate total size
Using AWS CLI
aws s3 ls s3://YOUR_BUCKET/YOUR_FOLDER/ --recursive --human-readable --summarize
The command's output shows:
The date the objects were created
Individual file size of each object
The path of each object the total number of objects in the s3 bucket
The total size of the objects in the bucket
Using Bash script
#!/bin/bash
while IFS= read -r line;
do
echo $line
aws s3 ls --summarize --human-readable --recursive s3://#bucket/$line --region #region | tail -n 2 | awk '{print $1 $2 $3 $4}'
echo "----------"
done < folder-name.txt
Sample Output:
test1/
TotalObjects:10
TotalSize:2.1KiB
----------
s3folder1/
TotalObjects:2
TotalSize:18.2KiB
----------
testfolder/
TotalObjects:1
TotalSize:112 Mib
----------
Found here
aws s3api list-objects --bucket cyclops-images --output json --query "[sum(Contents[].Size), length(Contents[])]" | awk 'NR!=2 {print $0;next} NR==2 {print $0/1024/1024/1024" GB"}'
You can visit this URL to see the size of your bucket on the "Metrics" tab in S3: https://s3.console.aws.amazon.com/s3/buckets/{YOUR_BUCKET_NAME}?region={YOUR_REGION}&tab=metrics
The data's actually in CloudWatch so you can just go straight there instead and then save the buckets you're interested in to a dashboard.
In NodeJs
const getAllFileList = (s3bucket, prefix = null, token = null, files = []) => {
var opts = { Bucket: s3bucket, Prefix: prefix };
let s3 = awshelper.getS3Instance();
if (token) opts.ContinuationToken = token;
return new Promise(function (resolve, reject) {
s3.listObjectsV2(opts, async (err, data) => {
files = files.concat(data.Contents);
if (data.IsTruncated) {
resolve(
await getAllFileList(
s3bucket,
prefix,
data.NextContinuationToken,
files
)
);
} else {
resolve(files);
}
});
});
};
const calculateSize = async (bucket, prefix) => {
let fileList = await getAllFileList(bucket, prefix);
let size = 0;
for (let i = 0; i < fileList.length; i++) {
size += fileList[i].Size;
}
return size;
};
Now Just call calculateSize("YOUR_BUCKET_NAME","YOUR_FOLDER_NAME")

How do I delete a versioned bucket in AWS S3 using the CLI?

I have tried both s3cmd:
$ s3cmd -r -f -v del s3://my-versioned-bucket/
And the AWS CLI:
$ aws s3 rm s3://my-versioned-bucket/ --recursive
But both of these commands simply add DELETE markers to S3. The command for removing a bucket also doesn't work (from the AWS CLI):
$ aws s3 rb s3://my-versioned-bucket/ --force
Cleaning up. Please wait...
Completed 1 part(s) with ... file(s) remaining
remove_bucket failed: s3://my-versioned-bucket/ A client error (BucketNotEmpty) occurred when calling the DeleteBucket operation: The bucket you tried to delete is not empty. You must delete all versions in the bucket.
Ok... how? There's no information in their documentation for this. S3Cmd says it's a 'fully-featured' S3 command-line tool, but it makes no reference to versions other than its own. Is there any way to do this without using the web interface, which will take forever and requires me to keep my laptop on?
I ran into the same limitation of the AWS CLI. I found the easiest solution to be to use Python and boto3:
#!/usr/bin/env python
BUCKET = 'your-bucket-here'
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET)
bucket.object_versions.delete()
# if you want to delete the now-empty bucket as well, uncomment this line:
#bucket.delete()
A previous version of this answer used boto but that solution had performance issues with large numbers of keys as Chuckles pointed out.
Using boto3 it's even easier than with the proposed boto solution to delete all object versions in an S3 bucket:
#!/usr/bin/env python
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket('your-bucket-name')
bucket.object_versions.all().delete()
Works fine also for very large amounts of object versions, although it might take some time in that case.
You can delete all the objects in the versioned s3 bucket.
But I don't know how to delete specific objects.
$ aws s3api delete-objects \
--bucket <value> \
--delete "$(aws s3api list-object-versions \
--bucket <value> | \
jq '{Objects: [.Versions[] | {Key:.Key, VersionId : .VersionId}], Quiet: false}')"
Alternatively without jq:
$ aws s3api delete-objects \
--bucket ${bucket_name} \
--delete "$(aws s3api list-object-versions \
--bucket "${bucket_name}" \
--output=json \
--query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}')"
This two bash lines are enough for me to enable the bucket deletion !
1: Delete objects
aws s3api delete-objects --bucket ${buckettoempty} --delete "$(aws s3api list-object-versions --bucket ${buckettoempty} --query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}')"
2: Delete markers
aws s3api delete-objects --bucket ${buckettoempty} --delete "$(aws s3api list-object-versions --bucket ${buckettoempty} --query='{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}')"
Looks like as of now, there is an Empty button in the AWS S3 console.
Just select your bucket and click on it. It will ask you to confirm your decision by typing permanently delete
Note, this will not delete the bucket itself.
Here is a one liner you can just cut and paste into the command line to delete all versions and delete markers (it requires aws tools, replace yourbucket-name-backup with your bucket name)
echo '#!/bin/bash' > deleteBucketScript.sh \
&& aws --output text s3api list-object-versions --bucket $BUCKET_TO_PERGE \
| grep -E "^VERSIONS" |\
awk '{print "aws s3api delete-object --bucket $BUCKET_TO_PERGE --key "$4" --version-id "$8";"}' >> \
deleteBucketScript.sh && . deleteBucketScript.sh; rm -f deleteBucketScript.sh; echo '#!/bin/bash' > \
deleteBucketScript.sh && aws --output text s3api list-object-versions --bucket $BUCKET_TO_PERGE \
| grep -E "^DELETEMARKERS" | grep -v "null" \
| awk '{print "aws s3api delete-object --bucket $BUCKET_TO_PERGE --key "$3" --version-id "$5";"}' >> \
deleteBucketScript.sh && . deleteBucketScript.sh; rm -f deleteBucketScript.sh;
then you could use:
aws s3 rb s3://bucket-name --force
If you have to delete/empty large S3 buckets, it becomes quite inefficient (and expensive) to delete every single object and version. It's often more convenient to let AWS expire all objects and versions.
aws s3api put-bucket-lifecycle-configuration \
--lifecycle-configuration '{"Rules":[{
"ID":"empty-bucket",
"Status":"Enabled",
"Prefix":"",
"Expiration":{"Days":1},
"NoncurrentVersionExpiration":{"NoncurrentDays":1}
}]}' \
--bucket YOUR-BUCKET
Then you just have to wait 1 day and the bucket can be deleted with:
aws s3api delete-bucket --bucket YOUR-BUCKET
For those using multiple profiles via ~/.aws/config
import boto3
PROFILE = "my_profile"
BUCKET = "my_bucket"
session = boto3.Session(profile_name = PROFILE)
s3 = session.resource('s3')
bucket = s3.Bucket(BUCKET)
bucket.object_versions.delete()
One way to do it is iterate through the versions and delete them. A bit tricky on the CLI, but as you mentioned Java, that would be more straightforward:
AmazonS3Client s3 = new AmazonS3Client();
String bucketName = "deleteversions-"+UUID.randomUUID();
//Creates Bucket
s3.createBucket(bucketName);
//Enable Versioning
BucketVersioningConfiguration configuration = new BucketVersioningConfiguration(ENABLED);
s3.setBucketVersioningConfiguration(new SetBucketVersioningConfigurationRequest(bucketName, configuration ));
//Puts versions
s3.putObject(bucketName, "some-key",new ByteArrayInputStream("some-bytes".getBytes()), null);
s3.putObject(bucketName, "some-key",new ByteArrayInputStream("other-bytes".getBytes()), null);
//Removes all versions
for ( S3VersionSummary version : S3Versions.inBucket(s3, bucketName) ) {
String key = version.getKey();
String versionId = version.getVersionId();
s3.deleteVersion(bucketName, key, versionId);
}
//Removes the bucket
s3.deleteBucket(bucketName);
System.out.println("Done!");
You can also batch delete calls for efficiency if needed.
If you want pure CLI approach (with jq):
aws s3api list-object-versions \
--bucket $bucket \
--region $region \
--query "Versions[].Key" \
--output json | jq 'unique' | jq -r '.[]' | while read key; do
echo "deleting versions of $key"
aws s3api list-object-versions \
--bucket $bucket \
--region $region \
--prefix $key \
--query "Versions[].VersionId" \
--output json | jq 'unique' | jq -r '.[]' | while read version; do
echo "deleting $version"
aws s3api delete-object \
--bucket $bucket \
--key $key \
--version-id $version \
--region $region
done
done
Simple bash loop I've found and implemented for N buckets:
for b in $(ListOfBuckets); do \
echo "Emptying $b"; \
aws s3api delete-objects --bucket $b --delete "$(aws s3api list-object-versions --bucket $b --output=json --query='{Objects: *[].{Key:Key,VersionId:VersionId}}')"; \
done
I ran into issues with Abe's solution as the list_buckets generator is used to create a massive list called all_keys and I spent an hour without it ever completing. This tweak seems to work better for me, I had close to a million objects in my bucket and counting!
import boto
s3 = boto.connect_s3()
bucket = s3.get_bucket("your-bucket-name-here")
chunk_counter = 0 #this is simply a nice to have
keys = []
for key in bucket.list_versions():
keys.append(key)
if len(keys) > 1000:
bucket.delete_keys(keys)
chunk_counter += 1
keys = []
print("Another 1000 done.... {n} chunks so far".format(n=chunk_counter))
#bucket.delete() #as per usual uncomment if you're sure!
Hopefully this helps anyone else encountering this S3 nightmare!
For deleting specify object(s), using jq filter.
You may need cleanup the 'DeleteMarkers' not just 'Versions'.
Using $() instead of ``, you may embed variables for bucket-name and key-value.
aws s3api delete-objects --bucket bucket-name --delete "$(aws s3api list-object-versions --bucket bucket-name | jq -M '{Objects: [.["Versions","DeleteMarkers"][]|select(.Key == "key-value")| {Key:.Key, VersionId : .VersionId}], Quiet: false}')"
Even though technically it's not AWS CLI, I'd recommend using AWS Tools for Powershell for this task. Then you can use the simple command as below:
Remove-S3Bucket -BucketName {bucket-name} -DeleteBucketContent -Force -Region {region}
As stated in the documentation, DeleteBucketContent flag does the following:
"If set, all remaining objects and/or object versions in the bucket
are deleted proir (sic) to the bucket itself being deleted"
Reference: https://docs.aws.amazon.com/powershell/latest/reference/items/Remove-S3Bucket.html
This bash script found here: https://gist.github.com/weavenet/f40b09847ac17dd99d16
worked as is for me.
I saved script as: delete_all_versions.sh and then simply ran:
./delete_all_versions.sh my_foobar_bucket
and that worked without a flaw.
Did not need python or boto or anything.
You can do this from the AWS Console using Lifecycle Rules.
Open the bucket in question. Click the Management tab at the top.
Make sure the Lifecycle Sub Tab is selected.
Click + Add lifecycle rule
On Step 1 (Name and scope) enter a rule name (e.g. removeall)
Click Next to Step 2 (Transitions)
Leave this as is and click Next.
You are now on the 3. Expiration step.
Check the checkboxes for both Current Version and Previous Versions.
Click the checkbox for "Expire current version of object" and enter the number 1 for "After _____ days from object creation
Click the checkbox for "Permanently delete previous versions" and enter the number 1 for
"After _____ days from becoming a previous version"
click the checkbox for "Clean up incomplete multipart uploads"
and enter the number 1 for "After ____ days from start of upload"
Click Next
Review what you just did.
Click Save
Come back in a day and see how it is doing.
I improved the boto3 answer with Python3 and argv.
Save the following script as something like s3_rm.py.
#!/usr/bin/env python3
import sys
import boto3
def main():
args = sys.argv[1:]
if (len(args) < 1):
print("Usage: {} s3_bucket_name".format(sys.argv[0]))
exit()
s3 = boto3.resource('s3')
bucket = s3.Bucket(args[0])
bucket.object_versions.delete()
# if you want to delete the now-empty bucket as well, uncomment this line:
#bucket.delete()
if __name__ == "__main__":
main()
Add chmod +x s3_rm.py.
Run the function like ./s3_rm.py my_bucket_name.
In the same vein as https://stackoverflow.com/a/63613510/805031 ... this is what I use to clean up accounts before closing them:
# If the data is too large, apply LCP to remove all objects within a day
# Create lifecycle-expire.json with the LCP required to purge all objects
# Based on instructions from: https://aws.amazon.com/premiumsupport/knowledge-center/s3-empty-bucket-lifecycle-rule/
cat << JSON > lifecycle-expire.json
{
"Rules": [
{
"ID": "remove-all-objects-asap",
"Filter": {
"Prefix": ""
},
"Status": "Enabled",
"Expiration": {
"Days": 1
},
"NoncurrentVersionExpiration": {
"NoncurrentDays": 1
},
"AbortIncompleteMultipartUpload": {
"DaysAfterInitiation": 1
}
},
{
"ID": "remove-expired-delete-markers",
"Filter": {
"Prefix": ""
},
"Status": "Enabled",
"Expiration": {
"ExpiredObjectDeleteMarker": true
}
}
]
}
JSON
# Apply to ALL buckets
aws s3 ls | cut -d" " -f 3 | xargs -I{} aws s3api put-bucket-lifecycle-configuration --bucket {} --lifecycle-configuration file://lifecycle-expire.json
# Apply to a single bucket; replace $BUCKET_NAME
aws s3api put-bucket-lifecycle-configuration --bucket $BUCKET_NAME --lifecycle-configuration file://lifecycle-expire.json
...then a day later you can come back and delete the buckets using something like:
# To force empty/delete all buckets
aws s3 ls | cut -d" " -f 3 | xargs -I{} aws s3 rb s3://{} --force
# To remove only empty buckets
aws s3 ls | cut -d" " -f 3 | xargs -I{} aws s3 rb s3://{}
# To force empty/delete a single bucket; replace $BUCKET_NAME
aws s3 rb s3://$BUCKET_NAME --force
It saves a lot of time and money so worth doing when you have many TBs to delete.
I found the other answers either incomplete or requiring external dependencies to be installed (like boto), so here is one that is inspired by those but goes a little deeper.
As documented in Working with Delete Markers, before a versioned bucket can be removed, all its versions must be completely deleted, which is a 2-step process:
"delete" all version objects in the bucket, which marks them as
deleted but does not actually delete them
complete the deletion by deleting all the deletion marker objects
Here is the pure CLI solution that worked for me (inspired by the other answers):
#!/usr/bin/env bash
bucket_name=...
del_s3_bucket_obj()
{
local bucket_name=$1
local obj_type=$2
local query="{Objects: $obj_type[].{Key:Key,VersionId:VersionId}}"
local s3_objects=$(aws s3api list-object-versions --bucket ${bucket_name} --output=json --query="$query")
if ! (echo $s3_objects | grep -q '"Objects": null'); then
aws s3api delete-objects --bucket "${bucket_name}" --delete "$s3_objects"
fi
}
del_s3_bucket_obj ${bucket_name} 'Versions'
del_s3_bucket_obj ${bucket_name} 'DeleteMarkers'
Once this is done, the following will work:
aws s3 rb "s3://${bucket_name}"
Not sure how it will fare with 1000+ objects though, if anyone can report that would be awesome.
By far the easiest method I've found is to use this CLI tool, s3wipe. It's provided as a docker container so you can use it like so:
$ docker run -it --rm slmingol/s3wipe --help
usage: s3wipe [-h] --path PATH [--id ID] [--key KEY] [--dryrun] [--quiet]
[--batchsize BATCHSIZE] [--maxqueue MAXQUEUE]
[--maxthreads MAXTHREADS] [--delbucket] [--region REGION]
Recursively delete all keys in an S3 path
optional arguments:
-h, --help show this help message and exit
--path PATH S3 path to delete (e.g. s3://bucket/path)
--id ID Your AWS access key ID
--key KEY Your AWS secret access key
--dryrun Don't delete. Print what we would have deleted
--quiet Suprress all non-error output
--batchsize BATCHSIZE # of keys to batch delete (default 100)
--maxqueue MAXQUEUE Max size of deletion queue (default 10k)
--maxthreads MAXTHREADS Max number of threads (default 100)
--delbucket If S3 path is a bucket path, delete the bucket also
--region REGION Region of target S3 bucket. Default vaue `us-
east-1`
Example
Here's an example where I'm deleting all the versioned objects in a bucket and then deleting the bucket:
$ docker run -it --rm slmingol/s3wipe \
--id $(aws configure get default.aws_access_key_id) \
--key $(aws configure get default.aws_secret_access_key) \
--path s3://bw-tf-backends-aws-example-logs \
--delbucket
[2019-02-20#03:39:16] INFO: Deleting from bucket: bw-tf-backends-aws-example-logs, path: None
[2019-02-20#03:39:16] INFO: Getting subdirs to feed to list threads
[2019-02-20#03:39:18] INFO: Done deleting keys
[2019-02-20#03:39:18] INFO: Bucket is empty. Attempting to remove bucket
How it works
There's a bit to unpack here but the above is doing the following:
docker run -it --rm mikelorant/s3wipe - runs s3wipe container interactively and deletes it after each execution
--id & --key - passing our access key and access id in
aws configure get default.aws_access_key_id - retrieves our key id
aws configure get default.aws_secret_access_key - retrieves our key secret
--path s3://bw-tf-backends-aws-example-logs - bucket that we want to delete
--delbucket - deletes bucket once emptied
References
https://github.com/slmingol/s3wipe
Is there a way to export an AWS CLI Profile to Environment Variables?
https://cloud.docker.com/u/slmingol/repository/docker/slmingol/s3wipe
https://gist.github.com/wknapik/191619bfa650b8572115cd07197f3baf
#!/usr/bin/env bash
set -eEo pipefail
shopt -s inherit_errexit >/dev/null 2>&1 || true
if [[ ! "$#" -eq 2 || "$1" != --bucket ]]; then
echo -e "USAGE: $(basename "$0") --bucket <bucket>"
exit 2
fi
# $# := bucket_name
empty_bucket() {
local -r bucket="${1:?}"
for object_type in Versions DeleteMarkers; do
local opt=() next_token=""
while [[ "$next_token" != null ]]; do
page="$(aws s3api list-object-versions --bucket "$bucket" --output json --max-items 1000 "${opt[#]}" \
--query="[{Objects: ${object_type}[].{Key:Key, VersionId:VersionId}}, NextToken]")"
objects="$(jq -r '.[0]' <<<"$page")"
next_token="$(jq -r '.[1]' <<<"$page")"
case "$(jq -r .Objects <<<"$objects")" in
'[]'|null) break;;
*) opt=(--starting-token "$next_token")
aws s3api delete-objects --bucket "$bucket" --delete "$objects";;
esac
done
done
}
empty_bucket "${2#s3://}"
E.g. empty_bucket.sh --bucket foo
This will delete all object versions and delete markers in a bucket in batches of 1000. Afterwards, the bucket can be deleted with aws s3 rb s3://foo.
Requires bash, awscli and jq.
This works for me. Maybe running later versions of something and above > 1000 items. been running a couple of million files now. However its still not finished after half a day and no means to validate in AWS GUI =/
# Set bucket name to clearout
BUCKET = 'bucket-to-clear'
import boto3
s3 = boto3.resource('s3')
bucket = s3.Bucket(BUCKET)
max_len = 1000 # max 1000 items at one req
chunk_counter = 0 # just to keep track
keys = [] # collect to delete
# clear files
def clearout():
global bucket
global chunk_counter
global keys
result = bucket.delete_objects(Delete=dict(Objects=keys))
if result["ResponseMetadata"]["HTTPStatusCode"] != 200:
print("Issue with response")
print(result)
chunk_counter += 1
keys = []
print(". {n} chunks so far".format(n=chunk_counter))
return
# start
for key in bucket.object_versions.all():
item = {'Key': key.object_key, 'VersionId': key.id}
keys.append(item)
if len(keys) >= max_len:
clearout()
# make sure last files are cleared as well
if len(keys) > 0:
clearout()
print("")
print("Done, {n} items deleted".format(n=chunk_counter*max_len))
#bucket.delete() #as per usual uncomment if you're sure!
To add to python solutions provided here: if you are getting boto.exception.S3ResponseError: S3ResponseError: 400 Bad Request error, try creating ~/.boto file with the following data:
[Credentials]
aws_access_key_id = aws_access_key_id
aws_secret_access_key = aws_secret_access_key
[s3]
host=s3.eu-central-1.amazonaws.com
aws_access_key_id = aws_access_key_id
aws_secret_access_key = aws_secret_access_key
Helped me to delete bucket in Frankfurt region.
Original answer: https://stackoverflow.com/a/41200567/2586441
If you use AWS SDK for JavaScript S3 Client for Node.js (#aws-sdk/client-s3), you can use following code:
const { S3Client, ListObjectsCommand } = require('#aws-sdk/client-s3')
const endpoint = 'YOUR_END_POINT'
const region = 'YOUR_REGION'
// Create an Amazon S3 service client object.
const s3Client = new S3Client({ region, endpoint })
const deleteEverythingInBucket = async bucketName => {
console.log('Deleting all object in the bucket')
const bucketParams = {
Bucket: bucketName
}
try {
const command = new ListObjectsCommand(bucketParams)
const data = await s3Client.send(command)
console.log('Bucket Data', JSON.stringify(data))
if (data?.Contents?.length > 0) {
console.log('Removing objects in the bucket', data.Contents.length)
for (const object of data.Contents) {
console.log('Removing object', object)
if (object.Key) {
try {
await deleteFromS3({
Bucket: bucketName,
Key: object.Key
})
} catch (err) {
console.log('Error on object delete', err)
}
}
}
}
} catch (err) {
console.log('Error creating presigned URL', err)
}
}
For my case, I wanted to be sure that all objects for specific prefixes would be deleted. So, we generate a list of all objects for each prefix, divide it by 1k records (AWS limitation), and delete them.
Please note that AWS CLI and jq must be installed and configured.
A text file with prefixes that we want to delete was created (in the example below prefixes.txt).
The format is:
prefix1
prefix2
And this is a shell script (also please change the BUCKET_NAME with the real name):
#!/bin/sh
BUCKET="BUCKET_NAME"
PREFIXES_FILE="prefixes.txt"
if [ -f "$PREFIXES_FILE" ]; then
while read -r current_prefix
do
printf '***** PREFIX %s *****\n' "$current_prefix"
OLD_OBJECTS_FILE="$current_prefix-all.json"
if [ -f "$OLD_OBJECTS_FILE" ]; then
printf 'Deleted %s...\n' "$OLD_OBJECTS_FILE"
rm "$OLD_OBJECTS_FILE"
fi
cmd="aws s3api list-object-versions --bucket \"$BUCKET\" --prefix \"$current_prefix/\" --query \"[Versions,DeleteMarkers][].{Key: Key, VersionId: VersionId}\" >> $OLD_OBJECTS_FILE"
echo "$cmd"
eval "$cmd"
no_of_obj=$(cat "$OLD_OBJECTS_FILE" | jq 'length')
i=0
page=0
#Get old version Objects
echo "Objects versions count: $no_of_obj"
while [ $i -lt "$no_of_obj" ]
do
next=$((i+999))
old_versions=$(cat "$OLD_OBJECTS_FILE" | jq '.[] | {Key,VersionId}' | jq -s '.' | jq .[$i:$next])
paged_file_name="$current_prefix-page-$page.json"
cat << EOF > "$paged_file_name"
{"Objects":$old_versions, "Quiet":true}
EOF
echo "Deleting records from $i - $next"
cmd="aws s3api delete-objects --bucket \"$BUCKET\" --delete file://$paged_file_name"
echo "$cmd"
eval "$cmd"
i=$((i+1000))
page=$((page+1))
done
done < "$PREFIXES_FILE"
else
echo "$PREFIXES_FILE does not exist."
fi
If you want just to check the list of objects and don't delete them immediately - please comment/remove the last eval "$cmd".
I needed to delete older object versions but keep the current version in the bucket. Code uses iterators, works on buckets of any size with any number of objects.
import boto3
from itertools import islice
bucket = boto3.resource('s3').Bucket('bucket_name'))
all_versions = bucket.object_versions.all()
stale_versions = iter(filter(lambda x: not x.is_latest, all_versions))
pages = iter(lambda: tuple(islice(stale_versions, 1000)), ())
for page in pages:
bucket.delete_objects(
Delete={
'Objects': [{
'Key': item.key,
'VersionId': item.version_id
} for item in page]
})
S3=s3://tmobi-processed/biz.db/
aws s3 rm ${S3} --recursive
BUCKET=`echo ${S3} | egrep -o 's3://[^/]*' | sed -e s/s3:\\\\/\\\\///g`
PREFIX=`echo ${S3} | sed -e s/s3:\\\\/\\\\/${BUCKET}\\\\///g`
aws s3api list-object-versions \
--bucket ${BUCKET} \
--prefix ${PREFIX} |
jq -r '.Versions[] | .Key + " " + .VersionId' |
while read key id ; do
aws s3api delete-object \
--bucket ${BUCKET} \
--key ${key} \
--version-id ${id} >> versions.txt
done
aws s3api list-object-versions \
--bucket ${BUCKET} \
--prefix ${PREFIX} |
jq -r '.DeleteMarkers[] | .Key + " " + .VersionId' |
while read key id ; do
aws s3api delete-object \
--bucket ${BUCKET} \
--key ${key} \
--version-id ${id} >> delete_markers.txt
done
You can use aws-cli to delete s3 bucket
aws s3 rb s3://your-bucket-name
If aws cli is not installed in your computer you can your following commands:
For Linux or ubuntu:
sudo apt-get install aws-cli
Then check it is installed or not by:
aws --version
Now configure it by providing aws-access-credentials
aws configure
Then give the access key and secret access key and your region

How to cp file only if it does not exist, throw error otherwise?

aws s3 cp "dist/myfile" "s3://my-bucket/production/myfile"
It always copies myfile to s3 - I would like to copy file ONLY if it does no exist, throw error otherwise. How I can do it? Or at least how I can use awscli to check if file exists?
You could test for the existence of a file by listing the file, and seeing whether it returns something. For example:
aws s3 ls s3://bucket/file.txt | wc -l
This would return a zero (no lines) if the file does not exist.
If you only want to copy a file if it does not exist, try the sync command, e.g.:
aws s3 sync . s3://bucket/ --exclude '*' --include 'file.txt'
This will synchronize the local file with the remote object, only copying it if it does not exist or if the local file is different to the remote object.
So, turns out that "aws s3 sync" doesn't do files, only directories. If you give it a file, you get...interesting...behavior, since it treats anything you give it like a directory and throws a slash on it. At least aws-cli/1.6.7 Python/2.7.5 Darwin/13.4.0 does.
%% date > test.txt
%% aws s3 sync test.txt s3://bucket/test.txt
warning: Skipping file /Users/draistrick/aws/test.txt/. File does not exist.
So, if you -really- only want to sync a file (only upload if exists, and if checksum matches) you can do it:
file="test.txt"
aws s3 sync --exclude '*' --include "$file" "$(dirname $file)" "s3://bucket/"
Note the exclude/include order - if you reverse that, it won't include anything. And your source and include path need to have sanity around their matching, so maybe a $(basename $file) is in order for --include if you're using full paths... aws --debug s3 sync is your friend here to see how the includes evaluate.
And don't forget the target is a directory key, not a file key.
Here's a working example:
%% file="test.txt"
%% date >> $file
%% aws s3 sync --exclude '*' --include "$file" "$(dirname $file)" "s3://bucket/"
upload: ./test.txt to s3://bucket/test.txt/test.txt
%% aws s3 sync --exclude '*' --include "$file" "$(dirname $file)" "s3://bucket/"
%% date >> $file
%% aws s3 sync --exclude '*' --include "$file" "$(dirname $file)" "s3://bucket/"
upload: ./test.txt to s3://bucket/test.txt/test.txt
(now, if only there were a way to ask aws s3 to -just- validate the checksum, since it seems to always do multipart style checksums.. oh, maybe some --dryrun and some output scraping and sync..)
You can do this by listing and copying if and only if the list succeeds.
aws s3 ls "s3://my-bucket/production/myfile" || aws s3 cp "dist/myfile" "s3://my-bucket/production/myfile"
Edit: replaced && to || to have the desired effect of if list fails do copy
You can also check the existence of a file by aws s3api head-object subcommand. An advantage of this over aws s3 ls is that it just requires s3:GetObject permission instead of s3:ListBucket.
$ aws s3api head-object --bucket ${BUCKET} --key ${EXISTENT_KEY}
{
"AcceptRanges": "bytes",
"LastModified": "Wed, 1 Jan 2020 00:00:00 GMT",
"ContentLength": 10,
"ETag": "\"...\"",
"VersionId": "...",
"ContentType": "binary/octet-stream",
"ServerSideEncryption": "AES256",
"Metadata": {}
}
$ echo $?
0
$ aws s3api head-object --bucket ${BUCKET} --key ${NON_EXISTENT_KEY}
An error occurred (403) when calling the HeadObject operation: Forbidden
$ echo $?
255
Note that the HTTP status code for the non-existent object depends on whether you have the s3:ListObject permission. See the API document for more details:
If you have the s3:ListBucket permission on the bucket, Amazon S3 returns an HTTP status code 404 ("no such key") error.
If you don’t have the s3:ListBucket permission, Amazon S3 returns an HTTP status code 403 ("access denied") error.
AWS HACK
You can run the following command to raise ERROR if the file already exists
Run aws s3 sync command to sync the file to s3, it will return the copied path if the file doesn't exist or it will give blank output if it exits
Run wc -c command to check the character count and raise an error if the output is zero
com=$(aws s3 sync dist/ s3://my-bucket/production/ | wc -c);if [[ $com -ne 0
]]; then exit 1; else exit 0; fi;
OR
#!/usr/bin/env bash
com=$(aws s3 sync dist s3://my-bucket/production/ | wc -c)
echo "hello $com"
if [[ $com -ne 0 ]]; then
echo "File already exists"
exit 1
else
echo "success"
exit 0
fi
I voted up aviggiano. Using his example above, I was able to get this to work in my windows .bat file. If the S3 path exists it will throw an error and end the batch job. If the file does not exist it will continue on to perform the copy function. Hope this helps some one.
:Step1
aws s3 ls s3://00000000000-fake-bucket/my/s3/path/inbound/test.txt && ECHO Could not copy to S3 bucket becasue S3 Object already exists, ending script. && GOTO :Failure
ECHO No file found in bucket, begin upload.
aws s3 cp Z:\MY\LOCAL\PATH\test.txt s3://00000000000-fake-bucket/my/s3/path/inbound/test.txt --exclude "*" --include "*.txt"
:Step2
ECHO YOU MADE IT, LET'S CELEBRATE
IF %ERRORLEVEL% == 0 GOTO :Success
GOTO :Failure
:Success
echo Job Endedsuccess
GOTO :ExitScript
:Failure
echo BC_Script_Execution_Complete Failure
GOTO :ExitScript
:ExitScript
I am running AWS on windows. and this is my simple script.
rem clean work files:
if exist SomeFileGroup_remote.txt del /q SomeFileGroup_remote.txt
if exist SomeFileGroup_remote-fileOnly.txt del /q SomeFileGroup_remote-fileOnly.txt
if exist SomeFileGroup_Local-fileOnly.txt del /q SomeFileGroup_Local-fileOnly.txt
if exist SomeFileGroup_remote-Download-fileOnly.txt del /q SomeFileGroup_remote-Download-fileOnly.txt
Rem prep:
call F:\Utilities\BIN\mhedate.cmd
aws s3 ls s3://awsbucket//someuser#domain.com/BulkRecDocImg/folder/folder2/ --recursive >>SomeFileGroup_remote.txt
for /F "tokens=1,2,3,4* delims= " %%i in (SomeFileGroup_remote.txt) do #echo %%~nxl >>SomeFileGroup_remote-fileOnly.txt
dir /b temp\*.* >>SomeFileGroup_Local-fileOnly.txt
findstr /v /I /l /G:"SomeFileGroup_Local-fileOnly.txt" SomeFileGroup_remote-fileOnly.txt >>SomeFileGroup_remote-Download-fileOnly.txt
Rem Download:
for /F "tokens=1* delims= " %%i in (SomeFileGroup_remote-Download-fileOnly.txt) do (aws s3 cp s3://awsbucket//someuser#domain.com/BulkRecDocImg/folder/folder2/%%~nxi "temp" >>"SomeFileGroup_Download_%DATE.YEAR%%DATE.MONTH%%DATE.DAY%.log")
I Added Date to the path in-order to not override the file:
aws cp videos/video_name.mp4 s3://BUCKET_NAME/$(date +%D-%H:%M:%S)
So that way I will have history and the existing file won't be overriddend.