AWS CLI rm folder with a special character - amazon-web-services

I'm trying to delete an empty folder structure using the cli and noting seems to be working. The root folder is /+7000/ and I'm positive it's because of the "+"
My rm commands are working on other folders without special characters and the cli isn't returning an error. How would I build this script to recognize this folder and get rid of it?
Test Scripts
(%2B is '+' in hex)
>aws s3 ls
2022-07-13 10:29:36 0 %2B7000/
>
>
//Attempts
> aws s3 rm s3://namespace/ --exclude "*" --include "%2B7000/"
> aws s3 rm s3://namespace/ --exclude "*" --include "*7000/"
> aws s3 rm s3://namespace/ --exclude "*" --include "[+]7000/"
> aws s3 rm s3://namespace/ --exclude "*" --include "'+7000'/"
> aws s3 rm s3://namespace/"\+7000/"
> aws s3 rm s3://namespace/%2B7000/
delete: s3://namespace/%2B7000/
Most attempts return a successful deletion but the folder is still there.
Output from aws s3api list-objects --bucket namespace
{
"Key": "+7000/",
"LastModified": "2022-07-13T14:29:36.884Z",
"ETag": "\"100\"",
"Size": 0,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "vmsstg",
"ID": "1-2-3-4"
}
}

If aws s3 rm isn't working, you can try the 'lower-level' API call:
aws s3api delete-object --bucket namespace --key %2B7000/
Given that you said that %2B is plus, and based on your comment, you can use:
aws s3api delete-object --bucket namespace --key "+7000/"
I guess the plus sign got translated into 'URL Encode' syntax somewhere along the way.
Another approach is to use an AWS SDK (eg boto3 for Python) to retrieve the Key and then delete the object by passing back the exact value. This would avoid any encoding in the process.

Related

AWS S3 list all files with specific content type

I want to CLI list all S3 bucket files that have content type = binary/octet-stream.
aws s3 ls s3://mybucket -r -content-type???
How to list files with content type = binary/octet-stream?
How to list all files with the content type of each file?
You would need to:
List the objects in the bucket
For each object, call aws s3api head-object --bucket xxx --key xxx
It will return:
{
"AcceptRanges": "bytes",
"LastModified": "2014-03-10T21:59:20+00:00",
"ContentLength": 41603,
"ETag": "\"eca134ebe408fdb1f3494d7d916bf027\"",
"VersionId": "null",
"ContentType": "image/jpeg",
"ServerSideEncryption": "AES256",
"Metadata": {}
}
You would need some shell-scripting skills to be able to do this with the AWS CLI. It would be easier to accomplish with a scripting language, such as Python:
import boto3
s3_resource = boto3.resource('s3')
for object in s3_resource.Bucket('BUCKETNAME').objects.all():
print(object.key, object.get()['ContentType'])

Undelete folders from AWS S3

I have a S3 bucket with versioning enabled. It is possible to undelete files, but how can I undelete folders?
I know, S3 does not have folders... but how can I undelete common prefixes? Is there a possibility to undelete files recursively?
I created this simple bash script to restore all the files in an S3 folder I deleted:
#!/bin/bash
recoverfiles=$(aws s3api list-object-versions --bucket MyBucketName --prefix TheDeletedFolder/ --query "DeleteMarkers[?IsLatest && starts_with(LastModified,'yyyy-mm-dd')].{Key:Key,VersionId:VersionId}")
for row in $(echo "${recoverfiles}" | jq -c '.[]'); do
key=$(echo "${row}" | jq -r '.Key' )
versionId=$(echo "${row}" | jq -r '.VersionId' )
echo aws s3api delete-object --bucket MyBucketName --key $key --version-id $versionId
done
yyyy-mm-dd = the date the folder was deleted
I found a satisfying solution here, which is described in more details here.
To sum up, there is no out-of-the-box tool for this, but a simple bash script wraps the AWS tool "s3api" to achieve the recursive undelete.
The solution worked for me. The only drawback I found is, that Amazon seems to throttle the restore operations after about 30.000 files.
You cannot undelete a common prefix. You would need to undelete one object at a time. When an object appears, any associated folder will also reappear.
Undeleting can be accomplished in two ways:
Delete the Delete Marker that will reverse the deletion, or
Copy a previous version of the object to itself, which will make the newest version newer than the Delete Marker, so it will reappear. (I hope you understood that!)
If a folder and its contents are deleted you can recover them using the below script inspired by a previous answer
The script is applicable to an S3 bucket where versioning is enabeled before hand. It uses the delete marker tag to restore files in an S3 prefix.
#!/bin/bash
# Inspired by https://www.dmuth.org/how-to-undelete-files-in-amazon-s3/
# This script can be used to undelete objects from an S3 bucket.
# When run, it will print out a list of AWS commands to undelete files, which you
# can then pipe into Bash.
#
#
# You will need the AWS CLI tool from https://aws.amazon.com/cli/ in order to run this script.
#
# Note that you must have the following permissions via IAM:
#
# Bucket permissions:
#
# s3:ListBucket
# s3:ListBucketVersions
#
# File permissions:
#
# s3:PutObject
# s3:GetObject
# s3:DeleteObject
# s3:DeleteObjectVersion
#
# If you want to do this in a "quick and dirty manner", you could just grant s3:* to
# the account, but I don't really recommend that.
#
# profile = company
# bucket = company-s3-bucket
# prefix = directory1/directory2/directory3/lastdirectory/
# pattern = (.*)
# USAGE
# bash undelete.sh > recover_files.txt | bash
read -p "Enter your aws profile: " PROFILE
read -p "Enter your S3 bucket name: " BUCKET
read -p "Enter your S3 directory/prefix to be recovered from, leave empty for to recover all of the S3 bucket: " PREFIX
read -p "Enter the file pattern looking to recover, leave empty for all: " PATTERN
# Make sure Profile and Bucket are entered
[[ -z "$PROFILE" ]] && { echo "Profile is empty" ; exit 1; }
[[ -z "$BUCKET" ]] && { echo "Bucket is empty" ; exit 1; }
# Fill PATTERN to match all if empty
PATTERN=${PATTERN:-(.*)}
# Errors are fatal
set -e
if [ "$PREFIX" = "" ];
# To recover all of the S3 bucket
then
aws --profile ${PROFILE} --output text s3api list-object-versions --bucket ${BUCKET} \
| grep -i $PATTERN \
| grep -E "^DELETEMARKERS" \
| awk -v PROFILE=$PROFILE -v BUCKET=$BUCKET -v PREFIX=$PREFIX \
-F "[\t]+" '{ print "aws --profile " PROFILE " s3api delete-object --bucket " BUCKET "--key \""$3"\" --version-id "$5";"}'
# To recover a directory
else
aws --profile ${PROFILE} --output text s3api list-object-versions --bucket ${BUCKET} --prefix ${PREFIX} \
| grep -E $PATTERN \
| grep -E "^DELETEMARKERS" \
| awk -v PROFILE=$PROFILE -v BUCKET=$BUCKET -v PREFIX=$PREFIX \
-F "[\t]+" '{ print "aws --profile " PROFILE " s3api delete-object --bucket " BUCKET "--key \""$3"\" --version-id "$5";"}'
fi

How can I read the metadata for every item in an S3 bucket?

I can set Cache-Control metadata on every item in an S3 bucket using the following command (from this answer):
aws s3 cp s3://mybucket s3://mybucket --recursive --metadata-directive REPLACE \
--cache-control max-age=86400
Is there a way to read the Cache-Control metadata for every item in a bucket?
This bash one-liner should work (but it is very slow since it sends separate request for each object):
IFS=$'\n'; for object in `aws s3 ls s3://my-bucket-name --recursive | tr -s ' ' | cut -d' ' -f4-`; do echo $object `aws s3api head-object --bucket my-bucket-name --key $object --query CacheControl` ; done

How do I set metadata with aws s3api?

I'm trying to make my CloudFront hosted blog redirect /feed/atom/index.html to /index.xml. I have the following script that is supposed to set up redirect headers for me:
#!/bin/sh
redirect() aws s3api copy-object \
--copy-source blog.afoolishmanifesto.com$1 \
--bucket blog.afoolishmanifesto.com --key $1 \
--metadata x-amz-website-redirect-location=$2 \
--metadata-directive REPLACE
redirect /feed/atom/index.html /index.xml
After running the script I get the following output:
{
"CopyObjectResult": {
"LastModified": "2016-03-27T07:26:03.000Z",
"ETag": "\"40c27e3a5ea160c6695d7f34de8b4dea\""
}
}
And when I refresh the object in the AWS console view of S3 I do not see a Website Redirect Location (or x-amz-website-redirect-location) piece of metadata for the object in question. What can I do to ensure that the redirect is configured correctly?
Note: I have tried specifying the metadata as JSON and as far as I can tell it made no difference.
UPDATE: I have left the above question the same, as it still applies to metadata, but if you are trying to create a redirect with aws s3api you should use the --website-redirect-location option, not --metadata.
It is not able to create a key /feed/atom/index.html in the bucket, so no metadata attribute was not created. Instead you should create feed/atom/index.html. I'll modify it like:
#!/bin/sh
redirect() aws s3api copy-object \
--copy-source blog.afoolishmanifesto.com/$1 \
--bucket blog.afoolishmanifesto.com --key $1 \
--metadata x-amz-website-redirect-location=$2 \
--metadata-directive REPLACE
redirect feed/atom/index.html /index.xml
In my solution, notice / in --copy-source and the first argument to redirect script missing the leading /

Does aws-cli confirm checksums when uploading files to S3, or do I need to manage that myself?

If I'm uploading data to S3 using the aws-cli (i.e. using aws s3 cp), does aws-cli do any work to confirm that the resulting file in S3 matches the original file, or do I somehow need to manage that myself?
Based on this answer and the Java API documentation for putObject(), it looks like it's possible to verify the MD5 checksum after upload. However, I can't find a definitive answer on whether aws-cli actually does that.
It matters to me because I'm intending to upload GPG-encrypted files from a backup process, and I'd like some confidence that what's been stored in S3 actually matches the original.
According to the faq from the aws-cli github, the checksums are checked in most cases during upload and download.
Key points for uploads:
The AWS CLI calculates the Content-MD5 header for both standard and
multipart uploads.
If the checksum that S3 calculates does not match
the Content-MD5 provided, S3 will not store the object and instead
will return an error message back the AWS CLI.
The AWS CLI will retry this error up to 5 times before giving up and exiting with a nonzero exit code.
The AWS support page How do I ensure data integrity of objects uploaded to or downloaded from Amazon S3? describes how to achieve this.
Firstly determine the base64 encoded md5sum of the file you wish to upload:
$ md5_sum_base64="$( openssl md5 -binary my-file | base64 )"
Then use the s3api to upload the file:
$ aws s3api put-object --bucket my-bucket --key my-file-name --body my-file-path --content-md5 "$md5_sum_base64"
Note the use of the --content-md5 flag, the help for this flag states:
--content-md5 (string) The base64-encoded 128-bit MD5 digest of the part data.
This does not say much about why to use this flag, but we can find this information in the API documentation for put object:
To ensure that data is not corrupted traversing the network, use the Content-MD5 header. When you use this header, Amazon S3 checks the object against the provided MD5 value and, if they do not match, returns an error. Additionally, you can calculate the MD5 while putting an object to Amazon S3 and compare the returned ETag to the calculated MD5 value.
Using this flag causes S3 to verify that the file hash serverside matches the specified value. If the hashes match s3 will return the ETag:
{
"ETag": "\"599393a2c526c680119d84155d90f1e5\""
}
The ETag value will usually be the hexadecimal md5sum (see this question for some scenarios where this may not be the case).
If the hash does not match the one you specified you get an error.
A client error (InvalidDigest) occurred when calling the PutObject operation: The Content-MD5 you specified was invalid.
In addition to this you can also add the file md5sum to the file metadata as an additional check:
$ aws s3api put-object --bucket my-bucket --key my-file-name --body my-file-path --content-md5 "$md5_sum_base64" --metadata md5chksum="$md5_sum_base64"
After upload you can issue the head-object command to check the values.
$ aws s3api head-object --bucket my-bucket --key my-file-name
{
"AcceptRanges": "bytes",
"ContentType": "binary/octet-stream",
"LastModified": "Thu, 31 Mar 2016 16:37:18 GMT",
"ContentLength": 605,
"ETag": "\"599393a2c526c680119d84155d90f1e5\"",
"Metadata": {
"md5chksum": "WZOTosUmxoARnYQVXZDx5Q=="
}
}
Here is a bash script that uses content md5 and adds metadata and then verifies that the values returned by S3 match the local hashes:
#!/bin/bash
set -euf -o pipefail
# assumes you have aws cli, jq installed
# change these if required
tmp_dir="$HOME/tmp"
s3_dir="foo"
s3_bucket="stack-overflow-example"
aws_region="ap-southeast-2"
aws_profile="my-profile"
test_dir="$tmp_dir/s3-md5sum-test"
file_name="MailHog_linux_amd64"
test_file_url="https://github.com/mailhog/MailHog/releases/download/v1.0.0/MailHog_linux_amd64"
s3_key="$s3_dir/$file_name"
return_dir="$( pwd )"
cd "$tmp_dir" || exit
mkdir "$test_dir"
cd "$test_dir" || exit
wget "$test_file_url"
md5_sum_hex="$( md5sum $file_name | awk '{ print $1 }' )"
md5_sum_base64="$( openssl md5 -binary $file_name | base64 )"
echo "$file_name hex = $md5_sum_hex"
echo "$file_name base64 = $md5_sum_base64"
echo "Uploading $file_name to s3://$s3_bucket/$s3_dir/$file_name"
aws \
--profile "$aws_profile" \
--region "$aws_region" \
s3api put-object \
--bucket "$s3_bucket" \
--key "$s3_key" \
--body "$file_name" \
--metadata md5chksum="$md5_sum_base64" \
--content-md5 "$md5_sum_base64"
echo "Verifying sums match"
s3_md5_sum_hex=$( aws --profile "$aws_profile" --region "$aws_region" s3api head-object --bucket "$s3_bucket" --key "$s3_key" | jq -r '.ETag' | sed 's/"//'g )
s3_md5_sum_base64=$( aws --profile "$aws_profile" --region "$aws_region" s3api head-object --bucket "$s3_bucket" --key "$s3_key" | jq -r '.Metadata.md5chksum' )
if [ "$md5_sum_hex" == "$s3_md5_sum_hex" ] && [ "$md5_sum_base64" == "$s3_md5_sum_base64" ]; then
echo "checksums match"
else
echo "something is wrong checksums do not match:"
cat <<EOM | column -t -s ' '
$file_name file hex: $md5_sum_hex s3 hex: $s3_md5_sum_hex
$file_name file base64: $md5_sum_base64 s3 base64: $s3_md5_sum_base64
EOM
fi
echo "Cleaning up"
cd "$return_dir"
rm -rf "$test_dir"
aws \
--profile "$aws_profile" \
--region "$aws_region" \
s3api delete-object \
--bucket "$s3_bucket" \
--key "$s3_key"