How do I set metadata with aws s3api? - amazon-web-services

I'm trying to make my CloudFront hosted blog redirect /feed/atom/index.html to /index.xml. I have the following script that is supposed to set up redirect headers for me:
#!/bin/sh
redirect() aws s3api copy-object \
--copy-source blog.afoolishmanifesto.com$1 \
--bucket blog.afoolishmanifesto.com --key $1 \
--metadata x-amz-website-redirect-location=$2 \
--metadata-directive REPLACE
redirect /feed/atom/index.html /index.xml
After running the script I get the following output:
{
"CopyObjectResult": {
"LastModified": "2016-03-27T07:26:03.000Z",
"ETag": "\"40c27e3a5ea160c6695d7f34de8b4dea\""
}
}
And when I refresh the object in the AWS console view of S3 I do not see a Website Redirect Location (or x-amz-website-redirect-location) piece of metadata for the object in question. What can I do to ensure that the redirect is configured correctly?
Note: I have tried specifying the metadata as JSON and as far as I can tell it made no difference.
UPDATE: I have left the above question the same, as it still applies to metadata, but if you are trying to create a redirect with aws s3api you should use the --website-redirect-location option, not --metadata.

It is not able to create a key /feed/atom/index.html in the bucket, so no metadata attribute was not created. Instead you should create feed/atom/index.html. I'll modify it like:
#!/bin/sh
redirect() aws s3api copy-object \
--copy-source blog.afoolishmanifesto.com/$1 \
--bucket blog.afoolishmanifesto.com --key $1 \
--metadata x-amz-website-redirect-location=$2 \
--metadata-directive REPLACE
redirect feed/atom/index.html /index.xml
In my solution, notice / in --copy-source and the first argument to redirect script missing the leading /

Related

AWS CLI rm folder with a special character

I'm trying to delete an empty folder structure using the cli and noting seems to be working. The root folder is /+7000/ and I'm positive it's because of the "+"
My rm commands are working on other folders without special characters and the cli isn't returning an error. How would I build this script to recognize this folder and get rid of it?
Test Scripts
(%2B is '+' in hex)
>aws s3 ls
2022-07-13 10:29:36 0 %2B7000/
>
>
//Attempts
> aws s3 rm s3://namespace/ --exclude "*" --include "%2B7000/"
> aws s3 rm s3://namespace/ --exclude "*" --include "*7000/"
> aws s3 rm s3://namespace/ --exclude "*" --include "[+]7000/"
> aws s3 rm s3://namespace/ --exclude "*" --include "'+7000'/"
> aws s3 rm s3://namespace/"\+7000/"
> aws s3 rm s3://namespace/%2B7000/
delete: s3://namespace/%2B7000/
Most attempts return a successful deletion but the folder is still there.
Output from aws s3api list-objects --bucket namespace
{
"Key": "+7000/",
"LastModified": "2022-07-13T14:29:36.884Z",
"ETag": "\"100\"",
"Size": 0,
"StorageClass": "STANDARD",
"Owner": {
"DisplayName": "vmsstg",
"ID": "1-2-3-4"
}
}
If aws s3 rm isn't working, you can try the 'lower-level' API call:
aws s3api delete-object --bucket namespace --key %2B7000/
Given that you said that %2B is plus, and based on your comment, you can use:
aws s3api delete-object --bucket namespace --key "+7000/"
I guess the plus sign got translated into 'URL Encode' syntax somewhere along the way.
Another approach is to use an AWS SDK (eg boto3 for Python) to retrieve the Key and then delete the object by passing back the exact value. This would avoid any encoding in the process.

How to add tags to an S3 Bucket without deleting the existing tags using AWS CLI?

I am using this function:
aws s3api put-bucket-tagging --bucket $bucket --tagging 'TagSet=[{Key=ss,Value=mm}]
It is deleting the existing tags and I can see only one tag.
That is how the API works, yes - that is how PUT APIs work in general, POST APIs may append a new tag or add a new property, etc. but PUT APIs overwrite.
To get this to work despite that you need to retrieve all tags, combine them with the new tags, and then put all tags at once. You can do that easily using e.g. jq:
# assuming there are already tags otherwise the get-bucket-tagging fails
data=$(aws s3api get-bucket-tagging --bucket $bucket | jq '.TagSet += [{"Key":"tag2", "Value": "value2"}]')
aws s3api put-bucket-tagging --bucket $bucket --tagging "$data"
aws s3api get-bucket-tagging --bucket $bucket # should print the merged tags
(Update: added quotations to "$data" because it did not work without quotations.)

Undelete folders from AWS S3

I have a S3 bucket with versioning enabled. It is possible to undelete files, but how can I undelete folders?
I know, S3 does not have folders... but how can I undelete common prefixes? Is there a possibility to undelete files recursively?
I created this simple bash script to restore all the files in an S3 folder I deleted:
#!/bin/bash
recoverfiles=$(aws s3api list-object-versions --bucket MyBucketName --prefix TheDeletedFolder/ --query "DeleteMarkers[?IsLatest && starts_with(LastModified,'yyyy-mm-dd')].{Key:Key,VersionId:VersionId}")
for row in $(echo "${recoverfiles}" | jq -c '.[]'); do
key=$(echo "${row}" | jq -r '.Key' )
versionId=$(echo "${row}" | jq -r '.VersionId' )
echo aws s3api delete-object --bucket MyBucketName --key $key --version-id $versionId
done
yyyy-mm-dd = the date the folder was deleted
I found a satisfying solution here, which is described in more details here.
To sum up, there is no out-of-the-box tool for this, but a simple bash script wraps the AWS tool "s3api" to achieve the recursive undelete.
The solution worked for me. The only drawback I found is, that Amazon seems to throttle the restore operations after about 30.000 files.
You cannot undelete a common prefix. You would need to undelete one object at a time. When an object appears, any associated folder will also reappear.
Undeleting can be accomplished in two ways:
Delete the Delete Marker that will reverse the deletion, or
Copy a previous version of the object to itself, which will make the newest version newer than the Delete Marker, so it will reappear. (I hope you understood that!)
If a folder and its contents are deleted you can recover them using the below script inspired by a previous answer
The script is applicable to an S3 bucket where versioning is enabeled before hand. It uses the delete marker tag to restore files in an S3 prefix.
#!/bin/bash
# Inspired by https://www.dmuth.org/how-to-undelete-files-in-amazon-s3/
# This script can be used to undelete objects from an S3 bucket.
# When run, it will print out a list of AWS commands to undelete files, which you
# can then pipe into Bash.
#
#
# You will need the AWS CLI tool from https://aws.amazon.com/cli/ in order to run this script.
#
# Note that you must have the following permissions via IAM:
#
# Bucket permissions:
#
# s3:ListBucket
# s3:ListBucketVersions
#
# File permissions:
#
# s3:PutObject
# s3:GetObject
# s3:DeleteObject
# s3:DeleteObjectVersion
#
# If you want to do this in a "quick and dirty manner", you could just grant s3:* to
# the account, but I don't really recommend that.
#
# profile = company
# bucket = company-s3-bucket
# prefix = directory1/directory2/directory3/lastdirectory/
# pattern = (.*)
# USAGE
# bash undelete.sh > recover_files.txt | bash
read -p "Enter your aws profile: " PROFILE
read -p "Enter your S3 bucket name: " BUCKET
read -p "Enter your S3 directory/prefix to be recovered from, leave empty for to recover all of the S3 bucket: " PREFIX
read -p "Enter the file pattern looking to recover, leave empty for all: " PATTERN
# Make sure Profile and Bucket are entered
[[ -z "$PROFILE" ]] && { echo "Profile is empty" ; exit 1; }
[[ -z "$BUCKET" ]] && { echo "Bucket is empty" ; exit 1; }
# Fill PATTERN to match all if empty
PATTERN=${PATTERN:-(.*)}
# Errors are fatal
set -e
if [ "$PREFIX" = "" ];
# To recover all of the S3 bucket
then
aws --profile ${PROFILE} --output text s3api list-object-versions --bucket ${BUCKET} \
| grep -i $PATTERN \
| grep -E "^DELETEMARKERS" \
| awk -v PROFILE=$PROFILE -v BUCKET=$BUCKET -v PREFIX=$PREFIX \
-F "[\t]+" '{ print "aws --profile " PROFILE " s3api delete-object --bucket " BUCKET "--key \""$3"\" --version-id "$5";"}'
# To recover a directory
else
aws --profile ${PROFILE} --output text s3api list-object-versions --bucket ${BUCKET} --prefix ${PREFIX} \
| grep -E $PATTERN \
| grep -E "^DELETEMARKERS" \
| awk -v PROFILE=$PROFILE -v BUCKET=$BUCKET -v PREFIX=$PREFIX \
-F "[\t]+" '{ print "aws --profile " PROFILE " s3api delete-object --bucket " BUCKET "--key \""$3"\" --version-id "$5";"}'
fi

Update cloudfront configuration using awscli

I would like to edit/update my CloudFront distribution with awscli.
I'm using latest cli version:
aws-cli/1.11.56 Python/2.7.10 Darwin/16.4.0 botocore/1.5.19
To use cloudfront features in awscli you need to add this to your aws config file:
[preview]
cloudfront = true
I'm getting config of the distribution that I'd like to modify:
aws cloudfront get-distribution-config --id FOO_BAR_ID > cf_config.json
Looks like it worked as expected. Config looks ok for me. Now I'm trying to reconfigure my CF distribution with the same config.
aws cloudfront update-distribution --distribution-config file://cf_config.json --id FOO_BAR_ID
and I'm getting:
Parameter validation failed:
Missing required parameter in DistributionConfig: "CallerReference"
Missing required parameter in DistributionConfig: "Origins"
Missing required parameter in DistributionConfig: "DefaultCacheBehavior"
Missing required parameter in DistributionConfig: "Comment"
Missing required parameter in DistributionConfig: "Enabled"
Unknown parameter in DistributionConfig: "ETag", must be one of: CallerReference, Aliases, DefaultRootObject, Origins, DefaultCacheBehavior, CacheBehaviors, CustomErrorResponses, Comment, Logging, PriceClass, Enabled, ViewerCertificate, Restrictions, WebACLId, HttpVersion, IsIPV6Enabled
Unknown parameter in DistributionConfig: "DistributionConfig", must be one of: CallerReference, Aliases, DefaultRootObject, Origins, DefaultCacheBehavior, CacheBehaviors, CustomErrorResponses, Comment, Logging, PriceClass, Enabled, ViewerCertificate, Restrictions, WebACLId, HttpVersion, IsIPV6Enabled
What is the right way to reconfigure CF using awscli?
#usterk's answer is correct, but it took me another 3 hours to get to the script that I needed. Here, I am sharing it.
My case: CI/CD using S3/CloudFront with manual artifact versioning
I am hosting a static website (SSG) in S3, and I want it to be served by CloudFront. The website gets frequent updates in terms of its code (not just the content) and I want to store all the versions of the website in S3 (just like all the artifacts or docker images) and update CloudFront to point to a new version, right after a new version is pushed to S3.
I know that there is "file versioning" in S3, but this old-school format for keeping all versions of the assets helps with analyzing the assets as well as easy roll-backs.
My configs
After building the assets (JS, CSS, etc), the new files are uploaded to S3 in a folder like s3://<mybucket-name>/artifacts/<version-id>
In CloudFront I have a Distribution for www website. Route53 for www.domain.com is pointing to it.
In that Distribution I have several Origins (e.g. one to send /api path to ELB.)
The Origin that matter here is www which has its OriginPath pointing to /artifacts/<version-id>.
Workflow
After S3 sync is done via AWS CLI, I need to update CloudFront's configs for that www Origin's OriginPath value to point to the new path in S3.
I also need to initiate an invalidation on the Distribution so CloudFront picks up the new files internally (between S3 and it)
The Task
As #usterk and #BrianLeishman pointed out, the only CLI command for this job is update-distribution which per the documentation, requires the ENTIRE CONFIGURATION of the distribution to REPLACE it. So, there is no command to partially update just one field in the config.
To achieve this, one must first get the current distribution-config, then extract the "DistributionConfig" component, then update the fields it takes, and finally, put it back in the proper format with a proper verification token.
Note that what the "update" command needs is a "subset" of what "get" gives back. So parsing JSON via jq is inevitable.
The Bash Script
The following script that I came up with, does the job for me:
# 0) You need to set the followings for your case
CLOUDFRONT_DISTRIBUTION_ID="EABCDEF12345ABCD"
NEW_ORIGIN_PATH="/art/0.0.9"
CLOUDFRONT_ORIGIN_ID="E1A2B3C4D5E6F"
DIST_CONFIG_OLD_FILENAME="dist-config.json" # a temp file, which will be removed later
DIST_CONFIG_NEW_FILENAME="dist-config2.json" # a temp file, which will be removed later
# 1) Get the current config, entirely, and put it in a file
aws cloudfront get-distribution --id $CLOUDFRONT_DISTRIBUTION_ID > $DIST_CONFIG_OLD_FILENAME
# 2) Extract the Etag which we need this later for update
Etag=`cat $DIST_CONFIG_OLD_FILENAME | jq '.ETag' | tr -d \"`
# 3) Modify the config as wished, for me I used `jq` extensively to update the "OriginPath" of the desired "originId"
cat $DIST_CONFIG_OLD_FILENAME | jq \
--arg targetOriginId $CLOUDFRONT_ORIGIN_ID \
--arg newOriginPath $NEW_ORIGIN_PATH \
'.Distribution.DistributionConfig | .Origins.Items = (.Origins.Items | map(if (.Id == $targetOriginId) then (.OriginPath = $newOriginPath) else . end))' \
> $DIST_CONFIG_NEW_FILENAME
# 4) Update the distribution with the new file
aws cloudfront update-distribution --id $CLOUDFRONT_DISTRIBUTION_ID \
--distribution-config "file://${DIST_CONFIG_NEW_FILENAME}" \
--if-match $Etag \
> /dev/null
# 5) Invalidate the distribution to pick up the changes
aws cloudfront create-invalidation --distribution-id $CLOUDFRONT_DISTRIBUTION_ID --paths "/*"
# 6) Clean up
rm -f $DIST_CONFIG_OLD_FILENAME $DIST_CONFIG_NEW_FILENAME
Final Note: IAM Access
The user that performs these needs IAM access to the Get, Invalidate, and Update actions on the Distribution in CloudFront. Here is the Policy that gives that:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"cloudfront:GetDistribution",
"cloudfront:UpdateDistribution",
"cloudfront:CreateInvalidation"
],
"Resource": "arn:aws:cloudfront::<ACCOUNT_ID>:distribution/<DISTRIBUTION_ID>
}
]
}
You have to edit cf_config.json before using it with update-distribution and remove
{
"ETag": "ETag_Value",
"DistributionConfig":
from the beginning of the file and last
}
from the end of file.
Then use this command with the right id and ETag value that was removed from cf_config.json
aws cloudfront update-distribution --distribution-config file://cf_config.json --id FOO_BAR_ID --if-match ETag_Value
aws cloudfront get-distribution-config Generates extra information you cannot use in update-distribution
so you have to get only DistributionConfig from generated json and modify it.
using jq
aws cloudfront get-distribution --id <CLOUDFRONT_DISTRIBUTION_ID> | jq .Distribution.DistributionConfig > config.json
also you will need Etag separately you can get it by:
ETAG=`aws cloudfront get-distribution --id <CLOUDFRONT_DISTRIBUTION_ID> | jq -r .ETag`
and then use it in update-distribution. e.g.
if you saved config in config.json:
aws cloudfront update-distribution --id <CLOUDFRONT_DISTRIBUTION_ID> --distribution-config "file://config.json" --if-match $ETAG > /dev/null
Example bash script to change Origin:
https://gist.github.com/ahmed-abdelazim/d5aa4dea6ecb5dbbff94ce1f5c1f32ff?fbclid=IwAR1QL1CujiCEyd5cDLoEuocOuXNfstxV9Ev6ndO9IorHVsx0EMroBVnimNg
#!/bin/bash
CLOUDFRONT_DISTRIBUTION_ID=E2C3RNL2F4MRMQ
NEW_ORIGIN="origin2-zaid.s3.us-west-2.amazonaws.com"
ETAG=`aws cloudfront get-distribution --id $CLOUDFRONT_DISTRIBUTION_ID | jq -r .ETag`
aws cloudfront get-distribution --id $CLOUDFRONT_DISTRIBUTION_ID | \
jq --arg NEW_ORIGIN "$NEW_ORIGIN" '.Distribution.DistributionConfig.Origins.Items[0].Id=$NEW_ORIGIN' | \
jq --arg NEW_ORIGIN "$NEW_ORIGIN" '.Distribution.DistributionConfig.Origins.Items[0].DomainName=$NEW_ORIGIN' | \
jq --arg NEW_ORIGIN "$NEW_ORIGIN" '.Distribution.DistributionConfig.DefaultCacheBehavior.TargetOriginId=$NEW_ORIGIN' | \
jq .Distribution.DistributionConfig > config.json
aws cloudfront update-distribution --id $CLOUDFRONT_DISTRIBUTION_ID --distribution-config "file://config.json" --if-match $ETAG > /dev/null
aws cloudfront create-invalidation --distribution-id $CLOUDFRONT_DISTRIBUTION_ID --paths "/*"
rm config.json

Does aws-cli confirm checksums when uploading files to S3, or do I need to manage that myself?

If I'm uploading data to S3 using the aws-cli (i.e. using aws s3 cp), does aws-cli do any work to confirm that the resulting file in S3 matches the original file, or do I somehow need to manage that myself?
Based on this answer and the Java API documentation for putObject(), it looks like it's possible to verify the MD5 checksum after upload. However, I can't find a definitive answer on whether aws-cli actually does that.
It matters to me because I'm intending to upload GPG-encrypted files from a backup process, and I'd like some confidence that what's been stored in S3 actually matches the original.
According to the faq from the aws-cli github, the checksums are checked in most cases during upload and download.
Key points for uploads:
The AWS CLI calculates the Content-MD5 header for both standard and
multipart uploads.
If the checksum that S3 calculates does not match
the Content-MD5 provided, S3 will not store the object and instead
will return an error message back the AWS CLI.
The AWS CLI will retry this error up to 5 times before giving up and exiting with a nonzero exit code.
The AWS support page How do I ensure data integrity of objects uploaded to or downloaded from Amazon S3? describes how to achieve this.
Firstly determine the base64 encoded md5sum of the file you wish to upload:
$ md5_sum_base64="$( openssl md5 -binary my-file | base64 )"
Then use the s3api to upload the file:
$ aws s3api put-object --bucket my-bucket --key my-file-name --body my-file-path --content-md5 "$md5_sum_base64"
Note the use of the --content-md5 flag, the help for this flag states:
--content-md5 (string) The base64-encoded 128-bit MD5 digest of the part data.
This does not say much about why to use this flag, but we can find this information in the API documentation for put object:
To ensure that data is not corrupted traversing the network, use the Content-MD5 header. When you use this header, Amazon S3 checks the object against the provided MD5 value and, if they do not match, returns an error. Additionally, you can calculate the MD5 while putting an object to Amazon S3 and compare the returned ETag to the calculated MD5 value.
Using this flag causes S3 to verify that the file hash serverside matches the specified value. If the hashes match s3 will return the ETag:
{
"ETag": "\"599393a2c526c680119d84155d90f1e5\""
}
The ETag value will usually be the hexadecimal md5sum (see this question for some scenarios where this may not be the case).
If the hash does not match the one you specified you get an error.
A client error (InvalidDigest) occurred when calling the PutObject operation: The Content-MD5 you specified was invalid.
In addition to this you can also add the file md5sum to the file metadata as an additional check:
$ aws s3api put-object --bucket my-bucket --key my-file-name --body my-file-path --content-md5 "$md5_sum_base64" --metadata md5chksum="$md5_sum_base64"
After upload you can issue the head-object command to check the values.
$ aws s3api head-object --bucket my-bucket --key my-file-name
{
"AcceptRanges": "bytes",
"ContentType": "binary/octet-stream",
"LastModified": "Thu, 31 Mar 2016 16:37:18 GMT",
"ContentLength": 605,
"ETag": "\"599393a2c526c680119d84155d90f1e5\"",
"Metadata": {
"md5chksum": "WZOTosUmxoARnYQVXZDx5Q=="
}
}
Here is a bash script that uses content md5 and adds metadata and then verifies that the values returned by S3 match the local hashes:
#!/bin/bash
set -euf -o pipefail
# assumes you have aws cli, jq installed
# change these if required
tmp_dir="$HOME/tmp"
s3_dir="foo"
s3_bucket="stack-overflow-example"
aws_region="ap-southeast-2"
aws_profile="my-profile"
test_dir="$tmp_dir/s3-md5sum-test"
file_name="MailHog_linux_amd64"
test_file_url="https://github.com/mailhog/MailHog/releases/download/v1.0.0/MailHog_linux_amd64"
s3_key="$s3_dir/$file_name"
return_dir="$( pwd )"
cd "$tmp_dir" || exit
mkdir "$test_dir"
cd "$test_dir" || exit
wget "$test_file_url"
md5_sum_hex="$( md5sum $file_name | awk '{ print $1 }' )"
md5_sum_base64="$( openssl md5 -binary $file_name | base64 )"
echo "$file_name hex = $md5_sum_hex"
echo "$file_name base64 = $md5_sum_base64"
echo "Uploading $file_name to s3://$s3_bucket/$s3_dir/$file_name"
aws \
--profile "$aws_profile" \
--region "$aws_region" \
s3api put-object \
--bucket "$s3_bucket" \
--key "$s3_key" \
--body "$file_name" \
--metadata md5chksum="$md5_sum_base64" \
--content-md5 "$md5_sum_base64"
echo "Verifying sums match"
s3_md5_sum_hex=$( aws --profile "$aws_profile" --region "$aws_region" s3api head-object --bucket "$s3_bucket" --key "$s3_key" | jq -r '.ETag' | sed 's/"//'g )
s3_md5_sum_base64=$( aws --profile "$aws_profile" --region "$aws_region" s3api head-object --bucket "$s3_bucket" --key "$s3_key" | jq -r '.Metadata.md5chksum' )
if [ "$md5_sum_hex" == "$s3_md5_sum_hex" ] && [ "$md5_sum_base64" == "$s3_md5_sum_base64" ]; then
echo "checksums match"
else
echo "something is wrong checksums do not match:"
cat <<EOM | column -t -s ' '
$file_name file hex: $md5_sum_hex s3 hex: $s3_md5_sum_hex
$file_name file base64: $md5_sum_base64 s3 base64: $s3_md5_sum_base64
EOM
fi
echo "Cleaning up"
cd "$return_dir"
rm -rf "$test_dir"
aws \
--profile "$aws_profile" \
--region "$aws_region" \
s3api delete-object \
--bucket "$s3_bucket" \
--key "$s3_key"