Undelete folders from AWS S3 - amazon-web-services

I have a S3 bucket with versioning enabled. It is possible to undelete files, but how can I undelete folders?
I know, S3 does not have folders... but how can I undelete common prefixes? Is there a possibility to undelete files recursively?

I created this simple bash script to restore all the files in an S3 folder I deleted:
#!/bin/bash
recoverfiles=$(aws s3api list-object-versions --bucket MyBucketName --prefix TheDeletedFolder/ --query "DeleteMarkers[?IsLatest && starts_with(LastModified,'yyyy-mm-dd')].{Key:Key,VersionId:VersionId}")
for row in $(echo "${recoverfiles}" | jq -c '.[]'); do
key=$(echo "${row}" | jq -r '.Key' )
versionId=$(echo "${row}" | jq -r '.VersionId' )
echo aws s3api delete-object --bucket MyBucketName --key $key --version-id $versionId
done
yyyy-mm-dd = the date the folder was deleted

I found a satisfying solution here, which is described in more details here.
To sum up, there is no out-of-the-box tool for this, but a simple bash script wraps the AWS tool "s3api" to achieve the recursive undelete.
The solution worked for me. The only drawback I found is, that Amazon seems to throttle the restore operations after about 30.000 files.

You cannot undelete a common prefix. You would need to undelete one object at a time. When an object appears, any associated folder will also reappear.
Undeleting can be accomplished in two ways:
Delete the Delete Marker that will reverse the deletion, or
Copy a previous version of the object to itself, which will make the newest version newer than the Delete Marker, so it will reappear. (I hope you understood that!)

If a folder and its contents are deleted you can recover them using the below script inspired by a previous answer
The script is applicable to an S3 bucket where versioning is enabeled before hand. It uses the delete marker tag to restore files in an S3 prefix.
#!/bin/bash
# Inspired by https://www.dmuth.org/how-to-undelete-files-in-amazon-s3/
# This script can be used to undelete objects from an S3 bucket.
# When run, it will print out a list of AWS commands to undelete files, which you
# can then pipe into Bash.
#
#
# You will need the AWS CLI tool from https://aws.amazon.com/cli/ in order to run this script.
#
# Note that you must have the following permissions via IAM:
#
# Bucket permissions:
#
# s3:ListBucket
# s3:ListBucketVersions
#
# File permissions:
#
# s3:PutObject
# s3:GetObject
# s3:DeleteObject
# s3:DeleteObjectVersion
#
# If you want to do this in a "quick and dirty manner", you could just grant s3:* to
# the account, but I don't really recommend that.
#
# profile = company
# bucket = company-s3-bucket
# prefix = directory1/directory2/directory3/lastdirectory/
# pattern = (.*)
# USAGE
# bash undelete.sh > recover_files.txt | bash
read -p "Enter your aws profile: " PROFILE
read -p "Enter your S3 bucket name: " BUCKET
read -p "Enter your S3 directory/prefix to be recovered from, leave empty for to recover all of the S3 bucket: " PREFIX
read -p "Enter the file pattern looking to recover, leave empty for all: " PATTERN
# Make sure Profile and Bucket are entered
[[ -z "$PROFILE" ]] && { echo "Profile is empty" ; exit 1; }
[[ -z "$BUCKET" ]] && { echo "Bucket is empty" ; exit 1; }
# Fill PATTERN to match all if empty
PATTERN=${PATTERN:-(.*)}
# Errors are fatal
set -e
if [ "$PREFIX" = "" ];
# To recover all of the S3 bucket
then
aws --profile ${PROFILE} --output text s3api list-object-versions --bucket ${BUCKET} \
| grep -i $PATTERN \
| grep -E "^DELETEMARKERS" \
| awk -v PROFILE=$PROFILE -v BUCKET=$BUCKET -v PREFIX=$PREFIX \
-F "[\t]+" '{ print "aws --profile " PROFILE " s3api delete-object --bucket " BUCKET "--key \""$3"\" --version-id "$5";"}'
# To recover a directory
else
aws --profile ${PROFILE} --output text s3api list-object-versions --bucket ${BUCKET} --prefix ${PREFIX} \
| grep -E $PATTERN \
| grep -E "^DELETEMARKERS" \
| awk -v PROFILE=$PROFILE -v BUCKET=$BUCKET -v PREFIX=$PREFIX \
-F "[\t]+" '{ print "aws --profile " PROFILE " s3api delete-object --bucket " BUCKET "--key \""$3"\" --version-id "$5";"}'
fi

Related

How to list unused AWS S3 buckets and empty bucket using shell script?

I am looking for list of unused s3 buckets from last 90 days and also for empty bucket list.
In order to get it, I have tried writing code as below:
#/bin/sh
for bucketlist in $(aws s3api list-buckets --query "Buckets[].Name");
do
listobjects=$(\
aws s3api list-objects --bucket $bucketlist \
--query 'Contents[?contains(LastModified, `2020-08-06`)]')
done
This code prints following output: [I have added results for only one bucket for reference]
{
"Contents": [
{
"Key": "test2/image.png",
"LastModified": "2020-08-06T17:19:10.000Z",
"ETag": "\"xxxxxx\"",
"Size": 179008,,
"StorageClass": "STANDARD",
}
]
}
Expectations:
In above code I want to print only bucket list which objects are not modified/used in last 90 days.
I am also looking for empty bucket list
I am not good in programming, Can anyone guide me on this?
Thank you in advance for your support.
I made this small bash script to find empty buckets in my account:
#!/bin/zsh
for b in $(aws s3 ls | cut -d" " -f3)
do
echo -n $b
if [[ "$(aws s3api list-objects-v2 --bucket $b --max-items 1)" == "" ]]
then
echo " BUCKET EMPTY"
else
echo ""
fi
done
I listed the objects using the list-objects-v2 with maximum items of 1. If there are no items - the result is empty and I print "BUCKET EMPTY" alongside the bucket name.
Note 1: You must have access to list the objects.
Note 2: I'm not sure how it'll work for versioned buckets with deleted objects (appears to be empty, but actually contains older versions of deleted objects)
Here's a script I wrote today. It doesn't change anything, but it does give you the commandlines to make the changes.
#!/bin/bash
profile="default"
olddate="2020-01-01"
smallbucketsize=10
emptybucketlist=()
oldbucketlist=()
smallbucketlist=()
#for bucketlist in $(aws s3api list-buckets --profile $profile | jq --raw-output '.Buckets[6,7,8,9].Name'); # test this script on just a few buckets
for bucketlist in $(aws s3api list-buckets --profile $profile | jq --raw-output '.Buckets[].Name');
do
echo "* $bucketlist"
if [[ ! "$bucketlist" == *"shmr-logs" ]]; then
listobjects=$(\
aws s3api list-objects --bucket $bucketlist \
--query 'Contents[*].Key' \
--profile $profile)
#echo "==$listobjects=="
if [[ "$listobjects" == "null" ]]; then
echo "$bucketlist is empty"
emptybucketlist+=("$bucketlist")
else
# get size
aws s3 ls --summarize --human-readable --recursive --profile $profile s3://$bucketlist | tail -n1
# get number of files
filecount=$(echo $listobjects | jq length )
echo "contains $filecount files"
if [[ $filecount -lt $smallbucketsize ]]; then
smallbucketlist+=("$bucketlist")
fi
# get number of files older than $olddate
listoldobjects=$(\
aws s3api list-objects --bucket $bucketlist \
--query "Contents[?LastModified<=\`$olddate\`]" \
--profile $profile)
oldfilecount=$(echo $listoldobjects | jq length )
echo "contains $oldfilecount old files"
# check if all files are old
if [[ $filecount -eq $oldfilecount ]]; then
echo "all the files are old"
oldbucketlist+=("$bucketlist")
fi
fi
fi
done
echo -e "\n\n"
echo "check the contents of these buckets which only contain old files"
for oldbuckets in ${oldbucketlist[#]};
do
echo "$oldbuckets"
done
echo -e "\n\n"
echo "check the contents of these buckets which don't have many files"
for smallbuckets in ${smallbucketlist[#]};
do
echo "aws s3api list-objects --bucket $smallbuckets --query 'Contents[*].Key' --profile $profile"
done
echo -e "\n\n"
echo "consider deleting these empty buckets"
for emptybuckets in "${emptybucketlist[#]}";
do
echo "aws s3api delete-bucket --profile $profile --bucket $emptybuckets"
done

How to compare versions of an Amazon S3 object?

Versioning of Amazon S3 buckets is nice, but I don't see any easy way to compare versions of a file - either through the console or through any other app I found.
S3Browser seems to have the best versioning support, but no comparison.
Is there a way to compare versions of a file on S3 without downloading both versions and comparing them manually?
--
EDIT:
I just started thinking that some basic automation should not be too hard, see snippet below. Question remains though: is there any tool that supports this properly? This script may be fine for me, but not for non-dev users.
#!/bin/bash
# s3-compare-last-versions.sh
if [[ $# -ne 2 ]]; then
echo "Usage: `basename $0` <bucketName> <fileKey> "
exit 1
fi
bucketName=$1
fileKey=$2
latestVersionId=$(aws s3api list-object-versions --bucket $bucketName --prefix $fileKey --max-items 2 | json Versions[0].VersionId)
previousVersionId=$(aws s3api list-object-versions --bucket $bucketName --prefix $fileKey --max-items 2 | json Versions[1].VersionId)
aws s3api get-object --bucket $bucketName --key $fileKey --version-id $latestVersionId $latestVersionId".js"
aws s3api get-object --bucket $bucketName --key $fileKey --version-id $previousVersionId $previousVersionId".js"
diff $latestVersionId".js" $previousVersionId".js"
I wrote a bash script to download the last two versions of an object and compare it using colordiff. I stumbled across this questions after writing it. Thought I could share it here if anyone wanted to use it.
#!/bin/bash
#This script needs awscli, jq and colordiff. Please install them for your environment
#This script also needs the AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION.
#Please set them using the export command as follows or set them using envrc
#export AWS_ACCESS_KEY_ID=<Your AWS Access Key ID>
#export AWS_SECRET_ACCESS_KEY=<Your AWS Secret Access Key>
#export AWS_DEFAULT_REGION=<Your AWS Default Region>
set -e
if [ -z $1 ] || [ -z $2 ]; then
echo "Usage:"
echo "version_compare.sh *bucket_name* *file_name*"
echo
echo "Example"
echo "version_compare.sh bucket_name folder/filename.extension"
echo
exit 1;
fi
aws_bucket=$1
file_key=$2
echo Getting the last 2 versions of the file at ${file_key}..
echo
echo Executing:
cat << EOF
aws s3api list-object-versions --bucket ${aws_bucket} --prefix ${file_key} --max-items 2
EOF
echo
versions=$(aws s3api list-object-versions --bucket ${aws_bucket} --prefix ${file_key} --max-items 2)
version_1=$( jq -r '.["Versions"][0]["VersionId"]' <<< "${versions}" )
version_2=$( jq -r '.["Versions"][1]["VersionId"]' <<< "${versions}" )
mkdir -p state_comparison_files
echo Getting the latest version ${version_1} of the file at ${file_key}..
echo
echo Executing:
cat << EOF
aws s3api get-object --bucket ${aws_bucket} --key ${file_key} --version-id ${version_1} state_comparison_files/${version_1}
EOF
aws s3api get-object --bucket ${aws_bucket} --key ${file_key} --version-id ${version_1} state_comparison_files/${version_1} > /dev/null
echo
echo Getting older version ${version_2} of the file at ${file_key}..
echo
echo Executing:
cat << EOF
aws s3api get-object --bucket ${aws_bucket} --key ${file_key} --version-id ${version_2} state_comparison_files/${version_2}
EOF
aws s3api get-object --bucket ${aws_bucket} --key ${file_key} --version-id ${version_2} state_comparison_files/${version_2} > /dev/null
echo
echo Comparing the different versions.
echo If no differences are found, nothing will be shown
colordiff --unified state_comparison_files/${version_2} state_comparison_files/${version_1}
Here's the link to it
https://gist.github.com/mohamednajiullah/3edc88d314291be40f2dd3cf13ea0d7f
Note: It's pretty much the same as the script the question asker himself created except that it uses jq for json parsing and colordiff for showing the difference with different colors like in git diff.
I'm creating an electron.js based desktop app to do exactly this. It's currently in development but it can be used. I welcome contributions
https://github.com/mohamednajiullah/s3_object_version_comparator
You can't view file contents at all via S3, so you definitely can't compare the contents of files via S3. You would have to download the different versions and then use a tool like diff to compare them.
you can use MegaSparDiff an open source too that compares multiple types of datasources including S3
https://github.com/FINRAOS/MegaSparkDiff
the below pair will return inLeftButNotInRight and inRightButNotInLeft as DataFrames which you can save as files or you can examine the data via code.
SparkFactory.initializeSparkContext();
AppleTable leftAppleTable = SparkFactory.parallelizeTextSource("S3://file1","table1");
AppleTable rightAppleTable = SparkFactory.parallelizeTextSource("S3://file2","table2");
Pair<Dataset<Row>, Dataset<Row>> resultPair = SparkCompare.compareAppleTables(leftAppleTable, rightAppleTable);
resultPair.getLeft().show(100);
SparkFactory.stopSparkContext();

How can I read the metadata for every item in an S3 bucket?

I can set Cache-Control metadata on every item in an S3 bucket using the following command (from this answer):
aws s3 cp s3://mybucket s3://mybucket --recursive --metadata-directive REPLACE \
--cache-control max-age=86400
Is there a way to read the Cache-Control metadata for every item in a bucket?
This bash one-liner should work (but it is very slow since it sends separate request for each object):
IFS=$'\n'; for object in `aws s3 ls s3://my-bucket-name --recursive | tr -s ' ' | cut -d' ' -f4-`; do echo $object `aws s3api head-object --bucket my-bucket-name --key $object --query CacheControl` ; done

Does aws-cli confirm checksums when uploading files to S3, or do I need to manage that myself?

If I'm uploading data to S3 using the aws-cli (i.e. using aws s3 cp), does aws-cli do any work to confirm that the resulting file in S3 matches the original file, or do I somehow need to manage that myself?
Based on this answer and the Java API documentation for putObject(), it looks like it's possible to verify the MD5 checksum after upload. However, I can't find a definitive answer on whether aws-cli actually does that.
It matters to me because I'm intending to upload GPG-encrypted files from a backup process, and I'd like some confidence that what's been stored in S3 actually matches the original.
According to the faq from the aws-cli github, the checksums are checked in most cases during upload and download.
Key points for uploads:
The AWS CLI calculates the Content-MD5 header for both standard and
multipart uploads.
If the checksum that S3 calculates does not match
the Content-MD5 provided, S3 will not store the object and instead
will return an error message back the AWS CLI.
The AWS CLI will retry this error up to 5 times before giving up and exiting with a nonzero exit code.
The AWS support page How do I ensure data integrity of objects uploaded to or downloaded from Amazon S3? describes how to achieve this.
Firstly determine the base64 encoded md5sum of the file you wish to upload:
$ md5_sum_base64="$( openssl md5 -binary my-file | base64 )"
Then use the s3api to upload the file:
$ aws s3api put-object --bucket my-bucket --key my-file-name --body my-file-path --content-md5 "$md5_sum_base64"
Note the use of the --content-md5 flag, the help for this flag states:
--content-md5 (string) The base64-encoded 128-bit MD5 digest of the part data.
This does not say much about why to use this flag, but we can find this information in the API documentation for put object:
To ensure that data is not corrupted traversing the network, use the Content-MD5 header. When you use this header, Amazon S3 checks the object against the provided MD5 value and, if they do not match, returns an error. Additionally, you can calculate the MD5 while putting an object to Amazon S3 and compare the returned ETag to the calculated MD5 value.
Using this flag causes S3 to verify that the file hash serverside matches the specified value. If the hashes match s3 will return the ETag:
{
"ETag": "\"599393a2c526c680119d84155d90f1e5\""
}
The ETag value will usually be the hexadecimal md5sum (see this question for some scenarios where this may not be the case).
If the hash does not match the one you specified you get an error.
A client error (InvalidDigest) occurred when calling the PutObject operation: The Content-MD5 you specified was invalid.
In addition to this you can also add the file md5sum to the file metadata as an additional check:
$ aws s3api put-object --bucket my-bucket --key my-file-name --body my-file-path --content-md5 "$md5_sum_base64" --metadata md5chksum="$md5_sum_base64"
After upload you can issue the head-object command to check the values.
$ aws s3api head-object --bucket my-bucket --key my-file-name
{
"AcceptRanges": "bytes",
"ContentType": "binary/octet-stream",
"LastModified": "Thu, 31 Mar 2016 16:37:18 GMT",
"ContentLength": 605,
"ETag": "\"599393a2c526c680119d84155d90f1e5\"",
"Metadata": {
"md5chksum": "WZOTosUmxoARnYQVXZDx5Q=="
}
}
Here is a bash script that uses content md5 and adds metadata and then verifies that the values returned by S3 match the local hashes:
#!/bin/bash
set -euf -o pipefail
# assumes you have aws cli, jq installed
# change these if required
tmp_dir="$HOME/tmp"
s3_dir="foo"
s3_bucket="stack-overflow-example"
aws_region="ap-southeast-2"
aws_profile="my-profile"
test_dir="$tmp_dir/s3-md5sum-test"
file_name="MailHog_linux_amd64"
test_file_url="https://github.com/mailhog/MailHog/releases/download/v1.0.0/MailHog_linux_amd64"
s3_key="$s3_dir/$file_name"
return_dir="$( pwd )"
cd "$tmp_dir" || exit
mkdir "$test_dir"
cd "$test_dir" || exit
wget "$test_file_url"
md5_sum_hex="$( md5sum $file_name | awk '{ print $1 }' )"
md5_sum_base64="$( openssl md5 -binary $file_name | base64 )"
echo "$file_name hex = $md5_sum_hex"
echo "$file_name base64 = $md5_sum_base64"
echo "Uploading $file_name to s3://$s3_bucket/$s3_dir/$file_name"
aws \
--profile "$aws_profile" \
--region "$aws_region" \
s3api put-object \
--bucket "$s3_bucket" \
--key "$s3_key" \
--body "$file_name" \
--metadata md5chksum="$md5_sum_base64" \
--content-md5 "$md5_sum_base64"
echo "Verifying sums match"
s3_md5_sum_hex=$( aws --profile "$aws_profile" --region "$aws_region" s3api head-object --bucket "$s3_bucket" --key "$s3_key" | jq -r '.ETag' | sed 's/"//'g )
s3_md5_sum_base64=$( aws --profile "$aws_profile" --region "$aws_region" s3api head-object --bucket "$s3_bucket" --key "$s3_key" | jq -r '.Metadata.md5chksum' )
if [ "$md5_sum_hex" == "$s3_md5_sum_hex" ] && [ "$md5_sum_base64" == "$s3_md5_sum_base64" ]; then
echo "checksums match"
else
echo "something is wrong checksums do not match:"
cat <<EOM | column -t -s ' '
$file_name file hex: $md5_sum_hex s3 hex: $s3_md5_sum_hex
$file_name file base64: $md5_sum_base64 s3 base64: $s3_md5_sum_base64
EOM
fi
echo "Cleaning up"
cd "$return_dir"
rm -rf "$test_dir"
aws \
--profile "$aws_profile" \
--region "$aws_region" \
s3api delete-object \
--bucket "$s3_bucket" \
--key "$s3_key"

Query EC2 tags from within instance

Amazon recently added the wonderful feature of tagging EC2 instances with key-value pairs to make management of large numbers of VMs a bit easier.
Is there some way to query these tags in the same way as some of the other user-set data? For example:
$ curl http://169.254.169.254/latest/meta-data/placement/availability-zone
us-east-1d
Is there some similar way to query the tags?
The following bash script returns the Name of your current ec2 instance (the value of the "Name" tag). Modify TAG_NAME to your specific case.
TAG_NAME="Name"
INSTANCE_ID="`wget -qO- http://instance-data/latest/meta-data/instance-id`"
REGION="`wget -qO- http://instance-data/latest/meta-data/placement/availability-zone | sed -e 's:\([0-9][0-9]*\)[a-z]*\$:\\1:'`"
TAG_VALUE="`aws ec2 describe-tags --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=key,Values=$TAG_NAME" --region $REGION --output=text | cut -f5`"
To install the aws cli
sudo apt-get install python-pip -y
sudo pip install awscli
In case you use IAM instead of explicit credentials, use these IAM permissions:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [ "ec2:DescribeTags"],
"Resource": ["*"]
}
]
}
Once you've got ec2-metadata and ec2-describe-tags installed (as mentioned in Ranieri's answer above), here's an example shell command to get the "name" of the current instance, assuming you have a "Name=Foo" tag on it.
Assumes EC2_PRIVATE_KEY and EC2_CERT environment variables are set.
ec2-describe-tags \
--filter "resource-type=instance" \
--filter "resource-id=$(ec2-metadata -i | cut -d ' ' -f2)" \
--filter "key=Name" | cut -f5
This returns Foo.
You can use a combination of the AWS metadata tool (to retrieve your instance ID) and the new Tag API to retrieve the tags for the current instance.
You can add this script to your cloud-init user data to download EC2 tags to a local file:
#!/bin/sh
INSTANCE_ID=`wget -qO- http://instance-data/latest/meta-data/instance-id`
REGION=`wget -qO- http://instance-data/latest/meta-data/placement/availability-zone | sed 's/.$//'`
aws ec2 describe-tags --region $REGION --filter "Name=resource-id,Values=$INSTANCE_ID" --output=text | sed -r 's/TAGS\t(.*)\t.*\t.*\t(.*)/\1="\2"/' > /etc/ec2-tags
You need the AWS CLI tools installed on your system: you can either install them with a packages section in a cloud-config file before the script, use an AMI that already includes them, or add an apt or yum command at the beginning of the script.
In order to access EC2 tags you need a policy like this one in your instance's IAM role:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1409309287000",
"Effect": "Allow",
"Action": [
"ec2:DescribeTags"
],
"Resource": [
"*"
]
}
]
}
The instance's EC2 tags will available in /etc/ec2-tags in this format:
FOO="Bar"
Name="EC2 tags with cloud-init"
You can include the file as-is in a shell script using . /etc/ec2-tags, for example:
#!/bin/sh
. /etc/ec2-tags
echo $Name
The tags are downloaded during instance initialization, so they will not reflect subsequent changes.
The script and IAM policy are based on itaifrenkel's answer.
If you are not in the default availability zone the results from overthink would return empty.
ec2-describe-tags \
--region \
$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed -e "s/.$//") \
--filter \
resource-id=$(curl --silent http://169.254.169.254/latest/meta-data/instance-id)
If you want to add a filter to get a specific tag (elasticbeanstalk:environment-name in my case) then you can do this.
ec2-describe-tags \
--region \
$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed -e "s/.$//") \
--filter \
resource-id=$(curl --silent http://169.254.169.254/latest/meta-data/instance-id) \
--filter \
key=elasticbeanstalk:environment-name | cut -f5
And to get only the value for the tag that I filtered on, we pipe to cut and get the fifth field.
ec2-describe-tags \
--region \
$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed -e "s/.$//") \
--filter \
resource-id=$(curl --silent http://169.254.169.254/latest/meta-data/instance-id) \
--filter \
key=elasticbeanstalk:environment-name | cut -f5
You can alternatively use the describe-instances cli call rather than describe-tags:
This example shows how to get the value of tag 'my-tag-name' for the instance:
aws ec2 describe-instances \
--instance-id $(curl -s http://169.254.169.254/latest/meta-data/instance-id) \
--query "Reservations[*].Instances[*].Tags[?Key=='my-tag-name'].Value" \
--region ap-southeast-2 --output text
Change the region to suit your local circumstances. This may be useful where your instance has the describe-instances privilege but not describe-tags in the instance profile policy
I have pieced together the following that is hopefully simpler and cleaner than some of the existing answers and uses only the AWS CLI and no additional tools.
This code example shows how to get the value of tag 'myTag' for the current EC2 instance:
Using describe-tags:
export AWS_DEFAULT_REGION=us-east-1
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
aws ec2 describe-tags \
--filters "Name=resource-id,Values=$instance_id" 'Name=key,Values=myTag' \
--query 'Tags[].Value' --output text
Or, alternatively, using describe-instances:
aws ec2 describe-instances --instance-id $instance_id \
--query 'Reservations[].Instances[].Tags[?Key==`myTag`].Value' --output text
For Python:
from boto import utils, ec2
from os import environ
# import keys from os.env or use default (not secure)
aws_access_key_id = environ.get('AWS_ACCESS_KEY_ID', failobj='XXXXXXXXXXX')
aws_secret_access_key = environ.get('AWS_SECRET_ACCESS_KEY', failobj='XXXXXXXXXXXXXXXXXXXXX')
#load metadata , if = {} we are on localhost
# http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AESDG-chapter-instancedata.html
instance_metadata = utils.get_instance_metadata(timeout=0.5, num_retries=1)
region = instance_metadata['placement']['availability-zone'][:-1]
instance_id = instance_metadata['instance-id']
conn = ec2.connect_to_region(region, aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
# get tag status for our instance_id using filters
# http://docs.aws.amazon.com/AWSEC2/latest/CommandLineReference/ApiReference-cmd-DescribeTags.html
tags = conn.get_all_tags(filters={'resource-id': instance_id, 'key': 'status'})
if tags:
instance_status = tags[0].value
else:
instance_status = None
logging.error('no status tag for '+region+' '+instance_id)
A variation on some of the answers above but this is how I got the value of a specific tag from the user-data script on an instance
REGION=$(curl http://instance-data/latest/meta-data/placement/availability-zone | sed 's/.$//')
INSTANCE_ID=$(curl -s http://instance-data/latest/meta-data/instance-id)
TAG_VALUE=$(aws ec2 describe-tags --region $REGION --filters "Name=resource-id,Values=$INSTANCE_ID" "Name=key,Values='<TAG_NAME_HERE>'" | jq -r '.Tags[].Value')
Starting January 2022, this should be also available directly via ec2 metadata api (if enabled).
curl http://169.254.169.254/latest/meta-data/tags/instance
https://aws.amazon.com/about-aws/whats-new/2022/01/instance-tags-amazon-ec2-instance-metadata-service/
Using the AWS 'user data' and 'meta data' APIs its possible to write a script which wraps puppet to start a puppet run with a custom cert name.
First start an aws instance with custom user data: 'role:webserver'
#!/bin/bash
# Find the name from the user data passed in on instance creation
USER=$(curl -s "http://169.254.169.254/latest/user-data")
IFS=':' read -ra UDATA <<< "$USER"
# Find the instance ID from the meta data api
ID=$(curl -s "http://169.254.169.254/latest/meta-data/instance-id")
CERTNAME=${UDATA[1]}.$ID.aws
echo "Running Puppet for certname: " $CERTNAME
puppet agent -t --certname=$CERTNAME
This calls puppet with a certname like 'webserver.i-hfg453.aws' you can then create a node manifest called 'webserver' and puppets 'fuzzy node matching' will mean it is used to provision all webservers.
This example assumes you build on a base image with puppet installed etc.
Benefits:
1) You don't have to pass round your credentials
2) You can be as granular as you like with the role configs.
Jq + ec2metadata makes it a little nicer. I'm using cf and have access to the region. Otherwise you can grab it in bash.
aws ec2 describe-tags --region $REGION \
--filters "Name=resource-id,Values=`ec2metadata --instance-id`" | jq --raw-output \
'.Tags[] | select(.Key=="TAG_NAME") | .Value'
No jq.
aws ec2 describe-tags --region us-west-2 \
--filters "Name=resource-id,Values=`ec2-metadata --instance-id | cut -d " " -f 2`" \
--query 'Tags[?Key==`Name`].Value' \
--output text
Download and run a standalone executable to do that.
Sometimes one cannot install awscli that depends on python. docker might be out of the picture too.
Here is my implementation in golang:
https://github.com/hmalphettes/go-ec2-describe-tags
The Metadata tool seems to no longer be available, but that was an unnecessary dependency anyway.
Follow the AWS documentation to have the instance's profile grant it the "ec2:DescribeTags" action in a policy, restricting the target resources as much as you wish. (If you need a profile for another reason then you'll need to merge policies into a new profile-linked role).
Then:
aws --region $(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone | sed -e 's/.$//') ec2 describe-tags --filters Name=resource-type,Values=instance Name=resource-id,Values=$(curl http://169.254.169.254/latest/meta-data/instance-id) Name=key,Values=Name |
perl -nwe 'print "$1\n" if /"Value": "([^"]+)/;'
Well there are lots of good answers here but none quite worked for me exactly out of the box, I think the CLI has been updated since some of them and I do like using the CLI. The following single command works out of the box for me in 2021 (as long as the instance's IAM role is allowed to describe-tags).
aws ec2 describe-tags \
--region "$(ec2-metadata -z | cut -d' ' -f2 | sed 's/.$//')" \
--filters "Name=resource-id,Values=$(ec2-metadata --instance-id | cut -d " " -f 2)" \
--query 'Tags[?Key==`Name`].Value' \
--output text
AWS has recently announced support for instance tags in Instance Metadata Service: https://aws.amazon.com/about-aws/whats-new/2022/01/instance-tags-amazon-ec2-instance-metadata-service/
If you have have the tag metadata option enabled for an instance, you can simply do
$ TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 900"`
$ curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/tags/instance
It is possible to get Instance tags from within the instance via metadata.
First, allow access to tags in instance metadata as explained here
Then, run this command for IMDSv1, Refer
curl http://169.254.169.254/latest/meta-data/tags/instance/Name
or this command for IMDSv2
TOKEN=`curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600"` \
&& curl -H "X-aws-ec2-metadata-token: $TOKEN" -v http://169.254.169.254/latest/meta-data/tags/instance
Install AWS CLI:
curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
sudo apt-get install unzip
unzip awscli-bundle.zip
sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws
Get the tags for the current instance:
aws ec2 describe-tags --filters "Name=resource-id,Values=`ec2metadata --instance-id`"
Outputs:
{
"Tags": [
{
"ResourceType": "instance",
"ResourceId": "i-6a7e559d",
"Value": "Webserver",
"Key": "Name"
}
]
}
Use a bit of perl to extract the tags:
aws ec2 describe-tags --filters \
"Name=resource-id,Values=`ec2metadata --instance-id`" | \
perl -ne 'print "$1\n" if /\"Value\": \"(.*?)\"/'
Returns:
Webserver
For those crazy enough to use Fish shell on EC2, here's a handy snippet for your /home/ec2-user/.config/fish/config.fish. The hostdata command now will list all your tags as well as the public IP and hostname.
set -x INSTANCE_ID (wget -qO- http://instance-data/latest/meta-data/instance-id)
set -x REGION (wget -qO- http://instance-data/latest/meta-data/placement/availability-zone | sed 's/.$//')
function hostdata
aws ec2 describe-tags --region $REGION --filter "Name=resource-id,Values=$INSTANCE_ID" --output=text | sed -r 's/TAGS\t(.*)\t.*\t.*\t(.*)/\1="\2"/'
ec2-metadata | grep public-hostname
ec2-metadata | grep public-ipv4
end