Cannot delete Amazon S3 key that contains bad character - amazon-web-services

I just began to use S3 recently. I accidentally made a key that contains a bad character, and now I can't list the contents of that folder, nor delete that bad key. (I've since added checks to make sure I don't do this again).
I was using an old "S3" python module from 2008 originally. Now I've switched to boto-2.0, and I still cannot delete it. I did quite a bit of research online, and it seems the problem is I have an invalid XML character, so it seems a problem at the lowest level, and no API has helped so far.
I finally contacted Amazon, and they said to use "s3-curl.pl" from http://aws.amazon.com/code/128. I downloaded it, and here's my key:
<Key>info/[01</Key>
I think I was doing a quick bash for loop over some files at the time, and I have "lscolors" set up, and so this happened.
I tried
./s3curl.pl --id <myID> --key <myKEY> -- -X DELETE https://mybucket.s3.amazonaws.com/info/[01
(and also tried putting the URL in single/double quotes, and also tried to escape the '[').
Without quotes on the URL, it hangs. With quotes, I get "curl: (3) [globbing] error: bad range specification after pos 50". I edited the s3-curl.pl to do curl --globoff and still get this error.
I would appreciate any help.

This solved the issue, just delete the main folder:
aws s3 rm "s3://BUCKET_NAME/folder/folder" --recursive

You can use the s3cmd tool from here. You first need to run
s3cmd fixbucket <bucket name that contains bad file>.
You can then delete the file using
s3cmd del <bucket>/<file>

In my case there were newlines in the key (however that happened..). I was able to fix it with the aws cli like this:
aws cli rm "s3://my_bucket/Icon"$'\r'
I also had versioning enabled, so I also needed to do this, for all the versions (versions ids are visible in the UI when enabling the version view):
aws s3api delete-object --bucket my_bucket --key "Icon"$'\r' --version-id <version_id>

I was in this situation recently, to list the items you can use:
aws s3api list-objects-v2 --bucket my_bucket --encoding-type url
the bad keys will come back url encoded like:
"Key": "%01%C3%B4%C2%B3%C3%8Bu%C2%A5%27%40yr%3E%60%0EQ%14%C3%A5.gif"
spaces became + and I had to change those to %20 and * wasn't encoded I had to replace those with %2A before I was able to delete them.
To actually delete them, I wasn't able to use the aws cli because it would urlencode the already urlencoded key resulting in a 404, so to get around that I manually hit the rest API with the DELETE verb.

I recently encountered this case. I had newline at the end of my bucket. The following command solved the matter.
aws s3 rm "bucket_name"$'\r' --recursive

Related

PowerShell Copy Files to AWS Bucket with Object Lock

I want to copy CSV files generated by an SSIS package from an AWS EC2 server to an S3 bucket. Each time I try I get an error around the content-MD5 HTTP error because we have object lock enabled on the bucket.
Write-S3Object : Content-MD5 HTTP header is required for Put Object requests with Object Lock parameters
I would assume there is a PowerShell command I can add or I am missing something but after furious googling I cannot find a resolution. Any help or an alternative option would be appreciated.
I am now testing using the AWS CLI process instead of PowerShell.
If you do want to continue to use the Write-S3Object PowerShell command the missing magic flag is:
-CalculateContentMD5Header 1
So the final command will be
Write-S3Object -Region $region -BucketName $bucketName -File $fileToBackup -Key $destinationFileName -CalculateContentMD5Header 1
https://docs.aws.amazon.com/powershell/latest/reference/items/Write-S3Object.html
After a lot of testing, reading and frustration I found the AWS CLI was able to do exactly what I needed. I am unsure if this is an issue with my PowerShell knowledge or a missing feature (I lean toward my knowledge).
I created a bat file that used the CLI to move the files into the S3 bucket, I then called this bat file from within an SSIS execute task process.
Dropped the one line code below just in case it may help others.
aws s3 mv C:\path\to\files\ s3://your.s3.bucket.name/ --recursive

AWS S3 Object deleting issue

The object of s3 is:
images/jkå^ååö¨..ö ÷aq<aa<2a<qa2.jpg
While I delete it from S3 management console it said 100% success but while I the object is still there.
I have tried command line and try to delete it from EC2 instance with the command and that does not work either
My command is:
aws s3 rm s3://sws-bucket/images/jkå^ååö¨..ö ÷aq<aa<2a<qa2.jpg
Which works for another object.
eg: aws s3 rm s3://sws-bucket/images/test.jpg
Working fine.
I really want to delete the object with the special character but can't succeed. Anyone can help with this?
Finally, I have found the solution:
While I was checking in the regional file the name was
'jkå^ååö¨..ö ÷aq
So when I ran the command
s3://sws-bucket/image/'jkå^ååö¨..ö ÷aq

Correct gzip encoding on S3

While transferring my files using "aws s3 sync", transferred files does not have right Content-type and Content-encoding. I am able to solve the types by tweaking /etc/mime.types however no idea how to set right encoding for ".gz" extension so zipped files are served as text apart from:
changing types on s3 afterwards (seems like double-work to me)
aws-cli using exclude / include with correct types
(this results in multiple commands)
Any idea how to solve this? Thanks...
Here is how I solved it,
aws s3 sync /tmp/foo/ s3://bucket/ --recursive
--exclude "*" --include "*.gz" --content-type "text/plain; charset=UTF-8"
By default aws s3 sync command assumes the best matching content types. If you want to change the default behavior you need to handle them separately.
Reference:
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Hope it helps.

AWS CLI S3API find newest folder in path

I've got a very large bucket (hundreds of thousands of objects). I've got a path (lets say s3://myBucket/path1/path2). /path2 gets uploads that are also folders. So a sample might look like:
s3://myBucket/path1/path2/v6.1.0
s3://myBucket/path1/path2/v6.1.1
s3://myBucket/path1/path2/v6.1.102
s3://myBucket/path1/path2/v6.1.2
s3://myBucket/path1/path2/v6.1.25
s3://myBucket/path1/path2/v6.1.99
S3 doesn't take into account version number sorting (which makes sense) but alphabetically the last in the list is not the last uploaded. In that example .../v6.1.102 is the newest.
Here's what I've got so far:
aws s3api list-objects
--bucket myBucket
--query "sort_by(Contents[?contains(Key, \`path1/path2\`)],&LastModified)"´
--max-items 20000
So one problem here is max-items seems to start alphabetically from the all files recursively in the bucket. 20000 does get to my files but it's a pretty slow process to go through that many files.
So my questions are twofold:
1 - This is still searching the whole bucket but I just want to narrow it down to path2/ . Can I do this?
2 - This lists just objects, is it possible to pull up just a path list instead?
Basically the end goal is I just want a command to return the newest folder name like 'v6.1.102' from the example above.
To answer #1, you could add the --prefix path1/path2 to limit what you're querying in the bucket.
In terms of sorting by last modified, I can only think of using an SDK to combine the list_objects_v2 and head_object (boto3) to get last modified on the objects and programmatically sort
Update
Alternatively, you could reverse sort by LastModified in jmespath and return the first item to give you the most recent object and gather the directory from there.
aws s3api list-objects-v2 \
--bucket myBucket \
--prefix path1/path2 \
--query 'reverse(sort_by(Contents,&LastModified))[0]'
If you want general purpose querying e.g. "lowest version", "highest version", "all v6.x versions" then consider maintaining a separate database with the version numbers.
If you only need to know the highest version number and you need that to be retrieved quickly (quicker than a list object call) then you could maintain that version number independently. For example, you could use a Lambda function that responds to objects being uploaded to path1/path2 where the Lambda function is responsible for storing the highest version number that it has seen into a file at s3://mybucket/version.max.
Prefix works with list_object using boto3 client. But using boto3 resource might give some issues. Paginator in pagination is a great concept and works nice!. to find the latest changes(additions of objects) : sort_by(contents)[-1]

AWS S3 Glacier - Programmatically Initiate Restore

I have been writing an web-app using s3 for storage and glacier for backup. So I setup the lifecycle policy to archive it. Now I want to write a webapp that lists the archived files, the user should be able to initiate restore from this and then get an email once their restore is complete.
Now the trouble I am running into is I cant find a php sdk command I can issue to initiateRestore. Then it would be nice if it notified SNS when restore was complete, SNS would push the JSON onto SQS and I would poll SQS and finally email the user when polling detected a complete restore.
Any help or suggestions would be nice.
Thanks.
You could also use the AWS CLI tool like so (here I'm assuming you want to restore all files in one directory):
aws s3 ls s3://myBucket/myDir/ | awk '{if ($4) print $4}' > myFiles.txt
for x in `cat myFiles.txt`
do
echo "restoring $x"
aws s3api restore-object \
--bucket myBucket \
--key "myDir/$x" \
--restore-request '{"Days":30}'
done
Regarding your desire for notification, the CLI tool will report "A client error (RestoreAlreadyInProgress) occurred: Object restore is already in progress" if request already initiated, and probably a different message once it restores. You could run this restore command several times, looking for "restore done" error/message. Pretty hacky of course; there's probably a better way with AWS CLI tool.
Caveat: be careful with Glacier restores that exceed the allotted free-restore amount/period. If you restore too much data too quickly, charges can exponentially pile up.
I wrote something fairly similar. I can't speak to any PHP api, however there's a simple http POST that kicks off glacier restoration.
Since that happens asyncronously (and takes up to 5 hours), you have to set up a process to poll files that are restoring by making HEAD requests for the object, which will have restoration status info in an x-amz-restore header.
If it helps, my ruby code for parsing this header looks like this:
if restore = headers['x-amz-restore']
if restore.first =~ /ongoing-request="(.+?)", expiry-date="(.+?)"/
restoring = $1 == "true"
restore_date = DateTime.parse($2)
elsif restore.first =~ /ongoing-request="(.+?)"/
restoring = $1 == "true"
end
end