Correct gzip encoding on S3 - amazon-web-services

While transferring my files using "aws s3 sync", transferred files does not have right Content-type and Content-encoding. I am able to solve the types by tweaking /etc/mime.types however no idea how to set right encoding for ".gz" extension so zipped files are served as text apart from:
changing types on s3 afterwards (seems like double-work to me)
aws-cli using exclude / include with correct types
(this results in multiple commands)
Any idea how to solve this? Thanks...

Here is how I solved it,
aws s3 sync /tmp/foo/ s3://bucket/ --recursive
--exclude "*" --include "*.gz" --content-type "text/plain; charset=UTF-8"
By default aws s3 sync command assumes the best matching content types. If you want to change the default behavior you need to handle them separately.
Reference:
https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
Hope it helps.

Related

AWS S3 ls command returns the folder name too along with files in it

This might seem like a stupid question but I am not able to find a reason for this. When I am running command aws s3 ls on S3 URI, it gives the name of parent folder in output for some of the buckets and for some it will just list the files in the folder.
Example:
aws s3 ls s3://test-bucket/test_folder/ --recursive --human-readable --summarize
2022-06-28 20:04:36 0 Bytes test_folder/
2022-06-28 20:05:58 3.0 KiB test_folder/file.txt
and for another s3 URI it will just list the contents
aws s3 ls s3://sample_/sample_test/ --recursive --human-readable --summarize
2021-06-29 03:24:08 5.2 sample_test/file1.txt
2021-06-29 03:24:07 7.0 sample_test/file2.txt
2021-06-29 03:24:08 5.1 sample_test/file3.txt
I am not sure what is causing this behavior, is there any documentation which I am missing here
Thanks
This is likely because someone used the S3 console to explicitly create a 'folder' named test_folder but they didn't do that for sample_test. They simply uploaded 3 files to sample_test.
What you see as the folder test_folder/ is simply an S3 object whose key is test_folder/ and whose size is zero. It doesn't need to exist for you to be able to upload files to test_folder/. It's just a visual convenience in the S3 console.
There are typically no real folders in S3. They're virtual, and inferred from the presence of multiple objects with a common prefix ending in forward slash e.g. dogs/bingo.png and dogs/elvis.jpg implies the presence of a virtual folder named dogs/, but it doesn't really exist (typically).

Sync objects to S3 with expires and cache-control headers using the CLI?

I'm trying to sync some objects to S3 and set the Expires and Cache-Control headers, but I'm at my wit's end here. Nothing seems to work. Here's my latest attempt:
aws s3 sync . s3://my-bucket \
--expires "2020-06-16T13:27:40Z" \
--cache-control "max-age=315360000, public, s-maxage=31536000, max-age=31536000, immutable" \
--exclude "*" \
--metadata-directive REPLACE \
--include "bundles"
The result: no Expires header, no Cache-Control header. I've looked around in the console (only one metadata, Content-Type), I've used get-object to look at it, and I looked at the response with curl. I'm not really sure about metadata-directive - it is not mentioned under --expires in the docs, but the docs for the directive option indicates it must be set for the other ones to work. What crazy incantation must I conjure to have these headers set on my objects?
This eventually turned out to be a PEBCAC. This does indeed work when run in isolation. I ran it as part of a multi-step process that would first sync some files, except the ones I wanted the headers on, and then sync the files with headers. Problem was, I fat-fingered the exclude pattern in the first sync, so basically all the files where already synced, and so the header-setting sync did nothing. Ah, ain't being a developer lovely?

AWS CLI - A file containing items to be ignored for S3 copy or sync

Is it somewhat possible to have a file containing ignored files and folders during uploading items through AWS CLI.
It has an --exclude flag like mentioned here. However, the concept I seek is something like .gitignore or .dockerignore file rather than enlisting with a flag.
No, there is no in-built capability within the AWS Command-Line Interface (CLI) to support .ignore file capabilities.
I know it's not exactly what you are looking for but you could set an alias in your ~/.bash_profile something like:
alias s3_cp=`aws s3 cp --exclude "yadda, yadda, yadda"`
This would at least reduce the need to type them every time, even though it isn't in a concise file.
Edit: Here is a link that shows it doesn't look like the base config file supports what you are looking for. https://docs.aws.amazon.com/cli/latest/topic/s3-config.html

Cannot delete Amazon S3 key that contains bad character

I just began to use S3 recently. I accidentally made a key that contains a bad character, and now I can't list the contents of that folder, nor delete that bad key. (I've since added checks to make sure I don't do this again).
I was using an old "S3" python module from 2008 originally. Now I've switched to boto-2.0, and I still cannot delete it. I did quite a bit of research online, and it seems the problem is I have an invalid XML character, so it seems a problem at the lowest level, and no API has helped so far.
I finally contacted Amazon, and they said to use "s3-curl.pl" from http://aws.amazon.com/code/128. I downloaded it, and here's my key:
<Key>info/[01</Key>
I think I was doing a quick bash for loop over some files at the time, and I have "lscolors" set up, and so this happened.
I tried
./s3curl.pl --id <myID> --key <myKEY> -- -X DELETE https://mybucket.s3.amazonaws.com/info/[01
(and also tried putting the URL in single/double quotes, and also tried to escape the '[').
Without quotes on the URL, it hangs. With quotes, I get "curl: (3) [globbing] error: bad range specification after pos 50". I edited the s3-curl.pl to do curl --globoff and still get this error.
I would appreciate any help.
This solved the issue, just delete the main folder:
aws s3 rm "s3://BUCKET_NAME/folder/folder" --recursive
You can use the s3cmd tool from here. You first need to run
s3cmd fixbucket <bucket name that contains bad file>.
You can then delete the file using
s3cmd del <bucket>/<file>
In my case there were newlines in the key (however that happened..). I was able to fix it with the aws cli like this:
aws cli rm "s3://my_bucket/Icon"$'\r'
I also had versioning enabled, so I also needed to do this, for all the versions (versions ids are visible in the UI when enabling the version view):
aws s3api delete-object --bucket my_bucket --key "Icon"$'\r' --version-id <version_id>
I was in this situation recently, to list the items you can use:
aws s3api list-objects-v2 --bucket my_bucket --encoding-type url
the bad keys will come back url encoded like:
"Key": "%01%C3%B4%C2%B3%C3%8Bu%C2%A5%27%40yr%3E%60%0EQ%14%C3%A5.gif"
spaces became + and I had to change those to %20 and * wasn't encoded I had to replace those with %2A before I was able to delete them.
To actually delete them, I wasn't able to use the aws cli because it would urlencode the already urlencoded key resulting in a 404, so to get around that I manually hit the rest API with the DELETE verb.
I recently encountered this case. I had newline at the end of my bucket. The following command solved the matter.
aws s3 rm "bucket_name"$'\r' --recursive

Change Content-Disposition of existing S3 object

In S3 REST API I am adding metadata to an existing object by using the PUT (Copy) command and copying a key to the same location with 'x-amz-metadata-directive' = 'REPLACE'
What I want to do is change the download file name by setting:
Content-Disposition: attachment; filename=foo.bar;
This sets the metadata correctly but when I download the file it still uses the key name instead of 'foo.bar'
I use a software tool S3 Browser to view the metadata and it looks correct (apart from 'Content-Disposition' being all lower case as that's was S3 ask me to sign)
Then using S3 Browser I just pressed, then save without changing anything and now it works???
What am I missing how come setting a metadata 'Content-Disposition: attachment; filename=foo.bar;' from my web app does not work but does work from S3 Browser?
Edited for clarity:
Content-Disposition must be set explicitly and not included as x-amz-meta-Content-Disposition. All metadata header names must start with "x-amz-meta-" and be all lowercase.
Thanks to #Eyal for clarifying.
Original:
>SOLVED:
>
>The Doco at http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?RESTAuthentication.html
>
>seems to be wrong it says:
>
>Notice how the 'x-amz-' headers are sorted, white-space trimmed, converted tolowercase, and multiple headers with the same name have been joined using a comma toseparate values.
>
>Note how only the Content-Type and Content-MD5HTTPentity headers appear in the StringToSign .The otherContent-* entity headers do not.
However Content-Disposition must be set specifically and not included as : x-amz-meta-Content-Disposition
>
>It now works fine.
here: this uses the cli to set the content-disposition header on all files in a path inside a bucket (and also sets them as public):
aws s3 ls s3://mybucket/brand_img/ios/|awk {'print $4'} > objects.txt
while read line; do aws s3api copy-object --bucket mybucket \
--copy-source /mybucket/brand_img/ios/$line --key brand_img/ios/$line \
--metadata-directive REPLACE --metadata Content-Disposition=$line --acl public-read; done < objects.txt