How to enable gzip compression on AWS CloudFront - amazon-web-services

I m trying to gzip compress the img I m serving through CloudFront. My origin is S3
Based on several articles/blogs on aws, what I did is:
1) Set "Content-Length" header for the object I want to compress. I set the value equal to the size appeared on the size property box
2) Set the Compress Objects Automatically value to Yes in the Behaviour of my cloud distribution.
3) I invalidated my object to get a fresh copy from S3.
Still I m not able to make CloudFront gzip my object. Any ideas?

I'm trying to gzip compress the [image]
You don't typically need to gzip images -- doing so saves very little bandwidth, if any, since virtually all image formats used on the web are already compressed.
Also, CloudFront doesn't support it.
See File Types that CloudFront Compresses for the suported file formats. They are text-based formats, which tend to benefit substantially from gzip compression.
If you really want the files served gzipped, you can store the files in S3, already gzipped.
$ gzip -9 myfile.png
This will create a gzipped file myfile.png.gz.
Upload the file to S3 without the .gz on the end. Set the Content-Encoding: header to gzip and set the Content-Type: header to the normal, correct value for the file, such as image/png.
This breaks any browser that doesn't understand Content-Encoding: gzip, but there should be no browsers in use that have that limitation.
Note that the -9, above, means maximum compression.

If you're trying to gzip jpegs/pngs, I would suggest that you first compress them online with a tool such as https://tinyjpg.com/
You will not need to compress the images further ideally. Compressing images with image optimization tools will work better than using gzip -9 as it takes into consideration the textures, colors and patterns and such.
Also, make sure that you save your file in the proper formats (actual images in jpg and vector images in png) - This will help in reducing the size of the images

Related

Does content encoding only compress for transmission, or also for persistent storage in S3?

Does 'Content-Encoding': 'gzip' reduce the file size in AWS S3?
Documentation says: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Encoding
This lets the recipient know how to decode the representation in order to obtain the original payload format.
Does it mean if a file / image / GIF is sent to S3, AWS will decode it and save it in a decoded way? Or S3 will store it in a compressed way, and serve it also in a compressed way?
In my case we store GIF-s in S3, and they have to be smaller than 8 MB. So need some kind of compression.
content type is applied on the fly on the image. You can disable and enable the content type headers on the s3 and check the file size on s3. They will be the same in both the cases.

Preferred internet connection to transport of 4 TB CSV files to Amazon (AWS)

I need to transfer bunch of CSV files with around 4 TB in total to AWS.
What is preferred internet connection from my ISP which can handle this transfer or
link does not play any role. My link is 70 Mbps Upload/Download Dedicated. Is this enough or I need to increase my link speed?
Thnx.
4 TB = 4,194,304 mbyte
70 mbit/sec ~= 8.75 mbyte/sec (approximate because there will be network overheads)
Dividing results in 479,349 seconds, or 5.55 days
Increasing your link speed will certainly improve this, but you'll probably find that you get more improvement using compression (CSV implies text with a numeric bias, which compresses extremely well).
You don't say what you'll be uploading to, nor how you'll be using the results. If you're uploading to S3, I'd suggest using GZip (or another compression format) to compress the files before uploading, and then let the consumers decompress as needed. If you're uploading to EFS, I'd create an EC2 instance to receive the files and use rsync with the -z option (which will compress over the wire but leave the files uncompressed on the destination). Of course, you may still prefer pre-compressing the files, to save on long-term storage costs.

gZIP with AWS cloudFront and S3

CloudFront offers compression (gZIP) for certain file types from the origin. My architecture looks like this:
So, the requirements for the files to get compressed in cloudFront are:
1. Have to enable Compress Objects Automatically option in cloudFront's cache behaviour settings.
2. content-type and content-length has to be returned by S3. S3 sends these headers by default. I have cross checked this.
3. The received file type must be one of the file types listed by cloudFront. In my case, I want to compress app.bundle.js which comes under application/javascript (content-type) and it is also present in the supported file-types of cloudFront.
I guess above are the only requirements to get a gZipped version of the files to browser. Even after having the above things, gzip does not work for me. Any ideas, what am I missing?

Merging files on AWS S3 (Using Apache Camel)

I have some files that are being uploaded to S3 and processed for some Redshift task. After that task is complete these files need to be merged. Currently I am deleting these files and uploading merged files again.
These eats up a lot of bandwidth. Is there any way the files can be merged directly on S3?
I am using Apache Camel for routing.
S3 allows you to use an S3 file URI as the source for a copy operation. Combined with S3's Multi-Part Upload API, you can supply several S3 object URI's as the sources keys for a multi-part upload.
However, the devil is in the details. S3's multi-part upload API has a minimum file part size of 5MB. Thus, if any file in the series of files under concatenation is < 5MB, it will fail.
However, you can work around this by exploiting the loop hole which allows the final upload piece to be < 5MB (allowed because this happens in the real world when uploading remainder pieces).
My production code does this by:
Interrogating the manifest of files to be uploaded
If first part is
under 5MB, download pieces* and buffer to disk until 5MB is buffered.
Append parts sequentially until file concatenation complete
If a non-terminus file is < 5MB, append it, then finish the upload and create a new upload and continue.
Finally, there is a bug in the S3 API. The ETag (which is really any MD5 file checksum on S3, is not properly recalculated at the completion of a multi-part upload. To fix this, copy the fine on completion. If you use a temp location during concatenation, this will be resolved on the final copy operation.
* Note that you can download a byte range of a file. This way, if part 1 is 10K, and part 2 is 5GB, you only need to read in 5110K to get meet the 5MB size needed to continue.
** You could also have a 5MB block of zeros on S3 and use it as your default starting piece. Then, when the upload is complete, do a file copy using byte range of 5MB+1 to EOF-1
P.S. When I have time to make a Gist of this code I'll post the link here.
You can use Multipart Upload with Copy to merge objects on S3 without downloading and uploading them again.
You can find some examples in Java, .NET or with the REST API here.

Force AWS EMR to unzip files in S3

I have a bucket in AWS's S3 service that contains gzipped CSV files, however when they were stored they all were saved with the metadata Content-Type of text/csv.
Now I am using AWS EMR, which will not recognize them as a zipped file and unzip them. I've looked through configuration option for EMR but don't see anything that would work... I have almost a million files, so renaming their metadata value would require a Boto script that cycled through all the files and renamed the metadata value.
Am I missing something easy? Thanks!
The Content-Type isn't the problem... that's correct if the files are csv, but if you stored them gzipped, then you needed to also have set Content-Encoding: gzip in the header metadata. Doing that "should" trigger the useragent that's fetching them to gunzip them on the fly when they are downloaded... so had you done that, it should have "just worked."
(I store gzipped log files this way, with Content-Type: text/plain and Content-Encoding: gzip and when you download them with a web browser, the file you get is no longer gzipped because the browser untwizzles the compression on the fly due to the Content-Encoding header.)
But, since you've already uploaded the files, I did find this in the google machine, which might help:
GZipped input. A lot of my input data had already been gzipped, but luckily if you pass -jobconf stream.recordreader.compression=gzip in the extra arguments section Hadoop will decompress them on the fly before passing the data to your mapper.
http://petewarden.typepad.com/searchbrowser/2010/01/elastic-mapreduce-tips.html