AWS CloudFront has the option to compress files (see here)
What are the pros and cons of using AWS CloudFront gzip compression vs using the compression-webpack-plugin?
CloudFront compresses content on the fly and it has certain limitation.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html
The common problem is
CloudFront is busy
In rare cases, when a CloudFront edge location is unusually busy, some files might not be compressed.
This is most common problem because if this happens CF caches a non compressed copy and serve it until you clear the cache.
compression-webpack-plugin on the other hand compresses at the disk which is more helpful.
Related
I'm encoding dash streams locally that I intend to stream through Cloudfront after, but when it comes to uploading the whole folder it get counted as +4000 PUT requests. So, I thought instead to compress it and upload the zip folder that would count as only 1 PUT request, and then Unzip it using lambda.
My question is, is lambda still going to use the PUT requests for unzipping the file ? And if so, what would be a better/cost effective way to achieve this ?
No, there is no way around having to pay for the individual PUT/POST requests per-file.
S3 is expensive. So is anything related to video streaming. The bandwidth and storage costs will eclipse your HTTP request costs. You might consider a more affordable provider. AWS is the highest price out of all that do S3-compatible hosting.
Can anyone please help me before I go crazy?
I have been searching for any documentation/sample-code (in JavaScript) for uploading files to S3 via CloudFront but I can't find a proper guide.
I know I could use Tranfer Acceleration feature for faster uploads and yeah, Transfer Acceleration essentially does the job through CloudFront Edge Points but as long as I searched, it is possible to make the POST/PUT request via AWS.CloudFront...
Also read an article posted in 2013 says that AWS just added a functionality to make POST/PUT requests but says not a single thing about how to do it!?
CloudFront documentation for JavaScript sucks, it does not even show any sample codes. All they do is assuming that we already know all the things about the subject. If I knew, why would I dive into documentation in the first place.
I believe there is some confusion here about adding these requests. This feature was added simply to allow POST/PUT requests to be supported for your origin so that functionality in your application such as form submissions or API requests would now function.
The recommended approach as you pointed out is to make use of S3 transfer acceleration, which actually makes use of the CloudFront edge locations.
Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.
We are using AWS cloudfront to render static contents on our site with origin as S3 BUCKET. Now as next steps, the user can dynamically upload images which we want to push to CDN. But we would require different sizes of it so that we can use it later in in the site. One option is to actually do preprocessing of images before pushing to S3 BUCKET . This ends up creating multiple images based on sizes. Can we do post processing something like http://imageprocessor.org/imageprocessor-web/ does but still use cloudfront. Any feedback would be helpful.
Regards
Raghav
Well, yes, it is possible to do post-processing and use CloudFront but you need an intermediate layer between CloudFront and S3. I designed a system using the following high-level implementation:
Request arrives at CloudFront, which serves the image from cache if available; otherwise CloudFront sends the request to the origin server.
The origin server is not S3. The origin server is Varnish, on EC2.
Varnish sends the request to S3, where all the resized image results are stored. If S3 returns 200 OK, the image is returned to CloudFront and to the requesting browser and the process is complete. Since the Varnish machine runs in the same AWS region as the S3 bucket, the performance is essentially indistinguishble between CloudFront >> S3 and CloudFront >> Varnish >> S3.
Otherwise, Varnish is configured to retry the failed request by sending it to the resizer platform, which also runs in EC2.
The resizer examines the request to determine what image is being requested, and what size. In my application, the desired size is in the last few characters of the filename, so xxxxx_300_300_.jpg means 300 x 300. The resizer fetches the source image... resizes it... stores the result in S3... and returns the new image to Varnish, which returns it to CloudFront and to the requester. The resizer itself is Imagemagick wrapped in Mojolicious and uses a MySQL database to identify the source URI where the original image can be fetched.
Storing the results in a backing store, like S3, and checking there, first, on each request, is a critical part of this process, because CloudFront does not work like many people seem to assume. Check your assumptions against the following assertions:
CloudFront has 50+ edge locations. Requests are routed to the edge that optimal for (usually, geographically close to) the viewer. The edge caches are all independent. If I request an object through CloudFront, and you request the same object, and our requests arrive at different edge locations, then neither of us will be served from cache. If you are generating content on demand, you want to save your results to S3 so that you do not have to repeat the processing effort.
CloudFront honors your Cache-Control: header (or overridden values in configuration) for expiration purposes, but does not guarantee to retain objects in cache until they expire. Caches are volatile and CloudFront is no exception. For this reason, too, your results need to be stored in S3 to avoid duplicate processing.
This is a much more complex solution than pre-processing.
I have a pool of millions of images, a large percentage of which would have a very low probability of being viewed, and this is an appropriate solution, here. It was originally designed as a parallel solution to make up for deficiencies in a poorly-architected preprocessor that sometimes "forgot" to process everything correctly, but it worked so well that it is now the only service providing images.
However, if your motivation revolves around avoiding the storage cost of the preprocessed results, this solution won't entirely solve that.
I am using CloudFront to access my S3 bucket.
I perform both GET and PUT operations to retrieve and update the data. The problem is that after i send PUT request with new data, GET request still returns the older data. I do see that the file is updated in S3 bucket.
I am performing both GET and PUT from iOS application. However, i tried performing GET request using regular browsers and i still receive older data.
Do i need to do anything in addition to make CloudFront refresh its data?
Cloudfront caches your data. How long depends on the headers the origin serves content with and the distribution settings.
Amazon has a document with the full results of how they interac, but if you haven't set your cache control headers, not changed any cloudfront settings, then by default data is cached for upto 24 hours.
You can either:
set headers indicating how long to cache content for (e.g. Cache-Control: max-age=300 to allow caching for up to 5 minutes). How exactly you do this depends on how you are uploading the content, at a pinch you can use the console
Use the console / api to invalidate content. Beware that only the first 1000 invalidations a month a free - beyond that amazon charges. In addition, invalidations take 10-15 minutes to process.
Change the naming strategy for your s3 data so that new data is served under a different name (perhaps less relevant in your case)
When you PUT an object into S3 by sending it through Cloudfront, Cloudfront proxies the PUT request back to S3, without interpreting it within Cloudfront... so the PUT request changes the S3 object, but the old version of the object, if cached, would have no reason to be evicted from the Cloudfront cache, and would continue to be served until it is expired, evicted, or invalidated.
"The" Cloudfront cache is not a single thing. Cloudfront has over 50 global edge locations (reqests are routed to what should be the closest one, using geolocating DNS), and objects are only cached in locations through which they have been requested. Sending an invalidation request to purge an object from cache causes a background process at AWS to contact all of the edge locations and request the object be purged, if it exists.
What's the point of uploading this way, then? The point has to do with the impact of packet loss, latency, and overall network performance on the throughput of a TCP connection.
The Cloudfront edge locations are connected to the S3 regions by high bandwidth, low loss, low latency (within the bounds of the laws of physics) connections... so the connection from the "back side" of Cloudfront towards S3 may be a connection of higher quality than the browser would be able to establish.
Since the Cloudfront edge location is also likely to be closer to the browser than S3 is, the browser connection is likely to be of higher quality and more resilient... thereby improving the net quality of the end-to-end logical connection, by splitting it into two connections. This feature is solely about performance:
http://aws.amazon.com/blogs/aws/amazon-cloudfront-content-uploads-post-put-other-methods/
If you don't have any issues sending directly to S3, then uploads "through" Cloudfront serve little purpose.
I am running a cityscape and nature photography website that contains a lot of images which range from 50kb-2mb in size. I have already shrunk them down in size using a batch photo editor so I can't lose any more quality in the images without them getting too grainy.
Google page insights recommends lossless compression and I am trying to figure out how to solve this. These specific images are in s3 buckets and being served by AWS cloudfront
Losslessly compressing https://d339oe4gm47j4m.cloudfront.net/bw107.jpg could save 57.6KiB (38% reduction).
Losslessly compressing https://luminoto-misc.s3-us-west-2.amazonaws.com/bob_horsch.jpg could save 40.6KiB (42% reduction). ...... and a hundred more of the same.
Can Cloudfront do the compression before the image is server to the client? Or do I have to do some other type of compression and then reupload each file to a new s3 bucket. I am looking for a solution where cloudfront will do it.
I have searched around but haven't found a definitive answer.
Thanks,
Jeff
Update
As implicitly pointed out by Ryan Parman (+1), there are two different layers at play when it comes to compression (and/or optimization), which seem to get mixed a bit in this discussion so far:
My initial answer below has addressed lossless compression using Cloudfront as per your question title, which is concerned with the HTTP compression layer:
HTTP compression is a capability that can be built into web servers and web clients to make better use of available bandwidth, and provide greater transmission speeds between both.
[...] data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. [...]
That is, the compress/decompress operation is usually automatically handled by the server and the client to optimize bandwidth usage and transmission performance - the difference with CloudFront is, that its server implementation does indeed not handle compression automatically like most web servers, which is why you need to prepare a compressed representation yourself if desired.
This kind of compression works best with text files like HTML, CSS and JavaScript etc., but isn't useful (or even detrimental) with binary data formats that are already compressed by themselves like ZIP and other prepacked archives and esp. image formats like PNG and JPEG.
Now, your question body talks about a different compression/optimization layer all together, namely lossy JPEG_compression and specifically Lossless_editing as well as optimization via jpegoptim - this has nothing to do with how files are handled by HTTP servers and clients, rather just compressing/optimizing the files themselves to better match the performance constraints within specific use cases like web or mobile browsing, where the transmission of a digital photo in its original size wouldn't make any sense when it is simply to be viewed on a web page for example.
This kind of compression/optimization is one that is rarely offered by web servers themselves so far, even though notable efforts like Google's mod_pagespeed are available these days - usually it is the responsibility of the web designer to prepare appropriate assets, ideally optimized for and selectively delivered to the expected target audience via CSS Media queries.
Initial Answer
AWS CloudFront is capable of Serving Compressed Files, however, this is to be taken literally:
Amazon CloudFront can serve both compressed and uncompressed files
from an origin server. CloudFront relies on the origin server either
to compress the files or to have compressed and uncompressed versions
of files available; CloudFront does not perform the compression on
behalf of the origin server. With some qualifications, CloudFront can
also serve compressed content from Amazon S3. For more information,
see Choosing the File Types to Compress. [emphasis mine]
That is, you'll need to provide compressed versions yourself, but once you've set this up, this is transparent for clients - please note that the content must be compressed using gzip; other compression algorithms are not supported:
[...] If the request header includes additional content encodings, for example, deflate or sdch, CloudFront removes them before forwarding the request to the origin server. If gzip is missing from the Accept-Encoding field, CloudFront serves only the uncompressed version of the file. [...]
Details regarding the requirements and process are outlined in How CloudFront Serves Compressed Content from a Custom Origin and Serving Compressed Files from Amazon S3.
JPEGOptim doesn't do any compression -- it does optimization.
The short answer is, yes, you should always use JPEGOptim on your .jpg files to optimize them before uploading them to S3 (or whatever your source storage is). This has been a good idea since forever.
If you're talking about files which are plain text-based (e.g., CSS, JavaScript, HTML), then gzip-compression is the appropriate solution, and Steffen Opel would have had the 100% correct answer.
The only compression amazon really supports is zip or gzip. You are able to load those compressions into S3, and then do things like loads directly into resources like Redshift. So in short, no amazon does not provide you with the service you are looking for. This would be something you would have to leverage yourself...