AWS CloudFront: How to know which Edge cache handled the user request? - amazon-web-services

I know there are 100+ Edges distributed around the world. So my requirement is to identify the CF Edge, a particular user request that got hit.
I have a CloudFront Function, pointed to the "Viewer Request" event, and I was hoping its' 'event' object would contain some info about the Edge it ran but, it was not.
My ultimate goal is to log such available info about the Edge and identify the hot Edges and eventually deploy my APIs/application close to that Edge's region.
Is there any way I could achieve this?

Related

submit PUT request through CloudFront

Can anyone please help me before I go crazy?
I have been searching for any documentation/sample-code (in JavaScript) for uploading files to S3 via CloudFront but I can't find a proper guide.
I know I could use Tranfer Acceleration feature for faster uploads and yeah, Transfer Acceleration essentially does the job through CloudFront Edge Points but as long as I searched, it is possible to make the POST/PUT request via AWS.CloudFront...
Also read an article posted in 2013 says that AWS just added a functionality to make POST/PUT requests but says not a single thing about how to do it!?
CloudFront documentation for JavaScript sucks, it does not even show any sample codes. All they do is assuming that we already know all the things about the subject. If I knew, why would I dive into documentation in the first place.
I believe there is some confusion here about adding these requests. This feature was added simply to allow POST/PUT requests to be supported for your origin so that functionality in your application such as form submissions or API requests would now function.
The recommended approach as you pointed out is to make use of S3 transfer acceleration, which actually makes use of the CloudFront edge locations.
Transfer Acceleration takes advantage of Amazon CloudFront’s globally distributed edge locations. As the data arrives at an edge location, data is routed to Amazon S3 over an optimized network path.

AWS WAF Rate Limit by Request URL Component

I want to start by saying that I'm a total newcomer to AWS.
I'm investigating using AWS WAF for dynamic rate limiting based on a component of the request URL. The AWS website has a tutorial for doing this by IP address, but I have no idea if it can be modified to do what I need.
So, with that in mind, please tell me what, if any, of the following is actually possible:
Rate limit by a component of the URL (an API key in this case)
Determine limit dynamically (different behaviour for different keys)
Perform some non-blocking action in the first instance of exceeding
the limit, then block if the limit is exceeded consistently
Log both of the above actions and do something with the outputted logs (i.e. forward them somewhere)
Again, I'm not looking for detailed how-tos here as they would probably warrant seperate questions - just want to know if this is possible.
API Gateway is probably the right fit for what you are looking to implement. It has throttling implemented out of the box.
Take a look at API Gateway Usage Plans for implementation details for your specific use case.

Uploading various sized Images to AWS Cloudfront versus post processing

We are using AWS cloudfront to render static contents on our site with origin as S3 BUCKET. Now as next steps, the user can dynamically upload images which we want to push to CDN. But we would require different sizes of it so that we can use it later in in the site. One option is to actually do preprocessing of images before pushing to S3 BUCKET . This ends up creating multiple images based on sizes. Can we do post processing something like http://imageprocessor.org/imageprocessor-web/ does but still use cloudfront. Any feedback would be helpful.
Regards
Raghav
Well, yes, it is possible to do post-processing and use CloudFront but you need an intermediate layer between CloudFront and S3. I designed a system using the following high-level implementation:
Request arrives at CloudFront, which serves the image from cache if available; otherwise CloudFront sends the request to the origin server.
The origin server is not S3. The origin server is Varnish, on EC2.
Varnish sends the request to S3, where all the resized image results are stored. If S3 returns 200 OK, the image is returned to CloudFront and to the requesting browser and the process is complete. Since the Varnish machine runs in the same AWS region as the S3 bucket, the performance is essentially indistinguishble between CloudFront >> S3 and CloudFront >> Varnish >> S3.
Otherwise, Varnish is configured to retry the failed request by sending it to the resizer platform, which also runs in EC2.
The resizer examines the request to determine what image is being requested, and what size. In my application, the desired size is in the last few characters of the filename, so xxxxx_300_300_.jpg means 300 x 300. The resizer fetches the source image... resizes it... stores the result in S3... and returns the new image to Varnish, which returns it to CloudFront and to the requester. The resizer itself is Imagemagick wrapped in Mojolicious and uses a MySQL database to identify the source URI where the original image can be fetched.
Storing the results in a backing store, like S3, and checking there, first, on each request, is a critical part of this process, because CloudFront does not work like many people seem to assume. Check your assumptions against the following assertions:
CloudFront has 50+ edge locations. Requests are routed to the edge that optimal for (usually, geographically close to) the viewer. The edge caches are all independent. If I request an object through CloudFront, and you request the same object, and our requests arrive at different edge locations, then neither of us will be served from cache. If you are generating content on demand, you want to save your results to S3 so that you do not have to repeat the processing effort.
CloudFront honors your Cache-Control: header (or overridden values in configuration) for expiration purposes, but does not guarantee to retain objects in cache until they expire. Caches are volatile and CloudFront is no exception. For this reason, too, your results need to be stored in S3 to avoid duplicate processing.
This is a much more complex solution than pre-processing.
I have a pool of millions of images, a large percentage of which would have a very low probability of being viewed, and this is an appropriate solution, here. It was originally designed as a parallel solution to make up for deficiencies in a poorly-architected preprocessor that sometimes "forgot" to process everything correctly, but it worked so well that it is now the only service providing images.
However, if your motivation revolves around avoiding the storage cost of the preprocessed results, this solution won't entirely solve that.

Returning images through AWS API Gateway

I'm trying to use AWS API Gateway as a proxy in front of an image service.
I'm able to get the image to come through but it gets displayed as a big chunk of ASCII because Content-Type is getting set to "application/json".
Is there a way to tell the gateway NOT to change the source Content-Type at all?
I just want "image/jpeg", "image/png", etc. to come through.
I was trying to format a string to be returned w/o quotes and discovered the Integration Response functionality. I haven't tried this fix myself, but something along these lines should work:
Go to the Method Execution page of your Resource,
click on Integration Response,
expand Method Response Status 200,
expand Mapping Templates,
click "application/json",
click the pencil next to Output Passthrough,
change "application/json" to "image/png"
Hope it works!
I apologize, in advance, for giving an answer that does not directly answer the question, and instead suggests you adopt a different approach... but based in the question and comments, and my own experience with what I believe to be a similar application, it seems like you may be using the the wrong tool for the problem, or at least a tool that is not the optimal choice within the AWS ecosystem.
If your image service was running inside Amazon Lambda, the need for API Gateway would be more apparent. Absent that, I don't see it.
Amazon CloudFront provides fetching of content from a back-end server, caching of content (at over 50 "edge" locations globally), no charge for the storage of cached content, and you can configure up to 100 distinct hostnames pointing to a single Cloudfront distribution, in addition to the default xxxxxxxx.cloudfront.net hostname. It also supports SSL. This seems like what you are trying to do, and then some.
I use it, quite successfully for exactly the scenario you describe: "a proxy in front of an image service." Exactly what my image service and your image service do may be different (mine is a resizer that can look up the source URL of missing/never before requested images, fetch, and resize) but fundamentally it seems like we're accomplishing a similar purpose.
Curiously, the pricing structure of CloudFront in some regions (such as us-east-1 and us-west-2) is such that it's not only cost-effective, but in fact using CloudFront can be almost $0.005 cheaper than not using it per gigabyte downloaded.
In my case, in addition to the back-end image service, I also have an S3 bucket with a single file in it, attached to a single path in the CloudFront distribution (as a second "custom origin"), for the sole purpose of serving up /robots.txt, to control direct access to my images by well-behaved crawlers. This allows the robots.txt file to be managed separately from the image service itself.
If this doesn't seem to address your need, feel free to comment and I will clarify or withdraw this answer.
#kjsc: we finally figured out how to get this working on an alternate question with base64 encoded image data which you may find helpful in your solution:
AWS Gateway API base64Decode produces garbled binary?
To answer your question, to get the Content-Type to come through as a hard-coded value, you would first go into the method response screen and add a Content-Type header and whatever Content type you want.
Then you'd go into the Integration Response screen and set the Content type to your desired value (image/png in this example). Wrap 'image/png' in single quotes.

Push files up to Amazon Cloudfront: Possible?

I've been reading up about pull and push CDNs. I've been using Cloudfront as a pull CDN for resized images:
Receive image from client
Put image in S3
later on, when a client makes a request to cloudfront for a URL, Cloudfront does not have the image, hence it has to forward it to my server, which:
Receive request
Pull image from S3
Resize image
Push image back to Cloudfront
However, this takes a few seconds, which is a really annoying wait when you first upload your beautiful image and want to see it. The delay appears to be mostly the download/reuploading time, rather than the resizing, which is pretty fast.
Is it possible to pro-actively push the resized image to Cloudfront and attach it to a URL, such that future requests can immediately get the prepared image? Ideally I would like to
Receive image from client
Put image in S3
Resize image for common sizes
Pre-emptively push these sizes to cloudfront
This avoids the whole download/reupload cycle, making the common sizes really fast, but the less-common sizes can still be accessed (albeit with a delay the first time). However, to do this I'd need to push the images up to Cloudfront. This:
http://www.whoishostingthis.com/blog/2010/06/30/cdns-push-vs-pull/
seems to suggest it can be done, but everything else i've seen makes no mention of it. My question is: is it possible? Or are there any other solutions to this problem that I am missing?
We have tried to similar things with different CDN providers, and for CloudFront I don't think there is any existing way for you to push (what we call pre-feeding) your specific contents to nodes/edges if the cloudfront distribution is using your custom origin.
One way I can think of, also as mentioned by #Xint0 is set up another S3 bucket to specifically hosting those files you would like to push (in your case those resized images). Basically you will have two cloudFront distributions one to pull those files rarely accessed and another to push for those files accessed frequently and also those images you expect to be resized. This sounds a little bit complex but I believe that's the tradeoff you have to make.
Another point I can recommend you to look at is EdgeCast which is another CDN provider and they do provide function called load_to_edge (which I spent quite a lot of time last month to integrate this with our service, that's why I remember it clearly) which does exactly what you expect. They also support custom origin pull, so that maybe you can take a trial there.
The OP asks for a push CDN solution, but it sounds like he's really just trying to make things faster. I'm venturing that you probably don't really need to implement a CDN push, you just need to optimize your origin server pattern.
So, OP, I'm going to assume you're supporting at most a handful of image sizes--let's say 128x128, 256x256 and 512x512. It also sounds like you have your original versions of these images in S3.
This is what currently happens on a cache miss:
CDN receives request for a 128x128 version of an image
CDN does not have that image, so it requests it from your origin server
Your origin server receives the request
Your origin server downloads the original image from S3 (presumably a larger image)
Your origin resizes that image and returns it to the CDN
CDN returns that image to user and caches it
What you should be doing instead:
There are a few options here depending on your exact situation.
Here are some things you could fix quickly, with your current setup:
If you have to fetch your original images from S3, you're basically making it so that a cache miss results in every image taking as long to download as the original sized image. If at all possible, you should try to stash those original images somewhere that your origin server can access quickly. There's a million different options here depending on your setup, but fetching them from S3 is about the slowest of all of them. At least you aren't using Glacier ;).
You aren't caching the resized images. That means that every edge node Cloudfront uses is going to request this image, which triggers the whole resizing process. Cloudfront may have hundreds of individual edge node servers, meaning hundreds of missing and resizes per image. Depending on what Cloudfront does for tiered distribution, and how you set your file headers it may not actually be that bad, but it won't be good.
I'm going out on a limb here, but I'm betting you aren't setting custom expiration headers, which means Cloudfront is only caching each of these images for 24 hours. If your images are immutable once uploaded, you'd really benefit from returning expiration headers telling the CDN not to check for a new version for a long, long time.
Here are a couple ideas for potentially better patterns:
When someone uploads a new image, immediately transcode it into all the sizes you support and upload those to S3. Then just point your CDN at that S3 bucket. This assumes you have a manageable number of supported image sizes. However, I would point out that if you support too many image sizes, a CDN may be the wrong solution altogether. Your cache hit rate may be so low that the CDN is really getting in the way. If that's the case, see the next point.
If you are supporting something like continuous resizing (ie, I could request image_57x157.jpg or image_315x715.jpg, etc and the server would return it) then your CDN may actually be doing you a disservice by introducing an extra hop without offloading much from your origin. In that case, I would probably spin up EC2 instances in all the available regions, install your origin server on them, and then swap image URLs to regionally appropriate origins based on client IP (effectively rolling your own CDN).
And if you reeeeeally want to push to Cloudfront:
You probably don't need to, but if you simply must, here are a couple options:
Write a script to use the webpagetest.org APIs to fetch your image from a variety of different places around the world. In a sense, you'd be pushing a pull command to all the different edge locations. This isn't guaranteed to populate every edge location, but you could probably get close. Note that I'm not sure how thrilled webpagetest.org would be about using it this way, but I don't see anything in there terms of use about it (IANAL).
If you don't want to use a third party or risk irking webpagetest.org, just spin up a micro EC2 instance in every region, and use those to fetch the content, same as in #1.
AFAIK CloudFront uses S3 buckets as the datastore. So, after resizing the images you should be able to save the resized images to the S3 bucket used by CloudFront directly.