I want to use a single AWS CloudFront CDN distribution to serve both production and development content with different behaviors for each.
I have production and development content in one bucket.
bucket
|-- root
|-- p
|-- demo.html
|-- d
|-- v1
|-- demo.html
The aim is to serve the production files from the root and the dev files from a path with directories.
my.domain.com/demo.html and my.domain.com/d/v1/demo.html
I have two origins:
1 - Origin: wm-prod. Origin domain: wm-bucket. Origin path /root/p
2 - Origin: wm-dev. Origin domain: wm-bucket. Origin path /root/d
and two behaviors:
1 - Behavior precedence: 1. Path pattern Default (*). Origin wm-prod
2 - Behavior precendence: 0. Path pattern /d. Origin wm-dev
I tried /d/*, d/, d/*, and even simply d as the path pattern. I've also tried using /dev so the origin path and url path pattern are different. No luck in either case.
Expected behavior is that when someone visits the root (my.domain.com/demo.html) the production file is loaded thanks to the first origin path of /root/p. This works as expected. Expected behavior also includes visitors who visit a URL with the path pattern that matches /d (my.domain.com/d/v1/demo.html) seeing content from the second origin whose origin path is /root/d. This is not working as expected.
In all cases in which it's "not working" the standard S3 message for an unfound file is returned. "This XML file does not appear...."
A workaround to skip the multiple behaviors to simply include /p in the URL of the production content (my.domain.com/p/demo.html). This is what I have in place to proceed, however, it would be preferred to use the root path.
Is it possible to achieve what I'm seeking?
Thank you!
Related
I am currently trying to create a cloudfront distribution, with an application load balancer (ALB) as an origin.
I have set up an apache Server on both my EC2 instances (which are linked to the ALB) and created two different repositories under:
/var/www/html/cache/index.html
/var/www/html/no_cache/index.html
with the two files (index.html ) referencing on which instance they are, and if they belong to the caching or non-caching setup-
After that, I tried to set up cloudfront in order to cache the files with pathpattern /cache/ (using a managed caching policy ) and not to cache the files with path pattern /no_cache/. The default setting is set up for non-caching also.
after so many trials, I figured out that cloudfront is not caching anything. the index.html file under /cache/index.html changes instantantly when I edit it and refresh the page.
I tried to see the logs and queried them from athena:
Here I have the results:
As you can see, I always get '' Miss'' as result in result_type and x_edge_detailed result type logs. From the official AWS Docs, the interpretation of miss looks like this:
Miss – The request could not be satisfied by an object in the cache, so the server forwarded the request to the origin server and returned the result to the viewer.
Could some one tell me more about the problem? I am really confused.
I have a cloudfront distribution which I'm using with a single S3 bucket as the origin. The bucket has both private and public data in it, segregated by folders - public_folder_1, public_folder_2, private_folder_1, private_folder_2. I want to use cloudfront to serve only the content from the 2 public folders. I want requests to xxx.cloudfront.net/public_folder_1/file1 to go to public_folder_1 and requests to xxx.cloudfront.net/public_folder_2/file1 to go to public_folder_2.
I have created 2 origins within the distro with origin name + paths - mybucket/public_folder_1 and mybucket/public_folder_2 for the 2 folders. I have also created two behaviours with path patterns - public_folder_1/* and public_folder_2/* (I have tried adding a leading / to the path pattern, it doesn't seem to make a difference) . But I can't access files via cloudfront.
If I change the path pattern for either behaviour to * instead of public_folder_x/* then I can access files using xxx.cloudfront.net/filex . My concern with this is that if I have 2 files with the same name in both folders how would cloudfront know which folder to use as the origin? I don't want to have to create and manage a seperate distro for each origin path.
In my case, I wasn't understanding that when CloudFront finds a matching behavior, it forwards the entire URL path to S3. If a matching directory structure isn't represented in your S3 bucket subdirectory, you'll get a 403 response. For example, if you have a behavior matching on /developer/*, and the url is {hostname}/developer/thing.html, the S3 bucket directory which represents your origin must have a root-level subdirectory named "developer" that contains thing.html.
My concern with this is that if I have 2 files with the same name in both folders how would cloudfront know which folder to use as the origin?
You'd have to find some way to distinguish the files by a matching pattern. I would suggest adding uniquely named folders to each origin, then creating behaviors to divert traffic based on the path provided:
origin1: mybucket/stuff
origin2: mybucket/things
behavior1: /stuff/* --> origin1
behavior2: /things/* --> origin2
cloudfront-url/stuff/file.html --> returns file.html in the stuff directory, aka origin1.
cloudfront-url/things/file.html --> returns file.html in the things directory, aka origin2.
Edit:
I just created a single distro with the following two origins...
With the following behaviors...
And the following bucket structure...
And here are the results:
https://d103l0p0xb29zj.cloudfront.net/file.html > matches default behavior, so returns file.html from S3-stackoverflow-test-bucket/public_folder_1.
https://d103l0p0xb29zj.cloudfront.net/pub1/file.html > matches first behavior, returns file.html from S3-stackoverflow-test-bucket/public_folder_1/pub1
https://d103l0p0xb29zj.cloudfront.net/pub2/file.html > matches second behavior, returns file.html from S3-stackoverflow-test-bucket/public_folder_2/pub2
Note that S3-stackoverflow-test-bucket/public_folder_2/file.html is not accessible at all based on the current rules.
I'll leave the distro up for a few days so you can click the links.
I have two S3 buckets that are serving as my Cloudfront origin servers:
example-bucket-1
example-bucket-2
The contents of both buckets live in the root of those buckets. I am trying to configure my Cloudfront distribution to route or rewrite based on a URL pattern. For example, with these files
example-bucket-1/something.jpg
example-bucket-2/something-else.jpg
I would like to make these URLs point to the respective files
http://example.cloudfront.net/path1/something.jpg
http://example.cloudfront.net/path2/something-else.jpg
I tried setting up cache behaviors that match the path1 and path2 patterns, but it doesn't work. Do the patterns have to actually exist in the S3 bucket?
Update: the original answer, shown below, is was accurate when written in 2015, and is correct based on the built-in behavior of CloudFront itself. Originally, the entire request path needed to exist at the origin.
If the URI is /download/images/cat.png but the origin expects only /images/cat.png then the CloudFront Cache Behavior /download/* will not do what you might assume -- the cache behavior's path pattern is only for matching -- the matched prefix isn't removed.
By itself, CloudFront doesn't provide a way to remove elements from the path requested by the browser when sending the request to the origin. The request is always forwarded as it was received, or with extra characters at the beginning, if the origin path is specified.
However, the introduction of Lambda#Edge in 2017 changes the dynamic.
Lambda#Edge allows you to declare trigger hooks in the CloudFront flow and write small Javascript functions that inspect and can modify the incoming request, either before the CloudFront cache is checked (viewer request), or after the cache is checked (origin request). This allows you to rewrite the path in the request URI. You could, for example, transform a request path from the browser of /download/images/cat.png to remove /download, resulting in a request being sent to S3 (or a custom orgin) for /images/cat.png.
This option does not modify which Cache Behavior will actually service the request, because this is always based on the path as requested by the browser -- but you can then modify the path in-flight so that the actual requested object is at a path other than the one requested by the browser. When used in an Origin Request trigger, the response is cached under the path requested by the browser, so subsequent responses don't need to be rewritten -- they can be served from the cache -- and the trigger won't need to fire for every request.
Lambda#Edge functions can be quite simple to implement. Here's an example function that would remove the first path element, whatever it may be.
'use strict';
// lambda#edge Origin Request trigger to remove the first path element
// compatible with either Node.js 6.10 or 8.10 Lambda runtime environment
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request; // extract the request object
request.uri = request.uri.replace(/^\/[^\/]+\//,'/'); // modify the URI
return callback(null, request); // return control to CloudFront
};
That's it. In .replace(/^\/[^\/]+\//,'/'), we're matching the URI against a regular expression that matches the leading / followed by 1 or more characters that must not be /, and then one more /, and replacing the entire match with a single / -- so the path is rewritten from /abc/def/ghi/... to /def/ghi/... regardless of the exact value of abc. This could be made more complex to suit specific requirements without any notable increase in execution time... but remember that a Lambda#Edge function is tied to one or more Cache Behaviors, so you don't need a single function to handle all requests going through the distribution -- just the request matched by the associated cache behavior's path pattern.
To simply prepend a prefix onto the request from the browser, the Origin Path setting can still be used, as noted below, but to remove or modify path components requires Lambda#Edge, as above.
Original answer.
Yes, the patterns have to exist at the origin.
CloudFront, natively, can prepend to the path for a given origin, but it does not currently have the capability of removing elements of the path (without Lambda#Edge, as noted above).
If your files were in /secret/files/ at the origin, you could have the path pattern /files/* transformed before sending the request to the origin by setting the "origin path."
The opposite isn't true. If the files were in /files at the origin, there is not a built-in way to serve those files from path pattern /download/files/*.
You can add (prefix) but not take away.
A relatively simple workaround would be a reverse proxy server on an EC2 instance in the same region as the S3 bucket, pointing CloudFront to the proxy and the proxy to S3. The proxy would rewrite the HTTP request on its way to S3 and stream the resulting response back to CloudFront. I use a setup like this and it has never disappointed me with its performance. (The reverse proxy software I developed can actually check multiple buckets in parallel or series and return the first non-error response it receives, to CloudFront and the requester).
Or, if using the S3 Website Endpoints as the custom origins, you could use S3 redirect routing rules to return a redirect to CloudFront, sending the browser back with the unhandled prefix removed. This would mean an extra request for each object, increasing latency and cost somewhat, but S3 redirect rules can be set to fire only when the request doesn't actually match a file in the bucket. This is useful for transitioning from one hierarchical structure to another.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/HowDoIWebsiteConfiguration.html
When I use the method generate_presigned_url -- the resulting url contains my bucket name and path to the object and I am quite paranoid of exposing them!
Is there a way to generate a temp url that will not have the path and bucket name ?
The short answer to my question is No!
Discussion with boto: https://github.com/boto/boto3/issues/1591
Apparently the work around I followed is that move the images to a new randomly named folder and generate the url! After URL expiry, delete the randomly named folder. Downside -- Time in copying the images. Space is not an issue if the url expiry is in seconds.
I have two S3 buckets that are serving as my Cloudfront origin servers:
example-bucket-1
example-bucket-2
The contents of both buckets live in the root of those buckets. I am trying to configure my Cloudfront distribution to route or rewrite based on a URL pattern. For example, with these files
example-bucket-1/something.jpg
example-bucket-2/something-else.jpg
I would like to make these URLs point to the respective files
http://example.cloudfront.net/path1/something.jpg
http://example.cloudfront.net/path2/something-else.jpg
I tried setting up cache behaviors that match the path1 and path2 patterns, but it doesn't work. Do the patterns have to actually exist in the S3 bucket?
Update: the original answer, shown below, is was accurate when written in 2015, and is correct based on the built-in behavior of CloudFront itself. Originally, the entire request path needed to exist at the origin.
If the URI is /download/images/cat.png but the origin expects only /images/cat.png then the CloudFront Cache Behavior /download/* will not do what you might assume -- the cache behavior's path pattern is only for matching -- the matched prefix isn't removed.
By itself, CloudFront doesn't provide a way to remove elements from the path requested by the browser when sending the request to the origin. The request is always forwarded as it was received, or with extra characters at the beginning, if the origin path is specified.
However, the introduction of Lambda#Edge in 2017 changes the dynamic.
Lambda#Edge allows you to declare trigger hooks in the CloudFront flow and write small Javascript functions that inspect and can modify the incoming request, either before the CloudFront cache is checked (viewer request), or after the cache is checked (origin request). This allows you to rewrite the path in the request URI. You could, for example, transform a request path from the browser of /download/images/cat.png to remove /download, resulting in a request being sent to S3 (or a custom orgin) for /images/cat.png.
This option does not modify which Cache Behavior will actually service the request, because this is always based on the path as requested by the browser -- but you can then modify the path in-flight so that the actual requested object is at a path other than the one requested by the browser. When used in an Origin Request trigger, the response is cached under the path requested by the browser, so subsequent responses don't need to be rewritten -- they can be served from the cache -- and the trigger won't need to fire for every request.
Lambda#Edge functions can be quite simple to implement. Here's an example function that would remove the first path element, whatever it may be.
'use strict';
// lambda#edge Origin Request trigger to remove the first path element
// compatible with either Node.js 6.10 or 8.10 Lambda runtime environment
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request; // extract the request object
request.uri = request.uri.replace(/^\/[^\/]+\//,'/'); // modify the URI
return callback(null, request); // return control to CloudFront
};
That's it. In .replace(/^\/[^\/]+\//,'/'), we're matching the URI against a regular expression that matches the leading / followed by 1 or more characters that must not be /, and then one more /, and replacing the entire match with a single / -- so the path is rewritten from /abc/def/ghi/... to /def/ghi/... regardless of the exact value of abc. This could be made more complex to suit specific requirements without any notable increase in execution time... but remember that a Lambda#Edge function is tied to one or more Cache Behaviors, so you don't need a single function to handle all requests going through the distribution -- just the request matched by the associated cache behavior's path pattern.
To simply prepend a prefix onto the request from the browser, the Origin Path setting can still be used, as noted below, but to remove or modify path components requires Lambda#Edge, as above.
Original answer.
Yes, the patterns have to exist at the origin.
CloudFront, natively, can prepend to the path for a given origin, but it does not currently have the capability of removing elements of the path (without Lambda#Edge, as noted above).
If your files were in /secret/files/ at the origin, you could have the path pattern /files/* transformed before sending the request to the origin by setting the "origin path."
The opposite isn't true. If the files were in /files at the origin, there is not a built-in way to serve those files from path pattern /download/files/*.
You can add (prefix) but not take away.
A relatively simple workaround would be a reverse proxy server on an EC2 instance in the same region as the S3 bucket, pointing CloudFront to the proxy and the proxy to S3. The proxy would rewrite the HTTP request on its way to S3 and stream the resulting response back to CloudFront. I use a setup like this and it has never disappointed me with its performance. (The reverse proxy software I developed can actually check multiple buckets in parallel or series and return the first non-error response it receives, to CloudFront and the requester).
Or, if using the S3 Website Endpoints as the custom origins, you could use S3 redirect routing rules to return a redirect to CloudFront, sending the browser back with the unhandled prefix removed. This would mean an extra request for each object, increasing latency and cost somewhat, but S3 redirect rules can be set to fire only when the request doesn't actually match a file in the bucket. This is useful for transitioning from one hierarchical structure to another.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/HowDoIWebsiteConfiguration.html