Case:
I have a few S3 buckets that store my media files. I wanted to map all these s3 buckets to a single CF distribution. (their files should be accessed from different paths).
I have made a CF distribution and added 2 buckets. For the behaviour, the first bucket is on Default(*) and the second bucket is on path nature/*.
Issues:
I am able to access the primary bucket (one with default behaviour) but not able to access the secondary bucket (with path nature/*). The error on accessing the secondary bucket is "Access Denied".
Additional details:
Both my buckets are not available to global access and CF is accessing them from OAI.
References:
https://aswinkumar4018.medium.com/amazon-cloudfront-with-multiple-origin-s3-buckets-71b9e6f8936
https://vimalpaliwal.com/blog/2018/10/10f435c29f/serving-multiple-s3-buckets-via-single-aws-cloudfront-distribution.html
https://youtu.be/5r9Q-tI7mMw
Your files in the second bucket must start with a top level prefix nature/ or else the won't resolve. CF doesn't remove the path matched when routing, it is still there. The CloudFront behavior will correctly match and send the request to the second bucket, but the path will still be nature/....
If you can't move the objects in the second bucket into a nature/ prefix, then you need a CF Function to remove this part of the path from the object key before forwarding the request to the S3 origin.
Moving all the objects include the nature prefix is easy but annoying. It is the best strategy because it is also cheapest (from a monetary standpoint), but may require extra overhead on your side. A CF function is easier but costly both from a money standpoint and performance, since the CF function has to be run every time.
An example CF function might be:
function handler(event) {
var request = event.request;
request.uri = request.uri.replace(/^\/nature/, '');
return request;
}
CloudFront doesn't natively support this.
The original request path is forwarded intact to the origin server, with only one exception: if the origin has an Origin Path configured, that value is added to the beginning of the path before the request is sent to the origin (and, of course, this doesn't help, here).
However, we can modify the path during CloudFront processing using a CF-Function
The function code to remove the first level from the path might look something like this:
function handler(event) {
var request = event.request;
request.uri = request.uri.replace(/^\/[^/]*\//, "/");
return request;
}
Related
I'm using an AWS Lambda function to create a file and save it to my bucket on S3, it is working fine. After executing the putObject method, I get a data object, but it only contains an Etag of the recently added object.
s3.putObject(params, function(err, data) {
// data only contains Etag
});
I need to know the exact URL that I can use in a browser so the client can see the file. The folder has been already made public and I can see the file if I copy the Link from the S3 console.
I tried using getSignedUrl but the URL it returns is used for other purposes, I believe.
Thanks!
The SDKs do not generally contain a convenience method to create a URL for publicly-readable objects. However, when you called PutObject, you provided the bucket and the object's key and that's all you need. You can simply combine those to make the URL of the object, for example:
https://bucket.s3.amazonaws.com/key
So, for example, if your bucket is pablo and the object key is dogs/toto.png, use:
https://pablo.s3.amazonaws.com/dogs/toto.png
Note that S3 keys do not begin with a / prefix. A key is of the form dogs/toto.png, and not /dogs/toto.png.
For region-specific buckets, see Working with Amazon S3 Buckets and AWS S3 URL Styles. Replace s3 with s3.<region>.amazonaws.com or s3-<region>.amazonaws.com in the above URLs, for example:
https://seamus.s3.eu-west-1.amazonaws.com/dogs/setter.png (with dot)
https://seamus.s3-eu-west-1.amazonaws.com/dogs/setter.png (with dash)
If you are using IPv6, then the general URL form will be:
https://BUCKET.s3.dualstack.REGION.amazonaws.com
For some buckets, you may use the older path-style URLs. Path-style URLs are deprecated and only work with buckets created on or before September 30, 2020. They are used like this:
https://s3.amazonaws.com/bucket/key
https://s3.amazonaws.com/pablo/dogs/toto.png
https://s3.eu-west-1.amazonaws.com/seamus/dogs/setter.png
https://s3.dualstack.REGION.amazonaws.com/BUCKET
Currently there are TLS and SSL certificate issues that may require some buckets with dots (.) in their name to be accessed via path-style URLs. AWS plans to address this. See the AWS announcement.
Note: General guidance on object keys where certain characters need special handling. For example space is encoded to + (plus sign) and plus sign is encoded to %2B. Also here.
in case you got the s3bucket and filename objects and want to extract the url, here is an option:
function getUrlFromBucket(s3Bucket,fileName){
const {config :{params,region}} = s3Bucket;
const regionString = region.includes('us-east-1') ?'':('-' + region)
return `https://${params.Bucket}.s3${regionString}.amazonaws.com/${fileName}`
};
You can do an another call with this:
var params = {Bucket: 'bucket', Key: 'key'};
s3.getSignedUrl('putObject', params, function (err, url) {
console.log('The URL is', url);
});
I have setup a lambda#edge function that dynamically decides what bucket (origin) to use to get the s3 object from (uri..).
I want to make sure that what I am doing does not defeat the whole purpose of using CloudFront in the first place, meaning, allowing origins contents (s3 buckets) to be cached at edges so that users can get what they request fast from the closest edge location to them.
I am following this flow:
'use strict';
const querystring = require('querystring');
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
/**
* Reads query string to check if S3 origin should be used, and
* if true, sets S3 origin properties.
*/
const params = querystring.parse(request.querystring);
if (params['useS3Origin']) {
if (params['useS3Origin'] === 'true') {
const s3DomainName = 'my-bucket.s3.amazonaws.com';
/* Set S3 origin fields */
request.origin = {
s3: {
domainName: s3DomainName,
region: '',
authMethod: 'none',
path: '',
customHeaders: {}
}
};
request.headers['host'] = [{ key: 'host', value: s3DomainName}];
}
}
callback(null, request);
};
That is just an example of what I am doing. ( found from https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-examples.html#lambda-examples-content-based-S3-origin-request-trigger )
I have setup an Origin-Request trigger that runs my lambda#adge function which will decide what bucket to use as origin.
When the lambda function sets the bucket (s3DomainName) as it gets run in the Origin-Request trigger, as long as that bucket is setup in CloudFront as one of the origin already, will CloudFront be able to serve the file down from cache still? Or am I bypassing the whole point of using CloudFront by setting the origin bucket like shown in the cod example?
I want to make sure that what I am doing does not defeat the whole purpose of using CloudFront
It doesn't defeat the purpose. This works as intended and the response would still be cached using the same rules that would be used if this bucket were a statically configured origin and no trigger was in place.
Origin Request triggers only fire when there's a cache miss -- after the cache is checked and the requested resource is not there, and CloudFront concludes that the request needs to be forwarded to an origin server. The Origin Request trigger causes CloudFront to briefly suspend its processing while your trigger code evaluates and possibly modifies the request before returning control to CloudFront. Changing/setting the origin server in the trigger modifies the destination where CloudFront will then send the request, but it doesn't have any impact on whether the response can/will be cached or otherwise change what CloudFront would do when the response arrives from the origin server. And, if a suitable response is already available in the cache when a new request arrives, the trigger won't fire at all, since there's nothing for it to do.
as long as that bucket is setup in CloudFront as one of the origin already
This isn't actually required. Any S3 bucket with publicly-accessible content can be specified as the origin server in an Origin Request trigger, even if it isn't one of the origins configured on the distribution.
There are other rules and complexities that come into play if a trigger like this is desired and your buckets require authentication and are using KMS-based object encryption, but in those cases, there would still be no impact of this design on caching -- just some extra things you need to do to actually make authentication work between CloudFront and the bucket, since an Origin Access Identity (normally used to allow CloudFront to authenticate itself to S3) isn't sufficient for this use case.
It's not directly related to the Origin Request trigger and the gist of this question, but noteworthy since you are using request.querystring: be sure you understand the concept of the cache key, which is the combination of request attributes that CloudFront will use when determining whether a future request can be served using a given cached object. If two similar requests result in different cache keys, then CloudFront won't treat them as identical requests and the response for one cannot be used as a cached response for the other, even though your expectation might have been otherwise, since the two request are for the same object. An Origin Request trigger can't change the cache key for a request, either accidentally or deliberately, since it's already been calculated by CloudFront by the time the trigger fires.
meaning, allowing origins contents (s3 buckets) to be cached at edges so that users can get what they request fast from the closest edge location to them
Ah. Having written all the above, it occurs to me now that perhaps your question might actually stem from an assumption I've seen people occasionally make about how CloudFront and S3 interact. It's an assumption that is understandable, but turns out to be inaccurate. When a bucket is configured as a CloudFront origin, there is no mechanism that pushes the bucket's content out to the CloudFront edges proactively. If you were working from a belief that this is something that happens when a bucket is configured as a CloudFront S3 origin, then this question takes on a different significance, since you're essentially asking whether the content from those various buckets would still arrive at CloudFront as expected... but CloudFront is what I call a "pull-through CDN." Objects are stored in the cache at a given edge only when someone requests them through that edge -- they're cached as they are fetched as a result of that initial request, not before. So your setup simply changes where CloudFront goes to fetch a given object. The rest of the system works the same as always, with CloudFront pulling and caching the content on demand.
I have two S3 buckets that are serving as my Cloudfront origin servers:
example-bucket-1
example-bucket-2
The contents of both buckets live in the root of those buckets. I am trying to configure my Cloudfront distribution to route or rewrite based on a URL pattern. For example, with these files
example-bucket-1/something.jpg
example-bucket-2/something-else.jpg
I would like to make these URLs point to the respective files
http://example.cloudfront.net/path1/something.jpg
http://example.cloudfront.net/path2/something-else.jpg
I tried setting up cache behaviors that match the path1 and path2 patterns, but it doesn't work. Do the patterns have to actually exist in the S3 bucket?
Update: the original answer, shown below, is was accurate when written in 2015, and is correct based on the built-in behavior of CloudFront itself. Originally, the entire request path needed to exist at the origin.
If the URI is /download/images/cat.png but the origin expects only /images/cat.png then the CloudFront Cache Behavior /download/* will not do what you might assume -- the cache behavior's path pattern is only for matching -- the matched prefix isn't removed.
By itself, CloudFront doesn't provide a way to remove elements from the path requested by the browser when sending the request to the origin. The request is always forwarded as it was received, or with extra characters at the beginning, if the origin path is specified.
However, the introduction of Lambda#Edge in 2017 changes the dynamic.
Lambda#Edge allows you to declare trigger hooks in the CloudFront flow and write small Javascript functions that inspect and can modify the incoming request, either before the CloudFront cache is checked (viewer request), or after the cache is checked (origin request). This allows you to rewrite the path in the request URI. You could, for example, transform a request path from the browser of /download/images/cat.png to remove /download, resulting in a request being sent to S3 (or a custom orgin) for /images/cat.png.
This option does not modify which Cache Behavior will actually service the request, because this is always based on the path as requested by the browser -- but you can then modify the path in-flight so that the actual requested object is at a path other than the one requested by the browser. When used in an Origin Request trigger, the response is cached under the path requested by the browser, so subsequent responses don't need to be rewritten -- they can be served from the cache -- and the trigger won't need to fire for every request.
Lambda#Edge functions can be quite simple to implement. Here's an example function that would remove the first path element, whatever it may be.
'use strict';
// lambda#edge Origin Request trigger to remove the first path element
// compatible with either Node.js 6.10 or 8.10 Lambda runtime environment
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request; // extract the request object
request.uri = request.uri.replace(/^\/[^\/]+\//,'/'); // modify the URI
return callback(null, request); // return control to CloudFront
};
That's it. In .replace(/^\/[^\/]+\//,'/'), we're matching the URI against a regular expression that matches the leading / followed by 1 or more characters that must not be /, and then one more /, and replacing the entire match with a single / -- so the path is rewritten from /abc/def/ghi/... to /def/ghi/... regardless of the exact value of abc. This could be made more complex to suit specific requirements without any notable increase in execution time... but remember that a Lambda#Edge function is tied to one or more Cache Behaviors, so you don't need a single function to handle all requests going through the distribution -- just the request matched by the associated cache behavior's path pattern.
To simply prepend a prefix onto the request from the browser, the Origin Path setting can still be used, as noted below, but to remove or modify path components requires Lambda#Edge, as above.
Original answer.
Yes, the patterns have to exist at the origin.
CloudFront, natively, can prepend to the path for a given origin, but it does not currently have the capability of removing elements of the path (without Lambda#Edge, as noted above).
If your files were in /secret/files/ at the origin, you could have the path pattern /files/* transformed before sending the request to the origin by setting the "origin path."
The opposite isn't true. If the files were in /files at the origin, there is not a built-in way to serve those files from path pattern /download/files/*.
You can add (prefix) but not take away.
A relatively simple workaround would be a reverse proxy server on an EC2 instance in the same region as the S3 bucket, pointing CloudFront to the proxy and the proxy to S3. The proxy would rewrite the HTTP request on its way to S3 and stream the resulting response back to CloudFront. I use a setup like this and it has never disappointed me with its performance. (The reverse proxy software I developed can actually check multiple buckets in parallel or series and return the first non-error response it receives, to CloudFront and the requester).
Or, if using the S3 Website Endpoints as the custom origins, you could use S3 redirect routing rules to return a redirect to CloudFront, sending the browser back with the unhandled prefix removed. This would mean an extra request for each object, increasing latency and cost somewhat, but S3 redirect rules can be set to fire only when the request doesn't actually match a file in the bucket. This is useful for transitioning from one hierarchical structure to another.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/HowDoIWebsiteConfiguration.html
I've been setting up aws lambda functions for S3 events. I want to set up a new structure for my bucket, but it's not possible--so I set up a new bucket the way I want and will migrate old things and send new things there. I wanted to have some of the structure the same under a given base folder name old-bucket/images and new-bucket/images. I set up CloudFront to serve from old-bucket/images now, but I wanted to add new-bucket/images as well. I thought the behavior tab would set it such that it would check the new-bucket/images first then old-bucket/images. Alas, that didn't work. If the object wasn't found in the first, that was the end of the line.
Am I misunderstanding how behaviors work? Has anyone attempted anything like this?
That is expected behavior. An origin tells Amazon CloudFront where to obtain the data to serve to users, based upon a prefix, suffix, etc.
For example, you could serve old-bucket/* from one Amazon S3 bucket, while serving new-bucket/* from a different bucket.
However, there is no capability to 'fall-back' to a different origin if a file is not found.
You could check for the existence of files before serving the link, and then provide a different link depending upon where the files are stored. Otherwise, you'll need to put all of your files in the location that matches the link you are serving.
I have two S3 buckets that are serving as my Cloudfront origin servers:
example-bucket-1
example-bucket-2
The contents of both buckets live in the root of those buckets. I am trying to configure my Cloudfront distribution to route or rewrite based on a URL pattern. For example, with these files
example-bucket-1/something.jpg
example-bucket-2/something-else.jpg
I would like to make these URLs point to the respective files
http://example.cloudfront.net/path1/something.jpg
http://example.cloudfront.net/path2/something-else.jpg
I tried setting up cache behaviors that match the path1 and path2 patterns, but it doesn't work. Do the patterns have to actually exist in the S3 bucket?
Update: the original answer, shown below, is was accurate when written in 2015, and is correct based on the built-in behavior of CloudFront itself. Originally, the entire request path needed to exist at the origin.
If the URI is /download/images/cat.png but the origin expects only /images/cat.png then the CloudFront Cache Behavior /download/* will not do what you might assume -- the cache behavior's path pattern is only for matching -- the matched prefix isn't removed.
By itself, CloudFront doesn't provide a way to remove elements from the path requested by the browser when sending the request to the origin. The request is always forwarded as it was received, or with extra characters at the beginning, if the origin path is specified.
However, the introduction of Lambda#Edge in 2017 changes the dynamic.
Lambda#Edge allows you to declare trigger hooks in the CloudFront flow and write small Javascript functions that inspect and can modify the incoming request, either before the CloudFront cache is checked (viewer request), or after the cache is checked (origin request). This allows you to rewrite the path in the request URI. You could, for example, transform a request path from the browser of /download/images/cat.png to remove /download, resulting in a request being sent to S3 (or a custom orgin) for /images/cat.png.
This option does not modify which Cache Behavior will actually service the request, because this is always based on the path as requested by the browser -- but you can then modify the path in-flight so that the actual requested object is at a path other than the one requested by the browser. When used in an Origin Request trigger, the response is cached under the path requested by the browser, so subsequent responses don't need to be rewritten -- they can be served from the cache -- and the trigger won't need to fire for every request.
Lambda#Edge functions can be quite simple to implement. Here's an example function that would remove the first path element, whatever it may be.
'use strict';
// lambda#edge Origin Request trigger to remove the first path element
// compatible with either Node.js 6.10 or 8.10 Lambda runtime environment
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request; // extract the request object
request.uri = request.uri.replace(/^\/[^\/]+\//,'/'); // modify the URI
return callback(null, request); // return control to CloudFront
};
That's it. In .replace(/^\/[^\/]+\//,'/'), we're matching the URI against a regular expression that matches the leading / followed by 1 or more characters that must not be /, and then one more /, and replacing the entire match with a single / -- so the path is rewritten from /abc/def/ghi/... to /def/ghi/... regardless of the exact value of abc. This could be made more complex to suit specific requirements without any notable increase in execution time... but remember that a Lambda#Edge function is tied to one or more Cache Behaviors, so you don't need a single function to handle all requests going through the distribution -- just the request matched by the associated cache behavior's path pattern.
To simply prepend a prefix onto the request from the browser, the Origin Path setting can still be used, as noted below, but to remove or modify path components requires Lambda#Edge, as above.
Original answer.
Yes, the patterns have to exist at the origin.
CloudFront, natively, can prepend to the path for a given origin, but it does not currently have the capability of removing elements of the path (without Lambda#Edge, as noted above).
If your files were in /secret/files/ at the origin, you could have the path pattern /files/* transformed before sending the request to the origin by setting the "origin path."
The opposite isn't true. If the files were in /files at the origin, there is not a built-in way to serve those files from path pattern /download/files/*.
You can add (prefix) but not take away.
A relatively simple workaround would be a reverse proxy server on an EC2 instance in the same region as the S3 bucket, pointing CloudFront to the proxy and the proxy to S3. The proxy would rewrite the HTTP request on its way to S3 and stream the resulting response back to CloudFront. I use a setup like this and it has never disappointed me with its performance. (The reverse proxy software I developed can actually check multiple buckets in parallel or series and return the first non-error response it receives, to CloudFront and the requester).
Or, if using the S3 Website Endpoints as the custom origins, you could use S3 redirect routing rules to return a redirect to CloudFront, sending the browser back with the unhandled prefix removed. This would mean an extra request for each object, increasing latency and cost somewhat, but S3 redirect rules can be set to fire only when the request doesn't actually match a file in the bucket. This is useful for transitioning from one hierarchical structure to another.
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/distribution-web-values-specify.html
http://docs.aws.amazon.com/AmazonS3/latest/dev/HowDoIWebsiteConfiguration.html