What CloudFront/S3 doing with HTML/CSS/JS files?

What CloudFront/S3 doing with HTML/CSS/JS files? - amazon-web-services

I uploaded to S3 below files:
EntryPoint__fd6b122d5ca60cd57ec5.js
index.html
Main.css
But server returns below content:
Especially the gray one is strange: it does not seems like file.
It's name (filename?) is ID from URN:
https://XXXX.com/product/53483ca1-9fd1-4970-841d-e9fbeadd4660
But when I checked the content of EntryPoint__fd6b122d5ca60cd57ec5.js, Main.css, I saw the same HTML code as in picture above (by other words, content of 53483ca1-9fd1-4970-841d-e9fbeadd4660, Main.css and EntryPoint__fd6b122d5ca60cd57ec5.js is even).
I have error:
Uncaught SyntaxError: Unexpected token '<'
EntryPoint__fd6b122d5ca60cd57ec5.js:1
To solve this error, first I need to understand what CloudFront/S3 did with my files. What is gray one? Where it's name has been set?
Anyway, it did something wrong: EntryPoint__fd6b122d5ca60cd57ec5.js has HTML content, and certainly
JavaScript can not parse it.
Update or request: deploying to S3 task
const applicationDeployment = ({
targetFilesGlobSelections,
targetIsFunctionalTesting = false
}) =>
Gulp.src(targetFilesGlobSelections)
.pipe(GulpPlugins.plumber({
errorHandler: (error) => {
console.error("Task: 'DeployApplication', error occurred:");
console.error(error);
NodeNotifier.notify({
title: "Task: 'DeployApplication', error occurred:",
message: error.message
});
}
}))
.pipe(GulpPlugins.s3(
targetIsFunctionalTesting ? AMAZON_S3_DEPLOYMENT_CONFIG__FUNCTIONAL_STAGING : AMAZON_S3_DEPLOYMENT_CONFIG__PRODUCTION
));
Gulp.task("Deployment to production", () => applicationDeployment({
targetFilesGlobSelections: `${public}/**/**`
}));

The viewer requests the website at www.example.com.
If the requested object is cached, CloudFront returns the object from
its cache to the viewer.
If the object is not in CloudFront’s cache, CloudFront requests the
object from the origin (an S3 bucket).
S3 returns the object to CloudFront, which triggers the Lambda#Edge
origin response event.
The object, including the security headers added by the Lambda#Edge
function, is added to CloudFront’s cache.
(Not shown) The objects is returned to the viewer. Subsequent
requests for the object that come to the same CloudFront edge
location are served from the CloudFront cache.
As mentioned in 5th point, let elaborate the Lambda#Edge more! There are many uses for Lambda#Edge processing. For example:
A Lambda function can inspect cookies and rewrite URLs so that users
see different versions of a site for A/B testing.
CloudFront can return different objects to viewers based on the
device they're using by checking the User-Agent header, which
includes information about the devices. For example, CloudFront can
return different images based on the screen size of their device.
Similarly, the function could consider the value of the Referer
header and cause CloudFront to return the images to bots that have
the lowest available resolution.
Or you could check cookies for other criteria. For example, on a
retail website that sells clothing, if you use cookies to indicate
which color a user chose for a jacket, a Lambda function can change
the request so that CloudFront returns the image of a jacket in the
selected color.
A Lambda function can generate HTTP responses when CloudFront viewer
request or origin request events occur.
A function can inspect headers or authorization tokens, and insert a
header to control access to your content before CloudFront forwards
the request to your origin.
A Lambda function can also make network calls to external resources
to confirm user credentials, or fetch additional content to customize
a response.
I hope it can help you to understand what happened to your files. By the in inspect grey files mean HTML, orange/yellow mean javascript or .js file and blue mean css file.
Following is the example of my files!

Related

Cloudfront Edge functions

I'm trying to play Instagram Video assets. The challenge is the videos are expirable. They expire every N mins.
I'm brainstorming a solution where I set up my CDN (Cloudfront) which forwards the incoming requests to the original server (Instagram in this case), caches the video at CDN, and then keeps serving it without the need to request Instagram again. I don't want to download the videos and keep them in my bucket.
I'd a look at CloudFront functions and was able to redirect the incoming requests to another URL, basis on some conditions. Following is the code.
function handler(event) {
var request = event.request;
var headers = request.headers;
if request.uri == '/assets/1.jpg'{
var newurl = 'https://instagram.com/media/1.jpg'
var response = {
statusCode: 302,
statusDescription: 'Found',
headers:
{ "location": { "value": newurl } }
}
return response;
}
return request
}
However, this redirects it to the newURL. What I'm looking for is not a redirect, but the following
when the request is made to my server CDN, ie mydomain.com/assets/1.jpg, the file 1.jpg should be served from the Instagram server, whose value is the newURL in the above code snippet. This should be done without changing my domain URL (in the address bar) to Instagram.
The following requests to mydomain.com/assets/1.jpg should be directly served from the cache, and should not be routed again to Instagram.
Any help in this regard is highly appreciated.

I'm afraid LambdaEdge will not help here, however you may use Custom Origin in your CloudFront behavior with your custom cache policy to meet N mins TTL requirement. In case you familiar with CDK then please have a look at HttpOrigin. CloudFront distribution can look like below:
new cloudfront.Distribution(this, 'myDist', {
defaultBehavior: {
origin: new origins.HttpOrigin('www.instagram.com'),
cachePolicy: new cloudfront.CachePolicy(this, 'myCachePolicy', {
cachePolicyName: 'MyPolicy',
comment: 'A default policy',
defaultTtl: Duration.minutes(N)
}),
},
});

Spoke to the AWS team directly. This is what they responded.
From the case description, I understand you're attempting to set up a CloudFront distribution that forwards incoming requests to the original server (Instagram in this case), caches the video at CDN, and then continues to serve it without the need to request Instagram again, and you've also stated that you don't want to store the videos in an S3 bucket. If I've misunderstood your concern, kindly correct me.
Using the internal tools, I could see that the origin for the CloudFront distribution is an S3 bucket. Since you have mentioned in your concern that you want the requests coming to your distribution to be forwarded to the origin, in this case Instagram to serve the video assets from there, you can make use of Custom origins in CloudFront for this. Most CloudFront features are supported when you use a custom origin except for private content. For CloudFront to access the custom origin, the origin must remain publicly accessible. See [1].
With this in mind, I attempted to recreate the situation in which "Instagram" can be set as the custom origin for a CloudFront distribution. I used "www.instagram.com " as my origin, and when I tried to access the CF distribution, I received a "5xx Server Error," implying that Instagram is not allowed to be configured as an origin. Unfortunately, due to the configurations of the origin (Instagram), you will not be able to serve content from Instagram without first storing it in your S3 bucket. Using CloudFront and S3, you can serve video content as described in this document [2]
Another workaround is to use redirection, which can be accomplished by using S3 Bucket's Static website hosting property or Lambda#Edge functions [3,4]. This method does not require you to store the content in an S3 bucket to serve it, since you mentioned in your correspondence that you want to serve the Instagram content from your cache and do not want the requests forwarded to Instagram again, this method is also not possible. When you redirect your CloudFront requests to a new website, a new request is generated to the origin to serve the content, and CloudFront is removed from the picture. Because CloudFront is not involved, it will not be able to cache the content, and every time a request is made, it will directly hit the origin server, i.e. Instagram's servers. Kindly note that, since Instagram is a third-party tool, unless you have the access to use it as a CloudFront origin, CloudFront will not be able to cache it's content.
References:
[1] Using Amazon EC2 (or another custom origin): https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/DownloadDistS3AndCustomOrigins.html
[2] Tutorial: Hosting on-demand streaming video with Amazon S3, Amazon CloudFront, and Amazon Route 53: https://docs.aws.amazon.com/AmazonS3/latest/userguide/tutorial-s3-cloudfront-route53-video-streaming.html
[3] (Optional) Configuring a webpage redirect: https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-page-redirect.html
[4] Handling Redirects#Edge Part 1: https://aws.amazon.com/blogs/networking-and-content-delivery/handling-redirectsedge-part1/

Static content on CloudFront is cached incorrectly over time

I have set up a CloudFront on top of multiple S3 buckets (in different regions) to provide a fast stable version of my webapp. This webapp is implemented with React which means it's all one single HTML file and one single Javascript file.
Using the routing mechanism of React, all the paths in the URL are handled within the code. This means if I click on a link like www.example.com/users, there won't be a request sent to the server. Instead, the client code will render the appropriate page without any consultation with the server (I'm just talking about the HTML and not considering the data). This means that if some user types in the given URL, the server should return the index.html (the only HTML file I have) which then will take care of the URL on the client-side. In other words, all the requests sent to the server should either return the HTML file or the Javascript file I mentioned earlier. Even the requests that are pointing to none-existing files.
In order to implement this requirement, I asked this question and I got an answer like this:
I need to set up an error page for my distribution on CloudFront and
redirect all the 403 (Forbidden) requests to /index.html file. This
is because when the request is pointing to a nonexisting file on S3,
S3 will return 403 to CloudFront due to the lack of listing
permission. Or I can grant the listing permission and instead handle
the 404 error (I didn't test this latter option).
Anyways, I set this up and it works perfectly - for a few hours. But then, for some unknown reason, the request to the Javascript file also returns the HTML file. And of course, all I'm getting back is actually coming from CloudFront's cache which means, no matter how many times I send the request, it will keep returning the same value. That is until I invalidate the cache on CloudFront which will solve the problem for few more hours. And we go around and around.
Even though I'm not sure why this happens but my guess is that at some point the S3 buck is inaccessible to CloudFront which will result in CloudFront caching the index.html. What can I do about this?

I think I found the problem:
MAKE SURE YOUR STATIC CONTENT ON ALL THE S3 BUCKETS ARE IDENTICAL!!!
In my case, the Javascript filename is automatically generated by Webpack which means it's random. And since different regions were "compiled" separated, their filenames differed.

Implement Lambda#Edge authentication for CloudFront

I am looking to add the Lambda#Edge to one of our services. The goal is to regex the url for certain values and compare those against a header value to ensure authorization. If the value is present then it is compared and if rejected should return a 403 immediately to the user. If the value compared matches or the url doesn't contain a particular value, then the request continues on as an authorized request.
Initially I was thinking that this would occur with a "viewer request" event. Some of the posts and comments on SO suggest that the "origin request" is more ideal for this check. But right now I've been trying to play around with the examples in the documentation on one of our CF end points but I'm not seeing expected results. The code is the following:
'use strict';
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
request.headers["edge-test"] = [{
key: 'edge-test',
value: Date.now().toString()
}];
console.log(require('util').inspect(event, { depth: null }));
callback(null, request);
};
I would expect that there should be a logged value inside cloudwatch and a new header value in the request, yet I'm not seeing any logs nor am I seeing the header value when the request comes in.
Can someone shed some light on why things don't seem to be executing as to what I would think should be the response? Is my understanding of what the expected output wrong? Is there configuration that I may be missing (My distribution ID on the trigger is set to the instance we want, and the behavior was set to '*')? Any help is appreciated :)

First, a few notes;
CloudFront is (among other things) a web cache.
A web cache's purpose is to serve content directly to the browser instead of sending the request to the origin server.
However, one of the most critical things a cache must do correctly is not return the wrong content. One of the ways a cache can return the wrong content is by not realizing that certain request headers may cause the orogin server to vary the response it returns for a given URI.
CloudFront has no perfect way of knowing this, so its solution -- by default -- is to remove almost all of the headers from the request before forwarding it to the origin. Then it caches the received response against exactly the request that it sent to the origin, and will only use that cached response for future identical requests.
Injecting a new header in a Viewer Request trigger will cause that header to be discarded after it passes through the matching Cache Behavior, unless the cache behavior specifically is configured to whitelist that header for forwarding to the origin. This is the same behavior you would see if the header had been injected by the browser, itself.
So, your solution to get this header to pass through to the origin is to whitelist it in the cache behavior settings.
If you tried this same code as an Origin Request trigger, without the header whitelisted, CloudFront would actually throw a 502 Bad Gateway error, because you're trying to inject a header that CloudFront already knows you haven't whitelisted in the matching Cache Behavior. (In Viewer Request, the Cache Behavior match hasn't yet occurred, so CloudFront can't tell if you're doing something with the headers that will not ultimately work. In Origin Request, it knows.) The flow is Viewer Request > Cache Behavior > Cache Check > (if cache miss) Origin Request > send to Origin Server. Whitelisting the header would resolve this, as well.
Any header you want the origin to see, whether it comes from the browser, or a request trigger, must be whitelisted.
Note that some headers are inaccessible or immutable, particularly those that could be used to co-opt CloudFront for fraudulent purposes (such as request forgery and spoofing) and those that simply make no sense to modify.

How do you set a default root object for subdirectories for a statically hosted website on Cloudfront?

How do you set a default root object for subdirectories on a statically hosted website on Cloudfront? Specifically, I'd like www.example.com/subdir/index.html to be served whenever the user asks for www.example.com/subdir. Note, this is for delivering a static website held in an S3 bucket. In addition, I would like to use an origin access identity to restrict access to the S3 bucket to only Cloudfront.
Now, I am aware that Cloudfront works differently than S3 and amazon states specifically:
The behavior of CloudFront default root objects is different from the
behavior of Amazon S3 index documents. When you configure an Amazon S3
bucket as a website and specify the index document, Amazon S3 returns
the index document even if a user requests a subdirectory in the
bucket. (A copy of the index document must appear in every
subdirectory.) For more information about configuring Amazon S3
buckets as websites and about index documents, see the Hosting
Websites on Amazon S3 chapter in the Amazon Simple Storage Service
Developer Guide.
As such, even though Cloudfront allows us to specify a default root object, this only works for www.example.com and not for www.example.com/subdir. In order to get around this difficulty, we can change the origin domain name to point to the website endpoint given by S3. This works great and allows the root objects to be specified uniformly. Unfortunately, this doesn't appear to be compatable with origin access identities. Specifically, the above links states:
Change to edit mode:
Web distributions – Click the Origins tab, click the origin that you want to edit, and click Edit. You can only create an origin access
identity for origins for which Origin Type is S3 Origin.
Basically, in order to set the correct default root object, we use the S3 website endpoint and not the website bucket itself. This is not compatible with using origin access identity. As such, my questions boils down to either
Is it possible to specify a default root object for all subdirectories for a statically hosted website on Cloudfront?
Is it possible to setup an origin access identity for content served from Cloudfront where the origin is an S3 website endpoint and not an S3 bucket?

There IS a way to do this. Instead of pointing it to your bucket by selecting it in the dropdown (www.example.com.s3.amazonaws.com), point it to the static domain of your bucket (eg. www.example.com.s3-website-us-west-2.amazonaws.com):
Thanks to This AWS Forum thread

(New Feature May 2021) CloudFront Function
Create a simple JavaScript function below
function handler(event) {
var request = event.request;
var uri = request.uri;
// Check whether the URI is missing a file name.
if (uri.endsWith('/')) {
request.uri += 'index.html';
}
// Check whether the URI is missing a file extension.
else if (!uri.includes('.')) {
request.uri += '/index.html';
}
return request;
}
Read here for more info

Activating S3 hosting means you have to open the bucket to the world. In my case, I needed to keep the bucket private and use the origin access identity functionality to restrict access to Cloudfront only. Like #Juissi suggested, a Lambda function can fix the redirects:
'use strict';
/**
* Redirects URLs to default document. Examples:
*
* /blog -> /blog/index.html
* /blog/july/ -> /blog/july/index.html
* /blog/header.png -> /blog/header.png
*
*/
let defaultDocument = 'index.html';
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
if(request.uri != "/") {
let paths = request.uri.split('/');
let lastPath = paths[paths.length - 1];
let isFile = lastPath.split('.').length > 1;
if(!isFile) {
if(lastPath != "") {
request.uri += "/";
}
request.uri += defaultDocument;
}
console.log(request.uri);
}
callback(null, request);
};
After you publish your function, go to your cloudfront distribution in the AWS console. Go to Behaviors, then chooseOrigin Request under Lambda Function Associations, and finally paste the ARN to your new function.

I totally agree that it's a ridiculous problem! The fact that CloudFront knows about serving index.html as Default Root Object AND STILL they say it doesn't work for subdirectories (source) is totally strange!
The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket.
I, personally, believe that AWS has made it this way so CloudFront becomes a CDN only (loading assets, with no logic in it whatsoever) and every request to a path in your website should be served from a "Server" (e.g. EC2 Node/Php server, or a Lambda function.)
Whether this limitation exists to enhance security, or keep things apart (i.e. logic and storage separated), or make more money (to enforce people to have a dedicated server, even for static content) is up to debate.
Anyhow, I'm summarizing the possible solutions workarounds here, with their pros and cons.
1) S3 can be Public - Use Custom Origin.
It's the easiest one, originally posted by #JBaczuk answer as well as in this github gist. Since S3 already supports serving index.html in subdirectories via Static Website Hosting, all you need to do is:
Go to S3, enable Static Website Hosting
Grab the URL in the form of http://<bucket-name>.s3-website-us-west-2.amazonaws.com
Create a new Origin in CloudFront and enter this as a Custom Origin (and NOT S3 ORIGIN), so CloudFront treats this as an external website when getting the content.
Pros:
Very easy to set up.
It supports /about/, /about, and /about/index.html and redirect the last two to the first one, properly.
Cons:
If your files in the S3 bucket are not in the root of S3 (say in /artifacts/* then going to www.domain.com/about (without the trailing /) will redirect you to www.domain.com/artifacts/about which is something you don't want at all! Basically the /about to /about/ redirect in S3 breaks if you serve from CloudFront and the path to files (from the root) don't match.
Security and Functionality: You cannot make S3 Private. It's because CloudFront's Origin Access Identity is not going to be supported, clearly, because CloudFront is instructed to take this Origin as a random website. It means that users can potentially get the files from S3 directly, which might not be what you ever what due to security/WAF concerns, as well as the website actually working if you have JS/html that relies on the path being your domain only.
[maybe an issue] The communication between CloudFront and S3 is not the way it's recommended to optimize stuff.
[maybe?] someone has complained that it doesn't work smoothly for more than one Origin in the Distribution (i.e. wanting /blog to go somewhere)
[maybe?] someone has complained that it doesn't preserve the original query params as expected.
2) Official solution - Use a Lambda Function.
It's the official solution (though the doc is from 2017). There is a ready-to-launch 3rd-party Application (JavaScript source in github) and example Python Lambda function (this answer) for it, too.
Technically, by doing this, you create a mini-server (they call it serverless!) that only serves CloudFront's Origin Requests to S3 (so, it basically sits between CloudFront and S3.)
Pros:
Hey, it's the official solution, so probably lasts longer and is the most optimized one.
You can customize the Lambda Function if you want and have control over it. You can support further redirect in it.
If implemented correctly, (like the 3rd party JS one, and I don't think the official one) it supports /about/ and /about both (with a redirect from the latter without trailing / to the former).
Cons:
It's one more thing to set up.
It's one more thing to have an eye, so it doesn't break.
It's one more thing to check when something breaks.
It's one more thing to maintain -- e.g. the third-party one here has open PRs since Jan 2021 (it's April 2021 now.)
The 3rd party JS solution doesn't preserve the query params. So /about?foo=bar is 301 redirected to /about/ and NOT /about/?foo=bar. You need to make changes to that lambda function to make it work.
The 3rd party JS solution keeps /about/ as the canonical version. If you want /about to be the canonical version (i.e. other formats get redirected to it via 301), you have to make changes to the script.
[minor] It only works in us-east-1 (open issue in Github since 2020, still open and an actual problem in April 2021).
[minor] It has its own cost, although given CloudFront's caching, shouldn't be significant.
3) Create fake "Folder File"s in S3 - Use a manual Script.
It's a solution between the first two -- It supports OAI (private S3) and it doesn't require a server. It's a bit nasty though!
What you do here is, you run a script that for each subdirectory of /about/index.html it creates an object in S3 named (has key of) /about and copy that HTML file (the content and the content-type) into this object.
Example scripts can be found in this Reddit answer and this answer using AWS CLI.
Pros:
Secure: Supports S3 Private and CloudFront OAI.
No additional live piece: The script runs pre-upload to S3 (or one-time) and then the system remains intact with the two pieces of S3 and CF only.
Cons:
[Needs Confirmation] It supports /about but not /about/ with trailing / I believe.
Technically you have two different files being stored. Might look confusing and make your deploys expensive if there are tons of HTML files.
Your script has to manually find all the subdirectories and create a dummy object out of them in S3. That has the potential to break in the future.
PS. Other Tricks)
Dirty trick using Javascript on Custom Error
While it doesn't look like a real thing, this answer deserves some credit, IMO!
You let the Access Denied (404s turning into 403) go through, then catch them, and manually, via a JS, redirect them to the right place.
Pros
Again, easy to set up.
Cons
It relies on JavaScript in Client-Side.
It messes up with SEO -- especially if the crawler doesn't run JS.
It messes up with the user's browser history. (i.e. back button) and possibly could be improved (and get more complicated!) via HTML5 history.replace.

There is an "official" guide published on AWS blog that recommends setting up a Lambda#Edge function triggered by your CloudFront distribution:
Of course, it is a bad user experience to expect users to always type index.html at the end of every URL (or even know that it should be there). Until now, there has not been an easy way to provide these simpler URLs (equivalent to the DirectoryIndex Directive in an Apache Web Server configuration) to users through CloudFront. Not if you still want to be able to restrict access to the S3 origin using an OAI. However, with the release of Lambda#Edge, you can use a JavaScript function running on the CloudFront edge nodes to look for these patterns and request the appropriate object key from the S3 origin.
Solution
In this example, you use the compute power at the CloudFront edge to inspect the request as it’s coming in from the client. Then re-write the request so that CloudFront requests a default index object (index.html in this case) for any request URI that ends in ‘/’.
When a request is made against a web server, the client specifies the object to obtain in the request. You can use this URI and apply a regular expression to it so that these URIs get resolved to a default index object before CloudFront requests the object from the origin. Use the following code:
'use strict';
exports.handler = (event, context, callback) => {
// Extract the request from the CloudFront event that is sent to Lambda#Edge
var request = event.Records[0].cf.request;
// Extract the URI from the request
var olduri = request.uri;
// Match any '/' that occurs at the end of a URI. Replace it with a default index
var newuri = olduri.replace(/\/$/, '\/index.html');
// Log the URI as received by CloudFront and the new URI to be used to fetch from origin
console.log("Old URI: " + olduri);
console.log("New URI: " + newuri);
// Replace the received URI with the URI that includes the index page
request.uri = newuri;
// Return to CloudFront
return callback(null, request);
};
Follow the guide linked above to see all steps required to set this up, including S3 bucket, CloudFront distribution and Lambda#Edge function creation.

There is one other way to get a default file served in a subdirectory, like example.com/subdir/. You can actually (programatically) store a file with the key subdir/ in the bucket. This file will not show up in the S3 management console, but it actually exists, and CloudFront will serve it.

Johan Gorter and Jeremie indicated index.html can be stored as an object with key subdir/.
I validated this approach works and an alternative easy way to do this with awscli's s3api copy-object
aws s3api copy-object --copy-source bucket_name/subdir/index.html --key subdir/ --bucket bucket_name

Workaround for the issue is to utilize lambda#edge for rewriting the requests. One just needs to setup the lambda for the CloudFront distribution's viewer request event and to rewrite everything that ends with '/' AND is not equal to '/' with default root document e.g. index.html.

UPDATE: It looks like I was incorrect! See JBaczuk's answer, which should be the accepted answer on this thread.
Unfortunately, the answer to both your questions is no.
1. Is it possible to specify a default root object for all subdirectories for a statically hosted website on Cloudfront?
No. As stated in the AWS CloudFront docs...
... If you define a default root object, an end-user request for a subdirectory of your distribution does not return the default root object. For example, suppose index.html is your default root object and that CloudFront receives an end-user request for the install directory under your CloudFront distribution:
http://d111111abcdef8.cloudfront.net/install/
CloudFront will not return the default root object even if a copy of index.html appears in the install directory.
...
The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket. (A copy of the index document must appear in every subdirectory.)
2. Is it possible to setup an origin access identity for content served from Cloudfront where the origin is an S3 website endpoint and not an S3 bucket?
Not directly. Your options for origins with CloudFront are S3 buckets or your own server.
It's that second option that does open up some interesting possibilities, though. This probably defeats the purpose of what you're trying to do, but you could setup your own server whose sole job is to be a CloudFront origin server.
When a request comes in for http://d111111abcdef8.cloudfront.net/install/, CloudFront will forward this request to your origin server, asking for /install. You can configure your origin server however you want, including to serve index.html in this case.
Or you could write a little web app that just takes this call and gets it directly from S3 anyway.
But I realize that setting up your own server and worrying about scaling it may defeat the purpose of what you're trying to do in the first place.

Another alternative to using lambda#edge is to use CloudFront's error pages. Set up a Custom Error Response to send all 403's to a specific file. Then add javascript to that file to append index.html to urls that end in a /. Sample code:
if ((window.location.href.endsWith("/") && !window.location.href.endsWith(".com/"))) {
window.location.href = window.location.href + "index.html";
}
else {
document.write("<Your 403 error message here>");
}

One can use newly released cloudfront functions and here is sample code.
Note: If you are using static website hosting, then you do not need any function!

I know this is an old question, but I just struggled through this myself. Ultimately my goal was less to set a default file in a directory, and more to have the the end result of a file that was served without .html at the end of it
I ended up removing .html from the filename and programatically/manually set the mime type to text/html. It is not the traditional way, but it does seem to work, and satisfies my requirements for the pretty urls without sacrificing the benefits of cloudformation. Setting the mime type is annoying, but a small price to pay for the benefits in my opinion

#johan-gorter indicated above that CloudFront serves file with keys ending by /
After investigation, it appears that this option works, and that one can create this type of files in S3 programatically. Therefore, I wrote a small lambda that is triggered when a file is created on S3, with a suffix index.html or index.htm
What it does is copying an object dir/subdir/index.html into an object dir/subdir/
import json
import boto3
s3_client = boto3.client("s3")
def lambda_handler(event, context):
for f in event['Records']:
bucket_name = f['s3']['bucket']['name']
key_name = f['s3']['object']['key']
source_object = {'Bucket': bucket_name, 'Key': key_name}
file_key_name = False
if key_name[-10:].lower() == "index.html" and key_name.lower() != "index.html":
file_key_name = key_name[0:-10]
elif key_name[-9:].lower() == "index.htm" and key_name.lower() != "index.htm":
file_key_name = key_name[0:-9]
if file_key_name:
s3_client.copy_object(CopySource=source_object, Bucket=bucket_name, Key=file_key_name)

Is it possible to set Content-Security-Policy headers in Amazon S3?

I'm trying to set a Content-Security-Policy header for an html file I'm serving via s3/cloudfront. I'm using the web-based AWS console. Whenever I try to add the header:
it doesn't seem to respect it. What can I do to make sure this header is served?

I'm having the same problem (using S3/CloudFront) and it appears there is currently no way to set this up easily.
S3 has a whitelist of the headers permitted, and Content-Security-Policy is not on it. Whilst it is true you can use the prefixed x-amz-meta-Content-Security-Policy, this is unhelpful as there is no browser support for it.
There are two options I can see.
1) you can serve the html content from a webserver on an EC2 instance and set that up as another CloudFront origin. Not really a great solution.
2) include the CSP as a meta tag within your html document:
<!doctype html>
<html>
<head>
<meta http-equiv="Content-Security-Policy" content="default-src http://*.foobar.com 'self'">
...
This option is not as widely supported by browsers, but it appears to work with both Webkit and Firefox, so the current Chrome, Firefox, Safari (and IOS 7 Safari) seem to support it.
I chose 2 as it was the simpler/cheaper/faster solution and I hope AWS will add the CSP header in the future.

S3/CloudFront takes any headers that the origin set and forward those to the client, but you can't set custom headers on you response directly.
You can use Lambda#Edge function that can inject security headers through CloudFront.
Here is how the process works: (reference aws blog)
Viewer navigates to website.
Before CloudFront serves content from the cache it will trigger any
Lambda function associated with the Viewer Request trigger for that
behavior.
CloudFront serves content from the cache if available, otherwise it
goes to step 4.
Only after CloudFront cache ‘Miss’, Origin Request trigger is fired
for that behavior.
S3 Origin returns content.
After content is returned from S3 but before being cached in
CloudFront, Origin Response trigger is fired.
After content is cached in CloudFront, Viewer Response trigger is
fired and is the final step before viewer receives content.
Viewer receives content.
Below is the blog from aws on how to do this step by step.
https://aws.amazon.com/blogs/networking-and-content-delivery/adding-http-security-headers-using-lambdaedge-and-amazon-cloudfront/

If you are testing through CloudFront, have you made sure you have invalidated the cached objects? Can you try to upload a completely new file and then try accessing it via CF and see if the header is still not there?
Update
Seems like custom metadata will not work as expected as per DOC. Any metadata other than the ones supported by S3 (the ones displayed in the dropdown) will have to be prefixed with x-amz-meta-

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js