Hosting multiple SPA web apps on S3 + Cloudfront under same URL

Hosting multiple SPA web apps on S3 + Cloudfront under same URL - amazon-web-services

I have two static web apps (create-react-apps) that are currently in two separate S3 buckets. Both buckets are configured for public read + static web hosting, and visiting their S3 hosted URLs correctly display the sites.
Bucket 1 - First App:
index.html
static/js/main.js
Bucket 2 - Second App:
/secondapp/
index.html
static/js/main.js
I have setup a single Cloudfront for this - The default cloudfront origin loads FirstApp correctly, such that www.mywebsite.com loads the index.html by default.
For the SecondApp, I have set up a Cache Behavior so that the path pattern secondapp/* points to the SecondApp bucket URL.
In my browser, when I visit www.mywebsite.com/secondapp/ it correctly displays the second web app.
If I omit the trailing slash however, I instead see the First App, which is undesired.
If I visit www.mywebsite.com/secondapp/something, I am also shown the First App, which is also undesired. (I want it to load the .html of secondapp)
Both apps are configured to use html5 push state via react-router-dom.
My desired behavior is that visiting the following displays the correct site/bucket:
www.mywebsite.com - Currently working
www.mywebsite.com/secondapp/ - Currently working
www.mywebsite.com/secondapp - (Without trailing slash) Not working, shows First App
www.mywebsite.com/secondapp/something_else - Not working, show First App
How can I achieved the desired behavior?
Thanks!

After researching this issue, I was able to resolve it using lambda#edge (https://aws.amazon.com/lambda/edge/)
By deploying a simple javascript function to route specific paths to the desired s3 bucket, we are able to achieve an nginx-like routing setup.
The function sits on lambda#edge on our Cloudfront CDN, meaning you can specify when it is triggered. For us, it's on "Origin Request"
My setup is as follows:
I used a single s3 bucket, and deployed my second-app in a subfolder "second-app"
I created a new Lambda function, hosted on "U.S. East N Virginia". The region is important here, as you can only host lambda function an #edge in this region.
See below for the actual Lambda function
Once created, go to your CloudFront configuration and go to "Behaviors > Select the Default (*) path pattern and hit Edit"
Scroll to the bottom where there is "Lambda Function Associations"
Select "Origin Request" form the drop down
Enter the address for your lambda function (arn:aws:lambda:us-east-1:12345667890:function:my-function-name)
Here is an example of the lambda function I used.
var path = require('path');
exports.handler = (event, context, callback) => {
// Extract the request from the CloudFront event that is sent to Lambda#Edge
var request = event.Records[0].cf.request;
const parsedPath = path.parse(request.uri);
// If there is no extension present, attempt to rewrite url
if (parsedPath.ext === '') {
// Extract the URI from the request
var olduri = request.uri;
// Match any '/' that occurs at the end of a URI. Replace it with a default index
var newuri = olduri.replace(/second-app.*/, 'second-app/index.html');
// Replace the received URI with the URI that includes the index page
request.uri = newuri;
}
// If an extension was not present, we are trying to load static access, so allow the request to proceed
// Return to CloudFront
return callback(null, request);
};
These are the resources I used for this solution:
https://aws.amazon.com/blogs/compute/implementing-default-directory-indexes-in-amazon-s3-backed-amazon-cloudfront-origins-using-lambdaedge/
https://github.com/riboseinc/terraform-aws-s3-cloudfront-website/issues/1
How do you set a default root object for subdirectories for a statically hosted website on Cloudfront?

Related

Client-side routing for S3 / Cloudfront with multiple path-based Single Page Apps

I have the following situation:
An S3 bucket with multiple path-based applications, grouped by version number.
Simplified example:
/v1.0.0
index.html
main.js
/v1.1.0
index.html
main.js
Each application is a (React) SPA and requires client-side routing (via React router)
I am using S3 with Cloudfront and have everything mostly working, however the client-side routing is broken. This is to say I am able to visit the root of each application, ie. https://<app>.cloudfront.net/<version>, but cannot reach any client-side routes.
I'm aware that an error document can be set to redirect to an index.html, but I believe this solution only works when there is one index.html per bucket (ie. I cannot set an error document per route-based path).
What's the best way to get around this issue?

One simple way to deal with SPA through Cloudfront is by using Lambda#Edge - Origin request (or Cloudfront functions).
The objective is to change the Origin URI.
A simple js code that I use very often for SPAs (for the v1.0.0 webapp):
exports.handler = async (event) => {
const request = event.Records[0].cf.request;
const hasType = request.uri.split(/\#|\?/)[0].split('.').length >= 2;
if (hasType) return request; // simply forward to the S3 object as it is an asset
request.uri = '/v1.0.0/index.html'; // handle all react routes
return request;
};
I check if there is an extension (.png, .js, .css, ...) in the URL. If it is an asset, I simply forward to the S3 object otherwise I send the index.html.
In that case, index.html is sent for the path /v1.0.0/my-react-router.
Updated
For dynamic handling, you can do like this (for the idea):
request.uri = '/' + request.uri.split('/')[1] + '/index.html';
Or even better, use regexp to parse the request.uri in order to extract the version, the asset's extension or the spa route.

How do I 301 redirect (HTTP to HTTPS) && (www to non-www) for a single domain using S3 and Cloudfront?

I am hosting a static site (purely html/css) on AWS S3 with a CloudFront distribution. I have no problem configuring only CloudFront to redirect HTTP to HTTPS. Nor do I have a problem only having S3 redirect www to a non-www (naked) subdomain.
The problem comes when I try to redirect all HTTP traffic to HTTPS and simultaneously redirect all www subdomains to non-www.
It simply doesn't work. And I haven't been able to find a solution to this problem and I've been looking for months. It may seem like StackOverflow has the answer, but I'm telling you it doesn't. Either their solution reaches a dead-end or the solution is for an older AWS user interface that doesn't quite match the way it is today.
The best I have been able to come up with is an HTML redirect for www to non-www, but that's not ideal from an SEO and maintainability standpoint.
What is the best solution for this configuration?

As I mentioned in Supporting HTTPS URL redirection with a single CloudFront distribution, the simple and straightforward solution involves two buckets and two CloudFront distributions -- one for www and one for the bare domain. I am highly skeptical that this would have any negative SEO impact.
However, that answer pre-dates the introduction of the CloudFront Lambda#Edge extension, which offers another solution because it allows you to trigger a Javascript Lambda function to run at specific points during CloudFront's request processing, to inspect the request and potentially modify it or otherwise react to it.
There are several examples in the documentation but they are all very minimalistic, so here's a complete, working example, with more comments than actual code, explaining exactly what it does and how it does it.
This function -- configured as an Origin Request trigger -- will fire every time there is a cache miss, and inspect the Host header sent by the browser, to see if the request should be allowed through, or if it should be redirected without actually sending the request all the way through to S3. For cache hits, the function will not fire, because CloudFront already has the content cached.
Any other domain name associated with the CloudFront distribution will be redirected to the "real" domain name of your site, as configured in the function body. Optionally, it will also return a generated 404 response if someone accesses your distribution's *.cloudfront.net default hostname directly.
You may be wondering how the cache of a single CloudFront distribution can differentiate between the content for example.com/some-path and www.example.com/some-path and cache them separately, but the answer is that it can and it does if you configure it appropriately for this setup -- which means telling it to cache based on selected request headers -- specifically the Host header.
Normally, enabling that configuration wouldn't be quite compatible with S3, but it works here because the Lambda function also sets the Host header back to what S3 expects. Note that you need to configure the Origin Domain Name -- the web site hosting endpoint of your bucket -- inline, in the code.
With this configuration, you only need one bucket, and the bucket's name does not need to match any of the domain names. You can use whatever bucket you want... but you do need to use the web site hosting endpoint for the bucket, so that CloudFront treats it as a custom origin. Creating an "S3 Origin" using the REST endpoint for the bucket will not work.
'use strict';
// if an incoming request is for a domain name other than the canonical
// (official) hostname for the site, this Lambda#Edge trigger
// will redirect the request back to the official site, subject to the
// configuration parameters below.
// this trigger must be deployed as an Origin Request trigger.
// in the CloudFront Cache Behavior settings, the Host header must be
// whitelisted for forwarding, in order for this function to work as intended;
// this is an artifact of the way the Lambda#Edge interface interacts with the
// CloudFront cache key mechanism -- we can't react to what we can't see,
// and if it isn't part of the cache key, CloudFront won't expose it.
// specify the official hostname of the site; requests to this domain will
// be passed through; others will redirect to it...
const canonical_domain_name = 'example.com';
// ...but note that every CloudFront distribution has a default *.cloudfront.net
// hostname that can't be disabled; you may not want this hostname to do
// anything at all, including redirect; set this parameter to true if you
// want to to return 404 for the default hostname; see the render_reject()
// function to customize the behavior further.
const reject_default_hostname = false;
// the "origin" is the server that provides your content; this is configured
// in the distribution and selected in the Cache Behavior settings, but
// that information needs to be provided here, so that we can modify
// successful requests to match what the destination expects.
const origin_domain_name = 'example-bucket.s3-website.us-east-2.amazonaws.com';
// http status code for redirects; you may want 302 or 307 for testing,
// and 301 or 308 for production; note that this is a string, not a number.
const redirect_http_status_code = '302';
// for generated redirects, we can also set a cache control header; you'll need
// to ensure you format this correctly, since the code below does not validate
// the syntax; here, max-age is how long the browser should cache redirects,
// while s-maxage tells CloudFront how long to potentially cache them;
// higher values should result in less traffic and potentially lower costs;
// set to empty string or null if you don't want to set a value.
const redirect_cache_control = 'max-age=300, s-maxage=86400';
// set false to drop the query string on redirects; true to preserve
const redirect_preserve_querystring = true;
// set false to change the path to '/' on redirects; true to preserve
const redirect_preserve_path = true;
// end of configuration
// the URL in the generated redirect will always use https unless you
// configure whitelisting of CloudFront-Forwarded-Proto, in which case we
// will use that value; if you want to send http to https, use the
// Viewer Protocol Policy settings in the CloudFront cache behavior.
exports.handler = (event, context, callback) => {
// extract the CloudFront object from the trigger event
const cf = event.Records[0].cf;
// extract the request object
const request = cf.request;
// extract the HTTP Host header
const host = request.headers.host[0].value;
// check whether the host header matches the canonical value; if so,
// set the host header to what the origin expects, and return control
// to CloudFront
if(host === canonical_domain_name)
{
request.headers.host[0].value = origin_domain_name;
return callback(null, request);
}
// check for rejection
if (reject_default_hostname && host.endsWith('.cloudfront.net'))
{
return render_reject(cf, callback);
}
// if neither 'return' above has been invoked, then we need to generate a redirect.
const proto = (request.headers['cloudfront-forwarded-proto'] || [{ value: 'https' }])[0].value;
const path = redirect_preserve_path ? request.uri : '/';
const query = redirect_preserve_querystring && (request.querystring != '') ? ('?' + request.querystring) : '';
const location = proto + '://' + canonical_domain_name + path + query;
// build a response object to redirect the browser.
const response = {
status: redirect_http_status_code,
headers: {
'location': [ { key: 'Location', value: location } ],
},
body: '',
};
// add the cache control header, if configured
if(redirect_cache_control)
{
response.headers['cache-control'] = [{ key: 'Cache-Control', value: redirect_cache_control }];
}
// return the response object, preventing the request from being sent to
// the origin server
return callback(null, response);
};
function render_reject(cf, callback) {
// only invoked if the request is for *.cloudfront.net and you set
// reject_default_hostname to true; here, we generate a very simple
// response, text/plain, with a 404 error. This can be customized to HTML
// or XML, etc., according to your local practices, but be sure you properly
// escape the request URI, since it is untrusted data and could lead to an
// XSS injection otherwise; no similar vulnerability exists with plain text.
const body_text = `The requested URL '${cf.request.uri}' does not exist ` +
'on this server, or access is not enabled via the ' +
`${ cf.request.headers.host[0].value } endpoint.\r\n`;
// generate a response; you may want to customize this; note that
// Lambda#Edge is strict with regard to the way headers are specified;
// the outer keys are lowercase, the inner keys can be mixed.
const response = {
status: '404',
headers: {
'cache-control': [{ key: 'Cache-Control', value: 'no-cache, s-maxage=86400' }],
'content-type': [{ key: 'Content-Type', value: 'text/plain' }],
},
body: body_text,
};
return callback(null, response);
}
// eof

Finishing up the other answer here using Lambda#Edge, I realized there is a significantly simpler solution, using only a single CloudFront distribution and three (explained below) S3 buckets.
There are more constraints to this solution, but it has fewer moving parts and costs less to implement and use.
Here are the constraints:
You must be using the S3 web site hosting feature (should be a given, since we're talking about hosting content and doing redirects)
The buckets must all be in the same AWS region
The first two buckets must be named exactly the same as the hostnames you want to handle -- e.g. you need a bucket named example.com and a bucket named www.example.com.
You also need to create a bucket whose name exactly matches the hostname assigned to the CloudFront distribution, e.g. dzczcexample.cloudfront.net, and this bucket also must be in the same region as the other two.
Configure the CloudFront distribution's Origin Domain Name to point to your main content bucket using its web site hosting endpoint, e.g. example.com.s3-website.us-east-2.amazonaws.com.
Configure the Alternate Domain Name settings for both example.com and www.example.com.
Whitelist the Host header for forwarding to the origin. This setting takes advantage of the fact that when S3 does not recognize the incoming HTTP Host header as being one that belongs to S3, then...
the bucket for the request is the lowercase value of the Host header, and the key for the request is the Request-URI.
https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html
Ummm... perfect! That's exactly what we need -- and it gives us a way to pass requests to multiple buckets in one S3 region, through a single CloudFront distribution, based on what the browser asks for... because with this setup, we're able to split the logic:
the Origin Domain Name is used only for routing the request from the CloudFront edge to the correct S3 region, then
the whitelisted Host header is used when the request arrives at S3 for selecting which bucket handles the request.
(This is why all the buckets have to be in the same region, as mentioned above. Otherwise, the request will be delivered to the region of the "main" bucket, and that region will reject it as misrouted if the identified bucket is in a different region.)
With this configuration in place, you'll find that example.com requests are handled by the example.com bucket, and www.example.com requests are handled by the www.example.com bucket, which means all you need to do now is configure the buckets as desired.
But there is one more critical step. You absolutely need to create a bucket named after your CloudFront distribution's assigned default domain name (e.g. d111jozxyqk.cloudfront.net), in order to avoid setting up an exploitable scenario. It's not a security vulnerability, it's a billing one. It doesn't make a great deal of difference how you configure this bucket, but it is important that you own the bucket so that nobody else can create it. Why? Because with this configuration, requests sent directly to your CloudFront distribution's default domain name (not your custom domains) will result in S3 returning a No Such Bucket error for that bucket name. If someone else were to discover your setup, they could create that bucket, you'd pay for all their data traffic through your CloudFront distribution. Create the bucket and either leave it empty (so that an error is returned) or set it up to redirect to your main web site.

Redirect to index.html for S3 subfolder

I have a domain example.com. I have a S3 bucket named example.com setup with an index.html file that works. Now I like to create two subfolders called old and new, each containing a separate version of a single page application. Requesting https://example.com/old (I like to omit the index.html when entering the request in address bar for browser) would open the index.html file in the old subfolder and requesting https://example.com/new would open the index.html. What is the best way of doing these redirects? Should I set something up in Route 53 example.com/old -> example.com/old/index.html or is there a better way of doing it?

No need for a lambda function adding expense and complexity to your project.
The following answer is quoted from https://stevepapa.com/
https://stevepapa.com/my-great-new-post/ would be expected to work the same way as: https://stevepapa.com/my-great-new-post/index.html
There’s a clever little way to get these flowing through to the Cloudfront distribution, and it involves changing the source origin from the one that Cloudfront presents to you by default.
When selecting the origin source Cloudfront will show you a list of S3 buckets.
Instead of setting the source from the bucket shown in the dropdown list, you’ll need to grab the static web hosting endpoint for that resource from its S3 settings page and pop it in manually.
Using the static source for the Cloudfront distribution origin means any request to that distribution will be using the S3’s root object lookup, and your 404 responses should disappear as the references flow through.
Important
After doing this:
Clear your browser cache
Devalidate the items in your Cloudfront distribution
Otherwise, the changes you made won't go live immediately.

So I had this problem last night too.
The issue is as follows:
S3 when configured as a website bucket is forgiving and has the index document setting, set to index.html and this gets applied at the root, ie, example.com actually gets redirected to example.com/index.html, and it also gets applied at the subfolder level, so example.com/new or example.com/new/ should both redirect to example.com/new/index.html, where there would be an object in the bucket. (If not, you'd get a NoSuchKey error instead.)
However you then "upgrade" yourself to CloudFront, likely for HTTPS, and this feature goes away. CloudFront instead makes explicit API calls to S3 and therefore doesn't trigger the index document concession. It does work for the root, but not for subfolders.
The RoutingRules solution doesn't look clean to me because by specifying KeyPrefixEquals rather than key exactly equals (which doesn't exist) I think you'd get unintended matches.
I instead have implemented a Lambda#Edge rule that rewrites the request that CloudFront makes to S3 to have a proper key value in it.
Start with the Lambda docs and the A/B testing example here:
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-examples.html#lambda-examples-general-examples
Change the code to:
'use strict';
exports.handler = (event, context, callback) => {
/*
* Expand S3 request to have index.html if it ends in /
*/
const request = event.Records[0].cf.request;
if ((request.uri !== "/") /* Not the root object, which redirects properly */
&& (request.uri.endsWith("/") /* Folder with slash */
|| (request.uri.lastIndexOf(".") < request.uri.lastIndexOf("/")) /* Most likely a folder, it has no extension (heuristic) */
)) {
if (request.uri.endsWith("/"))
request.uri = request.uri.concat("index.html");
else
request.uri = request.uri.concat("/index.html");
}
callback(null, request);
};
And publish it to your CloudFront distribution.

There is even easier way to accomplish this with an HTML redirect file
Create a plain file named my-great-new-post (don't worry there won't be a name conflict with the folder in the same bucket)
Write a meta-redirect code in that file (I pasted the code below)
upload file to root bucket (where my-great-new-post folder lays)
modify metadata of the new file and make Content-Type:text/html
Here lays the content of the file:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="refresh" content="0; url=/my-great-new-post/index.html">
</head>
<body>
</body>
</html>

If you're using CloudFront, you can use CloudFront functions to create a simple redirection.
I modified #jkingok's solution.
Go to CloudFront, and click on Functions.
Click on Create function, enter a name and optional description
In the development section, enter the code snippet below and publish from the publish tab.
function handler(event) {
var request = event.request;
if (request.uri !== "/" && (request.uri.endsWith("/") || request.uri.lastIndexOf(".") < request.uri.lastIndexOf("/"))) {
if (request.uri.endsWith("/")) {
request.uri = request.uri.concat("index.html");
} else {
request.uri = request.uri.concat("/index.html");
}
}
return request;
}
Once your function is completed, you can use the function by going to the "Behaviors" tab of your distribution, select the path pattern you want to modify, then under "Function associations", for Viewer Request, select "CloudFront function" as the function type and then select the function you created in the dropdown list.
Once you save the Behaviors, you can test your website.
NOTE: This solution redirects every URL without extension to "URL/index.html", you can modify the behaviour of the function to what works for you.

When you enable and configure static hosting with S3 you need to access the site via the bucket website endpoint. You can find this URL in the bucket properties in the Static website hosting section.
The URL of the website endpoint will look like this:
http://example-bucket.s3-website-eu-west-1.amazonaws.com/example-folder/
However (confusingly) objects stored in S3 are also accessible via a different URL, this url does not honour the index rules on subfolders. This URL looks like this:
https://example-bucket.s3-eu-west-1.amazonaws.com/example-folder/

Configure your Bucket to deliver a static website
Create a CloudFront Distribution: set your bucket as the Origin and leave the OriginPath empty (default: /)
Create Route53 RecordSet which links to your CloudFront Distribution
You can find a helpful walkthrough here
Question: What should happen if your customer enters example.com (without old/new)?
Edit: 2. is optional. You could also link your Route53 RecordSet to your static website but CloudFront enables you to serve your wesbite with https (with help of AWS Certificate Manager).

If you are using CDK to create a CloudFrontWebDistribution with an S3 source, then your first guess is probably to do this:
OriginConfigs = new[] {
new SourceConfiguration {
S3OriginSource = new S3OriginConfig
{
S3BucketSource = bucket
}
Behaviors = new[] { new Behavior { IsDefaultBehavior = true } }
}
}
However, to configure cloudfront to use the website-bucket-url (that does have the behavior to resolve a directory to index.html), you need to use:
OriginConfigs = new[] {
new SourceConfiguration {
CustomOriginSource = new CustomOriginConfig
{
DomainName = bucket.BucketWebsiteDomainName,
OriginProtocolPolicy = OriginProtocolPolicy.HTTP_ONLY
},
Behaviors = new[] { new Behavior { IsDefaultBehavior = true } }
}
}
You need to specify the protocol as HTTP_ONLY because website buckets do not support HTTPS. The default for a CustomOriginSource is HTTPS_ONLY.

You can try setting Redirection rules, Here is an untested rule.
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>old</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyWith>old/index.html</ReplaceKeyWith>
</Redirect>
</RoutingRule>
<RoutingRule>
<Condition>
<KeyPrefixEquals>new</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyWith>new/index.html</ReplaceKeyWith>
</Redirect>
</RoutingRule>
</RoutingRules>

Static site hosted on S3 and Cloudfront root page routes to www.example.com/index.html, not www.example.com

I have a static site hosted on Amazon S3 and Cloudfront.Currently, when I go to my site, www.example.com, I get a 403. It's only when I go to www.example.com/index.html that I actually access my site. My desired behavior is that when I go to www.example.com, I see what I see when I go to www.example.com/index.html.
I've set up a bucket that we can call example.com, that contains all of my site's information. I also have another bucket (www.example.com) that redirects to example.com.
My domain points to Cloudfront, where I have a Cloudfront domain set. I think this is where the problem is. I have to go to /index.html from this domain to actually see the site.
How do I set this up so that when I go to www.example.com, I see what currently lives at www.example.com/index.html?
I have already set index.html as my bucket's Index document.

CloudFront's default root object only works for the root of your domain. So a request to https://example.com will serve /index.html from S3, but a request to https://example.com/about-us will simply 404 (or 403, depending on permissions).
If you want to serve index.html from any directory, you need to deploy a Lambda#Edge function for the origin request. Follow this tutorial, and the function body you need (Node.js 8.10) is:
exports.handler = (event, context, callback) => {
const { request } = event.Records[0].cf;
const parts = request.uri.split('/');
const lastPart = parts[parts.length - 1];
const hasExtension = lastPart.includes('.');
if (!hasExtension) {
const newURI = request.uri.replace(/\/?$/, '/index.html');
request.uri = newURI;
}
return callback(null, request);
};

When you setup your S3 bucket for static website hosting (with or without CloudFront), you can configure the bucket with an Index Document. The Index Document is sent to the client when the client makes a directory request.
For example, you want www.example.com/index.html to be served when the client goes to www.example.com.
To do this, set index.html as your bucket's Index Document.
https://docs.aws.amazon.com/AmazonS3/latest/dev/HostingWebsiteOnS3Setup.html
See Step 1, sub-step 3(b).

In the CloudFront general configuration settings make sure the default root object is set to index.html. As a best practice you should use an Origin Access Identifier to ensure your objects are only served through CloudFront.
Also make sure your origin is set properly to your S3 bucket and the folder that contains your site (if applicable).

How do you set a default root object for subdirectories for a statically hosted website on Cloudfront?

How do you set a default root object for subdirectories on a statically hosted website on Cloudfront? Specifically, I'd like www.example.com/subdir/index.html to be served whenever the user asks for www.example.com/subdir. Note, this is for delivering a static website held in an S3 bucket. In addition, I would like to use an origin access identity to restrict access to the S3 bucket to only Cloudfront.
Now, I am aware that Cloudfront works differently than S3 and amazon states specifically:
The behavior of CloudFront default root objects is different from the
behavior of Amazon S3 index documents. When you configure an Amazon S3
bucket as a website and specify the index document, Amazon S3 returns
the index document even if a user requests a subdirectory in the
bucket. (A copy of the index document must appear in every
subdirectory.) For more information about configuring Amazon S3
buckets as websites and about index documents, see the Hosting
Websites on Amazon S3 chapter in the Amazon Simple Storage Service
Developer Guide.
As such, even though Cloudfront allows us to specify a default root object, this only works for www.example.com and not for www.example.com/subdir. In order to get around this difficulty, we can change the origin domain name to point to the website endpoint given by S3. This works great and allows the root objects to be specified uniformly. Unfortunately, this doesn't appear to be compatable with origin access identities. Specifically, the above links states:
Change to edit mode:
Web distributions – Click the Origins tab, click the origin that you want to edit, and click Edit. You can only create an origin access
identity for origins for which Origin Type is S3 Origin.
Basically, in order to set the correct default root object, we use the S3 website endpoint and not the website bucket itself. This is not compatible with using origin access identity. As such, my questions boils down to either
Is it possible to specify a default root object for all subdirectories for a statically hosted website on Cloudfront?
Is it possible to setup an origin access identity for content served from Cloudfront where the origin is an S3 website endpoint and not an S3 bucket?

There IS a way to do this. Instead of pointing it to your bucket by selecting it in the dropdown (www.example.com.s3.amazonaws.com), point it to the static domain of your bucket (eg. www.example.com.s3-website-us-west-2.amazonaws.com):
Thanks to This AWS Forum thread

(New Feature May 2021) CloudFront Function
Create a simple JavaScript function below
function handler(event) {
var request = event.request;
var uri = request.uri;
// Check whether the URI is missing a file name.
if (uri.endsWith('/')) {
request.uri += 'index.html';
}
// Check whether the URI is missing a file extension.
else if (!uri.includes('.')) {
request.uri += '/index.html';
}
return request;
}
Read here for more info

Activating S3 hosting means you have to open the bucket to the world. In my case, I needed to keep the bucket private and use the origin access identity functionality to restrict access to Cloudfront only. Like #Juissi suggested, a Lambda function can fix the redirects:
'use strict';
/**
* Redirects URLs to default document. Examples:
*
* /blog -> /blog/index.html
* /blog/july/ -> /blog/july/index.html
* /blog/header.png -> /blog/header.png
*
*/
let defaultDocument = 'index.html';
exports.handler = (event, context, callback) => {
const request = event.Records[0].cf.request;
if(request.uri != "/") {
let paths = request.uri.split('/');
let lastPath = paths[paths.length - 1];
let isFile = lastPath.split('.').length > 1;
if(!isFile) {
if(lastPath != "") {
request.uri += "/";
}
request.uri += defaultDocument;
}
console.log(request.uri);
}
callback(null, request);
};
After you publish your function, go to your cloudfront distribution in the AWS console. Go to Behaviors, then chooseOrigin Request under Lambda Function Associations, and finally paste the ARN to your new function.

I totally agree that it's a ridiculous problem! The fact that CloudFront knows about serving index.html as Default Root Object AND STILL they say it doesn't work for subdirectories (source) is totally strange!
The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket.
I, personally, believe that AWS has made it this way so CloudFront becomes a CDN only (loading assets, with no logic in it whatsoever) and every request to a path in your website should be served from a "Server" (e.g. EC2 Node/Php server, or a Lambda function.)
Whether this limitation exists to enhance security, or keep things apart (i.e. logic and storage separated), or make more money (to enforce people to have a dedicated server, even for static content) is up to debate.
Anyhow, I'm summarizing the possible solutions workarounds here, with their pros and cons.
1) S3 can be Public - Use Custom Origin.
It's the easiest one, originally posted by #JBaczuk answer as well as in this github gist. Since S3 already supports serving index.html in subdirectories via Static Website Hosting, all you need to do is:
Go to S3, enable Static Website Hosting
Grab the URL in the form of http://<bucket-name>.s3-website-us-west-2.amazonaws.com
Create a new Origin in CloudFront and enter this as a Custom Origin (and NOT S3 ORIGIN), so CloudFront treats this as an external website when getting the content.
Pros:
Very easy to set up.
It supports /about/, /about, and /about/index.html and redirect the last two to the first one, properly.
Cons:
If your files in the S3 bucket are not in the root of S3 (say in /artifacts/* then going to www.domain.com/about (without the trailing /) will redirect you to www.domain.com/artifacts/about which is something you don't want at all! Basically the /about to /about/ redirect in S3 breaks if you serve from CloudFront and the path to files (from the root) don't match.
Security and Functionality: You cannot make S3 Private. It's because CloudFront's Origin Access Identity is not going to be supported, clearly, because CloudFront is instructed to take this Origin as a random website. It means that users can potentially get the files from S3 directly, which might not be what you ever what due to security/WAF concerns, as well as the website actually working if you have JS/html that relies on the path being your domain only.
[maybe an issue] The communication between CloudFront and S3 is not the way it's recommended to optimize stuff.
[maybe?] someone has complained that it doesn't work smoothly for more than one Origin in the Distribution (i.e. wanting /blog to go somewhere)
[maybe?] someone has complained that it doesn't preserve the original query params as expected.
2) Official solution - Use a Lambda Function.
It's the official solution (though the doc is from 2017). There is a ready-to-launch 3rd-party Application (JavaScript source in github) and example Python Lambda function (this answer) for it, too.
Technically, by doing this, you create a mini-server (they call it serverless!) that only serves CloudFront's Origin Requests to S3 (so, it basically sits between CloudFront and S3.)
Pros:
Hey, it's the official solution, so probably lasts longer and is the most optimized one.
You can customize the Lambda Function if you want and have control over it. You can support further redirect in it.
If implemented correctly, (like the 3rd party JS one, and I don't think the official one) it supports /about/ and /about both (with a redirect from the latter without trailing / to the former).
Cons:
It's one more thing to set up.
It's one more thing to have an eye, so it doesn't break.
It's one more thing to check when something breaks.
It's one more thing to maintain -- e.g. the third-party one here has open PRs since Jan 2021 (it's April 2021 now.)
The 3rd party JS solution doesn't preserve the query params. So /about?foo=bar is 301 redirected to /about/ and NOT /about/?foo=bar. You need to make changes to that lambda function to make it work.
The 3rd party JS solution keeps /about/ as the canonical version. If you want /about to be the canonical version (i.e. other formats get redirected to it via 301), you have to make changes to the script.
[minor] It only works in us-east-1 (open issue in Github since 2020, still open and an actual problem in April 2021).
[minor] It has its own cost, although given CloudFront's caching, shouldn't be significant.
3) Create fake "Folder File"s in S3 - Use a manual Script.
It's a solution between the first two -- It supports OAI (private S3) and it doesn't require a server. It's a bit nasty though!
What you do here is, you run a script that for each subdirectory of /about/index.html it creates an object in S3 named (has key of) /about and copy that HTML file (the content and the content-type) into this object.
Example scripts can be found in this Reddit answer and this answer using AWS CLI.
Pros:
Secure: Supports S3 Private and CloudFront OAI.
No additional live piece: The script runs pre-upload to S3 (or one-time) and then the system remains intact with the two pieces of S3 and CF only.
Cons:
[Needs Confirmation] It supports /about but not /about/ with trailing / I believe.
Technically you have two different files being stored. Might look confusing and make your deploys expensive if there are tons of HTML files.
Your script has to manually find all the subdirectories and create a dummy object out of them in S3. That has the potential to break in the future.
PS. Other Tricks)
Dirty trick using Javascript on Custom Error
While it doesn't look like a real thing, this answer deserves some credit, IMO!
You let the Access Denied (404s turning into 403) go through, then catch them, and manually, via a JS, redirect them to the right place.
Pros
Again, easy to set up.
Cons
It relies on JavaScript in Client-Side.
It messes up with SEO -- especially if the crawler doesn't run JS.
It messes up with the user's browser history. (i.e. back button) and possibly could be improved (and get more complicated!) via HTML5 history.replace.

There is an "official" guide published on AWS blog that recommends setting up a Lambda#Edge function triggered by your CloudFront distribution:
Of course, it is a bad user experience to expect users to always type index.html at the end of every URL (or even know that it should be there). Until now, there has not been an easy way to provide these simpler URLs (equivalent to the DirectoryIndex Directive in an Apache Web Server configuration) to users through CloudFront. Not if you still want to be able to restrict access to the S3 origin using an OAI. However, with the release of Lambda#Edge, you can use a JavaScript function running on the CloudFront edge nodes to look for these patterns and request the appropriate object key from the S3 origin.
Solution
In this example, you use the compute power at the CloudFront edge to inspect the request as it’s coming in from the client. Then re-write the request so that CloudFront requests a default index object (index.html in this case) for any request URI that ends in ‘/’.
When a request is made against a web server, the client specifies the object to obtain in the request. You can use this URI and apply a regular expression to it so that these URIs get resolved to a default index object before CloudFront requests the object from the origin. Use the following code:
'use strict';
exports.handler = (event, context, callback) => {
// Extract the request from the CloudFront event that is sent to Lambda#Edge
var request = event.Records[0].cf.request;
// Extract the URI from the request
var olduri = request.uri;
// Match any '/' that occurs at the end of a URI. Replace it with a default index
var newuri = olduri.replace(/\/$/, '\/index.html');
// Log the URI as received by CloudFront and the new URI to be used to fetch from origin
console.log("Old URI: " + olduri);
console.log("New URI: " + newuri);
// Replace the received URI with the URI that includes the index page
request.uri = newuri;
// Return to CloudFront
return callback(null, request);
};
Follow the guide linked above to see all steps required to set this up, including S3 bucket, CloudFront distribution and Lambda#Edge function creation.

There is one other way to get a default file served in a subdirectory, like example.com/subdir/. You can actually (programatically) store a file with the key subdir/ in the bucket. This file will not show up in the S3 management console, but it actually exists, and CloudFront will serve it.

Johan Gorter and Jeremie indicated index.html can be stored as an object with key subdir/.
I validated this approach works and an alternative easy way to do this with awscli's s3api copy-object
aws s3api copy-object --copy-source bucket_name/subdir/index.html --key subdir/ --bucket bucket_name

Workaround for the issue is to utilize lambda#edge for rewriting the requests. One just needs to setup the lambda for the CloudFront distribution's viewer request event and to rewrite everything that ends with '/' AND is not equal to '/' with default root document e.g. index.html.

UPDATE: It looks like I was incorrect! See JBaczuk's answer, which should be the accepted answer on this thread.
Unfortunately, the answer to both your questions is no.
1. Is it possible to specify a default root object for all subdirectories for a statically hosted website on Cloudfront?
No. As stated in the AWS CloudFront docs...
... If you define a default root object, an end-user request for a subdirectory of your distribution does not return the default root object. For example, suppose index.html is your default root object and that CloudFront receives an end-user request for the install directory under your CloudFront distribution:
http://d111111abcdef8.cloudfront.net/install/
CloudFront will not return the default root object even if a copy of index.html appears in the install directory.
...
The behavior of CloudFront default root objects is different from the behavior of Amazon S3 index documents. When you configure an Amazon S3 bucket as a website and specify the index document, Amazon S3 returns the index document even if a user requests a subdirectory in the bucket. (A copy of the index document must appear in every subdirectory.)
2. Is it possible to setup an origin access identity for content served from Cloudfront where the origin is an S3 website endpoint and not an S3 bucket?
Not directly. Your options for origins with CloudFront are S3 buckets or your own server.
It's that second option that does open up some interesting possibilities, though. This probably defeats the purpose of what you're trying to do, but you could setup your own server whose sole job is to be a CloudFront origin server.
When a request comes in for http://d111111abcdef8.cloudfront.net/install/, CloudFront will forward this request to your origin server, asking for /install. You can configure your origin server however you want, including to serve index.html in this case.
Or you could write a little web app that just takes this call and gets it directly from S3 anyway.
But I realize that setting up your own server and worrying about scaling it may defeat the purpose of what you're trying to do in the first place.

Another alternative to using lambda#edge is to use CloudFront's error pages. Set up a Custom Error Response to send all 403's to a specific file. Then add javascript to that file to append index.html to urls that end in a /. Sample code:
if ((window.location.href.endsWith("/") && !window.location.href.endsWith(".com/"))) {
window.location.href = window.location.href + "index.html";
}
else {
document.write("<Your 403 error message here>");
}

One can use newly released cloudfront functions and here is sample code.
Note: If you are using static website hosting, then you do not need any function!

I know this is an old question, but I just struggled through this myself. Ultimately my goal was less to set a default file in a directory, and more to have the the end result of a file that was served without .html at the end of it
I ended up removing .html from the filename and programatically/manually set the mime type to text/html. It is not the traditional way, but it does seem to work, and satisfies my requirements for the pretty urls without sacrificing the benefits of cloudformation. Setting the mime type is annoying, but a small price to pay for the benefits in my opinion

#johan-gorter indicated above that CloudFront serves file with keys ending by /
After investigation, it appears that this option works, and that one can create this type of files in S3 programatically. Therefore, I wrote a small lambda that is triggered when a file is created on S3, with a suffix index.html or index.htm
What it does is copying an object dir/subdir/index.html into an object dir/subdir/
import json
import boto3
s3_client = boto3.client("s3")
def lambda_handler(event, context):
for f in event['Records']:
bucket_name = f['s3']['bucket']['name']
key_name = f['s3']['object']['key']
source_object = {'Bucket': bucket_name, 'Key': key_name}
file_key_name = False
if key_name[-10:].lower() == "index.html" and key_name.lower() != "index.html":
file_key_name = key_name[0:-10]
elif key_name[-9:].lower() == "index.htm" and key_name.lower() != "index.htm":
file_key_name = key_name[0:-9]
if file_key_name:
s3_client.copy_object(CopySource=source_object, Bucket=bucket_name, Key=file_key_name)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js