How to get clean URLs on AWS Cloudfront (S3)? - amazon-web-services

I'm hosting my static website on AWS S3, with Cloudfront as a CDN, and I'm wondering how I can get clean URLs working.
I currently have to go to example.com/about.html to get the about page. I'd prefer example.com/about as well as across all my other pages. Also, I kind of have to do this because my canonical URLs have been set with meta tags and search engines, and it's gonna be a bit much to go changing them.
Is there a setting in Cloudfront that I'm non seeing?
Updates
There are two options I've explored, one detailed by Matt below.
First is trimming .html off the file before uploading to S3 and then editing the Content Header in the http for that file. This might work beautifully, but I can't figure out how to edit content headers from the command line, where I'm writing my "push website update" bash script.
Second is detailled by Matt below and leverages S3's feature that recognizes root default files, usually index.html. Might be a great approach, but it makes my local testing challenging, and it leaves a trailing slash on the URLs which doesn't work for me.

Try AWS Lamda#Edge. It solves this completely.
First, create an AWS Lambda function and then attach your CloudFront as a trigger.
In the code section of this AWS Lamda page, add the snippet in the repository below.
https://github.com/CloudUnder/lambda-edge-nice-urls/blob/master/lambdaRewrite.js
Note the options in the readme section of the repo

When you host your website in S3 (and by extension CloudFront), you can configure S3 to have a "default" file to load when a directory is requested. This is called the "index document".
For example, you can configure S3 to load index.html as the default file. This way, if the request was for example.com/abc/, then it would load abc/index.html.
When you do this, if they requested example.com/abc/123.html, then it will serve up abc/123.html. So the default file only applies when a folder is requested.
To address your request for example.com/about/, you could configure your bucket with a default file of index.html, and put about/index.html in your bucket.
More information can be found in Amazon's documentation: Index Document Support.

You can overcome ugly urls by using a custom origin, when your S3 bucket is configured as a website endpoint, with your Cloudfront distribution. The downside is that you can't configure Cloudfront to use HTTPS to communicate between Cloudfront and your origin. You can still use HTTPS, just not end-to-end encryption.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/using-https-cloudfront-to-s3-origin.html

You can use Lambda as a reverse proxy to your website.
In API Gateway you need to create a "proxy resource" with resource path = "{proxy+}"
Then, create a Lambda function to route the requests:
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
const myBucket = 'myBucket';
exports.handler = async (event) => {
var responseBody = "";
if (event.path=="/") {
responseBody = "<h1>My Landing Page</h1>";
responseBody += "<a href='/about'>link to about page</a>";
return buildResponse(200, responseBody);
}
if (event.path == "/about") {
var params = {
Bucket: myBucket,
Key: 'path/to/about.html',
};
const data = await s3.getObject(params).promise();
return buildResponse(200, data.Body.toString('utf-8'));
}
return buildResponse(404, 'Page Not Found');
};
function buildResponse(statusCode, responseBody) {
var response = {
"isBase64Encoded": false,
"statusCode": statusCode,
"headers": {
"Content-Type" : "text/html; charset=utf-8"
},
"body": responseBody,
};
return response;
}
Then you can create a CloudFront distribution for your API Gateway, using your custom domain.
For more details check this answer:
https://stackoverflow.com/a/57913763/2444386

Related

Hosting multiple SPA web apps on S3 + Cloudfront under same URL

I have two static web apps (create-react-apps) that are currently in two separate S3 buckets. Both buckets are configured for public read + static web hosting, and visiting their S3 hosted URLs correctly display the sites.
Bucket 1 - First App:
index.html
static/js/main.js
Bucket 2 - Second App:
/secondapp/
index.html
static/js/main.js
I have setup a single Cloudfront for this - The default cloudfront origin loads FirstApp correctly, such that www.mywebsite.com loads the index.html by default.
For the SecondApp, I have set up a Cache Behavior so that the path pattern secondapp/* points to the SecondApp bucket URL.
In my browser, when I visit www.mywebsite.com/secondapp/ it correctly displays the second web app.
If I omit the trailing slash however, I instead see the First App, which is undesired.
If I visit www.mywebsite.com/secondapp/something, I am also shown the First App, which is also undesired. (I want it to load the .html of secondapp)
Both apps are configured to use html5 push state via react-router-dom.
My desired behavior is that visiting the following displays the correct site/bucket:
www.mywebsite.com - Currently working
www.mywebsite.com/secondapp/ - Currently working
www.mywebsite.com/secondapp - (Without trailing slash) Not working, shows First App
www.mywebsite.com/secondapp/something_else - Not working, show First App
How can I achieved the desired behavior?
Thanks!
After researching this issue, I was able to resolve it using lambda#edge (https://aws.amazon.com/lambda/edge/)
By deploying a simple javascript function to route specific paths to the desired s3 bucket, we are able to achieve an nginx-like routing setup.
The function sits on lambda#edge on our Cloudfront CDN, meaning you can specify when it is triggered. For us, it's on "Origin Request"
My setup is as follows:
I used a single s3 bucket, and deployed my second-app in a subfolder "second-app"
I created a new Lambda function, hosted on "U.S. East N Virginia". The region is important here, as you can only host lambda function an #edge in this region.
See below for the actual Lambda function
Once created, go to your CloudFront configuration and go to "Behaviors > Select the Default (*) path pattern and hit Edit"
Scroll to the bottom where there is "Lambda Function Associations"
Select "Origin Request" form the drop down
Enter the address for your lambda function (arn:aws:lambda:us-east-1:12345667890:function:my-function-name)
Here is an example of the lambda function I used.
var path = require('path');
exports.handler = (event, context, callback) => {
// Extract the request from the CloudFront event that is sent to Lambda#Edge
var request = event.Records[0].cf.request;
const parsedPath = path.parse(request.uri);
// If there is no extension present, attempt to rewrite url
if (parsedPath.ext === '') {
// Extract the URI from the request
var olduri = request.uri;
// Match any '/' that occurs at the end of a URI. Replace it with a default index
var newuri = olduri.replace(/second-app.*/, 'second-app/index.html');
// Replace the received URI with the URI that includes the index page
request.uri = newuri;
}
// If an extension was not present, we are trying to load static access, so allow the request to proceed
// Return to CloudFront
return callback(null, request);
};
These are the resources I used for this solution:
https://aws.amazon.com/blogs/compute/implementing-default-directory-indexes-in-amazon-s3-backed-amazon-cloudfront-origins-using-lambdaedge/
https://github.com/riboseinc/terraform-aws-s3-cloudfront-website/issues/1
How do you set a default root object for subdirectories for a statically hosted website on Cloudfront?

Redirect to index.html for S3 subfolder

I have a domain example.com. I have a S3 bucket named example.com setup with an index.html file that works. Now I like to create two subfolders called old and new, each containing a separate version of a single page application. Requesting https://example.com/old (I like to omit the index.html when entering the request in address bar for browser) would open the index.html file in the old subfolder and requesting https://example.com/new would open the index.html. What is the best way of doing these redirects? Should I set something up in Route 53 example.com/old -> example.com/old/index.html or is there a better way of doing it?
No need for a lambda function adding expense and complexity to your project.
The following answer is quoted from https://stevepapa.com/
https://stevepapa.com/my-great-new-post/ would be expected to work the same way as: https://stevepapa.com/my-great-new-post/index.html
There’s a clever little way to get these flowing through to the Cloudfront distribution, and it involves changing the source origin from the one that Cloudfront presents to you by default.
When selecting the origin source Cloudfront will show you a list of S3 buckets.
Instead of setting the source from the bucket shown in the dropdown list, you’ll need to grab the static web hosting endpoint for that resource from its S3 settings page and pop it in manually.
Using the static source for the Cloudfront distribution origin means any request to that distribution will be using the S3’s root object lookup, and your 404 responses should disappear as the references flow through.
Important
After doing this:
Clear your browser cache
Devalidate the items in your Cloudfront distribution
Otherwise, the changes you made won't go live immediately.
So I had this problem last night too.
The issue is as follows:
S3 when configured as a website bucket is forgiving and has the index document setting, set to index.html and this gets applied at the root, ie, example.com actually gets redirected to example.com/index.html, and it also gets applied at the subfolder level, so example.com/new or example.com/new/ should both redirect to example.com/new/index.html, where there would be an object in the bucket. (If not, you'd get a NoSuchKey error instead.)
However you then "upgrade" yourself to CloudFront, likely for HTTPS, and this feature goes away. CloudFront instead makes explicit API calls to S3 and therefore doesn't trigger the index document concession. It does work for the root, but not for subfolders.
The RoutingRules solution doesn't look clean to me because by specifying KeyPrefixEquals rather than key exactly equals (which doesn't exist) I think you'd get unintended matches.
I instead have implemented a Lambda#Edge rule that rewrites the request that CloudFront makes to S3 to have a proper key value in it.
Start with the Lambda docs and the A/B testing example here:
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-examples.html#lambda-examples-general-examples
Change the code to:
'use strict';
exports.handler = (event, context, callback) => {
/*
* Expand S3 request to have index.html if it ends in /
*/
const request = event.Records[0].cf.request;
if ((request.uri !== "/") /* Not the root object, which redirects properly */
&& (request.uri.endsWith("/") /* Folder with slash */
|| (request.uri.lastIndexOf(".") < request.uri.lastIndexOf("/")) /* Most likely a folder, it has no extension (heuristic) */
)) {
if (request.uri.endsWith("/"))
request.uri = request.uri.concat("index.html");
else
request.uri = request.uri.concat("/index.html");
}
callback(null, request);
};
And publish it to your CloudFront distribution.
There is even easier way to accomplish this with an HTML redirect file
Create a plain file named my-great-new-post (don't worry there won't be a name conflict with the folder in the same bucket)
Write a meta-redirect code in that file (I pasted the code below)
upload file to root bucket (where my-great-new-post folder lays)
modify metadata of the new file and make Content-Type:text/html
Here lays the content of the file:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="refresh" content="0; url=/my-great-new-post/index.html">
</head>
<body>
</body>
</html>
If you're using CloudFront, you can use CloudFront functions to create a simple redirection.
I modified #jkingok's solution.
Go to CloudFront, and click on Functions.
Click on Create function, enter a name and optional description
In the development section, enter the code snippet below and publish from the publish tab.
function handler(event) {
var request = event.request;
if (request.uri !== "/" && (request.uri.endsWith("/") || request.uri.lastIndexOf(".") < request.uri.lastIndexOf("/"))) {
if (request.uri.endsWith("/")) {
request.uri = request.uri.concat("index.html");
} else {
request.uri = request.uri.concat("/index.html");
}
}
return request;
}
Once your function is completed, you can use the function by going to the "Behaviors" tab of your distribution, select the path pattern you want to modify, then under "Function associations", for Viewer Request, select "CloudFront function" as the function type and then select the function you created in the dropdown list.
Once you save the Behaviors, you can test your website.
NOTE: This solution redirects every URL without extension to "URL/index.html", you can modify the behaviour of the function to what works for you.
When you enable and configure static hosting with S3 you need to access the site via the bucket website endpoint. You can find this URL in the bucket properties in the Static website hosting section.
The URL of the website endpoint will look like this:
http://example-bucket.s3-website-eu-west-1.amazonaws.com/example-folder/
However (confusingly) objects stored in S3 are also accessible via a different URL, this url does not honour the index rules on subfolders. This URL looks like this:
https://example-bucket.s3-eu-west-1.amazonaws.com/example-folder/
Configure your Bucket to deliver a static website
Create a CloudFront Distribution: set your bucket as the Origin and leave the OriginPath empty (default: /)
Create Route53 RecordSet which links to your CloudFront Distribution
You can find a helpful walkthrough here
Question: What should happen if your customer enters example.com (without old/new)?
Edit: 2. is optional. You could also link your Route53 RecordSet to your static website but CloudFront enables you to serve your wesbite with https (with help of AWS Certificate Manager).
If you are using CDK to create a CloudFrontWebDistribution with an S3 source, then your first guess is probably to do this:
OriginConfigs = new[] {
new SourceConfiguration {
S3OriginSource = new S3OriginConfig
{
S3BucketSource = bucket
}
Behaviors = new[] { new Behavior { IsDefaultBehavior = true } }
}
}
However, to configure cloudfront to use the website-bucket-url (that does have the behavior to resolve a directory to index.html), you need to use:
OriginConfigs = new[] {
new SourceConfiguration {
CustomOriginSource = new CustomOriginConfig
{
DomainName = bucket.BucketWebsiteDomainName,
OriginProtocolPolicy = OriginProtocolPolicy.HTTP_ONLY
},
Behaviors = new[] { new Behavior { IsDefaultBehavior = true } }
}
}
You need to specify the protocol as HTTP_ONLY because website buckets do not support HTTPS. The default for a CustomOriginSource is HTTPS_ONLY.
You can try setting Redirection rules, Here is an untested rule.
<RoutingRules>
<RoutingRule>
<Condition>
<KeyPrefixEquals>old</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyWith>old/index.html</ReplaceKeyWith>
</Redirect>
</RoutingRule>
<RoutingRule>
<Condition>
<KeyPrefixEquals>new</KeyPrefixEquals>
</Condition>
<Redirect>
<ReplaceKeyWith>new/index.html</ReplaceKeyWith>
</Redirect>
</RoutingRule>
</RoutingRules>

AWS amazon-s3 rewrite url to query param and go to index.html

I have an S3 bucket hosting a static website with a single .html file (index.html) and some .js and .css.
I want to be able to hit mybucket.s3-website-eu-west-1.amazonaws.com/12345678 and have the url rewritten to mybucket.s3-website-eu-west-1.amazonaws.com/index.html?ID=12345678.
After reading the docs I am not sure I can do this with S3 alone, but maybe I am wrong? What solutions do I have for this?
Note: URL's for the assets (e.g. /css/main.css) need to still work although I could look to inlining everything in the HTML if this is the only option.
This is not directly possible with S3. You can use AWS CloudFront and Edge Lambda to do the URL rewrites as required.
Create a CloudFront distribution and connect your S3 bucket as an origin.
Create an Edge Lambda inside the distribution and write your URL rewrite rules.
exports.handler = (event, context, callback) => {
// this is the request we want to re-map
var request = event.Records[0].cf.request;
// the request has a 'uri' property which is the value we want to overwrite
// rewrite the url applying your custom logic
request.uri = 'some custom logic here to rewrite the url';
// quit the lambda and let the request chain continue
callback( null, request );
};

API Gateway GET / PUT large files into S3

Following this AWS documentation, I was able to create a new endpoint on my API Gateway that is able to manipulate files on an S3 repository. The problem I'm having is the file size (AWS having a payload limitation of 10MB).
I was wondering, without using a lambda work-around (this link would help with that), would it be possible to upload and get files bigger than 10MB (even as binary if needed) seeing as this is using an S3 service as a proxy - or is the limit regardless?
I've tried PUTting and GETting files bigger than 10MB, and each response is a typical "message": "Timeout waiting for endpoint response".
Looks like Lambda is the only way, just wondering if anyone else got around this, using S3 as a proxy.
Thanks
You can create a Lambda proxy function that will return a redirect link with a S3 pre-signed URL.
Example JavaScript code that generating a pre-signed S3 URL:
var s3Params = {
Bucket: test-bucket,
Key: file_name,
ContentType: 'application/octet-stream',
Expires: 10000
};
s3.getSignedUrl('putObject', s3Params, function(err, data){
...
}
Then your Lambda function returns a redirect response to your client, like,
{
"statusCode": 302,
"headers": { "Location": "url" }
}
You might be able to find more information you need from this documentation.
If you have large files, consider directly uploading them to S3 from your client. You can create a API endpoint to return a signed URL for the client to use for the upload (To Implement Access Control) your private content.
Also you can consider using multi-part uploads for even larger files to speed up the uploading.

Is it possible to upload to S3 directly from URL using POST?

I know there is a way to upload to S3 directly from the web browser using POST without the files going to your backend server. But is there a way to do it from URL instead of web browser.
Example, upload a file that resides at http://example.com/dude.jpg directly to S3 using post. I mean I don't want to download the asset to my server then upload it to S3. I just want to make a POST request to S3 and it uploads it automatically.
It sounds like you want S3 itself to download the file from a remote server where you only pass the URL of the resource to S3.
This is not currently supported by S3.
It needs an API client to actually transfer the content of the object to S3.
I thought I should share my code to achieve something similar. I was working on the backend but possibly could do something similar in frontend though be mindful about AWS credentials likely to be exposed.
For my purposes, I wanted to download a file from the external URL and then ultimately get back the URL form S3 of the uploaded file instead.
I also used axios in order to get the uploadable format and file-type to get the proper type of the file but that is not the requirement.
Below is the snippet of my code:
async function uploadAttachmentToS3(type, buffer) {
var params = {
//file name you can get from URL or in any other way, you could then pass it as parameter to the function for example if necessary
Key : 'yourfolder/directory/filename',
Body : buffer,
Bucket : BUCKET_NAME,
ContentType : type,
ACL: 'public-read' //becomes a public URL
}
//notice use of the upload function, not the putObject function
return s3.upload(params).promise().then((response) => {
return response.Location
}, (err) => {
return {type: 'error', err: err}
})
}
async function downloadAttachment(url) {
return axios.get(url, {
responseType: 'arraybuffer'
})
.then(response => {
const buffer = Buffer.from(response.data, 'base64');
return (async () => {
let type = (await FileType.fromBuffer(buffer)).mime
return uploadAttachmentToS3(type, buffer)
})();
})
.catch(err => {
return {type: 'error', err: err}
});
}
let myS3Url = await downloadAttachment(url)
I hope it helps people who still struggle with similar issues. Good luck!
I found this article with some details. You will probably have to modify your buckets' security settings in some fashion to allow this type of interaction.
http://aws.amazon.com/articles/1434
There will be some security issues on the client as well since you never want your keys publicly accessible
You can use rclone to achieve this easily:
https://rclone.org/commands/rclone_copyurl/
Create a new access key on AWS for rclone and use rclone config like this:
https://rclone.org/s3/
Then, you can easily interact with your S3 buckets using rclone.
To upload from URL:
rclone -Pva copy {URL} RCLONE_CONFIG_NAME:/{BUCKET_NAME}/{FOLDER}/
It is quite handy for me as I am archiving my old files from Dropbox Business to S3 Glacier Deep Archive to save on Dropbox costs.
I can easily create a file transfer from Dropbox (100GB per file limit), copy the download link and upload directly to S3 using rclone.
It is copying at 10-12 MiB/s on a small DigitalOcean droplet.
If you are able you can use Cloudinary as an alternative to S3. They support remote upload via URL and more.
https://cloudinary.com/documentation/image_upload_api_reference#upload_examples