Hope you're all doing well!
I have a question I'm hoping to get some help with. I have a static site served through S3 with CloudFront distributions in front.
My main site is served on www.xyz.xyz and the cloudfront distribution connected ha a behavior http to https redirect.
Then I also want people to be able to access http://xyz.xyz, therefore I have created another bucket for the naked domain, with a redirect policy to www.xyz.xyz with http as protocol. In the CloudFront distribution connected to this the origin is the direct S3 website link, and not the bucket.
In the end this ensures all guests end at https://www.xyz.xyz, however when running Google Lighthouse for a SEO check, if I enter http://xyz.xyz it seems to go through 2 redirects, one to https and one to www and I'm assuming, according to Lighthouse, that this has some negative effects in that regard, both in terms of time to serve, but also SEO.
Am I doing something wrong? I hope you can help me. I really thought it was simpler, also with all the buckets and such :-)
I noticed in AWS Amplify you need to setup redirect/rewrites, but I guess in S3 + CloudFront terms, that's what I'm already doing.
Best,
To maintain compatibility with HSTS, you must perform your redirection in two steps. The first redirect should upgrade the request to https. The second can canonicalize the domain (add or remove www). So this behavior is desirable.
Related
NOTE: I'm providing details of my setup, but really this is a "how is this possible" question, not a "please debug my setup" question.
I have a "singe page application" (ie. an HTML file that uses the History API to simulate URLs). I'm serving this app on AWS S3, behind an AWS Cloudfront ... front.
I had successfully configured things so that if someone went to www.example.com/foo (let's pretend I own example.com), Cloudfront would serve an "error page" of my index.html. My index.html would then see the URL, and use its routing to show the user the correct page.
That all worked great ... until it didn't. Now for some reason when I go to www.example.com/foo, I get redirected to www.example.com. I'm trying to debug things, but what I can't understand is how I'm going from /foo to the main page.
When I look in the Network panel of my developer tools, I can see the request made to the original (/foo). Then I can see the chain of requests (for images, css files, etc.), and they all have a referrer of www.example.com/foo.
Then all of the sudden I see a request for React Developer tools (why it needs to make a request is beyond me) ... and it's from referrer www.example.com. After that I get one last image request from /foo, and then all subsequent requests come from www.example.com.
Can anyone explain how this could be working? I know that if a server returns a redirect (either type) that could change my URL ... but every request has a 200 status (ie. no server redirects).
I know Javascript could "push" a new URL to my browser ... but that would leave a history entry right? When I go "back" (either with my browser or history.back()) I go to the page before; I don't go "back" to /foo.
So somehow I'm not making a history entry, but I am switching my URL, and the URL I make requests from, and this all happens within milliseconds on page load ... without any redirects. How?
P.S. When I use my dev tools to add an beforeunload breakpoint, then try to navigate from example.com to example.com/foo I don't hit that break point (either for going to /foo, or when I'm "redirected" back to example.com).
When I check the box for any Load event, I do see some happen ... after my URL has already switched. In other words, I type example.com/foo, hit enter, and by the time any event fires I'm back on example.com. Whatever mechanism is doing the "redirection" here ... it doesn't trigger any load events.
I figured out my (AWS-specific) problem, thanks to a bit of Gatsby documentation. I'll include the details below in case it helps others, but I won't accept this answer, as I still don't understand how AWS did what it did (and I'd still welcome an answer for that).
What happened was that I had my Cloudfront "Origin Domain Name and Path" pointing to:
example.com.s3.amazonaws.com
However, as explained on https://www.gatsbyjs.com/docs/deploying-to-s3-cloudfront/:
There are two ways that you can connect CloudFront to an S3 origin. The most obvious way, which the AWS Console will suggest, is to type the bucket name in the Origin Domain Name field. This sets up an S3 origin, and allows you to configure CloudFront to use IAM to access your bucket. Unfortunately, it also makes it impossible to perform serverside (301/302) redirects, and it also means that directory indexes (having index.html be served when someone tries to access a directory) will only work in the root directory. You might not initially notice these issues, because Gatsby’s clientside JavaScript compensates for the latter and plugins such as gatsby-plugin-meta-redirect can compensate for the former. But just because you can’t see these issues, doesn’t mean they won’t affect search engines.
In order for all the features of your site to work correctly, you must instead use your S3 bucket’s Static Website Hosting Endpoint as the CloudFront origin. This does (sadly) mean that your bucket will have to be configured for public-read, because when CloudFront is using an S3 Static Website Hosting Endpoint address as the Origin, it’s incapable of authenticating via IAM.
Once I changed my Cloudfront "Origin Domain Name and Path" to the bucket's static hosting URL:
http://example.com.s3-website-us-west-1.amazonaws.com
Everything worked!
But again, I still don't understand how AWS did what it did when I mis-set my "Origin Domain Name and Path". It redirected me to my root domain, seemingly without either a redirect response OR a client-side redirect, and I'd love to hear how that was accomplished.
For example:
About www.abc.com , and en.abc.com
I want to know how to configure the CDN or somethings, make the CDN only works in www.abc.com,
for en.abc.com don't works.
I am using aliyun.com as my cdn provider.
How about the NGINX or Django or Domain or CDN settings?
CDN systems always 'only' work on their configured hostnames. Basically, a CDN is a reverse proxy with a set of rules on it. For any request coming in, it has to know
where to fetch the content from
which additional logic to apply to the content when delivering it
If you want to use a different hostname on the CDN, you will have to make the CDN work, all other components in your web site delivery will not be reached if the CDN configuration doesn't proxy the request to your web server.
I am not familiar with aliyun.com specifically, but there might be a chance to have them set up a wildcard/regex hostname (like *.example.com). You will have to get suport from aliyun to understand if this is possible.
I'm trying to make Cloudfront work on my solution. I'm using Route 53 + CloudFront + ELB.
Consider the following:
1. Route 53 is pointing to CloudFront through a record set alias.
2. CloudFront is pointing to the ELB through a origin domain name.
3. CloudFront has an Alternate Domain Name set to my custom domain (mysite.com)
If I make a request using the CloudFront domain name (d1ngxxxx.cloudfront.net) or the custom domain (mysite.com), the initial request goes to CloudFront which responds with a HTTP 302. All the subsequent requests (for resources like images, css, js..) are made directly to the ELB domain name bypassing CloudFront.
What should I do to make all requests go throuhg CloudFront?
Thanks is advance!
I can't come up with a circumstance where Cloudfront would issue these redirects.
It seems likely that what's happening is that your server itself is issuing the 302 redirect, because it doesn't like the Host: header it's getting from Cloudfront.
Host: CloudFront sets the value to the domain name of the origin that is associated with the requested object.
— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/RequestAndResponseBehaviorCustomOrigin.html
Cloudfront is then returning the redirect to the browser.
Cloudfront can also cache such a redirect, so be mindful of that as you're troubleshooting. The response headers should indicate whether cloudfront went to the origin for the particular reponse:
X-Cache: Miss from cloudfront
...or whether cloudfront served the request from cache.
X-Cache: Hit from cloudfront
Two possible approaches to resolve this:
If your legacy code is reacting to the Host: header in a negative way, you might be able to reconfigure the web server to modify that value before the code is able to see it, so the redirection wouldn't occur.
Alternately, you could use something outboard, a reverse-proxying engine like Varnish or HAProxy (of which I have touched on elsewhere). In HAProxy, for a simple example:
reqirep ^Host:\ .* Host:\ expected-domain.example.com if { hdr(host) -i unexpected-domain.example.com }
A rule in form similar to this would replace the Host: unexpected-domain.example.com header with Host: expected-domain.example.com in all incoming requests where that header was present, which should keep your legacy code happy and avoid the redirects. Running HAProxy in front of your legacy system doesn't impose a significant load, since the code is very tight. All of my legacy web systems are now fronted with these systems, to give me the ability to manipulate and modify behavior much more easily than might otherwise be possible.
So we are using the Meteor browser-policy package, and using Amazon S3 to store content.
On the server we have setup the browser policy as follows:
BrowserPolicy.content.allowOriginForAll('*.amazonaws.com');
BrowserPolicy.content.allowOriginForAll('*.s3.amazonaws.com');
This works fine in local dev and in production when visiting our http:// site. However when using the https:// address to our site the AWS content no longer passes this policy.
The following error is put on the console
Refused to load the image 'http://our-bucket-name.s3.amazonaws.com/asset-stored-in-s3.png' because it violates the following Content Security Policy directive: "img-src data: 'self' *.google-analytics.com *.zencdn.net *.filepicker.io *.uservoice.com *.amazonaws.com *.s3.amazonaws.com".
As you can see we have some other origins allowed in the browser policy, these all seem to work fine in both http and https. AWS S3 is the only one that is failing.
I've tried Chrome, Firefox, and Safari and they all have the same issue.
Whats going on?
I may not have the exact answer to this question but I have some information which the community may find helpful.
First, you should avoid serving mixed content. I'm unclear if that would set off the browser policy alerts but you just shouldn't do it anyway. The easiest solution is to use a protocol-relative-url or just explicitly specify https in your url.
Second, I too assumed that the wildcard worked like a glob. However, I've been told that it works the same way as an ssl certificate rule - i.e. for all subdomains or for a specific subdomain. In other words, *.example.com and www.example.com, are valid but *.foo.example.com, isn't meaningful. I think you want to explicitly add your bucket like so:
BrowserPolicy.content.allowOriginForAll('our-bucket-name.s3.amazonaws.com')
unless you literally want to trust all of amazonaws.com.
I know this is an apples and oranges question but I'd like to understand the pros and cons of using https and signed urls with AWS Cloudfront. Might people please comment on and add to this list?
HTTPS
PROS
Security: https is more secure than http. Though, I'm not sure what this mean b/c if you can't trust that the URL is actually from Amazon, who can you trust?
Preserve your application's status quo: Your site is already fully https for another reason, like you handle credit cards. Using https for cloudfront prevents alerting the user that you are serving insecure content, i.e., the dreaded "yellow" indicator symbol. Could this also be a con if you're site is fully http (honest question)?
Degree of difficulty: 0/10. Just change http to https in your url, it works either way out of the box. On the other hand, if you want to use your own CNAME with https, this seems significantly more confusing, 7/10, though I haven't tried it due to con #1 below...
CONS
Cost: $600/month !! to use https with own CNAME, e.g., images.mysite.com instead of blah123.cloudfront.com. On the other hand, my understanding is that using CNAMEs with http is free?
SIGNED URLS
PROS
REAL security: signed urls would seem the most commonly needed method to control who has access to your site's content. You can control things like the user IP address and the time duration of who has access.
Cost: none
CONS
Degree of difficulty: 9/10. Creating signed urls is relatively confusing. There's lots of terminology to learn and possibly some libraries not part of the AWS SDK you'll need to track down.
HTTPS helps secure data in transit, which is helpful if you are already using SSL for access to your application. With the CNAME issue, most people are likely not going to realize that your images and other static content are being delivered from cloudfront.net instead of yourdomain.com
Signing URLs only helps control who can access a given file and how long they can access it for. You may use this for delivery digital purchases, or other private files to logged in users. You also loose some of the caching benefit of cloudfront.