AWS S3 / Cloudfront deployment - certain paths aren't updating on static website - amazon-web-services

I'm fairly new to AWS and web dev - I have a simple static website at https://iveyfintechclub.ca
I recently refactored the code and made some changes to the project organization. I basically wiped the S3 bucket and reuploaded all new files and folders.
On CloudFront, I have object caching set to use origin cache headers:
my CloudFront distribution behavior config
I also did an invalidation with /*.
On S3, I've set the metadata Cache-Control to max-age=0 for all files.
Two problems are still eluding me:
The old bucket had a blank index.html which redirected to a
nested HTML file. The new bucket has index.html as the landing page.
When I attempt to visit the root URL, I get a 404 error as it still
attempts to reach the old nested HTML path. This doesn't happen in incognito mode (browser cache issue).
2. On the new landing page, I have a script file which is getting a 404
error as its looking for the file on its old path. Inspecting the HTML shows that the new path is in the client. This is happening in incognito mode too. All other resources are loading properly with
new paths, just this one is failing.
I'm wondering if I just have to wait longer or if I'm still missing a configuration.

Related

Amazon S3 SOAP in URL returns empty 200

We have a static site hosted on Amazon S3 with a CloudFront CDN. There is a 404 error page in CloudFront that redirects to the site to be handled on the client side (react routing).
All 404 pages work correctly but if I have "/soap" in the URL, it returns a 200 and a blank response. I believe s3 is returning a 200 instead of a 404 when /soap is in the URL.
How would I prevent s3 from intercepting /soap requests and return a 404?
This problem occurs when you have an Improper directory control. Instead of directly copying build outputs in s3 bucket, first create a folder with random name and paste the build output in that random folder.
Now you have to provide Origin path in your Cloud Front distributions. For setting origin path, you need to open your Cloud Front distribution console and select Origins tab and then select your Origin then click on edit, you can now find Origin path section under settings. Now you have to provide your random folder name there with '/'. (For example: /Random Folder). Now you can check this with "/soap" in the URL it will redirect to your Error Page.
For setting Origin Path you can refer the following link:
Setting Origin Path in AWS Cloud Front Distribution

Understanding Server/Client Routing: How Can Amazon(?) Be Redirecting My SPA ... Without a Redirect (or History Entry)?

NOTE: I'm providing details of my setup, but really this is a "how is this possible" question, not a "please debug my setup" question.
I have a "singe page application" (ie. an HTML file that uses the History API to simulate URLs). I'm serving this app on AWS S3, behind an AWS Cloudfront ... front.
I had successfully configured things so that if someone went to www.example.com/foo (let's pretend I own example.com), Cloudfront would serve an "error page" of my index.html. My index.html would then see the URL, and use its routing to show the user the correct page.
That all worked great ... until it didn't. Now for some reason when I go to www.example.com/foo, I get redirected to www.example.com. I'm trying to debug things, but what I can't understand is how I'm going from /foo to the main page.
When I look in the Network panel of my developer tools, I can see the request made to the original (/foo). Then I can see the chain of requests (for images, css files, etc.), and they all have a referrer of www.example.com/foo.
Then all of the sudden I see a request for React Developer tools (why it needs to make a request is beyond me) ... and it's from referrer www.example.com. After that I get one last image request from /foo, and then all subsequent requests come from www.example.com.
Can anyone explain how this could be working? I know that if a server returns a redirect (either type) that could change my URL ... but every request has a 200 status (ie. no server redirects).
I know Javascript could "push" a new URL to my browser ... but that would leave a history entry right? When I go "back" (either with my browser or history.back()) I go to the page before; I don't go "back" to /foo.
So somehow I'm not making a history entry, but I am switching my URL, and the URL I make requests from, and this all happens within milliseconds on page load ... without any redirects. How?
P.S. When I use my dev tools to add an beforeunload breakpoint, then try to navigate from example.com to example.com/foo I don't hit that break point (either for going to /foo, or when I'm "redirected" back to example.com).
When I check the box for any Load event, I do see some happen ... after my URL has already switched. In other words, I type example.com/foo, hit enter, and by the time any event fires I'm back on example.com. Whatever mechanism is doing the "redirection" here ... it doesn't trigger any load events.
I figured out my (AWS-specific) problem, thanks to a bit of Gatsby documentation. I'll include the details below in case it helps others, but I won't accept this answer, as I still don't understand how AWS did what it did (and I'd still welcome an answer for that).
What happened was that I had my Cloudfront "Origin Domain Name and Path" pointing to:
example.com.s3.amazonaws.com
However, as explained on https://www.gatsbyjs.com/docs/deploying-to-s3-cloudfront/:
There are two ways that you can connect CloudFront to an S3 origin. The most obvious way, which the AWS Console will suggest, is to type the bucket name in the Origin Domain Name field. This sets up an S3 origin, and allows you to configure CloudFront to use IAM to access your bucket. Unfortunately, it also makes it impossible to perform serverside (301/302) redirects, and it also means that directory indexes (having index.html be served when someone tries to access a directory) will only work in the root directory. You might not initially notice these issues, because Gatsby’s clientside JavaScript compensates for the latter and plugins such as gatsby-plugin-meta-redirect can compensate for the former. But just because you can’t see these issues, doesn’t mean they won’t affect search engines.
In order for all the features of your site to work correctly, you must instead use your S3 bucket’s Static Website Hosting Endpoint as the CloudFront origin. This does (sadly) mean that your bucket will have to be configured for public-read, because when CloudFront is using an S3 Static Website Hosting Endpoint address as the Origin, it’s incapable of authenticating via IAM.
Once I changed my Cloudfront "Origin Domain Name and Path" to the bucket's static hosting URL:
http://example.com.s3-website-us-west-1.amazonaws.com
Everything worked!
But again, I still don't understand how AWS did what it did when I mis-set my "Origin Domain Name and Path". It redirected me to my root domain, seemingly without either a redirect response OR a client-side redirect, and I'd love to hear how that was accomplished.

Subdirectory pages not found static site hosted on Google Cloud Storage Bucket

I'm setting up a static site on a Google Cloud Storage Bucket with Loadbalancer. The site gets generated with Gridsome and then the dist folder gets saved in the bucket.
I have set the index and error with gsutil like in the [documentation]: https://cloud.google.com/storage/docs/gsutil/commands/web
Now I am facing a problem with how every url for accessing subdirectories gets redirected to dir/index.html. This is desired behavior, the dir/index.html page even exists in the bucket. But I still get a 404 - not found.
If I do a curl to the url subdir/index.html I get the HTML
I do not know exactly how you are testing your subfolder but I think this link can help you with your issue Error 404 when loading subfolder on GCS. In addition, you maybe must to take a look here How subdirectories work.
Based on How subdirectories work on GCS, when browser request URL http://www.example.com/dir it will be redirect (301) to The object http://www.example.com/dir/index.html on content served.
My assumption is there is no route http://www.example.com/dir/index.html on Vue (vue-router). So it will be throw to Not Found 404 page.
The simple solution is try to change all subdirectories link from
http://www.example.com/dir, http://www.example.com/about etc, to
http://www.example.com/dir/, http://www.example.com/about/
It will not redirect to 404 page when you request subdirectories url or reload the browser. But we all know that it's not best practices.

Static content on CloudFront is cached incorrectly over time

I have set up a CloudFront on top of multiple S3 buckets (in different regions) to provide a fast stable version of my webapp. This webapp is implemented with React which means it's all one single HTML file and one single Javascript file.
Using the routing mechanism of React, all the paths in the URL are handled within the code. This means if I click on a link like www.example.com/users, there won't be a request sent to the server. Instead, the client code will render the appropriate page without any consultation with the server (I'm just talking about the HTML and not considering the data). This means that if some user types in the given URL, the server should return the index.html (the only HTML file I have) which then will take care of the URL on the client-side. In other words, all the requests sent to the server should either return the HTML file or the Javascript file I mentioned earlier. Even the requests that are pointing to none-existing files.
In order to implement this requirement, I asked this question and I got an answer like this:
I need to set up an error page for my distribution on CloudFront and
redirect all the 403 (Forbidden) requests to /index.html file. This
is because when the request is pointing to a nonexisting file on S3,
S3 will return 403 to CloudFront due to the lack of listing
permission. Or I can grant the listing permission and instead handle
the 404 error (I didn't test this latter option).
Anyways, I set this up and it works perfectly - for a few hours. But then, for some unknown reason, the request to the Javascript file also returns the HTML file. And of course, all I'm getting back is actually coming from CloudFront's cache which means, no matter how many times I send the request, it will keep returning the same value. That is until I invalidate the cache on CloudFront which will solve the problem for few more hours. And we go around and around.
Even though I'm not sure why this happens but my guess is that at some point the S3 buck is inaccessible to CloudFront which will result in CloudFront caching the index.html. What can I do about this?
I think I found the problem:
MAKE SURE YOUR STATIC CONTENT ON ALL THE S3 BUCKETS ARE IDENTICAL!!!
In my case, the Javascript filename is automatically generated by Webpack which means it's random. And since different regions were "compiled" separated, their filenames differed.

Amazon S3 redirects to /index.html not root of page

I have two sites statically hosted at S3.
Site example-one.com should redirect to example-two.com.
I added a redirect in the S3 bucket for example-one.com.
Like shown in the picture below.
Problem: entering example-one.com results in example-two.com/index.html
Question: How can I make it redirect to **example-two.com and not example-two.com/index.html ?**
When I access example-two.com directly it results in example-two.com as intended.
example-two.com has the S3 static site setting index document given as index.html
I use CloudFront for both domains. They previously had set the setting default root object as index.html, however I removed this earlier today so it's blank now.
To see it live access the link
Please specify if further details are needed :-)