How to make CloudFront never cache index.html on S3 bucket - amazon-web-services

I have a React app hosted on an S3 bucket. The code is minified using yarn build (it's a create-react-app based app). The build folder looks something like:
build
├── asset-manifest.json
├── favicon.ico
├── images
│   ├── map-background.png
│   └── robot-icon.svg
├── index.html
├── js
│   ├── fontawesome.js
│   ├── packs
│   │   ├── brands.js
│   │   ├── light.js
│   │   ├── regular.js
│   │   └── solid.js
│   └── README.md
├── service-worker.js
└── static
├── css
│   ├── main.bf27c1d9.css
│   └── main.bf27c1d9.css.map
└── js
├── main.8d11d7ab.js
└── main.8d11d7ab.js.map
I never want index.html to be cached, because if I update the code (causing the hex suffix in main.*.js to update), I need the user's next visit to pick up on the <script src> change in index.html to point to the updated code.
In CloudFront, I can only seem to exclude paths, and excluding "/" doesn't seem to work properly. I'm getting strange behavior where I change the code, and if I hit refresh, I see it, but if I quit Chrome and go back, I see very outdated code for some reason.
I don't want to have to trigger an invalidation on every code release (via CodeBuild). Is there some other way? I think one of the challenges is that since this is an app using React Router, I'm having to do some trickery by setting the error document to index.html and forcing an HTTP status 200 instead of 403.

A solution based on CloudFront configuration:
Go to your CloudFront distribution, under the "Behavior" tab and create a new behavior.
Specify the following values:
Path Pattern: index.html
Object Caching: customize
Maximum TTL: 0 (or another very small value)
Default TTL: 0 (or another very small value)
Save this configuration.
CloudFront will not cache index.html anymore.

If you never want index.html to be cached, set the Cache-Control: max-age=0 header on that file only. CloudFront will make a request back to your origin S3 bucket on every request, but it sounds like this is desired behavior.
If you're wanting to set longer expiry times and invalidate the CloudFront cache manually, you can use a * or /* as your invalidation path (not / as you have mentioned). This can take up to 15 minutes for all CloudFront edge nodes around the world to reflect the changes in your origin however.

Here is the command I ran to set cache-control on my index.html file after uploading new files to s3 and invalidating Cloudfront:
aws s3 cp s3://bucket/index.html s3://bucket/index.html --metadata-directive REPLACE --cache-control max-age=0 --content-type "text/html"

It's much better to run an invalidation for index.html on every release than to defeat Cloudfront's purpose and serve it (what is basically an entrypoint for your app) from S3 every single time.

Related

django manage static files accessibility

example app tree:
articles
├── admin.py
├── apps.py
├── models.py
├── static
│   ├── css
│   │   ├── article_change.css
│   │   ├── article_create.css
│   │   ├── article_detail.css
│   └── js (obfuscated)
│   ├── article_create.js
│   ├── article_list.js
│   ├── edit_process.js
│   ├── editor.js
│   └── js (readable)
│   ├── article_create.js
│   ├── article_list.js
│   ├── edit_process.js
│   └── editor.js
├── templates
│   └── articles
│   ├── article_create.html
│   ├── article_detail.html
│   ├── edit_process.html
│   └── editor.html
├── tests.py
├── urls.py
└── views.py
static/js/js contains javascript that is human readable
static/js contains obfuscated javascript files.
I wrote a template tag to includes files:
#register.simple_tag
def jstatic(path):
s = ''
if settings.DEBUG:
s = 'js/'
return static(s + path)
in templates, I can do:
<script src="{% jstatic 'js/info.js' %}"></script>
which conditionally renders static javascript files based on DEBUG mode. whereas, if not in DEBUG mode, will serve obfuscated files.
the thing is, I don't want unobfuscated file to be accessed when DEBUG is not on, which is running the application on server.
when debug is on, I want user user only to visit obfuscated files:
static/js/js/info.js
and have no access to
static/js/info.js
all the apps follows this root tree convention, I wonder if there is a way, for me to block static/js/info.js is DEBUG is not on.
I have thought about shell setting directory permissions, but give up eventually. because, it will not work due to the wrapping structure of the directory. and it will be too much work to modify it, in the project there are about 20 apps.
This is not possible through standard configuration. How to solve this depends on your configuration, but there are three ways to solve this:
If you use some kind of minifier/webpack like configuration to obfuscate your JS files, you could move the JS files to a src directory and have your tooling only copy when DEBUG is True, and copy and obfuscate when debug is False.
You can use two static directories, one for readable files, and the other for obfuscated files (something like src/static/* and dist/static/*), and then only point to the source directory on development environments:
STATICFILES_DIRS = [ "src/static", "dist/static"] vs. STATICFILES_DIRS = [ "dist/static"] on production.
In this case, Django's static files finder will return the first match found.
Leave your configuration as is, but use a webserver like NGINX for serving static files (Which is already the recommended way to serve static files.) In NGINX's configuration you can define a location that 404's as long as it appears before the location serving your static files.

Where does EMR store Spark stdout?

I am running my Spark application on EMR, and have several println() statements. Other than the console, where do these statements get logged?
My S3 aws-logs directory structure for my cluster looks like:
node
├── i-0031cd7a536a42g1e
│   ├── applications
│   ├── bootstrap-actions
│   ├── daemons
│   ├── provision-node
│   └── setup-devices
containers/
├── application_12341331455631_0001
│   ├── container_12341331455631_0001_01_000001
You can find println's in a few places:
Resource Manager -> Your Application -> Logs -> stdout
Your S3 log directory -> containers/application_.../container_.../stdout (though this takes a few minutes to populate after the application)
SSH into the EMR, yarn logs -applicationId <Application ID> -log_files <log_file_type>
There is a very important thing that you need to consider when printing from Spark: are you running code that gets executed in the driver or is it code that runs in the executor?
For example, if you do the following, it will output in the console as you are bringing data back to the driver:
for i in your_rdd.collect():
print i
But the following will run within an executor and thus it will be written in the Spark logs:
def run_in_executor(value):
print value
your_rdd.map(lambda x: value(x))
Now going to your original question, the second case will write to the log location. Logs are usually written to the master node which is located in /mnt/var/log/hadoop/steps, but it might be better to configure logs to an s3 bucket with --log-uri. That way it will be easier to find.

How to deploy a Go web application in Beanstalk with custom project folder structure

I'm new to Go.
I am trying to deploy a simple web project to EB without success.
I would like to deploy a project with the following local structure to Amazon EB:
$GOPATH
├── bin
├── pkg
└── src
├── github.com
│   ├── AstralinkIO
│   │   └── api-server <-- project/repository root
│   │   ├── bin
│   │   ├── cmd <-- main package
│   │   ├── pkg
│   │   ├── static
│   │   └── vendor
But I'm not sure how to do that, when building the command, Amazon is treating api-server as the $GOPATH, and of course import paths are broken.
I read that most of the time it's best to keep all repos under the same workspace, but it makes deployment harder..
I'm using Procfile and Buildfile to customize output path, but I can't find a solution to dependencies.
What is the best way to deploy such project to EB?
Long time has past since I used Beanstalk, so I'm a bit rusty on the details. But basic idea is as follows. AWS Beanstalk support for go is a bit odd by design. It basically extracts your source files into a folder on the server, declares that folder as GOPATH and tries to build your application assuming that your main package is at the root of your GOPATH. Which is not a standard layout for go projects. So your options are:
1) Package your whole GOPATH as "source bundle" for Beanstalk. Then you should be able to write build.sh script to change GOPATH and build it your way. Then call build.sh from your Buildfile.
2) Change your main package to be a regular package (e.g. github.com/AstralinkIO/api-server/cmd). Then create an application.go file at the root of your GOPATH (yes, outside of src, while all actual packages are in src as they should be). Your application.go will become your "package main" and will only contain a main function (which will call your current Main function from github.com/AstralinkIO/api-server/cmd). Should do the trick. Though your mileage might vary.
3) A bit easier option is to use Docker-based Go Platform instead. It still builds your go application on the server with mostly same issues as above, but it's better documented and possibility to test it locally helps a lot with getting configuration and build right. It will also give you some insights into how Beanstalk builds go applications thus helping with options 1 and 2. I used this option myself until I moved to plain EC2 instances. And I still use skills gained as a result of it to build my current app releases using docker.
4) Your best option though (in my humble opinion) is to build your app yourselves and package it as a ready to run binary file. See second bullet point paragraph here
Well, which ever option you choose - good luck!

how to configure nginx to indicate ’/‘ to my main page

i have a django project and my web static files are at 'web/' directory
here is the structure:
➜ web git:(ycw.alpha) tree -L 4
.
└── forward
├── asserts
│   ├── img
│   │   ├── background
│   │   ├── qr
│   │   └── thumb
│   └── style
│   ├── css
│   └── sass
├── index.html
├── package.json
├── script.js
├── source
└── unit
i have configured Nginx conf and i want nginx to directly indicate to 'web/forward/index.html' when i request my own website 'http://example.com'
i do the thing above like this:
location / {
index index.html
root /path/to/my/django/project/;
}
location /index.html {
alias /path/to/my/django/project/web/forward/index.html;
}
it indeed directly redirects to 'index.html', but the question is there are some references in 'index.html' to some static files such as img or css and the paths are relative paths like './asserts/style/css/index.css' so consequently these files are not found as 404
how can i configure it correctly?
The problem has been solved by changing 'root' in location / to '/path/to/the/index/directory' rather than the root directory of the django project, in this way you can still use relative paths of static resources in html file generally.

how to use Hugo with github pages to automatically update content

I am using Hugo to deploy a static page to github pages
I have a git repo in the /public folder but the contents of the /static folder are not a part of the repository. Therfore they are not uploaded to the username.github.io page.
the /static folder contains images and css files. This is why my page does not look good after pushing to github.
My workaround is that each time I manually copy the /static folder into the /public folder after I build the site.
I think there should be a better solution and I am probably missing something in the config.toml file of the hugo workflow.
I am following the instructions from this site
Any ideas how to automatically include /static files into the repository?
Hugo copies all the files in the static/ directory into the public/ directory when your site is rendered. For example, if you have a static/ folder that looks like this:
.
├── css
│   └── main.css
└── img
├── favicon.png
└── avatar.png
Then when you build your site, the public/ folder will look like this:
.
├── css
│ ├── main.css
│ └── <theme css files>
├── img
│ ├── favicon.png
│ ├── avatar.png
│ └── <theme images>
<more content folders>
So the files in your static folder are probably being included already. The problem is likely to be that your theme is looking for your static files in the wrong place. Take a look at your theme documentation and see if it says anything about static assets.