CloudFront atomic replication - amazon-web-services

I want to host a static site via Amazon CloudFront from an S3 bucket. If I update the content of the bucket with a new version of the page, is there a way I can ensure the distribution happens in an atomic way?
What I mean is, if I have assets like a.js and b.js, that the updated version of both is served at the same time, and not e.g. the old a.js and new b.js.

You have a couple of options:
You can request an invalidation. Takes about 15 minutes or so to complete.
You can give your new assets a new name. This is a bit harder to do, but in my opinion the preferable route. Since its easier to enable long expiration client side caching.

If you perform object invalidataion, there is no gurantee that two js files will be invalidataed at the same time. There would definitely be some time when your site will behave unexpectedly.
Either you do it at a time when you expect least number of users visiting your site Or create new resources like "datasage" mentioned & then use names of these newly created resources to update all the files that reference these.

Related

multiple organizations/accounts and dev environments for S3 buckets

As per best practice aws resources should be per account (prod, stage, ...) and its also good to give devs their own accounts with defined limits (budget, region, ...).
Im now wondering how i can create a full working dev environment especially when it comes to S3 buckets.
Most of the services are pay per use so its totally fine to spin up some lambdas, SQS etc. to use the real services for dev.
Now to the real questions what should be done with static assets like pictures, downloads and so on which are stored in S3 buckets?
Duplicating those buckets for every dev/environment could come expensive as you pay for storage and/or data transfer.
What i thought was to give the devs S3 bucket a redirect rule and when a file is not found (e.g. 404) in the dev bucket it redirects to the prod bucket so that images, ... are retrieved from there.
I have testet this and it works pretty well but it solves only part of the problem.
The other part is how to replace those files in a convenient way?
Currently static assets and downloads are also in our git (maybe not the best idea after all ... - how you handle file changes which should go live with new features, currently its convenient to have it in git as well) and when someone changes stuff they push it and it gets deployed to prod.
We could of course sync back the devs S3 bucket to prod bucket with the new files uploaded but how to combine this with merge requests and have a good CI/CD experience?
What are your solutions to have S3 buckets for every dev so that they can spinn up their own completely working dev environment with everything available to them?
My experience is that you don't want to complicate things just to save a few dollars. S3 costs are pretty cheap, so if you're just talking about website assets, like HTML, CSS, JavaScript, and some images, then you're probably going to spend more time creating, managing, and troubleshooting a solution than you'll save. Time is, after all, your most precious resource.
If you do have large items that need to be stored to make your system work then maybe have the S3 bucket have a lifecycle policy on those large items and delete them after some reasonable amount of time. If/when a dev needs that object they can retrieve it again from its source and upload it again, manually. You could write a script to do that pretty easily.

best practice for streaming images in S3 to clients through a server

I am trying to find the best practice for streaming images from s3 to client's app.
I created a grid-like layout using flutter on a mobile device (similar to instagram). How can my client access all its images?
Here is my current setup: Client opens its profile screen (which contains the grid like layout for all images sorted by timestamp). This automatically requests all images from the server. My python3 backend server uses boto3 to access S3 and dynamodb tables. Dynamodb table has a list of all image paths client uploaded, sorted by timestamp. Once I get the paths, I use that to download all images to my server first and then send it to the client.
Basically my server is the middleman downloading the sending the images back to the client. Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe. Plus I don't know how I can give clients access to S3 without giving them aws credentials...
Any suggestions would be appreciated. Thank you in advance!
What you are doing will work, and it's probably the best option if you are optimising for getting something working quickly, w/o worrying too much about waste of server resources, unnecessary computation, and if you don't have scalability concerns.
However, if you're worrying about scalability and lower latency, as well as secure access to these image resources, you might want to improve your current architecture.
Once I get the paths, I use that to download all images to my server first and then send it to the client.
This part is the first part I would try to get rid of as you don't really need your backend to download these images, and stream them itself. However, it seems still necessary to control the access to resources based on who owns them. I would consider switching this to below setup to improve on latency, and spend less server resources to make this work:
Once I get the paths in your backend service, generate Presigned urls for s3 objects which will give your client temporary access to these resources (depending on your needs, you can adjust the time frame of how long you want a URL access to work).
Then, send these links to your client so that it can directly stream the URLs from S3, rather than your server becoming the middle man for this.
Once you have this setup working, I would try to consider using Amazon CloudFront to improve access to your objects though the CDN capabilities that CloudFront gives you, especially if your clients distributed in different geographical regions. AFA I can see, you can also make CloudFront work with presigned URLs.
Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe
Presigned URLs is your way of mitigating the uncontrolled access to your S3 objects. You probably need to worry about edge cases though (e.g. how the clients should act when their access to an S3 object has expired, so that users won't notice this, etc.). All of these are costs of making something working in scale, if you have that scalability concerns.

CloudFlare or AWS CDN links

I have a script that I install on a page and it will load some more JS and CSS from an S3 bucket.
I have versions, so when I do a release on Github for say 1.1.9 it will get deployed to /my-bucket/1.1.9/ on S3.
Question, if I want to have something like a symbolic link /my-bucket/v1 -> /my-bucket/1.1.9, how can I achieve this with AWS or CloudFlare?
The idea is that I want to release a new version by deploying it, to my bucket or whatever CDN, and than when I am ready I want to switch v1 to the latest 1.x.y version released. I want all websites to point to /v1 and get the latest when there is new release.
Is there a CDN or AWS service or configuration that will allow me to create a sort of a linux-like symbolic link like that?
A simple solution with CloudFront requires a slight change in your path design:
Bucket:
/1.1.9/v1/foo
Browser:
/v1/foo
CloudFront Origin Path (on the Origin tab)
/1.1.9
Whatever you configure as the Origin Path is added to the beginning of whatever the browser requested before sending the request to the Origin server.
Note that changing this means you also need to do a cache invalidation, because responses are cached based on what was requested, not what was fetched.
There is a potential race condition here, between the time you change the config and invalidate -- there is no correlation in the order of operations between configuration changes and invalidation requests -- a config change followed by an invalidation may be completed after,¹ so will probably need to invalidate, update config, invalidate, verify that the distribution had progressed a stable state, then invalidate once more. You don't need to invalidate objects individually, just /* or /v1*. It would be best if only the resource directly requested is subject to the rewrite, and not it's dependencies. Remember, also, that browser caching is a big cost-saver that you can't leverage as fully if you use the same request URI to represent a different object over time.
More complicated path rewriting in CloudFront requires a Lambda#Edge Origin Request trigger (or you could use Viewer Request, but these run more often and thus cost more and add to overall latency).
¹ Invalidation requests -- though this is not documented and is strictly anecdotal -- appear to involve a bit of time travel. Invalidations are timestamped, and it appears that they invalidate anything cached before their timestamp, rather than before the time they propagate to the edge locations. Architecturally, it would make sense if CloudFront is designed such that invalidations don't actively purge content, but only serve as directives for the cache to consider any cached object as stale if it pre-dates the timestamp on the invalidation request, allowing the actual purge to take place in the background. Invalidations seem to complete too rapidly for any other explanation. This means creating an invalidation request after the distribution returns to the stable Deployed state would assure that everything old is really purged, and that another invalidation request when the change is initially submitted would catch most of the stragglers that might be served from cache before the change is propagated. Changes and invalidations do appear to propagate to the edges via independent pipelines, based on observed completion timing.

Best way to handle Cloudfront/S3 website with www redirected to bare domain

I have a website that I would like the www-prefixed version to redirect to the bare domain.
After searching for different solutions, I found this closed topic here with this answer that seems to work great: https://stackoverflow.com/a/42869783/8406990
However, I have a problem where if I update the root object "index/html" in my S3 bucket, it can take over a day before Cloudfront serves the new version. I have even manually invalidated the file, and while that updates the "index.html" file correctly, Cloudfront still serves the old one.
To better explain, if I type in: http://mywebsite.com/index.html, it will serve the new version. But if I type in http://mywebsite.com/, it serves the old index.html.
I went ahead and added "index.html" in the Default Root Object Property of my Cloudfront distribution (for the bare domain), and it immediately worked as I wanted. Typing in just the domain (without adding /index.html) returned the new version.
However, this is in contrast with the answer in the thread I just linked to, which explicitly states NOT to set a "default root object" when using two distributions to do the redirect. I was hoping to gain a better understanding of this "Default Root Object", and whether there is a better way to make sure the root object updates the cached version correctly?
Thank you.
If you really put index.html/ as the default root object and your CloudFront distribution is pointing to the web site hosting endpoint of the bucket and it worked, then you were almost certainly serving up an object in your bucket called index.html/ which would appear in your bucket as a folder, or an object named index.html inside a folder named index.html. The trailing slash doesn't belong new there. This might explain the strange behavior. But that also might be a typo in your question.
Importantly... one purpose of CloudFront is to minimize requests to the back-end and keep copies cached in locations that are geographically near where they are frequently requested. Updating an object in S3 isn't designed to update what CloudFront serves right away, unless you have configured it to do so. One way of doing this is to set (for example) Cache-Control: public, max-age=600 on the object metadata when you save it to S3. This would tell CloudFront never to serve up a cached copy of the object that it obtained from S3 longer than 600 seconds (10 minutes) ago. If you don't set this, CloudFront will not check back for 24 hours, by default (the "Default TTL").
http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/Expiration.html
This only works in one direction -- it tells CloudFront how long it is permitted to retain a cached copy without checking for updates. It doesn't tell CloudFront that it must wait that long before checking. Objects that are requested infrequently might be released by CloudFront before their max-age expires. The next request fetches a fresh copy from S3.
If you need to wipe an object from CloudFront's cache right away, that's called a cache invalidation. These are billed $0.005 for each path (not each file) that you request be invalidated, but the first 1,000 per month per AWS account are billed at $0.00. You can invalidate all your files by requesting an invalidation for /*. This leaves S3 untouched but CloudFront discards anything it cached before the invalidation request.
The default root object is a legacy feature that is no longer generally needed since S3 introduced static web site hosting buckets. Before that -- and still, if you point CloudFront to the REST endpoint for the bucket -- someone hitting the root of your web site would see a listing of all your objects. Obviously, that's almost always undesirable, so the default root object allowed you to substitute a different page at the root of the site.
With static hosting in S3, you have index documents, which work in any "directory" on your site, making the CloudFront option -- which only works at the root of the site, not anywhere an index document is available. So it's relatively uncommon to use this feature, now.

Is it possible to set Amazon CloudFront cache to far-future expiry?

How do I change the cache expiry in CloudFront on AWS? I can't see a way to do it and I think I saw an old post of a few years ago on here was somebody said it couldn't be done.
I've gone through every option in S3 and CloudFront and every option on the outer folder and on the file, but nothing.
Can it be done now, or is there any alternative? I really want to set the cache to 6 months or a year if I can.
AWS is hard work.
You can, but its not exactly obvious how this works.
You can store custom http headers with your s3 objects. If you look at the console, this is under the metadata section for an object. With this you can set a far future expires header.
Cloudfront will take the existing headers and pass them on. If cloudfront is already caching the object, you will need to invalidate it to see the headers after you set them.