Updating uploaded content on Amazon S3? - amazon-web-services

We have a problem with updating our uploaded content on Amazon S3. We keep our software updates on Amazon S3. We overwrite the old version of our software on S3 with new versions. Sometimes our users get old versions of files, when new versions have already been uploaded over 10 hours ago.
Step by step actions of our team:
We upload our file (size about 300 mb) on S3
This file is located on S3 for some time; more than a day, usually some weeks.
We upload a new version of the file to S3, overwriting the old version of this file
We start testing downloads. Some people get new versions, but another people get old versions.
How to solve this problem?

You should use different file names for different versions, this would make sure that some crazy proxy won't cache old file.

I'd suggest you try to use S3 Object Versioning, and place CloudFront in the solution to expose a short TTL Expiry to make it clear to caches to dismiss it ASAP.
Just a note for CloudFront: Make sure to Invalidate the CloudFront Cache for the Object when releasing a new version

Related

Continuous Delivery issues with S3 and AWS CloudFront

I'm building out a series of content websites, and I've built a working CodePipeline that allows me to push edits to HTML files on github that instantly reflect in the S3 bucket, and consequently on the live website.
I created a cloudfront distro to get HTTPS for my website. The certificate and distro work fine, and it populates with my index.html in my S3 bucket, but the changes made via my github pipeline to the S3 bucket are reflected in the S3 bucket but not the CloudFront Distribution.
From what I've read, the edge locations used in cloudfront don't update their caches super often, and when they do, they might not update the edited index.html file because it has the same name as the old version.
I don't want to manually rename my index.html file in S3 every time one of my writers needs to post a top 10 Tractor Brands article or implement an experimental, low-effort clickbait idea, so that's pretty much off the table.
My overall objective is to build something where teams can quickly add an article with a few images to the website that goes live in minutes, and I've been able to do it so far but not with HTTPS.
If any of you know a good way of instantly updating CloudFront Distributions without changing file names, that would be great. Othterwise I'll probably have to start over because I need my sites secured and the ability to update them instantly.
You people are awesome. Thanks a million for any help.
You need to invalidate files from the edge caches. It's a simple and quick process.
You can automate the process yourself in your pipeline, or you could potentially use a third-party tool such as aws-cloudfront-auto-invalidator.

How to set no cache AT ALL on AWS S3?

I started to use AWS S3 to provide a fast way to my users download the installation files of my Win32 apps. Each install file has about 60MB and the download it's working very fast.
However when i upload a new version of the app, S3 keeps serving the old file instead ! I just rename the old file and upload the new version with the same name of the old. After i upload, when i try to download, the old version is downloaded instead.
I searched for some solutions and here is what i tried :
Edited all TTL values on cloudfrond to 0
Edited the metadata 'Cache-control' with the value 'max-age=0' for each file on the bucket
None of these fixed the issue, AWS keeps serving the old file instead of the new !
Often i will upload new versions, so i need that when the users try to download, S3 never use cache at all.
Please help.
I think this behavior might be because S3 uses an eventually consistent model, meaning that updates and deletes will propagate eventually but it is not guaranteed that this happens immediately, or even within a specific amount of time. (see here for the specifics of their consistency approach). Specifically, they say "Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all Regions" and I think the case you're describing would be an overwrite PUT. There appears to be a good answer on a similar issue here: How long does it take for AWS S3 to save and load an item? which touches on the consistency issue and how to get around it, hopefully that's helpful

Scheduling a future version of AWS S3 files

I'd like to queue up a collection of new versions of web site assets, and make them all go live at nearly the same time.
I've got a series of related files and directories that need to go live at a future time, all at once. In other words, a collection of AWS S3 files in a given bucket need to be updated at nearly the same time. Some of these files are large, and they could originate from locations where Internet access is unreliable and slow. That means they need to be staged somewhere, possibly in another bucket.
I want to be able to roll back to previous version(s) of individual files, or a set of files.
Suggestions or ideas? Bash code is preferred.
One option would be to put Amazon CloudFront in front of the Amazon S3 bucket. The CloudFront distribution can be "pointed" to an origin, such as an S3 bucket and path.
So, the update could be done just by changing one configuration in the CloudFront distribution.
If you are sticking with S3 exclusively, the updated files would need to be copied to the appropriate location (either from another bucket or from elsewhere in the same bucket). The time to make this happen would depend upon the size of the objects. You could do a parallel copy to make them copy faster.
Or, if the data is being accessed via a web page, then you could have the new version of the files already in place, then just update the web page that references the files. This means that all the pages (with different names) could be sitting there ready to be used and you just update the home page, which points to the other pages. Think of it as directing people through a different front door.

Issue with update of objects in AWS S3 bucket

While building an AWS website for one of my client I am having issues with the eventual consistency of S3 Bucket while updating an object.
In one of the feature that we have developed the user can update his profile picture and we are saving the profile picture in the S3 bucket and saving the public URL of it in the DB for later retrieval.
Now for new Objects it is working fine but for updates it is taking time(~ 5-10 mins) for the update to happen. I have explored the internet and could not find a solution to this. Some people suggested to use a versioning like v1/filename and v2/filename and with update take the data from the latest version directory but this is too impractical.
Can any one please suggest me what to do?
enable versioning in the bucket and use the versioning features to get the latest - rather than altering the path. s3 will handle the number of copies. See
https://forums.aws.amazon.com/thread.jspa?threadID=263531 for a discussion of this feature and consistency

How can I deploy a new cloudfront s3 version without a small period of unavailability?

I am utilizing AWS cloudfront with an S3 origin. I'm using a webpack plugin to cache-bust using chunked hash file names for all of my static files excluding index.html, which I will simply invalidate using the cloudfront feature upon each new release.
I plan on using a jenkins build to run aws s3 sync ./dist s3://BUCKET-NAME/dist
--recursive --delete which will swap out the new chunked files as necessary. Then I will overwrite the index.html file to use the new chunked reference. During the few seconds (max) it takes to swap out the old files for new, it is possible that a user will make a request to the website from a region in which cloudfront has not cached the resources, at which point they'll be unavailable because I have just deleted them.
I could not find any information about avoiding this edge case.
Yes, it can happen that a person near a different edge location experience the missing files. To solve this, you need to change the approach of doing new deployments since cache busting and time is unpredictable at request-response level. One commonly used pattern is to keep different directories(paths) for each new deployment in S3 as follows.
For release v1.0
/dist/v1.0/js/*
/dist/v1.0/css/*
/dist/index.html <- index.html for v1.0 release which has reference for js & css in /dist/v1.0 path
For release v1.1
/dist/v1.1/js/*
/dist/v1.1/css/*
/dist/index.html <- index.html for v1.1 release which has reference for js & css in /dist/v1.1 path
After each deployment, a user will receive either the old version(v1.0) or new version(v1.1) of the index.html, which will still working during the transition period until the edge cache is busted.
You can automate the versioning with Jenkins either incrementing the version or using parameterize build plugin.
This will also be useful to do immutable deployments, where in a case of a critical issue, you can rollback to the previous deployments. Apart from that you can configure S3 lifecycle management rules to archive the older versions.
A library called Stout can do this all for you automatically. It's a major time saver. I have no association with them, I just really like it.
A few benefits:
Can help you create new buckets if you want it to
Versions your script and style files during each deploy to ensure your pages don't use an inconsistent set of files during or after a deploy
Supports rollback to any previous version
Properly handles caching headers such as TTL
Compresses files for faster delivery
Usage:
stout deploy --bucket my-bucket-name --root path/to/website
Here is how I solved that problem.
Just deleting flat will not solve the issue.
Since you have chunked hash file names I assume you have only index.html that is not hashed filename.
Collect all old files which need to be deleted
aws s3 ls s3://bucket
Deploy all files from your new build.
aws s3 cp ./dist s3://bucket
Remove old files now either with mv or delete
aws s3 rm files you collected before except index.html
Your new site will be served with the new app now.
Hope it helps.