Continuous Delivery issues with S3 and AWS CloudFront - amazon-web-services

I'm building out a series of content websites, and I've built a working CodePipeline that allows me to push edits to HTML files on github that instantly reflect in the S3 bucket, and consequently on the live website.
I created a cloudfront distro to get HTTPS for my website. The certificate and distro work fine, and it populates with my index.html in my S3 bucket, but the changes made via my github pipeline to the S3 bucket are reflected in the S3 bucket but not the CloudFront Distribution.
From what I've read, the edge locations used in cloudfront don't update their caches super often, and when they do, they might not update the edited index.html file because it has the same name as the old version.
I don't want to manually rename my index.html file in S3 every time one of my writers needs to post a top 10 Tractor Brands article or implement an experimental, low-effort clickbait idea, so that's pretty much off the table.
My overall objective is to build something where teams can quickly add an article with a few images to the website that goes live in minutes, and I've been able to do it so far but not with HTTPS.
If any of you know a good way of instantly updating CloudFront Distributions without changing file names, that would be great. Othterwise I'll probably have to start over because I need my sites secured and the ability to update them instantly.
You people are awesome. Thanks a million for any help.

You need to invalidate files from the edge caches. It's a simple and quick process.
You can automate the process yourself in your pipeline, or you could potentially use a third-party tool such as aws-cloudfront-auto-invalidator.

Related

AEM/Adobe Experience Manager upload only some assets to AWS S3

My company is using AEM 6.5 and we were thinking to get some better performance out of our systems.
The idea we had is to upload only some assets (for example videos) to an S3 bucket and keep the other assets locally, we do not want to upload all the assets/datastore to S3. I know I can switch the datastore to S3, but that would mean all the assets go to S3, and we don't want this.
Restriction: we want the video upload to be done seamlessly from within the AEM Author, the editor should upload the video normally and somehow, behind the scenes, this transition to S3 to happen.
I checked as much documentation as I could find, and there is no mention of this partial asset upload to S3, you either go full S3 or nothing at all (we already tested full S3 datastore, it's working, but we do not want it).
So, my question is: did someone manage to do something like this?
Thanks
Have you looked into writing an Adobe Experience Manager workflow that would then read a list of assets to upload and then only update those specified assets. You could control which assets are uploaded to an Amazon S3 bucket before running the AEM workflow.
You can create a custom workflow step as discussed here. However in your use case - you would use the S3 Java API to create a custom workflow step. This is one way you can control which assets are uploaded to an Amazon S3 bucket from AEM.
https://helpx.adobe.com/experience-manager/using/message_service_gateway_api_64.html
Technically, it is possible to upload assets to S3, when they are uploaded to AEM instead of storing them in JCR. Nevertheless, this probably won't work as you expect and would require a lot of refactoring of AEM itself to make it work properly.
Just because the binary is stored in S3, does not mean that AEMs internals are aware of that and can deal with it.
Take asset preview on the author for example: this part of AEM would expect the binary to be stored in JCR. Now you have to rewrite this whole part of AEM to go look for those assets in S3. This would be a massive headache, overlaying those parts of AEM are already deprecated etc. And this is just one example of hundreds, that you would need to find a solution for.
It is not worth the effort.
You probably need to go "all-in" with S3 or leave it as is. Not sure what the reasoning is behind this drive to only use S3 "partially" for videos instead of all assets. Videos are probably already the largest assets you have, so it can't be cost. We run pure asset installations with S3 datastore that have 20TB-60TB of data which is totally fine.

How to set no cache AT ALL on AWS S3?

I started to use AWS S3 to provide a fast way to my users download the installation files of my Win32 apps. Each install file has about 60MB and the download it's working very fast.
However when i upload a new version of the app, S3 keeps serving the old file instead ! I just rename the old file and upload the new version with the same name of the old. After i upload, when i try to download, the old version is downloaded instead.
I searched for some solutions and here is what i tried :
Edited all TTL values on cloudfrond to 0
Edited the metadata 'Cache-control' with the value 'max-age=0' for each file on the bucket
None of these fixed the issue, AWS keeps serving the old file instead of the new !
Often i will upload new versions, so i need that when the users try to download, S3 never use cache at all.
Please help.
I think this behavior might be because S3 uses an eventually consistent model, meaning that updates and deletes will propagate eventually but it is not guaranteed that this happens immediately, or even within a specific amount of time. (see here for the specifics of their consistency approach). Specifically, they say "Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all Regions" and I think the case you're describing would be an overwrite PUT. There appears to be a good answer on a similar issue here: How long does it take for AWS S3 to save and load an item? which touches on the consistency issue and how to get around it, hopefully that's helpful

Scheduling a future version of AWS S3 files

I'd like to queue up a collection of new versions of web site assets, and make them all go live at nearly the same time.
I've got a series of related files and directories that need to go live at a future time, all at once. In other words, a collection of AWS S3 files in a given bucket need to be updated at nearly the same time. Some of these files are large, and they could originate from locations where Internet access is unreliable and slow. That means they need to be staged somewhere, possibly in another bucket.
I want to be able to roll back to previous version(s) of individual files, or a set of files.
Suggestions or ideas? Bash code is preferred.
One option would be to put Amazon CloudFront in front of the Amazon S3 bucket. The CloudFront distribution can be "pointed" to an origin, such as an S3 bucket and path.
So, the update could be done just by changing one configuration in the CloudFront distribution.
If you are sticking with S3 exclusively, the updated files would need to be copied to the appropriate location (either from another bucket or from elsewhere in the same bucket). The time to make this happen would depend upon the size of the objects. You could do a parallel copy to make them copy faster.
Or, if the data is being accessed via a web page, then you could have the new version of the files already in place, then just update the web page that references the files. This means that all the pages (with different names) could be sitting there ready to be used and you just update the home page, which points to the other pages. Think of it as directing people through a different front door.

Issue with update of objects in AWS S3 bucket

While building an AWS website for one of my client I am having issues with the eventual consistency of S3 Bucket while updating an object.
In one of the feature that we have developed the user can update his profile picture and we are saving the profile picture in the S3 bucket and saving the public URL of it in the DB for later retrieval.
Now for new Objects it is working fine but for updates it is taking time(~ 5-10 mins) for the update to happen. I have explored the internet and could not find a solution to this. Some people suggested to use a versioning like v1/filename and v2/filename and with update take the data from the latest version directory but this is too impractical.
Can any one please suggest me what to do?
enable versioning in the bucket and use the versioning features to get the latest - rather than altering the path. s3 will handle the number of copies. See
https://forums.aws.amazon.com/thread.jspa?threadID=263531 for a discussion of this feature and consistency

Setting up Amazon Cloudfront without S3

I want to use Cloudfront to serve images and CSS from my static website. I have read countless articles showing how to set it up with Amazon S3 but I would like to just host the files on my host and use cloud front to speed up delivery of said files, I'm just unsure on how to go about it.
So far I have created a distribution on CloudFront with my Origin Domain and CName and deployed it.
Origin Domain: example.me CName media.example.me
I added the CNAME for my domain:
media.mydomain.com with destination xxxxxx.cloudfront.net
Now this is where I'm stuck? Do I need to update the links in my HTML to that cname so if the stylesheet was http://example.me/stylesheets/screen.css do I change that to http://media.example.me/stylesheets/screen.css
and images within the stylesheet that were ../images/image1.jpg to http://media.example.me/images/image1.jpg?
Just finding it a little confusing how to link everything it's the first time I have really dabbled in using a CDN.
Thanks
Yes, you will have to update the paths in your HTML to point to CDN. Typically if you have a deployment/build process this link changing can be done at that time (so that development time can use the local files).
Another important thing to also handle here is the versioning the CSS/JS etc. You might make frequent changes to your CSS/JS. When you make any change typically CDNs take 24 hrs to reflect. (Another option is invalidating files on CDN, this is but charged explicitly and is discouraged). The suggested method is to generate a path like "media.example.me/XYZ/stylesheets/screen.css", and change this XYZ to a different number for each deployment (time stamp /epoch will do). This way with each deployment, you need to invalidate only the HTML and other files are any way a new path and will load fresh. This technique is generally called finger-printing the URLs.
Yes, you would update the references to your CSS files to load via the CDN domain. If image paths within CSS do not include a domain, they will also automatically load via cloudfront.