Amazon CloudFront Latency - amazon-web-services

I am experimenting with AWS S3 and CloudFront for a web application that I am developing.
In the app I'm letting users upload files to the S3 bucket (using the AWS SDK) and make it available via CloudFront CDN, but the issue is even when the files are uploaded and ready in the S3 bucket it takes about a minute or 2 to be available in the CloudFront CDN url, is this normal?

CloudFront attempts to fetch uncached content from the origin server in real time. There is no "replication delay" or similar issue because CloudFront is a pull-through CDN. Each CloudFront edge location knows only about your site's existence and configuration; it doesn't know about your content until it receives requests for it. When that happens, the CloudFront edge fetches the requested content from the origin server, and caches it as appropriate, for serving subsequent requests.
The issue that's occurring here is related to a concept sometimes called "negative caching" -- caching the fact that a request won't work -- which is typically done to avoid hammering the origin of whatever's being cached with requests that are likely to fail anyway.
By default, when your origin returns an HTTP 4xx or 5xx status code, CloudFront caches these error responses for five minutes and then submits the next request for the object to your origin to see whether the problem that caused the error has been resolved and the requested object is now available.
— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/custom-error-pages.html
If the browser, or anything else, tries to download the file from that particular CloudFront edge before the upload into S3 is complete, S3 will return an error, and CloudFront -- at that edge location -- will cache that error and remember, for the next 5 minutes, not to bother trying again.
Not to worry, though -- this timer is configurable, so if the browser is doing this under the hood and outside your control, you should still be able to fix it.
You can specify the error-caching duration—the Error Caching Minimum TTL—for each 4xx and 5xx status code that CloudFront caches. For a procedure, see Configuring Error Response Behavior.
— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/custom-error-pages.html
To configure this in the console:
When viewing the distribution configuration, click the Error Pages tab.
For each error where you want to customize the timing, begin by clicking Create Custom Error Response.
Choose the error code you want to modify from the drop-down list, such as 403 (Forbidden) or 404 (Not Found) -- your bucket configuration determines which code S3 returns for missing objects, so if you aren't sure, change 403 then repeat the process and change 404.
Set Error Caching Minimum TTL (seconds) to 0
Leave Customize Error Response set to No (If set to Yes, this option enables custom response content on errors, which is not what you want. Activating this option is outside the scope of this question.)
Click Create. This takes you back to the previous view, where you'll see Error Caching Minimum TTL for the code you just defined.
Repeat these steps for each HTTP response code you want to change from the default behavior (which is the 300 second hold time, discussed above).
When you've made all the changes you want, return to the main CloudFront console screen where the distributions are listed. Wait for the distribution state to change from In Progress to Deployed (formerly, this took quite some time but now requires typically about 5 minutes for the changes to be pushed out to all the edges) and test.

Are these new files being written to S3 for the first time, or are they updates to existing files? S3 provides read-after-write consistency for new objects, and given CloudFront's pull model you should not be having this issue with new files written to S3. If you are, then I would open a ticket with AWS.
If these are updates to existing files, then you have both S3 eventual consistency and CloudFront cache expiration to deal with. Both of which could cause this sort of behavior.

As observed in your comment, it seems that google chrome is messing up with your upload/preview strategy:
Chrome is requesting the URL that currently doesn't have the
content.
the request is cached by cloudfront with invalid response
you upload the file to S3
when preview the uploaded file the cloudfront answers with the cached response (step 2).
after the cloudfront cache expires, cloudfront hits origin and the problem can no longer be reproducible.

Related

AWS s3 bucket with CloudFront - Failed to load resource: the server responded with a status of 403 [duplicate]

I am experimenting with AWS S3 and CloudFront for a web application that I am developing.
In the app I'm letting users upload files to the S3 bucket (using the AWS SDK) and make it available via CloudFront CDN, but the issue is even when the files are uploaded and ready in the S3 bucket it takes about a minute or 2 to be available in the CloudFront CDN url, is this normal?
CloudFront attempts to fetch uncached content from the origin server in real time. There is no "replication delay" or similar issue because CloudFront is a pull-through CDN. Each CloudFront edge location knows only about your site's existence and configuration; it doesn't know about your content until it receives requests for it. When that happens, the CloudFront edge fetches the requested content from the origin server, and caches it as appropriate, for serving subsequent requests.
The issue that's occurring here is related to a concept sometimes called "negative caching" -- caching the fact that a request won't work -- which is typically done to avoid hammering the origin of whatever's being cached with requests that are likely to fail anyway.
By default, when your origin returns an HTTP 4xx or 5xx status code, CloudFront caches these error responses for five minutes and then submits the next request for the object to your origin to see whether the problem that caused the error has been resolved and the requested object is now available.
— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/custom-error-pages.html
If the browser, or anything else, tries to download the file from that particular CloudFront edge before the upload into S3 is complete, S3 will return an error, and CloudFront -- at that edge location -- will cache that error and remember, for the next 5 minutes, not to bother trying again.
Not to worry, though -- this timer is configurable, so if the browser is doing this under the hood and outside your control, you should still be able to fix it.
You can specify the error-caching duration—the Error Caching Minimum TTL—for each 4xx and 5xx status code that CloudFront caches. For a procedure, see Configuring Error Response Behavior.
— http://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/custom-error-pages.html
To configure this in the console:
When viewing the distribution configuration, click the Error Pages tab.
For each error where you want to customize the timing, begin by clicking Create Custom Error Response.
Choose the error code you want to modify from the drop-down list, such as 403 (Forbidden) or 404 (Not Found) -- your bucket configuration determines which code S3 returns for missing objects, so if you aren't sure, change 403 then repeat the process and change 404.
Set Error Caching Minimum TTL (seconds) to 0
Leave Customize Error Response set to No (If set to Yes, this option enables custom response content on errors, which is not what you want. Activating this option is outside the scope of this question.)
Click Create. This takes you back to the previous view, where you'll see Error Caching Minimum TTL for the code you just defined.
Repeat these steps for each HTTP response code you want to change from the default behavior (which is the 300 second hold time, discussed above).
When you've made all the changes you want, return to the main CloudFront console screen where the distributions are listed. Wait for the distribution state to change from In Progress to Deployed (formerly, this took quite some time but now requires typically about 5 minutes for the changes to be pushed out to all the edges) and test.
Are these new files being written to S3 for the first time, or are they updates to existing files? S3 provides read-after-write consistency for new objects, and given CloudFront's pull model you should not be having this issue with new files written to S3. If you are, then I would open a ticket with AWS.
If these are updates to existing files, then you have both S3 eventual consistency and CloudFront cache expiration to deal with. Both of which could cause this sort of behavior.
As observed in your comment, it seems that google chrome is messing up with your upload/preview strategy:
Chrome is requesting the URL that currently doesn't have the
content.
the request is cached by cloudfront with invalid response
you upload the file to S3
when preview the uploaded file the cloudfront answers with the cached response (step 2).
after the cloudfront cache expires, cloudfront hits origin and the problem can no longer be reproducible.

Resize images on the fly in CloudFront and get them in the same URL instantly: AWS CloudFront -> S3 -> Lambda -> CloudFront

TLDR: We have to trick CloudFront 307 redirect caching by creating new cache behavior for responses coming from our Lambda function.
You will not believe how close we are to achieve this. We have stucked so badly in the last step.
Business case:
Our application stores images in S3 and serves them with CloudFront in order to avoid any geographic slow downs around the globe.
Now, we want to be really flexible with the design and to be able to request new image dimentions directly in the CouldFront URL!
Each new image size will be created on demand and then stored in S3, so the second time it is requested it will be
served really quickly as it will exist in S3 and also will be cached in CloudFront.
Lets say the user had uploaded the image chucknorris.jpg.
Only the original image will be stored in S3 and wil be served on our page like this:
//xxxxx.cloudfront.net/chucknorris.jpg
We have calculated that we now need to display a thumbnail of 200x200 pixels.
Therefore we put the image src to be in our template:
//xxxxx.cloudfront.net/chucknorris-200x200.jpg
When this new size is requested, the amazon web services have to provide it on the fly in the same bucket and with the requested key.
This way the image will be directly loaded in the same URL of CloudFront.
I made an ugly drawing with the architecture overview and the workflow on how we are doing this in AWS:
Here is how Python Lambda ends:
return {
'statusCode': '301',
'headers': {'location': redirect_url},
'body': ''
}
The problem:
If we make the Lambda function redirect to S3, it works like a charm.
If we redirect to CloudFront, it goes into redirect loop because CloudFront caches 307 (as well as 301, 302 and 303).
As soon as our Lambda function redirects to CloudFront, CloudFront calls the API Getaway URL instead of fetching the image from S3:
I would like to create new cache behavior in CloudFront's Behaviors settings tab.
This behavior should not cache responses from Lambda or S3 (don't know what exactly is happening internally there), but should still cache any followed requests to this very same resized image.
I am trying to set path pattern -\d+x\d+\..+$, add the ARN of the Lambda function in add "Lambda Function Association"
and set Event Type Origin Response.
Next to that, I am setting the "Default TTL" to 0.
But I cannot save the behavior due to some error:
Are we on the right way, or is the idea of this "Lambda Function Association" totally different?
Finally I was able to solve it. Although this is not really a structural solution, it does what we need.
First, thanks to the answer of Michael, I have used path patterns to match all media types. Second, the Cache Behavior page was a bit misleading to me: indeed the Lambda association is for Lambda#Edge, although I did not see this anywhere in all the tooltips of the cache behavior: all you see is just Lambda. This feature cannot help us as we do not want to extend our AWS service scope with Lambda#Edge just because of that particular problem.
Here is the solution approach:
I have defined multiple cache behaviors, one per media type that we support:
For each cache behavior I set the Default TTL to be 0.
And the most important part: In the Lambda function, I have added a Cache-Control header to the resized images when putting them in S3:
s3_resource.Bucket(BUCKET).put_object(Key=new_key,
Body=edited_image_obj,
CacheControl='max-age=12312312',
ContentType=content_type)
To validate that everything works, I see now that the new image dimention is served with the cache header in CloudFront:
You're on the right track... maybe... but there are at least two problems.
The "Lambda Function Association" that you're configuring here is called Lambda#Edge, and it's not yet available. The only users who can access it is users who have applied to be included in the limited preview. The "maximum allowed is 0" error means you are not a preview participant. I have not seen any announcements related to when this will be live for all accounts.
But even once it is available, it's not going to help you, here, in the way you seem to expect, because I don't believe an Origin Response trigger allows you to do anything to trigger CloudFront to try a different destination and follow the redirect. If you see documentation that contradicts this assertion, please bring it to my attention.
However... Lambda#Edge will be useful for setting Cache-Control: no-cache on the 307 so CloudFront won't cache it, but the redirect itself will still need to go all the way back to the browser.
Note also, Lambda#Edge only supports Node, not Python... so maybe this isn't even part of your plan, yet. I can't really tell, from the question.
Read about the Lambda#Edge limited preview.
The second problem:
I am trying to set path pattern -\d+x\d+\..+$
You can't do that. Path patterns are string matches supporting * wildcards. They are not regular expressions. You might get away with /*-*x*.jpg, though, since multiple wildcards appear to be supported.

Amazon S3 static site serves old contents

My S3 bucket hosts a static website. I do not have cloudfront set up.
I recently updated the files in my S3 bucket. While the files got updated, I confirmed manually in the bucket. It still serves an older version of the files. Is there some sort of caching or versioning that happens on Static websites hosted on S3?
I haven't been able to find any solution on SO so far. Note: Cloudfront is NOT enabled.
Is there some sort of caching or versioning that happens on Static websites hosted on S3?
Amazon S3 buckets provide read-after-write consistency for PUTS of new objects and eventual consistency for overwrite PUTS and DELETES
what does this mean ?
If you create a new object in s3, you will be able to immediately access your object - however in case you do an update of an existing object, you will 'eventually' get the newest version of you object from s3, so s3 might still deliver you the previous version of the object.
I believe that starting some time ago, read-after-write consistency is also available for update in the US Standard region.
how much do you need to wait ? well it depends, Amazon does not provide much information about this.
what you can do ? no much. If you want to make sure you do not have any issue with your S3 bucket delivering the files, upload a new file in your bucket, you will be able to access it immediately
Solution is here:
But you need to use CloundFront. like #Frederic Henri said, you cannot do much in S3 bucket itself, but with CloudFront, you can invalidate it.
CloudFront will have cached that file on an edge location for 24 hours which is the default TTL (time to live), and will continue to return that file for 24 hours. Then after the 24 hours are over, and a request is made for that file, CloudFront will check the origin and see if the file has been updated in the S3 bucket. If is has been updated, CloudFront will then serve the new updated version of the object. If it has not been updated, then CloudFront will continue to serve the original version of the object.
However where you update the file in the origin and wish for it to be served immediately via your website, then what needs to be done is a CloudFront invalidation. An invalidation wipes the file(s) from the CloudFront cache, so when a request is made to CloudFront, it will see that there are no files on the cache, will then check the origin and will serve the new updated file in the origin. Running an invalidation is recommended each time files are updated in the origin.
To run an invalidation:
click on the following link for CloudFront console
-- https://console.aws.amazon.com/cloudfront/home?region=eu-west-1#
open the distribution in question
click on the 'Invalidations' tab to the right of all the tabs
click on 'Create Invalidation'
on the popup, it will ask for the path. You can enter /* to invalidate every object from the cache, or enter the exact path tot he file, such as /images/picture.jpg
finally click on 'Invalidate'
this typically will be completed within 2/3 minutes
then once the invalidation is complete, when you request the object again through CloudFront, CloudFront will check the origin and return the updated file.
It sounds like Akshay tried uploading with a new filename and it worked.
I just tried the same (I was having the same problem), and it resolved the file not being available for me.
Do a push of index.html
index.html not updated
mv index.html index-new.html
Do a push of new-index.htlml
After this, index-html was immediately available.
That's kind of shite - I can't share one link to my website if I want to be sure that the recipient will see the latest version? I need to keep changing the filename and re-sharing the new link.

AWS cloudfront not updating on update of files in S3

I created a distribution in cloudfront using my files on S3.
It worked fine and all my files were available. But today I updated my files on S3 and tried to access them via Cloudfront, but it still gave old files.
What am I missing ?
Just ran into the same issue. At first I tried updating the cache control to be 0 and max-age=0 for the files in my S3 bucket I updated but that didn't work.
What did work was following the steps from #jpaljasma. Here's the steps with some screen shots.
First go to your AWS CloudFront service.
Then click on the CloudFront distrubition you want to invalidate.
Click on the invalidations tab then click on "Create Invalidation" which is circled in red.
In the "object path" text field, you can list the specific files ie /index.html or just use the wildcard /* to invalidate all. This forces cloudfront to get the latest from everything in your S3 bucket.
Once you filled in the text field click on "Invalidate", after CloudFront finishes invalidating you'll see your changes the next time you go to the web page.
Note: if you want to do it via aws command line interface you can do the following command
aws cloudfront create-invalidation --distribution-id <your distribution id> --paths "/*"
The /* will invalidate everything, replace that with specific files if you only updated a few.
To find the list of cloud front distribution id's you can do this command aws cloudfront list-distributions
Look at these two links for more info on those 2 commands:
https://docs.aws.amazon.com/cli/latest/reference/cloudfront/create-invalidation.html
https://docs.aws.amazon.com/cli/latest/reference/cloudfront/list-distributions.html
You should invalidate your objects in CloudFront distribution cache.
Back in the old days you'd have to do it 1 file at a time, now you can do it wildcard, e.g. /images/*
https://aws.amazon.com/about-aws/whats-new/2015/05/amazon-cloudfront-makes-it-easier-to-invalidate-multiple-objects/
How to change the Cache-Control max-age via the AWS S3 Console:
Navigate to the file whose Cache-Control you would like to change.
Check the box next to the file name (it will turn blue)
On the top right click Properties
Click Metadata
If you do not see a Key named Cache-Control, then click Add more metadata.
Set the Key to Cache-Control set the Value to max-age=0 (where 0 is the number of seconds you would like the file to remain in the cache). You can replace 0 with whatever you want.
The main advantage of using CloudFront is to get your files from a source (S3 in your case) and store it on edge servers to respond to GET requests faster. CloudFront will not go back to S3 source for each http request.
To have CloudFront serve latest fiels/objects, you have multiple options:
Use CloudFront to Invalidate modified Objects
You can use CloudFront to invalidate one or more files or directories manually or using a trigger. This option have been described in other responses here. More information at Invalidate Multiple Objects in CloudFront. This approach comes handy if you are updating your files infrequently and do not want to impact the performance benefits of cached objects.
Setting object expiration dates on S3 objects
This is now the recommended solution. It is straight forward:
Log in to AWS Management Console
Go into S3 bucket
Select all files
Choose "Actions" drop down from the menu
Select "Change metadata"
In the "Key" field, select "Cache-Control" from the drop down menu.
In the "Value" field, enter "max-age=300" (number of seconds)
Press "Save" button
The default cache value for CloudFront objects is 24 hours. By changing it to a lower value, CloudFront checks with the S3 source to see if a newer version of the object is available in S3.
I use a combination of these two methods to make sure updates are propagated to an edge locations quickly and avoid serving outdated files managed by CloudFront.
AWS however recommends changing the object names by using a version identifier in each file name. If you are using a build command and compiling your files, that option is usually available (as in react npm build command).
For immediate reflection of your changes, you have to invalidate objects in the Cloudfront - Distribution list -> settings -> Invalidations -> Create Invalidation.
This will clear the cache objects and load the latest ones from S3.
If you are updating only one file, you can also invalidate exactly one file.
It will just take few seconds to invalidate objects.
Distribution List -> settings -> Invalidations -> Create Invalidation
I also faced similar issues and found out its really easy to fix in your cloudfront distribution
Step 1.
Login To your AWS account and select your target distribution as shown in the picture below
Step 2.
Select Distribution settings and select behaviour tab
Step 3.
Select Edit and choose option All as per the below image
Step 4.
Save your settings and that's it
I also had this issue and solved it by using versioning (not the same as S3 versioning). Here is a comprehensive link to using versioning with cloudfront
Invalidating Files
In summary:
When you upload a new file or files to your S3 bucket, change the version, and update your links as appropriate. From the documentation the benefit of using versioning vs. invalidating (the other way to do this) is that there is no additional charge for making CloudFront refresh by version changes whereas there is with invalidation. If you have hundreds of files this may be problematic, but its possible that by adding a version to your root directory, or default root object (if applicable) it wouldn't be a problem. In my case, I have an SPA, all I have to do is change the version of my default root object (index.html to index2.html) and it instantly updates on CloudFront.
Thanks tedder42 and Chris Heald
I was able to reduce the cache duration in my origin i.e. s3 object and deliver the files more instantly then what it was by default 24 hours.
for some of my other distribution I also set forward all headers to origin in which cloudfront doesn't cache anything and sends all request to origin.
thanks.
Please refer to this answer this may help you.
What's the difference between Cache-Control: max-age=0 and no-cache?
Adding a variable Cache-Control to 0 in the header to the selected file in S3
How to change the Cache-Control max-age via the AWS S3 Console:
Go to your bucket
Select all files you would like to change (you can select folders as well, it will include all files inside them
Click on the Actions dropdown, then click on Edit Metadata
On the page that will open, click on Add metadata
Set Type to System defined
Set Key to Cache-Control
Set value to 0 (or whatever you would like to set it to)
Click on Save Changes
Invalidate all distribution files:
aws cloudfront create-invalidation --distribution-id <dist-id> --paths "/*"
If you need to remove a file from CloudFront edge caches before it expires docs
The best practice for solving this issue is probably using the Object Version approach.
The invalidation method could solve this problem anyhow but it will bring you some side effects simultaneously. Such as cost increasing if exceeding 1000 times per month, or some object could not be deleted via this method.
Hope the official doc on "Why CloudFront is serving outdated content from Amazon" could help the poor guys.

Are deleted files accessible if behind cloudfront

I am trying to wrap my head around how cloudfront and CDNs work.
If I have a file and the cache control header is set to 1 year and I am using amazon cloud front as my CDN.
What happens if I delete the file? Would it still be served as it is cached by the cloudfront servers? Would it be served in all locations of the world, or does it only get cached on an edge server if it has been requested once.
Example I have a file behind Amazon Cloud Front
blue.jpg with cache control headers set for 1 year
I visit the file from a location in New York
I then delete the file.
If I then visit the page which includes the file again from New York, Would the file be served as its cached?
What if someone then visits the page with the file from Moscow, Russia. Would he be able to view the file?
Thanks for your help :)
CloudFront is simply a collection of caches close to your users. Each edge location operates independently.
By default, CloudFront obeys your http cache control headers. If you set your headers so a file does not expire for a year, the CloudFront will continue serving that file for a year without checking back to your origin server.
Since each edge location operates independently, in your example, New York will continue serving the file, but Moscow will the file as deleted (404). As you can imaging this could lead to different users seeing different content.
There are strategies to avoid this problem.
From the CloudFront docs (http://aws.amazon.com/cloudfront/#details):
Object Versioning and Cache Invalidation
You have two options to update your files cached at the Amazon CloudFront edge locations. You can use object versioning to manage changes to your content. To implement object versioning, you create a unique filename in your origin server for each version of your file and use the file name corresponding to the correct version in your web pages or applications. With this technique, Amazon CloudFront caches the version of the object that you want without needing to wait for an object to expire before you can serve a newer version.
You can also remove copies of a file from all Amazon CloudFront edge locations at any time by calling the invalidation API. This feature removes the file from every Amazon CloudFront edge location regardless of the expiration period you set for that file on your origin server. If you need to remove multiple files at once, you may send a list of files (up to 1,000) in an XML document. The invalidation feature is designed to be used in unexpected circumstances, e.g., to correct an encoding error on a video you uploaded or an unanticipated update to your website’s CSS file. However, if you know beforehand that your files will change frequently, it is recommended that you use object versioning to manage updates to your files. This technique gives you more control over when your changes take effect and also lets you avoid potential charges for invalidating objects.