S3 cloudfront expiry on images - performance very slow - amazon-web-services

I recently started serving my website images on cloudfront CDN instead of S3 thinking that it would be much faster. It is not. In fact it is much much slower. After a lot of investigation I'm being hinted that setting an expiry date on image objects is the key as Cloudfront will know how long to keep cached static content. Makes sense. But this is poorly documented by AWS and I can't figure out how to change the expiry date. people have said "you can change this on the aws console" Please show how dumb I am because I cannot see this. been at it for several hours. Needless to say I'm quite frustrated of fumbling around on this. Anyways any hints as small as they might be would be terrific. I like AWS, and what Cloudfront promised, but so far it's not what it seems.
EDIT ADITIONAL DETAIL:
Added expiry date headers per answer. In my case I had no headers. My hypothesis was that my slow Cloudfront performance serving images had to do with having NO expiry in the header. Having set an expiry date as shown in screenshot, and described in the answer, I'm seeing no noticeable difference in performance (going from no headers to adding an expiry date only). My site takes on average 7s to load with 10 core images (each <60Kbs). Those 10 images (served via cloudfront) account for 60-80% of the load time latency - depending on the performance tool used. Obviously something is wrong given that serving files on my VPS is faster. I hate to conclude that cloudfront is the problem given that so many people use it and I'd hate to break of from EC2 and S3, but right now testing MAxCDN is showing better results. I'll keep testing over the next 24hrs, but my conclusion is that the expiry date header is just a confusing detail with no favorable impact. Hopefully I'm wrong because I'd like to keep it all in the AWS family. Perhaps I'm barking up the wrong tree on the expiry date thing?

You will need to set it in the meta-data while uploading the file into S3. This article describes how you can achieve this.
The format for the expiry date is the RFC1123 date which is formatted like this:
Expires: Thu, 01 Dec 1994 16:00:00 GMT
Setting this to a far future date will enable caches like CloudFront to hold the file for a long time and in this case speed up the delivery as the single edge-locations (server all over the world delivering content for CloudFront) already have the data and don't need to fetch it again and again.
Even with a far future expiry header the first request of an object will be slow as the edge-location has to fetch the object once before it can be served from the cache.
Alternately you can omit the Expires header and use Cache-Control instead. CloudFront will understand that one too and you are more flexible with the expiry. In that case you can for example state the object should be held for one day from the first request the edge-location made to the object:
Cache-Control: public, max-age=86400
In that case the time is given using seconds instead of a fixed date.

While setting Cache-Control or Expires headers will improve the cache-ability of your objects, it won't improve your 60k/sec download speed. This requires some help from AWS. Therefore, I would recommend posting to the AWS CloudFront Forums with some sample response headers, traceroutes and resolver information.

Related

How to use CloudFront efficiently for less popular website?

We are building a website which contains a lot of images and data. We have optimized a lot to make the website faster. Then we decided to use AWS CloudFront also to make it faster for all regions around the world. The app works faster after the integration of CloudFront.
But later we found that the data will load to CloudFront cache only when the website asks for it. So we are afraid that the initial load will take the same time as it used to take without the CDN because it loads from S3 to CDN first and then to the user.
Also, we used the default TTL values (ie., 24 hours). In our case, a user may log in once or twice per week to this website. So in that case also, the advantage of caching won't work here as well because the caching expires after 24 hours. Will raising the time of TTL (Maximum TTL) to a larger value solve the issue? Does it cost more money? And I also read that, increasing to a longer TTL is not a good idea as it has some disadvantages also for updating the data in s3.
Cloudfront will cache the response only after the first user requests for it. So it will be slow for the first user, but it will be significantly faster for every other user after the first user. So it does make sense to use Cloudfront.
Using the default TTL value is okay. Since most users will see the same content and the website has a lot of static components as well. Every user except the first user will see a fast response from your website. You could even reduce this to 10-12 hours depending on how often you expect your data to change.
There is no additional cost to increasing your TTL. However invalidation requests are charged. So if you want to remove a cache, there will be a cost added to it. So I would prefer to keep a short TTL as short as your data is expected to change, so you dont have to invalidate existing caches when your data changes. At the same time, maximum number of users can benefit from your CDN.
No additional charge for the first 1,000 paths requested for invalidation each month. Thereafter, $0.005 per path requested for invalidation.
UPDATE: In the event that you only have 1 user using the website over a long period of time (1 week or so), it might not be of much benefit to use CloudFront at all. CloudFront and all caching services are only effective when there are multiple users requesting for the same resources.
However you might still have a marginal benefit using CloudFront, as the requests will be routed from the edge location to S3 over AWS's backbone network which is much faster than the internet. But whether this is cost effective for you or not depends on how many users are using the website and how slow it is.
Aside from using CloudFront, you could also try S3 Cross Region Replication to increase your overall speed. Cross Region Replication can replicate your buckets to a different region as and when they are added in one region. This can help to minimize latency for users from other regions.

How to explain variation between cloudfront update times?

I know cloudfront updates it's servers ~24 hours [source].
My question is why does sometimes it take less than 24 hours? Like sometimes I update s3 and bam the new content is available from XXdomain.com immediately. Other times it seems like it takes the full 24 hours.
How can anyone explain the variation? Why does it seem like a non-standard amount of time to update?
It depends upon whether the request is cached or not. If a request did not a POP (Point of Presence) it need to fetch from origin. If that is the first request, then it will need to contact the origin and serve whatever is there.
In other cases, if already cached, then it will serve whatever is cached. We don't modify contents for longer time. If you purge, usually it takes longer time like 24 hours varies based on the pop location, network availability, cacheable size for a domain, etc.,
You can use either cache headers or set you cache configuration to your desired time.
Hope it helps.
I know cloudfront updates it's servers ~24 hours
That isn't really an accurate description of what happens.
More correctly, we can say that by default, an object cached in a CloudFront edge will be evicted after 24 hours.
There is no active update process. The CloudFront cache is a passive, pull-through cache. When a request arrives, it is served from cache if a cached copy is available and not stale, otherwise a request is sent to the origin server, the object is stored in the cache, and returned to the viewer.
If the cache does not contain a fresh copy of the object, it fetches it immediately from the origin upon request. Thus, the timing of requests made by you and others will determine how frequently it appears that CloudFront is "updating," even though "updating" isn't really an accurate term for what is occurring.
The CloudFront cache is also not monolithic. If you are in the Eastern U.S. and your user base is in Western Europe, you would potentially see the update sooner, because the edge that is handling your request handles less traffic and is this less likely to have a handled a recent request and have a cached copy available.
After updating your content in S3, create a CloudFront invalidation request for /*. This marks everything cached previously as expired so that all subsequent requests will be sent to the origin server and all viewers will see fresh content. Each AWS account can create 1,000 invalidation requests per month (across all distributions combined), at no cost.

Can I expect a delay if I am showing the s3 content in website

Since s3 is eventually consistent, in how much time will the data become consistent. If my user uploads some media, and I if I have to show the same content in website, can i expect a scenario where others users might not see the content in website for some time, and some users can see it.
Its eventually consistent, but in my experience that usually means under a few seconds - so yes it's possible, but only for a very,very small amount of time
Yes it is very quick. Quick enough that any limitations to update speed would not be caused by s3. I host a static website in s3, and I can upload/update a file and see the contents in less than 1 second (i.e. refresh the page as fast as I can).

How to reduce Amazon Cloudfront costs?

I have a site that has exploded in traffic the last few days. I'm using Wordpress with W3 Total Cache plugin and Amazon Cloudfront to deliver the images and files from the site.
The problem is that the cost of Cloudfront is quite huge, near $500 just the past week. Is there a way to reduce the costs? Maybe using another CDN service?
I'm new to CDN, so I might not be implementing this well. I've created a cloudfront distribution and configured it on W3 Total Cache Plugin. However, I'm not using S3 and don't know if I should or how. To be honest, I'm not quite sure what's the difference between Cloudfront and S3.
Can anyone give me some hints here?
I'm not quite sure what's the difference between Cloudfront and S3.
That's easy. S3 is a data store. It stores files, and is super-scalable (easily scaling to serving 1000's of people at once.) The problem is that it's centralized (i.e. served from one place in the world.)
CloudFront is a CDN. It caches your files all over the world so they can be served faster. If you squint, it looks like they are 'storing' your files, but the cache can be lost at any time (or if they boot up a new node), so you still need the files at your origin.
CF may actually hurt you if you have too few hits per file. For example, in Tokyo, CF may have 20 nodes. It may take 100 requests to a file before all 20 CF nodes have cached your file (requests are randomly distributed). Of those 100 requets, 20 of them will hit an empty cache and see an additional 200ms latency as it fetches the file. They generally cache your file for a long time.
I'm not using S3 and don't know if I should
Probably not. Consider using S3 if you expect your site to massively grow in media. (i.e. lots of use photo uploads.)
Is there a way to reduce the costs? Maybe using another CDN service?
That entirely depends on your site. Some ideas:
1) Make sure you are serving the appropriate headers. And make sure your expires time isn't too short (should be days or weeks, or months, ideally).
The "best practice" is to never expire pages, except maybe your index page which should expire every X minutes or hours or days (depending on how fast you want it updated.) Make sure every page/image says how long it can be cached.
2) As stated above, CF is only useful if each page is requested > 100's of times per cache time. If you have millions of pages, each requested a few times, CF may not be useful.
3) Requests from Asia are much more expensive than the from the US. Consider launching your server in Toyko if you're more popular there.
4) Look at your web server log and see how often CF is requesting each of your assets. If it's more often than you expect, your cache headers are setup wrong. If you setup "cache this for months", you should only see a handful of requests per day (as they boot new servers, etc), and a few hundred requests when you publish a new file (i.e. one request per CF edge node).
Depending on your setup, other CDNs may be cheaper. And depending on your server, other setups may be less expensive. (i.e. if you serve lots of small files, you might be better off doing your own caching on EC2.)
You could give cloudflare a go. It's not a full CDN so it might not have all the features as cloudfront, but the basic package is free and it will offload a lot of traffic from your server.
https://www.cloudflare.com
Amazon Cloudfront costs Based on 2 factor
Number of Requests
Data Transferred in GB
Solution
Reduce image requests. For that combine small images into one image and use that image
https://www.w3schools.com/css/tryit.asp?filename=trycss_sprites_img (image sprites)
Don't use CDN for video file because video size is high and this is responsible for too high in CDN coast
What components make up your bill? One thing to check with W3 Total Cache plugin is the number of invalidation requests it is sending to CloudFront. It's known to send a large amount of invalidations paths on each change, which can add up.
Aside from that, if your spend is predictable, one option is to use CloudFront Security Savings Bundle to save up to 30% by committing to a minimum amount for a one year period. It's self-service, so you can sign up in the console and purchase additional commitments as your usage grows.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/savings-bundle.html
Don't forget that cloudfront has 3 different price classes, which will influence how far your data is being replicated, but at the same time, it will make it cheaper.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html
The key here is this:
"If you choose a price class that doesn’t include all edge locations, CloudFront might still occasionally serve requests from an edge location in a region that is not included in your price class. When this happens, you are not charged the rate for the more expensive region. Instead, you’re charged the rate for the least expensive region in your price class."
It means that you could use price class 100 (the cheapest one) and still get replication on regions you are not paying for <3

Is it possible to set Amazon CloudFront cache to far-future expiry?

How do I change the cache expiry in CloudFront on AWS? I can't see a way to do it and I think I saw an old post of a few years ago on here was somebody said it couldn't be done.
I've gone through every option in S3 and CloudFront and every option on the outer folder and on the file, but nothing.
Can it be done now, or is there any alternative? I really want to set the cache to 6 months or a year if I can.
AWS is hard work.
You can, but its not exactly obvious how this works.
You can store custom http headers with your s3 objects. If you look at the console, this is under the metadata section for an object. With this you can set a far future expires header.
Cloudfront will take the existing headers and pass them on. If cloudfront is already caching the object, you will need to invalidate it to see the headers after you set them.