Multiple distributions without assigning CNAME - amazon-web-services

From my understanding, the SSL option on CloudFront is a costly option (out of reach for me). Therefore, I am considering using the https://*.cloudfront.com option.
One of the perks of CF over S3 is the ability of assigning multiple custom domains to get the benefit concurrent parallel HTTP connections, ie. cdn0.domain.com, cdn1.domain.com, etc.
Since custom domain + SSL is not an option, does CF have a wildcard option of the https://*[0,1,2,3].cloudfront.com variant to a single distribution?

The solution would be to only have multiple CF distributions in this case... one for images, another for static code (JS, CSS) etc. Typically if you are already keeping these images etc in S3 bucket, have separate bucket for each type (say one for image) and make these buckets origin servers in the CF distribution.
Having said that, the concurrent connections in a browsers have increased over time. It is not that small anyway. Typically a page needs to load only one JS (combined. minified), one CSS(combined. minified) and one Image for Icons (sprited). This is like only 1 + 3 connections - not too high. Other images (like large thumbnails etc) in the page anyway come from another CF distibution. So you dont have to "artifically" create sub-domains for performance.
This shows the state of connections now : (What's the maximum number of simultaneous connections a browser will make?)

Related

Seamlessly Transferring Heroku Domain to Another Domain

Use case: 100s of clients each on different Heroku boxes because of a previous need for unique env variables per client. The person or persons who set this up didn't expect these 100s of clients, only a few. Obviously this wasn't built to scale.
Current Challenge: Now we've got a big code push coming and it will enable us to move each of these clients to one box, or a few smaller boxes. Each of these clients has an IT team, and will need to point to our new box's DNS target. Heroku doesn't allow for a subdomain (whatever.coredomain.com)to be on multiple boxes. Essentially for every single client, we'll need to coordinate downtime to take the old box down, add the domain to the new box, then have the client point to the latest CNAME.
Is there a clever way to do this without downtime?
I'm familiar with the AWS suite (and we do use some services) and wouldn't be opposed to moving to it, but Heroku has a lot of built in tools that our not-so-tech-saavvy team uses.

best practice for streaming images in S3 to clients through a server

I am trying to find the best practice for streaming images from s3 to client's app.
I created a grid-like layout using flutter on a mobile device (similar to instagram). How can my client access all its images?
Here is my current setup: Client opens its profile screen (which contains the grid like layout for all images sorted by timestamp). This automatically requests all images from the server. My python3 backend server uses boto3 to access S3 and dynamodb tables. Dynamodb table has a list of all image paths client uploaded, sorted by timestamp. Once I get the paths, I use that to download all images to my server first and then send it to the client.
Basically my server is the middleman downloading the sending the images back to the client. Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe. Plus I don't know how I can give clients access to S3 without giving them aws credentials...
Any suggestions would be appreciated. Thank you in advance!
What you are doing will work, and it's probably the best option if you are optimising for getting something working quickly, w/o worrying too much about waste of server resources, unnecessary computation, and if you don't have scalability concerns.
However, if you're worrying about scalability and lower latency, as well as secure access to these image resources, you might want to improve your current architecture.
Once I get the paths, I use that to download all images to my server first and then send it to the client.
This part is the first part I would try to get rid of as you don't really need your backend to download these images, and stream them itself. However, it seems still necessary to control the access to resources based on who owns them. I would consider switching this to below setup to improve on latency, and spend less server resources to make this work:
Once I get the paths in your backend service, generate Presigned urls for s3 objects which will give your client temporary access to these resources (depending on your needs, you can adjust the time frame of how long you want a URL access to work).
Then, send these links to your client so that it can directly stream the URLs from S3, rather than your server becoming the middle man for this.
Once you have this setup working, I would try to consider using Amazon CloudFront to improve access to your objects though the CDN capabilities that CloudFront gives you, especially if your clients distributed in different geographical regions. AFA I can see, you can also make CloudFront work with presigned URLs.
Is this the right way of doing it? It seems that if the client accesses S3 directly, it'll be faster but I'm not sure if that is safe
Presigned URLs is your way of mitigating the uncontrolled access to your S3 objects. You probably need to worry about edge cases though (e.g. how the clients should act when their access to an S3 object has expired, so that users won't notice this, etc.). All of these are costs of making something working in scale, if you have that scalability concerns.

How to use CloudFront efficiently for less popular website?

We are building a website which contains a lot of images and data. We have optimized a lot to make the website faster. Then we decided to use AWS CloudFront also to make it faster for all regions around the world. The app works faster after the integration of CloudFront.
But later we found that the data will load to CloudFront cache only when the website asks for it. So we are afraid that the initial load will take the same time as it used to take without the CDN because it loads from S3 to CDN first and then to the user.
Also, we used the default TTL values (ie., 24 hours). In our case, a user may log in once or twice per week to this website. So in that case also, the advantage of caching won't work here as well because the caching expires after 24 hours. Will raising the time of TTL (Maximum TTL) to a larger value solve the issue? Does it cost more money? And I also read that, increasing to a longer TTL is not a good idea as it has some disadvantages also for updating the data in s3.
Cloudfront will cache the response only after the first user requests for it. So it will be slow for the first user, but it will be significantly faster for every other user after the first user. So it does make sense to use Cloudfront.
Using the default TTL value is okay. Since most users will see the same content and the website has a lot of static components as well. Every user except the first user will see a fast response from your website. You could even reduce this to 10-12 hours depending on how often you expect your data to change.
There is no additional cost to increasing your TTL. However invalidation requests are charged. So if you want to remove a cache, there will be a cost added to it. So I would prefer to keep a short TTL as short as your data is expected to change, so you dont have to invalidate existing caches when your data changes. At the same time, maximum number of users can benefit from your CDN.
No additional charge for the first 1,000 paths requested for invalidation each month. Thereafter, $0.005 per path requested for invalidation.
UPDATE: In the event that you only have 1 user using the website over a long period of time (1 week or so), it might not be of much benefit to use CloudFront at all. CloudFront and all caching services are only effective when there are multiple users requesting for the same resources.
However you might still have a marginal benefit using CloudFront, as the requests will be routed from the edge location to S3 over AWS's backbone network which is much faster than the internet. But whether this is cost effective for you or not depends on how many users are using the website and how slow it is.
Aside from using CloudFront, you could also try S3 Cross Region Replication to increase your overall speed. Cross Region Replication can replicate your buckets to a different region as and when they are added in one region. This can help to minimize latency for users from other regions.

Can a distribution automatically match the subdomain from a request to figure out the origin

We're adding a lot of nearly equivalent apps on the same domain, each app can be accessed through its specific subdomain. Each app has got specific assets (not a lot).
Every app refer to the same cdn.mydomain.com to get the assets from cloudfront.
Assets are named spaced. For exemple:
app1:
Can be reached from app1.mydomain.com
assets url is cdn.mydomain.com/assets/app1
cloudfront orgin app1.mydomain.com
cache behavior /assets/app1/* to origin app1.mydomain.com
When Cloudfront doesn't have the assets in cache, it downloads it from the right origin.
Actually we're making a new origin and cache behavior on the same distribution each time we add a new app.
We're trying to simplify that process so Cloudfront can be able to get the assets from the right origin without having to specify it. And this will resolve the problem if we hit the limit of the number of origin in one distribution.
How can we do this and is it possible?
We're thinking of making an origin of mydomain.com with a cache configure to forward the host header but we're not sure that this will do the trick.
Origins are tied to Cache Behaviors, which are tied to path patterns. You can't really do what you're thinking about doing.
I would suggest that you should create a distribution for each app and each subdomain. It's very easy to script this using aws-cli, since once you have one set up the way you like it, you can use its configuration output as a template to make more, with minimal changes. (I use a Perl script to build the final JSON to create each distribution, with minimal inputs like alternate domain name and certificate ARN and pipe its output into aws-cli.)
I believe this is the right approach, because:
CloudFront cannot select the origin based on the Host header. Only the path pattern is used to select the origin.
Lambda#Edge can rewrite the path and can inspect the Host header, but it cannot rewrite the path before the matching is done that selects the Cache Behavior (and thus the origin). You cannot use Lambda#Edge to cause CloudFront to switch or select origins, unless you generatre browser redirects, which you probably don't want to do, for performance reasons. I've submitted a feature request to allow a Lambda trigger to signal CloudFront that it should return to the beginning of processing and re-evaluate the path, but I don't know if it is being considered as a future feature -- AWS tends to keep their plans for future functionality close to the vest, and understandably so.
you don't gain any efficiency or cost savings by combining your sites in a single distribution, since the resources are different
if you decide to whitelist the Host header, that means CloudFront will cache responses, separately, based on the Host header, the same as it would do if you had created multiple distributions. Even if the path is identical, it will still cache separate responses if the Host header differs, as it must to ensure sensible behavior
the default limit for distributions is 200, while the limit for origins and cache behaviors is 25. Both can be raised by request, but the number of distributions they can give you is unlimited, while the other resources are finite because they increase the workload on the system for each request and would eventually have a negative performance impact
separate distributions gives you separate logs and reports
provisioning errors have a smaller blast radius when each app has its own distribution
You can also go into Amazon Certificate Manager and a wildcard certificate for * *.cdn.example.com. Then use e.g. app1.cdn.example.com as the alternate domain name for the app1 distribution and attach the wildcard cert. Then reuse the same cert on the app2.cdn.app.com distribution, etc.
Note that you also have an easy migration strategy from your current solution: You can create a single distribution with *.cdn.example.com as its alternate domain name. Code the apps to use their own unique-name-here.cdn.example.com. Point all the DNS records here. Later, when you create a distribution with a specific alternate domain name foo.cdn.example.com, CloudFront will automatically stop routing those requests to the wildcard distribution and start routing them to the one with the specific domain. You will need to change the DNS entry... but CloudFront will actually handle the requests correctly, routing them to the newly-created distribution, before you change the DNS, because it has some internal magic that will match the non-wildcard hostname to the correct distribution regardless of whether the browser connects to the new endpoint or the old... so the migration event should pretty much be a non-event.
I'd suggest the wildcard strategy is a good one, anyway, so that your apps are each connecting to a specific endpoint hostname, allowing you much more flexibility in the future.

How to reduce Amazon Cloudfront costs?

I have a site that has exploded in traffic the last few days. I'm using Wordpress with W3 Total Cache plugin and Amazon Cloudfront to deliver the images and files from the site.
The problem is that the cost of Cloudfront is quite huge, near $500 just the past week. Is there a way to reduce the costs? Maybe using another CDN service?
I'm new to CDN, so I might not be implementing this well. I've created a cloudfront distribution and configured it on W3 Total Cache Plugin. However, I'm not using S3 and don't know if I should or how. To be honest, I'm not quite sure what's the difference between Cloudfront and S3.
Can anyone give me some hints here?
I'm not quite sure what's the difference between Cloudfront and S3.
That's easy. S3 is a data store. It stores files, and is super-scalable (easily scaling to serving 1000's of people at once.) The problem is that it's centralized (i.e. served from one place in the world.)
CloudFront is a CDN. It caches your files all over the world so they can be served faster. If you squint, it looks like they are 'storing' your files, but the cache can be lost at any time (or if they boot up a new node), so you still need the files at your origin.
CF may actually hurt you if you have too few hits per file. For example, in Tokyo, CF may have 20 nodes. It may take 100 requests to a file before all 20 CF nodes have cached your file (requests are randomly distributed). Of those 100 requets, 20 of them will hit an empty cache and see an additional 200ms latency as it fetches the file. They generally cache your file for a long time.
I'm not using S3 and don't know if I should
Probably not. Consider using S3 if you expect your site to massively grow in media. (i.e. lots of use photo uploads.)
Is there a way to reduce the costs? Maybe using another CDN service?
That entirely depends on your site. Some ideas:
1) Make sure you are serving the appropriate headers. And make sure your expires time isn't too short (should be days or weeks, or months, ideally).
The "best practice" is to never expire pages, except maybe your index page which should expire every X minutes or hours or days (depending on how fast you want it updated.) Make sure every page/image says how long it can be cached.
2) As stated above, CF is only useful if each page is requested > 100's of times per cache time. If you have millions of pages, each requested a few times, CF may not be useful.
3) Requests from Asia are much more expensive than the from the US. Consider launching your server in Toyko if you're more popular there.
4) Look at your web server log and see how often CF is requesting each of your assets. If it's more often than you expect, your cache headers are setup wrong. If you setup "cache this for months", you should only see a handful of requests per day (as they boot new servers, etc), and a few hundred requests when you publish a new file (i.e. one request per CF edge node).
Depending on your setup, other CDNs may be cheaper. And depending on your server, other setups may be less expensive. (i.e. if you serve lots of small files, you might be better off doing your own caching on EC2.)
You could give cloudflare a go. It's not a full CDN so it might not have all the features as cloudfront, but the basic package is free and it will offload a lot of traffic from your server.
https://www.cloudflare.com
Amazon Cloudfront costs Based on 2 factor
Number of Requests
Data Transferred in GB
Solution
Reduce image requests. For that combine small images into one image and use that image
https://www.w3schools.com/css/tryit.asp?filename=trycss_sprites_img (image sprites)
Don't use CDN for video file because video size is high and this is responsible for too high in CDN coast
What components make up your bill? One thing to check with W3 Total Cache plugin is the number of invalidation requests it is sending to CloudFront. It's known to send a large amount of invalidations paths on each change, which can add up.
Aside from that, if your spend is predictable, one option is to use CloudFront Security Savings Bundle to save up to 30% by committing to a minimum amount for a one year period. It's self-service, so you can sign up in the console and purchase additional commitments as your usage grows.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/savings-bundle.html
Don't forget that cloudfront has 3 different price classes, which will influence how far your data is being replicated, but at the same time, it will make it cheaper.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html
The key here is this:
"If you choose a price class that doesn’t include all edge locations, CloudFront might still occasionally serve requests from an edge location in a region that is not included in your price class. When this happens, you are not charged the rate for the more expensive region. Instead, you’re charged the rate for the least expensive region in your price class."
It means that you could use price class 100 (the cheapest one) and still get replication on regions you are not paying for <3