Github pages site size limits? - github-pages

What are the limitations on the amount of data on github pages?
Main github repo a limited in 1 GB (https://help.github.com/articles/what-is-my-disk-quota/), and what about github pages?
UPD
On http://www.quora.com/What-are-bandwidth-and-traffic-limits-for-GitHub-pages I found two much different answers.

GitHub now has new policies of a 1GB filesize repository, warnings for pushes of files over 50 MB and complete rejection for fileuploads of 100MB. GitHub warns you when you push a file larger than 50 MB.
https://help.github.com/articles/what-is-my-disk-quota/

Related is bandwidth usage (amount of data transferred). There is a limit but it isn't documented. Taken from the GitHub Terms of Service:
If your bandwidth usage significantly exceeds the average bandwidth
usage (as determined solely by GitHub) of other GitHub customers, we
reserve the right to immediately disable your account or throttle your
file hosting until you can reduce your bandwidth consumption.

https://docs.github.com/en/free-pro-team#latest/github/working-with-github-pages/about-github-pages#guidelines-for-using-github-pages
Usage limits
GitHub Pages sites are subject to the following usage limits:
GitHub Pages source repositories have a recommended limit of 1GB. For more information, see "What is my disk quota?"
Published GitHub Pages sites may be no larger than 1 GB.
GitHub Pages sites have a soft bandwidth limit of 100GB per month.
GitHub Pages sites have a soft limit of 10 builds per hour.

Related

Edge Caching Size Limit

I am trying to take advantage of the built-in Cloud Storage edge caching feature. When a valid Cache-Control header is set, the files can be stored at edge locations. This is without having to set up Cloud Load Balancer & CDN. This built-in behavior is touched on in this Cloud Next '18 video.
What I am seeing though is a hard limit of 10 MB. When I store a file over 10 MB and then download it, it's missing the Age response header. A 9 MB file will have it. The 10 MB limit is mentioned in the CDN docs here, though. What doesn't make sense to me is why files over 10 MB don't get cached to the edge. After all, the Cloud Storage server meets all the requirements, it even says: Cloud Storage supports byte range requests for most objects.
Does anyone know more about the default caching limits? I can't seem to find any limits documented for Cloud Storage.
At the moment, the information of the cache limit managed by the objects in Cloud Storage is documented in the Cloud DN documentation and what you describe is an expected behavior.
Origin server does not support byte range requests: 10 MB (10,485,760
bytes)
Performing tests on my side with files of an exact size of 10,485,760 bytes include the Age field.
However files above that limit such as 10,485,770 no longer include it.
I recommend you create a feature request here in order to improve the Google Cloud Storage documentation.
In this way you will have direct communication with the team responsible for the documentation and your request may be supported by other members of the community.

Reduce CloudFront costs when serving static S3 website over HTTPS

I maintain a 'hobby' website to experiment with AWS technologies. Because it's a pure hobby, I am trying to keep its costs as low as possible, and only use those services that are absolutely necessary.
Over the months, the website has started to generate some traffic, about 30-50 hits per day, and on some days it has had up to 1K hits per day.
I am using CloudFront (CF) for the main purpose of having HTTPS and having a way to connect my domain with my S3 website bucket, but the costs have been going up as a result of the increase in hits.
Obviously, at this stage, the costs are manageable (few dollars p. month), but as I said my goal is to keep costs to an absolute minimum, and CF is starting to be the lion share of my costs.
Reviewing the CF costs in Bill Details, show me that HTTPS requests and especially Bandwidth make up the lion share of the costs.
I am looking for a way that allows me to continue using CF for the HTTPS and for having a way to point my domain securely serve from the S3 bucket, but to reduce costs resulting from the requests and bandwidth.
The website is static and entirely hosted on S3. It contains:
an index.html - auto-updated every hour
10 category pages (250 kilobyte in size each) - auto-updated every hour, they contain links to the detail pages
< 1,000 details pages (100 kilobyte in size each) - these are created once, and then never changed again
< 1,000 images (50 kilobyte in size each) - each detail page has 1 image, their behaviour is the same as details pages
My CF configuration is as follows:
no Origin Custom Headers
Behaviour:
Path pattern: Default (*)
Viewer protocol policy: Redirect HTTP to HTTPS
Cache Based on Selected Request Headers: Whitelist
Whitelist Headers: Referer
Object Caching: Customize
Min. TTL: 0
Max. TTL: 31536000
Default TTL: 0
Forward Cookies: None
Query String Forwarding and Caching: None
No geo restrictions
Analysing the majority of CF cost being Bandwidth, this tells me that it may be the page and image files that is causing this. I.e. when people load my pages, and the image files are being served, it adds up to 100 kb + 50 kb per page.
Based on my research on CF, I suspect that the Path Pattern and TTL parameters is what needs to be optimised here to achieve a cost reduction. If someone could point me in the right direction that would be great.
Bandwidth costs are proportional to the amount of data retrieved from your website.
Amazon S3: 9c/GB
Amazon CloudFront: 8.5c/GB to 17c/GB depending upon location
Some ideas to reduce your costs:
Change the CloudFront distribution to use Price Class 100, which only serves traffic from the lower-cost locations. Users in other locations will have slower access, but you'll save money!
Increase your default TTL so that content remains cached longer, resulting in fewer repeat requests.
Activate and examine CloudFront Access Logs to analyse incoming traffic. It might be that a lot of requests are coming from spiders and bots. You can limit such access by creating a robots.txt file.
Reduce the filesize of images by lowering quality. The trade-off in quality might be worth the cost savings.
Make a less-popular website. That will lower your costs! :)

Cloud Run Request Limit

Currently, Cloud Run has a request limit of 32 Mb per request, which makes it impossible to upload files like videos (which placed with no changes to GCP Storage). Meantime All Quotas page doesn't list this limitation as the one you can request an increase from support. So question is - does anyone know how to increase this limit or how to make it possible (uploading video and bigger files) to Cloud Run with given limitation?
Google recommended best practice is to use Signed URLs to upload files, which is likely to be more scalable and reliable (over flaky networks) for file uploads:
see this url for further information:
https://cloud.google.com/storage/docs/access-control/signed-urls
As per official GCP documentation, the maximum request size limit for Cloud Run (which is 32 MB) cannot be increased.
Update since the other answers were posted - the request size can be unlimited if using HTTP/2.
See https://cloud.google.com/run/quotas#cloud_run_limits which says "Request Maximum HTTP/1 request size, in MB 32 if using HTTP/1 server. No limit if using HTTP/2 server."

Is switching over to Amazon S3 for Drupal 7 image hosting worth it?

So I just have a quick question with regards to using amazon s3.
I have a small Drupal 7 site hosted on a VPS with not too much storage space. I put together the site for members of my School's Photographic Arts Committee to upload photos of School events and projects.
The full-quality photos are stored in a private folder on the server, and the images displayed on the site are watermarked 2048px width ones stored publicly.
I'm worried that I'm going to blow my storage space very fast, and I fear that I'm going to blow my not-really-exsistant budget on using amazon s3 with the module in Drupal.
So, I would like to know if it is a worthy investment using amazon s3, I'll be willing to spend +/- $5 dollars on it.
My monthly usage will include 3gigs worth of uploads and probably 20 gigs max downloads. Obviously slowly increasing.
Also, a bit confused about storage billing, do I have to pay for say my 50gigs worth storage from uploads from previous months, or just the 3 gigs of storage I used this month
PS: I live in South Africa and will probably use the Ireland S3 servers as they have the best latency.
Any feedback much appreciated!
Thanks.
S3 may be a good option in your case, given your limited storage space.
You can calculate things fairly easily. Ignoring the 'requests' charge since it's tiny, here's the formula for Ireland:
(gb of storage * 0.3) + (avg image size * requests * 0.09) + (requests * 0.005/1000)
There are some volume discounts and some "first N transfer free", but this is a good ceiling, especially for a low-volume site as you mention. Also note storing the full-size images (and not downloading them) means only the first third of the formula matters. As an example, if you have 5gb in full-size images plus another 1gb in 350kb "2048px" images that sum to 10,000 image views per month:
full-size: 5*.03=.15
2048 hosting/downloads: (1*.03)+(0.00033*10000*.09)+(10000*.0004/10000)=0.3274
So, your monthly costs are about 50 cents.
What happens if your site is slashdotted? Imagine you get 10 million hits:
full-size: 5*.03=.15
2048 hosting/downloads: (1*.03)+(0.00033*10000000*.09)+(10000000*.004/10000)=301.03
So, your monthly cost is now over $300. (this is why billing alarms are important!)
Now, let's imagine you put cloudfront in front of S3 (which is a really good idea for several reasons) and look at the pricing in this scenario. (I've simplified the pricing here a little bit, and assuming nothing is loaded twice by the same browser, so no caching)
full-size: 5*.03=.15
2048 hosting/downloads: (1*.03)+(0.00033*10000000*.085)+(10000000*.009/10000)=289.53
so it saved about $10 but gave you better performance.
If you need more features (image resizing, for instance), you may want to consider a photo host like Flickr or Smugmug. They pay for bandwidth, which makes your costs more predictable.

How to reduce Amazon Cloudfront costs?

I have a site that has exploded in traffic the last few days. I'm using Wordpress with W3 Total Cache plugin and Amazon Cloudfront to deliver the images and files from the site.
The problem is that the cost of Cloudfront is quite huge, near $500 just the past week. Is there a way to reduce the costs? Maybe using another CDN service?
I'm new to CDN, so I might not be implementing this well. I've created a cloudfront distribution and configured it on W3 Total Cache Plugin. However, I'm not using S3 and don't know if I should or how. To be honest, I'm not quite sure what's the difference between Cloudfront and S3.
Can anyone give me some hints here?
I'm not quite sure what's the difference between Cloudfront and S3.
That's easy. S3 is a data store. It stores files, and is super-scalable (easily scaling to serving 1000's of people at once.) The problem is that it's centralized (i.e. served from one place in the world.)
CloudFront is a CDN. It caches your files all over the world so they can be served faster. If you squint, it looks like they are 'storing' your files, but the cache can be lost at any time (or if they boot up a new node), so you still need the files at your origin.
CF may actually hurt you if you have too few hits per file. For example, in Tokyo, CF may have 20 nodes. It may take 100 requests to a file before all 20 CF nodes have cached your file (requests are randomly distributed). Of those 100 requets, 20 of them will hit an empty cache and see an additional 200ms latency as it fetches the file. They generally cache your file for a long time.
I'm not using S3 and don't know if I should
Probably not. Consider using S3 if you expect your site to massively grow in media. (i.e. lots of use photo uploads.)
Is there a way to reduce the costs? Maybe using another CDN service?
That entirely depends on your site. Some ideas:
1) Make sure you are serving the appropriate headers. And make sure your expires time isn't too short (should be days or weeks, or months, ideally).
The "best practice" is to never expire pages, except maybe your index page which should expire every X minutes or hours or days (depending on how fast you want it updated.) Make sure every page/image says how long it can be cached.
2) As stated above, CF is only useful if each page is requested > 100's of times per cache time. If you have millions of pages, each requested a few times, CF may not be useful.
3) Requests from Asia are much more expensive than the from the US. Consider launching your server in Toyko if you're more popular there.
4) Look at your web server log and see how often CF is requesting each of your assets. If it's more often than you expect, your cache headers are setup wrong. If you setup "cache this for months", you should only see a handful of requests per day (as they boot new servers, etc), and a few hundred requests when you publish a new file (i.e. one request per CF edge node).
Depending on your setup, other CDNs may be cheaper. And depending on your server, other setups may be less expensive. (i.e. if you serve lots of small files, you might be better off doing your own caching on EC2.)
You could give cloudflare a go. It's not a full CDN so it might not have all the features as cloudfront, but the basic package is free and it will offload a lot of traffic from your server.
https://www.cloudflare.com
Amazon Cloudfront costs Based on 2 factor
Number of Requests
Data Transferred in GB
Solution
Reduce image requests. For that combine small images into one image and use that image
https://www.w3schools.com/css/tryit.asp?filename=trycss_sprites_img (image sprites)
Don't use CDN for video file because video size is high and this is responsible for too high in CDN coast
What components make up your bill? One thing to check with W3 Total Cache plugin is the number of invalidation requests it is sending to CloudFront. It's known to send a large amount of invalidations paths on each change, which can add up.
Aside from that, if your spend is predictable, one option is to use CloudFront Security Savings Bundle to save up to 30% by committing to a minimum amount for a one year period. It's self-service, so you can sign up in the console and purchase additional commitments as your usage grows.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/savings-bundle.html
Don't forget that cloudfront has 3 different price classes, which will influence how far your data is being replicated, but at the same time, it will make it cheaper.
https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PriceClass.html
The key here is this:
"If you choose a price class that doesn’t include all edge locations, CloudFront might still occasionally serve requests from an edge location in a region that is not included in your price class. When this happens, you are not charged the rate for the more expensive region. Instead, you’re charged the rate for the least expensive region in your price class."
It means that you could use price class 100 (the cheapest one) and still get replication on regions you are not paying for <3