Concurrent connections and hosting a website "at home" - concurrency

I have searched for an already answered question regarding this topic, but I can hardly find what I seek.
My question is simple and straightforward: I have a blog on .com domain which uses wordpress software and is currently hosted at a company in my country. It currently supports only 30 concurrent connections. I'm pretty familiar with those terms but if the traffic on my website will go very high, I'm considering to buy servers and host it at home instead of getting a more expensive hosting plan. If I do so, what do I need? For example: how many concurrent connections will a server (PC) handle? How many and powerful servers you need for, say, 1 milion daily unique visits?

Depending on your website optimization and caching and hardware requirements, any moderate pc today can handle 1 million daily visits easily.
1m requests per day = ~11 requests per second, which really isn't that much, assuming the sql optimization is NOT really bad.
I would suggest that you start hosting up the site on the hardware you have, whatever it is, and monitor real stats/statistics, and see for yourself what kind of hardware you might need, this experiment will tell you whether you want to optimize more or not, and if you need a fast cpu, or more ram.
You need to consider few important things on the long run, like power bills and backups.
Some draft hardware requirements:
CPU: any Core i3 2xxx or Core i3 3xxx or faster.
RAM: ram is really cheap these days, go for 8gb or 16gb, I do not see one site needing that much, but extra ram will help with harddisk IO read caching.
Storage: All storage devices need to be in RAID1 or any other RAID that provide some redundancy, plus some daily/weekly backups, .. consider getting SSDs from a reputable brand (Intel or Samsung) with 5 years warranty.
Power Backup: Get a UPS
Those might be an overkill (depending on the amount of optimization you will have), but they should give you a nice and relatively fast server.

Related

Hosting several "in development"-sites on AWS

I've been trying to wrap my head around the best solution for hosting development sites for our company lately.
To be completely frank I'm new to AWS and it's architecture, so more then anything I just want to know if I should keep learning about it, or find another more suitable solution.
Right now we have a dedicated server which hosts our own website, our intranet, and a lot of websites we've developed for clients.
Our own web and the intranet isn't an issue, however I'm not quite sure about the websites we produced for our clients.
There are about 100 of them right now, these sites are only used pre-launch so our clients can populate the sites with content. As soon as the content is done we host the website somewhere else. And the site that is still on our developer server is no longer used at all, but we keep them there if the client wants a new template/function so we can show it there before sending it to production.
This means the development sites have almost zero traffic, with perhaps at most 5 or so people adding content to them at any given time (5 people for all 100 sites, not 5 per site).
These sites needs to be available at all times, and should always feel snappy.
These are not static sites, they all require a database connection.
Is AWS (ES2, or any other kind of instance, lightsail?) a valid solution for hosting these sites. Or should I just downgrade our current dedicated server to a VPS, and just worry about hosting our main site on AWS?
I'll put this in an answer because it's too long, but it's just advice.
If you move those sites to AWS you're likely to end up paying (significantly) more than you do now. You can use the Simple Monthly Calculator to get an idea.
To clarify, AWS is cost-effective for certain workloads. It is cost effective because it can scale automatically when needed so you don't have to provision for peak traffic all the time. And because it's easy to work with, so it takes fewer people and you don't have to pay a big ops team. It is cost effective for small teams that want to run production workloads with little operational overhead, up to big teams that are not yet big enough to build their own cloud.
Your sites are development sites that just sit there and see very little activity. Which means those sites are probably under the threshold of cost effective.
You should clarify why you want to move. If the reason is that you want as close to 100% uptime as possible, then AWS is a good choice. But it will cost you, both in terms of bill paid to Amazon and price of learning to set up such infrastructure. If cost is a primary concern, you might want to think it over.
That said, if your requirements for the next year or more are predictable enough and you have someone who knows what they are doing in AWS, there are ways to lower the cost, so it might be worth it. But without further detail it's hard for anyone to give you a definitive answer.
However. You also asked if you should keep learning AWS. Yes. Yes, you should. If not AWS, one of the other major clouds. Cloud and serverless[1] are the future of much of this industry. For some that is very much the present. Up to you if you start with those dev sites or something else.
[1] "Serverless" is as misleading a name as NoSQL. It doesn't mean no servers.
Edit:
You can find a list of EC2 (Elastic Cloud Compute) instance types here. That's CPU and RAM. Realistically, the cheapest instance is about $8 per month. You also need storage, which is called EBS (Elastic Block Store). There are multiple types of that too, you probably want GP2 (General Purpose SSD).
I assume you also have one or more databases behind those sites. You can either set up the database(s) on EC2 instance(s), or use RDS (Relational Database Service). Again, multiple choices there. You probably don't want Multi-AZ there for dev. In short, Multi-AZ means two RDS instances so that if one crashes the other one takes over, but it's also double the price. You also pay for storage there, too.
And, depending on how you set things up you might pay for traffic. You pay for traffic between zones, but if you put everything in the same zone traffic is free.
Storage and traffic are pretty cheap though.
This is only the most basic of the basics. As I said, it can get complicated. It's probably worth it, but if you don't know AWS you might end up paying more than you should. Take it slow and keep reading.

How to adjust and measure network performance on AWS

Lately, I have been struggling to understand what is my network speed (downlink) between nodes on AWS (in a multi-homed cluster, computers in different regions).
I have a lot of fluctuations when I measure it with a script which I have written (based on this link and SCP) or with Iperf.
I believe it is based on network use which changes rapidly (mostly between regions), but I still don't understand AWS documentation about what is the performance I am paying for, a minimum and a maximum downlink rate for example (aws instances).
At first, I have tried the T2 type, and as I saw it had burst CPU performance, I thought that maybe the NIC performance is also bursty so I have moved to M4 type, but I have got the same problems with M4.
Is there any way to know my NIC downlink rate based on the type and flavor?
*I have asked a similar question on the AWS forum, but I haven't got a response (https://forums.aws.amazon.com/thread.jspa?threadID=296389).
There is no way to get a better indication that your measuring. AWS does not publish anything indicating this performance, and unless we are talking the larger instance where network performance is actually specifically given. I.e. m5.12xlarge having 10 gbps. Most likely network performance does have a burst component for smaller instance types.
There are pages with other peoples benchmarks, but you won't find any official answer for any of this.

Separate server for Memcache/Redis?

I am using Django for my project and I ll be hosting it on Linode or any other hosting service. Plus if I want to use memcache will I require a new Linode for it? Means just one server will be ok or I ll have to host my site on 2 servers, one for memcache and one for django? And is it the same for Redis? Also will I require a separate server for Mysql?
I don't think you understand that nobody is a fortune telling wizard. Nobody knows how many requests you will receive per second, nor how cpu/memory intensive each request will be. Nobody knows how optimized your code is. Nobody knows if your application is read heavy or write heavy. Your use case is your own, and your probably the only one who estimate it.
My only actual advice to you is to try to estimate your server data and sever load and benchmark your setup on one machine. If you are unsatisfied with the performance then scale up. You can either scale up vertically, by increasing the size of your linode, or scale horizontally by adding more linode instances. In the latter case, you will most likely put your DB on a machine of it's own and have multiple django instances fed by a load balancer. These Django instances could each share the same memcache on a machine, or they can each have their own memcaches on their own machine. Which one is better? I can't tell you. It again depends on your use case.
If I were you, I would set it all up on one linode instance. I would create test data that I assume would be close to real world. Then I would try to test my response times with an estimated number of requests per second. I would measure response times, cache hits, and memory usage. I would then decide based on that if my use case is satisfied with this level of performance or not because I'm really the only one who would know what is satisfactory performance. Additionally, adding more linode resources is not necessarily where I would first try and improve performance.
Some great tips on optimizing and benchmarking can be found here:
https://docs.djangoproject.com/en/1.8/topics/performance/
http://blog.disqus.com/post/62187806135/scaling-django-to-8-billion-page-views
http://scottbarnham.com/blog/2008/04/28/django-performance-testing-a-real-world-example/
Late night reading about scaling up Django can be found in many books, I like this one:
https://highperformancedjango.com/
Sorry if I sound a bit blunt, I just want you to understand that nobody can walk in here and give you an answer with a large degree of confidence. This question doesn't have a straight-forward answer.
TL;DR Start with one instance and scale up only if you've convinced yourself you need to.
You say Memcached or Redis, so I assume Redis would be deployed without persistence, with a purely in-memory configuration.
In such case both Memcached and Redis are unlikely to get saturated even if you run them in one server, since the limiting factor is more likely to be a single Django instance if your requests/second go high.
However you should make sure to have enough memory and to configure an appropriate max memory usage for Memcached / Redis (different ways to accomplish this in the two different services). Note that under memory pressure, the Linux OOM killer may kill your cache otherwise, so if you go for a single instance, which seems to me a sensible first step, make sure your Django memory usage plus the memory you allocate for caching, are not enough to go near the limits of the instance free memory.
CPU is hardly going to be an issue as I said since Memcached / Redis are pretty good at using little CPU, so I can't foresee a setup where Django is ok serving pages but the instance is in trouble since the CPU is burned by the cache.

Can I improve performance of my GCE small instance?

I'm using cloud VPS instances to host very small private game servers. On Amazon EC2, I get good performance on their micro instance (1 vCPU [single hyperthread on a 2.5GHz Intel Xeon], 1GB memory).
I want to use Google Compute Engine though, because I'm more comfortable with their UX and billing. I'm testing out their small instance (1 vCPU [single hyperthread on a 2.6GHz Intel Xeon], 1.7GB memory).
The issue is that even when I configure near-identical instances with the same game using the same settings, the AWS EC2 instances perform much better than the GCE ones. To give you an idea, while the game isn't Minecraft I'll use that as an example. On the AWS EC2 instances, succeeding world chunks would load perfectly fine as players approach the edge of a chunk. On the GCE instances, even on more powerful machine types, chunks fail to load after players travel a certain distance; and they must disconnect from and re-login to the server to continue playing.
I can provide more information if necessary, but I'm not sure what is relevant. Any advice would be appreciated.
Diagnostic protocols to evaluate this scenario may be more complex than you want to deal with. My first thought is that this shared core machine type might have some limitations in consistency. Here are a couple of strategies:
1) Try backing into the smaller instance. Since you only pay for 10 minutes, you could see if the performance is better on higher level machines. If you have consistent performance problems no matter what the size of the box, then I'm guessing it's something to do with the nature of your application and the nature of their virtualization technology.
2) Try measuring the consistency of the performance. I get that it is unacceptable, but is it unacceptable based on how long it's been running? The nature of the workload? Time of day? If the performance is sometimes good, but sometimes bad, then it's probably once again related to the type of your work load and their virtualization strategy.
Something Amazon is famous for is consistency. They work very had to manage the consistency of the performance. it shouldn't spike up or down.
My best guess here without all the details is you are using a very small disk. GCE throttles disk performance based on the size. You have two options ... attach a larger disk or use PD-SSD.
See here for details on GCE Disk Performance - https://cloud.google.com/compute/docs/disks
Please post back if this helps.
Anthony F. Voellm (aka Tony the #p3rfguy)
Google Cloud Performance Team

Django-based app might grow out of control - how can I scale it?

I need some guidance in the realm of server architecture for Django.
My current Django-based web app stats (reached in two weeks - run on one VPS w/ Apache, mod_wsgi, mysql):
10,000 users total
20 avg requests/user/day
200,000 requests/day
8,000 users access site daily
Where the app could reach (where I'd be panicking - this assumes approx linear growth):
200,000 users total
20 avg requests/user/day
4,000,000 requests/day
160,000 users access site daily
The issue here is really just handling page requests. I only store short strings of text-based data, so DB size shouldn't be an issue.
What sort of server architecture should I be setting up from a hardware and software perspective? I need to think about caching, load balancing, multiple processing servers, multiple DB servers, etc, but don't know where to start.
You're projected growth of ~45 / requests per second really isn't that intensive. I think using a standard nginx load balancer in front of your web servers will handle everything. If your DB access isn't very intense you will probably do fine with just 1 DB machine.
I really think the most important thing is not to do any premature optimization. Deal with issues as they come, or else you may end up wasting a lot of time.
There are tons of caching, multiple server configurations, and load balancing tutorials.
Google is a good place to start.
Growing traffic is a standard problem, there are no lack of tutorials on these things.