When building an application/system that is to be run in the cloud (e.g., AWS),
is it recommended to always make single-purpose instances?
For example, should I have two instances running MySQL (master+slave), and then two web-server instances, instead of combining web+MySQL in one (possible larger) instance?
Whats the pros and cons, except separation of concerns?
The primary reasons why it's better to have single-purposes instances are:
1) It's easier scale. (eg: just scale up the bottlenecks rather than having to scale the entire stack)
2) It's more secure (eg: your MySQL database isn't on a server that has port 80 open because it also needs to accept your http traffic)
The only good reason not to have single-purpose instances is price. It costs money and for some people it's too much.
If you're doing any kind of e-commerce then definitely use single-purpose instances since most security standards (like PCI-DSS for example) require it. If you're running a content site that doesn't have any e-commerce components and doesn't accept sensitive data from your users, then you can probably be a little looser to save a few bucks, but I don't recommend it.
Splitting database apart from the front end web server(s) is a standard recommendation.
Here's a good writeup on some of the issues to consider:
http://www.mysqlperformanceblog.com/2006/10/16/should-mysql-and-web-server-share-the-same-box/
Related
I've been trying to wrap my head around the best solution for hosting development sites for our company lately.
To be completely frank I'm new to AWS and it's architecture, so more then anything I just want to know if I should keep learning about it, or find another more suitable solution.
Right now we have a dedicated server which hosts our own website, our intranet, and a lot of websites we've developed for clients.
Our own web and the intranet isn't an issue, however I'm not quite sure about the websites we produced for our clients.
There are about 100 of them right now, these sites are only used pre-launch so our clients can populate the sites with content. As soon as the content is done we host the website somewhere else. And the site that is still on our developer server is no longer used at all, but we keep them there if the client wants a new template/function so we can show it there before sending it to production.
This means the development sites have almost zero traffic, with perhaps at most 5 or so people adding content to them at any given time (5 people for all 100 sites, not 5 per site).
These sites needs to be available at all times, and should always feel snappy.
These are not static sites, they all require a database connection.
Is AWS (ES2, or any other kind of instance, lightsail?) a valid solution for hosting these sites. Or should I just downgrade our current dedicated server to a VPS, and just worry about hosting our main site on AWS?
I'll put this in an answer because it's too long, but it's just advice.
If you move those sites to AWS you're likely to end up paying (significantly) more than you do now. You can use the Simple Monthly Calculator to get an idea.
To clarify, AWS is cost-effective for certain workloads. It is cost effective because it can scale automatically when needed so you don't have to provision for peak traffic all the time. And because it's easy to work with, so it takes fewer people and you don't have to pay a big ops team. It is cost effective for small teams that want to run production workloads with little operational overhead, up to big teams that are not yet big enough to build their own cloud.
Your sites are development sites that just sit there and see very little activity. Which means those sites are probably under the threshold of cost effective.
You should clarify why you want to move. If the reason is that you want as close to 100% uptime as possible, then AWS is a good choice. But it will cost you, both in terms of bill paid to Amazon and price of learning to set up such infrastructure. If cost is a primary concern, you might want to think it over.
That said, if your requirements for the next year or more are predictable enough and you have someone who knows what they are doing in AWS, there are ways to lower the cost, so it might be worth it. But without further detail it's hard for anyone to give you a definitive answer.
However. You also asked if you should keep learning AWS. Yes. Yes, you should. If not AWS, one of the other major clouds. Cloud and serverless[1] are the future of much of this industry. For some that is very much the present. Up to you if you start with those dev sites or something else.
[1] "Serverless" is as misleading a name as NoSQL. It doesn't mean no servers.
Edit:
You can find a list of EC2 (Elastic Cloud Compute) instance types here. That's CPU and RAM. Realistically, the cheapest instance is about $8 per month. You also need storage, which is called EBS (Elastic Block Store). There are multiple types of that too, you probably want GP2 (General Purpose SSD).
I assume you also have one or more databases behind those sites. You can either set up the database(s) on EC2 instance(s), or use RDS (Relational Database Service). Again, multiple choices there. You probably don't want Multi-AZ there for dev. In short, Multi-AZ means two RDS instances so that if one crashes the other one takes over, but it's also double the price. You also pay for storage there, too.
And, depending on how you set things up you might pay for traffic. You pay for traffic between zones, but if you put everything in the same zone traffic is free.
Storage and traffic are pretty cheap though.
This is only the most basic of the basics. As I said, it can get complicated. It's probably worth it, but if you don't know AWS you might end up paying more than you should. Take it slow and keep reading.
I'm trying to understand the layout of the microservices pattern. Given that each microservice would run on its on VM (for sake of example) how does the database fit into this architecture?
Would each service, in turn, connect to the consolidated database to read/write data?
Thanks for any insight
There's no one size fits all solution.
The general principle is that each microservice should make the right decision for itself in terms of what the right persistence architecture should be. It might be connected to a central SQL database, or it could be using a filesystem, or it could be using NoSQL data store, or memcached, or whatever. (This is why people talk about eventual consistency a lot with microservices.)
You want to do it this way to really capture the benefits of microservices.
You want each microservice to be independently shippable, so that you're not blocked on anything. Stronger coupling to centralized infrastructure reduces the independence of the microservice.
Persistence requirements are highly variable. If you're running a search microservice, you don't need the ACID semantics of a typical SQL database. If you're doing payments, you need ACID. If you're storing and processing images, you might just use the filesystem. Etc.
In my experience when dealing with mSOA it always comes to Data Warehouse solution in the end. And this is the natural choice if you have a dedicated DB (cluster) per micro-service. After all the business should be able to use that info from your domain. Even Data Vault Modeling will be a good fit here.
Google Compute Engine lets you get a group of instances that are semantically local in the sense that only they can talk to each other and all external access has to go through a firewall etc. If I want to run Map-Reduce or other kinds of cluster jobs that are going to induce high network traffic, then I also want machines that are physically local (say, on the same rack). Looking at the APIs and initial documentation, I don't see any way to request that; does anyone know otherwise?
There is no support in GCE right now for specifying rack locality. However, we built the system to work well in the face of large numbers of instances talking to each other in a fully connected way, as long as they are in the same zone.
This is one of the things that allowed MapR to approach the record for a hadoop terasort. You can see that in action in the video for the Criag Mcluckie's talk from IO:
https://developers.google.com/events/io/sessions/gooio2012/302/
The best way to see is to test out your application and see how it works.
I was contracted to make a groupon-clone website for my client. It was done in PHP with MYSQL and I plan to host it on an Amazon EC2 server. My client warned me that he will be email blasting to about 10k customers so my site needs to be able to handle that surge of clicks from those emails. I have two questions:
1) Which Amazon server instance should I choose? Right now I am on a Small instance, I wonder if I should upgrade it to a Large instance for the week of the email blast?
2) What are the configurations that need to be set for a LAMP server. For example, does Amazon server, Apache, PHP, or MySQL have a maximum-connections limit that I should adjust?
Thanks
Technically, putting the static pages, the PHP and the DB on the same instance isn't the best route to take if you want a highly scalable system. That said, if the budget is low and high availablity isn't a problem then you may get away with it in practise.
One option, as you say, is to re-launch the server on a larger instance size for the period you expect heavy traffic. Often this works well enough. You problem is that you don't know the exact model of the traffic that will come. You will get a certain percentage who are at their computers when it arrives and they go straight to the site. The rest will trickle in over time. Having your client send the email whilst the majority of the users are in bed, would help you somewhat, if that's possible, by avoiding the surge.
If we take the case of, say, 2,000 users hitting your site in 10 minutes, I doubt a site that hasn't been optimised would cope, there's very likely to be a silly bottleneck in there. The DB is often the problem, a good sized in-memory cache often helps.
This all said, there are a number of architectural design and features provided by the likes of Amazon and GAE, that enable you, with a correctly designed back-end, to have to worry very little about scalability, it is handled for you on the most part.
If you split the database away from the web server, you would be able to put the web server instances behind an elastic load balancer and have that scale instances by demand. There also exist standard patterns for scaling databases, though there isn't any particular feature to help you with that, apart from database instances.
You might want to try Amazon mechanical turk, which basically lots of people who'll perform often trivial tasks (like navigate to a web page click on this, etc) for a usually very small fee. It's not a bad way to simulate real traffic.
That said, you'd probably have to repeat this several times, so you're better off with a load testing tool. And remember, you can't load testing a time-slicing instance with another time-slicing instance...
Could someone please direct me to some good documentation or feedback here on what are best practices for implementing web services in an application that handles different concerns? For example, should I create different services, one that handles security, (AuthService), one that handles data-entry for customer service reps, (CRUDService), BillingService and so on or should I just encapsulate all these "services" into one, e.g. ApplicationService? Basically, I am asking if it is bad design to create multiple services (files) within one application. Can some of you note on your experiences or what you've experienced?
Also, let's say three of the listed services from above connect to the same database, but are actually hitting totally different concerns, e.g. one is for all transactions like CRUD, and the other one is for purely reporting purposes. Should I create two services here, one CRUDService and the other for ReportingService? Is it bad to create two different database connections via these 2 services? Or how can I share the same database connection with different services?
I think there is a tendency among publicly available services to just dump everything into one service. Which, may not be a bad idea for a publicly available API. It just makes it easier for developers. However, for any project i work on, i try to break things down into logical groups. This way your client doesn't need to be inheriting functionality it may not need. Updating services would also be a slightly easier task because you're only affecting a certain subset of your web service framework and not everything. So if your service contract breaks and your clients no longer support it, they may still be able to use other parts of your system, but not that particular one. Where as if you break a contract on your aggregated service, everything fails. Finally, if you have to implement something like a fail-over support, you have more flexibility to choose which service requires more fail-over nodes, allowing you to better manage your resources allocation.
If you want best practices take a look to the SOA Design Pattern Catalog