I was wandering (because though I am a programmer I am not good with networking) if I have a site with multiple databases for user accounts, what unifies those servers/multiple databases so it doesn't check the wrong database or sever. So when I go from having 1 server to multiple, will I be able to keep the same application and the databases will expand into those server? If someone suggested a book that would be great!
Understanding how horizontal scaling works is concept which would give you clear understanding of how that's done.
I suggest you to read articles and books which related to that topic.
There are a lot of good books and articles related to that topic, few of them listed below:
Understanding Horizontal and Vertical Scaling
Best Practices For Horizontal Application Scaling
Building Scalable Web Sites: Building, Scaling, and Optimizing the Next Generation of Web Applications 1st Edition
4 Architecture Issues When Scaling Web Applications: Bottlenecks, Database, CPU, IO
Related
This is my first attempt at looking into cloud hosting and I'm feeling like a complete idiot. I have always had my own dedicated server with which I would would remote in and install/manage everything myself. So this cloud thing is completely new for me. I just can't seem to grasp basic things... like how I would get Tomcat and PostgreSQL installed in a way that they could talk to each other or get my domain and SSL cert on there, etc.
If I could just get a feel for where I should start, then I could probably calculate my costs and jump into the free trial where hopefully things will click for me.
Here are my basic, high-level requirements...
My web app running in Tomcat over HTTPS
Let's say approximately 1,000 page views per day
PostgreSQL supporting my web app.
Let's say approximately 10GB database storage
Throughout the day, a fairly steady stream of inbound SFTP data (~ 100MB per day)
The processing load on the app server side should be fairly light. The heaving lifting will be on the DB side sorting through and processing lots of data.
I'm having trouble figuring out which options I would install and calculating costs. If someone could help me get started by saying something like "You would start with a std-xyz-med server, install ABC located here at http://blahblah, then install XYZ located at http://XYZ.... etc.. etc. You can expect to pay somewhere around $100-$200 per month"....
Thoughts?
I would be eternally grateful. It seems like they should have some free sales support channel to ask someone at Google about this, but I don't see it.
Thank You!
I'll try to give you some tips where to start looking.
I will be referring to some products, here are the links
If you want to stick to your old ways, you can always spin up an instance on Compute Engine and set it up the same way you did before, these are just regular virtual machines. For some use cases this is completely valid.
You can split different components of your stack to different products:
For example, if your app is fine with postgresql, you can spin up a fully managed service in Cloud SQL, which might make it easier to manage backup or have several apps access the same db.
Alternatively, have a look at the different DB offerings to see if any of them matches your needed workload better. Perhaps have a look at BigQuery?
If you want to turn your app into a microservice, which is then easier to autoscale and is more fault tolerant, have a look at App Engine. That way you don't need to manage a virtual machine. The docs here will lead you through some easy to follow examples on how to set up SSL.
For the services to talk to each other, refer to docs of the individual components. It's usually very simple.
With pricing, try https://cloud.google.com/products/calculator/
Things like BigQuery have different pricing models - you don't pay for server uptime, but for amounts of data stored & processed with your queries.
I'm the only programmer of a pretty small ISP in a rural area with just around 2000 customers. Now I have finished a couple of semesters in university but I only have a couple of years of experience in the field so I'm uncertain of the architectural decisions that I'm making and was hoping somebody could help me pick the right path.
Most of our internal apps were created 8-10 years ago and are severely outdated and I have been given the job to replace those systems. Most of the basic underlying systems are solid but the apps that we use to manage our customers and connecting those to our internal systems are...lacking to say the least.
Most of these applications were created in PHP back in the day and are using mysql databases. I decided that i was going to create a couple of rest APIs using NodeJS on top of these databases and then create a central app that will take care of connecting all those systems together and making sure they stay up to date with one another.
Now for the question. I've been looking a bit into enterprise architecture and from what I've gathered going with this sort of micro service architecture seems to be a solid plan. However I've also seen a couple articles talking about message buses and my question is if i should instead set up a message bus, for example apache activemq so these services can talk together amongst themselves instead of using a central app that would handle managing all of them.
Are there any specific patterns that i should be reading up on or does what I've come up with look solid enough?
An enterprise service bus will add a lot of complexity to your design, so you need to look at the pro/con to see if it's really necessary. Here is an article you can always upgrade your architecture in the future and migrate the services.
I run some complex services on Apache Tomcat and they work great. Supports a user pool of 70,000. If you build in connection pooling and redundancy you should be fine.
We need our Sitecore web application to process 60-80 web requests per second. We are using Sitecore 7.0. We have tried a 1 Webserver + 1 Database server deployment, but it only processes 20-25 requests per second. Web server queues up all the other requests in the Memory. As we increase the load, memory fills up.(We did all Sitecore performance enhancements recommended). We need 4X performance to reach the goal :).
Will it be possible to achieve this goal by upgrading the existing server, or do we have to add more web servers in production environment.
Note: We are using Lucene indexing as well.
Here are some things you can consider without changing overall architecture of your deployment
CDN to offload media and static asset requests
This leaves your content delivery server available to handle important content queries and display logic.
Example www.cloudflare.com
Configure and use Sitecore's built-in caching
This is from the guide:
Investigation and configuration of the Sitecore Caches is broken down
into multiple tasks. This way each task is more focused and
simplified. The focus is on configuration and tuning of the Sitecore
Database Caches (prefetch, data, and item caches.)
For configuration
of the output rendering caching properties, the customer should be
made aware of both the Sitecore Cache Configuration Reference and the
Sitecore Presentation Component Reference as to how properly enable
and the properties to expire these caches.
Check out the Sitecore Tuning Guide
Find Slow Queries or Controls
It sounds like your application follows Sitecore best practices, but I leave this note in for anyone that might find this answer. Use Sitecore's built-in Debug mode to identify the slowest running controls and sublayouts. Additionally, if you have Analytics set up there is a "Slow Pages" report that might give you some information on where your application is slowing down.
Those things being said, if you're prepared to provision additional servers and set up a load-balanced environment then read on.
Separate Content Delivery and Content Management
To me the first logical step before load-balancing content delivery servers is to separate the content management from the equation. This is pretty easy and the Scaling Guide walks you through getting the HistoryEngine set up to keep those Lucene indexes up to date.
Set up Load Balancer with 2 or more Content Delivery servers
Once you've done the first step this can be as easy as cloning your content delivery server and adding it to your load balancer "pool". There are a couple of things to consider here like: Does your web application allow users to log in? So you'll need to worry about sticky sessions or machine keys. Does your web application use file media instead of blob media? I haven't had to deal with this, but I understand that's another consideration.
Scale your SQL solution
I've seen applications with up to four load balanced content delivery servers and the SQL Server did not have a problem - I think this will be unique to each case depending on a lot of factors: horsepower and tuning of SQL Server, content model of your application, complexity of your queries, caching configuration on content delivery servers, etc. Again, the Scaling Guide covers SQL Mirroring and Failover, so that is going to be your first stop on getting that going.
Finally, I would say contact Sitecore. These guys have probably seen more of what's gone right and what's gone wrong with installations and could get you on the right path. Good luck!
This answer written from a Sitecore developer perspective:
Bottom line: You need to figure out exactly where your performance bottleneck is. That is going to take some digging, but will be very worthwhile. You should definitely be able to serve 60-80 requests/s without any trouble... but of course that makes a lot of assumptions about the nature of your site and the requests.
For my site, I found Sitecore's caching implementation to be sub-par... I created some very simple and aggressive application-specific caches in my app and this made all the difference in the world. For instance, we have 900+ "Partner" items where our sites' advertisements live... and simply putting all these objects in an array in the Application object sped up page requests significantly. Finding an object in a Hashtable indexed by its Item.Name or ID is going to be a lot faster than Sitecore.Context.Database.GetItem("/itempath") or a SelectItems() call (at least, that's my experience). If your architecture and data set will allow this strategy, we've had good experience with it.
Another thing to watch out for is XSLT renderings. Personally, I avoid them completely in favor of ASP.NET UserControls. The XSLT rendering is just slow. As much as 10x slower than a native UserControl rendering the same HTML. So if you have a few of these... replace with some custom code and you'll see a world of difference.
I have a web app running on php, mysql, apache on a virtual windows server. I want to redesign it so it is scalable (for fun so I can learn new things) on AWS.
I can see how to setup an EC2 and dump it all in there but I want to make it scalable and take advantage of all the cool features on AWS.
I've tried googling but just can't find a simple guide (note - I have no command line experience of Linux)
Can anyone direct me to detailed resources that can lead me through the steps and teach me? Or alternatively, summarise the steps in an answer so I can research based on what you say.
Thanks
AWS is growing and changing all the time, so there aren't a lot of books to help. Amazon offers training that's excellent. I took their three day class on Architecting with AWS that seems to be just what you're looking for.
Of course, not everyone can afford to spend the travel time and money to attend a class. The AWS re:Invent conference in November 2012 had a lot of sessions related to what you want, and most (maybe all) of the sessions have videos available online for free. Building Web Scale Applications With AWS is probably relevant (slides and video available), as is Dissecting an Internet-Scale Application (slides and video available).
A great way to understand these options better is by fiddling with your existing application on AWS. It will be easy to just move it to an EC2 instance in AWS, then start taking more advantage of what's available. The first thing I'd do is get rid of the MySql server on your own machine and use one offered with RDS. Once that's stable, create one or more read replicas in RDS, and change your application to read from them for most operations, reading from the main (writable) database only when you need completely current results.
Does your application keep any data on the web server, other than in the database? If so, get rid of all local storage by moving that data off the EC2 instance. Some of it might go to the database, some (like big files) might be suitable for S3. DynamoDB is a good place for things like session data.
All of the above reduces the load on the web server to just your application code, which helps with scalability. And now that you keep no state on the web server, you can use ELB and Auto-scaling to automatically run multiple web servers (and even automatically launch more as needed) to handle greater load.
Does the application have any long running, intensive operations that you now perform on demand from a web request? Consider not performing the operation when asked, but instead queueing the request using SQS, and just telling the user you'll get to it. Now have long running processes (or cron jobs or scheduled tasks) check the queue regularly, run the requested operation, and email the result (using SES) back to the user. To really scale up, you can move those jobs off your web server to dedicated machines, and again use auto-scaling if needed.
Do you need bigger machines, or perhaps can live with smaller ones? CloudWatch metrics can show you how much IO, memory, and CPU are used over time. You can use provisioned IOPS with EC2 or RDS instances to improve performance (at a cost) as needed, and use difference size instances for more memory or CPU.
All this AWS setup and configuration can be done with the AWS web console, or command-line tools, or SDKs available in many languages (Python's boto library is great). After learning the basics, look into CloudFormation to automate it better (I've written a couple of posts about that so far).
That's a bit of the 10,000 foot high view of one approach. You'll need to discover the details of each AWS service when you try to use them. AWS has good documentation about all of them.
Depending on how you look at it, this is more of a comment than it is an answer, but it was too long to write as a comment.
What you're asking for really can't be answered on SO--it's a huge, complex question. You're basically asking is "How to I design a highly-scalable, durable application that can be deployed on a cloud-based platform?" The answer depends largely on:
The specifics of your application--what does it do and how does it work?
Your tolerance for downtime balanced against your budget
Your present development and deployment workflow
The resources/skill sets you have on-staff to support the application
What your launch time frame looks like.
I run a software consulting company that specializes in consulting on Amazon Web Services architecture. About 80% of our business is investigating and answering these questions for our clients. It's a multi-week long project each time.
However, to get you pointed in the right direction, I'd recommend that you look at Elastic Beanstalk. It's a PaaS-like service that abstracts away the underlying AWS resources, making AWS easier to use for developers who don't have a lot of sysadmin experience. Think of it as "training wheels" for designing an autoscaling application on AWS.
Sorry for the very involved question, but this is something I've been researching for a while now and it is really frustrating me. I feel like in today's age we have a million and one ways to implement services tat are cross-platform (SOAP) and easy to build (thanks to .NET, java, and other frameworks). However, these technologies have been in the community for 5-10 years, but we are (or at least I am) constantly plagued with the same issues:
Identification (Tracking services) - UDDI; e.g., had to remind a co-worker the 3 times this month where a service is at, despite the fact there is a wiki that discusses the service and a PDF version of the same documentation that lives in a repository where we keep our service docs.
Scalability - Out of the box clustering; As organizations, we spend a lot of money on paying our admins just to watch the utilization of our services and make decisions like, does this service need more RAM, more CPU, more interfaces? How do I load balance this?
Monitoring - error logging, etc; I can't count how many times I have to set up tracing on services in order to see why a bug is happening that only seems to affect one customer, or have to code logic into the service to serialize exceptions, log exceptions to dbs, fail gracefully, etc.
Deployment - easy to deploy; none of this deploying DLLs to 5 load balanced servers
Each one of these problems requires some type of custom solution implemented by the organization. Documentation and UDDIs for #1. Virtualization and load balancing hardware / software for #2. Tracing, writing exceptions to databases / logs, etc for #3. Custom deployment software for #4. I work for a mid-sized organization. I can't even imagine how a company the size of Sun, Google, or Microsoft would tackle these dilemmas.
Maybe my vision is unrealistic, but I dream of having a Framework per se that lives on top of a server cluster that manages all of the above. I was ecstatic to read about Microsoft's AppFabric since it really seems to extend some of the functionality of BizTalk to WCF service implementors: Caching, Hosting, Monitoring, etc. However, from what I've seen, I still don't feel it lives up to my dream for an all-in-one solution that assists the developer and organization in writing services that are scaled across clusters easily, deployed into the cluster easily, and identifiable, possibly even version-able.
So, I don't mean this post to be about my dream. I do actually have a question. For starters, is my dream / want completely unrealistic? Furthermore, what solutions are there available that attempt to solve these problems without confining us to a new and more proprietary way (BizTalk) of developing services? An lastly, in concern to a complete SOA / ESB solution, where do we see the most potential in the market right now or in the future?
I think that you are talking about different kinds of problems here.
1). Developers who don't read documentation. This is an endemic problem, not limited to SOA - just look at some questions on StackOverflow. At least the developer is asking you whether there is a service, rather then just duplicating logic in their own code. I don't see any technical solution to these kinds of problems, you've already provided good registries and documentation, but some developers prefer to talk to people. Maybe, even, this is actually a good thing - human interaction has value above the technical content of the interaction. Or maybe, you're too nice: "No, I won't answer that question, look it up."
2). Scaling. There are technologies addressing this issue. (Disclaimer I work for IBM, who sell some, so I'll reference these - I'm not intending to imply that IBM are the only vendor with solutions in this space.) There are products such as this that can provision a new machine, install a software stack and add it to a cluster to address workload changes. Then at a finer grained level of control in the Java EE world the Application Server can dynamically shape traffic and adjust clusters. See WebSphere Virtual Enterprise
3). Monitoring. I don't "get" what you expect here. In all likelyhood such tricky bugs will require application level trace. For some problems such as finding memory leaks and performance bottlenecks there are very good tools, at least in the Java EE world.
4). I can't speak to the .Net world, but I'd say that Java EE app servers do a reasonable job of deploying the apps across clusters smoothly, and in the cases where we use JNI and need DLLs deploying then we can use products such as the Tivoli stack I mention to manage this.
So, in summary, I do think that vendors are trying to address these issues. And I don't think your life would be simpler without SOA. Imagine instead the same problems applied to myriad separate, independent applications.
Here's my two cents.
I've been a developer at a company that used SOA incorrectly. The worst solution they implemented was field level validation of form elements on a desktop app using SOA. To perform acceptably these require very low latency. A 2-4 second wait to change to a new field gets old fast. The service ran over the network on a biztalk server. Everyone hated it.
If you're going to do this you really need to spend a lot of time dealing with network latency, service failure, timing, and timeout issues.
Don't get carried away and think SOA is the solution to every problem. Used at a high level it's great, used at a low level it makes your applications fragile, slow, and impossible to debug.
If you talk to IBM or one of the big SOA vendors, they got a products that cover each scenario.
Identification (Tracking services) - UDDI; e.g., had to remind a co-worker the 3 times this month where a service is at, despite the fact there is a wiki that discusses the service and a PDF version of the same documentation that lives in a repository where we keep our service docs.
Registry and Repository server. Nice thing is that it does governance (promotion, demotion, versioning, approval) and your ESB typically does a "lookup" for the latest and greatest against the register server.
Scalability - Out of the box clustering; As organizations, we spend a lot of money on paying our admins just to watch the utilization of our services and make decisions like, does this service need more RAM, more CPU, more interfaces? How do I load balance this?
Transaction monitoring software like IBM Tivoli Composite Application Manager for SOA. Basically, it tracks things from a horizontal point of view and to see if there is a service disruption from a end user/end app point of view.
As far as your clustering.... you have to pick good middleware and architecture. Personally speaking, get stuff that is "cloud" ready. App Servers with NoSQL connected by MOM.
Monitoring - error logging, etc; I can't count how many times I have to set up tracing on services in order to see why a bug is happening that only seems to affect one customer, or have to code logic into the service to serialize exceptions, log exceptions to dbs, fail gracefully, etc.
Enterprise standards for your developers and for your vendors. Integration of all business and system events into a single dashboard. (Most companies spilt them). This is done already at most enterprise shops.
Deployment - easy to deploy; none of this deploying DLLs to 5 load balanced servers
Ahh.. Microsoft IIS Web Deployment Tool 2.0. You can sync 100s of MS servers by just updating the master. It's really easy.