Where to go to learn about web architecture? Youtube example?

I'm trying to build a web application that is similar to Youtube (it's not a knock off), but I guess I don't know how video is served on the internet very well.
I know how to build regular database driven web applications, but nothing like the scalability of Youtube. All of the applications I have built before have all been run on one server with the files stored on the same box as the web server.
How does one decouple the application server from the file storage from the media server?
I would more or less want 4 machines (clusters of machines)
1.) Application servers
-- Present the web page, handle user uploads, link the user's flash player to the correct media server etc.
2.) Database shards
-- Store user information, check favorites, etc.
3.) File storage
-- Store the media files
4.) Media servers
-- Serve the media files
How do I hook all of this together? Which technologies should I leverage? Where do I go to learn more about architecting this?
How does Youtube's embeddable flash stuff work? I want to embed my flash player on other websites and have it tie into my architecture.
Note I have looked into: http://highscalability.com/youtube-architecture
But I still don't get the overall picture of how this stuff ties together.
If someone can explain in high level terms how all of this stuff works?
Are there dedicated client servers running internally to shuffle around all of this stuff between the application servers, file storage, etc. Is it all via HTTP using JSON, what is going on here!

Two books I'd recommend are:
Scalable Internet Architectures
Building Scalable Web Sites
The latter is by the director of engineering at flickr. Not youtube, but I think you'll find it enlightening.
Beyond that, the High Scalability blog is a good source of case studies and collected wisdom, all of which provide a good starting point for further exploration.

Start by hiring the right people; if you hire smart people, they'll be able to come up with answers to these questions, and more which will crop up.
Also, start at the scale that you plan to initially operate at. Don't plan for scalability you don't need. You aren't going to be making another Youtube - even if you're very successful within your field.
Scalability is expensive - very expensive - to develop and maintain. If you don't need it, it will drain your resources and restrict your developers needlessly. Just building a credible test environment for high performance systems tends to be a big job, and such a system would require several such environments.


What do we actually mean by large scale web application?

What do we actually mean by large scale web application ?
What is the criteria to call it large scale web application
Is it number of code lines of a application or
Number of user per day of a web application. say 10K per day ?
What do we actually mean by large scale web applications? That depends who you ask.
If people have build smaller applications in the past and need to build a larger one now that handles more data and more traffic, they might call it a large scale web application. But if you then compare that site with ones like LinkedIn, Facebook, or Google then it will still be a small application.
People have different notions of what large is, what's large for some might be medium size for others and small for some others. But a large scale web application has characteristics such as these:
performance, being able to handle a large number (millions) of users/request or a large number (thousands) of transactions per second. Both have challenges depending on the type of app (CPU bound, IO bound or both).
scalability, the horizontal type, not only at the web server level but also at the database level. Depending on what you are doing a RDBMS might not cut it anymore so you have to walk the NoSQL path, sometimes using more than one product as NoSQL solutions tend to be specialized systems tackling a specific use cases as opposed to being a general purpose database as RDBMSs are. A lot of integration challenges arise from connecting heterogeneous solutions together and make them behave as a single application.
large scale applications are distributed, taking advantage of CDNs or running the app on servers geographically closer to the user. You can easily have hundreds or thousands of server nodes, with a large sys admin team having to manage the setup. If you don't have your own data centers you can run in the cloud.
besides a large sys admin team, you often have a large development team needing to optimize for performance and scalability, designers and front end developers working on providing a fluid user interface, with mobile support, etc;
having to deal with a large volume of data and with lots of data types. No longer handling just products, customers, orders, etc, but also clicks, page views/hits, events, logs, customer behavior tracking, etc. This is in relation to the previous NoSQL point, but also with this volume of data these apps tend to have a large back-office that offers all kinds of reports, graphs, administrative tools, etc to manage the app itself.
availability 24/7;
some other, miscellaneous keywords to throw into the mix like SOA architecture, microservices, data-warehouses, security, distributed caches, continuous deployment, etc.
These are some of the characteristics (I think) large web scale web apps have in common and it's important to think about these aspects up front and deal with all challenges that come from them from the beginning, when building the app.

Search Engine Necessary?

In my application, I have a bunch of service providers in my database offering various services. I need a user to be able to search through these service providers by either name, location, or both. I also need a user to be able to filter the providers by different criteria, based on multiple attributes.
I am trying to decide if I could implement this simply with database queries or if a more robust solution (i.e. a search engine) would better suit my needs.
Please let me know the pros and cons of either solution, and which you think would be best to go with.
I am writing my application in Django 1.7, using a PostGIS database, and would use django-haystack with elasticsearch if a search engine is the way to go here.
Buddy,It seems that you are working on a search intensive application.Now my opinion in this regard is as follows-:
1)If u use search intensive queries directly with the database,Then automatically overhead is gonna be very high as each time a separate criteria based query is to be fired to the database engine from your django.Each time query is to be built with seperate parameters and is to be built to fire at the backend database engine. Consequence is it will make you highly dependent on the availability of database server.Things can go more worse if database server will be located in some remote location.As overhead of network connectivity will be another addendum to this.
2)You should try to implement a server side caching system like redis that is a in-memory nosql database (sometimes also called a data structure server) that will beat all the problems I discussed in my previous point.Read more about it here.
3)To powerpack your search.Read about Apache Solr enter link description here.A lucene based search library this will power pack your search to the next level.
4)Last but not least go with case studies of biggies like facebook,twitter etc regarding how they are managing their infrastructure.You will get even more better idea.
Any doubts or suggestions.Kindly comment cheers :-)

What does it take to scale Django?

So I've been Django-ing for a number of months*. I find myself in a position where, I'm able to code up a Django web app for whatever, but am terrified by my inability** to come up with solutions as to how to go about building a Django web app for a large (LARGE) audience. Good to know that Django scales, at least.
How I'm thinking about it
It seems like there would need to be a relatively large leap of knowledge to understand how to (let alone actually execute) scale a Django web app. I say this because my research has given me the impression that scaling (or, enabling scalability) is a process of fitting aftermarket solutions to the different components of your web app to enhance the performance of each of these components.
So there's a ton of solutions, and a bunch of components. For instance, there's Elastic Beanstalk for hosting, Django's cache framework, Memcached and Varnish for caching, Cassandra, Redis and PostgreSQL for databases, and uWSGI, Nginx and Apache for deployment. If what I think is right, anyway. I'm still not sure.
What I Need
I crave that amazing response that becomes the canonical answer to the question, but would also appreciate leads on where to begin, or suggestions of an approach to take to solve the problem, or your approach to scale Django. Thank you in advance for your been-there-done-that words of wisdom. << Edit: SO disapproves :(
What I need
What are the 3 most important/effective things I should do/implement to improve the preparedness for scaling of the Django web apps that I'm building? List the approach, and explaining how they help would be nice.
*I've been cheating. I deploy on Pythonanywhere and have only used Sqlite3 up till now. I have also managed to keep my hands clean of WSGI/Apache deployment stuff to date.
**With Django is when I first managed to create something of value through programming. Before, I had only used Pascal to cheat at Runescape and Java to make some shitty Android apps. Which could perhaps explain why I feel this is that large of a leap.
I really wouldn't worry too much about it initially. That said, here are some ideas for how you might want to think about scaling your Django apps.
Depending on what your application is, caching can be very useful indeed. Certainly for any application that has a high proportion of reads to writes, such as a blog or content management system, then implementing caching is a no-brainer. For other types of sites, you may have to be a bit more careful, however the Django caching framework makes it straightforward to customise how caching works for your application.
Memcached is easy to set up with the Django caching, and it's solid and reliable. It should probably be your default choice as the caching backend.
If your web app does any appreciable number of tasks in the background that need not be done during the same HTTP requests, then you should consider using Celery to carry them out in a separate task.
Case in point: on a Django app I built, there was the option to send an email to a client with a PDF copy of a report attached. Because the email need not be sent within the same HTTP request, then I handed that task off to Celery. Now, when the app receives the HTTP request, it just pushes the request to send that email onto the messaging queue. The Celery process picks up this task and handles it separately.
In theory that task could be handled on an entirely separate machine when your web app gets big enough.
Web server
It seems to be generally accepted that serving static content and dynamic content with Django is a bad idea. The solution I use seems to be fairly typical and employs two web servers:
Nginx runs on port 80. It serves all the static files and reverse proxies everything else to another port
Gunicorn runs on that other port and it serves the dynamic content, and Supervisor is used to run the Gunicorn process
There are variants of this general idea, but this kind of two server approach seems to be common. You could also consider using something like Amazon's S3 to host static files.
It's also well worth your while taking the time to minify your static files to improve their performance. Using a tool like Grunt it's quite easy to concatenate and minify your JavaScript and CSS files so that only one of each need be downloaded, rather than including many files that need to be downloaded individually.
Either MySQL or Postgresql will be fine. Both are solid databases that are used in production on many websites.
As I said higher up, scaling your app shouldn't really be too much of a concern early on. However, it helps to be familiar with the kind of strategies you'll need to use.

Using Microsoft Office software as part of my web service backend?

What licensing issues arise if I install and use Microsoft Office software (in this case Visio) as part of my web service backend?
My company's flagship piece of software can convert Microsoft Visio files for use in their environment, but of course requires a local install of Visio to decode the files. The system I'm to create is to offer a sort of web service where people can upload their Visio files, and then we can show off the benefits of buying our full price software.
In order to do this I'd need an install of our software on the server, as well as Visio. What I'm a little concerned about is technically any visitor to the site is technically using Visio. I can't really find any other examples when searching online (it doesn't help when things like "server", "cloud" are essentially buzzwords) so any advice would be greatly appreciated!
I don't know the legal details but MS say if you do this every user would require a Visio Licence. You can certainly do it technically but MS also warn that office automation was intended to be done in an interactive session - I take this to mean they don't guarantee that its not going to pop up a dialog or something at some point. They provide server side options for most office products but not Visio.
I don't know what your application is but I can think of three options that may be relevant:
Create a downloadable application that opens Visio and converts the file to your internal format and then uploads it to your server
Have files uploaded to the server which then creates a task for someone in your company to download the file and do something with it. You could significantly automate this process
Get the users to upload VDX files and process the data as XML
Note if your application is using Visio in such a way that you don't have your own internal data structure can you use option 1 and just have some of the functionality done on the server through authenticate web services? this way they get to see what it can do but it only works while connected to your server.

Web Applications & Desktop Applications

I am a programmer who writes a lot of code for desktop applications, now started considering cross-platform apps as an issue but at work I write C# apps and I come from C++ and CS background and of course, I wrote several things in QT/C++. But now I am kinda confused about web applications, I have done some work on PHP and I know how things go there, I was a gmail and google docs user for a lot of time and I have seen how much web applications were improved with new web 2.0 technology including Ajax, XML so on. And my confusion is that should I start looking forward for web application development? and continue exploring the power of web 2.0 or I have to just stick with my old world where I feel very comfortable on parallelism and other stuff? Because believe me I had too many offers to work as a web application developer but I didn't realize this opportunity and now I am kinda confused whether I must start writing web apps. Have you been writing desktop applications and switched to web? or have somebody experience in this scenario?
Thank you.
The boundaries between desktop and web applications have really blurred. Whilst once upon a time the nature of developing for the web was totally different to developing for the desktop, nowadays you find the same concepts (such as parallelism which you referred to) cropping up in both. Don't think of developing web applications as taking a huge step away from traditional software development as you'll employ just as many skills and concepts as you already use. You wouldn't need to learn a whole lot more to get involved in web development if you have C# experience, as you could code backends to web applications in a very similar way to how you currently work. If you wanted/needed to get involved in the UI side of things, there are new technologies you'd need to pick up, but they're not essential to get a job in web development (as long as you weren't looking for a frontend role obviously).
To follow up Dustman's comments about companies wanting to keep tight control of their data etc; bear in mind that not all "web applications" involve the use of the internet. Really all the term means is "applications developed on web-based technologies" and as well as being deployed publically on the web, they're commonly deployed on intranets and other closed-access environments. I work for a software company which develops "web applications" but a large number of systems are hosted by clients for use on their internal networks for the very reasons Dustman refers to - they want to keep tight control of their data. The beauty of web based technologies is that you can achieve this whilst still reaping the benefits of a centralised system, meaning there is no need to manage deployment across 100s of workstations, no need to worry too much about the specifications of client devices, the ability to access the system across different types of device (mobile etc), regular and easily deployed updates, and so I could continue.
It's all about what kind of programs you want to be writing. End-user apps already have already started a significant move to being web-oriented, because of the advantages that some companies find in outsourcing their data handling and IT infrastructure. Because this area of development is a new and growing sector, I have no doubt that you will be getting all kinds of offers, and hearing all about new startups and so forth that are centered on developing these kinds of applications.
That doesn't mean that desktop apps are going to go away. Some companies, and lots of private individuals like to have a sense of being in physical possession of their data, and see no monetary benefit in "renting" an online app or in outsourcing their data handling. These people are going to keep the desktop app market open in the foreseeable future, although perhaps not to the extent that we have seen previously.
So at this point, you needn't feel forced to make a move into the web game, but there are certainly opportunities there if you want them.
In the near future, the boundary between the web development and the desktop application development will go on erasing. For a professional programmer, learning new things is the real growth. learning web development for an experienced programmer is not a difficult task. you can obviously go ahead and learn the web development. You should recognize web well as it will certainly come to meet the desktop apps in the near future.