Setting up Nginx as a reverse proxy for Apache vs just Apache Event MPM

Setting up Nginx as a reverse proxy for Apache vs just Apache Event MPM - django

In the Django docs for setting up mod_wsgi, the tutorial notes:
Django doesn’t serve files itself; it leaves that job to whichever Web
server you choose.
We recommend using a separate Web server – i.e., one that’s not also
running Django – for serving media. Here are some good choices:
Nginx
A stripped-down version of Apache
I understand this might be due to wasted resources when Apache spawns new processes to serve each static file, which Nginx avoids. However, Apache's (newish?) Event MPM seems to act similar to an Nginx instance handing off requests to an Apache worker mpm. Therefore I'd like to ask: instead of setting up Nginx to be a reverse proxy for Apache, would using an Apache Event MPM be sufficient for serving static files in Apache?

Apache doesn't spawn a new process for each static file. Apache keeps persistent processes to handle concurrent and subsequent requests just like nginx. The difference is that nginx uses a full async model, whereas Apache relies on processes and/or threading for concurrency, although event MPM uses an async model for initial request acceptance and keep alive connections now. For the majority of people, Apache alone is still a more than acceptable solution. So don't get ahead of yourself if you are just starting out and think you need a Google/Facebook scale solution from the outset.
More important than separate web server is that if using Apache/mod_wsgi, serve the static files under a different host name. That way you avoid heavy weight cookie information being sent for all static file requests. You can do this using virtual hosts in Apache. Also ensure you are using daemon mode of mod_wsgi for running the Django application as that is a better architecture and provides lots more options for setting timeouts so you can have your application recover from various situations which might otherwise cause the server to lock up when overloaded.
For a system which provides a better out of the box configuration and experience than using Apache/mod_wsgi directly and configuring it yourself, look at using mod_wsgi-express.
https://pypi.python.org/pypi/mod_wsgi
http://blog.dscpl.com.au/2015/04/introducing-modwsgi-express.html
http://blog.dscpl.com.au/2015/04/using-modwsgi-express-with-django.html
http://blog.dscpl.com.au/2015/04/integrating-modwsgi-express-as-django.html

The advice about separating the webservers has two advantages. One clearly outlined by Graham. The other is "predictable resource consumption".
The number of resources per HTML page differ. Leaving one webserver to serve the application and the other to serve static resources, has the advantage that you know exactly how many concurrent visitors you can serve: the MaxClients setting of Apache.
If this slows down the loading of images, those webservers need very few modules and no measurable amount of CPU power so a one core machine with SSD disks is all you need and scaling is cheap.
As Graham indicates it starts with a STATIC_URL that has a different hostname. Run it at the same server at the start. When scaling up, tie that hostname to a reverse proxy that serves from several image server backend machines.

Related

Writing a REST Service in C++ with Nginx

I'm a bit underwhelmed by the Nginx module documentation. I've a lot of C++ code, and a REST Service already running using Boost Beast, and I'd like to compare performance between Beast and NGINX using the C++ module interface against a Benchmark I'll write accordingly to my needs.
I've seen this tutorial here: https://www.evanmiller.org/nginx-modules-guide.html
But I've thus far not seen a concise, short example to just get started.
Is there a hidden documentation? Alternatively, do you have an example showing how to use Nginx as a REST service in C(++)?

Short answer: Do not embed any application code into nginx.
Long answer:
You can make new nginx module to help nginx to do its work better, for example:
add some new method of authentication
or some new transport to back-end, like shared memory.
Nginx was designed to serve static content, proxy requests and do some filtering like modifying headers.
Main objective of nginx - do these things as fast as possible and spend as less resources as possible.
It allows your application server to scale dynamically without affecting currently connected users.
Nginx is good web server but was never designed to become application server.
It does not makes much sense to embed application logic into nginx just because it is built with C language.
If you need to have the best of both worlds (proxy, static files and rest server) then just use them both (nginx and Beast) with each having its own responsibility.
Nginx will take care of balancing, encryption and any other non-application specific function and app server will do its work.
Nginx's architecture is based on non-blocking network/file calls and all connections are served in a single thread and Nginx do it well because it just shuffles data back and forth.
If the code of your application can generate content fast and without blocking calls to external services then you could embed your app into nginx with consequences of loosing scalability. And if some part of your app requires CPU bound work or blocking calls then you need to move such things off main networking loop and it complicates things "a bit".
By embedding your logic into nginx you could probably save some microseconds and file handles on communications.
For multi-user websocket app like chat or stock feed (i.e. app with long-term open connections) it could liberate extra resources but for the REST app with fast responses it would not make any gain.
Your REST app most likely uses SSL encryption. This encryption adds much more microseconds(milliseconds) to your response time compared to what you could gain by such implementation.
My advice: Leave nginx to do its things and do not interfere with it

How to warm up django web service before opening to public?

I'm running django web application on AWS ecs.
I'd like to warm up the server (hit the first request and it takes some time for django to load up) when deploying a new version.
Is there a way for warming up the server before registering it to the Application load balancer?
Edit
I'm using nginx + uwsgi

I asumed that you use mod_wsgi , because that is the behavior described here:
Q: Why do requests against my application seem to take forever, but then after a bit they all run much quicker?
A: This is because mod_wsgi by default performs lazy loading of any application. That is, an application is only loaded the first time
that a request arrives which targets that WSGI application. This means
that those initial requests will incur the overhead of loading all the
application code and performing any startup initialisation.
This startup overhead can appear to be quite significant, especially if using Apache prefork MPM and embedded mode. This is
because the startup cost is incurred for each process and with prefork
MPM there are typically a lot more processes that if using worker MPM
or mod_wsgi daemon mode. Thus, as many requests as there are processes
will run slowly and everything will only run full speed once code has
all been loaded.
Note that if recycling of Apache child processes or mod_wsgi daemon processes after a set number of requests is enabled, or for
embedded mode Apache decides itself to reap any of the child
processes, then you can periodically see these delayed requests
occurring.
Some number of the benchmarks for mod_wsgi which have been posted do not take into mind these start up costs and wrongly try to compare
the results to other systems such as fastcgi or proxy based systems
where the application code would be preloaded by default. As a result
mod_wsgi is painted in a worse light than is reality. If mod_wsgi is
configured correctly the results would be better than is shown by
those benchmarks.
For some cases, such as when WSGIScriptAlias is being used, it is actually possible to preload the application code when the processes
first starts, rather than when the first request arrives. To preload
an application see the WSGIImportScript directive.
I think you may try to use WSGIScriptAlias see more here

I just changed health check from nginx based ones to uwsgi related ones,
create an endpoint in django, and let the ELB use that as health check

fcgi vs mod_fastcgi on apache server

I have an apache server in which I am setting up fcgi. I was contemplating if I've to setup the tailor made mod_fastcgi or the plain old cgi-fcgi.
mod-fastcgi doesn't seem to support the "multiplexing" features of fcgi, and the web service I am building is a very high traffic service with several thousand calls per minute and I want them to be processed as quick as possible.
Any suggestions or advice??

Indeed, mod_fastcgi does not support multiplexing. I suppose this is because the Apache web server handles concurrent processing itself. You've probably dealt with it's various Multi-Processing-Models (MPMs) already...
Apache is highly optimized around the several (request) phases provided. The various modules can hook in where-ever you like, which makes the Apache an excellent server to directly integrate high performance and/or really complex applications (e.g. with custom modules in c, mod_perl and so on) as modules themselves.
But both, mod_fastcgi and cgi-fcgi, are IMHO only used to provide response and/or filter handler. Thus; many of the great features (configuration, mapping, post-request logging & cleanup...) provided with Apache are just not used in such a setup.
Thus; if your application is built on top of FGCI, I'd rather not recommend using Apache. Especially for high performance applications under high load; One may prefer a more lightweight but fast HTTP daemon. There are plenty of alternatives like nginx or lighttpd.
Usually one would use them as proxies/balancer to the FCGI processes, cache, SSL handler and logging provider. Of course, Apache is also capable of these tasks, but it's somehow like using a helicopter to direct the traffic at the intersection...
Cheers!

Serve multiple Django and PHP projects on the same machine?

The documentation states that one should not server static files on the same machine as as the Django project, because static content will kick the Django application out of memory. Does this problem also come from having multiple Django projects on one server ? Should I combine all my Website-Projects into one very large Django project ?
I'm currently serving Django along with php scripts from Apache with mod WSGI. Does this also cause a loss of efficiency ?
Or is the warning just meant for static content, because the problem arises when serving hundreds of files, while serving 20-30 different PHP / Django projects is ok ?

I would say that this setup is completely ok. Off course it depends on the hardware, load and the other projects. But here you can just try and monitor the usage/performance.
The suggestion to use different server(s) for static files makes sense, as it is more efficient for the ressources. But as long as one server performs good enough i don't see a reason to use a second one.
Another question - which has less to do with performance than with ease of use/configuration - is the decision if you really want to run everything on the same server.
For one setup with a bunch of smaller sites (and as well some php-legacy) we use one machine with four virtual servers:
webhead running nginx (and varnish)
database
simple apache2/php server
django server using gunicorn + supervisord
nginx handles all the sites, either proxying to the application-server or serving static content (via nas). I like this setup, as it is very easy to install and handle, as well it makes it simple to scale out one piece if needed. Bu

If the documentation says """one should not server static files on the same machine as as the Django project, because static content will kick the Django application out of memory""" then the documentation is very misleading and arguably plain wrong.
The one suggestion I would make if using PHP on same system is that you ensure you are using mod_wsgi daemon mode for running the Python web application and even one daemon process per Python web application.
Do not run the Python web application in embedded mode because that means you are running stuff in same process as mod_php and because PHP including extensions is not really multithread safe that means you have to be running prefork MPM. Running Python web applications embedded in Apache when running prefork MPM is a bad idea unless you know very well how to set up Apache properly for it. Don't set up Apache right and you get issues like as described in:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
The short of it is that Apache configuration for PHP and Python need to be quite different. You can get around that though by using mod_wsgi daemon mode for the Python web application.

Move to 2 Django physical servers (front and backend) from a single production server?

I currently have a growing Django production server that has all of the front end and backend services running on it. I could keep growing that server larger and larger, but instead I want to try and leave that main server as my backend server and create multiple front end servers that would run apache/nginx and remotely connect to the main production backend server.
I'm using slicehost now, so I don't think I can benefit from having the multiple servers run on an intranet. How do I do this?

The first step in scaling your server is usually to separate the database server. I'm assuming this is all you meant by "backend services", unless you give us any more details.
All this needs is a change to your settings file. Change DATABASE_HOST from localhost to the new IP of your database server.
If your site is heavy on static content, creating a separate media server could help. You may even look into a CDN.

The first step usually is to separate the server running actual Python code and the database server. Any background jobs that does processing would probably run on the database server. I assume that when you say front end server, you actually mean a server running Python code.
Now, as every request will have to do a number of database queries, latency between the webserver and the database server is very important. I don't know if Slicehost has some feature to allow you to create two virtual machines that are "close" in terms of network latency(a quick google search did not find anything). They seem like nice guys, so maybe you could ask them if they have such a service or could make an exception.
Anyway, when you do have two machines on Slicehost, you could check the latency between them by simply pinging between them. When you have the result you will probably know if this is at all feasible or not.
Further steps depends on your application. If it is media heavy, then maybe using a separate media server would make sense. Otherwise the normal step is to add more web servers.
--
As a side note, I personally think it makes more sense to invest in real dedicated servers with dedicated network equipment for this kind of setup. This of course depends on what budget you are on.
I would also suggest looking into Amazon EC2 where you can provision servers that are magically close to each other.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js