Django/Apache freezing with mod_wsgi - django

I have a Django application that is running behind 2 load balanced mod_wsgi/Apache servers behind Nginx (static files, reverse proxy/load balance).
Every few days, my site becomes completely unresponsive. My guess is that a bunch of clients are requesting URLs that are blocking.
Here is my config
WSGIDaemonProcess web1 user=web1 group=web1 processes=8 threads=15 maximum-requests=500 python-path=/home/web1/django_env/lib/python2.6/site-packages display-name=%{GROUP}
WSGIProcessGroup web1
WSGIScriptAlias / /home/web1/django/wsgi/wsgi_handler.py
I've tried experimenting with only using a single thread and more processes, and more threads and a single process. Pretty much everything I try sooner or later results in timed out page loads.
Any suggestions for what I might try? I'm willing to try other deployment options if that will fix the problem.
Also, is there a better way to monitor mod_wsgi other than the Apache status module? I've been hitting:
curl http://localhost:8080/server-status?auto
And watching the number of busy workers as an indicator for whether I'm about to get into trouble (I assume the more busy workers I have, the more blocking operations are currently taking place).
NOTE: Some of these requests are to a REST web service that I host for the application. Would it make sense to rate limit that URL location through Nginx somehow?

Use:
http://code.google.com/p/modwsgi/wiki/DebuggingTechniques#Extracting_Python_Stack_Traces
to embed functionality that you can trigger at a time where you expect stuck requests and find out what they are doing. Likely the requests are accumulating over time rather than happening all at once, so you could do it periodically rather than wait for total failure.
As a fail safe, you can add the option:
inactivity-timeout=600
to WSGIDaemonProcess directive.
What this will do is restart the daemon mode process if it is inactive for 10 minutes.
Unfortunately at the moment this happens in two scenarios though.
The first is where there have been no requests at all for 10 minutes, the process will be restarted.
The second, and the one you want to kick in, is if all request threads are blocked and none of them has read any input from wsgi.input, nor have any yielded any response content, in 10 minutes, the process will again be restarted automatically.
This will at least mean your process should recover automatically and you will not be called out of bed. Because you are running so many processes, chances are that they will not all get stuck at the same time so restart shouldn't be noticed by new requests as other processes will still handle the requests.
What you should work out is how low you can make that timeout. You don't want it so low that processes will restart because of no requests at all as it will unload the application and next request if lazy loading being used will incur slow down.
What I should do is actually add a new option blocked-timeout which specifically checks for all requests being blocked for the defined period, therefore separating it from restarts due to no requests at all. This would make this more flexible as having it restart due to no requests brings its own issues with loading application again.
Unfortunately one can't easily implement a request-timeout which applies to a single request because the hosting configuration could be multithreaded. Injecting Python exceptions into a request will not necessarily unblock the thread and ultimately you would have to kill process anyway and interupt other concurrent requests. Thus blocked-timeout is probably better.
Another interesting thing to do might be for me to add stuff into mod_wsgi to report such forced restarts due to blocked processes into the New Relic agent. That would be really cool then as you would get visibility of them in the monitoring tool. :-)

We had a similar problem at my work. Best we could ever figure out was race/deadlock issues with the app, causing mod_wsgi to get stuck. Usually killing one or more mod_wsgi processes would un-stick it for a while.
Best solution was to move to all-processes, no-threads. We confirmed with our dev teams that some of the Python libraries they were pulling in were likely not thread-safe.
Try:
WSGIDaemonProcess web1 user=web1 group=web1 processes=16 threads=1 maximum-requests=500 python-path=/home/web1/django_env/lib/python2.6/site-packages display-name=%{GROUP}
Downside is, processes suck up more memory than threads do. Consequently we usually end up with fewer overall workers (hence 16x1 instead of 8x15). And since mod_wsgi provides virtually nothing for reporting on how busy the workers are, you're SOL apart from just blindly tuning how many you have.
Upside is, this problem never happens anymore and apps are completely reliable again.
Like with PHP, don't use a threaded implementation unless you're sure it's safe... that means the core (usually ok), the framework, your own code, and anything else you import. :)

If I've understood your problem properly, you may try the following options:
move URL fetching out of the request/response cycle (using e.g. celery);
increase thread count (they can handle such blocks better than processes because they consume less memory);
decrease timeout for the urllib2.urlopen;
try gevent or eventlet (they will magically solve your problem but can introduce another subtle issues)
I don't think this is a deployment issue, this is more of a code issue and there is no apache configuration solving it.

Related

How to warm up django web service before opening to public?

I'm running django web application on AWS ecs.
I'd like to warm up the server (hit the first request and it takes some time for django to load up) when deploying a new version.
Is there a way for warming up the server before registering it to the Application load balancer?
Edit
I'm using nginx + uwsgi
I asumed that you use mod_wsgi , because that is the behavior described here:
Q: Why do requests against my application seem to take forever, but then after a bit they all run much quicker?
A: This is because mod_wsgi by default performs lazy loading of any application. That is, an application is only loaded the first time
that a request arrives which targets that WSGI application. This means
that those initial requests will incur the overhead of loading all the
application code and performing any startup initialisation.
This startup overhead can appear to be quite significant, especially if using Apache prefork MPM and embedded mode. This is
because the startup cost is incurred for each process and with prefork
MPM there are typically a lot more processes that if using worker MPM
or mod_wsgi daemon mode. Thus, as many requests as there are processes
will run slowly and everything will only run full speed once code has
all been loaded.
Note that if recycling of Apache child processes or mod_wsgi daemon processes after a set number of requests is enabled, or for
embedded mode Apache decides itself to reap any of the child
processes, then you can periodically see these delayed requests
occurring.
Some number of the benchmarks for mod_wsgi which have been posted do not take into mind these start up costs and wrongly try to compare
the results to other systems such as fastcgi or proxy based systems
where the application code would be preloaded by default. As a result
mod_wsgi is painted in a worse light than is reality. If mod_wsgi is
configured correctly the results would be better than is shown by
those benchmarks.
For some cases, such as when WSGIScriptAlias is being used, it is actually possible to preload the application code when the processes
first starts, rather than when the first request arrives. To preload
an application see the WSGIImportScript directive.
I think you may try to use WSGIScriptAlias see more here
I just changed health check from nginx based ones to uwsgi related ones,
create an endpoint in django, and let the ELB use that as health check

Very slow: ActiveRecord::QueryCache#call

I have an app on heroku, running on Puma:
workers 2
threads_count 3
pool 5
It looks like some requests get stuck in the middleware, and it makes the app very slow (VERY!).
I have seen other people threads about this problem but no solution so far.
Please let me know if you have any hint.
!
!
I work for Heroku support and Middleware/Rack/ActiveRecord::QueryCache#call is a commonly reported as a problem by New Relic. Unfortunately, it's usually a red herring as each time the source of the problem lies elsewhere.
QueryCache is where Rails first tries to check out a connection for use, so any problems with a connection will show up here as a request getting 'stuck' waiting. This doesn't mean the database server is out of connections necessarily (if you have Librato charts for Postgres they will show this). It likely means something is causing certain database connections to enter a bad state, and new requests for a connection are waiting. This can occur in older versions of Puma where multiple threads are used and the reaping_frequency is set - if some connections get into a bad state and the others are reaped this will cause problems.
Some high-level suggestions are as follows:
Upgrade Ruby & Puma
If using the rack-timeout gem, upgrade that too
These upgrades often help. If not, there are other options to look into such as switching from threads to worker based processes or using a Postgres connection pool such as PgBouncer. We have more suggestions on configuring concurrent web servers for use with Postgres here: https://devcenter.heroku.com/articles/concurrency-and-database-connections
I will answer my own question:
I simply had to check all queries to my DB. One of them was taking a VERY long time, and even if it was not requested often, it would slow down the whole server for quite some time afterwards(even after the process was done, there was a sort of "traffic jam" on the server).
Solution:
Check all the queries to your database, fix the slowest ones (it might simply mean breaking it down in few steps, it might mean make it run at night when there is no traffic, etc...).
Once this queries are fixed, everything should go back to normal.
I recently started seeing a spike in time spent in ActiveRecord::QueryCache#call. After looking at the source, I decided to try clearing said cache using ActiveRecord::Base.connection.clear_query_cache from a Rails Console attached to the production environment. The error I got back was PG::ConnectionBad: could not fork new process for connection: Cannot allocate memory which lead me to this other SO question at least Heroku Rails could not fork new process for connection: Cannot allocate memory

Configure uwsgi server for performance

I am deploying a uwsgi server for a django app. Each request will have a latency around 2 seconds. I need to handle 100 QPS. On a 4 cores machines, how should I configure the number of processes and the number of threads? I tried to play with the values but I do not understand what I am doing.
Go through the uWSGI Things to know page. 100 requests per second should be easily attainable with uWSGI.
Based on uWSGI behavior I've experienced, I would recommend that you start with only processes and don't use any threads. With both processes and threads we observed that there seemed to be an affinity to use threads over processes. That resulted in a single process handling all requests until it's thread pool was fully occupied and only then were requests handled by the next process. This resulted in poor utilization of resources as a single core was maxed out with all other idle. Turning off threading resulted in a massive performance boost for our particular use model.
Your experience may be different. The uWSGI authors stress that there isn't any magic config combination- it's completely dependent on your particular use case. You need benchmark your app against various configurations to find the sweet spot. Additionally, unless you're able to use benchmarks that perfectly model your actual production load, you'll want to continue to monitor performance and methodically tweak settings after you deploy.
From the Things to know page:
There is no magic rule for setting the number of processes or threads
to use. It is very much application and system dependent. Simple math
like processes = 2 * cpucores will not be enough. You need to
experiment with various setups and be prepared to constantly monitor
your apps. uwsgitop could be a great tool to find the best values.

cpu load and django application that makes long-response-time requests to external API

I'm developing a web application in python for which each user request makes an API call to an external service and takes about 20 seconds to receive response. As a result, in the event of several concurrent requests being made, the CPU load goes crazy (>95%) with several idle processes.
The server consists of a 1.6 GHz dual core Atom 330 with 2GB RAM.
The web app is developed in python which is served through Apache with mod_wsgi
My question is the following. Will a non-blocking webserver such as Tornado improve CPU load and thus handle more concurrent users (I'm also interested why) ? Can you suggest any other scalable solution ?
This really doesn't have anything to do with blocking; it does, but it doesn't. The 20 sec request is blocking one thread, so another has be utilized for the next request. Whereas with quick requests, the threads basically round-robin.
However, this really shouldn't be spiking your CPU output. Webservers have an upward limit of "workers" that get spawned and when they're all tied up, they're all tied up. It won't extend past the limit, so unless you've set or the default setting is higher than the box you have is capable of running, it shouldn't push your CPU that high.
Regardless, all that is merely informational, and doesn't really solve your problem. With such a long running request though, you should be offloading this from your webserver as quick as possible. The webserver should merely hand off the request to another process that can asyncronously handle it and then employ polling to notify the client when the response is ready. Node.js is used a lot in similar scenarios, but I really don't have enough experience with it to give you any real guidance beyond that.
You should look into using message queues to offload tasks so that your user requests are not blocked.
You could look into python libs kombu and celery to handle messages and tasks.
You are likely using prefork MPM with Apache and mod_wsgi embedded mode. This is a bad combination by default because Apache is setup for PHP and not fat Python web applications. Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
which explains this exact sort of issue.
Use mod_wsgi daemon mode at the minimum, and preferably also change to worker MPM for Apache.

How do I handle blocking IO in mod_wsgi/django?

I am running Django under Apache+mod_wsgi in daemon mode with the following config:
WSGIDaemonProcess myserver processes=2 threads=15
My application does some IO on the backend, which could take several seconds.
def my_django_view:
content=... # Do some processing on backend file
return HttpResponse(content)
It appears that if I am processing more than 2 http requests that are handling this kind of IO, Django will simply block until one of the previous requests completes.
Is this expected behavior? Shouldn't threading help alleviate this i.e. shouldn't I be able to process up to 15 separate requests for a given WSGI process, before I see this kind of wait?
Or am I missing something here?
If the processing is in python, then Global Interpreter Lock is not being released -- in a single python process only one thread of python code can be executing at a time. The GIL is usually released inside C code though -- like most I/O, for example.
If this kind of processing is going to happen a lot, you might consider running a second "worker" application as a deamon, reading tasks from the database, performing the operations and writing resulsts back to the database. Apache might decide to kill processes that take too long to respond.
+1 to Radomir Dopieralski's answer.
If the task takes long you should delegate it to a process outside the request-response cycle, either by using a standard cron, or some distributed task queue like Celery
Databases for workload offloading were quite the thing in 2010, and a good idea then, but we've come a bit farther now.
We're using Apache Kafka as a queue to store our in-flight workload. So, Dataflow is now:
User -> Apache httpd -> Kafka -> python daemon processor
User post operation puts data into system to be processed via wsgi app that just writes it very fast to a Kafka queue. Minimal sanity checking is done in the post operation to keep it fast but find some obvious problems. Kafka stores the data very fast so the http response is zippy.
A separate set of python daemons pull data from Kafka and do processing on it. We actually have multiple processes that need to process it differently, but Kafka makes that fast by only writing once and having multiple readers read the same data if needed; no penalty for duplicate storage is incurred.
This allows very, very fast turnaround; optimal resource usage since we have other boxes offline handle the pull-from-kafka and can tune that to reduce lag as needed. Kafka is HA with same data written to multiple boxes in the cluster so my manager doesn't complain about 'what happens if' scenarios.
We're quite happy with Kafka. http://kafka.apache.org