Tuning gunicorn (with Django): Optimize for more concurrent connections and faster connections - django

I use Django 1.5.3 with gunicorn 18.0 and lighttpd. I serve my static and dynamic content like that using lighttpd:
$HTTP["host"] == "www.mydomain.com" {
$HTTP["url"] !~ "^/media/|^/static/|^/apple-touch-icon(.*)$|^/favicon(.*)$|^/robots\.txt$" {
proxy.balance = "hash"
proxy.server = ( "" => ("myserver" =>
( "host" => "127.0.0.1", "port" => 8013 )
))
}
$HTTP["url"] =~ "^/media|^/static|^/apple-touch-icon(.*)$|^/favicon(.*)$|^/robots\.txt$" {
alias.url = (
"/media/admin/" => "/var/www/virtualenvs/mydomain/lib/python2.7/site-packages/django/contrib/admin/static/admin/",
"/media" => "/var/www/mydomain/mydomain/media",
"/static" => "/var/www/mydomain/mydomain/static"
)
}
url.rewrite-once = (
"^/apple-touch-icon(.*)$" => "/media/img/apple-touch-icon$1",
"^/favicon(.*)$" => "/media/img/favicon$1",
"^/robots\.txt$" => "/media/robots.txt"
)
}
I already tried to run gunicorn (via supervisord) in many different ways, but I cant get it better optimized than it can handle about 1100 concurrent connections. In my project I need about 10000-15000 connections
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 9 -k gevent --preload --settings=myproject.settings
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 10 -k eventlet --worker_connections=1000 --settings=myproject.settings --max-requests=10000
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 20 -k gevent --settings=myproject.settings --max-requests=1000
command = /var/www/virtualenvs/myproject/bin/python /var/www/myproject/manage.py run_gunicorn -b 127.0.0.1:8013 -w 40 --settings=myproject.settings
On the same server there live about 10 other projects, but CPU and RAM is fine, so this shouldnt be a problem, right?
I ran a load test and these are the results:
At about 1100 connections my lighttpd errorlog says something like that, thats where the load test shows the drop of connections:
2013-10-31 14:06:51: (mod_proxy.c.853) write failed: Connection timed out 110
2013-10-31 14:06:51: (mod_proxy.c.939) proxy-server disabled: 127.0.0.1 8013 83
2013-10-31 14:06:51: (mod_proxy.c.1316) no proxy-handler found for: /
... after about one minute
2013-10-31 14:07:02: (mod_proxy.c.1361) proxy - re-enabled: 127.0.0.1 8013
These things also appear ever now and then:
2013-10-31 14:06:55: (network_linux_sendfile.c.94) writev failed: Connection timed out 600
2013-10-31 14:06:55: (mod_proxy.c.853) write failed: Connection timed out 110
...
2013-10-31 14:06:57: (mod_proxy.c.828) establishing connection failed: Connection timed out
2013-10-31 14:06:57: (mod_proxy.c.939) proxy-server disabled: 127.0.0.1 8013 45
So how can I tune gunicorn/lighttpd to serve more connections faster? What can I optimize? Do you know any other/better setup?
Thanks alot in advance for your help!
Update: Some more server info
root#django ~ # top
top - 15:28:38 up 100 days, 9:56, 1 user, load average: 0.11, 0.37, 0.76
Tasks: 352 total, 1 running, 351 sleeping, 0 stopped, 0 zombie
Cpu(s): 33.0%us, 1.6%sy, 0.0%ni, 64.2%id, 0.4%wa, 0.0%hi, 0.7%si, 0.0%st
Mem: 32926156k total, 17815984k used, 15110172k free, 342096k buffers
Swap: 23067560k total, 0k used, 23067560k free, 4868036k cached
root#django ~ # iostat
Linux 2.6.32-5-amd64 (django.myserver.com) 10/31/2013 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
33.00 0.00 2.36 0.40 0.00 64.24
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
sda 137.76 980.27 2109.21 119567783 257268738
sdb 24.23 983.53 2112.25 119965731 257639874
sdc 24.25 985.79 2110.14 120241256 257382998
md0 0.00 0.00 0.00 400 0
md1 0.00 0.00 0.00 284 6
md2 1051.93 38.93 4203.96 4748629 512773952
root#django ~ # netstat -an |grep :80 |wc -l
7129
Kernel Settings:
echo "10152 65535" > /proc/sys/net/ipv4/ip_local_port_range
sysctl -w fs.file-max=128000
sysctl -w net.ipv4.tcp_keepalive_time=300
sysctl -w net.core.somaxconn=250000
sysctl -w net.ipv4.tcp_max_syn_backlog=2500
sysctl -w net.core.netdev_max_backlog=2500
ulimit -n 10240

Related

Django REST Framework slow on Gunicorn

I deployed a new application made on Django REST Framework running with Gunicorn.
The application is deployed on 4 different servers and they are listening on a port (8001) which is consumed by an HAProxy load balancer.
Unfortunately I am suffering many performance problems: many requests takes seconds to be served, even 30 or 60 seconds.
Sometimes even the basic entry endpoint (like https://my.api/api/v2 which basically returns the list of the available endpoints) requires 10/20 seconds to respond.
I don't think the problem is the load balancer because I have latencies connecting directly to any backend servers with my client, so bypassing the load balancer.
And I don't think the problem is the database because the call to https://my.api/api/v2 as guest (not logged with any user) shouldn't make any database query.
This is a performance test made with hey (https://github.com/rakyll/hey) on the basic endpoint without authorization:
me#staging2:~$ hey -n 10000 -c 100 https://my.api/api/v2/
Summary:
Total: 38.9165 secs
Slowest: 18.6772 secs
Fastest: 0.0041 secs
Average: 0.3099 secs
Requests/sec: 256.9604
Total data: 20870000 bytes
Size/request: 2087 bytes
Response time histogram:
0.004 [1] |
1.871 [9723] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
3.739 [0] |
5.606 [175] |■
7.473 [0] |
9.341 [35] |
11.208 [29] |
13.075 [0] |
14.943 [0] |
16.810 [0] |
18.677 [37] |
Latency distribution:
10% in 0.0054 secs
25% in 0.0122 secs
50% in 0.0322 secs
75% in 0.2378 secs
90% in 0.3417 secs
95% in 0.3885 secs
99% in 8.5935 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0021 secs, 0.0041 secs, 18.6772 secs
DNS-lookup: 0.0001 secs, 0.0000 secs, 0.0123 secs
req write: 0.0001 secs, 0.0000 secs, 0.0153 secs
resp wait: 0.3075 secs, 0.0039 secs, 18.6770 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0150 secs
Status code distribution:
[200] 10000 responses
This is my Gunicorn configuration:
bind = '0.0.0.0:8001'
backlog = 2048
workers = 1
worker_class = 'sync'
worker_connections = 1000
timeout = 120
keepalive = 5
spew = False
daemon = False
pidfile = None
umask = 0
user = None
group = None
tmp_upload_dir = None
errorlog = '-'
loglevel = 'debug'
accesslog = '-'
access_log_format = '%(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"'
And Gunicorn is currently running with the following command:
/path/to/application/bin/python3 /path/to/application/bin/gunicorn --env DJANGO_SETTINGS_MODULE=settings.production -c /path/to/application/settings/gunicorn_conf.py --user=deployer --log-file=/path/to/application/django-application-gunicorn.log --chdir /path/to/application/django-application --daemon wsgi:application
What tests I can do to find out what is causing my performance problems?

NOAUTH Authentication required redis

Start redis-server:
redis-server /usr/local/etc/redis.conf
Redis config file (/usr/local/etc/redis.conf):
...
requirepass 'foobared'
...
Rails - application.yml:
...
development:
redis_password: 'foobared
...
The Error:
Redis::CommandError - NOAUTH Authentication required.:
...
app/models/user.rb:54:in `last_accessed_at'
...
app/models/user.rb:54:
Line 53 - def last_accessed_at
Line 54 - Rails.cache.read(session_key)
Line 55 - end
and session_key is just an attribute of User model.
BTW:
± ps -ef | grep redis
501 62491 57789 0 1:45PM ttys001 0:00.37 redis-server 127.0.0.1:6379
501 62572 59388 0 1:54PM ttys002 0:00.00 grep --color=auto --exclude-dir=.bzr --exclude-dir=CVS --exclude-dir=.git --exclude-dir=.hg --exclude-dir=.svn redis
I got the same error:
I just commented out the requirepass line.
# requirepass 'foobared'
It worked in my case.

supervisor, gunicorn and django only work when logged in

I'm using this guide to setting up an intranet server. Everything goes ok, the server works and I can checked it is working in my network.
But when I logout, I get 404 error.
The sock file is in the path indicated in gunicorn_start.
(cmi2014)javier#sgc:~/workspace/cmi/cmi$ ls -l run/
total 0
srwxrwxrwx 1 javier javier 0 mar 10 17:31 cmi.sock
Actually I can se the workers when listed the process list.
(cmi2014)javier#sgc:~/workspace/cmi/cmi$ ps aux | grep cmi
javier 17354 0.0 0.2 14652 8124 ? S 17:27 0:00 gunicorn: master [cmi]
javier 17365 0.0 0.3 18112 10236 ? S 17:27 0:00 gunicorn: worker [cmi]
javier 17366 0.0 0.3 18120 10240 ? S 17:27 0:00 gunicorn: worker [cmi]
javier 17367 0.0 0.5 36592 17496 ? S 17:27 0:00 gunicorn: worker [cmi]
javier 17787 0.0 0.0 4408 828 pts/0 S+ 17:55 0:00 grep --color=auto cmi
And supervisorctl responds that the process is running:
(cmi2014)javier#sgc:~/workspace/cmi/cmi$ sudo supervisorctl status cmi
[sudo] password for javier:
cmi RUNNING pid 17354, uptime 0:29:21
There is an error in nginx logs,
(cmi2014)javier#sgc:~/workspace/cmi/cmi$ tail logs/nginx-error.log
2014/03/10 17:38:57 [error] 17299#0: *19 connect() to
unix:/home/javier/workspace/cmi/cmi/run/cmi.sock failed (111: Connection refused) while
connecting to upstream, client: 10.69.0.174, server: , request: "GET / HTTP/1.1",
upstream: "http://unix:/home/javier/workspace/cmi/cmi/run/cmi.sock:/", host:
"10.69.0.68:2014"
Again, the error appears only when I logout or close session, but everything works fine when run or reload supervisor and stay connected.
By the way, ngnix, supervisor and gunicorn run on my uid.
Thanks in advance.
Edit Supervisor conf
[program:cmi]
command = /home/javier/entornos/cmi2014/bin/cmi_start
user = javier
stdout_logfile = /home/javier/workspace/cmi/cmi/logs/cmi_supervisor.log
redirect_stderr = true
autostart=true
autorestart=true
Gnunicor start script
#!/bin/bash
NAME="cmi" # Name of the application
DJANGODIR=/home/javier/workspace/cmi/cmi # Django project directory
SOCKFILE=/home/javier/workspace/cmi/cmi/run/cmi.sock # we will communicte using this unix socket
USER=javier # the user to run as
GROUP=javier # the group to run as
NUM_WORKERS=3 # how many worker processes should Gunicorn spawn
DJANGO_SETTINGS_MODULE=cmi.settings # which settings file should Django use
DJANGO_WSGI_MODULE=cmi.wsgi # WSGI module name
echo "Starting $NAME as `whoami`"
# Activate the virtual environment
cd $DJANGODIR
source /home/javier/entornos/cmi2014/bin/activate
export DJANGO_SETTINGS_MODULE=$DJANGO_SETTINGS_MODULE
export PYTHONPATH=$DJANGODIR:$PYTHONPATH
export CMI_SECRET_KEY='***'
export CMI_DATABASE_HOST='***'
export CMI_DATABASE_NAME='***'
export CMI_DATABASE_USER='***'
export CMI_DATABASE_PASS='***'
export CMI_DATABASE_PORT='3306'
# Create the run directory if it doesn't exist
RUNDIR=$(dirname $SOCKFILE)
test -d $RUNDIR || mkdir -p $RUNDIR
# Start your Django Unicorn
# Programs meant to be run under supervisor should not daemonize themselves (do not use --daemon)
exec /home/javier/entornos/cmi2014/bin/gunicorn ${DJANGO_WSGI_MODULE}:application --name $NAME --workers $NUM_WORKERS --user=$USER --group=$GROUP --log-level=debug --bind=unix:$SOCKFILE

WSGI using more daemon processes than it should?

So i set up a WSGI server running python/django code, and stuck the following in my httpd.conf file:
WSGIDaemonProcess mysite.com processes=2 threads=15 user=django group=django
However, when i go to the page and hit "refresh" really really quickly, it seems that i am getting way more than two processes:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21042 django 20 0 975m 36m 4440 S 98.9 6.2 0:15.63 httpd
1017 root 20 0 67688 2352 740 S 0.3 0.4 0:10.50 sendmail
21041 django 20 0 974m 40m 4412 S 0.3 6.7 0:16.36 httpd
21255 django 20 0 267m 8536 2036 S 0.3 1.4 0:01.02 httpd
21256 django 20 0 267m 8536 2036 S 0.3 1.4 0:00.01 httpd
I thought setting processes=2 would limit it to two processes. Is there something i'm missing?

Django + lighttpd + fcgi performance

I am using Django to handle fairly long http post requests and I am wondering if my setup has some limitations when I received many requests at the same time.
lighttpd.conf fcgi:
fastcgi.server = (
"a.fcgi" => (
"main" => (
# Use host / port instead of socket for TCP fastcgi
"host" => "127.0.0.1",
"port" => 3033,
"check-local" => "disable",
"allow-x-send-file" => "enable"
))
)
Django init.d script start section:
start-stop-daemon --start --quiet \
--pidfile /var/www/tmp/a.pid \
--chuid www-data --exec /usr/bin/env -- python \
/var/www/a/manage.py runfcgi \
host=127.0.0.1 port=3033 pidfile=/var/www/tmp/a.pid
Starting Django using the script above results in a multi-threaded Django server:
www-data 342 7873 0 04:58 ? 00:01:04 python /var/www/a/manage.py runfcgi host=127.0.0.1 port=3033 pidfile=/var/www/tmp/a.pid
www-data 343 7873 0 04:58 ? 00:01:15 python /var/www/a/manage.py runfcgi host=127.0.0.1 port=3033 pidfile=/var/www/tmp/a.pid
www-data 378 7873 0 Feb14 ? 00:04:45 python /var/www/a/manage.py runfcgi host=127.0.0.1 port=3033 pidfile=/var/www/tmp/a.pid
www-data 382 7873 0 Feb12 ? 00:14:53 python /var/www/a/manage.py runfcgi host=127.0.0.1 port=3033 pidfile=/var/www/tmp/a.pid
www-data 386 7873 0 Feb12 ? 00:12:49 python /var/www/a/manage.py runfcgi host=127.0.0.1 port=3033 pidfile=/var/www/tmp/a.pid
www-data 7873 1 0 Feb12 ? 00:00:24 python /var/www/a/manage.py runfcgi host=127.0.0.1 port=3033 pidfile=/var/www/tmp/a.pid
In lighttpd error.log, I do see load = 10 which shows I am getting many requests at the same time, this happens few times a day:
2010-02-16 05:17:17: (mod_fastcgi.c.2979) got proc: pid: 0 socket: tcp:127.0.0.1:3033 load: 10
Is my setup correct to handle many long http post requests (can last few minutes each) at the same time ?
I think you may want to configure your fastcgi worker to run multi-processed, or multi-threaded.
From manage.py runfcgi help:
method=IMPL prefork or threaded (default prefork)
[...]
maxspare=NUMBER max number of spare processes / threads
minspare=NUMBER min number of spare processes / threads.
maxchildren=NUMBER hard limit number of processes / threads
So your start command would be:
start-stop-daemon --start --quiet \
--pidfile /var/www/tmp/a.pid \
--chuid www-data --exec /usr/bin/env -- python \
/var/www/a/manage.py runfcgi \
host=127.0.0.1 port=3033 pidfile=/var/www/tmp/a.pid \
method=prefork maxspare=4 minspare=4 maxchildren=8
You will want to adjust the number of processes as needed. Note that the more FCGI processes you have, your memory usage will increase linearly. Also, if your processes are CPU-bound, having more processes than number of available CPU cores won't help much for concurrency.