Optimizing nginx requests per second - django

I'm trying to optimize a single core 1GB ram Digital Ocean VPS to handle more requests per second. After some tweaking (workers/gzip etc.) it serves about 15 requests per second. I don't have anything to compare it with but I think this number can be higher.
The stack works like this:
VPS -> Docker container -> nginx (ssl) -> Varnish -> nginx -> uwsgi (Django)
I'm aware of the fact that this is a long chain and that Docker might cause some overhead. However, almost all requests can be handled by Varnish.
These are my tests results:
ab -kc 100 -n 1000 https://mydomain | grep 'Requests per second'
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests
Requests per second: 18.87 [#/sec] (mean)
I have actually 3 questions:
Am I correct that 18.87 requests per second is low?
For a simple Varnished Django blog app, what would be an adequate value (indication)?
I already applied the recommended tweaks (tuned for my system) from this tutorial. What can I tweak more and how do I figure out the bottlenecks.

First some note about Docker. It is not meant to run a multiple processes in a single docker container. Docker is not a replacement for a VM. It simply allows to run processes in isolation. So the docker diagram should be:
VPS -> docker nginx container -> docker varnish container -> docker django container
To make your life using multiple Docker containers simpler, I would recommend to use Docker-compose. It is not perfect but its an excellent start.
Old but still fundamentally relent blog post about that. Note that some suggestions are no-longer relevant like nsenter since docker exec command is now available but most of the blog post is still correct.
As for your performance issues, yes, 18 requests per second is pretty low. However the issue is probably has nothing to do with nginx and is most likely in your Django application and possibly varnish (however very unlikely).
To debug PA issues in Django, I would recommend to use django-debug-toolbar. Most issues in Django are caused by unnecessary SQL queries. You can see them easily in debug toolbar. To solve most them you can use select_related() and prefetch_related. For more detailed analysis, I would also recommend profiling your application. cProfile is a great start. Also some IDEs like PyCharm include built-in profilers so its pretty easy to profile your application to see which functions are taking most of the time which you can optimize. Finally you can use 3rd party tools to profile your application. Even free newrelic account will give you quite a bit of information. Alternatively you can use opbeat which is a new cool kid on the block.

Related

Huge time latency between nginx and upstream django backdend

So we have a setup of nginx ingress controller as reverse proxy for a django based backend app in production (GKE k8s cluster). We have used opentelemetry to trace this entire stack(Signoz being the actual tool). One of our most critical api is validate-cart.
And we have observed that this api sometimes take a lotta time, like 10-20 seconds and even more. But if we look at the trace of one of such request in Signoz, the actual backend takes very less time like 100ms but the total trace starting from nginx shows 29+ seconds. As you can see from the screen shot attached.
And looking at the p99 latency, the nginx service has way bigger spikes than the order. This graph is populated for the same validate-cart api.
Have been banging my head around this for quite some time and i am still stuck.
I am assuimg there might be some case of request being queued at either nginx or django layer. But i am trusting otel libraries that are used to trace django, to start the trace the moment it hit django layer and since there isn't a big latency at django layer, issue might be at nginx layer. I have traced nginx using the third party module open-tracing since ingress-nginx doesn't yet support opentelemetry where more tracing information about nginx layer is provided.
Nginx opentracing

504 gateway timeout for any requests to Nginx with lot of free resources

We have been maintaining a project internally which has both web and mobile application platform. The backend of the project is developed in Django 1.9 (Python 3.4) and deployed in AWS.
The server stack consists of Nginx, Gunicorn, Django and PostgreSQL. We use Redis based cache server to serve resource intensive heavy queries. Our AWS resources include:
t1.medium EC2 (2 core, 4 GB RAM)
PostgreSQL RDS with one additional read-replica.
Right now Gunicorn is set to create 5 workers (by following the 2*n+1 rule). Load wise, there are like 20-30 mobile users making requests in every minute and there are 5-10 users checking the web panel every hour. So I would say, not very much load.
Now this setup works alright for 80% days. But when something goes wrong (for example, we detect a bug in the live system and we had to switch off the server for maintenance for few hours. In the mean time, the mobile apps have a queue of requests ready in their app. So when we make the backend live, a lot of users hit the system at the same time.), the server stops behaving normally and started responding with 504 gateway timeout error.
Surprisingly every time this happened, we found the server resources (CPU, Memory) to be free by 70-80% and the connection pool in the databases are mostly free.
Any idea where the problem is? How to debug? If you have already faced a similar issue, please share the fix.
Thank you,

Recommended settings for uwsgi

I have a mysql + django + uwsgi + nginx application and I recently had some issues with uwsgi's default configuration so I want to reconfigure it but I have no idea what the recommended values are.
Another problem is that I couldn't find the default settings that uwsgi uses and that makes debugging really hard.
Using the default configuration, the site was too slow under real traffic (too many requests stuck waiting for the uwsgi socket). So I used a configuration from some tutorial and it had cpu-affinity=1 and processes=4 which fixed the issue. The configuration also had limit-as=512 and now the app gets MemoryErrors so I guess 512MB is not enough.
My questions are:
How can I tell what the recommended settings are? I don't need it to be super perfect, just to handle the traffic in a descent way and to not crash from memory errors etc. Specifically the recommended value for limit-as is what I need most right now.
What are the default values of uwsgi's settings?
Thanks!
We run usually quite small applications... Rarely more than 2000 requests per minute. But anyway its is hard to compare different applications. Thats what we use on production:
Recommendation by the documentation
Haharakiri = 20 # respawn processes taking more than 20 seconds
limit-as = 256 # limit the project to 256 MB
max-requests = 5000 # respawn processes after serving 5000 requests
daemonize = /var/log/uwsgi/yourproject.log # background the process & log
uwsgi_conf.yml
processes: 4
threads: 4
# This part might be important too, that way you limit the log file to 200 MB and
# rotate it once
log-maxsize : 200000000
log-backupname : /var/log/uwsgi/yourproject_backup.log
We use the following project for deployment and configuration of our django apps. (No documentation here sorry... Just used it internally)
https://github.com/iterativ/deployit/blob/ubuntu1604/deployit/fabrichelper/fabric_templates/uwsgi.yaml
How can you tell if you configured it correctly... ? Since it depends much on your application I would recommend to use some monitoring tools such as newrelic.com and analyse it.

Strange apache lag in requests

I have an Apache2 and Django (mod_wsgi) setup that provides a RESTful API. I have a set of automated tests for this, that executes ~1000 API requests (pure http GET/POST/PUT/DELETE) in sequential order.
The problem is, for every 80 requests or so, I get a strange lag/timeout for exactly 5s or 10s. See timestamp examples here:
Request 1: 2013-08-30T03:49:20.915
Response 1: 2013-08-30T03:49:30.940
Request 2: 2013-08-30T03:50:32.559
Response 2: 2013-08-30T03:50:37.597
I can't figure out why this happens. I have an apache config with KeepAlive Off (recommended setup setting for Django) but otherwise standard install for Ubuntu 12.04 LTS.
I'm running the tests from the same server where the webserver is, I first thought this was some kind of DNS cache thing, but I've added the hostname I'm requesting to /etc/hosts but the problem persists.
The system is idle and have lots of cpu and mem when this lag/timeouts happens.
The lag is not specific to a certain request (URL), it seems kinda random.
Considering that it's always exactly to the millisecond 5s or 10s, it feels like this is some specific setting somewhere causing this.
In case it provides some insight, watch my talk from PyCon US.
http://lanyrd.com/2013/pycon/scdyzk/
The talk deals with things like process churn and startup costs. One thing you shouldn't do is set maximum requests if you don't really need it.
Also consider trying New Relic to help diagnose where the issue is. That will save a lot of guessing if it is a web application of backend service infrastructure issue.
As far as seeing how such monitoring can help, watch another one of my PyCon talks.
http://lanyrd.com/2012/pycon/spcdg/
This was a DNS issue, adding the domainname I used locally to /etc/hosts actually solved the problem. I just hadn't reboot the server for the changes to take effect, thought restarting networking would take care of that, but apparently not.

How to improve number of request django on heroku can handle at a moment?

In my project, I use Django and heroku to deploy it. In Heroku, I use uWSGI server (with asynchronous mode), database is MySQL (on AWS RDS). I used 7 dyno for scaling django app
When I run stress test with 600 request/second, timeout is 30 second.
My server return > 50% with timeouts request.
Any ideas can help me improve my server performance?
If your async setup is right (and this is the hardest part), well your only solution is adding more dynos. If you are not sure about django+async (or if you have not done any particular customization to make them work together), you have probably a screwed up setup (no concurrency at all).
Take in account that uWSGI async mode could means dozens of different setup (gevent, ugreen, callbacks, greenlets...), so some detail on your configuration could help.