Django Soak Testing Bulk Request at once - django

I want to test whether Django can handle heavy load requests at once continuously. How to conduct this scenario? using tests?.
Eg:
300 Read requests
300 Write requests
within 1 sec.
You can take any basic ORM query like
User.objects.get(id=`some id)
for 1 request and take this as 300 different read requests.
Same for write requests as well but change any field of your choice.
Later check if that works for at-least 10 sec.
You can think of this scenario as multiple concurrent users using the Django app.

I think I got the solution myself to use locust But I'll still wait for someone to give a better answer before accepting my own answer.
There is also a paid site for testing Django - gatling

Related

Django REST framework Query Throughput on AWS

I've trying to run a simple django server on amazon lightsail.
The server is supposed to be a backend that serves game state to twitch viewers via an extension. I'm currently trying to have each viewer poll my backend every second.
Trying a very bear bones read to start I'm essentially storing the game state in memory using django's cache (there's only one tracked 'player' for now) and serving it back to users via the django REST framework.
Given that my requests and responses are quite small I'd expect to easily serve at least a few hundred users but I seem to top out of the sustainable CPU usage at around 50 users. Is this expected for this setup? Are there optimizations I can make or obvious bottlenecks?
Here's my backend that gets polled every second (I update settings post-git pull to debug=False and update the databse settings)
https://github.com/boardengineer/extension
Normal backend reads come in at about 800 bytes.
I tried returning empty responses to test if the response size should be optimized but it only made a small difference.
I thought of removing header content to further reduce response size but I don't think i found a way to do so correctly.
I also tried removing some of my middleware hoping to make the server more lean but I failed to find any middleware to remove

Django - How to store all the requests/responses with the least overhead?

I'm working on a Django middleware to store all requests/responses in my main database (Postgres / SQLite).
But it's not hard to guess that the overhead will be crazy, so I'm thinking to use Redis to queue the requests for an amount of time and then send them slowly to my database.
e.g. receiving 100 requests, storing them in database, waiting to receive another 100 requests and doing the same, or something like this.
The model is like this:
url
method
status
user
remote_ip
referer
user_agent
user_ip
metadata # any important piece of data related to request/response e.g. errors or ...
created_at
updated_at
My questions are "is it a good approach? and how we can implement it? do you have any example that does such a thing?"
And the other question is that "is there any better solution"?
This doesn't suit the concrete question/answer format particularly well, unfortunately.
"Is this a good approach" is difficult to answer directly with a yes or no response. It will work and your proposed implementation looks sound, but you'll be implementing quite a bit of software and adding quite a bit of complexity to your project.
Whether this is desirable isn't easily answerable without context only you have.
Some things you'll want to answer:
What am I doing with these stored requests? Debugging? Providing an audit trail?
If it's for debugging, what does a database record get us that our web server's request logs do not?
If it's for an audit trail, is every individual HTTP request the best representation of that audit trail? Does an auditor care that someone asked for /favicon.ico? Does it convey the meaning and context they need?
Do we absolutely need every request stored? For how long? How do we handle going over our storage budget? How do we handle in edge cases like the client hanging up before getting the response, or we've processed the request but crashed before sending a response or logging the record?
Does logging a request in band with the request itself present a performance cost we actually can't afford?
Compare the strengths and weaknesses of your approach to some alternatives:
We can rely on the web server's logs, which we're already paying the cost for and are built to handle many of the oddball cases here.
We can write an HTTPLog model in band with the request using a simple middleware function, which solves some complexities like "what if redis is down but django and the database aren't?"
We write an audit logging system by providing any context needed to an out-of-band process (perhaps through signals or redis+celery)
Above all: capture your actual requirements first, implement the simplest solution that works second, and optimize only after you actually see performance issues.
I would not put this functionality in my Django application. There are many tools to do that. One of them is NGINX, which is a reverse proxy server which you can put infront of Django. Then you can use the access log from NGINX. Also, you can format those logs according to your need. Usually for this big amount of data, it is better to not store them in database, because this data will rarely be used. You can store them in a S3 bucket or just in plain files and use a log parser tool to parse them.

How to profile Django's bottlenecks for scaling?

I am using django and tastypie for REST API.
For profiling, I am using django-silk and below is a summary of requests:
How do I profile the complete flow? Time taken except for database queries is (382 - 147) ms on average. How do I figure out the bottleneck and optimize/scale? I did use #silk_profile() for the get_object_list method for this resource, but even this method doesn't seem to be bottleneck.
I used caching for decreasing response time, but that didn't help much, what are the other options?
When testing using loader.io, the peak the server can handle is 1000 requests per 30 secs (which seems very low). Other than caching (which I already tried) what might help?
Here's a bunch of suggestions:
bring the query per request at least below 5 per request (34 per request is really bad)
install django toolbar and have a look where the time is spent
use gunicorn or uwsgi behind a reverse proxy (NGINX)
You have too much queries, even if they are relatively fast you spend
some time to reach database etc. Also if you have external cache
storage (for example, redis) it could take some time to connect
there.
To investigate slow parts of the code you have two options:
Use a profiler - profiling at local PC could make no sense if you have distributed system deployed to several machines
Add tracing points to your code that will record some message and current time (something like https://gist.github.com/dbf256/0f1d5d7d2c9aa70bce89). Deploy this patched code and test it with your load-testing tool and check logs.

Concurrency issues with django on appengine

Running a Django application on Appengine we need to make a query that returns approx. 450 rows per request including joins M2M prefetch_related and select_related.
When we make many concurrent requests, the query time for each request goes up in a way that all requests end simultaneously.
Running the same concurrent requests on a non-appengine Django installation or in an appengine instance that has threading set to false do not show this behavior.
There is also a slight improvement when the requests are separated to different appengine instances.
Has anyone seen this before?
Sounds like your database backend is too heavily loaded by your query. Have you tried upgrading to a higher tier?
The basic tier only handles 25 concurrent queries. You said "many" in your question, so if "many" > 25 that's the source of your problem:
https://developers.google.com/cloud-sql/pricing

Django Performance

I am using a django with apache mod_wsgi, my site has dynamic data on every page and all of the media (css, images, js) is stored on amazon S3 buckets liked via "http://bucket.domain.com/images/*.jpg" inside the markup . . . . my question is, can varnish still help me speed up my web server?
I am trying to knock down all the stumbling blocks here. Is there something else I should be looking at? I have made a query profiler for my code at each page renders at about 0.120 CPU seconds which seems quick enough, but when I use ab -c 5 -n 100 http://mysite.com/ the results are only Requests per second: 12.70 [#/sec] (mean) . . .
I realize that there are lots of variables at play, but I am looking for some guidance on things I can do and thought Varnish might be the answer.
UPDATE
here is a screenshot of my profiler
The only way you can improve your performance is if you measure what is slowing you down. Though it's not the best profiler in the world, Django has good integration with the hotshot profiler (described here) and you can figure out what is taking those 0.120 cpu seconds.
Are you using 2 cpus? If that's the case than perhaps the limitation is in the db when you use ab? I only say that because 0.120 * 12.70 is 1.5 which means that there's .5 seconds waiting for something. This could also be IO or something.
Adding another layer for no reason such as varnish is generally not a good idea. The only case where something like varnish would help is if you have slow clients with poor connections hold onto threads, but the ab test is not hitting this condition and frankly it's not a large enough issue to warrant the extra layer.
Now, the next topic is caching, which varnish can help with. Are your pages customized for each user, or can it be static for long periods of time? Often times pages are static except for a simple login status screen -- in this case consider off loading that login status to javascript with cookies. If you are able to cache entire pages then they would be extremely fast in ab. However, the next problem is that ab is not really a good benchmark of your site, since users aren't going to just sit at one page and hit f5 repeatedly.
A few things to think about before you go installing varnish:
First off, have you enabled the page caching middleware within Django?
Are etags set up and working properly?
Is your database server performing optimally?
Have you considered setting up caching using memcached within your code for common results? (particularly front pages and static pages displayed to non-logged-in users)
Except for query heavy dynamic pages that absolutely must display substantially different data for each user, .12 seconds seems like a long time to serve a page. Look at how you can work caching into your app to improve that performance. If you have a page that is largely static other than a username or something similar, cache the computed part of the page.
When Django's caching is working properly, ab on public pages should be lightning fast. If you aren't using any of the other features of Apache, consider using something lighter and faster like lighttpd or nginx.
Eric Florenzano has put together a pretty useful project called django-newcache that implements some advanced caching behavior. If you're running into the limitations of the built-in caching, consider it. http://github.com/ericflo/django-newcache