Is there any tool, plugin or technique that I can use to help identify n+1 queries on a Django application? There is a gem called bullet for Rails that identifies n+1 queries and logs or pops up warnings in a number of ways but I haven't been able to find anything similar for Django. I'd be open to guidance on how to write my own plugin if no one knows of an existing solution.
nplusone does this
Auto-detecting the n+1 queries problem in Python
https://github.com/jmcarp/nplusone
comes with proper django support! also integrates with flask, vanilla wsgi, ...
I don't know any plugin that would find them automatically and warn you.
I personally use the Django Debug Toolbar:
https://github.com/django-debug-toolbar/django-debug-toolbar
It shows the number of queries ran on a page and you can view them.
Scout, an APM product that supports Django apps, identifies expensive N+1 queries in production.
Here's how to use it:
Install the scout-apm Python package (MIT license) and provide your Scout API key. The API key is found in the Scout web UI.
Deploy your app, confirm Scout is receiving data, then check back in an hour or so. Scout analyzes every web request, checking for N+1s, and then displays the worst performers on a dashboard (screenshot).
Select an N+1 you're interested in to reveal a transaction of the request that triggered the N+1. This includes the raw SQL of the query and a backtrace to the LOC that is triggering the query (screenshot).
An advantage to Scout over a development tool like Bullet: most development databases have a small amount of data, so the true impact of an N+1 is frequently unknown. Scout identifies just those N+1s that are consuming significant time, which can help focus your efforts.
Related
I'm new to django, I have this application where each of my views have an average of 6 queries. Is that ok, or should I optimise my database for better.
Instagram is mainly using Django as their backend framework mainly, and they have a huge amount of user as we know. Their solution for solving your problem is to create multiple data centers to process the request to make it like a distributed system.
In your case, if it is running as a private project and the query is not querying a huge database, I think it is okay.
If you are trying to diagnose slow queries in your mysql backend and are using a Django frontend, how do you tie together the slow queries reported by the backend with specific querysets in the Django frontend code?
I think you has no alternative besides logging every django query for the suspicious querysets.
See this answer on how to access the actual query for a given queryset.
If you install django-devserver, it will show you the queries that are being run and the time they take in your shell when using runserver.
Another alternative is django-debug-toolbar, which will do the same in a side panel-overlay on your site.
Either way, you'll need to test it out in your development environment. However, neither really solves the issue of pinpointing you directly to the offending queries; they work on a per-request basis. As a result, you'll have to do a little thinking about which of your views are using the database most heavily and/or deal with exceptionally large amounts of data, but by cherry-picking likely-candidate views and inspecting the times for the queries to run on those pages, you should be able to get a handle on which particular queries are the worst.
I want to trace user's actions in my web site by logging their requests to database as plain text in Django.
I consider to write a custom decorator and place it to every view that I want to trace.
However, I have some troubles in my design.
First of all, is such logging mecahinsm reasonable or because of my log table will be enlarging rapidly it causes some preformance problems ?
Secondly, how should be my log table's design ?
I want to keep keywords if the user call search view or keep the item's id if the user call details of item view.
Besides, IP addresses of user's should be kept but how can I seperate users if they connect via single IP address as in many companies.
I am glad to explain in detail if you think my question is unclear.
Thanks
I wouldn't do that. If this is a production service then you've got a proper web server running in front of it, right? Apache, or nginx or something. That can do logging, and can do it well, and can write to a form that won't bloat your database, and there's a wealth of analytical tools for log analysis.
You are going to have to duplicate a lot of that functionality in your decorator, such as when you want to switch it on or off, or change the log level. The only thing you'll get by doing it all in django is the possibility of ultra-fine control, such as only logging views of blog posts with id numbers greater than X or something. But generally you'd not want that level of detail, and you'd log everything and do any stripping at the analysis phase. You've not given any reason currently why you need to do it from Django.
If you really want it in a RDBMS, reading an apache log file into Postgres or MySQL or one of those expensive ones is fairly trivial.
One thing you should keep in mind is that SQL databases don't offer you a very good writing performance (in comparison with reading), so if you are experiencing heavy loads you should probably look for a better in-memory solution (eg. some key-value-store like redis).
But keep in mind, that, especially if you would use a non-sql solution you should be aware what you want to do with the collected data (just display something like a 'log' or do some more in-deep searching/querying on the data).
If you want to identify different users from the same IP address you should probably look for a cookie-based solution (if you are using django's session framework the session's are per default identified through a cookie - so you could just simply use sessions). Another solution could be doing the logging 'asynchronously' via javascript after the page has loaded in the browser (which could give you more possibilities in identifying the user and avoid additional load when generating the page).
I am relatively new to Django and this is a more general 'concept' question.
For a client I need to construct an expansive database holding data returned from a series of questionnaires as well as some basic biological data. The idea is to move away from the traditional tools (i.e. Microsoft Access) and manage the data in a mysql database using a basic CRUD interface. Initially the project doesn't need to live on the web, but the next phase will to be to have a centralized db with login and admin page.
I have started building the db with Django models which is great, and I want to use the Django admin for the management of the data.
My question is: Is this a good use of Django? Is there anything I should consider before relying on django for the whole process? And is it advisable to us the Django runserver for db admin on a client's local machine (before we get to the web phase).
Any advice would be much appreciated.
Actually, your description sounds exactly like the sort of thing for which Django is an ideal solution. It sounds more complex and customized than a CMS, and if it's as straightforward as your description then the ORM is definitely a good tool for this. Then again, this sounds exactly like an appserver-ready problem, so Rails, Express for Node.js, or even ChicagoBoss (if you're brave) would be good platforms for this kind of application.
And sure, Django is solid enough you can run it with the test server for local clients before you go whole-hog and run the thing on the web. For that, though, I recommend Apache/mod_wsgi, and if you're going to be fault tolerant there are diamond architectures (one front end proxy with monitoring failover, two or more appserver machines, one database with hot spare) and more complex (see: sharding) architectural layouts you can approach later.
If you're going to run it in a client's local setting, and you're not running Windows, I recommend looking into the screen program. It will allow you to detach the running job into the background while making diagnostics accessible in an ongoing fashion.
I'm working on a django website that needs to track popularity of items within a given date/time range. I'll need the ability to have a most viewed today, this week, all time, etc...
There is a "django-popularity" app on github that looks promising but only works with mysql (I'm using postgresql).
My initial thoughts are to create a generic ViewCounter model that logs views for all the objects that are tracked and then run a cron that crunches those numbers into the relevant time-based statistics for each item.
Looking forward to hearing your ideas.
Did you try django-popularity with postgres? The github page just says that the developer has not tested it with anything other than MySQL.
The app has only been tested with MySQL but it should fully work for Postgres with few adjustments. If you do manage to get it to work: please inforrm me. (I'm the developer.) I would love to be able to tell people this product is useable for Postgres as well.
Moreover; all the funcitonality relying on raw SQL checks whether there is actually a MySQL database in use. If not, it should throw an assertion error.
Also, the generic viewcounter is already in my package (it's called ViewTracker, but hell). The cron job seems too much of a hassle to me if we could do either SQL or Django caching as well.