Django app performing slowly (even when cached) - django

Launching my second-ever Django site.
I've had problems in the past with Django's ORM (basically, the SQL it was generating just wasn't what I wanted and even using things like select_related() I couldn't wrangle it into what it should've been) -- I ended up just writing all my DB queries by hand in my views and using this function, taken from the Django docs, to turn the cursor's responses into usable dictionaries:
def dictfetchall(cursor, returnMultiDictAnyway=False):
"Returns all rows from a cursor as a dict"
desc = cursor.description
rows = [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
if len(rows) == 1 and not returnMultiDictAnyway:
return rows[0]
return rows
I'm almost ready to launch my site but I'm finding pretty huge performance problems on the two different webservers I've tried hosting the app with.
Locally, it doesn't run blazingly fast, but I generally put this down to my machine in general being a little slow. I don't have the numbers to hand (will add later on) but the SQL times aren't crazily high and I've made the effort to optimise MySQL (adding missing indexes etc).
Here's the app, running on two different webhosts (using bit.ly to avoid Google spidering these URLs, sorry!):
http://bit.ly/10iEWYt (hosted on Dreamhost, using Passenger WSGI)
http://bit.ly/UZ9adS (hosted on WebFaction, also using WSGI)
At the moment I have Debug=False on both of those hosts (so there shouldn't be a loading penalty) and a file-based cache of 15 minutes for each one. On the Dreamhost one I have an experimental cronjob hitting the homepage every 15 minutes in an effort to see if this keeps the Python server alive -- this doesn't seem to have done much.
If you try those links you should see how long it takes for the server to respond as you click around, even including the cache (try going from the homepage to another page then back home).
I've tried this profiling middleware but not really sure how to interpret results (can add them to this post later on when I'm home) -- in any case, the functions/lines it pointed to were all inside Django's own code so I struggled to relate that to my own views etc.
Is it likely that the dictfetchall() method above could be an issue here? I use that to work with the results of every DB query on the site (~5-10 per page, most on the homepage). I do have a few included templates but nothing too crazy. I have a context processor for common things like showing album reviews, which I use all over the place. I'm stumped about what else could be causing this slowness.
Thanks, hope this is enough info to be helpful.
EDIT: okay, here's a profiling trace of the site homepage: http://pastebin.com/raw.php?i=c7kHNXAZ -- struggling to interpret it, to be honest.
Also, I looked at the Debug Toolbar stats: 8 SQL queries in 246ms (looking currently at further optimising these), but total time for render of 3235ms (locally). This is what's confusing me.

Related

Django Debug Toolbar Target?

I've got a web page loading pretty slowly, so I installed the Django Debug Toolbar. I'm pretty new at this, so I'm trying to figure out what I can do with it.
I can see the database did 264 queries in 205 ms. Looks kind of high. I'm pretty sure I can cut down on that by adding some indexes and just writing better queries. But my question is: What is a "good" number that should be trying to hit here? What is generally accepted as "fast enough" and further optimization isn't really worth it. 50ms? 20ms?
Also on this same page it's showing 2500ms in user CPU. That sounds terrible to me, and I'm surprised it's so much higher than the database, which I assumed was the bottleneck. Is this maybe an indication that I am trying to do too much in python code instead of at the database layer? Would reducing the number of SQL queries help with CPU? (Waiting between queries?). Again is there some well known target response time I should be aiming for.
I'm looking for a snappy response from my clients. Right now when I click around I can feel a "pregnant pause" before the pages load.
By default accessing related model fields results in one extra query per model per row. Look into select_related() and prefetch_related(), this usually cuts down number of queries and speeds things up by a lot. I think debug toolbar shows you the actual queries, if not, need to enable sql logs before doing any query optimizations. Once you cut down number of queries to a minimum (no extra queries per pow), look for the slowest query and use EXPLAIN sql syntax to see if indexes are being used, this is another area where it can get slow especially on big data.
Usually database is the bottleneck, unless you are doing some major looping in your code. If you believe python code is slow, then need to profile it, otherwise it's just guessing.

Django/Sqlite Improve Database performance

We are developing an online school diary application using django. The prototype is ready and the project will go live next year with about 500 students.
Initially we used sqlite and hoped that for the initial implementation this would perform well enough.
The data tables are such that to obtain details of a school day (periods, classes, teachers, classrooms, many tables are used and the database access takes 67ms on a reasonably fast PC.
Most of the data is static once the year starts with perhaps minor changes to classrooms. I thought of extracting the timetable for each student for each term day so no table joins would be needed. I put this data into a text file for one student, the file is 100K in size. The time taken to read this data and process it for a days timetable is about 8ms. If I pre-load the data on login and store it in sessions it takes 7ms at login and 2ms for each query.
With 500 students what would be the impact on the web server using this approach and what other options are there (putting the student text files into a sort of memory cache rather than session for example?)
There will not be a great deal of data entry, students adding notes, teachers likewise, so it will mostly be checking the timetable status and looking to see what events exist for that day or week.
What is your expected response time, and what is your expected number of requests per minute? One twentieth of a second for the database access (which is likely to be slow part) for a request doesn't sound like a problem to me. SQLite should perform fine in a read-mostly situation like this. So I'm not convinced you even have a performance problem.
If you want faster response you could consider:
First, ensuring that you have the best response time by checking your indexes and profiling individual retrievals to look for performance bottlenecks.
Pre-computing the static parts of the system and storing the HTML. You can put the HTML right back into the database or store it as disk files.
Using the database as a backing store only (to preserve state of the system when the server is down) and reading the entire thing into in-memory structures at system start-up. This eliminates disk access for the data, although it limits you to one physical server.
This sounds like premature optimization. 67ms is scarcely longer than the ~50ms where we humans can observe that there was a delay.
SQLite's representation of your data is going to be more efficient than a text format, and unlike a text file that you have to parse, the operating system can efficiently cache just the portions of your database that you're actually using in RAM.
You can lock down ~50MB of RAM to cache a parsed representation of the data for all the students, but you'll probably get better performance using that RAM for something else, like the OS disk cache.
I agree with some of other answers which suggest to use MySQL or PostgreSQL instead of SQLite. It is not designed to be used as production db. It is great for storing data for one-user applications such as mobile apps or even a desktop application, but it falls short very quickly in server applications. With Django it is trivial to switch to any other full-pledges database backend.
If you switch to one of those, you should not really have any performance issues, especially if you will do all the necessary joins using select_related and prefetch_related.
If you will still need more performance, considering that "most of the data is static", you actually might want to convert Django site a static site (a collection of html files) and then serve those using nginx or something similar to that. The simplest way I can think of doing that is to just write a cron-job which will loop over all needed url-configs, request the page from Django and then save that as an html file. If you want to go into that direction, you also might want to take a look at Python's static site generators: Hyde and Pelican.
This approach will certainly work much faster then any caching system however you will loose any dynamic components of the site. If you need them, then caching seems like the best and fastest solution.
You should use MySQL or PostgreSQL for your production database. sqlite3 isn't a good idea.
You should also avoid pre-loading data on login. Since your records can be inserted in advance, write django management commands and run the import to your chosen database before hand and design your models such that when a user logs in, the user would already be able to access and view/edit his or her related data (which are pre-inserted before the application even goes live). Hardcoding data operations when log in does not smell right at all from an application design point-of-view.
https://docs.djangoproject.com/en/dev/howto/custom-management-commands/
The benefit of designing your django models and using custom management commands to insert the records right way before your application goes live implies that you can use django orm to make the appropriate relationships between users and their records.
I suspect - based on your description of what you need above - that you need to re-look at the approach you are creating this application.
With 500 students, we shouldn't even be talking about caching. If you want response speed, you should deal with the following issues in priority:-
Use a production quality database
Design your application use case correctly and design your application model right
Pre-load any data you need to the production database
front end optimization comes first (css/js compression etc)
use django debug toolbar to figure out if any of your sql is slow and optimize specifically those
implement caching (memcached etc) as needed
As a general guideline.

Optimisation tips when migrating data into Sitecore CMS

I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.
During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.
The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.
If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
Use BulkUpdateContext()
Don't forget your target language
If you can, make the fields shared and unversioned. This should help migration execution speed.
The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!
The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?
What optimization tips do you have?
Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)
Since I wrote that post, I've added a few new things to my normal routines when doing imports.
Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets
And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)
Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay
Hope this helps a bit :-)
I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.
I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?

Profiling Django

My django application has become painfully slow on the production. Probably it is due to some complex or unindexed queries.
Is there any django-ish way to profile my application?
Try the Django Debug Toolbar. It will show you what queries are executed on each page and how much time they take. It's a really useful, powerful and easy to use tool.
Also, read recommendations about Django performance in Database access optimization from the documentation.
And Django performance tips by
Jacob Kaplan-Moss.
Just type "django-profiling" on google, you'll get these links (and more):
http://code.djangoproject.com/wiki/ProfilingDjango
http://code.google.com/p/django-profiling/
http://www.rkblog.rk.edu.pl/w/p/django-profiling-hotshot-and-kcachegrind/
Personally I'm using the middleware approach - i.e. each user can toggle a "profiling" flag stored in a session, and if my profiling middleware notices that a flag has been set, it uses Python's hotshot module like this:
def process_view(self, request, view_func, view_args, view_kwargs):
# setup things here, along with: settings.DEBUG=True
# to get a SQL dump in connection.queries
profiler = hotshot.Profile(fname)
response = profiler.runcall(view_func, request, *view_args, **view_kwargs)
profiler.close()
# process results
return response
EDIT: For profiling SQL queries http://github.com/robhudson/django-debug-toolbar mentioned by Konstantin is a nice thing - but if your queries are really slow (probably because there are hundreds or thousands of them), then you'll be waiting insane amount of time until it gets loaded into a browser - and then it'll be hard to browse due to slowness. Also, django-debug-toolbar is by design unable to give useful insight into the internals of AJAX requests.
EDIT2: django-extensions has a great profiling command built in:
https://github.com/django-extensions/django-extensions/blob/master/docs/runprofileserver.rst
Just do this and voila:
$ mkdir /tmp/my-profile-data
$ ./manage.py runprofileserver --kcachegrind --prof-path=/tmp/my-profile-data
For profiling data access (which is where the bottleneck is most of the time) check out django-live-profiler. Unlike Django Debug Toolbar it collects data across all requests simultaneously and you can run it in production without too much performance overhead or exposing your app internals.
Shameless plug here, but I recently made https://github.com/django-silk/silk for this purpose. It's somewhat similar to django toolbar but with history, code profiling and more fine grained control over everything.
For all you KCacheGrind fans, I find it's very easy to use the shell in tandem with Django's fantastic test Client for generating profile logs on-the-fly, especially in production. I've used this technique now on several occasions because it has a light touch — no pesky middleware or third-party Django applications are required!
For example, to profile a particular view that seems to be running slow, you could crack open the shell and type this code:
from django.test import Client
import hotshot
c = Client()
profiler = hotshot.Profile("yourprofile.prof") # saves a logfile to your pwd
profiler.runcall(c.get, "/pattern/matching/your/view/")
profiler.close()
To visualize the resulting log, I've used hotshot2cachegrind:
http://kcachegrind.sourceforge.net/html/ContribPython.html
But there are other options as well:
http://www.vrplumber.com/programming/runsnakerun/
https://code.djangoproject.com/wiki/ProfilingDjango
I needed to profile a Django app recently and tried many of these suggestions. I ended up using pyinstrument instead, which can be added to a Django app using a single update to the middleware list and provides a stack-based view of the timings.
Quick summary of my experience with some other tools:
Django Debug Toolbar is great if you the issue is due to SQL queries and works well in combination with pyinstrument
django-silk works well, but requires adding a context manager or decorator to each part of the stack where you want sub-request timings. It also provides an easy way to access cProfile timings and automatically displays ajax timings, both of which can be really helpful.
djdt-flamegraph looked promising, but the page never actually rendered on my system.
Compared to the other tools I tried, pyinstrument was dramatically easier to install and to use.
When the views are not HTML, for example JSON, use simple middleware methods for profiling.
Here are a couple examples:
https://gist.github.com/1229685 - capture all sql calls went into the view
https://gist.github.com/1229681 - profile all method calls used to create the view
You can use line_profiler.
It allows to display a line-by-line analysis of your code with the time alongside of each line (When a line is hit several times, the time is summed up also).
It's used with not-Django python code but there's a little trick to use it on Django in fact: https://stackoverflow.com/a/68163807/1937033
I am using silk for live profiling and inspection of Django application. This is a great tool. You can have a look on it.
https://github.com/jazzband/django-silk

Monitor database requests in Django, tied to line number

We've got some really strange extraneous DB hits happening in our project. Is there any way to monitor where the requests are coming from, possibly by line number? The SQL printing middleware helps, but we've looked everywhere those kinds of requests might be generated and can't find the source.
If the above isn't possible, any pointers on narrowing down the source would be greatly appreciated.
To find the code executing queries, you can install django-debug-toolbar to figure out what commands are being executed and which tables they're operating on.
Once you've done that, try hooking into the appropriate Django signals for those models and using print and assert to narrow the code.
I'm sure there's a better way to do some of this (a python debugger?) but this is the first thing that comes to mind and probably what I would end up doing myself.
if you want track SQL queries for performance optimization and debug purpose and how to monitor query call in Django
for that this blog will help you out
Tracking SQL Queries for a Request using Django