We've got some really strange extraneous DB hits happening in our project. Is there any way to monitor where the requests are coming from, possibly by line number? The SQL printing middleware helps, but we've looked everywhere those kinds of requests might be generated and can't find the source.
If the above isn't possible, any pointers on narrowing down the source would be greatly appreciated.
To find the code executing queries, you can install django-debug-toolbar to figure out what commands are being executed and which tables they're operating on.
Once you've done that, try hooking into the appropriate Django signals for those models and using print and assert to narrow the code.
I'm sure there's a better way to do some of this (a python debugger?) but this is the first thing that comes to mind and probably what I would end up doing myself.
if you want track SQL queries for performance optimization and debug purpose and how to monitor query call in Django
for that this blog will help you out
Tracking SQL Queries for a Request using Django
Related
Is there a Progress profiling tool that allows me to see the queries executing against an OpenEdge database?
We're doing a migration from an OpenEdge database into a SQL database. In order to map the data correctly we'd like to run certain application reports on the OpenEdge database and see what database queries are being executed to retrieve the data.
Is this possible with some kind of Progress profiling tool (a la SQL Server Profiling)? Preferably free...
Progress is record oriented, not set oriented like SQL, so your reports aren't a single query or a set of queries, it is more likely a lot of record lookups combined with what you'd consider query-like operations.
Depending on the version you're running, there is a way to send a signal to the client to see what it is currently doing, however doing so will almost certainly not give you enough information to discern what's going on "under the hood."
Long story short, your options are to get a Dataserver product so you can attach the Progress client to an SQL database - this will enable you to use an SQL database w/out losing the Progress functionality. The second option is to get a copy of the program's source code to find out how the reports are structured.
Tim is quite right -- without the source code, looking at the queries is unlikely to provide you with much insight.
None the less there are some tools and capabilities that will provide information about queries. Probably the most useful for your purpose would be to specify something similar to:
-logentrytypes QryInfo -logginglevel 3 -clientlog "mylog.log"
at session startup.
You can use session triggers to identify almost anything done by any program, without modifying or having access to the source of those programs. Setting this up may be more work than is worth it for your purpose. We have a testing system built around this idea. One big flaw: triggers cannot be fired for CAN-FIND.
Launching my second-ever Django site.
I've had problems in the past with Django's ORM (basically, the SQL it was generating just wasn't what I wanted and even using things like select_related() I couldn't wrangle it into what it should've been) -- I ended up just writing all my DB queries by hand in my views and using this function, taken from the Django docs, to turn the cursor's responses into usable dictionaries:
def dictfetchall(cursor, returnMultiDictAnyway=False):
"Returns all rows from a cursor as a dict"
desc = cursor.description
rows = [
dict(zip([col[0] for col in desc], row))
for row in cursor.fetchall()
]
if len(rows) == 1 and not returnMultiDictAnyway:
return rows[0]
return rows
I'm almost ready to launch my site but I'm finding pretty huge performance problems on the two different webservers I've tried hosting the app with.
Locally, it doesn't run blazingly fast, but I generally put this down to my machine in general being a little slow. I don't have the numbers to hand (will add later on) but the SQL times aren't crazily high and I've made the effort to optimise MySQL (adding missing indexes etc).
Here's the app, running on two different webhosts (using bit.ly to avoid Google spidering these URLs, sorry!):
http://bit.ly/10iEWYt (hosted on Dreamhost, using Passenger WSGI)
http://bit.ly/UZ9adS (hosted on WebFaction, also using WSGI)
At the moment I have Debug=False on both of those hosts (so there shouldn't be a loading penalty) and a file-based cache of 15 minutes for each one. On the Dreamhost one I have an experimental cronjob hitting the homepage every 15 minutes in an effort to see if this keeps the Python server alive -- this doesn't seem to have done much.
If you try those links you should see how long it takes for the server to respond as you click around, even including the cache (try going from the homepage to another page then back home).
I've tried this profiling middleware but not really sure how to interpret results (can add them to this post later on when I'm home) -- in any case, the functions/lines it pointed to were all inside Django's own code so I struggled to relate that to my own views etc.
Is it likely that the dictfetchall() method above could be an issue here? I use that to work with the results of every DB query on the site (~5-10 per page, most on the homepage). I do have a few included templates but nothing too crazy. I have a context processor for common things like showing album reviews, which I use all over the place. I'm stumped about what else could be causing this slowness.
Thanks, hope this is enough info to be helpful.
EDIT: okay, here's a profiling trace of the site homepage: http://pastebin.com/raw.php?i=c7kHNXAZ -- struggling to interpret it, to be honest.
Also, I looked at the Debug Toolbar stats: 8 SQL queries in 246ms (looking currently at further optimising these), but total time for render of 3235ms (locally). This is what's confusing me.
We are planning to use django-haystack with Solr4.0 (with near real time search) for our web app, and I was wondering if anyone could advice on the limits of using haystack (when compared to using solr directly). i.e is there a performance hit/overhead of using django-haystack? We have around 3 million+ documents which would need indexing + an additional (estimated) 100k added everyday.
Ideally, I'd think we need a simple API over Solr4 - but I am finding it hard to find anything specific to python which is still actively maintained (except django-haystack ofcourse). I'd appreciate any guidance on this.
It seems like your question could be rephrased "How has Haystack burned you?" Haystack is nice for some things, but has also caused me some headaches on the job. Here are some things I've had to code around.
You mentioned indexing. Haystack has management commands for rebuilding the index. We'll use these for nuke and pave rebuilding during testing, but for reindexing our production content we kind of hit the wall. The command would take forever, you wouldn't know where it was in terms of progress, and if it failed you were screwed and had to start all over again. We reached a point where we had too much content and it would fail reliably enough. We switched to making batches of content and reindexing these in celery tasks. We made a management command to do the batching and kick off all those tasks. This has been a little more robust in the face of partial failures and, even better, it actually finishes.The underlying task will use haystack calls to convert a database object to a solr document -- this ORMiness is nice and hasn't burned me yet. We edit our solr schema by hand, though.
The query API is okay for simple stuff. If you are doing more convoluted solr queries, you can find yourself just feeding in raw queries. This can lead to spaghetti code. We eventually pushed that raw spaghetti into request handlers in the solrconfig file instead. We still use haystack to turn on highlighting or to choose a specific request handler, but again, I'm happier when we keep it simple and we did hack on a method to add arbitrary parameters as needed.
There are other assumptions about how you want to use solr that seem to get baked in to the code. Haystack is the only open source project where I actually have some familiarity with the code. I'm not sure if that's a good thing, since it hasn't always been by choice. We have plenty of layer cake code that extends a haystack class and overrides it to do the right thing. This isn't terrible, but when you have to also copy and paste haystack code into those overridden methods then it starts being more terrible.
So... it's not perfect, but parts are handy. If you are on the fence about writing your own API anyway, using haystack might save you some trouble, especially when you want to take all your solr results and pass them back into django templates or whatever. It sounds like with the steady influx of documents, you will want to write some batch indexing jobs anyway. Start with that, prepare to be burnt a little and look at the source when that happens, it's actually pretty readable.
I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.
During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.
The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.
If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
Use BulkUpdateContext()
Don't forget your target language
If you can, make the fields shared and unversioned. This should help migration execution speed.
The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!
The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?
What optimization tips do you have?
Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)
Since I wrote that post, I've added a few new things to my normal routines when doing imports.
Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets
And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)
Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay
Hope this helps a bit :-)
I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.
I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?
My django application has become painfully slow on the production. Probably it is due to some complex or unindexed queries.
Is there any django-ish way to profile my application?
Try the Django Debug Toolbar. It will show you what queries are executed on each page and how much time they take. It's a really useful, powerful and easy to use tool.
Also, read recommendations about Django performance in Database access optimization from the documentation.
And Django performance tips by
Jacob Kaplan-Moss.
Just type "django-profiling" on google, you'll get these links (and more):
http://code.djangoproject.com/wiki/ProfilingDjango
http://code.google.com/p/django-profiling/
http://www.rkblog.rk.edu.pl/w/p/django-profiling-hotshot-and-kcachegrind/
Personally I'm using the middleware approach - i.e. each user can toggle a "profiling" flag stored in a session, and if my profiling middleware notices that a flag has been set, it uses Python's hotshot module like this:
def process_view(self, request, view_func, view_args, view_kwargs):
# setup things here, along with: settings.DEBUG=True
# to get a SQL dump in connection.queries
profiler = hotshot.Profile(fname)
response = profiler.runcall(view_func, request, *view_args, **view_kwargs)
profiler.close()
# process results
return response
EDIT: For profiling SQL queries http://github.com/robhudson/django-debug-toolbar mentioned by Konstantin is a nice thing - but if your queries are really slow (probably because there are hundreds or thousands of them), then you'll be waiting insane amount of time until it gets loaded into a browser - and then it'll be hard to browse due to slowness. Also, django-debug-toolbar is by design unable to give useful insight into the internals of AJAX requests.
EDIT2: django-extensions has a great profiling command built in:
https://github.com/django-extensions/django-extensions/blob/master/docs/runprofileserver.rst
Just do this and voila:
$ mkdir /tmp/my-profile-data
$ ./manage.py runprofileserver --kcachegrind --prof-path=/tmp/my-profile-data
For profiling data access (which is where the bottleneck is most of the time) check out django-live-profiler. Unlike Django Debug Toolbar it collects data across all requests simultaneously and you can run it in production without too much performance overhead or exposing your app internals.
Shameless plug here, but I recently made https://github.com/django-silk/silk for this purpose. It's somewhat similar to django toolbar but with history, code profiling and more fine grained control over everything.
For all you KCacheGrind fans, I find it's very easy to use the shell in tandem with Django's fantastic test Client for generating profile logs on-the-fly, especially in production. I've used this technique now on several occasions because it has a light touch — no pesky middleware or third-party Django applications are required!
For example, to profile a particular view that seems to be running slow, you could crack open the shell and type this code:
from django.test import Client
import hotshot
c = Client()
profiler = hotshot.Profile("yourprofile.prof") # saves a logfile to your pwd
profiler.runcall(c.get, "/pattern/matching/your/view/")
profiler.close()
To visualize the resulting log, I've used hotshot2cachegrind:
http://kcachegrind.sourceforge.net/html/ContribPython.html
But there are other options as well:
http://www.vrplumber.com/programming/runsnakerun/
https://code.djangoproject.com/wiki/ProfilingDjango
I needed to profile a Django app recently and tried many of these suggestions. I ended up using pyinstrument instead, which can be added to a Django app using a single update to the middleware list and provides a stack-based view of the timings.
Quick summary of my experience with some other tools:
Django Debug Toolbar is great if you the issue is due to SQL queries and works well in combination with pyinstrument
django-silk works well, but requires adding a context manager or decorator to each part of the stack where you want sub-request timings. It also provides an easy way to access cProfile timings and automatically displays ajax timings, both of which can be really helpful.
djdt-flamegraph looked promising, but the page never actually rendered on my system.
Compared to the other tools I tried, pyinstrument was dramatically easier to install and to use.
When the views are not HTML, for example JSON, use simple middleware methods for profiling.
Here are a couple examples:
https://gist.github.com/1229685 - capture all sql calls went into the view
https://gist.github.com/1229681 - profile all method calls used to create the view
You can use line_profiler.
It allows to display a line-by-line analysis of your code with the time alongside of each line (When a line is hit several times, the time is summed up also).
It's used with not-Django python code but there's a little trick to use it on Django in fact: https://stackoverflow.com/a/68163807/1937033
I am using silk for live profiling and inspection of Django application. This is a great tool. You can have a look on it.
https://github.com/jazzband/django-silk