I have a django project, and a problem - it eats a lot of memory and loads hosting too much.
How can I find the problem places in the project which eat a lot of memory?
If you're using Django with DEBUG = True then Django logs every database query which can quickly mount up and use a substantial amount of memory.
If you're not running in DEBUG mode, then take a look at gc module and in particular try adding gc.set_debug(gc.DEBUG_LEAK) to settings.py. This will show you a great deal of information about what objects are using memory.
In general for debugging/profiling, I suggest django-debug-toolbar as a starting location as well as the various tips in:
http://docs.djangoproject.com/en/dev/topics/db/optimization/
However this won't give memory usage info. If you really need that, you can try some middleware using pympler to log memory usage while debugging and run the development server.
http://www.rkblog.rk.edu.pl/w/p/profiling-django-object-size-and-memory-usage-pympler/?c=1
I've found that doing this grinds my webapps to a near-halt and then there are the problems from using the dev-webserver (e.g., media files not getting served).
But as others said your best bet is to set DEBUG=False:
http://docs.djangoproject.com/en/dev/faq/models/#why-is-django-leaking-memory
As Andrew Wilkinson stated this might have to do with the DEBUG = True setting. However it might also be important to know if you're running this project stand-alone or as a webserver.
A Django will automaticly cache querysets when opening a request and remove the references when the request returns. Since there are no requests in a stand-alone project the references are never delete and hence every queryset ever requested get saved.
To fix the stand-alone python issue simply call django.db.reset_queries() after you've done a bunch of request. This will allow the querysets to be garbage collected and fix your leak.
Related
The django docs clearly states
You shouldn't alter settings in your applications at runtime.
Here's the link to that statement
My question is, why is this so? I want to add applications dynamically at runtime, and add databases at runtime, both of which involve editing the settings. Can someone explain why settings are not to be edited at runtime and if exceptions exist, which settings they are and why they are exceptional? I'm not so much interested in how to achieve my goal, but in the reason for why settings shouldn't be altered.
Most settings will not be re-read if you change them at runtime. So Django will not recognise the changes you make.
This is due to the fact that Django is just normal Python code. It isn't like a server that is monitoring your code - it is just part of your code.
In some cases, parts of Django code might respond to changes in settings, because they might do 'settings.DEFAULT_FROM_EMAIL' every time mail is sent, for example.
But if Django processes the setting in any way, like it has to do for INSTALLED_APPS, it isn't going to notice you changed something and re-do the processing.
Which settings are safe? Well, the docs are saying "none are safe", because it might change in the future. Django might save a copy of any setting for some reason, or do some processing.
Changing INSTALLED_APPS could never be made to work, because it alters which modules are imported. There is simply no way that Django could work around the way that Python works at this level - it would need to be able to 'unimport' modules, which is basically impossible (the only way is to restart the process), and there are other problems associated with cross-app links.
AFAIK there is no documentation on which settings are safely modifiable at runtime, but there is an open ticket asking that they be documented more clearly.
If you take a look under the hood at the settings object Django exposes to interface with your project's settings module, you'll notice that there's nothing preventing you to dynamically change the settings at runtime.
You should, however, appreciate that the framework's architecture is built around a request-response flow where a lot of global state is being shared between threads for memory optimization, based on the premise that the application is configured only once during initialization.
I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.
During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.
The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.
If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
Use BulkUpdateContext()
Don't forget your target language
If you can, make the fields shared and unversioned. This should help migration execution speed.
The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!
The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?
What optimization tips do you have?
Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)
Since I wrote that post, I've added a few new things to my normal routines when doing imports.
Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets
And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)
Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay
Hope this helps a bit :-)
I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.
I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?
There are different "Application memory" options (like 80MB...200MB) in django-friendly hosting called webfaction and I'm confused deciding which one I should buy.
Could someone please walk me through the ideas on how to figure out how much memory my project might require (excluding operating system, the main apache server and the database servers memory requirements)? I understand in theory I'll need to perform some kind of load testing, but thought there might be ways to calculate that in advance with some simple/relatively easy understandable approach.
I don't know how hard they enforce application memory usage limit, and another question is: what will happen if more users came to the site and more threads started than what I expected? Will the application crash? Or will delays just become uncomfortable?
And - no, application is not ready yet (I can't measure anything right now). Development environment if it matters is Winodows 7, 64-bit. Hosting itself is some kind of Linux I think.
(Sorry if it's not a stackoverflow question.)
Sorry, but until you have the application completely developed, you can't say anything about the kind of memory it'll use. I recommend that you take their "lowest" plan, and renew to it to fit your needs, or still better: get hosting after you finish developing the application.
On the other hand, if you had the application ready, you could just run it in Apache with your host's config and some sample data to get a rough estimate...
I agree that you can' tell much before your app is ready.
As a vague estimation consider that your host is supposed to be "django-friendly" so some "basic" application should run without problems. Try and upgrade later if that's possible easily.
Also consider the type of data that is processed with your app, I eg. ran into troubles once when I had to process really large image uploads that made the whole site crash.
Also keep in mind if you need some ram for additional processes eg. memcache!
Webfaction are indeed a Django-friendly host, and your application will certainly not crash if it starts needing more memory than you have paid for. What will happen is that you will be allowed to use small amounts of extra memory, but if you consistently go over the limit they will send you a polite email requesting that you either reduce the load or pay for more.
My django application has become painfully slow on the production. Probably it is due to some complex or unindexed queries.
Is there any django-ish way to profile my application?
Try the Django Debug Toolbar. It will show you what queries are executed on each page and how much time they take. It's a really useful, powerful and easy to use tool.
Also, read recommendations about Django performance in Database access optimization from the documentation.
And Django performance tips by
Jacob Kaplan-Moss.
Just type "django-profiling" on google, you'll get these links (and more):
http://code.djangoproject.com/wiki/ProfilingDjango
http://code.google.com/p/django-profiling/
http://www.rkblog.rk.edu.pl/w/p/django-profiling-hotshot-and-kcachegrind/
Personally I'm using the middleware approach - i.e. each user can toggle a "profiling" flag stored in a session, and if my profiling middleware notices that a flag has been set, it uses Python's hotshot module like this:
def process_view(self, request, view_func, view_args, view_kwargs):
# setup things here, along with: settings.DEBUG=True
# to get a SQL dump in connection.queries
profiler = hotshot.Profile(fname)
response = profiler.runcall(view_func, request, *view_args, **view_kwargs)
profiler.close()
# process results
return response
EDIT: For profiling SQL queries http://github.com/robhudson/django-debug-toolbar mentioned by Konstantin is a nice thing - but if your queries are really slow (probably because there are hundreds or thousands of them), then you'll be waiting insane amount of time until it gets loaded into a browser - and then it'll be hard to browse due to slowness. Also, django-debug-toolbar is by design unable to give useful insight into the internals of AJAX requests.
EDIT2: django-extensions has a great profiling command built in:
https://github.com/django-extensions/django-extensions/blob/master/docs/runprofileserver.rst
Just do this and voila:
$ mkdir /tmp/my-profile-data
$ ./manage.py runprofileserver --kcachegrind --prof-path=/tmp/my-profile-data
For profiling data access (which is where the bottleneck is most of the time) check out django-live-profiler. Unlike Django Debug Toolbar it collects data across all requests simultaneously and you can run it in production without too much performance overhead or exposing your app internals.
Shameless plug here, but I recently made https://github.com/django-silk/silk for this purpose. It's somewhat similar to django toolbar but with history, code profiling and more fine grained control over everything.
For all you KCacheGrind fans, I find it's very easy to use the shell in tandem with Django's fantastic test Client for generating profile logs on-the-fly, especially in production. I've used this technique now on several occasions because it has a light touch — no pesky middleware or third-party Django applications are required!
For example, to profile a particular view that seems to be running slow, you could crack open the shell and type this code:
from django.test import Client
import hotshot
c = Client()
profiler = hotshot.Profile("yourprofile.prof") # saves a logfile to your pwd
profiler.runcall(c.get, "/pattern/matching/your/view/")
profiler.close()
To visualize the resulting log, I've used hotshot2cachegrind:
http://kcachegrind.sourceforge.net/html/ContribPython.html
But there are other options as well:
http://www.vrplumber.com/programming/runsnakerun/
https://code.djangoproject.com/wiki/ProfilingDjango
I needed to profile a Django app recently and tried many of these suggestions. I ended up using pyinstrument instead, which can be added to a Django app using a single update to the middleware list and provides a stack-based view of the timings.
Quick summary of my experience with some other tools:
Django Debug Toolbar is great if you the issue is due to SQL queries and works well in combination with pyinstrument
django-silk works well, but requires adding a context manager or decorator to each part of the stack where you want sub-request timings. It also provides an easy way to access cProfile timings and automatically displays ajax timings, both of which can be really helpful.
djdt-flamegraph looked promising, but the page never actually rendered on my system.
Compared to the other tools I tried, pyinstrument was dramatically easier to install and to use.
When the views are not HTML, for example JSON, use simple middleware methods for profiling.
Here are a couple examples:
https://gist.github.com/1229685 - capture all sql calls went into the view
https://gist.github.com/1229681 - profile all method calls used to create the view
You can use line_profiler.
It allows to display a line-by-line analysis of your code with the time alongside of each line (When a line is hit several times, the time is summed up also).
It's used with not-Django python code but there's a little trick to use it on Django in fact: https://stackoverflow.com/a/68163807/1937033
I am using silk for live profiling and inspection of Django application. This is a great tool. You can have a look on it.
https://github.com/jazzband/django-silk
We've got some really strange extraneous DB hits happening in our project. Is there any way to monitor where the requests are coming from, possibly by line number? The SQL printing middleware helps, but we've looked everywhere those kinds of requests might be generated and can't find the source.
If the above isn't possible, any pointers on narrowing down the source would be greatly appreciated.
To find the code executing queries, you can install django-debug-toolbar to figure out what commands are being executed and which tables they're operating on.
Once you've done that, try hooking into the appropriate Django signals for those models and using print and assert to narrow the code.
I'm sure there's a better way to do some of this (a python debugger?) but this is the first thing that comes to mind and probably what I would end up doing myself.
if you want track SQL queries for performance optimization and debug purpose and how to monitor query call in Django
for that this blog will help you out
Tracking SQL Queries for a Request using Django