How can I prevent Django, for testing purposes, from automatically fetching related tables not specified in the select_related() call during the intial query?
I have a large application where I make significant use of
select_related() to bring in related model data during each original
query. All select_related() calls are used to specify the specific related models, rather than relying on the default, e.g. select_related('foo', 'bar', 'foo__bar')
As the application has grown, the select_related calls haven't
completely kept up, leaving a number of scenarios where Django happily
and kindly goes running off to the database to fetch related model
rows. This significantly increases the number of database hits, which
I obviously don't want.
I've had some success in tracking these down by checking the queries
generated using the django.db.connection.queries collection, but some
remain unsolved.
I've tried to find a suitable patch location in the django code to raise an
exception in this scenario, making the tracking much easier, but tend
to get lost in the code.
Thanks.
After some more digging, I've found the place in the code to do this.
The file in question is django/db/models/fields/related.py
You need to insert two lines into this file.
Locate class "SingleRelatedObjectDescriptor". You need to change the function __get__() as follows:
def __get__(self, instance, instance_type=None):
if instance is None:
return self
try:
return getattr(instance, self.cache_name)
except AttributeError:
raise Exception("Automated Database Fetch on %s.%s" % (instance._meta.object_name, self.related.get_accessor_name()))
# leave the old code here for when you revert!
Similarly, in class "ReverseSingleRelatedObjectDescriptor" further down the code, you again need to change __get__() to:
def __get__(self, instance, instance_type=None):
if instance is None:
return self
cache_name = self.field.get_cache_name()
try:
return getattr(instance, cache_name)
except AttributeError:
raise Exception("Automated Database Fetch on %s.%s" % (instance._meta.object_name, self.field.name))
# BEWARE: % parameters are different to previous class
# leave old code here for when you revert
Once you've done this, you'll find that Django raises an exception every time it performs an automatic database lookup. This is pretty annoying when you first start, but it will help you track down those pesky database lookups. Obviously, when you've found them all, it's probably best to revert the database code back to normal. I would only suggest using this during a debugging/performance investigation phase and not in the live production code!
So, you're asking how to stop a method from doing what it's specifically designed to do? I don't understand why you would want to do that.
However, one thing to know about select_related is that it doesn't automatically follow relationships which are defined as null=True. So if you can set your FKs to that for now, the relationship won't be followed.
Related
Apologies if my question is very similar to this one and my approach to trying to solve the issue is 100% based on the answers to that question but I think this is slightly more involved and may target a part of Django that I do not fully understand.
I have a CMS system written in Django 1.5 with a few APIs accessible by two desktop applications which cannot make use of cookies as a browser does.
I noticed that every time an API call is made by one of the applications (once every 3 seconds), a new entry is added to django_session table. Looking closely at this table and the code, I can see that all entries to a specific URL are given the same session_data value but a different session_key. This is probably because Django determines that when one of these calls is made from a cookie-less application, the request.session._session_key is None.
The result of this is that thousands of entries are created every day in django_session table and simply running ./manage clearsessions using a daily cron will not remove them from this table, making whole database quite large for no obvious benefit. Note that I even tried set_expiry(1) for these requests, but ./manage clearsessions still doesn't get rid of them.
To overcome this problem through Django, I've had to override 3 Django middlewares as I'm using SessionMiddleware, AuthenticationMiddleware and MessageMiddleware:
from django.contrib.sessions.middleware import SessionMiddleware
from django.contrib.auth.middleware import AuthenticationMiddleware
from django.contrib.messages.middleware import MessageMiddleware
class MySessionMiddleware(SessionMiddleware):
def process_request(self, request):
if ignore_these_requests(request):
return
super(MySessionMiddleware, self).process_request(request)
def process_response(self, request, response):
if ignore_these_requests(request):
return response
return super(MySessionMiddleware, self).process_response(request, response)
class MyAuthenticationMiddleware(AuthenticationMiddleware):
def process_request(self, request):
if ignore_these_requests(request):
return
super(MyAuthenticationMiddleware, self).process_request(request)
class MyMessageMiddleware(MessageMiddleware):
def process_request(self, request):
if ignore_these_requests(request):
return
super(MyMessageMiddleware, self).process_request(request)
def ignore_these_requests(request):
if request.POST and request.path.startswith('/api/url1/'):
return True
elif request.path.startswith('/api/url2/'):
return True
return False
Although the above works, I can't stop thinking that I may have made this more complex that it really is and that this is not the most efficient approach as 4 extra checks are made for every single request.
Are there any better ways to do the above in Django? Any suggestions would be greatly appreciated.
Dirty hack: removing session object conditionally.
One approach would be including a single middleware discarding the session object conditional to the request. It's a bit of a dirty hack for two reasons:
The Session object is created at first and removed later. (inefficient)
You're relying on the fact that the Session object isn't written to the database yet at that point. This may change in future Django versions (though not very likely).
Create a custom middleware:
class DiscardSessionForAPIMiddleware(object):
def process_request(self, request):
if request.path.startswith("/api/"): # Or any other condition
del request.session
Make sure you install this after the django.contrib.sessions.middleware.SessionMiddleware in the MIDDLEWARE_CLASSES tuple in your settings.py.
Also check that settings.SESSION_SAVE_EVERY_REQUEST is set to False (the default). This makes it delay the write to the database until the data is modified.
Alternatives (untested)
Use process_view instead of process_request in the custom middleware so you can check for the view instead of the request path. Advantage: condition check is better. Disadvantage: other middleware might already have done something with the session object and then this approach fails.
Create a custom decorator (or a shared base class) for your API views deleting the session object in there. Advantage: responsibility for doing this will be with the views, the place where you probably like it best (view providing the API). Disadvantage: same as above, but deleting the session object in an even later stage.
Make sure your settings.SESSION_SAVE_EVERY_REQUEST is set to False. That will go a long way in ensuring sessions aren't saved every time.
Also, if you have any ajax requests going to your server, ensure that the request includes the cookie information so that the server doesn't think each request belongs to a different person.
I have made a django app that creates models and database tables on the fly. This is, as far as I can tell, the only viable way of doing what I need. The problem arises of how to pass a dynamically created model between pages.
I can think of a few ways of doing such but they all sound horrible. The methods I can think of are:
Use global variables within views.py. This seems like a horrible hack and likely to cause conflicts if there are multiple simultaneous users.
Pass a reference in the URL and use some eval hackery to try and refind the model. This is probably stupid as the model could potentially be garbage collected en route.
Use a place-holder app. This seems like a bad idea due to conflicts between multiple users.
Having an invisible form that posts the model when a link is clicked. Again very hacky.
Is there a good way of doing this, and if not, is one of these methods more viable than the others?
P.S. In case it helps my app receives data (as a json string) from a pre-existing database, and then caches it locally (i.e. on the webserver) creating an appropriate model and table on the fly. The idea is then to present this data and do various filtering and drill downs on it with-out placing undue strain on the main database (as each query returns a few hundred results out of a database of hundreds of millions of data points.) W.R.T. 3, the tables are named based on a hash of the query and time stamp, however a place-holder app would have a predetermined name.
Thanks,
jhoyla
EDITED TO ADD: Thanks guys, I have now solved this problem. I ended up using both answers together to give a complete answer. As I can only accept one I am going to accept the contenttypes one, sadly I don't have the reputation to give upvotes yet, however if/when I ever do I will endeavor to return and upvote appropriately.
The solution in it's totality,
from django.contrib.contenttypes.models import ContentType
view_a(request):
model = create_model(...)
request.session['model'] = ContentType.objects.get_for_model(model)
...
view_b(request):
ctmodel = request.session.get('model', None)
if not ctmodel:
return Http404
model = ctmodel.model_class()
...
My first thought would be to use content types and to pass the type/model information via the url.
You could also use Django's sessions framework, e.g.
def view_a(request):
your_model = request.session.get('your_model', None)
if type(your_model) == YourModel
your_model.name = 'something_else'
request.session['your_model'] = your_model
...
def view_b(request):
your_model = request.session.get('your_model', None)
...
You can store almost anything in the session dictionary, and managing it is also easy:
del request.session['your_model']
I have a Django view in which I call my_model.save() in a single object (conditionally) in multiple spots. my_model is a normal model class.
save() is commited at once in Django, and thus, the database gets hit several times in the worst case. To prevent this, I defined a boolean variable save_model and set it to True in the case of a object modification. At the end of my view, I check this boolean and call save on my object in needed.
Is there a simpler way of doing this? I tried Djangos transaction.commit_on_success as a view decorator, but the save-calls appear to get queued and committed anyway.
You could look into django-dirtyfields.
Simply use DirtyFieldsMixin as a mixin to your model. You will then be able to check if an object has changed (using obj.is_dirty()) before doing a save().
You can use transaction support everywhere in your code, Django docs say it explicitely:
Although the examples below use view functions as examples, these decorators and context managers can be used anywhere in your code that you need to deal with transactions
But this isn't the thing transactions are for. You can get rid of your boolean variable using some existing app for that, like django-dirtyfields.
But it smells like a bad design. Why do you need to call save multiple times? Are you sure there is no way to call it only once?
There can be two approaches for this... But they are similar... First one is calling save() before returning response.
def my_view(request):
obj = Mymodel.objects.get(...)
if cond1:
obj.attr1 = True
elif cond2:
obj.attr2 = True
else:
obj.attr1 = False
obj.attr2 = False
obj.save()
return .......
Second one is your approach...
But there is no other way to do this, except you define your own decorator or do some other approach, but in fact, you need to check if there is any modification on your model (or you want to save changes to your data).
I'm having some issues with double-posting on my site. I figure a simple unique constraint across all the relevant fields will solve the issue on a database level, but then it just produces a nasty error page for the user. Is there a way I can turn this into a pretty form error instead? Like a non_field_error? Or what approach should I take?
Maybe something like this will help you:
class YourForm(forms.Form):
# Everything as before.
...
def clean(self):
cleaned_data = self.cleaned_data
your_unique_key = cleaned_data['your_unique_key']
if your_unique_key and YourModel.objects.get(your_unique_key=your_unique_key):
raise forms.ValidationError("not unique")
# Always return the full collection of cleaned data.
return cleaned_data
The clean() method will allow you to access all fields of the form which might be useful if you have a combined unique key. Otherwise a (sightly shorter) clean_your_unique_key() might suit you better.
And please note that under rare circumstances (race conditions) the form validation might not report a duplicate entry (but it's of course reported by the database engine). But for most applications the provided example will be the easier and more maintainable one, so I still recommend this approach.
as far as a 'nasty error page' for the user, Django lets you customize your own 500,404 and probably other pages. general info on that:
In order to use the Http404 exception
to its fullest, you should create a
template that is displayed when a 404
error is raised. This template should
be called 404.html and located in the
top level of your template tree.
-- http://docs.djangoproject.com/en/dev/topics/http/views/
another nice way, not as DRY as tux21b's solution but perhaps a little easier to understand for a one-time solution, might be to catch the error intelligently. one way is to do so without even bothering to violate the constraint - a simple query should verify whether the user is about to do something illegal.
okToUpdate=MyModel.objects.filter(parameters=values...).count()
if okToUpdate>0: # an object already exists
errorExists=True
errors={customError:customMessage}
...
if errorExists:
return render_to_response(errors,'customErrorPage.html')
else:
# return whatever you normally would return
you then use render_to_response to render a custom error page.
(another way is to allow the database violation to occur, then catch that error and do the same thing... i theorize that a DB gets slightly less stress doing a lookup than handling an exception but it's up to you how you like to do things).
JB
If there a way to protect against concurrent modifications of the same data base entry by two or more users?
It would be acceptable to show an error message to the user performing the second commit/save operation, but data should not be silently overwritten.
I think locking the entry is not an option, as a user might use the "Back" button or simply close his browser, leaving the lock for ever.
This is how I do optimistic locking in Django:
updated = Entry.objects.filter(Q(id=e.id) && Q(version=e.version))\
.update(updated_field=new_value, version=e.version+1)
if not updated:
raise ConcurrentModificationException()
The code listed above can be implemented as a method in Custom Manager.
I am making the following assumptions:
filter().update() will result in a single database query because filter is lazy
a database query is atomic
These assumptions are enough to ensure that no one else has updated the entry before. If multiple rows are updated this way you should use transactions.
WARNING Django Doc:
Be aware that the update() method is
converted directly to an SQL
statement. It is a bulk operation for
direct updates. It doesn't run any
save() methods on your models, or emit
the pre_save or post_save signals
This question is a bit old and my answer a bit late, but after what I understand this has been fixed in Django 1.4 using:
select_for_update(nowait=True)
see the docs
Returns a queryset that will lock rows until the end of the transaction, generating a SELECT ... FOR UPDATE SQL statement on supported databases.
Usually, if another transaction has already acquired a lock on one of the selected rows, the query will block until the lock is released. If this is not the behavior you want, call select_for_update(nowait=True). This will make the call non-blocking. If a conflicting lock is already acquired by another transaction, DatabaseError will be raised when the queryset is evaluated.
Of course this will only work if the back-end support the "select for update" feature, which for example sqlite doesn't. Unfortunately: nowait=True is not supported by MySql, there you have to use: nowait=False, which will only block until the lock is released.
Actually, transactions don't help you much here ... unless you want to have transactions running over multiple HTTP requests (which you most probably don't want).
What we usually use in those cases is "Optimistic Locking". The Django ORM doesn't support that as far as I know. But there has been some discussion about adding this feature.
So you are on your own. Basically, what you should do is add a "version" field to your model and pass it to the user as a hidden field. The normal cycle for an update is :
read the data and show it to the user
user modify data
user post the data
the app saves it back in the database.
To implement optimistic locking, when you save the data, you check if the version that you got back from the user is the same as the one in the database, and then update the database and increment the version. If they are not, it means that there has been a change since the data was loaded.
You can do that with a single SQL call with something like :
UPDATE ... WHERE version = 'version_from_user';
This call will update the database only if the version is still the same.
Django 1.11 has three convenient options to handle this situation depending on your business logic requirements:
Something.objects.select_for_update() will block until the model become free
Something.objects.select_for_update(nowait=True) and catch DatabaseError if the model is currently locked for update
Something.objects.select_for_update(skip_locked=True) will not return the objects that are currently locked
In my application, which has both interactive and batch workflows on various models, I found these three options to solve most of my concurrent processing scenarios.
The "waiting" select_for_update is very convenient in sequential batch processes - I want them all to execute, but let them take their time. The nowait is used when an user wants to modify an object that is currently locked for update - I will just tell them it's being modified at this moment.
The skip_locked is useful for another type of update, when users can trigger a rescan of an object - and I don't care who triggers it, as long as it's triggered, so skip_locked allows me to silently skip the duplicated triggers.
For future reference, check out https://github.com/RobCombs/django-locking. It does locking in a way that doesn't leave everlasting locks, by a mixture of javascript unlocking when the user leaves the page, and lock timeouts (e.g. in case the user's browser crashes). The documentation is pretty complete.
You should probably use the django transaction middleware at least, even regardless of this problem.
As to your actual problem of having multiple users editing the same data... yes, use locking. OR:
Check what version a user is updating against (do this securely, so users can't simply hack the system to say they were updating the latest copy!), and only update if that version is current. Otherwise, send the user back a new page with the original version they were editing, their submitted version, and the new version(s) written by others. Ask them to merge the changes into one, completely up-to-date version. You might try to auto-merge these using a toolset like diff+patch, but you'll need to have the manual merge method working for failure cases anyway, so start with that. Also, you'll need to preserve version history, and allow admins to revert changes, in case someone unintentionally or intentionally messes up the merge. But you should probably have that anyway.
There's very likely a django app/library that does most of this for you.
Another thing to look for is the word "atomic". An atomic operation means that your database change will either happen successfully, or fail obviously. A quick search shows this question asking about atomic operations in Django.
The idea above
updated = Entry.objects.filter(Q(id=e.id) && Q(version=e.version))\
.update(updated_field=new_value, version=e.version+1)
if not updated:
raise ConcurrentModificationException()
looks great and should work fine even without serializable transactions.
The problem is how to augment the deafult .save() behavior as to not have to do manual plumbing to call the .update() method.
I looked at the Custom Manager idea.
My plan is to override the Manager _update method that is called by Model.save_base() to perform the update.
This is the current code in Django 1.3
def _update(self, values, **kwargs):
return self.get_query_set()._update(values, **kwargs)
What needs to be done IMHO is something like:
def _update(self, values, **kwargs):
#TODO Get version field value
v = self.get_version_field_value(values[0])
return self.get_query_set().filter(Q(version=v))._update(values, **kwargs)
Similar thing needs to happen on delete. However delete is a bit more difficult as Django is implementing quite some voodoo in this area through django.db.models.deletion.Collector.
It is weird that modren tool like Django lacks guidance for Optimictic Concurency Control.
I will update this post when I solve the riddle. Hopefully solution will be in a nice pythonic way that does not involve tons of coding, weird views, skipping essential pieces of Django etc.
To be safe the database needs to support transactions.
If the fields is "free-form" e.g. text etc. and you need to allow several users to be able to edit the same fields (you can't have single user ownership to the data), you could store the original data in a variable.
When the user committs, check if the input data has changed from the original data (if not, you don't need to bother the DB by rewriting old data),
if the original data compared to the current data in the db is the same you can save, if it has changed you can show the user the difference and ask the user what to do.
If the fields is numbers e.g. account balance, number of items in a store etc., you can handle it more automatically if you calculate the difference between the original value (stored when the user started filling out the form) and the new value you can start a transaction read the current value and add the difference, then end transaction. If you can't have negative values, you should abort the transaction if the result is negative, and tell the user.
I don't know django, so I can't give you teh cod3s.. ;)
From here:
How to prevent overwriting an object someone else has modified
I'm assuming that the timestamp will be held as a hidden field in the form you're trying to save the details of.
def save(self):
if(self.id):
foo = Foo.objects.get(pk=self.id)
if(foo.timestamp > self.timestamp):
raise Exception, "trying to save outdated Foo"
super(Foo, self).save()