Django N+1 query solution - django

I visited http://guides.rubyonrails.org/active_record_querying.html after talking with a peer regarding N+1 and the serious performance implications of bad DB queries.
ActiveRecord (Rails):
clients = Client.includes(:address).limit(10)
Where client's have addresses, and I intend to access them while looping through the clients, Rails provides includes to let it know to go ahead and add them to the query, which eliminates 9 queries right off the bat.
Django:
https://github.com/lilspikey/django-batch-select provides batch query support. Do you know of other libraries or tricks to achieve what Rails provides above, but in a less verbose manor (as in the rails example wherein just 19 chars fix N+1 and is very clear)? Also, does batch-select address the concern in the same way, or are these two different things?
BTW, I'm not asking about select_related, though it may seem to be the answer at first glance. I'm speaking of a situation where address has a forign key to client.

You can do it with prefetch_related since Django 1.4:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related
If you're using < 1.4, have a look at this module:
https://github.com/ionelmc/django-prefetch
It claims to be more flexible than Django's prefetch_related. Verbose but works great.

Unfortunately, Django's ORM as of yet has no way of doing this.
Fortunately, it is possible to do it in only 2 queries, with a bit of work done in Python.
clients = list(Client.objects.all()[:10])
addresses = dict((x.client_id, x) for x in
Address.objects.filter(client__in=clients))
for client in clients:
print client, addresses[client.id]

django-batch-select is supposed to provide an answer to this problem, though I haven't tried it out. Ignacio's answer above seems best to me.

Related

When to use Haystack/ElasticSearch vs Django's ORM

So I implemented Haystack with ElasticSearch a week ago within our BETA application. One thing I can notice is that getting some data (large amount) back to our users (for example listing all the users within the application) is much faster by going through Haystack then Django's ORM. Now, I will be releasing a REST service (with TastyPie) to serve the possible tablets within the next weeks, as I want to be able to access the information from iPads, Nexus tablets and so on.
One thing I was wondering, is when should I be querying the ORM vs Haystack/ElasticSearch? For example, if the user on the tablet is requesting a specific set of users, should we let TastyPie query the ORM, or go to ElasticSearch?
If we look at this answer Django: Haystack or ORM, we can all agree that a DB is made to retrieve and write data. However, could we say that retrieving faster can be faster with Haystack/ElasticSearch once the search engine was updated?
I am a bit confused as to when, should we not be querying Haystack if it is much faster?!
To make things clear I guess you're talking about querying Elasticsearch via Haystack without later instantiating any objects for your search results with data from you database.
Some points to consider besides the points mentioned in the other post:
A search engine like Elasticsearch is highly optimized when dealing with full-text searches (When doing something with SQL it highly depends on the database/engine you are using)
Queries that are involving a lot of relations/joins will most like be easier to handle with the ORM, but on the other hand you can eg save data from foreign-key relations in a denormalized fashion when using ES which could give you a performance boost. Of course you can denormalize your database tables as well but this is quite often considered as a bad practice as long as you know what you are doing, eg when solving a performance bottleneck.
ES is somehow quite easy to scale while scaling your SQL DB might be more complicated.
Most likely this is a decision that depends very much on your use case, the amount of data to process and the queries you are intending to run. So the best thing of course is - as always - to do some benchmarking yourself and compare this two solutions. But don't do any premature optimisations as one big advantage of the ORM is to keep things simple - you don't have to care much about the integrity of your data and maintain an additional system.

django built-in support for MongoDB

I'm trying to find any information if official django is going to support any noSQL DBMS, especially MongoDB. I found a fork of django 1.3 the django-nonrel (a fork of official django) and some other not very reliable projects (failures occur often, according to comments I found on the web). Is django going to support noSQL officially at all?
Perhaps, there are other ways to achieve your goals, besides going noSQL.
In short, if you just need dynamic fields, you have other options. I have an extensive writeup about them in another answer:
Entity–attribute–value model (Django-eav)
PostgreSQL hstore (Django-hstore)
Dynamic models based on migrations (Django-mutant)
Yes, that's not exactly what you've asked for, but that's all that we've currently got.
As you said, forked code is never the best alternative: changes take longer to get into the fork, it might break things... And even with django-nonrel, is not really Django as you loose things like model inheritance, M2M... basically anything that will need to do a JOIN query behind the scenes.
Is Django going to support NoSQL? As far as I know, there's no plans on the roadmap for doing so in the short run. According to Russell Keith-Magee on his talk on PyCon Russia 2013, "NoSQL" is on the roadmap but in the long term, as well as SQLAlchemy. So if you wanna wait, is going to take a long time, I'm afraid.
Anyway, even if it's not ideal, you still can use Django but use something else as a ORM. Nothing stops you from use vanilla Django and something like MongoDB instead of Django ORM.

Order by count of ManyToMany field in django

Has anyone had difficulty with this type of api call:
projects.annotate(votes_count=Count('votes')).order_by('votes_count')
It's excruciatingly slow for me and I found this related issue:
https://code.djangoproject.com/ticket/17144
I'm wondering if anyone else is experiencing issues related to this or has good work arounds? It seems like a common enough api call that I'm surprised the bug doesn't have more traction.
The ticket specifically mentions MySQL. I'm using PostgreSQL and use annotations all the time with no noticeable slowness. However, I've never specifically checked the generated SQL to see if it's doing the same thing.
If you are using MySQL, it would seem you potentially have the following choices (albeit, none of them particular awesome):
Switch over to PostgreSQL. If my anecdotal evidence and the lack of complaints regarding PostgreSQL in the ticket are any indication, it might not be a problem then.
Live with it for now, upgrade to 1.4 when it's released.
Run off of trunk to get the fix now (not generally a good idea) or try to patch your current installation.
Use raw() as #jknupp suggests.
4 is probably the easiest and best approach for the moment. It's not really a problem unless you switch database engines at some point in the future, and you can always just make a note to yourself in the code to do it the right way when Django is updated. I typically do something like the following in similar scenarios:
"""Django has a bug in version 1.3 (see ticket:
https://code.djangoproject.com/ticket/17144), resulting in unnecessary fields
being included in the GROUP BY clause and subsequently very slow queries. The raw
query below can be replaced with the commented line once the issue has been
corrected.
"""
# projects.annotate(votes_count=Count('votes')).order_by('votes_count')
projects.raw(...)
To get around this, you can use the raw() SQL query. It's not the prettiest way to do it, but it will work. See the documentation here

Django : looking for a good LDAP manipulation library

I am looking for a good ldap library on Django, that would allow me to manage my ldap server :
adding, modifying, deleting entries
for groups, users, and all kind of objects
The library django-ldapdb looked promising, it offers a Model base class that can be used to declare ldap objects in a Django fashion (which is what we ideally want), however we've had some bugs with it, and furthermore it seems like it is not maintained any more.
Does somebody know a good library that could do the trick ? Otherwise I guess I'll just try to improve and debug django-ldapdb ...
Thanks !
sebpiq, you say you applied "one or two fixes" to django-ldapdb, would you care to share them? So far django-ldapdb meets my needs, but I'd be happy to integrate any fixes you might have.
When using ldapdb to query ldap with more results than the server allows instead of getting the partial list (of say the first 500 users) I get SIZELIMIT_EXCEEDED exception. Trying to change the code to catch that exception resulted in an empty result objects.
Anyone else had that problem?
I fixed that problem by changing the search_s function to use search_ext and read the results one by one until the exception happens.
http://www.python-ldap.org/doc/html/index.html
The beauty of Django is that you can use any python module within your application.
There is also django-auth-ldap which claims
LDAP configuration can be as simple as a single distinguished name template, but there are many rich options for working with User objects, groups, and permissions.
Actually, I have found out that with one or two fixes, django-ldapdb is a pretty good library. The only bad point is that it is not very actively maintained... I will use it anyways, because it is the best solution I have found.

django-nonrel on Google App Engine - Implications of using ListField for ManyToMany

I am working on a Google App Engine application and I am relatively new at this.
I have built an app already in Django and have a model using a field type of ManyToMany.
I am aware that django-nonrel does not support many-to-many field types of Django. So I am considering using ListField instead.
Questions:
- What is the implication of using ListField instead of ManyToMany?
- I am aware that this means that Django's JOIN API cannot be used. But what does this mean for my app?
- Am I going to have problems when it comes to doing a search for something in a many-to-many field?
Apologies if these are programming 101 questions. I'm a designer trying to get my head around development.
Thanks
Well as you probably know, you will be spanning the relationship more manually.
Django cannot help quite as much as when using ManyToMany, but it should not be that big a problem.
Depending on the complexity of the relationship, you might want to consider building a model just for this purpose.
I have never used that approach on GAE, since IMO its only valid when an object has alot relations (more than 50 I would say) or when the lookups you plan to do, will benefit from this. Maybe because they start at either end of the relationship with equal frequency or it would be nice to be able to loop over the relationships to display them or something along those lines.
Last time I made something on GAE I used the ListField (or ListProperty as it was known then) since most of the objects only had about 20 related objects and the lookups would rarely go the other way.
So all in all, its not a big deal and I don't remember it as any kind of a pain to work with/around.
Hope this was helpful despite it being rather "IMO"