Order by count of ManyToMany field in django

Order by count of ManyToMany field in django - django

Has anyone had difficulty with this type of api call:
projects.annotate(votes_count=Count('votes')).order_by('votes_count')
It's excruciatingly slow for me and I found this related issue:
https://code.djangoproject.com/ticket/17144
I'm wondering if anyone else is experiencing issues related to this or has good work arounds? It seems like a common enough api call that I'm surprised the bug doesn't have more traction.

The ticket specifically mentions MySQL. I'm using PostgreSQL and use annotations all the time with no noticeable slowness. However, I've never specifically checked the generated SQL to see if it's doing the same thing.
If you are using MySQL, it would seem you potentially have the following choices (albeit, none of them particular awesome):
Switch over to PostgreSQL. If my anecdotal evidence and the lack of complaints regarding PostgreSQL in the ticket are any indication, it might not be a problem then.
Live with it for now, upgrade to 1.4 when it's released.
Run off of trunk to get the fix now (not generally a good idea) or try to patch your current installation.
Use raw() as #jknupp suggests.
4 is probably the easiest and best approach for the moment. It's not really a problem unless you switch database engines at some point in the future, and you can always just make a note to yourself in the code to do it the right way when Django is updated. I typically do something like the following in similar scenarios:
"""Django has a bug in version 1.3 (see ticket:
https://code.djangoproject.com/ticket/17144), resulting in unnecessary fields
being included in the GROUP BY clause and subsequently very slow queries. The raw
query below can be replaced with the commented line once the issue has been
corrected.
"""
# projects.annotate(votes_count=Count('votes')).order_by('votes_count')
projects.raw(...)

To get around this, you can use the raw() SQL query. It's not the prettiest way to do it, but it will work. See the documentation here

Related

django built-in support for MongoDB

I'm trying to find any information if official django is going to support any noSQL DBMS, especially MongoDB. I found a fork of django 1.3 the django-nonrel (a fork of official django) and some other not very reliable projects (failures occur often, according to comments I found on the web). Is django going to support noSQL officially at all?

Perhaps, there are other ways to achieve your goals, besides going noSQL.
In short, if you just need dynamic fields, you have other options. I have an extensive writeup about them in another answer:
Entity–attribute–value model (Django-eav)
PostgreSQL hstore (Django-hstore)
Dynamic models based on migrations (Django-mutant)
Yes, that's not exactly what you've asked for, but that's all that we've currently got.

As you said, forked code is never the best alternative: changes take longer to get into the fork, it might break things... And even with django-nonrel, is not really Django as you loose things like model inheritance, M2M... basically anything that will need to do a JOIN query behind the scenes.
Is Django going to support NoSQL? As far as I know, there's no plans on the roadmap for doing so in the short run. According to Russell Keith-Magee on his talk on PyCon Russia 2013, "NoSQL" is on the roadmap but in the long term, as well as SQLAlchemy. So if you wanna wait, is going to take a long time, I'm afraid.
Anyway, even if it's not ideal, you still can use Django but use something else as a ORM. Nothing stops you from use vanilla Django and something like MongoDB instead of Django ORM.

Django N+1 query solution

I visited http://guides.rubyonrails.org/active_record_querying.html after talking with a peer regarding N+1 and the serious performance implications of bad DB queries.
ActiveRecord (Rails):
clients = Client.includes(:address).limit(10)
Where client's have addresses, and I intend to access them while looping through the clients, Rails provides includes to let it know to go ahead and add them to the query, which eliminates 9 queries right off the bat.
Django:
https://github.com/lilspikey/django-batch-select provides batch query support. Do you know of other libraries or tricks to achieve what Rails provides above, but in a less verbose manor (as in the rails example wherein just 19 chars fix N+1 and is very clear)? Also, does batch-select address the concern in the same way, or are these two different things?
BTW, I'm not asking about select_related, though it may seem to be the answer at first glance. I'm speaking of a situation where address has a forign key to client.

You can do it with prefetch_related since Django 1.4:
https://docs.djangoproject.com/en/dev/ref/models/querysets/#prefetch-related
If you're using < 1.4, have a look at this module:
https://github.com/ionelmc/django-prefetch
It claims to be more flexible than Django's prefetch_related. Verbose but works great.

Unfortunately, Django's ORM as of yet has no way of doing this.
Fortunately, it is possible to do it in only 2 queries, with a bit of work done in Python.
clients = list(Client.objects.all()[:10])
addresses = dict((x.client_id, x) for x in
Address.objects.filter(client__in=clients))
for client in clients:
print client, addresses[client.id]

django-batch-select is supposed to provide an answer to this problem, though I haven't tried it out. Ignacio's answer above seems best to me.

Django : looking for a good LDAP manipulation library

I am looking for a good ldap library on Django, that would allow me to manage my ldap server :
adding, modifying, deleting entries
for groups, users, and all kind of objects
The library django-ldapdb looked promising, it offers a Model base class that can be used to declare ldap objects in a Django fashion (which is what we ideally want), however we've had some bugs with it, and furthermore it seems like it is not maintained any more.
Does somebody know a good library that could do the trick ? Otherwise I guess I'll just try to improve and debug django-ldapdb ...
Thanks !

sebpiq, you say you applied "one or two fixes" to django-ldapdb, would you care to share them? So far django-ldapdb meets my needs, but I'd be happy to integrate any fixes you might have.

When using ldapdb to query ldap with more results than the server allows instead of getting the partial list (of say the first 500 users) I get SIZELIMIT_EXCEEDED exception. Trying to change the code to catch that exception resulted in an empty result objects.
Anyone else had that problem?
I fixed that problem by changing the search_s function to use search_ext and read the results one by one until the exception happens.

http://www.python-ldap.org/doc/html/index.html
The beauty of Django is that you can use any python module within your application.

There is also django-auth-ldap which claims
LDAP configuration can be as simple as a single distinguished name template, but there are many rich options for working with User objects, groups, and permissions.

Actually, I have found out that with one or two fixes, django-ldapdb is a pretty good library. The only bad point is that it is not very actively maintained... I will use it anyways, because it is the best solution I have found.

How do the big sites handle immediate schema changes whilst using Django?

I've been using south but I really hate having to manually migrate data all over again even if I make one small itty bitty update to a class. If I'm not using django I can easily just alter the table schema and make an adjustment in a class and I'm good.
I know that most people would probably tell me to properly think out the schema way in advance, but realistically speaking there are times where you need to immediately make changes, and I don't think using south is ideal for this.
Is there some sort of advanced method people use, perhaps even modifying the core of Django itself? Or is there something about south that I'm not just grokking?

I really hate having to manually migrate data all over again even if I make one small itty bitty update to a class.
Can you specify what kind of updates? If you mean adding new fields or editing existing ones then obviously yes. If you mean modifying methods that operate on fields then there is no need to migrate.
I know that most people would probably tell me to properly think out the schema way in advance
It would certainly help to think it over a couple of times. Experience helps too. But obviously you cannot foresee everything.
but realistically speaking there are times where you need to immediately make changes, and I don't think using south is ideal for this.
Honestly I am not convinced by this argument. If changes can be deployed "immediately" using SQL then I'd argue that they can be deployed using South as well. Especially if you have automated your deployment using Fabric or such.
Also I find it hard to believe that the time taken to execute a migration using a generated script can be significantly greater than the time taken to first write the appropriate SQL and then execute it. At least this has not been the case in my experience.
The one exception could be a situation where the ORM doesn't readily have an equivalent for the SQL. In that case you can still execute the raw SQL through your (South) migration script.
Or is there something about south that I'm not just grokking?
I suspect that you are not grokking the idea of having orderly, version-controlled, reversible migrations. SQL-only migrations are not always designed to be reversible (I know there are exceptions). And they are not orderly unless the developers take particular care about keeping them so. I've even seen people fire potentially troublesome updates on production without even pausing to start a transaction first and then discard the SQL without even making a record of it.
I'm not questioning your skills or attention to detail here; I'm just pointing out what I think is your disconnect with South.
Hope this helps.

Search engine solution for Django that actually works?

The story so far:
Decided to go with Xapian as search backend because it has all search-engine features I was looking for, knows about Unicode, stemming, has few dependencies and requires no bloated app-server installation on top of it.
Tried Django and Haystack (plus xapian-haystack, the backend glue code to tie Haystack to Xapian) because it was advertised on quite some blogs as "working". Did not work. Neither django-haystack nor the xapian-haystack project provide a version combination that actually works together. MASTER from both projects yields an error from Xapian, so it's not stable at all. Haystack 1.0.1 and xapian-haystack 1.0.x/1.1.0 are not API-compatible. Plus, in a minimally working installation of Haystack 1.0.1 and xapian-haystack MASTER, any complex query yields zero results due to errors in either django-haystack or xapian-haystack (I double-verified this), maybe because the unit-tests actually test very simple cases, and no edge-cases at all.
Tried Djapian. The source-code is riddled with spelling errors (mind you, in variable names, not comments), documentation is also riddled with ambiguities and outdated information that will never lead to a working installation. Not surprisingly, users rarely ask for features but how to get it working in the first place.
Next on the plate: exploring Solr (installing a Java environment plus Tomcat gives me headaches, the machine is RAM- and CPU-constrained), or Lucene (slightly less headaches, but still).
Before I proceed spending more time with a solution that might or might not work as advertised, I'd like to know: Did anyone ever get an actual, real-world search solution working in Django? I'm serious. I find it really frustrating reading about "large problems mostly solved", and then realizing that you will never get a working installation from the source-code because, actually, all bloggers dealing with those "mostly solved problems" never went past basic installation and copy-pasting the official tutorials.
So here are the requirements:
must be able to search for 10-100 terms in one query
must handle + (term must be present) and - (term must not be present), AND/OR
must handle arbitrary grouping (i.e. parentheses around AND/OR)
must allow for Django-ORM filtering before or after fulltext-search (i.e. pre-/post-processing of results with the full set of filters that Django knows about)
alternatively, there must be a facility to bulk-fetch the result set and transform it into a QuerySet
should be light on the machine, so preferably no humongous JVM and Java-based app-server installation
Is there anything out there that does this? I'm not interested in anecdotal evidence, or references to some blog posts that claim it should be working. I'd like to hear from someone who actually has a fully-functional setup working in the real world, under real conditions, with real queries.
EDIT:
Let me repeat again that I'm not so much interested in anecdotal evidence that someone, somewhere has a somewhat running installation working with unspecified properties. I already went there, I read all the blog posts, mailing lists, I contacted the authors, but when it came to actual implementation of real-world scenarios, nothing ever worked as advertised.
Also, and a user below brought that point up as well, considering the TCO of any project, I'm definitely not interested in hearing that someone, somewhere was able to pull it off once a vendor parachuted in an unknown number of specialists to monkey-patch the whole installation with specific domain-knowledge that's documented nowhere.
So, please, if you claim you have a working installation that actually satisfies minimum requirements for a full-fledged search (see requirements above), please provide the following so that we can all benefit from a search solution for Django that actually solves the problem:
exact Linux distribution, release version,
exact release version of Haystack (or equivalent) and release version of search backend,
exact release version of the search engine
publicly (!) available documentation how to set up all components exactly in the way that your installation was set up such that the minimal requirements above are met.
Thank you.

I have developed some Django applications with xapian support too. The biggest of them has a xapian database with an index of 8G storing 2.4M documents (including forum posts, wiki entries, planet entries and blog entries) - still growing.
Overall I am quite happy with xapian. It performs extremely well and is easy to use. The only thing I don't like is that xapian won't work with mod_wsgi (except of the global mode) because of a deadlock. So you are forced to use fastcgi (or connect to xapian-tcpsrv or write your own service).
I recommend you, to use the xapian-bindings directly. Xapian nowadays offers quite a lot of useful helpers (TermGenerator, QueryParser etc), which makes both the indexing and the querying simple. In fact, there is nothing I can imaging which would justify an additional library. In my opinion they are all more complicated and don't allow you to index efficiently.
The only thing you need, is some understanding of the way how xapian is working. (What are terms? What are values? What is stemming and where should I use it? and so on). You can find all those topics on the xapian website, and as soon as you understand those concepts, dealing with xapian will become easy.
Also, the xapian API is extremly stable. I've started using it a long time before the 1.0 release and never had any problems with API changes or version conflicts. The only thing which has changed is that all those helpers (query parser, tokenizer, etc.) I have once written for my Django project are now useless, because similar classes have made their way into the xapian core.
So, to summarize, just give the direct usage of xapian-bindings a try.

I can vouch for Django-Haystack with the Xapian backend (In the interest of full disclosure, I am the author of the xapian-haystack backend) in a real life, production environment. We currently use Haystack/Xapian on several sites, the largest of which has more than 20,000 registered users and a Xapian database with 20,000+ documents containing more than 143,000 unique terms for a total size of ~141mb.
As for not being able to get any combination of Haystack and the Xapian backend running, I'll admit that I was not as diligent as I should have been with my tagging and so there is some confusion with the versions. You should, however, be able to use the current master of both codebases without any issue. If this is not case, I'd be more than happy to assist with problems. You'll need to be a little bit more specific about the issue though. Simply saying "it did not work" is not enough information.
Daniel and I both do our best to respond to any issues opened on Github within a timely manner. Also, we're both usually available on the #haystack IRC channel during the day and the django-haystack Google Group.
Versions used:
Haystack 1.0BETA with Xapian-Haystack 1.1.0BETA
Haystack 1.0.1FINAL with Xapian-Haystack 1.1.3BETA
Most of the sites we've deployed with Haystack have been running Ubuntu 8.04 LTS with Xapian 1.0.5

Short answer: No.
We bailed and went with a Google Custom Search. Although the site has over 10,000 possible page views, we keep the sitemap feed down to the main 4,000 pages or so and it costs $250/year, which is about 2 hours of my time. The customer is happy and he feels comfortable with the results.
I'd love to see someone come up with a good FOSS solution, but in a commercial situation the TCO has got to make economic sense.

The details you requested.
exact Linux distribution, release version - Ubuntu 9.04 & 9.10
exact release version of Haystack (or equivalent) - Haystack 1.0 as well as master
release version of search backend - The Solr & Whoosh backends included with Haystack
exact release version of the search engine - Solr 1.3, Solr 1.4 & Whoosh 0.3.15
publicly (!) available documentation how to set up all components exactly in the way that your installation was set up such that the minimal requirements above are met.
http://docs.haystacksearch.org/dev/installing_search_engines.html#solr (or #whoosh)
Beyond this, it's the standard configuration bits from the tutorial, plus any additional overrides from (which I can't link to, thanks Stack Overflow) as needed.
As the maintainer of Haystack, I'm actively running all of the above previous setups. The smallest Haystack installation (Haystack 1.0 + Whoosh) is ~600 documents. A slightly larger one (Haystack master + Solr 1.4) is ~4000 documents. The largest deployment I'm aware of (Haystack master + Solr 1.4) is ~3 million documents.
I generally try to avoid Stack Overflow, so don't be surprised if you see nothing further from me. The mailing list is the best place for support, but given your responses thus far, I'm sure you'd rather just trash me here.

I (and my colleagues) have successfully used Haystack to achieve a fairly good search functionality.
It is easy to start with haystack and whoosh backend; and change to the Apache-Solr backend when performance of whoosh is not acceptable.
We really got to get around to write a detailed post about it with links to the projects where it works.
For now I can suggest you to have a look at this search: http://www.webdevjobshq.com/search/?q=rails implemented using Haystack with Apache-Solr backend. Or this: http://www.govbuddy.com/search/?q=Roy

Have you considered Sphinx? What are you using as you data store? It has a MySQL engine that works terrific. I think it meet most of your requirements except I'm not exactly certain how nicely it can be tied into Django-ORM.
I'm heavily considering using Sphinx in one of my own Django Apps to improve performance on an auto-suggest field that does a prefix and infix search on a corpus of 3.5 million records. But I haven't got around to implementing it yet, so I can't speak to Django+Sphinx integration. My only Sphinx experience is with the MySQL Engine and directly querying MySQL.

I use Djapian. It was quite simple to install and works great. There is an actual tutorial that covers basic use-cases and shows entire integration process.
Yes, it has some ambiguities but issue tracker is open and authors rapidly fixes bugs and add features.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js