Is it a good idea to modify fields in Model instances returned in a QuerySet? Will it kill performance? - django

So, my problem is this. I have a legacy MySQL database that I'm building a shiny new Django application over. For whatever reason, some fairly daft design decisions were made—such as all fields, no matter what they contain, being stored as varchars—but because since other systems that I'm not rewriting depend on that same data I can't destructively change that schema at all.
I want to treat a certain field—the stock quantity on hand—as an integer, so that in my template I can check its amount and display a relevant value (basically, if there are more than 100 items available, I want to just display "100+ Available").
The existing value for stock quantity is stored as, oddly, a varchar holding a float (as if it's possible to have fractional amounts of an item in stock):
item.qty: u"72.0"
Now, I figure as a worst case I can use QuerySet.values(), and iterate over the results, replacing each stock quantity with an int() parsed version of itself. Something like ...
item_list = items.values()
for item in item_list:
item['qty'] = int(float(item['qty']))
... but won't that cause my QuerySet to evaluate itself completely? I confess to being fairly ignorant of the process by which Django handles lazy execution of queries, but it seems like working with actual values would mean evaluating the query before it needs to.
So, am I complaining about nothing (I mean, it's definitely evaluating these values in the template anyway), or is there a better way to do what I need to do?

Yes, iterating through in the view would evaluate the entire queryset. That may or not be what you want - for example, if you're paginating, the paginator limits the query automatically, so you don't want to evaluate the whole thing separately.
I would approach this in a different way, by calling a function in the template to format the data. You can either do this with a method on the model, if there's just one field you want to format:
class Item(models.Model):
...
def get_formatted_qty(self):
return int(float(self.qty))
and call it:
{{ item.get_formatted_qty }}
Or, to make it more general, you can define a custom filter in your templatetags:
#register.filter
def format_value(value):
return int(float(value))
{{ item.qty|format }}

Related

How can I dynamically create multi-level hierarchical forms in Django?

I'm building an advanced search page for a scientific database using Django. The goal is to be able to allow some dynamically created sophisticated searches, joining groups of search terms with and & or.
I got part-way there by following this example, which allows multiple terms anded together. It's basically a single group of search terms, dynamically created, that I can either and-together or or-together. E.g.
<field1|field2> <is|is not|contains|doesn't contain> <search term> <->
<+>
...where <-> will remove a search term row and <+> will add a new row.
But I would like the user to be able to either add another search term row, or add an and-group and an or-group, so that I'd have something like:
<and-group|or-group> <->
<field1|field2> <is|is not|contains|doesn't contain> <search term> <->
<+term|+and-group|_or-group>
A user could then add terms or groups. The result search might end up like:
and-group
compound is lysine
or-group
tissue is brain
tissue is spleen
feeding status is not fasted
Thus the resulting filter would be like the following.
Data.objects.filter(Q(compound="lysine") & (Q(tissue=brain) | Q(tissue="spleen")) & ~Q(feeding_status="fasted"))
Note - I'm not necessarily asking how to get the filter expression below correct - it's just the dynamic hierarchical construction component that I'm trying to figure out. Please excuse me if I got the Q and/or filter syntax wrong. I've made these queries before, but I'm still new to Django, so getting it right off the top of my head here is pretty much guaranteed to be zero-chance. I also skipped the model relationships I spanned here, so let's assume these are all fields in the same model, for simplicity.
I'm not sure how I would dynamically add parentheses to the filter expression, but my current code could easily join individual Q expressions with and or or.
I'm also not sure how I could dynamically create a hierarchal form to create the sub-groups. I'm guessing any such solution would have to be a hack and that there are not established mechanisms for doing something like this...
Here's a screenshot example of what I've currently got working:
UPDATE:
I got really far following this example I found. I forked that fiddle and got this proof of concept working before incorporating it into my Django project:
http://jsfiddle.net/hepcat72/d42k38j1/18/
The console spits out exactly the object I want. And there are no errors. Clicking the search button works for form validation. Any fields I leave empty causes a prompt to fill in the field. Here's a demo gif:
Now I need to process the POST input to construct the query (which I think I can handle) and restore the form above the results - which I'm not quite sure how to accomplish - perhaps a recursive function in a custom tag?
Although, is there a way to snapshot the form and restore it when the results load below it? Or maybe have the results load in a different frame?
I don't know if I'm teaching a grandmother to suck eggs, but in case not, one of the features of the Python language may be useful.
foo( bar = 27, baz = None)
can instead be coded
args = {}
a1, a2 = 'bar', 'baz'
d[a1] = 27
d[a2] = None
foo( **args )
so an arbitrary Q object specified by runtime keys and values can be constructed q1 = Q(**args)
IIRC q1 & q2 and q1 | q2 are themselves Q objects so you can build up a filter of arbitrary complexity.
I'll also include a mention of Django-filter which is usually my answer to filtering questions like this one, but I suspect in this case you are after greater power than it easily provides. Basically, it will "and" together a list of filter conditions specified by the user. The built-in ones are simple .filter( key=value), but by adding code you can create custom filters with complex Q expressions related to a user-supplied value.
As for the forms, a Django form is a linear construct, and a formset is a list of similar forms. I think I might resort to JavaScript to build some sort of tree representing a complex query in the browser, and have the submit button encode it as JSON and return it through a single text field (or just pick it out of request.POST without using a form). There may be some Javascript out there already written to do this, but I'm not aware of it. You'd need to be sure that malicious submission of field names and values you weren't expecting doesn't result in security issues. For a pure filtering operation, this basically amounts to being sure that the user is entitled to get all data in database table in any case.
There's a form JSONField in the Django PostgreSQL extensions, which validates that user-supplied (or Javascript-generated) text is indeed JSON, and supplies it to you as Python dicts and lists.

Get part of django model depending on a variable

My code
PriceListItem.objects.get(id=tarif_id).price_eur
In my settings.py
CURRENCY='eur'
My Question:
I would like to pick the different info depending on the CURRENCY variable in settings.py
Example:
PriceListItem.objects.get(id=tarif_id).price_+settings.CURRENCY
Is it possible?
Sure. This has nothing to do with Django actually. You can reach the instance's attribute through pure Python:
getattr(PriceListItem.objects.get(id=tarif_id), 'price_'+settings.CURRENCY)
Note it might be a better idea to have a method on the model which accepts the currency as a parameter and returns the correct piece of data (through the line I wrote above, for example).
I think this should work
item = PriceListItem.objects.get(id=tarif_id)
value = getattr(item, price_+settings.CURRENCY)
In case you are only interested in that specific column, you can make the query more efficient with .values_list:
my_price = PriceListItem.objects.values_list_(
'price_{}'.format(settings.CURRENCY),
flat=True
).get(id=tarif_id)
This will only fetch that specific column from the database, which can be a (a bit) faster than first fetching the entire row into memory, and then discard all the rest later.
Here my_price is thus not a PriceListItem object, but the value that is stored for the specific price_cur column.
It will thus result in a query that looks like:
SELECT pricelistitem.price_cur
FROM pricelistitem
WHERE id=tarif_id

django - show the length of a queryset in a template

In my html file, how can I output the size of the queryset that I am using (for my debugging purposes)
I've tried
{{ len(some_queryset) }}
but that didn't work. What is the format?
Give {{ some_queryset.count }} a try.
This is better than using len (which could be invoked with {{ some_queryset.__len__ }}) because it optimizes the SQL generated in the background to only retrieve the number of records instead of the records themselves.
some_queryset.count() or {{some_queryset.count}} in your template.
dont use len, it is much less efficient. The database should be doing that work. See the documentation about count().
However, taking buffer's advice into account, if you are planning to iterate over the records anyway, you might as well use len which will involve resolving the queryset and making the resulting rows resident in main memory - this wont go to waste because you will visit these rows anyway. It might actually be faster, depending on db connection latency, but you should always measure.
Just to highlight #Yuji'Tomita'Tomita comment above as a separate answer:
There is a filter called length to call len() on anything.
So you could use:
{{ some_queryset|length }}
The accepted answer is not entirely correct. Whether you should use len() (or the length-filter in a template) vs count() depends on your use case.
If the QuerySet only exists to count the amount of rows, use count().
If the QuerySet is used elsewhere, i.e. in a loop, use len() or |length. Using count() here would issue another SELECT-query to count the rows, while len() simply counts the amount of cached results in the QuerySet.
From the docs:
Note that if you want the number of items in a QuerySet and are also retrieving model instances from it (for example, by iterating over it), it’s probably more efficient to use len(queryset) which won’t cause an extra database query like count() would.
Although it seems that with related objects that you have already eager-loaded using prefetch_related(), you can safely use count() and Django will be smart enough to use the cached data instead of doing another SELECT-query.

Django - How to annotate QuerySet using multiple field values?

I have a model called "Story" that has two integer fields called "views" and "votes". When I retrieve all the Story objects I would like to annotate the returned QuerySet with a "ranking" field that is simply "views"/"votes". Then I would like to sort the QuerySet by "ranking". Something along the lines of...
Story.objects.annotate( ranking=CalcRanking('views','votes') ).sort_by(ranking)
How can I do this in Django? Or should it be done after the QuerySet is retrieved in Python (like creating a list that contains the ranking for each object in the QuerySet)?
Thanks!
PS: In my actual program, the ranking calculation isn't as simple as above and depends on other filters to the initial QuerySet, so I can't store it as another field in the Story model.
In Django, the things you can pass to annotate (and aggregate) must be subclasses of django.db.models.aggregates.Aggregate. You can't just pass arbitrary Python objects to it, since the aggregation/annotation actually happens inside the database (that's the whole point of aggregate and annotate). Note that writing custom aggregations is not supported in Django (there is no documentation for it). All information available on it is this minimal source code: https://code.djangoproject.com/browser/django/trunk/django/db/models/aggregates.py
This means you either have to store the calculations in the database somehow, figure out how the aggregation API works or use raw sql (raw method on the Manager) to do what you do.

Django ORM: Optimizing queries involving many-to-many relations

I have the following model structure:
class Container(models.Model):
pass
class Generic(models.Model):
name = models.CharacterField(unique=True)
cont = models.ManyToManyField(Container, null=True)
# It is possible to have a Generic object not associated with any container,
# thats why null=True
class Specific1(Generic):
...
class Specific2(Generic):
...
...
class SpecificN(Generic):
...
Say, I need to retrieve all Specific-type models, that have a relationship with a particular Container.
The SQL for that is more or less trivial, but that is not the question. Unfortunately, I am not very experienced at working with ORMs (Django's ORM in particular), so I might be missing a pattern here.
When done in a brute-force manner, -
c = Container.objects.get(name='somename') # this gets me the container
items = c.generic_set.all()
# this gets me all Generic objects, that are related to the container
# Now what? I need to get to the actual Specific objects, so I need to somehow
# get the type of the underlying Specific object and get it
for item in items:
spec = getattr(item, item.get_my_specific_type())
this results in a ton of db hits (one for each Generic record, that relates to a Container), so this is obviously not the way to do it. Now, it could, perhaps, be done by getting the SpecificX objects directly:
s = Specific1.objects.filter(cont__name='somename')
# This gets me all Specific1 objects for the specified container
...
# do it for every Specific type
that way the db will be hit once for each Specific type (acceptable, I guess).
I know, that .select_related() doesn't work with m2m relationships, so it is not of much help here.
To reiterate, the end result has to be a collection of SpecificX objects (not Generic).
I think you've already outlined the two easy possibilities. Either you do a single filter query against Generic and then cast each item to its Specific subtype (results in n+1 queries, where n is the number of items returned), or you make a separate query against each Specific table (results in k queries, where k is the number of Specific types).
It's actually worth benchmarking to see which of these is faster in reality. The second seems better because it's (probably) fewer queries, but each one of those queries has to perform a join with the m2m intermediate table. In the former case you only do one join query, and then many simple ones. Some database backends perform better with lots of small queries than fewer, more complex ones.
If the second is actually significantly faster for your use case, and you're willing to do some extra work to clean up your code, it should be possible to write a custom manager method for the Generic model that "pre-fetches" all the subtype data from the relevant Specific tables for a given queryset, using only one query per subtype table; similar to how this snippet optimizes generic foreign keys with a bulk prefetch. This would give you the same queries as your second option, with the DRYer syntax of your first option.
Not a complete answer but you can avoid a great number of hits by doing this
items= list(items)
for item in items:
spec = getattr(item, item.get_my_specific_type())
instead of this :
for item in items:
spec = getattr(item, item.get_my_specific_type())
Indeed, by forcing a cast to a python list, you force the django orm to load all elements in your queryset. It then does this in one query.
I accidentally stubmled upon the following post, which pretty much answers your question :
http://lazypython.blogspot.com/2008/11/timeline-view-in-django.html