Sorting Records using Foreign Keys - django

From my question at Get Foreign Key Value, I managed to get the desired output...only one last bit remains. I want to sort my records by the year, make, then model in that order. I thought it'd be as simple as Vehicle.objects.all().order_by('common_vehicle') but this doesn't sort anything.

You have to order by specific fields in the related class. You do this by using the double-underscore format. So, for example:
Vehicle.objects.order_by('common_vehicle__year', 'common_vehicle__series__model__model')
to sort by the year value of the CommonVehicle class, then the model value of the Model class which is related via the Series class.
Note that this is a lot of joins, and could make your query performance quite slow. It may be fine for your needs, but just a heads-up that this is a potential source of slowness down the line.

Related

Ordering Django querysets using a JSONField's properties

I have a model that kinda looks like this:
class Person(models.Model):
data = JSONField()
The data field has 2 properties, name, and age. Now, lets say I want to get a paginated queryset (each page containing 20 people), with a filter where age is greater than 25, and the queryset is to be ordered in descending order. In a usual setup, that is, a normalized database, I can write this query like so:
person_list_page_1 = Person.objects.filter(age > 25).order_by('-age')[:20]
Now, what is the equivalence of the above when filtering and ordering using keys stored in the JSONField? I have researched into this, and it seems it was meant to be a feature for 2.1, but I can't seem to find anything relevant.
Link to the ticket about it being implemented in the future
I also have another question. Lets say we filter and order using the JSONField. Will the ORM have to get all the objects, filter, and order them before sending the first 20 in such a case? That is, will performance be legitimately slower?
Obviously, I know a normalized database is far better for these things, but my hands are kinda tied.
You can use the postgresql sql syntax to extract subfields. Then they can be used just as any other field on the model in queryset filters.
from django.db.models.expressions import RawSQL
Person.objects.annotate(
age=RawSQL("(data->>'age')::int", [])
).filter(age__gte=25).order_by('-age')[:20]
See the postgresql docs for other operators and functions.
In some cases, you might have to add explicit typecasts (::int, for example)
https://www.postgresql.org/docs/current/static/functions-json.html
Performance will be slower than with a proper field, but it's not bad.

Django Postgres ArrayField vs One-to-Many relationship

For a model in my database I need to store around 300 values for a specific field. What would be the drawbacks, in terms of performance and simplicity in query, if I use Postgres-specific ArrayField instead of a separate table with One-to-Many relationship?
If you use an array field
The size of each row in your DB is going to be a bit large thus Postgres is going to be using a lot more toast tables (http://www.postgresql.org/docs/9.5/static/storage-toast.html)
Every time you get the row, unless you specifically use defer (https://docs.djangoproject.com/en/1.9/ref/models/querysets/#defer) the field or otherwise exclude it from the query via only, or values or something, you paying the cost of loading all those values every time you iterate across that row. If that's what you need then so be it.
Filtering based on values in that array, while possible isn't going to be as nice and the Django ORM doesn't make it as obvious as it does for M2M tables.
If you use M2M
You can filter more easily on those related values
Those fields are postponed by default, you can use prefetch_related if you need them and then get fancy if you want only a subset of those values loaded
Total storage in the DB is going to be slightly higher with M2M because of keys, and extra id fields
The cost of the joins in this case is completely negligible because of keys.
Personally I'd say go with the M2M tables, but I don't know your specific application. If you're going to be working with a massive amount of data it's likely worth grabbing a representative dataset and testing both methods with it.

What is the best way to use query with a list and keep the list order? [duplicate]

This question already has answers here:
Django: __in query lookup doesn't maintain the order in queryset
(6 answers)
Closed 8 years ago.
I've searched online and could only find one blog that seemed like a hackish attempt to keep the order of a query list. I was hoping to query using the ORM with a list of strings, but doing it that way does not keep the order of the list.
From what I understand bulk_query only works if you have the id's of the items you want to query.
Can anybody recommend an ideal way of querying by a list of strings and making sure the objects are kept in their proper order?
So in a perfect world I would be able to query a set of objects by doing something like this...
Entry.objects.filter(id__in=['list', 'of', 'strings'])
However, they do not keep order, so string could be before list etc...
The only work around I see, and I may just be tired or this may be perfectly acceptable I'm not sure is doing this...
for i in listOfStrings:
object = Object.objects.get(title=str(i))
myIterableCorrectOrderedList.append(object)
Thank you,
The problem with your solution is that it does a separate database query for each item.
This answer gives the right solution if you're using ids: use in_bulk to create a map between ids and items, and then reorder them as you wish.
If you're not using ids, you can just create the mapping yourself:
values = ['list', 'of', 'strings']
# one database query
entries = Entry.objects.filter(field__in=values)
# one trip through the list to create the mapping
entry_map = {entry.field: entry for entry in entries}
# one more trip through the list to build the ordered entries
ordered_entries = [entry_map[value] for value in values]
(You could save yourself a line by using index, as in this example, but since index is O(n) the performance will not be good for long lists.)
Remember that ultimately this is all done to a database; these operations get translated down to SQL somewhere.
Your Django query loosely translated into SQL would be something like:
SELECT * FROM entry_table e WHERE e.title IN ("list", "of", "strings");
So, in a way, your question is equivalent to asking how to ORDER BY the order something was specified in a WHERE clause. (Needless to say, I hope, this is a confusing request to write in SQL -- NOT the way it was designed to be used.)
You can do this in a couple of ways, as documented in some other answers on StackOverflow [1] [2]. However, as you can see, both rely on adding (temporary) information to the database in order to sort the selection.
Really, this should suggest the correct answer: the information you are sorting on should be in your database. Or, back in high-level Django-land, it should be in your models. Consider revising your models to save a timestamp or an ordering when the user adds favorites, if that's what you want to preserve.
Otherwise, you're stuck with one of the solutions that either grabs the unordered data from the db then "fixes" it in Python, or constructing your own SQL query and implementing your own ugly hack from one of the solutions I linked (don't do this).
tl;dr The "right" answer is to keep the sort order in the database; the "quick fix" is to massage the unsorted data from the database to your liking in Python.
EDIT: Apparently MySQL has some weird feature that will let you do this, if that happens to be your backend.

Django: Query with F() into an object not behaving as expected

I am trying to navigate into the Price model to compare prices, but met with an unexpected result.
My model:
class ProfitableBooks(models.Model):
price = models.ForeignKey('Price',primary_key=True)
In my view:
foo = ProfitableBooks.objects.filter(price__buy__gte=F('price__sell'))
Producing this error:
'ProfitableBooks' object has no attribute 'sell'
Is this your actual model or a simplification? I think the problem may lie in having a model whose only field is its primary key is a foreign key. If I try to parse that out, it seems to imply that it's essentially a field acting as a proxy for a queryset-- you could never have more profitable books than prices because of the nature of primary keys. It also would seem to mean that your elided books field must have no overlap in prices due to the implied uniqueness constraints.
If I understand correctly, you're trying to compare two values in another model: price.buy vs. price.sell, and you want to know if this unpictured Book model is profitable or not. While I'm not sure exactly how the F() object breaks down here, my intuition is that F() is intended to facilitate a kind of efficient querying and updating where you're comparing or adjusting a model value based on another value in the database. It may not be equipped to deal with a 'shell' model like this which has no fields except a joint primary/foreign key and a comparison of two values both external to the model from which the query is conducted (and also distinct from the Book model which has the identifying info about books, I presume).
The documentation says you can use a join in an F() object as long as you are filtering and not updating, and I assume your price model has a buy and sell field, so it seems to qualify. So I'm not 100% sure where this breaks down behind the scenes. But from a practical perspective, if you want to accomplish exactly the result implied here, you could just do a simple query on your price model, b/c again, there's no distinct data in the ProfitableBooks model (it only returns prices), and you're also implying that each price.buy and price.sell have exactly one corresponding book. So Price.objects.filter(buy__gte=F('sell')) gives the result you've requested in your snipped.
If you want to get results which are book objects, you should do a query like the one you've got here, but start from your Book model instead. You could put that query in a queryset manager called "profitable_books" or something, if you wanted to substantiate it in some way.

Django ORM: Optimizing queries involving many-to-many relations

I have the following model structure:
class Container(models.Model):
pass
class Generic(models.Model):
name = models.CharacterField(unique=True)
cont = models.ManyToManyField(Container, null=True)
# It is possible to have a Generic object not associated with any container,
# thats why null=True
class Specific1(Generic):
...
class Specific2(Generic):
...
...
class SpecificN(Generic):
...
Say, I need to retrieve all Specific-type models, that have a relationship with a particular Container.
The SQL for that is more or less trivial, but that is not the question. Unfortunately, I am not very experienced at working with ORMs (Django's ORM in particular), so I might be missing a pattern here.
When done in a brute-force manner, -
c = Container.objects.get(name='somename') # this gets me the container
items = c.generic_set.all()
# this gets me all Generic objects, that are related to the container
# Now what? I need to get to the actual Specific objects, so I need to somehow
# get the type of the underlying Specific object and get it
for item in items:
spec = getattr(item, item.get_my_specific_type())
this results in a ton of db hits (one for each Generic record, that relates to a Container), so this is obviously not the way to do it. Now, it could, perhaps, be done by getting the SpecificX objects directly:
s = Specific1.objects.filter(cont__name='somename')
# This gets me all Specific1 objects for the specified container
...
# do it for every Specific type
that way the db will be hit once for each Specific type (acceptable, I guess).
I know, that .select_related() doesn't work with m2m relationships, so it is not of much help here.
To reiterate, the end result has to be a collection of SpecificX objects (not Generic).
I think you've already outlined the two easy possibilities. Either you do a single filter query against Generic and then cast each item to its Specific subtype (results in n+1 queries, where n is the number of items returned), or you make a separate query against each Specific table (results in k queries, where k is the number of Specific types).
It's actually worth benchmarking to see which of these is faster in reality. The second seems better because it's (probably) fewer queries, but each one of those queries has to perform a join with the m2m intermediate table. In the former case you only do one join query, and then many simple ones. Some database backends perform better with lots of small queries than fewer, more complex ones.
If the second is actually significantly faster for your use case, and you're willing to do some extra work to clean up your code, it should be possible to write a custom manager method for the Generic model that "pre-fetches" all the subtype data from the relevant Specific tables for a given queryset, using only one query per subtype table; similar to how this snippet optimizes generic foreign keys with a bulk prefetch. This would give you the same queries as your second option, with the DRYer syntax of your first option.
Not a complete answer but you can avoid a great number of hits by doing this
items= list(items)
for item in items:
spec = getattr(item, item.get_my_specific_type())
instead of this :
for item in items:
spec = getattr(item, item.get_my_specific_type())
Indeed, by forcing a cast to a python list, you force the django orm to load all elements in your queryset. It then does this in one query.
I accidentally stubmled upon the following post, which pretty much answers your question :
http://lazypython.blogspot.com/2008/11/timeline-view-in-django.html