Order of Django Queryset results - django

I'm having trouble understanding why a Queryset is being returned in the order it is. We have authors listed for articles and those get stored in a ManyToMany table called: articlepage_authors
We need to be able to pick, on an article by article basis, what order they are returned and displayed in.
For example, article with id 44918 has authors 13752 (‘Lee Bodding’) and 13751 (‘Mark Lee’).
I called these in the shell which returns :
Out[6]: <QuerySet [<User: Mark Lee (MarkLee#uss778.net)>, <User: Lee Bodding (LeeBodding#uss778.net)>]>
Calling this in postgres: SELECT * FROM articlepage_authors;
shows that user Lee Bodding id=13752 is stored first in the table.
id | articlepage_id | user_id
-----+----------------+---------
1 | 44508 | 7781
2 | 44508 | 7775
3 | 44514 | 17240
….
465 | 44916 | 17171
468 | 44918 | 13752
469 | 44918 | 13751
No matter what I try e.g. deleting the authors, adding ‘Lee Bodding’, saving the article, then adding ‘Mark Lee’, and vice versa – I can still only get a query set which returns ‘Mark Lee’ first.
I am not sure how else to debug this.
One solution would be to add another field which defines the order of authors, but I’d like to understand what’s going on here first. Something seems to be defining the order already, and it’d be better to manage that.

You can add an order_by to your queryset to make records appear in the order that you would like. Warning: for query optimization you may need to create an index on that field for performance reasons depending on the database:
By default, results returned by a QuerySet are ordered by the ordering tuple given by the ordering option in the model’s Meta. You can override this on a per-QuerySet basis by using the order_by method.
Example:
Entry.objects.filter(pub_date__year=2005).order_by('-pub_date', 'headline')
The result above will be ordered by pub_date descending, then by headline ascending. The negative sign in front of "-pub_date" indicates descending order. Ascending order is implied.
You pair that with an extra to order by the many-to-many ID:
.extra(select={
'creation_seq': 'articlepage_authors.id'
}).order_by("creation_seq")
If you're using django > 1.10, you can just use the field directly without the extra:
.order_by('articlepage_authors.id')

Related

Django ORM: django aggregate over filtered reverse relation

The question is remotely related to Django ORM: filter primary model based on chronological fields from related model, by further limiting the resulting queryset.
The models
Assuming we have the following models:
class Patient(models.Model)
name = models.CharField()
# other fields following
class MedicalFile(model.Model)
patient = models.ForeignKey(Patient, related_name='files')
issuing_date = models.DateField()
expiring_date = models.DateField()
diagnostic = models.CharField()
The query
I need to select all the files which are valid at a specified date, most likely from the past. The problem that I have here is that for every patient, there will be a small overlapping period where a patient will have 2 valid files. If we're querying for a date from that small timeframe, I need to select only the most recent file.
More to the point: consider patient John Doe. he will have string of "uninterrupted" files starting with 2012 like this:
+---+------------+-------------+
|ID |issuing_date|expiring_date|
+---+------------+-------------+
|1 |2012-03-06 |2013-03-06 |
+---+------------+-------------+
|2 |2013-03-04 |2014-03-04 |
+---+------------+-------------+
|3 |2014-03-04 |2015-03-04 |
+---+------------+-------------+
As one can easily observe, there is an overlap of couple of days of the validity of these files. For instance, in 2013-03-05 the files 1 and 2 are valid, but we're considering only file 2 (as the most recent one). I'm guessing that the use case isn't special: this is the case of managing subscriptions, where in order to have a continuous subscription, you will renew your subscription earlier.
Now, in my application I need to query historical data, e.g. give me all the files which where valid at 2013-03-05, considering only the "most recent" ones. I was able to solve this by using RawSQL, but I would like to have a solution without raw SQL. In the previous question, we were able to filter the "latest" file by aggregation over the reverse relation, something like:
qs = MedicalFile.objects.annotate(latest_file_date=Max('patient__files__issuing_date'))
qs = qs.filter(issuing_date=F('latest_file_date')).select_related('patient')
The problem is that we need to limit the range over which latest_file_date is computed, by filtering against 2013-03-05. But aggregate function don't run over filtered querysets ...
The "poor" solution
I'm currently doing this via an extra queryset clause (substitute "app" with your concrete application):
reference_date = datetime.date(year=2013, month=3, day=5)
annotation_latest_issuing_date = {
'latest_issuing_date': RawSQL('SELECT max(file.issuing_date) '
'FROM <app>_medicalfile file '
'WHERE file.person_id = <app>_medicalfile.person_id '
' AND file.issuing_date <= %s', (reference_date, ))
}
qs = MedicalFile.objects.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date)
qs = qs.extra(**annotation_latest_issuing_date).filter(issuing_date=F('latest_issuing_date'))
Writen as such, the queryset returns correct number of records.
Question: how can it be achieved without RaWSQL and (already implied) with the same performance level ?
You can use id__in and provide your nested filtered queryset (like all files that are valid at the given date).
qs = MedicalFile.objects
.filter(id__in=self.filter(expiring_date__gt=reference_date, issuing_date__lte=reference_date))
.order_by('patient__pk', '-issuing_date')
.distinct('patient__pk') # field_name parameter only supported by Postgres
The order_by groups the files by patient, with the latest issuing date first. distinct then retrieves that first file for each patient. However, general care is required when combining order_by and distinct: https://docs.djangoproject.com/en/1.9/ref/models/querysets/#django.db.models.query.QuerySet.distinct
Edit: Removed single patient dependence from first filter and changed latest to combination of order_by and distinct
Consider p is a Patient class instance.
I think you can do someting like:
p.files.filter(issue_date__lt='some_date', expiring_date__gt='some_date')
See https://docs.djangoproject.com/en/1.9/topics/db/queries/#backwards-related-objects
Or maybe with the Q magic query object...

django distinct doesn't return just unique fields

I've a models for chat messages with three fields:
sender, recipient ( ForeignKey for User model ) and message as a TextField.
I'm trying to select all unique conversations with either sender either recipient field (exclude request.user). And I'm a bit messed in how to implement that.
I've 2 issues:
Message.objects.filter(Q(sender = request.user)|Q(recipient = request.user)).values('sender').distinct()
doesn't return a list of unique records ( even with order_by ). I've a lot absolutely the same senders: {'sender': 4L}, {'sender': 4L} (the same is with recipients).
And the second issue is:
Do I need to concatenate two queysets (for senders and recipients) or there is another way to get the whole list of conversations for current request.user?
upd. ok, here it is table content:
mysql> select id, sender_id, recipient_id, body from messages_message ;
+----+-----------+--------------+-----------+
| id | sender_id | recipient_id | body |
+----+-----------+--------------+-----------+
| 1 | 4 | 1 | Message 1 |
| 2 | 4 | 1 | Message 2 |
+----+-----------+--------------+-----------+
and here it is result of
Message.objects.filter(Q(sender = request.user)|Q(recipient = request.user)).values('sender').distinct()
[{'sender': 4L}, {'sender': 4L}]
But I expected to get just a [{'sender': 4L}].
So, what's wrong?
upd2. my model:
class Message(models.Model):
body = models.TextField(_("Body"))
sender = models.ForeignKey(User, related_name='sent_messages', verbose_name=_("Sender"))
recipient = models.ForeignKey(User, related_name='received_messages', null=True, blank=True, verbose_name=_("Recipient"))
sent_at = models.DateTimeField(_("sent at"), null=True, blank=True)
I need to select all conversation partners (people who sent or received message to request.user) of current user.
Just my $.02, I really think that this sort of logic is handled better by python than SQL. If you use query params that are specific to one DB, then it kind of defeats the purpose of the ORM, in my opinion.
I would try something like this:
messages = Message.objects.filter(Q(sender = request.user)|Q(recipient = request.user))
## Does this need to be 'Q'? ##
Then:
partners = set()
for m in messages:
partners.add(m.sender)
partners.add(m.recipient)
If you were going to look at this set often, you could cache it.
But it might be better to make partners a field of User, and add to it every time a message is sent. Then no complex query would ever need to be made, just a simple User.partners.
I assume that you need to get the User object to send a message anyway, so it shouldn't be extra overhead.
As Rob pointed out, distinct() works differently than you expected. It looks at all the fields to determine uniqueness, not just the ones you specify in values().
If you're using PostgreSQL then you can do what you want by passing arguments to distinct(). From the documentation:
You can pass positional arguments (*fields) in order to specify the
names of fields to which the DISTINCT should apply. This translates to
a SELECT DISTINCT ON SQL query. Here’s the difference. For a normal
distinct() call, the database compares each field in each row when
determining which rows are distinct. For a distinct() call with
specified field names, the database will only compare the specified
field names.
Getting back to your ultimate goal of finding all conversation partners, I don't see a simple, elegant solution. One way to do it is to use aggregation:
receivers = user.sent_messages.values('recipient')
.aggregate(num_messages=Count('id'))
senders = user.received_messages.values('sender')
.aggregate(num_messages=Count('id'))
You'd then want to manually combine them if you don't care about the distinction between senders and receivers.

How to group by AND aggregate with Django

I have a fairly simple query I'd like to make via the ORM, but can't figure that out..
I have three models:
Location (a place), Attribute (an attribute a place might have), and Rating (a M2M 'through' model that also contains a score field)
I want to pick some important attributes and be able to rank my locations by those attributes - i.e. higher total score over all selected attributes = better.
I can use the following SQL to get what I want:
select location_id, sum(score)
from locations_rating
where attribute_id in (1,2,3)
group by location_id order by sum desc;
which returns
location_id | sum
-------------+-----
21 | 12
3 | 11
The closest I can get with the ORM is:
Rating.objects.filter(
attribute__in=attributes).annotate(
acount=Count('location')).aggregate(Sum('score'))
Which returns
{'score__sum': 23}
i.e. the sum of all, not grouped by location.
Any way around this? I could execute the SQL manually, but would rather go via the ORM to keep things consistent.
Thanks
Try this:
Rating.objects.filter(attribute__in=attributes) \
.values('location') \
.annotate(score = Sum('score')) \
.order_by('-score')
Can you try this.
Rating.objects.values('location_id').filter(attribute__in=attributes).annotate(sum_score=Sum('score')).order_by('-score')

Ordering entries via comment count with django

I need to get entries from database with counts of comments. Can i do it with django's comment framework? I am also using a voting application which is not using GenericForeignKeys i get entries with scores like this:
class EntryManager(models.ModelManager):
def get_queryset(self):
return super(EntryManager,self).get_queryset(self).all().annotate(\
score=Sum("linkvote__value"))
But when there is foreignkeys i am being stuck. Do you have any ideas about that?
extra explaination: i need to fetch entries like this:
id | body | vote_score | comment_score |
1 | foo | 13 | 4 |
2 | bar | 4 | 1 |
after doing that, i can order them via comment_score. :)
Thans for all replies.
Apparently, annotating with reverse generic relations (or extra filters, in general) is still an open ticket (see also the corresponding documentation). Until this is resolved, I would suggest using raw SQL in an extra query, like this:
return super(EntryManager,self).get_queryset(self).all().annotate(\
vote_score=Sum("linkvote__value")).extra(select={
'comment_score': """SELECT COUNT(*) FROM comments_comment
WHERE comments_comment.object_pk = yourapp_entry.id
AND comments_comment.content_type = %s"""
}, select_params=(entry_type,))
Of course, you have to fill in the correct table names. Furthermore, entry_type is a "constant" that can be set outside your lookup function (see ContentTypeManager):
from django.contrib.contenttypes.models import ContentType
entry_type = ContentType.objects.get_for_model(Entry)
This is assuming you have a single model Entry that you want to calculate your scores on. Otherwise, things would get slightly more complicated: you would need a sub-query to fetch the content type id for the type of each annotated object.

Django: union of different queryset on the same model

I'm programming a search on a model and I have a problem.
My model is almost like:
class Serials(models.Model):
id = models.AutoField(primary_key=True)
code = models.CharField("Code", max_length=50)
name = models.CharField("Name", max_length=2000)
and I have in the database tuples like these:
1 BOSTON The new Boston
2 NYT New York journal
3 NEWTON The old journal of Mass
4 ANEWVIEW The view of the young people
If I search for the string new, what I want to have is:
first the names that start with the string
then the codes that start with the string
then the names that contain the string
then the codes that contain the string
So the previous list should appear in the following way:
2 NYT New York journal
3 NEWTON The old journal of Mass
1 BOSTON The new Boston
4 ANEWVIEW The view of the young people
The only way I found to have this kind of result is to make different searches (if I put "OR" in a single search, I loose the order I want).
My problem is that the code of the template that shows the result is really redundant and honestly very ugly, because I have to repeat the same code for all the 4 different querysets. And the worse thing is that I cannot use the pagination!
Now, since the structure of the different querysets is the same, I'm wandering if there is a way to join the 4 querysets and give the template only one queryset.
You can make those four queries and then chain them inside your program:
result = itertools.chain(qs1, qs2, qs3, qs4)
but this doesn't seem to nice because your have to make for queries.
You can also write your own sql using raw sql, for example:
Serials.objects.raw(sql_string)
Also look at this:
How to combine 2 or more querysets in a Django view?
You should also be able to do qs1 | qs2 | qs3 | qs4. This will give you duplicates, however.
What you might want to look into is Q() objects:
from django.db.models import Q
value = "new"
Serials.objects.filter(Q(name__startswith=value) |
Q(code__startswith=value) |
Q(name__contains=value) |
Q(code__contains=value).distinct()
I'm not sure if it will handle the ordering if you do it this way, as this would rely on the db doing that.
Indeed, even using qs1 | qs2 may cause the order to be determined by the db. That might be the drawback (and reason why you might need at least two queries).