I have a problem with Django queryset ordering.
My model contains a field named position, a PositiveSmallIntegerField which I'd like to used to order query results.
I use order_by('position'), which works great.
Problem : my position field is nullable (null=True, blank=True), because I don't wan't to specify a position for every 50000 instances of my model. When some instances have a NULL position, order_by returns them in the top of the list: I'd like them to be at the end.
In raw SQL, I used to write things like:
IF(position IS NULL or position='', 1, 0)
(see http://www.shawnolson.net/a/730/mysql-sort-order-with-null.html). Is it possible to get the same result using Django, without writing raw SQL?
You can use the annotate() from django agrregation to do the trick:
items = Item.objects.all().annotate(null_position=Count('position')).order_by('-null_position', 'position')
As of Django 1.8 you can use Coalesce() to convert NULL to 0.
Sample:
import datetime
from django.db.models.functions import Coalesce, Value
from app import models
# Coalesce works by taking the first non-null value. So we give it
# a date far before any non-null values of last_active. Then it will
# naturally sort behind instances of Box with a non-null last_active value.
the_past = datetime.datetime.now() - datetime.timedelta(days=10*365)
boxes = models.Box.objects.all().annotate(
new_last_active=Coalesce(
'last_active', Value(the_past)
)
).order_by('-new_last_active')
It's a shame there are a lot of questions like this on SO that are not marked as duplicate. See (for example) this answer for the native solution for Django 1.11 and newer. Here is a short excerpt:
Added the nulls_first and nulls_last parameters to Expression.asc() and desc() to control the ordering of null values.
Example usage (from comment to that answer):
from django.db.models import F
MyModel.objects.all().order_by(F('price').desc(nulls_last=True))
Credit goes to the original answer author and commenter.
Using extra() as Ignacio said optimizes a lot the end query. In my aplication I've saved more than 500ms (that's a lot for a query) in database processing using extra() instead of annotate()
Here is how it would look like in your case:
items = Item.objects.all().extra(
'select': {
'null_position': 'CASE WHEN {tablename}.position IS NULL THEN 0 ELSE 1 END'
}
).order_by('-null_position', 'position')
{tablename} should be something like {Item's app}_item following django's default tables name.
I found that the syntax in Pablo's answer needed to be updated to the following on my 1.7.1 install:
items = Item.objects.all().extra(select={'null_position': 'CASE WHEN {name of Item's table}.position IS NULL THEN 0 ELSE 1 END'}).order_by('-null_position', 'position')
QuerySet.extra() can be used to inject expressions into the query and order by them.
Related
I found this discussion about annotation over querysets. I wonder, if that ever got implemented?
When I run the following annotation:
x = Isolate.objects.all().annotate(sum_shipped=Sum('shipped'))
x[0].sum_shipped
>>> True
shipped is a boolean field, so I would expect here the number of instances where shipped is set to True. Instead, I get True or 1. That is pretty inconvenient behaviour.
There is also this discussion on stackoverflow. However, this only covers django 1.
I would expect something like Sum([True, True, False]) -> 2.
Instead, True is returned.
Seems this was not touched since then. Is that discussion in the second link still the state-of-the-art ???
Is there a better way to do statistics with the content of the database than this, now that Django is in version 4?
My current database is sqlite3 for development and testing, but will be Postgres in production. I would very much like to get database independent results.
SUM(bool_field) on Oracle will return 1 also, because bools in Oracle are just a bit (1 or 0). Postgres has a specific bool type, and you can't sum a True/False without an implicit conversion to int. This is why I say SUM(bool) is only accidentally supported for a subset of databases. The field type that is returned is based on the backend get_db_converters and the actual values that come back.
Example model
class Isolate(models.Model):
isolate_nbr = models.CharField(max_length=10, primary_key=True)
growing = models.BooleanField(default=True)
shipped = models.BooleanField(default=True)
....
I think you have a few options here, an annotate is not one of them at this time.
from django.db.models import Sum, Count
# Use aggregate with a filter
print(Isolate.objects.filter(shipped=True).aggregate(Sum('shipped')))
# Just filter then get the count of the queryset
print(Isolate.objects.filter(shipped=True).count())
# Just use aggregate without filter (this will only aggregate/Sum the True values)
print(Isolate.objects.aggregate(Sum('shipped')))
# WON'T WORK: annotate will only annotate on that row whatever that row's value is, not the aggregate across the table
print(Isolate.objects.annotate(sum_shipped==Count('shipped')).first().sum_shipped)
Use Postgres as db and Django 1.9
I have some model with field 'price'. 'Price' blank=True.
On ListView, I get query set. Next, I want to sort by price with price=0 at end.
How I can write in SQL it:
'ORDER BY NULLIF('price', 0) NULLS LAST'
How write it on Django ORM? Or on rawsql?
Ok. I found alternative. Write own NullIf with django func.
from django.db.models import Func
class NullIf(Func):
template = 'NULLIF(%(expressions)s, 0)'
And use it for queryset:
queryset.annotate(new_price=NullIf('price')).order_by('new_price')
Edit : Django 2.2 and above have this implemented out of the box. The equivalent code will be
from django.db.models.functions import NullIf
from django.db.models import Value
queryset.annotate(new_price=NullIf('price', Value(0)).order_by('new_price')
You can still ORDER BY PRICE NULLS LAST if in your select you select the price as SELECT NULLIF('price', 0). That way you get the ordering you want, but the data is returned in the way you want. In django ORM you would select the price with annotate eg TableName.objects.annotate(price=NullIf('price', 0) and for the order by NULLS LAST and for the order by I'd follow the recommendations here Django: Adding "NULLS LAST" to query
Otherwise you could also ORDER BY NULLIF('price', 0) DESC but that will reorder the other numeric values. You can also obviously exclude null prices from the query entirely if you don't require them.
This is a bleeding-edge feature that I'm currently skewered upon and quickly bleeding out. I want to annotate a subquery-aggregate onto an existing queryset. Doing this before 1.11 either meant custom SQL or hammering the database. Here's the documentation for this, and the example from it:
from django.db.models import OuterRef, Subquery, Sum
comments = Comment.objects.filter(post=OuterRef('pk')).values('post')
total_comments = comments.annotate(total=Sum('length')).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
They're annotating on the aggregate, which seems weird to me, but whatever.
I'm struggling with this so I'm boiling it right back to the simplest real-world example I have data for. I have Carparks which contain many Spaces. Use Book→Author if that makes you happier but —for now— I just want to annotate on a count of the related model using Subquery*.
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
This gives me a lovely ProgrammingError: more than one row returned by a subquery used as an expression and in my head, this error makes perfect sense. The subquery is returning a list of spaces with the annotated-on total.
The example suggested that some sort of magic would happen and I'd end up with a number I could use. But that's not happening here? How do I annotate on aggregate Subquery data?
Hmm, something's being added to my query's SQL...
I built a new Carpark/Space model and it worked. So the next step is working out what's poisoning my SQL. On Laurent's advice, I took a look at the SQL and tried to make it more like the version they posted in their answer. And this is where I found the real problem:
SELECT "bookings_carpark".*, (SELECT COUNT(U0."id") AS "c"
FROM "bookings_space" U0
WHERE U0."carpark_id" = ("bookings_carpark"."id")
GROUP BY U0."carpark_id", U0."space"
)
AS "space_count" FROM "bookings_carpark";
I've highlighted it but it's that subquery's GROUP BY ... U0."space". It's retuning both for some reason. Investigations continue.
Edit 2: Okay, just looking at the subquery SQL I can see that second group by coming through ☹
In [12]: print(Space.objects_standard.filter().values('carpark').annotate(c=Count('*')).values('c').query)
SELECT COUNT(*) AS "c" FROM "bookings_space" GROUP BY "bookings_space"."carpark_id", "bookings_space"."space" ORDER BY "bookings_space"."carpark_id" ASC, "bookings_space"."space" ASC
Edit 3: Okay! Both these models have sort orders. These are being carried through to the subquery. It's these orders that are bloating out my query and breaking it.
I guess this might be a bug in Django but short of removing the Meta-order_by on both these models, is there any way I can unsort a query at querytime?
*I know I could just annotate a Count for this example. My real purpose for using this is a much more complex filter-count but I can't even get this working.
Shazaam! Per my edits, an additional column was being output from my subquery. This was to facilitate ordering (which just isn't required in a COUNT).
I just needed to remove the prescribed meta-order from the model. You can do this by just adding an empty .order_by() to the subquery. In my code terms that meant:
from django.db.models import Count, OuterRef, Subquery
spaces = Space.objects.filter(carpark=OuterRef('pk')).order_by().values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
And that works. Superbly. So annoying.
It's also possible to create a subclass of Subquery, that changes the SQL it outputs. For instance, you can use:
class SQCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
You then use this as you would the original Subquery class:
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
Carpark.objects.annotate(space_count=SQCount(spaces))
You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.
I just bumped into a VERY similar case, where I had to get seat reservations for events where the reservation status is not cancelled. After trying to figure the problem out for hours, here's what I've seen as the root cause of the problem:
Preface: this is MariaDB, Django 1.11.
When you annotate a query, it gets a GROUP BY clause with the fields you select (basically what's in your values() query selection). After investigating with the MariaDB command line tool why I'm getting NULLs or Nones on the query results, I've came to the conclusion that the GROUP BY clause will cause the COUNT() to return NULLs.
Then, I started diving into the QuerySet interface to see how can I manually, forcibly remove the GROUP BY from the DB queries, and came up with the following code:
from django.db.models.fields import PositiveIntegerField
reserved_seats_qs = SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES
).values('id').annotate(
count=Count('id')).values('count')
# Query workaround: remove GROUP BY from subquery. Test this
# vigorously!
reserved_seats_qs.query.group_by = []
performances_qs = Performance.objects.annotate(
reserved_seats=Subquery(
queryset=reserved_seats_qs,
output_field=PositiveIntegerField()))
print(performances_qs[0].reserved_seats)
So basically, you have to manually remove/update the group_by field on the subquery's queryset in order for it to not have a GROUP BY appended on it on execution time. Also, you'll have to specify what output field the subquery will have, as it seems that Django fails to recognize it automatically, and raises exceptions on the first evaluation of the queryset. Interestingly, the second evaluation succeeds without it.
I believe this is a Django bug, or an inefficiency in subqueries. I'll create a bug report about it.
Edit: the bug report is here.
Problem
The problem is that Django adds GROUP BY as soon as it sees using an aggregate function.
Solution
So you can just create your own aggregate function but so that Django thinks it is not aggregate. Just like this:
total_comments = Comment.objects.filter(
post=OuterRef('pk')
).order_by().annotate(
total=Func(F('length'), function='SUM')
).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
This way you get the SQL query like this:
SELECT "testapp_post"."id", "testapp_post"."length"
FROM "testapp_post"
WHERE "testapp_post"."length" > (SELECT SUM(U0."length") AS "total"
FROM "testapp_comment" U0
WHERE U0."post_id" = "testapp_post"."id")
So you can even use aggregate subqueries in aggregate functions.
Example
You can count the number of workdays between two dates, excluding weekends and holidays, and aggregate and summarize them by employee:
class NonWorkDay(models.Model):
date = DateField()
class WorkPeriod(models.Model):
employee = models.ForeignKey(User, on_delete=models.CASCADE)
start_date = DateField()
end_date = DateField()
number_of_non_work_days = NonWorkDay.objects.filter(
date__gte=OuterRef('start_date'),
date__lte=OuterRef('end_date'),
).annotate(
cnt=Func('id', function='COUNT')
).values('cnt')
WorkPeriod.objects.values('employee').order_by().annotate(
number_of_word_days=Sum(F('end_date__year') - F('start_date__year') - number_of_non_work_days)
)
Hope this will help!
A solution which would work for any general aggregation could be implemented using Window classes from Django 2.0. I have added this to the Django tracker ticket as well.
This allows the aggregation of annotated values by calculating the aggregate over partitions based on the outer query model (in the GROUP BY clause), then annotating that data to every row in the subquery queryset. The subquery can then use the aggregated data from the first row returned and ignore the other rows.
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)
If I understand correctly, you are trying to count Spaces available in a Carpark. Subquery seems overkill for this, the good old annotate alone should do the trick:
Carpark.objects.annotate(Count('spaces'))
This will include a spaces__count value in your results.
OK, I have seen your note...
I was also able to run your same query with other models I had at hand. The results are the same, so the query in your example seems to be OK (tested with Django 1.11b1):
activities = Activity.objects.filter(event=OuterRef('pk')).values('event')
count_activities = activities.annotate(c=Count('*')).values('c')
Event.objects.annotate(spaces__count=Subquery(count_activities))
Maybe your "simplest real-world example" is too simple... can you share the models or other information?
"works for me" doesn't help very much. But.
I tried your example on some models I had handy (the Book -> Author type), it works fine for me in django 1.11b1.
Are you sure you're running this in the right version of Django? Is this the actual code you're running? Are you actually testing this not on carpark but some more complex model?
Maybe try to print(thequery.query) to see what SQL it's trying to run in the database. Below is what I got with my models (edited to fit your question):
SELECT (SELECT COUNT(U0."id") AS "c"
FROM "carparks_spaces" U0
WHERE U0."carpark_id" = ("carparks_carpark"."id")
GROUP BY U0."carpark_id") AS "space_count" FROM "carparks_carpark"
Not really an answer, but hopefully it helps.
I am using Django, with mongoengine. I have a model Classes with an inscriptions list, And I want to get the docs that have an id in that list.
classes = Classes.objects.filter(inscriptions__contains=request.data['inscription'])
Here's a general explanation of querying ArrayField membership:
Per the Django ArrayField docs, the __contains operator checks if a provided array is a subset of the values in the ArrayField.
So, to filter on whether an ArrayField contains the value "foo", you pass in a length 1 array containing the value you're looking for, like this:
# matches rows where myarrayfield is something like ['foo','bar']
Customer.objects.filter(myarrayfield__contains=['foo'])
The Django ORM produces the #> postgres operator, as you can see by printing the query:
print Customer.objects.filter(myarrayfield__contains=['foo']).only('pk').query
>>> SELECT "website_customer"."id" FROM "website_customer" WHERE "website_customer"."myarrayfield_" #> ['foo']::varchar(100)[]
If you provide something other than an array, you'll get a cryptic error like DataError: malformed array literal: "foo" DETAIL: Array value must start with "{" or dimension information.
Perhaps I'm missing something...but it seems that you should be using .filter():
classes = Classes.objects.filter(inscriptions__contains=request.data['inscription'])
This answer is in reference to your comment for rnevius answer
In Django ORM whenever you make a Database call using ORM, it will generally return either a QuerySet or an object of the model if using get() / number if you are using count() ect., depending on the functions that you are using which return other than a queryset.
The result from a Queryset function can be used to implement further more refinement, like if you like to perform a order() or collecting only distinct() etc. Queryset are lazy which means it only hits the database when they are actually used not when they are assigned. You can find more information about them here.
Where as the functions that doesn't return queryset cannot implement such things.
Take time and go through the Queryset Documentation more in depth explanation with examples are provided. It is useful to understand the behavior to make your application more efficient.
How can I retrieve the last record in a certain queryset?
Django Doc:
latest(field_name=None) returns the latest object in the table, by date, using the field_name provided as the date field.
This example returns the latest Entry in the table, according to the
pub_date field:
Entry.objects.latest('pub_date')
EDIT : You now have to use Entry.objects.latest('pub_date')
You could simply do something like this, using reverse():
queryset.reverse()[0]
Also, beware this warning from the Django documentation:
... note that reverse() should
generally only be called on a QuerySet
which has a defined ordering (e.g.,
when querying against a model which
defines a default ordering, or when
using order_by()). If no such ordering
is defined for a given QuerySet,
calling reverse() on it has no real
effect (the ordering was undefined
prior to calling reverse(), and will
remain undefined afterward).
The simplest way to do it is:
books.objects.all().last()
You also use this to get the first entry like so:
books.objects.all().first()
To get First object:
ModelName.objects.first()
To get last objects:
ModelName.objects.last()
You can use filter
ModelName.objects.filter(name='simple').first()
This works for me.
Django >= 1.6
Added QuerySet methods first() and last() which are convenience methods returning the first or last object matching the filters. Returns None if there are no objects matching.
When the queryset is already exhausted, you may do this to avoid another db hint -
last = queryset[len(queryset) - 1] if queryset else None
Don't use try...except....
Django doesn't throw IndexError in this case.
It throws AssertionError or ProgrammingError(when you run python with -O option)
You can use Model.objects.last() or Model.objects.first().
If no ordering is defined then the queryset is ordered based on the primary key. If you want ordering behaviour queryset then you can refer to the last two points.
If you are thinking to do this, Model.objects.all().last() to retrieve last and Model.objects.all().first() to retrieve first element in a queryset or using filters without a second thought. Then see some caveats below.
The important part to note here is that if you haven't included any ordering in your model the data can be in any order and you will have a random last or first element which was not expected.
Eg. Let's say you have a model named Model1 which has 2 columns id and item_count with 10 rows having id 1 to 10.[There's no ordering defined]
If you fetch Model.objects.all().last() like this, You can get any element from the list of 10 elements. Yes, It is random as there is no default ordering.
So what can be done?
You can define ordering based on any field or fields on your model. It has performance issues as well, Please check that also. Ref: Here
OR you can use order_by while fetching.
Like this: Model.objects.order_by('item_count').last()
If using django 1.6 and up, its much easier now as the new api been introduced -
Model.object.earliest()
It will give latest() with reverse direction.
p.s. - I know its old question, I posting as if going forward someone land on this question, they get to know this new feature and not end up using old method.
In a Django template I had to do something like this to get it to work with a reverse queryset:
thread.forumpost_set.all.last
Hope this helps someone looking around on this topic.
MyModel.objects.order_by('-id')[:1]
If you use ids with your models, this is the way to go to get the latest one from a qs.
obj = Foo.objects.latest('id')
You can try this:
MyModel.objects.order_by('-id')[:1]
The simplest way, without having to worry about the current ordering, is to convert the QuerySet to a list so that you can use Python's normal negative indexing. Like so:
list(User.objects.all())[-1]