How to update a queryset that has been annotated?

How to update a queryset that has been annotated? - django

I have the following models:
class Work(models.Model):
visible = models.BooleanField(default=False)
class Book(models.Model):
work = models.ForeignKey('Work')
I am attempting to update some rows like so:
qs=Work.objects.all()
qs.annotate(Count('book')).filter(Q(book__count__gt=1)).update(visible=False)
However, this is giving an error:
DatabaseError: subquery has too many columns
LINE 1: ...SET "visible" = false WHERE "app_work"."id" IN (SELECT...
If I remove the update clause, the query runs with no problems and returns what I am expecting.
It looks like this error happens for queries with an annotate followed by an update. Is there some other way to write this?

Without making a toy database to be able to duplicate your issue and try out solutions, I can at least suggest the approach in Django: Getting complement of queryset as one possible approach.
Try this approach:
qs.annotate(Count('book')).filter(Q(book__count__gt=1))
Work.objects.filter(pk__in=qs.values_list('pk', flat=True)).update(visible=False)

You can also clear the annotations off a queryset quite simply:
qs.query.annotations.clear()
qs.update(..)
And this means you're only firing off one query, not one into another, but don't use this if your query relies on an annotation to filter. This is great for stripping out database-generated concatenations, and the utility rubbish that I occasionally add into model's default queries... but the example in the question is a perfect example of where this would not work.

To add to Oli's answer: If you need your annotations for the update then do the filters first and store the result in a variable and then call filter with no arguments on that queryset to access the update function like so:
q = X.objects.filter(annotated_val=5, annotated_name='Nima')
q.query.annotations.clear()
q.filter().update(field=900)

I've duplicated this issue & believe its a bug with the Django ORM. #acjay answer is a good workaround. Bug report: https://code.djangoproject.com/ticket/25171
Fix released in Django 2 alpha: https://code.djangoproject.com/ticket/19513

Related

Django 1.11 Annotating a Subquery Aggregate

This is a bleeding-edge feature that I'm currently skewered upon and quickly bleeding out. I want to annotate a subquery-aggregate onto an existing queryset. Doing this before 1.11 either meant custom SQL or hammering the database. Here's the documentation for this, and the example from it:
from django.db.models import OuterRef, Subquery, Sum
comments = Comment.objects.filter(post=OuterRef('pk')).values('post')
total_comments = comments.annotate(total=Sum('length')).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
They're annotating on the aggregate, which seems weird to me, but whatever.
I'm struggling with this so I'm boiling it right back to the simplest real-world example I have data for. I have Carparks which contain many Spaces. Use Book→Author if that makes you happier but —for now— I just want to annotate on a count of the related model using Subquery*.
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
This gives me a lovely ProgrammingError: more than one row returned by a subquery used as an expression and in my head, this error makes perfect sense. The subquery is returning a list of spaces with the annotated-on total.
The example suggested that some sort of magic would happen and I'd end up with a number I could use. But that's not happening here? How do I annotate on aggregate Subquery data?
Hmm, something's being added to my query's SQL...
I built a new Carpark/Space model and it worked. So the next step is working out what's poisoning my SQL. On Laurent's advice, I took a look at the SQL and tried to make it more like the version they posted in their answer. And this is where I found the real problem:
SELECT "bookings_carpark".*, (SELECT COUNT(U0."id") AS "c"
FROM "bookings_space" U0
WHERE U0."carpark_id" = ("bookings_carpark"."id")
GROUP BY U0."carpark_id", U0."space"
)
AS "space_count" FROM "bookings_carpark";
I've highlighted it but it's that subquery's GROUP BY ... U0."space". It's retuning both for some reason. Investigations continue.
Edit 2: Okay, just looking at the subquery SQL I can see that second group by coming through ☹
In [12]: print(Space.objects_standard.filter().values('carpark').annotate(c=Count('*')).values('c').query)
SELECT COUNT(*) AS "c" FROM "bookings_space" GROUP BY "bookings_space"."carpark_id", "bookings_space"."space" ORDER BY "bookings_space"."carpark_id" ASC, "bookings_space"."space" ASC
Edit 3: Okay! Both these models have sort orders. These are being carried through to the subquery. It's these orders that are bloating out my query and breaking it.
I guess this might be a bug in Django but short of removing the Meta-order_by on both these models, is there any way I can unsort a query at querytime?
*I know I could just annotate a Count for this example. My real purpose for using this is a much more complex filter-count but I can't even get this working.

Shazaam! Per my edits, an additional column was being output from my subquery. This was to facilitate ordering (which just isn't required in a COUNT).
I just needed to remove the prescribed meta-order from the model. You can do this by just adding an empty .order_by() to the subquery. In my code terms that meant:
from django.db.models import Count, OuterRef, Subquery
spaces = Space.objects.filter(carpark=OuterRef('pk')).order_by().values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
And that works. Superbly. So annoying.

It's also possible to create a subclass of Subquery, that changes the SQL it outputs. For instance, you can use:
class SQCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
You then use this as you would the original Subquery class:
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
Carpark.objects.annotate(space_count=SQCount(spaces))
You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.

I just bumped into a VERY similar case, where I had to get seat reservations for events where the reservation status is not cancelled. After trying to figure the problem out for hours, here's what I've seen as the root cause of the problem:
Preface: this is MariaDB, Django 1.11.
When you annotate a query, it gets a GROUP BY clause with the fields you select (basically what's in your values() query selection). After investigating with the MariaDB command line tool why I'm getting NULLs or Nones on the query results, I've came to the conclusion that the GROUP BY clause will cause the COUNT() to return NULLs.
Then, I started diving into the QuerySet interface to see how can I manually, forcibly remove the GROUP BY from the DB queries, and came up with the following code:
from django.db.models.fields import PositiveIntegerField
reserved_seats_qs = SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES
).values('id').annotate(
count=Count('id')).values('count')
# Query workaround: remove GROUP BY from subquery. Test this
# vigorously!
reserved_seats_qs.query.group_by = []
performances_qs = Performance.objects.annotate(
reserved_seats=Subquery(
queryset=reserved_seats_qs,
output_field=PositiveIntegerField()))
print(performances_qs[0].reserved_seats)
So basically, you have to manually remove/update the group_by field on the subquery's queryset in order for it to not have a GROUP BY appended on it on execution time. Also, you'll have to specify what output field the subquery will have, as it seems that Django fails to recognize it automatically, and raises exceptions on the first evaluation of the queryset. Interestingly, the second evaluation succeeds without it.
I believe this is a Django bug, or an inefficiency in subqueries. I'll create a bug report about it.
Edit: the bug report is here.

Problem
The problem is that Django adds GROUP BY as soon as it sees using an aggregate function.
Solution
So you can just create your own aggregate function but so that Django thinks it is not aggregate. Just like this:
total_comments = Comment.objects.filter(
post=OuterRef('pk')
).order_by().annotate(
total=Func(F('length'), function='SUM')
).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
This way you get the SQL query like this:
SELECT "testapp_post"."id", "testapp_post"."length"
FROM "testapp_post"
WHERE "testapp_post"."length" > (SELECT SUM(U0."length") AS "total"
FROM "testapp_comment" U0
WHERE U0."post_id" = "testapp_post"."id")
So you can even use aggregate subqueries in aggregate functions.
Example
You can count the number of workdays between two dates, excluding weekends and holidays, and aggregate and summarize them by employee:
class NonWorkDay(models.Model):
date = DateField()
class WorkPeriod(models.Model):
employee = models.ForeignKey(User, on_delete=models.CASCADE)
start_date = DateField()
end_date = DateField()
number_of_non_work_days = NonWorkDay.objects.filter(
date__gte=OuterRef('start_date'),
date__lte=OuterRef('end_date'),
).annotate(
cnt=Func('id', function='COUNT')
).values('cnt')
WorkPeriod.objects.values('employee').order_by().annotate(
number_of_word_days=Sum(F('end_date__year') - F('start_date__year') - number_of_non_work_days)
)
Hope this will help!

A solution which would work for any general aggregation could be implemented using Window classes from Django 2.0. I have added this to the Django tracker ticket as well.
This allows the aggregation of annotated values by calculating the aggregate over partitions based on the outer query model (in the GROUP BY clause), then annotating that data to every row in the subquery queryset. The subquery can then use the aggregated data from the first row returned and ignore the other rows.
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)

If I understand correctly, you are trying to count Spaces available in a Carpark. Subquery seems overkill for this, the good old annotate alone should do the trick:
Carpark.objects.annotate(Count('spaces'))
This will include a spaces__count value in your results.
OK, I have seen your note...
I was also able to run your same query with other models I had at hand. The results are the same, so the query in your example seems to be OK (tested with Django 1.11b1):
activities = Activity.objects.filter(event=OuterRef('pk')).values('event')
count_activities = activities.annotate(c=Count('*')).values('c')
Event.objects.annotate(spaces__count=Subquery(count_activities))
Maybe your "simplest real-world example" is too simple... can you share the models or other information?

"works for me" doesn't help very much. But.
I tried your example on some models I had handy (the Book -> Author type), it works fine for me in django 1.11b1.
Are you sure you're running this in the right version of Django? Is this the actual code you're running? Are you actually testing this not on carpark but some more complex model?
Maybe try to print(thequery.query) to see what SQL it's trying to run in the database. Below is what I got with my models (edited to fit your question):
SELECT (SELECT COUNT(U0."id") AS "c"
FROM "carparks_spaces" U0
WHERE U0."carpark_id" = ("carparks_carpark"."id")
GROUP BY U0."carpark_id") AS "space_count" FROM "carparks_carpark"
Not really an answer, but hopefully it helps.

Why .filter() in django returns duplicated objects?

I've followed django tutorial and arrived at tutorial05.
I tried to not show empty poll as tutorial says, so I added filter condition like this:
class IndexView(generic.ListView):
...
def get_queryset(self):
return Question.objects.filter(
pub_date__lte=timezone.now(),
choice__isnull=False
).order_by('-pub_date')[:5]
But this returned two objects which are exactly same.
I think choice__isnull=False caused the problem, but not sure.

choice__isnull causes the problem. It leads to join with choice table (to weed out questions without choices), that is something like this:
SELECT question.*
FROM question
JOIN choice
ON question.id = choice.question_id
WHERE question.pub_date < NOW()
You can inspect query attribute of QuerySet to be sure. So if you have one question with two choices, you will get that question two times. You need to use distinct() method in this case: queryset.distinct().

Just use .distinct() at the end of your ORM.

A little late to the party, but I figured it could help others looking up the same issue.
Instead of using choice__isnull=False with the filter() method, use it with exclude() instead to exclude out any questions without any choices. So your code would look something like this:
...
def get_queryset(self):
return Question.objects.filter(pub_date__lte=timezone.now()).exclude(choice__isnull=True).order_by('-pub_date')[:5]
By doing it this way, it will return only one instance of the question. Be sure to use choice_isnull=True though.

Because you created two objects with same properties. If you want to ensure uniqueness, you should add validation in clean and add unique index on identifier field too.
Besides filter returns all the objects that match the criteria, if you are expecting only one item to be returned, you should use get instead. get would raise exception if less or more than 1 item is found.

Django order_by() filter with distinct()

How can I make an order_by like this ....
p = Product.objects.filter(vendornumber='403516006')\
.order_by('-created').distinct('vendor__name')
The problem is that I have multiple vendors with the same name, and I only want the latest product by the vendor ..
Hope it makes sense?
I got this DB error:
SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON ("search_vendor"."name")
"search_product"...

Based on your error message and this other question, it seems to me this would fix it:
p = Product.objects.filter(vendornumber='403516006')\
.order_by('vendor__name', '-created').distinct('vendor__name')
That is, it seems that the DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). So by making the column you use in distinct as the first column in the order_by, I think it should work.

Just matching leftmost order_by() arg and distinct() did not work for me, producing the same error (Django 1.8.7 bug or a feature)?
qs.order_by('project').distinct('project')
however it worked when I changed to:
qs.order_by('project__id').distinct('project')
and I do not even have multiple order_by args.

In case you are hoping to use a separate field for distinct and order by another field you can use the below code
from django.db.models import Subquery
Model.objects.filter(
pk__in=Subquery(
Model.objects.all().distinct('foo').values('pk')
)
).order_by('bar')

I had a similar issue but then with related fields. With just adding the related field in distinct(), I didn't get the right results.
I wanted to sort by room__name keeping the person (linked to residency ) unique. Repeating the related field as per the below fixed my issue:
.order_by('room__name', 'residency__person', ).distinct('room__name', 'residency__person')
See also these related posts:
ProgrammingError: when using order_by and distinct together in django
django distinct and order_by
Postgresql DISTINCT ON with different ORDER BY

Get most commented posts in django with django comments

I'm trying to get the ten most commented posts in my django app, but I'm unable to do it because I can't think a proper way.
I'm currently using the django comments framework, and I've seen a possibility of doing this with aggregate or annotate , but I can figure out how.
The thing would be:
Get all the posts
Calculate the number of comments per post (I have a comment_count method for that)
Order the posts from most commented to less
Get the first 10 (for example)
Is there any "simple" or "pythonic" way to do this? I'm a bit lost since the comments framework is only accesible via template tags, and not directly from the code (unless you want to modify it)
Any help is appreciated

You're right that you need to use the annotation and aggregation features. What you need to do is group by and get a count of the object_pk of the Comment model:
from django.contrib.comments.models import Comment
from django.db.models import Count
o_list = Comment.objects.values('object_pk').annotate(ocount=Count('object_pk'))
This will assign something like the following to o_list:
[{'object_pk': '123', 'ocount': 56},
{'object_pk': '321', 'ocount': 47},
...etc...]
You could then sort the list and slice the top 10:
top_ten_objects = sorted(o_list, key=lambda k: k['ocount'])[:10]
You can then use the values in object_pk to retrieve the objects that the comments are attached to.

Annotate is going to be the preferred way, partially because it will reduce db queries and it's basically a one-liner. While your theoretical loop would work, I bet your comment_count method relies on querying comments for a given post, which would be 1 query per post that you loop over- nasty!
posts_by_score = Comment.objects.filter(is_public=True).values('object_pk').annotate(
score=Count('id')).order_by('-score')
post_ids = [int(obj['object_pk']) for obj in posts_by_score]
top_posts = Post.objects.in_bulk(post_ids)
This code is shameless adapted from Django-Blog-Zinnia (no affiliation)

Get last record in a queryset

How can I retrieve the last record in a certain queryset?

Django Doc:
latest(field_name=None) returns the latest object in the table, by date, using the field_name provided as the date field.
This example returns the latest Entry in the table, according to the
pub_date field:
Entry.objects.latest('pub_date')

EDIT : You now have to use Entry.objects.latest('pub_date')
You could simply do something like this, using reverse():
queryset.reverse()[0]
Also, beware this warning from the Django documentation:
... note that reverse() should
generally only be called on a QuerySet
which has a defined ordering (e.g.,
when querying against a model which
defines a default ordering, or when
using order_by()). If no such ordering
is defined for a given QuerySet,
calling reverse() on it has no real
effect (the ordering was undefined
prior to calling reverse(), and will
remain undefined afterward).

The simplest way to do it is:
books.objects.all().last()
You also use this to get the first entry like so:
books.objects.all().first()

To get First object:
ModelName.objects.first()
To get last objects:
ModelName.objects.last()
You can use filter
ModelName.objects.filter(name='simple').first()
This works for me.

Django >= 1.6
Added QuerySet methods first() and last() which are convenience methods returning the first or last object matching the filters. Returns None if there are no objects matching.

When the queryset is already exhausted, you may do this to avoid another db hint -
last = queryset[len(queryset) - 1] if queryset else None
Don't use try...except....
Django doesn't throw IndexError in this case.
It throws AssertionError or ProgrammingError(when you run python with -O option)

You can use Model.objects.last() or Model.objects.first().
If no ordering is defined then the queryset is ordered based on the primary key. If you want ordering behaviour queryset then you can refer to the last two points.
If you are thinking to do this, Model.objects.all().last() to retrieve last and Model.objects.all().first() to retrieve first element in a queryset or using filters without a second thought. Then see some caveats below.
The important part to note here is that if you haven't included any ordering in your model the data can be in any order and you will have a random last or first element which was not expected.
Eg. Let's say you have a model named Model1 which has 2 columns id and item_count with 10 rows having id 1 to 10.[There's no ordering defined]
If you fetch Model.objects.all().last() like this, You can get any element from the list of 10 elements. Yes, It is random as there is no default ordering.
So what can be done?
You can define ordering based on any field or fields on your model. It has performance issues as well, Please check that also. Ref: Here
OR you can use order_by while fetching.
Like this: Model.objects.order_by('item_count').last()

If using django 1.6 and up, its much easier now as the new api been introduced -
Model.object.earliest()
It will give latest() with reverse direction.
p.s. - I know its old question, I posting as if going forward someone land on this question, they get to know this new feature and not end up using old method.

In a Django template I had to do something like this to get it to work with a reverse queryset:
thread.forumpost_set.all.last
Hope this helps someone looking around on this topic.

MyModel.objects.order_by('-id')[:1]

If you use ids with your models, this is the way to go to get the latest one from a qs.
obj = Foo.objects.latest('id')

You can try this:
MyModel.objects.order_by('-id')[:1]

The simplest way, without having to worry about the current ordering, is to convert the QuerySet to a list so that you can use Python's normal negative indexing. Like so:
list(User.objects.all())[-1]

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

How to update a queryset that has been annotated? - django

I've duplicated this issue & believe its a bug with the Django ORM. #acjay answer is a good workaround. Bug report: https://code.djangoproject.com/ticket/25171 Fix released in Django 2 alpha: https://code.djangoproject.com/ticket/19513

Related

Django 1.11 Annotating a Subquery Aggregate

Why .filter() in django returns duplicated objects?

Django order_by() filter with distinct()

Get most commented posts in django with django comments

Get last record in a queryset

Categories

Resources