With a raw query the columns for the SQL that will be executed can be accessed like this.
query_set = Model.objects.raw('select * from table')
query_set.columns
There is no columns attribute for django.db.models.query.QuerySet.
I'm not using raw ... I was debugging a normal query set using Q filters ... it was returning way too many records ... so I wrote code to see what the results would be with normal SQL. I'd like the see the columns and the actual SQL being called on a query set so I can diagnose what the issue is without guessing.
How do you get the columns or the SQL that will be executed from a django.db.models.query.QuerySet instance?
Ok, so based on your comment, if you want to see what django has created in terms of SQL there's an attribute you can use on the Queryset.
Once you've written your query, qs = MyModel.objects.all(), you can then inspect that by doing qs.query, which if you print that out will show you the SQL query itself.
You'll have to inspect this query to see what columns are being included.
The query object is a class called Query which I can't find mention of in the django docs, but it's source is here; https://github.com/django/django/blob/master/django/db/models/sql/query.py#L136
Writing code that is generating JSON. The last section of JSON has to be terminated by a ",", so in the code I have:
-- Define a queryset to retrieve distinct values of the database field:
databases_in_workload = DatabaseObjectsWorkload.objects.filter(workload=migration.workload_id).values_list('database_object__database', flat=True).distinct()
-- Then I cycle over it:
for database_wk in databases_in_workload:
... do something
if not (database_wk == databases_in_workload.last()):
job_json_string = job_json_string + '} ],'
else:
job_json_string = job_json_string + '} ]'
I want the last record to be terminated by a square bracket, the preceding by a comma. But instead, the opposite is happening.
I also looked at the database table content. The values I have for "database_wk" are user02 (for the records with a lower value of primary key) and user01 (for the records with the higher value of pk in the DB). The order (if user01 is first or last) really doesn't matter, as long as the last record is correctly identified by last() - so if I have user02, user01 in the query set iterations, I expect last() to return user01. However - this is not working correctly.
What is strange is that if in the database (Postgres) order is changed (first have user01, then user02 ordered by primary key values) then the "if" code above works, but in my situation last() seems to be returning the first record, not the last. It's as if there is one order in the database, another in the query set, and last() is taking the database order... Anybody encountered/solved this issue before? Alternatively - any other method for identifying the last record in a query set (other than last()) which I could try would also help. Many thanks in advance!
The reason is behaving the way it does is because there is no ordering specified. Try using order_by. REF
From: queryset.first()
If the QuerySet has no ordering defined, then the queryset is automatically ordered by the primary key
From: queryset.last()
Works like first(), but returns the last object in the queryset.
If you don't want to use order_by then try using queryset.latest()
This is a bleeding-edge feature that I'm currently skewered upon and quickly bleeding out. I want to annotate a subquery-aggregate onto an existing queryset. Doing this before 1.11 either meant custom SQL or hammering the database. Here's the documentation for this, and the example from it:
from django.db.models import OuterRef, Subquery, Sum
comments = Comment.objects.filter(post=OuterRef('pk')).values('post')
total_comments = comments.annotate(total=Sum('length')).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
They're annotating on the aggregate, which seems weird to me, but whatever.
I'm struggling with this so I'm boiling it right back to the simplest real-world example I have data for. I have Carparks which contain many Spaces. Use Book→Author if that makes you happier but —for now— I just want to annotate on a count of the related model using Subquery*.
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
This gives me a lovely ProgrammingError: more than one row returned by a subquery used as an expression and in my head, this error makes perfect sense. The subquery is returning a list of spaces with the annotated-on total.
The example suggested that some sort of magic would happen and I'd end up with a number I could use. But that's not happening here? How do I annotate on aggregate Subquery data?
Hmm, something's being added to my query's SQL...
I built a new Carpark/Space model and it worked. So the next step is working out what's poisoning my SQL. On Laurent's advice, I took a look at the SQL and tried to make it more like the version they posted in their answer. And this is where I found the real problem:
SELECT "bookings_carpark".*, (SELECT COUNT(U0."id") AS "c"
FROM "bookings_space" U0
WHERE U0."carpark_id" = ("bookings_carpark"."id")
GROUP BY U0."carpark_id", U0."space"
)
AS "space_count" FROM "bookings_carpark";
I've highlighted it but it's that subquery's GROUP BY ... U0."space". It's retuning both for some reason. Investigations continue.
Edit 2: Okay, just looking at the subquery SQL I can see that second group by coming through ☹
In [12]: print(Space.objects_standard.filter().values('carpark').annotate(c=Count('*')).values('c').query)
SELECT COUNT(*) AS "c" FROM "bookings_space" GROUP BY "bookings_space"."carpark_id", "bookings_space"."space" ORDER BY "bookings_space"."carpark_id" ASC, "bookings_space"."space" ASC
Edit 3: Okay! Both these models have sort orders. These are being carried through to the subquery. It's these orders that are bloating out my query and breaking it.
I guess this might be a bug in Django but short of removing the Meta-order_by on both these models, is there any way I can unsort a query at querytime?
*I know I could just annotate a Count for this example. My real purpose for using this is a much more complex filter-count but I can't even get this working.
Shazaam! Per my edits, an additional column was being output from my subquery. This was to facilitate ordering (which just isn't required in a COUNT).
I just needed to remove the prescribed meta-order from the model. You can do this by just adding an empty .order_by() to the subquery. In my code terms that meant:
from django.db.models import Count, OuterRef, Subquery
spaces = Space.objects.filter(carpark=OuterRef('pk')).order_by().values('carpark')
count_spaces = spaces.annotate(c=Count('*')).values('c')
Carpark.objects.annotate(space_count=Subquery(count_spaces))
And that works. Superbly. So annoying.
It's also possible to create a subclass of Subquery, that changes the SQL it outputs. For instance, you can use:
class SQCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = models.IntegerField()
You then use this as you would the original Subquery class:
spaces = Space.objects.filter(carpark=OuterRef('pk')).values('pk')
Carpark.objects.annotate(space_count=SQCount(spaces))
You can use this trick (at least in postgres) with a range of aggregating functions: I often use it to build up an array of values, or sum them.
I just bumped into a VERY similar case, where I had to get seat reservations for events where the reservation status is not cancelled. After trying to figure the problem out for hours, here's what I've seen as the root cause of the problem:
Preface: this is MariaDB, Django 1.11.
When you annotate a query, it gets a GROUP BY clause with the fields you select (basically what's in your values() query selection). After investigating with the MariaDB command line tool why I'm getting NULLs or Nones on the query results, I've came to the conclusion that the GROUP BY clause will cause the COUNT() to return NULLs.
Then, I started diving into the QuerySet interface to see how can I manually, forcibly remove the GROUP BY from the DB queries, and came up with the following code:
from django.db.models.fields import PositiveIntegerField
reserved_seats_qs = SeatReservation.objects.filter(
performance=OuterRef(name='pk'), status__in=TAKEN_TYPES
).values('id').annotate(
count=Count('id')).values('count')
# Query workaround: remove GROUP BY from subquery. Test this
# vigorously!
reserved_seats_qs.query.group_by = []
performances_qs = Performance.objects.annotate(
reserved_seats=Subquery(
queryset=reserved_seats_qs,
output_field=PositiveIntegerField()))
print(performances_qs[0].reserved_seats)
So basically, you have to manually remove/update the group_by field on the subquery's queryset in order for it to not have a GROUP BY appended on it on execution time. Also, you'll have to specify what output field the subquery will have, as it seems that Django fails to recognize it automatically, and raises exceptions on the first evaluation of the queryset. Interestingly, the second evaluation succeeds without it.
I believe this is a Django bug, or an inefficiency in subqueries. I'll create a bug report about it.
Edit: the bug report is here.
Problem
The problem is that Django adds GROUP BY as soon as it sees using an aggregate function.
Solution
So you can just create your own aggregate function but so that Django thinks it is not aggregate. Just like this:
total_comments = Comment.objects.filter(
post=OuterRef('pk')
).order_by().annotate(
total=Func(F('length'), function='SUM')
).values('total')
Post.objects.filter(length__gt=Subquery(total_comments))
This way you get the SQL query like this:
SELECT "testapp_post"."id", "testapp_post"."length"
FROM "testapp_post"
WHERE "testapp_post"."length" > (SELECT SUM(U0."length") AS "total"
FROM "testapp_comment" U0
WHERE U0."post_id" = "testapp_post"."id")
So you can even use aggregate subqueries in aggregate functions.
Example
You can count the number of workdays between two dates, excluding weekends and holidays, and aggregate and summarize them by employee:
class NonWorkDay(models.Model):
date = DateField()
class WorkPeriod(models.Model):
employee = models.ForeignKey(User, on_delete=models.CASCADE)
start_date = DateField()
end_date = DateField()
number_of_non_work_days = NonWorkDay.objects.filter(
date__gte=OuterRef('start_date'),
date__lte=OuterRef('end_date'),
).annotate(
cnt=Func('id', function='COUNT')
).values('cnt')
WorkPeriod.objects.values('employee').order_by().annotate(
number_of_word_days=Sum(F('end_date__year') - F('start_date__year') - number_of_non_work_days)
)
Hope this will help!
A solution which would work for any general aggregation could be implemented using Window classes from Django 2.0. I have added this to the Django tracker ticket as well.
This allows the aggregation of annotated values by calculating the aggregate over partitions based on the outer query model (in the GROUP BY clause), then annotating that data to every row in the subquery queryset. The subquery can then use the aggregated data from the first row returned and ignore the other rows.
Performance.objects.annotate(
reserved_seats=Subquery(
SeatReservation.objects.filter(
performance=OuterRef(name='pk'),
status__in=TAKEN_TYPES,
).annotate(
reserved_seat_count=Window(
expression=Count('pk'),
partition_by=[F('performance')]
),
).values('reserved_seat_count')[:1],
output_field=FloatField()
)
)
If I understand correctly, you are trying to count Spaces available in a Carpark. Subquery seems overkill for this, the good old annotate alone should do the trick:
Carpark.objects.annotate(Count('spaces'))
This will include a spaces__count value in your results.
OK, I have seen your note...
I was also able to run your same query with other models I had at hand. The results are the same, so the query in your example seems to be OK (tested with Django 1.11b1):
activities = Activity.objects.filter(event=OuterRef('pk')).values('event')
count_activities = activities.annotate(c=Count('*')).values('c')
Event.objects.annotate(spaces__count=Subquery(count_activities))
Maybe your "simplest real-world example" is too simple... can you share the models or other information?
"works for me" doesn't help very much. But.
I tried your example on some models I had handy (the Book -> Author type), it works fine for me in django 1.11b1.
Are you sure you're running this in the right version of Django? Is this the actual code you're running? Are you actually testing this not on carpark but some more complex model?
Maybe try to print(thequery.query) to see what SQL it's trying to run in the database. Below is what I got with my models (edited to fit your question):
SELECT (SELECT COUNT(U0."id") AS "c"
FROM "carparks_spaces" U0
WHERE U0."carpark_id" = ("carparks_carpark"."id")
GROUP BY U0."carpark_id") AS "space_count" FROM "carparks_carpark"
Not really an answer, but hopefully it helps.
I'm trying to display a chat log in django. I can get my entire chatlog in the proper order with this query.
latest_chats_list = Chat.objects.order_by('timestamp')
I want the functionality of this line (last 10 elements in order), but django doesn't allow negative indexes.
latest_chats_list = Chat.objects.order_by('timestamp')[-10:]
if I try this line, I get the messages I want, but they're in the wrong order.
latest_chats_list = Chat.objects.order_by('-timestamp')[:10]
This line gives the first 10 chats instead of the most recent.
latest_chats_list = Chat.objects.order_by('-timestamp')[:10].reverse()
last_ten = Chat.objects.all().order_by('-id')[:10]
last_ten_in_ascending_order = reversed(last_ten)
Edit (from comments)
Why not use Django's queryset.reverse() ?
Because it messes with the SQL query, as does queryset.order_by(). Slicing the queryset ([:10]) also alters the SQL query, adding LIMIT and OFFSET to it. The two can combine in not-obviously-expected ways...
On the other hand, the built-in Python function reversed(iterable) only changes the way queryset gets iterated over, not effecting the SQL at all.
I'm trying to order a list of items in django by the number of comments they have. However, there seems to be an issue in that the Count function doesn't take into account the fact that django comments also uses a content_type_id to discern between comments for different objects!
This gives me a slight problem in that the comment counts for all objects are wrong using the standard methods; is there a 'nice' fix or do I need to drop back to raw sql?
Code to try and ge the correct ordering:
app_list = App.objects.filter(published=True)
.annotate(num_comments=Count('comments'))
.order_by('-num_comments')
Sample output from the query (note no mention of the content type id):
SELECT "apps_app"."id", "apps_app"."name",
"apps_app"."description","apps_app"."author_name", "apps_app"."site_url",
"apps_app"."source_url", "apps_app"."date_added", "apps_app"."date_modified",
"apps_app"."published", "apps_app"."published_email_sent", "apps_app"."created_by_id",
"apps_app"."rating_votes", "apps_app"."rating_score", COUNT("django_comments"."id") AS
"num_comments" FROM "apps_app" LEFT OUTER JOIN "django_comments" ON ("apps_app"."id" =
"django_comments"."object_pk") WHERE "apps_app"."published" = 1 GROUP BY
"apps_app"."id", "apps_app"."name", "apps_app"."description", "apps_app"."author_name",
"apps_app"."site_url", "apps_app"."source_url", "apps_app"."date_added",
"apps_app"."date_modified", "apps_app"."published", "apps_app"."published_email_sent",
"apps_app"."created_by_id", "apps_app"."rating_votes", "apps_app"."rating_score" ORDER
BY num_comments DESC LIMIT 4
Think I found the answer: Django Snippet