Annotating without using Exists or SubQuery

Annotating without using Exists or SubQuery - django

I have a client who is using Django 1.8. While they will be moved to the latest version, we need to run some queries before their migration, but obviously we can't use Exists or OuterRef.
In our case we want to annotate a queryset. eg
recordset = Question.objects.annotate(
has_answers=Exists(Answers.objects.filter(question=OuterRef('pk')))
)
Is there a workaround to do the equivalent of the above annotation. What did people use in 'the olden days'?

The following should work in 1.8, annotate each question with the count of answers and then use a conditional expression to convert that to a boolean
from django.db.models import Count, Case, When, BooleanField
Question.objects.annotate(
num_answers=Count('answer')
).annotate(
has_answers=Case(
When(num_answers__gt=0, then=True),
default=False,
output_field=BooleanField()
)
)

Related

Django queryset how to query SQL with positive values first, ZERO values second

I have an SQL query like the following:
select * from results_table order by case
when place = 0 then 1 else 0 end, place
This query sorts positive numbers first, ZEROs next. How can I write this in Django? Better yet, how can I write it in the following way:
Result.objects.filter(...).order_by('positive_place', 'place')
where 'positive_place' exists for certain models. I am reading about annotate but I am not quiet sure how it works yet. I need to write the annotation for every query. Is there a way to write annotation per query set?

An annotation is adding an attribute to each object in a queryset. Attributes can be further filtered and ordered. You can annotate a queryset using conditional expressions and you can make it reusable by calling custom queryset methods from the model manager.
I'm having a hard time understanding your desired ordering but here's an example of how it could be put together.
from django.db import models
from django.db.models import Case, Value as V, When
class ResultQuerySet(models.QuerySet):
def annotate_positive_place(self):
return self.annotate(
positive_place=Case(When(place=0, then=V(1)), default=V(0))
)
class Result(models.Model):
place = models.IntegerField()
objects = ResultQuerySet.as_manager()
Result.objects.annotate_positive_place().order_by('positive_place')

How to get revision dates for first and last update for multiple objects?

I need to make a bulk query for all instances of SomeModel, annotating them with the date of their creation and last update. Here's what I tried and is terribly slow:
query = SomeModel.objects.all()
for entry in query:
last_updated_date = entry.details.history.last().history_date
created_date = entry.details.history.first().history_date
csv_writer.writerow([entry.name, last_updated_date, created_date])
How could I optimize the code? I imagine that the problem is that I'm making a lot of SELECT queries, when probably a single bit more complex one would do.

You can try like this(using subquery):
from django.db.models import OuterRef, Subquery
from simple_history.models import HistoricalRecords
histories = HistoricalRecords.objects.filter(pk=OuterRef('details__history')).order_by('history_date')
SomeModel.objects.annotate(created_date=Subquery(histories.values('history_date')[:1])).annotate(last_updated==Subquery(histories.order_by('-history_date').values('history_date')[:1]))
FYI: this is an untested code.

Django: using an annotated aggregate in queryset update()

I've run into an interesting situation in a new app I've added to an existing project. My goal is to (using a Celery task) update many rows at once with a value that includes annotated aggregated values from foreign keyed objects. Here are some example models that I've used in previous questions:
class Book(models.model):
author = models.CharField()
num_pages = models.IntegerField()
num_chapters = models.IntegerField()
class UserBookRead(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
user_book_stats = models.ForeignKey(UserBookStats)
book = models.ForeignKey(Book)
complete = models.BooleanField(default=False)
pages_read = models.IntegerField()
class UserBookStats(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL)
total_pages_read = models.IntegerField()
I'm attempting to:
Use the post_save signal from Book instances to update pages_read on related UserBookRead objects when a Book page count is updated.
At the end of the signal, launch a background Celery task to roll up the pages_read from each UserBookRead which was updated, and update the total_pages_read on each related UserBookStats (This is where the problem occurs)
I'm trying to be as lean as possible as far as number of queries- step 1 is complete and only requires a few queries for my actual use case, which seems acceptable for a signal handler, as long as those queries are optimized properly.
Step 2 is more involved, hence the delegation to a background task. I've managed to accomplish most of it in a fairly clean manner (well, for me at least).
The problem I run into is that when annotating the UserBookStats queryset with a total_pages aggregation (the Sum() of all pages_read for related UserBookRead objects), I can't follow that with a straight update of the queryset to set the total_pages_read field.
Here's the code (the Book instance is passed to the task as book):
# use the provided book instance to get the stats which need to be updated
book_read_objects= UserBookRead.objects.filter(book=book)
book_stat_objects = UserBookStats.objects.filter(id__in=book_read_objects.values_list('user_book_stats__id', flat=True).distinct())
# annotate top level stats objects with summed page count
book_stat_objects = book_stat_objects.annotate(total_pages=Sum(F('user_book_read__pages_read')))
# update the objects with that sum
book_stat_objects.update(total_pages_read=F('total_pages'))
On executing the last line, this error is thrown:
django.core.exceptions.FieldError: Aggregate functions are not allowed in this query
After some research, I found an existing Django ticket for this use case here, on which the last comment mentions 2 new features in 1.11 that could make it possible.
Is there any known/accepted way to accomplish this use case, perhaps using Subquery or OuterRef? I haven't had any success trying to fold in the aggregation as a Subquery. The fallback here is:
for obj in book_stat_objects:
obj.total_pages_read = obj.total_pages
obj.save()
But with potentially tens of thousands of records in book_stat_objects, I'm really trying to avoid issuing an UPDATE for each one individually.

I ended up figuring out how to do this with Subquery and OuterRef, but had to take a different approach than I originally expected.
I was able to quickly get a Subquery working, however when I used it to annotate the parent query, I noticed that every annotated value was the first result of the subquery- this was when I realized I needed OuterRef, because the generated SQL wasn't restricting the subquery by anything in the parent query.
This part of the Django docs was super helpful, as was this StackOverflow question. What this process boils down to is that you have to use Subquery to create the aggregation, and OuterRef to ensure the subquery restricts aggregated rows by the parent query PK. At that point, you can annotate with the aggregated value and directly make use of it in a queryset update().
As I mentioned in the question, the code examples are made up. I've tried to adapt them to my actual use case with my changes:
from django.db.models import Subquery, OuterRef
from django.db.models.functions import Coalesce
# create the queryset to use as the subquery, restrict based on the `book_stat_objects` queryset
book_reads = UserBookRead.objects.filter(user_book_stat__in=book_stat_objects, user_book_stats=OuterRef('pk')).values('user_book_stats')
# annotate the future subquery with the aggregation of pages_read from each UserBookRead
total_pages = book_reads.annotate(total=Sum(F('pages_read')))
# annotate each stat object with the subquery total
book_stats = book_stats.annotate(total=Coalesce(Subquery(total_pages), 0))
# update each row with the new total pages count
book_stats.update(total_pages_read=F('total'))
It felt odd to create a queryset that cant be used on it's own (trying to evaluate book_reads will throw an error due to the inclusion of OuterRef), but once you examine the final SQL generated for book_stats, it makes sense.
EDIT
I ended up running into a bug with this code a week or two after figuring out this answer. It turned out to be due to a default ordering for the UserBookRead model. As the Django docs state, default ordering is incorporated into any aggregate GROUP BY clauses, so all of my aggregates were off. The solution to that is to clear the default ordering with a blank order_by() when creating the base subquery:
book_reads = UserBookRead.objects.filter(user_book_stat__in=book_stat_objects, user_book_stats=OuterRef('pk')).values('user_book_stats').order_by()

Action vise versa in Admin Django

i write an action for class in admin.py
class YarnsAdmin(admin.ModelAdmin):
actions = [make_stockable_unstockable]
i want this action to change status vise versa of stockable for the product.
my try is:
def make_stockable_unstockable(self, request, queryset):
for product in queryset:
if product.stockable:
queryset.filter(id=product.id).update(stockable=False)
else:
queryset.filter(id=product.id).update(stockable=True)
self.message_user(request, "Position(s) were updated")
it works, but i think this takes a lot of resources.
if anyone has an idea to optimize it?

Since Django 1.8, Conditional Expressions (SQL's Case..When..) are supported.
Thus the following django ORM single update statement should accomplish what you need
from django.db.models import Case, When, F, Q, BooleanField, Value
queryset.annotate(new_value=Case(
When(Q(stockable=False), then=Value(True)),
default=Value(False),
output_field=BooleanField()
)).update(stockable=F('new_value'))
it generates the following sql
UPDATE `yourmodel`
SET `stockable` = CASE WHEN `yourmodel`.`stocakble` = 0 THEN 1 ELSE 0 END
WHERE <queryset's filters>
for the record, here is the original, wrong solution I initially proposed
you could issue just two updates instead of looping:
queryset.filter(stockable=False).update(stockable=True)
queryset.filter(stockable=True).update(stockable=False)
which will flip the flag with two update statements

You can try to use the bulk update which fires single query for bunch of records instead of one query per record.
def make_stockable_unstockable(self, queryset):
queryset.filter(stockable=False).update(stockable=True)
queryset.filter(stockable=True).update(stockable=False)

Dynamic range filter based on field in Django

I'm relatively new to Django and not familiar yet with Django Querysets.
I want to filter a queryset by a datetime range based on a field.
In MySQL I would do
WHERE (start_time < NOW() - INTERVAL duration MINUTE)
Here start_time is a datetime and duration is an int for duration in minutes.
How would I do this in Django in a portable way? I know I could always use extra, but I would prefer it to work both in MySQL and Sqlite3. It seems that these database managers doesn't share any datetime functions.

You can use 'F' expressions in Queries to reference fields of the model within filter expressions (see https://docs.djangoproject.com/en/dev/topics/db/queries/#using-f-expressions-in-filters)
This means that within the filter, it's possible to say 'where time_a is before time_b' for any particular row with:
filter(time_a__lt=F('time_b')
However, the problem arises because, as per this related question, timedelta won't accept the dynamic F() expression... and so it looks like custom SQL is required. However, you can test that automatically in Django on both sqlite and mysql to assert that it works on both.
Meanwhile, a particularly dirty solution which avoids custom SQL is to convert the start_time into Unix time which makes the maths easy.
from calendar import timegm
from datetime import timedelta
from django.db import models
from django.db.models import F
from django.utils.timezone import now
class Journey(models.Model):
name = models.CharField(max_length=64)
start_time = models.IntegerField()
duration_minutes = models.IntegerField()
def current_journeys():
unow = timegm(now().utctimetuple())
q = Journey.objects.filter(start_time__gt=unow-60*F('duration_minutes'))
return q.all()

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Annotating without using Exists or SubQuery - django

Related

Django queryset how to query SQL with positive values first, ZERO values second

How to get revision dates for first and last update for multiple objects?

Django: using an annotated aggregate in queryset update()

Action vise versa in Admin Django

Dynamic range filter based on field in Django

Categories

Resources