Where should I specify the output_field kwarg in this complex query? - django

I wrote a little survey app. I have a query that causes an exception upon upgrade to Django 2.0:
Expression contains mixed types. You must set output_field.
Here's a few relationships necessary to understand the query (where --fk--> indicats a foreign key):
response --fk--> question --fk--> survey
response --fk--> person
Here's my query:
answered_surveys = SurveyResponse.objects.all()\
.values('fk_question__fk_survey__hash', 'fk_question__fk_survey__name')\
.annotate(
nb_respondants = Count('fk_person', distinct=True),
survey_name = F('fk_question__fk_survey__name'),
survey_hash = F('fk_question__fk_survey__hash')
)\
.annotate(
completion=100*Count('id')/
(F('nb_respondants')*F('fk_question__fk_survey__nb_questions_per_survey'))
)
It's fine up until the last annotate, but I don't know where to add the output_field kwarg since I only have F and Count models. Count outputs an IntegerField by definition, F complains if I try to add that kwarg.
How do I fix this?

Thanks to #Gahan's comment, I discovered I need to use an ExpressionWrapper to perform arithmetic on F objects within an annotation. Docs here.
The last part hence becomes:
.annotate(
completion=ExpressionWrapper(
100*Count('id')\
(F('nb_respondants')*F('fk_question__fk_survey__nb_questions_per_survey')),
output_field=FloatField()
)

Related

Annotate queryset with whether matching related object exists

I have two models with an explicit many-to-many relationship: a thing, auth.user, and a "favorite" model connecting the two. I want to be able to order my "thing"s by whether or not they are favorited by a particular user. In Sqlite3, the best query i've come up with is (roughly) this:
select
*, max(u.name = "john cleese") as favorited
from thing as t
join favorite as f on f.thing_id = t.id
join user as u on f.user_id = u.id
group by t.id
order by favorited desc
;
The thing tripping me up in my sql-to-django translation is the max(u.name = "john cleese") bit. As far as I can tell, Django has support for arithmatic but not equality. The closest I can come is a case statement that doesn't properly group the output rows:
Thing.objects.annotate(favorited=Case(
When(favorites__user=john_cleese, then=Value(True)),
default=Value(False),
output_field=BooleanField()
))
The other direction I've tried is to use RawSQL:
Thing.objects.annotate(favorited=RawSQL('"auth_user"."username" = "%s"', ["john cleese"]))
However, this won't work, because (as far as I'm aware) there's no way to explicitly join the favorite and auth_user tables I need.
Is there something I'm missing?
This will achieve what you (or anyone else googling there way here) wants to do:
Thing.objects.annotate(
favorited=Count(Case(
When(
favorites__user=john_cleese,
then=1
),
default=0,
output_field=BooleanField(),
)),
)
From what I read in a related ticket, you can use subquery with the Exists query expression.
Exists is a Subquery subclass that uses an SQL EXISTS statement. In many cases it will perform better than a subquery since the database is able to stop evaluation of the subquery when a first matching row is found.
Assuming the middle model in your case of ManyToMany is called Favorite
from django.db.models import Exists, OuterRef
is_favorited_subquery = Favorite.objects.filter(
thing_id = OuterRef('pk')
)
Thing.objects.annotate(favorited=Exists(is_favorited_subquery))
Then you can order by favorited attribute of the query.
I'm not exactly sure what you're trying to achieve, but I would start it like this way.
from django.db import models
from django.contrib.auth.models import User
class MyUser(models.Model):
person = models.OneToOneField(User)
class Thing(models.Model):
thingname = models.CharField(max_length=10)
favorited_by = models.ManyToManyField(MyUser)
And in your view:
qs = MyUser.objects.get(id=pk_of_user_john_reese).thing_set.all()
Will give you all Thing objects of the given user.
You should have a look in the Django Docs for ManyToMany
I'm using Django for some years now in several smaller and even bigger Projects, but I have never used the RawSQL features. Most times I thought about it, I have had a mistake in my model design.

Annotate filtering -- sum only some of related objects' fields

Let's say there's an Author and he has Books. In order to fetch authors together with the number of written pages, the following can be done:
Author.objects.annotate(total_pages=Sum('book__pages'))
But what if I wanted to sum pages of sci-fi and fantasy books separately? I'd like to end up with an Author, that has total_pages_books_scifi_pages and total_pages_books_fantasy_pages properties.
I know I can do following:
Author.objects.filter(book__category='scifi').annotate(total_pages_books_scifi_pages=Sum('book__pages'))
Author.objects.filter(book__category='fantasy').annotate(total_pages_books_fantasy_pages=Sum('book__pages'))
But how do it in one queryset?
from django.db.models import IntegerField, F, Case, When, Sum
categories = ['scifi', 'fantasy']
annotations = {}
for category in categories:
annotation_name = 'total_pages_books_{}'.format(category)
case = Case(
When(book__category=category, then=F('book__pages')),
default=0,
output_field=IntegerField()
)
annotations[annotation_name] = Sum(case)
Author.objects.filter(
book__category__in=categories
).annotate(
**annotations
)
Try:
Author.objects.values("book__category").annotate(total_pages=Sum('book__pages'))
From Django docs:
https://docs.djangoproject.com/en/1.10/topics/db/aggregation/#values:
values()
Ordinarily, annotations are generated on a per-object basis - an annotated QuerySet will return one result for each object in the original QuerySet. However, when a values() clause is used to constrain the columns that are returned in the result set, the method for evaluating annotations is slightly different. Instead of returning an annotated result for each result in the original QuerySet, the original results are grouped according to the unique combinations of the fields specified in the values() clause. An annotation is then provided for each unique group; the annotation is computed over all members of the group.

update a specific field in a filter queryset django without creating an instance

I have a queryset which returns the first object of a model
latest_ignit_data = ( ignition_data
.objects
.filter(vehicle=veh_obj,updated_time_lt=new_time)
.order_by('updated_time')
.first()
)
I would like to update the time field of this object, how can I do that?
EDITED
This is NOT the way:
latest_ignit_data.update( updated_time =new_time )
Reading django documentation, first raises a QuerySet evaluation because first method is a convenience method equivalent to slicing.
Then, the right way is:
pk = ( ignition_data
.objects
.filter(vehicle=veh_obj,updated_time_lt=new_time)
.order_by('updated_time')
.values_list('id', flat=True)
.first()
)
ignition_data.objects.filter(pk=pk).update( updated_time =new_time )
This will update database without create instance and without raise signals.

filtering the order_by relationship in Django ORM

In the below, product has many writers through contributor, and contributor.role_code defines the exact kind of contribution made to the product. Is it possible with the Django ORM to filter the contributors referenced by the order_by() method below? E.g. I want to order products only by contributors such that contributor.role_code in ['A01', 'B01'].
Product.objects.filter(
product_type__name=choices.PRODUCT_BOOK
).order_by(
'contributor__writer__last_name' # filter which contributors it uses?
)
You can do this via an annotation subquery:
Define the Subquery that represents the thing we want to order by
Annotate the original QuerySet with the Subquery
Order by the annotation.
contributors_for_ordering = Contributor.objects.filter( # 1
product=OuterRef('pk'),
role_code__in=['A01', 'B01'],
).values('writer__last_name')
queryset = Product.objects.filter(
product_type__name=choices.PRODUCT_BOOK
).annotate( # 2
writer_last_name=Subquery(contributors_for_ordering[:1]) # Slice [:1] to ensure a single result
).order_by( # 3
'writer_last_name'
)
Note, however, that there is a potential quirk here. If a Product has contributors with both 'A01' and 'B01' we haven't controlled which one will be used for ordering--we'll get whichever the database returns first. You can add an order_by clause to contributors_for_ordering to deal with that.
To filter on specific values, first build your list of accepted values:
accepted_values = ['A01', 'B01']
Then filter for values in this list:
Product.objects.filter(
product_type__name=choices.PRODUCT_BOOK
).filter(
contributor__role_code__in=accepted_values
).order_by(
'contributor__writer__last_name'
)

Django update queryset with annotation

I want to update all rows in queryset by using annotated value.
I have a simple models:
class Relation(models.Model):
rating = models.IntegerField(default=0)
class SignRelation(models.Model):
relation = models.ForeignKey(Relation, related_name='sign_relations')
rating = models.IntegerField(default=0)
And I want to awoid this code:
for relation in Relation.objects.annotate(total_rating=Sum('sign_relations__rating')):
relation.rating = relation.total_rating or 0
relation.save()
And do update in one SQL-request by using something like this:
Relation.objects.update(rating=Sum('sign_relations__rating'))
Doesn't work:
TypeError: int() argument must be a string or a number, not 'Sum'
or
Relation.objects.annotate(total_rating=Sum('sign_relations__rating')).update(rating=F('total_rating'))
Also doesn't work:
DatabaseError: missing FROM-clause entry for table "relations_signrelation"
LINE 1: UPDATE "relations_relation" SET "rating" = SUM("relations_si...
Is it possible to use Django's ORM for this purpose? There is no info about using update() and annotate() together in docs.
For Django 1.11+ you can use Subquery:
from django.db.models import OuterRef, Subquery, Sum
Relation.objects.update(
rating=Subquery(
Relation.objects.filter(
id=OuterRef('id')
).annotate(
total_rating=Sum('sign_relations__rating')
).values('total_rating')[:1]
)
)
This code produce the same SQL code proposed by Tomasz Jakub Rup but with no use of RawSQL expression. The Django documentation warns against the use of RawSQL due to the possibility of SQL injection).
Update
I published an article based on this answer with more in-depth explanations: Updating a Django queryset with annotation and subquery on paulox.net
UPDATE statement doesn't support GROUP BY. See e.g. PostgreSQL Docs, SQLite Docs.
You need someting like this:
UPDATE relation
SET rating = (SELECT SUM(rating)
FROM sign_relation
WHERE relation_id = relation.id)
Equivalent in DjangoORM:
from django.db.models.expressions import RawSQL
Relation.objects.all(). \
update(rating=RawSQL('SELECT SUM(rating) FROM signrelation WHERE relation_id = relation.id', []))
or:
from django.db.models import F, Sum
from django.db.models.expressions import RawSQL
Relation.objects.all(). \
update(rating=RawSQL(SignRelation.objects. \
extra(where=['relation_id = relation.id']). \
values('relation'). \
annotate(sum_rating=Sum('rating')). \
values('sum_rating').query, []))
You can define your own custom objects manager:
class RelationManager(models.Manager):
def annotated(self,*args,*kwargs):
queryset = super(RelationManager,self).get_queryset()
for obj in queryset:
obj.rating = ... do something ...
return queryset
class Relations(models.Model):
rating = models.IntegerField(default=0)
rating_objects = RelationManager()
Then in your code:
q = Realation.rating_objects.annotated()
Add args/kwargs to customise what this manager returns.
Workaround for postgres:
with connection.cursor() as cursor:
sql, params = qs.query.sql_with_params()
cursor.execute("""
WITH qs AS ({})
UPDATE foo SET bar = qs.bar
FROM qs WHERE qs.id = foo.id
""".format(sql), params)
If you want to avoid many calls to the database, you should use transaction.atomic.
Read more on Django documentation: https://docs.djangoproject.com/en/1.9/topics/db/transactions/#controlling-transactions-explicitly
You really can't do this. Take a look at the code for update and follow it through for some fine reading.
Honestly, what's wrong with placing something like this in a Manager definition? Put those 3 lines you don't want to put in your view into a manager, call that manager as necessary. Additionally, you're doing much less "magic" and when the next developer looks at your code, they won't have to resort to a few WTF's .. :)
Also, I was curious and it looks like you can use SQL Join with UPDATE statements but it's some classic SQL hackery .. So if you're so inclined, you can use Djangos raw SQL functionality for that ;)