I want to update all rows in queryset by using annotated value.
I have a simple models:
class Relation(models.Model):
rating = models.IntegerField(default=0)
class SignRelation(models.Model):
relation = models.ForeignKey(Relation, related_name='sign_relations')
rating = models.IntegerField(default=0)
And I want to awoid this code:
for relation in Relation.objects.annotate(total_rating=Sum('sign_relations__rating')):
relation.rating = relation.total_rating or 0
relation.save()
And do update in one SQL-request by using something like this:
Relation.objects.update(rating=Sum('sign_relations__rating'))
Doesn't work:
TypeError: int() argument must be a string or a number, not 'Sum'
or
Relation.objects.annotate(total_rating=Sum('sign_relations__rating')).update(rating=F('total_rating'))
Also doesn't work:
DatabaseError: missing FROM-clause entry for table "relations_signrelation"
LINE 1: UPDATE "relations_relation" SET "rating" = SUM("relations_si...
Is it possible to use Django's ORM for this purpose? There is no info about using update() and annotate() together in docs.
For Django 1.11+ you can use Subquery:
from django.db.models import OuterRef, Subquery, Sum
Relation.objects.update(
rating=Subquery(
Relation.objects.filter(
id=OuterRef('id')
).annotate(
total_rating=Sum('sign_relations__rating')
).values('total_rating')[:1]
)
)
This code produce the same SQL code proposed by Tomasz Jakub Rup but with no use of RawSQL expression. The Django documentation warns against the use of RawSQL due to the possibility of SQL injection).
Update
I published an article based on this answer with more in-depth explanations: Updating a Django queryset with annotation and subquery on paulox.net
UPDATE statement doesn't support GROUP BY. See e.g. PostgreSQL Docs, SQLite Docs.
You need someting like this:
UPDATE relation
SET rating = (SELECT SUM(rating)
FROM sign_relation
WHERE relation_id = relation.id)
Equivalent in DjangoORM:
from django.db.models.expressions import RawSQL
Relation.objects.all(). \
update(rating=RawSQL('SELECT SUM(rating) FROM signrelation WHERE relation_id = relation.id', []))
or:
from django.db.models import F, Sum
from django.db.models.expressions import RawSQL
Relation.objects.all(). \
update(rating=RawSQL(SignRelation.objects. \
extra(where=['relation_id = relation.id']). \
values('relation'). \
annotate(sum_rating=Sum('rating')). \
values('sum_rating').query, []))
You can define your own custom objects manager:
class RelationManager(models.Manager):
def annotated(self,*args,*kwargs):
queryset = super(RelationManager,self).get_queryset()
for obj in queryset:
obj.rating = ... do something ...
return queryset
class Relations(models.Model):
rating = models.IntegerField(default=0)
rating_objects = RelationManager()
Then in your code:
q = Realation.rating_objects.annotated()
Add args/kwargs to customise what this manager returns.
Workaround for postgres:
with connection.cursor() as cursor:
sql, params = qs.query.sql_with_params()
cursor.execute("""
WITH qs AS ({})
UPDATE foo SET bar = qs.bar
FROM qs WHERE qs.id = foo.id
""".format(sql), params)
If you want to avoid many calls to the database, you should use transaction.atomic.
Read more on Django documentation: https://docs.djangoproject.com/en/1.9/topics/db/transactions/#controlling-transactions-explicitly
You really can't do this. Take a look at the code for update and follow it through for some fine reading.
Honestly, what's wrong with placing something like this in a Manager definition? Put those 3 lines you don't want to put in your view into a manager, call that manager as necessary. Additionally, you're doing much less "magic" and when the next developer looks at your code, they won't have to resort to a few WTF's .. :)
Also, I was curious and it looks like you can use SQL Join with UPDATE statements but it's some classic SQL hackery .. So if you're so inclined, you can use Djangos raw SQL functionality for that ;)
Related
Question is regarding Subquery and ArrayAgg in Django ORM.
For example I have 2 models without any relationship one to another:
class Example1(models.Model):
ident = Integerfield()
class Example2(models.Model):
ident = IntegerField()
email = EmailField()
There is no connection between 2 models like FK, M2M, O2O, but field
ident might be same integer in both models (which is a connection in a way)and in general for 1 instance of Example1 there are multiple instances of Example2 with same ident.
I want to make a subquery or arrayagg (db Postgres) or any way outside RAWSQL to make an annotation like this:
Example1.objects.annotate(
cls2=Subquery(
Example2.objects.filter(
ident=OuterRef(‘ident’
).values_list(‘email’, flat=True).
#or
Example1.objects.annotate(
cls2=StringAgg(
something here???,
delimeter=’, ‘,
distinct=True,)
Sure that this does not work as Subquery returns multiple rows and it seems like it impossible to use StringAgg as we dont have any connections between models(nothing to put inside StringAgg).
Any ideas how to annotate Example1 with emails from Example2 in one queryset?
This will be used in CASE expression.
Thanks...
For MySQL backend, you can use GroupConcat of django-mysql, or take a look on the post to make a aggregate function youself:
from django_mysql.models import GroupConcat
Example1.objects.annotate(
cls2=Subquery(
Example2.objects.filter(ident=OuterRef('ident')).values('ident')\
.annotate(emails=GroupConcat('email')).values('emails')
)
)
For PostgreSQL backend, you can use ArrayAgg or StringAgg:
from django.contrib.postgres.aggregates import ArrayAgg, StringAgg
Example1.objects.annotate(
cls2=Subquery(
Example2.objects.filter(ident=OuterRef('ident')).values('ident')\
.annotate(emails=ArrayAgg('email')).values('emails')
)
)
# or
Example1.objects.annotate(
cls2=Subquery(
Example2.objects.filter(ident=OuterRef('ident')).values('ident')\
.annotate(emails=StringAgg('email', ',')).values('emails')
)
)
django 2.0.2 python 3.4
models.py
Post(models.Model):
Id = pk
content = text
Reply(models.Model):
Id = pk
PostId = Fk(Post)
content = text
view.py
Post.objects.all().annotate(lastreply=F("Reply__content__last"))
can use last query in F() ?
As far as I know, latest cannot be used with F().
One possible solution is including a timestamp in the reply class
Post(models.Model):
Id = pk
content = text
Reply(models.Model):
Id = pk
PostId = Fk(Post)
content = text
timestamp = DateTime(auto)
Then you can use a query of this format to get the latest reply for each post.
Reply.objects.annotate(max_time=Max('Post__Reply__timestamp')).filter(timestamp=F('max_time'))
Please note that this is really time consuming for large number of records.
If you are using a Postgres DB you can use distinct()
Reply.objects.order_by('Post__Id','-timestamp').distinct('Post__Id')
F expression has no way to do that.
but Django has another way to handle it.
https://docs.djangoproject.com/en/2.0/ref/models/expressions/#subquery-expressions
for this problem, the code below can solve this:
from django.db.models import OuterRef, Subquery
sub_qs = Reply.objects.filter(
PostId=OuterRef('pk')
).order_by('timestamp')
qs = Post.objects.annotate(
last_reply_content=Subquery(
sub_qs.values('content')[:1]))
how does it work?
sub_qs is the related model queryset, where you want to take only the last reply for each post, to do that, we use the OuterRef, it will take care to get replies related to this post, and finally the order_by that will order by the timestamp, the first is the most recent, and the last is the eldest.
sub_qs = Reply.objects.filter(
PostId=OuterRef('pk')
).order_by('timestamp')
the second part is the Post queryset with a annotate, we wanna apply the sub_qs in an extra field, and using subquery will allow us to insert another queryset inside of annotate
we use .values('content') to get only the content field, and slice the sub_qs with [:1] to get only the first occurrence.
qs = Post.objects.annotate(
last_reply_content=Subquery(
sub_qs.values('content')[:1]))
I wrote a little survey app. I have a query that causes an exception upon upgrade to Django 2.0:
Expression contains mixed types. You must set output_field.
Here's a few relationships necessary to understand the query (where --fk--> indicats a foreign key):
response --fk--> question --fk--> survey
response --fk--> person
Here's my query:
answered_surveys = SurveyResponse.objects.all()\
.values('fk_question__fk_survey__hash', 'fk_question__fk_survey__name')\
.annotate(
nb_respondants = Count('fk_person', distinct=True),
survey_name = F('fk_question__fk_survey__name'),
survey_hash = F('fk_question__fk_survey__hash')
)\
.annotate(
completion=100*Count('id')/
(F('nb_respondants')*F('fk_question__fk_survey__nb_questions_per_survey'))
)
It's fine up until the last annotate, but I don't know where to add the output_field kwarg since I only have F and Count models. Count outputs an IntegerField by definition, F complains if I try to add that kwarg.
How do I fix this?
Thanks to #Gahan's comment, I discovered I need to use an ExpressionWrapper to perform arithmetic on F objects within an annotation. Docs here.
The last part hence becomes:
.annotate(
completion=ExpressionWrapper(
100*Count('id')\
(F('nb_respondants')*F('fk_question__fk_survey__nb_questions_per_survey')),
output_field=FloatField()
)
Of course, I don't mean to do what prefetch_related does already.
I'd like to mimic what it does.
What I'd like to do is the following.
I have a list of MyModel instances.
A user can either follows or doesn't follow each instance.
my_models = MyModel.objects.filter(**kwargs)
for my_model in my_models:
my_model.is_following = Follow.objects.filter(user=user, target_id=my_model.id, target_content_type=MY_MODEL_CTYPE)
Here I have n+1 query problem, and I think I can borrow what prefetch_related does here. Description of prefetch_related says, it performs the query for all objects and when the related attribute is required, it gets from the pre-performed queryset.
That's exactly what I'm after, perform query for is_following for all objects that I'm interested in. and use the query instead of N individual query.
One additional aspect is that, I'd like to attach queryset rather than attach the actual value, so that I can defer evaluation until pagination.
If that's too ambiguous statement, I'd like to give the my_models queryset that has is_following information attached, to another function (DRF serializer for instance).
How does prefetch_related accomplish something like above?
A solution where you can get only the is_following bit is possible with a subquery via .extra.
class MyModelQuerySet(models.QuerySet):
def annotate_is_follwing(self, user):
return self.extra(
select = {'is_following': 'EXISTS( \
SELECT `id` FROM `follow` \
WHERE `follow`.`target_id` = `mymodel`.id \
AND `follow`.`user_id` = %s)' % user.id
}
)
class MyModel(models.Model):
objects = MyModelQuerySet.as_manager()
usage:
my_models = MyModel.objects.filter(**kwargs).annotate_is_follwing(request.user)
Now another solution where you can get a whole list of following objects.
Because you have a GFK in the Follow class you need to manually create a reverse relation via GenericRelation. Something like:
class MyModelQuerySet(models.QuerySet):
def with_user_following(self, user):
return self.prefetch_related(
Prefetch(
'following',
queryset=Follow.objects.filter(user=user) \
.select_related('user'),
to_attr='following_user'
)
)
class MyModel(models.Model):
following = GenericRelation(Follow,
content_type_field='target_content_type',
object_id_field='target_id'
related_query_name='mymodels'
)
objects = MyModelQuerySet.as_manager()
def get_first_following_object(self):
if hasattr(self, 'following_user') and len(self.following_user) > 0:
return self.following_user[0]
return None
usage:
my_models = MyModel.objects.filter(**kwargs).with_user_following(request.user)
Now you have access to following_user attribute - a list with all follow objects per mymodel, or you can use a method like get_first_following_object.
Not sure if this is the best approach, and I doubt this is what prefetch_related does because I'm joining here.
I found there's way to select extra columns in your query.
extra_select = """
EXISTS(SELECT * FROM follow_follow
WHERE follow_follow.target_object_id = myapp_mymodel.id AND
follow_follow.target_content_type_id = %s AND
follow_follow.user_id = %s)
"""
qs = self.extra(
select={'is_following': extra_select},
select_params=[CONTENT_TYPE_ID, user.id]
)
So you can do this with join.
prefetch_related way of doing it would be separate queryset and look it up in queryset for the attribute.
I have these models:
def Foo(Models.model):
size = models.IntegerField()
# other fields
def is_active(self):
if check_condition:
return True
else:
return False
def Bar(Models.model):
foo = models.ForeignKey("Foo")
# other fields
Now I want to query Bars that are having active Foo's as such:
Bar.objects.filter(foo.is_active())
I am getting error such as
SyntaxError at /
('non-keyword arg after keyword arg'
How can I achieve this?
You cannot query against model methods or properties. Either use the criteria within it in the query, or filter in Python using a list comprehension or genex.
You could also use a custom manager. Then you could run something like this:
Bar.objects.foo_active()
And all you have to do is:
class BarManager(models.Manager):
def foo_active(self):
# use your method to filter results
return you_custom_queryset
Check out the docs.
I had similar problem: I am using class-based view object_list and I had to filter by model's method. (storing the information in database wasn't an option because the property was based on time and I would have to create a cronjob and/or... no way)
My answer is ineffective and I don't know how it's gonna scale on larger data; but, it works:
q = Model.objects.filter(...)...
# here is the trick
q_ids = [o.id for o in q if o.method()]
q = q.filter(id__in=q_ids)
You can't filter on methods, however if the is_active method on Foo checks an attribute on Foo, you can use the double-underscore syntax like Bar.objects.filter(foo__is_active_attribute=True)