How to get a Count based on a subquery? - django

I am suffering to get a query working despite all I have been trying based on my web search, and I think I need some help before becoming crazy.
I have four models:
class Series(models.Model):
puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
...
class Puzzle(models.Model):
puzzles = models.ManyToManyField(Puzzle, through='SeriesElement', related_name='series')
...
class SeriesElement(models.Model):
puzzle = models.ForeignKey(Puzzle,on_delete=models.CASCADE,verbose_name='Puzzle',)
series = models.ForeignKey(Series,on_delete=models.CASCADE,verbose_name='Series',)
puzzle_index = models.PositiveIntegerField(verbose_name='Order',default=0,editable=True,)
class Play(models.Model):
puzzle = models.ForeignKey(Puzzle, on_delete=models.CASCADE, related_name='plays')
user = models.ForeignKey(settings.AUTH_USER_MODEL, blank=True,null=True, on_delete=models.SET_NULL, related_name='plays')
series = models.ForeignKey(Series, blank=True, null=True, on_delete=models.SET_NULL, related_name='plays')
puzzle_completed = models.BooleanField(default=None, blank=False, null=False)
...
each user can play any puzzle several times, each time creating a Play record.
that means that for a given set of (user,series,puzzle) we can have several Play records,
some with puzzle_completed = True, some with puzzle_completed = False
What I am trying (unsuccesfully) to achieve, is to calculate for each series, through an annotation, the number of puzzles nb_completed_by_user and nb_not_completed_by_user.
For nb_completed_by_user, I have something which works in almost all cases (I have one glitch in one of my test that I cannot explain so far):
Series.objects.annotate(nb_completed_by_user=Count('puzzles',
filter=Q(puzzles__plays__puzzle_completed=True,
puzzles__plays__series_id=F('id'),puzzles__plays__user=user), distinct=True))
For nb_not_completed_by_user, I was able to make a query on Puzzle that gives me the good answer, but I am not able to transform it into a Subquery expression that works without throwing up an error, or to get a Count expression to give me the proper answer.
This one works:
puzzles = Puzzle.objects.filter(~Q(plays__puzzle_completed=True,
plays__series_id=1, plays__user=user),series=s)
but when trying to move to a subquery, I cannot find the way to use the following expression not to throw the error:ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
pzl_completed_by_user = Puzzle.objects.filter(plays__series_id=OuterRef('id')).exclude(
plays__puzzle_completed=True,plays__series_id=OuterRef('id'), plays__user=user)
and the following Count expression doesn't give me the right result:
Series.objects.annotate(nb_not_completed_by_user=Count('puzzles', filter=~Q(
puzzle__plays__puzzle_completed=True, puzzle__plays__series_id=F('id'),
puzzle__plays__user=user))
Could anybody explain me how I could obtain both values ?
and eventually to propose me a link which explains clearly how to use subqueries for less-obvious cases than those in the official documentation
Thanks in advance
Edit March 2021:
I recently found two posts which guided me through one potential solution to this specific issue:
Django Count and Sum annotations interfere with each other
and
Django 1.11 Annotating a Subquery Aggregate
I implemented the proposed solution from https://stackoverflow.com/users/188/matthew-schinckel and https://stackoverflow.com/users/1164966/benoit-blanchon
having help classes: class SubqueryCount(Subquery) and class SubquerySum(Subquery)
class SubqueryCount(Subquery):
template = "(SELECT count(*) FROM (%(subquery)s) _count)"
output_field = PositiveIntegerField()
class SubquerySum(Subquery):
template = '(SELECT sum(_sum."%(column)s") FROM (%(subquery)s) _sum)'
def __init__(self, queryset, column, output_field=None, **extra):
if output_field is None:
output_field = queryset.model._meta.get_field(column)
super().__init__(queryset, output_field, column=column, **extra)
It works extremely well ! and is far quicker than the conventional Django Count annotation.
... at least in SQlite, and probably PostgreSQL as stated by others.
But when I tried in a MariaDB environnement ... it crashed !
MariaDB is apparently not able / not willing to handle correlated subqueries as those are considered sub-optimal.
In my case, as I try to get from the database multiple Count/distinct annotations for each record at the same time, I really see a tremendous gain in performance (in SQLite)
that I would like to replicate in MariaDB.
Would anyone be able to help me figure out a way to implement those helper functions for MariaDB ?
What should template be in this environnement?
matthew-schinckel ?
benoit-blanchon ?
rktavi ?

Going a bit deeper and analysis the Django docs a bit more in details, I was finally able to produce a satisfying way to produce a Count or Sum based on subquery.
For simplifying the process, I defined the following helper functions:
To generate the subquery:
def get_subquery(app_label, model_name, reference_to_model_object, filter_parameters={}):
"""
Return a subquery from a given model (work with both FK & M2M)
can add extra filter parameters as dictionary:
Use:
subquery = get_subquery(
app_label='puzzles', model_name='Puzzle',
reference_to_model_object='puzzle_family__target'
)
or directly:
qs.annotate(nb_puzzles=subquery_count(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target')),)
"""
model = apps.get_model(app_label, model_name)
# we need to declare a local dictionary to prevent the external dictionary to be changed by the update method:
parameters = {f'{reference_to_model_object}__id': OuterRef('id')}
parameters.update(filter_parameters)
# putting '__id' instead of '_id' to work with both FK & M2M
return model.objects.filter(**parameters).order_by().values(f'{reference_to_model_object}__id')
To count the subquery generated through get_subquery:
def subquery_count(subquery):
"""
Use:
qs.annotate(nb_puzzles=subquery_count(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target')),)
"""
return Coalesce(Subquery(subquery.annotate(count=Count('pk', distinct=True)).order_by().values('count'), output_field=PositiveIntegerField()), 0)
To sum the subquery generated through get_subquery on the field field_to_sum:
def subquery_sum(subquery, field_to_sum, output_field=None):
"""
Use:
qs.annotate(total_points=subquery_sum(get_subquery(
'puzzles', 'Puzzle','puzzle_family__target'),'points'),)
"""
if output_field is None:
output_field = queryset.model._meta.get_field(column)
return Coalesce(Subquery(subquery.annotate(result=Sum(field_to_sum, output_field=output_field)).order_by().values('result'), output_field=output_field), 0)
The required imports:
from django.db.models import Count, Subquery, PositiveIntegerField, DecimalField, Sum
from django.db.models.functions import Coalesce
I spent so many hours on solving this ...
I hope that this will save many of you all the frustration I went through figuring out the right way to proceed.

Related

Double reverse filtering in Django ORM

I am facing one issue while filtering the data .
I am having three models ...
class Product:
size = models.CharField(max_length=200)
class Make(models.Model):
name = models.ForeignKey(Product, related_name='product_set')
class MakeContent(models.Model):
make = models.ForeignKey(Make, related_name='make_set')
published = models.BooleanField()
I can generate a queryset that contains all Makes and each one's related MakeContents where published = True.
Make.objects.filter(make_set__published=True)
I'd like to know if it's possible (without writing SQL directly) for me to generate a queryset that contains all Product and each one's related MakeContents where published = True.
I have tried this
Product.objects.filter(product_set__make_set__published=True)
But it's not working
A subquery can solve the problem.
from django.db.models import Subquery
sub_query = MakeContent.objects.filter(published=True)
Product.objects.filter(
pk__in=Subquery(sub_query.values('make__name__pk'))
)
Use this
Product.objects.filter(make__makecontent__published=True).distinct()
Filtering through ForeignKey works both forward and backwards in Django. You can use the model name in lower case for backward filtering.
https://docs.djangoproject.com/en/3.1/topics/db/queries/#lookups-that-span-relationships
Finally .distinct() is required to remove the multiple values produced due to SQL joins

How to sort by the sum of a related field in Django using class based views?

If I have a model of an Agent that looks like this:
class Agent(models.Model):
name = models.CharField(max_length=100)
and a related model that looks like this:
class Deal(models.Model):
agent = models.ForeignKey(Agent, on_delete=models.CASCADE)
price = models.IntegerField()
and a view that looked like this:
from django.views.generic import ListView
class AgentListView(ListView):
model = Agent
I know that I can adjust the sort order of the agents in the queryset and I even know how to sort the agents by the number of deals they have like so:
queryset = Agent.objects.all().annotate(uc_count=Count('deal')).order_by('-uc_count')
However, I cannot figure out how to sort the deals by the sum of the price of the deals for each agent.
Given you already know how to annotate and sort by those annotations, you're 90% of the way there. You just need to use the Sum aggregate and follow the relationship backwards.
The Django docs give this example:
Author.objects.annotate(total_pages=Sum('book__pages'))
You should be able to do something similar:
queryset = Agent.objects.all().annotate(deal_total=Sum('deal__price')).order_by('-deal_total')
My spidy sense is telling me you may need to add a distinct=True to the Sum aggregation, but I'm not sure without testing.
Building off of the answer that Greg Kaleka and the question you asked under his response, this is likely the solution you are looking for:
from django.db.models import Case, IntegerField, When
queryset = Agent.objects.all().annotate(
deal_total=Sum('deal__price'),
o=Case(
When(deal_total__isnull=True, then=0),
default=1,
output_field=IntegerField()
)
).order_by('-o', '-deal_total')
Explanation:
What's happening is that the deal_total field is adding up the price of the deals object but if the Agent has no deals to begin with, the sum of the prices is None. The When object is able to assign a value of 0 to the deal_totals that would have otherwise been given the value of None

Django distinct related querying

I have two models:
Model A is an AbstractUserModel and Model B
class ModelB:
user = ForeignKey(User, related_name='modelsb')
timestamp = DateTimeField(auto_now_add=True)
What I want to find is how many users have at least one ModelB object created at least in 3 of the 7 past days.
So far, I have found a way to do it but I know for sure there is a better one and that is why I am posting this question.
I basically split the query into 2 parts.
Part1:
I added a foo method inside the User Model that checks if a user meets the above conditions
def foo(self):
past_limit = starting_date - timedelta(days=7)
return self.modelsb.filter(timestamp__gte=past_limit).order_by('timestamp__day').distinct('timestamp__day').count() > 2
Part 2:
In the Custom User Manager, I find the users that have more than 2 modelsb objects in the last 7 days and iterate through them applying the foo method for each one of them.
By doing this I narrow down the iterations of the required for loop. (basically its a filter function but you get the point)
def boo(self):
past_limit = timezone.now() - timedelta(days=7)
candidates = super().get_queryset().annotate(rc=Count('modelsb', filter=Q(modelsb__timestamp__gte=past_limit))).filter(rc__gt=2)
return list(filter(lambda x: x.foo(), candidates))
However, I want to know if there is a more efficient way to do this, that is without the for loop.
You can use conditional annotation.
I haven't been able to test this query, but something like this should work:
from django.db.models import Q, Count
past_limit = starting_date - timedelta(days=7)
users = User.objects.annotate(
modelsb_in_last_seven_days=Count('modelsb__timestap__day',
filter=Q(modelsb__timestamp__gte=past_limit),
distinct=True))
.filter(modelsb_in_last_seven_days__gte = 3)
EDIT:
This solution did not work, because the distinct option does specify what field makes an entry distinct.
I did some experimenting on my own Django instance, and found a way to make this work using SubQuery. The way this works is that we generate a subquery where we make the distinction ourself.
counted_modelb = ModelB.objects
.filter(user=OuterRef('pk'), timestamp__gte=past_limit)
.values('timestamp__day')
.distinct()
.annotate(count=Count('timestamp__day'))
.values('count')
query = User.objects
.annotate(modelsb_in_last_seven_days=Subquery(counted_modelb, output_field=IntegerField()))
.filter(modelsb_in_last_seven_days__gt = 2)
This annotates each row in the queryset with the count of all distinct days in modelb for the user, with a date greater than the selected day.
In the subquery I use values('timestamp__day') to make sure I can do distinct() (Because a combination of distinct('timestamp__day') and annotate() is unsupported.)

"SELECT...AS..." with related model data in Django

I have an application where users select their own display columns. Each display column has a specified formula. To compute that formula, I need to join few related columns (one-to-one relationship) and compute the value.
The models are like (this is just an example model, actual has more than 100 fields):
class CompanyCode(models.Model):
"""Various Company Codes"""
nse_code = models.CharField(max_length=20)
bse_code = models.CharField(max_length=20)
isin_code = models.CharField(max_length=20)
class Quarter(models.Model):
"""Company Quarterly Result Figures"""
company_code = models.OneToOneField(CompanyCode)
sales_now = models.IntegerField()
sales_previous = models.IntegerField()
I tried doing:
ratios = {'growth':'quarter__sales_now / quarter__sales_previous'}
CompanyCode.objects.extra(select=ratios)
# raises "Unknown column 'quarter__sales_now' in 'field list'"
I also tried using raw query:
query = ','.join(['round((%s),2) AS %s' % (formula, ratio_name)
for ratio_name, formula in ratios.iteritems()])
companies = CompanyCode.objects.raw("""
SELECT `backend_companycode`.`id`, %s
FROM `backend_companycode`
INNER JOIN `backend_quarter` ON ( `backend_companycode`.`id` = `backend_companyquarter`.`company_code_id` )
""", [query])
#This just gives empty result
So please give me a little clue as to how I can use related columns preferably using 'extra' command. Thanks.
By now the Django documentation says that one should use extra as a last resort.
So here is a query without extra():
from django.db.models import F
CompanyCode.objects.annotate(
growth=F('quarter__sales_now') / F('quarter__sales_previous'),
)
Since the calculation is being done on a single Quarter instance, where's the need to do it in the SELECT? You could just define a ratio method/property on the Quarter model:
#property
def quarter(self):
return self.sales_now / self.sales_previous
and call it where necessary
Ok, I found it out. In above using:
CompanyCode.objects.select_related('quarter').extra(select=ratios)
solved the problem.
Basically, to access any related model data through 'extra', we just need to ensure that that model is joined in our query. Using select_related, the query automatically joins the mentioned models.
Thanks :).

Ordering by a custom model field in django

I am trying to add an additional custom field to a django model. I have been having quite a hard time figuring out how to do the following, and I will be awarding a 150pt bounty for the first fully correct answer when it becomes available (after it is available -- see as a reference Improving Python/django view code).
I have the following model, with a custom def that returns a video count for each user --
class UserProfile(models.Model):
user = models.ForeignKey(User, unique=True)
positions = models.ManyToManyField('Position', through ='PositionTimestamp', blank=True)
def count(self):
from django.db import connection
cursor = connection.cursor()
cursor.execute(
"""SELECT (
SELECT COUNT(*)
FROM videos_video v
WHERE v.uploaded_by_id = p.id
OR EXISTS (
SELECT NULL
FROM videos_videocredit c
WHERE c.video_id = v.id
AND c.profile_id = p.id
)
) AS Total_credits
FROM userprofile_userprofile p
WHERE p.id = %d"""%(int(self.pk))
)
return int(cursor.fetchone()[0])
I want to be able to order by the count, i.e., UserProfile.objects.order_by('count'). Of course, I can't do that, which is why I'm asking this question.
Previously, I tried adding a custom model Manager, but the problem with that was I also need to be able to filter by various criteria of the UserProfile model: Specifically, I need to be able to do: UserProfile.objects.filter(positions=x).order_by('count'). In addition, I need to stay in the ORM (cannot have a raw sql output) and I do not want to put the filtering logic into the SQL, because there are various filters, and would require several statements.
How exactly would I do this? Thank you.
My reaction is that you're trying to take a bigger bite than you can chew. Break it into bite size pieces by giving yourself more primitives to work with.
You want to create these two pieces separately so you can call on them:
Does this user get credit for this video? return boolean
For how many videos does this user get credit? return int
Then use a combination of #property, model managers, querysets, and methods that make it easiest to express what you need.
For example you might attach the "credit" to the video model taking a user parameter, or the user model taking a video parameter, or a "credit" manager on users which adds a count of videos for which they have credit.
It's not trivial, but shouldn't be too tricky if you work for it.
"couldn't you use something like the "extra" queryset modifier?"
see the docs
I didn't put this in an answer at first because I wasn't sure it would actually work or if it was what you needed - it was more like a nudge in the (hopefully) right direction.
in the docs on that page there is an example
query
Blog.objects.extra(
select={
'entry_count': 'SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id'
},
)
resulting sql
SELECT blog_blog.*, (SELECT COUNT(*) FROM blog_entry WHERE blog_entry.blog_id = blog_blog.id) AS entry_count
FROM blog_blog;
Perhaps doing something like that and accessing the user id which you currently have as p.id as appname_userprofile.id
note:
Im just winging it so try to play around a bit.
perhaps use the shell to output the query as sql and see what you are getting.
models:
class Positions(models.Model):
x = models.IntegerField()
class Meta:
db_table = 'xtest_positions'
class UserProfile(models.Model):
user = models.ForeignKey(User, unique=True)
positions = models.ManyToManyField(Positions)
class Meta:
db_table = 'xtest_users'
class Video(models.Model):
usr = models.ForeignKey(UserProfile)
views = models.IntegerField()
class Meta:
db_table = 'xtest_video'
result:
test = UserProfile.objects.annotate(video_views=Sum('video__views')).order_by('video_views')
for t in test:
print t.video_views
doc: https://docs.djangoproject.com/en/dev/topics/db/aggregation/
This is either what you want, or I've completely misunderstood!.. Anywhoo... Hope it helps!