Django annotate M2M object with field from through model - django

Let's say I have the following models:
class Color(models.Model):
name = models.CharField(max_length=255, unique=True)
users = models.ManyToManyField(User, through="UserColor", related_name="colors")
class UserColor(models.Model):
class Meta:
unique_together = (("user", "color"), ("user", "rank"))
user = models.ForeignKey(User, on_delete=models.CASCADE)
color = models.ForeignKey(Color, on_delete=models.CASCADE)
rank = models.PositiveSmallIntegerField()
I want to fetch all users from the database with their respective colors and color ranks. I know I can do this by traversing across the through model, which makes a total of 3 DB hits:
users = User.objects.prefetch_related(
Prefetch(
"usercolor_set",
queryset=UserColor.objects.order_by("rank").prefetch_related(
Prefetch("color", queryset=Color.objects.only("name"))
),
)
)
for user in users:
for usercolor in user.usercolor_set.all():
print(user, usercolor.color.name, usercolor.rank)
I discovered another way to do this by annotating the rank onto the Color objects, which makes sense because we have a unique constraint on user and color.
users = User.objects.prefetch_related(
Prefetch(
"colors",
queryset=(
Color.objects.annotate(rank=F("usercolor__rank"))
.order_by("rank")
.distinct()
),
)
)
for user in users:
for color in user.colors.all():
print(user, color, color.rank)
This approach comes with several benefits:
Makes only 2 DB hits instead of 3.
Don't have to deal with the through object, which I think is more intuitive.
However, it only works if I chain distinct() (otherwise I get duplicate objects) and I'm worried this may not be a legit approach (maybe I just came up with a hack that may not work in all cases).
So is the second solution legit? Is there a better way to it? Or should I stick to the first one?

Related

Trouble with Django annotation with multiple ForeignKey references

I'm struggling with annotations and haven't found examples that help me understand. Here are relevant parts of my models:
class Team(models.Model):
team_name = models.CharField(max_length=50)
class Match(models.Model):
match_time = models.DateTimeField()
team1 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team1')
team2 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team2')
team1_points = models.IntegerField(null=True)
team2_points = models.IntegerField(null=True)
What I'd like to end up with is an annotation on the Teams objects that would give me each team's total points. Sometimes, a team is match.team1 (so their points are in match.team1_points) and sometimes they are match.team2, with their points stored in match.team2_points.
This is as close as I've gotten, in maybe a hundred or so tries:
teams = Team.objects.annotate(total_points =
Value(
(Match.objects.filter(team1=21).aggregate(total=Sum(F('team1_points'))))['total'] or 0 +
(Match.objects.filter(team2=21).aggregate(total=Sum(F('team2_points'))))['total'] or 0,
output_field=IntegerField())
)
This works great, but (of course) annotates the total_points for the team with pk=21 to every team in the queryset. If there's a better approach for all this, I'd love to see it, but short of that, if you can show me how to turn those '21' values into a reference to the outer team's pk, I think that will work?
EDIT: I ended up using a combination of elyas' answers and annotating a raw SQL statement to solve my issues. I was not able to keep normal annotations from dropping non-unique scores from the queryset, but raw SQL seems to work.
Here's that raw annotation:
teams = Team.objects.raw('select id, sum(points) as total_points from (select team1_id as id, team1_points as points from leagueman_match union all select team2_id as id, team2_points as points from leagueman_match) group by id order by total_points desc;')
An alternative solution to annotating the QuerySet might be to make total_points a #property of the Team model (depending on the use case):
from django.db.models import Case, Q, Sum, When
class Team(models.Model):
team_name = models.CharField(max_length=50)
#property
def total_points(self):
return Match.objects.filter(Q(team1=self.id) | Q(team2=self.id)).aggregate(
total_points=Sum(Case(
When(team1=self.id, then='team1_points'),
When(team2=self.id, then='team2_points')
))
)['total_points']
The disadvantage is that it can't be used in subsequent QuerySet operations e.g. .values(), .order_by().
Django also has a #cached_property decorator which will cache the output of the attribute when it is first called.
Other solutions tried
Originally I thought you could leverage the reverse relations match_team1 and match_team2 from the Team model to generate a simple annotation:
teams = Team.objects.annotate(
total_points=(
Sum('match_team1__team1_points', distinct=True)
+ Sum('match_team2__team2_points', distinct=True)
)
)
Unfortunately this solution encounters difficulties in handling duplicates. The distinct=True argument eliminates the issue of points from the same match being summed more than once. But it introduces a different issue where different matches with the same points scored will be excluded.
Maybe I would advice to refactor your data model. Even if you find a solution for this specific problem, you may want to think little ahead.
This is a solution:
class Team(models.Model):
team_name = models.CharField(max_length=50)
class Match(models.Model):
match_time = models.DateTimeField()
team1 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team1')
team2 = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='match_team2')
class Points(models.Model):
match = models.ForeignKey(
Match, on_delete=models.CASCADE, related_name='match')
team = models.ForeignKey(
Team, on_delete=models.CASCADE, related_name='team')
points = models.IntegerField(null=True)
With this you can sum up easily the points of any team, and also filter it by matches.

Annotating values from filtered related objects -- Case, Subquery, or another method?

I have some models in Django:
# models.py, simplified here
class Category(models.Model):
"""The category an inventory item belongs to. Examples: car, truck, airplane"""
name = models.CharField(max_length=255)
class UserInterestCategory(models.Model):
"""
How interested is a user in a given category. `interest` can be set by any method, maybe a neural network or something like that
"""
user = models.ForeignKey(User, on_delete=models.CASCADE) # user is the stock Django user
category = models.ForeignKey(Category, on_delete=models.CASCADE)
interest = models.PositiveIntegerField(default=0, validators=[MinValueValidator(0)])
class Item(models.Model):
"""This is a product that we have in stock, which we are trying to get a User to buy"""
model_number = models.CharField(max_length=40, default="New inventory item")
product_category = models.ForeignKey(Category, null=True, blank=True, on_delete=models.SET_NULL, verbose_name="Category")
I have a list view showing items, and I'm trying to sort by user_interest_category for the currently logged in user.
I have tried a couple different querysets and I'm not thrilled with them:
primary_queryset = Item.objects.all()
# this one works, and it's fast, but only finds items the users ALREADY has an interest in --
primary_queryset = primary_queryset.filter(product_category__userinterestcategory__user=self.request.user).annotate(
recommended = F('product_category__userinterestcategory__interest')
)
# this one works great but the baby jesus weeps at its slowness
# probably because we are iterating through every user, item, and userinterestcategory in the db
primary_queryset = primary_queryset.annotate(
recommended = Case(
When(product_category__userinterestcategory__user=self.request.user, then=F('product_category__userinterestcategory__interest')),
default=Value(0),
output_field=IntegerField(),
)
)
# this one works, but it's still a bit slow -- 2-3 seconds per query:
interest = Subquery(UserInterestCategory.objects.filter(category=OuterRef('product_category'), user=self.request.user).values('interest'))
primary_queryset = primary_queryset.annotate(interest)
The third method is workable, but it doesn't seem like the most efficient way to do things. Isn't there a better method than this?

Django: How to create a queryset from a model relationship?

I have been struggling with grasping relations for some time and would be very grateful if someone can help me out on this issue.
I have a relation that connects the User model to a ProcessInfo model via one to many and then I have a relation that connects the ProcessInfo to the ProcessAssumptions as One to one
Is there a way to use the User id to get all ProcessAssumptions related to all processes from that user.
I would like to retrieve a queryset of all ProcessAssumptions related to a user id
Here is the model relation :
class ProcessInfo(models.Model):
process_name = models.CharField(max_length=120, null=True)
user_rel = models.ForeignKey(User, null=True, on_delete=models.SET_NULL)
class ProcessAssumptions(models.Model):
completion_time = models.FloatField(default='0')
process_rel_process = models.OneToOneField(ProcessInfo, primary_key = True, on_delete=models.CASCADE)
Using field referencing for foreign keys.
process_assumption_objects = ProcessAssumptions.objects.filter(process_rel_process__user_rel=<user_id>)
Replace <user_id> with the id you wish to query for.
When you define a relationship to model X in another model Y, all related Ys can be accessed from an instance of X by X_instance.Y_set.all(). You can even perform the regular filter or get operations on that. X_instance.Y_set is the default object manager for Y (same as Y.objects), but it's filtered to only contain the objects that are related to X_instance.
So in this specific case, you can get all ProcessInfo objects for a certain user like this:
user = User.objects.get(the_user_id)
required_assumptions = [proc_info.process_assumptions for proc_info in user.process_info_set.all()]
This might be a bit hard to read with _set suffix, so you can define a related_name argument while defining the relation on the model.
like:
# in class ProcessInfo
user_rel = models.ForeignKey(User, null=True, on_delete=models.SET_NULL, related_name='processes')
# and now you can do
some_user.processes.all()

Best way to handle one ForeignKey field that can be sourced from multiple Database Tables

I am running into a little bit of unique problem and wanted to see which solution fit best practice or if I was missing anything in my design.
I have a model - it has a field on it that represents a metric. That metric is a foreign key to an object which can come from several database tables.
Idea one:
Multiple ForeignKey fields. I'll have the benefits of the cascade options, direct access to the foreign key model instance from MyModel, (although that's an easy property to add), and the related lookups. Pitfalls include needing to check an arbitrary number of fields on the model for a FK. Another is logic to make sure that only one FK field has a value at a given time (easy to check presave) although .update poses a problem. Then theres added space in the database from all of the columns, although that is less concerning.
class MyModel(models.Model):
source_one = models.ForeignKey(
SourceOne,
null=True,
blank=True,
on_delete=models.SET_NULL,
db_index=True
)
source_two = models.ForeignKey(
SourceTwo,
null=True,
blank=True,
on_delete=models.SET_NULL,
db_index=True
)
source_three = models.ForeignKey(
SourceThree,
null=True,
blank=True,
on_delete=models.SET_NULL,
db_index=True
)
Idea two:
Store a source_id and source on the model. Biggest concern I have with this is needing to maintain logic to set these fields to null if the source is deleted. It otherwise seems like a cleaner solution, but not sure if the overhead to make sure the data is accurate is worth it. I can probably write some logic in a delete hook on the fk models to clean MyModel up if necessary.
class MyModel(models.Model):
ONE = 1
TWO = 2
THREE = 3
SOURCES = (
(ONE, "SourceOne"),
(TWO, "SourceTwo"),
(THREE, "SourceThree")
)
source_id = models.PositiveIntegerField(null=True, blank=True)
source = models.PositiveIntegerField(null=True, blank=True, choices=SOURCES)
I would love the communities opinion.
Your second idea seems fragile as the integrity is not ensured by the database as you have pointed out yourself.
Without knowing more about the use case, it's difficult to provide an enlightened advice however if your "metric" object is refered by many other tables, I wonder if you should consider approaching this the other way round, i.e. defining the relationships from the models consuming this metric.
To exemplify, let's say that your project is a photo gallery and that your model represents a tag. Tags could be associated to photos, photo albums or users (e.g.. the tags they want to follow).
The approach would be as follow:
class Tag(models.Model):
pass
class Photo(models.Model):
tags = models.ManyToManyField(Tag)
class Album(models.Model):
tags = models.ManyToManyField(Tag)
class User(AbstractUser):
followed_tags = models.ManyToManyField(Tag)
You may even consider to factor in this relationship in an abstract model as outlined below:
class Tag(models.Model):
pass
class TaggedModel(models.Model):
tags = models.ManyToManyField(Tag)
class Meta:
abstract = True
class Photo(TaggedModel):
pass
As mentioned in the comments, you are looking for a Generic Relation:
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
class SourceA(models.Model):
name = models.CharField(max_length=45)
class SourceB(models.Model):
name = models.CharField(max_length=45)
class MyModel(models.Model):
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.PositiveIntegerField()
source = GenericForeignKey('content_type', 'object_id')
There are three parts to setting up a Generic Relation:
Give your model a ForeignKey to ContentType. The usual name for this field is “content_type”.
Give your model a field that can store primary key values from the models you’ll be relating to. For most models, this means a PositiveIntegerField. The usual name for this field is “object_id”.
Give your model a GenericForeignKey, and pass it the names of the two fields described above. If these fields are named “content_type” and “object_id”, you can omit this – those are the default field names GenericForeignKey will look for.
Now you can pass any Source instance to the source field of MyModel, regardless of which model it belongs to:
source_a = SourceA.objects.first()
source_b = SourceB.objects.first()
MyModel.objects.create(source=source_a)
MyModel.objects.create(source=source_b)

m2m field 'through' another model that contains two of the same fields

The basic idea is that I want to track training and have a roster for each training session. I would like to also track who entered each person in the roster hence a table rather than just an M2M to the Member model within Training.
So, here is what I currently have:
class Training( models.Model ):
name = models.CharField( max_length=100, db_index=True )
date = models.DateField( db_index=True )
roster = models.ManyToManyField( Member, through='TrainingRoster' )
class TrainingRoster( models.Model ):
training = models.ForeignKey( Training )
member = models.ForeignKey( Member )
## auto info
entered_by = models.ForeignKey( Member, related_name='training_roster_entered_by' )
entered_on = models.DateTimeField( auto_now_add = True )
The problem is that django doesn't like the "roster=models.m2m( Member, through='TrainingRoster') as there are two fields in TrainingRoster with a ForeignKey of Member. I understand why it is unhappy, but is there not a way to specify something like: through='TrainingRoster.member'. That doesn't work, but it seems like it should.
[I will admit that I am wondering if the "entered_by" and "entered_on" fields are the best for these models. I want to track who is entering each piece of information but possible a log table might be better than having the two extra fields in the TrainingRoster table. But that is a whole separate question. Though would make this question easier. :-) ]