Django - aggregate fields value from joined model - django

my goal here seems to be simple: display the Sum (aggregation) of a foreign model particular field.
The difficulty consist in the current set-up, kindly take a look and let me know if this need to be changed or I can achieve the goal with current model:
class Route(models.Model):
name = models.CharField(max_length=50)
route_length = models.IntegerField()
class Race(models.Model):
race_cod = models.CharField(max_length=6, unique=True)
route_id = models.ForeignKey(Route, on_delete=models.CASCADE, related_name='b_route')
class Results(models.Model):
race_id = models.ForeignKey(Race, on_delete=models.CASCADE, related_name='r_race')
runner_id = models.ForeignKey(Runner, on_delete=models.CASCADE, related_name='r_runner')
Now, I am trying to have like a year summary:
Runner X have raced in 12 races with a total distance of 134 km.
While I was able to count the number of races like this (views.py)
runner = Runner.objects.get(pk=pk)
number_races = Results.objects.filter(runner_id=runner).count()
For computing the distance I have tried:
distance = Results.objects.filter(runner_id=runner).annotate(total_km=Sum(race_id.route_id.route_length))
This code error out stating that on views.py - distance line
Exception Type: NameError
Exception Value: name 'race_id' is not defined
I am sure I did not u/stood exactly how this works. Anybody kind enough to clarify this issue?
Thank you

My workaround is the following :
tmp_race_id = Results.objects.filter(runner_id=runner).values('race_id')
tmp_route_id = Race.objects.filter(pk__in=tmp_race_id).values('route_id')
distance = Route.objects.filter(pk__in=tmp_route_id).aggregate(Sum("route_length "))['route_length __sum'] or 0.00
Thank you Jorge Lopez for the hint.

you don´t need a Results Model, you can calculate using the data in the models, can you share your Runner Model? that model needs to have a foreign key to a Race. if that is so, you can go from Route's -> Race -> Runner in your query, and you can use the query for the count, so you will have a variable where you stored the count and a variable where you stored the distance. To do a Sum in your query do not use annotate, use aggregate, something like this:
.aggregate(total=Coalesce(Sum('route_lenght'), 0))['total']

do like this
from django.db.models import Sum, Count
u = Runner.objects.annotate(
tot_result=Count('r_runner'),
tot_km=Sum('r_runner__race_id__route_id__route_length')
)
for i in u:
print('Total_race {} -- Total_Km {}'.format(i.tot_result, i.tot_km))

Related

Django annotation on compoundish primary key with filter ignoring primary key resutling in too many annotated items

Please see EDIT1 below, as well.
Using Django 3.0.6 and python3.8, given following models
class Plants(models.Model):
plantid = models.TextField(primary_key=True, unique=True)
class Pollutions(models.Model):
pollutionsid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
pollutant = models.TextField()
releasesto = models.TextField(blank=True, null=True)
amount = models.FloatField(db_column="amount", blank=True, null=True)
class Meta:
managed = False
db_table = 'pollutions'
unique_together = (('plantid', 'releasesto', 'pollutant', 'year'))
class Monthp(models.Model):
monthpid = models.IntegerField(unique=True, primary_key=True)
year = models.IntegerField()
month = models.IntegerField()
plantid = models.ForeignKey(Plants, models.DO_NOTHING, db_column='plantid')
power = models.IntegerField(null=False)
class Meta:
managed = False
db_table = 'monthp'
unique_together = ('plantid', 'year', 'month')
I'd like to annotate - based on a foreign key relationship and a fiter a value, particulary - to each plant the amount of co2 and the Sum of its power for a given year. For sake of debugging having replaced Sum by Count using the following query:
annotated = tmp.all().annotate(
energy=Count('monthp__power', filter=Q(monthp__year=YEAR)),
co2=Count('pollutions__amount', filter=Q(pollutions__year=YEAR, pollutions__pollutant="CO2", pollutions__releasesto="Air")))
However this returns too many items (a wrong number using Sum, respectively)
annotated.first().co2 # 60, but it should be 1
annotated.first().energy # 252, but it should be 1
although my database guarantees - as denoted, that (plantid, year, month) and (plantid, releasesto, pollutant, year) are unique together, which can easily be demonstrated:
pl = annotated.first().plantid
testplant = Plants.objects.get(pk=pl) # plant object
pco2 = Pollutions.objects.filter(plantid=testplant, year=YEAR, pollutant="CO2", releasesto="Air")
len(pco2) # 1, as expected
Why does django return to many results and how can I tell django to limit the elements to annotate to the 'current primary key' in other words to only annotate the elements where the foreign key matches the primary key?
I can achieve what I intend to do by using distinct and Max:
energy=Sum('yearly__power', distinct=True, filter=Q(yearly__year=YEAR)),
co2=Max('pollutions__amount', ...
However the performance is inacceptable.
I have tested to use model_to_dict and appending the wanted values "by hand" to the dict, which works for the values itself, but not for sorting the resulted dict (e.g. by energy) and it is acutally faster than the workaround directly above.
It conceptually strikes to me that the manual approach is faster than letting the database do, what it is intended to do.
Is this a feature limitation of django's orm or am I missing something?
EDIT1:
The behaviour is known as bug since 11 years.
Even others "spent a whole day on this".
I am now trying it with subqueries. However the forein key I am using is not a primary key of its table. So the kind of "usual" approach to use "pk=''" does not work. More clearly, trying:
tmp = Plants.objects.filter(somefilter)
subq1 = Subquery(Yearly.objects.filter(pk=OuterRef('plantid'), year=YEAR)) tmp1 = tmp.all().annotate(
energy=Count(Subquery(subq1))
)
returns
OperationalError at /xyz
no such column: U0.yid
Which definitely makes sense because Plants has no clue what a yid is, it only knows plantids. How do I adjust the subquery to that?

How do I construct an order_by for a specific record in a ManyToOne field?

I'm trying to sort (order) by statistical data stored in a ManyToOne relationship. Suppose I have the following code:
class Product(models.Model):
info = ...
data = models.IntegerField(default=0.0)
class Customer(models.Model):
info = ...
purchases = models.ManyToManyField(Product, related_name='customers', blank=True)
class ProductStats(models.Model):
ALL = 0
YOUNG = 1
OLD = 2
TYPE = ((ALL, 'All'), (YOUNG, 'Young'), (OLD, 'Old'),)
stats_type = models.SmallIntegerField(choices=TYPE)
product = models.ForeignKey(Product, related_name='stats', on_delete=models.CASCADE)
data = models.FloatField(default=0.0)
Then I would like to sort the products by their stats for the ALL demographic (assume every product has a stats connected to it for ALL). This might look something like the following:
products = Product.objects.all().order_by('stats__data for stats__stats_type=0')
Currently the only solution I can think of is either to create a new stats class just for all and use a OneToOneField for Product. Or, add a OneToOneField for Product pointing to the ALL stats in ProductStats.
Thank you for your help.
How about like this using multiple fields in order_by:
Product.objects.all().order_by('stats__data', 'stats__stats_type')
# it will order products from stats 0, then 1 then 2
Or if you want to get data for only stats_type 0:
Product.objects.filter(stats__stats_type=0).order_by('stats__data')
You can annotate the value of the relevant demographic and order by that:
from django.db.models import F
Product.objects.all().filter(stats__stats_type=0).annotate(data_for_all=F('stats__data').order_by('data_for_all')

Proper way to annotate a rank field for a queryset

Assume models like this:
class Person(models.Model):
name = models.CharField(max_length=20)
class Session(models.Model):
start_time = models.TimeField(auto_now_add=True)
end_time = models.TimeField(blank=True, null=True)
person = models.ForeignKey(Person)
class GameSession(models.Model):
game_type = models.CharField(max_length=2)
score = models.PositiveIntegerField(default=0, blank=True)
session = models.ForeignKey(Session)
I want to have a queryset function to return total score of each person which is addition of all his games score and all times he has spent in all his sessions alongside with a rank that a person has relative to all persons. Something like below:
class DenseRank(Func):
function = 'DENSE_RANK'
template = '%(function)s() Over(Order by %(expressions)s desc)'
class PersonQuerySet(models.query.QuerySet):
def total_scores(self):
return self.annotate(total_score=some_fcn_for_calculate).annotate(rank=DenseRank('total_score'))
I could find a way to calculate total score, but dense rank is not what I want, because it just calculates rank based on persons in current queryset but I want to calculate rank of a person relative to all persons.
I use django 1.11 and postgres 10.5, please suggest me a proper way to find rank of each person in a queryset because I want to able to add another filter before or after calculating total_score and rank.
Sadly, it is not a possible operation since (to me) the postgresql WHERE operation (filter/exclude) narrows the rows before the aggregation functions can work on them.
The only solution I found is to simply compute the ranking for all Person with a separate queryset and then, to annotate your queryset with these results.
This answer (see the improved method) explains how to "annotate a queryset with externally prepared data in a dict".
Here is the implementation I made for your models:
class PersonQuerySet(models.QuerySet):
def total_scores(self):
# compute the global ranking
ranks = (Person.objects
.annotate(total_score=models.Sum('session__gamesession__score'))
.annotate(rank=models.Window(expression=DenseRank(),
order_by=models.F('total_score').decs()))
.values('pk', 'rank'))
# extract and put ranks in a dict
rank_dict = dict((e['pk'], e['rank']) for e in ranks)
# create `WHEN` conditions for mapping filtered Persons to their Rank
whens = [models.When(pk=pk, then=rank) for pk, rank in rank_dict.items()]
# build the query
return (self.annotate(rank=models.Case(*whens, default=0,
output_field=models.IntegerField()))
.annotate(total_score=models.Sum('session__gamesession__score')))
I tested it with Django 2.1.3 and Postgresql 10.5, so the code may lightly change for you.
Feel free to share a version compatible with Django 1.11!

Sorting by distance with a related ManyToMany field

I have this two models.
class Store(models.Model):
coords = models.PointField(null=True,blank=True)
objects = models.GeoManager()
class Product(models.Model):
stores = models.ManyToManyField(Store, null=True, blank=True)
objects = models.GeoManager()
I want to get the products sorted by the distance to a point. If the stores field in Product was a Foreign Key I would do this and it works.
pnt = GEOSGeometry('POINT(5 23)')
Product.objects.distance(pnt, field_name='stores__coords').order_by('distance')
But since the field is a ManyToMany field it breaks with
ValueError: <django.contrib.gis.db.models.fields.PointField: coords> is not in list
I kind of expected this because it's not clear which of the stores it should use to calculate the distance, but is there any way to do this.
I need the list of products ordered by distance to a specific point.
Just an idea, maybe this would work for you, this should take only two database queries (due to how prefetch works). Don't judge harshly if it doesn't work, I haven't tried it:
class Store(models.Model):
coords = models.PointField(null=True,blank=True)
objects = models.GeoManager()
class Product(models.Model):
stores = models.ManyToManyField(Store, null=True, blank=True, through='ProductStore')
objects = models.GeoManager()
class ProductStore(models.Model):
product = models.ForeignKey(Product)
store = models.ForeignKey(Store)
objects = models.GeoManager()
then:
pnt = GEOSGeometry('POINT(5 23)')
ps = ProductStore.objects.distance(pnt, field_name='store__coords').order_by('distance').prefetch_related('product')
for p in ps:
p.product ... # do whatever you need with it
This is how I solved it but I dont really like this solution. I think is very inefficient. There should be a better way with GeoDjango. So, until i find a better solution I probably wont be using this. Here's what I did.
I added a new method to the product model
class Product(models.Model):
stores = models.ManyToManyField(Store, null=True, blank=True)
objects = models.GeoManager()
def get_closes_store_distance(point):
sorted_stores = self.stores.distance(point).order_by('distance')
if sorted_stores.count() > 0:
store = sorted_stores[0]
return store.distance.m
return 99999999 # If no store, return very high distance
Then I can sort this way
def sort_products(self, obj_list, lat, lng):
pt = 'POINT(%s %s)' % (lng, lat)
srtd = sorted(obj_list, key=lambda obj: obj.get_closest_store_distance(pt))
return srtd
Any better solutions or ways to improve this one are very welcome.
I will take "distance from a product to a point" to be the minimum distance from the point to a store with that product. I will take the output to be a list of (product, distance) for all products sorted by distance ascending. (A comment by someone who placed a bounty indicated they sometimes also want (product,distance,store) sorted by distance then store within product.)
Every model has a corresponding table. The fields of the model are the columns of the table. Every model/table should have a fill-in-the-(named-)blanks statement where its records/rows are the ones that make a true statement.
Store(coords,...) // store [store] is at [coords] and ...
Product(product,store,...) // product [product] is stocked by store [store] and ...
Since Product has store(s) as manyToManyField it already is a "ProductStore" table of products and stocking stores and Store already is a "StoreCoord" table of stores and their coordinates.
You can mention any object's fields in a query filter() for a model with a manyToManyField.
The SQL for this is simple:
select p.product,distance
select p.product,distance(s.coord,[pnt]) as distance
from Store s join Product p
on s.store=p.store
group by product
having distance=min(distance)
order by distance
It should be straightforward to map this to a query. However, I am not familiar enough with Django to give you exact code now.
from django.db.models import F
q = Product.objects.all()
.filter(store__product=F('product'))
...
.annotate(distance=Min('coord.distance([pnt])'))
...
.order_by('distance')
The Min() is an example of aggregation.
You may also be helped by explicitly making a subquery.
It is also possible to query this by the raw interface. However, the names above are not right for a Django raw query. Eg the table names will by default be APPL_store and APPL_product where APPL is your application name. Also, distance is not your pointField operator. You must give the right distance function. But you should not need to query at the raw level.

Average calculated on the difference on a Min and Max value in Django

I have two simple tables in Django which looks like:
class Session(models.Model):
id = models.AutoField(primary_key=True)
class Track(models.Model):
id = models.AutoField(primary_key=True)
session = models.ForeignKey(Session)
when = models.DateTimeField(null=False, auto_now_add=True)
I need to find a the average duration of all the sessions. The duration of a session is calculated by subtracting the highest when value with the lowest when value. Can I do something like this:
Session.objects.all().annotate(duration=Max('track__when') - Min('track__when')).aggregate(Avg('duration'))
Any better methods?
Thanks.
As I already answered you the same question in Querying data from Django , you can't do using only the ORM API, either:
you extend the API by defining your custom annotations, or
you do part of the algorithm in python, in your view/templatetag/whatever