Grouping aggregations in Django

Grouping aggregations in Django - django

For reporting I want to group some counts together but still have access to the row data. I feel like I do this a lot but can't find any of the code today. This is in a B2B video delivery platform. A simplified version of my modelling:
Video(name)
Company(name)
View(video, company, when)
I want to show which companies viewed the most videos, but also show which Videos were the most popular.
+-----------+-----------+-------+
| Company | Video | Views |
+-----------+-----------+-------+
| Company E | Video 1 | 1215 |
| | Video 2 | 5 |
| | Video 3 | 1 |
| Company B | Video 2 | 203 |
| | Video 4 | 3 |
+-----------+-----------+-------+
I can do that by annotating views (to get the right to a company order, most views first) and then looping companies and performing a View query for each company, but how can I do this in in O(1-3) queries rather than O(n)?

So I think I got there with .values(). I always seem to trip over this and it does mean I'll have to do a couple of additional lookups to pull out the video and company names... But 3 feels healthier than n queries.
results = (
Video.objects
.values('views__company_id', 'id')
.annotate(
view_count=Count('views', filter=q_last_month) # filter count based on last month
)
)

Related

How to use prefetch_related to retrieve multiple rows similar to SQL result

I’ve a question about the usage of prefetch_related. Based on my understanding I need to use prefetch_related for reverse foreign key relationships
As an example I have a User(id, name) model and SchoolHistory(id, start_date, school_name, user_id[FK user.id]) model. A user can have multiple school history records.
If I’m querying the database using the following SQL query:
SELECT
user.id,
name,
start_date,
school_name
FROM user
INNER JOIN school_history ON school_history.user_id = user.id
the expected result would be:
| User ID | Name | Start Date | School |
| 1 | Human | 1/1/2022 | Michigan |
| 1 | Human | 1/1/2021 | Wisconsin |
| 2 | Alien | | |
This is the current result that I’m getting instead with ORM and a serializer:
| User ID | Name | school_history
| 1 | Human | [{start_date:1/1/2022 , school:Michigan}, {start_date:1/1/2021 , school:Wisconsin}] |
| 2 | Alien | [] |
This is the ORM query that I’m using:
User.objects.prefetch_related(
Prefetch(
‘school_history’
query_set=SchoolHistory.objects.order_by(‘start_date’)
)
)
Is there a way for the ORM query to have a similar result as SQL? I want multiple rows if there are multiple schools associated with that user

Single Filter for PowerBI

I have 2 tables in powerbi, one contains all transactions to and from people (each client identified with an id, where "I" can be either the receiver or sender of $) and the other is the detail for each client.
Table 1 would look something like
| $ | sender id | receiver id |
|---|-----------| ------------|
| 10| 1 | 2 |
| 15| 1 | 3 |
| 20| 1 | 2 |
| 15| 3 | 1 |
| 10| 3 | 1 |
| 25| 2 | 1 |
| 10| 1 | 2 |
The second table contains sender id and name:
| id | name |
|----|-------|
| 1 | "me" |
| 2 | John |
| 3 | Susan |
The expected result is something like (not necesarily in a table, just to show)
| $ sent | $ received | Balance|
|--------|------------|--------|
| 55 | 45 | +10 |
And in a filter have "John" and "Susan" so when i Select one of them i could see $ sent, $received and balance for each of them.
The problem of course is that i end up with one active and one inactive relationship so if i apply such a filter i end up with 0 in sender/receiver and the whole value in the other (depending which is made active and inactive) and if i make another table that's "id sender"+"name sender" then i cant filter all at once.
Is it possible to do this?
I hope this is kinda understandable

You will need to add 2 columns to your user table
received = CALCULATE(SUM(T1[$]), Filter(T1, UserTable[id] = T1[reveicer id]))
The same you can do for send. Now in your visual, use the new columns.
Enjoy!

after going around a bit I found a way to solve this, probably not the most orthodox way to do it, but it works.
What I did is to add 2 columns to my sales table, one was labeled "movement" and in sql it is just a 'case' where when the receiver is 'me' its "Charged" and when the receiver is 'not-me' its "Payment", then i added a column with a case so it would always bring me the 'not-me' id, and i used that for may relationship with my users table.
Then I just added filters in my cards making one a "Payment" card and the other a "Charged" card.
This is all following the previous example, it was actually just a bit more tricky as I could actually have a payment from me to myself, but thats just another "case" for when it was 'me-me'
Hope this is understandable, english is not my first language and the information i actually used is partially confidential so i had to make the above example.
thanks all and have a nice day.

Power BI - Filtered count not grouping by values in table

I have two tables (subject & category) that are both related to the same parent table (main). Because of the foreign key constraints, it looks like Power BI automatically created the links.
Simple mock-up of table links
I need to count the subjects by type for each possible distance range. I tried a simple calculation shown below for each distance category.
less than 2m =
CALCULATE(
COUNTA('Category'[Descr]),
'Subject'[Distance] IN { "less than 2m" }
)
However, the filter doesn't seem to apply properly.
I want...
+------+--------------+--------------+--+
| Descr| less than 2m | more than 2m | |
+------+--------------+--------------+--+
| Car | 2 | 1 | |
| Sign | 4 | 2 | |
+------+--------------+--------------+--+
but I'm getting...
+------+--------------+--------------+--+
| Descr| less than 2m | more than 2m | |
+------+--------------+--------------+--+
| Car | 3 | 3 | |
| Sign | 6 | 6 | |
+------+--------------+--------------+--+
It's just giving me the total count by type which is correct but isn't applying the filter by distance so I can break it down.
I'm sure this is probably really simple but I'm pretty new with DAX and I can't figure this one out.

I wish I could mark Kosuke's comment as an answer. The issue was indeed with having to enable cross-filtering. This can either be done clicking on the link on your model or using a function to temporarily enable the cross filter.

Django 1.4/1.5 control GROUP BY combined with HAVING

Say I have the following data:
| id | user_id | time |
| 1 | 1 | 10.0 |
| 2 | 1 | 12.0 |
| 3 | 2 | 11.0 |
| 4 | 2 | 13.0 |
What I want is to query this table such that I get the MIN(time) per user_id:
| id | user_id | time |
| 1 | 1 | 10.0 |
| 3 | 2 | 11.0 |
So my SQL would like this:
SELECT id, user_id, time FROM table GROUP BY user_id HAVING MIN(time)
However, when trying to use .annotate(Min('time')), Django will GROUP BY on arbitrary (wrong) fields. For example see the following code and the resulting (simplified) SQL:
>>> Table.objects.annotate(Min('time')).query
SELECT id, user_id, time, MIN(time) FROM table GROUP BY id, user_id, time
>>> Table.objects.values('id', 'time').annotate(Min('time')).query
SELECT id, time, MIN(time) FROM table GROUP BY id, time
The resulting SQL is far from my desired output. I'm currently working around this by using raw SQL, however this defeats the purpose of using an ORM in the first place. Also, the resulting code is difficult to reuse as normal .filter() cannot be applied.
There are similar questions about this type of querying, however they are rather old and do not incorporate changes to Django since 1.3.

It's not pretty code but. Using django's 'extra' queryset method and where as an argument you can achieve getting back the desired result set. i.e.
Table.objects.extra(where=['id IN (SELECT id FROM table_name'
' GROUP BY user_id HAVING MIN(time))'])

How to store data with large number (constant) of properties in SQL

I am parsing the USDA's food database and storing it in SQLite for query purposes. Each food has associated with it the quantities of the same 162 nutrients. It appears that the list of nutrients (name and units) has not changed in quite a while, and since this is a hobby project I don't expect to follow any sudden changes anyway. But each food does have a unique quantity associated with each nutrient.
So, how does one go about storing this kind of information sanely. My priorities are multi-programming language friendly (Python and C++ having preference), sanity for me as coder, and ease of retrieving nutrient sets to sum or plot over time.
The two things that I had thought of so far were 162 columns (which I'm not particularly fond of, but it does make the queries simpler), or a food table that has a link to a nutrient_list table that then links to a static table with the nutrient name and units. The second seems more flexible i ncase my expectations are wrong, but I wouldn't even know where to begin on writing the queries for sums and time series.
Thanks

You should read up a bit on database normalization. Most of the normalization stuff is quite intuitive, but really going through the definition of the steps and seeing an example helps understanding the concepts and will help you greatly if you want to design a database in the future.
As for this problem, I would suggest you use 3 tables: one for the foods (let's call it foods), one for the nutrients (nutrients), and one for the specific nutrients of each food (foods_nutrients).
The foods table should have a unique index for referencing and the food's name. If the food has other data associated to it (maybe a link to a picture or a description), this data should also go here. Each separate food will get a row in this table.
The nutrients table should also have a unique index for referencing and the nutrient's name. Each of your 162 nutrients will get a row in this table.
Then you have the crossover table containing the nutrient values for each food. This table has three columns: food_id, nutrient_id and value. Each food gets 162 rows inside this table, oe for each nutrient.
This way, you can add or delete nutrients and foods as you like and query everything independent of programming language (well, using SQL, but you'll have to use that anyway :) ).
Let's try an example. We have 2 foods in the foods table and 3 nutrients in the nutrients table:
+------------------+
| foods |
+---------+--------+
| food_id | name |
+---------+--------+
| 1 | Banana |
| 2 | Apple |
+---------+--------+
+-------------------------+
| nutrients |
+-------------+-----------+
| nutrient_id | name |
+-------------+-----------+
| 1 | Potassium |
| 2 | Vitamin C |
| 3 | Sugar |
+-------------+-----------+
+-------------------------------+
| foods_nutrients |
+---------+-------------+-------+
| food_id | nutrient_id | value |
+---------+-------------+-------+
| 1 | 1 | 1000 |
| 1 | 2 | 12 |
| 1 | 3 | 1 |
| 2 | 1 | 3 |
| 2 | 2 | 7 |
| 2 | 3 | 98 |
+---------+-------------+-------+
Now, to get the potassium content of a banana, your'd query:
SELECT food_nutrients.value
FROM food_nutrients, foods, nutrients
WHERE foods_nutrients.food_id = foods.food_id
AND foods_nutrients.nutrient_id = nutrients.nutrient_id
AND foods.name = 'Banana'
AND nutrients.name = 'Potassium';

Use the second (more normalized) approach.
You could even get away with fewer tables than you mentioned:
tblNutrients
-- NutrientID
-- NutrientName
-- NutrientUOM (unit of measure)
-- Otherstuff
tblFood
-- FoodId
-- FoodName
-- Otherstuff
tblFoodNutrients
-- FoodID (FK)
-- NutrientID (FK)
-- UOMCount
It will be a nightmare to maintain a 160+ field database.
If there is a time element involved too (can measurements change?) then you could add a date field to the nutrient and/or the foodnutrient table depending on what could change.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Grouping aggregations in Django - django

Related

How to use prefetch_related to retrieve multiple rows similar to SQL result

Single Filter for PowerBI

Power BI - Filtered count not grouping by values in table

Django 1.4/1.5 control GROUP BY combined with HAVING

How to store data with large number (constant) of properties in SQL

Categories

Resources