I have a model for which I want to perform a group-by on two values and calculate the percentages of each value per outer grouping.
Currently I just make a query to get all the rows and put them into a pandas dataframe and perform something similar to the answer here. Although this works I'm sure it would be more efficient if I could make the query return the information I require directly.
I am currently running Django 2.0.5 with a backend DB on PostgreSQL 9.6.8
I think window functions could be the solution as indicated here but I cannot construct a successful combination of annotate and values to give me the desired output.
Another possible solution could be rollup introduced in PostgreSQL 9.5 if I can find a way to get the summary row as a set of extra columns for each row? But I also think it's not yet supported by Django.
Model:
class ModelA(models.Model):
grouper1 = models.CharField()
grouper2 = models.CharField()
metric1 = models.IntegerField()
All rows:
grouper1 | grouper2 | metric1
---------+----------+---------
A | C | 2
A | C | 2
A | C | 2
A | D | 4
A | D | 4
A | D | 4
B | C | 5
B | C | 5
B | C | 5
B | D | 6
B | D | 4
B | D | 5
Desired output:
grouper1 | grouper2 | sum(metric1) | Percentage
---------+----------+--------------+-----------
A | C | 6 | 40
A | D | 12 | 60
B | C | 15 | 50
B | D | 15 | 50
I got close to what I expected with
ModelA.objects.all(
).values(
'grouper1',
'grouper2'
).annotate(
SumMetric1=Window(expression=Sum('metric1'), partition_by=[F('grouper1'), F('grouper2')]),
GroupSumMetric1=Window(expression=Sum('metric1'), partition_by=[F('grouper1')])
)
However this returns a row for every original row in the database like so:
grouper1 | grouper2 | sum(metric1) | Percentage
---------+----------+--------------+-----------
A | C | 6 | 40
A | C | 6 | 40
A | C | 6 | 40
A | D | 12 | 60
A | D | 12 | 60
A | D | 12 | 60
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | C | 15 | 50
B | D | 15 | 50
In this situation .distinct() might help.
More information is here.
Related
Could you please help me to solve the problem as I am totally new to DAX and English is not my first language so I am struggling to even find the correct question.
Here's the problem.
I have two tables:
start_balance
+------+---------------+
| Type | Start balance |
+------+---------------+
| A | 0 |
| B | 10 |
+------+---------------+
in_out
+------+-------+------+----+-----+
| Year | Month | Type | In | Out |
+------+-------+------+----+-----+
| 2020 | 1 | A | 20 | 20 |
| 2020 | 1 | A | 0 | 10 |
| 2020 | 2 | B | 20 | 0 |
| 2020 | 2 | B | 20 | 10 |
+------+-------+------+----+-----+
I'd like to get the result as follows:
Unfiltered:
+------+-------+------+---------+----+-----+------+
| Year | Month | Type | Balance | In | Out | Left |
+------+-------+------+---------+----+-----+------+
| 2020 | 1 | A | 0 | 20 | 20 | 0 |
| 2020 | 1 | B | 10 | 20 | 10 | 20 |
| 2020 | 2 | A | 0 | 20 | 10 | 10 |
| 2020 | 2 | B | 20 | 20 | 10 | 30 |
+------+-------+------+---------+----+-----+------+
Filtered (for example year/month 2020/2):
+------+-------+------+---------+----+-----+------+
| Year | Month | Type | Balance | In | Out | Left |
+------+-------+------+---------+----+-----+------+
| 2020 | 2 | A | 0 | 20 | 10 | 10 |
| 2020 | 2 | B | 20 | 20 | 10 | 30 |
+------+-------+------+---------+----+-----+------+
So while selecting a slicer for the year/month it should calculate balance before selected year/month and then show selected year/month values.
Edit: corrected start_balance table.
Is the sample data correct?
A -> the starting balance is 10, but in your unfiltered table example, it is 0.
Do you have any relationship between these tables?
Does opening balance always apply to the current year? What if 2021 appears in the in_out table? How do you know when the start balance started?
example without starting balance
If you want to show value breaking given filter you should use statement ALL or REMOVEFILTERS function (in Analysis Services 2019 and in Power BI since October 2019).
calculate(sum([in]) - sum([out]), all('in_out'[Year],'in_out'[Month]))
More helpful information:
https://www.sqlbi.com/articles/managing-all-functions-in-dax-all-allselected-allnoblankrow-allexcept/
I need to rank products for my dashboard. Each day, we store sales of products. In result we have this dataset example:
+-----------+------------+-------+
| product | date | sales |
+-----------+------------+-------+
| coffee | 11/03/2019 | 15 |
| coffee | 12/03/2019 | 10 |
| coffee | 13/03/2019 | 28 |
| coffee | 14/03/2019 | 1 |
| tea | 11/03/2019 | 5 |
| tea | 12/03/2019 | 2 |
| tea | 13/03/2019 | 6 |
| tea | 14/03/2019 | 7 |
| Chocolate | 11/03/2019 | 30 |
| Chocolate | 11/03/2019 | 4 |
| Chocolate | 11/03/2019 | 15 |
| Chocolate | 11/03/2019 | 10 |
+-----------+------------+-------+
My attempt
I actualy managed to Rank my products but not in the way I wanted it; In fact, the ranking process increase by the number of rows. for example, chocolate is first but we record 4 rows so coffee is ranked at 5 and not 2.
+-----------+------------+-------+-----+------+
| product | date | sales | sum | rank |
+-----------+------------+-------+-----+------+
| coffee | 11/03/2019 | 15 | 54 | 5 |
| coffee | 12/03/2019 | 10 | 54 | 5 |
| coffee | 13/03/2019 | 28 | 54 | 5 |
| coffee | 14/03/2019 | 1 | 54 | 5 |
| tea | 11/03/2019 | 5 | 20 | 9 |
| tea | 12/03/2019 | 2 | 20 | 9 |
| tea | 13/03/2019 | 6 | 20 | 9 |
| tea | 14/03/2019 | 7 | 20 | 9 |
| Chocolate | 11/03/2019 | 30 | 59 | 1 |
| Chocolate | 11/03/2019 | 4 | 59 | 1 |
| Chocolate | 11/03/2019 | 15 | 59 | 1 |
| Chocolate | 11/03/2019 | 10 | 59 | 1 |
+-----------+------------+-------+-----+------+
sum field formula formula:
sum =
SUMX(
FILTER(
Table1;
Table1[product] = EARLIER(Table1[product])
);
Table1[sales]
)
rank field formula :
rank = RANKX(
ALL(Table1);
Table1[sum]
)
As you can see, we get the following ranking:
1 : Chocolate
5 : Coffee
9 : Tea
Improvements
I would like to transform the previous result into :
1 : Chocolate
2 : Coffee
3 : Tea
Can you help me improving my ranking system and get a marvelous 1, 2, 3 instead of this ugly and not practical 1, 5, 9 ?
If you don't know the anwser, help by simply upvote the question ♥
Fortunately, this is an easy fix.
If you look at the documentation for the RANKX function, you'll notice an optional ties argument which you can set to Skip or Dense. The default is Skip but you want Dense. Try this:
rank =
RANKX(
ALL(Table1);
Table1[sum];
;;
"Dense"
)
(Those extra ; delimiters are there since we aren't specifying the optional value or order arguments.)
I have simplified my problem to solve. Lets suppose I have three tables. One containing data and specific codes that identify objects lets say Apples.
+-------------+------------+-----------+
| Data picked | Color code | Size code |
+-------------+------------+-----------+
| 1-8-2018 | 1 | 1 |
| 1-8-2018 | 1 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 2 | 3 |
| 1-8-2018 | 2 | 2 |
| 1-8-2018 | 3 | 3 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 4 | 1 |
| 1-8-2018 | 5 | 3 |
| 1-8-2018 | 6 | 1 |
| 1-8-2018 | 6 | 2 |
| 1-8-2018 | 6 | 2 |
+-------------+------------+-----------+
And i have two related helping tables to help understand the codes (their relationships are inactive in the model due to ambiguity with other tables in the real case).
+-----------+--------+
| Size code | Size |
+-----------+--------+
| 1 | Small |
| 2 | Medium |
| 3 | Large |
+-----------+--------+
and
+------------+----------------+-------+
| Color code | Color specific | Color |
+------------+----------------+-------+
| 1 | Light green | Green |
| 2 | Green | Green |
| 3 | Semi green | Green |
| 4 | Red | Red |
| 5 | Dark | Red |
| 6 | Pink | Red |
+------------+----------------+-------+
Lets say that I want to create an extra column in the original table to determine which apples are class A and class B given that medium green Apples are class A and large Red apples are class B, the other remain blank as the example below.
+-------------+------------+-----------+-------+
| Data picked | Color code | Size code | Class |
+-------------+------------+-----------+-------+
| 1-8-2018 | 1 | 1 | |
| 1-8-2018 | 1 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 2 | 3 | |
| 1-8-2018 | 2 | 2 | A |
| 1-8-2018 | 3 | 3 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 4 | 1 | |
| 1-8-2018 | 5 | 3 | B |
| 1-8-2018 | 6 | 1 | |
| 1-8-2018 | 6 | 2 | |
| 1-8-2018 | 6 | 2 | |
+-------------+------------+-----------+-------+
What's the proper DAX to use given the relationships are initially inactive. Preferably solvable without creating any further additional columns in any table. I already tried codes like:
CALCULATE (
"A" ;
FILTER ( 'Size Table' ; 'Size Table'[Size] = "Medium");
FILTER ( 'Color Table' ; 'Color Table'[Color] = "Green")
)
And many variations on the same principle
Given that the relationships are inactive, I'd suggest using LOOKUPVALUE to match ID values on the other tables. You should be able to create a calculated column as follows:
Class =
VAR Size = LOOKUPVALUE('Size Table'[Size],
'Size Table'[Size code], 'Data Table'[Size code])
VAR Color = LOOKUPVALUE('Color Table'[Color],
'Color Table'[Color code], 'Data Table'[Color code])
RETURN SWITCH(TRUE(),
(Size = "Medium") && (Color = "Green"), "A",
(Size = "Large") && (Color = "Red"), "B", BLANK())
If your relationships are active, then you don't need the lookups:
Class = SWITCH(TRUE(),
(RELATED('Size Table'[Size]) = "Medium") &&
(RELATED('Color Table'[Color]) = "Green"),
"A",
(RELATED('Size Table'[Size]) = "Large") &&
(RELATED('Color Table'[Color]) = "Red"),
"B",
BLANK())
Or a bit more elegantly written (especially for more classes):
Class =
VAR SizeColor = RELATED('Size Table'[Size]) & " " & RELATED('Color Table'[Color])
RETURN SWITCH(TRUE(),
SizeColor = "Medium Green", "A",
SizeColor = "Large Red", "B",
BLANK())
I have a Django model with three fields: product, condition and quantity with data such as:
| Product | Condition | Quantity |
+---------+-----------+----------+
| A | new | 2 |
| A | new | 3 |
| A | new | 4 |
| A | old | 1 |
| A | old | 2 |
| B | new | 2 |
| B | new | 3 |
| B | new | 1 |
| B | old | 4 |
| B | old | 2 |
I'd like to sum the quantities of the entries where product and condition are equal:
| Product | Condition | Quantity |
+---------+-----------+----------+
| A | new | 9 |
| A | old | 3 |
| B | new | 6 |
| B | old | 6 |
This answer helps to count entries with the same field value, but I need to count two fields.
How could I implement this?
from django.db.models import Sum
Model.objects.values('product', 'condition').order_by().annotate(Sum('quantity'))
So I'm trying to figure out a good way of vectorizing a calculation and I'm a bit stuck.
| A | B (Calculation) | B (Value) |
|---|----------------------|-----------|
| 1 | | |
| 2 | | |
| 3 | | |
| 4 | =SUM(A1:A4)/4 | 2.5 |
| 5 | =(1/4)*A5 + (3/4)*B4 | 3.125 |
| 6 | =(1/4)*A6 + (3/4)*B5 | 3.84375 |
| 7 | =(1/4)*A7 + (3/4)*B6 | 4.6328125 |
I'm basically trying to replicate Wilder's Average True Range (without using TA-Lib). In the case of my simplified example, column A is the precomputed True Range.
Any ideas of how to do this without looping? Breaking down the equation it's effectively a weighted cumulative sum... but it's definitely not something that the existing pandas cumsum allows out of the box.
This is indeed an ewm problem. The issue is that the first 4 rows are crammed together into a single row... then ewm takes over
a = df.A.values
d1 = pd.DataFrame(dict(A=np.append(a[:4].mean(), a[4:])), df.index[3:])
d1.ewm(adjust=False, alpha=.25).mean()
A
3 2.500000
4 3.125000
5 3.843750
6 4.632812