I am trying to calculate Variance and Average in PowerBI. I am running into Circular dependency errors.
This is my Data,
Month Year Item Count
1 2017 Chair 100
1 2017 Chair 200
1 2017 Chair 300
1 2017 Bench 110
1 2017 Bench 140
1 2017 Bench 150
2 2017 Chair 180
2 2017 Chair 190
2 2017 Chair 250
2 2017 Bench 270
2 2017 Bench 370
3 2017 Chair 120
3 2017 Chair 150
3 2017 Bench 180
3 2017 Bench 190
4 2017 Chair 200
4 2017 Chair 210
4 2017 Bench 220
4 2017 Bench 230
.
.
.
Average = Sum of Counts for the Previous 3 months / 3
Variance = (Average - Sum(CurrentMonth)) / Average
So, because the average won't be meaningful for the first 3 months, I wouldn't be worried about that.
Expected Output,
Month Year Item Sum(CurrentMonth) Average Variance
1
1
2
2
3
3
4 2017 Chair 410 497 0.21
4 2017 Bench x y z
Lets Say for Chair,
Sum of Current Month = 200 + 210 = 410
Average of Last 3 Months = (100 + 200 + 300 + 180 + 190 + 250 + 120 + 150 )/ 3 = 1490 / 3 = 497
Variance = (497 - 410) / 410 = 87 / 410 = 0.21
Kindly share your thoughts.
I started with this as Table1 (I added a couple months data to yours):
I loaded it into Power BI and added a column called "YearMonth" using this code: YearMonth = Table1[Year]&FORMAT(Table1[Month],"00") ...to get this:
Then I added another column called "Sum(CurrentMonth)" using this code: Sum(CurrentMonth) = SUMX(FILTER(FILTER(Table1,Table1[Item]=EARLIER(Table1[Item])),VALUE(Table1[YearMonth])=VALUE(EARLIER(Table1[YearMonth]))),Table1[Count]) ...to get this:
Then I added another column called "Average" using this code: Average = SUMX(FILTER(FILTER(FILTER(Table1,Table1[Item]=EARLIER(Table1[Item])),VALUE(Table1[YearMonth])<=VALUE(EARLIER(Table1[YearMonth]))-1),VALUE(Table1[YearMonth])>=VALUE(EARLIER(Table1[YearMonth]))-3),Table1[Count])/3 ...to get this:
Lastly, I added a column called "Variance" using this code: Variance = (Table1[Average]-Table1[Sum(CurrentMonth)])/Table1[Sum(CurrentMonth)] ...to get this:
I hope this helps you.
Related
In Stata, I want to calculate the minimum and maximum for subgroups per country and year, while the result should be in every observation.
Ulitmately, I want to have the difference between min and max as a separate variable.
Here is an example for my dataset:
country
year
oranges
type
USA
2021
100
1
USA
2021
200
0
USA
2021
900
0
USA
2022
500
1
USA
2022
300
0
Canada
2022
300
0
Canada
2022
400
1
The results should look like this:
country
year
oranges
type
min(tpye=1)
max(type=0)
distance
USA
2021
100
1
100
900
800
USA
2021
200
0
100
900
800
USA
2021
900
0
100
900
800
USA
2022
500
1
500
300
-200
USA
2022
300
0
500
300
-200
Canada
2022
300
0
400
300
-100
Canada
2022
400
1
400
300
-100
So far, I tried the following code:
bysort year country: egen smalloranges = min(oranges) if type == 1
bysort year country: egen bigoranges = max(oranges) if type == 0
gen distance = bigoranges - smalloranges
I would approach this directly, as follows:
* Example generated by -dataex-. For more info, type help dataex
clear
input str6 country int(year oranges) byte type
"USA" 2021 100 1
"USA" 2021 200 0
"USA" 2021 900 0
"USA" 2022 500 1
"USA" 2022 300 0
"Canada" 2022 300 0
"Canada" 2022 400 1
end
egen min = min(cond(type == 1, oranges, .)), by(country year)
egen max = max(cond(type == 0, oranges, .)), by(country year)
gen wanted = max - min
list, sepby(country year)
b +------------------------------------------------------+
| country year oranges type min max wanted |
|------------------------------------------------------|
1. | USA 2021 100 1 100 900 800 |
2. | USA 2021 200 0 100 900 800 |
3. | USA 2021 900 0 100 900 800 |
|------------------------------------------------------|
4. | USA 2022 500 1 500 300 -200 |
5. | USA 2022 300 0 500 300 -200 |
|------------------------------------------------------|
6. | Canada 2022 300 0 400 300 -100 |
7. | Canada 2022 400 1 400 300 -100 |
+------------------------------------------------------+
For more discussion, see Section 9 of https://www.stata-journal.com/article.html?article=dm0055
I am not sure if I understand the purpose of type 1 and 0, but this generates the exact result you describe in the tables. It might seem convoluted to create temporary files like this, but I think it modularizes the code into clean blocks.
* Example generated by -dataex-. For more info, type help dataex
clear
input str6 country int(year oranges) byte type
"USA" 2021 100 1
"USA" 2021 200 0
"USA" 2021 900 0
"USA" 2022 500 1
"USA" 2022 300 0
"Canada" 2022 300 0
"Canada" 2022 400 1
end
tempfile min1 max0
* Get min values for type 1 in each country-year
preserve
keep if type == 1
collapse (min) min_type_1=oranges , by(country year)
save `min1'
restore
* Get max values for type 0 in each country-year
preserve
keep if type == 0
collapse (max) max_type_0=oranges , by(country year)
save `max0'
restore
* Merge the min and the max
merge m:1 country year using `min1', nogen
merge m:1 country year using `max0', nogen
* Calculate distance
gen distance = max_type_0 - min_type_1
Is there a recommended Power BI DAX pattern for calculating monthly Days Sales Outstanding (a.k.a. DSO or Debtor Days) using the Countback method?
I have been searching for a while and although there are many asking about it, there is no working solution recommendation I can find. I think that is perhaps because nobody has set out the problem properly so I am going to try to explain as fully as possible.
DSO is a widely-used management accounting measure of the average number of days that it takes a business to collect payment for its credit sales. More background info on the metric here: https://www.investopedia.com/terms/d/dso.asp
There are various options for defining the calculation. I believe my requirement is known as the countback method. My data set is a fairly large star schema with a separate date dimension, but using the below simplified data set to generate a solution would totally point me in the right direction.
Input data set as follows:
Month No
Month
Days in Month
Debt Balance
Gross Income
1
Jan
31
1000
700
2
Feb
28
1100
500
3
Mar
31
900
400
4
Apr
30
950
600
5
May
31
1000
400
6
Jun
30
1100
550
7
Jul
31
900
700
8
Aug
31
950
500
9
Sep
30
1000
400
10
Oct
31
1100
600
11
Nov
30
900
400
12
Dec
31
950
550
The aim is to create a measure for debtor days equal to the number of days of average daily income per month we need to count back to match the debt balance.
Starting with Dec as an example in 3 steps:
Debt Balance= 950, income = 550. Dec has 31 days. So we take all
31 days of income and reduce the debt balance to 400 (i.e. 950 - 550) and go back to the previous month.
Remaining Dec Debt balance =
400. Nov Income = 700. We don't need all of the daily income from Nov to match the rest of the Dec debt balance. 400/700 x 30 days in
Nov = 17.14 days
We have finished counting back days. 31 + 17.14 = 48.14 debtor days
Nov has a higher balance so we need 1 more step:
Debt balance= 1500, income = 700. Nov has 30 days. So we take all 30 days of income and reduce the debt balance to 800 (i.e. 1500 - 700) and go back to the previous month.
Remaining Nov Debt balance = 800. Oct Income = 600. Oct has 31 days. So we take all 31 days of income from Oct and reduce the Nov debt balance to 200 (i.e. 1500 - 700 - 600)
Remaining Nov debt balance = 200. Sep Income = 400. We don't need all of the daily income from Sep to match the rest of the Nov debt balance. 200/400 x 30 days in Sep = 15 days
We have finished counting back days. 30 + 31 + 15 = 76 debtor days
Apr has a lower balance so can be resolved in one step:
Debt Balance = 400, income = 600. Apr has 30 days. We don't need all of Apr Income as income exceeds debt in this month. 400/600 * 30 = 20 debtor days
The required solution for Debtor days in the simplified data set is therefore shown in the right-most "Debtor Days" column as follows:
Month
Month
Days
Debt Balance
Gross Income
Debtor Days
1
Jan
31
1000
700
2
Feb
28
1100
500
54.57
3
Mar
31
900
400
59.00
4
Apr
30
400
600
20.00
5
May
31
600
400
41.00
6
Jun
30
800
550
49.38
7
Jul
31
900
700
41.91
8
Aug
31
950
500
50.93
9
Sep
30
1000
400
65.43
10
Oct
31
1100
600
67.20
11
Nov
30
1500
700
76.00
12
Dec
31
950
550
48.14
I hope the above explains the required calculation sufficiently. Of course it needs to be implemented as a measure rather than a calculated column as in the real world it needs to work with more complex scenarios with the user defining the filter context at runtime by filtering and slicing in Power BI.
If anyone can recommend a DAX calculation for Debtor Days, that would be great!
This works on a small example, probably this may not work on a large model.
There is no easy way to do that, DAX isnt a programing language and we canot use loop / recursive statements etc. We have many limitations;
We can only mimic this behavior by bulk/ force calculate (which is resource consuming task). The most interesting part is variable _zz where we calculate for each row 3 version of the main table limited to 1/2/3 rows (as you see we hardcode some value - i consider that we can find result in max 3 iteration). You can investigate this if you want by adding NewTable from this code:
filter(GENERATE(SELECTCOLUMNS(GENERATE(Sheet1, GENERATESERIES(1,3,1)),"MYK", [MonthYearKey], "MonthToCheck", [Value], "Debt", [Debt Balance]),
var _tmp = TOPN([MonthToCheck],FILTER(ALL(Sheet1), Sheet1[MonthYearKey] <= [MYK] ), Sheet1[MonthYearKey], DESC)
return row("IncomAgg", SUMX(_tmp, Sheet1[Gross Income]) )
), [IncomAgg] >= [Debt])
Next, I try to find in our Table Variable 2 information, how many months back we must go.
Full code (I use MonthYearKey for time navigating purpose):
Mes =
var __currRowDebt = SELECTEDVALUE(Sheet1[Debt Balance])
var _zz = TOPN(1,
filter(GENERATE(SELECTCOLUMNS(GENERATE(Sheet1, GENERATESERIES(1,3,1)),"MYK", [MonthYearKey], "MonthToCheck", [Value], "Debt", [Debt Balance]),
var _tmp = TOPN([MonthToCheck],FILTER(ALL(Sheet1), Sheet1[MonthYearKey] <= [MYK] ), Sheet1[MonthYearKey], DESC)
return row("IncomAgg", SUMX(_tmp, Sheet1[Gross Income]) )
), [IncomAgg] >= [Debt]), [MonthToCheck], ASC)
var __monthinscoop = sumx(_zz,[MonthToCheck]) - 2
var __backwardrunningIncom = sumx(_zz,[IncomAgg])
var _calc = CALCULATE( sum(Sheet1[Days]), filter(ALL(Sheet1), Sheet1[MonthYearKey] <= SELECTEDVALUE( Sheet1[MonthYearKey]) && Sheet1[MonthYearKey] >= SELECTEDVALUE( Sheet1[MonthYearKey]) - __monthinscoop ))
var __twik = SWITCH( TRUE()
, __monthinscoop < 0 , -1
, __monthinscoop = 0 , 1
, __monthinscoop = 1 , 3
,0)
var __GetRowValue = CALCULATE( SUM(Sheet1[Gross Income]), FILTER(ALL(Sheet1), Sheet1[MonthYearKey] = (SELECTEDVALUE( Sheet1[MonthYearKey]) + __monthinscoop - __twik)))
var __GetRowDays = CALCULATE( SUM(Sheet1[Days]), FILTER(ALL(Sheet1), Sheet1[MonthYearKey] = (SELECTEDVALUE( Sheet1[MonthYearKey]) + __monthinscoop - __twik)))
return
_calc+DIVIDE(__GetRowValue - (__backwardrunningIncom - __currRowDebt), __GetRowValue) * __GetRowDays
Appreciate any help. Basically, I have a poor data set and am trying to make it more useful.
Below is a representation
df = pd.DataFrame({'State': ("Texas","California","Florida"),
'Q1 Computer Sales': (100,200,300),
'Q1 Phone Sales': (400,500,600),
'Q1 Backpack Sales': (700,800,900),
'Q2 Computer Sales': (200,200,300),
'Q2 Phone Sales': (500,500,600),
'Q2 Backpack Sales': (800,800,900)})
I would like to have a df that creates separate columns for the Quarters and Sales for the respective state.
I think perhaps regex, str.contains, and loops perhaps?
snapshot below
IIUC, you can use:
df_a = df.set_index('State')
df_a.columns = pd.MultiIndex.from_arrays(zip(*df_a.columns.str.split(' ', n=1)))
df_a.stack(0).reset_index()
Output:
State level_1 Backpack Sales Computer Sales Phone Sales
0 Texas Q1 700 100 400
1 Texas Q2 800 200 500
2 California Q1 800 200 500
3 California Q2 800 200 500
4 Florida Q1 900 300 600
5 Florida Q2 900 300 600
Or we can go further:
df_a = df.set_index('State')
df_a.columns = pd.MultiIndex.from_arrays(zip(*df_a.columns.str.split(' ', n=1)), names=['Quarters','Items'])
df_a = df_a.stack(0).reset_index()
df_a['Quarters'] = df_a['Quarters'].str.extract('(\d+)')
print(df_a)
Output:
Items State Quarters Backpack Sales Computer Sales Phone Sales
0 Texas 1 700 100 400
1 Texas 2 800 200 500
2 California 1 800 200 500
3 California 2 800 200 500
4 Florida 1 900 300 600
5 Florida 2 900 300 600
I am currently trying to create a report that shows how customers behave over time, but instead of doing this by date, I am doing it by customer age (number of months since they first became a customer). So using a date field isn't really an option, considering one customer may have started in Dec 2016 and another starts in Jun 2017.
What I'm trying to find is the month-over-month change in units purchased. If I was using a date field, I know that I could use
[Previous Month Total] = CALCULATE(SUM([Total Units]), PREVIOUSMONTH([FiscalDate]))
I also thought about using EARLIER() to find out but I don't think it would work in this case, as it requires row context that I'm not sure I could create. Below is a simplified version of the table that I'll be using.
ID Date Age Units
219 6/1/2017 0 10
219 7/1/2017 1 5
219 8/1/2017 2 4
219 9/1/2017 3 12
342 12/1/2016 0 500
342 1/1/2017 1 280
342 2/1/2017 2 325
342 3/1/2017 3 200
342 4/1/2017 4 250
342 5/1/2017 5 255
How about something like this?
PrevTotal =
VAR CurrAge = SELECTEDVALUE(Table3[Age])
RETURN CALCULATE(SUM(Table3[Units]), ALL(Table3[Date]), Table3[Age] = CurrAge - 1)
The CurrAge variable gives the Age evaluated in the current filter context. You then plug that into a filter in the CALCULATE line.
I have a table like this.
Company Amount Year
A 200 2016
B 300 2016
C 400 2016
A 500 2017
B 600 2017
C 700 2017
A 100 2016
B 400 2016
C 100 2016
A 600 2017
B 133 2017
C 50 2017
I am looking for a measure calculate the Percentage of amount that top 2 companies(based on amount) contributes to that particular year's total amount. This needs to be dynamic based on the values of Year slicer. (For Example if 2 years are selected, then the top 2 companies needs to be based on the total amount that the company has spent on those 2 years).
How about this as a measure?
PercentTop2 =
DIVIDE(
SUMX(
TOPN(2,
SUMMARIZECOLUMNS(Companies[Company],
"Amount", SUM(Companies[Amount])),
[Amount]),
[Amount]),
SUMX(ALLSELECTED(Companies), Companies[Amount]))
The TOPN(2,[...]) finds the top 2 rows of the summarized table. Then you divide the sum of those two rows by the sum of all the selected rows.