PowerBI: New Column = Sum of value where key = X - powerbi

Let's say I have a table that looks like this:
Human Theme Score
----------------------
J1 Surfing 2
J2 Eating 3
J3 Sleeping 4
J2 Eating 5
J2 Surfing 6
Now I want to add columns to get the total value of a theme of the humans separately. To give you an idea:
Jury Theme Score EatingTotal SurfingTotal SleepingTotal
------------------------------------------------------------------
J1 Surfing 2 0 2 0
J2 Eating 3 8 6 0
J3 Sleeping 4 0 0 4
J2 Eating 5 8 6 0
J2 Surfing 6 8 6 0
How do I get there? Is there a way in DAX to say something like
SurfingTotal = SUM(if table[Theme] = Surfing)
Still new to PowerBI

There are probably several ways to do it, but this one seems to work:
Total Eating Measure =
CALCULATE(
SUM(Jury[Score]);
ALLEXCEPT(
Jury;
Jury[Human]
);
Jury[Theme]="Eating"
) + 0
I've added the zero in the end, to prevent showing blank values.

Related

Power BI - Substract rows based on a id

I have two tables as follows:
id
N1
N2
N3
N4
N5
1
UP
REIT
2
UP
REIT
UPDigital
DI
3
UP
REIT
UPDigital
DI
SI
4
UP
REIT
UPdigital
DI
IT
5
UP
FCUP
id_entity
id_person
exit
join
2
1
1
0
5
1
0
1
3
10
1
0
4
10
0
1
4
25
1
0
4
12
0
1
I need to calculate people's joins and exits, so to calculate the exists I created the following measure
N exits = IF(CALCULATE(sum(Folha2[exit])-sum(Folha2[join])) < 0,0, sum(Folha2[exit])-sum(Folha2[join]))
And for the joins this
N joins = IF(CALCULATE(sum(Folha2[join])-sum(Folha2[exit])) < 0,0, sum(Folha2[join])-sum(Folha2[exit]))
This is the result, but it is not correct.
My problem is that this way it is not based on the id_person
For example, in the last two rows of the second table, the person with id_person=25 left entity 4 and the person with id_person=12 entered entity 4.
This way he subtracts the two lines not taking into account that they are two different people
The correct thing would be the following number of exists
UP - 1
FCUP - 0
REIT - 2
UPDigital -2
DI - 2
IT - 1
SI - 1
Is it possible to calculate this in Power bi ?

Get previous rowindex in Google Sheets where certain columnvalue is zero

Consider a sheet like:
rowNr | Another Col | Filled | Cumul. Size
0 2 -1000 -1000
1 3 1000 0
2 1 -5000 -5000
3 4 5000 0
4 5 -10000 -10000
5 2 -10000 -20000
6 1 -20000 -40000
6 4 40000 0
The 'Cumul. Size'-column displays the cumulative sum of the 'filled' column.
each time Cummulutive Size = 0, I need to calculate the sum of 'Another Column' for all previous rows until 'Cummulutive Size' != 0 again. For rows where 'Cummulutive Size' = 0, display '' (blank)
So something like this:
rowNr | Another Col | Filled | Cumul. Size | calculated
0 2 -1000 -1000
1 3 1000 0 5
2 1 -5000 -5000
3 4 5000 0 5
4 5 -10000 -10000
5 2 -10000 -20000
6 1 -20000 -40000
6 4 40000 0 12
I'm sure I can create something working as long as I can find a function with a signature similar to: findPreviousRowIndex(curRowIndex, whereCondition)
Any pointers much appreciated
EDIT
Link To example Google Sheet
paste in D2 cell and drag down:
=ARRAYFORMULA(IF(LEN(A2), IF(C2=0, SUM(INDIRECT(ADDRESS(IFERROR(MAX(IF(
INDIRECT("C1:C"&ROW()-1)=0, ROW(A:A), ))+1, 2), 1, 4)&":A"&ROW())), ), ))

shifting with re-sampling in time series data

assume that i have this time-series data:
A B
timestamp
1 1 2
2 1 2
3 1 1
4 0 1
5 1 0
6 0 1
7 1 0
8 1 1
i am looking for a re-sample value that would give me specific count of occurrences at least for some frequency
if I would use re sample for the data from 1 to 8 with 2S, i will get different maximum if i would start from 2 to 8 for the same window size (2S)
ds = series.resample( str(tries) +'S').sum()
for shift in range(1,100):
tries = 1
series = pd.read_csv("file.csv",index_col='timestamp') [shift:]
ds = series.resample( str(tries) +'S').sum()
while ( (ds.A.max + ds.B.max < 4) & (tries < len(ds))):
ds = series.resample( str(tries) +'S').sum()
tries = tries + 1
#other lines
i am looking for performance improvement as it takes prohibitively long to finish for large data

PQ - Function.InvokeAfter() - real delay

Trying to calculate real delay between InvokeAfter-function's executions.
Function is supposed to fire five times a second
index delay now
0 0 18:47:33
1 0 18:47:33
2 0 18:47:33
3 0 18:47:33
4 0 18:47:33
5 1 18:47:34
6 1 18:47:34
7 1 18:47:34
8 1 18:47:34
9 1 18:47:34
10 2 18:47:35
11 2 18:47:35
12 2 18:47:35
13 2 18:47:35
14 2 18:47:35
...
But I get this
Column real_delay is a difference between this row and previous
CODE
let
t = Table.FromList({0..19}, Splitter.SplitByNothing()),
delay = Table.AddColumn(t, "delay", each Number.IntegerDivide([Column1], 5)),
InvokeAfter = Table.AddColumn(delay, "InvokeTimeNow", each Function.InvokeAfter(
()=>DateTime.Time(DateTime.LocalNow()), #duration(0,0,0,[delay]))
),
real_delay = Table.AddColumn(InvokeAfter, "real_delay", each try InvokeAfter{[Column1=[Column1]-1]}[InvokeTimeNow]-[InvokeTimeNow] otherwise "-")
in
real_delay
What's wrong with code? Or maybe with InvokeAfter-function???
5 times a second means you should be waiting (second / 5) = 0.2 fractional seconds each invocation.
If you run this code:
let
t = Table.FromList({0..19}, Splitter.SplitByNothing()),
delay = Table.AddColumn(t, "delay", each 0.2),
InvokeAfter = Table.AddColumn(delay, "InvokeTimeNow", each Function.InvokeAfter(
()=>DateTime.Time(DateTime.LocalNow()), #duration(0,0,0,[delay]))
),
real_delay = Table.AddColumn(InvokeAfter, "real_delay", each try InvokeAfter{[Column1=[Column1]-1]}[InvokeTimeNow]-[InvokeTimeNow] otherwise "-")
in
real_delay
you'll see the function was invoked about 5 times per second.
== SOLUTION ==
Here is my own solution. A surprising one, I think...
NEW CODE
let
threads=5,
t = Table.FromList({0..19}, Splitter.SplitByNothing()),
delay = Table.AddColumn(t, "delay", each if Number.Mod([Column1], threads)=0 and [Column1]>0 then 1 else 0),
InvokeAfter = Table.AddColumn(delay, "InvokeTimeNow", each Function.InvokeAfter(()=>DateTime.Time(DateTime.LocalNow()), #duration(0,0,0,[delay]))),
real_delay = Table.AddColumn(InvokeAfter, "real_delay", each try InvokeAfter{[Column1=[Column1]-1]}[InvokeTimeNow]-[InvokeTimeNow] otherwise "-")
in
real_delay
The original idea was the multithreading parsing. And since there were some limits for simultaneous connections, I had to adapt.
I thought there is a "null-zero-start" moment, after which function is invoked - the moment, when cell is calculated (all cells almost at the same time). And second parameter means a delay after this start point. But it appears to accumulate all the delays. Very strange behaviour, imho...
So I solved the problem, but still do not understand why =)

SAS - Selecting optimal quantities

I'm trying to solve a problem in SAS where I have quantities of customers across a range of groups, and the quantities I select need to be as even across the different categories as possible. This will be easier to explain with a small table, which is a simplification of a much larger problem I'm trying to solve.
Here is the table:
Customer Category | Revenue band | Churn Band | # Customers
A 1 1 4895
A 1 2 383
A 1 3 222
A 2 1 28
A 2 2 2828
A 2 3 232
B 1 1 4454
B 1 2 545
B 1 3 454
B 2 1 4534
B 2 2 434
B 2 3 454
Suppose I need to select 3000 customers from category A, and 3000 customers from category B. From the second category, within each A and B, I need to select an equal amount from 1 and 2. If possible, I need to select a proportional amount across each 1, 2, and 3 subcategories. Is there an elegant solution to this problem? I'm relatively new to SAS and so far I've investigated OPTMODEL, but the examples are either too simple or too advanced to be much use to me yet.
Edit: I've thought about using survey select. I can use this to select equal sizes across the Revenue Bands 1, 2, and 3. However where I'm lacking customers in the individual churn bands, surveyselect may not select the maximum number of customers available where those numbers are low, and I'm back to manually selecting customers.
There are still some ambiguities in the problem statement, but I hope that the PROC OPTMODEL code below is a good start for you. I tried to add examples of many different features, so that you can toy around with the model and hopefully get closer to what you actually need.
Of the many things you could optimize, I am minimizing the maximum violation from your "If possible" goal, e.g.:
min MaxMismatch = MaxChurnMismatch;
I was able to model your constraints as a Linear Program, which means that it should scale very well. You probably have other constraints you did not mention, but that would probably beyond the scope of this site.
With the data you posted, you can see from the output of the print statements that the optimal penalty corresponds to choosing 1500 customers from A,1,1, where the ideal would be 1736. This is more expensive than ignoring the customers from several groups:
[1] ChooseByCat
A 3000
B 3000
[1] [2] [3] Choose IdealProportion
A 1 1 1500 1736.670
A 1 2 0 135.882
A 1 3 0 78.762
A 2 1 28 9.934
A 2 2 1240 1003.330
A 2 3 232 82.310
B 1 1 1500 1580.210
B 1 2 0 193.358
B 1 3 0 161.072
B 2 1 1500 1608.593
B 2 2 0 153.976
B 2 3 0 161.072
Proportion MaxChurnMisMatch
0.35478 236.67
That is probably not the ideal solution, but figuring how to model exactly your requirements would not be as useful for this site. You can contact me offline if that is relevant.
I've added quotes from your problem statement as comments in the code below.
Have fun!
data custCounts;
input cat $ rev churn n;
datalines;
A 1 1 4895
A 1 2 383
A 1 3 222
A 2 1 28
A 2 2 2828
A 2 3 232
B 1 1 4454
B 1 2 545
B 1 3 454
B 2 1 4534
B 2 2 434
B 2 3 454
;
proc optmodel printlevel = 0;
set CATxREVxCHURN init {} inter {<'A',1,1>};
set CAT = setof{<c,r,ch> in CATxREVxCHURN} c;
num n{CATxREVxCHURN};
read data custCounts into CATxREVxCHURN=[cat rev churn] n;
put n[*]=;
var Choose{<c,r,ch> in CATxREVxCHURN} >= 0 <= n[c,r,ch]
, MaxChurnMisMatch >= 0, Proportion >= 0 <= 1
;
/* From OP:
Suppose I need to select 3000 customers from category A,
and 3000 customers from category B. */
num goal = 3000;
/* See "implicit slice" for the parenthesis notation, i.e. (c) below. */
impvar ChooseByCat{c in CAT} =
sum{<(c),r,ch> in CATxREVxCHURN} Choose[c,r,ch];
con MatchCatGoal{c in CAT}:
ChooseByCat[c] = goal;
/* From OP:
From the second category, within each A and B,
I need to select an equal amount from 1 and 2 */
con MatchRevenueGroupsWithinCat{c in CAT}:
sum{<(c),(1),ch> in CATxREVxCHURN} Choose[c,1,ch]
= sum{<(c),(2),ch> in CATxREVxCHURN} Choose[c,2,ch]
;
/* From OP:
If possible, I need to select a proportional amount
across each 1, 2, and 3 subcategories. */
con MatchBandProportion{<c,r,ch> in CATxREVxCHURN, sign in / 1 -1 /}:
MaxChurnMismatch >= sign * ( Choose[c,r,ch] - Proportion * n[c,r,ch] );
min MaxMismatch = MaxChurnMismatch;
solve;
print ChooseByCat;
impvar IdealProportion{<c,r,ch> in CATxREVxCHURN} = Proportion * n[c,r,ch];
print Choose IdealProportion;
print Proportion MaxChurnMismatch;
quit;