Power BI, show category average rows in same table - powerbi

I have this raw data from source:
Category Product Price
C1 P1 1
C1 P2 1
C1 P3 4
C2 P4 2
C2 P5 10
C2 P6 12
I want to visualise a Power BI table that shows the Category average within the same structure:
Category Product Price
C1 P1 1
C1 P2 1
C1 Avg_C1 3
C1 P3 4
C2 P4 2
C2 Avg_C2 8
C2 P5 10
C2 P6 12
Many thanks if you show me a solution.
Just to reformat the question...
I have this raw data from source:
**Category Product Price**
C1 P1 1
C1 P2 1
C1 P3 4
C2 P4 2
C2 P5 10
C2 P6 12
I want to visualise a Power BI table that shows the Category average within the same structure:
**Category Product Price**
C1 P1 1
C1 P2 1
C1 Avg_C1 3
C1 P3 4
C2 P4 2
C2 Avg_C2 8
C2 P5 10
C2 P6 12

You can create a Matrix visual in which you place the Category and Product columns on Rows and for Values you use a Measure that would be like this:
PriceWithAverage =
VAR CurrentCategory =
MAX ( ProductTable[Category] )
RETURN
IF (
ISFILTERED ( ProductTable[Product] ),
MAX ( ProductTable[Price] ),
CALCULATE (
AVERAGE ( ProductTable[Price] ),
FILTER ( ProductTable, ProductTable[Category] = CurrentCategory )
)
)
Let us know if that works for you
Best
David

Related

Create semi-cumulative columns based off several other columns. SAS

I've got some data which is essentially lots of columns of information/data and dates and then two columns of numbers and a column which is a flag (ie its either a 1 or a 0). Each row is information on an individual at a particular month.
For the two columns of numbers I want to create two new columns which are the cumulative numbers for each individual over time. And for the flag I want it to be 1 for all future dates for that individual once it has first become 1 for that individual.
I'm struggling to word this (and so also to google what I want to do!) so I've put what I have and what I want below. In this example: A1, B1, C1 would be one individual and A1, B2, C3 would be another individual.
I've got this:
Col1
Col2
Col3
Date
Value_1
Value_2
Flag
A1
B1
C1
01Jan2021
0
100
0
A1
B1
C1
01Feb2021
0
0
0
A1
B1
C1
01Mar2021
10
100
0
A1
B1
C1
01Apr2021
50
0
0
A1
B1
C1
01May2021
0
10
1
A1
B1
C1
01Jun2021
10
0
0
A1
B1
C1
01Jul2021
0
0
0
A1
B2
C3
01Jan2021
0
0
0
A1
B2
C3
01Feb2021
0
20
1
A1
B2
C3
01Mar2021
10
20
0
A1
B2
C3
01Apr2021
40
20
0
A1
B2
C3
01May2021
0
0
0
A1
B2
C3
01Jun2021
30
0
0
A1
B2
C3
01Jul2021
0
0
0
And I want this:
Col1
Col2
Col3
Date
Value_1_full
Value_2_full
Flag
A1
B1
C1
01Jan2021
0
100
0
A1
B1
C1
01Feb2021
0
100
0
A1
B1
C1
01Mar2021
10
200
0
A1
B1
C1
01Apr2021
60
200
0
A1
B1
C1
01May2021
60
210
1
A1
B1
C1
01Jun2021
70
210
1
A1
B1
C1
01Jul2021
70
210
1
A1
B2
C3
01Jan2021
0
0
0
A1
B2
C3
01Feb2021
0
20
1
A1
B2
C3
01Mar2021
10
40
1
A1
B2
C3
01Apr2021
50
60
1
A1
B2
C3
01May2021
50
60
1
A1
B2
C3
01Jun2021
80
60
1
A1
B2
C3
01Jul2021
80
60
1
I could do this if the only data I had was for a single individual, but there's lots of them. The code I've written is just giving me the total cumulative of the column - I can't figure out how to calculate them separately for each individual. I'm also struggling to write the code for the flag column for a similar reason. I've put the code below and would be very appreciative of any help/advice.
Note: I'm really new to SAS and to write this question I've struggled to get the date field in correctly by just typing out the data for this example (I've used this "Ignore" bit of the code below as a work around to get it into SAS) so if you could let me know what I've done wrong here that would also be greatly appreciated for the future!
data data_1;
input Col1 $ Col2 $ Col3 $ Date date8. Ignore Value_1 Value_2 Flag;
format Date date8.;
datalines;
A1 B1 C1 "'01Jan2021'd" 0 100 0
A1 B1 C1 "'01Feb2021'd" 0 0 0
A1 B1 C1 "'01Mar2021'd" 10 100 0
A1 B1 C1 "'01Apr2021'd" 50 0 0
A1 B1 C1 "'01May2021'd" 0 10 1
A1 B1 C1 "'01Jun2021'd" 10 0 0
A1 B1 C1 "'01Jul2021'd" 0 0 0
A1 B2 C3 "'01Jan2021'd" 0 0 0
A1 B2 C3 "'01Feb2021'd" 0 20 1
A1 B2 C3 "'01Mar2021'd" 10 20 0
A1 B2 C3 "'01Apr2021'd" 40 20 0
A1 B2 C3 "'01May2021'd" 0 0 0
A1 B2 C3 "'01Jun2021'd" 30 0 0
A1 B2 C3 "'01Jul2021'd" 0 0 0
;
run;
Data data_2;
set data_1;
drop Ignore;
run;
proc sort data=data_2
out=data_3;
by Col1 Col2 Col3 Date;
run;
data data_4;
set data_3;
by Col1 Col2 Col3 Date;
retain Col1 Col2 Col3 Date Value_1 Value_2 Flag Value_1_full Value_2_full;
if first.Col1 AND first.Col2 AND first.Col3 AND first.Date then Value_1_full = Value_1;
else Value_1_full = Value_1_full + Value_1;
run;
So you're pretty close! I think this gets there...
proc sort data=data_1(drop=ignore)
out=data_3;
by Col1 Col2 Col3 Date;
run;
data data_4;
set data_3;
by Col1 Col2 Col3 Date;
retain Col1 Col2 Col3 Date Value_1 Value_2 Flag Value_1_full Value_2_full;
if first.Col3 then Value_1_full = Value_1;
else Value_1_full = Value_1_full + Value_1;
if first.col3 then flag=0;
flag = max(flag,flag_Early);
run;
Only a few small changes. I removed one pointless data step (The drop can be done in any of the other places you use the data) and change the if first. to be if first.col3.
You don't need col2 and col1 - first.col3 is what you care about, the other two changing would also cause first.col3 to also be true by default.
you also don't want First.date there - first.date is true EVERY TIME the date changes (or any other variable before it in the by), and that happens on every row, so it is always true! You don't want that.
Finally, for flag you need to make a new variable. Old variables are in fact always retained! But they're also replaced every iteration with new values. So we rename it to flag_early or whatever you like, and use the max function to assign a 1 to flag any time flag_early has a 1 or keep the 1 in flag if it has it from before - again resetting it every time first.col3 is true.

Combine data from multiple rows in SAS if multiple ids including adding some vars

Hope you can help with an solution, either a SQL or data step.
I need to combine multiple rows if customer id is the same, and add some vars with code too.
I have following static variable containers:
%let FirstColSuffix=<Somecode1>
%let SecondColSuffix=#<SomeCode2>
%let ThirdColSuffix=#<SomeCode3>
Data have;
Customerid Firstcol Secondcol Thirdcol
1 A1 A2 A3
2 B1 B2 B3
2 C1 C2 C3
2 D1 D2 D3
3 E1 E2 E3
3 F1 F2 F3
3 G1 G2 G3
3 H1 H2 H3
Data want;
Customerid Firstcol Secondcol Thirdcol Result
1 A1 A2 A3 A1<SomeCode1>A2#<SomeCode2>A3#<SomeCode3>
2 B1 B2 B3 B1<SomeCode1>B2#<SomeCode2>B3#<SomeCode3>
2 C1 C2 C3 B1<SomeCode1>B2#<SomeCode2>B3#<SomeCode3>C1<SomeCode1>C2#<SomeCode2>C3#<SomeCode3>
2 D1 D2 D3 B1<SomeCode1>B2#<SomeCode2>B3#<SomeCode3>C1<SomeCode1>C2#<SomeCode2>C3#<SomeCode3>D1<SomeCode1>D2#<SomeCode2>D3#<SomeCode3>
3 E1 E2 E3 E1<SomeCode1>E2#<SomeCode2>E3#<SomeCode3>
3 F1 F2 F3 E1<SomeCode1>E2#<SomeCode2>E3#<SomeCode3>F1<SomeCode1>F2#<SomeCode2>F3#<SomeCode3>
3 G1 G2 G3 E1<SomeCode1>E2#<SomeCode2>E3#<SomeCode3>F1<SomeCode1>F2#<SomeCode2>F3#<SomeCode3>G1<SomeCode1>G2#<SomeCode2>G3#<SomeCode3>
3 H1 H2 H3 E1<SomeCode1>E2#<SomeCode2>E3#<SomeCode3>F1<SomeCode1>F2#<SomeCode2>F3#<SomeCode3>G1<SomeCode1>G2#<SomeCode2>G3#<SomeCode3>H1<SomeCode1>H2#<SomeCode2>H3#<SomeCode3>
I only need output if last customer id (but with data from all matching customer id outputted in last row in column "result".
So in this example I need the line 1, 4 and 8
Can anyone help? :-)
Use retain and by-group processing. We'll continually concatenate result to itself for each row we read and carry that value forward. At the last customer ID, we'll output. At the first customer ID, result is reset.
data want;
set have;
by Customerid;
length Result $500.;
retain Result;
if(first.Customerid) then call missing(Result);
Result = cats(Result, FirstCol, "&FirstColSuffix", SecondCol, "&SecondColSuffix", ThirdCol, "&ThirdColSuffix");
if(last.Customerid);
run;
Output:

Calculate the average row_count by each process in DAX/powerbi?

I have a table in powerbi that has columns:
process | row_count |
P1 1
P1 2
P1 3
P2 4
P2 5
P3 2
P3 1
and I want to add a column that will have the average row_count of each process.
For example,
process | row_count | avg_row_count |
P1 1 2
P1 2 2
P1 3 2
P2 4 3
P2 5 3
P3 2 1
P3 1 1
Does anyone know how to do this using Dax /powerbi?
DAX measure for New calculated Column in a DataTable displaying the average of [row_count] column, in relation to distinct values in a [process] column.
avg_row_count = CALCULATE(
AVERAGE('Table1'[row_count]),
ALLEXCEPT('Table1','Table1'[process])
)

How to get counts from related tables filtered by dates in both tables

I have two tables A and B as shown below. The AccountID in A has a relationship with the AccountID in B. 
A
AccountID CmpName AccFlag SysStartTime
A1 Test1 1 1/1/2020
A2 Test2 0 1/2/2020
A3 Test3 1 1/2/2020
B
ContactId AccountID ConFlag SysStartTime
C1 A1 1 1/1/2020
C2 A1 1 1/1/2020
C3 A1 0 1/1/2020
C4 A2 1 1/2/2020
I want to get the count of records in A that have 3 related records in B. I did this using a calculated column with the DAX:
getcount = COUNTROWS(RELATEDTABLE(B))
And then created another calculated column to flag the one's with getcount = 3.
But the problem is that I want to check the count of records in A that have 3 related records in B at a given time. So I need to filter by the sysStartTime's in both the tables. For example, I want to get the count of records in A that have 3 related records in B as of 1/1/2010. So the result should be 1. Please advise on how I can do this using a Measure instead of a calculated column.
You should be able to do something like this:
SUMX ( A, IF ( COUNTROWS ( RELATEDTABLE ( B ) ) = 3, 1 ) )
Assuming you have a date table with a relationship to both A and B, a slicer on that date table will apply to the measure and you can set whatever dates you want.

How can i fill in missing csv file value base on reference csv file

I have a reference file like this
Id, Value1, Value2
a, a1, a2
b, b1, b2
c, c1, c2
d, d1, d2
...
n, n1, n2
and the missing file
Id, Value1, Value2
d, , d2
g, , g2
a, a1 ,
c, c1 ,
...
n, , n2
how can i write the code to fill missing values based on reference file 'Id'
you can do it using fillna(), but first set your joining column as index in both DF's:
In [71]: df = df.set_index('Id').fillna(ref.set_index('Id')).reset_index()
In [72]: df
Out[72]:
Id Value1 Value2
0 d d1 d2
1 g NaN g2
2 a a1 a2
3 c c1 c2
Data:
In [69]: ref
Out[69]:
Id Value1 Value2
0 a a1 a2
1 b b1 b2
2 c c1 c2
3 d d1 d2
In [70]: df
Out[70]:
Id Value1 Value2
0 d NaN d2
1 g NaN g2
2 a a1 NaN
3 c c1 NaN