I have 2 tables like this:
PM_History2
Serial# Good
A TRUE
B FALSE
A TRUE
B FALSE
C TRUE
A FALSE
C TRUE
CONTRACTS
Serial# Enrollment#
A 1
B 2
C 3
D 4
I have a measure that calculates the number of Good for TRUE:
Count of Good for True =
CALCULATE(COUNTA('PM_History2'[Good]), 'PM_History2'[Good] IN { TRUE })
I then have a measure that calculates the percentage of TRUEs for Good.
PM Score = 'PM_History2'[Count of Good for True]/COUNTROWS(PM_History2)
When I create a table visualization to show all the Serial# and their PM Score I get this:
Serial# PM Score
A .67
B
C 1.00
D
What can I do to get what should be a zero to come in as 0 and what should be blank to be blank. Like this:
Serial# PM Score
A .67
B 0
C 1.00
D
Thank you in advance!
Try this:
PM Score = DIVIDE ( [Count of Good for True] + 0, COUNTROWS ( PM_History2 ) )
Adding + 0 makes the numerator nonblank but the DIVIDE function still returns a blank when the denominator is blank, thus distinguishing the results for B and D.
Related
I have a data set with cols (ID, Calc.CompleteBool where complete = 1 and incomplete = 0) of the form:
ID | Calc.CompleteBool
----------------------------
100| 1
101| 0
103| 1
105| 1
I need to create a measure that gives me a single percentage complete. Thus, the measure needs to count the total number of IDs (n) and divide by that number the total IDs that meet the condition of 'complete' or 1.
E.g. 3 / 4 = 75%
I have tried the following and it does not work. It is returning a value of zero (0). Your assistance is greatly appreciated.
Here is my code:
Calc.pctComplete =
VAR total_aps =
CALCULATE(
COUNT('TABLE_NAME'[ID]),
FILTER(
ALL('TABLE_NAME'),
'TABLE_NAME'[Calc.CompleteBool] = 'TABLE_NAME'[Calc.CompleteBool]
)
)
VAR total_aps_complete =
CALCULATE(
COUNT('TABLE_NAME'[Calc.CompleteBool]),
FILTER(
ALL('TABLE_NAME'),
'TABLE_NAME'[Calc.CompleteBool] = 1
)
)
RETURN total_aps_complete/total_aps
Update
I also need to add another filter in that only returns rows where "CheckID" = Yes.
There are 3,700 total IDs
There are ~ 1,500 IDs where CheckID = Yes
And roughly 8 where Calc.CompleteBool = 1
ID | Calc.CompleteBool | CheckID |
---------------------------------------
100| 1 | Yes
101| 0 | No
103| 1 | No
105| 1 | Yes
106| 0 | Yes
{100, 105, 106} are the set that would be included. So the division would be 2/3 = 66% complete.
Your result can be calculated with simple dax formula as following. The concept of calculate with filter can transform count into similar function like excel countifs:
Completion = CALCULATE(COUNT(Sheet1[ Calc.CompleteBool]),
Sheet1[ Calc.CompleteBool]=1, Sheet1[CheckID]="Yes") /
COUNT(Sheet1[ Calc.CompleteBool])
Output:
You may use this measure (add +0 to __completed if you want see 0% if all rows has 0 in Calc.CompleteBool either you get BLANK:
Percentage% =
var __completed = CALCULATE( COUNTROWS(VALUES(TABLE_NAME[ID])), 'TABLE_NAME'[Calc.CompleteBool] = 1) + 0
var __all = COUNTROWS('TABLE_NAME')
return
DIVIDE(__completed, __all)
Consider to use DIVIDE instead of "/" https://dax.guide/divide/
I am currently working on an abc/pareto analysis concerning customer IDs.
What I want to calculate is something like this:
ID| Sales / ID |Cum. Sales| %from total | category
G 15.000,00€ 15.000,00 € 21,45% A
D 5.700,00€ 20.700,00 € 29,60% A
H 4.000,00€ 24.700,00 € 35,32% A
Q 3.800,00€ 28.500,00 € 40,75% A
O 3.650,00€ 32.150,00 € 45,97% A
X 3.500,00€ 35.650,00 € 50,97% B
I 3.350,00€ 39.000,00 € 55,76% B
Ü 3.200,00€ 42.200,00 € 60,34% B
Ö 3.050,00€ 45.250,00 € 64,70% B
N 2.900,00€ 48.150,00 € 68,84% B
J 2.750,00€ 50.900,00 € 72,78% C
Ä 2.600,00€ 53.500,00 € 76,49% C
Z 2.450,00€ 55.950,00 € 80,00% C
Y 2.300,00€ 58.250,00 € 83,29% C
L 2.150,00€ 60.400,00 € 86,36% D
P 2.000,00€ 62.400,00 € 89,22% D
W 1.765,00€ 64.165,00 € 91,74% D
R 1.530,00€ 65.695,00 € 93,93% D
F 1.295,00€ 66.990,00 € 95,78% E
V 1.060,00€ 68.050,00 € 97,30% E
B 825,00€ 68.875,00 € 98,48% E
T 590,00€ 69.465,00 € 99,32% E
M 355,00€ 69.820,00 € 99,83% E
C 120,00€ 69.940,00 € 100,00% E
This way I can say that "A-customers" make 50% of my total profit.
I used this tutorial to create my meassures:
https://www.youtube.com/watch?v=rlUBO5qoKow
total_sales = SUM(fact_table[sales])
cumulative sales =
VAR MYSALES = [total_sales]
RETURN
SUMX(
FILTER(
SUMMARIZE(ALLSELECTED(fact_table);fact_table[CustomerID];
"table_sales";[total_sales]);
[total_sales] >= MYSALES);
[table_sales])
Since I am calculating the cumulative sales for >1000 unique customer IDs the calculation takes ages!
Is there a way I can save this calculation in a new table so I only have to calculate it once?
Or does anyone know a Meassure that does the same but is less computationally expensive?
Any help is much appreciated!
You could calculate it once as a calculated column but then ALLSELCTED wouldn't act as you expect since calculated columns cannot be responsive to report filters or slicers.
There are some inefficiencies in your measure though. It looks like you are calculating [total_sales] twice, once inside SUMMARIZE and again for the FILTER.
I haven't tested this measure, but it may be faster as follows:
cumulative sales =
VAR MYSALES = [total_sales]
RETURN
SUMX (
FILTER (
SUMMARIZECOLUMNS (
fact_table[CustomerID];
ALLSELECTED ( fact_table );
"table_sales"; [total_sales]
);
[table_sales] >= MYSALES
);
[table_sales]
)
The important part is reusing [table_sales] in the FILTER but SUMMARIZECOLUMNS might be a bit better too.
I have got below table and want to add calculated column Rank (oldest top-3) that ranks only when Status is "O". Note that **Rank (oldest top-3)**is the desired result.
Status Days open Rank (oldest top-3)
C 1
O 1 4
O 2 3
C 3
C 4
C 5
O 6 2
O 7 1
C 8
C 9
I have got below code but they do not work for me.
Rank = IF(order[Status] = "C", BLANK(),
RANKX(FILTER(order, order[Status] = "O"),
order[Days open], , 1, Dense))
I get top 3 and not the botom one. Also, with filter it filter out any other data. I tried to replace FILTER with ALLSELECTED but it did not work.
Input
I have created a table named order with the following data:
Status Days open
C 1
O 1
O 2
C 3
C 4
C 5
O 6
O 7
C 8
C 9
Code
Then I have added a calculated column with the following DAX:
Rank =
IF('order'[Status] = "C",
BLANK(),
RANKX(
FILTER('order', 'order'[Status] = "O"),
'order'[Days open],
,
0,
Dense
)
)
The only difference compared to your DAX (apart from formatting) is that the second to last option of the RANKX function is 0 instead of 1.
The documentation of RANKX indicates that 0 ranks the series in descending order.
Output
FILTER('order', 'order'[Status] = "O"),change to FILTER(all('order'), 'order'[Status] = "O"),if not, your resutls may the all the same in one table.
I have a data frame as following:
ID Value
A 70
A 80
B 75
C 10
B 50
A 1000
C 60
B 2000
.. ..
I would like to group this data by ID, remove the outliers from the grouped data (the ones we see from the boxplot) and then calculate mean.
So far
grouped = df.groupby('ID')
statBefore = pd.DataFrame({'mean': grouped['Value'].mean(), 'median': grouped['Value'].median(), 'std' : grouped['Value'].std()})
How can I find outliers, remove them and get the statistics.
I believe the method you're referring to is to remove values > 1.5 * the interquartile range away from the median. So first, calculate your initial statistics:
statBefore = pd.DataFrame({'q1': grouped['Value'].quantile(.25), \
'median': grouped['Value'].median(), 'q3' : grouped['Value'].quantile(.75)})
And then determine whether values in the original DF are outliers:
def is_outlier(row):
iq_range = statBefore.loc[row.ID]['q3'] - statBefore.loc[row.ID]['q1']
median = statBefore.loc[row.ID]['median']
if row.Value > (median + (1.5* iq_range)) or row.Value < (median - (1.5* iq_range)):
return True
else:
return False
#apply the function to the original df:
df.loc[:, 'outlier'] = df.apply(is_outlier, axis = 1)
#filter to only non-outliers:
df_no_outliers = df[~(df.outlier)]
Q1 = df['Value'].quantile(0.25)
Q3 = df['Value'].quantile(0.75)
IQR = Q3 - Q1
data = df[~((df['Value'] < (Q1 - 1.5 * IQR)) |(df['Value'] > (Q3 + 1.5 *
IQR))).any(axis=1)]
just do :
In [187]: df[df<100].groupby('ID').agg(['mean','median','std'])
Out[187]:
Value
mean median std
ID
A 75.0 75.0 7.071068
B 62.5 62.5 17.677670
C 35.0 35.0 35.355339
I have a pandas dataframe like this:
Product Group Product ID Units Sold Revenue Rev/Unit
A 451 8 $16 $2
A 987 15 $40 $2.67
A 311 2 $5 $2.50
B 642 6 $18 $3.00
B 251 4 $28 $7.00
I want to transform it to look like this:
Product Group Units Sold Revenue Rev/Unit Mean Abs Deviation
A 25 $61 $2.44 $0.24
B 10 $46 $4.60 $2.00
The Mean Abs Deviation column is to be performed on the Rev/Unit column in the first table. The tricky thing is taking into account the respective weights behind the Rev/Unit calculation.
For example taking a straight MAD of Product Group A's Rev/Unit would yield $0.26. However after taking weight into consideration, the MAD would be $0.24.
I know to use groupby to get the simple summation for units sold and revenue, but I'm a bit lost on how to do the more complicated calculations of the next 2 columns.
Also while we're giving advice/help---is there any easier way to create/paste tables into SO posts??
UPDATE:
Would a solution like this work? I know it will for the summation fields, but not sure how to implement for the latter 2 fields.
grouped_df=df.groupby("Product Group")
grouped_df.agg({
'Units Sold':'sum',
'Revenue':'sum',
'Rev/Unit':'Revenue'/'Units Sold',
'MAD':some_function})
you need to clarify what the "weights" are, I assumed the weights are the number of units sold, but that gives a different results from yours:
pv = df.pivot_table( rows='Product Group',
values=[ 'Units Sold', 'Revenue' ],
aggfunc=sum )
pv[ 'Rev/Unit' ] = pv.Revenue / pv[ 'Units Sold' ]
this gives:
Revenue Units Sold Rev/Unit
Product Group
A 61 25 2.44
B 46 10 4.60
As for WMAD:
def wmad( prod ):
idx = df[ 'Product Group' ] == prod
w = df[ 'Units Sold' ][ idx ]
abs_dev = np.abs ( df[ 'Rev/Unit' ][ idx ] - pv[ 'Rev/Unit' ][ prod ] )
return sum( abs_dev * w ) / sum( w )
pv[ 'Mean Abs Deviation' ] = [ wmad( idx ) for idx in pv.index ]
which as I mentioned gives different result
Revenue Units Sold Rev/Unit Mean Abs Deviation
Product Group
A 61 25 2.44 0.2836
B 46 10 4.60 1.9200
From your suggested solution, you can use a lambda function to operate on each row e.g:
'Rev/Unit': lambda x: calculate_revenue_per_unit(x)
Bear in mind that x is a tuple for each row, so you'll need to unpack that within your calculate_revenue_per_unit function.