I am currently working on an abc/pareto analysis concerning customer IDs.
What I want to calculate is something like this:
ID| Sales / ID |Cum. Sales| %from total | category
G 15.000,00€ 15.000,00 € 21,45% A
D 5.700,00€ 20.700,00 € 29,60% A
H 4.000,00€ 24.700,00 € 35,32% A
Q 3.800,00€ 28.500,00 € 40,75% A
O 3.650,00€ 32.150,00 € 45,97% A
X 3.500,00€ 35.650,00 € 50,97% B
I 3.350,00€ 39.000,00 € 55,76% B
Ü 3.200,00€ 42.200,00 € 60,34% B
Ö 3.050,00€ 45.250,00 € 64,70% B
N 2.900,00€ 48.150,00 € 68,84% B
J 2.750,00€ 50.900,00 € 72,78% C
Ä 2.600,00€ 53.500,00 € 76,49% C
Z 2.450,00€ 55.950,00 € 80,00% C
Y 2.300,00€ 58.250,00 € 83,29% C
L 2.150,00€ 60.400,00 € 86,36% D
P 2.000,00€ 62.400,00 € 89,22% D
W 1.765,00€ 64.165,00 € 91,74% D
R 1.530,00€ 65.695,00 € 93,93% D
F 1.295,00€ 66.990,00 € 95,78% E
V 1.060,00€ 68.050,00 € 97,30% E
B 825,00€ 68.875,00 € 98,48% E
T 590,00€ 69.465,00 € 99,32% E
M 355,00€ 69.820,00 € 99,83% E
C 120,00€ 69.940,00 € 100,00% E
This way I can say that "A-customers" make 50% of my total profit.
I used this tutorial to create my meassures:
https://www.youtube.com/watch?v=rlUBO5qoKow
total_sales = SUM(fact_table[sales])
cumulative sales =
VAR MYSALES = [total_sales]
RETURN
SUMX(
FILTER(
SUMMARIZE(ALLSELECTED(fact_table);fact_table[CustomerID];
"table_sales";[total_sales]);
[total_sales] >= MYSALES);
[table_sales])
Since I am calculating the cumulative sales for >1000 unique customer IDs the calculation takes ages!
Is there a way I can save this calculation in a new table so I only have to calculate it once?
Or does anyone know a Meassure that does the same but is less computationally expensive?
Any help is much appreciated!
You could calculate it once as a calculated column but then ALLSELCTED wouldn't act as you expect since calculated columns cannot be responsive to report filters or slicers.
There are some inefficiencies in your measure though. It looks like you are calculating [total_sales] twice, once inside SUMMARIZE and again for the FILTER.
I haven't tested this measure, but it may be faster as follows:
cumulative sales =
VAR MYSALES = [total_sales]
RETURN
SUMX (
FILTER (
SUMMARIZECOLUMNS (
fact_table[CustomerID];
ALLSELECTED ( fact_table );
"table_sales"; [total_sales]
);
[table_sales] >= MYSALES
);
[table_sales]
)
The important part is reusing [table_sales] in the FILTER but SUMMARIZECOLUMNS might be a bit better too.
Related
I think the title accurately describes what I'm trying to achieve.
https://docs.google.com/spreadsheets/d/1sRQzXXZ4a3vjAwvsH4rrWfijxV4jCl_OV3iQO00yvlQ/edit?usp=sharing
Essentially, I have a table of data for houses, the street it's on, whether it has a pool or gates etc. and I'm trying to create a lookup in Google Sheets so if someone is trying to find a house with a pool for a maximum of $800k then I can return results that match the criteria.
This is how the table data looks.
I want to be able to query the data here in columns D, E, F, G (G being a maximum value in the lookup) and return the data in columns A, B, C if everything matches.
I would enter on a different tab, the maximum budget (which would need to do a max lookup of column G, and then look for any Y/N in the other columns and return a list of all matches.
Is this possible with Google Sheets?
Thanks, for any help you can offer.
use:
=QUERY(Houses!A:I,
"select C,B,A,H
where H <= "&B3&"
and D = '"&B4&"'
and E = '"&B5&"'
and F = '"&B6&"'", 0)
update:
=IFERROR(QUERY(HousingData,
"select C,B,A,G
where G <= "&B3&
IF(B4="Y", " and D = '"&B4&"'", )&
IF(B5="Y", " and E = '"&B5&"'", )&
IF(B6="Y", " and F = '"&B6&"'", )&
IF(B7="Y", " and J = '"&B7&"'", ), 0), "No houses found.")
I have 2 tables like this:
PM_History2
Serial# Good
A TRUE
B FALSE
A TRUE
B FALSE
C TRUE
A FALSE
C TRUE
CONTRACTS
Serial# Enrollment#
A 1
B 2
C 3
D 4
I have a measure that calculates the number of Good for TRUE:
Count of Good for True =
CALCULATE(COUNTA('PM_History2'[Good]), 'PM_History2'[Good] IN { TRUE })
I then have a measure that calculates the percentage of TRUEs for Good.
PM Score = 'PM_History2'[Count of Good for True]/COUNTROWS(PM_History2)
When I create a table visualization to show all the Serial# and their PM Score I get this:
Serial# PM Score
A .67
B
C 1.00
D
What can I do to get what should be a zero to come in as 0 and what should be blank to be blank. Like this:
Serial# PM Score
A .67
B 0
C 1.00
D
Thank you in advance!
Try this:
PM Score = DIVIDE ( [Count of Good for True] + 0, COUNTROWS ( PM_History2 ) )
Adding + 0 makes the numerator nonblank but the DIVIDE function still returns a blank when the denominator is blank, thus distinguishing the results for B and D.
I have a cross reference table and another table with the list of "Items"
I connect "PKG" to "Item" as "PKG" has distinct values.
Example:
**Cross table** **Item table**
Bulk PKG Item Value
A D A 2
A E B 1
B F C 4
C G D 5
E 8
F 3
G 1
After connecting the 2 above tables by PKG and ITEM i get the following result
Item Value Bulk PKG
A 2
B 1
C 4
D 5 A D
E 8 A E
F 3 B F
G 1 C G
As you can see nothing shows up for the first 3 values since it is connected by pkg and those are "Bulk" values.
I am trying to create a new column that uses the cross reference table
I want to create the following with a new column
Item Value Bulk PKG NEW COLUMN
A 2 5
B 1 3
C 4 1
D 5 A D 5.75
E 8 A E 9.2
F 3 B F 3.45
G 1 C G 1.15
The new column is what I am trying to create.
I want the original values to show up for bulk as they appear for pkg. I then want the Pkg items to be 15% higher than the original value.
How can I calculate this based on the setup?
Just write a conditional custom column in the query editor:
New Column = if [Bulk] = null then [Value] else 1.15 * [Value]
You can also do this as a DAX calculated column:
New Column = IF( ISBLANK( Table1[Bulk] ), Table1[Value], 1.15 * Table1[Value] )
I have got below table and want to add calculated column Rank (oldest top-3) that ranks only when Status is "O". Note that **Rank (oldest top-3)**is the desired result.
Status Days open Rank (oldest top-3)
C 1
O 1 4
O 2 3
C 3
C 4
C 5
O 6 2
O 7 1
C 8
C 9
I have got below code but they do not work for me.
Rank = IF(order[Status] = "C", BLANK(),
RANKX(FILTER(order, order[Status] = "O"),
order[Days open], , 1, Dense))
I get top 3 and not the botom one. Also, with filter it filter out any other data. I tried to replace FILTER with ALLSELECTED but it did not work.
Input
I have created a table named order with the following data:
Status Days open
C 1
O 1
O 2
C 3
C 4
C 5
O 6
O 7
C 8
C 9
Code
Then I have added a calculated column with the following DAX:
Rank =
IF('order'[Status] = "C",
BLANK(),
RANKX(
FILTER('order', 'order'[Status] = "O"),
'order'[Days open],
,
0,
Dense
)
)
The only difference compared to your DAX (apart from formatting) is that the second to last option of the RANKX function is 0 instead of 1.
The documentation of RANKX indicates that 0 ranks the series in descending order.
Output
FILTER('order', 'order'[Status] = "O"),change to FILTER(all('order'), 'order'[Status] = "O"),if not, your resutls may the all the same in one table.
I have a pandas dataframe like this:
Product Group Product ID Units Sold Revenue Rev/Unit
A 451 8 $16 $2
A 987 15 $40 $2.67
A 311 2 $5 $2.50
B 642 6 $18 $3.00
B 251 4 $28 $7.00
I want to transform it to look like this:
Product Group Units Sold Revenue Rev/Unit Mean Abs Deviation
A 25 $61 $2.44 $0.24
B 10 $46 $4.60 $2.00
The Mean Abs Deviation column is to be performed on the Rev/Unit column in the first table. The tricky thing is taking into account the respective weights behind the Rev/Unit calculation.
For example taking a straight MAD of Product Group A's Rev/Unit would yield $0.26. However after taking weight into consideration, the MAD would be $0.24.
I know to use groupby to get the simple summation for units sold and revenue, but I'm a bit lost on how to do the more complicated calculations of the next 2 columns.
Also while we're giving advice/help---is there any easier way to create/paste tables into SO posts??
UPDATE:
Would a solution like this work? I know it will for the summation fields, but not sure how to implement for the latter 2 fields.
grouped_df=df.groupby("Product Group")
grouped_df.agg({
'Units Sold':'sum',
'Revenue':'sum',
'Rev/Unit':'Revenue'/'Units Sold',
'MAD':some_function})
you need to clarify what the "weights" are, I assumed the weights are the number of units sold, but that gives a different results from yours:
pv = df.pivot_table( rows='Product Group',
values=[ 'Units Sold', 'Revenue' ],
aggfunc=sum )
pv[ 'Rev/Unit' ] = pv.Revenue / pv[ 'Units Sold' ]
this gives:
Revenue Units Sold Rev/Unit
Product Group
A 61 25 2.44
B 46 10 4.60
As for WMAD:
def wmad( prod ):
idx = df[ 'Product Group' ] == prod
w = df[ 'Units Sold' ][ idx ]
abs_dev = np.abs ( df[ 'Rev/Unit' ][ idx ] - pv[ 'Rev/Unit' ][ prod ] )
return sum( abs_dev * w ) / sum( w )
pv[ 'Mean Abs Deviation' ] = [ wmad( idx ) for idx in pv.index ]
which as I mentioned gives different result
Revenue Units Sold Rev/Unit Mean Abs Deviation
Product Group
A 61 25 2.44 0.2836
B 46 10 4.60 1.9200
From your suggested solution, you can use a lambda function to operate on each row e.g:
'Rev/Unit': lambda x: calculate_revenue_per_unit(x)
Bear in mind that x is a tuple for each row, so you'll need to unpack that within your calculate_revenue_per_unit function.