I have got below table and want to add calculated column Rank (oldest top-3) that ranks only when Status is "O". Note that **Rank (oldest top-3)**is the desired result.
Status Days open Rank (oldest top-3)
C 1
O 1 4
O 2 3
C 3
C 4
C 5
O 6 2
O 7 1
C 8
C 9
I have got below code but they do not work for me.
Rank = IF(order[Status] = "C", BLANK(),
RANKX(FILTER(order, order[Status] = "O"),
order[Days open], , 1, Dense))
I get top 3 and not the botom one. Also, with filter it filter out any other data. I tried to replace FILTER with ALLSELECTED but it did not work.
Input
I have created a table named order with the following data:
Status Days open
C 1
O 1
O 2
C 3
C 4
C 5
O 6
O 7
C 8
C 9
Code
Then I have added a calculated column with the following DAX:
Rank =
IF('order'[Status] = "C",
BLANK(),
RANKX(
FILTER('order', 'order'[Status] = "O"),
'order'[Days open],
,
0,
Dense
)
)
The only difference compared to your DAX (apart from formatting) is that the second to last option of the RANKX function is 0 instead of 1.
The documentation of RANKX indicates that 0 ranks the series in descending order.
Output
FILTER('order', 'order'[Status] = "O"),change to FILTER(all('order'), 'order'[Status] = "O"),if not, your resutls may the all the same in one table.
Related
I am trying to help a public school here, but I have very limited knowledge in Power Bi so I hope your guys could enlight me on this case:
we have a very simple report with a table and a kpi
Kpi counts all students
table shows studants grades
Student Math Portuguese History Science
StD A 6 6 7 8
StD B 6 7 6 7
StD C 8 9 7 8
StD D 6 6 6 6
StD E 6 7 8 8
StD F 8 6 7 7
the rule that must be applied to the kpi (count(Students)) and to the table is to show studenst only if:
at least 2 subjects are equal or under 6
portuguese is equal or under 6
math is under 6
all the rest should not be showed in the table or counted in the KPI. In this case I would see/count only students A, B, D,E & F
any help would be very appreciated
To tackle your task try the following:
Create a calculated column in your table with the following DAX code:
isValid =
VAR cond_2_subjects = (('Table'[Math] <= 6 ) + ('Table'[Portuguese] <= 6) + ('Table'[History] <= 6) + ('Table'[Science] <= 6)) >= 2
VAR cond_portuguese = 'Table'[Portuguese] <= 6
VAR cond_math = 'Table'[Math] < 6
RETURN
-- This will check if any of the given conditions is true
IF(
cond_2_subjects || cond_portuguese || cond_math,
TRUE(),
FALSE()
)
The table should then look like this:
The KPI (measure) can then be written like so:
# Students =
CALCULATE(
COUNT('Table'[Student]),
-- only count Students where conditions are true (calculated column isValid = True)
'Table'[isValid] = TRUE()
)
The final result should then look like this:
The table on the left has specified 'Table'[isValid] = TRUE() as filter on visual
I have a data set with cols (ID, Calc.CompleteBool where complete = 1 and incomplete = 0) of the form:
ID | Calc.CompleteBool
----------------------------
100| 1
101| 0
103| 1
105| 1
I need to create a measure that gives me a single percentage complete. Thus, the measure needs to count the total number of IDs (n) and divide by that number the total IDs that meet the condition of 'complete' or 1.
E.g. 3 / 4 = 75%
I have tried the following and it does not work. It is returning a value of zero (0). Your assistance is greatly appreciated.
Here is my code:
Calc.pctComplete =
VAR total_aps =
CALCULATE(
COUNT('TABLE_NAME'[ID]),
FILTER(
ALL('TABLE_NAME'),
'TABLE_NAME'[Calc.CompleteBool] = 'TABLE_NAME'[Calc.CompleteBool]
)
)
VAR total_aps_complete =
CALCULATE(
COUNT('TABLE_NAME'[Calc.CompleteBool]),
FILTER(
ALL('TABLE_NAME'),
'TABLE_NAME'[Calc.CompleteBool] = 1
)
)
RETURN total_aps_complete/total_aps
Update
I also need to add another filter in that only returns rows where "CheckID" = Yes.
There are 3,700 total IDs
There are ~ 1,500 IDs where CheckID = Yes
And roughly 8 where Calc.CompleteBool = 1
ID | Calc.CompleteBool | CheckID |
---------------------------------------
100| 1 | Yes
101| 0 | No
103| 1 | No
105| 1 | Yes
106| 0 | Yes
{100, 105, 106} are the set that would be included. So the division would be 2/3 = 66% complete.
Your result can be calculated with simple dax formula as following. The concept of calculate with filter can transform count into similar function like excel countifs:
Completion = CALCULATE(COUNT(Sheet1[ Calc.CompleteBool]),
Sheet1[ Calc.CompleteBool]=1, Sheet1[CheckID]="Yes") /
COUNT(Sheet1[ Calc.CompleteBool])
Output:
You may use this measure (add +0 to __completed if you want see 0% if all rows has 0 in Calc.CompleteBool either you get BLANK:
Percentage% =
var __completed = CALCULATE( COUNTROWS(VALUES(TABLE_NAME[ID])), 'TABLE_NAME'[Calc.CompleteBool] = 1) + 0
var __all = COUNTROWS('TABLE_NAME')
return
DIVIDE(__completed, __all)
Consider to use DIVIDE instead of "/" https://dax.guide/divide/
I have a cross reference table and another table with the list of "Items"
I connect "PKG" to "Item" as "PKG" has distinct values.
Example:
**Cross table** **Item table**
Bulk PKG Item Value
A D A 2
A E B 1
B F C 4
C G D 5
E 8
F 3
G 1
After connecting the 2 above tables by PKG and ITEM i get the following result
Item Value Bulk PKG
A 2
B 1
C 4
D 5 A D
E 8 A E
F 3 B F
G 1 C G
As you can see nothing shows up for the first 3 values since it is connected by pkg and those are "Bulk" values.
I am trying to create a new column that uses the cross reference table
I want to create the following with a new column
Item Value Bulk PKG NEW COLUMN
A 2 5
B 1 3
C 4 1
D 5 A D 5.75
E 8 A E 9.2
F 3 B F 3.45
G 1 C G 1.15
The new column is what I am trying to create.
I want the original values to show up for bulk as they appear for pkg. I then want the Pkg items to be 15% higher than the original value.
How can I calculate this based on the setup?
Just write a conditional custom column in the query editor:
New Column = if [Bulk] = null then [Value] else 1.15 * [Value]
You can also do this as a DAX calculated column:
New Column = IF( ISBLANK( Table1[Bulk] ), Table1[Value], 1.15 * Table1[Value] )
I've a dataframe which looks like this:
wave mean median mad
0 4050.32 -0.016182 -0.011940 0.008885
1 4208.98 0.023707 0.007189 0.032585
2 4508.28 3.662293 0.001414 7.193139
3 4531.62 -15.459313 -0.001523 30.408377
4 4551.65 0.009028 0.007581 0.005247
5 4554.46 0.001861 0.010692 0.027969
6 6828.60 -10.604568 -0.000590 21.084799
7 6839.84 -0.003466 -0.001870 0.010169
8 6842.04 -32.751551 -0.002514 65.118329
9 6842.69 18.293519 -0.002158 36.385884
10 6843.66 0.006386 -0.002468 0.034995
11 6855.72 0.020803 0.000886 0.040529
As it's clearly evident in the above table that some of the values in the column mad and median are very big(outliers). So i want to remove the rows which have these very big values.
For example in row3 the value of mad is 30.408377 which very big so i want to drop this row. I know that i can use one line
to remove these values from the columns but it doesn't removes the complete row
df[np.abs(df.mad-df.mad.mean()) <= (3*df.mad.std())]
But i want to remove the complete row.
How can i do that?
Predicates like what you've given will remove entire rows. But none of your data is outside of 3 standard deviations. If you tone it down to just one standard deviation, rows are removed with your example data.
Here's an example using your data:
import pandas as pd
import numpy as np
columns = ["wave", "mean", "median", "mad"]
data = [
[4050.32, -0.016182, -0.011940, 0.008885],
[4208.98, 0.023707, 0.007189, 0.032585],
[4508.28, 3.662293, 0.001414, 7.193139],
[4531.62, -15.459313, -0.001523, 30.408377],
[4551.65, 0.009028, 0.007581, 0.005247],
[4554.46, 0.001861, 0.010692, 0.027969],
[6828.60, -10.604568, -0.000590, 21.084799],
[6839.84, -0.003466, -0.001870, 0.010169],
[6842.04, -32.751551, -0.002514, 65.118329],
[6842.69, 18.293519, -0.002158, 36.385884],
[6843.66, 0.006386, -0.002468, 0.034995],
[6855.72, 0.020803, 0.000886, 0.040529],
]
df = pd.DataFrame(np.array(data), columns=columns)
print("ORIGINAL: ")
print(df)
print()
res = df[np.abs(df['mad']-df['mad'].mean()) <= (df['mad'].std())]
print("REMOVED: ")
print(res)
this outputs:
ORIGINAL:
wave mean median mad
0 4050.32 -0.016182 -0.011940 0.008885
1 4208.98 0.023707 0.007189 0.032585
2 4508.28 3.662293 0.001414 7.193139
3 4531.62 -15.459313 -0.001523 30.408377
4 4551.65 0.009028 0.007581 0.005247
5 4554.46 0.001861 0.010692 0.027969
6 6828.60 -10.604568 -0.000590 21.084799
7 6839.84 -0.003466 -0.001870 0.010169
8 6842.04 -32.751551 -0.002514 65.118329
9 6842.69 18.293519 -0.002158 36.385884
10 6843.66 0.006386 -0.002468 0.034995
11 6855.72 0.020803 0.000886 0.040529
REMOVED:
wave mean median mad
0 4050.32 -0.016182 -0.011940 0.008885
1 4208.98 0.023707 0.007189 0.032585
2 4508.28 3.662293 0.001414 7.193139
3 4531.62 -15.459313 -0.001523 30.408377
4 4551.65 0.009028 0.007581 0.005247
5 4554.46 0.001861 0.010692 0.027969
6 6828.60 -10.604568 -0.000590 21.084799
7 6839.84 -0.003466 -0.001870 0.010169
10 6843.66 0.006386 -0.002468 0.034995
11 6855.72 0.020803 0.000886 0.040529
Observe that rows indexed 8 and 9 are now gone.
Be sure you're reassigning the output of df[np.abs(df['mad']-df['mad'].mean()) <= (df['mad'].std())] as shown above. The operation is not done in place.
Doing df[np.abs(df.mad-df.mad.mean()) <= (3*df.mad.std())] will not change the dataframe.
But assign it back to df, so that:
df = df[np.abs(df.mad-df.mad.mean()) <= (3*df.mad.std())]
I want to aggregate (sum up) the following product list by groups (see below):
prods <- list("101.2000"=data.frame(1,2,3),
"102.2000"=data.frame(4,5,6),
"103.2000"=data.frame(7,8,9),
"104.2000"=data.frame(1,2,3),
"105.2000"=data.frame(4,5,6),
"106.2000"=data.frame(7,8,9),
"101.2001"=data.frame(1,2,3),
"102.2001"=data.frame(4,5,6),
"103.2001"=data.frame(7,8,9),
"104.2001"=data.frame(1,2,3),
"105.2001"=data.frame(4,5,6),
"106.2001"=data.frame(7,8,9))
test= list("100.2000"=data.frame(2,3,5),
"100.2001"=data.frame(4,5,6))
names <- c("A", "B", "C")
prods <- lapply(prods, function (x) {colnames(x) <- names; return(x)})
Each element of the product list (prods) has a name combination of the product number and the year (e.g. 101.2000 --> 101 = prod nr. and 2000 = year). And the groups only contain product numbers for the aggregation.
group1 <- c(101, 106)
group2 <- c(102, 104)
group3 <- c(105, 103)
My expected result, shows the aggregated product groups by year:
$group1.2000
A B C
1 8 10 12
$group2.2000
A B C
1 5 7 9
$group3.2000
A B C
1 11 13 15
$group1.2001
A B C
1 8 10 12
$group2.2001
A B C
1 5 7 9
$group3.2001
A B C
1 11 13 15
So far, I tried this way: First I decomposed the names of prods into product numbers:
prodnames <- names(prods)
prodnames_sub <- gsub("\\..*.","", prodnames)
And then I tried to aggregate using lapply:
lapply(prods, function(x) aggregate( ... , FUN = sum)
However, I didn't find how to implement the previous product numbers in the aggregation function. Ideas? Thanks
Here are two approaches. No packages are used in either one.
1) Using lists Create a two column data.frame S from the groups whose columns are the products (value column) and associated groups (ind column). Create the list to split by, By. In code to produce By, sub("\\.*", "", names(prods)) extracts the products and match is then used to find the associated group. sub("\\..*", "", names(prods)) extracts the year. Next perform the split and lapply over it to run the summations. The two components of By (group and year) can be reversed to change the order of the output, if desired.
S <- stack(list(group1 = group1, group2 = group2, group3 = group3))
By <- list(group = S$ind[match(sub("\\..*", "", names(prods)), S$values)],
year = sub(".*\\.", "", names(prods)))
lapply(split(prods, By), function(x) colSums(do.call(rbind, x)))
2) Using data.frames Convert the groups and prods each to a data frame, merge them, perform an aggregate and split back into a list. The output is the same as requested except for order. (Reverse the two right hand variables in the aggregate formula to get the order shown in the question but that will also reverse the two parts of each component name in he output list.)
S <- stack(list(group1 = group1, group2 = group2, group3 = group3))
DF0 <- do.call(rbind, prods)
DF <- cbind(do.call(rbind, strsplit(rownames(DF0), ".", fixed = TRUE)), DF0)
M <- merge(DF, S, all.x = TRUE, by = 1)
Ag <- aggregate(cbind(A, B, C) ~ ind + `2`, M, sum)
lapply(split(Ag, paste(Ag[[1]], Ag[[2]], sep = ".")), "[", 3:5)
giving:
$group1.2000
A B C
1 8 10 12
$group1.2001
A B C
4 8 10 12
$group2.2000
A B C
2 5 7 9
$group2.2001
A B C
5 5 7 9
$group3.2000
A B C
3 11 13 15
$group3.2001
A B C
6 11 13 15