DAX Calculated column to consider multiple rows to generate a result - powerbi

I have a dataset like this.
Reference_ID MyCode
1 NULL
1 S1010
1 NULL
1 1011
2 NULL
2 NULL
I want to return True for 1, as 1 has a value other than empty or blank or NULL. While False for 2.
Reference_ID MyCode ExpectedOutput
1 NULL True
1 S1010 True
1 NULL True
1 1011 True
2 NULL False
2 NULL False
How can I do this using DAX in Power BI?

try something like,
ExpectedOutPut =
SWITCH (
TRUE (),
Reference_ID = 1, TRUE (),
Reference_ID = 2, FALSE (),
BLANK ()
)

This works for me.
I made use of the COUNTROWS function to do this.
Non_NULL_MyCodes =
Var required_PolicyNumber = (Sheet1[Reference_ID])
Var numberofRows = CALCULATE(COUNTROWS(FILTER(ALL(Sheet1),Sheet1[Reference_ID] = required_PolicyNumber && Sheet1[MyCode] <> "NULL")))
Var result = IF(numberofRows>0,True,False)
return result

Related

How to mark column with 0 or 1 if data exists in another table in Power BI?

I have 2 tables
1 is a set of employees (Table 1)
1 is a set of terminations (Table 2)
They will both match on an Employee ID column. I want to add a new calculated column to Table 1 that returns 1 if the employee is in Table 2 and returns 0 otherwise. I can't figure out how to write this in DAX. I feel like this should be extremely simple.
I tried
Column =
VAR X = RELATED(Table1[Employee ID])
VAR RES = IF(ISBLANK(X), "no data", X)
RETURN
RES
This just returns "#ERROR" in all values.
Make sure your IF statement returns the same data type for both the true and the false section:
use either
VAR RES = IF(ISBLANK(X), "no data", FORMAT(X, "#"))
or
VAR RES = IF(ISBLANK(X), 0, X)
And referring to the title of your question you should actually use
VAR RES = IF(ISBLANK(X), 0, 1)

How to create a measure that can mark two columns conditionally?

I have a table that is something like this:
ID
A
B
1H4
6S8
True
1L7
True
6T8
True
7Y8
6S2
True
True
1H1
True
6S3
True
1H9
True
True
6S0
I want to create a measure that evaluates a table to be able to conditionally (to later make conditional rules for report i.e. place color values in such cells) evaluate the cells for the following 2 conditions:
when there are values in both Column A and Column B
when there are blanks/nulls in both columns
(If both can be done in a single measure this would be ideal)
You can use a measure like this:
Background Color =
var Count_A = COUNTBLANK('Table'[A])
var Count_B = COUNTBLANK('Table'[B])
RETURN
SWITCH(TRUE();
AND(Count_A = 0; Count_B = 0); "Red";
AND(Count_A > 0; Count_B > 0); "Green";
"")
First count the blank values in each of the columns, and then return a different color, depending on both counts. Then use this measure to conditionally format the background color for each of the columns:
to get something like this:
You'll need a custom column with the logic of
Column name =
SWITCH (
TRUE (),
A = 'True'
&& B = 'True', "True",
A = ''
&& B = '', "False",
"Else goes here"
)
You'll have to change the logic if the cells without anything in them are '' or true blanks. SWITCH acts like a multiple IF statement, and Switch with TRUE() evaluates the conditions in the later steps.
You can achieve the desired result by using both custom columns and measures.
Custom Column
Column =
IF (
Table[A] <> BLANK ()
&& Table[B] <> BLANK (),
"Green",
IF ( Table[A] = BLANK () && Table[B] = BLANK (), "Red" )
)
Measure
Measure X =
IF(COUNTBLANK(Table[A]) = 0
&& COUNTBLANK(Table[B]) = 0 , "#00FF00",
IF(COUNTBLANK(Table[A]) <> 0
&& COUNTBLANK(Table[B]) <> 0 , "#FF0000")
)
After creating a measure or custom column go to conditional formatting and select background colour, and you may select either measure or column as per your choice. this will give you the desired result.
Output

Skip a record if empty

I've created a function that cleans my data of extra columns with null values. There should always be 15 columns after this however occasionally there is more or less and when this happens those tables should just be removed.
I've tried just skipping all those rows and returning an empty table but when I try to expand those tables I get an error "Cannot convert the value false to type Number."
(tbl as table) =>
let
ColumnNames = Table.ColumnNames(tbl),
RemoveNullColumns = Table.SelectColumns(tbl, List.Select(ColumnNames, each List.MatchesAny(Table.Column(tbl, _), each _ <> null))),
CheckColumns = Table.Skip(RemoveNullColumns, Table.ColumnCount(RemoveNullColumns) <> 15)
in
CheckColumns
See if this works for you. Removes any columns containing a null and returns tbl only if there are 15 remaining columns
(tbl as table) =>
let ColumnNames = Table.ColumnNames(tbl),
ReplacedValue = Table.ReplaceValue(tbl,null,"imanull",Replacer.ReplaceValue,ColumnNames ),
UnpivotedColumns = Table.UnpivotOtherColumns(ReplacedValue, {}, "Attribute", "Value"),
FilteredRows = Table.SelectRows(UnpivotedColumns, each ([Value] = "imanull")),
NonNullColumns= List.Difference(ColumnNames,List.Distinct(FilteredRows[Attribute])),
Results = if List.Count (NonNullColumns) <> 15 then null else Table.SelectColumns(tbl,NonNullColumns)
in Results

How should I check more than 10 columns for nan values and select those rows having nan values, ie keepna() instead of dropna()

Output = df[df['TELF1'].isnull() | df['STCEG'].isnull() | df['STCE1'].isnull()]
This is my code I am checking here if a column contains nan value than only select that row. But here I have over 10 columns to do that. This will make my code huge. Is there any short or more pythonic way to do it.
df.dropna(subset=['STRAS','ORT01','LAND1','PSTLZ','STCD1','STCD2','STCEG','TELF1','BANKS','BANKL','BANKN','E-MailAddress'])
Is there any way to get the opposite of the above command.It will give me the same output what I was trying above but it was getting very long.
Using loc with a simple boolean filter should work:
df = pd.DataFrame(np.random.random((5,4)), columns=list('ABCD'))
subset = ['C', 'D']
df.at[0, 'C'] = None
df.at[4, 'D'] = None
>>> df
A B C D
0 0.985707 0.806581 NaN 0.373860
1 0.232316 0.321614 0.606824 0.439349
2 0.956236 0.169002 0.989045 0.118812
3 0.329509 0.644687 0.034827 0.637731
4 0.980271 0.001098 0.918052 NaN
>>> df.loc[df[subset].isnull().any(axis=1), :]
A B C D
0 0.985707 0.806581 NaN 0.37386
4 0.980271 0.001098 0.918052 NaN
df[subset].isnull() returns boolean values of whether or not any of the subset columns have a NaN.
>>> df[subset].isnull()
C D
0 True False
1 False False
2 False False
3 False False
4 False True
.any(axis=1) will return True if any value in the row (because axis=1, otherwise the column) is True.
>>> df[subset].isnull().any(axis=1)
0 True
1 False
2 False
3 False
4 True
dtype: bool
Finally, use loc (rows, columns) to locate rows that satisfy a boolean condition. The : symbol means to select everything, so it selects all columns for rows 0 and 4.

Geo Pandas Data Frame / Matrix - filter/drop NaN / False values

I applied GeoSeries.almost_equals(other[, decimal=6]) function to geo data frame with 10 mil entries, in order to find multiple geo points close to each other.
:
which gave me matrix, now i need to filter all True values in order to create DF/list with only POI that are geo related, so I used:
Now, I struggle to figure out how to proceed further with filters of this matrix.
Expected output is either vector, list or ideally DF with all TRUE (matched) values but matched to each other re 1 to 1, and repeated (if [1,9] then [9,1] to be removed from output
list example:
DF example:
Consider this example dataframe:
In [1]: df = pd.DataFrame([[True, False, False, True],
...: [False, True, True, False],
...: [False, True, True, False],
...: [True, False, False, True]])
In [2]: df
Out[2]:
0 1 2 3
0 True False False True
1 False True True False
2 False True True False
3 True False False True
A possible solution to get to the dataframe of matching indexes:
First I use np.triu to only consider the upper triangle (so you don't have duplicates):
In [15]: df2 = pd.DataFrame(np.triu(df))
In [16]: df2
Out[16]:
0 1 2 3
0 True False False True
1 False True True False
2 False False True False
3 False False False True
Then I stack the dataframe, give the index levels the desired names, and select only the rows where we have 'True' values:
In [17]: result = df2.stack()
In [18]: result
Out[18]:
0 0 True
1 False
2 False
3 True
1 0 False
1 True
2 True
3 False
2 0 False
1 False
2 True
3 False
3 0 False
1 False
2 False
3 True
dtype: bool
In [21]: result.index.names = ['POI_id', 'matched_POI_ids']
In [23]: result[result].reset_index()
Out[23]:
POI_id matched_POI_ids 0
0 0 0 True
1 0 3 True
2 1 1 True
3 1 2 True
4 2 2 True
5 3 3 True
You can then of course delete the column with trues: .drop(0, axis=1)