Different measure in a drill up scenario - Power BI - powerbi

This case is data sensitive, so i've created a example.
The goal is to agreggate two assets and change how the measure behaves when i do so.
Data: (1)
Measure:
Measure =
Var Datafim = MAX('NAVPS'[date])
Var DataInicio = MIN(NAVPS[date])
Return
CALCULATE(PRODUCTX('NAVPS','NAVPS'[profitability + 1]),FILTER('NAVPS','NAVPS'[date] >= DataInicio && 'NAVPS'[date] <= Datafim))) - 1
The result it shows: (2)
The result i want after drilling up: (3)
When i drill up the Product function just multiples all [profitability + 1] values from Asset A and Asset B, but the correct way would be (1+[Asset A Measure])*(1+[Asset B Measure]).
Asset A and B are the only ones that i want to see aggregated, the other ones just are repeated on the [Classf] column.
The tricky part here is that both assets A and B can come with different names, so i can't put a IF scenario that states [Asset] = Asset A or [Asset] = Asset B.
Do you guys see another way of showing this result ?
It doesnt need to be just like the image (3), as long as it shows what i want.
Best regards!

Related

How to choose the MIN of a calculated column (not in Power Query)

Working with basketball data, I'm trying to get the time on court for the players (there are some columns that have information about a player or players).
I tried to obtain the value with a calculated column, named "TimeOnCourt". The code works for most cases but there is a case that, due to a mistake in the data entry team, there are different values of the players columns for the same "TimeOnCourt" so, when I try to visualize the information, the data entry mistake comes out.
I guess I could use the column "Index" to add a piece of code to choose the MIN value for the "TimeOnCourt" column but, after trying some options, I don't know where to put it or if I have to change the full code.
I also tried with Test_Flags but not working for all cases (but could fix 2 of the 4 cases).
Add you the link with the pbix file and the Test_Flag measures I tried: Link to pbix file v3
And the image with the mistake marked. The expected time in the right visualization should be 0:40:00 instead of 0:43:03 (it's due to the duplicate in Full Quarter = 2Q and Time_Def = 0:04:00. This could happen again although I talked with them so the solution should be general, not filtering this specific case.
Problem

Importrange + Query + Matches + Regexp

I am trying to filter out data from a different sheet with a specific account number. However the code below doesn't give out any results
=query(importrange(Setup!B1,"Sheet1!A2:F"),"Select * where Col4 matches '\d\d-\d\d\d\d-[14-7]\d\d\d'",1)
This is supposed to filter out all accounts where the 1st digit in the 3rd group of numbers is either 1,4,5,6 or 7. The width of the account numbers are all the same following the format xx-xxxx-xxxx.
Try using this as your match criteria:
\d{2}-\d{4}-[14-7]{3}\d
Also, while I can't see your data, make sure you actually have a header in the first row of your IMPORTRANGE results (which you've requested with the 1 at the end of the QUERY). If you don't actually have headers, the 1 will leave you with one more result than you want; if that is the case, just remove the ,1 from the end of the QUERY.
If this doesn't produce the results you want, it may be due to mixed data types in your raw data that are being filtered out by the QUERY. In that case, you can try using FILTER and REGEXMATCH instead:
=ArrayFormula(FILTER(IMPORTRANGE(Setup!B1,"Sheet1!A2:F"),REGEXMATCH(IMPORTRANGE(Setup!B1,"Sheet1!D2:D"),"\d{2}-\d{4}-[14-7]{3}\d")))
It is always hard to write complex formulas sight unseen. If none of these solutions (which work in my local sheet) produce the results you expect, I encourage you to share a link to both of your sheets. The raw data sheet being called by IMPORTRANGE can be "View Only"; but you'll want to set the Share permission on the second sheet (the one with the IMPORTRANGE formula itself) to "Anyone with the link..." and "Editor," so that those here can access it to test.

Checking if context is total

What might be an alternative way, possibly more effective, for checking if context is total. I use this measure as benchmark:
IsTotal1 = CALCULATE(COUNT(Tab[Store]), ALLSELECTED(Tab)) = COUNT(Tab[Store])
The idea is that it calculates COUNT on a table with filters removed (left side, so we get counts for all dimensions in context) and checks it against the COUNT with current context. If both are the same, we have total.
I know that using the function HASONEVALUE might be tempting:
IsTotal2 = NOT(HASONEVALUE(Tab[Store]))
However, using this approach has a serious drawback. If we make a table displaying sales by store and by product then the first measure will work and the second will fail. Moreover, if we display sales by product only the first measure will still work, and the second should be retyped to HASONEVALUE(Tab[Product]).
So I want the measure to be resistant to any change of granularity due to adding new dimension to table visual.
Based on the information you provided in the comments, it sounds like you have a page- or report level filter. In that case, you can't rely on functions such as ISFILTERED(...) or ISCROSSFILTERED(...), as these external filters or slicers could impact the result returned from these two functions.
So you have to either stick with your approach (perhaps changing COUNT(...) to COUNTROWS(Tab) could improve the performance slightly), or write something like
ISINSCOPE('Tab'[Store]) || ISINSCOPE('Tab'[Product]) || etc...
where you repeat ISINSCOPE for every column that could potentially be used to slice the data, as ISINSCOPE is the only function that distinguishes using a column on a filter/slicer vs. using it as a row/column grouping on a table/matrix visual.

How can one create a single table from multiple datasets?

I'm trying to create a descriptive table by treatment group. For my analysis, I have 3 different partitions of the data (because I'm running 3 separate analyses) from a complete data set, but I only have one statistic from each subset that I am trying to describe, so I think it'd look better in one complete table. At the end, I'd like an output that can convert to latex (as I'm using bookdown).
I've been using the compareGroups package to easily create each table individually. I know that there is an rbind function that allows to create a stacked table, but it won't let me combine them because the n of each separate data frame is different (due to missingness). For instance, I'm trying to study marriage in one of my analyses, and later divorce (which is a separate analysis), and so the n's of these two data frames differ, but the definition of treatment group is the same.
Ideally, I'd have two columns, one for the treatment group and one for the control group. There would be two rows, one that has age of first marriage, and the second row which would have length of that first marriage, and then the respective ns of the cells.
library(compareGroups)
d1 <- compareGroups(treat ~ time1mar,
data = nlsy.mar,
simplify=TRUE,
na.action=na.omit) %>% createTable(.,
type=1,
show.p.overall = FALSE)
d2 <- compareGroups(treat ~ time1div,
data = nlsy.div,
simplify=TRUE,
na.action=na.omit) %>% createTable(.,
type=1,
show.p.overall = FALSE)
d.tot <- rbind(`First Age at Marriage` = d1, `Length of First Marriage` = d2)
This is the error that I get:
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 6626, 5057
Any suggestions?
The problem might be that you're using na.omit which delets the cases/rows with NAs from both of your datasets. Probably a different amount of cases get removed from each data set. But actually different numbers of row should only be a problem with cbind. However you might try to change the na.action option.
I'm just guessing. As said by joshpk without sample data is difficult to reproduce your problem.

dax code for count and distinct count measures (or calculated columns)

I hope somebody can help me with some hints for the following analysis. The students may do some actions for some courses (enroll, join, grant,...) and also the reverse - to cancel the latest action.
The first metric is to count all the action occurred in the system between two dates - these are exposed like a filter/slicer.
Some sample data :
person-id,person-name,course-name,event,event-rank,startDT,stopDT
11, John, CS101, enrol,1,2000-01-01,2000-03-31
11, John, CS101, grant,2,2000-04-01,2000-04-30
11, John, CS101, cancel,3,2000-04-01,2000-04-30
11, John, PHIL, enrol, 1, 2000-02-01,2000-03-31
11, John, PHIL, grant, 2, 2000-04-01,2000-04-30
The data set (ds) is above and I have added the following code for the count metric:
evaluate
sumx(
addcolumns( ds
,"z+", if([event] <> "cancel",1,0)
,"z-", if([event] = "cancel",-1,0)
)
,[z+] + [z-])
}
The metric should display : 3 subscriptions (John-CS101 = 1 , John-PHIL=2).
There are some other rules but I don't know how to add them to the DAX code, the cancel date is the same as the above action (non-cancel) and the rank of the cancel-action = the non-cancel-action + 1.
Also there is a need for adding the number for distinct student and course, the composite key . How to add this to the code, please ? (via summarize, rankx)
Regards,
Q
This isn't technically an answer, but more of a recommendation.
It sounds like your challenge is that you have actions that may then be cancelled. There is specific logic that determines whether an action is cancelled or not (i.e. the cancellation has to be the immediate next row and the dates must match).
What I would recommend, which doesn't answer your specific question, is to adjust your data model rather than put the cancellation logic in DAX.
For example, if you could add a column to your data model that flags a row as subsequently cancelled, then all DAX has to do is check that flag to know if an action is cancelled or not. A CALCULATE statement. You don't have to have lots of logic to determine whether the event was cancelled. You entirely eliminate the need for SUMX, which can be slow when working with a lot of rows since it works row by row.
The logic for whether an action is cancelled or not moves to your source system (e.g. SQL or even a calculated column in Excel), or to your ETL (e.g. the Query Editor in Power BI) which are better equipped for such tasks. The logic is applied 1 time and then exists in your data model for all measures, instead of needing to apply the logic each time a measure is used.
I know this doesn't help you solve your logic question, but the reason I make this recommendation is that DAX is fundamentally a giant calculator. It adds things up. It's great at filters (adding some things up but not others), but it works best when everything is reduced to columns that it can sum or count. Once you go beyond that (e.g. wanting to look at the row below to adjust something about the current row), your DAX is going to get very complicated (and slow), whereas a source system or the Query Editor will likely be able to handle such requirements more easily.