How do you exclude negative values in an iterator function? - powerbi

I am putting together a measure of the difference between two datetime values and have ran into negative durations due to bad data, that I need to omit from the calculation.
My DAX function is as follows:
Job_length =
SUMX (
jobs ,
DATEDIFF (
jobs[actualstart] ,
jobs[actualend] ,
MINUTE
)
)
This returns the following output:
How can I change the formula to skip rows where the iterator expression returns a negative value?

The question is a bit lacklustre in the sense that it is perhaps the wrong question that is being asked! E.g. why is there data where actualstart < actualend to begin with? This is something you should fix in ETL prior to loading data into Power BI.
Or better yet, ensure that this is not allowed within your source systems. Perhaps there is a bad setting somewhere in your stack that results in this behavior in the data.
However, to do exactly as you describe you can apply a filter to the table you are iterating over within the SUMX function to remove rows that don't comply with the requirement. I have here assumed that only rows where actualend is strictly greater than actualstart are evaluated:
Job_length =
SUMX(
FILTER (
jobs ,
[actualstart] < [actualend]
),
DATEDIFF (
jobs[actualstart],
jobs[actualend],
MINUTE
)
)

Related

How can I calculate how many times a value is present excluding another value?

I have a column for which I need to count how many times the value "Good" is present without calculating the "Neutral" Value?
This is the sample table:
Col1
Bad
Neutral
Good
Bad
Neutral
Okay
Here is what I have so far but I kept
Measure = CALCULATE(Col1],Table[Col1] = "Good",ALLEXCEPT(Table,Table[Col1]="Neutral" ))
but I end up getting this error:
A single value for column 'Intune Owner Email' in table 'users' cannot be determined. This can happen when a measure formula refers to a column that contains many values without specifying an aggregation such as min, max, count, or sum to get a single result.
How can I work around this?
Why whould you need to except Col1 = Neutral for this?
This measure counts the number of rows where Col1 = "Good":
# Good :=
CALCULATE (
COUNT ( 'Table'[Col1] ) ,
'Table'[Col1] = "Good"
)
If you are after a different type of filtering semantics, you can try also invoking KEEPFILTERS:
# Good Variant =
CALCULATE (
COUNT ( 'Table'[Col1] ) ,
KEEPFILTERS ( 'Table'[Col1] = "Good" )
)
The difference between the semantics can be shown here:

Why using ALLSELECTED() AND FILTER() together?

I am trying to understand the below DAX code:
CALCULATE(
SUM(Revenue[Net])
,FILTER('Center', NOT 'Center'[Acc] IN {"RSM", BLANK() })
,ALLSELECTED()
,VALUES('Customer'[Customer Number])
)
I have the below questions:
What's the use of ALLSELECTED?? By definition ALLSELECTED returns all rows in a table, ignoring any filters that might have been applied inside the query, but keeping filters that come from outside. https://dax.guide/allselected/
So, what's the point of writing FILTER() if its going to be forced to be ignored by the next line (ALLSELECTED)?!?
Also by definition:
CALCULATE is just a expresion followed by filters...
What's the use of VALUES() ? It doesn't appear to be a filter, so how is it even allowed to appear there? (Per definition VALUES(): returns a one-column table that contains the distinct values from the specified column.)
I do not understand what is this returning? is it the SUM() or the VALUES()?
(I come from a SQL background, so any sql-friendly answer is appreciated).
In Dax every filter is a table of values its look similar to INNER JOIN;
ALLSELECTED is useful when you need to keep a row context (this is also a filter in DAX). You can use ALLSELECTED inside FILTER function.
For better understand what engine does you can use a DaxStudio with ServerTiming;
As you see this product one simple statement:
SELECT
SUM ( 'Table'[Cost Centre] )
FROM 'Table'
WHERE
'Table'[Project] NIN ( 'AB' ) ;
You can find useful article by Alberto Ferrari and Marco Russo:
https://www.sqlbi.com/tv/auto-exist-on-clusters-or-numbers-unplugged-22/
If it is only converting DAX queries if your connecting your sql analysis services to in DAX studio. because it is not working for PBI file data

How do I dynamically grab data row-by-row and calculate a distinct count where parameters are met?

INTRO
I realize the title makes the problem sound simple, however, this task has proved incredibly difficult for me and it's taken up hours of my time every day for the past week. With that being said, any help is appreciated!
The first table involved is LoadView, which contains the fields LoadNumber, CarrierID, and BookedFrom. The second table is LoadBaseView, which contains the fields LoadNumber, CarrierID, and BookedOnDateTime. These two are related by LoadNumber.
The visualization I wanna add to is the following, where the new row would be "New Carriers" listed right under "Carriers":
Lastly, preliminary info wise, that table is just a matrix with LoadView[BookedFrom] as the only context (Autobook, etc.) and simple measures across LoadView along with it.
PROBLEM
Now that I've (hopefully) laid everything out clearly, let me explain exactly what I am looking for. I would like to count the amount of new carriers that have booked in each BookedFrom category, i.e. I would like to count the amount of carriers booking that have never booked before for each category. This means that any carrier could potentially be counted in each BookedFrom column, just to clarify. I've tried many different measure to capture this and I've run into a whole host of problems, including memory insufficiencies to exceeding available resources. The latter's DAX is the following:
IsFirstCarrierBookedFrom* =
Var current_booked_from = MIN(dsgLoadView[BookedFrom])
Var T1 = ADDCOLUMNS(ALL(dsgLoadView),"BookedOnDateTime",RELATED(dsgLoadBaseView[BookedOnDateTime]))
Var T2 = GROUPBY(T1,dsgLoadView[CarrierId],dsgLoadView[BookedOn],"MinBookedOnDateTime",MINX(CURRENTGROUP(),[BookedOnDateTime]))
Var T3 = NATURALINNERJOIN(T1,T2)
Var T4 = FILTER(T3,[BookedOnDateTime]=[MinBookedOnDateTime])
RETURN
CALCULATE(DISTINCTCOUNT([CarrierId]),FILTER(T4,[BookedFrom]=current_booked_from),USERELATIONSHIP(dsgLoadView[BookedOnDate],dsgCalendar[Date]))
The above results in this error:
This next attempt results in a memory insufficiency error:
TotalFirstCarrierBooks* =
Var current_row_carrier_id = MIN(dsgLoadBaseView[CarrierId])
Var current_row_booked_from = MIN(dsgLoadView[BookedFrom])
Var first_booked_from_date_time =
CALCULATE(
MIN(dsgLoadBaseView[BookedOnDateTime]),
FILTER(
ALL(dsgLoadBaseView),
dsgLoadBaseView[CarrierId]=current_row_carrier_id
),
FILTER(
All(dsgLoadView),
dsgLoadView[BookedFrom]=current_row_booked_from
)
)
Var is_first_date = IF(first_booked_from_date_time=MIN(dsgLoadBaseView[BookedOnDateTime]),1,0)
RETURN
SUMX(dsgLoadBaseView,is_first_date)
With that being said, if I take out the BookedFrom bits (current_row_booked_from, etc.) the measure works and when alongside LoadNumber it returns a 1 or 0, denoting that the LoadNumber was or was not the first booking by the Carrier. I decided this wasn't the right path, though, due to that memory error. Also, summing up these 1's gets me duplicate bookings per Carrier per BookedFrom. In other words, a Carrier can book 2 loads via Manual at the same DateTime and, as those 2 rows would have 1's per the logic, that would add up to 2 which is a no-no.
THANK YOU
Seriously, to anyone who got this far! This problem has eaten up a ton of time for me, I've Googled relentlessly and I've watched countless YouTube videos. Thanks for your time!
I unable to provide a solution that would work because I don't have the data and its impossible to solve questions like these without the actual data, but I do know DAX so based on my experience here are my suggestions.
For measure IsFirstCarrierBookedFrom:
Substitute DISTINCTCOUNT with SUMX ( VALUES ( Table[CarriedID] ), 1 ) and see if that resulst in better performance
You are adding a column to dsgLoadView with RELATED, what is the cardinality of this table? Pay attention to these details and based on that supply a smaller table to ADDCOLUMNS and then use CALCULATE to compute BookedOnDateTime
Functions like NATURALINNERJOIN utilize the slower engine of DAX
You are probably applying a huge table T4 into the filter context with CALCULATE
For measure TotalFirstCarrierBooks:
You are probably iterating a huge tabls inside CALCULATE in the variable first_booked_from_date_time, try to change that to this:
VAR first_booked_from_date_time =
CALCULATE (
MIN ( dsgLoadBaseView[BookedOnDateTime] ),
dsgLoadBaseView[CarrierId] = current_row_carrier_id,
dsgLoadView[BookedFrom] = current_row_booked_from,
REMOVEFILTERS ( dsgLoadBaseView ),
REMOVEFILTERS ( dsgLoadView )
)
The RETURN part isn't working how you would expect it to, variables in DAX are constants, before RETURN the value of is_first_date is computed and is now a fixed value, nothing can change it, let's assume it is 10 then inside SUMX ( dsgLoadBaseView, is_first_date ) you are summing 10 for each row of the table dsgLoadBaseView

Power BI - Getting the most recent value from a related table

I know this must be extremely simple, but every example I can find online only works within a single table. I've simplified my situation to these two tables:
I want to add a calculated column to the first table, showing the most recent value for that id. It also needs to work with text.
There are a variety of ways to do this kind of thing as I've explained before and all of the solutions there can be adjusted to work in this case.
Doing this as a calculated column and with a second table, you need to make sure you are using row context and filter context appropriately.
Here's are a couple different possibilities I think may work:
MostRecentValue =
MAXX ( TOPN ( 1, RELATEDTABLE ( Table2 ), Table2[date] ), Table2[value] )
In this one, RELATEDTABLE is doing the work of filtering Table2 to only the rows where id matches Table1.
MostRecentValue =
VAR PrevDate = CALCULATE ( MAX ( Table2[date] ) )
RETURN CALCULATE ( MAX ( Table2[value] ), Table2[date] = PrevDate )
The relationship is more subtle here. Wrapping the MAX in CALCULATE forces a context transition so that the row context (which includes id) is applied to Table2 as filter context.

Calculate value if outside time range

I have a problem where I need to figure out if a project has values outside it's start and finish date range.
Below is a simple relationship of dimension table containing start and finish dates of the projects. And a fact table containing time registration.
The table below has a column 'Outside Date Range' Which I'd like to have a true/false value. for example if Main2 Table contains a date Monday, May 13, 2018. The column should show false.
I tried something like
Outside Date Range = CALCULATE(SUM(Main2[Value]), FILTER(Main2, Main2[Time] < LOOKUPVALUE(Main[Start], Main[Project], ALL(Main2[Project]))))
But not really sure how to approach the relationship between the two tables properly.
The two approaches I would suggest are either a calculated column or a measure.
Calculated column:
Outside Date Range =
VAR rowsOutsideRange =
CALCULATE (
COUNTROWS ( Main2 ),
FILTER (
RELATEDTABLE ( Main2 ),
Main2[Time] < Main[Start]
|| Main2[Time] > Main[Finish]
)
)
RETURN
IF ( rowsOutsideRange > 0, TRUE (), FALSE () )
You were pretty close in your solution! Since you have a relationship between the two tables RELATEDTABLE will only return the related rows which removes the necessity of a LOOKUPVALUE(). Also, counting the rows is sufficient since we only want to know if any rows exist outside of the range, not how many.
You could also create a measure:
Outside Date Range Measure :=
VAR rowsOutsideRange =
CALCULATE (
COUNTROWS ( Main2 ),
FILTER (
Main2,
Main2[Time] < MIN ( Main[Start] )
|| Main2[Time] > MAX ( Main[Finish] )
)
)
RETURN
IF ( rowsOutsideRange > 0, TRUE (), FALSE () )
Which is pretty similar to the calculated column, the only this is we need to aggregate the start and finish dates. On its own this measure doesn't have any value, it needs to be sliced by a project to be correct. If you would really want to you could use a SUMX() type of construction to create an overall TRUE/FALSE statement which tells you if any of the project have rows outside their ranges but for your use case I don't see the benefit of that.
The choice between a calculated column and a measure is dependent on the legibility of the code and resource usage. A calculated measure uses more memory and a measure uses more CPU.
Looking at your case I would go for a calculated column, which seems the most simple and clear solution.
Hope that helps!
Jan