Using DAX to identify first instance of a record
I'm faced with trying to identify the first instance in a database where someone (identified by the ID column) has purchased a product for the first time. It's possible for said person to purchase the product multiple times on different days, or purchase different products on the same day. I drummed up an excel formula that gets me there, but am having trouble translating into DAX.
=COUNTIFS(ID,ID,PurchaseDate,"<="&PurchaseDate,Product,Product)
Which results in the correct values in the "First Instance?" Column.
Ideally I won't have to hardcode values, as I would like to use the "Product" column as a parameter in the future. If there are other suggests aside from translating this in DAX, that would also be appreciated! (IE using filters, or other tools in PowerBI)
Thanks in advance!
This is very similar to an answer I gave to another question (which you can find here).
In that question, the request was to see a running count of rows for the given row's criteria (product, year, etc.). We can modify that slightly to get it to work in your problem.
This is the formula I provided in the answer I linked above. The basic concept is to use the EARLIER functions to get the value from the row and pass it into the filter statement.
Running Count =
COUNTROWS(
FILTER(
'Data',
[ProductName] = EARLIER([ProductName]) &&
[Customer] = EARLIER([Customer]) &&
[Seller] = EARLIER([Seller]) &&
[Year] <= EARLIER([Year])
)
)
What I would suggest for your problem is to create this as a TRUE/FALSE flag by simply checking if the running count is 1. This formula will evaluate to a Boolean flag.
First Instance =
COUNTROWS(
FILTER(
'Data',
[ID] = EARLIER([ID]) &&
[Product] = EARLIER([Product]) &&
[Purchase Date] <= EARLIER([Purchase Date])
)
) = 1
Related
Using the table attached as an example 'Financial_Tab'.
I want to sum [Budget] based on whether "1.2.4" is within [CBS Code], and where "NOP1" or "NOP2" or "NOP3" is within [CBS Name]. Returning £50 in the example table.
I'm struggling looking for a succinct and functional way of searching for NOP1/NOP2/NOP3 in one go. I'm returning blank results. I think IN {} could be used but I cant get it to work with containsstring.
I really appreciate any help.
Thanks
This is what you're looking for:
Sum Budget =
CALCULATE(
SUM('Table'[Budget]),
CONTAINSSTRING('Table'[CBS Code], "1.2.4"),
CONTAINSSTRING('Table'[CBS Name], "NOP1")
|| CONTAINSSTRING('Table'[CBS Name], "NOP2")
|| CONTAINSSTRING('Table'[CBS Name], "NOP3")
)
I replicated an issue I am having with the 'Adventure Works DW 2020' pbix file, so if my analysis seems a little out of context, please understand this example is not the true data I am working with. The pbix I used can be downloaded here:
https://drive.google.com/file/d/1vn6CluiE5rrAF3UjYPh5ejb93H2JX6IX/view?usp=sharing
My goal is to create a measure that can flag the subset of records that I want to use for a matrix visual.
I created the following measure with notes in the syntax:
VAR TABLEVAR =
SELECTCOLUMNS(
FILTER(
SUMMARIZE(
CALCULATETABLE(Sales/*Apply several filters to Sales table*/
,NOT Sales[CustomerKey] = -1
,Sales[orderdatekey] > 20180731
,Sales[orderdatekey] < 20190601
)
,[CustomerKey]/*Count the number of products per customer*/
,"Count",COUNT(Sales[ProductKey])
)
,[Count] > 1/*Only keep customers that bought more than 1 product*/
)
,[CustomerKey] /*Select the identifiers of the desired customers*/
)
RETURN
{
SWITCH(TRUE()
,SELECTEDVALUE(Sales[CustomerKey]) IN TABLEVAR/*Flag the customers that were identified in the previous table*/
,1,BLANK()
)
}
Now, in the PowerBI Matrix visual, this seems to work at first:
I had successfully flagged the desired output. Now I just have to filter for the 'Analysis' measure to be 'Not Blank', but then this happens:
Now removing that filter and going down a level:
So you see, the measure does not evaluate at the record level of the table. Does anyone understand the concept I am missing here? I have tried all kinds of different measures but it all comes down to the same problem about flagging different levels of analysis.
Ideally, the output would only include the following(circled in green):
These are the records that are within the date filters I put into the CALCULATETABLE() arguments.
Any help or insight with this problem would be greatly appreciated. Thank you
I'm not 100% clear what you're trying to do but please try the following and see if it helps.
Analysis =
VAR TABLEVAR =
SELECTCOLUMNS(
FILTER(
SUMMARIZE(
CALCULATETABLE(Sales
,NOT Sales[CustomerKey] = -1
,Sales[orderdatekey] > 20180731
,Sales[orderdatekey] < 20190601,
REMOVEFILTERS()
)
,[CustomerKey]
,"Count",COUNT(Sales[ProductKey])
)
,[Count] > 1
)
,[CustomerKey]
)
RETURN
//CONCATENATEX(TABLEVAR, [CustomerKey], ",")
SWITCH(TRUE()
,SELECTEDVALUE(Sales[CustomerKey]) IN TABLEVAR
,1,BLANK()
)
Basically, I’d like to get one entity totals, but calculated for another (but still related/associated!) entity. Relation type between these entities is many-to-many.
Just to be less abstract, let’s take Trips and Shipments as mentioned entities and Shipments’ weight as a total to be calculated.
Calculating weight totals just per each trip is pretty easy task. Here is a table of Shipments weights:
We place them into some amounts of trucks/trips and get following weight totals per trip:
But when I try to show SUM of Trip weight totals (figures from 2nd table) per each related Shipment (Column from 1st table), it becomes much harder than I expect.
It should look like:
And I can’t get such table within Power BI.
Data model for your reference:
Seems like SUMMARIZE function is almost fit, but it doesn’t allow me to use a column from another table than initialized in the function:
Additional restrictions:
Selections made by user (clicks on cells etc.) should not affect calculation anyhow.
The figures should be able to be used in further calculations, using them as a basis.
Can anyone advise a solution? Or at least proper DAX references to consider? I thought I could find a quick answer in DAX reference guide on my own. However I failed to find a quick answer.
Version 1
Try the following DAX function as a calculated column inside of your shipments table:
TripWeight =
VAR tripID =
RELATED ( Trips[TripID] )
RETURN
CALCULATE (
SUM ( Shipments[ShipmentTaxWeightKG] );
FILTER ( Shipments; RELATED ( InkTable[TripID] ) = tripID )
)
The first expression var tripID is storing the TripID of the current row and the CALCULATE function gets the SUM of all of the weight for all the shipments that belong to the current trip.
Version 2
You can also create a calculated table using the following DAX and create a relationship between the newly created table and your Trips table and simply display the weight in this table:
TripWeight =
GROUPBY (
Shipments;
Trips[TripID];
"Total Weight KG"; SUMX ( CURRENTGROUP (); Shipments[ShipmentTaxWeightKG] )
)
Version 3
Version 1 and 2 are only working if the relationship between lnkTrip and Shipment is a One-to-One relationship. If it is a many-to-one relationship, the following calculated column can be created inside of the Trips table:
ShipmentTaxWeightKG by Trip = SUMX(RELATEDTABLE(Shipments); Shipments[ShipmentTaxWeightKG])
hope this helps.
I have the table which looks something like this. I am trying to find a way to find status change per an account, e.g. if the current month the status is Written off but it was Active last month, the tag should be Newly written Off. Is it feasible in Power BI? I found PREVIOUSMONTH but it deals only with measures, not categorical values like I have.
this seems like a trivial problem, but I found out, that it's not easily solvable in PowerBI.
Before showing the way I solve this, I would recommend you to solve it prior to loading data to PowerBI (ie. in the data source).
If that is not possible here's what you should do:
(recommended) Create T-1 data column == column, which has the previous date for comparison (for you, its previous month or date):
T-1 date =
VAR __acc_id = TABLE[account_id]
VAR __date = TABLE[date]
RETURN
CALCULATE(MAX(TABLE[date]),FILTER(ALL(TABLE), TABLE[account_id] = __acc_id && TABLE[date] < __date))
The filter part of calculation "returns" part of the table, which has the same account_id as current row and smaller date then current row. After that, we simply select the max date, which should be the previous one.
This way, we find the biggest previous date for each account.
If you know what the previous date is, feel free to skip this step.
Prior to the creation of the status column itself, I would create a calculated column, which contains the previous status. The column should be calculated like this:
t-1 status=
var __acc_id = TABLE[account_id]
var tdate = TABLE[T-1 date] //CREATED IN PREVIOUS STEP OR YOUR PREVIOUS DATE METRIC(LIKE dateadd or previousmonth function)
return
CALCULATE(firstnonblank(TABLE[status]), FILTER(ALL(TABLE), TABLE[account_id]=__acc_id && table[date] = tdate))`
With the previous status column, we now have the current and previous status columns, which should be enough to correctly label the "rows" with simple if statement.
The calculated column formula might look something like this:
status_label = if(TABLE[status] == "Written off" && TABLE[t-1 status] == "active", "newly written off", "something else").
If simple IF isn't enough, have a look at switch statement
This sequence of steps should solve your issue, but I have to admit, it's not performance efficient. It would be better to solve it in PowerQuery, but sadly, I do not know how. Any solution using PowerQuery would be highly appreciated.
I have the following tables & relationships in our pbix report:
For some obvious reasons, I need to have a relationship (non-active) between Dates[date] and Table2[T2Date]. However, doing so causes data fluctuation to measure 'Total Amount' in Table1.
Here are some screenshots:
Before Relationship (Dates[date] - Table2[T2Date]):
After Relationship (Dates[date] - Table2[T2Date]):
I need to understand why this difference is coming up and how the relationship is causing it since the measure uses a different relationship.
For reference, I am attaching the pbix report.
https://drive.google.com/open?id=1XknisXvElS6uQN224bEcZ_biX7m-4el4
Any help would be appreciated :)
The link that #MikeHoney gives has really useful information on the subtleties of relationships and does relate to this problem (do watch it!), but this issue is not ultimately related to bidirectional filtering in particular. In fact, I can reproduce it with this simplified relationship structure:
The key thing to note here is that when you attach Table2 to Dates, since Table2 contains T2Date values that don't match to any Date[date], this creates an extra row in Dates with a blank date which you can notice in your filter on 6. Year when that relationship exists (active or inactive). Filtering out that that blank in the 6. Year filter would work, except that in your measure, you use ALL(Dates) to strip all filtering done on that table.
There are multiple ways to resolve this discrepancy, the easiest being replacing ALL with ALLNOBLANKROW. If you used ALLSELECTED that would also work in conjunction with filtering out blanks on your report-level filter on 6. Year.
Cleaning up some items not relevant in this context and changing ALL to ALLNOBLANKROW, your total measure can be more simply written as:
ALLNOBLANKROW =
VAR EndServiceDate =
MAX ( Dates[Date] )
RETURN
CALCULATE (
SUM ( Table1[Net Amount] ),
FILTER (
ALLNOBLANKROW ( Dates ),
Dates[Date] <= EndServiceDate
),
Table1[Flag2] = 1,
Table1[Flag] = TRUE ()
)
Results with no 6. Year filter and with two measures, one using ALL and one using ALLNOBLANKROW:
Notice that every row in the ALL column has been reduced by -7,872.01. This is the sum of all the Net Amount values that don't match to any dates in the Dates table. If you remove the relationship from Dates[date] to Table2[T2Date] then the blank row no longer exists and both of these will match the ALLNOBLANKROW version.
Setting the Cross Filter Direction to Both on any relationship is a bit risky - you essentially hand over control of the runtime query designs to the Power BI robots. There's then a risk that they will come up with a "creative" query design that is unexpected.
There's some insight into how this happens in a recent talk by Alberto Ferrari:
https://www.sqlbi.com/tv/understanding-relationships-in-power-bi/
I'm sure you'll agree it's quite terrifying.
Looking at your info, I expect you can avoid those traps by changing the Cross Filter Direction to Single, for the relationship from MonthYear to Dates.