Parsing DAX Query with Regular expression to get measures and dimensions - regex

I want to create an application that uses Extended Events to analyze the usage of Azure Analysis Services models.
Specifically, I want to know what measures and dimensions the end users are using.
I look at the QueryEnd event and try to parse the TextData field. Depending on the tool used for querying the model I get either MDX or DAX in the TextData.
I think I have managed to parse the MDX with this RegEx: ([[\w ]+].[[\w ]+](?:.(?:Members|[Q\d]))?)
(from this post: Regular expression for extracting element from MDX Query)
Now parsing the DAX is the problem. If I query the model from fx PowerBI I get a DAX like this:
EVALUATE
TOPN(
502,
SUMMARIZECOLUMNS(
ROLLUPADDISSUBTOTAL('Product'[Color], \"IsGrandTotalRowTotal\"),
\"Order_Count\", 'SalesOrderDetail'[Order Count]
),
[IsGrandTotalRowTotal],
0,
[Order_Count],
0,
'Product'[Color],
1
)
ORDER BY
[IsGrandTotalRowTotal] DESC, [Order_Count] DESC, 'Product'[Color]
What I would like to match with the RegEx is:
'Product'[Color] and 'SalesOrderDetail'[Order Count]
And.... how would i know that Order Count is used as a measyre while Color is an attribute of the Product dimension?..... guess I won't?
Thanx a lot
Nicolaj

think I just found a possible solution for parsing both DAX and MDX queries:
([\[\'][\w ]+[\]\']\.?[\[\'][\w ]+[\]\'])(?!.*\1)
This will give me what I need.... without duplicates. Improvements are welcomed :-)

For disambiguating columns from measures, I would query the DMVs for the deployed model to get a list of columns and a list of measures. Then you can just look up your parsed tokens in the two lists:
DMV docs
Column DMV
Measure DMV.
Note that measure names are globally unique among the population of column and measure names, so that is an easy lookup (just throw away the table reference). Column names are only unique within the table they belong to.
It looks like you've already got a regex that's working for you, so run with that.
I'll also note that almost all PBI visuals which are returning anything other than a simple list (e.g. slicers or one column tables will return just a list) will follow the pattern of the query you shared.
Measures should all be in later args to SUMMARIZECOLUMNS. The pattern is:
SUMMARIZECOLUMNS (
<grouping columns, optionally in a ROLLUPADDISSUBTOTAL>,
<filters, defined above in VARs>,
<measures as ("Alias", [Measure]) pairs>
)

Related

Power BI LOOKUPVALUE with a column of values for the search items? (VLOOKUP alternative)

In Power BI, I need to create a VLOOKUP alternative. From the research I've done, this is done with the LOOKUPVALUE function, but the problem is that function needs one specific SEARCH ITEM, which isn't super helpful in a VLOOKUP type scenario where you have a full column of values to search for?
Given these two tables, connected through the user_name and first_name columns:
...what's the formula needed in order to create a new column in the Employee_Table called phone_call_group by using the names as the search items in order to return the group they belong to? So how can I end up with this?
(Forget that the entries in each table are already sorted, needs to be dynamic). Will be back tomorrow to review solutions.
In Power BI you have relations between tables instead of Excel's VLOOKUP function.
In your case you just have to create a one-to-one relation between
'Phone_Call_Table'[user_name] and 'Employee_Table'['first_name]'
With that you can add a Calculated Column to your 'Employee_Table' using the following expression:
phone_call_group = RELATED(Phone_Call_Table[group])
and in the data view the table will look like this:
LOOKUPVALUE() is just a workaround if for other reasons you can't establish that relation. What you've been missing so far is that in a Calculated Column there is a Row Context which gives you exactly one value per row for the <search_value> (this is different from Measures):
alt_phone_call_group =
LOOKUPVALUE(
Phone_Call_Table[group],
Phone_Call_Table[user_name],
Employee_Table[first_name]
)

Why using ALLSELECTED() AND FILTER() together?

I am trying to understand the below DAX code:
CALCULATE(
SUM(Revenue[Net])
,FILTER('Center', NOT 'Center'[Acc] IN {"RSM", BLANK() })
,ALLSELECTED()
,VALUES('Customer'[Customer Number])
)
I have the below questions:
What's the use of ALLSELECTED?? By definition ALLSELECTED returns all rows in a table, ignoring any filters that might have been applied inside the query, but keeping filters that come from outside. https://dax.guide/allselected/
So, what's the point of writing FILTER() if its going to be forced to be ignored by the next line (ALLSELECTED)?!?
Also by definition:
CALCULATE is just a expresion followed by filters...
What's the use of VALUES() ? It doesn't appear to be a filter, so how is it even allowed to appear there? (Per definition VALUES(): returns a one-column table that contains the distinct values from the specified column.)
I do not understand what is this returning? is it the SUM() or the VALUES()?
(I come from a SQL background, so any sql-friendly answer is appreciated).
In Dax every filter is a table of values its look similar to INNER JOIN;
ALLSELECTED is useful when you need to keep a row context (this is also a filter in DAX). You can use ALLSELECTED inside FILTER function.
For better understand what engine does you can use a DaxStudio with ServerTiming;
As you see this product one simple statement:
SELECT
SUM ( 'Table'[Cost Centre] )
FROM 'Table'
WHERE
'Table'[Project] NIN ( 'AB' ) ;
You can find useful article by Alberto Ferrari and Marco Russo:
https://www.sqlbi.com/tv/auto-exist-on-clusters-or-numbers-unplugged-22/
If it is only converting DAX queries if your connecting your sql analysis services to in DAX studio. because it is not working for PBI file data

Power BI - Creating a calculated table

I am creating a dashboard in Power BI. I have to report the executions of a process in a daily basis. When selecting one of these days, I want to create another calculated table based on the day selected (providing concrete information about the number of executions and hours) as it follows:
TABLE_B = FILTER(TABLE_A; TABLE_A[EXEC_DATE] = [dateSelected])
When [dateSelected] is previously calculated from the selected day as it follows:
dateSelected = FORMAT(FIRSTDATE(TABLE_A[EXEC_DATE]);"dd/MM/yyyy")
I tried a lot of alternatives as, for example, create individualy the year, month and day to later compare. I used the format in both sides of the comparation, but none of them works for me. The most of the cases it returns me the whole source table without any kind of filters. In other cases, it doesn't return anything. But, when I put a concrete day ...
TABLE_B = FILTER(TABLE_A; TABLE_A[EXEC_DATE] = "20/02/2019")
... it makes the filter correctly generating the table as I want.
Does someone know how to implement the functionality I am searching for?
Thanks in advance.
You're almost there Juan. You simply need to use dateSelected as a varialbe inside of your DAX query:
TABLE_B =
var dateSelected = FIRSTDATE(TABLE_A[EXEC_DATE])
return
FILTER(TABLE_A, TABLE_A[EXEC_DATE] = dateSelected)
Note that all my dates are formatted as Date so I didn't need to use a FORMAT function.
Here's the final result:
I admit that this behavior can be quite confusing! Here is a useful link that will help you understand Power BI's context:
https://community.powerbi.com/t5/Desktop/Filtering-table-by-measures/td-p/131361
Let's treat option 1 as FILTER(TABLE_A; TABLE_A[EXEC_DATE] = "20/02/2019") and option 2 as FILTER(TABLE_A; TABLE_A[EXEC_DATE] = [dateSelected]). Quote from the post:
In option 1, in the filter function, you are iterating
over each row of your 'Table' (row context). In option 2, because you
are using a measure as part of the filter condition, this row context
is transformed into an equivalent filter context (context transition).
Using variables (...) is very convenient when you want to filter
a column based on the value of a measure but you don't want context
transition to apply.

Refer to slicer-filtered table in DAX calculated column

I'm beginning to think that what I'm looking for isn't actually possible.
I have two datasets - one of Titles and one of Keywords. Only one column from each is relevant to this query - Titles[Title] and Keywords[Keyword]. So pretty straightforward data. Titles has around 2 million rows, and Keywords will eventually have ~500-1000.
I would like to display a slicer of Keywords[Keyword] values, which will filter the Titles dataset where Titles[Title] contains one of the selected Keywords[Keyword] values.
I tried creating a DAX calculated column on Titles like below -
Matches = IF(SUMX(FILTER('Keywords','Keywords'[Keyword] <> ""),FIND(UPPER('Keywords'[Keyword]), UPPER(Titles[Title]),,0)) > 0,1, 0)
I then apply a report level filter for Matches >= 1. This works for all keyword values, but is not aware of the selection in the slicer.
I tried changing it to use ALLSELECTED('Keywords'[Keyword]) as the first argument passted to FILTER, but this doesn't seem to have any effect.
As a test, I created a Calculated Column and a Measure with the exact same DAX -
CONCATENATEX(VALUES(Keywords[Keyword]), Keywords[Keyword], ",")
This displays the slicer selection delimited by commas for the measure, but not for the column. Since I want to calculate this per row and filter the report based on this, a measure isn't suitable.
Is there any other way I can refer to the filtered Keywords[Keyword] in my calculated column? Or is there a way that a Measure could actually be used to achieve this? Or is there a completely different approach that I could try?

Inactive relationships affecting measures

I have the following tables & relationships in our pbix report:
For some obvious reasons, I need to have a relationship (non-active) between Dates[date] and Table2[T2Date]. However, doing so causes data fluctuation to measure 'Total Amount' in Table1.
Here are some screenshots:
Before Relationship (Dates[date] - Table2[T2Date]):
After Relationship (Dates[date] - Table2[T2Date]):
I need to understand why this difference is coming up and how the relationship is causing it since the measure uses a different relationship.
For reference, I am attaching the pbix report.
https://drive.google.com/open?id=1XknisXvElS6uQN224bEcZ_biX7m-4el4
Any help would be appreciated :)
The link that #MikeHoney gives has really useful information on the subtleties of relationships and does relate to this problem (do watch it!), but this issue is not ultimately related to bidirectional filtering in particular. In fact, I can reproduce it with this simplified relationship structure:
The key thing to note here is that when you attach Table2 to Dates, since Table2 contains T2Date values that don't match to any Date[date], this creates an extra row in Dates with a blank date which you can notice in your filter on 6. Year when that relationship exists (active or inactive). Filtering out that that blank in the 6. Year filter would work, except that in your measure, you use ALL(Dates) to strip all filtering done on that table.
There are multiple ways to resolve this discrepancy, the easiest being replacing ALL with ALLNOBLANKROW. If you used ALLSELECTED that would also work in conjunction with filtering out blanks on your report-level filter on 6. Year.
Cleaning up some items not relevant in this context and changing ALL to ALLNOBLANKROW, your total measure can be more simply written as:
ALLNOBLANKROW =
VAR EndServiceDate =
MAX ( Dates[Date] )
RETURN
CALCULATE (
SUM ( Table1[Net Amount] ),
FILTER (
ALLNOBLANKROW ( Dates ),
Dates[Date] <= EndServiceDate
),
Table1[Flag2] = 1,
Table1[Flag] = TRUE ()
)
Results with no 6. Year filter and with two measures, one using ALL and one using ALLNOBLANKROW:
Notice that every row in the ALL column has been reduced by -7,872.01. This is the sum of all the Net Amount values that don't match to any dates in the Dates table. If you remove the relationship from Dates[date] to Table2[T2Date] then the blank row no longer exists and both of these will match the ALLNOBLANKROW version.
Setting the Cross Filter Direction to Both on any relationship is a bit risky - you essentially hand over control of the runtime query designs to the Power BI robots. There's then a risk that they will come up with a "creative" query design that is unexpected.
There's some insight into how this happens in a recent talk by Alberto Ferrari:
https://www.sqlbi.com/tv/understanding-relationships-in-power-bi/
I'm sure you'll agree it's quite terrifying.
Looking at your info, I expect you can avoid those traps by changing the Cross Filter Direction to Single, for the relationship from MonthYear to Dates.