The documentation for COUNTAX (DAX) and COUNTX (DAX)
states that the second argument is an expression that is evaluated for each row.
See: https://learn.microsoft.com/en-us/dax/countax-function-dax
This is exactly what I need, but I cannot figure out what the 'expression' should look like.
The example given in Microsoft's documentation is this:
=COUNTAX(FILTER('Reseller',[Status]="Active"),[Phone])
But the second argument ([Phone]) does not look like an expression.
An expression in my expectation is something like "value > 3 AND value <= 10"
What kind of expression can be used here?
In the example [Phone] is the expression evaluated for each row in the resulting table. To clarify, since COUNTAX and COUNTX return the count of nonblank rows, the count of nonblank values in the [Phone] column is the computed expression. After the FILTER function is applied to the table, the expression would be equivalent to COUNT([Phone]) for this context. Using the Server Timings feature in Dax Studio the text representation of what is passed to the storage engine can be viewed. In the case of COUNTX you will see a query with IS NOT NULL in the WHERE clause for the column used as the expression, as would be [Phone] in this case, with the COUNT function selected since any rows with a blank [Phone] will have already been filtered out.
The statement below is an example query from the Server Timings feature in DAX Studio using the example measure from your question. As you can see, there are two filters in the WHERE clause. The first on the Status column, to return only rows that are active. The second is to eliminate null values in the Phone column. This leaves the COUNT aggregate function to count all rows that have an active Status and a non-blank value in the Phone column, which is equivalent to a count of the Phone column with the active Status. The query here is only a text representation of the requests sent to the storage engine, thus the syntax displayed won't be actual SQL, but will give you a better idea of how the DAX is being processed.
SET DC_KIND="AUTO";
SELECT
COUNT ( )
FROM 'Reseller'
WHERE
'Reseller'[Status] = 'Active' AND
'Reseller'[Phone] IS NOT NULL;
'Estimated size ( volume, marshalling bytes ) : 224012, 1392082'
usually in dax the term expression means Column or Formula, so you could use either a column or a formula in the countax/countx or in any other functions that are evaluated for each row (generally functions that end in x :D )
Thank you #userfl89, for the enlightening and analytic explanation.
I see now that <expression> is actually a WHERE clause. Now it starts making sense why Microsoft choosed to use the word 'expression' here.
Still, it all is full of contradictions.
The syntax for Count is
COUNT(<column>)
Then, [Phone], in COUNT([Phone]) is of type <column>. I assume this is also just a WHERE clause towards the storage engine?
Compare with COUNTAX(FILTER('Reseller',[Status]="Active"),[Phone]), where [Phone] is of type <expression>.
The real logic in COUNTAX is in fact the FILTER-function.
The <expression> is always a WHERE clause that but selects the non-blanc values. OK, it is an 'expression', but it is always the same one. Right?
I cannot but conclude that Microsoft's type system in their syntax description is a bit messy and makes it hard for me (as a starter on PowerBI) to make my own DAX queries based on their documentation.
Or they should come up with better examples to compensate for the confusing syntax descriptions
Related
I use power bi Version: 2.112.603.0 64-bit (December 2022).In this version when im writing dax measure with switch function after giving the second result with the comma its asking third result.Its skipping 3 rd Value.What Im doing wrong here.
Intellisense isn't always correct. Just provide the correct number of arguments and the DAX engine will interpret them correctly.
The (switch function) has a minimum of 3 parameters that are Expression, Value, Result. It is in this situation like the IF function. But you can also add as many as you want other Value, Result pairs, in this case is more similar to the Case When statement.
To come back to your example, you could add other hasonevalue functions and the respective Result value and as last Parameter the else Value (if no Value is matched).
Tried to add new measures via PowerBI but the EARLIER functions seem not working properly.
The dataset including Year, Month, DealerName, MerchantType, Province, TotalApplications, and Requested Amount. I want to calculate the Total Applications based on DealerName and MerchantType respectively. Please see one of the functions below:
Merchant_TotalApp = CALCULATE(SUM('Requested Summary'[TotalApplications]), FILTER('Requested Summary','Requested Summary'[MerchantType] = EARLIER ('Requested Summary'[MerchantType]))))
The EARLIER ('Requested Summary'[MerchantType]) is underlined red and the error message is "EARLIER/EARLIEST refers to an earlier row context which doesn't exist". When hovering over to the error part, it showing "parameter is not the correct type".
Wondering how to solve the problem? is it related to the data type erro or anything. Thanks for any help!
EARLIER is rarely used in Measures.
It is used in an iterator function when you want to compare what's in the current row context to a prior row context.
In your current Measure, the FILTER function is not nested within a row context, so the EARLIER function has nothing to 'look back' to.
If what you are after are sums that pay attention only to Merchant type and ignore any other filters - you can get that with on of the ALL functions.
Merchant_TotalApp =
CALCULATE (
SUM ( 'Requested Summary'[TotalApplications] ),
ALLEXCEPT('Requested Summary','Requested Summary'[MerchantType])
)
MY GOAL:
parse a MM/DD date from the result of a vlookup so that it can be used in a project plan
BACKGROUND:
The vlookup result contains multiple values separated by a "•" (I don't need all of them)
The value I'm looking to parse is not always in the same location in the vlookup result (otherwise I could use the RIGHT formula)
There is a finite number of the values I'm looking to retrieve (and I know them already)
The value that I'm looking to retrieve contains some text with a date range; I only want the first four values in the date range (MM/DD)
I'd like to achieve all this with a single formula with the result in a single cell
CURRENT FORMULA
The formula that I've been working on that is not working is:
=ARRAYFORMULA(if(iserror(search(Iterations!D2:D7,(VLOOKUP(A2,'Results {2596503}'!$C$2:$L$183,3)))),,))
I've set up a sheet called "Erik Help" with the following formulas in B2 ad C2:
=ArrayFormula(IF(A2:A="","",MID(VLOOKUP(A2:A,data!A2:B,2,FALSE),FIND(REGEXEXTRACT(VLOOKUP(A2:A,data!A2:B,2,FALSE),"[0-9]-[0-9]"),VLOOKUP(A2:A,data!A2:B,2,FALSE))-4,5)))
and
=ArrayFormula(IF(A2:A="","",MID(VLOOKUP(A2:A,data!A2:B,2,FALSE),FIND(REGEXEXTRACT(VLOOKUP(A2:A,data!A2:B,2,FALSE),"[0-9]-[0-9]"),VLOOKUP(A2:A,data!A2:B,2,FALSE))+2,5)))
respectively.
They may be longer than actually needed, but you did not share realistic results in Column B or list which symbols may appear in Column B other than in the date; so I tried to account for either a hyphen or a forward slash possibly appearing in Column B in places other than within the date span.
Your analytics sheet also shows a formula that is sorting the results from data!A:A. So even though in your example the original data order happens to be the same as in analytics!A:A, that is not a given (again, based on your formula). Therefore, the VLOOKUP is also necessary.
You did not indicate whether you need to further use these returned date-snippets in calculations, or whether you just need to view them. So the results generated in "Erik Help" are text.
If you want usable numbers/dates, you add further issues that would need to be controlled for in the formula, because you'll only be extracting month and day, not year. That's fine right now. But what about when the date range to be extracted is "12/28-01/13"? If you simply make these values/dates, they will both be assigned to the current year. So the end date here will wind up being earlier than the start date.
Because of this, I've added a second sheet, "Erik Help 2," which contains extended formulas to account for these cases while still returning the date format you want as actual dates which can be used in calculations.
EDIT
(following your note on the sheet: "I would like to remove col b altogether and nest in the formulas in col c and d")
You can adjust the range B2:B by replacing it with your already existing formula in B2.
The new adjusted formula will become
=ArrayFormula(IFNA(SPLIT(REGEXEXTRACT(VLOOKUP(ARRAYFORMULA(sort(unique(data!A2:A))),data!$A$1:$C,2),"\d+\/\d+-\d+\/\d+"),"-")))
Original answer
You can use the following formula:
=ArrayFormula(IFNA(SPLIT(REGEXEXTRACT(B2:B,"\d{2}\/\d{2}-\d{2}\/\d{2}"),"-")))
Make sure you format the results as Date.
(Please adjust ranges to your needs)
Functions used:
ArrayFormula
IFNA
SPLIT
REGEXEXTRACT
try:
=ARRAYFORMULA(IF(A2:A="",,IFNA(TEXT(SPLIT(REGEXEXTRACT(
VLOOKUP(data!A2:A, data!A:C, 2), "\d+/\d+-\d+/\d+"), "-"), "mm/dd"))))
What might be an alternative way, possibly more effective, for checking if context is total. I use this measure as benchmark:
IsTotal1 = CALCULATE(COUNT(Tab[Store]), ALLSELECTED(Tab)) = COUNT(Tab[Store])
The idea is that it calculates COUNT on a table with filters removed (left side, so we get counts for all dimensions in context) and checks it against the COUNT with current context. If both are the same, we have total.
I know that using the function HASONEVALUE might be tempting:
IsTotal2 = NOT(HASONEVALUE(Tab[Store]))
However, using this approach has a serious drawback. If we make a table displaying sales by store and by product then the first measure will work and the second will fail. Moreover, if we display sales by product only the first measure will still work, and the second should be retyped to HASONEVALUE(Tab[Product]).
So I want the measure to be resistant to any change of granularity due to adding new dimension to table visual.
Based on the information you provided in the comments, it sounds like you have a page- or report level filter. In that case, you can't rely on functions such as ISFILTERED(...) or ISCROSSFILTERED(...), as these external filters or slicers could impact the result returned from these two functions.
So you have to either stick with your approach (perhaps changing COUNT(...) to COUNTROWS(Tab) could improve the performance slightly), or write something like
ISINSCOPE('Tab'[Store]) || ISINSCOPE('Tab'[Product]) || etc...
where you repeat ISINSCOPE for every column that could potentially be used to slice the data, as ISINSCOPE is the only function that distinguishes using a column on a filter/slicer vs. using it as a row/column grouping on a table/matrix visual.
I have a question about MDX tuples, I would like to gain some insight on something that seems confusing to me.
Most of the literature I have read talks about tuples being a set of co-ordinates essentially pointing to a cell which contains a measure value. From what I understand a tuple is defined as containing only one distinct member from each dimension. Typically when writing queries we don't specify every member for every dimension we let SSAS engine use the default members and aggregate the measure data accordingly.
Straight out of the adventure works sample OLAP database (cube) "adventure works"
A super simple query that I understand represents a tuple:
SELECT
([Date].[Calendar Quarter of Year].&[CY Q3],[Measures].[Sales Amount]) --Tuple
ON COLUMNS
FROM [Adventure Works]
SS Management studio returns this result
No problem here the tuple specified by the &[CY Q3] member point to the cell containing the displayed measure amount. Clearly a tuple has been returned.
Typically though I use this sort of thing more often:
select
non empty ([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount]) --Tuple??
ON COLUMNS
FROM [Adventure Works]
Which returns all the quarter totals across all years for said measure (not a great example but it's just an example):
I see this result as a set because more than one distinct member has been returned from the same dimension (date). In fact, by default all members are being returned if so how can it be a tuple?
So my question is this. The parenthesis around the "tuple" in the query above, indicate to me that I'm selecting a tuple, the query engine processes and a result is returned that to me looks like a set, not because more than one cell value is returned but because more than one member from the date dimension has been used.
The query indicates that a tuple is being selected, and the query engine seems to accept it as one however the result set, includes multiple members from the same dimension and corresponding cell values indicating to me that more than one tuple will be returned --> set.
Also, The query engine throws no error when I treat it as a set and use set functions on it:
select
nonempty({([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount])}) --Set
ON COLUMNS
FROM [Adventure Works]
My question is this, Assuming I am correct and that the results do in fact represent a set (a set of tuples denoted by each distinct member instance), why does the query engine allow you to specify parenthesis indicating selection of a tuple to return something that is not a tuple?
This makes more sense to me :
SELECT
nonemptycrossjoin(
{[Date].[Calendar].[Calendar Quarter]}, --Set 1
{[Measures].[Sales Amount]} --Set 2
)
ON COLUMNS
FROM [Adventure Works]
At least this code reflects the result set that's returned Thoughts?
Or is it all just Analysis Services semantics?
Thanks
Unfortunately, MDX is an ambiguous language. From what I've understood, the () notation is a tuple or an operator precedence notation. And:
{...} , {...}
is actually a crossjoin:
{...} * {...}
Then when you specify a " level " where a set is expected, the level.members is defaulted. When you specify a tuple where a set is expected a singleton set is created with this only tuple. So:
( [level], [measures].[amount] )
is equivalent to:
crossjoin( [level].members, { [measures].[amount] } )
The only tuple (a member is a tuple) specified is [Measures].[Amount] which by the way does not use the () notation ;-)
You mention that you typically use this syntax - but I write mdx nearly every day and never use this syntax -
SELECT
NON EMPTY ([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount]) ON COLUMNS
FROM [Adventure Works];
I'm a little surprised it runs as it seems to be indicating to analysis services to create a tuple from a level and a measure member.
MarcP mentions that mdx is ambiguous, but I'm more in favour of saying it can be written ambiguously - it unfortunately fails quite graciously most of the time and you get no numbers returned or the wrong numbers - I wish it threw more errors and enforced tighter syntax rules as this might make it more understandable.
Your original script I would just use the * operator rather than typing out the full crossjoin function when you need it - in your script it is much more readable to move measures onto the rows and delete that tuple? Like Marc mentions SSAS's implicit use of the MEMBERS function - I find things more readable to include it explicitly when it is being used:
SELECT
NON EMPTY
[Date].[Calendar].[Calendar Quarter].MEMBERS ON 0,
[Measures].[Sales Amount] ON 1
FROM [Adventure Works];
[Date].[Calendar].[Calendar Quarter]
As Marc mentioned this is actually the same as [Date].[Calendar].[Calendar Quarter].MEMBERS in this case MSDN is your friend (and for mdx unlike some other ms languages I find msdn very good) - here is the definition of the MEMBERS function:
https://msdn.microsoft.com/en-us/library/ms144851.aspx
telling you the return type:
Returns the set of members in a dimension, level, or hierarchy.
nonemptycrossjoin
You mentioned nonemptycrossjoin - this is not needed any more - just a simple crossjoin via * is all that is needed with the last 2 or three versions of SSAS.