I have a question about MDX tuples, I would like to gain some insight on something that seems confusing to me.
Most of the literature I have read talks about tuples being a set of co-ordinates essentially pointing to a cell which contains a measure value. From what I understand a tuple is defined as containing only one distinct member from each dimension. Typically when writing queries we don't specify every member for every dimension we let SSAS engine use the default members and aggregate the measure data accordingly.
Straight out of the adventure works sample OLAP database (cube) "adventure works"
A super simple query that I understand represents a tuple:
SELECT
([Date].[Calendar Quarter of Year].&[CY Q3],[Measures].[Sales Amount]) --Tuple
ON COLUMNS
FROM [Adventure Works]
SS Management studio returns this result
No problem here the tuple specified by the &[CY Q3] member point to the cell containing the displayed measure amount. Clearly a tuple has been returned.
Typically though I use this sort of thing more often:
select
non empty ([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount]) --Tuple??
ON COLUMNS
FROM [Adventure Works]
Which returns all the quarter totals across all years for said measure (not a great example but it's just an example):
I see this result as a set because more than one distinct member has been returned from the same dimension (date). In fact, by default all members are being returned if so how can it be a tuple?
So my question is this. The parenthesis around the "tuple" in the query above, indicate to me that I'm selecting a tuple, the query engine processes and a result is returned that to me looks like a set, not because more than one cell value is returned but because more than one member from the date dimension has been used.
The query indicates that a tuple is being selected, and the query engine seems to accept it as one however the result set, includes multiple members from the same dimension and corresponding cell values indicating to me that more than one tuple will be returned --> set.
Also, The query engine throws no error when I treat it as a set and use set functions on it:
select
nonempty({([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount])}) --Set
ON COLUMNS
FROM [Adventure Works]
My question is this, Assuming I am correct and that the results do in fact represent a set (a set of tuples denoted by each distinct member instance), why does the query engine allow you to specify parenthesis indicating selection of a tuple to return something that is not a tuple?
This makes more sense to me :
SELECT
nonemptycrossjoin(
{[Date].[Calendar].[Calendar Quarter]}, --Set 1
{[Measures].[Sales Amount]} --Set 2
)
ON COLUMNS
FROM [Adventure Works]
At least this code reflects the result set that's returned Thoughts?
Or is it all just Analysis Services semantics?
Thanks
Unfortunately, MDX is an ambiguous language. From what I've understood, the () notation is a tuple or an operator precedence notation. And:
{...} , {...}
is actually a crossjoin:
{...} * {...}
Then when you specify a " level " where a set is expected, the level.members is defaulted. When you specify a tuple where a set is expected a singleton set is created with this only tuple. So:
( [level], [measures].[amount] )
is equivalent to:
crossjoin( [level].members, { [measures].[amount] } )
The only tuple (a member is a tuple) specified is [Measures].[Amount] which by the way does not use the () notation ;-)
You mention that you typically use this syntax - but I write mdx nearly every day and never use this syntax -
SELECT
NON EMPTY ([Date].[Calendar].[Calendar Quarter],[Measures].[Sales Amount]) ON COLUMNS
FROM [Adventure Works];
I'm a little surprised it runs as it seems to be indicating to analysis services to create a tuple from a level and a measure member.
MarcP mentions that mdx is ambiguous, but I'm more in favour of saying it can be written ambiguously - it unfortunately fails quite graciously most of the time and you get no numbers returned or the wrong numbers - I wish it threw more errors and enforced tighter syntax rules as this might make it more understandable.
Your original script I would just use the * operator rather than typing out the full crossjoin function when you need it - in your script it is much more readable to move measures onto the rows and delete that tuple? Like Marc mentions SSAS's implicit use of the MEMBERS function - I find things more readable to include it explicitly when it is being used:
SELECT
NON EMPTY
[Date].[Calendar].[Calendar Quarter].MEMBERS ON 0,
[Measures].[Sales Amount] ON 1
FROM [Adventure Works];
[Date].[Calendar].[Calendar Quarter]
As Marc mentioned this is actually the same as [Date].[Calendar].[Calendar Quarter].MEMBERS in this case MSDN is your friend (and for mdx unlike some other ms languages I find msdn very good) - here is the definition of the MEMBERS function:
https://msdn.microsoft.com/en-us/library/ms144851.aspx
telling you the return type:
Returns the set of members in a dimension, level, or hierarchy.
nonemptycrossjoin
You mentioned nonemptycrossjoin - this is not needed any more - just a simple crossjoin via * is all that is needed with the last 2 or three versions of SSAS.
Related
I'm making a time-spending tracker based on the work I do every hour of the day.
Now, suppose I have 28 types of work listed in my tracker (which I also have to increase from time to time), and I have about 8 significance values that I have decided to relate to these 28 types of work, predefined.
I want that, as soon as I enter a type of work in cell 1 - I want the adjacent cell 2 to get automatically populated with a significance value (from a range of 8 values) that is pre-definitely set by me.
Every time I input a new or old occurrence of a type of work, the adjacent cell should automatically get matched with its relevant significance value & automatically get populated in real-time.
I know how to do it using IF, IFS, and IF_OR conditions, but I feel that based on the ever-expanding types of work & significance values, the above formulas will be very big, complicated, and repetitive in the future. I feel there's a more efficient way to achieve it. Also, I don't want it to be selected from a drop-down list.
Guys, please help me out with the most efficient way to handle this. TUIA :)
Also, I've added a snapshot and a sample sheet describing the problem.
Sample sheet
XLOOKUP() may work. Try-
=XLOOKUP(D2,A2:A,B2:B)
Or FILTER() function like-
=FILTER(B2:B,A2:A=D2)
You can use this formula for a whole column:
=INDEX(IFERROR(VLOOKUP(C14:C,A2:B9,2,0)))
Adapt the ranges to your actual tables in order to include in the second argument all the potential values and their significances
This is the formula, that worked for me (for anybody's reference):
I created another reference sheet, stating the types of work & their significance. From that sheet, I'm using either vlookup, filter, xlookup.Using gforms for inputting my data.
=ARRAYFORMULA(IFS(ROW(D:D)=1,"Significance",A:A="","",TRUE,VLOOKUP(D:D,Reference!$A:$B,2,0)))
I have a couple of rather large nested if functions in my spreadsheet. It sure would be nice to have an alternative method. Problem is I'm using a wildcard (*) in my lookup because the source text has slight variations (date for example).
For example, if my list of data contains:
VENMO PAYMENT 220828 1022093447487 BRENDA HOSPY
VENMO PAYMENT 220813 1031323447487 BRENDA HOSPY
I want these to show in an adjacent column of cells as just Venmo
Currently my if function in that second column of cells is:
=IF(COUNTIF($F10,"*APPLE.COM/BILL*"),"AP",
IF(COUNTIF($F10,"IIA VOYA*"),"VOYA",
IF(COUNTIF($F10,"VENMO PAYMENT*"),"Venmo",
IF(COUNTIF($F10,etc...
This works fine but quickly gets unruly as more things get added.
I've spent a great deal of time searching for functions and processes that would make this easier, or at least more compact, but I can't find a way with typical functions like vlookup or index/match.
If I've explained this in a comprehensible fashion perhaps you've seen or experienced a similar situation and could offer a suggestion. It would be appreciated!
I'm not opposed to using a programming function.
I've looked at, and for, various Excel functions or combinations with no luck on my own or online.
I have created a structure as below
Formula present in B2 is as below
=IFERROR(INDEX($F$2:$F$9,MIN(IF(COUNTIF(A2,"*"&$E$2:$E$9&"*")>0,ROW($E$2:$E$9),9999999)-1)),"---")
Enter it as an Array Formula using Ctrl+Shift+Enter
It will search all the strings present in column E in A2 when found will return all the row numbers of column E where there is a match, i have then used min to get the first one, and if not found it will return 9999999, and as the data is starting from row 2 i have added -1 to make it equal to the data index. after that i have called the index to search value present at that index in column F. and at the end used the if error function to show --- where no match was found and 999999 was returned.
My issue is that I need to reference a cell (A1) which will either be the name of a state that can be found in column L, or it can be "All States" which I then want to include all results of column L. I can't work out how to include this.
=SUMPRODUCT(--(IF(A1="All States",Data!$L:$L,Data!$L:$L=A1)),Data!Q:Q)
I want to add a bunch more criteria based on the above so I don’t want to go down the route of imbedding the sumproduct in an if function because the formula will quickly become too unweildy.
You have a lot of choices. Using your initial formula I would tweak it to
(A) =SUMPRODUCT((IF($A$1="All States",1,($L$2:$L$11=$A$1)))*($Q$2:$Q$11))
But this would need to be entered as an array formula so instead of just confirming with ENTER, you need CONTROL+SHIFT+ENTER. You will know you have done it right when { } show up around your formula. Note that they cannot be added manually.
A non array type formula which would be faster I believe would be to look at your two options. You are either dealing with a single state or all states. Set up an IF check to determine if you need to sum all of column Q, or if you need to find a single value from column Q. I used the following formula:
(B) =IF(A1="all states",SUM($Q$2:$Q$11),INDEX($Q$2:$Q$11,MATCH($A$1,$L$2:$L$11,0)))
A bit of a cheat but but simplifies things, is to add a final state to the bottom of your list in L and call is "All States". In the corresponding row in Q place =sum(First Cell:Last Cell). If you do that then you can use the following formula:
(C) =SUMPRODUCT(($L$2:$L$12=$A$1)*($Q$2:$Q$12))
That are other options out there as well, just thought I would show some options.
The documentation for COUNTAX (DAX) and COUNTX (DAX)
states that the second argument is an expression that is evaluated for each row.
See: https://learn.microsoft.com/en-us/dax/countax-function-dax
This is exactly what I need, but I cannot figure out what the 'expression' should look like.
The example given in Microsoft's documentation is this:
=COUNTAX(FILTER('Reseller',[Status]="Active"),[Phone])
But the second argument ([Phone]) does not look like an expression.
An expression in my expectation is something like "value > 3 AND value <= 10"
What kind of expression can be used here?
In the example [Phone] is the expression evaluated for each row in the resulting table. To clarify, since COUNTAX and COUNTX return the count of nonblank rows, the count of nonblank values in the [Phone] column is the computed expression. After the FILTER function is applied to the table, the expression would be equivalent to COUNT([Phone]) for this context. Using the Server Timings feature in Dax Studio the text representation of what is passed to the storage engine can be viewed. In the case of COUNTX you will see a query with IS NOT NULL in the WHERE clause for the column used as the expression, as would be [Phone] in this case, with the COUNT function selected since any rows with a blank [Phone] will have already been filtered out.
The statement below is an example query from the Server Timings feature in DAX Studio using the example measure from your question. As you can see, there are two filters in the WHERE clause. The first on the Status column, to return only rows that are active. The second is to eliminate null values in the Phone column. This leaves the COUNT aggregate function to count all rows that have an active Status and a non-blank value in the Phone column, which is equivalent to a count of the Phone column with the active Status. The query here is only a text representation of the requests sent to the storage engine, thus the syntax displayed won't be actual SQL, but will give you a better idea of how the DAX is being processed.
SET DC_KIND="AUTO";
SELECT
COUNT ( )
FROM 'Reseller'
WHERE
'Reseller'[Status] = 'Active' AND
'Reseller'[Phone] IS NOT NULL;
'Estimated size ( volume, marshalling bytes ) : 224012, 1392082'
usually in dax the term expression means Column or Formula, so you could use either a column or a formula in the countax/countx or in any other functions that are evaluated for each row (generally functions that end in x :D )
Thank you #userfl89, for the enlightening and analytic explanation.
I see now that <expression> is actually a WHERE clause. Now it starts making sense why Microsoft choosed to use the word 'expression' here.
Still, it all is full of contradictions.
The syntax for Count is
COUNT(<column>)
Then, [Phone], in COUNT([Phone]) is of type <column>. I assume this is also just a WHERE clause towards the storage engine?
Compare with COUNTAX(FILTER('Reseller',[Status]="Active"),[Phone]), where [Phone] is of type <expression>.
The real logic in COUNTAX is in fact the FILTER-function.
The <expression> is always a WHERE clause that but selects the non-blanc values. OK, it is an 'expression', but it is always the same one. Right?
I cannot but conclude that Microsoft's type system in their syntax description is a bit messy and makes it hard for me (as a starter on PowerBI) to make my own DAX queries based on their documentation.
Or they should come up with better examples to compensate for the confusing syntax descriptions
I need to define a calculated member in MDX (this is SAS OLAP, but I'd appreciate answers from people who work with different OLAP implementations anyway).
The new measure's value should be calculated from an existing measure by applying an additional filter condition. I suppose it will be clearer with an example:
Existing measure: "Total traffic"
Existing dimension: "Direction" ("In" or "Out")
I need to create a calculated member "Incoming traffic", which equals "Total traffic" with an additional filter (Direction = "In")
The problem is that I don't know MDX and I'm on a very tight schedule (so sorry for a newbie question). The best I could come up with is:
([Measures].[Total traffic], [Direction].[(All)].[In])
Which almost works, except for cells with specific direction:
So it looks like the "intrinsic" filter on Direction is overridden with my own filter). I need an intersection of the "intrinsic" filter and my own. My gut feeling was that it has to do with Intersecting [Direction].[(All)].[In] with the intrinsic coords of the cell being evaluated, but it's hard to know what I need without first reading up on the subject :)
[update] I ended up with
IIF([Direction].currentMember = [Direction].[(All)].[Out],
0,
([Measures].[Total traffic], [Direction].[(All)].[In])
)
..but at least in SAS OLAP this causes extra queries to be performed (to calculate the value for [in]) to the underlying data set, so I didn't use it in the end.
To begin with, you can define a new calculated measure in your MDX, and tell it to use the value of another measure, but with a filter applied:
WITH MEMBER [Measures].[Incoming Traffic] AS
'([Measures].[Total traffic], [Direction].[(All)].[In])'
Whenever you show the new measure on a report, it will behave as if it has a filter of 'Direction > In' on it, regardless of whether the Direction dimension is used at all.
But in your case, you WANT the Direction dimension to take precendence when used....so things get a little messy. You will have to detect if this dimension is in use, and act accordingly:
WITH MEMBER [Measures].[Incoming Traffic] AS
'IIF([Direction].currentMember = [Direction].[(All)].[Out],
([Measures].[Total traffic]),
([Measures].[Total traffic], [Directon].[(All)].[In])
)'
To see if the Dimension is in use, we check if the current cell is using OUT. If so we can return Total Traffic as it is. If not, we can tell it to use IN in our tuple.
I think you should put a column in your Total Traffic fact table for IN/OUT indication & create a Dim table for the IN & Out values. You can then analyse your data based on IN & Out.