How to construct filter tables for SUMMARIZECOLUMNS function?
The SUMMARIZECOLUMNS has the following pattern:
SUMMARIZECOLUMNS(
ColumnName1, ...
ColumnNameN,
FilterTable1, -- my question concerns this line
FilterTableN,
Name1, [measure1],
NameN, [measure2],
)
I have checked that the following 3 patterns work. They return the same results, at least for the simple sample data I used.
SUMMARIZECOLUMNS (
T[col],
FILTER( T, T[col] = "red" )
)
SUMMARIZECOLUMNS (
T[col],
CALCULATETABLE( T, T[col] = "red" )
)
SUMMARIZECOLUMNS (
T[col],
CALCULATETABLE ( T, KEEPFILTERS ( T[col] = "red" ) )
)
Is any of these patterns superior over the other?
Reference: https://www.sqlbi.com/articles/introducing-summarizecolumns/
Update
I would be interested in an answer that contains a query plan analysis
or link to credible source. I would be grateful if you mentioned
using the SUMMARIZECOLUMNS function when grouping columns from
multiple tables.
You can also construct them the way PowerBI does, using VAR:
VAR __MyFilterTable = FILTER( T, T[col] = "red" )
RETURN
SUMMARIZECOLUMNS (
T[col],
__MyFilterTable
)
Which is more efficient will depend on the complexity your filtering, so there is no "one size fits all" rule necessarily. For a simple table level filter, just FILTER will suffice. I caution you that Line 1, where you're filtering the entire table T, is a bad idea. It's much more performant to only filter a single column. When you filter the entire table, DAX materializes the entire table in memory, while the following just materializes the one value of T[col]:
VAR __MyFilterTable = FILTER( ALL(T[col]), T[col] = "red" ) // This is better.
RETURN
SUMMARIZECOLUMNS (
T[col],
__MyFilterTable
)
You can do even better than that, conceptually. You can basically tell DAX, "I know this is a value, so don't even look in the table for it. Just make me a table and treat it as though I filtered it. Like this:
VAR __MyFilterTable = TREATAS ({"red"}, T[col] )
RETURN
SUMMARIZECOLUMNS (
T[col],
__MyFilterTable
)
Again, this is the pattern that PowerBI uses when performing its filters.
BTW, Creating the filter tables a the top vs. creating them inline with SUMMARIZECOLUMNS() won't make any difference for speed. Avoid using CALCULATETABLE() as you've done here generally.
You can also do this as well, though you aren't likely to see a speed increase generally:
CALCULATETABLE(
SUMMARIZECOLUMNS (
T[col]
),
KEEPFILTERS(T[col] = "red")
)
Related
So, I have two tables I need to combine without merging as I was told merging the tables does not auto update when posted so had to separate my tables. So, I have Fioptics and legacy tables where I pull RecordID, JobTypeID and CustTypeID individually below is how my code looks
legacy res install =
CALCULATE (
COUNT ( LEGACY[RecordID] ),
FILTER (
LEGACY,
LEGACY[JobTypeID] = 1
&& LEGACY[CustTypeID] = 1
&& LEGACY[prod_grouping] = "legacy"
)
)
and
fioptics res install =
CALCULATE (
COUNT ( FIOPTIC[RecordID] ),
FILTER (
FIOPTIC,
FIOPTIC[JobTypeID] = 1
&& FIOPTIC[CustTypeID] = 1
&& FIOPTIC[prod_grouping] = "fioptics"
)
)
how do I go about using a dax function to pull from both RecordID's, JobTypeID and CustTypeID and filter from my FIOPTICS AND LEGACY tables at the same time? It might be a simple answer but having a brain fart maybe overlooking a simple solution to this problem.
You can combine those columns in a calculated table
Combined =
UNION(
SELECTCOLUMNS(
LEGACY,
"RecordID", LEGACY[RecordID],
"JobTypeID", LEGACY[JobTypeID],
"CustTypeID", LEGACY[CustTypeID]
),
SELECTCOLUMNS(
FIOPTICS,
"RecordID", FIOPTICS[RecordID],
"JobTypeID", FIOPTICS[JobTypeID],
"CustTypeID", FIOPTICS[CustTypeID]
)
)
And of cause you can filter the combined table in your report to your needs.
I'm trying to replicate the example presented in this youtube tutorial
https://www.youtube.com/watch?v=z9ttZAZkEhs
However, even if I use the same DAX code the controls do not recognize the values properly.
Selected = calculate (DimProduct[uniqueCustomer], treatas( Values(Products[Name]), DimProduct[EnglishProductName] ) )
I tried different ways to recognize the values coming from the slicer, but they simply do not work.
CheckColumn = if (trim(DimProduct[EnglishProductName]) = trim(DimProduct[SelectedNumber2]),true,false)
I have attached the example file that I'm using.
https://github.com/gabrielacosta/TestPowerBiSlicer/blob/main/testslicer.pbix
Does anyone know what could be the issue.
I guess your calculation should look like the formula below but it doesn't make much sense doing that calculation.
Selected =
CALCULATE (
[uniqueCustomer],
TREATAS ( VALUES ( Products[Name] ), DimProduct[EnglishProductName] ),
VALUES ( DimProduct[EnglishProductName] )
)
For this specific scenario just using SELECTEDVALUE is better
Selected2 =
VAR SelectedProduct =
SELECTEDVALUE ( Products[Name] )
VAR CountCalc =
CALCULATE (
[uniqueCustomer],
FILTER ( DimProduct, [EnglishProductName] = SelectedProduct )
)
RETURN
CountCalc
I have created two measures:
Revenue Red 1 = CALCULATE([Revenue], FILTER('Product', 'Product'[Color] = "Red"))
and
Revenue Red 2 = CALCULATE([Revenue], KEEPFILTERS('Product'[Color] = "Red"))
that seem to behave similarly
The measure behaviour can be investigated by downloading
https://github.com/MicrosoftDocs/mslearn-dax-power-bi/raw/main/activities/Adventure%20Works%20DW%202020%20M06.pbix
and adding the two measures above.
Based on the documentation https://learn.microsoft.com/en-us/dax/keepfilters-function-dax I understand that the CALCULATE filters replace any filters on the same column, whereas the KEEPFILTERS clause always applies no matter what. But I still find this confusing and I wonder what is the best practice to use these two constructs. Any insights will be appreciated.
When you write a measure on the form:
Measure :=
CALCULATE (
[Revenue] ,
'Product'[Color] = "Red"
)
The filter is translated at query time to:
Measure :=
CALCULATE (
[Revenue] ,
FILTER (
ALL ( 'Product'[Color] ) ,
'Product'[Color] = "Red"
)
)
Note that any filter in the current filter context is removed by the ALL function (and not by CALCULATE in and of itself).
If the filter context of this important, you can invoke KEEPFILTERS to change the semantics, in order to retain the filter on the specified column. Which means that a measure on the form:
Measure :=
CALCULATE (
[Revenue] ,
KEEPFILTERS ( 'Product'[Color] = "Red" )
)
Is translated to:
Measure :=
CALCULATE (
[Revenue] ,
FILTER (
VALUES ( 'Product'[Color] ) ,
'Product'[Color] = "Red"
)
)
I would like to add something to the comment posted by Marcus because I feel it doesn't answer the question 100%. As Marcus says already, this here:
Revenue Red 2 = CALCULATE([Revenue], KEEPFILTERS('Product'[Color] = "Red"))
becomes this:
Revenue Red 2 = CALCULATE([Revenue], KEEPFILTERS(FILTER(ALL('Product'[Color]), 'Product'[Color] = "Red")))
As Marcus explained already, ALL() removes any filter context on 'Product'[Color] but KEEPFILTERS() allows to merge the filter specified in FILTER() as well as any other filter specified by the filter context of the same column.
Your question, however, was what the difference between your Red1 and your Red2 queries are. As you wrote as well, the result is the same in your example. As stated in this article here, Red1 iterates the entire product table whereas Red2 is a boolean expression. The latter will usually exhibit faster performance than passing an iterator/table function like FILTER() into calculate.
I know this must be extremely simple, but every example I can find online only works within a single table. I've simplified my situation to these two tables:
I want to add a calculated column to the first table, showing the most recent value for that id. It also needs to work with text.
There are a variety of ways to do this kind of thing as I've explained before and all of the solutions there can be adjusted to work in this case.
Doing this as a calculated column and with a second table, you need to make sure you are using row context and filter context appropriately.
Here's are a couple different possibilities I think may work:
MostRecentValue =
MAXX ( TOPN ( 1, RELATEDTABLE ( Table2 ), Table2[date] ), Table2[value] )
In this one, RELATEDTABLE is doing the work of filtering Table2 to only the rows where id matches Table1.
MostRecentValue =
VAR PrevDate = CALCULATE ( MAX ( Table2[date] ) )
RETURN CALCULATE ( MAX ( Table2[value] ), Table2[date] = PrevDate )
The relationship is more subtle here. Wrapping the MAX in CALCULATE forces a context transition so that the row context (which includes id) is applied to Table2 as filter context.
What is the difference between CALCULATE(m, x=red) and CALCULATE(m, KEEPFILTERS(x=red))
Apparently they are not the same. I found docs and explanation but I still do not get it.
https://learn.microsoft.com/en-us/dax/keepfilters-function-dax
https://dax.guide/keepfilters/
Your first measure, without KEEPFILTERS overrides all other filters applied to field x.
The second measure, using KEEPFILTERS maintains the filter context on field x, and applies the new filter context as a subset of existing filters (or blank, if no overlap of filter contexts).
Here's a simple example PBIX file to demonstrate - play with the colour slicer, and see the difference in the two measures: https://pwrbi.com/so_57850298/
This SQLBI.com article explains it well.
It is helpful to understand a bit more about how simple predicates in CALCULATE are evaluated. The following two expressions are equivalent; in fact, the first is just syntactic sugar for the second - the former is rewritten to the latter behind the scenes:
CALCULATE ( [m], 'T'[Col] = "Red" )
and
CALCULATE (
[m],
FILTER (
ALL ( 'T'[Col] ),
'T'[Col] = "Red"
)
)
FILTER is an iterator that takes a table as its first argument and a predicate to be evaluated in row context as its second argument. It removes any rows from the input table where the predicate is false.
Thus, CALCULATE manipulation of filter context is actually almost entirely manipulations of tables. If you're comfortable with the relational algebra, the tables in args2-N of CALCULATE are tables which are left semijoined to the table(s) being operated on in the expression in CALCULATE's arg1. These semijoins depend on relationships being defined in the data model.
So the pattern of FILTER ( ALL ( 'T'[Col] ), <predicate> ) ignores any external filter context on 'T'[Col] and replaces that with a new filter context that you are defining.
Now for KEEPFILTERS. I am not 100% positive that this is just syntactic sugar, but I believe it is. Either way, the two expressions below are semantically equivalent - they will always return the same values:
CALCULATE ( [M], KEEPFILTERS ( 'T'[Col] = "red" ) )
and
CALCULATE (
[M],
FILTER (
VALUES ( 'T'[Col] ), // this is the only difference from the first expansion
'T'[Col] = "red"
)
)
You can see that the KEEPFILTERS expansion is using VALUES instead of ALL. So, ALL returns all unique values from the named column, ignoring any filter context on that column (it also has other forms where it can operate on more than one column, but that is not relevant to this discussion). VALUES returns the unique values from the named column in the current filter context.
Another way to think of this is as follows. Assume that the value "red" does exist in 'T'[Col]. FILTER ( ALL ( 'T'[Col] ), 'T'[Col] = "red" ) will always return the 1-column, 1-row table of 'T'[Col] with the value "red". FILTER ( VALUES ( 'T'[Col] ), 'T'[Col] = "red" ) will always return a 1-column table, either with 0 or 1 row; if the external filter context includes 'T'[Col]="red", then it will return the 1-row table with 'T'[Col]="red", whereas if the external filter context does not include that value, it will return the empty table.
Again, the table output of the FILTER expressions above is treated as the right side table in a left semi-join.
Note, especially, that all of the above is based on single columns. You might get thrown for a loop if there are multiple columns contributing filter context. Here is an easy-to-understand example. We define two measures and put them into a table visual with 'DimDate'[Year] and 'DimDate'[Date].
Prior Date =
VAR CurrentDate = MAX ( 'DimDate'[Date] )
RETURN
CALCULATE (
MAX ( 'DimDate'[Date] ),
'DimDate'[Date] = CurrentDate - 1
)
Prior Day ALL DimDate =
VAR CurrentDate = MAX ( 'DimDate'[Date] )
RETURN
CALCULATE (
MAX ( 'DimDate'[Date] ),
ALL ( 'DimDate' ),
'DimDate'[Date] = CurrentDate - 1
)
And here's what they return in our table visual:
Arithmetic with dates is defined in DAX and <date> - 1 will always return the prior date. So CurrentDate - 1 on 2019-01-01 is 2018-12-31. But in our visual, we have filter context coming from both 'DimDate'[Year] and 'DimDate'[Date], so in the first measure, we're calculating MAX ( 'DimDate'[Date] ) in the filter context of 'DimDate'[Year]=2019 and 'DimDate'[Date]=2018-12-31 (the context manipulation in our CALCULATE). There are no rows in 'DimDate' that simultaneously match both of those conditions, so the first version returns blank. The second version clears all filter context coming from 'DimDate', so the only context remaining is what we explicitly apply with 'DimDate'[Date] = CurrentDate - 1.
Note that the example above would only return values for totals when we use KEEPFILTERS.
Prior Date KEEPFILTERS =
VAR CurrentDate = MAX ( 'DimDate'[Date] )
RETURN
CALCULATE (
MAX ( 'DimDate'[Date] ),
KEEPFILTERS ( 'DimDate'[Date] = CurrentDate - 1 )
)
This only works on totals, because at a detail level, there is no way for KEEPFILTERS ( 'DimDate'[Date] = CurrentDate - 1 to return any values. It's saying, essentially "find me a date that is one less than itself", which is obviously impossible. But at a grand total level, there are many dates in context, and so we're filtering a table of many contiguous dates. So our measure can return something for totals.