Understanding the DAX CALCULATE function - powerbi

I am working on a model in Power BI that has two datasets:
Set_1 (just a list of each group name)
Group:
1
2
3
and Set_2, a bunch of values per group in a different dataset:
Group: Value:
1 10
1 20
1 -7
2 100
2 -25
3 45
3 15
1 3
The tables are related by group. I want to create a measure on Set_1 that shows the sum of the values by group in Set_2. I can do so with the following DAX formula:
GroupSum = CALCULATE(SUMX(Set_2, Set_2[Value]))
looks like this
Group: GroupSum:
1 26
2 75
3 60
But I don't understand why the CALCULATE function, which doesn't take any filter contexts as parameters works the way it does in this instance. Without the CALCULATE function,
GroupSum = SUMX(Set_2, Set_2[Value])
looks like this:
Group: GroupSum:
1 161
2 161
3 161
Which makes sense. I just need help understanding how the Calculate function works, specifically when it isn't passed any filter parameters.
EDIT: The answer lies in the concept of "context transition" as pointed out below.

Using the CALCULATE function makes the DAX perform a context transition.
CALCULATE transforms all existing row contexts into an equivalent filter context before applying its filter arguments to the original filter context.
For more detail on this, check out the site I quoted above:
Understanding Context Transition.
In your example, the value in the Group column of each row acts as a filter when you use CALCULATE as if you had written something like CALCULATE(SUM(Set_2[Value]), Set_2[Group] = 1). Even though it doesn't have an explicit filter, the row context acts as a filter.

Related

Return Slicer's Value (trade simulator)

I work with a single table (called sTradeSim) that I have created in PowerQuery. It has 3 columns (Fund1, Fund2, Fund3), each having values from -10 to 10, with an increment of 1.
I also have three separate slicers, each created using an option "Greater than or equal to". Each slicer is having a field assigned to it - Slicer 1 = Fund1, Slicer 2 = Fund2, Slicer 3 = Fund3. Below is a screenshot of Slicer 1.
Right next to these three slicers is a table with three rows. For each row, I would like to retrieve the value of the respective slicers. So the desired result would look like:
Row No 1 = -10.00 (the value of Slicer 1),
Row No 2 = -2.00 (the value of Slicer 2),
Row No 3 = 3.00 (the value of Slicer 3).
Unfortunately, DAX formula that I have developed is always returning 3.00 (the value of the third slicer).
I have tried to find a solution on the forum and combine my SWITCH formula with ALL, ALLEXCEPT, SELECTEDVALUE etc., but it seems like I'm missing something very basic.
mHV_Trades =
SWITCH(
MAX(FundTable[FundsRanked]),
1, MIN(sTradeSim[Fund1]),
2, MIN(sTradeSim[Fund2]),
3, MIN(sTradeSim[Fund3])
)
What you are trying to do doesn't work, because essentially when you place 1 filter on any column on the table, it will filter all the rows that have that value. So, when you apply a filter fund1 = -10 it will also filter the values for fund 2 and fund 3.
You have 2 options:
Create independent tables each with values from -10 to 10
Create a table with all the combinations of -10 to 10 values for every fund.
For your example with 3 funds this works quite nicely (the table has about 10k records), all the combinations of -10 to 10 (21) to the power of 3, the problem with this solution is that depending on the number of funds you have you will run out of space quite quickly.

Why does my measure with SumX return this result?

I have 2 tables:
**Partners**
PartnerID Name
1 AAAA
2 BBBB
3 CCCC
4 DDDD
**Sales**
PartnerID SaleAmount
1 15
2 20
3 30
4 40
1 15
I have a visual with PartnerID from my Partners table and this measure:
TotalSalesMeasure: Sumx(Partners, Sum(Sales[SaleAmount]))
**Resulting table visual with measure**
PartnerID TotalSalesMeasure
1 30
2 20
3 30
4 40
What's confusing me is how the results are derived. It's my understanding that:
-The partners table is filtered using the incoming context(PartnerID)
-Filtered partners table is iterated through and Sum(Sales[SaleAmount]) is called for each row
-After Iteration is done, it is summed
First row ex:
-Partners is filtered to 1 row based on the partnerID
-Since it's using row context and not filter context in the partners table, it sums the entire Sales[SaleAmount] column one time
-That should result in 15+20+30+40+15 = 120, but it shows 30
I was basing this on a video here at around the 49 min mark where he does a similar operation:
https://www.youtube.com/watch?v=1yWLhxYoq88
What's odd is if I do the same thing he does later by wrapping Sum with Calculate, I get the same result(aside from the totals field). In fact, my result seems to be what calculate would return(again, outside of the grand total)
I'm obviously confusing something, but I don't know what it is
Edit:
I think I know what's going on now. The external filter context is applied to the sales table before summing. I didn't realize that it did that as well
SumX(
Partners <---Affected by exterior context
Sum(Sales[SalesAmount]) <---Affected by exterior context + row context
)

DAX selecting and displaying the max value of all selected records

Problem
I'm trying to calculate and display the maximum value of all selected rows alongside their actual values in a table in Power BI. When I try to do this with the measure MaxSelectedSales = MAXX(ALLSELECTED(FactSales), FactSales[Value]), the maximum value ends up being repeated, like this:
If I add additional dimensions to the output, even more rows appear.
What I want to see is just the selected rows in the fact table, without the blank values. (i.e., only four rows would be displayed for SaleId 1 through 4).
Does anyone know how I can achieve my goal with the data model shown below?
Details
I've configured the following model.
The DimMarket and DimSubMarket tables have two rows each, you can see their names above. The FactSales table looks like this:
SaleId
MarketId
SubMarketId
Value
IsCurrent
1
1
1
100
true
2
2
1
50
true
3
1
2
60
true
4
2
2
140
true
5
1
1
30
false
6
2
2
20
false
7
1
1
90
false
8
2
2
200
false
In the table output, I've filtered FactSales to only include rows where IsCurrent = true by setting a visual level filter.
Your max value (the measure) is a scalar value (a single value only). If you put a scalar value in a table with the other records, the value just get repeated. In general mixing scalar values and records (tables) does not really bring any benefit.
Measures like yours can be better displayed in a KPI or Multi KPI visual (normally with the year, that you get the max value per year).
If you just want to display the max value of selected rows (for example a filter in your table), use this measure:
Max Value = MAX(FactSales[Value])
This way all filter which are applied are considered in the measures calculation.
Here is a sample:
I've found a solution to my problem, but I'm slightly concerned with query performance. Although, on my current dataset, things seem to perform fairly well.
MaxSelectedSales =
MAXX(
FILTER(
SELECTCOLUMNS(
ALLSELECTED(FactSales),
"id", FactSales[SaleId],
"max", MAXX(ALLSELECTED(FactSales), FactSales[Value])
),
[id] = MAX(FactSales[SaleId])
),
[max]
)
If I understand this correctly, for every row in the output, this measure will calculate the maximum value across all selected FactSales rows, set it to a column named max and then filter the table so that only the current FactSales[SaleId] is selected. The performance hit comes from the fact that MAX needs to be executed for every row in the output and a full table scan would be done when that occurs.
Posted on behalf of the question asker

Sorting between groups based on a variable other than the one grouped on

I would like to use Pandas groupby to sort groups according to a value within each group. This value is not the one used for the grouping.
I am working with public transport data which tells me the stops and arrival times of different bus trips. Here is a sample of the dataframe (called stopTimes):
trip_id stop_sequence arrival_time
1 3 15:08:00
2 2 16:01:00
1 1 09:00:40
2 3 16:45:00
2 1 07:05:30
1 2 12:03:00
I would like to sort the trips according to the arrival time at the first stop. So the result of the sorting for the above dataframe would be:
trip_id stop_sequence arrival_time
2 1 07:05:30
2 2 16:01:00
2 3 16:45:00
1 1 09:00:40
1 2 12:03:00
1 3 15:08:00
I have been able to achieve this result already by:
timeSortedTrips = stopTimes.loc[stopTimes['stop_sequence']==1].sort_values('arrival_time')['trip_id']
stopTimes['trip_id'] = pd.Categorical(stopTimes['trip_id'],timeSortedTrips)
stopTimes = stopTimes.sort_values(['trip_id','arrival_time'])
However, I am curious: can I achieve this using groupby? If so, would it be more efficient? Additionally, I am new to Python, so if you have even better ideas to do this sorting please point me in that direction.
You can groupby trip_id and within each group, sort by arrival_time
stopTimes.arrival_time = pd.to_datetime(stopTimes.arrival_time)
stopTimes = stopTimes.groupby("trip_id", as_index=False).apply(lambda x: x.sort("arrival_time"))

Dynamic Rolling Window in SAS for correlation calculation

Problem: I have a data set as below -
Comp date time returns
1 12-Aug-97 10:23:38 0.919292648
1 12-Aug-97 10:59:43 0.204139521
1 13-Aug-97 11:03:12 0.31909242
1 14-Aug-97 11:10:02 0.989339371
1 14-Aug-97 11:19:27 0.08394389
1 15-Aug-97 11:56:17 0.481199854
1 16-Aug-97 13:53:45 0.140404929
1 17-Aug-97 10:09:03 0.538569786
2 14-Aug-97 11:43:49 0.427344962
2 14-Aug-97 11:48:32 0.154836294
2 15-Aug-97 14:03:47 0.445415114
2 15-Aug-97 9:38:59 0.696953041
2 15-Aug-97 13:59:23 0.577391987
2 15-Aug-97 9:10:12 0.750949097
2 15-Aug-97 10:22:38 0.077787596
2 15-Aug-97 11:07:57 0.515822161
2 16-Aug-97 11:37:26 0.862673945
2 17-Aug-97 11:42:33 0.400670247
2 19-Aug-97 11:59:34 0.109279307
These are nothing but share price returns for every company at a date and time level.
I need to calculate autocorrelation(degree 1) of returns over a period of 10 days for each Comp and date value combination. As you can see, my time series is not continuous, it has breaks for weekends and public holidays. In such cases, if i need to take a 10 day range, I can't use a intnk function as adding 10 days to the date column might include a saturday/sunday for which I don't have data for and hence, my autocorrelation value will be compromised. How do I make this range dynamic?
I found this question Calculating rolling correlations in SAS that I thought might help but then again, there is the same intnx problem.
You can use the INTERVALDS system option to define a custom interval that fits your needs. See this article for more details.
The basic concept is that you create a dataset containing all of your possible dates (or datetimes) and define an interval value for each one, then tell SAS via the system option to use that dataset when you use a particular interval name. Then use INTNX as normal.
Otherwise, you could just do a PROC FREQ of your data to get the unique days, and then use that to create a day counter; then instead of creating your fromDate with intnx, you can just use SQL to grab the row with a date 10 less than current date.