Aggregating a dimension table by fact table in PowerBI

Aggregating a dimension table by fact table in PowerBI - powerbi

I have a hierarchy of three PowerBI tables
Courses:
Course ID
Course credits
C1
2
C2
3
Practical Courses:
Practical Course ID
Course ID
Year
Number of Students
C1.22
C1
2022
10
C1.23
C1
2023
15
C2.21
C2
2021
17
Practical course lecturers:
Lecturer name
Practical Course ID
Jack
C1.22
Jack
C1.23
Jill
C1.22
Jill
C2.21
Note that one course may have multiple practical courses and that one practical course may have multiple lecturers.
I would like to find the total number of credits and students for each lecturer. For the given sample data, I would like to get the following results
Lecturer name
Credits
Students
Jack
4
25
Jill
5
27
The problem is that the filters go the "wrong" way. I somehow need to aggregate the metadata filtered by the fact table value.
Is there a simple way of doing this without resorting to bidirectional filters? This is a general problem I encounter in multiple forms.
Edit: I was asked whether the following thread answers my question Avoiding bidirectional filter for a matrix report
There are two answers that are suggested there:
Use a cross-filter to define an ad-hoc filter for this specific query.
Use a common field that appears in the fact table.
I don't see how these answers apply to this question. In particular, it seems cumbersome to set two cross-filters to connect the fact table with the top-level table (There may be multiple levels). The second suggestion isn't relevant since there is no common field between the first and third tables.

#peter is correct. I'll make the following points.
You have a snowflake schema. That is why " it seems cumbersome to set two cross-filters to connect the fact table with the top-level table"
Your sample data doesn't match your results. I have corrected this in my example below.
Here is how you achieve your desired result.
Measure 1:
Students = CALCULATE( SUM('Practical Courses'[Number of Students]), CROSSFILTER('Practical Courses'[Practical Course ID] , 'Practical Course Lecturers'[Practical Course ID], Both))
Measure 2:
Credits = SUMX('Practical Course Lecturers', CALCULATE(SUM(Courses[Course credits]), CROSSFILTER('Practical Courses'[Practical Course ID] , 'Practical Course Lecturers'[Practical Course ID], Both),CROSSFILTER(Courses[Course ID], 'Practical Courses'[Course ID], Both) ))
There are 3 ways to aggregate a dimension from a fact if you don't use bi-directional filters.
Pass whole fact table as a filter in Calculate (not good)
Use cross filter in calculate
Use TREATAS()

Related

How is it possible for DAX syntax to reference the original table name when using table variables?

This question comes from an example that I'm trying to understand in The Definitive Guide to DAX, Second Edition chapter 4. If you want the sample Power BI file, you can download it from the website above; it's Figure 4-26 in chapter 4. Here is the DAX code:
Correct Average =
VAR CustomersAge =
SUMMARIZE ( -- Existing combinations
Sales, -- that exist in Sales
Sales[CustomerKey], -- of the customer key and
Sales[Customer Age] -- the customer age
)
RETURN
AVERAGEX ( -- Iterate on list of
CustomersAge, -- Customers/age in Sales
Sales[Customer Age] -- and average the customer’s age
)
I understand the logic behind how SUMMARIZE and AVERAGEX are used in this example, and the requirements are all clear. What's confusing to me is how AVERAGEX references Sales[Customer Age]. Since AVERAGEX is operating on the summarized CustomersAge table variable, I would have assumed that the syntax would have been something along the lines of:
AVERAGEX (
CustomersAge,
[Customer Age] -- This is the line that I assumed would be different
)
How is it that the code given in the book is correct? Does the table variable (and the summarized table it contains) somehow have pointers to the original underlying table and column names? And is that normal for writing DAX queries, to always reference the original underlying table and column names when using table variables for intermediate steps?

Yes, the columns have what's known as data lineage. Sometimes you even have to restore lineage if it gets lost. You can read more about it here: https://www.sqlbi.com/articles/understanding-data-lineage-in-dax/

Lars, To the best of my understanding this is how I can explain it.
Creating a variable doesn't create a table that is added to the model. You can think of variables as steps or placeholders of a series of DAX expressions.
And so in the case of the SUMMARIZE used in the CustomerAge variable in this code, you'd see that the actual columns in the model were what was referenced in the arguments of SUMMARIZE. So when you perform calculations on that variable, the columns you can access are the actual columns in the model rather new columns.
What the variable has done is to help you break down the process of writing the calculation and make it less complex.
The code you wrote, as what you expect, would have been valid if in the CustomerAge variable, we created a new column, say Age * 2, and needed to perform the average over that. Then in that case that new column isn't part of the model, thus we'd reference it like you wrote.
I just got my copy of the book but I hope this helps a bit.

Filter one sided table in many to one relationship

I would like two measures that SUM the Sales[Value] for all the Sales[ID] that have a specific StatusID in SalesStatus.
One that can filter on Sales[Date], and one that can filter on SalesStatus[statusDate]
Diagram
Regards,
Anders

In this scenario I would consider modifying your model to have only two tables by combining what appears to be two FACT tables (sales, sales status). Depending on what your data consists of I would either UNION the two tables after joining and then treat the date in your Sales table as another status date (i.e. shipped complete or sale finished, whatever that date represents) OR I would join the two tables and have two relationships to the date table.
This will create a duplicated data issue as you will ideally result in having the value column in your final fact table. If you go with the union option, you can force the user to select a single sales status effectively removing the sales duplication. If you end up with two connections to the date table, you can use the USERELATIONSHIP() function to write the two different sales measures, and the one that uses the date from the Sales table will need some clever tricks to ensure the data does measure does not duplicate. I would try to UNION the tables though.
For more details, I would research what's referred to as SEMI-ADDITIVE fact tables in datawarehousing. There is a great article from SQL BI on the subject. I have tried setting up models like you diagrammed and even if I could get them to work through intense DAX measures, they would produce unexpected results and have poor performance. I find the Semi additive fact table pattern to be a much cleaner solution once you get passed the data duplication that results.
Example:

Calculated column returning all results

I have a colleague table (named 'colleagues') which lists all of the colleagues in the department. I have another table (named 'cases') which lists all of the cases worked by all of the colleagues. In the cases table, there is a column called 'outcome' which will be either Good, Satisfactory or Bad depending on how well the case was dealt with.
colleagues table:
cases table:
There is a one to many relationship between the colleague table and the cases table. I am trying to create a calculated column in the colleague table which will sum how many 'Good' outcome cases they had in total. This the calculated column formula I have:
CALCULATE(COUNTROWS('cases'), FILTER('cases', 'cases'[outcome]="Good")
This calculated column is just adding all of the 'Good' cases for everyone rather than just the individual colleague. See calculated column below with column name 'Good':
The expected behavior is that the column would calculate the number of 'Good' cases each colleague had. This is the expected outcome:

Create a calculated column as follows:
CountValues =
CALCULATE(COUNTROWS('cases'),'cases'[outcome]="good")
p.s. If this (or another) answer helps you, please take a moment to "accept" the answer that helped by clicking on the check mark beside the answer to toggle it from "greyed out" to "filled in".

Another solution that doesn't involve CALCULATE is
COUNTROWS( FILTER( RELATEDTABLE('cases'), 'cases'[outcome]="Good" ) )

Can a measure be created in Power BI that references two tables that share no relationship?

I'm trying to create a matrix table in Power BI to display the monthly rent projections for a number of properties. I thought I could simply create a measure that summed the rent from one table and then displayed it by month based on start and end date conditions, but it's been a while since I created any measures and I had forgotten that there needs to be a relationship between columns, among other things.
Data Model
A site can have more than one lease associated with it and a lease can have both car-parks and floors associated with it, of which there can be multiple.
In addition to the tables in the linked image, once I had sorted out what I thought would be the easy step I was going to add another table which includes the estimated percentage rent increase and the period in which the increase will occur.
I started out by trying to create a measure along the lines of the following:
Matrix Test =
IF (
HASONEVALUE ( Period[Month] ),
IF (
Period[Month] >= Leases[Custom Start Date],
SUM ( Floor_Rent[Annual Rent] ) / 12,
0
),
0
)
This would need to be expanded upon because the end date of a lease would also need to be taken into consideration.
As well as forgetting about the relationship requirements, I've forgotten how to deal with the issue of narrowing down to a single value within a column.
The result is supposed to be something that looks like this:
The blanks indicate a lease that starts in the future or ends within the time-frame displayed.
When I try linking the Leases table and the Period table on Leases[Start month for current term] and Period[Month] all I can get to is a table that shows the rent amount in the month the lease starts.
Is what I'm trying to achieve possible? If so, how do I accomplish the desired result?
Link to .pbix file
Solution
The direct answer to the title question is probably 'no', but while trying to figure out how I could use Pratik Bhavsar's LOOKUPVALUE suggestion I had a thought and performed a clumsy google search - power bi create table for each value in column - and found this post. By meddling with some of the DAX in said post I was able to come up with the following:
Test Table =
GENERATE(
SELECTCOLUMNS(
VALUES(Leases[Lease ID]),"Lease ID",[Lease ID]
),
SELECTCOLUMNS(
VALUES(Period[Month]),"Month",[Month]
)
)
The result is a table with each Lease ID mapped against each Month. I can't claim to understand exactly how the functions work, and it's not the outcome I thought I needed, but it allows me to achieve exactly what I set out to do.
I've accepted Pratik Bhavsar's answer because it effectively accomplishes the same thing as the work around I implemented. Pratik's solution might be better than what I eventually landed on, but I need to have a closer look at how the two compare.

The following DAX will give you a table with all buildings mapped against all rows in the period table, eliminating the requirement of a relationship.
SiteToPeriod =
CROSSJOIN(
SELECTCOLUMNS(Sites1, "Building name/label", Sites1[Building name/label]),
Period
)

How to display data from different data source tables in a single table in Power BI

I have a couple of different tables in my Report, for demonstration purposes lets say that I have 1 data source that is Actual Invoice amounts and then I have another table that is Forecasted amounts. Each table has several dimensions that are the same between them, let say Country, Region, Product Classification and Product.
What I want is to be able to display a table/matrix that pulls information from both of these data sources like this
Description Invoice Forecast vs Forecast
USA 300 325 92%
East 150 175 86%
Product Grouping 1 125 125 100%
Product 1 50 75 67%
Product 2 75 50 150%
Product Grouping 3 25 50 50%
Product 3 25 50 50%
West 150 150 100%
Product Grouping 1 75 100 75%
Product 1 25 50 50%
Product 2 50 50 100%
Product Grouping 3 75 50 150%
Product 3 75 50 150%
I have not been able to figure out a way to combine the information from the multiple data source into a single matrix table, so any help would be appreciated. The one thing that I did find was somebody hard coded the structure of the rows into a separate data source and then used DAX expressions to pull in the pieces of information into the columns, but I don't like this solution because the structure of the rows is not constant.

What you're asking about is a common part of the star schema: combining facts from different fact tables together into a single visual or report.
What Not To Do (That You Might Be Tempted To)
What you don't want to do is combine the 2 fact tables into a single table in your Power BI data model. That's a lot of work and there's absolutely no need. Especially, since there are likely dimensions that the 2 fact tables do not have in common (e.g. actual amounts might be associated with a customer dimension, but forecast amounts wouldn't be).
What you also don't want to do is relate the 2 fact tables to each other in any way. Again, that's a lot of work. (Especially since there's no natural way to relate them at the row level.)
What To Do
Generally, how you handle 2 fact tables is the same as you handle a single fact table. First, you have your dimensions (country, region, classification, product, date, customer). Then you load your fact tables, and join them to the dimensions. You do not join your fact tables to each other. You then create measures (i.e. DAX expressions).
When you want to combine measures from the two facts together in a single matrix, you only use rows/columns that are meaningful to both fact tables. For example, actual amounts might be associated with a customer, but forecast amounts aren't. So you can't include customer information in the matrix. Another possibility is that actual amounts are recorded each day, whereas forecasts were done for the whole month. In this situation, you could put month in your matrix (since that's meaningful to both), but you wouldn't want to use date because Power BI wouldn't know how to divide up forecasts to individual dates.
As long as you're only using dimensions & attributes that are meaningful to both fact tables, you can easily create a matrix as you envision above. Simply drag on the attributes you want, then add the measures (i.e. DAX expressions).
The Invoice & Forecast columns would both be measures. The two measures from different fact tables can be combined into a 3rd measure for the vs. Forecast measure. Everything will work as long as you're just using dimensions/attributes that mean something to both fact tables.
I don't see anything in your proposed pivot table that strikes me as problematic.
Other Situations
If you have a situation where forecasts are at a month level and actual is at a date level, then you may be wondering how you'd relate them both to the same date dimension. This situation is called having different granularities, and there's a good article here I'd recommend reading that has advice: https://www.daxpatterns.com/handling-different-granularities/. Indeed, there's a whole section on comparing budget with revenue that you might find useful.
Finally, you mention that someone hard-coded the structure of the rows and used DAX expressions to build everything. This does, admittedly, sound like overkill. The goal with Power BI is flexibility. Once you have your facts, measures & dimensions, you can combine them in any way that makes sense. Hard-coding the rows eliminates that flexibility, and is a good clue that something isn't right. (Another good clue that something isn't right is when DAX expressions seem really complicated for something that should be easy)
I hope my answer helps. It's a general answer since your question is general. If you have specific questions about your specific situation, definitely post additional questions. (Sample data, a description of the model, the problem you're seeing, and what you want to see is helpful to get a good answer.)
If you're brand new to Power BI, data models, and the star schema, Alberto Ferrari and Marco Russo have an excellent book that I'd recommend reading to get a crash course: https://www.sqlbi.com/books/analyzing-data-with-microsoft-power-bi-and-power-pivot-for-excel/

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js