about table expansion in regular relationships in power bi - powerbi

I am trying to understand the Microsoft power bi documentation about "Understand Model Relationships" ; I understood most part of the documentation, except expanding table in regular relationships and limited relationships. Below is the paragraph that I am referring to from the document:
"At query time, regular relationships permit table expansion to take place. Table expansion results in the creation of a virtual table by including the native columns of the base table and then expanding into related tables. For Import tables, it's done in the query engine; for DirectQuery tables it is done in the native query sent to the source database (as long as the Assume referential integrity property isn't enabled). The query engine then acts upon the expanded table, applying filters and grouping by the values in the expanded table columns."
The above mentioned paragraph is under the heading: "Regular Relationships". Would anyone please help me understand the context. Thank you for giving your valuable time.
enter link description here
Please find the link to the document in this post

Related

Showing Side by Side Measures from Semi-Related Tables in Power BI

I have a data model in Power BI that, among other things, has the following tables
Employees (Dimension; employee ID/name)
Jobs (Dimension; contains details about the job, including job ID)
Employee history- Contains a record for each day an associate was in a job(snapshot table);
Job Budget History- Contains a record for each day a job was budgeted(snapshot table)
Calendar Table
The table is modeled like so (simplified version):
In Power BI, I am trying to make a simplified table view that contains measures based on both the budget history as well as the employee history for the most recent day in the dataset (simple count rows/distinct count of calendar table)
However, attempting to do so gives me the results below if I try to put both measures on the table. Basically it appears to be doing a cross join between each table and matching associates with jobs they don't have (this happens when the budget is added).
Of course, if I just do one of the singular measures everything works perfectly. I am fairly certain it is because there is no real connection between the 'employee' and the 'budget history' in this relationship, so it is just joining everything on the date without any context.
I have tried several things such as making inactive relationships with userelationship(), using visual level filters etc. but I'm not sure what the best option in this situation would be. (I am trying to avoid bidirectional relationships if possible)
Ideally this information should show on this date that Joe was present as President, Sally was present as an operator, and the Manager position had nobody, but all three were budgeted.
Any advice is appreciated. I have attached a simplified mockup pbix file for reference.
PBIX File
This is a complicated problem for many reasons. I was able to produce this report:
by removing field "Name" from the table and replacing it a measure:
Employee Name =
CALCULATE(
SELECTEDVALUE(Employees[Name]),
CROSSFILTER(Employees[Employee_ID], Employee_History[Employee_ID], BOTH)
)
It looks exactly like the report you want, but if you have additional requirements, you'll need to make sure that such approach works for you.
If this is acceptable, a brief explanation:
the root cause of the issue is missing Employee-Budget relationship. When you put Name in the table as a filter, it doesn't propagate to the budget table and causes a cartesian product.
Removing Name from the table eliminates the need for the filter propagation, but then you won't see employee names. I solved this by pulling employee names with the measure, where required propagation is forced by CROSSFILTER function (essentially, it's like a temporary bi-directional relation only when you need it, so it does not negatively affect the rest of the model).

PowerBI Query Performance

I have a PowerBI report that has a few different pages display different visuals. The report uses the same table of data (lets call it Jobs).
The previous author of this report has created two queries in the data section that read off this base table of data, but apply different transformations and filters to the underlying data. Then, the visuals use either of these models to display their data. For example, the first one applies a filter to exclude certain columns based off a status field and the other applies a different filter, and performs transformations on some of the columns
When I manually refresh the report, it looks like the report is retrieving data for both of these queries, even though the base data is the same. Since the dataset is quite large, I am worried that this report has been built inefficiently but I am not sure if there is a better way of doing this.
TL;DR; The Source and Navigation of both of queries is exactly the same - is this retrieving the data twice and causing my report to be inefficient, and if so, what is the approrpiate way to achieve what I am trying to do?
PowerBi will try to parallelize as much as possible. If you have two queries that read from the same table then two queries will be executed.
To avoid this you can:
create a query which only gets the necessary data from the table.
Set this table not to be loaded in the model (toggle "Enable Load")
Every other table that starts from this table won't be a clone of this but will reference it.
In this way, the data will be fetched once from the source and then used to create other tables using PowerQuery.

Most efficient Snowflake connection type from PowerBI?

We're trialling PowerBI on a Snowflake dimensional model and performance seems very non-optimised. Can anyone point me to information on best practices for this connection? I've previously used Tableau and there's an excellent white paper describing the pros/cons of each connection type and how to set this up so that as much heavy lifting as possible is done in Snowflake, with minimal load on the viz tool.
e.g. when you summarise 1 million invoices to get a chart of sales volume by year that distils this to 10 data points, Tableu would send 'SELECT year, sum(volume) FROM t GROUP BY year' (~10 rows), but in PowerBI we see SF receiving a query like 'SELECT invoice_id, sum(volume) FROM t GROUP BY invoice_id' (~1M rows) - leaving the viz tool to do a lot more work.
So far, we've tried mapping the individual facts and dimensions within PowerBI, and also using a mix of direct query and import, but without significant improvement. Is there any guidance on best practice?
Thanks in advance!
I've never used Snowflake, and I have no clue about how PowerBi interfaces to it. That said on the PowerBI side you may be interested in the composite model and aggregations.
MS Docs:
https://learn.microsoft.com/en-us/power-bi/desktop-composite-models
https://learn.microsoft.com/en-us/power-bi/desktop-storage-mode
https://learn.microsoft.com/en-us/power-bi/desktop-aggregations
Radacad's blog about aggregations:
https://radacad.com/power-bi-fast-and-furious-with-aggregations
https://radacad.com/dual-storage-mode-the-most-important-configuration-for-aggregations-step-2-power-bi-aggregations
In practice, when you are using a composite model the aggregation functionality allows you to create a hidden table (in import mode) in your model with aggregated data (by year, month, customer, etc).
Now when you query your data, PowerBI will check if this table can answer the query, if yes then it will just pick the data from this table, otherwise, it will run a query against the source (direct query)
The example you shared about PowerBI querying the source without asking for aggregation (but instead asking for every single InvoiceId) might be caused by not setting up the composite model correctly.
A table in "direct query" cannot reference other tables in its query (in this case the calendar) unless that table is also in "Direct query" or "dual" mode.
How does the model look like in the case you shared? and which is the storage mode of each table?

How can I get a mapped value from a many-to-one related table via Power BI DirectQuery?

I have 2 tables that share a Foreign Key. Power BI sees them as a Many (Table A) to One (Table B) relationship. All I'm trying to do is to get a value from Table B to show up as a column for Table A.
When I look at the table via PowerQuery (using "Edit Query" in Power Bi Desktop) I can see Table B but every row just shows "Value" as it's value. If I click "Value" I get the details of the related object below the table so I know the relationship works.
My struggle is that none of the methods I've seen via google results to get that value work for me.
I've tried using LOOKUPVALUE and RELATED.
RELATED(TableB[ColumnNameImTryingToRetrieve])
RELATED(TableB[IdColumn]
For the RELATED function, every variation I try for the ColumnName parameter either results in the error message
"The column 'TableB[NameIveGiven]' either doesn't exist or doesn't
have a relationship to any table available in the current context."
or the error message
"Parameter is not the correct type".
LOOKUPVALUE isn't even available as an option in the Intellisense options that come up so i can't try it.
I've seen a lot of references about LOOKUPVALUE not being available in DirectQuery mode and that there used to be an option in DirectQuery options called "Allow unrestricted measures in DirectQuery mode" but that is no longer available. This supposedly would have allowed LOOKUPVALUE to work.
Also, when I make most changes in PowerQuery when trying to add the new column I get the error message "This step results in a query that is not supported in DirectQuery mode".
Is there any simple way to get the value I'm after in DirectQuery mode or should I switch to Import Mode?
Okay, I got what I was after. I used "Merge Queries" in Power Query Editor to do a Left Join on the tables. Then I split the table column up that was created by the Join and left only the column I was after.
Then in a third table, I was able to to do:
RELATED(TableA[TableB.1.ColumnINeed])

Power BI Aggregations - Detail Tables Must Be DirectQuery Tables?

I have a simple data model, from the Contoso database, that looks like this:
I'm trying to set up the table named Online Sales Aggregate as an aggregate table. When I attempt to set up a mapping, all the detail tables are disabled (see below)
When I hover over a table I see a message that says, "Customers (for example) must be a DirectQuery table to be used as the detail table."
All the tables in the model, including the Online Sales Aggregate table were imported. Why do the detail tables need to be DQ tables?
This is currently a limitation that Microsoft has imposed at least while aggregates are still in preview.
From Microsoft's documentation:
Detail table must be DirectQuery, not Import.
According to Microsoft people, it's likely that this limitation will eventually go away.
v-lili6-msft: Power bi product team is improving this preview feature
JoshCaplan-MSFT: This is still a work in progress but it is coming.
To expand on what David says below, I'd guess that removing this limitation is not a high priority since the main use case for aggregations is for datasets that are too unwieldy to import. If you've already imported all the data, then adding an aggregate table probably won't really speed things up that much in most cases.
If you still do need an aggregate table for an imported table, then you can do the workaround he describes by creating a summarized table via the query editor or a DAX calculated table and write your measure(s) to try to read from that first. An added bonus with this method is that you can use custom measures in your summarized table instead of being limited to aggregate summarization functions (Count, GroupBy, Max, Min, Sum), though you'll need to be careful with how you handle non-additive measures.