PowerBI Query Performance - powerbi

I have a PowerBI report that has a few different pages display different visuals. The report uses the same table of data (lets call it Jobs).
The previous author of this report has created two queries in the data section that read off this base table of data, but apply different transformations and filters to the underlying data. Then, the visuals use either of these models to display their data. For example, the first one applies a filter to exclude certain columns based off a status field and the other applies a different filter, and performs transformations on some of the columns
When I manually refresh the report, it looks like the report is retrieving data for both of these queries, even though the base data is the same. Since the dataset is quite large, I am worried that this report has been built inefficiently but I am not sure if there is a better way of doing this.
TL;DR; The Source and Navigation of both of queries is exactly the same - is this retrieving the data twice and causing my report to be inefficient, and if so, what is the approrpiate way to achieve what I am trying to do?

PowerBi will try to parallelize as much as possible. If you have two queries that read from the same table then two queries will be executed.
To avoid this you can:
create a query which only gets the necessary data from the table.
Set this table not to be loaded in the model (toggle "Enable Load")
Every other table that starts from this table won't be a clone of this but will reference it.
In this way, the data will be fetched once from the source and then used to create other tables using PowerQuery.

Related

How the datas are get in Power Bi (direct query) with a filter

I was wondering, when you get the data in power bi (with Direct Query)
And you filter the data in the Power Query Editor.(Lignes filtrées = Filtered rows)
You will get the data first then you filter the data,
or you filter the data first the you get the data?
Actually, both are data which contains different values.
PowerBI is operating data from top to bottom. So, it gets the data from Source, then navigates to the selected table, then applies filters and some other jobs.
So, getting the data first and then filtering it is meaningful because you can see all the data changes in the Edit Query page.
But getting the data after filtering it in query may also meaningful because all the filtering is handled in database and if you have a good infrastructure, your data will come faster.
And the best case approach is: "using filtering in queries is OK but don't be complex on your database queries". So, there is no correct answer for your question, you should just use a little bit of this and that depend on the quality of your data.

Optimize unpivot and filter

I am trying to make the following visualization:
Visualization
Using the following fact table called Crashes (Crash_ID, which is not shown, is the primary key, there are many more columns that have been left out):
Fact table
My approach was to first unpivot the "EA" columns from the fact table using DAX:
EA_Unpivots = UNION( SELECTCOLUMNS(Crashes,"Crash_ID",Crashes[Crash_ID],"EA","EA_Distracted_Driving","Counts",Crashes[EA_Distracted_Driving]), SELECTCOLUMNS(Crashes,"Crash_ID",Crashes[Crash_ID],"EA","EA_Impaired_Driving","Counts",Crashes[EA_Impaired_Driving]), SELECTCOLUMNS(Crashes,"Crash_ID",Crashes[Crash_ID],"EA","EA_Intersection_Safety","Counts",Crashes[EA_Intersection_Safety]), SELECTCOLUMNS(Crashes,"Crash_ID",Crashes[Crash_ID],"EA","EA_Older_Road_Users","Counts",Crashes[EA_Older_Road_Users]), SELECTCOLUMNS(Crashes,"Crash_ID",Crashes[Crash_ID],"EA","EA_Pedestrian_Safety","Counts",Crashes[EA_Pedestrian_Safety]), SELECTCOLUMNS(Crashes,"Crash_ID",Crashes[Crash_ID],"EA","EA_Roadway_and_Lane_Departures","Counts",Crashes[EA_Roadway_and_Lane_Departures]), SELECTCOLUMNS(Crashes,"Crash_ID",Crashes[Crash_ID],"EA","EA_Speeding","Counts",Crashes[EA_Speeding]) )
and then use
EA_Counts = GROUPBY(EA_Unpivots,EA_Unpivots[EA],"EA_Count",SUMX(CURRENTGROUP(),EA_Unpivots[Counts]))
to setup the table needed to produce the visualization.
However, the drawback of this approach is that the visualization will not react to different filters being applied dynamically on the dashboard since EA_Counts does not have Crash_ID as a column anymore, and the filters indirectly operate on Crash_ID by selecting different attributes of each crash from the fact table.
Because of this, I noticed that EA_Counts was unnecessary and I could get the visualization by just creating a relationship between the fact table, Crashes, and EA_Unpivots on Crash_ID:
data model
and then setting up the visualization like this:
visualization
Here is my question: Is there a way to achieve the same result without creating EA_Unpivots? The reason is that EA_Unpivots is very large and blows up the file size. It seems there must be a more efficient way to achieve this. Thanks.

Can you replace fields from query with fields from a split query in Power BI?

I have a report in Power BI that cannot refresh because the data from the table is too large:
The amount of data on the gateway client has exceeded the limit for a single table. Please consider reducing the use of highly repetitive strings values through normalized keys, removing unused columns, or upgrading to Power BI Premium
I have tried to shrink the columns used in the data set to the best of my ability, but it is still too large to refresh. I did a test where, instead of using just a single query to retrieve the data, I made two queries that split the columns roughly half and half and then link them back together in Power BI using their ID column. It looked to me that the test data refresh started working upon splitting up the table's data into two separate queries.
Please correct me if there is a better method to trim the data down to allow the data set to refresh, but for now this is the best solution I see. What I am wondering is, since now my data is split into two separate queries, what is the best way to adapt the already existing visualizations I have that are linked up to the full, non-refreshable query to the split, refreshable queries? It looks to me like I would have to recreate the visuals from scratch, but if there is a way to simply do a mass replace of the fields that would save so much time. The split queries I created both have the same fields as the non-split query.

Run a SQL Stored procedure from Power BI using Direct Query

Is there a way to call a SQL server stored procedure from Power BI?
Do you have to use Import Query mode only?
I need the visual based on the SQLstored procedure to refresh though!!
Can you do this with Query Direct Mode?
Please help!
It's not possible.
From Microsoft's Documentation:
Limited data transformations
Similarly, there are limitations in the data transformations that can be applied within Query Editor. With imported data, a sophisticated set of transformations can easily be applied to clean and reshape the data before using it to create visuals, such as parsing JSON documents, or pivoting data from a column to a row form. Those transformations are more limited in DirectQuery.
First, when connecting to an OLAP source like SAP Business Warehouse, no transformations can be defined at all, and the entire external model is taken from the source. For relational sources, like SQL Server, it's still possible to define a set of transformations per query, but those transformations are limited for performance reasons.
Any such transformation will need to be applied on every query to the underlying source, rather than once on data refresh, so they're limited to those transformations that can reasonably be translated into a single native query. If you use a transformation that is too complex, you receive an error that either it must be deleted or the model switched to import.
Additionally, the query that results from the Get Data dialog or Query Editor will be used in a subselect within the queries generated and sent to retrieve the necessary data for a visual. The query defined in Query Editor must be valid within this context. In particular, it's not possible to use a query using Common Table Expressions, nor one that invokes Stored Procedures.

Power Query Formula Language - Detect type of columns

In Power BI, I've got some query tables generated from imported data. All the data comes in as type 'Any', and I'm trying to automatically detect the type of the data in each column.
Some of the queries generate tables with columns based on the in-coming data - I don't know what the columns are going to be until the query runs and sets up the table (data comes from an Azure blob). As I will have quite a few tables to maintain, which columns can change (possibly new columns being added) with any data refresh, it would be unmanageable to go through all of them each time and press 'Detect Data Type' on the columns.
So I'm trying to figure out how I can do a 'Detect Data Type' in the query formula language to attach to the end of the query that generates the table columns. I've tried grabbing the first entry in a column and do Value.Type(column{0}), however this seems to come out as 'Text' for a column which has integers in it. Pressing 'Detect Data Type' does however correctly identifies the type as 'Whole Number'.
Does anyone know how to detect a column's entry types?
P.S. I'm not too worried about a column possibly holding values of different data types
You seem to have multiple issues here. And your solution will be fragile, there's a better way. But let's first deal with column type detection. Power Query uses the 'any' data type as it's go to data type. You can write a function that samples the rows of a column in a table does a best match data type detection then explicitly sets the data type of the column. This is probably messy and tricky since you need to do it once per column. This might be workable for a fixed schema but for a dynamic schema you'll run into a couple of things very quickly. First you'll need to write some crazy PQ code to list all the columns and run you function on each. This will work the first time, but might break in subsequent refreshes because data model changes are not allowed during refresh. If you're using a tool like Power BI Desktop, you'll be able to fix things up. If you publish your report to the Power BI service, you'll just see refresh errors.
Dynamic Schemas will suffer the same data model change issue I mentioned above.
The alternate solution that you won't have problems with is using a Direct Query data source instead of using Power Query. If you load your data into Azure SQL or a Tabular Model, the reporting layer will get the updated fields automatically so you don't have to try to work around using PQ.