How to perform incremental refresh on merged table?

How to perform incremental refresh on merged table? - powerbi

I have 2 tables in Power query editor.
I want to merge them and implement incremental load on merged table.
Following is my plan:
Merge both tables into a new table (Table3)
Disable refresh and disable load for both tables.
How to configure Incremental refresh on Table3?
Do I need to also configure Incremental refresh on Table1 and Table2?
So technically- will each table get incrementally loaded and then merge. Or will entire data be merged and then incrementally loaded?

For this to work you need to, in simple terms:
Create your limiting parameters RangeStart and RangeEnd
Set up a filter on applicable date columns using RangeStart and RangeEnd parameters for your subqueries Table 1 and Table 2 (this controls data ingestion)
Set up the same type logic for the applicable date column in Table 3 (this controls data deletion)
Configure incremental refresh time logic
For it to be actually efficient you also need to make sure:
Data is transactional in nature
Both subqueries are foldable and from the same data source
The resulting table is foldable
If the queries are not foldable, it will require a full data load and subsequent filter anyway, removing the benefits of incremental refresh.
There exists a nice write-up of this in the Power BI Community pages that details how you would go about setting this up for a header/detail table join.

Related

Does Power BI support Incremental refresh for expanded table?

I have made table as follows (https://apexinsights.net/blog/convert-date-range-to-list):
In this scenario suppose I configure Incremental refresh on the Start Date column, then will Power BI support this correctly. I am asking because - say the refresh is for last 2 days or last 2 months, then it will fetch the source rows, and apply the transform to the partition. But my concern is that I will have to put the date param filter on Start date prior to non folding steps so that the query folds (alternatively power query will auto apply date filter so that query can fold).
So when it pulls the data based on start date and apply the transforms then I'm not able to think clearly about what kind of partitions it will create. Whether it is for Start date or for expabded date. Is query folding supported in this scenario?

This is a quite complicated scenario, where I would probably just avoid adding incremental refresh.
You would have to use the RangeStart/RangeEnd parameters twice in this query. Once that gets folded to the data source to retrieve ranges that overlap with the [RangeStart,RangeEnd) interval and a second time after expanding the ranges to filter out individual rows that fall outside [RangeStart,RangeEnd).

PBI generated query having a lengthy where clause which filter out certain records (Direct Mode)

We are creating PBI report with data sources in Direct Mode. Row level security has been implemented through some tables. While viewing the report with data security , we could see a lengthy where clause is included in the PBI generated back end query (Snowflake). It seems the where clause is containing values for each records of the result set, but sometimes here they miss certain records and we couldn't identify why those values are not included.
We tried using dynamic parameters and able to see the records are coming in the grid, but we couldn't go with parameters as there are some issues with RLS and multi selection.
Is there a solution to avoid generating the where clause in the back end query?

Can you replace fields from query with fields from a split query in Power BI?

I have a report in Power BI that cannot refresh because the data from the table is too large:
The amount of data on the gateway client has exceeded the limit for a single table. Please consider reducing the use of highly repetitive strings values through normalized keys, removing unused columns, or upgrading to Power BI Premium
I have tried to shrink the columns used in the data set to the best of my ability, but it is still too large to refresh. I did a test where, instead of using just a single query to retrieve the data, I made two queries that split the columns roughly half and half and then link them back together in Power BI using their ID column. It looked to me that the test data refresh started working upon splitting up the table's data into two separate queries.
Please correct me if there is a better method to trim the data down to allow the data set to refresh, but for now this is the best solution I see. What I am wondering is, since now my data is split into two separate queries, what is the best way to adapt the already existing visualizations I have that are linked up to the full, non-refreshable query to the split, refreshable queries? It looks to me like I would have to recreate the visuals from scratch, but if there is a way to simply do a mass replace of the fields that would save so much time. The split queries I created both have the same fields as the non-split query.

PowerBI Query Performance

I have a PowerBI report that has a few different pages display different visuals. The report uses the same table of data (lets call it Jobs).
The previous author of this report has created two queries in the data section that read off this base table of data, but apply different transformations and filters to the underlying data. Then, the visuals use either of these models to display their data. For example, the first one applies a filter to exclude certain columns based off a status field and the other applies a different filter, and performs transformations on some of the columns
When I manually refresh the report, it looks like the report is retrieving data for both of these queries, even though the base data is the same. Since the dataset is quite large, I am worried that this report has been built inefficiently but I am not sure if there is a better way of doing this.
TL;DR; The Source and Navigation of both of queries is exactly the same - is this retrieving the data twice and causing my report to be inefficient, and if so, what is the approrpiate way to achieve what I am trying to do?

PowerBi will try to parallelize as much as possible. If you have two queries that read from the same table then two queries will be executed.
To avoid this you can:
create a query which only gets the necessary data from the table.
Set this table not to be loaded in the model (toggle "Enable Load")
Every other table that starts from this table won't be a clone of this but will reference it.
In this way, the data will be fetched once from the source and then used to create other tables using PowerQuery.

Power BI - how to load from a source based on min/max values of another already-loaded source

I am attempting to build a data model in Power BI from a data mart (star schema) in SQL Server. This data mart has a fact table and several dimension tables. One of the dimension tables is a date table. I want to load all rows from the fact table. However, I only want to load a subset of the date table. In particular, I want those dates (rows) between the min and max dates in my fact table. This way, when I create slicers and such, I don't have unnecessary dates appearing.
In other BI tools (e.g., Qlik Sense), the usual solution is to first load the fact table into memory, compute and load its min/max dates into another table (also in-memory), set variables from this other table, load the date dimension table (into memory) based on the min/max variables, and finally drop the temporary table from memory so that it doesn't stay in the model and cause problems. This seems like the most efficient solution to me. It only reads the required (as opposed to unnecessary) data from the source dimension table, it doesn't need to perform any joins in the source, and it only reads each table once (as opposed to 2+ times).
How can I achieve this in Power BI? Or, more importantly, is this solution method even possible in Power BI?
I found this solution, but it seems inefficient, as it creates 2 queries (instead of just 1) for the min/max and, moreover, it performs the dimension table filtering after all rows have already been fetched from the source. (In my particular case, this isn't too bad. But, it could be problematic in other situations in which my dimension table is large.)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js