We have a situation were a table full data is replaced during data refresh. We would like to compare the refreshed data with the previous data that it held.
One option that I thought of was to clone the table and then refresh the original table. However, refreshing the original table also refreshes the data in the clone.
Is there any other way that the original table is refreshed without refreshing the clone/copy table?
Other solution I thought was to use SSAS and create two tables on the top of the source. This way each table can be processed individually.
However, in this situation, SSAS is not an option so is it possible to do this within Power BI?
Power BI is not really suited to this purpose and accumulating historic data is best done upstream in your DB or data warehouse.
However, there are options out there using dataflows which you can read about here although it is not a recommended pattern: https://data-azure.com/2021/10/25/dataflows-to-store-as-historical-data/
Related
I need help with improving refresh times on a Power BI dashboard with about 20M rows of data and 80 columns pulling from SQL Server. I cannot use Power BI Service in any capacity, this has to load into Power BI Desktop.
My refresh times on the raw data (virtually no transformations in Power Query) are taking about 3-4 hours.
Microsoft recommends incremental refresh to archive my historical data and only refresh the latest changes, but that requires Service and I 100% cannot use it.
Is there any other way to significantly improve my refresh times beyond Service's incremental refresh? If it was under an hour I'd be happy.
What I've tried:
Native Query to leverage the server
reducing column selections
removing all transformations
Splitting tables in Power Query and selectively turning off refresh in the historical tables - as soon as they get stacked/appended Power Query triggers a refresh on all stacked tables regardless of which ones have refresh turned off.
Looked into Power Query PQFL/M code to activate refresh of tables - can't find any method/property to control this in M code.
optimizing the SQL, haven't gotten any significant improvements.
20 million rows should not take that long, especially with no transformations. Something else is going wrong but without access to your data and hardware, it is impossible to say.
One possibility is do an initial data load and then turn off refresh on that query. Add a new query for just the new data (which should be quick) but load the new query to a completely new table. In PBI, you will then have two tables. Create a calculated table in DAX which is a union of your old, non-refreshed data and your new data. Refreshes should be very quick after your first load but obviously you need to think about how it scales as your data grows.
I have a power BI dataset that takes its data from a software made by the IT team in my organization.
I was wondering if it was possible for me to "freeze" all the data in the PBI dataset (like, taking a picture of the data for exemple today) and use this dataset for further analysis (I have another power BI file linked to that Power BI dataset). I know the data won't refresh, but it's not important for what I need to do, as I only need to have the past info.
The reason why I need to know if that's possible is that I'm going oversea for one month and won't have access to the original dataset. Downloading all the data into one excel is impossible as it is way to big.
thanks
It sounds like you're after some sort of snapshotting functionality
If you just wanted to keep the file as is, then you can download the pbix and just not refresh it provided its in import mode.
However one approach you could take if you want to continue doing development without worrying about accidentally refreshing is to use a power bi dataflow
You could copy your power query queries to a dataflow. Refresh them all as at today. Then don't refresh the dataflow anymore
You can then point your power bi dataset to your dataflow
https://learn.microsoft.com/en-us/power-bi/transform-model/dataflows/dataflows-create
That way if you wanted to do further transformation of data, you wouldnt be getting new data from the data source (so long as you dont refresh the dataflow)
In Power BI first we get source data. And then we add multiple query steps to filter data/remove column/etc. Then we add relations and model the data.
We can have calculated columns that are stored in the data. And measures that are not stored in the data but calculated on the fly.
Which data is stored in Power BI - the one after query or the one after modelling?
Power BI has 3 connection types for data access. They are import, direct query and live connections.
If we use import method as a connection type, data imported into Power bi file using Power BI desktop. So all the data always stays in disk. When query or refresh, data stays in computer memory.This data we can use to query and modeling. After work, we save the Power BI file it will save as file with .pbix extension. Data compressed and stored inside this file.
in direct query mode , data stays in remote location and we can connect data. each time we refresh or make change in slicer request goes to data source and bring back data to power bi. In this method, we can't access data but we can create data model.
live connection is another method. It only support for few data sources. In this method, data not stored in computer memory and can't create data model using Power BI desktop.
Power BI is very well documented. Many of the questions you've recently asked are answered in that resource, so please take a look. I get the feeling that you are using this community because you don't want to read the manual. I strongly suggest you take a look at the documentation, because everything we write in answer to your questions has already been written and documented, and SO is not meant to be a shadow user guide for well documented systems.
Depending on the data source you use in Power BI Desktop, Power BI supports query folding, which will do as much processing of the data at the source (for example SQL Server).
If query folding is not possible because the source does not support it, then the source data is loaded before the query steps are applied.
Read more about query folding here: https://learn.microsoft.com/en-us/power-bi/guidance/power-query-folding
When you perform additional modelling after the Power Queries are loaded, i.e. creating tables with DAX, adding columns, etc., these will be performed when the PBIX file is published to the Power BI service, and they will be performed each time the data is refreshed with the data gateway.
I'm trying to build a simple report in Power BI based upon data published on a website.
Here is what I want to achieve
This website publishes data for COVID cases in the country.
The number are just the current numbers, without any time-series.
I want to fetch these numbers from this website daily and build a report on
top of it (with time series kind of analysis).
So I fetch these numbers (Get Data > Web > URL) and get this into a query I then add
a custom column with a timestmap (M's DateTime.LocalNow() function)
and get this data with the required timestamp.
Now I want to refresh this query daily, so that I get daily results in this query.
6. As expected, PBI simply overwrites the existing rows with new data,
with the latest timestamp (my custom column).
I tried few things like:-
Creating a new query and appending data to it, it doesn't seem to work, existing data gets over-written (maybe the way I have created the new query).
Explored incremental refresh functionality, it doesn't seem to fit my use case.
Tried looking at other similar posts, none seem to help me resolve this.
Questions:-
Is there a simple workaround to circumvent this (point#7) and have PBI append new data instead of overwriting existing data.
Am i correct on point#2 above (incremental refresh)?
Appreciate any pointers. Thanks in advance!
There is no simple workaround within Power BI.
Power BI is not designed to be used as a database where you store historical data. It's designed to connect to data and create reports from that, so you'll need to store the daily data somewhere external.
There are tons of ways to store the data. E.g., you could save them as CSVs in a folder that Power BI loads from or you could write them to a database table and connect to that.
Edit: That said, there is a non-simple workaround if this is something you really must do.
Though not recommended, you can use incremental refresh to trick Power BI into doing what you want.
Without a premium licensing, is it possible to simulate an incremental refresh to speed up Power BI Desktop?
Say, we keep all the data before a certain date in a local Access database and connect to the "live" database only for data after that date?
The question is how to export the historical data from one or several pbix file to Access, how can we do that?
Try doing it as a composite model. Load your archive data as one query using Import and your recent data as another query using Direct Query. Then you can union those to tables as a DAX calculated table and use that for your report.
If you aren't using Direct Query for recent data or you need to be refreshing your model, then I believe you can uncheck "Include in report refresh" in the query editor (right-click on the query in the Queries pane) and it won't refresh that archive table unless you specifically ask it to.