I have a handful of Power BI Queries that hit the same datasource (Azure Blob Storage). Currently, when I want to refresh data, all the queries download the files from blob storage and parse them, making the process take far longer. Is there a way to have a query that does the download of the file and store it for other queries to read from so I don't have to download the same files over and over?
You can add a blank query and reference the original query:
Then, when you refresh, the data in both queries will be downloaded only once.
Related
When developing a query in Power BI with a database data source, making any changes causes the query editor to 'start from scratch' and re-query the database.
Wondering if there is a workaround that allows you to develop a query without repeated long wait times by eg downloading a temporary local flat file of the full dataset which can be used to develop the query offline and can then be swapped out for the live database connection when you are happy with it.
Importing the data once, exporting as a csv from a Power BI table visualisation and re-importing as a new data source would work but maybe there's a simpler way?
Thanks
There's two approaches you can use.
If your database supports query folding, make the first step take just the top 200 records whilst you develop your query. Once your happy with it, remove the firstN filter.
Load the entire table to the model, export it to a csv using DAX studio, develop your query using the CSV and then switch back to the DB once you're happy with it.
I have an existing PowerBI report that imports data from an SQL Server analytics services database. This is working fine and I can schedule automatic refreshes using the Gateway provided by my organization.
I would now like to add some additional, but rarely changing data, that I only have in a local Excel file. When I do add this data, the report stops refreshing automatically and complains, that it has no gateway to refresh this Excel file.
What I would like is that Power BI is refreshing the data of the SQL Server analytics services database, but just keeps the existing Excel file without updating it. - I will upload an updated version of the PowerBI report if I need to change the data in the Excel file.
Is that possible? I couldn't find out how. I was trying to upload the Excel file to a different dataset to the Power BI service and reference this dataset in my report. Just to find out, that I cannot access a different Power BI dataset and SQL server analysis services database from the same report.
Three things I can think of
Upload the file to onedrive/sharepoint so that it's accessible online (per Dev's answer)
If the data is simple enough, you can add the data directly into PowerBI itself and skip the Excel file entirely.
You can disable the Excel file refresh so that PBI does not try and refresh(and thus access) the local Excel file. (Not sure if this will work)
I had a similar issue I came across. Yes, you can just use Enter Data to add a table, but you can only build something with less than 3000 cells, so you'd have to merge several tables if something was larger than that.
Turning off the report refresh in the suggestion above (#3) still requires a gateway, unfortunately.
I just created a dataflow and plopped the data from my csv there. You'll have to create a connection and refresh it, but you don't need to schedule a refresh there, so no need to create a gateway.
Then just link the dataflow as a source to your .pbix file and setup your gateway to point at the dataflow.
Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.
What I do:
I built ETL processes with power query to load data (production machine stop history) from multiple Excel files directly into PowerBI.
On each new shift (every 8 hrs.) there is a new excel file generated by the production machine that need to be loaded to the data model too.
How I did it:
To do so, power query is processing all files found in a specific folder.
The problem:
During query refresh it need to process all the data files again and again (old files + new files).
If I remove the old files from the folder, power query removes the data also from the data model during the next refresh cycle.
What I need / My question:
A batch process copies new files into the folder while removing all the old files.
Is there a possibility to configure powery query in a way that it keeps the existing data inside the data model and just extend it with the data from the new files?
What I would like to avoid:
I know building a database would be one solution but this requires a second system with new ETL process. But power query does already a very good job for preprocessing the data! Therefore and if possible, it would be highly appreciated if this problem could be solved directly inside power query / power bi.
If you want to shoot sparrows with a cannon gun, you could try incremental refresh, but it's Premium feature.
In Power BI refreshing a dataset reloads it, so first it is cleared, and second - you will need all the files to re-load them and recalculate everything. If you don't want this, you have to either change your ETL to store the data outside of the report's dataset (e.g. a database would be a very good choice), or to push the data from the new files only to a dataset (which I wouldn't recommend in your case).
To summarize - the best solution is to build ETL process and put the data in a datawarehouse, and then to use it as a datasource for your reports.
Without a premium licensing, is it possible to simulate an incremental refresh to speed up Power BI Desktop?
Say, we keep all the data before a certain date in a local Access database and connect to the "live" database only for data after that date?
The question is how to export the historical data from one or several pbix file to Access, how can we do that?
Try doing it as a composite model. Load your archive data as one query using Import and your recent data as another query using Direct Query. Then you can union those to tables as a DAX calculated table and use that for your report.
If you aren't using Direct Query for recent data or you need to be refreshing your model, then I believe you can uncheck "Include in report refresh" in the query editor (right-click on the query in the Queries pane) and it won't refresh that archive table unless you specifically ask it to.