When developing a query in Power BI with a database data source, making any changes causes the query editor to 'start from scratch' and re-query the database.
Wondering if there is a workaround that allows you to develop a query without repeated long wait times by eg downloading a temporary local flat file of the full dataset which can be used to develop the query offline and can then be swapped out for the live database connection when you are happy with it.
Importing the data once, exporting as a csv from a Power BI table visualisation and re-importing as a new data source would work but maybe there's a simpler way?
Thanks
There's two approaches you can use.
If your database supports query folding, make the first step take just the top 200 records whilst you develop your query. Once your happy with it, remove the firstN filter.
Load the entire table to the model, export it to a csv using DAX studio, develop your query using the CSV and then switch back to the DB once you're happy with it.
Related
Context:
I use SQL lab to generate a table for charting but the it often times out when I connect the SQL query result (through explore) to multiple charts in a dash. So I try to use CTAS and CVAS in superset to create a middle layer table for charting to use which loads much faster. But the problem is that it doesn't refresh when new data comes in.
Current solution: I have searched for a week in the internet and found no relevant solution so far.
The Problem:
How to make CTAS and CVAS tables and views created through SQL lab auto-refresh?
Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.
I'm trying to build a simple report in Power BI based upon data published on a website.
Here is what I want to achieve
This website publishes data for COVID cases in the country.
The number are just the current numbers, without any time-series.
I want to fetch these numbers from this website daily and build a report on
top of it (with time series kind of analysis).
So I fetch these numbers (Get Data > Web > URL) and get this into a query I then add
a custom column with a timestmap (M's DateTime.LocalNow() function)
and get this data with the required timestamp.
Now I want to refresh this query daily, so that I get daily results in this query.
6. As expected, PBI simply overwrites the existing rows with new data,
with the latest timestamp (my custom column).
I tried few things like:-
Creating a new query and appending data to it, it doesn't seem to work, existing data gets over-written (maybe the way I have created the new query).
Explored incremental refresh functionality, it doesn't seem to fit my use case.
Tried looking at other similar posts, none seem to help me resolve this.
Questions:-
Is there a simple workaround to circumvent this (point#7) and have PBI append new data instead of overwriting existing data.
Am i correct on point#2 above (incremental refresh)?
Appreciate any pointers. Thanks in advance!
There is no simple workaround within Power BI.
Power BI is not designed to be used as a database where you store historical data. It's designed to connect to data and create reports from that, so you'll need to store the daily data somewhere external.
There are tons of ways to store the data. E.g., you could save them as CSVs in a folder that Power BI loads from or you could write them to a database table and connect to that.
Edit: That said, there is a non-simple workaround if this is something you really must do.
Though not recommended, you can use incremental refresh to trick Power BI into doing what you want.
I’m pretty new to Power BI and am still at the point of assessing whether it will meet our needs.
I’ve got as far as realising that when creating a new report I can either Import tables (I’m using SQL Server) and use a Direct Query.
The particular report I’m trying to report is quite resource intensive. To create the report in TSql requires iterating through hundreds of thousands of rows in multiple tables in a cursor and then storing some data in a temp table which is the output of the query. I’m very concerned about using the Direct Query option for this because of potential performance degradation on the server.
Is it possible in Power BI Desktop to Import the 5 tables that are used in my query, and then somehow write my query against these tables? That way (in-theory) the query wouldn’t be sent directly to our server each time someone views the report.
My question is based on my lack of knowledge of Power Bi so I may be asking something that is completely impossible!!
Thanks in advance for your help
Regards
Dotdev
That's exactly what Import option does. It imports the tables only once (unless you refresh or change your query). The viewer would be looking at the data that was extracted upon import and packaged into the PBIX file rather than a direct connection to the database.
I have a handful of Power BI Queries that hit the same datasource (Azure Blob Storage). Currently, when I want to refresh data, all the queries download the files from blob storage and parse them, making the process take far longer. Is there a way to have a query that does the download of the file and store it for other queries to read from so I don't have to download the same files over and over?
You can add a blank query and reference the original query:
Then, when you refresh, the data in both queries will be downloaded only once.