Google BigQuery BI Engine monitoring and comparing - google-cloud-platform

I have recently been asked to look into BI Engine for our BigQuery Tables and Views. I am trying to find out how to compare the speed of using BI Engine reservation against not using it.. any way i can see this?
Thank you

Keep in mind that BI Engine uses BigQuery as a backend, for that reason, the BI Engine reservations works like BigQuery reservations too, based on this, I suggest you look the Reservations docs to get more information about the differences between On-demand capacity and flat-rate pricing.
You can find useful concepts about reservations in this link.

There are a couple of ways to do that:
1) If your table is less than 1Gb, it will use free tier. Then any dashboard created in Data Studio will be accelerated (see https://cloud.google.com/bi-engine/pricing).
2) If not, create reservation in pantheon: https://cloud.google.com/bi-engine/docs/reserving-capacity. Once you create reservation, Data Studio dashboards will be accelerated. You can experiment for couple of hours and remove reservation, and will only be charged for the time reservation was enabled.

BI Engine will in general only speed up smaller SELECT queries coming from Tableau, Looker etc., and the UI. So for example queries processing < 16 GB.
My advice would be to make a reservation for example for 8GB and then check how long it took for queries that used BI Engine. You can do that by querying the information schema:
select
creation_time,
start_time,
end_time,
(unix_millis(end_time) - unix_millis(start_time)) / 1000 total_time_seconds,
job_id,
cache_hit,
bi_engine_statistics.bi_engine_mode,
user_email,
query,
from `your_project_id.region-eu.INFORMATION_SCHEMA.JOBS`
where
creation_time >= '2022-12-13' -- partitioned on creation_time
and creation_time < '2022-12-14'
and bi_engine_statistics.bi_engine_mode = 'FULL' -- BI Engine fully used for speed up
and query not like '%INFORMATION_SCHEMA%' -- BI Engine will not speed up these queries
order by creation_time desc, job_id
Then switch off BI Engine, and run the queries that had BI Engine mode = FULL again, but now without BI Engine. Also make sure cache is turned off!
You can now compare the speed. In general queries are 1.5 to 2 times faster. Although it can also happen that there is no speed up, or in some cases a query will take slightly longer.
See also:
https://lakshmanok.medium.com/speeding-up-small-queries-in-bigquery-with-bi-engine-4ac8420a2ef0
BigQuery BI Engine: how to choose a good reservation size?

Related

Azure Data Explorer - Measuring the Cluster performance /impact

Is there a way to measure the impact to Kusto Cluster when we run a Query from Power BI. This is because the Query I use in Power BI might get large data even if it is for a limited time range. I am aware of setting - limit Query result record ,but I would like to measure the impact to Cluster for specific queries .
Do I need to use the metrics under - Data explorer monitoring. Is there a best way to do it and any specific metrics . Thanks.
You can use .show queries or Query diagnostics logs - these can show you the resources utilization per query (e.g. Total CPU time & memory peak), and you can filter to a specific user or application name (e.g. PowerBI).

Is there a way to connect PBI to a Databricks cluster that is not running?

In my scenario, Databricks is performing read and writing transformations in Delta tables. We have PBI connected to the Databricks cluster that needs to be running most of the time, which is expensive.
Knowing that delta tables are in a container, what would be the best way in terms of cost x performance to feed PBI from delta tables?
If your set size is under max allowed size in PowerBI (100 GB I guess) and daily refresh is enough you can just load everything to your PowerBI model.
https://blog.gbrueckl.at/2021/01/reading-delta-lake-tables-natively-in-powerbi/
If you want to save the costs maybe you don't need transactions and can save it in csv in data lake, than loading everything to PowerBI and refresh daily is really easy.
If you want to save the costs and query new incoming data all the time using DirectQuery consider using Azure SQL. It has really competitive prices starting from 5 eur/usd. Integration with databricks is also perfect write in append mode do all magic.
Another option to consider is to create an Azure Synapse workspace and use serverless SQL compute to query the delta lake files. This is a pay-per-the-TB consumed pricing model so you don’t have to have your Databricks cluster running all the time. It’s a great way to load Power BI import models.

Can Power BI power query be connected to a source of another Power BI report?

Is it possible in Power BI power query to connect from A.pbix report to the results of other B.pbix report? If so, how? The reason for doing this is that in A.pbix we have one sort of aggregation - say many monthly reports for one country, and in B.pbix we have another, second stage, sort of aggregation - say one report for all countries.
There are reasons for keeping it separated - tidiness, possibility to refresh single source, lower memory used.
The best option for this architecture is to publish B.pbix to a Workspace in the web service (app.powerbi.com) and then start A.pbix by connecting to the B.pbix dataset via Online Services / Power BI service.
That will make the entire dataset from B.pbix available for re-use. You only need to worry about query / model maintenance and refresh on the B.pbix dataset. Varying visuals on the report pages you build in A.pbix and B.pbix should meet your requirements.
It's described in some detail here:
https://learn.microsoft.com/en-us/power-bi/desktop-report-lifecycle-datasets

Power BI - Best practice importing big data?

I'm new in the BI field and I have to develop the BI of over 25 million records.
I'm using DirectQuery to import the data to PowerBi and to web. But the dashboard loading is too slow. And sometimes the dashboard won't load.
I'd like to know what's the best way to import big amount of data to PowerBI and to publish in the web after the fact.
DirectQuery, Analysis Service, what else?
Thanks for the answers.
you can do
1) make aggregation on data and reduce size of data.
2) bring only useful column only- that reduce size again
3) data is (real time/permutation -combination) only then use direct qry
4) if spark --you can connect to directly to spark

Power BI and Azure Document DB

I have connected to Azure Document DB by Power BI ,but it is taking too much time for data to load and even more time to apply the queries ...Is there any way to reduce this data loading time??
This is more of a generic question around Power BI and even broadly BI tools. But, in general, you have to specify more filters to the queries (yes, they can be edited even in plain SQL). Azure Cosmos DB is super-fast, it all depends on how much data you're trying to query. Also, make sure data is in the region from which users are accessing it in Power BI.