Azure Data Explorer - Measuring the Cluster performance /impact

Azure Data Explorer - Measuring the Cluster performance /impact - powerbi

Is there a way to measure the impact to Kusto Cluster when we run a Query from Power BI. This is because the Query I use in Power BI might get large data even if it is for a limited time range. I am aware of setting - limit Query result record ,but I would like to measure the impact to Cluster for specific queries .
Do I need to use the metrics under - Data explorer monitoring. Is there a best way to do it and any specific metrics . Thanks.

You can use .show queries or Query diagnostics logs - these can show you the resources utilization per query (e.g. Total CPU time & memory peak), and you can filter to a specific user or application name (e.g. PowerBI).

Related

Item Duration in Cache

I am trying to create a metric to measure the amount of time that an item has been in a cache using Elasticache. There does not seem to be any built in metric for this in Cloud Watch, and I have struggled to run a query in logs insights to obtain this information.
I have tried running a query in log insights to create this metric, but it requires matching of an ID and the query language used in AWS does not seem to support these types of conditional queries. So I am unsure of how to solve this problem

BigQuery with BI Engine is slower than BigQuery with cache

I've read almost all the threads about how to improve BigQuery performance, to retrieve data in milliseconds or at least under a second.
I decided to use BI Engine for the purpose because it has seamless integration without code changes, it supports partitioning, smart offloading, real-time data, built-in compression, low latency, etc.
Unfortunately for the same query, I got a slower response time with the BI engine enabled, than just the query with cache enabled.
BigQuery with cache hit
Average 691ms response time from BigQuery API
https://gist.github.com/bgizdov/b96c6c3d795f5f14e5e9a3e9d7091d85
BigQuery + BiEngine
Average 1605ms response time from BigQuery API.
finalExecutionDurationMs is about 200-300ms, but the total time to retrieve the data (just 8 rows) is 5-6 times more.
BigQuery UI: Elapsed 766ms, the actual time for their call to REST entity service is 1.50s. This explains why I get similar results.
https://gist.github.com/bgizdov/fcabcbce9f96cf7dc618298b2d69575d
I am using Quarkus with BigQuery integration and measuring the time for the query with Stopwatch by Guava.
The table is about 350MB, the BI reservation is 1GB.
The returned rows are 8, aggregated from 300 rows. This is a very small data size with a simple query.
I know BigQuery does not perform well with small data sizes, or it doesn't matter, but I want to get data for under a second, that's why I tried BI, and it will not improve with big datasets.

Could you please share job id?
BI Engine enables a number of optimizations, and for vast majority of queries they allow significantly faster and efficient processing.
However, there are corner cases when BI Engine optimizations are not as effective. One issue is initial loading of the data - we fetch data into RAM using optimal encoding, whereas BigQuery processes data directly. Subsequent queries should be faster. Another is - some operators are very easy to optimize to maximize CPU utilization (e.g. aggregations/filtering/compute), while others may be more tricky.

Is there a way to connect PBI to a Databricks cluster that is not running?

In my scenario, Databricks is performing read and writing transformations in Delta tables. We have PBI connected to the Databricks cluster that needs to be running most of the time, which is expensive.
Knowing that delta tables are in a container, what would be the best way in terms of cost x performance to feed PBI from delta tables?

If your set size is under max allowed size in PowerBI (100 GB I guess) and daily refresh is enough you can just load everything to your PowerBI model.
https://blog.gbrueckl.at/2021/01/reading-delta-lake-tables-natively-in-powerbi/
If you want to save the costs maybe you don't need transactions and can save it in csv in data lake, than loading everything to PowerBI and refresh daily is really easy.
If you want to save the costs and query new incoming data all the time using DirectQuery consider using Azure SQL. It has really competitive prices starting from 5 eur/usd. Integration with databricks is also perfect write in append mode do all magic.

Another option to consider is to create an Azure Synapse workspace and use serverless SQL compute to query the delta lake files. This is a pay-per-the-TB consumed pricing model so you don’t have to have your Databricks cluster running all the time. It’s a great way to load Power BI import models.

Is it possible to set a query size limit quota for BigQuery ML (BQML)?

It's possible to set query size limits for BigQuery API on project and user-level, see https://cloud.google.com/bigquery/quotas
As I understand it, this includes BQML. The costs between BQ and BQML differ significantly, though. If we'd set a query size limit of 1 TB per user and day, this would allow the user to consume 1 TB with BQML, which results in costs of 250$, whereas for normal BQ query costs would be 5$.
Is there a way to set a user query size limit specifically for BQML?

Unfortunately for on-demand users there is no way to set a query limit specific for BQML CREATE MODEL statement.

Google BigQuery BI Engine monitoring and comparing

I have recently been asked to look into BI Engine for our BigQuery Tables and Views. I am trying to find out how to compare the speed of using BI Engine reservation against not using it.. any way i can see this?
Thank you

Keep in mind that BI Engine uses BigQuery as a backend, for that reason, the BI Engine reservations works like BigQuery reservations too, based on this, I suggest you look the Reservations docs to get more information about the differences between On-demand capacity and flat-rate pricing.
You can find useful concepts about reservations in this link.

There are a couple of ways to do that:
1) If your table is less than 1Gb, it will use free tier. Then any dashboard created in Data Studio will be accelerated (see https://cloud.google.com/bi-engine/pricing).
2) If not, create reservation in pantheon: https://cloud.google.com/bi-engine/docs/reserving-capacity. Once you create reservation, Data Studio dashboards will be accelerated. You can experiment for couple of hours and remove reservation, and will only be charged for the time reservation was enabled.

BI Engine will in general only speed up smaller SELECT queries coming from Tableau, Looker etc., and the UI. So for example queries processing < 16 GB.
My advice would be to make a reservation for example for 8GB and then check how long it took for queries that used BI Engine. You can do that by querying the information schema:
select
creation_time,
start_time,
end_time,
(unix_millis(end_time) - unix_millis(start_time)) / 1000 total_time_seconds,
job_id,
cache_hit,
bi_engine_statistics.bi_engine_mode,
user_email,
query,
from `your_project_id.region-eu.INFORMATION_SCHEMA.JOBS`
where
creation_time >= '2022-12-13' -- partitioned on creation_time
and creation_time < '2022-12-14'
and bi_engine_statistics.bi_engine_mode = 'FULL' -- BI Engine fully used for speed up
and query not like '%INFORMATION_SCHEMA%' -- BI Engine will not speed up these queries
order by creation_time desc, job_id
Then switch off BI Engine, and run the queries that had BI Engine mode = FULL again, but now without BI Engine. Also make sure cache is turned off!
You can now compare the speed. In general queries are 1.5 to 2 times faster. Although it can also happen that there is no speed up, or in some cases a query will take slightly longer.
See also:
https://lakshmanok.medium.com/speeding-up-small-queries-in-bigquery-with-bi-engine-4ac8420a2ef0
BigQuery BI Engine: how to choose a good reservation size?

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Azure Data Explorer - Measuring the Cluster performance /impact - powerbi

You can use .show queries or Query diagnostics logs - these can show you the resources utilization per query (e.g. Total CPU time & memory peak), and you can filter to a specific user or application name (e.g. PowerBI).

Related

Item Duration in Cache

BigQuery with BI Engine is slower than BigQuery with cache

Is there a way to connect PBI to a Databricks cluster that is not running?

Is it possible to set a query size limit quota for BigQuery ML (BQML)?

Google BigQuery BI Engine monitoring and comparing

Categories

Resources