I'm using GCP BigQuery with Google Data Studio. I did setup the BI engine reservation in the same region as the dataset (EU, multiple locations).
I still get in Data Studio this warning: "Not accelerated by BigQuery BI Engine"
BI Engine accelerates only some types of queries. See some limitations documented here.
Go to your BigQuery console, locate the project history, and the Jobs history, filter to Querys only.
Examine each query that is coming from DataStudio, and if you open the pane, it will have details why it's not accelerating it.
You will see samples like this:
If you are more developer friendly you can use the BQ cli command tool
To get the most recent jobs:
bq ls -j -a --max_results=15
To fetch the statistics associated with BI Engine accelerated queries, run the following bq command-line tool command:
bq show --format=prettyjson -j job_id
and it will have a section such as:
"statistics": {
"creationTime": "1602175128902",
"endTime": "1602175130700",
"query": {
"biEngineStatistics": {
"biEngineMode": "DISABLED",
"biEngineReasons": [
{
"code": "UNSUPPORTED_SQL_TEXT",
"message": "Detected unsupported join type"
}
]
},
Related
I was trying to auto tag InfoTypes like PhoneNumber, EmailId on the data in GCS Bucket and Big Query External tables using Data Loss Prevention Tool in GCP so that i can have those tags at Data Catalog and subsequently in Dataplex. Now the problems are that
If i select any sources other than Big Query table (GCS, Data Store etc.), the option to publish GCP DLP inspection results to Data Catalog is disabled.
If i select Big Query table, Data Catalog publish option is enabled but when i try to run the inspection job, its errors out saying , "External tables are not supported for inspection". Surprisingly it supports only internal big query tables.
The question is that, is my understanding of GCP DLP - Data Catalog integration works only for Internal Big Query tables correct? Am doing something wrong here, GCP documentation doesn't mention these things either!
Also while configuring the Inspection Job from the DLP UI Console, i had to provide Big Query tableid mandatorily, is there a way i can run DLP inspection job against a BQ Dataset or a bunch of tables?
Regarding Data Loss Prevention Services in Google Cloud, your understanding is correct, data cannot be ex-filtrated by copying to services outside the perimeter, e.g., a public Google Cloud Storage (GCS) bucket or an external BigQuery table. Visit this URL for more reference.
Now, about how to run a DLP Inspection job against a BQ bunch of tables, there are 2 ways to do it:
Programmatically fetch the Big Query tables, query the table and call DLP Streaming Content API. It operates in real time, but it is expensive. Here I share the concept in a Java example code:
url =
String.format(
"jdbc:bigquery://https://www.googleapis.com/bigquery/v2:443;OAuthType=3;ProjectId=%s;",
projectId);
DataSource ds = new com.simba.googlebigquery.jdbc42.DataSource();
ds.setURL(url);
conn = ds.getConnection();
DatabaseMetaData databaseMetadata = conn.getMetaData();
ResultSet tablesResultSet =
databaseMetadata.getTables(conn.getCatalog(), null, "%", new String[]{"TABLE"});
while (tablesResultSet.next()) {
// Query your Table Data and call DLP Streaming API
}
Here is a tutorial for this method.
Programmatically fetch the Big Query tables, and then trigger one Inspect Job for each table. It is the cheapest method, but you need to consider that it's a batch operation, so it doesn’t execute in real time. Here is the concept in a Python example:
client = bigquery.Client()
datasets = list(client.list_datasets(project=project_id))
if datasets:
for dataset in datasets:
tables = client.list_tables(dataset.dataset_id)
for table in tables:
# Create Inspect Job for table.table_id
Use this thread for more reference on running a DLP Inspection job against a BQ bunch of tables.
Where/how can I easily see how many BigQuery analysis queries have been run per month. How about storage usage overall/changes-over-time(monthly)?
I've had a quick look at "Monitoring > Dashboards > Bigquery". Is that the best place to explore? It only seems to go back to early October - was that when it was released or does it only display the last X weeks of data? Trying metrics explorer for Queries Count (Metric:bigquery.googleapis.com/job/num_in_flight) was giving me a weird unlabelled y-axis, e.g. a scale of 0 to 0.08? Odd as I expect to see a few hundred queries run per week.
Context: It would be good to have a high level summary of BigQuery, as the the months progress, to give an idea to the wider organisation and management on the scale of usage.
You can track your bytes billed by exporting BigQuery usage logs.
Setup logs export (this is using the Legacy Logs Viewer)
Open Logging -> Logs Viewer
Click Create Sink
Enter "Sink Name"
For "Sink service" choose "BigQuery dataset"
Select your BigQuery dataset to monitor
Create sink
Create sink
Once Logs is enabled, all queries to be executed will store data usage logs in table "cloudaudit_googleapis_com_data_access_YYYYMMDD" under the BigQuery dataset you selected in your sink.
Created cloudaudit_googleapis_com_* tables
Here is a sample query to get bytes used per user
#standardSQL
WITH data as
(
SELECT
protopayload_auditlog.authenticationInfo.principalEmail as principalEmail,
protopayload_auditlog.metadataJson AS metadataJson,
CAST(JSON_EXTRACT_SCALAR(protopayload_auditlog.metadataJson,
"$.jobChange.job.jobStats.queryStats.totalBilledBytes") AS INT64) AS totalBilledBytes,
FROM
`myproject_id.training_big_query.cloudaudit_googleapis_com_data_access_*`
)
SELECT
principalEmail,
SUM(totalBilledBytes) AS billed_bytes
FROM
data
WHERE
JSON_EXTRACT_SCALAR(metadataJson, "$.jobChange.job.jobConfig.type") = "QUERY"
GROUP BY principalEmail
ORDER BY billed_bytes DESC
Query results
NOTES:
You can only track the usage starting at the date when you set up the logs export
Table "cloudaudit_googleapis_com_data_access_YYYYMMDD" is created daily to track all logs
I think Cloud Monitoring is the only place to create and view metrics. If you are not happy with what they provide for BigQuery by default, the only other alternative is to create your own customized carts and dashboards that satisfy your need. You can achieve that using Monitoring Query Language. Using MQL you can achieve the stuff you described in you question. Here are the links for more detailed information.
Introduction to BigQuery monitoring
Introduction to Monitoring Query Language
I have set up a build pipeline Azure DevOps, which builds the project, runs the MSTests and generates code coverage report as well as code analysis metrics results.
How do I get these results to a dashboard such as Power BI or any similar? What are the different visualization options from Azure DevOps?
I know adding a widget and getting the visualization in a Azure DevOps dashboard, but looking for an option where I can publish the results, also see the historic code metrices, and drill down to each class level results.
You can check the sample reports in the following link:
https://learn.microsoft.com/en-us/azure/devops/report/powerbi/sample-odata-overview?view=azure-devops
For example, you can paste the Power BI query listed below directly into the Get Data->Blank Query window.
let
Source = OData.Feed (""
in
Source
&"Pipeline/PipelineName eq '{pipelineName}' "
&"And Date/Date ge {startdate} "
&"And Workflow eq 'Build' "
&") "
&"/aggregate( "
&"ResultCount with sum as ResultCount, "
&"ResultPassCount with sum as ResultPassCount, "
&"ResultFailCount with sum as ResultFailCount, "
&"ResultNotExecutedCount with sum as ResultNotExecutedCount, "
&"ResultNotImpactedCount with sum as ResultNotImpactedCount "
&") "
,null, [Implementation="2.0",OmitValues = ODataOmitValues.Nulls,ODataVersion = 4])
in
Source
I got the metrics xml files to a blob storage from the pipeline and then added the blob storage as data source to power BI. Did necessary transformations in Power BI and published the report to PowerBI dashboard which gave me required report and made it accessible to people in the organization.
I have a Power BI project where the data is stored in sql server through DirectQuery. I would like to schedule the update of the report and therefore of the data, through Microsoft Automate (Flow).
Flow scheduling
When I run the schedule, however, I get this error message "Invalid dataset. This API can only be called on a Model-based dataset power bi".
{
"error": {
"code": "InvalidRequest",
"message": "Invalid dataset. This API can only be called on a Model-based dataset"
}
}
Immage Error
Why?
When you publish Direct Query report to PBI Service, there is no need in scheduled refresh, the service is able to query the data directly. Only import datasets require refresh when published to PBI service. That is the reason of the error - there is basically nothing to refresh.
I have injected data to Druid using tranquility.
The data source is visible through overlord's console, all good I can query.
Tranquility 0.1.0
Druid 12.3
Superset 0.1.0
When I attach Druid's datasource to Superset I see that all defined columns are of type String. That is pretty weird because I defined types in the tranquility schema as follow:
"dimensionsSpec": {
"dimensions": [
"some_id",
{
"type": "double",
"name": "total_positions"
}]
}
I tried to use Calculated Columns and Metrics but when I save those new element are not appearing in Druid.
Druid chart -> datasource editor
Did anyone has a similar issue? Is there any way I can change column type in Superset or maybe the schema should be defined some different way.
We have the same issue on our environment. We were planning to use it in Apache Branch Report.
As a workaround, we've created external table for Druid on Hive and using Hive connector in Superset in order to cast to integer in SQL Lab: https://cwiki.apache.org/confluence/display/Hive/Druid+Integration
However, it would have been much better if Superset charts could interpret numeric dimensions out of the box so that the architecture would be leaner.
We faced a similar issue. By default, all dimensions were taken as String. In Tranquility, we used metrixSpec and defined the column as longSum. These columns will reflect as numbers in Superset. Remember to refresh Druid metadata in Superset.
"metricsSpec": [
{
"name": "trafficUp",
"type": "longSum",
"fieldName": "trafficUp"
}
]