I cannot find a way to do this in the UI: I'd like to have distinct query tabs in the BigQuery's UI attached to the same session (i.e. so they share the same ##session_id and _SESSION variables). For example, I'd like to create a temporary table (session-scoped) in one tab, then in a separate query tab be able to refer to that temp table.
As far as I can tell, when I put a query tab in Session Mode, it always creates a new session, which is precisely what I don't want :-\
Is this doable in BQ's UI?
There is 3rd party IDE for BigQuery supporting such a feature (namely: joining Tab(s) into existing session)
This is Goliath - part of Potens.io Suite available at Marketplace.
Let's see how it works there:
Step 1 - create Tab with new session and run some query to actually initiate session
Step 2 - create new Tab(s) and join to existing session (either using session_id or just simply respective Tab Name
So, now both Tabs(Tab 2 and Tab 3) share same session with all expected perks
You can add as many Tabs to that session as you want to comfortably organize your workspace
And, as you can see Tabs that belong to same session are colored in user defined color so easy to navigate between them
Note: Another tool in this suite is Magnus - Workflow Automator. Supports all BigQuery, Cloud Storage and most of Google APIs as well as multiple simple utility type Tasks like BigQuery Task, Export to Storage Task, Loop Task and many many more along with advanced scheduling, triggering, etc. Supports GitHub as a source control as well
Disclosure: I am GDE for Google Cloud and creator of those tools and leader on Potens team
Related
In Google Cloud Platform, you can add labels to several resources and also add labels to the query jobs you execute. I did this second option. A typical code looks like this:
bq query --label=my_label:{parameter} --label=my_label2:{parameter2} --format=json --use_legacy_sql=false '{query}'"
But, by mistake, the first time I did like this:
bq query --label=my_label{parameter} --label=my_label2:{parameter2} --format=json --use_legacy_sql=false '{query}'"
which created several jobs (I regularly ran this command) having a label named my_labelFoo with an empty value instead of a label named my_label with a value of Foo. This was detected when, in the Billing UI, we noticed several labels as options for filtering, being all of them:
my_labelFoo
my_labelBar
my_labelBaz
my_labelJohn
my_labelGeorge
my_labelRingo
my_labelPaul
...
What I tried to do, then, is to delete the metadata of those wrong jobs. So I tried this query in BigQuery (having the appropriate permissions):
SELECT job_id, query, labels FROM `my-project`.`region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE ARRAY_LENGTH(labels) > 0 AND EXISTS(SELECT * FROM UNNEST(labels) l WHERE l.key = 'my_labelRingo')"
For each job_id retrieved this way, I tried invoking:
from google.cloud.bigquery import Client
Client().delete_job_metadata(job_id, location="us")
What I can say for good, is that the job entries were removed (they were a few amount), but...
...when I go back to the Billing UI, I still see the my_labelRingo as a selectable label from there. I don't want that label to exist anymore.
So, my question is:
How do I delete the wrong labels from the Billing UI?
Is there, perhaps, a time I have to wait for my_labelRingo to cease to exist?
The situation you are experiencing with the labels in the Billing console is something specific to Cloud Billing Support and you will need to directly engage them using this link
so they can fully investigate why it is happening.
The solutions shared below are different alternatives to delete labels in the GCP BigQuery console.
You can delete a table or view label in the following ways:
Using the console
Using SQL DDL statements
Using the bq command-line tool's bq update command
Calling the tables.patch API method
Because views are treated like table resources, tables.patch is used to modify both views and tables.
Using the client libraries
But you would need to have the next permissions:
bigquery.tables.get
bigquery.tables.update
For example to delete a label through the console you need to follow the next steps:
On the console, select the dataset you want to edit.
Click on the details tab and then click the pencil icon to the right of labels.
On the edit labels dialog
For each label you want to delete, click delete (X)
Click on update to save the changes,
Also you can see more ways to delete labels.
My scenario is:
I have 3 Dataflows:
Recent Data (from SQL Server. Refreshes 8 times a day)
Historical Data (does not refresh, just once initially)
Sharepoint Excel file Data
In my Dataset, I want to have a single Fact table that "union all" all 3 sources.
Instead of Append transformation, I want to create 3 custom Partition (well explained here: https://www.youtube.com/watch?v=6CRqdsLjHNA&t=127s).
I want to somehow tell the schedule refresh to only process the Recent Data and Excel Data partitions only.
The reasoning is - if I do Append, then the dataset will each time process the Historical Data again and again.
Now 2 questions:
How do I tell the scheduled refresh to only process two of 3 partitions? (I can do it manually via XMLA endpoint, but I need it scheduled)
What if I change something in my report (like visuals) - how do I deploy the changes without needing to recreate the partitions?
See Advanced Refresh Scenarios which includes Metadata Only Deployment, and Automate Premium workspace and dataset tasks with service principals.
The easiest way to generate the TMSL scripts for the advanced refresh scenarios is with SQL Server Management Studio (SSMS) which has wizards for configuring refresh, and can generate the script for you. Then you use the script through PowerShell cmdlets or using ADOMD.NET, which in turn can be automated with Azure Automation or an Azure Function.
If you don't need full TMSL scripting capabilities, Power Automate has connectors that hit the Power BI REST APIs, but doesn't support partition-based refresh currently.
But you can call the REST Refresh API directly through any programming language, or the Power Automate HTTP Action.
Also you should take a look at the new (Preview) Hybrid Tables feature which would enable you to have the recent data in a DirectQuery partition, while the historical data is in Import mode.
I am new in power bi. I am using SQL connection for data load in power bi.
I created the report in Dev environment. But I want to use the same report in all environment(dev/test/uat/prod).
Question: Is it possible to switch the connection via button click in dashboard?
You'll have to use a parameter to select the connection and store the report in template format - *.pbit. Then you can easily create different versions of the report from the template by specifying the according parameter setting.
The only way to use a slicer for changing the environment would be to load the data from all different environments into the model first - which is clearly not recommended.
Power BI offers Deployment Pipelines for this purpose. This tool will allow you to create 3 workspaces for dev, test and production stages. Then you can deploy from one stage to another by clicking a button in Power BI Service or using the REST API. In the pipeline you can define rules for dataset and parameters, which can be used to automatically change the datasource when deploying to the next stage, i.e. to change the datasource from your dev database to the test database, or from test database to production one.
You can also implement similar functionality using the API. See for example this answer.
Its tricky question.
Try with above answers, if those not work try with these approach.
I don't think for the moment they didn't implement solution for that.
From my experience I had to create 3 dashboards and gateways to dev, test and prod dashboard.
If your dev,test,prod database column names are same you just copy past your dashboard and rename it according to that.
Then go to change data source and add new test env host and change schema to test env.
If you get few errors you have to resolve , check column names, host and finally you have to sync your data.
You can use same approach for the prod env .
once you publish, you can point to gateway for dev,test or prod environment.
Note: Establish gateway on your server.
This is what I am trying to do: I have various SQL server databases with data. I created views in all of them. All views will need to be imported, and I specify their relationships. I want this to be refreshed nightly. I want to build various reports of the same data source.
Do I have to use a PowerBI desktop application to import data into PowerBI Report Service? [I have done this so far, but then can create new reports in the cloud on existing data. It would make sense to connect directly from PowerBI report service to my SQL servers.]
Once I uploaded data using a desktop application (as I have done so far), how can I view the data model in the report service once it is uploaded in the cloud?
In order to get routinely refreshed data I need to setup a gateway. Is the local PowerBI desktop application still involved in this process, or could I [in theory] delete the local desktop application that pushed the data in initially?
For your questions:
You have two options, use PBI Desktop to connect to the data using import/direct query, then load it to the service. You can use dataflows to create an import based on your views, but you will then need to create reports from those. Using dataflows, you'll have to set up a refresh schedule, then for the dataset(s) built on top of those, you'll have to set another refresh schedule.
You will be limited to the dataset sizes of 1GB for the workspace if importing data. You cannot use direct query on dataflows (unless you have enhanced compute with PBI premium). Once the dataset is loaded, you can then create new reports in the service or via desktop on top of that dataset. If possible it is recommended to use direct query.
To see the data model, you can use desktop to connect to PBI Service Dataset. This will connect in 'Live Connection' mode, and will be limited to that one dataset, you can't add others to it, Excel, CSV, SQL etc. You can also use Analyse in Excel, a plugin for Excel, that can connect to the data model. You can create new reports in the service for existing data models as well.
When creating the report in PBI Desktop it does not use the Gateway, you connect to your data sources as normal, then once you load the dataset to Power BI it will match the data sources in the file to the ones set up in the Gateway Admin settings. So you will still need PBI Desktop to create reports, but the gateway is there for the refreshing. The Desktop is not used in the process for refreshing. You could delete the workbook or application, but if you have to make changes, what will you refer to? (You could download a copy of the report from the service).+ It is easier to make changes in the desktop app, then the service, as there is a feature difference between dataset creation in the desktop vs service.
For context, we would like to visualize our data in google data studio - this dataset receives more entries each week. I have tried hosting our data sets in google drive, but it seems that they're too large and this slows down google data studio (the file is only 50 mb, am I doing something wrong?).
I have loaded our data into google cloud storage --> google bigquery, and connected my google data studio to my bigquery table. This has allowed me to use the google data studio dashboard much quicker!
I'm not sure what is the best way to update our data weekly in google cloud/bigquery. I have found a slow way to do this by uploading the new weekly data to google cloud, then appending the data to my table manually in bigquery, but I'm wondering if there's a better way to do this (or at least a more automated way)?
I'm open to any suggestions, and if you think that bigquery/google cloud storage is not the answer for me, please let me know!
If I understand your question correctly, you want to automate the query that populate your table, which is connected to Data Studio.
If this is the case, then you can use Scheduled Query from BigQuery. Scheduled query allow you to define a query which results can be inserted in a new table. Particularly you can specify different rules for repetition (minimum each 15 minutes) and execution, as well as destination writing options (destination table, writing mode: append, truncate).
In order to use Scheduled Queries your account must have the right permissions. You can have a look at the following documentation to better understand how to use Scheduled Query [1].
Also, please note that at the front end the updated data in the BigQuery table will be seen updated in Datastudio at each refresh (click on refresh button in Datastudio). To automatically refresh the front-end visualization you can use the following plugin [2] or automate the click on the refresh button through Browser console commands.
[1] https://cloud.google.com/bigquery/docs/scheduling-queries
[2] https://chrome.google.com/webstore/detail/data-studio-auto-refresh/inkgahcdacjcejipadnndepfllmbgoag?hl=en