How to update data in google cloud storage/bigquery for google data studio? - google-cloud-platform

For context, we would like to visualize our data in google data studio - this dataset receives more entries each week. I have tried hosting our data sets in google drive, but it seems that they're too large and this slows down google data studio (the file is only 50 mb, am I doing something wrong?).
I have loaded our data into google cloud storage --> google bigquery, and connected my google data studio to my bigquery table. This has allowed me to use the google data studio dashboard much quicker!
I'm not sure what is the best way to update our data weekly in google cloud/bigquery. I have found a slow way to do this by uploading the new weekly data to google cloud, then appending the data to my table manually in bigquery, but I'm wondering if there's a better way to do this (or at least a more automated way)?
I'm open to any suggestions, and if you think that bigquery/google cloud storage is not the answer for me, please let me know!

If I understand your question correctly, you want to automate the query that populate your table, which is connected to Data Studio.
If this is the case, then you can use Scheduled Query from BigQuery. Scheduled query allow you to define a query which results can be inserted in a new table. Particularly you can specify different rules for repetition (minimum each 15 minutes) and execution, as well as destination writing options (destination table, writing mode: append, truncate).
In order to use Scheduled Queries your account must have the right permissions. You can have a look at the following documentation to better understand how to use Scheduled Query [1].
Also, please note that at the front end the updated data in the BigQuery table will be seen updated in Datastudio at each refresh (click on refresh button in Datastudio). To automatically refresh the front-end visualization you can use the following plugin [2] or automate the click on the refresh button through Browser console commands.
[1] https://cloud.google.com/bigquery/docs/scheduling-queries
[2] https://chrome.google.com/webstore/detail/data-studio-auto-refresh/inkgahcdacjcejipadnndepfllmbgoag?hl=en

Related

Advanced Partitions Management in Power BI

My scenario is:
I have 3 Dataflows:
Recent Data (from SQL Server. Refreshes 8 times a day)
Historical Data (does not refresh, just once initially)
Sharepoint Excel file Data
In my Dataset, I want to have a single Fact table that "union all" all 3 sources.
Instead of Append transformation, I want to create 3 custom Partition (well explained here: https://www.youtube.com/watch?v=6CRqdsLjHNA&t=127s).
I want to somehow tell the schedule refresh to only process the Recent Data and Excel Data partitions only.
The reasoning is - if I do Append, then the dataset will each time process the Historical Data again and again.
Now 2 questions:
How do I tell the scheduled refresh to only process two of 3 partitions? (I can do it manually via XMLA endpoint, but I need it scheduled)
What if I change something in my report (like visuals) - how do I deploy the changes without needing to recreate the partitions?
See Advanced Refresh Scenarios which includes Metadata Only Deployment, and Automate Premium workspace and dataset tasks with service principals.
The easiest way to generate the TMSL scripts for the advanced refresh scenarios is with SQL Server Management Studio (SSMS) which has wizards for configuring refresh, and can generate the script for you. Then you use the script through PowerShell cmdlets or using ADOMD.NET, which in turn can be automated with Azure Automation or an Azure Function.
If you don't need full TMSL scripting capabilities, Power Automate has connectors that hit the Power BI REST APIs, but doesn't support partition-based refresh currently.
But you can call the REST Refresh API directly through any programming language, or the Power Automate HTTP Action.
Also you should take a look at the new (Preview) Hybrid Tables feature which would enable you to have the recent data in a DirectQuery partition, while the historical data is in Import mode.

Adding static Excel to automatically refreshing Power BI report

I have an existing PowerBI report that imports data from an SQL Server analytics services database. This is working fine and I can schedule automatic refreshes using the Gateway provided by my organization.
I would now like to add some additional, but rarely changing data, that I only have in a local Excel file. When I do add this data, the report stops refreshing automatically and complains, that it has no gateway to refresh this Excel file.
What I would like is that Power BI is refreshing the data of the SQL Server analytics services database, but just keeps the existing Excel file without updating it. - I will upload an updated version of the PowerBI report if I need to change the data in the Excel file.
Is that possible? I couldn't find out how. I was trying to upload the Excel file to a different dataset to the Power BI service and reference this dataset in my report. Just to find out, that I cannot access a different Power BI dataset and SQL server analysis services database from the same report.
Three things I can think of
Upload the file to onedrive/sharepoint so that it's accessible online (per Dev's answer)
If the data is simple enough, you can add the data directly into PowerBI itself and skip the Excel file entirely.
You can disable the Excel file refresh so that PBI does not try and refresh(and thus access) the local Excel file. (Not sure if this will work)
I had a similar issue I came across. Yes, you can just use Enter Data to add a table, but you can only build something with less than 3000 cells, so you'd have to merge several tables if something was larger than that.
Turning off the report refresh in the suggestion above (#3) still requires a gateway, unfortunately.
I just created a dataflow and plopped the data from my csv there. You'll have to create a connection and refresh it, but you don't need to schedule a refresh there, so no need to create a gateway.
Then just link the dataflow as a source to your .pbix file and setup your gateway to point at the dataflow.

Export Data from BigQuery to Google Cloud SQL using Create Job From SQL tab in DataFlow

I am working on a project which crunching data and doing a lot of processing. So I chose to work with BigQuery as it has good support to run analytical queries. However, the final result that is computed is stored in a table that has to power my webpage (used as a Transactional/OLTP). My understanding is, BigQuery is not suitable for transactional queries. I was looking more into other alternatives and I realized I can use DataFlow to do analytical processing and move the data to Cloud SQL (relationalDb fits my purpose).
However, It seems, it's not as straightforward as it seems. First I have to create a pipeline to move the data to the GCP bucket and then move it to Cloud SQL.
Is there a better way to manage it? Can I use "Create Job from SQL" in the dataflow to do it? I haven't found any examples which use "Create Job From SQL" to process and move data to GCP Cloud SQL.
Consider a simple example on Robinhood:
Compute the user's returns by looking at his portfolio and show the graph with the returns for every month.
There are other options, beside pipeline use, but in all cases you cannot export table data to a local file, to Sheets, or to Drive. The only supported export location is Cloud Storage, as stated on the Exporting table data documentation page.

PowerBI – How to set up a streaming data report?

I have a range of devices that push data to an Azure IoT hub. I then have a stream analytics job which ingests the data from IoThub and outputs it to PowerBI. In PowerBI then I have created a report from the dataset which is working. I now want that report to be ‘live’ using the streaming data from Stream Analytics, but can’t see how to make it so?
Looking at this guide, under the section ‘Create and publish a Power BI report to visualize the data’ point 4 says ‘Click Streaming Datasets’ which sounds like exactly what I need, but I don’t see that option anywhere?
Have I missed a step or set something up wrong, or is there another way to make my report ‘live’ without needing manual refresh and publishing?
Here's what I see when creating a report:
Thanks
Go to the report where you want to add the streaming data tile to.
Click on '+ Add tile' over on the top navigation
You should see an option for Real time - Custom streaming data on the resulting blade.

Framework selection for a new project?

Problem Context
We have a set of excel reports which are generated from an excel input provided by the user and then fed into SAS for further transformation. SAS pulls data from Teradata database and then there is a lot of manipulation that happens with the input data & data pulled from Teradata. Finally, a dataset is generated which can either be sent to the client as a report, or be used for populating Tableau dashboard. Also the database is being migrated from Teradata to Google Cloud (Big Query EDW) as the Teradata pulls from SAS used to take almost 6-7 hours
Problem Statement
Now we need to automate this whole process, by creating front end for the user to upload the input files and from there on the process should trigger and in the end the user should receive the excel file or Tableau dashboard as an attachment in a mail.
Can you suggest what technologies should be used in the front end & middle tier to make this process feasible is least possible time with google cloud platform as the backend?
Can an R shiny front end be a solution given that we need to communicate with a Google Cloud backend ?
I have got suggestion from people that Django will be a good framework to accomplish this task. What are your views on this ?