Scheduled refresh with Azure SQL only data grayed out - powerbi

Setup
I have around 7GB of data in an Azure SQL DB which will continue to grow. PBIX report at just under 1GB. Currently I'm using the import method to work with the data and then publishing the report. Data loaded to PBI Desktop and then published to a workspace. All the data comes from the same Azure DB and I've already checked the Firewall option about internal Azure connections.
Problem
I am unable to set a scheduled refresh because I haven't filled out a data source credential, but that option is grayed out so I can't fill it in. All the data comes from the same Azure DB (some used to be CSV files, but I just created tables in the DB instead and replaced the data), which is online, so I should not need a gateway.
Thoughts
Maybe the capacity of the Office tenant (not sure if it's A1-3 or larger, not sure how to check) is full, as the report is just shy of 1 GB and the error shown is just badly handled?
Maybe because I had some of the data as files first, it's not recognizing that it's all now under the same DB connection? (I deleted the report with the dataset and re-uploaded)
Maybe I should change it to "Direct Query" (which I think makes me loose some of the things I've done in the report) and pay for more DB use instead, if this is something that's not possible, although this seems like the best way since it's MS and MS.
Maybe PBI just hates me.
Error message:
Last refresh failed: Tue Apr 06 2021 22:39:08 GMT+0000 (Greenwich Mean Time) Scheduled refresh has been disabled. Data source error: Scheduled refresh is disabled because at least one data source is missing credentials. To start the refresh again, go to this dataset's settings page and enter credentials for all data sources. Then reactivate scheduled refresh.

Related

I'm unable to republish/overwrite an existing dataset in my shared workspace nor can I refresh it

I haven't done any changes in my dataset for a while and in the past few days, the refresh and publish features (via PBI Desktop) were working just fine. However, this morning, it suddenly stopped following through the scheduled/automatic refresh which were set 5x a day every 2 hours starting 8AM. When Itrigger manual refresh, it says "Preparing for refresh" but doesn't really follow through with it, so i kept on clicking it but to no avail. After a while, I checked the Refresh History and the error I got was it's timing out.
I was trying to republish it with a newer version which was the same dataset refreshed in a different workspace. Apparently, when I publish it to a different workspace, or if I rename it and publish it to my existing shared workspace, it works. BUT I don't want to push through with this kind of workaround because I cannot afford to redo my the set up of RLS and manage permission in the dataset of my existing workspace.
Hope you can help me on this. Thanks.
Try signing in to the workspace using a browser, then use the Get Data feature (arrow button at bottom left), then choose Files / Local File and select your PBIX.
It's effectively the same results as a Publish from Power BI Desktop, but sometimes the method above works when Power BI Desktop is jammed up.
You might also have a refresh issue, not a publish issue. There's an unannounced-refresh-on-publish feature that most people don't notice:
"When you republish a dataset published from Power BI Desktop and have a refresh schedule defined, a dataset refresh is started as soon as you republish."
https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-upload-desktop-files

Power BI report data storage concept

I am looking for an answer regarding report data storage concept in Power BI.
I have published 3 reports to Power BI service (cloud):
Report1 with Excel source
Report2 with onpremise Sql server source
Report3 with azure sql source
Around 200 users in my organization will be accessing these reports. I want to understand whether:
The first time a particular report is accessed, will the data be fetched from the source and shown in the report or will it be stored to some cloud location from where the data will go to the report?
Suppose a user opens a report that was already viewed by another user, then will the data be fetched from the source again or is there any concept of cross user shared cache?
Suppose a user opens the report for the 2nd time (example: after having already accessed it, suppose user refreshes the web page), will the data will be fetched again? Or is there any concept of shared cache?
Does the answer to any of the above change if I had used the Power BI reporting server (onpremise) and deployed the report on the PBRS?
With the service, you typically upload a PBIX, which contains the report pages and all of the underlying data. Unless you set up a data gateway to accommodate DirectQueries and/or scheduled refreshes, the cloud service does not access your original data sources at all. With a scheduled refresh, it only accesses the original data during the refresh. A DirectQuery connection does access a server "live" but has many limitations.
The data is fetched when you load it into your Power BI desktop application and then loaded into the cloud when you publish the report to a workspace. Once it's there, the data shown to the user is fetched from the cloud copy, not the original data source.
Same answer as above regarding where the data is fetched from (the cloud copy). I don't believe there is shared cache between users but rather each user has some temporary caching individually. This type of caching saves the calculation results (computed on the underlying data) that are needed to populate the report visuals.
There is some caching done temporarily so that if a user switches among slicer combinations to one previously chosen you may see much quicker loading than when selecting a new configuration since it cached the results and doesn't need to recompute them. As far as I understand, this kind of caching is short-lived and not shared among users. Remember, this type of cache is not the same as the underlying data in the cloud copy of the PBIX.
I've not used an on-premise server, but I would expect the behavior to be similar with the exception that the service is on the local server instead of a could server somewhere else.
The upshot is that traffic in the service is separated from the requests to the original source data (assuming no DirectQuery connections). Those original sources are only accessed during data refreshes, which are independent of end-user actions (under the same assumption).

Reuse a previously published datasource in a Power BI report

I have developed a Power BI report using Power BI Desktop, pointing to a private on premise development database as the datasource so that I was able to develop and test it easily. Then, I published it from my Power BI Desktop pbix to the work area of my customer.
As a result, the work area contains the published report and the dataset. Later, my customer has changed the dataset so that it now points to the correct on premise production database of their own. It works perfectly.
Now, I want to publish a new report for my customer using the previously published and reconfigured dataset. The problem is that I can't see any option in Power BI Desktop to have the report point to the published dataset, nor I can't see any option to avoid creating a new dataset each time I publish a report, nor any way to reconfigure from the web portal the new published report to point to the same dataset as the first one.
Is there any way to do this or any work around for this scenario? I think the most reasonable solution would be to be able to change the dataset of any report, so that the datasets of any report could be interchangeable.
Update:
I had already used connection specific parameters, but I'm not given rights to change the published dataset, so thats a dead end.
Another thing I have come up to is that in Power BI Desktop you cannot change the connection parameters values to those of production enviroment and publish the report if you can't access the target database from your computer, because PowerBI Desktop ask you to apply changes first, and when it tries to apply the values it tries to connect to the corresponding database and, obviously, ends with a network related error or timeout error trying to connect to the database server, therefore cancelling changes and returning to the starting point.
It's always a good practice to use connection specific parameters to define the data source. This means that you do not enter server name directly, but specify it indirectly using a parameter. The same for the database name, if applicable.
If you are about to make a new report, cancel Get data dialog, define parameters as described bellow, and then in Get data specify the datasource using these parameters:
To modify an existing report, open Power Query Editor by clicking Edit Queries and in Manage Parameters define two new text parameters, lets name them ServerName and DatabaseName:
Set their current values to point to one of your data sources, e.g. SQLSERVER2016 and AdventureWorks2016. Then right click your query in the report and open Advanced Editor. Find the server name and database name in the M code:
and replace them with the parameters defined above, so the M code will look like this:
Now you can close and apply changes and your report should work as before. But now when you want to change the data source, do it using Edit Parameters:
and change the server and/or database name to point to the other data source, that you want to use for your report:
After changing parameter values, Power BI Desktop will ask you to apply the changes and reload the data from the new data source. To change the parameter values (i.e. the data source) of a report published in Power BI Service, go to dataset's settings and enter new server and/or database name:
If the server is on-premise, check the Gateway connection too, to make sure that it is configured properly to use the right gateway. You may also want to check the available gateways in Manage gateways:
After changing the data source, refresh your dataset to get the data from the new data source. With Power BI Pro account you can do this 8 times per 24 hours, while if the dataset is in a dedicated capacity, this limit is raised to 48 times per 24 hours.
This is a easy way to make your reports "switchable", e.g. for switching one report from DEV or QA to PROD environment, or as part of your disaster recovery plan, to automate switching all reports in some workgroup to another DR server. In your case, this will allow you (or your customers) to easily switch the datasource of the report.
I think the only correct answer is that it cannot be done, at least at this moment.
The most closest way of achieving this is with Live connections:
https://learn.microsoft.com/en-us/power-bi/desktop-report-lifecycle-datasets
But if you have already designed your report without using the Live connection but your own development enviroment and corresponding connection parameters then you are lost, your only chance is redo all your report with the Live Connection, or the queerest one solution, to use an alias in your configuration matching the name of the database server and the same database name that in the target production environment.

How to update data in google cloud storage/bigquery for google data studio?

For context, we would like to visualize our data in google data studio - this dataset receives more entries each week. I have tried hosting our data sets in google drive, but it seems that they're too large and this slows down google data studio (the file is only 50 mb, am I doing something wrong?).
I have loaded our data into google cloud storage --> google bigquery, and connected my google data studio to my bigquery table. This has allowed me to use the google data studio dashboard much quicker!
I'm not sure what is the best way to update our data weekly in google cloud/bigquery. I have found a slow way to do this by uploading the new weekly data to google cloud, then appending the data to my table manually in bigquery, but I'm wondering if there's a better way to do this (or at least a more automated way)?
I'm open to any suggestions, and if you think that bigquery/google cloud storage is not the answer for me, please let me know!
If I understand your question correctly, you want to automate the query that populate your table, which is connected to Data Studio.
If this is the case, then you can use Scheduled Query from BigQuery. Scheduled query allow you to define a query which results can be inserted in a new table. Particularly you can specify different rules for repetition (minimum each 15 minutes) and execution, as well as destination writing options (destination table, writing mode: append, truncate).
In order to use Scheduled Queries your account must have the right permissions. You can have a look at the following documentation to better understand how to use Scheduled Query [1].
Also, please note that at the front end the updated data in the BigQuery table will be seen updated in Datastudio at each refresh (click on refresh button in Datastudio). To automatically refresh the front-end visualization you can use the following plugin [2] or automate the click on the refresh button through Browser console commands.
[1] https://cloud.google.com/bigquery/docs/scheduling-queries
[2] https://chrome.google.com/webstore/detail/data-studio-auto-refresh/inkgahcdacjcejipadnndepfllmbgoag?hl=en

Power BI embedded, direct query, not refreshing

I have implemented Power BI embedded in a web app with direct query using Azure SQL as data source.
The Azure SQL database is being updated by webjobs and if I leave open the Power BI embedded web app I don't see the visuals refreshing with the new data unless I trigger a query for example changing tab or filtering with a slicer.
In the documentation I found the following:
"If there is no user interaction in a visualization, like in a dashboard, data is refreshed automatically about every fifteen minutes."
Do I understand correctly that an open visual in my case should be refreshing without need of user interaction?
Can you point out to the reason for it not to be updating automatically? Also do you know a way to control the time of the refresh with direct query without user interaction more exactly than the "...about every fifteen minutes..."
When inspecting the connection properties on Power BI desktop I have made sure it indicates "Direct Query".
From my understanding the embedded report won't refresh automatically. However, if you're using the Power BI JS framework (https://github.com/Microsoft/PowerBI-JavaScript) to embed your report from a Workspace Collection, then you can use a refresh() method on the report object to manually get the latest data, provided your report is using Direct Query.
This method is only present in version 2.2.0 of the framework and was then removed in the latest version (currently 2.2.1) while further testing around billing is performed (see https://github.com/Microsoft/PowerBI-JavaScript/commit/5230b2f96b10a1104efecdffe78255b9788526b8).
However, in my testing I found the session count remained unaffected by the refresh method. You can refresh up to intervals of 15 seconds (a limit set by the server). This may change however, given the method was removed in 2.2.1, but using 2.2.0 seems to work currently.
Here's a quick and dirty example which will refresh the report every minute within the allocated session:
powerbi.embed(reportContainer, embedConfig);
report = powerbi.get(reportContainer);
window.setInterval(function () {
report.refresh();
}, 60 * 1000);
If the session expires (after 1 hour currently) then a new JWT will need to be requested and the report needs to be reloaded with the new token.
You may want to implement some checks around the session expiry if you plan to keep the report open for more than the allotted session time.