I am looking for an answer regarding report data storage concept in Power BI.
I have published 3 reports to Power BI service (cloud):
Report1 with Excel source
Report2 with onpremise Sql server source
Report3 with azure sql source
Around 200 users in my organization will be accessing these reports. I want to understand whether:
The first time a particular report is accessed, will the data be fetched from the source and shown in the report or will it be stored to some cloud location from where the data will go to the report?
Suppose a user opens a report that was already viewed by another user, then will the data be fetched from the source again or is there any concept of cross user shared cache?
Suppose a user opens the report for the 2nd time (example: after having already accessed it, suppose user refreshes the web page), will the data will be fetched again? Or is there any concept of shared cache?
Does the answer to any of the above change if I had used the Power BI reporting server (onpremise) and deployed the report on the PBRS?
With the service, you typically upload a PBIX, which contains the report pages and all of the underlying data. Unless you set up a data gateway to accommodate DirectQueries and/or scheduled refreshes, the cloud service does not access your original data sources at all. With a scheduled refresh, it only accesses the original data during the refresh. A DirectQuery connection does access a server "live" but has many limitations.
The data is fetched when you load it into your Power BI desktop application and then loaded into the cloud when you publish the report to a workspace. Once it's there, the data shown to the user is fetched from the cloud copy, not the original data source.
Same answer as above regarding where the data is fetched from (the cloud copy). I don't believe there is shared cache between users but rather each user has some temporary caching individually. This type of caching saves the calculation results (computed on the underlying data) that are needed to populate the report visuals.
There is some caching done temporarily so that if a user switches among slicer combinations to one previously chosen you may see much quicker loading than when selecting a new configuration since it cached the results and doesn't need to recompute them. As far as I understand, this kind of caching is short-lived and not shared among users. Remember, this type of cache is not the same as the underlying data in the cloud copy of the PBIX.
I've not used an on-premise server, but I would expect the behavior to be similar with the exception that the service is on the local server instead of a could server somewhere else.
The upshot is that traffic in the service is separated from the requests to the original source data (assuming no DirectQuery connections). Those original sources are only accessed during data refreshes, which are independent of end-user actions (under the same assumption).
Related
I'm trying to find the best approach to delivering a BI solution to 400+ customers which each have their own database.
I've got PowerBI Embedded working using service principal licensing and I have the PowerBI service connected to my data through the On Premise Data Gateway.
I've build my first report pointing to 1 of the customer databases. Which works lovely.
What I want to do next, when embedding the report, is to tell PowerBI, for this session, to get the database from a different database.
I'm struggling to find somewhere where this is explained, or to understand if this is even possible.
I'm trying to avoid creating 400+ WorkSpaces or 400+ Data Sets.
If someone could point me in the right direction, it would be appreciated.
You can configure the report to use parameters and these parameters can be used to configure the source for your dataset:
https://www.phdata.io/blog/how-to-parameterize-data-sources-power-bi/
These parameters can be set by the app hosting the embedded report:
https://learn.microsoft.com/en-us/rest/api/power-bi/datasets/update-parameters-in-group
Because the app is setting the parameter, each user will only see their own data. Since this will be a live connection, you would need to think about how the underlying server can support the workload.
An alternative solution would be to consolidate the customer databases into a single database (just the relevant tables) and use row level security to restrict access for each customer. The advantage to this design is that you take the burden off of the underlying SQL instance and push it into a PBI dataset that is made to handle huge datasets with sub-second response times.
More on that here: https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-rls
I set up incremental refresh in powerBI, and it loaded all the history to the service, and now only refreshes the last data every time, as expected. I need to reload once all the history, because of changes made to historical data. Is there a way to do so?
If I republish my dataset from PowerBI desktop to the service, the data will be reloaded fully. Is there a simpler solution?
If your model is NOT in a Premium workspace, republishing is your only option.
If your model is published to a Premium workspace, you can manage the underlying data through the XMLA endpoint. You have to have that set up to be writable in your Power BI Admin settings. Once it's open, you can manage IR models with SQL Server Management Studio (MS doc here). There are also some third-party tools. Of these, Tabular Editor is the best. This is a video from Guy in a Cube that will get you started with TE.
If you have Premium, I highly recommend setting up all your IR tables as Dataflows. That separates the IR from your report, and makes managing IR much simpler in the long run. IR Dataflows are not available without Premium.
If I republish my dataset from PowerBI desktop to the service, the data will be reloaded fully. Is there a simpler solution?
Simpler, no. But you can force a full refresh with the Rest API.
applyRefreshPolicy Boolean true
If an incremental refresh policy is defined, applyRefreshPolicy will
determine if the policy is applied or not. If the policy isn't
applied, a process full operation will leave partition definitions
unchanged and all partitions in the table will be fully refreshed.
Modes are true or false.
This is what I am trying to do: I have various SQL server databases with data. I created views in all of them. All views will need to be imported, and I specify their relationships. I want this to be refreshed nightly. I want to build various reports of the same data source.
Do I have to use a PowerBI desktop application to import data into PowerBI Report Service? [I have done this so far, but then can create new reports in the cloud on existing data. It would make sense to connect directly from PowerBI report service to my SQL servers.]
Once I uploaded data using a desktop application (as I have done so far), how can I view the data model in the report service once it is uploaded in the cloud?
In order to get routinely refreshed data I need to setup a gateway. Is the local PowerBI desktop application still involved in this process, or could I [in theory] delete the local desktop application that pushed the data in initially?
For your questions:
You have two options, use PBI Desktop to connect to the data using import/direct query, then load it to the service. You can use dataflows to create an import based on your views, but you will then need to create reports from those. Using dataflows, you'll have to set up a refresh schedule, then for the dataset(s) built on top of those, you'll have to set another refresh schedule.
You will be limited to the dataset sizes of 1GB for the workspace if importing data. You cannot use direct query on dataflows (unless you have enhanced compute with PBI premium). Once the dataset is loaded, you can then create new reports in the service or via desktop on top of that dataset. If possible it is recommended to use direct query.
To see the data model, you can use desktop to connect to PBI Service Dataset. This will connect in 'Live Connection' mode, and will be limited to that one dataset, you can't add others to it, Excel, CSV, SQL etc. You can also use Analyse in Excel, a plugin for Excel, that can connect to the data model. You can create new reports in the service for existing data models as well.
When creating the report in PBI Desktop it does not use the Gateway, you connect to your data sources as normal, then once you load the dataset to Power BI it will match the data sources in the file to the ones set up in the Gateway Admin settings. So you will still need PBI Desktop to create reports, but the gateway is there for the refreshing. The Desktop is not used in the process for refreshing. You could delete the workbook or application, but if you have to make changes, what will you refer to? (You could download a copy of the report from the service).+ It is easier to make changes in the desktop app, then the service, as there is a feature difference between dataset creation in the desktop vs service.
I have developed a Power BI report using Power BI Desktop, pointing to a private on premise development database as the datasource so that I was able to develop and test it easily. Then, I published it from my Power BI Desktop pbix to the work area of my customer.
As a result, the work area contains the published report and the dataset. Later, my customer has changed the dataset so that it now points to the correct on premise production database of their own. It works perfectly.
Now, I want to publish a new report for my customer using the previously published and reconfigured dataset. The problem is that I can't see any option in Power BI Desktop to have the report point to the published dataset, nor I can't see any option to avoid creating a new dataset each time I publish a report, nor any way to reconfigure from the web portal the new published report to point to the same dataset as the first one.
Is there any way to do this or any work around for this scenario? I think the most reasonable solution would be to be able to change the dataset of any report, so that the datasets of any report could be interchangeable.
Update:
I had already used connection specific parameters, but I'm not given rights to change the published dataset, so thats a dead end.
Another thing I have come up to is that in Power BI Desktop you cannot change the connection parameters values to those of production enviroment and publish the report if you can't access the target database from your computer, because PowerBI Desktop ask you to apply changes first, and when it tries to apply the values it tries to connect to the corresponding database and, obviously, ends with a network related error or timeout error trying to connect to the database server, therefore cancelling changes and returning to the starting point.
It's always a good practice to use connection specific parameters to define the data source. This means that you do not enter server name directly, but specify it indirectly using a parameter. The same for the database name, if applicable.
If you are about to make a new report, cancel Get data dialog, define parameters as described bellow, and then in Get data specify the datasource using these parameters:
To modify an existing report, open Power Query Editor by clicking Edit Queries and in Manage Parameters define two new text parameters, lets name them ServerName and DatabaseName:
Set their current values to point to one of your data sources, e.g. SQLSERVER2016 and AdventureWorks2016. Then right click your query in the report and open Advanced Editor. Find the server name and database name in the M code:
and replace them with the parameters defined above, so the M code will look like this:
Now you can close and apply changes and your report should work as before. But now when you want to change the data source, do it using Edit Parameters:
and change the server and/or database name to point to the other data source, that you want to use for your report:
After changing parameter values, Power BI Desktop will ask you to apply the changes and reload the data from the new data source. To change the parameter values (i.e. the data source) of a report published in Power BI Service, go to dataset's settings and enter new server and/or database name:
If the server is on-premise, check the Gateway connection too, to make sure that it is configured properly to use the right gateway. You may also want to check the available gateways in Manage gateways:
After changing the data source, refresh your dataset to get the data from the new data source. With Power BI Pro account you can do this 8 times per 24 hours, while if the dataset is in a dedicated capacity, this limit is raised to 48 times per 24 hours.
This is a easy way to make your reports "switchable", e.g. for switching one report from DEV or QA to PROD environment, or as part of your disaster recovery plan, to automate switching all reports in some workgroup to another DR server. In your case, this will allow you (or your customers) to easily switch the datasource of the report.
I think the only correct answer is that it cannot be done, at least at this moment.
The most closest way of achieving this is with Live connections:
https://learn.microsoft.com/en-us/power-bi/desktop-report-lifecycle-datasets
But if you have already designed your report without using the Live connection but your own development enviroment and corresponding connection parameters then you are lost, your only chance is redo all your report with the Live Connection, or the queerest one solution, to use an alias in your configuration matching the name of the database server and the same database name that in the target production environment.
So, currently I'm having difficulty understanding how Power BI Embedded can be setup so that each customer can access data from their own separate Azure Analysis Service, this is an App Owns Data situation. Analysis Services will be running in In-Memory mode and it will be accessed from Power BI via Live Connect.
Ideally I would like the Power BI Report to be ignorant of the data set/data source until the embedded report is provided with a parameter (e.g. connection string) which the report interprets so that it knows which server to connect to. So, ideally have: one Workspace, one Report, and zero (or a fake) Dataset.
The following is roughly what I'm looking to do (note the Red and Blue flow access a different server):
It looks like if I created both a Report and Dataset per customer I can achieve my goal but this seems like a poor approach since if the Report needs to be updated this involves updating, potentially, hundreds of reports. Also creating hundreds of Reports seems like unnecessary overhead when all Power BI needs to change for each request is the connection string pointing to the data source.
So is it possible to share the Workspace and Report across all customers but having completely separate data sources? Or is my approach in conflict with the way Power BI expects to function?
To date, I've tried using Query Parameters when configuring the data source in Power BI Desktop but I get the following error:
The connect live option for this file is disabled because it already contains data from another data source. You cannot explore live data and connect to another type of data source in the same file.
Please note,
Every report in Power BI can be connected to only one Dataset.
There is NO ability to dynamically change a connection string on the fly.
Currently, and in the foreseeable future, you'd have to clone the report & dataset per customer (or per connection setup) and modify the new dataset's connection string to match.
You can then dynamically choose which report to display based on your customer's needs.
Cloning a report can be done using:
POST https://api.powerbi.com/v1.0/myorg/reports/{report_id}/Clone
POST https://api.powerbi.com/v1.0/myorg/groups/{group_id}/reports/{report_id}/Clone
https://msdn.microsoft.com/en-us/library/mt784674.aspx
Changing the connection string would be done using:
POST https://api.powerbi.com/v1.0/myorg/datasets/{dataset_id}/Default.SetAllConnections
(similar API for groups)
https://msdn.microsoft.com/en-us/library/mt748181.aspx
using the C#.NET library provided by Power BI team, you'd use
Reports.CloneReport(string reportKey, CloneReportRequest requestParameters)
Datasets.SetAllDatasetConnections(string datasetKey, ConnectionDetails parameters)