What is the difference between using normal vs CDM tables in data flows? - powerbi

I am creating Power BI data flow, and there is a button called Map to standard. I understand this stores the data into CDM.
If I don't click on this button, then where does the data get stored? Is it into the azure data lake?
What is the difference between using non-CDM (data lake I assume) and CDM tables in data flows? For example - does either improve performance?

Related

What is the difference between writing multiple dataflows and combining via datamart versus connect to data source via data mart?

I'm exploring power bi data flows and data marts.
Data flow allow to connect to data source.
Data mart also allows to connect to data source and even to data flows.
What is the benefit of having data flows and connecting to them via data mart versus directly using datamart to connect to data source?
A Data Flow uses Power Query to copy data into a Data Lake in .CSV format.
A Data Mart uses Power Query to copy data into a relational database where it's stored in a highly-compressed columnstore.
What is the benefit of having data flows and connecting to them via data mart versus directly using datamart to connect to data source?
So this is essentially the same question as whether to load a data warehouse directly, or to load a Data Lake first, and load the warehouse from there.
The data would be available for other uses. So you might copy the raw data with a Data Flow, and then transform while loading into a Datamart. Another user might use that same raw data to build a Dataset.
On the other hand, it's an additional copy of the data, and you've got two refreshes to worry about.

PowerBi - Connection Type (DIRECT QUERY or IMPORT DATA) Question

I am working on a PowerBi project and I need some advice/questions on the best way to approach this project. I am tasked to create a dashboard for employee metrics pulled from an onsite SQL Server database. The managers here are going to have access to the PowerBi cloud, so I will end up uploading this to the cloud. There are 10 or so metrics that need to be shown on the dashboard. We have 5000+ employees. My first thought was to create a table and dump all the metrics into a table and set the PowerBi report to import the data, but that seems excessive and a waste of space to upload all that data to the CLOUD because all of the managers don't need access to every employee. They may want to see 1 or 2 employees' metrics on the dashboard.
My second thought is to (and if this is possible) create a stored procedure that will take the employee id and output a dataset for PowerBi to create a visual for. On the dashboard, have a list of employees and when a manager selects one, PowerBi will call the stored procedure with the employee id and the dataset will be returned for PowerBi to decipher into a visual based on my measurements. I guess I would set the PowerBi report connection type as DIRECT QUERY?
Here are my questions:
Is this possible? Is it possible to what I am thinking for my second plan? Is this how DIRECT QUERY works?
If so, how does DIRECT QUERY work with the PowerBi cloud?
What is setup like? Do I just install the PowerBi Data Gateway/configure it like IMPORT DATA and PowerBi does the rest?
A couple of queries:
What is the frequency of data update ?
In case if it is a batch job, it is ideally preferable to import that data from source into powerbi model and do reporting on the imported data as
a) The performance would be quicker
b) There would be no to and for of data across on prem database and cloud
c) the source would not be impacted constantly
So is the ask to have RLS wherein the managers should see only the employees under them?
Then it is pretty easy to implement RLS in imported version rather than in case of direct query.
Also you won't be able to pass parameters to stored procedures, and you can't execute them in direct query mode. You can however, create table valued functions which give you the ability to use table variables and perform other functions that are more complex in nature in Direct Query mode
you can refer this for additional details :
https://community.powerbi.com/t5/Desktop/Can-i-call-Stored-Procedure-with-Direct-Query/m-p/267141#:~:text=%40Pallavi%20you%20won't%20be,nature%20in%20Direct%20Query%20mode.

Which data is stored in Power BI - the one after query or the one after modelling?

In Power BI first we get source data. And then we add multiple query steps to filter data/remove column/etc. Then we add relations and model the data.
We can have calculated columns that are stored in the data. And measures that are not stored in the data but calculated on the fly.
Which data is stored in Power BI - the one after query or the one after modelling?
Power BI has 3 connection types for data access. They are import, direct query and live connections.
If we use import method as a connection type, data imported into Power bi file using Power BI desktop. So all the data always stays in disk. When query or refresh, data stays in computer memory.This data we can use to query and modeling. After work, we save the Power BI file it will save as file with .pbix extension. Data compressed and stored inside this file.
in direct query mode , data stays in remote location and we can connect data. each time we refresh or make change in slicer request goes to data source and bring back data to power bi. In this method, we can't access data but we can create data model.
live connection is another method. It only support for few data sources. In this method, data not stored in computer memory and can't create data model using Power BI desktop.
Power BI is very well documented. Many of the questions you've recently asked are answered in that resource, so please take a look. I get the feeling that you are using this community because you don't want to read the manual. I strongly suggest you take a look at the documentation, because everything we write in answer to your questions has already been written and documented, and SO is not meant to be a shadow user guide for well documented systems.
Depending on the data source you use in Power BI Desktop, Power BI supports query folding, which will do as much processing of the data at the source (for example SQL Server).
If query folding is not possible because the source does not support it, then the source data is loaded before the query steps are applied.
Read more about query folding here: https://learn.microsoft.com/en-us/power-bi/guidance/power-query-folding
When you perform additional modelling after the Power Queries are loaded, i.e. creating tables with DAX, adding columns, etc., these will be performed when the PBIX file is published to the Power BI service, and they will be performed each time the data is refreshed with the data gateway.

Optimize data load from Azure Cosmos DB to Power BI

Currently we have a problem with loading data when updating the report data with respect to the DB, since it has too many records and it takes forever to load all the data. The issue is how can I load only the data from the last year to avoid taking so long to load everything. As I see, trying to connect to the COSMO DB in the box allows me to place an SQL query, but I don't know how to do it in this type of non-relational database.
Example
Power BI has an incremental refresh feature. You should be able to refresh the current year only.
If that still doesn’t meet expectations I would look at a preview feature called Azure Synapse Link which automatically pulls all Cosmos DB updates out into analytical storage you can query much faster in Azure Synapse Analytics in order to refresh Power BI faster.
Depending on the volume of the data you will hit a number of issues. First is you may exceed your RU limit, slowing down the extraction of the data from CosmosDB. The second issue will be the transforming of the data from JSON format to a structured format.
I would try to write a query to specify the fields and items that you need. That will reduce the time of processing and getting the data.
For SQL queries it will be some thing like
SELECT * FROM c WHERE c.partitionEntity = 'guid'
For more information on the CosmosDB SQL API syntax please see here to get you started.
You can use the query window in Azure to run the SQL commands, or Azure Storage Explorer to test the query, then move it to Power BI.
What is highly recommended is to extract the data into a place where is can be transformed into a strcutured format like a table or csv file.
For example use Azure Databricks to extract, then turn the JSON format into a table formatted object.
You do have the option of using running Databricks notebook queries in CosmosDB, or Azure DataBricks in its own instance. One other option would to use change feed to send the data and an Azure Function to send and shred the data to Blob Storage and query it from there, using Power BI, DataBricks, Azure SQL Database etc.
In the Source of your Query, you can make a select based on the CosmosDB _ts system property, like:
Query ="SELECT * FROM XYZ AS t WHERE t._ts > 1609455599"
In this case, 1609455599 is the timestamp which corresponds to 31.12.2020, 23:59:59. So, only data from 2021 will be selected.

How to access single tenant Azure Analsysis Server with Power BI Embedded

So, currently I'm having difficulty understanding how Power BI Embedded can be setup so that each customer can access data from their own separate Azure Analysis Service, this is an App Owns Data situation. Analysis Services will be running in In-Memory mode and it will be accessed from Power BI via Live Connect.
Ideally I would like the Power BI Report to be ignorant of the data set/data source until the embedded report is provided with a parameter (e.g. connection string) which the report interprets so that it knows which server to connect to. So, ideally have: one Workspace, one Report, and zero (or a fake) Dataset.
The following is roughly what I'm looking to do (note the Red and Blue flow access a different server):
It looks like if I created both a Report and Dataset per customer I can achieve my goal but this seems like a poor approach since if the Report needs to be updated this involves updating, potentially, hundreds of reports. Also creating hundreds of Reports seems like unnecessary overhead when all Power BI needs to change for each request is the connection string pointing to the data source.
So is it possible to share the Workspace and Report across all customers but having completely separate data sources? Or is my approach in conflict with the way Power BI expects to function?
To date, I've tried using Query Parameters when configuring the data source in Power BI Desktop but I get the following error:
The connect live option for this file is disabled because it already contains data from another data source. You cannot explore live data and connect to another type of data source in the same file.
Please note,
Every report in Power BI can be connected to only one Dataset.
There is NO ability to dynamically change a connection string on the fly.
Currently, and in the foreseeable future, you'd have to clone the report & dataset per customer (or per connection setup) and modify the new dataset's connection string to match.
You can then dynamically choose which report to display based on your customer's needs.
Cloning a report can be done using:
POST https://api.powerbi.com/v1.0/myorg/reports/{report_id}/Clone
POST https://api.powerbi.com/v1.0/myorg/groups/{group_id}/reports/{report_id}/Clone
https://msdn.microsoft.com/en-us/library/mt784674.aspx
Changing the connection string would be done using:
POST https://api.powerbi.com/v1.0/myorg/datasets/{dataset_id}/Default.SetAllConnections
(similar API for groups)
https://msdn.microsoft.com/en-us/library/mt748181.aspx
using the C#.NET library provided by Power BI team, you'd use
Reports.CloneReport(string reportKey, CloneReportRequest requestParameters)
Datasets.SetAllDatasetConnections(string datasetKey, ConnectionDetails parameters)