I have multiple databases which have similar schemas. I need to combine the data from all these databases and do reporting over it.
For example -
Customer table in AdventureWorks in Server 1
Customer table in AdventureWorks in Server 2
Customer table in AdventureWorks in Server 3
Now in Power BI i will have a data set called Customer. The data for this needs to come from all the 3 servers mentioned above. I know I can do it using merge queries in Power BI but it means I will have to pull the data from different server as different datasets in power bi and merge which I want to avoid.
Do let me know if there is any other way to do this.
You must use Edit queries -> Append queries (or Append queries as new) to combine the data from these data sources into one table:
I find Append queries as new more meaningful in this case. It will create a new (fourth) table, which will contain all the rows from the other 3 tables. Select Three or more tables and select all customers tables from your data sources:
Then use this new table in the report.
You should not worry that duplicating the data will lead to increased size of the data. The data itself will be reused and not copied twice.
Your model is the right place for combining data from different data sources in one place. Maybe this should not be your report, but one dataset shared between your reports, or a SSAS model. Combining the data on SQL Server level (e.g. partitioned view over 3 databases) is not a good idea. Combining these in your model also gives you the option to combine data from different types of data sources, e.g. Customers 1 is in SQL Server, Customers 2 is .csv file in OneDrive and Customers 3 is coming from a web service call.
Related
Just looking for a pointer as to the best way to go about this.
I'm comfortable with Power BI Report Builder (SSRS experience), but am pretty much a Power BI novice.
Basically, we have to create a Paginated (non-interactive) report for client consumption. It's going to be large, have multiple datasets, and use parameters / presence of data in the data sets to group data and/or turn sections on or off.
Not too much visualisation - some illustrative graphs and tables here and there - and quite a bit of text, some of it with data / text inserted via placeholders from the various datasets.
There are 3 Azure SQL databases I need to combine data from for this, (split roughly into config, data and results).
In SSRS / SQL Server, I would have used one of my databases as the data source, and written a stored procedure per SSRS data set, joining to tables in other databases in the stored procedure query.
Then in Report builder just set up the data sets joining to the stored procs and gone from there.
On Azure SQL Server, I think I've got 2 options:
write elastic queries so I can bring in the data I need from each database, but just query on one database.
Build a Power BI Model / Dataset that joins the relevant tables from the 3 databases together, publish to power bi service and use that as my datasource.
What's the best solution for my reporting scenario?
Cheers
I have 2 datasets deployed to power bi portal.
On a report I can connect to 1 dataset using live connection, and then convert the connection to DQ to also connect to the 2nd dataset.
Then I can create relationship between the model travels. How to merge (append) data from 2 live models?
Today you can't.
You can leave the tables separate and write DAX measures that operate over both tables.
But if you try creating a DAX calculated table that appends the two, refresh will fail in the service, as this scenario is not currently supported.
Instead of using Direct Query to Power BI Datasets, if the Datasets are on Premium Capacities, you can import tables from both models using the Analysis Services data source and an explicit DAX query. eg
let
Source1 = AnalysisServices.Database("powerbi://api.powerbi.com/v1.0/myorg/someworkspace", "AdventureWorksDW", [Query="evaluate DimCustomer", Implementation="2.0"]),
Source2 = AnalysisServices.Database("powerbi://api.powerbi.com/v1.0/myorg/someworkspace", "AdventureWorksDW2", [Query="evaluate DimCustomer", Implementation="2.0"]),
Appended = Table.Combine({Source1, Source2})
in
Appended
I have 3 OLTP databases, all using the same database schema. Each db represents one department.
I am exploring Power BI as a solution for reporting at the company level, so all departments combined.
What is the approach to combine data from multiple dbs into a data warehouse? For example - do I need SSIS to combine the 3 dbs into 1 data warehouse?
Another option could be to have 1 shared dataset per db, and then the final report can connect and combine multiple live datasets? Or is there another way with Power BI like combining multiple live datasets?
Any reference link on how if someone has done this?
Or is there another way with Power BI
Yes. Simply create a single import model and load data from all three databases in it. So for each table in your Power BI model you would have three Power Queries set to not load into the model, and you would append them in a query that is used to load your model. See eg: https://learn.microsoft.com/en-us/power-query/append-queries
Best practice would be to:
Extract the data into a single database (DWH or reporting schema)
Build the necessary items there for your data model, be it reporting schema, or star/snowflake schemas
Connect Power BI to that schema.
Combining datasets is going to be tricky, you may have the same measures in each of the datasets. Combining in the database, with any added columns to indicate the department is the best option in terms of supporting updating/adding/removing items. For example, if the schema changes in the DB's you do it in one place, not three datasets. The toolset in DB/SSIS will be better suited to the heavy lifting of the data to a location.
You would use SSIS to extract the data if on-prem data, Azure Data Factory for Azure DB's. Extract to a staging schema, convert/transform the data into its final from, with a new schema to define what it is, facts/dimensions other schema names such as reporting can be used, depending on the data model you wish to build. Most of this is covered by the standard ETL pattern of OLTP to an OLAP database.
Connection Type: Direct Query to multiple sources so limited DAX available especially in Power Query load.
Data Model Query: The Data model is not a perfect star schema but there is an attempt to separate tables into business processes and lookup tables. There are probably a few issues to discuss the current data model. I only have 1 question at this time.
My current goal is to generate a single summarised customer table to replace the current two tables that have some measure I need like the number of app customers, a number of total customers, date customer first accessed app etc.
So I cannot merge the 2 customer tables and add calculated columns and measures at the import stage as power bi does not support or allow it and sql is out as I am using direct query. My plan is to create a summarised customer table using DAX summarise function on front end visual page, that has only the app customers and then measures like the total number of customers etc. Is this best practice or is there a better way of approaching this? Understand you would ideally do in sql, or power query but in these circumstances, I think this is the best way but wanted a second view.
Is there a reason to use Direct Query over Import? If you are in Import mode, you can easily Append the two client tables together in PowerQuery.
Treb Gatte, Power BI MVP
I’m building an model in Azure Analysis Services. The model should contain only data for the last 3 months and is processed every day.
I have a separate dimension for date that has a relation with a fact table using a datekey. I’m using a power query to only load the last 3 months in the date dimension. In the power query to load the fact table I used Table.nestedjoin to only load the rows that have a value in the date table.
When I do this, the processing of the model takes forever. After some troubleshooting I saw that the query Analysis Services is using to retrieve data from the SQL database retrieves all rows. So, Am I correct saying AS load all data before it merge the rows? Is there a way to change this? Or is there a better way to a chief my solution?
Kind regards,
Joins are super slow in Power Query. You should avoid them if you can do it in the datasource or use normal relationships in the data model.
Also, you can setup the date dimension in DAX and dynamically populate it to contain only dates present in the FACT table.
As for the load of all the data, it could be because the data is fetched as is, and only then power query applies the transformations (the join).
You can modify the query in the Power Query Editor / Advenced Editor to add a where clause direclty in the query