Strategies to map attributes from multiple dimension tables

Strategies to map attributes from multiple dimension tables - powerbi

I'm trying to tackle a requirement in AAS (Azure Analysis Service) tabular model.
In the underlying data model (specific to a particular application), I've a Sales Fact table which joins with multiple dimension tables (lets call these application specific dimension tables) using usual PK-FK relationships.
These application specific dimension are frozen for any changes in
DWH, at the end of reporting month-year and hence can be considered
to have snapshots for all reporting month-years.
Each of these dimension tables has a corresponding "enterprise"
version which is a Type 2 dimension table. That is, for a dimension
such as Product, the enterprise version of the stores the history of
changes for dimension attributes with usual fields such as valid-from
date (start of the month-year), valid to-date (end of the month date)
and latest indicator ('Y' value for the latest version of a specific
reporting month-year) . An new attribute value means a new Product
FK.
Both versions of these dimension tables and the Sales fact would be part of the AAS tabular model.
Now, as part of the reporting requirement I need to build a report (over this tabular model) running for a particular reporting month-year and show Sales data along with specific dimension attribute for the month-year.
Additionally, the report needs to show the data of these dimension attributes from the "enterprise version" of the dimension tables as well, with the latest values (Latest indicator='Y').
For example
Considering just one set of dimension tables
Product-App (Product Name, Product Group, Product PK .... so on)
Product-Enterpise (Product Name, Product Group, Valid-From, Valid-To, Latest Indicator ... so on)
and the Sales Fact :-
**if the user is running the report for Nov-2021 and running this in Jan-2022, the report needs to show Nov Sales data with
Product attributes from Product-App and
Latest values of the Product attributes from Product-Enterprise
That is; for point 2 requirement, the tabular model needs to provide a way for the user to bring the latest values of the Product dimension attributes from Product-Enterprise.**
My understanding is: such capability would mean that to bring the latest values of specific dimension attribute in this scenario/data model, a calculated column or measure approach would need to be taken. Either a calculated column: to map and get the latest values of dim attributes in application-specific dimension or a measure. These calculated columns and measure would then can be consumed from the model for reporting.
I'm looking to any pointers if this is the best approach or are there any alternate approaches can be taken to best address such a requirement.
Thanks!

Related

What is best practice to deal with missing data according to Kimball?

I have a data base with the following tables:
Customers, Invoices, Salesman, Target.
The ones concerned about my question are Customers, Invoices.
There are customersIDs used in the Invoices but doesn't exist in the Customers table.
If I used only the customers from Customers Table, my customer dimension would be incomplete.
My solution is to append these IDs from Invoices to Customers and fill other columns in the Customers table with nulls.
I don't know if this is the best approche according to Kimball?
also, if it is a good solution, how can I add accomplish it with Power bi desktop?
Customers table: "generated Data"
Invoice table:
..... just a sample the table is thousands of rows.

There's two points here:
Firstly, (in import mode at least) PBI already creates the "blank row" for items present in your fact table but missing from your dimension table for precisely this scenario. If you don't need the granularity of each individual missing customer id, then you don't need to do anything.
Secondly, if you need to to retain that granularity then your approach is the correct one. The way to do this in Power Query is as follows:
Create a new query which takes your customer dimension table and does a left outer join on customer id with your invoice fact table.
Expand the newly joined table but retain only the new customer id column.
Remove all columns apart from the new customer id column.
Remove duplicates
You now have a list of missing customer ids. Ensure the column name is the same as the column name of you customer id in the customer dimension table. Append this to the original customer dimension query and the nulls will be filled in automatically for the missing columns.

Please keep in mind that It is Kimball, not Kimble.
There are 4 steps of DWH Methodology:
1) Understand Business Process (What your process is actually measuring?)
2) Deciding the grain (It means what every row in your fact table actually represents?)
3) Deciding Dimensions (Ask Where-What-Who-Where-How-HowMany-HowMuch to your grain declaration formed together with business processing)
4) Define Facts (Metrics)
According to this order, You define Dimension tables before building your fact tables: If your dimension table , Customer table in this case, is missing in terms of customers available in your fact table, My biggest biggest advice according to the DWH Dimensional Modeling is to set your customer table right!!! Define every piece of customer in your dimension table!!!! Then populate your Fact table with records:
[Customer ID] in Customer Table : PRIMARY KEY
[CustomerID] in Invoice Table : FOREIGN KEY
SQL and Power BI reacts very differently in your problem:
1) Power BI has no referential integrity concept: It adds a blank row to your dimension table in such a case.
2) SQL gives referential integrity error, and you can't even add rows to your fact table. I support SQL in this case personally!!!!
Finally: Use some ETL tool(SSIS, Talend, ODI or even Power Query) to make your dimension table as accurate as possible:
For example:
Do not leave any column value as null!
If an unknown date exists, put a default date value like '1900-12-31'
If an unknown textual property, put in keywords 'unknown','not available' etc..
Because dimensional table are sources of querying in SQL statements; and different SQL Vendors (SQL Server, Oracle, MySQL) has to deal with NULL values in a different way, and this cause problems in terms of performance wise!!

Power BI problem in direct active relationship

I have a problem with Power BI in the use of multiple direct active relationship between two table.
I have this 2 table: Population and Cost and I want to graph the the costs over the years in the different regions according to the population of the region.
In Population the attributes of my interest are: Year, Region, #Population, insted in Cost I have: Year, Region, Costs. So I need to create two active relationship between Year and Region in the two tables. However I can't do that in Power BI.
I tryed to generate a new Population table (an identical copy of the other) and create the two active relationship: the first (for Years) between Cost and one Population, the second (for Region) between Cost and the other Population.
Unfortunately this solution is not successful, in fact, by dividing the costs for the population, in one case I obtain the aggregation over all the years, in the other the aggregation of all regions.
Does anyone have any idea how to solve the problem?

Each Power BI relationship is limited to a single column from each table. The typical workaround to your scenario is to create a concatenated column on each table.
You can do this in the Query Editor (my preference) using the Merge Columns button from the Add Column ribbon.
The other method would be using the Add Column button (e.g. on the Modeling ribbon) and writing the DAX formula e.g.
Year Region Key = [Year] & [Region]
Once you have a concatenated column added to both tables, use that to create the active relationship.

Showing items with no data

Recently I started working in Power-BI to generate few reports. I am new to the Power BI. So far i am able to manage key task, but stuck at one point--
I have one matrix in my report which uses one measured column. I have used IF condition in that measure column, and based on this condition categorised them in 3 types. Now when i am populating these on matrix, i can see only 2 categories not 3. The reason behind this is that there is no value falling under the third category. but i want to show 3rd category as well with zero data. I have tried "Show item with no data" but no luck.Any help will be appreciated. Thanks in advance.

Give it a try with below steps.
Create a separate "Categories" table with all possible categories.
Create a relationship between the "Categories" table and your calculated column.
Use categories from "Categories" table in the visual. Mark "Show items with no data".
Let's see a simplified example.
I have a Sales table (with very small number of rows for simplicity) like this. There are possibly 3 categories, A, B, and C. However, category C is not yet appearing in the existing data.
In my matrix visual, there is no category C, with no wonder.
Now, I create a Category table with all possible categories including C, then build a many-to-one relationship between Sales and Categories tables.
It is advised that you turn Category in Sales table ("many" side) to be hidden in report view, to make sure the users will correctly choose the one from Categories table.
Then, in the setting of the matrix visual, I replace the Category with the one in Categories table, and mark "Show items with no data".
Category C is successfully shown up in the matrix with empty value.

One-to-many relationship always changed into many-to-one by PowerBI

I have two tables from Azure SQL in PowerBI, using direct query:
EMP(empID PK)
contactInfo(contactID PK, empID FK, contactDetail)
which have an obvious one-to-many relationship from EMP.empID to contactInfo.empID. The foreign key constraint is successfully enforced.
However I can only create a many-to-one relationship (contactInfo.empID to EMP.empID) in PowerBI. If I ever try the opposite, PowerBI always automatically converts the relationship to many-to-one (by swapping the from and to column), which prevents me from creating visuals. Does PowerBI think the two are equivalent?
Update:
What I'm doing is to just create a table in PowerBI showing the join results of these two tables. The foreign key constraint is contactInfo.empID REFERENCES EMP.empID, which is many-to-one. That should not be a problem, I guess, since I can directly query the join using SQL.
Please also suggest if I should create the foreign key in the opposite direction.
More info on failure to create visual
The exact error message is:
Can't display the data because Power BI can't determine
the relationship between two or more fields.
Version: 2.43.4647.541 (PBIDesktop)
To reproduce the error:
DB schema is as follows:
What I want is a table in PowerBI showing contact and sales info of am employee, that is, joining all the four tables. The error will occur when VALUES of the table visual contains "empName, contactDetail, contactType, productName", however, error will NOT occur if I only include "empName, contactDetail, contactType" or "empName, productName". At first I thought the problem may lie in the relationship between contactInfo and emp, but it now seems to be more complicated. I guess it may be caused by multiple one-to-many relationships?

Expanding my comments to make an answer:
Root of the Problem
In your data model, a single employee can have multiple contacts and multiple sales. But, there's no way for Power BI to know which contactDetail corresponds to which productName, or vice versa (which it needs to know to display them together in a table).
Deeper Explanation
Let's say you have 1 emp row, that joins to 10 rows in the sales table, and 13 rows in the contactInfo table. In SQL, if you start from the emp row and outer join to the other 2 tables, you'll get back (1*10)*(1*13) rows (130 rows in total). Each row in the contactInfo table is repeated for each row in the sales table.
That repetition can be a problem if you do something like sum the sales and don't realize a single sales record is repeated 13 times but might be fine otherwise (e.g. if you just want a list of sales and all associated contacts).
Power BI vs. SQL
Power BI works slightly differently. Power BI is designed primarily to aggregate numbers, and then break them down by different attributes. E.g. sales by product. Sales by contact. Sales by day. In order to do this, Power BI needs to know 100% how to divide numbers up between the attributes on your table.
At this point, I'll note that your database diagram doesn't include any obvious metrics that you'd use Power BI to aggregate. However, Power BI doesn't know that. It behaves the same whether you have metrics to aggregate or not. (And failing all else, Power BI can always count your rows to make a metric.)
Let's say that you have a metric on your sales table called Amt Sold. If you bring in the empName, productName, and Amt Sold columns, Power BI will know exactly how to divide up Amt Sold between empName and ProductName. There's no problem.
Now add in contactDetail. Using your database diagram, Power BI has no way of knowing how an Amt Sold metric in the sales table relates to a given contactDetail. It might know that $100 belongs to empID 27. And that empID 27 corresponds to 3 records in the contactInfo table. But it has no way of knowing how to divide up the $100 between those 3 contacts.
In SQL, what you'd get is 3 contacts, each showing the $100 amount sold. But in Power BI, that would imply $300 was sold, which isn't the case. Even equally dividing the $100 up would be misleading. What if the $100 belonged entirely to 1 contact? So instead, Power BI shows the error you're seeing.
My Recommendations
If you can, I recommend changing your data model before your bring it in. Power BI works best with a single fact table, which would contain your metrics (like amount sold). You then join this fact table to as many lookup tables as you like (e.g. customer, product, etc.), directly. This allows you to slice & dice your metrics with any combination of attributes from any of the lookup tables. I would recommend checking out the star schema data model and the concept of lookup tables: powerpivotpro.com/2016/02/data-modeling-power-pivot-power-bi
At the very least, you would want to flatten your tables (i.e. merge the contactInfo and sales tables into a single table before importing them into your data model.
It may be that Power BI isn't the best tool for what you're trying to accomplish. If all you want is a table showing all sales & contact info for an employee, without any associated metrics, a regular reporting tool + SQL query might be a better way to go.
Side Note: You can't reverse a many:one relationship to get past this error. The emp table contains one row per empID. Both the contactInfo and sales tables contain multiple rows with the same empID. This means the emp table is necessarily the "one" side of the relationship to both those tables. You can't arbitrarily change that.

power bi - how to manage unrelated dimensions

I'm attempting to create a shared date dimensions between two fact tables in Power BI, based off of a relational data source.
Currently, if I include an unrelated dimension in the report, I get numbers duplicated across multiple rows, where they don't really apply.
I'm wondering if there is any way to tell Power BI that certain dimensions cannot be used with certain fact tables, similar to using IgnoreUnrelatedDimensions in SSAS.
Currently the only solution I can find is to create a separate date dimension, so that the two fact tables have no relationship that could be used to join them, however this would mean forfeiting the ability to do any time based comparisons.

Create a combined view of the fact tables with only compatible columns to be used for time based comparison:
In Query Editor, create new queries for your fact tables by
referencing i.e. right click original query and select "Reference".
Then in those "copies" cut out the incompatible dimensions.
Rename columns to align terminology (e.g. Sales Date ==> Transaction Date, Payment Date ==> Transaction Date).
Use "Merge Queries" function to combine the copies using Full Outer Join.
Join this merged view to your date dimension

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js