I have tables as follows:
DimLocation
DimProduct
DimCustomer
FactSales
DimLocation filters both product and customer.
Product and customer filter factSales.
This looks like a diamond schema.
Power BI only allows me to make 1 Link as active: location to product Or location to customer.
Both cannot be made active as it says it will introduce ambiguity. I am looking for simple example of why it will introduce ambiguity.
My thinking is that selecting a location will filter both product and customer. Then both these tables will filter the factSales such that sales common to both dims should be kept. Therefore there is no ambiguity. Why does power bi think it will cause ambiguity?
Related
I have a data base with the following tables:
Customers, Invoices, Salesman, Target.
The ones concerned about my question are Customers, Invoices.
There are customersIDs used in the Invoices but doesn't exist in the Customers table.
If I used only the customers from Customers Table, my customer dimension would be incomplete.
My solution is to append these IDs from Invoices to Customers and fill other columns in the Customers table with nulls.
I don't know if this is the best approche according to Kimball?
also, if it is a good solution, how can I add accomplish it with Power bi desktop?
Customers table: "generated Data"
Invoice table:
..... just a sample the table is thousands of rows.
There's two points here:
Firstly, (in import mode at least) PBI already creates the "blank row" for items present in your fact table but missing from your dimension table for precisely this scenario. If you don't need the granularity of each individual missing customer id, then you don't need to do anything.
Secondly, if you need to to retain that granularity then your approach is the correct one. The way to do this in Power Query is as follows:
Create a new query which takes your customer dimension table and does a left outer join on customer id with your invoice fact table.
Expand the newly joined table but retain only the new customer id column.
Remove all columns apart from the new customer id column.
Remove duplicates
You now have a list of missing customer ids. Ensure the column name is the same as the column name of you customer id in the customer dimension table. Append this to the original customer dimension query and the nulls will be filled in automatically for the missing columns.
Please keep in mind that It is Kimball, not Kimble.
There are 4 steps of DWH Methodology:
1) Understand Business Process (What your process is actually measuring?)
2) Deciding the grain (It means what every row in your fact table actually represents?)
3) Deciding Dimensions (Ask Where-What-Who-Where-How-HowMany-HowMuch to your grain declaration formed together with business processing)
4) Define Facts (Metrics)
According to this order, You define Dimension tables before building your fact tables: If your dimension table , Customer table in this case, is missing in terms of customers available in your fact table, My biggest biggest advice according to the DWH Dimensional Modeling is to set your customer table right!!! Define every piece of customer in your dimension table!!!! Then populate your Fact table with records:
[Customer ID] in Customer Table : PRIMARY KEY
[CustomerID] in Invoice Table : FOREIGN KEY
SQL and Power BI reacts very differently in your problem:
1) Power BI has no referential integrity concept: It adds a blank row to your dimension table in such a case.
2) SQL gives referential integrity error, and you can't even add rows to your fact table. I support SQL in this case personally!!!!
Finally: Use some ETL tool(SSIS, Talend, ODI or even Power Query) to make your dimension table as accurate as possible:
For example:
Do not leave any column value as null!
If an unknown date exists, put a default date value like '1900-12-31'
If an unknown textual property, put in keywords 'unknown','not available' etc..
Because dimensional table are sources of querying in SQL statements; and different SQL Vendors (SQL Server, Oracle, MySQL) has to deal with NULL values in a different way, and this cause problems in terms of performance wise!!
I have a data model in Power BI that, among other things, has the following tables
Employees (Dimension; employee ID/name)
Jobs (Dimension; contains details about the job, including job ID)
Employee history- Contains a record for each day an associate was in a job(snapshot table);
Job Budget History- Contains a record for each day a job was budgeted(snapshot table)
Calendar Table
The table is modeled like so (simplified version):
In Power BI, I am trying to make a simplified table view that contains measures based on both the budget history as well as the employee history for the most recent day in the dataset (simple count rows/distinct count of calendar table)
However, attempting to do so gives me the results below if I try to put both measures on the table. Basically it appears to be doing a cross join between each table and matching associates with jobs they don't have (this happens when the budget is added).
Of course, if I just do one of the singular measures everything works perfectly. I am fairly certain it is because there is no real connection between the 'employee' and the 'budget history' in this relationship, so it is just joining everything on the date without any context.
I have tried several things such as making inactive relationships with userelationship(), using visual level filters etc. but I'm not sure what the best option in this situation would be. (I am trying to avoid bidirectional relationships if possible)
Ideally this information should show on this date that Joe was present as President, Sally was present as an operator, and the Manager position had nobody, but all three were budgeted.
Any advice is appreciated. I have attached a simplified mockup pbix file for reference.
PBIX File
This is a complicated problem for many reasons. I was able to produce this report:
by removing field "Name" from the table and replacing it a measure:
Employee Name =
CALCULATE(
SELECTEDVALUE(Employees[Name]),
CROSSFILTER(Employees[Employee_ID], Employee_History[Employee_ID], BOTH)
)
It looks exactly like the report you want, but if you have additional requirements, you'll need to make sure that such approach works for you.
If this is acceptable, a brief explanation:
the root cause of the issue is missing Employee-Budget relationship. When you put Name in the table as a filter, it doesn't propagate to the budget table and causes a cartesian product.
Removing Name from the table eliminates the need for the filter propagation, but then you won't see employee names. I solved this by pulling employee names with the measure, where required propagation is forced by CROSSFILTER function (essentially, it's like a temporary bi-directional relation only when you need it, so it does not negatively affect the rest of the model).
I've imported three tables into Power BI using a REST API, have added relationships between them, and am now trying to add fields from the various tables to a table on the canvas. The table names (from a Human Resources database) are named Employees, Job History, and Salary History.
Employees is joined to Job History using EmployeeID as a 1:Many relationship, and also to Salary History using EmployeeID on a 1:Many relationship.
I can add fields from the Employee table and EITHER the Salary History OR the Job History table to the table on the report canvas. However, if I try to add fields from all three tables, I'm seeing the error 'Can't display the data because Power BI can't determine the relationship between two or more fields'.
Could anyone advise where I'm going wrong? Many thanks.
If I understand correctly, you have a model like in this picture:
The way PBI filters work is: The 1: side table filters the N: side table. Filters propagate that way. So in this case, you can filter JobHistory with data from Employees, and SalaryHistory with data from Employees. But the 2 fact tables can't relate because the filters don't propagate that way.
Look into DAX measures like RELATED(), RELATEDTABLE() and USERELATIONSHIP() that might work for you.
Without that, I don't think you can use data from the 3 tables, since you have a model with 2 Fact Tables.
Someone correct me if I'm wrong.
I'm trying to figure out how to get a measure to adhere to the filter set by a slicer in Power BI.
My DAX query is: Block Time Cost = SUMX( FILTER(v_Invoice_Line_Items, v_Invoice_Line_Items[IV_Item_RecID]=9), v_Invoice_Line_Items[billable_ext_price_amount])
I know very, very little about DAX so my initial query may be way off base.
It calculates as I expect, but when filter with a date range silcer the value does not update as expected or at all.
I'm pulling my data from two views in the same database, v_Invoice and v_Data_Combined. I have a page level filter on the row Record_Description to limit the data to the types I'm looking for and the measure pulls it's data from rows in the v_Data_Combined view.
The rows in v_Invoice are below.
A sample copy is here.
and the rows for v_Data_Combined, if you click they will enlarge.
A sample copy is here.
I have no relations set between the views.
How can I have a measure adhere to slicer filters?
The slicer has to be on the same table as the measures you're filtering, or on a table related to that table. If your slicer is on a column in v_Invoice and your data is from v_Data_Combined - and the 2 tables are unrelated in Power BI, the slicer from one table will have no effect on the data from the other table.
Without sample data (which can be fake data), it's hard to make further recommendations.
However, if the two tables you have aren't really related to each other, then I would recommend exploring the possibility of "lookup" tables. E.g. if you have Company_Name in both tables, then you might add a 3rd table that is a unique list of companies (their name, address, etc). Then, when you want to slice by company you would slice on this 3rd table. That slicer will then filter both related tables (without having to have the tables related to each other).
You can read more about data modeling in Power BI, and how to design lookup tables, here: https://powerpivotpro.com/2016/02/data-modeling-power-pivot-power-bi/
I have two tables from Azure SQL in PowerBI, using direct query:
EMP(empID PK)
contactInfo(contactID PK, empID FK, contactDetail)
which have an obvious one-to-many relationship from EMP.empID to contactInfo.empID. The foreign key constraint is successfully enforced.
However I can only create a many-to-one relationship (contactInfo.empID to EMP.empID) in PowerBI. If I ever try the opposite, PowerBI always automatically converts the relationship to many-to-one (by swapping the from and to column), which prevents me from creating visuals. Does PowerBI think the two are equivalent?
Update:
What I'm doing is to just create a table in PowerBI showing the join results of these two tables. The foreign key constraint is contactInfo.empID REFERENCES EMP.empID, which is many-to-one. That should not be a problem, I guess, since I can directly query the join using SQL.
Please also suggest if I should create the foreign key in the opposite direction.
More info on failure to create visual
The exact error message is:
Can't display the data because Power BI can't determine
the relationship between two or more fields.
Version: 2.43.4647.541 (PBIDesktop)
To reproduce the error:
DB schema is as follows:
What I want is a table in PowerBI showing contact and sales info of am employee, that is, joining all the four tables. The error will occur when VALUES of the table visual contains "empName, contactDetail, contactType, productName", however, error will NOT occur if I only include "empName, contactDetail, contactType" or "empName, productName". At first I thought the problem may lie in the relationship between contactInfo and emp, but it now seems to be more complicated. I guess it may be caused by multiple one-to-many relationships?
Expanding my comments to make an answer:
Root of the Problem
In your data model, a single employee can have multiple contacts and multiple sales. But, there's no way for Power BI to know which contactDetail corresponds to which productName, or vice versa (which it needs to know to display them together in a table).
Deeper Explanation
Let's say you have 1 emp row, that joins to 10 rows in the sales table, and 13 rows in the contactInfo table. In SQL, if you start from the emp row and outer join to the other 2 tables, you'll get back (1*10)*(1*13) rows (130 rows in total). Each row in the contactInfo table is repeated for each row in the sales table.
That repetition can be a problem if you do something like sum the sales and don't realize a single sales record is repeated 13 times but might be fine otherwise (e.g. if you just want a list of sales and all associated contacts).
Power BI vs. SQL
Power BI works slightly differently. Power BI is designed primarily to aggregate numbers, and then break them down by different attributes. E.g. sales by product. Sales by contact. Sales by day. In order to do this, Power BI needs to know 100% how to divide numbers up between the attributes on your table.
At this point, I'll note that your database diagram doesn't include any obvious metrics that you'd use Power BI to aggregate. However, Power BI doesn't know that. It behaves the same whether you have metrics to aggregate or not. (And failing all else, Power BI can always count your rows to make a metric.)
Let's say that you have a metric on your sales table called Amt Sold. If you bring in the empName, productName, and Amt Sold columns, Power BI will know exactly how to divide up Amt Sold between empName and ProductName. There's no problem.
Now add in contactDetail. Using your database diagram, Power BI has no way of knowing how an Amt Sold metric in the sales table relates to a given contactDetail. It might know that $100 belongs to empID 27. And that empID 27 corresponds to 3 records in the contactInfo table. But it has no way of knowing how to divide up the $100 between those 3 contacts.
In SQL, what you'd get is 3 contacts, each showing the $100 amount sold. But in Power BI, that would imply $300 was sold, which isn't the case. Even equally dividing the $100 up would be misleading. What if the $100 belonged entirely to 1 contact? So instead, Power BI shows the error you're seeing.
My Recommendations
If you can, I recommend changing your data model before your bring it in. Power BI works best with a single fact table, which would contain your metrics (like amount sold). You then join this fact table to as many lookup tables as you like (e.g. customer, product, etc.), directly. This allows you to slice & dice your metrics with any combination of attributes from any of the lookup tables. I would recommend checking out the star schema data model and the concept of lookup tables: powerpivotpro.com/2016/02/data-modeling-power-pivot-power-bi
At the very least, you would want to flatten your tables (i.e. merge the contactInfo and sales tables into a single table before importing them into your data model.
It may be that Power BI isn't the best tool for what you're trying to accomplish. If all you want is a table showing all sales & contact info for an employee, without any associated metrics, a regular reporting tool + SQL query might be a better way to go.
Side Note: You can't reverse a many:one relationship to get past this error. The emp table contains one row per empID. Both the contactInfo and sales tables contain multiple rows with the same empID. This means the emp table is necessarily the "one" side of the relationship to both those tables. You can't arbitrarily change that.