I'm trying to create an "opportunity" calculation model, but the output results that I'm getting are not accurate. Sample problem would be McDonald's and Burger King who sell food in various regions, some regions have both BK and McD in the area and they both sell similar food types, but some have both in the area however they can't fulfill the same order type (an example would be zip code 10049 where BK and McD both exist, but McD sells burgers and BK sells salads; so BK can cover the area, but can't fulfill potential customer want.)
In the example spreadsheet, there are three tabs, first with McD sales, second with BK sales, and the third reconciles the naming convention between McD's and BK's orders.
I started by connecting the tables with relationships. I figured I need to connect McD to BK by Zip, then McD to Crossreference. Due to many-to-many relationship limitations in PBI, I'm forced to create lookup tables with unique values for zips, and order names. Looks a bit messy, but does the job. The problem is that I can look up the zip code connections, but not the sales for the potential orders.
Relationships:
This is a clear example of how things don't work. Zip code 10048 sums up McD's sales and displays it for each BK order type. The expected output would be $5 for angus and $3 for onion soup, $8 in total.
If I try to connect crossreference BK order names to BK orders, then I get an ambiguity error.
Spreadsheet data file:
https://docs.google.com/spreadsheets/d/1WM9OD7voApax7uNJ6_bJk75zfj9FQN9tSf2jU1hXl7c/edit?usp=sharing
Excel and Power BI files: https://drive.google.com/drive/folders/1hOdP5ZglHcqo_dk2GMlXr6Xmc5Ywm6nj?usp=sharing
I don't think you'll be able to do this exactly how you want to. It would basically equivalent to creating multiple relationships between two tables. Power BI doesn't let you do that.
There are some workarounds though. For example, you can pull over the McD[order] values into your BK table using a calculated column:
MDorder = MAXX(FILTER(Crossreference, Crossreference[BK] = BK[order]), Crossreference[McD])
This will allow you to pull across the price from the McD table using a lookup or similar max type function:
Price = LOOKUPVALUE(McD[price], McD[order], BK[McD Order], McD[Zip], BK[Zip])
or
Price = MAXX(FILTER(McD, McD[Zip] = BK[Zip] && McD[order] = BK[McD Order]), McD[price])
Once you do that, you can work entirely on the BK table.
Note that some price rows will have nulls since there was no corresponding McD row with matching zip and order. (I suppose you could take the median price of those orders over the zip codes they do exist in and plug that in those cases...) If the price is uniform across zip codes, then this can be made simpler.
Also, notice that when you put the price into a table and use an implicit measure on it, it will likely default to a sum and you'll get $6 for 10048 and angus since you have duplicate rows. Switching to max will get you the $3 if that's what you prefer.
This type of merging is also possible to do in the query editor, but I couldn't play with that on the pbix you included since I don't have access to the data source on your C: drive.
Related
I have a dashboard that looks like this in PowerBi:
Almost every slicer and visual on this page comes from the "visits" dataset. That dataset is 70,000+ rows, where each row stands for a single patient visit to the hospital. There are a few relevant columns for this question such as: "subject mrn", and "protocol_no" (the study they're on).
Well, elsewhere, I have a dataset called "Data Managers" that is the staff assigned to each protocol. It has relevant columns of "subject mrn", "protocol no" and "staff name"
I have these datasets in my power bi like this:
When I connect these datasets by dragging in between them, Power BI warns me that they are many-to-many relationships. This makes sense because:
Lets say staff member John is the data manager for patient 12345 on study x
Well patient 12345 might also be on study y, and on that study, staff member steve is the data manager.
Also, other patients on study x might have other data managers.
So I need to connect these datasets in a way that when I filter to John, I only get rows back from the visits data where John is the data manager for that combination of subject AND study.
When I just drag across from protocol no and subject mrn like this
it doesnt work. The dropdown appears to filter to lists for john, but when I check for accuracy, its people with totally different data managers. Any idea what to do?
If anybody is looking at this, theres probably a way to do it with managing multiple relationships, but I ended up creating a concatenated column in each dataset of "Protocol, subject_mrn" and then linking those new columns together.
Basically I have a big Excel dataset about 500x500 with economic information from various companies.
Each row is representing a different company and in columns we have the information. A little bit of it is qualitative like ZIP code, type, etc. But most of it is quantitative. For each of the quantitative info, we have info for 5 years, so we have one column for each year and for each information i.e. Debt 2019, Debt 2020, etc.
So my question is which is the best way to preprocess this data to work with it and how should it be done. Either doing the preprocessing with Excel, running a Script on PowerBI, using Query, SQL, ...
The objective is to have a report which will be accessible online and the user will type the name of the company and it will show them the dashboard with the information of that company (only that one), so they can navigate through it.
The structure and which information is shown is the same for each company, the only thing that changes is the "numbers" that each company has. So it has to be possible to change which data is showing (to use the one from the company they want).
It also needs to be able to show comparative data to other groups of companies or to the total.
I want to have it right from the start, because then changes get complicated.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info). But I am not really sure.
I know how to use Power BI but I have never used it for something this big. I would like to know which way to organize this data is better and some info on how to do it.
Many thanks to everyone.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info).
Yes, do that.
General guidance is to use Power Query in PowerBI to transform the data into a star schema model. See Understand star schema and the importance for Power BI
So that would typically result in one table that has the "dimension" data for each company, a date table, and a "fact" table at the grain of (CompanyId,Date) with the quantitative data.
I believe I have a data modelling problem since I can't connect my fact table with my ABC dimension. I just tried several things but nothing seems to work, so I am searching for some improvement tips, if you can help. I am also new using power BI. I have two excel files that I use to create the model.
This is a material flow. I would like to show it in a flow map. So, I have a material that is sold from company located in country A to company located in country B. This material can have a certain ABC classification for company A but a different ABC classification for company B.
Ex) Material 001 is A for China and B for Germany. For example, my Source can be Austria that sends this material to China and Germany.
I have the following tables:
Dimension Source and Destination (ID, Source Country, Destination Country, City, Postal Code, etc)
Dimension Material (ID, description)
Dimension ABC & Status (material ID, Source and Destination ID, ABC classification, Status)
Fact (Order Qty, Price in Eur, etc)
Therefore, one Material can have many ABC classification.
A single source can send a material to several destinations. A destination can also have multiple sources.
I also thought of just connecting ABC directly to the fact dimension by using a composed key with material and source/destination. However, I think it is conceptually wrong. In this case, I am able to see the data in a table, it works. But as soon as I plot the graph, everything blocks. I think a dimension is not supposed to be so big...
Figuring out the right relationship should be the right approach, I guess...
I would remove or deactivate the other relationships for Dim ABC, then connect it directly to the fact as you described.
The relationship definition needs to be 1 to many (Dim ABC to Fact).
For your graph challenge, you'll need to give more info.
I'm trying to create a column that has a total of values between 3 columns from 3 tables. How would I go about doing this?
The 2 tables are tables of values that share an id, and they are both linked to a table of account by Id. The goal is to add up 3 columns, and place it into a table grouped by the Id.
I've attempted summing them, trying to use the USERELATIONSHIP function, and creating a relationship between them. It seems to give very inaccurate results, as if it's summing all of the totals together, and passing them to each Id. That, or it won't let me use the column, as if it never existed.
EDIT: General Idea of what I'm trying to do (Lines should be pointing to Account's Id column, but I messed up the lines)
EDIT 2: I also forgot to illustrate or mention. There are more columns with information in each table that can't be summarized for each account preventing me from just merging the table together.
Make sure your data model looks like this (change names as you please, but the structure must be the same):
In dimensional modeling, your table "Account" is a Dimension, and both fee tables are Fact tables. The operation of combining data from multiple fact tables that share the same dimension is called "drill-across", and it's a standard functionality of Power BI.
To combine fees from these tables, you just need to use measures, not columns. This article explains the difference:
Calculated Columns and Measures in DAX
First, create 2 measures for the fees:
Fee1 Amount = SUM(Fee_1[Amount])
Fee2 Amount = SUM(Fee_2[Amount])
Then, create a third measure to combine them:
Total Fee Amount = [Fee1 Amount] + [Fee2 Amount]
Create matrix visual, and place Account_ID from the Account table on the rows. Then drop all these measures into the matrix values area, like this:
Result:
Of course, you don't have to have all these measure in the matrix, I just showed them for your convenience, to validate the results. If you remove them, the last measure still works:
I have two tables from Azure SQL in PowerBI, using direct query:
EMP(empID PK)
contactInfo(contactID PK, empID FK, contactDetail)
which have an obvious one-to-many relationship from EMP.empID to contactInfo.empID. The foreign key constraint is successfully enforced.
However I can only create a many-to-one relationship (contactInfo.empID to EMP.empID) in PowerBI. If I ever try the opposite, PowerBI always automatically converts the relationship to many-to-one (by swapping the from and to column), which prevents me from creating visuals. Does PowerBI think the two are equivalent?
Update:
What I'm doing is to just create a table in PowerBI showing the join results of these two tables. The foreign key constraint is contactInfo.empID REFERENCES EMP.empID, which is many-to-one. That should not be a problem, I guess, since I can directly query the join using SQL.
Please also suggest if I should create the foreign key in the opposite direction.
More info on failure to create visual
The exact error message is:
Can't display the data because Power BI can't determine
the relationship between two or more fields.
Version: 2.43.4647.541 (PBIDesktop)
To reproduce the error:
DB schema is as follows:
What I want is a table in PowerBI showing contact and sales info of am employee, that is, joining all the four tables. The error will occur when VALUES of the table visual contains "empName, contactDetail, contactType, productName", however, error will NOT occur if I only include "empName, contactDetail, contactType" or "empName, productName". At first I thought the problem may lie in the relationship between contactInfo and emp, but it now seems to be more complicated. I guess it may be caused by multiple one-to-many relationships?
Expanding my comments to make an answer:
Root of the Problem
In your data model, a single employee can have multiple contacts and multiple sales. But, there's no way for Power BI to know which contactDetail corresponds to which productName, or vice versa (which it needs to know to display them together in a table).
Deeper Explanation
Let's say you have 1 emp row, that joins to 10 rows in the sales table, and 13 rows in the contactInfo table. In SQL, if you start from the emp row and outer join to the other 2 tables, you'll get back (1*10)*(1*13) rows (130 rows in total). Each row in the contactInfo table is repeated for each row in the sales table.
That repetition can be a problem if you do something like sum the sales and don't realize a single sales record is repeated 13 times but might be fine otherwise (e.g. if you just want a list of sales and all associated contacts).
Power BI vs. SQL
Power BI works slightly differently. Power BI is designed primarily to aggregate numbers, and then break them down by different attributes. E.g. sales by product. Sales by contact. Sales by day. In order to do this, Power BI needs to know 100% how to divide numbers up between the attributes on your table.
At this point, I'll note that your database diagram doesn't include any obvious metrics that you'd use Power BI to aggregate. However, Power BI doesn't know that. It behaves the same whether you have metrics to aggregate or not. (And failing all else, Power BI can always count your rows to make a metric.)
Let's say that you have a metric on your sales table called Amt Sold. If you bring in the empName, productName, and Amt Sold columns, Power BI will know exactly how to divide up Amt Sold between empName and ProductName. There's no problem.
Now add in contactDetail. Using your database diagram, Power BI has no way of knowing how an Amt Sold metric in the sales table relates to a given contactDetail. It might know that $100 belongs to empID 27. And that empID 27 corresponds to 3 records in the contactInfo table. But it has no way of knowing how to divide up the $100 between those 3 contacts.
In SQL, what you'd get is 3 contacts, each showing the $100 amount sold. But in Power BI, that would imply $300 was sold, which isn't the case. Even equally dividing the $100 up would be misleading. What if the $100 belonged entirely to 1 contact? So instead, Power BI shows the error you're seeing.
My Recommendations
If you can, I recommend changing your data model before your bring it in. Power BI works best with a single fact table, which would contain your metrics (like amount sold). You then join this fact table to as many lookup tables as you like (e.g. customer, product, etc.), directly. This allows you to slice & dice your metrics with any combination of attributes from any of the lookup tables. I would recommend checking out the star schema data model and the concept of lookup tables: powerpivotpro.com/2016/02/data-modeling-power-pivot-power-bi
At the very least, you would want to flatten your tables (i.e. merge the contactInfo and sales tables into a single table before importing them into your data model.
It may be that Power BI isn't the best tool for what you're trying to accomplish. If all you want is a table showing all sales & contact info for an employee, without any associated metrics, a regular reporting tool + SQL query might be a better way to go.
Side Note: You can't reverse a many:one relationship to get past this error. The emp table contains one row per empID. Both the contactInfo and sales tables contain multiple rows with the same empID. This means the emp table is necessarily the "one" side of the relationship to both those tables. You can't arbitrarily change that.