I believe I have a data modelling problem since I can't connect my fact table with my ABC dimension. I just tried several things but nothing seems to work, so I am searching for some improvement tips, if you can help. I am also new using power BI. I have two excel files that I use to create the model.
This is a material flow. I would like to show it in a flow map. So, I have a material that is sold from company located in country A to company located in country B. This material can have a certain ABC classification for company A but a different ABC classification for company B.
Ex) Material 001 is A for China and B for Germany. For example, my Source can be Austria that sends this material to China and Germany.
I have the following tables:
Dimension Source and Destination (ID, Source Country, Destination Country, City, Postal Code, etc)
Dimension Material (ID, description)
Dimension ABC & Status (material ID, Source and Destination ID, ABC classification, Status)
Fact (Order Qty, Price in Eur, etc)
Therefore, one Material can have many ABC classification.
A single source can send a material to several destinations. A destination can also have multiple sources.
I also thought of just connecting ABC directly to the fact dimension by using a composed key with material and source/destination. However, I think it is conceptually wrong. In this case, I am able to see the data in a table, it works. But as soon as I plot the graph, everything blocks. I think a dimension is not supposed to be so big...
Figuring out the right relationship should be the right approach, I guess...
I would remove or deactivate the other relationships for Dim ABC, then connect it directly to the fact as you described.
The relationship definition needs to be 1 to many (Dim ABC to Fact).
For your graph challenge, you'll need to give more info.
Related
I have a dashboard that looks like this in PowerBi:
Almost every slicer and visual on this page comes from the "visits" dataset. That dataset is 70,000+ rows, where each row stands for a single patient visit to the hospital. There are a few relevant columns for this question such as: "subject mrn", and "protocol_no" (the study they're on).
Well, elsewhere, I have a dataset called "Data Managers" that is the staff assigned to each protocol. It has relevant columns of "subject mrn", "protocol no" and "staff name"
I have these datasets in my power bi like this:
When I connect these datasets by dragging in between them, Power BI warns me that they are many-to-many relationships. This makes sense because:
Lets say staff member John is the data manager for patient 12345 on study x
Well patient 12345 might also be on study y, and on that study, staff member steve is the data manager.
Also, other patients on study x might have other data managers.
So I need to connect these datasets in a way that when I filter to John, I only get rows back from the visits data where John is the data manager for that combination of subject AND study.
When I just drag across from protocol no and subject mrn like this
it doesnt work. The dropdown appears to filter to lists for john, but when I check for accuracy, its people with totally different data managers. Any idea what to do?
If anybody is looking at this, theres probably a way to do it with managing multiple relationships, but I ended up creating a concatenated column in each dataset of "Protocol, subject_mrn" and then linking those new columns together.
I'm starting with Power Bi and I need some help to create the data model for this practice case.
I receive the data in a table that has the power consumption of different houses by hour. In the first column it has the date and hour and the rest of columns has the power consumption of each house (one column for each house).
My idea is to create a dashboard with a map as a filter to select the house and a date slicer to filter a period of time.
I dont know exactly what is the better way to face that, if create a subtable for each house or what.
I know that I also have to create another table with the information of each house with coordinates and relate it to the information table, but this information table has the name of the houses in the header of each column and the name is followed by the text "Power Cosumption".
I would appreciate your help, I think that this is not a complicate problem and you can give me some clues or examples to face it.
Thanks!
Regards
I am building a dashboard to compare baseline demographics to those of a population of interest.
I am visualizing demographic composition with a 100% bar chart, with bars over time, and the Legend as a parameter containing the 4 field options.. I then set up a similar visual adjacent to the first to show a one-bar chart across the top of overall baseline group composition as a comparison to the population of interest over time.
Data on the population of interest is at the individual person level:
Name
Race
Ethnicity
Gender
Degree
xxxx
Asian
non-Hisp
Female
Doctorate
xxxx
White
Hispanic
Female
Masters
xxxx
White
non-Hisp
Male
Doctorate
Baseline data was obtained from a self-serve table builder, so data was not at level of record but at the "smallest subgroup" (i.e. a hierarchical table ultimately broken into individual subgroups of unique Race-Ethnicity-Gender-Degree value options, plus respective counts (e.g., one record could be for 5000 Asian non-Hispanic Women with a Doctorate):
Race
Ethnicity
Gender
Degree
Count
Asian
non-Hisp
Female
Doctorate
5000
White
Hispanic
Female
Masters
3000
White
non-Hisp
Male
Doctorate
200
I grouped the first table across those four fields to align the data for comparison -- My initial plan was to combine the tables so the same field parameter can be used for both visuals, and the visuals just need to be filtered subsets of the combined table (a simple "Source" field accomplishes this).
The wrinkle is that I can't utilize the grouped target data in the bar chart over time, as I am contextualizing it with other record-level dimension tables in the data model (including the date variable used to stratify the bars over time). This means that the target data's bar chart and the baseline bar chart must rely on two different parameters, one for each table's fields.
Especially since the fields are identically named, I was hoping for a way to command one field parameter to just take the current value of another so they can be synced and only one parameter needs to be manipulated.
Is this possible? Or, can this use case be clarified and implemented differently? The only workable solutions I can figure are clunky and undesirable (e.g., require users to manage two identical slicers). Other avenues I explored were not fruitful (create a Measure equal to the current value of the param, manipulate the param table to reference the second table in an additional field or embedded as two-element lists in the <Param> Field column). I was also hopeful for Sync Sliders, however that just imposes the same parameter onto multiple slicers and does not address this use case.
Basically I have a big Excel dataset about 500x500 with economic information from various companies.
Each row is representing a different company and in columns we have the information. A little bit of it is qualitative like ZIP code, type, etc. But most of it is quantitative. For each of the quantitative info, we have info for 5 years, so we have one column for each year and for each information i.e. Debt 2019, Debt 2020, etc.
So my question is which is the best way to preprocess this data to work with it and how should it be done. Either doing the preprocessing with Excel, running a Script on PowerBI, using Query, SQL, ...
The objective is to have a report which will be accessible online and the user will type the name of the company and it will show them the dashboard with the information of that company (only that one), so they can navigate through it.
The structure and which information is shown is the same for each company, the only thing that changes is the "numbers" that each company has. So it has to be possible to change which data is showing (to use the one from the company they want).
It also needs to be able to show comparative data to other groups of companies or to the total.
I want to have it right from the start, because then changes get complicated.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info). But I am not really sure.
I know how to use Power BI but I have never used it for something this big. I would like to know which way to organize this data is better and some info on how to do it.
Many thanks to everyone.
I thought about doing sort of a "relational model" with one "table" for each company with the quantitative data (with one row for each year and each column one info point) and then a general table with the qualitative data (with rows being each company and the columns the info).
Yes, do that.
General guidance is to use Power Query in PowerBI to transform the data into a star schema model. See Understand star schema and the importance for Power BI
So that would typically result in one table that has the "dimension" data for each company, a date table, and a "fact" table at the grain of (CompanyId,Date) with the quantitative data.
I'm trying to create an "opportunity" calculation model, but the output results that I'm getting are not accurate. Sample problem would be McDonald's and Burger King who sell food in various regions, some regions have both BK and McD in the area and they both sell similar food types, but some have both in the area however they can't fulfill the same order type (an example would be zip code 10049 where BK and McD both exist, but McD sells burgers and BK sells salads; so BK can cover the area, but can't fulfill potential customer want.)
In the example spreadsheet, there are three tabs, first with McD sales, second with BK sales, and the third reconciles the naming convention between McD's and BK's orders.
I started by connecting the tables with relationships. I figured I need to connect McD to BK by Zip, then McD to Crossreference. Due to many-to-many relationship limitations in PBI, I'm forced to create lookup tables with unique values for zips, and order names. Looks a bit messy, but does the job. The problem is that I can look up the zip code connections, but not the sales for the potential orders.
Relationships:
This is a clear example of how things don't work. Zip code 10048 sums up McD's sales and displays it for each BK order type. The expected output would be $5 for angus and $3 for onion soup, $8 in total.
If I try to connect crossreference BK order names to BK orders, then I get an ambiguity error.
Spreadsheet data file:
https://docs.google.com/spreadsheets/d/1WM9OD7voApax7uNJ6_bJk75zfj9FQN9tSf2jU1hXl7c/edit?usp=sharing
Excel and Power BI files: https://drive.google.com/drive/folders/1hOdP5ZglHcqo_dk2GMlXr6Xmc5Ywm6nj?usp=sharing
I don't think you'll be able to do this exactly how you want to. It would basically equivalent to creating multiple relationships between two tables. Power BI doesn't let you do that.
There are some workarounds though. For example, you can pull over the McD[order] values into your BK table using a calculated column:
MDorder = MAXX(FILTER(Crossreference, Crossreference[BK] = BK[order]), Crossreference[McD])
This will allow you to pull across the price from the McD table using a lookup or similar max type function:
Price = LOOKUPVALUE(McD[price], McD[order], BK[McD Order], McD[Zip], BK[Zip])
or
Price = MAXX(FILTER(McD, McD[Zip] = BK[Zip] && McD[order] = BK[McD Order]), McD[price])
Once you do that, you can work entirely on the BK table.
Note that some price rows will have nulls since there was no corresponding McD row with matching zip and order. (I suppose you could take the median price of those orders over the zip codes they do exist in and plug that in those cases...) If the price is uniform across zip codes, then this can be made simpler.
Also, notice that when you put the price into a table and use an implicit measure on it, it will likely default to a sum and you'll get $6 for 10048 and angus since you have duplicate rows. Switching to max will get you the $3 if that's what you prefer.
This type of merging is also possible to do in the query editor, but I couldn't play with that on the pbix you included since I don't have access to the data source on your C: drive.