I have created a relationship between multiple data sources and am trying to leverage information across them. I ideally want to show the cost associated with different banking transactions by company. These would come from the deals table and the cost table - the two at either end of the relationship.
I tried deleting blanks, removing excess values etc. I am hoping the relationship can be fixed.
ENTITY_INFO filters PAYMENTS but not Cost.
To calculate the total cost of all payments for a company, add a measure like
TotalPaymentCost = sumx(PAYMENTS,RELATED(Cost[Cost]))
. Most commonly this measure would be added to the PAYMENTS table, but it doesn't actually matter what table you put measures on.
Related
I would like two measures that SUM the Sales[Value] for all the Sales[ID] that have a specific StatusID in SalesStatus.
One that can filter on Sales[Date], and one that can filter on SalesStatus[statusDate]
Diagram
Regards,
Anders
In this scenario I would consider modifying your model to have only two tables by combining what appears to be two FACT tables (sales, sales status). Depending on what your data consists of I would either UNION the two tables after joining and then treat the date in your Sales table as another status date (i.e. shipped complete or sale finished, whatever that date represents) OR I would join the two tables and have two relationships to the date table.
This will create a duplicated data issue as you will ideally result in having the value column in your final fact table. If you go with the union option, you can force the user to select a single sales status effectively removing the sales duplication. If you end up with two connections to the date table, you can use the USERELATIONSHIP() function to write the two different sales measures, and the one that uses the date from the Sales table will need some clever tricks to ensure the data does measure does not duplicate. I would try to UNION the tables though.
For more details, I would research what's referred to as SEMI-ADDITIVE fact tables in datawarehousing. There is a great article from SQL BI on the subject. I have tried setting up models like you diagrammed and even if I could get them to work through intense DAX measures, they would produce unexpected results and have poor performance. I find the Semi additive fact table pattern to be a much cleaner solution once you get passed the data duplication that results.
Example:
I'm struggling triying to make a star schema from a set of tables with different origins, two SQL databases, Excel files and CSV reports, it's a bit of a puzzle.
The initial tables that they provide me are set like this:
The important points of this set of tables are:
In Products table IdProduct is not unique, because one product can be make with one type of machine in factory A, and another type of machine in factory B, so it's one row for every Factory/Machine/IdProduct combination.
The OrderItems table has mixed rows with materials and products, so you have all the products in the Order and all the materials used in each product of the same Order.
The cost of the material changes daily and is updated in the system from where I get the OrderItems table.
The delivery cost is different for each order.
The packaging and fix costs are updated once a week.
The product price changes from order to order (it is set taking into account the client, day and size of the order).
I got to this model dividing the OrderItems in products and costs (materials), and joining with them the fixed costs and the packaging costs, I haven't joined the delivery costs, but i end up with two fact tables and a snowflake schema:
I am thinking in Region, Factory, Machine, Date, Product, and a compound of cost concepts (materials, fixed costs, etc.) as dimensions, and the total amounts and quantities as facts. This to compare the total sales to the total costs around the different dimensions.
I just wanna know if this is the correct path or there is a better way, I tried to search more about the subject but the case is too specific so I get nothing.
Thanks in advance for your answers.
Choosing a star schema over a snowflake schema ?
The Star schema is in a more de-normalized form and can be better for performance. Along the same records the Star schema uses less foreign keys so the query execution time is limited. In almost all cases the data retrieval speed of a Star schema has the Snowflake beat.
But you can split your work as data marts :
A data mart is a simple form of data warehouse focused on a single
subject or line of business.
A data mart can contain star schemas and other tables for more than one warehouse pack. For example, a single data mart might contain data for the your reporting needs related to costs
I have a data model in Power BI that, among other things, has the following tables
Employees (Dimension; employee ID/name)
Jobs (Dimension; contains details about the job, including job ID)
Employee history- Contains a record for each day an associate was in a job(snapshot table);
Job Budget History- Contains a record for each day a job was budgeted(snapshot table)
Calendar Table
The table is modeled like so (simplified version):
In Power BI, I am trying to make a simplified table view that contains measures based on both the budget history as well as the employee history for the most recent day in the dataset (simple count rows/distinct count of calendar table)
However, attempting to do so gives me the results below if I try to put both measures on the table. Basically it appears to be doing a cross join between each table and matching associates with jobs they don't have (this happens when the budget is added).
Of course, if I just do one of the singular measures everything works perfectly. I am fairly certain it is because there is no real connection between the 'employee' and the 'budget history' in this relationship, so it is just joining everything on the date without any context.
I have tried several things such as making inactive relationships with userelationship(), using visual level filters etc. but I'm not sure what the best option in this situation would be. (I am trying to avoid bidirectional relationships if possible)
Ideally this information should show on this date that Joe was present as President, Sally was present as an operator, and the Manager position had nobody, but all three were budgeted.
Any advice is appreciated. I have attached a simplified mockup pbix file for reference.
PBIX File
This is a complicated problem for many reasons. I was able to produce this report:
by removing field "Name" from the table and replacing it a measure:
Employee Name =
CALCULATE(
SELECTEDVALUE(Employees[Name]),
CROSSFILTER(Employees[Employee_ID], Employee_History[Employee_ID], BOTH)
)
It looks exactly like the report you want, but if you have additional requirements, you'll need to make sure that such approach works for you.
If this is acceptable, a brief explanation:
the root cause of the issue is missing Employee-Budget relationship. When you put Name in the table as a filter, it doesn't propagate to the budget table and causes a cartesian product.
Removing Name from the table eliminates the need for the filter propagation, but then you won't see employee names. I solved this by pulling employee names with the measure, where required propagation is forced by CROSSFILTER function (essentially, it's like a temporary bi-directional relation only when you need it, so it does not negatively affect the rest of the model).
I have two tables from Azure SQL in PowerBI, using direct query:
EMP(empID PK)
contactInfo(contactID PK, empID FK, contactDetail)
which have an obvious one-to-many relationship from EMP.empID to contactInfo.empID. The foreign key constraint is successfully enforced.
However I can only create a many-to-one relationship (contactInfo.empID to EMP.empID) in PowerBI. If I ever try the opposite, PowerBI always automatically converts the relationship to many-to-one (by swapping the from and to column), which prevents me from creating visuals. Does PowerBI think the two are equivalent?
Update:
What I'm doing is to just create a table in PowerBI showing the join results of these two tables. The foreign key constraint is contactInfo.empID REFERENCES EMP.empID, which is many-to-one. That should not be a problem, I guess, since I can directly query the join using SQL.
Please also suggest if I should create the foreign key in the opposite direction.
More info on failure to create visual
The exact error message is:
Can't display the data because Power BI can't determine
the relationship between two or more fields.
Version: 2.43.4647.541 (PBIDesktop)
To reproduce the error:
DB schema is as follows:
What I want is a table in PowerBI showing contact and sales info of am employee, that is, joining all the four tables. The error will occur when VALUES of the table visual contains "empName, contactDetail, contactType, productName", however, error will NOT occur if I only include "empName, contactDetail, contactType" or "empName, productName". At first I thought the problem may lie in the relationship between contactInfo and emp, but it now seems to be more complicated. I guess it may be caused by multiple one-to-many relationships?
Expanding my comments to make an answer:
Root of the Problem
In your data model, a single employee can have multiple contacts and multiple sales. But, there's no way for Power BI to know which contactDetail corresponds to which productName, or vice versa (which it needs to know to display them together in a table).
Deeper Explanation
Let's say you have 1 emp row, that joins to 10 rows in the sales table, and 13 rows in the contactInfo table. In SQL, if you start from the emp row and outer join to the other 2 tables, you'll get back (1*10)*(1*13) rows (130 rows in total). Each row in the contactInfo table is repeated for each row in the sales table.
That repetition can be a problem if you do something like sum the sales and don't realize a single sales record is repeated 13 times but might be fine otherwise (e.g. if you just want a list of sales and all associated contacts).
Power BI vs. SQL
Power BI works slightly differently. Power BI is designed primarily to aggregate numbers, and then break them down by different attributes. E.g. sales by product. Sales by contact. Sales by day. In order to do this, Power BI needs to know 100% how to divide numbers up between the attributes on your table.
At this point, I'll note that your database diagram doesn't include any obvious metrics that you'd use Power BI to aggregate. However, Power BI doesn't know that. It behaves the same whether you have metrics to aggregate or not. (And failing all else, Power BI can always count your rows to make a metric.)
Let's say that you have a metric on your sales table called Amt Sold. If you bring in the empName, productName, and Amt Sold columns, Power BI will know exactly how to divide up Amt Sold between empName and ProductName. There's no problem.
Now add in contactDetail. Using your database diagram, Power BI has no way of knowing how an Amt Sold metric in the sales table relates to a given contactDetail. It might know that $100 belongs to empID 27. And that empID 27 corresponds to 3 records in the contactInfo table. But it has no way of knowing how to divide up the $100 between those 3 contacts.
In SQL, what you'd get is 3 contacts, each showing the $100 amount sold. But in Power BI, that would imply $300 was sold, which isn't the case. Even equally dividing the $100 up would be misleading. What if the $100 belonged entirely to 1 contact? So instead, Power BI shows the error you're seeing.
My Recommendations
If you can, I recommend changing your data model before your bring it in. Power BI works best with a single fact table, which would contain your metrics (like amount sold). You then join this fact table to as many lookup tables as you like (e.g. customer, product, etc.), directly. This allows you to slice & dice your metrics with any combination of attributes from any of the lookup tables. I would recommend checking out the star schema data model and the concept of lookup tables: powerpivotpro.com/2016/02/data-modeling-power-pivot-power-bi
At the very least, you would want to flatten your tables (i.e. merge the contactInfo and sales tables into a single table before importing them into your data model.
It may be that Power BI isn't the best tool for what you're trying to accomplish. If all you want is a table showing all sales & contact info for an employee, without any associated metrics, a regular reporting tool + SQL query might be a better way to go.
Side Note: You can't reverse a many:one relationship to get past this error. The emp table contains one row per empID. Both the contactInfo and sales tables contain multiple rows with the same empID. This means the emp table is necessarily the "one" side of the relationship to both those tables. You can't arbitrarily change that.
I need to build a formula in DAX that will show the number of customers who purchased again after their initial purchase, broken out by Product. I have a standard data warehouse with a Order Placed fact table, a Customer dimension table, and a Product dimension table. I am able to find the number of customers who purchased each product on their initial purchase by using this formula:
First Purchase Customer Count = CALCULATE(DISTINCTCOUNT(Demand[CustomerKey]),Demand[Customer Order Sequence Number] = 1)
My visual is a table with the Product as the only attribute, so this first formula is being calculated per Product. The next formula needs to count the number of customers who bought a second time, regardless of what Product they purchased that second time, but it should only include customers who purchased the current Product the first time. I have successfully created this formula to do it, but it usually errors out on the 1M row limit unless I filter the Product by subcategory.
Rebuyer Count = COUNTROWS(INTERSECT(SUMMARIZE(FILTER(Demand,Demand[Customer Order Sequence Number] = 1),[CustomerKey]),SUMMARIZE(CALCULATETABLE(FILTER(Demand,Demand[Customer Order Sequence Number] = 2),all('Product')),[CustomerKey])))
How can I improve this formula so it will run without bombing?
Old question, but on the off-chance that this is still an issue for you, have you reviewed the New and Returning Customers pattern on DaxPatterns.com? http://www.daxpatterns.com/new-and-returning-customers/
Your returning customers solution is creating a table of first-time customers, and then a table of second-time customers, and then joining those two tables together to count the number of first-time customers who are also a second-time customer.
The DaxPatterns solution is slightly different. Rather than calculating 2 potentially large tables of customers on the fly and then joining them, it is counting existing customers who have made a prior purchase.
You would need to adapt their solution to meet your requirements. They have a concept of Absolute Returning Customers (meaning the customer's first-purchase was for any product, and their second purchase was for the specific product you care about). You want the reverse of that (the customer's first-purchase was for a specific product, and their second purchase can be anything).
Overall, however, you'll have less bombing out if you start from a single list of first-time customers and filter out the customers you don't want (i.e. who never bought again) vs. compiling 2 separate lists of customers and intersecting them.
(It's the difference between letting everyone into the theatre, then throwing out everyone without a ticket, vs. checking tickets at the door and only letting people with a ticket in).