PowerBI - Comparing two similar sets of data (Many to Many) - powerbi

I am trying to compare to sets of data that are very similar. I have done a bridge relation and used M:M relationship on PowerBI but I am still not getting the result I want.
Here is an example of the data:
Dataset 1
Name | Service | Usage
A | 1 | 10
A | 2 | 20
B | 1 | 10
B | 2 | 10
C | 1 | 20
C | 2 | 10
Dataset 2
Name | Service | Usage
A | 1 | 40
A | 2 | 20
B | 1 | 40
B | 2 | 10
C | 1 | 40
C | 2 | 10
Desired output
Name | Service | Usage 1 | Usage 2
A | 1 | 10 | 40
A | 2 | 20 | 20
B | 1 | 10 | 40
B | 2 | 10 | 10
C | 1 | 20 | 40
C | 2 | 10 | 10
Is this possible in PowerBI?

One approach (as suggested in comments), is to separate the distinct Name and Service values into separate dimension tables, in the query editor:
Names:
= Table.FromList(List.Distinct(List.Combine({#"Dataset 1"[Name], #"Dataset 2"[Name]})),Splitter.SplitByNothing(),{"Name"})
Services:
= Table.FromList(List.Distinct(List.Combine({#"Dataset 1"[Service], #"Dataset 2"[Service]})),Splitter.SplitByNothing(),{"Service"})
Create the DAX measures you want:
Usage 1 = SUM ( 'Dataset 1'[Usage] )
Usage 2 = SUM ( 'Dataset 2'[Usage] )
Now create relationships between the fact tables (Dataset 1, Dataset 2) and the dimension tables (Names, Services):
Then simply layout the visual as required:

Another approach may be to combine your dataset fact tables into one table, with an added "dataset" column:
Create your "combined" table in the query editor.
Combined Table:
= Table.Combine({Table.AddColumn(#"Dataset 1", "Dataset", each "Dataset 1", type text), Table.AddColumn(#"Dataset 2", "Dataset", each "Dataset 2", type text)})
Now use this table as your single source - either with a crosstab visual:
Or by adding separate measure for each dataset:
Usage 1 = CALCULATE ( SUM('Combined Data'[Usage]), 'Combined Data'[Dataset] = "Dataset 1" )
Usage 2 = CALCULATE ( SUM('Combined Data'[Usage]), 'Combined Data'[Dataset] = "Dataset 2" )

Related

What is the more efficient way to find row-wise sum in power bi DAX?

I have a sample table with the following values:
location | col1 | col2 | col3 | col4
------------------------------------------
usa1 | 1 | 1 | 1 | 1
usa2 | 1 | 0 | 1 | 1
values are boolean for true (1) and false (0).
I would like to add a new column that shows the sum per row. from https://www.c-sharpcorner.com/article/sum-multiple-column-using-dax-in-power-bi/
it suggested the following approach:
Measure Total = SUM(table[col1]) + SUM(table[col2]) + ... + SUM(table[colx])
I am getting the expected sum for the four columns I tried. But if I have 20 columns, I was hoping you can guide me to write the DAX more efficiently.
expected output
location | col1 | col2 | col3 | col4 | sum
------------------------------------------
usa1 | 1 | 1 | 1 | 1 | 4
usa2 | 1 | 0 | 1 | 1 | 3
I would use unpivoting feature of PowerQuery to go from wide to long by selecting location and unpivot all other columns.
Then the sum by location would be immediate in any visual, no need for DAX.
One way I do it is
Sum = table[col1] + table[col2] + table[col3] + ...
I am not sure if there is another way for your situation since I only had at most 5 columns to add.

Pandas group_by string column which values contained in a separate list

I have a hierarchy-based event stream, where each hierarchy parent node(represented as level0/1) has multiple children (level0(0/1/2) and sub child (level00(0/1/2)). "level" is just a placeholder, each hierarchy level has its own unique name. The only rule is that a parent node hierarchy string is always included in the child's hierarchy string name. Assume that this event stream has 300k and more entries.
| index | hierarchystr |
| ----- | --------------------- |
| 0 | level0level00level000|
| 1 | level0level01 |
| 2 | level0level02level021|
| 3 | level0level02level021|
| 4 | level0level02level020|
| 5 | level0level02level021|
| 6 | level1level02level021|
| 7 | level1level02level021|
| 8 | level1level02level021|
| 9 | level2level02level021|
Now I want to do an inclusive group_by by a separate list and the line should be included if the string in the array is included in the string of the hierarchystr column, expected output (beware hstrs is every time in a different order!):
#hstrs = ["level0", "level1", "level0level01", "level0level02", "level0level02level021"]
|index| 0 | Count |
|-----|---------------------|-------|
|0 |level0 | 6 |
|1 |level1 | 3 |
|2 |level0level01 | 1 |
|3 |level0level02 | 4 |
|4 |level0level02level021| 3 |
I tried the following solutions, but all are slow as hell:
#V1
for hstr in hstrs:
s = df[df.hierarchystr.str.contains(hstr)]
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
beforeset.append(hstr)
#V2
for hstr in hstrs:
s = df.hierarchystr.str.extract('(' + hstr + ')', expand=True)
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
list.append(hstr)
#V3 - fastest, but also slow and not satisfying
containing =[item for hierarchystr in df.hierarchystr for item in hstrs if item in hierarchystr]
containing = Counter(containing)
df1 = pd.DataFrame([containing]).T
nodeNamesWithOver200 = df1[df1 > 200].dropna().index.values
I also tried versions for all variables at once with pat and extract, but in return the size per group changes in every run, because the list hstrs is every run in a different order.
df.hierarchystr.extract[all](pat="|".join(hstrs))
Is there a regex and method possible that do this task in one step so this is also applicable for huge data frames at an appropriate time - that not depending on the order of the hstrs array?
You can try:
count = [df['hierarchystr'].str.startswith(hstr).sum() for hstr in hstrs]
out = pd.DataFrame({'hstr': hstrs, 'count': count})
print(out)
# Output
hstr count
0 level0 6
1 level1 3
2 level0level01 1
3 level0level02 4
4 level0level02level021 3

How to sum up a measure based on different levels in Power BI using DAX

I have the following table structure:
| Name 1 | Name 2 | Month | Count 1 | Count 2 | SumCount |
|--------|--------|--------|---------|---------|----------|
| A | E | 1 | 5 | 3 | 8 |
| A | E | 2 | 1 | 6 | 7 |
| A | F | 3 | 3 | 4 | 7 |
Now I calculate the following with a DAX measure.
Measure = (sum(Table[Count 2] - sum(Table[Count 1])) * sum(Table[SumCount])
I can't use a column because then the formula is applied before excluding a layer (eg. month). Added to my table structure and excluded month it would look like that:
| Name 1 | Name 2 | Count 1 | Count 2 | SumCount | Measure |
|--------|--------|---------|---------|----------|---------|
| A | E | 6 | 9 | 15 | 45 |
| A | F | 3 | 4 | 7 | 7 |
I added a table to the view which only displays Name 1in which case the measure of course will sum up Count 1, Count 2 and SumCount and applies the measure which leads to the following result:
| Name 1 | Measure |
|--------|---------|
| A | 88 |
But the desired result should be
| Name 1 | Measure |
|--------|---------|
| A | 52 |
which is the sum of Measure.
So basically I want to have the calculation on my base level Measure = (sum(Table[Count 1] - sum(Table[Count 2])) * sum(Table[SumCount]) but when drilling up and grouping those names it should only perform a sum.
An iterator function like SUMX is what you want here since you are trying to sum row by row rather than aggregating first.
Measure = SUMX ( Table, ( Table[Count 2] - Table[Count 1] ) * Table[SumCount] )
Any filters you have will be applied to the first argument, Table, and it will only sum the corresponding rows.
Edit:
If I'm understanding correctly, you want to aggregate over Month before taking the difference and product. One way to do this is by summarizing (excluding Month) before using SUMX like this:
Measure =
VAR Summary =
SUMMARIZE (
Table,
Table[Name 1],
Table[Name 2],
"Count1Sum", SUM ( Table[Count 1] ),
"Count2Sum", SUM ( Table[Count 2] ),
"SumCountSum", SUM ( Table[SumCount] )
)
RETURN
SUMX ( Summary, ( [Count2Sum] - [Count1Sum] ) * [SumCountSum] )
You don't want measure in this case, rather you need new column,
Same formula but new column will give your desired result.
Column = ('Table (2)'[Count1]-'Table (2)'[Count2])*'Table (2)'[SumCount]

Power BI - max date or nearest date of selected date

Object: Sum up the nearest date's value by a given date
Here is my data
Table: MyData
+-------------------------------+
| ID TradeDate Value |
+-------------------------------+
| 1 2018/11/30 105 |
| 1 2018/11/8 101 |
| 1 2018/10/31 100 |
| 1 2018/9/30 100 |
| 2 2018/11/30 200 |
| 2 2018/10/31 201 |
| 2 2018/9/30 205 |
| 3 2018/11/30 300 |
| 3 2018/10/31 305 |
| 3 2018/9/30 301 |
+-------------------------------+
I create a table named 'DateList' and use slicer to select a specified date
DateList Slicer
I want to achieve the result as follows:
as of *11/9/2018*
+-----------------------------------+
| ID TradeDate Value |
+-----------------------------------+
| 1 2018/11/8 101 |
| 2 2018/10/31 201 |
| 3 2018/10/31 305 |
+-----------------------------------+
| Total 607 |
+-----------------------------------+
Currently, I try to use the steps to achieve the above result.
First, i want to find the nearest date from table 'MyData' use the new measure
MyMaxDate = CALCULATE(MAX(MyData[TradeDate]),Filter(MyData, MyData[TradeDate] <= FIRSTDATE(DateList[Date]) ))
Second, i create a new measure "MySum" to the sum up the values if [tradedate] equal to the "MyMaxDate"
MySum = CALCULATE(SUM(MyDate[Value]),Filter(MyData, MyData[TradeDate] = MyMaxDate))
Third, i create a matrix to show the result (see Result)
Unfortunately, the result 1313 is different from my goal 607
So, how can i fix my DAX formula to achieve the right result?
Many Thanks
You can calculate the closest date by taking a min over the difference in dates and then taking the minimal date with that minimal difference.
MyDate =
VAR SlicerDate = MIN(DateList[Date])
VAR MinDiff =
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
),
ABS(SlicerDate - MyData[TradeDate]))
RETURN
MINX(
FILTER(ALL(MyData),
MyData[ID] IN VALUES(MyData[ID])
&& ABS(SlicerDate - MyData[TradeDate]) = MinDiff
),
MyData[TradeDate])
From there you can create the summing measure fairly easily:
MySum = CALCULATE(SUM(MyData[Value]), FILTER(MyData, MyData[TradeDate] = [MyDate]))

How to create a table (or list) with the order codes of orders with both products

I have a Transactions table with the following structure:
ID | Product | OrderCode | Value
1 | 8 | ABC | 100
2 | 5 | ABC | 150
3 | 4 | ABC | 80
4 | 5 | XPT | 100
5 | 6 | XPT | 100
6 | 8 | XPT | 100
7 | 5 | XYZ | 100
8 | 8 | UYI | 90
How do I create a table (or list) with the order codes of orders with both products 5 and 8?
In the example above it should be the orders ABC and XPT.
There are probably many ways to do this, but here's a fairly general solution what I came up with:
FilteredList =
VAR ProductList = {5, 8}
VAR SummaryTable = SUMMARIZE(Transactions,
Transactions[OrderCode],
"Test",
COUNTROWS(INTERSECT(ProductList, VALUES(Transactions[Product])))
= COUNTROWS(ProductList))
RETURN SELECTCOLUMNS(FILTER(SummaryTable, [Test]), "OrderCode", Transactions[OrderCode])
The key here is if the set of products for a particular order code contains both 5 and 8, then the intersection of VALUES(Transations[Product]) with the set {5,8} is exactly that set and has a count of 2. If it doesn't have both, the count will be 1 or 0 and the test fails.
Please elaborate more on your question, From your above post I understood is you want to filter the list, For that, you can use below code
List<Transactions> listTransactions = listTransactions.FindAll(x=>x.Product == 5 || x.Product == 8)