Related
I have a simple CSV data set such as this.
ID,MainCategory,SubCategory,Type,Value
1,E,E1,Demo,5
2,N,N3,Install,2
3,E,E1,Demo,4
4,E,E2,Install,7
5,D,D1,Install,3
6,S,S2,PM,4
7,N,N2,Install,7
8,N,N2,Demo,1
9,E,E2,Demo,2
10,D,D2,Install,6
11,D,D3,PM,4
12,S,S1,PM,8
13,N,N1,Install,5
14,S,S3,Install,8
15,S,S1,Demo,9
16,E,E3,Demo,5
17,N,N2,Install,3
18,E,E2,PM,6
19,D,D2,PM,6
20,N,N3,Demo,6
21,S,S2,Demo,7
22,E,E3,Install,2
23,S,S1,Install,4
24,S,S2,PM,8
25,D,D1,Install,5
In my Power BI Desktop, I'd like to load this into a table, and conditionally format the Value column based on whether the value in each row is greater than or less than the average for the currently selected data set.
For instance, the average of Value considering the entire table is 5.08, so if there are no filters applied (as in, all my slicers are set to select nothing), I'd like all rows whose Value is 6 or more to be background colored in one color, and the others in another color. For this, I created two measures like so:
AvgOfVal = DIVIDE( SUM(G2G[Value]), COUNTA(G2G[ID]) )
BGColor = IF(SUM(G2G[Value]) > [AvgOfVal], "Light Pink", "Light Blue")
Then I tried to apply the BGColor measure for conditionally formatting the background, but this doesn't work as expected, and instead produces the result below.
I realize that this is due to the fact that the measure is calculated per row, so when conditional formatting is applied, as seen in the AvgOfVal column in the table, it calculates average per row instead of for the entire data set. How can I calculate a measure that takes into account the entire data set (considering slicers), and do the conditional formatting as I need.
Please keep in mind that if a user were to select a slicer filter (say, MainCategory = D), then I want the conditional formatting to reflect this. So in this case, given that AvgOfVal = 4.80 for MainCategory = D entries, I'd like all rows whose Value >= 5 to be in one color, and others in another color.
I realize that this is due to the fact that the measure is calculated per row
Yes. The key is understanding how that happens. When the measure is calculated a "context transition" happens and the current row is added to the filter context.
So what you want is a calculation that removes the row filter that was added in the context transition. So you need ALLSELECTED(), which does precisely that. eg
AvgOvVAl = CALCULATE( AVERAGE('data'[Value]), ALLSELECTED() )
Removing the "innermost" filter which in this case is the filter on the row, but leaving all other filters, ie filters added on the report, page, visual, or filters coming from interactions with other visuals like slicers.
I have a query in Power BI that takes two parameter: Start Date and End Date.
Whenever I pass these Dates it return a table of Date that contain few columns created according to this range of date such as Date, QuarterofYear, Year, MonthName......etc.
Can we create a mapping data flow in ADF that takes two parameter as input and return a calculated table according to provided dates?
Is there any function that return the range of dates?
For your request: "I want that I pass two date Start Date and End Date in ADF Mapping Data Flow , and Data flow will Create a column such as "Date" that contain that number of Date rows. Is there any function for this? Exam. Start Date=20-01-2019, End Date=20-01-2020 Then Date Column Values should be: 20-01-2019 21-01-2019 ......... ......... 20-02-2020", according the Data Factory documents and my experience, the answer is no, we can't achieve it in Data Flow.
There is a solution to this, but it is a bit tricky.
TL;DR
The general data flow looks like this:
We need a dummy source with exactly one row which contains whatever.
Then we derive a column where we use the mapLoop() expression to create an array of all the dates we want to get rows for.
Finally, we need to flatten the array column which will result in one row per array entry and thus one row per date.
Walkthrough
Source dummy
Each dataflow needs a source and we need exactly one row to make our dataflow work. To achieve this I've created a dataset called empty of type CSV in my data lake which has this content:
empty
""
This is our source definition:
And its result looks like this:
Derived column days
This is where the magic happens!
We create a new column dates which is an array of all the dates we want to have in our date table:
In this scenario we want a date table starting on 2019-01-01 and reaching one year into the future. The full expression looks like this:
mapLoop(
addDays(currentDate(), 365) - toDate(2019-01-01),
addDays(toDate(2019-01-01), #index)
)
This is what happens here:
the mapLoop() function builds an array of elements. You specify the number of elements you want to have and the lambda expression to calculate each of the elements. For example, mapIndex([1, 2, 3, 4], #item + 2 + #index) results in [4, 6, 8, 10]
addDays(currentDate(), 365) - toDate('2019-01-01') is the number of days between our start (2019-01-01) and end date (1 year in the future from now) and thus the number of dates we want to have in our resulting array.
addDays(toDate(2019-01-01), #index) calculates each array item by adding #index days to our start date. This is executed for the number of days we've calculated before and #index is the array position. Thus, the first element of the array will be 2019-01-01 + 1, the second 2019-01-01 + 2 and so on.
Our stream now has these columns:
Flatten
Finally, you need a flatten transformation which will expand each item in your array to its dedicated row. We can also dismiss the useless empty column in this step:
And this finally results in what we wanted to achieve:
References
Data transformation expressions in mapping data flow
I have a table CarHistoryFact (CarHistoryFactId, CarId, CarHistoryFactTime, CarHistoryFactConditions) that tracks the historical status of cars. CarHistoryFactConditions is a 25-bit (in binary) int column that encodes the status of 25 various conditions the car may be in at a given point in time.
I have a dimension table CarConditions with a row for each of the conditions, and their base 10 bit value.
How can I implement a "relationship" between the fact and dimension, giving a list of all the conditions a given car is
I can come up with bit parsing code, but I'm not sure how to hook it up to the dimension table to get just the currently applicable conditions at a car-time.
Bitmask parsing in dax can be seen here :
https://radacad.com/quick-dax-convert-number-to-binary
You can create a CROSSJOIN Table where all records are added 25 times and then filter out the once not existing
CarHistoryConditions =
var temp = CROSSJOIN(CarHistoryFact ; CarConditions )
return FILTER(temp; MOD(TRUNC(CarHistoryFact [CarHistoryFactConditions] / CarConditions [bit]):2) = 1)
note: I assumed the CarHistoryFactConditions and bit to be an integer, not a string of bits. For sure you can change that.
The reult is a table with as many rows of conditions for each car. E.g. Car one has 2 conditions and Car two has 5 conditions. You get 7 rows
We are trying to implement a dashboard that displays various tables, metrics and a map where the dataset is a list of customers. The primary filter condition is the disjunction of two numeric fields. We want to the user to be able to select a threshold for [field 1] and a separate threshold for [field 2] and then impose the condition [field 1] >= <threshold> OR [field 2] >= <threshold>.
After that, we want to also allow various other interactive slicers so the user can restrict the data further, e.g. by country or account manager.
Power BI naturally imposes AND between all filters and doesn't have a neat way to specify OR. Can you suggest a way to define a calculation using the two numeric fields that is then applied as a filter within the same interactive dashboard screen? Alternatively, is there a way to first prompt the user for the two threshold values before the dashboard is displayed -- so when they click Submit on that parameter-setting screen they are then taken to the main dashboard screen with the disjunction already applied?
Added in response to a comment:
The data can be quite simple: no complexity there. The complexity is in getting the user interface to enable a disjunction.
Suppose the data was a list of customers with customer id, country, gender, total value of transactions in the last 12 months, and number of purchases in last 12 months. I want the end-user (with no technical skills) to specify a minimum threshold for total value (e.g. $1,000) and number of purchases (e.g. 10) and then restrict the data set to those where total value of transactions in the last 12 months > $1,000 OR number of purchases in last 12 months > 10.
After doing that, I want to allow the user to see the data set on a dashboard (e.g. with a table and a graph) and from there select other filters (e.g. gender=male, country=Australia).
The key here is to create separate parameter tables and combine conditions using a measure.
Suppose we have the following Sales table:
Customer Value Number
-----------------------
A 568 2
B 2451 12
C 1352 9
D 876 6
E 993 11
F 2208 20
G 1612 4
Then we'll create two new tables to use as parameters. You could do a calculated table like
Number = VALUES(Sales[Number])
Or something more complex like
Value = GENERATESERIES(0, ROUNDUP(MAX(Sales[Value]),-2), ROUNDUP(MAX(Sales[Value]),-2)/10)
Or define the table manually using Enter Data or some other way.
In any case, once you have these tables, name their columns what you want (I used MinNumber and MinValue) and write your filtering measure
Filter = IF(MAX(Sales[Number]) > MIN(Number[MinCount]) ||
MAX(Sales[Value]) > MIN('Value'[MinValue]),
1, 0)
Then put your Filter measure as a visual level filter where Filter is not 0 and use MinCount and MinValues column as slicers.
If you select 10 for MinCount and 1000 for MinValue then your table should look like this:
Notice that E and G only exceed one of the thresholds and tha A and D are excluded.
To my knowledge, there is no such built-in slicer feature in Power BI at the time being. There is however a suggestion in the Power BI forum that requests a functionality like this. If you'd be willing to use the Power Query Editor, it's easy to obtain the values you're looking for, but only for hard-coded values for your limits or thresh-holds.
Let me show you how for a synthetic dataset that should fit the structure of your description:
Dataset:
CustomerID,Country,Gender,TransactionValue12,NPurchases12
51,USA,M,3516,1
58,USA,M,3308,12
57,USA,M,7360,19
54,USA,M,2052,6
51,USA,M,4889,5
57,USA,M,4746,6
50,USA,M,3803,3
58,USA,M,4113,24
57,USA,M,7421,17
58,USA,M,1774,24
50,USA,F,8984,5
52,USA,F,1436,22
52,USA,F,2137,9
58,USA,F,9933,25
50,Canada,F,7050,16
56,Canada,F,7202,5
54,Canada,F,2096,19
59,Canada,F,4639,9
58,Canada,F,5724,25
56,Canada,F,4885,5
57,Canada,F,6212,4
54,Canada,F,5016,16
55,Canada,F,7340,21
60,Canada,F,7883,6
55,Canada,M,5884,12
60,UK,M,2328,12
52,UK,M,7826,1
58,UK,M,2542,11
56,UK,M,9304,3
54,UK,M,3685,16
58,UK,M,6440,16
50,UK,M,2469,13
57,UK,M,7827,6
Desktop table:
Here you see an Input table and a subset table using two Slicers. If the forum suggestion gets implemented, it should hopefully be easy to change a subset like below to an "OR" scenario:
Transaction Value > 1000 OR Number or purchases > 10 using Power Query:
If you use Edit Queries > Advanced filter you can set it up like this:
The last step under Applied Steps will then contain this formula:
= Table.SelectRows(#"Changed Type2", each [NPurchases12] > 10 or [TransactionValue12] > 1000
Now your original Input table will look like this:
Now, if only we were able to replace the hardcoded 10 and 1000 with a dynamic value, for example from a slicer, we would be fine! But no...
I know this is not what you were looking for, but it was the best 'negative answer' I could find. I guess I'm hoping for a better solution just as much as you are!
In Power BI Desktop, I'm trying to order the following column with repeated values by an ID column (contains primary key).
This returns the error: "There can't be more than one value in "Nível2"...."
In this other post it seems the suggestion is to concatenate the values of the column so they don't get duplicate.
But I want them to be repeated so they can aggregate values in visuals.
So, what's the workaround for this situation?
Thanks in advance for helping!
The issue is that your sort column (i.e. your ID column) contains multiple values for each value in the column you are trying to sort (i.e. your Nivel2 column).
You need to ensure that your sort column contains only one distinct value for each value in the column you are trying to sort.
One way to achieve this would be to create a new (calculated) sort column based on your ID column. It could be defined like this:
SortColumn:=CALCULATE(MAX('YourTable'[ID]),ALLEXCEPT('YourTable','YourTable'[Nivel2]))
Here is an example of how the SortColumn would behave:
Id Nivel2 SortColumn
1 Caixa 4
2 Caixa 4
3 Caixa 4
4 Caixa 4
5 Depósitos à ordem 7
6 Depósitos à ordem 7
7 Depósitos à ordem 7
You can now sort Nivel2 by SortColumn.
EDIT - The implementation of the SortColumn should be done in the data source
There seems to be a limitation in PowerBI where it checks the implementation of the sort column rather than the data in the sort column. Therefore the above solution does not work, even though the data in the sort column is perfectly valid. The above solution will throw this error when you attempt to sort [Nivel2] by SortColumn:
This column can't be sorted by a column that is already sorted, directly or indirectly, by this column.
The implementation of the SortColumn should be moved to the data source instead. I.e. if your data source is an Excel sheet, then the SortColumn should be created inside the Excel sheet.
The above answer does explain the issue and the resolvation correctly. The only change is that the SortColumn must be implemented outside of the tabular model (PowerBI) to ensure that PowerBI does not know about the dependency between the SortColumn and the [Nivel2] column.
In my case, I calculate the levels from a parent-child hierarchy
Path = Path([id],[father])
For each level:
Level1 = LOOKUPVALUE([Name],[id], PathItem([Path],1))
Level2 = LOOKUPVALUE([Name],[id], PathItem([Path],2))
.....
Then I created a new column for each level to sort the column Level:
SortL1 = LOOKUPVALUE([nID],[id], PathItem([Path],1))
SortL2 = LOOKUPVALUE([nID],[id], PathItem([Path],2))
.....
id and nID is the same numeric variable but "id" in string format because Path do not support numeric values.