I would like to imitate NTILE function of SQL in DAX. For a given number of bins, I would like a measure which returns the bin number for any value in a column. The bins should contain more or less equal number of observations.
So the parameters are:
number of bins
test value
table column
Here is something similar in Excel:
= MAX( ROUNDUP( PERCENTRANK($A$1:$A$8, A1) *4, 0),1)
In DAX, you can use the PERCENTILE.INC as the base for such a calculation.
Bucket =
VAR N = 4
VAR Percentiles =
ADDCOLUMNS (
GENERATESERIES ( 1, N ),
"Percentile", PERCENTILE.INC ( Table1[Col1], [Value] / N )
)
RETURN
MINX ( FILTER ( Percentiles, Table1[Col1] <= [Percentile] ), [Value] )
For your data, the Percentiles table variable looks like this:
Value Percentile
1 24.8
2 66.5
3 81.8
4 85.0
Then for each row in your original table, you take the minimum value from the calculated table where that Percentile column is less than or equal to the original table column Col1 in that row.
Note that the above is for a calculated column. For a measure, you'd need to specify an aggregation for Table1[Col1] in the last line (e.g. MAX(Table1[Col1])).
Related
I have a summary table in Power BI which shows how many days it takes for leads to convert to a sale. It has 2 columns, sum_convert (the amount of days in between lead creation date and converted date) and count_lead (the count of leads that have taken that amount of days to convert), both are numeric values. Here is an example of the data:
What I want, is a column next to count_lead that shows the running percentage total in the specific ascending order of sum_convert. Currently I've created a measure called lead_count which is the sum of count_lead. Then I've attempted to create the cumulative total with the following measure:
Cum_Lead = calculate([lead_count], FILTER(ALL(Convert_Count_Summary[Sum_Convert]), SUM(Convert_Count_Summary[count_lead]) <= [lead_count]))
This creates a cumulative total, but not in the specific sum_convert order, it's in the order of largest volume for count_lead. Any idea what I need to change so that it's in the order of sum_convert?
You could do this in Power Query using M:
= Table.AddColumn(#"Previous Step", "Cumulative_Count_PQ", each List.Sum(List.FirstN(#"Previous Step"[count_lead],_[sum_convert]+1)), type number)
Or as a calculated column using DAX:
Cumulative Count DAX =
CALCULATE (
SUM ( Convert_Count_Summary[count_lead] ),
ALL ( Convert_Count_Summary ),
Convert_Count_Summary[sum_convert] <= EARLIER ( Convert_Count_Summary[sum_convert] )
)
Edit:
Cumulative percentages in Power Query:
= Table.AddColumn(#"Previous Step", "Cumulative_Count_Percent_PQ", each List.Sum(List.FirstN(#"Previous Step"[count_lead],_[sum_convert]+1)) / List.Sum(#"Previous Step"[count_lead]), Percentage.Type)
Cumulative percentages calculated column in DAX:
Cumulative Count % DAX =
VAR _Numerator =
CALCULATE (
SUM ( Convert_Count_Summary[count_lead] ),
ALL ( Convert_Count_Summary ),
Convert_Count_Summary[sum_convert] <= EARLIER ( Convert_Count_Summary[sum_convert] )
)
VAR _Divisor =
SUM ( Convert_Count_Summary[count_lead] )
RETURN
DIVIDE (
_Numerator,
_Divisor
)
I have the dataset where the values in the col 'value' are repeated per month and id, e.g. for 1/1/2020 and id 1, the value is 0.5, for 2/1/2020, the value is 2, etc. The dataset has other columns which are to be used as filters for other calculations.
What will be the measure to get:
so that even when I use filters from the table, e.g. filter1, the value remains grouped by date ONLY?
I've tried with sumx and max; sum and value but nothing gives a result and calculation still reacts on other filters.
When i spoke abount ALL i had in mind this kind of solution:
WithoutExternalFilter =
CALCULATE (
VAR __dist =
ADDCOLUMNS (
SUMMARIZE ( Te, Te[Date], Te[ID] ),
"val", CALCULATE ( MAX ( Te[value] ) )
)
RETURN
SUMX ( __dist, [val] ),
ALL ( Te[filter1] )
)
WHEN we put some filter, value for "2020-08-01" is still 1.1:
I'm creating a report in Power BI, and want to return the last month Size.
I have a table with 4 columns named as Name, Size, Connections, Disconnections. The values on these columns are for the last 12 months. For example, Name column has A, B, C; Size column has 3608445, 2839945,874434; Connections column has 66875,85632,19237 and Disconnections column has 52658,61529 and 15832 values. These values are for the last 12 months. See screenshot below.
The code I used to created the expected table is
last_month_size =
VAR current_month =
MONTH ( TODAY () )
RETURN
CALCULATE (
[Size],
FILTER (
'Monthly Calendar_Lookup',
MONTH ( 'Monthly Calendar_Lookup'[Dates] ) = current_month - 1
)
)
I want to create a measure that will return last month Size column but the Connections and Disconnections remains the same. For example, the Size value changes while the connections and disconnections values remains the last 12 month values.
I find it difficult because the columns are on the same table.
I have researched about the question I have posted and I have found a solution.
This solution to the problem is creating measures and not using variables.
First, I created a measure called Total Size
Total Size = Sum ( Tablename [Size] )
Then, created another measure called prev_month size using DATEADD function with number_of_intervals as 0
prev_month size = CALCULATE ( [Total Size], DATEADD ('Monthly Calendar_Lookup'[Dates], 0, MONTH ) )
Next, I created measures of total connections and total disconnections
Total Connections = Sum ( Tablename [Connections] )
Total Disconnections = Sum ( Tablename [Disconnections] )
Also, I created two measures of rolling 12 months Connections and Disconnections each.
Rolling_Connections_12_months =
CALCULATE ( SUMX ('Tablename', [Total Connections] ),
DATESINPERIOD ('Date'[Month], LASTDATE ( 'Date'[Month] ), -12, MONTH ) )
Rolling_Disconnections_12_months =
CALCULATE ( SUMX ('Tablename', [Total Disconnections] ),
DATESINPERIOD ('Date'[Month], LASTDATE ( 'Date'[Month] ), -12, MONTH ) )
Drag the Name, prev_month size, Rolling_Connections_12_months, and Rolling_Disconnections_12_months on the canvas as a table visualization.
Then finally, I drag a relative Date slicer and set it as Last 1 Month.
This produces the expected results
I would like to count the frequency of a value in a column for each row. In Excel my situation can be solved with this formula:
=COUNTIF(I:I;I4)
In PowerBi Report and I have a table of students with a column, "Pääaine" (main subject). There are 81 distinct values in 1580 rows. I would like to calculate the number of similar students for each row (so that I can filter out the main subjects that have 4 or less students).
How do I do it in PowerBI?
With Calculated column like this I get 1580 for each cell:
Pääaine lkm =
CALCULATE(
COUNTROWS(Opiskelunkulku);
FILTER(
Opiskelunkulku;
Opiskelunkulku[Pääaine] = Opiskelunkulku[Pääaine]
)
)
You can use COUNTROWS() and EARLIER() to achieve this. EARLIER() returns the value for the specified column in the current row context.
Pääaine lkm =
COUNTROWS (
FILTER (
Opiskelunkulku,
Opiskelunkulku[Pääaine] = EARLIER ( Opiskelunkulku[Pääaine] )
)
)
As an alternative to Rory's answer try CALCULATE() with ALLEXCEPT() as a filter. Like this:
Pääaine lkm =
CALCULATE (
COUNTROWS ( Opiskelunkulku ),
ALLEXCEPT ( Opiskelunkulku, Opiskelunkulku[Pääaine] )
)
How to use if else for DAX in the measure. If row value =1 then take the var a calculated value else take the var b calculated value
x:=var a=[DATA1]
var b=[DATA2]
return(if([HOUR]=1),a,b)
I get error using above formula
It seems your problem is that you are not aggregating the columns while creating the measure. Measures only works aggregating data in a given context, generally if you want to perform calculations per row you should use a calculated column instead of a measure.
And the DAX expression for a calculated column should be:
MyColumn = IF([HOUR] = 1, [DATA1], [DATA2])
Otherwise if you want to use a measure you have to explicitely aggregate the column values in the given context, i.e:
MyMeasure =
VAR a =
FIRSTNONBLANK ( ExampleTable[Data1], 0 )
VAR b =
FIRSTNONBLANK ( ExampleTable[Data2], 0 )
RETURN
IF ( SUM ( ExampleTable[Hour] ) = 1, a, b )
Or simply:
MyMeasure =
IF (
SUM ( [Hour] ) = 1,
FIRSTNONBLANK ( ExampleTable[Data1], 0 ),
FIRSTNONBLANK ( ExampleTable[Data2], 0 )
)
Let me know if this helps.