The COUNT function only accepts a column reference as an argument - powerbi

I created formula
RRC SR < 98% = CALCULATE(COUNT(x_TOPN_new_daily[RRC Setup Success Rate] <98))
to measure value less than 98 on column RRC Setup Success Rate but it gives error "The COUNT function only accepts a column reference as an argument.", Do u have any idea?
City
RRC Setup Success Rate
B
100​
A
96​
C
94​
F
95​
R
99​
C
97​
I want to get count result where less than 98

Use this measure
RRC SR < 98% =
CALCULATE (
COUNT ( x_TOPN_new_daily[RRC Setup Success Rate] ),
x_TOPN_new_daily[RRC Setup Success Rate] < 98
)

Related

Calculate the amount of the cost of tickets finalized per material divided by the total amount of the tickets finalized

I have the following need :
Calculate the ratio between the sum of the amounts of tickets with status finalized for each material and the sum of the total amounts of the tickets finalized.
My fact table is like below :
TicketID StatusID MaterialID CategoryID Amount FKDATE
123 3 45 9 150 12/03/2021
124 5 50 4 569 11/03/2021
125 3 78 78 556 14/03/2021
126 -1 -1 -1 -1 12/03/2021
My dimension Status is like below :
StatusID Status
1 Open
2 In Process
3 Finalized
My dimension Material is like below :
MaterialID MaterielLabel
1 Bikes
.. ..
I want to exclude the TicketID with MaterialID = -1.
Try the following :
AmountFinalizedByMaterial:=
VAR AmountFinalizedByMaterialGroup =
CALCULATE (
SUM(yourFactTable[Amount]),
Status[Status] = "Finalized" ,
yourFactTable[MaterialID] <> -1)
VAR TotalAmountFinalized =
CALCULATE (
SUM(yourFactTable[Amount]),
Status[Status] = "Finalized" ,
ALL(Material)
)
RETURN
DIVIDE (
AmountFinalizedByMaterialGroup,
TotalAmountFinalized
)

How can I write a query to carry a remaining balance of hours forward for load leveling a schedule?

I have a query result with a total amount of hours scheduled per week in chronological order without gaps and have a set amount of hours that can be processed each week. Any hours not processed should be carried over to one or more following weeks. The following information is available.
Week | Hours | Capacity
1 2000 160
2 100 160
3 0 140
4 150 160
5 500 160
6 1500 160
Each week it should reduce the new hours plus carried over hours by the Capacity but never go below zero. A positive value should carry into the following week(s).
Week | Hours | Capacity | LeftOver = (Hours + LAG(LeftOver) - Capacity)
1 400 160 240 (400 + 0 - 160)
2 100 160 180 (100 + 240 - 160)
3 0 140 40 ( 0 + 180 - 140)
4 20 160 0 ( 20 + 40 - 160) (no negative, change to zero)
5 500 160 340 (500 + 0 - 160)
6 0 160 180 ( 0 + 340 - 160)
I'm assuming this can be done with cte recursion and a running value that doesn't go below zero but I can't find any specific examples of how this would be written.
Well, you are not wrong, a recursive common table expression is indeed an option to construct a solution.
Construction of recursive queries can generally be done in steps. Run your query after every step and validate the result.
Define the "anchor" of your recursion: where does the recursion start?Here the start is defined by Week = 1.
Define a recursion iteration: what is the relation between iterations?Here that would be the incrementing week numbers d.Week = r.Week + 1.
Avoiding negative numbers can be resolved with a case expression.
Sample data
create table data
(
Week int,
Hours int,
Capacity int
);
insert into data (Week, Hours, Capacity) values
(1, 400, 160),
(2, 100, 160),
(3, 0, 140),
(4, 20, 160),
(5, 500, 160),
(6, 0, 160);
Solution
with rcte as
(
select d.Week,
d.Hours,
d.Capacity,
case
when d.Hours - d.Capacity > 0
then d.Hours - d.Capacity
else 0
end as LeftOver
from data d
where d.Week = 1
union all
select d.Week,
d.Hours,
d.Capacity,
case
when d.Hours + r.LeftOver - d.Capacity > 0
then d.Hours + r.LeftOver - d.Capacity
else 0
end
from rcte r
join data d
on d.Week = r.Week + 1
)
select r.Week,
r.Hours,
r.Capacity,
r.LeftOver
from rcte r
order by r.Week;
Result
Week Hours Capacity LeftOver
---- ----- -------- --------
1 400 160 240
2 100 160 180
3 0 140 40
4 20 160 0
5 500 160 340
6 0 160 180
Fiddle to see things in action.
I ended up writing a few CTEs then a recursive CTE and got what I needed. The capacity is a static number here but will be replaced later with one that takes holidays and vacations into account. Will also need to consider the initial 'LeftOver' value for the first week but could use this query with an earlier date period to find the most recent date with a zero LeftOver value then use that as a new start date, then filter out those earlier weeks in the final query.
DECLARE #StartDate date = (SELECT MAX(FirstDayOfWorkWeek) FROM dbo._Calendar WHERE Date <= GETDATE());
DECLARE #EndDate date = DATEADD(week, 12, #StartDate);
DECLARE #EmployeeQty int = (SELECT ISNULL(COUNT(*), 0) FROM Employee WHERE DefaultDepartment IN (4) AND Hidden = 0 AND DateTerminated IS NULL);
WITH hours AS (
/* GRAB ALL NEW HOURS SCHEDULED FOR EACH WEEK IN THE SELECTED PERIOD */
SELECT c.FirstDayOfWorkWeek as [Date]
, SUM(budget.Hours) as hours
FROM dbo.Project_Phase phase
JOIN dbo.Project_Budget_Labor budget on phase.ID = budget.Phase
JOIN dbo._Calendar c on CONVERT(date, phase.Date1) = c.[Date]
WHERE phase.CompletedOn IS NULL AND phase.Project <> 4266
AND phase.Date1 BETWEEN #StartDate AND #EndDate
AND budget.Department IN (4)
GROUP BY c.FirstDayOfWorkWeek
)
, weeks AS (
/* CREATE BLANK ROWS FOR EACH WEEK AND JOIN TO ACTUAL HOURS TO ELIMINATE GAPS */
/* ADD A ROW NUMBER FOR RECURSION IN NEXT CTE */
SELECT cal.[Date]
, ROW_NUMBER() OVER(ORDER BY cal.[Date]) as [rownum]
, ISNULL(SUM(hours.Hours), 0) as Hours
FROM (SELECT FirstDayOfWorkWeek as [Date] FROM dbo._Calendar WHERE [Date] BETWEEN #StartDate AND #EndDate GROUP BY FirstDayOfWorkWeek) as cal
LEFT JOIN hours on cal.[Date] = hours.[Date]
GROUP BY cal.[Date]
)
, spread AS (
/* GRAB FIRST WEEK AND USE RECURSION TO CREATE RUNNING TOTAL THAT DOES NOT DROP BELOW ZERO*/
SELECT TOP 1 [Date]
, rownum
, Hours
, #EmployeeQty * 40 as Capacity
, CONVERT(numeric(9,2), 0.00) as LeftOver
, Hours as running
FROM weeks
ORDER BY rownum
UNION ALL
SELECT curr.[Date]
, curr.rownum
, curr.Hours
, #EmployeeQty * 40 as Capacity
, CONVERT(numeric(9,2), CASE WHEN curr.Hours + prev.LeftOver - (#EmployeeQty * 40) < 0 THEN 0 ELSE curr.Hours + prev.LeftOver - (#EmployeeQty * 40) END) as LeftOver
, curr.Hours + prev.LeftOver as running
FROM weeks curr
JOIN spread prev on curr.rownum = (prev.rownum + 1)
)
SELECT spread.Hours as NewHours
, spread.LeftOver as PrevHours
, spread.Capacity
, spread.running as RunningTotal
, CASE WHEN running < Capacity THEN running ELSE Capacity END as HoursThisWeek
FROM spread

Implementing Binomial Hypothesis Testing significance tests in Power BI (DAX)

This is partly a theory question, and partly an implementation question. My stats is a little rusty...
I am developing a report that is attempting to determine if the difference in occurances between a reference group and a selected group are statistically significant.
So, for example, if something occurs in X of n tests for one group, is it statistically significant than if it 'normally' occurs at a rate of Y of m tests for a different (control) group.
So, my H0 is that the rate is Y of m, per the control group
h1 is that it is not the same as the control group. (ideally, I'd like to use a 1-tailed test, depending if the observed occurrence is greater or less than the control, but my current implementation is 2 tailed)
I'd be comfortable with a CI of 80%.
I've got (slightly pseudocode here):
Zscore =
VAR pControl = DIVIDE(COUNT([Control occurrences]), COUNT([Control Tests])) RETURN
VAR pTest = DIVIDE(COUNT([Test occurrences]), COUNT([Test Tests])) RETURN
VAR controlStandardError =
SQRT(
DIVIDE(
(pControl * (1-pControl)
, COUNT([Control Tests])
)
) RETURN
VAR testStandardError =
SQRT(
DIVIDE(
(pTest* (1-pTest)
, COUNT([Test Tests])
)
) RETURN
DIVIDE(
(pTest - pControl)
, SQRT(POWER(testStandardError, 2) + POWER(controlStandardError, 2)
)
I'm then calculating:
p-Value =
VAR pControl = DIVIDE(COUNT([Control occurrences]), COUNT([Control Tests])) RETURN
IF(pControl > 0,
1 - ABS(NORM.DIST(Zscore, 0, 1, TRUE)
)
I am then displaying in a table each of my non-null hypotheses and filtering the table such that p-Value is less than 0.1. (2-tailed 80%)
am I on the right track here? Or have I completely bungled the theory on this one?
Theory and example tables - Right-tailed (μ > μ₀)
DAX
ControlGroup
XControl = COUNTROWS(FILTER(ControlGroup,ControlGroup[Outcome]=1))
NControl = COUNTROWS(ControlGroup)
pControl = DIVIDE([XControl],[NControl])
TreatmentGroup
XTreatment = COUNTROWS(FILTER(TreatmentGroup,TreatmentGroup[Outcome]=1))
NTreatment = COUNTROWS(TreatmentGroup)
pTreatment = DIVIDE([XTreatment],[NTreatment])
Test Parameters
PooledProportion =
DIVIDE(
[XTreatment]+[XControl],
[NTreatment]+[NControl]
)
ZCritivalValue = NORM.S.INV(0.90)
ZValue = DIVIDE(
[pTreatment]-[pControl],
SQRT(
[PooledProportion]*(1-[PooledProportion])*((1/[NTreatment])+(1/[NControl]))
)
)
Visualization (example)

How to distinguish between BLANK and 0 in a PowerBI Measure?

I have 2 tables like this:
PM_History2
Serial# Good
A TRUE
B FALSE
A TRUE
B FALSE
C TRUE
A FALSE
C TRUE
CONTRACTS
Serial# Enrollment#
A 1
B 2
C 3
D 4
I have a measure that calculates the number of Good for TRUE:
Count of Good for True =
CALCULATE(COUNTA('PM_History2'[Good]), 'PM_History2'[Good] IN { TRUE })
I then have a measure that calculates the percentage of TRUEs for Good.
PM Score = 'PM_History2'[Count of Good for True]/COUNTROWS(PM_History2)
When I create a table visualization to show all the Serial# and their PM Score I get this:
Serial# PM Score
A .67
B
C 1.00
D
What can I do to get what should be a zero to come in as 0 and what should be blank to be blank. Like this:
Serial# PM Score
A .67
B 0
C 1.00
D
Thank you in advance!
Try this:
PM Score = DIVIDE ( [Count of Good for True] + 0, COUNTROWS ( PM_History2 ) )
Adding + 0 makes the numerator nonblank but the DIVIDE function still returns a blank when the denominator is blank, thus distinguishing the results for B and D.

Python remove outliers from data

I have a data frame as following:
ID Value
A 70
A 80
B 75
C 10
B 50
A 1000
C 60
B 2000
.. ..
I would like to group this data by ID, remove the outliers from the grouped data (the ones we see from the boxplot) and then calculate mean.
So far
grouped = df.groupby('ID')
statBefore = pd.DataFrame({'mean': grouped['Value'].mean(), 'median': grouped['Value'].median(), 'std' : grouped['Value'].std()})
How can I find outliers, remove them and get the statistics.
I believe the method you're referring to is to remove values > 1.5 * the interquartile range away from the median. So first, calculate your initial statistics:
statBefore = pd.DataFrame({'q1': grouped['Value'].quantile(.25), \
'median': grouped['Value'].median(), 'q3' : grouped['Value'].quantile(.75)})
And then determine whether values in the original DF are outliers:
def is_outlier(row):
iq_range = statBefore.loc[row.ID]['q3'] - statBefore.loc[row.ID]['q1']
median = statBefore.loc[row.ID]['median']
if row.Value > (median + (1.5* iq_range)) or row.Value < (median - (1.5* iq_range)):
return True
else:
return False
#apply the function to the original df:
df.loc[:, 'outlier'] = df.apply(is_outlier, axis = 1)
#filter to only non-outliers:
df_no_outliers = df[~(df.outlier)]
Q1 = df['Value'].quantile(0.25)
Q3 = df['Value'].quantile(0.75)
IQR = Q3 - Q1
data = df[~((df['Value'] < (Q1 - 1.5 * IQR)) |(df['Value'] > (Q3 + 1.5 *
IQR))).any(axis=1)]
just do :
In [187]: df[df<100].groupby('ID').agg(['mean','median','std'])
Out[187]:
Value
mean median std
ID
A 75.0 75.0 7.071068
B 62.5 62.5 17.677670
C 35.0 35.0 35.355339