Implementing Binomial Hypothesis Testing significance tests in Power BI (DAX) - powerbi

This is partly a theory question, and partly an implementation question. My stats is a little rusty...
I am developing a report that is attempting to determine if the difference in occurances between a reference group and a selected group are statistically significant.
So, for example, if something occurs in X of n tests for one group, is it statistically significant than if it 'normally' occurs at a rate of Y of m tests for a different (control) group.
So, my H0 is that the rate is Y of m, per the control group
h1 is that it is not the same as the control group. (ideally, I'd like to use a 1-tailed test, depending if the observed occurrence is greater or less than the control, but my current implementation is 2 tailed)
I'd be comfortable with a CI of 80%.
I've got (slightly pseudocode here):
Zscore =
VAR pControl = DIVIDE(COUNT([Control occurrences]), COUNT([Control Tests])) RETURN
VAR pTest = DIVIDE(COUNT([Test occurrences]), COUNT([Test Tests])) RETURN
VAR controlStandardError =
SQRT(
DIVIDE(
(pControl * (1-pControl)
, COUNT([Control Tests])
)
) RETURN
VAR testStandardError =
SQRT(
DIVIDE(
(pTest* (1-pTest)
, COUNT([Test Tests])
)
) RETURN
DIVIDE(
(pTest - pControl)
, SQRT(POWER(testStandardError, 2) + POWER(controlStandardError, 2)
)
I'm then calculating:
p-Value =
VAR pControl = DIVIDE(COUNT([Control occurrences]), COUNT([Control Tests])) RETURN
IF(pControl > 0,
1 - ABS(NORM.DIST(Zscore, 0, 1, TRUE)
)
I am then displaying in a table each of my non-null hypotheses and filtering the table such that p-Value is less than 0.1. (2-tailed 80%)
am I on the right track here? Or have I completely bungled the theory on this one?

Theory and example tables - Right-tailed (μ > μ₀)
DAX
ControlGroup
XControl = COUNTROWS(FILTER(ControlGroup,ControlGroup[Outcome]=1))
NControl = COUNTROWS(ControlGroup)
pControl = DIVIDE([XControl],[NControl])
TreatmentGroup
XTreatment = COUNTROWS(FILTER(TreatmentGroup,TreatmentGroup[Outcome]=1))
NTreatment = COUNTROWS(TreatmentGroup)
pTreatment = DIVIDE([XTreatment],[NTreatment])
Test Parameters
PooledProportion =
DIVIDE(
[XTreatment]+[XControl],
[NTreatment]+[NControl]
)
ZCritivalValue = NORM.S.INV(0.90)
ZValue = DIVIDE(
[pTreatment]-[pControl],
SQRT(
[PooledProportion]*(1-[PooledProportion])*((1/[NTreatment])+(1/[NControl]))
)
)
Visualization (example)

Related

ABC analysis without aggregation

I never use DAX and I find it very complicated!
I read this https://www.daxpatterns.com/abc-classification/ but I find it complicated.
My dataset is like this https://1drv.ms/x/s!AvMy-Bzzx3mqgwmKSZm6wr_W4VXr?e=Jwwbji&nav=MTVfezAwMDAwMDAwLTAwMDEtMDAwMC0wMDAwLT...
I need to sum column Value if the Name and date are the sums after doing the ABC analysis. The problem is that I can't aggregate before in transform because I need the column Name Fatt!
It is not necessary for a dynamic ABC analysis. The value for each Name and Date is the same...
I appreciate every help! Also video tutorial or something can help me. Thanks!
'Aggregazione fatturato = SUMX(UNION(VALUES(Name),VALUES(Date)),SUM(Value)
Fatturato Cumulato =
VAR CurrentProductSales = 'Misure'[Aggregazione fatturato]
VAR BetterProducts =
FILTER (
Name,
'Misure'[Aggregazione fatturato] >= CurrentProductSales
)
VAR Result =
SUMX (
BetterProducts,
'Misure'[Aggregazione fatturato]
)
RETURN
Result
Cumulated Pct =
DIVIDE (
'Misure'[Fatturato Cumulato],
SUM ('Misure'[Aggregazione fatturato])
ERROR 'Misure'[Aggregazione fatturato] not found

Power bi returning value based on multiple condition

I have the data below
create table #data (Post_Code varchar(10), Internal_Code varchar(10))
insert into #data values
('AB10','hb3'),('AB10','hb25'),('AB12','dd1'),('AB15','hb6'),('AB16','aa4'),('AB16','hb7'),
('AB16','aa2'),('AB16','ab9'),('AB18','rr6'),('AB18','rr9'),('AB18','hb10'),('AB20','rr15'),
('AB20','td2'),('AB21','hb8'),('AB21','cc4'),('AB21','cc4'),('AB24','td5'),('AB9','yy3'),
('RM2','CC1'),('RM6','hb6'),('RM7','cc2'),('SA24','rr1'),('SA24','hb5'),('SA24','rr2'),
('SA24','cc34'),('SE15','rr9'),('SE15','rr5'),('SE25','rr10'),('SE25','hb11'),('SE25','rr8'),
('SE25','rr1'),('LA15','rr2')
select * from #data
drop table #data
What I want to achieve is if the same post code area have “hb” or “rr” in the same post code I want to return 1 else 0
The “hb” or “rr” internal_code must be in the same post_code if they in different post code. It should be 0
I wrote this DAX
Result = IF(left(Data[Internal_Code],2)="hb" || left(Data[Internal_Code],2)="rr",1,0)
it is not returning the correct result
current output
expected output
I think your expected result is incorrect as SA24 should also be 1. You should definitely do a calculation like this in PQ but if you need to do it in DAX in a calculated column, then use the following code which works.
Result =
VAR post_code = Data[Post_code]
RETURN
VAR hb = CALCULATE (COUNTROWS(Data),'Data'[Post_code] = post_code && left(Data[Internal_Code],2) = "hb" )
VAR rr = CALCULATE (COUNTROWS(Data),'Data'[Post_code] = post_code && left(Data[Internal_Code],2) = "rr" )
RETURN IF(hb>0 && rr > 0,1)

How to set proper back test range

I do not know how to code and I am trying to learn Pinescript but it really makes no sense to me so i googled how to set a backtest range and used some code someone else wrote but it doesn't seem to be actually testing the area i would like, it tests the entirety of the chart. I'd like to test from 1/1/2018 to present. I'm trying to do this for multiple strategies so I can better tailor them to the current market. here is wat I have for one of them and if you are willing to help with the others I would very much appreciate it!!! feel free to DM me.
//#version=5
strategy("Bollinger Bands BACKTEST", overlay=true)
source = close
length = input.int(20, minval=1)
mult = input.float(2.0, minval=0.001, maxval=50)
basis = ta.sma(source, length)
dev = mult * ta.stdev(source, length)
upper = basis + dev
lower = basis - dev
buyEntry = ta.crossover(source, lower)
sellEntry = ta.crossunder(source, upper)
if (ta.crossover(source, lower))
strategy.entry("BBandLE", strategy.long, stop=lower, oca_name="BollingerBands", oca_type=strategy.oca.cancel, comment="BBandLE")
else
strategy.cancel(id="BBandLE")
if (ta.crossunder(source, upper))
strategy.entry("BBandSE", strategy.short, stop=upper, oca_name="BollingerBands", oca_type=strategy.oca.cancel, comment="BBandSE")
else
strategy.cancel(id="BBandSE")
//plot(strategy.equity, title="equity", color=color.red, linewidth=2, style=plot.style_areabr)
// === INPUT BACKTEST RANGE ===
fromMonth = input.int(defval = 1, title = "From Month", minval = 1, maxval = 12)
fromDay = input.int(defval = 1, title = "From Day", minval = 1, maxval = 31)
fromYear = input.int(defval = 2018, title = "From Year", minval = 1970)

How to build an inflation term structure in QuantLib?

This is what I've got, but I'm getting weird results. Can you spot an error?:
#Zero Coupon Inflation Indexed Swap Data
zciisData = [(ql.Date(18,4,2020), 1.9948999881744385),
(ql.Date(18,4,2021), 1.9567999839782715),
(ql.Date(18,4,2022), 1.9566999673843384),
(ql.Date(18,4,2023), 1.9639999866485596),
(ql.Date(18,4,2024), 2.017400026321411),
(ql.Date(18,4,2025), 2.0074000358581543),
(ql.Date(18,4,2026), 2.0297999382019043),
(ql.Date(18,4,2027), 2.05430006980896),
(ql.Date(18,4,2028), 2.0873000621795654),
(ql.Date(18,4,2029), 2.1166999340057373),
(ql.Date(18,4,2031), 2.152100086212158),
(ql.Date(18,4,2034), 2.18179988861084),
(ql.Date(18,4,2039), 2.190999984741211),
(ql.Date(18,4,2044), 2.2016000747680664),
(ql.Date(18,4,2049), 2.193000078201294)]
def build_inflation_term_structure(calendar, observationDate):
dayCounter = ql.ActualActual()
yTS = build_yield_curve()
lag = 3
fixing_date = calendar.advance(observationDate,-lag, ql.Months)
convention = ql.ModifiedFollowing
cpiTS = ql.RelinkableZeroInflationTermStructureHandle()
inflationIndex = ql.USCPI(False, cpiTS)
#last observed CPI level
fixing_rate = 252.0
baseZeroRate = 1.8
inflationIndex.addFixing(fixing_date, fixing_rate)
observationLag = ql.Period(lag, ql.Months)
zeroSwapHelpers = []
for date,rate in zciisData:
nextZeroSwapHelper = ql.ZeroCouponInflationSwapHelper(rate/100,observationLag,date,calendar,
convention,dayCounter,inflationIndex)
zeroSwapHelpers = zeroSwapHelpers + [nextZeroSwapHelper]
# the derived inflation curve
derived_inflation_curve = ql.PiecewiseZeroInflation(observationDate, calendar, dayCounter, observationLag,
inflationIndex.frequency(), inflationIndex.interpolated(),
baseZeroRate, yTS, zeroSwapHelpers,
1.0e-12, ql.Linear())
cpiTS.linkTo(derived_inflation_curve)
return inflationIndex, derived_inflation_curve, cpiTS, yTS
observation_date = ql.Date(17, 4, 2019)
calendar = ql.UnitedStates()
inflationIndex, derived_inflation_curve, cpiTS, yTS = build_inflation_term_structure(calendar, observation_date)
If I plot the inflationIndex zero rates, I get this:
I've now been checking the same problem and first of all I don't think you need that function ZeroCouponInflationSwapHelper at all.
Just to let you know I've perfectly replicated OpenRiskEngine (ORE)'s inflation curve which itself is built on QuantLib and the idea is quite simple.
compute base value I(0) which is the lagged CPI (this is supposedly your value 252.0),
compute CPI(t)=I(T)=I(0)*(1+quote)^T, where T = ZCIS expiry date, t is T - 3M
I'm not sure where you took I(0) from, but since a lag is present the I(0) is not the latest available CPI value. Instead I(0)=I(2019-04-17) is interpolated value of CPI(2019-01) and CPI(2019-02).
Also to build the CPI curve you don't need any interest rate yield curve at all because the ZCIS pays at maturity T a single cash flow: 'floating' (I(T)/I(0)-1) for 'fixed' (1+quote)^T-1 cash flows. If you equate these, you can back out the 'fair' I(T) which I used above.
Assuming your value I(0)=252.0 is correct, your CPI curve would look like this:
import QuantLib as ql
import pandas as pd
fixing_rate = 252.0
observation_date = ql.Date(17, 4, 2019)
zciisData = [(ql.Date(18,4,2020), 1.9948999881744385),
(ql.Date(18,4,2021), 1.9567999839782715),
(ql.Date(18,4,2022), 1.9566999673843384),
(ql.Date(18,4,2023), 1.9639999866485596),
(ql.Date(18,4,2024), 2.017400026321411),
(ql.Date(18,4,2025), 2.0074000358581543),
(ql.Date(18,4,2026), 2.0297999382019043),
(ql.Date(18,4,2027), 2.05430006980896),
(ql.Date(18,4,2028), 2.0873000621795654),
(ql.Date(18,4,2029), 2.1166999340057373),
(ql.Date(18,4,2031), 2.152100086212158),
(ql.Date(18,4,2034), 2.18179988861084),
(ql.Date(18,4,2039), 2.190999984741211),
(ql.Date(18,4,2044), 2.2016000747680664),
(ql.Date(18,4,2049), 2.193000078201294)]
fixing_dates = []
CPI_computed = []
for tenor, quote in zciisData:
fixing_dates.append(tenor - ql.Period('3M')) # this is 'fixing date' t
pay_date = ql.ActualActual().yearFraction(observation_date, tenor) # this is year fraction of 'pay date' T
CPI_computed.append(fixing_rate * (1+quote/100)**(pay_date))
results = pd.DataFrame({'date': pd.Series(fixing_dates).apply(ql.Date.to_date), 'CPI':CPI_computed})
display(results)
results.set_index('date')['CPI'].plot();

Year over Year Variance Calculation Error "EARLIER/EARLIEST" refers to an earlier row context which doesn't exist

While trying to calculate Year over Year variance (unsuccessfully for 2 days now), I get the following error message.
EARLIER/EARLIEST refers to an earlier row context which doesn't exist.
YOY Variance = var PreviousYearPrinBal = CALCULATE(SUM(Deals[Principal Balance]),FILTER(ALL(Deals[Close Date].[Year]),Deals[Close Date].[Year] = EARLIER(Deals[Close Date].[Year])))
return
if(PreviousYearPrinBal = BLANK(), BLANK(), Deals[PrincipalBalance] - PreviousYearPrinBal)
In a different SO question, there is a different approach which gives me the following error:
A column specified in the call to function 'SAMEPERIODLASTYEAR' is not of type DATE. This is not supported.
yoy = CALCULATE([PrincipalBalance], SAMEPERIODLASTYEAR(Deals[Close Date].[Year]))
While I have some idea of what these mean, I do not have an idea of how to fix them. Here is my table.
And Here is what I expect as the result.
I've tried posting this question in Power BI community but haven't received an answer yet. Calculate Year over Year Variance.
ACTUAL DATA SAMPLE:
1) Created Year and Year Difference Column (Calculated Column)
Year = YEAR(Table1[Date])
Year Difference = Table1[Year] - Min(Table1[Year])
2) Created the Variance (Measure)
Variance =
Var current_YearDifference = SELECTEDVALUE(Table1[Year Difference])
Var Current_PrincipalBalance = CALCULATE(SUM(Table1[Principal Balance]),FILTER(ALL(Table1), Table1[Year Difference] = (current_YearDifference)))
Var Previous_PrincipalBalance = CALCULATE(SUM(Table1[Principal Balance]),FILTER(ALL(Table1), Table1[Year Difference] = (current_YearDifference - 1)))
Return if(current_YearDifference <> 0, (Current_PrincipalBalance - Previous_PrincipalBalance), 0)
3) Finally Created the Variance in terms of Percentage (Measure),
Variance in terms of Percentage =
Var current_YearDifference = SELECTEDVALUE(Table1[Year Difference])
Var Current_PrincipalBalance = CALCULATE(SUM(Table1[Principal Balance]),FILTER(ALL(Table1), Table1[Year Difference] = (current_YearDifference)))
Var Previous_PrincipalBalance = CALCULATE(SUM(Table1[Principal Balance]),FILTER(ALL(Table1), Table1[Year Difference] = (current_YearDifference - 1)))
Return if(current_YearDifference <> 0, ((Current_PrincipalBalance - Previous_PrincipalBalance) / Previous_PrincipalBalance), 0)
My Final Output
The Principal Balance has the function SUM selected on the Values Pane of the Output Table, where as the Year is Don't Summarize.
My Best Practice
Always Use Vars when creating Complex Measures to simplify the
formula.
Then return only a part of the Measure to check if the output is as expected.
Kindly let me know, if it helps or not.