Hierarchical modeling categorical variable interactions in PyMC3 - pymc3

I'm attempting to use PyMC3 to implement a hierarchical model with categorical variables and their interactions. In R, the formula would take the form of something like:
y ~ x1 + x2 + x1:x2
However, on the tutorial https://pymc-devs.github.io/pymc3/GLM-hierarchical/#partial-pooling-hierarchical-regression-aka-the-best-of-both-worlds they explicitly say that glm doesn't play nice with hierarchical modeling yet.
So how would I go about adding the x1:x2 term? Would it be a categorical variable with two categorical parents (x1 and x2)?

You can just manually add the interaction term to your linear model. You would have to add 3 regression coefficients (betas) and one intercept. You can then estimate your y with a likelihood as follows:
y = pm.Normal('regression',
mu=intercept + beta_x1 * data_x1 + beta_x2 * data_x2 + beta_interaction * data_x1 * data_x2,
sd=sigma,
observed=data_y)
The parameters themselves can all have hyperpriors to build a hierarchical model.

Related

Trying to calculate the indirect effects in a multilevel structural equation model using Stata. Can someone confirm than I have this code right?

I'm trying to build a multilevel structural equation model in stata, using Example 42g in the Stata handbook as a guide. My code is working, but I'm going off-book a little in calculating the indirect effects and could just use a sanity check from a pro.
I'm testing a moderated mediation model in which
Path A = fear <- i.theme + i.condition + i.theme#i.condition + easyseec
Path B = ppt_ave <- fear
Path C' = ppt_ave <- i.theme + i.condition + i.theme#i.condition + easyseec
The model uses nested data -- easyseec is nested under subjects, so I've added a random intercept and random slope under the variable responseid.
The .gsem output roughly tracks with the output the same model built in R with lmer, so I feel good about that. But I could really use an eyeball on the .nlcom below. This should calculate the indirect effect for a particular theme/condition contrast, right?
. gsem (ppt_ave <- fear i.theme i.condition i.theme#i.condition easyseec M1[responseid]) (fear <- i.theme i.condition i.theme#i.condition easyseec M2[responseid])
. nlcom _b[ppt_ave:fear]*(_b[fear:2.theme]+_b[fear:2.condition]+_b[fear:2.theme#2.condition]+_b[fear:easysee])

How to apply a mask to a DataFrame in Python?

My dataset named ds_f is a 840x57 matrix which contains NaN values. I want to forecast a variable with a linear regression model but when I try to fit the model, I get this message "SVD did not converge":
X = ds_f[ds_f.columns[:-1]]
y = ds_f['target_o_tempm']
model = sm.OLS(y,X) #stackmodel
f = model.fit() #ERROR
So I've been searching for an answer to apply a mask to a DataFrame. Although I was thinking of creating a mask to "ignore" NaN values and then convert it into a DataFrame, I get the same DataFrame as ds_f, nothing changes:
m = ma.masked_array(ds_f, np.isnan(ds_f))
m_ds_f = pd.DataFrame(m,columns=ds_f.columns)
EDIT: I've solved the problem by writing model=sm.OLS(X,y,missing='drop') but a new problem appears when I display results, I get only NaN:
Are you using statsmodels? If so, you could specify sm.OLS(y, X, missing='drop'), to drop the NaN values prior to estimation.
Alternatively, you may want to consider interpolating the missing values, rather than dropping them.

Extracting coefficients from sqreg in Stata

I am trying to run quantile regressions across deciles, and so I use the sqreg command to get bootstrap standard errors for every decile. However, after I run the regression (so Stata runs 9 different regressions - one for each decile except the 100th) I want to store the coefficients in locals. Normally, this is what I would do:
reg y x, r
local coeff = _b[x]
And things would work well. However, here my command is:
sqreg y x, q(0.1 0.2 0.3)
So, I will have three different coefficients here that I want to store as three different locals. Something like:
local coeff10 = _b[x] //Where _b[x] is the coefficient on x for the 10th quantile.
How do I do this? I tried:
local coeff10 = _b[[q10]x]
But this gives me an error. Please help!
Thank you!
Simply save matrix of coefficients from postestimation scalars and reference the outputted variable by row and column.
The reason you could not do the same as the OLS is the sqreg matrix holds multiple named instances of coefficient names:
* OUTPUTS MATRIX OF COEFFICIENTS (1 X 6)
matrix list e(b)
* SAVE COEFF. MATRIX TO REGULAR MATRIX VARIABLE
mat b = e(b)
* EXTRACT BY ROW/COLUMN INTO OTHER VARIABLES
local coeff10 = b[1,1]
local coeff20 = b[1,3]
local coeff30 = b[1,5]

Stata: extract p-values and save them in a list

This may be a trivial question, but as an R user coming to Stata I have so far failed to find the correct Google terms to find the answer. I want to do the following steps:
Do a bunch of tests (e.g. lrtest results in a foreach loop)
Extract the p-value from each test and save them in a list of some kind
Have a list I can do further operations on (e.g. perform multiple comparison correction)
So I am wondering how to extract p-values (or similar) from command results and how to save them into a vector-like object that I can work with. Here is some R code that does something similar:
myData <- data.frame(a=rnorm(10), b=rnorm(10), c=rnorm(10)) ## generate some data
pValue <- c()
for (variableName in c("b", "c")) {
myModel <- lm(as.formula(paste("a ~", variableName)), data=myData) ## fit model
pValue <- c(pValue, coef(summary(myModel))[2, "Pr(>|t|)"]) ## extract p-value and save in vector
}
pValue * 2 ## do amazing multiple comparison correction
To me it seems like Stata has much less of a 'programming' mindset to it than R. If you have any general Stata literature recommendations for an R user who can program, that would also be appreciated.
Here is an approach that would save the p-values in a matrix and then you can manipulate the matrix, maybe using Mata or standard matrix manipulation in Stata.
matrix storeMyP = J(2, 1, .) //create empty matrix with 2 (as many variables as we are looping over) rows, 1 column
matrix list storeMyP //look at the matrix
loc n = 0 //count the iterations
foreach variableName of varlist b c {
loc n = `n' + 1 //each iteration, adjust the count
reg a `variableName'
test `variableName' //this does an F-test, but for one variable it's equivalent to a t-test (check: -help test- there is lots this can do
matrix storeMyP[`n', 1] = `r(p)' //save the p-value in the matrix
}
matrix list storeMyP //look at your p-values
matrix storeMyP_2 = 2*storeMyP //replicating your example above
What's going on this that Stata automatically stores certain quantities after estimation and test commands. When the help files say this command stores the following values in r(), you refer to them in single quotes.
It could also be interesting for you to convert the matrix column(s) into variables using svmat storeMyP, or see help svmat for more info.

Nonlinear least squares in Stata, how to model summation over variables/sets?

I would like to estimate the following function by nonlinear least squares using Stata:
I am testing the results of another papper and would like to use Stata since it is the same software/solver as they used in the paper I am replicating and because it should be easier to do than using GAMS, for example.
My problem is that I cannot find any way to write out the sum part of the equation above. In my data all i's have are a single observation with the values for the j's in separate variables. I could write out the whole expression in the following manner (for three observations/i's):
nl (ln_wage = {alpha0} + {alpha0}*log( ((S_over_H_1)^{alpha2})*exp({alpha3}*distance_1) + ((S_over_H_2)^{alpha2})*exp({alpha3}*distance_2) + ((S_over_H_1)^{alpha2})*exp({alpha3}*distance_1) ))
Is there a simple way to tell Stata to sum over an expression/variables for a given set of numbers, like in GAMS where you can write:
lnwage(i) = alpha0 + alpha1*ln(sum((j), power(S_over_H(i,j),alpha2) * exp(alpha3 * distance(i,j))))
There is no direct equivalent in Stata of the GAMS notation you cite, but you could do this
forval j = 1/3 {
local call `call' S_over_H_`j'^({alpha2}) * exp({alpha3} * distance_`j')
}
nl (ln_wage = {alpha0} + {alpha1} * ln(`call')
P.S. please explain what GAMS is.