Stata multinomial regression - post-estimation Wald test - stata

I've conducted a multinomial logistic regression analysis in Stata, followed by a Wald test, and was hoping someone could confirm that my code is doing what I think it's doing.
NB: I'm using some of Stata's example data to illustrate. The analysis I'm running for this illustration is completely meaningless, but uses the same procedure as my 'real' analysis, other than the fact that my real analysis also includes some probability weights and other covariates.
sysuse auto.dta
First, I run a multinomial logistic regression, predicting 'Repair Record' from 'Foreign' and 'Price':
mlogit rep78 i.foreign price, base(1) rrr nolog
Multinomial logistic regression Number of obs = 69
LR chi2(8) = 31.15
Prob > chi2 = 0.0001
Log likelihood = -78.116372 Pseudo R2 = 0.1662
------------------------------------------------------------------------------
rep78 | RRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1 | (base outcome)
-------------+----------------------------------------------------------------
2 |
foreign |
Foreign | .7822853 1672.371 -0.00 1.000 0 .
price | 1.000414 .0007027 0.59 0.556 .9990375 1.001792
_cons | .5000195 1.669979 -0.21 0.836 .000718 348.2204
-------------+----------------------------------------------------------------
3 |
foreign |
Foreign | 686842 1.30e+09 0.01 0.994 0 .
price | 1.000462 .0006955 0.66 0.507 .9990996 1.001826
_cons | 1.254303 4.106511 0.07 0.945 .0020494 767.6863
-------------+----------------------------------------------------------------
4 |
foreign |
Foreign | 6177800 1.17e+10 0.01 0.993 0 .
price | 1.000421 .0006999 0.60 0.547 .9990504 1.001794
_cons | .5379627 1.7848 -0.19 0.852 .0008067 358.7452
-------------+----------------------------------------------------------------
5 |
foreign |
Foreign | 2.79e+07 5.29e+10 0.01 0.993 0 .
price | 1.000386 .0007125 0.54 0.587 .9989911 1.001784
_cons | .146745 .5072292 -0.56 0.579 .0001676 128.4611
------------------------------------------------------------------------------
Second, I want to know whether the 'Foreign' coefficient for outcome category 4 is significantly different to the 'Foreign' coefficient for outcome category 5. So, I run a Wald test:
test [4]1.foreign = [5]1.foreign
( 1) [4]1.foreign - [5]1.foreign = 0
chi2( 1) = 2.72
Prob > chi2 = 0.0988
From this, I conclude that the 'Foreign' coefficient for outcome category 4 is NOT significantly different to the 'Foreign' coefficient for outcome category 5. Put more simply, the association between 'Foreign' and 'Repair 4' (compared to 'Repair 1') is equal to the association between 'Foreign' and 'Repair 5' (compared to 'Repair 1') .
Is my code for the Wald test, and my inferences about what it's doing and showing, correct?

Additionally, to what was discussed in the comments you can also perform a likelihood-ratio test using the following code.
sysuse auto.dta
qui mlogit rep78 i.foreign price, base(1) rrr nolog
estimate store unrestricted
constraint 1 [4]1.foreign = [5]1.foreign
qui mlogit rep78 i.foreign price, base(1) rrr nolog constraints(1)
estimate store restricted
lrtest unrestricted restricted
The output of the test shows the same conclusion as the Wald test, but it has better properties as explained below.
Likelihood-ratio test LR chi2(1) = 3.13
(Assumption: restricted nested in unrestricted) Prob > chi2 = 0.0771
Quoting the official documentation from mlogit
The results produced by test are an approximation based on the estimated covariance matrix of the coefficients. Because the probability of being uninsured is low, the log-likelihood may be nonlinear for the uninsured. Conventional statistical wisdom is not to trust the asymptotic answer under these circumstances but to perform a likelihood-ratio test instead.

Related

How can I predict a dependent variable at the regressors' sample means?

I am working with several kinds of regressions in Stata (probit, logit, quantile regression,...) I would like to know how to predict the dependent variable at the regressors' sample means. This is straightforward for OLS, but donĀ“t see how to get it for a quantile regression.
The margins command is useful for this:
. sysuse auto
(1978 Automobile Data)
. qreg price weight length i.foreign, nolog
Median regression Number of obs = 74
Raw sum of deviations 71102.5 (about 4934)
Min sum of deviations 54411.29 Pseudo R2 = 0.2347
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | 3.933588 1.328718 2.96 0.004 1.283543 6.583632
length | -41.25191 45.46469 -0.91 0.367 -131.9284 49.42456
|
foreign |
Foreign | 3377.771 885.4198 3.81 0.000 1611.857 5143.685
_cons | 344.6489 5182.394 0.07 0.947 -9991.31 10680.61
------------------------------------------------------------------------------
. margins, at((mean) _continuous (base) _factor)
Warning: cannot perform check for estimable functions.
Adjusted predictions Number of obs = 74
Model VCE : IID
Expression : Linear prediction, predict()
at : weight = 3019.459 (mean)
length = 187.9324 (mean)
foreign = 0
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 4469.386 418.7774 10.67 0.000 3648.597 5290.175
This predicts the median at means of the covariates for continuous variables and the base for the dummies (so you can avoid nonsensical values like fractionally pregnant).

Stata xtoverid command error

I have panel data and my regression is of the form:
s_roa1 = s_roa + c_roa
I am new to Stata and i am trying to use the xtoverid command for a robust hausman test to help me choose between a fixed or random effects model:
xtoverid s_roa1 s_roa c_roa, fe i (year)
However, I get the following error:
varlist not allowed
Can anyone help me understand what does this suggest?
First of all, xtoverid is a community-contributed command, something which you fail to make clear in your question. It is customary and useful to provide this information right from the start, so others know that you do not refer to an official, built-in command.
Second, this is a post-estimation command, which means you run it directly after you estimate your model using xtreg, xtivreg, xtivreg2 or xthtaylor.
The help file provided by the authors offers an enlightening example:
. webuse nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. tsset idcode year
panel variable: idcode (unbalanced)
time variable: year, 68 to 88, but with gaps
delta: 1 unit
.
. gen age2=age^2
(24 missing values generated)
. gen black=(race==2)
.
. xtivreg ln_wage age (tenure = union south), fe i(idcode)
Fixed-effects (within) IV regression Number of obs = 19,007
Group variable: idcode Number of groups = 4,134
R-sq: Obs per group:
within = . min = 1
between = 0.1261 avg = 4.6
overall = 0.0869 max = 12
Wald chi2(2) = 142054.65
corr(u_i, Xb) = -0.6875 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tenure | .2450528 .0382041 6.41 0.000 .1701741 .3199314
age | -.0650873 .0126167 -5.16 0.000 -.0898156 -.040359
_cons | 2.826672 .2451883 11.53 0.000 2.346112 3.307232
-------------+----------------------------------------------------------------
sigma_u | .71990151
sigma_e | .64315554
rho | .55612637 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4133,14871) = 1.53 Prob > F = 0.0000
------------------------------------------------------------------------------
Instrumented: tenure
Instruments: age union south
------------------------------------------------------------------------------
.
. xtoverid
Test of overidentifying restrictions:
Cross-section time-series model: xtivreg fe
Sargan-Hansen statistic 0.965 Chi-sq(1) P-value = 0.3259
. xtoverid, robust
Test of overidentifying restrictions:
Cross-section time-series model: xtivreg fe robust
Sargan-Hansen statistic 0.960 Chi-sq(1) P-value = 0.3271
. xtoverid, cluster(idcode)
Test of overidentifying restrictions:
Cross-section time-series model: xtivreg fe robust cluster(idcode)
Sargan-Hansen statistic 0.495 Chi-sq(1) P-value = 0.4818
From Stata's command prompt, type help xtoverid for more details.

how to regress categorical variables in Stata

I'm trying to do a multinomial logit and my independent variables are categorical. I have two categorical variables - edu1 for those with high school degrees and edu2 for those with college degrees. The variables are dummy variables (edu1=1 denotes those with high school degrees, edu1=0 without) I want the results so that I can compare results against those who have college degrees. However, when I do mlogit edu*, the model automatically includes edu1 not edu2 in the model. Is there way to reverse this and include edu2 and not include edu1 instead?
You can't have both in the model unless you drop the constant. Google "dummy variable trap" to see why. Here's an example:
. webuse sysdsn1, clear
(Health insurance data)
. recode male (0=1) (1=0), gen(female)
(644 differences between male and female)
. mlogit insure male female, nocons nolog
Multinomial logistic regression Number of obs = 616
Wald chi2(4) = 149.44
Log likelihood = -553.40712 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
male | .3001046 .1703301 1.76 0.078 -.0337363 .6339455
female | -.1772065 .0968274 -1.83 0.067 -.3669847 .0125718
-------------+----------------------------------------------------------------
Uninsure |
male | -1.529395 .3059244 -5.00 0.000 -2.128996 -.9297944
female | -1.989585 .1884768 -10.56 0.000 -2.358993 -1.620177
------------------------------------------------------------------------------
. mlogit insure male, nolog
Multinomial logistic regression Number of obs = 616
LR chi2(2) = 6.38
Prob > chi2 = 0.0413
Log likelihood = -553.40712 Pseudo R2 = 0.0057
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
male | .477311 .1959283 2.44 0.015 .0932987 .8613234
_cons | -.1772065 .0968274 -1.83 0.067 -.3669847 .0125718
-------------+----------------------------------------------------------------
Uninsure |
male | .46019 .3593233 1.28 0.200 -.2440708 1.164451
_cons | -1.989585 .1884768 -10.56 0.000 -2.358993 -1.620177
------------------------------------------------------------------------------
Note that in second specification, the constant is the female effect and males get constant plus the male coefficient. This matches what you get with the no constant specification above.
If you have other dummies in the model, things get a bit more complicated. The constant will correspond to all the omitted categories from each set of dummy variables.

include panel-specific trends in a first-difference regression

I was wondering if there's a way to include panel-specific or just varying trends in a first-difference regression when clustering on the panel id and the time variable.
Here's an example of with Stata:
. webuse nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. ivreg2 S1.(ln_wage tenure) , cluster(idcode year)
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on idcode and year
Number of clusters (idcode) = 3660 Number of obs = 10528
Number of clusters (year) = 8 F( 1, 7) = 2.81
Prob > F = 0.1378
Total (centered) SS = 1004.098948 Centered R2 = 0.0007
Total (uncentered) SS = 1035.845686 Uncentered R2 = 0.0314
Residual SS = 1003.36326 Root MSE = .3087
------------------------------------------------------------------------------
| Robust
S.ln_wage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tenure |
S1. | .0076418 .0042666 1.79 0.073 -.0007206 .0160043
|
_cons | .0501738 .0070986 7.07 0.000 .0362608 .0640868
------------------------------------------------------------------------------
Included instruments: S.tenure
------------------------------------------------------------------------------
. ivreg2 S1.(ln_wage tenure i.c_city), cluster(idcode year)
factor variables not allowed
r(101);
In the specification above, the constant corresponds to a common time trend. Putting the factor variable outside the seasonal difference operator errors as well.
I understand that the differencing operator does not play well with factor variables or interactions, but I feel there must be some hack to get around that.
The ivreg2 is a bit of a red herring. I am not doing IV estimation, I just want to use two-way clustering.
You get the same solution as #Metrics if you do xi: ivreg2 S1.(ln_wage tenure) i.ind_code , cluster(idcode year)

What command in Stata 12 do I use to interpret the coefficients of the Limited Dependent Variable model?

I am running the following code:
oprobit var1 var2 var3 var4 var5 var2##var3 var4##var5 var6 var7 etc.
Without the interaction terms I could have used the following code to interpret the coefficients:
mfx compute, predict(outcome(2))
[for outcome equaling 2 (in total I have 4 outcomes)]
But since mfx does not work with the interaction terms, I get an error.
I tried to use
margins command, but it did not work either!!!
margins var2 var3 var4 var5 var2##var3 var4##var5 var6 var7 etc... , post
margins works ONLY for the interaction terms: (margins var2 var3 var4 var5, post)
What command do I use to be able to interpret BOTH interaction and regular variables?
Finally, to use simple language, my question is: given the regression model above, what command can I use to interpret the coefficients?
mfx is an old command that has been replaced with margins. That is why it does not work with factor variable notation that you used to define the interactions. I am not clear what you actually intended to calculate with the margins command.
Here's an example of how you can get the average marginal effects on the probability of outcome 2:
. webuse fullauto
(Automobile Models)
. oprobit rep77 i.foreign c.weight c.length##c.mpg
Iteration 0: log likelihood = -89.895098
Iteration 1: log likelihood = -76.800575
Iteration 2: log likelihood = -76.709641
Iteration 3: log likelihood = -76.709553
Iteration 4: log likelihood = -76.709553
Ordered probit regression Number of obs = 66
LR chi2(5) = 26.37
Prob > chi2 = 0.0001
Log likelihood = -76.709553 Pseudo R2 = 0.1467
--------------------------------------------------------------------------------
rep77 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
1.foreign | 1.514739 .4497962 3.37 0.001 .633155 2.396324
weight | -.0005104 .0005861 -0.87 0.384 -.0016593 .0006384
length | .0969601 .0348506 2.78 0.005 .0286542 .165266
mpg | .4747249 .2241349 2.12 0.034 .0354286 .9140211
|
c.length#c.mpg | -.0020602 .0013145 -1.57 0.117 -.0046366 .0005161
---------------+----------------------------------------------------------------
/cut1 | 17.21885 5.386033 6.662419 27.77528
/cut2 | 18.29469 5.416843 7.677877 28.91151
/cut3 | 19.66512 5.463523 8.956814 30.37343
/cut4 | 21.12134 5.515901 10.31038 31.93231
--------------------------------------------------------------------------------
. margins, dydx(*) predict(outcome(2))
Average marginal effects Number of obs = 66
Model VCE : OIM
Expression : Pr(rep77==2), predict(outcome(2))
dy/dx w.r.t. : 1.foreign weight length mpg
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.foreign | -.2002434 .0576487 -3.47 0.001 -.3132327 -.087254
weight | .0000828 .0000961 0.86 0.389 -.0001055 .0002711
length | -.0088956 .003643 -2.44 0.015 -.0160356 -.0017555
mpg | -.012849 .0085546 -1.50 0.133 -.0296157 .0039178
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
If you want the prediction, rather than the marginal effect, try
margins, predict(outcome(2))
The marginal effect of just the interaction term is harder to calculate in a non-linear model. Details here.
The marginal effects for positive outcomes, Pr(depvar1=1, depvar2=1), are
. mfx compute, predict(p11)
The marginal effects for Pr(depvar1=1, depvar2=0) are
. mfx compute, predict(p10)
The marginal effects for Pr(depvar1=0, depvar2=1) are
. mfx compute, predict(p01)