Difference between two means and medians in stata - stata

I want to export p-value of means and median difference to latex from stata. I tried the code below and it works for p-value of means. However, I dont know how to add the analysis on the difference between medians. Can you please help me with this?
eststo control: quietly estpost summarize a b c if treated == 0
eststo treated: quietly estpost summarize a b c if treated == 1
eststo diff: quietly estpost ttest a b c, by(treated) unequal
esttab using means_medians.tex, replace mlabels("Treated" "Control" "Difference")
cells("mean(pattern(1 1 0) fmt(2)) sd(pattern(1 1 0 ) fmt(3)) b(star pattern(0 0 1) fmt(2))") label

You can use epctile that is implemented as a proper estimation commands and returns e(V) and the estimation table. (Disregard the ugly output in the first part. I wrote it about 10 years ago for a specific project, but it seems like I have never made it pretty enough for the general use.)
. sysuse auto, clear
(1978 Automobile Data)
. epctile mpg, p(50)
Mean estimation Number of obs = 74
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
__000006 | -.027027 .058435 -.1434878 .0894338
--------------------------------------------------------------
Percentile estimation
------------------------------------------------------------------------------
mpg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p50 | 20 .75 26.67 0.000 18.53003 21.46997
------------------------------------------------------------------------------
This command is not on SSC, it is on my page, so to install it, follow the prompts in findit epctile.

Related

Stata multinomial regression - post-estimation Wald test

I've conducted a multinomial logistic regression analysis in Stata, followed by a Wald test, and was hoping someone could confirm that my code is doing what I think it's doing.
NB: I'm using some of Stata's example data to illustrate. The analysis I'm running for this illustration is completely meaningless, but uses the same procedure as my 'real' analysis, other than the fact that my real analysis also includes some probability weights and other covariates.
sysuse auto.dta
First, I run a multinomial logistic regression, predicting 'Repair Record' from 'Foreign' and 'Price':
mlogit rep78 i.foreign price, base(1) rrr nolog
Multinomial logistic regression Number of obs = 69
LR chi2(8) = 31.15
Prob > chi2 = 0.0001
Log likelihood = -78.116372 Pseudo R2 = 0.1662
------------------------------------------------------------------------------
rep78 | RRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1 | (base outcome)
-------------+----------------------------------------------------------------
2 |
foreign |
Foreign | .7822853 1672.371 -0.00 1.000 0 .
price | 1.000414 .0007027 0.59 0.556 .9990375 1.001792
_cons | .5000195 1.669979 -0.21 0.836 .000718 348.2204
-------------+----------------------------------------------------------------
3 |
foreign |
Foreign | 686842 1.30e+09 0.01 0.994 0 .
price | 1.000462 .0006955 0.66 0.507 .9990996 1.001826
_cons | 1.254303 4.106511 0.07 0.945 .0020494 767.6863
-------------+----------------------------------------------------------------
4 |
foreign |
Foreign | 6177800 1.17e+10 0.01 0.993 0 .
price | 1.000421 .0006999 0.60 0.547 .9990504 1.001794
_cons | .5379627 1.7848 -0.19 0.852 .0008067 358.7452
-------------+----------------------------------------------------------------
5 |
foreign |
Foreign | 2.79e+07 5.29e+10 0.01 0.993 0 .
price | 1.000386 .0007125 0.54 0.587 .9989911 1.001784
_cons | .146745 .5072292 -0.56 0.579 .0001676 128.4611
------------------------------------------------------------------------------
Second, I want to know whether the 'Foreign' coefficient for outcome category 4 is significantly different to the 'Foreign' coefficient for outcome category 5. So, I run a Wald test:
test [4]1.foreign = [5]1.foreign
( 1) [4]1.foreign - [5]1.foreign = 0
chi2( 1) = 2.72
Prob > chi2 = 0.0988
From this, I conclude that the 'Foreign' coefficient for outcome category 4 is NOT significantly different to the 'Foreign' coefficient for outcome category 5. Put more simply, the association between 'Foreign' and 'Repair 4' (compared to 'Repair 1') is equal to the association between 'Foreign' and 'Repair 5' (compared to 'Repair 1') .
Is my code for the Wald test, and my inferences about what it's doing and showing, correct?
Additionally, to what was discussed in the comments you can also perform a likelihood-ratio test using the following code.
sysuse auto.dta
qui mlogit rep78 i.foreign price, base(1) rrr nolog
estimate store unrestricted
constraint 1 [4]1.foreign = [5]1.foreign
qui mlogit rep78 i.foreign price, base(1) rrr nolog constraints(1)
estimate store restricted
lrtest unrestricted restricted
The output of the test shows the same conclusion as the Wald test, but it has better properties as explained below.
Likelihood-ratio test LR chi2(1) = 3.13
(Assumption: restricted nested in unrestricted) Prob > chi2 = 0.0771
Quoting the official documentation from mlogit
The results produced by test are an approximation based on the estimated covariance matrix of the coefficients. Because the probability of being uninsured is low, the log-likelihood may be nonlinear for the uninsured. Conventional statistical wisdom is not to trust the asymptotic answer under these circumstances but to perform a likelihood-ratio test instead.

Should tabstat and centile give different results for percentiles?

. sysuse auto
(1978 Automobile Data)
. centile price, centile(25 75)
-- Binom. Interp. --
Variable | Obs Percentile Centile [95% Conf. Interval]
-------------+-------------------------------------------------------------
price | 74 25 4193 4009.467 4501.838
| 75 6378 5798.432 9691.6
. tabstat price, stat(p25 p75)
variable | p25 p75
-------------+--------------------
price | 4195 6342
When making calculations by hand, my answers agree with the centile command, and disagree with the tabstat command (bonus: they also disagree with the sum , detail command).
Where is this discrepancy (25th percentile: 4193 vs 4195, and 75th percentile: 6378 vs 6342) coming from?
I am using Stata 15.1 for Unix.

How can I predict a dependent variable at the regressors' sample means?

I am working with several kinds of regressions in Stata (probit, logit, quantile regression,...) I would like to know how to predict the dependent variable at the regressors' sample means. This is straightforward for OLS, but donĀ“t see how to get it for a quantile regression.
The margins command is useful for this:
. sysuse auto
(1978 Automobile Data)
. qreg price weight length i.foreign, nolog
Median regression Number of obs = 74
Raw sum of deviations 71102.5 (about 4934)
Min sum of deviations 54411.29 Pseudo R2 = 0.2347
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
weight | 3.933588 1.328718 2.96 0.004 1.283543 6.583632
length | -41.25191 45.46469 -0.91 0.367 -131.9284 49.42456
|
foreign |
Foreign | 3377.771 885.4198 3.81 0.000 1611.857 5143.685
_cons | 344.6489 5182.394 0.07 0.947 -9991.31 10680.61
------------------------------------------------------------------------------
. margins, at((mean) _continuous (base) _factor)
Warning: cannot perform check for estimable functions.
Adjusted predictions Number of obs = 74
Model VCE : IID
Expression : Linear prediction, predict()
at : weight = 3019.459 (mean)
length = 187.9324 (mean)
foreign = 0
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | 4469.386 418.7774 10.67 0.000 3648.597 5290.175
This predicts the median at means of the covariates for continuous variables and the base for the dummies (so you can avoid nonsensical values like fractionally pregnant).

Using two different versions of Stata

I am working in two locations, in one I am using Stata 13 and in the other Stata 14.
Can I build a do-file that works in both versions even if some specific command has changed?
For instance, the following code will not work using Stata 13
sysuse auto, clear
ci means mpg price, level(90)
but this one works
sysuse auto, clear
ci mpg price, level(90)
Uising Stata 14, it will be the opposite.
I thought about adding capture but nothing happens in Stata 13 or Stata 14.
. sysuse auto, clear
(1978 Automobile Data)
. capture ci means mpg price, level(90)
. capture ci mpg price, level(90)
Update: Adding noisily after capture didn't help unfortunately. Here is an example with Stata 14 that works
. sysuse auto, clear
(1978 Automobile Data)
. capture noisily ci mpg price, level(90)
you must specify one of means, proportions, or variances following ci
. capture noisily ci means mpg price, level(90)
Variable | Obs Mean Std. Err. [90% Conf. Interval]
-------------+---------------------------------------------------------------
mpg | 74 21.2973 .6725511 20.17683 22.41776
price | 74 6165.257 342.8719 5594.033 6736.48
. gen lb=r(lb)
. su lb
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
lb | 74 5594.033 0 5594.033 5594.033
But this one does not work when you invert two lines of code (with Stata 14):
. sysuse auto, clear
(1978 Automobile Data)
. capture noisily ci means mpg price, level(90)
Variable | Obs Mean Std. Err. [90% Conf. Interval]
-------------+---------------------------------------------------------------
mpg | 74 21.2973 .6725511 20.17683 22.41776
price | 74 6165.257 342.8719 5594.033 6736.48
. capture noisily ci mpg price, level(90)
you must specify one of means, proportions, or variances following ci
* The program didn't stop but:
. gen lb=r(lb)
(74 missing values generated)
. su lb
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
lb | 0
Finally, note that the first code that works correctly with Stata 14 doesn't work with Stata 13
. sysuse auto, clear
(1978 Automobile Data)
. capture noisily ci mpg price, level(90)
Variable | Obs Mean Std. Err. [90% Conf. Interval]
-------------+---------------------------------------------------------------
mpg | 74 21.2973 .6725511 20.17683 22.41776
price | 74 6165.257 342.8719 5594.033 6736.48
. capture noisily ci means mpg price, level(90)
variable means not found
. gen lb=r(lb)
(74 missing values generated)
. su lb
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lb | 0
If you wish to use capture to catch an error, there is follow-through as well. Here you first try version 14 syntax, and if and only if that fails you try version 13 syntax.
sysuse auto, clear
capture noisily ci means mpg price, level(90)
if _rc ci mpg price, level(90)
gen lb = r(lb)
Here if _rc is a Stataish abbreviation for if _rc > 0 which will happen if and only if a program fails. _rc of 0 means everything was legal (with minute qualifications). _rc is the return code.
I am not clear that putting a single value in a variable is a good idea, but let that be a different issue. Also, you asked for two confidence intervals, and only results for the first will remain in memory.

What command in Stata 12 do I use to interpret the coefficients of the Limited Dependent Variable model?

I am running the following code:
oprobit var1 var2 var3 var4 var5 var2##var3 var4##var5 var6 var7 etc.
Without the interaction terms I could have used the following code to interpret the coefficients:
mfx compute, predict(outcome(2))
[for outcome equaling 2 (in total I have 4 outcomes)]
But since mfx does not work with the interaction terms, I get an error.
I tried to use
margins command, but it did not work either!!!
margins var2 var3 var4 var5 var2##var3 var4##var5 var6 var7 etc... , post
margins works ONLY for the interaction terms: (margins var2 var3 var4 var5, post)
What command do I use to be able to interpret BOTH interaction and regular variables?
Finally, to use simple language, my question is: given the regression model above, what command can I use to interpret the coefficients?
mfx is an old command that has been replaced with margins. That is why it does not work with factor variable notation that you used to define the interactions. I am not clear what you actually intended to calculate with the margins command.
Here's an example of how you can get the average marginal effects on the probability of outcome 2:
. webuse fullauto
(Automobile Models)
. oprobit rep77 i.foreign c.weight c.length##c.mpg
Iteration 0: log likelihood = -89.895098
Iteration 1: log likelihood = -76.800575
Iteration 2: log likelihood = -76.709641
Iteration 3: log likelihood = -76.709553
Iteration 4: log likelihood = -76.709553
Ordered probit regression Number of obs = 66
LR chi2(5) = 26.37
Prob > chi2 = 0.0001
Log likelihood = -76.709553 Pseudo R2 = 0.1467
--------------------------------------------------------------------------------
rep77 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------------+----------------------------------------------------------------
1.foreign | 1.514739 .4497962 3.37 0.001 .633155 2.396324
weight | -.0005104 .0005861 -0.87 0.384 -.0016593 .0006384
length | .0969601 .0348506 2.78 0.005 .0286542 .165266
mpg | .4747249 .2241349 2.12 0.034 .0354286 .9140211
|
c.length#c.mpg | -.0020602 .0013145 -1.57 0.117 -.0046366 .0005161
---------------+----------------------------------------------------------------
/cut1 | 17.21885 5.386033 6.662419 27.77528
/cut2 | 18.29469 5.416843 7.677877 28.91151
/cut3 | 19.66512 5.463523 8.956814 30.37343
/cut4 | 21.12134 5.515901 10.31038 31.93231
--------------------------------------------------------------------------------
. margins, dydx(*) predict(outcome(2))
Average marginal effects Number of obs = 66
Model VCE : OIM
Expression : Pr(rep77==2), predict(outcome(2))
dy/dx w.r.t. : 1.foreign weight length mpg
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
1.foreign | -.2002434 .0576487 -3.47 0.001 -.3132327 -.087254
weight | .0000828 .0000961 0.86 0.389 -.0001055 .0002711
length | -.0088956 .003643 -2.44 0.015 -.0160356 -.0017555
mpg | -.012849 .0085546 -1.50 0.133 -.0296157 .0039178
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
If you want the prediction, rather than the marginal effect, try
margins, predict(outcome(2))
The marginal effect of just the interaction term is harder to calculate in a non-linear model. Details here.
The marginal effects for positive outcomes, Pr(depvar1=1, depvar2=1), are
. mfx compute, predict(p11)
The marginal effects for Pr(depvar1=1, depvar2=0) are
. mfx compute, predict(p10)
The marginal effects for Pr(depvar1=0, depvar2=1) are
. mfx compute, predict(p01)