Stata: estpost+ttest with "reverse" option? - stata

I currently struggle to apply the reverse option to my ttest.
Here is a toy example:
sysuse auto,clear
local varlist mpg price weight
eststo foreign: quietly estpost summarize `varlist' if foreign==0
eststo domestic: quietly estpost summarize `varlist' if foreign==1
eststo diff: quietly estpost ttest mpg, by(foreign) unequal **reverse**
esttab foreign domestic diff
The following works:
sysuse auto,clear
local varlist mpg price weight
eststo foreign: quietly estpost summarize `varlist' if foreign==0
eststo domestic: quietly estpost summarize `varlist' if foreign==1
eststo diff: quietly estpost ttest mpg, by(foreign) unequal reverse
esttab foreign domestic diff
Note:
ttest mpg, by(foreign) unequal reverse
works.
After reading the estpost documentation, it seems the package currently does not support the option.
Eventually, I need to create a table of ttests for about 20 variables and reverse the result for b/t. I'm vey thankful for workarounds!

You can do everything using regression like this (even in small samples):
. sysuse auto, clear
(1978 automobile data)
. /* reverse t-test */
. ttest mpg, by(foreign) unequal reverse
Two-sample t test with unequal variances
------------------------------------------------------------------------------
Group | Obs Mean Std. err. Std. dev. [95% conf. interval]
---------+--------------------------------------------------------------------
Foreign | 22 24.77273 1.40951 6.611187 21.84149 27.70396
Domestic | 52 19.82692 .657777 4.743297 18.50638 21.14747
---------+--------------------------------------------------------------------
Combined | 74 21.2973 .6725511 5.785503 19.9569 22.63769
---------+--------------------------------------------------------------------
diff | 4.945804 1.555438 1.771556 8.120053
------------------------------------------------------------------------------
diff = mean(Foreign) - mean(Domestic) t = 3.1797
H0: diff = 0 Satterthwaite's degrees of freedom = 30.5463
Ha: diff < 0 Ha: diff != 0 Ha: diff > 0
Pr(T < t) = 0.9983 Pr(|T| > |t|) = 0.0034 Pr(T > t) = 0.0017
. /* calculate means */
. eststo means: qui reg mpg ibn.foreign, vce(hc2) nocons
. /* regression equivalent using robust veriance with a bias correction and t-test DoF */
. eststo diff: qui margins, dydx(foreign) df(`=scalar(df_t)') post
. esttab means diff, label se
----------------------------------------------------
(1) (2)
Mileage (m~)
----------------------------------------------------
Domestic 19.83*** 0
(0.658) (.)
Foreign 24.77*** 4.946**
(1.410) (1.555)
----------------------------------------------------
Observations 74 74
----------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Related

Difference between two means and medians in stata

I want to export p-value of means and median difference to latex from stata. I tried the code below and it works for p-value of means. However, I dont know how to add the analysis on the difference between medians. Can you please help me with this?
eststo control: quietly estpost summarize a b c if treated == 0
eststo treated: quietly estpost summarize a b c if treated == 1
eststo diff: quietly estpost ttest a b c, by(treated) unequal
esttab using means_medians.tex, replace mlabels("Treated" "Control" "Difference")
cells("mean(pattern(1 1 0) fmt(2)) sd(pattern(1 1 0 ) fmt(3)) b(star pattern(0 0 1) fmt(2))") label
You can use epctile that is implemented as a proper estimation commands and returns e(V) and the estimation table. (Disregard the ugly output in the first part. I wrote it about 10 years ago for a specific project, but it seems like I have never made it pretty enough for the general use.)
. sysuse auto, clear
(1978 Automobile Data)
. epctile mpg, p(50)
Mean estimation Number of obs = 74
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
__000006 | -.027027 .058435 -.1434878 .0894338
--------------------------------------------------------------
Percentile estimation
------------------------------------------------------------------------------
mpg | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
p50 | 20 .75 26.67 0.000 18.53003 21.46997
------------------------------------------------------------------------------
This command is not on SSC, it is on my page, so to install it, follow the prompts in findit epctile.

Should tabstat and centile give different results for percentiles?

. sysuse auto
(1978 Automobile Data)
. centile price, centile(25 75)
-- Binom. Interp. --
Variable | Obs Percentile Centile [95% Conf. Interval]
-------------+-------------------------------------------------------------
price | 74 25 4193 4009.467 4501.838
| 75 6378 5798.432 9691.6
. tabstat price, stat(p25 p75)
variable | p25 p75
-------------+--------------------
price | 4195 6342
When making calculations by hand, my answers agree with the centile command, and disagree with the tabstat command (bonus: they also disagree with the sum , detail command).
Where is this discrepancy (25th percentile: 4193 vs 4195, and 75th percentile: 6378 vs 6342) coming from?
I am using Stata 15.1 for Unix.

Add column headings for coefficients and standard errors in an esttab table

I am trying to use the community-contributed command estout to produce regression tables in the wide format (i.e. a separate column for coefficients and a separate column for standard errors), where there is a heading (eg "coefficient" and "s.e.") above each column.
A reproducible example using the auto dataset:
sysuse auto, clear
regress mpg weight i.foreign
estimates store m1
regress mpg weight length i.foreign
estimates store m2
esttab m1 m2, wide b(3) se(3)
esttab m1 m2, wide plain b(3) se(3)
This results in output almost exactly what I am after, but does not have the headings (eg "coefficient" and "s.e.") above each column:
esttab m1 m2, wide b(3) se(3)
----------------------------------------------------------------------
(1) (2)
mpg mpg
----------------------------------------------------------------------
weight -0.007*** (0.001) -0.004** (0.002)
0.foreign 0.000 (.) 0.000 (.)
1.foreign -1.650 (1.076) -1.708 (1.067)
length -0.083 (0.055)
_cons 41.680*** (2.166) 50.537*** (6.246)
----------------------------------------------------------------------
N 74 74
----------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
I suspect my preferred output is possible, because if I use the plain option, I get the headers ("b" and "se", although I would like to be able to rename both if possible):
esttab m1 m2, wide plain b(3) se(3)
m1 m2
b se b se
weight -0.007 0.001 -0.004 0.002
0.foreign 0.000 . 0.000 .
1.foreign -1.650 1.076 -1.708 1.067
length -0.083 0.055
_cons 41.680 2.166 50.537 6.246
N 74 74
My desired output would look like this:
----------------------------------------------------------------------
(1) (2)
mpg mpg
coefficient s.e. coefficient s.e.
----------------------------------------------------------------------
weight -0.007*** (0.001) -0.004** (0.002)
0.foreign 0.000 (.) 0.000 (.)
1.foreign -1.650 (1.076) -1.708 (1.067)
length -0.083 (0.055)
_cons 41.680*** (2.166) 50.537*** (6.246)
----------------------------------------------------------------------
N 74 74
----------------------------------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
Also, while the output above is in text format for reproducibility, in my real output, I'm trying to produce tables in rich text format.
The following works for me:
sysuse auto, clear
regress mpg weight i.foreign
estimates store m1
regress mpg weight length i.foreign
estimates store m2
esttab m1 m2, cells("b(fmt(3) star) se(fmt(3) par)") collabels("coefficient" "s.e.")
----------------------------------------------------------------------
(1) (2)
mpg mpg
coefficient s.e. coefficient s.e.
----------------------------------------------------------------------
weight -0.007*** (0.001) -0.004** (0.002)
0.foreign 0.000 (.) 0.000 (.)
1.foreign -1.650 (1.076) -1.708 (1.067)
length -0.083 (0.055)
_cons 41.680*** (2.166) 50.537*** (6.246)
----------------------------------------------------------------------
N 74 74
----------------------------------------------------------------------

Rename columns in esttab

Consider the following toy example using the community-contributed command esttab:
sysuse auto, clear
estpost tabstat mpg price, statistics(count mean) columns(statistics) listwise
esttab . , title("summary stats") cells("count(fmt(%5.0f)) mean(fmt(%5.0f))")
summary stats
--------------------------------------
(1)
count mean
--------------------------------------
mpg 74 21
price 74 6165
--------------------------------------
How can I change the column name from count to N?
Use the collabels() option:
sysuse auto, clear
estpost tabstat mpg price, statistics(count mean) columns(statistics) listwise
esttab . , title("summary stats") collabels("N" "Mean") ///
cells("count(fmt(%5.0f)) mean(fmt(%5.0f))")
summary stats
--------------------------------------
(1)
N Mean
--------------------------------------
mpg 74 21
price 74 6165
--------------------------------------
N 74
--------------------------------------

Using two different versions of Stata

I am working in two locations, in one I am using Stata 13 and in the other Stata 14.
Can I build a do-file that works in both versions even if some specific command has changed?
For instance, the following code will not work using Stata 13
sysuse auto, clear
ci means mpg price, level(90)
but this one works
sysuse auto, clear
ci mpg price, level(90)
Uising Stata 14, it will be the opposite.
I thought about adding capture but nothing happens in Stata 13 or Stata 14.
. sysuse auto, clear
(1978 Automobile Data)
. capture ci means mpg price, level(90)
. capture ci mpg price, level(90)
Update: Adding noisily after capture didn't help unfortunately. Here is an example with Stata 14 that works
. sysuse auto, clear
(1978 Automobile Data)
. capture noisily ci mpg price, level(90)
you must specify one of means, proportions, or variances following ci
. capture noisily ci means mpg price, level(90)
Variable | Obs Mean Std. Err. [90% Conf. Interval]
-------------+---------------------------------------------------------------
mpg | 74 21.2973 .6725511 20.17683 22.41776
price | 74 6165.257 342.8719 5594.033 6736.48
. gen lb=r(lb)
. su lb
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
lb | 74 5594.033 0 5594.033 5594.033
But this one does not work when you invert two lines of code (with Stata 14):
. sysuse auto, clear
(1978 Automobile Data)
. capture noisily ci means mpg price, level(90)
Variable | Obs Mean Std. Err. [90% Conf. Interval]
-------------+---------------------------------------------------------------
mpg | 74 21.2973 .6725511 20.17683 22.41776
price | 74 6165.257 342.8719 5594.033 6736.48
. capture noisily ci mpg price, level(90)
you must specify one of means, proportions, or variances following ci
* The program didn't stop but:
. gen lb=r(lb)
(74 missing values generated)
. su lb
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
lb | 0
Finally, note that the first code that works correctly with Stata 14 doesn't work with Stata 13
. sysuse auto, clear
(1978 Automobile Data)
. capture noisily ci mpg price, level(90)
Variable | Obs Mean Std. Err. [90% Conf. Interval]
-------------+---------------------------------------------------------------
mpg | 74 21.2973 .6725511 20.17683 22.41776
price | 74 6165.257 342.8719 5594.033 6736.48
. capture noisily ci means mpg price, level(90)
variable means not found
. gen lb=r(lb)
(74 missing values generated)
. su lb
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
lb | 0
If you wish to use capture to catch an error, there is follow-through as well. Here you first try version 14 syntax, and if and only if that fails you try version 13 syntax.
sysuse auto, clear
capture noisily ci means mpg price, level(90)
if _rc ci mpg price, level(90)
gen lb = r(lb)
Here if _rc is a Stataish abbreviation for if _rc > 0 which will happen if and only if a program fails. _rc of 0 means everything was legal (with minute qualifications). _rc is the return code.
I am not clear that putting a single value in a variable is a good idea, but let that be a different issue. Also, you asked for two confidence intervals, and only results for the first will remain in memory.