Use esttab to generate summary statistics by group with columns for mean difference and significance - stata

I would like to use esttab (ssc install estout) to generate summary statistics by group with columns for the mean difference and significance. It is easy enough to generate these as two separate tables with estpost, summarize, and ttest, and combine manually, but I would like to automate the whole process.
The following code generates the two components of the desired table.
sysuse auto, clear
* summary statistics by group
eststo clear
by foreign: eststo: quietly estpost summarize ///
price mpg weight headroom trunk
esttab, cells("mean sd") label nodepvar
* difference in means
eststo: estpost ttest price mpg weight headroom trunk, ///
by(foreign) unequal
esttab ., wide label
And I can print the two tables and cut-an-paste into one table.
* can generate similar tables and append horizontally
esttab, cells("mean sd") label
esttab, wide label
* manual, cut-and-paste solution
-------------------------------------------------------------------------------------------------------
(1) (2) (3)
mean sd mean sd
-------------------------------------------------------------------------------------------------------
Price 6072.423 3097.104 6384.682 2621.915 -312.3 (-0.44)
Mileage (mpg) 19.82692 4.743297 24.77273 6.611187 -4.946** (-3.18)
Weight (lbs.) 3317.115 695.3637 2315.909 433.0035 1001.2*** (7.50)
Headroom (in.) 3.153846 .9157578 2.613636 .4862837 0.540** (3.30)
Trunk space (.. ft.) 14.75 4.306288 11.40909 3.216906 3.341*** (3.67)
-------------------------------------------------------------------------------------------------------
Observations 52 22 74
-------------------------------------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
It seems that I should be able to get the desired table with one esttab call and without cutting-and-pasting, but I can't figure it out. Is there a way to generate the desired table without manually cutting-and-pasting?
I would prefer to output a LaTeX table, but anything that eliminates the cutting-and-pasting is a big step, even passing through a delimited text file.

If you still want to use esttab, you can play around using cells and pattern. The table in the original post can be replicated with the following code:
sysuse auto, clear
eststo domestic: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 0
eststo foreign: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 1
eststo diff: quietly estpost ttest ///
price mpg weight headroom trunk, by(foreign) unequal
esttab domestic foreign diff, ///
cells("mean(pattern(1 1 0) fmt(2)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(2)) t(pattern(0 0 1) par fmt(2))") ///
label
which yields
-----------------------------------------------------------------------------------------------------
(1) (2) (3)
mean sd mean sd b t
-----------------------------------------------------------------------------------------------------
Price 6072.42 3097.10 6384.68 2621.92 -312.26 (-0.44)
Mileage (mpg) 19.83 4.74 24.77 6.61 -4.95** (-3.18)
Weight (lbs.) 3317.12 695.36 2315.91 433.00 1001.21*** (7.50)
Headroom (in.) 3.15 0.92 2.61 0.49 0.54** (3.30)
Trunk space (.. ft.) 14.75 4.31 11.41 3.22 3.34*** (3.67)
-----------------------------------------------------------------------------------------------------
Observations 52 22 74
-----------------------------------------------------------------------------------------------------

I don't think there's a way to do this with esttab (estout package from ssc), but I have a solution with listtab (also ssc) and postfile. The table here is a little different than the one I propose above, but the approach is general enough that you can modify it to fit your needs.
This solution also use LaTeX's booktabs package.
/* data and variables */
sysuse auto, clear
local vars price mpg weight headroom trunk
/* means */
tempname postMeans
tempfile means
postfile `postMeans' ///
str100 varname domesticMeans foreignMeans pMeans using "`means'", replace
foreach v of local vars {
local name: variable label `v'
ttest `v', by(foreign)
post `postMeans' ("`name'") (r(mu_1)) (r(mu_2)) (r(p))
}
postclose `postMeans'
/* medians */
tempname postMedians
tempfile medians
postfile `postMedians' ///
domesticMedians foreignMedians pMedians using `medians', replace
foreach v of local vars {
summarize `v' if !foreign, detail
local med1 = r(p50)
summarize `v' if foreign, detail
local med2 = r(p50)
ranksum `v', by(foreign)
local pval = 2 * (1 - normal(abs(r(z))))
post `postMedians' (`med1') (`med2') (`pval')
}
postclose `postMedians'
/* combine */
use `means'
merge 1:1 _n using `medians', nogenerate
format *Means *Medians %9.3gc
list
/* make latex table */
/* requires LaTeX package `booktabs` */
listtab * using "Table.tex", ///
rstyle(tabular) replace ///
head("\begin{tabular}{lcccccc}" ///
"\toprule" ///
"& \multicolumn{3}{c}{Means} & \multicolumn{3}{c}{Medians} \\" ///
"\cmidrule(lr){2-4} \cmidrule(lr){5-7}" ///
"& Domestic & Foreign & \emph{p} & Domestic & Foreign & \emph{p}\\" ///
"\midrule") ///
foot("\bottomrule" "\end{tabular}")
This yields the following.

The answer chosen is nice but a bit redudant. You can achieve the same result with only estpost ttest.
sysuse auto, clear
estpost ttest price mpg weight headroom trunk, by(foreign)
esttab, cells("mu_1 mu_2 b(star)"
The output looks like this:
mu_1 mu_2 b
c_score 43.33858 42.034 1.30458***
nc_a4_17 4.007524 3.924623 .0829008*

Related

Twoway estpost tabstat-esttab: retain variable labels

I want to describe characteristics of cars by origin, and retrieve the result as latex table:
sysuse auto
estpost tabstat price trunk, by(foreign) statistics(mean sd) columns(statistics) listwise nototal
esttab using test.txt, main(mean) aux(sd)
Already after the estpost I can feel that the labels are going missing: it correctly displays the value labels "Domestic" and "Foreign", but simply lists the variables as "price" and "trunk" instead of "Price" and "Trunk space".
I have seen this problem on the internet, but no solution is satisfactory. Some suggest fsum, but that doesn't really allow latex, and also no cross-tabulation (means of x by category y).
How can I fix this?
I automatized the accepted's answer as follows:
local varlabels
foreach var in price trunk {
local varlabels `"`varlabels' `var' "`:variable label `var''""'
}
The varlabels option allows you to add custom labels. After the estpost command the names of the estimates look like this:
. mat l e(mean)
e(mean)[1,4]
Domestic: Domestic: Foreign: Foreign:
price trunk price trunk
mean 6072.4231 14.75 6384.6818 11.409091
You can replace these names with variable labels with the addition of some code:
sysuse auto
estpost tabstat price trunk, by(foreign) statistics(mean sd) columns(statistics) listwise nototal
foreach name in `:colfullnames e(mean)' {
foreach var in price trunk {
if strpos("`name'", "`var'") > 0 {
local varlabels `"`varlabels' `name' "`:variable label `var''""'
}
}
}
di `"`varlabels'"'
esttab, main(mean) aux(sd) varlabels(`varlabels')
Result:
. di `"`varlabels'"'
Domestic:price "Price" Domestic:trunk "Trunk space (cu. ft.)" Foreign:price "Price" Foreign:trunk "Trunk space (cu. ft.)"
. esttab, main(mean) aux(sd) varlabels(`varlabels')
----------------------------
(1)
----------------------------
Domestic
Price 6072.4
(3097.1)
Trunk.. ft.) 14.75
(4.306)
----------------------------
Foreign
Price 6384.7
(2621.9)
Trunk.. ft.) 11.41
(3.217)
----------------------------
N 74
----------------------------
mean coefficients; sd in parentheses
* p<0.05, ** p<0.01, *** p<0.001
When adding option unstack to esttab, Domestic and Foreign are used as column names and the names of the coefficients are only price and trunk, so you could do something like this:
esttab, main(mean) aux(sd) unstack varlabels(price "`:variable label price'" trunk "`:variable label trunk'")

How to add ttest results to esttab

I am constructing a table of means and p-values from ttest. How can I get all of this in the same esttab table? Here is a MWE:
Get the sample, save it as a temporary file, create a local with the variables we will consider, create a local that is the length of the first local:
sysuse auto2, clear
*create two groups: 0 and 1
gen group = _n<37
tempfile a
save `a'
local vars "price headroom trunk weight"
local vars_n: word count `vars'
ssc install estout
eststo clear
Calculate the means of group 0 (column 1) and group 2 (column 2):
*group 0 means
use `a', clear
keep if group==0
eststo: estpost sum `vars'
*group 1 means
use `a', clear
keep if group==1
eststo: estpost sum `vars'
Conduct t-tests for each variable (is there an easier way to do this?):
*t-test
*create blank matrix
matrix pval = J(`vars_n',1,.)
use `a', clear
forvalues i=1/`vars_n' {
local var `: word `i' of `vars''
ttest `var', by(group)
*add the two-sided p-value to matrix
matrix pval[`i',1]=r(p)
}
This previous block of code saves the p-values (column 3) into a matrix.
Use esttab to output the results:
esttab, cells(mean(fmt(2))) collabels(none) nodepvars nonumber replace label
esttab matrix(pval, fmt(2 0))
My issue is that I need to have the p-values in the same esttab as the means, but I currently have them in a matrix. How can I use something like eststo: estpost to get them so that I can use esttab (as opposed to esttab matrix)? Or is there a better way to do all of this? My goal is to run esttab, cells(mean(fmt(2))) collabels(none) nodepvars nonumber replace label and have it create a table with the first two columns being the means and the third column being the p-values.
All the information you need is in estpost ttest, so an easy solution would be this:
sysuse auto2, clear
gen group = _n<37
local vars price headroom trunk weight
estpost ttest `vars', by(group)
esttab ., cells("mu_1 mu_2 p") nonumber label
-----------------------------------------------------------
mu_1 mu_2 p
-----------------------------------------------------------
Price 5847.526 6500.639 .3445597
Headroom (in.) 2.828947 3.166667 .0861206
Trunk space (.. ft.) 12.39474 15.19444 .0041618
Weight (lbs.) 2654.474 3404.722 .0000115
-----------------------------------------------------------
Observations 74
-----------------------------------------------------------

How do I create a difference in means table with a column for N observations using esttab?

I'm trying to create a table showing the difference in means of two groups but I'm struggling to get exactly the right columns. I'm specifically trying to recreate a table that includes a column indicating N. I'm really confused how I would go about doing this. Here's the code for the bulk of the kind of table I'm talking about:
sysuse auto, clear
eststo all: quietly estpost summarize ///
price mpg weight headroom trunk
eststo domestic: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 0
eststo foreign: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 1
eststo diff: quietly estpost ttest ///
price mpg weight headroom trunk, by(foreign) unequal
esttab all domestic foreign diff, ///
cells("mean(pattern(1 1 1 0) fmt(2)) b(star pattern(0 0 0 1) fmt(2))" "sd(pattern(1 1 1 0) par fmt(2)) t(pattern(0 0 0 1) par fmt(2))") ///
label
In this scenario, I want a column N that just lists 74 in each row. In this shortened example, it doesn't make much sense but I'm using multiple data sources with the same variables and I want to stack the difference in means tables together.
You can reference the number of observations in each subsample (in this case by(foreign)) using N_1 and N_2 as follows:
sysuse auto, clear
estpost ttest price mpg weight headroom trunk, by(foreign)
esttab . , ///
cell(( N_2(fmt(%12.0fc)) mu_2(fmt(%12.3fc)) ///
N_1(fmt(%12.0fc)) mu_1(fmt(%12.3fc)) ///
b(fmt(%12.3fc)) t(fmt(%12.2fc)) )) ///
noobs label collabels(Count Mean Count Mean Diff T-Stat )

Stata creating output for IV regression with bysort

So I am running an 2SLS model by interview year and I have many interview years and different models. I want to present the first-stage results first and then after reassuring the reader that they are solid move on to the interesting results.
Example of Table A (first stage):
Year DV Coef SE F N
1 A 0.5 0.1 100 1000
2 A 0.8 0.2 10 1500
3 B -0.6 0.4 800 800
Table B with the main results would look the same just without the F-Stat.
I searched on the web about how to create those tables automatically in Stata, but despite finding many questions I didn't find an answer that worked for me. From those different posts and help-files I build something that is nearly there.
It creates the table I want for the main results with the F-Stat together by some variable (Step A in the code). However, when I move on to do the same for the first stage it only saves the last wave as I restore the estimates. I understand why Stata does it like that, but I cannot think of a way of convincing it to do what I want.
clear all
*Install user-written commands
ssc install outreg2, replace
ssc install ivreg210, replace
*load data
sysuse auto, clear
*run example model (obviously the model itself is bogus)
********************************************************
*Step A: creates the IV results by foreign plus the F-Statistic
bys foreign: ///
outreg2 using output1-IV-F, label excel stats(coef se) dec(2) adds(F-Test, e(widstat)) nocons nor2 keep(mpg) replace: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
*Step B: creates the first stage results in a seperate table
bys foreign: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
est restore _ivreg210_mpg
outreg2 using output1_1st-stage, replace keep(rep78)
cap erase output1-IV-F
cap erase output1_1st-stage
So ideally I would only run the model once and have the F-Stat in the first-stage table, but I can fix that manually. The biggest issue I have is how to store the estimates when using bysort. If anyone has any suggestions about that, I would greatly appreciate it.
Thanks!
ssc install estout
then you can store whatever result you want for later use, even after a bysort.
eststo clear
sysuse auto, clear
bysort foreign: eststo: reg price weight mpg
esttab, label nodepvar nonumber
This is a round-about solution. It works, but really isn't the proper solution I was/am looking for. The "trick" is to run the 1st stage as a separate model.
clear all
*Install user-written commands
ssc install outreg2, replace
ssc install ivreg210, replace
*load data
sysuse auto, clear
*run example model (obviously the model itself is bogus)
********************************************************
*Step A: creates the IV results by foreign plus the F-Statistic
bys foreign: ///
outreg2 using output1-IV-F, label excel stats(coef se) dec(2) adds(F-Test, e(widstat)) nocons nor2 keep(mpg) replace: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
*Step B: creates the first stage results in a seperate table
bys foreign: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
est restore _ivreg210_mpg
outreg2 using output1_1st-stage1, replace keep(rep78)
*************
/* NEW BIT */
*************
*Step C: creates the first stage results in a seperate table
bys foreign: ///
outreg2 using output1_1st_NEW, label excel stats(coef se) dec(2) nocons nor2 keep(rep78) replace: ///
reg mpg headroom trunk rep78
cap erase output1-IV-F
cap erase output1_1st-stage1
cap erase output1_1st_NEW

Add column with number of observations to esttab summary statistics table

I would like to make a summary statistics table using esttab from the estout package on SSC. I can make the table just fine, but I would like to add a column that counts the number of non-missing observations for each variable. That is, some variables may not be complete and I would like this to be clear to the reader.
In the example below I removed the first five observations for price, so I would like a 69 in that row. But my code doesn't include row-specific observation counts, only the total number of observations in the footer.
sysuse auto, clear
estpost summarize, detail
replace price = . in 1/5
local screen ///
cells("N mean sd min p50 max") ///
nonumber label
esttab, `screen'
This yields an empty N column, which I would prefer to have at 69 , followed by all 74s.
Is this it:
clear all
set more off
*----- exmple data -----
sysuse auto, clear
keep price mpg rep78 headroom
replace price = . in 1/5
*----- what you want -----
estpost summarize, detail
local screen cells("count mean sd") nonumber label noobs
esttab, `screen'
?
It just uses count. esttab is a wrapper for estout, and the help for the latter documents that it will take "results from e(myel)", which you have from estpost summarize, detail.
An alternative is:
tabstat _all, statistics(count mean sd) columns(statistics)
Yet another one, only that it allows variable labels to be displayed:
fsum _all, stat(n mean sd) uselabel
fsum is from SSC.