How to display different decimal places for N and R-Squared using esttab? - stata

I use esttab to generate regression tables. Please see sample code below.
eststo clear
eststo: reghdfe y x1 , absorb(year state) cluster(year state)
estadd local FE "Yes"
estadd local Controls "No"
eststo: reghdfe y x1 x2 x3, absorb(year state) cluster(year state)
estadd local FE "Yes"
estadd local Controls "Yes"
esttab ,star(* 0.10 ** 0.05 *** 0.01) b(3) t(2) r2 replace label drop(_cons) stats(FE Conrols N r2,fmt(%9.0fc) labels("Fixed Effects" "Controls" "Obserations" "R-Squared"))
Because my dataset is large, I use fmt(%9.0fc) to add commas to my number of observations. However, this option rounds my R-Squared to 0. How can I have integers (with commas) for observations and three decimal places for R-squared?
Also, is there a way to display adjusted R-squared? The user manual suggests that stats disables ar2 but I struggle to find a fix.

esttab is part of estout by Ben Jann, see the online documentation for installation and further information.
Here is a minimal working example using esttab's default formats.
eststo clear
sysuse auto
eststo: quietly regress price weight mpg
eststo: quietly regress price weight mpg foreign
esttab ,star(* 0.10 ** 0.05 *** 0.01) ///
b(3) t(2) ar2
Note that ar2 calls adjusted R^2. However, if you are going to use the stats() option to format the number of obs the syntax changes:
esttab ,star(* 0.10 ** 0.05 *** 0.01) ///
b(3) t(2) ///
stats(N r2_a)
To apply formats you add a term for each stat, and if you supply fewer terms than stats esttab applies the last one to the remaining stats. This is why your R^2 is rounding to zero. Try the following:
esttab ,star(* 0.10 ** 0.05 *** 0.01) ///
b(3) t(2) ///
stats(N r2_a, fmt(%9.0fc %9.2fc)
So I would edit your esttab line as follows:
esttab ,star(* 0.10 ** 0.05 *** 0.01) ///
b(3) t(2) /// no 'ar2' here
label /// note you only need 'replace' if you are generating a file
nocons /// 'reghdfe' has an option to 'drop(_cons)'
stats(FE Controls N r2_a, /// 'Controls' not 'Conrols'
fmt(%9.0fc %9.0fc %9.0fc %9.2fc) /// it seems you need place holders for the string stats
labels("Fixed Effects" "Controls" "Obserations" "R-Squared"))

Related

How to store confidence intervals from stata margins estimation?

stata experts,
I have been trying to find a way to store marginal estimations, including the p value and confidence interval.
Below is the code I have. All that I can get is the estimated marginal effect of variable I. Looks like I can't specify "ci" like what we can do for usual regression models. Is there a way to also store and present the other numbers from marginal estimations?
probit Y1 X
margin, dydx(X) post
est store m1
probit Y2 X
margins, dydx(X) post
est store m2
esttab m1 m2
esttab m1 m2, ci
Another related question is: how do I save marginal estimations for interaction terms? Example code below
probit Y2 year month year*month
margins year#month, asbalanced post
Thank you in advance!
Here's a way to grab p-values and confidence intervals after a margins command.
sysuse auto, clear
probit foreign price trunk
margin, dydx(price) post
eststo m1
The results from the margin command:
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
price | .0000268 .0000159 1.69 0.092 -4.36e-06 .000058
------------------------------------------------------------------------------
Then the p-value and confidence interval are recoverable from the stored matrices e(b) and e(V). To get the p-value, we need the z-score which is the point estimate over the standard error (e(b)[1,1]/sqrt(e(V)[1,1]). The rest is calculating the area in the two tails using normal.
The confidence interval is the point estimate e(b)[1,1] plus the standard error sqrt(e(V)[1,1]) times the critical value of z invnormal(0.975).
Shown with the output so that you can see the numbers line up:
. di "P-value: " normal(-abs(e(b)[1,1]/sqrt(e(V)[1,1])))*2
P-value: .09186065
. di "Upper bound: " e(b)[1,1] + sqrt(e(V)[1,1])*invnormal(0.975)
Upper bound: .00005796
. di "Lower bound: " e(b)[1,1] - sqrt(e(V)[1,1])*invnormal(0.975)
Lower bound: -4.361e-06
To put the p-value in a table, for example, you could use estadd:
estadd scalar pvalue = normal(-abs(e(b)[1,1]/sqrt(e(V)[1,1])))*2
And then esttab:
esttab m1, stats(pvalue, label("P-value"))
. esttab m1, stats(pvalue, label("P-value"))
----------------------------
(1)
----------------------------
price 0.0000268
(1.69)
----------------------------
P-value 0.0919
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

I want to create a table reporting results from two separate regressions using different variables

I have run two regressions on similar but different data sets and regressors the results of which I want to report next to each other for comparability but keep them in one table using estout/esttab. The finished product should look something like this.
Table
-----------------------------------------
Dep. Var.: a
-----------------------------------------
Regression 1 |Regression 2
-----------------------------------------
x_1 coeff.|x_2 coeff.
y_1 coeff.|y_2 coeff.
z_1 coeff.|z_2 coeff.
l_1 coeff.|
m_1 coeff.|
-----------------------------------------
Obs value|Obs value
-----------------------------------------
Hypothesis |
x_1=1,p-value |x_2=1,p-value
-----------------------------------------
I am able to create individual tables like this just fine but I have honestly no idea where to start here and documentation hasn't been very helpful either. I hope someone here can point me in the right direction.
I think the eststo/esttab syntax will help. Here's code which I think produces what you're after:
ssc install estout, replace
sysuse auto, clear
eststo m1: regress price mpg
test mpg=1
estadd scalar pvalue = r(p)
eststo m2: regress price mpg trunk [enter image description here][1]
test trunk=1
estadd scalar pvalue = r(p)
esttab m1 m2, b(3) se(3) stats(pvalue r2 N, fmt(3 3 0) label("P-value" "R-squared" "Observations")) mtitle("Regression 1" "Regression 2") nocons
It creates this table:
--------------------------------------------
(1) (2)
Regression 1 Regression 2
--------------------------------------------
mpg -238.894*** -220.165**
(53.077) (65.593)
trunk 43.559
(88.719)
--------------------------------------------
P-value 0.000 0.633
R-squared 0.220 0.222
Observations 74 74
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
The esttab guide is really great.

cycling Ranksum on Stata

I have some data with two different groupd of patients automatically exported from a diagnostic tool.
Variables are automatically nominated by the diagnostic tool (e.g. L1DensityWholeImage, L1WholeImageSHemi, L1WholeImageIHemi , L1WholeETDRS ,[...], DeepL2StartLayer, L2Startoffsetum, L2EndLayer, [...], Perimeter, AcircularityIndex )
I have to perform a Rank-sum test (or Mann-Whitney U test) with all the variables (> of 80) by group.
Normally, I should write each single analysis like that:
ranksum L1DensityWholeImage, by(Group)
ranksum L1WholeImageSHemi, by(Group)
ranksum L1WholeImageIHemi, by(Group)
ranksum L1WholeETDRS, by(Group)
Is there any way or code to write the command with a varlist? And maybe to obtain only 1 output result with all the p value?
e.g.: ranksum L1DensityWholeImage L1WholeImageSHemi L1WholeImageIHemi L1WholeETDRS, DeepL2StartLayer L2Startoffsetum L2EndLayer Perimeter AcircularityIndex, by(Group)
A short answer is write a loop and customise output.
Here is a token example which you can run.
sysuse auto, clear
foreach v of var mpg price weight length displacement {
quietly ranksum `v', by(foreign) porder
scalar pval = 2*normprob(-abs(r(z)))
di "`v'{col 14}" %05.3f pval " " %6.4e pval " " %05.3f r(porder)
}
Output is
mpg 0.002 1.9e-03 0.271
price 0.298 3.0e-01 0.423
weight 0.000 3.8e-07 0.875
length 0.000 9.4e-07 0.862
displacement 0.000 1.1e-08 0.921
Notes:
If your variable names are longer, they will need more space.
Displaying P-values with fixed numbers of decimal places won't prepare you for the circumstance in which all displayed digits are zero. The code exemplifies two forms of output.
The probability that values for the first group exceed those for the second group is very helpful in interpretation. Further summary statistics could be added.
Naturally a presentable table needs more header lines, best given with display.

Use esttab to generate summary statistics by group with columns for mean difference and significance

I would like to use esttab (ssc install estout) to generate summary statistics by group with columns for the mean difference and significance. It is easy enough to generate these as two separate tables with estpost, summarize, and ttest, and combine manually, but I would like to automate the whole process.
The following code generates the two components of the desired table.
sysuse auto, clear
* summary statistics by group
eststo clear
by foreign: eststo: quietly estpost summarize ///
price mpg weight headroom trunk
esttab, cells("mean sd") label nodepvar
* difference in means
eststo: estpost ttest price mpg weight headroom trunk, ///
by(foreign) unequal
esttab ., wide label
And I can print the two tables and cut-an-paste into one table.
* can generate similar tables and append horizontally
esttab, cells("mean sd") label
esttab, wide label
* manual, cut-and-paste solution
-------------------------------------------------------------------------------------------------------
(1) (2) (3)
mean sd mean sd
-------------------------------------------------------------------------------------------------------
Price 6072.423 3097.104 6384.682 2621.915 -312.3 (-0.44)
Mileage (mpg) 19.82692 4.743297 24.77273 6.611187 -4.946** (-3.18)
Weight (lbs.) 3317.115 695.3637 2315.909 433.0035 1001.2*** (7.50)
Headroom (in.) 3.153846 .9157578 2.613636 .4862837 0.540** (3.30)
Trunk space (.. ft.) 14.75 4.306288 11.40909 3.216906 3.341*** (3.67)
-------------------------------------------------------------------------------------------------------
Observations 52 22 74
-------------------------------------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
It seems that I should be able to get the desired table with one esttab call and without cutting-and-pasting, but I can't figure it out. Is there a way to generate the desired table without manually cutting-and-pasting?
I would prefer to output a LaTeX table, but anything that eliminates the cutting-and-pasting is a big step, even passing through a delimited text file.
If you still want to use esttab, you can play around using cells and pattern. The table in the original post can be replicated with the following code:
sysuse auto, clear
eststo domestic: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 0
eststo foreign: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 1
eststo diff: quietly estpost ttest ///
price mpg weight headroom trunk, by(foreign) unequal
esttab domestic foreign diff, ///
cells("mean(pattern(1 1 0) fmt(2)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(2)) t(pattern(0 0 1) par fmt(2))") ///
label
which yields
-----------------------------------------------------------------------------------------------------
(1) (2) (3)
mean sd mean sd b t
-----------------------------------------------------------------------------------------------------
Price 6072.42 3097.10 6384.68 2621.92 -312.26 (-0.44)
Mileage (mpg) 19.83 4.74 24.77 6.61 -4.95** (-3.18)
Weight (lbs.) 3317.12 695.36 2315.91 433.00 1001.21*** (7.50)
Headroom (in.) 3.15 0.92 2.61 0.49 0.54** (3.30)
Trunk space (.. ft.) 14.75 4.31 11.41 3.22 3.34*** (3.67)
-----------------------------------------------------------------------------------------------------
Observations 52 22 74
-----------------------------------------------------------------------------------------------------
I don't think there's a way to do this with esttab (estout package from ssc), but I have a solution with listtab (also ssc) and postfile. The table here is a little different than the one I propose above, but the approach is general enough that you can modify it to fit your needs.
This solution also use LaTeX's booktabs package.
/* data and variables */
sysuse auto, clear
local vars price mpg weight headroom trunk
/* means */
tempname postMeans
tempfile means
postfile `postMeans' ///
str100 varname domesticMeans foreignMeans pMeans using "`means'", replace
foreach v of local vars {
local name: variable label `v'
ttest `v', by(foreign)
post `postMeans' ("`name'") (r(mu_1)) (r(mu_2)) (r(p))
}
postclose `postMeans'
/* medians */
tempname postMedians
tempfile medians
postfile `postMedians' ///
domesticMedians foreignMedians pMedians using `medians', replace
foreach v of local vars {
summarize `v' if !foreign, detail
local med1 = r(p50)
summarize `v' if foreign, detail
local med2 = r(p50)
ranksum `v', by(foreign)
local pval = 2 * (1 - normal(abs(r(z))))
post `postMedians' (`med1') (`med2') (`pval')
}
postclose `postMedians'
/* combine */
use `means'
merge 1:1 _n using `medians', nogenerate
format *Means *Medians %9.3gc
list
/* make latex table */
/* requires LaTeX package `booktabs` */
listtab * using "Table.tex", ///
rstyle(tabular) replace ///
head("\begin{tabular}{lcccccc}" ///
"\toprule" ///
"& \multicolumn{3}{c}{Means} & \multicolumn{3}{c}{Medians} \\" ///
"\cmidrule(lr){2-4} \cmidrule(lr){5-7}" ///
"& Domestic & Foreign & \emph{p} & Domestic & Foreign & \emph{p}\\" ///
"\midrule") ///
foot("\bottomrule" "\end{tabular}")
This yields the following.
The answer chosen is nice but a bit redudant. You can achieve the same result with only estpost ttest.
sysuse auto, clear
estpost ttest price mpg weight headroom trunk, by(foreign)
esttab, cells("mu_1 mu_2 b(star)"
The output looks like this:
mu_1 mu_2 b
c_score 43.33858 42.034 1.30458***
nc_a4_17 4.007524 3.924623 .0829008*

Save on disk and reload estimates for esttab

Is there any way to save and reload data in between an eststo command and an esttab?
What I would love is something like the following:
eststo: quietly reg a b
estsave using foo.est, replace
***
*Some other File
estload using foo.est
esttab foo.tex
Any other alternatives that let me play with the way I output regressions by trial and error (without having to re-run them and having to be at an interactive prompt) would be enormously helpful.
Why do you need to put it to disk?
The (prefix) command eststo stores the results in memory until you close the file, and unless specified names each estimate consecutively (eststo1, eststo2 etc.). You can re-program and re-run only part of the do file.
Alternatively, you could create all estimates in a do and call it from a secondary do:
/* .do for make tables */
do makeEstimates.do
esttab ...
Elsewhere you program makeEstimates.do:
/* .do to make estimates */
quietly regress a b
estout ab
You can run once, then comment out the do makeEstimates.do line to just work on estout if you do not change them.
You can store estimates on disk with the estimates save command:
sysuse auto, clear
quietly regress price mpg
estimates save foo1
quietly regress price trunk
estimates save foo2
quietly regress price weight
estimates save foo3
The above code snippet creates 3 files in your current working directory containing the estimates:
foo1.ster
foo2.ster
foo3.ster
You can then reload these and use them with esttab non-interactively and in any way you like with the estimates use command:
estimates use foo2
esttab .
----------------------------
(1)
price
----------------------------
trunk 216.7**
(2.81)
_cons 3183.5**
(2.87)
----------------------------
N 74
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
estimates use foo1
esttab .
----------------------------
(1)
price
----------------------------
mpg -238.9***
(-4.50)
_cons 11253.1***
(9.61)
----------------------------
N 74
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
estimates use foo3
esttab .
----------------------------
(1)
price
----------------------------
weight 2.044***
(5.42)
_cons -6.707
(-0.01)
----------------------------
N 74
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001