How to store confidence intervals from stata margins estimation? - stata

stata experts,
I have been trying to find a way to store marginal estimations, including the p value and confidence interval.
Below is the code I have. All that I can get is the estimated marginal effect of variable I. Looks like I can't specify "ci" like what we can do for usual regression models. Is there a way to also store and present the other numbers from marginal estimations?
probit Y1 X
margin, dydx(X) post
est store m1
probit Y2 X
margins, dydx(X) post
est store m2
esttab m1 m2
esttab m1 m2, ci
Another related question is: how do I save marginal estimations for interaction terms? Example code below
probit Y2 year month year*month
margins year#month, asbalanced post
Thank you in advance!

Here's a way to grab p-values and confidence intervals after a margins command.
sysuse auto, clear
probit foreign price trunk
margin, dydx(price) post
eststo m1
The results from the margin command:
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
price | .0000268 .0000159 1.69 0.092 -4.36e-06 .000058
------------------------------------------------------------------------------
Then the p-value and confidence interval are recoverable from the stored matrices e(b) and e(V). To get the p-value, we need the z-score which is the point estimate over the standard error (e(b)[1,1]/sqrt(e(V)[1,1]). The rest is calculating the area in the two tails using normal.
The confidence interval is the point estimate e(b)[1,1] plus the standard error sqrt(e(V)[1,1]) times the critical value of z invnormal(0.975).
Shown with the output so that you can see the numbers line up:
. di "P-value: " normal(-abs(e(b)[1,1]/sqrt(e(V)[1,1])))*2
P-value: .09186065
. di "Upper bound: " e(b)[1,1] + sqrt(e(V)[1,1])*invnormal(0.975)
Upper bound: .00005796
. di "Lower bound: " e(b)[1,1] - sqrt(e(V)[1,1])*invnormal(0.975)
Lower bound: -4.361e-06
To put the p-value in a table, for example, you could use estadd:
estadd scalar pvalue = normal(-abs(e(b)[1,1]/sqrt(e(V)[1,1])))*2
And then esttab:
esttab m1, stats(pvalue, label("P-value"))
. esttab m1, stats(pvalue, label("P-value"))
----------------------------
(1)
----------------------------
price 0.0000268
(1.69)
----------------------------
P-value 0.0919
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

Related

I want to create a table reporting results from two separate regressions using different variables

I have run two regressions on similar but different data sets and regressors the results of which I want to report next to each other for comparability but keep them in one table using estout/esttab. The finished product should look something like this.
Table
-----------------------------------------
Dep. Var.: a
-----------------------------------------
Regression 1 |Regression 2
-----------------------------------------
x_1 coeff.|x_2 coeff.
y_1 coeff.|y_2 coeff.
z_1 coeff.|z_2 coeff.
l_1 coeff.|
m_1 coeff.|
-----------------------------------------
Obs value|Obs value
-----------------------------------------
Hypothesis |
x_1=1,p-value |x_2=1,p-value
-----------------------------------------
I am able to create individual tables like this just fine but I have honestly no idea where to start here and documentation hasn't been very helpful either. I hope someone here can point me in the right direction.
I think the eststo/esttab syntax will help. Here's code which I think produces what you're after:
ssc install estout, replace
sysuse auto, clear
eststo m1: regress price mpg
test mpg=1
estadd scalar pvalue = r(p)
eststo m2: regress price mpg trunk [enter image description here][1]
test trunk=1
estadd scalar pvalue = r(p)
esttab m1 m2, b(3) se(3) stats(pvalue r2 N, fmt(3 3 0) label("P-value" "R-squared" "Observations")) mtitle("Regression 1" "Regression 2") nocons
It creates this table:
--------------------------------------------
(1) (2)
Regression 1 Regression 2
--------------------------------------------
mpg -238.894*** -220.165**
(53.077) (65.593)
trunk 43.559
(88.719)
--------------------------------------------
P-value 0.000 0.633
R-squared 0.220 0.222
Observations 74 74
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
The esttab guide is really great.

How to display different decimal places for N and R-Squared using esttab?

I use esttab to generate regression tables. Please see sample code below.
eststo clear
eststo: reghdfe y x1 , absorb(year state) cluster(year state)
estadd local FE "Yes"
estadd local Controls "No"
eststo: reghdfe y x1 x2 x3, absorb(year state) cluster(year state)
estadd local FE "Yes"
estadd local Controls "Yes"
esttab ,star(* 0.10 ** 0.05 *** 0.01) b(3) t(2) r2 replace label drop(_cons) stats(FE Conrols N r2,fmt(%9.0fc) labels("Fixed Effects" "Controls" "Obserations" "R-Squared"))
Because my dataset is large, I use fmt(%9.0fc) to add commas to my number of observations. However, this option rounds my R-Squared to 0. How can I have integers (with commas) for observations and three decimal places for R-squared?
Also, is there a way to display adjusted R-squared? The user manual suggests that stats disables ar2 but I struggle to find a fix.
esttab is part of estout by Ben Jann, see the online documentation for installation and further information.
Here is a minimal working example using esttab's default formats.
eststo clear
sysuse auto
eststo: quietly regress price weight mpg
eststo: quietly regress price weight mpg foreign
esttab ,star(* 0.10 ** 0.05 *** 0.01) ///
b(3) t(2) ar2
Note that ar2 calls adjusted R^2. However, if you are going to use the stats() option to format the number of obs the syntax changes:
esttab ,star(* 0.10 ** 0.05 *** 0.01) ///
b(3) t(2) ///
stats(N r2_a)
To apply formats you add a term for each stat, and if you supply fewer terms than stats esttab applies the last one to the remaining stats. This is why your R^2 is rounding to zero. Try the following:
esttab ,star(* 0.10 ** 0.05 *** 0.01) ///
b(3) t(2) ///
stats(N r2_a, fmt(%9.0fc %9.2fc)
So I would edit your esttab line as follows:
esttab ,star(* 0.10 ** 0.05 *** 0.01) ///
b(3) t(2) /// no 'ar2' here
label /// note you only need 'replace' if you are generating a file
nocons /// 'reghdfe' has an option to 'drop(_cons)'
stats(FE Controls N r2_a, /// 'Controls' not 'Conrols'
fmt(%9.0fc %9.0fc %9.0fc %9.2fc) /// it seems you need place holders for the string stats
labels("Fixed Effects" "Controls" "Obserations" "R-Squared"))

How to extract any result from the output table

I am doing a weighted average and here is the table:
mean Income [fweight=Group]
Mean estimation
Number of obs = 1000
| Mean Std. Err. [95% Conf. Interval]
Income | 612.863 10.748 627.554 594.921
I really want to get the standard error and the confidence interval. However, I can only get variance by e(V). So my current method is to store e(V) in a matrix and store the element in a scalar and then use sqrt(). This is tedious!
Is there any way I can extract these statistics easily?
For example in R, all the output table is saved in a matrix RESULT and you can get the standard error simply through RESULT[1,2].
The command mean returns r(table) with the results you require:
webuse highschool, clear
mean height [pw = weight]
Mean estimation Number of obs = 4,071
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
height | 432.8991 .4149654 432.0856 433.7127
--------------------------------------------------------------
matrix list r(table)
r(table)[9,1]
height
b 432.89913
se .41496538
t 1043.2175
pvalue 0
ll 432.08557
ul 433.71269
df 4070
crit 1.960547
eform 0
More generally, different Stata commands return different results. However, in nearly all cases they give you all the ingredients to easily calculate what you need.
It may require a bit more effort to calculate further results but this is easily programmable and if you need to do something often you can write a wrapper program for the command.

How do I use outreg2 to display value labels in its output?

Take this code
sysuse auto, clear
reg price mpg c.mpg#i.foreign
outreg2 using "example.txt", stats(coef) replace
This outputs
(1)
VARIABLES price
price
mpg -329.0***
0b.foreign#co.mpg 0
1.foreign#c.mpg 78.33**
Constant 12,596***
Observations 74
R-squared 0.289
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
Ideally, I'd like it to display the value labels, as is done in the console's regression output:
-------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mpg | -329.0368 61.46843 -5.35 0.000 -451.6014 -206.4723
|
foreign#c.mpg |
Foreign | 78.32918 29.78726 2.63 0.010 18.93508 137.7233
|
_cons | 12595.97 1235.936 10.19 0.000 10131.58 15060.35
-------------------------------------------------------------------------------
I don't need any of the other stats at the moment; I'm strictly including that last piece of output to show what I mean with the value labels. Searching through the documentation for outreg2 tells me how to display variable labels, but not value labels.
Also posted on Statalist.
As #Dimitriy points out, you can use estout, from SSC. An example:
sysuse auto, clear
reg price mpg c.mpg#i.foreign
estimates store m1, title(Model 1)
estout m1, label
You can add other statistics, stars and more. After installation (ssc install estout), read patiently help estout.
If you decode your variables and use xi, it will do the trick. Of course this solution assumes that you recode your variables, but if you want to stick with outreg2 is an easy solution.
sysuse auto, clear
set seed 1234
gen maxspeed = round(uniform()*3)+1
label define speed 1 "Light" 2 "Ridiculous" 3 "Ludicrous" 4 "Plaid"
label values maxspeed speed
decode maxspeed, gen(maxspeed_str)
decode foreign, gen(foreign_str)
xi: reg price mpg weight i.foreign_str*i.maxspeed_str
outreg2 using test, see text label
I used the example you asked in Statalist as it was your latest question.

Percentile for COMPLEX SURVEY DATA while using subpop option

I am using a survey sample and am trying to analyze a subpopulation.
I am trying to get mean, median, 10th percentile and 90th percentile of a continuous varaible for my subpopulation of interets.
Stata website http://www.stata.com/support/faqs/statistics/percentiles-for-survey-data/ shows the metod to obtain median/percentiles.
However, I am interested in sub population and not the entire sample.
Can you please show me the appropriate commands to obtain any percentile while using a complex survey sample with sub population option?
You can use _pctile to get percentiles for a subpopulation without svyset, because the percentiles depend only on the weights. However to get standard errors and confidence intervals, you should download epctile by Stas Kolenikov (findit epctile in Stata) and svyset the data.
net describe epctile, from(http://web.missouri.edu/~kolenikovs/stata)
net install epctile.pkg
The auto data will provide the example, with the variable weight being the probability weight.
sysuse auto, clear
_pctile price if foreign==0 [pw = weight], p(25 50 75)
return list
scalars:
r(r1) = 4195
r(r2) = 5104
r(r3) = 6486
Compare to svysetting the data and calling epctile:
gen strat = rep78
gen mkr = substr(make,1,2)
svyset mkr [pw = weight], strata(strat)
epctile price, percentiles(25 50 75) subpop(if foreign==0) svy
Results:
Percentile estimation
------------------------------------------------------------------------------
| Linearized
price | Coef. Std. Err. z P>|z| [95% Conf. Interval]
p25 | 4195 108.5 38.66 0.000 3982.344 4407.656
p50 | 5104 320.5 15.93 0.000 4475.832 5732.168
p75 | 6486 2093 3.10 0.002 2383.795 10588.2