How to display r-squared for multiple models using outreg2 - stata

I run two regressions for which I would like to show the r-squared:
logit y c.x1 c.x2
quietly est store e1
local r1 = e(r2_p)
logit y c.x1 c.x2 c.x3
quietly est store e2
local r2 = e(r2_p)
I tried to create a matrix to fill it but was not successful:
mat t1=J(1,2,0) //Defining empty matrix with 2 columns 1 row
local rsq `r*' //Trying to store r1 and r2 as numeric
local a=1
forval i=1/2{
mat t1[`i'+1,`a']= `r*' // filling each row, one at a time, this fails
loc ++a
}
mat li t1
Ultimately, I would like to export the results with the community-contributed Stata command outreg2:
outreg2 [e*] using "myfile", excel replace addstat(Adj. R^2:, `rsq')

The following works for me:
webuse lbw, clear
logit low age smoke
outreg2 using "myfile.txt", replace addstat(Adj. R^2, e(r2_p))
logit low age smoke ptl ht ui
outreg2 using "myfile.txt", addstat(Adj. R^2, e(r2_p)) append
type myfile.txt
(1) (2)
VARIABLES low low
age -0.0498 -0.0541
(0.0320) (0.0339)
smoke 0.692** 0.557
(0.322) (0.339)
ptl 0.679**
(0.344)
ht 1.408**
(0.624)
ui 0.817*
(0.451)
Constant 0.0609 -0.168
(0.757) (0.806)
Observations 189 189
Adj. R^2 0.0315 0.0882
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1

Related

Power analysis via simulations in Stata version 15.1

I have been trying to run this simulation code in Stata version 15.1, but am having issues running it as indicated below.
local num_clus 3 6 9 18 36
local clussize 5 10 15 20 25
*Model specifications
local intercept 17.87
local timecoeff1 -5.42
local timecoeff2 -5.72
local timecoeff3 -7.03
local timecoeff4 -6.13
local timecoeff5 -9.13
local intrvcoeff 5.00
local sigma_u3 25.77
local sigma_u2 120.62
local sigma_error 38.35
local nrep 1000
local alpha 0.05
*Generate multi-level data
capture program drop swcrt
program define swcrt, rclass
version 15.1
preserve
clear
args num_clus clussize intercept intrvcoeff timecoeff1 timecoeff2 timecoeff3 timecoeff4 timecoeff5 sigma_u3 sigma_error alpha
assert `num_clus' > 0 & `clussize' > 0 & `intercept' > 0 & `intrvcoeff' > 0 & `timecoeff1' < 0 & `timecoeff2' < 0 & `timecoeff3' < 0 & `timecoeff4' < 0 & `timecoeff5' < 0 & `sigma_u3' > 0 & `sigma_error' > 0 & `alpha' > 0
/*Generate simulated multi—level data*/
qui
clear
set obs `num_clus'
qui gen cluster = _n
qui gen group = 1+mod(_n-1,4)
/*Generate cluster-level errors*/
qui gen u_3 = rnormal(0,`sigma_u3')
expand `clussize'
bysort cluster: gen individual = _n
/*Set up time*/
expand 6
bysort cluster individual: gen time = _n-1
/*Set up intervention variable*/
gen intrv = (time>=group)
/*Generate residual errors*/
qui gen error = rnormal(0,`sigma_error')
/*Generate outcome y*/
qui gen y = `intercept' + `intrvcoeff'*intrv + `timecoeff1'*1.time + `timecoeff2'*2.time + `timecoeff3'*3.time + `timecoeff4'*4.time + `timecoeff5'*5.time + u_3 + error
/*Fit multi-level model to simulated dataset*/
mixed y intrv i.time ||cluster:, covariance(unstructured) reml dfmethod(kroger)
/*Return estimated effect size, bias, p-value, and significance dichotomy*/
tempname M
matrix `M' = r(table)
return scalar bias = _b[intrv] - `intrvcoeff'
return scalar p = `M'[1,4]
return scalar p_= (`M'[1,4] < `alpha')
exit
end swcrt
*Postfile to store results
tempname step
tempfile powerresults
capture postutil clear
postfile `step' num_clus [B]clussize[/B] intrvcoeff p p_ bias using `powerresults', replace
ERROR: (note: file /var/folders/v4/j5kzzhc52q9fvh6w9pcx9fgm0000gn/T//S_00310.00000c not found)
*Loop over number of clusters
foreach c of local num_clus{
display as text "Number of clusters" as result "`c'"
foreach s of local clussize{
display as text "Cluster size" as result "`s'"
forvalue i = 1/`nrep'{
display as text "Iterations" as result `nrep'
quietly swcrt `num_clus' `clussize' `intercept' `intrvcoeff' `timecoeff1' `timecoeff2' `timecoeff3' `timecoeff4' `timecoeff5' `sigma_u3' `sigma_error' `alpha'
post `step' (`c') (`s') (`intrvcoeff') (`r(p)') (`r(p_)') (`r(bias)')
}
}
}
postclose `step'
ERROR:
Number of clusters3
Cluster size5
Iterations1000
r(9);
*Open results, calculate power
use `powerresults', clear
levelsof num_clus, local(num_clus)
levelsof clussize, local(clussize)
matrix drop _all
*Loop over combinations of clusters
*Add power results to matrix
foreach c of local num_clus{
foreach s of local clussize{
quietly ci proportions p_ if num_clus == `c' & clussize = `s'
local power `r(proportion)'
local power_lb `r(lb)'
local power_ub `r(ub)'
quietly ci mean bias if num_clus == `c' & clussize = `s'
local bias `r(mean)'
matrix M = nullmat(M) \ (`c', `s', `intrvcoeff', `power', `power_lb', `power_ub', `bias')
}
}
*Display the matrix
matrix colnames M = c s intrvcoeff power power_lb power_ub bias
ERROR:
matrix M not found
r(111);
matrix list M, noheader format(%3.2f)
ERROR:
matrix M not found
r(111);
There are a few things that seem to be amiss above.
I get a message after the postfile command saying that the file is not found. Nowhere in my code do I actually use that name so it seems to be generated by Stata.
After the loop and the post command I get error r(9).
Error message r(111) - says that the matrix is not found.
I have checked the following parts of the code to try and resolve the issue:
Specified local macros outside of the program and passed into it via the args statement of the program
Match between the variables in the call of the swcrt with the args statement in the program
Match between arguments in assert statement of the program with args command and whether the alligator clips are specified appropriately
Match b/w the number of variables in the post and postfile commands
I am not quite sure why I get these errors considering that the code did work previously and the program iterated (even when I take away the changes there is still the error). Does anyone know why this happens? If I had to guess, the matrix can't be found because of the error with the file not being found when I use postfile.

Export combined tables when using oprobit

I am running an ordered probit with four levels (A lot, Somewhat, Little, Not at all) on a female variable and some controls:
* Baseline only
eststo, title ("OProbit1"): /*quietly*/ oprobit retincome_worry i.female $control_socio, vce(robust)
estimate store OProbit1
* Baseline + Health Controls
eststo, title ("OProbit3"): oprobit retincome_worry i.female $control_socio $control_health, vce(robust)
estimate store OProbit3
I am doing this for marginal effects of the female variable:
* TABLE BASELINE
estimate restore OProbit1
margins, dydx(i.female) predict (outcome(1)) atmeans post
outreg using results\Reg_margins\Reg2.tex, noautosumm replace rtitle(A lot) ctitle(Social Controls) title(Worry about Retirement Income)
estimate restore OProbit1
margins, dydx(i.female) predict (outcome(2)) atmeans post
outreg using results\Reg_margins\Reg2.tex, noautosumm append rtitle(Somewhat)
estimate restore OProbit1
margins, dydx(i.female) predict (outcome(3)) atmeans post
outreg using results\Reg_margins\Reg2.tex, noautosumm append rtitle(Little)
estimate restore OProbit1
margins, dydx(i.female) predict (outcome(4)) atmeans post
outreg using results\Reg_margins\Reg2.tex, noautosumm append rtitle(Not at all) tex
* TABLE BASELINE + HEALTH
estimate restore OProbit3
margins, dydx(i.female) predict (outcome(1)) atmeans post
outreg using results\Reg_margins\Reg3.tex, noautosumm replace rtitle(A lot) ctitle(Baseline and Health) title(Worry about Retirement Income)
estimate restore OProbit3
margins, dydx(i.female) predict (outcome(2)) atmeans post
outreg using results\Reg_margins\Reg3.tex, append noautosumm rtitle(Somewhat)
estimate restore OProbit3
margins, dydx(i.female) predict (outcome(3)) atmeans post
outreg using results\Reg_margins\Reg3.tex, append noautosumm rtitle(Little)
estimate restore OProbit3
margins, dydx(i.female) predict (outcome(4)) atmeans post
outreg using results\Reg_margins\Reg3.tex, append noautosumm rtitle(Not at all) tex
I currently have four tables (see examples for two of them), each with a column name which is the controls included in the model and four rows with each level:
How can I have all of this in a single table, keeping the four rows and adding more columns?
You can get the desired output using the community-contributed command esttab.
First, define the program appendmodels (obtained from here):
capt prog drop appendmodels
*! version 1.0.0 14aug2007 Ben Jann
program appendmodels, eclass
// using first equation of model
version 8
syntax namelist
tempname b V tmp
foreach name of local namelist {
qui est restore `name'
mat `tmp' = e(b)
local eq1: coleq `tmp'
gettoken eq1 : eq1
mat `tmp' = `tmp'[1,"`eq1':"]
local cons = colnumb(`tmp',"_cons")
if `cons'<. & `cons'>1 {
mat `tmp' = `tmp'[1,1..`cons'-1]
}
mat `b' = nullmat(`b') , `tmp'
mat `tmp' = e(V)
mat `tmp' = `tmp'["`eq1':","`eq1':"]
if `cons'<. & `cons'>1 {
mat `tmp' = `tmp'[1..`cons'-1,1..`cons'-1]
}
capt confirm matrix `V'
if _rc {
mat `V' = `tmp'
}
else {
mat `V' = ///
( `V' , J(rowsof(`V'),colsof(`tmp'),0) ) \ ///
( J(rowsof(`tmp'),colsof(`V'),0) , `tmp' )
}
}
local names: colfullnames `b'
mat coln `V' = `names'
mat rown `V' = `names'
eret post `b' `V'
eret local cmd "whatever"
end
Next, run the following (here I use Stata's fullauto toy dataset for illustration):
webuse fullauto, clear
estimates clear
forvalues i = 1 / 4 {
oprobit rep77 i.foreign
margins, dydx(foreign) predict (outcome(`i')) atmeans post
estimate store OProbit1`i'
}
appendmodels OProbit11 OProbit12 OProbit13 OProbit14
estimates store result1
forvalues i = 1 / 4 {
oprobit rep77 i.foreign length mpg
margins, dydx(foreign) predict (outcome(`i')) atmeans post
estimate store OProbit2`i'
}
appendmodels OProbit21 OProbit22 OProbit23 OProbit24
estimates store result2
forvalues i = 1 / 4 {
oprobit rep77 i.foreign trunk weight
margins, dydx(foreign) predict (outcome(`i')) atmeans post
estimate store OProbit3`i'
}
appendmodels OProbit31 OProbit32 OProbit23 OProbit34
estimates store result3
forvalues i = 1 / 4 {
oprobit rep77 i.foreign price displ
margins, dydx(foreign) predict (outcome(`i')) atmeans post
estimate store OProbit4`i'
}
appendmodels OProbit41 OProbit42 OProbit43 OProbit44
estimates store result4
Finally, see the results:
esttab result1 result2 result3 result4, keep(1.foreign) varlab(1.foreign " ") ///
labcol2("A lot" "Somewhat" "A little" "Not at all") gaps noobs nomtitles
-------------------------------------------------------------------------------------
(1) (2) (3) (4)
-------------------------------------------------------------------------------------
A lot -0.0572 -0.0677 -0.0728 -0.0690
(-1.83) (-1.67) (-1.81) (-1.67)
Somewhat -0.144** -0.247*** -0.188** -0.175*
(-2.73) (-3.54) (-2.86) (-2.47)
A little -0.124 -0.290** -0.290** -0.163
(-1.86) (-3.07) (-3.07) (-1.74)
Not at all 0.198** 0.351*** 0.252** 0.237*
(2.64) (3.82) (2.95) (2.55)
-------------------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
You can install esttab by typing the following in Stata's command prompt:
ssc install estout

test with missing standard errors

How can I conduct a hypothesis test in Stata when my predictor perfectly predicts my dependent variable?
I would like to run the same regression over many subsets of my data. For each regression, I would then like to test the hypothesis that beta_1 = 1/2. However, for some subsets, I have perfect collinearity, and Stata is not able to calculate standard errors.
For example, in the below case,
sysuse auto, clear
gen value = 2*foreign*(price<6165)
gen value2 = 2*foreign*(price>6165)
gen id = 1 + (price<6165)
I get the output
. reg foreign value value2 weight length, noconstant
Source | SS df MS Number of obs = 74
-------------+------------------------------ F( 4, 70) = .
Model | 22 4 5.5 Prob > F = .
Residual | 0 70 0 R-squared = 1.0000
-------------+------------------------------ Adj R-squared = 1.0000
Total | 22 74 .297297297 Root MSE = 0
------------------------------------------------------------------------------
foreign | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
value | .5 . . . . .
value2 | .5 . . . . .
weight | 3.54e-19 . . . . .
length | -6.31e-18 . . . . .
------------------------------------------------------------------------------
and
. test value = .5
( 1) value = .5
F( 1, 70) = .
Prob > F = .
In the actual data, there is usually more variation. So I can identify the cases where the predictor does a very good job of predicting the DV--but I miss those cases where prediction is perfect. Is there a way to conduct a hypothesis test that catches these cases?
EDIT:
The end goal would be to classify observations within subsets based on the hypothesis test. If I cannot reject the hypothesis at the 95% confidence level, I classify the observation as type 1. Below, both groups would be classified as type 1, though I only want the second group.
gen type = .
for values 1/2 {
quietly: reg foreign value value2 weight length if id = `i', noconstant
test value = .5
replace type = 1 if r(p)>.05
}
There is no way to do this out of the box that I'm aware of. Of course you could program it yourself to get an approximation of the p-value in these cases. The standard error is missing here because the relationship between x and y is perfectly collinear. There is no noise in the model, nothing deviates.
Interestingly enough though, the standard error of the estimate is useless in this case anyway. test performs a Wald test for beta_i = exp against beta_i != exp, not a t-test.
The Wald test uses the variance-covariance matrix from the regression. To see this yourself, refer to the Methods and formulas section here and run the following code:
(also, if you remove the -1 from gen mpg2 = and run, you will see the issue)
sysuse auto, clear
gen mpg2 = mpg * 2.5 - 1
qui reg mpg2 mpg, nocons
* collect matrices to calculate Wald statistic
mat b = e(b) // Vector of Coefficients
mat V = e(V) // Var-Cov matrix
mat R = (1) // for use in Rb-r. This does not == [0,1] because of
the use of the noconstant option in regress
mat r = (2.5) // Value you want to test for equality
mat W = (R*b-r)'*inv(R*V*R')*(R*b-r)
// This is where it breaks for you, because with perfect collinearity, V == 0
reg mpg2 mpg, nocons
test mpg = 2.5
sca F = r(F)
sca list F
mat list W
Now, as #Brendan Cox suggested, you might be able to simply use the missing value returned in r(p) to condition your replace command. Depending on exactly how you are using it. A word of caution on this, however, is that when the relationship between some x and y is such that y = 2x, and you want to test x = 5 vs test x = 2, you will want to be very careful about the interpretation of missing p-values - In both cases they are classified as type == 1, where the test x = 2 command should not result in that outcome.
Another work-around would be to simply set p = 0 in these cases, since the variance estimate will asymptotically approach 0 as the linear relationship becomes near perfect, and thus the Wald statistic will approach infinity (driving p down, all else equal).
A final yet more complicated work-around in this case could be to calculate the F-statistic manually using the formula in the manual, and setting V to some arbitrary, yet infinitesimally small number. I've included code to do this below, but it is quite a bit more involved than simply issuing the test command, and in truth only an approximation of the actual p-value from the F distribution.
clear *
sysuse auto
gen i = ceil(_n/5)
qui sum i
gen mpg2 = mpg * 2 if i <= 5 // Get different estimation results
replace mpg2 = mpg * 10 if i > 5 // over different subsets of data
gen type = .
local N = _N // use for d.f. calculation later
local iMax = r(max) // use to iterate loop
forvalues i = 1/`iMax' {
qui reg mpg2 mpg if i == `i', nocons
mat b`i' = e(b) // collect returned results for Wald stat
mat V`i' = e(V)
sca cov`i' = V`i'[1,1]
mat R`i' = (1)
mat r`i' = (2) // Value you wish to test against
if (cov`i' == 0) { // set V to be very small if Variance = 0 & calculate Wald
mat V`i' = 1.0e-14
}
mat W`i' = (R`i'*b`i'-r`i')'*inv(R`i'*V`i'*R`i'')*(R`i'*b`i'-r`i')
sca W`i' = W`i'[1,1] // collect Wald statistic into scalar
sca p`i' = Ftail(1,`N'-2, W`i') // pull p-value from F dist
if p`i' > .05 {
replace type = 1 if i == `i'
}
}
Also note that this workaround will become slightly more involved if you want to test multiple coefficients.
I'm not sure if I advise these approaches without issuing a word of caution considering you are in a very real sense "making up" variance estimates, but without a variance estimate you wont be able to test the coefficients at all.

Plot confidence interval efficiently

I want to plot confidence intervals for some estimates after running a regression model.
As I'm working with a very big dataset, I need an efficient solution: in particular, a solution that does not require me to sort or save the dataset. In the following example, I plot estimates for b1 to b6:
reg y b1 b2 b3 b4 b5 b6
foreach i of numlist 1/6 {
local mean `mean' `=_b[b`i']' `i'
local ci `ci' ///
(scatteri ///
`=_b[b`i'] +1.96*_se[b`i']' `i' ///
`=_b[`i'] -1.96 * _se[b`i']' `i' ///
,lpattern(shortdash) lcolor(navy))
}
twoway `ci' (scatteri `mean', mcolor(navy)), legend(off) yline(0)
While scatteri efficiently plots the estimates, I can't get boundaries for the confidence interval similar to rcap.
Is there a better way to do this?
Here's token code for what you seem to want. The example is ridiculous. It's my personal view that refining this would be pointless given the very accomplished previous work behind coefplot. The multiplier of 1.96 only applies in very large samples.
sysuse auto, clear
set scheme s1color
reg mpg weight length displ
gen coeff = .
gen upper = .
gen lower = .
gen which = .
local i = 0
quietly foreach v in weight length displ {
local ++i
replace coeff = _b[`v'] in `i'
replace upper = _b[`v'] + 1.96 * _se[`v'] in `i'
replace lower = _b[`v'] - 1.96 * _se[`v'] in `i'
replace which = `i' in `i'
label def which `i' "`v'", modify
}
label val which which
twoway scatter coeff which, mcolor(navy) xsc(r(0.5, `i'.5)) xla(1/`i', val) ///
|| rcap upper lower which, lcolor(navy) xtitle("") legend(off)

Create comparison-of-means table with multiple variables by multiple groups comparing to total mean

I'm looking for a way to create a comparison-of-means (t-test) table from the output of a tabstat command. Basically, I want to know if the mean of each group is statistically significantly different from the mean for the variable overall.
I have 75 variables across 15 groups for a total of 1125 t-tests, so doing them one at a time is out of the question.
I could always write a loop for the tests, but I was wondering if there was a command similar to tabstat that would make the table for me. Google has been unhelpful thus far, even though it seems like a fairly logical place to go from a tabstat output.
Thanks!
There might be packages that better serve you, but here's an example that I just put together. It's assuming you are using the one sample t-test because I can't see another way to do it with a t-test. This block of code returns a matrix with three things: the difference from the grand mean, the t value, and the p value.
Feel free to adapt the code as you see fit. Actually it'd just take a few more steps to make it into an ado file.
sysuse auto,clear
loca varlist mpg weight length price // put varlist here
loca grpvar foreign // put grouping variable here
loca n_var=wordcount("`varlist'")
qui tab `grpvar'
loca n_grp=`r(r)'
mat T=J(`n_var'*3,`n_grp',.) // (# of vars*3, # of groups,.)
**colnames
loca cnames=""
su `grpvar', meanonly
forval i=`r(min)'/`r(max)' { // assuming consecutive sequence
loca cnames="`cnames'"+" "+"`i'"
}
mat colnames T=`cnames' // values of grouping variable
**rownames
loca rnames=""
forval i=1/`n_var' {
loca var=word("`varlist'",`i')
loca rnames="`rnames'"+" "+"`var':diff `var':t `var':p"
}
mat rownames T=`rnames' // difference, t value, p value
loca i=1
foreach var in `varlist' {
loca j=1
su `grpvar', meanonly
forval f=`r(min)'/`r(max)' {
su `var', meanonly
loca ydbhat=`r(mean)' // y double hat
su `var' if `grpvar'==`f', meanonly
loca diff=`ydbhat'-`r(mean)' // difference
qui ttest `var'=`ydbhat' if `grpvar'==`f' // one-sample ttest
mat T[`i',`j']=`diff'
mat T[`i'+1,`j']=`r(t)'
mat T[`i'+2,`j']=`r(p)'
loca ++j
}
loca i=`i'+3
}
mat list T, f(%8.3f)
Now I am not sure if 15 columns would be too wide. If so, change the display format or even just use putexcel to export the matrix into a spreadsheet.
Edited: Fixed the forval i=0/1 in the loops to a more generally applicable form. Also other minor editing.
Edited the code a bit - can't post markdown in the comments, so I made it a new answer. This version does a two-sample t-test and also displays the cluster mean for each variable.
local varlist var1 var2 var3 // put varlist here
local grpvar _clus_1 // put grouping variable here
local n_var=wordcount("`varlist'")
qui summ `grpvar', meanonly
local n_grp=`r(max)'
mat T=J(`n_var'*4,`n_grp',.) // (# of vars*4,# of groups,.)
**colnames
local cnames=""
qui summ `grpvar', meanonly
forval i=`r(min)'/`r(max)' { // assuming consecutive sequence
local cnames="`cnames'"+" "+"`i'"
}
//di "`cnames'"
mat colnames T=`cnames' // values of grouping variable
**rownames
local rnames=""
forval i=1/`n_var' {
local var=word("`varlist'",`i')
local rnames="`rnames'"+" "+"`var':mean `var':diff `var':t-stat `var':p-value"
}
mat rownames T=`rnames' // mean, difference, t value, p value
local i=1
foreach var in `varlist' {
local j=1
qui summ `grpvar'
forval f=`r(min)'/`r(max)' {
qui summ `var'
local varmean=`r(mean)'
local varn = `r(N)'
local varsd = `r(sd)'
qui summ `var' if `grpvar'==`f'
local clusmean = `r(mean)'
local clusn = `r(N)'
local clussd = `r(sd)'
local diff=`clusmean'-`varmean' // difference
**two-sample t-test
qui ttesti `varn' `varmean' `varsd' `clusn' `clusmean' `clussd'
mat T[`i',`j']=`clusmean'
mat T[`i'+1,`j']=`diff'
mat T[`i'+2,`j']=`r(t)'
mat T[`i'+3,`j']=`r(p)'
local ++j
}
local i=`i'+4
}
mat list T, f(%8.3f)