Storing the predictions and coefficients from Stata for n replications - stata

I have a following code where I am trying to replicate the estimation for n times and then generating prediction and coefficients for further use.
capture program drop mypro
program define mypro
drop _all
sysuse auto
bsample
reg mpg price headroom
mat mycoef=e(b)
gen mypri=mycoef[1,1]
gen myhead=mycoef[1,2]
gen mycons=mycoef[1,3]
predict x1b
end
simulate, seed(10) reps(10) nodots : mypro
The simulate by default gives only the coefficients from 10 different samples. However, I am trying to save each sample dataset along with coefficients mpri, myhead, myconst,and x1b . Is it possible to do this using simulate or do I need to use loop?
Updated as per comment of Nick:
capture program drop mypro
program define mypro
set seed 1
local r=10
forvalues i=1/`r'{
drop _all
sysuse auto
bsample
reg mpg price headroom
mat mycoef=e(b)
gen mypri=mycoef[1,1]
gen myhead=mycoef[1,3]
predict x1b
save data`i',replace
}
end

You are calling simulate to run your program to take a bootstrap sample to get regression results.
sysuse auto
bootstrap : reg mpg price headroom
is a much simpler approach. Look at the documentation for bootstrap to learn more.

Related

How can I run a ttest on the CEM matched data?

I have run a CEM matching process on my data using Stata, and now I would like to know how to run a t-test on the variables of the matched data.
/* Simple example of my code; first I run the CEM */
cem age gender education, treatment(treat)
/* Then I want to have a look at the summary statistics of the entire population and the matched data (this code works fine) */
summarize age gender education
summarize age gender education [iweight=cem_weights]
/* But if I want to do a t test only on the matched data, I get an error with the weights */
ttest age, by(treat) /* works fine */
ttest age [iweight=cem_weights], by(treat) /* error saying that weights are not allowed */
How can I run t tests only on the matched data? An option could also be to export the matched data, so how could I do that?
Something like this should do it:
sysuse auto, clear
cem mpg weight rep78, treatment(foreign)
mean price [iweight = cem_weight], over(foreign) //coeflegend
lincom _b[c.price#1.foreign] - _b[c.price#0.foreign]
Personally, I would just use regress with het-robust standard errors (which gets similar results here, and will get even closer with a bigger sample):
regress price i.foreign [iweight = cem_weight], robust

How to access elements from previously-stored estimates

Say that I have use this data to run these regressions and store the output:
eststo clear
sysuse auto2, clear
eststo e_mpg: reg price mpg
eststo e_trunk: reg price trunk
I now want to be able to access the _b and _se components of e_mpg.
I can see them displayed by:
estimates replay e_mpg
However, running the following does not work:
estimates use e_mpg
Basically, I want to have these estimates be in the current memory so that I can access _se[...] and _b[...] or, if that is not possible, access something from ereturn list or return list or some other solution.
You need to use estimates restore instead:
eststo clear
sysuse auto2, clear
eststo e_mpg: regress price mpg
eststo e_trunk: regress price trunk
estimates restore e_mpg
(results e_mpg are active now)
matrix list e(b)
e(b)[1,2]
mpg _cons
y1 -238.89435 11253.061
display _b[mpg]
-238.89435
display _se[mpg]
53.076687

Stata creating output for IV regression with bysort

So I am running an 2SLS model by interview year and I have many interview years and different models. I want to present the first-stage results first and then after reassuring the reader that they are solid move on to the interesting results.
Example of Table A (first stage):
Year DV Coef SE F N
1 A 0.5 0.1 100 1000
2 A 0.8 0.2 10 1500
3 B -0.6 0.4 800 800
Table B with the main results would look the same just without the F-Stat.
I searched on the web about how to create those tables automatically in Stata, but despite finding many questions I didn't find an answer that worked for me. From those different posts and help-files I build something that is nearly there.
It creates the table I want for the main results with the F-Stat together by some variable (Step A in the code). However, when I move on to do the same for the first stage it only saves the last wave as I restore the estimates. I understand why Stata does it like that, but I cannot think of a way of convincing it to do what I want.
clear all
*Install user-written commands
ssc install outreg2, replace
ssc install ivreg210, replace
*load data
sysuse auto, clear
*run example model (obviously the model itself is bogus)
********************************************************
*Step A: creates the IV results by foreign plus the F-Statistic
bys foreign: ///
outreg2 using output1-IV-F, label excel stats(coef se) dec(2) adds(F-Test, e(widstat)) nocons nor2 keep(mpg) replace: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
*Step B: creates the first stage results in a seperate table
bys foreign: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
est restore _ivreg210_mpg
outreg2 using output1_1st-stage, replace keep(rep78)
cap erase output1-IV-F
cap erase output1_1st-stage
So ideally I would only run the model once and have the F-Stat in the first-stage table, but I can fix that manually. The biggest issue I have is how to store the estimates when using bysort. If anyone has any suggestions about that, I would greatly appreciate it.
Thanks!
ssc install estout
then you can store whatever result you want for later use, even after a bysort.
eststo clear
sysuse auto, clear
bysort foreign: eststo: reg price weight mpg
esttab, label nodepvar nonumber
This is a round-about solution. It works, but really isn't the proper solution I was/am looking for. The "trick" is to run the 1st stage as a separate model.
clear all
*Install user-written commands
ssc install outreg2, replace
ssc install ivreg210, replace
*load data
sysuse auto, clear
*run example model (obviously the model itself is bogus)
********************************************************
*Step A: creates the IV results by foreign plus the F-Statistic
bys foreign: ///
outreg2 using output1-IV-F, label excel stats(coef se) dec(2) adds(F-Test, e(widstat)) nocons nor2 keep(mpg) replace: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
*Step B: creates the first stage results in a seperate table
bys foreign: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
est restore _ivreg210_mpg
outreg2 using output1_1st-stage1, replace keep(rep78)
*************
/* NEW BIT */
*************
*Step C: creates the first stage results in a seperate table
bys foreign: ///
outreg2 using output1_1st_NEW, label excel stats(coef se) dec(2) nocons nor2 keep(rep78) replace: ///
reg mpg headroom trunk rep78
cap erase output1-IV-F
cap erase output1_1st-stage1
cap erase output1_1st_NEW

How to add more lines to esttab summarize summary stat table

I am trying to use esttab to create a LaTeX table with summary statistics using the summarize command. I can use code like the following to do this if I summarize multiple variables at once:
sysuse auto, clear
global vars price mpg headroom
eststo clear
eststo: estpost sum $vars, listwise
esttab est*, cells("count mean(fmt(2)) sd") nomtitles nonumber noobs
However, I am not sure how to summarize one line, store it, summarize another, store it, etc., and then combine all of them in the same table without creating unnecessary columns. I may want to summarize each variable individually if I want to make individualized restrictions by variable on which observations to summarize.
Here is code that doesn't get me what I want. Specifically, it does not put the summary statistics for each variable under the same column, but instead creates new columns, each set of which correspond to a different variable.
eststo clear
gen count = 1
foreach i in $vars {
eststo: estpost sum `i' if `i'>count
replace count = count+1
}
esttab est*, cells("count mean(fmt(2)) sd") nomtitles nonumber noobs
What should I change to get me my desired result?
Your problem is analogous to stacking models; instead of "models" you have summaries. The user-written command estout doesn't stack models, so one way out is to create your own matrix and feed it to estout (or esttab):
clear
set more off
*----- example data -----
sysuse auto
*----- two-variable example -----
eststo clear
// process price
estpost summarize price
matrix mymat = e(mean), e(count)
// process mpg
estpost summarize mpg if mpg > 15
matrix mymat = mymat \ e(mean), e(count)
// finish formatting matrix
matrix colnames mymat = mean count
matrix rownames mymat = price mpg
matrix list mymat
// tabulate
esttab matrix(mymat), nomtitles
With additional work, you can automatize the steps.
See http://repec.org/bocode/e/estout/advanced.html#advanced901 for another example.
You can use the fragment and append options to make tables line-by-line. You might want to do one variable without the fragment option to generate the same table header/footer, then cut-and-paste the remaining lines into this table.

Computing and plotting difference in group means

In what follows I plot the mean of an outcome of interest (price) by a grouping variable (foreign) for each possible value taken by the fake variable time:
sysuse auto, clear
gen time = rep78 - 3
bysort foreign time: egen avg_p = mean(price)
scatter avg_p time if (foreign==0 & time>=0) || ///
scatter avg_p time if (foreign==1 & time>=0), ///
legend(order(1 "Domestic" 2 "Foreign")) ///
ytitle("Average price") xlab(#3)
What I would like to do is to plot the difference in the two group means over time, not the two separate means.
I am surely missing something, but to me it looks complicated because the information about the averages is stored "vertically" (in avg_p).
The easiest way to do this is to arguably use linear regression to estimate the differences:
/* Regression Way */
drop if time < 0 | missing(time)
reg price i.foreign##i.time
margins, dydx(foreign) at(time =(0(1)2))
marginsplot, noci title("Foreign vs Domestic Difference in Price")
If regression is hard to wrap your mind around, the other is involves mangling the data with a reshape:
/* Transform the Data */
keep price time foreign
collapse (mean) price, by(time foreign)
reshape wide price, i(time) j(foreign)
gen diff = price1-price0
tw connected diff time
Here is another approach. graph dot will happily plot means.
sysuse auto, clear
set scheme s1color
collapse price if inrange(rep78, 3, 5), by(foreign rep78)
reshape wide price, i(rep78) j(foreign)
rename price0 Domestic
label var Domestic
rename price1 Foreign
label var Foreign
graph dot (asis) Domestic Foreign, over(rep78) vertical ///
marker(1, ms(Oh)) marker(2, ms(+))