generating bootstrap standard errors for large number of scalars - stata

Suppose I have four scalars: call them dea_1 dea_2 dea_3 dea_4. They are output from a program samprogram (not shown here).
Now I use the bootstrap command in Stata with these scalars to get bootstrapped standard errors.
set seed 123
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4), reps(100): samprogram
This is fine but in my original program, I calculate 30 scalars, dea_1 dea_2 ... dea_30. Now I want to avoid writing each of these 30 scalars in the bootstrap command and for this purpose I wrote a loop as follows:
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
This works, but gives the output for each scalar one at a time. However, I am looking for code which avoids writing all scalars in the bootstrap command but still gives the output for all at the same time (i.e. like the output from the following command)
set seed 345
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4)[omitted]...dea_30=r(dea_30), reps(100): samprogram
Any help in this regard will be highly appreciated.

This yields to building-up the contents of a local macro step-by-step.
set seed 123
forval i = 1/30 {
local call `call' dea_`i'=r(dea_`i')
}
bootstrap `call', reps(100) : samprogram
If need be, blank out the macro beforehand by
local call
More discussion in http://www.stata-journal.com/sjpdf.html?articlenum=pr0005 [free .pdf]
(LATER) Note that contrary to your assertion the code
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
would not work as intended. First time round, for example, bootstrap would be looking for r(dea_dea_1) and would return missing for every sample. The code for calling bootstrap repeatedly could simply be
set seed 234
forvalues i = 1/30 {
bootstrap dea_`i'=r(dea_`i'), reps(100): samprogram
}
but that would be a bad idea when you can do what you want in one call.

An alternative solution would be to make your program eclass and return the results in the matrix e(b). This allows the shortcut bootstrap _b, reps(100): samprogram. Below is an example. The key points here are that the different scalars are stored in the row vector `b', which is returned by the program as the row vector e(b) whith the command:
ereturn post `b', esample(`touse')
A complete example is here:
clear all
program define sim, eclass
syntax varlist(numeric) [if] [in], by(varname numeric)
marksample touse
markout `touse' `by'
local k : word count `varlist'
tempname b m0
matrix `b' = J(1,`k',.)
local i = 1
foreach var of local varlist {
sum `var' if `touse' & `by', meanonly
scalar `m0' = r(mean)
sum `var' if `touse' & !`by', meanonly
matrix `b'[1,`i'] = `m0' - r(mean)
local i = `i' + 1
}
ereturn post `b', esample(`touse')
end
sysuse auto
bootstrap _b, reps(100) : sim price mpg length weight trunk, by(foreign)

Related

Stata coefplot: plot coefficients and corresponding confidence intervals on 2nd axis

When trying to depict two coefficients from one regression on separate axes with Ben Jann's superb coefplot (ssc install coefplot) command, the coefficient to be shown on the 2nd axis is correctly displayed, but its confidence interval is depicted on the 1st scale.
Can anyone explain how I get the CI displayed on the same (2nd) axis as the coefficient it belongs to? I couldn't find any option to change this - and imagine it should be the default, if not the only, option to plot the CI around the point estimate it belongs to.
I use the latest coefplot version with Stata 16.
Here is a minimum example to illustrate the problem:
results plot
webuse union, clear
eststo results: reg idcode i.union grade
coefplot (results, keep(1.union)) (results, keep(grade) xaxis(2))
In the line
coefplot (results, keep(1.union)) (results, keep(grade) xaxis(2))
you specify the option xaxis(2), but this is not a documented option of coefplot, although it is a valid option of twoway rspike which is called by coefplot. Apparently, if you use xaxis(2) something goes wrong with the communication between coefplot and rspike.
This works for me:
coefplot (results, keep(1.union)) (results, keep(grade) axis(2))
I'm trying to create something similar. Since this option is not built-in we need to write a program to tweak how coefplot works. I'm sharing the code from the user manual here: http://repec.sowi.unibe.ch/stata/coefplot/markers.html
capt program drop coefplot_mlbl
*! version 1.0.0 10jun2021 Ben Jann
program coefplot_mlbl, sclass
_parse comma plots 0 : 0
syntax [, MLabel(passthru) * ]
if `"`mlabel'"'=="" local mlabel mlabel(string(#b, "%5.2f") + " (" + string(#ll, "%5.2f") + "; " + string(#ul, "%5.2f") + ")")
preserve
qui coefplot `plots', `options' `mlabel' generate replace nodraw
sreturn clear
tempvar touse
qui gen byte `touse' = __at<.
mata: st_global("s(mlbl)", ///
invtokens((strofreal(st_data(.,"__at","`touse'")) :+ " " :+ ///
"`" :+ `"""' :+ st_sdata(.,"__mlbl","`touse'") :+ `"""' :+ "'")'))
sreturn local plots `"`plots'"'
sreturn local options `"`options'"'
end
capt program drop coefplot_ymlbl
*! version 1.0.0 10jun2021 Ben Jann
program coefplot_ymlbl
_parse comma plots 0 : 0
syntax [, MLabel(str asis) * ]
_parse comma mlspec mlopts : mlabel
local mlopts = substr(`"`mlopts'"', 2, .) // remove leading comma
if `"`mlspec'"'!="" local mlabel mlabel(`mlspec')
else local mlabel
coefplot_mlbl `plots', `options' `mlabel'
coefplot `plots', ///
yaxis(1 2) yscale(alt) yscale(axis(2) alt noline) ///
ylabel(none, axis(2)) yti("", axis(2)) ///
ymlabel(`s(mlbl)', axis(2) notick angle(0) `mlopts') `options'
end
coefplot_ymlbl D F, drop(_cons) xline(0)
However, the above program does not allow for the option 'bylabel'. I get a stata error saying "bylabel not allowed". I wanted to ask if there is a way to edit this code and include the bylabel option which is used to label subplots?

Scatter plot color by variable

I want to make an scatter plot in Stata with points colored according to a categorical variable.
The only way I've found to do this, is to code colors in layers of a twoway plot.
However, this seems a rather convoluted solution for such a simple operation:
twoway (scatter latitud longitud if nougrups4 ==1, mcolor(black)) ///
(scatter latitud longitud if nougrups4 ==2, mcolor(blue)) ///
(scatter latitud longitud if nougrups4 ==3, mcolor(red)) ///
(scatter latitud longitud if nougrups4 ==4, mcolor(green))
Is there a simpler and automatic way to do this?
In this case, the categorical variable nougrups4 came from a cluster analysis. A general solution would be fine, but also a specific solution to draw clusters.
This is how I would do this by hand:
sysuse auto, clear
separate price, by(rep78)
tw scatter price? mpg
drop price?
Or in one line using Nick Cox's sepscatter command from SSC:
sepscatter price mpg, separate(rep78)
The latter command can also output other type of plots with the recast() option.
There isn't a 'simpler' built-in solution for what you want to do.
However, here's a simple wrapper command, which you can extend to meet your needs:
capture program drop foo
program define foo
syntax varlist(min=1 max=3)
quietly {
tokenize `varlist'
levelsof `3', local(foolevels)
local i = 0
local foocolors red green blue
foreach x of local foolevels {
local ++i
local extra `extra' || scatter `1' `2' if `3' == `x', mcolor("`: word `i' of `foocolors''")
}
twoway `extra'
}
end
And a toy example:
clear
set obs 10
generate A = runiform()
generate B = runiform()
generate C = .
replace C = 1 in 1/3
replace C = 2 in 4/7
replace C = 3 in 8/10
foo A B C

Post e(b) vector from a custom program in Stata

I wrote a program that computes a weighted regression and now I want my estimation results to be stored as an e(b) vector so that the bootstrap command can easily access the results, but I keep getting an error. My program looks like:
capture program drop mytest
program mytest, eclass
version 13
syntax varlist [if]
marksample touse
// mata subroutine creates matrix `b', such as mata: bla("`varlist'", "`touse'")
tempname b
matrix `b' = (1\2\3)
ereturn post `b'
end
mytest town_id
ereturn list
But I keep getting a conformability error r(503); upon running the script. When I instead post an ordinary matrix such as ereturn matrix x = b, everything works fine but I would like to have my coefficients stored 'properly' in an e(b) vector.
I checked Stata's documentation but was unable to find out why this is not working. Their advice is to code
tempname b V
// produce coefficient vector `b' and variance–covariance matrix `V'
ereturn post `b' `V', obs(`nobs') depname(`depn') esample(`touse')
The options of ereturn post are all optional. Could anyone tell me what I am missing here? Thanks!
Use a "row" vector instead of a "column" vector. If you check, for example, the stored results of regress, you'll see that this is what is expected.
capture program drop mytest
program mytest, eclass
version 13
syntax varlist [if]
marksample touse
// mata subroutine creates matrix `b', such as mata: bla("`varlist'", "`touse'")
tempname b
matrix `b' = (1,2,3)
ereturn post `b'
end
*----- tests -----
clear
sysuse auto
// mytest test
mytest mpg weight
ereturn list
matrix list e(b)
// regress example
regress price weight mpg
ereturn list
matrix list e(b)

How to make Stata margins work for user-written model

I wonder, what requirements must a user-written estimation and/or prediction program satisfy in order for standard Stata margins command to be able to work with it?
I have created a toy "estimation" program with a prediction module, but when I run margins, dydx(x) after myreg y x, Stata throws r(103) ("too many specified") and produces nothing. Can anyone modify my code so that margins could work with it?
Yes, I know that if e(predict) is not returned, margins assume linear prediction and work OK, but eventually I need to write a nonlinear model and estimate marginal effects for it.
program mypred
version 13
syntax name [if] [in]
marksample touse
local newVar = "`1'"
mat b = e(b)
local columnNames: colfullnames b
tokenize `columnNames'
gen `newVar' = b[1,1] + b[1,2] * `2'
end
program myreg, eclass
version 13
syntax varlist(min=2 max=2) [if] [in]
marksample touse
tempname b V
matrix input b = (1.1, 2.3)
matrix input V = (9, 1 \ 1, 4)
matrix colnames b = _cons `2'
matrix colnames V = _cons `2'
matrix rownames V = _cons `2'
ereturn post b V, esample(`touse')
ereturn local predict "mypred"
ereturn local cmd "myreg"
ereturn display
end
I don't have a complete answer. If there is such a one-stop location within the Stata documentation that answers your question, I'm not aware of it.
The recommendation is to read, at least, the whole entry: [R] margins. Here is a list of conditions that should be considered:
margins cannot be used after estimation commands that do not produce
full variance matrices, such as exlogistic and expoisson (see [R]
exlogistic and [R] expoisson).
margins is all about covariates and
cannot be used after estimation commands that do not post the
covariates, which eliminates gmm (see [R] gmm).
margins cannot be used
after estimation commands that have an odd data organization, and that
excludes asclogit, asmprobit, asroprobit, and nlogit (see [R]
asclogit, [R] asmprobit, [R] asroprobit, and [R] nlogit).
From another subsection:
... as of Stata 11, you are supposed to set in e(marginsok) the list
of options allowed with predict that are okay to use with margins.
Consider also inspecting (help viewsource) user-written commands from experienced user/programmers who allow for this in their commands. Maarten Buis is one of them. (You can run search maarten buis, all to search within Stata.)

Stata output files in surveys

I have some survey data which I'm using Stata to analyze. I want to compute means of one variable by group and save those means to a Stata file. My code looks like this:
svyset [iw=wtsupp], sdrweight(repwtp1-repwtp160) vce(sdr)
svy: mean x
I tried
svy: by grp: mean x
but that did not work. I could save each mean to a separate file by simply saying
svy: mean x if grp==1
but that's inefficient. Is there a better way?
Saving results to a file like one can use SAS ODS to capture results is also a need. I am not talking about the log here. I need the means and the associated group. I'm thinking
estimates save [path],replace
but I'm not sure if that will give me a Stata file or the group if I can figure out how to use by processing.
Here's a simpler approach that creates a data set of the displayed estimation results: estimated means, standard errors, confidence limits, z statistics, and p-values. svy: mean is called with the over() option, which does away with the need for the foreach loop and computes standard errors appropriate for subpopulation analysis. The estimation results are contained in the returned matrix r(table), which is converted by the svmat command to a Stata data set. While svmat maintains column names, it does not preserve row (group) names, so it is necessary to merge these in to the created data set.
set more off
use http://www.stata-press.com/data/r13/ss07ptx, clear
svyset _n [pw= pwgtp], sdrweight(pwgtp*) vce(sdr)
************************************************ *
* Set name of grouping variable in double quotes *
* in the next line. *
* ************************************************
local gpname "sex"
tempvar gp
egen `gp' = group(`gpname')
preserve
tempfile t1
bys `gp': keep if _n==1
keep `gp' `gpname'
save `t1'
restore
svy: mean agep , over(`gp')
matrix a = r(table)'
clear
qui svmat double a, names(col)
gen `gp'=_n
merge 1:1 `gp' using `t1'
keep `gpname' b se z pvalue ll ul
order `gpname'
save results, replace
list
Edited 10/28
This version contains legibility improvements and the outcome variable and saved datasets are specified in a local macro. Therefore the analyst need not touch the foreach block. Easier to write and read matrix subscript expressions are used instead of the el matrix function: thus m[1,1] instead of el("m",1,1).
sysuse auto, clear
svyset _n
************************************************ *
* Set names of grouping variable and results data *
* set in double quotes in the next line. *
* ************************************************
local yvar mpg // variable for mean
local gpname "foreign"
local d_results "results"
tempvar gp
gen `gp' = `gpname'
tempname memhold
postfile `memhold' ///
`gpname' n mean se sd using `d_results', replace
levelsof `gp', local(lg)
foreach x of local lg{
svy, subpop(if `gp'==`x'): mean `yvar'
matrix m = e(b)
matrix v = e(V)
matrix a = e(V_srssub)
matrix b = e(_N_subp)
matrix c = e(_N)
scalar gx = `x'
scalar mean = m[1,1]
scalar sem = sqrt(v[1,1])
scalar sd = sqrt(b[1,1]*a[1,1])
scalar n = c[1,1]
post `memhold' (gx) (n) (mean) (sem) (sd)
}
postclose `memhold'
use results, clear
list