Post e(b) vector from a custom program in Stata - stata

I wrote a program that computes a weighted regression and now I want my estimation results to be stored as an e(b) vector so that the bootstrap command can easily access the results, but I keep getting an error. My program looks like:
capture program drop mytest
program mytest, eclass
version 13
syntax varlist [if]
marksample touse
// mata subroutine creates matrix `b', such as mata: bla("`varlist'", "`touse'")
tempname b
matrix `b' = (1\2\3)
ereturn post `b'
end
mytest town_id
ereturn list
But I keep getting a conformability error r(503); upon running the script. When I instead post an ordinary matrix such as ereturn matrix x = b, everything works fine but I would like to have my coefficients stored 'properly' in an e(b) vector.
I checked Stata's documentation but was unable to find out why this is not working. Their advice is to code
tempname b V
// produce coefficient vector `b' and variance–covariance matrix `V'
ereturn post `b' `V', obs(`nobs') depname(`depn') esample(`touse')
The options of ereturn post are all optional. Could anyone tell me what I am missing here? Thanks!

Use a "row" vector instead of a "column" vector. If you check, for example, the stored results of regress, you'll see that this is what is expected.
capture program drop mytest
program mytest, eclass
version 13
syntax varlist [if]
marksample touse
// mata subroutine creates matrix `b', such as mata: bla("`varlist'", "`touse'")
tempname b
matrix `b' = (1,2,3)
ereturn post `b'
end
*----- tests -----
clear
sysuse auto
// mytest test
mytest mpg weight
ereturn list
matrix list e(b)
// regress example
regress price weight mpg
ereturn list
matrix list e(b)

Related

Print CI with esttab in a specific order Stata

I´m trying to print my CI with esttab in this order:
My code with the cars.csv dataset form https://gist.github.com/noamross/e5d3e859aa0c794be10b#file-cars-csv:
clear all
import excel "C:\Users\luism\Desktop\archive/cars_usa.xlsx", sheet("hoja1") firstrow
destring mileage, force replace
reg price mileage
estimates store model_1
esttab model_1 using "C:\Users\luism\Desktop\archive/results.csv", replace beta ci
This is what I got when I separate by commas the results.csv file.
I would like to print the upper bound below the lower bound, not next to after I separate by commas the results.csv. Thanks
estadd allows you to add anything as a scalar and include it as follows:
sysuse auto , clear
reg price mpg
matrix results = r(table)
estadd scalar upperCI = results[rownumb(results,"ul"),colnumb(results,"mpg")]
estadd scalar lowerCI = results[rownumb(results,"ll"),colnumb(results,"mpg")]
esttab, stats(lowerCI upperCI)
This is exactly what Dimitry was suggesting in his comment. Wouter's comment is true to the extent that this isn't, strictly speaking, an "option" of esttab or estout. However, estadd provides a large amount of flexibility to esttab and it's worth knowing about.

Scatter plot color by variable

I want to make an scatter plot in Stata with points colored according to a categorical variable.
The only way I've found to do this, is to code colors in layers of a twoway plot.
However, this seems a rather convoluted solution for such a simple operation:
twoway (scatter latitud longitud if nougrups4 ==1, mcolor(black)) ///
(scatter latitud longitud if nougrups4 ==2, mcolor(blue)) ///
(scatter latitud longitud if nougrups4 ==3, mcolor(red)) ///
(scatter latitud longitud if nougrups4 ==4, mcolor(green))
Is there a simpler and automatic way to do this?
In this case, the categorical variable nougrups4 came from a cluster analysis. A general solution would be fine, but also a specific solution to draw clusters.
This is how I would do this by hand:
sysuse auto, clear
separate price, by(rep78)
tw scatter price? mpg
drop price?
Or in one line using Nick Cox's sepscatter command from SSC:
sepscatter price mpg, separate(rep78)
The latter command can also output other type of plots with the recast() option.
There isn't a 'simpler' built-in solution for what you want to do.
However, here's a simple wrapper command, which you can extend to meet your needs:
capture program drop foo
program define foo
syntax varlist(min=1 max=3)
quietly {
tokenize `varlist'
levelsof `3', local(foolevels)
local i = 0
local foocolors red green blue
foreach x of local foolevels {
local ++i
local extra `extra' || scatter `1' `2' if `3' == `x', mcolor("`: word `i' of `foocolors''")
}
twoway `extra'
}
end
And a toy example:
clear
set obs 10
generate A = runiform()
generate B = runiform()
generate C = .
replace C = 1 in 1/3
replace C = 2 in 4/7
replace C = 3 in 8/10
foo A B C

How to make Stata margins work for user-written model

I wonder, what requirements must a user-written estimation and/or prediction program satisfy in order for standard Stata margins command to be able to work with it?
I have created a toy "estimation" program with a prediction module, but when I run margins, dydx(x) after myreg y x, Stata throws r(103) ("too many specified") and produces nothing. Can anyone modify my code so that margins could work with it?
Yes, I know that if e(predict) is not returned, margins assume linear prediction and work OK, but eventually I need to write a nonlinear model and estimate marginal effects for it.
program mypred
version 13
syntax name [if] [in]
marksample touse
local newVar = "`1'"
mat b = e(b)
local columnNames: colfullnames b
tokenize `columnNames'
gen `newVar' = b[1,1] + b[1,2] * `2'
end
program myreg, eclass
version 13
syntax varlist(min=2 max=2) [if] [in]
marksample touse
tempname b V
matrix input b = (1.1, 2.3)
matrix input V = (9, 1 \ 1, 4)
matrix colnames b = _cons `2'
matrix colnames V = _cons `2'
matrix rownames V = _cons `2'
ereturn post b V, esample(`touse')
ereturn local predict "mypred"
ereturn local cmd "myreg"
ereturn display
end
I don't have a complete answer. If there is such a one-stop location within the Stata documentation that answers your question, I'm not aware of it.
The recommendation is to read, at least, the whole entry: [R] margins. Here is a list of conditions that should be considered:
margins cannot be used after estimation commands that do not produce
full variance matrices, such as exlogistic and expoisson (see [R]
exlogistic and [R] expoisson).
margins is all about covariates and
cannot be used after estimation commands that do not post the
covariates, which eliminates gmm (see [R] gmm).
margins cannot be used
after estimation commands that have an odd data organization, and that
excludes asclogit, asmprobit, asroprobit, and nlogit (see [R]
asclogit, [R] asmprobit, [R] asroprobit, and [R] nlogit).
From another subsection:
... as of Stata 11, you are supposed to set in e(marginsok) the list
of options allowed with predict that are okay to use with margins.
Consider also inspecting (help viewsource) user-written commands from experienced user/programmers who allow for this in their commands. Maarten Buis is one of them. (You can run search maarten buis, all to search within Stata.)

Stata output files in surveys

I have some survey data which I'm using Stata to analyze. I want to compute means of one variable by group and save those means to a Stata file. My code looks like this:
svyset [iw=wtsupp], sdrweight(repwtp1-repwtp160) vce(sdr)
svy: mean x
I tried
svy: by grp: mean x
but that did not work. I could save each mean to a separate file by simply saying
svy: mean x if grp==1
but that's inefficient. Is there a better way?
Saving results to a file like one can use SAS ODS to capture results is also a need. I am not talking about the log here. I need the means and the associated group. I'm thinking
estimates save [path],replace
but I'm not sure if that will give me a Stata file or the group if I can figure out how to use by processing.
Here's a simpler approach that creates a data set of the displayed estimation results: estimated means, standard errors, confidence limits, z statistics, and p-values. svy: mean is called with the over() option, which does away with the need for the foreach loop and computes standard errors appropriate for subpopulation analysis. The estimation results are contained in the returned matrix r(table), which is converted by the svmat command to a Stata data set. While svmat maintains column names, it does not preserve row (group) names, so it is necessary to merge these in to the created data set.
set more off
use http://www.stata-press.com/data/r13/ss07ptx, clear
svyset _n [pw= pwgtp], sdrweight(pwgtp*) vce(sdr)
************************************************ *
* Set name of grouping variable in double quotes *
* in the next line. *
* ************************************************
local gpname "sex"
tempvar gp
egen `gp' = group(`gpname')
preserve
tempfile t1
bys `gp': keep if _n==1
keep `gp' `gpname'
save `t1'
restore
svy: mean agep , over(`gp')
matrix a = r(table)'
clear
qui svmat double a, names(col)
gen `gp'=_n
merge 1:1 `gp' using `t1'
keep `gpname' b se z pvalue ll ul
order `gpname'
save results, replace
list
Edited 10/28
This version contains legibility improvements and the outcome variable and saved datasets are specified in a local macro. Therefore the analyst need not touch the foreach block. Easier to write and read matrix subscript expressions are used instead of the el matrix function: thus m[1,1] instead of el("m",1,1).
sysuse auto, clear
svyset _n
************************************************ *
* Set names of grouping variable and results data *
* set in double quotes in the next line. *
* ************************************************
local yvar mpg // variable for mean
local gpname "foreign"
local d_results "results"
tempvar gp
gen `gp' = `gpname'
tempname memhold
postfile `memhold' ///
`gpname' n mean se sd using `d_results', replace
levelsof `gp', local(lg)
foreach x of local lg{
svy, subpop(if `gp'==`x'): mean `yvar'
matrix m = e(b)
matrix v = e(V)
matrix a = e(V_srssub)
matrix b = e(_N_subp)
matrix c = e(_N)
scalar gx = `x'
scalar mean = m[1,1]
scalar sem = sqrt(v[1,1])
scalar sd = sqrt(b[1,1]*a[1,1])
scalar n = c[1,1]
post `memhold' (gx) (n) (mean) (sem) (sd)
}
postclose `memhold'
use results, clear
list

generating bootstrap standard errors for large number of scalars

Suppose I have four scalars: call them dea_1 dea_2 dea_3 dea_4. They are output from a program samprogram (not shown here).
Now I use the bootstrap command in Stata with these scalars to get bootstrapped standard errors.
set seed 123
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4), reps(100): samprogram
This is fine but in my original program, I calculate 30 scalars, dea_1 dea_2 ... dea_30. Now I want to avoid writing each of these 30 scalars in the bootstrap command and for this purpose I wrote a loop as follows:
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
This works, but gives the output for each scalar one at a time. However, I am looking for code which avoids writing all scalars in the bootstrap command but still gives the output for all at the same time (i.e. like the output from the following command)
set seed 345
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4)[omitted]...dea_30=r(dea_30), reps(100): samprogram
Any help in this regard will be highly appreciated.
This yields to building-up the contents of a local macro step-by-step.
set seed 123
forval i = 1/30 {
local call `call' dea_`i'=r(dea_`i')
}
bootstrap `call', reps(100) : samprogram
If need be, blank out the macro beforehand by
local call
More discussion in http://www.stata-journal.com/sjpdf.html?articlenum=pr0005 [free .pdf]
(LATER) Note that contrary to your assertion the code
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
would not work as intended. First time round, for example, bootstrap would be looking for r(dea_dea_1) and would return missing for every sample. The code for calling bootstrap repeatedly could simply be
set seed 234
forvalues i = 1/30 {
bootstrap dea_`i'=r(dea_`i'), reps(100): samprogram
}
but that would be a bad idea when you can do what you want in one call.
An alternative solution would be to make your program eclass and return the results in the matrix e(b). This allows the shortcut bootstrap _b, reps(100): samprogram. Below is an example. The key points here are that the different scalars are stored in the row vector `b', which is returned by the program as the row vector e(b) whith the command:
ereturn post `b', esample(`touse')
A complete example is here:
clear all
program define sim, eclass
syntax varlist(numeric) [if] [in], by(varname numeric)
marksample touse
markout `touse' `by'
local k : word count `varlist'
tempname b m0
matrix `b' = J(1,`k',.)
local i = 1
foreach var of local varlist {
sum `var' if `touse' & `by', meanonly
scalar `m0' = r(mean)
sum `var' if `touse' & !`by', meanonly
matrix `b'[1,`i'] = `m0' - r(mean)
local i = `i' + 1
}
ereturn post `b', esample(`touse')
end
sysuse auto
bootstrap _b, reps(100) : sim price mpg length weight trunk, by(foreign)