Stata output files in surveys - stata

I have some survey data which I'm using Stata to analyze. I want to compute means of one variable by group and save those means to a Stata file. My code looks like this:
svyset [iw=wtsupp], sdrweight(repwtp1-repwtp160) vce(sdr)
svy: mean x
I tried
svy: by grp: mean x
but that did not work. I could save each mean to a separate file by simply saying
svy: mean x if grp==1
but that's inefficient. Is there a better way?
Saving results to a file like one can use SAS ODS to capture results is also a need. I am not talking about the log here. I need the means and the associated group. I'm thinking
estimates save [path],replace
but I'm not sure if that will give me a Stata file or the group if I can figure out how to use by processing.

Here's a simpler approach that creates a data set of the displayed estimation results: estimated means, standard errors, confidence limits, z statistics, and p-values. svy: mean is called with the over() option, which does away with the need for the foreach loop and computes standard errors appropriate for subpopulation analysis. The estimation results are contained in the returned matrix r(table), which is converted by the svmat command to a Stata data set. While svmat maintains column names, it does not preserve row (group) names, so it is necessary to merge these in to the created data set.
set more off
use http://www.stata-press.com/data/r13/ss07ptx, clear
svyset _n [pw= pwgtp], sdrweight(pwgtp*) vce(sdr)
************************************************ *
* Set name of grouping variable in double quotes *
* in the next line. *
* ************************************************
local gpname "sex"
tempvar gp
egen `gp' = group(`gpname')
preserve
tempfile t1
bys `gp': keep if _n==1
keep `gp' `gpname'
save `t1'
restore
svy: mean agep , over(`gp')
matrix a = r(table)'
clear
qui svmat double a, names(col)
gen `gp'=_n
merge 1:1 `gp' using `t1'
keep `gpname' b se z pvalue ll ul
order `gpname'
save results, replace
list

Edited 10/28
This version contains legibility improvements and the outcome variable and saved datasets are specified in a local macro. Therefore the analyst need not touch the foreach block. Easier to write and read matrix subscript expressions are used instead of the el matrix function: thus m[1,1] instead of el("m",1,1).
sysuse auto, clear
svyset _n
************************************************ *
* Set names of grouping variable and results data *
* set in double quotes in the next line. *
* ************************************************
local yvar mpg // variable for mean
local gpname "foreign"
local d_results "results"
tempvar gp
gen `gp' = `gpname'
tempname memhold
postfile `memhold' ///
`gpname' n mean se sd using `d_results', replace
levelsof `gp', local(lg)
foreach x of local lg{
svy, subpop(if `gp'==`x'): mean `yvar'
matrix m = e(b)
matrix v = e(V)
matrix a = e(V_srssub)
matrix b = e(_N_subp)
matrix c = e(_N)
scalar gx = `x'
scalar mean = m[1,1]
scalar sem = sqrt(v[1,1])
scalar sd = sqrt(b[1,1]*a[1,1])
scalar n = c[1,1]
post `memhold' (gx) (n) (mean) (sem) (sd)
}
postclose `memhold'
use results, clear
list

Related

How do I get two coefficients from a set of regressions plotted on the same chart?

I am estimating a model in Stata 16 over several subsamples. I want a chart comparing two coefficients of interest over the different subsamples, with axis labels showing which subsample it comes from.
Is there a way to combine both of these on the same panel, with the mileage estimates in one colour and the trunk space in another?
The closest I can get using coefplot is a tiled plot with a set of coefficients of one variable in one panel, and the coefficients for the other variable in another panel (see toy example below). Any idea how to get both on the same panel?
webuse auto
forval v=2/5 {
reg price trunk mpg if rep78==`v'
est store reg_`v'
}
coefplot reg_2 || reg_3 || reg_4 || reg_5, keep(trunk mpg) bycoefs vertical
There's likely a more elegant way to do this with coefplot, but until someone posts that solution: you can use matrices to brute force coefplot into behaving the way you'd like. Specifically, define as many matrices as you have unique covariates, with each matrix's dimension being #specs x 3. Each row will contain the covariate's estimated coefficient, lower CI, and upper CI for a particular model specification.
This works because coefplot assigns the same color to all quantities associated with plot (as defined by coefplot's help file). plot is usually a stored model from estimates store, but by using the matrix trick, we've shifted plot to be equivalent to a specific covariate, giving us the same color for a covariate across all the model specifications. coefplot then looks to the matrix's rows to find its "categorical" information for the labeled axis. In this case, our matrix's rows correspond to a stored model, giving us the specification for our axis labels.
// (With macros for the specification names + # of coefficient
// matrices, for generalizability)
clear *
webuse auto
// Declare model's covariates
local covariates trunk mpg
// Estimate the various model specifs
local specNm = "" // holder for gph axis labels
forval v=2/5 {
// Estimate the model
reg price `covariates' if rep78==`v'
// Store specification's name, for gph axis labels
local specNm = "`specNm' reg_`v'"
// For each covariate, pull its coefficient + CIs for this model, then
// append that row vector to a new matrix containing that covariate's
// b + CIs across all specifications
matrix temp = r(table)
foreach x of local covariates{
matrix `x' = nullmat(`x') \ (temp["b","`x'"], temp["ll","`x'"], temp["ul","`x'"])
}
}
// Store the list of 'new' covariate matrices, along w/the
// column within this matrix containing the coefficients
global coefGphList = ""
foreach x of local covariates{
matrix rownames `x' = `specNm'
global coefGphList = "$coefGphList matrix(`x'[,1])"
}
// Plot
coefplot $coefGphList, ci((2 3)) vertical

Print CI with esttab in a specific order Stata

I´m trying to print my CI with esttab in this order:
My code with the cars.csv dataset form https://gist.github.com/noamross/e5d3e859aa0c794be10b#file-cars-csv:
clear all
import excel "C:\Users\luism\Desktop\archive/cars_usa.xlsx", sheet("hoja1") firstrow
destring mileage, force replace
reg price mileage
estimates store model_1
esttab model_1 using "C:\Users\luism\Desktop\archive/results.csv", replace beta ci
This is what I got when I separate by commas the results.csv file.
I would like to print the upper bound below the lower bound, not next to after I separate by commas the results.csv. Thanks
estadd allows you to add anything as a scalar and include it as follows:
sysuse auto , clear
reg price mpg
matrix results = r(table)
estadd scalar upperCI = results[rownumb(results,"ul"),colnumb(results,"mpg")]
estadd scalar lowerCI = results[rownumb(results,"ll"),colnumb(results,"mpg")]
esttab, stats(lowerCI upperCI)
This is exactly what Dimitry was suggesting in his comment. Wouter's comment is true to the extent that this isn't, strictly speaking, an "option" of esttab or estout. However, estadd provides a large amount of flexibility to esttab and it's worth knowing about.

Scatter plot color by variable

I want to make an scatter plot in Stata with points colored according to a categorical variable.
The only way I've found to do this, is to code colors in layers of a twoway plot.
However, this seems a rather convoluted solution for such a simple operation:
twoway (scatter latitud longitud if nougrups4 ==1, mcolor(black)) ///
(scatter latitud longitud if nougrups4 ==2, mcolor(blue)) ///
(scatter latitud longitud if nougrups4 ==3, mcolor(red)) ///
(scatter latitud longitud if nougrups4 ==4, mcolor(green))
Is there a simpler and automatic way to do this?
In this case, the categorical variable nougrups4 came from a cluster analysis. A general solution would be fine, but also a specific solution to draw clusters.
This is how I would do this by hand:
sysuse auto, clear
separate price, by(rep78)
tw scatter price? mpg
drop price?
Or in one line using Nick Cox's sepscatter command from SSC:
sepscatter price mpg, separate(rep78)
The latter command can also output other type of plots with the recast() option.
There isn't a 'simpler' built-in solution for what you want to do.
However, here's a simple wrapper command, which you can extend to meet your needs:
capture program drop foo
program define foo
syntax varlist(min=1 max=3)
quietly {
tokenize `varlist'
levelsof `3', local(foolevels)
local i = 0
local foocolors red green blue
foreach x of local foolevels {
local ++i
local extra `extra' || scatter `1' `2' if `3' == `x', mcolor("`: word `i' of `foocolors''")
}
twoway `extra'
}
end
And a toy example:
clear
set obs 10
generate A = runiform()
generate B = runiform()
generate C = .
replace C = 1 in 1/3
replace C = 2 in 4/7
replace C = 3 in 8/10
foo A B C

How to make Stata margins work for user-written model

I wonder, what requirements must a user-written estimation and/or prediction program satisfy in order for standard Stata margins command to be able to work with it?
I have created a toy "estimation" program with a prediction module, but when I run margins, dydx(x) after myreg y x, Stata throws r(103) ("too many specified") and produces nothing. Can anyone modify my code so that margins could work with it?
Yes, I know that if e(predict) is not returned, margins assume linear prediction and work OK, but eventually I need to write a nonlinear model and estimate marginal effects for it.
program mypred
version 13
syntax name [if] [in]
marksample touse
local newVar = "`1'"
mat b = e(b)
local columnNames: colfullnames b
tokenize `columnNames'
gen `newVar' = b[1,1] + b[1,2] * `2'
end
program myreg, eclass
version 13
syntax varlist(min=2 max=2) [if] [in]
marksample touse
tempname b V
matrix input b = (1.1, 2.3)
matrix input V = (9, 1 \ 1, 4)
matrix colnames b = _cons `2'
matrix colnames V = _cons `2'
matrix rownames V = _cons `2'
ereturn post b V, esample(`touse')
ereturn local predict "mypred"
ereturn local cmd "myreg"
ereturn display
end
I don't have a complete answer. If there is such a one-stop location within the Stata documentation that answers your question, I'm not aware of it.
The recommendation is to read, at least, the whole entry: [R] margins. Here is a list of conditions that should be considered:
margins cannot be used after estimation commands that do not produce
full variance matrices, such as exlogistic and expoisson (see [R]
exlogistic and [R] expoisson).
margins is all about covariates and
cannot be used after estimation commands that do not post the
covariates, which eliminates gmm (see [R] gmm).
margins cannot be used
after estimation commands that have an odd data organization, and that
excludes asclogit, asmprobit, asroprobit, and nlogit (see [R]
asclogit, [R] asmprobit, [R] asroprobit, and [R] nlogit).
From another subsection:
... as of Stata 11, you are supposed to set in e(marginsok) the list
of options allowed with predict that are okay to use with margins.
Consider also inspecting (help viewsource) user-written commands from experienced user/programmers who allow for this in their commands. Maarten Buis is one of them. (You can run search maarten buis, all to search within Stata.)

generating bootstrap standard errors for large number of scalars

Suppose I have four scalars: call them dea_1 dea_2 dea_3 dea_4. They are output from a program samprogram (not shown here).
Now I use the bootstrap command in Stata with these scalars to get bootstrapped standard errors.
set seed 123
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4), reps(100): samprogram
This is fine but in my original program, I calculate 30 scalars, dea_1 dea_2 ... dea_30. Now I want to avoid writing each of these 30 scalars in the bootstrap command and for this purpose I wrote a loop as follows:
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
This works, but gives the output for each scalar one at a time. However, I am looking for code which avoids writing all scalars in the bootstrap command but still gives the output for all at the same time (i.e. like the output from the following command)
set seed 345
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4)[omitted]...dea_30=r(dea_30), reps(100): samprogram
Any help in this regard will be highly appreciated.
This yields to building-up the contents of a local macro step-by-step.
set seed 123
forval i = 1/30 {
local call `call' dea_`i'=r(dea_`i')
}
bootstrap `call', reps(100) : samprogram
If need be, blank out the macro beforehand by
local call
More discussion in http://www.stata-journal.com/sjpdf.html?articlenum=pr0005 [free .pdf]
(LATER) Note that contrary to your assertion the code
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
would not work as intended. First time round, for example, bootstrap would be looking for r(dea_dea_1) and would return missing for every sample. The code for calling bootstrap repeatedly could simply be
set seed 234
forvalues i = 1/30 {
bootstrap dea_`i'=r(dea_`i'), reps(100): samprogram
}
but that would be a bad idea when you can do what you want in one call.
An alternative solution would be to make your program eclass and return the results in the matrix e(b). This allows the shortcut bootstrap _b, reps(100): samprogram. Below is an example. The key points here are that the different scalars are stored in the row vector `b', which is returned by the program as the row vector e(b) whith the command:
ereturn post `b', esample(`touse')
A complete example is here:
clear all
program define sim, eclass
syntax varlist(numeric) [if] [in], by(varname numeric)
marksample touse
markout `touse' `by'
local k : word count `varlist'
tempname b m0
matrix `b' = J(1,`k',.)
local i = 1
foreach var of local varlist {
sum `var' if `touse' & `by', meanonly
scalar `m0' = r(mean)
sum `var' if `touse' & !`by', meanonly
matrix `b'[1,`i'] = `m0' - r(mean)
local i = `i' + 1
}
ereturn post `b', esample(`touse')
end
sysuse auto
bootstrap _b, reps(100) : sim price mpg length weight trunk, by(foreign)