I need to modify the code below which I'm using on some CPS data to capture insurance coverage. I need to output a file with the percent covered by Census region (there are four). It should look something like this:
region n percent
1 xxx xx
2 xxx xx
3 xxx xx
4 xxx xx
I could live with two rows defining the percentages covered and not covered in each region if necessary, but I really only need the percentage covered.
Here's the code I'm using:
svyset [iw=hinswt], sdrweight(repwt1-repwt160) vce(sdr)
tempname memhold
postfile `memhold' region_rec n prop using Insurance, replace
levelsof region_rec, local(lf)
foreach x of local lf{
svy, subpop(if region_rec==`x' & age>=3 & age<=17): proportion hcovany
scalar forx = `x'
scalar prop = _b[hcovany]
matrix b = e(_N_subp)
matrix c = e(_N)
scalar n = el(c,1,1)
post `memhold' (forx) (n) (prop)
postclose `memhold'
use Insurance, clear
This is what it produces:
Survey: Proportion estimation Number of obs = 210648
Population size = 291166198
Subpop. no. obs = 10829
Subpop. size = 10965424.5
Replications = 160
_prop_1: hcovany = Not covered
| Proportion Std. Err. [95% Conf. Interval]
hcovany |
_prop_1 | .0693129 .0046163 .0602651 .0783607
Covered | .9306871 .0046163 .9216393 .9397349
[hcovany] not found
I can't figure out how to get around the error message at the bottom and get it to save the results. I think a SE and CV would be a desirable feature as well, but I'm not sure how to handle that within the matrix framework.
EDIT: Additional output
| region~c n prop se |
| 1 9640 .9360977 2 |
| 2 12515 .9352329 2 |
| 3 14445 .8769684 2 |
| 4 13241 .8846368 2 |
Try changing _b[hcovany] for _b[some-value-label]. To be clear, the following non-sensical example is similar to your code, but instead of using _b[sex], where sex is a variable, it uses _b[Male], where Male is a value label for sex. Subpopulation sizes and standard errors
are also saved.
clear all
set more off
webuse nhanes2f
svyset [pweight=finalwgt]
tempname memhold
tempfile results
postfile `memhold' region nsubpop maleprop stderr using `results', replace
levelsof region, local(lf)
foreach x of local lf{
svy, subpop(if region == `x' & inrange(age, 20, 40)): proportion sex
post `memhold' (`x') (e(N_subpop)) (_b[Male]) (_se[Male])
postclose `memhold'
use `results', clear
If we were to use _b[sex] instead of _b[Male], we would get the same r(111) error as in your original post.
For this example, lets see what the matrix e(b), containing the estimated proportions, looks like:
. matrix list e(b)
sex: sex:
Male Female
y1 .48821487 .51178513
Therefore, if we wanted to extract the proportions for females instead
of males, we could use _b[Female].
Yet another option is to save the estimation result in a matrix and use numerical subscripts:
matrix b = e(b)
post `memhold' (`x') (b[1,2])
There are other slight changes like the use of inrange and direct use of returned estimation results with post.
Also, you may want to take a look at help _variables and its link:
[U] 13.5 Accessing coefficients and standard errors.
I am doing a weighted average and here is the table:
mean Income [fweight=Group]
Mean estimation
Number of obs = 1000
| Mean Std. Err. [95% Conf. Interval]
Income | 612.863 10.748 627.554 594.921
I really want to get the standard error and the confidence interval. However, I can only get variance by e(V). So my current method is to store e(V) in a matrix and store the element in a scalar and then use sqrt(). This is tedious!
Is there any way I can extract these statistics easily?
For example in R, all the output table is saved in a matrix RESULT and you can get the standard error simply through RESULT[1,2].
The command mean returns r(table) with the results you require:
webuse highschool, clear
mean height [pw = weight]
Mean estimation Number of obs = 4,071
| Mean Std. Err. [95% Conf. Interval]
height | 432.8991 .4149654 432.0856 433.7127
matrix list r(table)
b 432.89913
se .41496538
t 1043.2175
pvalue 0
ll 432.08557
ul 433.71269
df 4070
crit 1.960547
eform 0
More generally, different Stata commands return different results. However, in nearly all cases they give you all the ingredients to easily calculate what you need.
It may require a bit more effort to calculate further results but this is easily programmable and if you need to do something often you can write a wrapper program for the command.
Stata has some swell built-in operators to facilitate working with factor variables and interactions in the context of estimation commands. For example, assuming there are two factor variables named sex (male/female), and arm (treat/control) the following command:
. regress outcome sex##arm
produces estimates for indicator variables thus in the output:
ln_wage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
sex |
female | ...
arm |
control | ...
sex#arm |
femal#contr | ...
_cons | ...
The ## operator will also work for three-way interactions like sex##arm##group. In addition the ib or synonymous b operator provides a means of specifying which base value of each factor variable will serve as the referent category.
If I am writing a new estimation command MyReg, is there some syntax or parsing tool that will permit a call like MyReg outcome sex##arm to get access to these factor names/factor value names (appearing the left column of the above table) without having to write my own parser for the nontrivial set of Stata's factor variable operators?
I am not entirely sure if I completely understood what you have in mind, but here's an example to get you going down this road:
/* (1) Define MyReg */
capture program drop MyReg
program define MyReg, eclass
version 14.2
syntax varlist(min=1 fv)
/* do the regression */
regress `varlist'
/* pull out the column names from the coefficient matrix */
local fvnames : colfullnames e(b)
/* drop omitted categories from column names list */
foreach var of local fvnames {
_ms_parse_parts `var'
if !`r(omit)' {
local fvlist `fvlist' `var'
/* e-return the names */
ereturn local fvlist `fvlist'
/* (2) An Example */
sysuse auto
MyReg price i.foreign##c.weight
display "Left Column Contents: " e(fvlist)
There's also a great FAQ on useful factor variable commands for programmers here.
I have two variables in my firm-level dataset containing the industrial classification and the industry name to which that company belongs. For a given id_class, industry_name might be missing in the data. See below
| id_class | industry_name |
| 10 | auto |
| 11 | telecommunication |
| 12 | . |
I'm doing regressions by industry using the levelsof command to save each category in id_class to a local macro to allow me to loop through each category
levelsof id_class, (id_class_list)
foreach i of local id_class_list {
reg y x if id_class == `i'
I want to save the estimated coefficients for each regression to a table (I know how to do this part), but I want the table to have the title contained in the industry_name variable. How can I do this?
You can use the macro extended functions for extracting data attributes, such as variable value labels, like this:
sysuse auto
levelsof foreign, local(list)
foreach v of local list {
local vl: label (foreign) `v'
di "Car Origin is `vl':"
reg price if foreign==`v'
have a look at statsby : check out its help and manual entry
I am trying to calculate the 95% binomial Wilson confidence interval for the proportion of people completing treatment by year (dataset is line-listed for each person).
I want to store the results into a matrix so that I can use the putexcel command to export the results to an existing Excel spreadsheet without changing the formatting of the sheet. I have created a binary variable dscomplete_binary which is 0 for a person if treatment was not completed, and 1 if treatment was completed.
I have tried the following:
bysort year: ci dscomplete_binary, binomial wilson level(95)
This gives output of each year with the 95% confidence intervals. Previously I used statsby to collapse the dataset to store the results in variables but this clears the dataset from the memory and so I have to constantly re-open it.
Is there a way to run the command and store the results in a tabular format so that the data is stored in a similar way to this:
year mean LowerCI UpperCI
r1 2005 .7031588 .69229454 .71379805
r2 2006 .75532377 .74504232 .7653212
r3 2007 .78125924 .77125096 .79094833
r4 2008 .80014324 .79059798 .80935836
r5 2009 .81860977 .80955398 .82732689
r6 2010 .82641232 .81723672 .83522016
r7 2011 .81854123 .80955547 .82719356
r8 2012 .83497983 .82621944 .8433823
r9 2013 .85411799 .84527379 .86253893
r10 2014 .84461939 .83499599 .85377985
I have tried the following commands, which give different estimates to the binomial Wilson option:
svyset id2
bysort year: eststo: ci dscomplete_binary, binomial wilson level(95)
I think the postfile family of commands will help you here. This won't save your data into a matrix, but will save the results of the ci command into a new data set, which you name and whose structure you set. After the analysis is complete, you can load the data saved by postfile and export to Excel in the manner of your choosing.
For postfile, you analyze the data in a loop instead of using by or bysort.
Assuming the years in your data run 2005-2014, here is sample code:
/*make sure no postfile is open, in case a previous run did not close the file*/
cap postclose ci_results
/*create the postfile that will store results*/
postfile ci_results year mean lowerCI upperCI using ci_results.dta, replace
/*loop through years*/
forval y = 2004/2014 {
ci dscomplete_binary if year==`y', binomial wilson level(95)
/*store saved results from ci to postfile. Make sure the post statement contains results in the same order stated in postfile command.*/
post (`y') (r(mean)) (r(lb)) (r(ub))
/*close the postfile once you've looped through all the cases of interest*/
postclose ci_results
use ci_results.dta, clear
Once you load the ci_results.dta data into memory, you can apply any Excel exporting command you like.
This is a development of the suggestion already made to use statsby. The objections to it are quite puzzling, as it is easy to get back to the original dataset. There is some machine time in re-loading a dataset, but how much personal time has been spent in pursuit of an alternative?
Absent a dataset which we can use, I've provided a reproducible example.
If you wish to do this repeatedly, you'll write a more elaborate program to do it, which is what this forum is all about.
I leave how to export results to Excel as a matter for those so inclined: no details of what is wanted are provided in any case.
. sysuse auto, clear
(1978 Automobile Data)
. preserve
. statsby mean=r(mean) ub=r(ub) lb=r(lb), by(rep78) : ci foreign, binomial wilson level(95)
(running ci on estimation sample)
command: ci foreign, binomial wilson
mean: r(mean)
ub: r(ub)
lb: r(lb)
by: rep78
Statsby groups
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
. list
| rep78 mean ub lb |
1. | 1 0 .6576198 0 |
2. | 2 0 .3244076 0 |
3. | 3 .1 .2562108 .0345999 |
4. | 4 .5 .7096898 .2903102 |
5. | 5 .8181818 .9486323 .5230194 |
. restore
. describe
The describe results will show that we are back where we started.
I would like to put column percentages into an excel file. The first step therefore would be to capture the percentages (or counts if not possible for percentages) into a matrix and then post the values into excel using putexcel. I can not use the matcell and matrow option with svy: tab so I attempted to check the stored results using e(name). The issue I am facing is how to capture the values from the following tabulation into a matrix:
webuse nhanes2b, clear
svyset psuid [pweight=finalwgt], strata(stratid)
svy: tabulate sex race , format(%11.3g) percent
1=male, | 1=white, 2=black, 3=other
2=female | White Black Other Total
Male | 42.3 4.35 1.33 47.9
Female | 45.7 5.2 1.2 52.1
Total | 87.9 9.55 2.53 100
Key: cell percentages
I would like to put the values above in a matrix. I tried the following which worked:
mat pct = e(b)' * 100
matrix list pct
p11 42.254909
p12 4.3497373
p13 1.3303765
p21 45.660537
p22 5.2008547
p23 1.2035865
But what I am interested in is the column percentages given by the following tabulation:
svy: tabulate sex race , format(%11.3g) col percent
1=male, | 1=white, 2=black, 3=other
2=female | White Black Other Total
Male | 48.1 45.5 52.5 47.9
Female | 51.9 54.5 47.5 52.1
Total | 100 100 100 100
Key: column percentages
I tried this which did not return the desired values in the table above:
mat pct = e(b)' * 100
matrix list pct
p11 42.254909
p12 4.3497373
p13 1.3303765
p21 45.660537
p22 5.2008547
p23 1.2035865
After checking through various stored objects using ereturn list I did not seem to find anything corresponding to column percentages.
How can I get the column percentages into a matrix?
easy peasy
ssc install estout
webuse nhanes2b, clear
estpost svy: tabulate race diabetes, row percent
esttab . using "C:/table.csv", b(2) se(2) scalars(F_Pear) nostar unstack mtitle(`e(colvar)')
see also here http://repec.org/bocode/e/estout/hlp_estpost.html#svy_tabulate