Percentile for COMPLEX SURVEY DATA while using subpop option - stata

I am using a survey sample and am trying to analyze a subpopulation.
I am trying to get mean, median, 10th percentile and 90th percentile of a continuous varaible for my subpopulation of interets.
Stata website http://www.stata.com/support/faqs/statistics/percentiles-for-survey-data/ shows the metod to obtain median/percentiles.
However, I am interested in sub population and not the entire sample.
Can you please show me the appropriate commands to obtain any percentile while using a complex survey sample with sub population option?

You can use _pctile to get percentiles for a subpopulation without svyset, because the percentiles depend only on the weights. However to get standard errors and confidence intervals, you should download epctile by Stas Kolenikov (findit epctile in Stata) and svyset the data.
net describe epctile, from(http://web.missouri.edu/~kolenikovs/stata)
net install epctile.pkg
The auto data will provide the example, with the variable weight being the probability weight.
sysuse auto, clear
_pctile price if foreign==0 [pw = weight], p(25 50 75)
return list
scalars:
r(r1) = 4195
r(r2) = 5104
r(r3) = 6486
Compare to svysetting the data and calling epctile:
gen strat = rep78
gen mkr = substr(make,1,2)
svyset mkr [pw = weight], strata(strat)
epctile price, percentiles(25 50 75) subpop(if foreign==0) svy
Results:
Percentile estimation
------------------------------------------------------------------------------
| Linearized
price | Coef. Std. Err. z P>|z| [95% Conf. Interval]
p25 | 4195 108.5 38.66 0.000 3982.344 4407.656
p50 | 5104 320.5 15.93 0.000 4475.832 5732.168
p75 | 6486 2093 3.10 0.002 2383.795 10588.2

Related

How to store confidence intervals from stata margins estimation?

stata experts,
I have been trying to find a way to store marginal estimations, including the p value and confidence interval.
Below is the code I have. All that I can get is the estimated marginal effect of variable I. Looks like I can't specify "ci" like what we can do for usual regression models. Is there a way to also store and present the other numbers from marginal estimations?
probit Y1 X
margin, dydx(X) post
est store m1
probit Y2 X
margins, dydx(X) post
est store m2
esttab m1 m2
esttab m1 m2, ci
Another related question is: how do I save marginal estimations for interaction terms? Example code below
probit Y2 year month year*month
margins year#month, asbalanced post
Thank you in advance!
Here's a way to grab p-values and confidence intervals after a margins command.
sysuse auto, clear
probit foreign price trunk
margin, dydx(price) post
eststo m1
The results from the margin command:
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
price | .0000268 .0000159 1.69 0.092 -4.36e-06 .000058
------------------------------------------------------------------------------
Then the p-value and confidence interval are recoverable from the stored matrices e(b) and e(V). To get the p-value, we need the z-score which is the point estimate over the standard error (e(b)[1,1]/sqrt(e(V)[1,1]). The rest is calculating the area in the two tails using normal.
The confidence interval is the point estimate e(b)[1,1] plus the standard error sqrt(e(V)[1,1]) times the critical value of z invnormal(0.975).
Shown with the output so that you can see the numbers line up:
. di "P-value: " normal(-abs(e(b)[1,1]/sqrt(e(V)[1,1])))*2
P-value: .09186065
. di "Upper bound: " e(b)[1,1] + sqrt(e(V)[1,1])*invnormal(0.975)
Upper bound: .00005796
. di "Lower bound: " e(b)[1,1] - sqrt(e(V)[1,1])*invnormal(0.975)
Lower bound: -4.361e-06
To put the p-value in a table, for example, you could use estadd:
estadd scalar pvalue = normal(-abs(e(b)[1,1]/sqrt(e(V)[1,1])))*2
And then esttab:
esttab m1, stats(pvalue, label("P-value"))
. esttab m1, stats(pvalue, label("P-value"))
----------------------------
(1)
----------------------------
price 0.0000268
(1.69)
----------------------------
P-value 0.0919
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001

How to extract any result from the output table

I am doing a weighted average and here is the table:
mean Income [fweight=Group]
Mean estimation
Number of obs = 1000
| Mean Std. Err. [95% Conf. Interval]
Income | 612.863 10.748 627.554 594.921
I really want to get the standard error and the confidence interval. However, I can only get variance by e(V). So my current method is to store e(V) in a matrix and store the element in a scalar and then use sqrt(). This is tedious!
Is there any way I can extract these statistics easily?
For example in R, all the output table is saved in a matrix RESULT and you can get the standard error simply through RESULT[1,2].
The command mean returns r(table) with the results you require:
webuse highschool, clear
mean height [pw = weight]
Mean estimation Number of obs = 4,071
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
height | 432.8991 .4149654 432.0856 433.7127
--------------------------------------------------------------
matrix list r(table)
r(table)[9,1]
height
b 432.89913
se .41496538
t 1043.2175
pvalue 0
ll 432.08557
ul 433.71269
df 4070
crit 1.960547
eform 0
More generally, different Stata commands return different results. However, in nearly all cases they give you all the ingredients to easily calculate what you need.
It may require a bit more effort to calculate further results but this is easily programmable and if you need to do something often you can write a wrapper program for the command.

Return stored results with svy and tab command in Stata

I would like to put column percentages into an excel file. The first step therefore would be to capture the percentages (or counts if not possible for percentages) into a matrix and then post the values into excel using putexcel. I can not use the matcell and matrow option with svy: tab so I attempted to check the stored results using e(name). The issue I am facing is how to capture the values from the following tabulation into a matrix:
webuse nhanes2b, clear
svyset psuid [pweight=finalwgt], strata(stratid)
svy: tabulate sex race , format(%11.3g) percent
--------------------------------------
1=male, | 1=white, 2=black, 3=other
2=female | White Black Other Total
----------+---------------------------
Male | 42.3 4.35 1.33 47.9
Female | 45.7 5.2 1.2 52.1
|
Total | 87.9 9.55 2.53 100
--------------------------------------
Key: cell percentages
I would like to put the values above in a matrix. I tried the following which worked:
mat pct = e(b)' * 100
matrix list pct
pct[6,1]
y1
p11 42.254909
p12 4.3497373
p13 1.3303765
p21 45.660537
p22 5.2008547
p23 1.2035865
But what I am interested in is the column percentages given by the following tabulation:
svy: tabulate sex race , format(%11.3g) col percent
--------------------------------------
1=male, | 1=white, 2=black, 3=other
2=female | White Black Other Total
----------+---------------------------
Male | 48.1 45.5 52.5 47.9
Female | 51.9 54.5 47.5 52.1
|
Total | 100 100 100 100
--------------------------------------
Key: column percentages
I tried this which did not return the desired values in the table above:
mat pct = e(b)' * 100
matrix list pct
pct[6,1]
y1
p11 42.254909
p12 4.3497373
p13 1.3303765
p21 45.660537
p22 5.2008547
p23 1.2035865
After checking through various stored objects using ereturn list I did not seem to find anything corresponding to column percentages.
How can I get the column percentages into a matrix?
easy peasy
ssc install estout
webuse nhanes2b, clear
estpost svy: tabulate race diabetes, row percent
esttab . using "C:/table.csv", b(2) se(2) scalars(F_Pear) nostar unstack mtitle(`e(colvar)')
see also here http://repec.org/bocode/e/estout/hlp_estpost.html#svy_tabulate

How do I use outreg2 to display value labels in its output?

Take this code
sysuse auto, clear
reg price mpg c.mpg#i.foreign
outreg2 using "example.txt", stats(coef) replace
This outputs
(1)
VARIABLES price
price
mpg -329.0***
0b.foreign#co.mpg 0
1.foreign#c.mpg 78.33**
Constant 12,596***
Observations 74
R-squared 0.289
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
Ideally, I'd like it to display the value labels, as is done in the console's regression output:
-------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
mpg | -329.0368 61.46843 -5.35 0.000 -451.6014 -206.4723
|
foreign#c.mpg |
Foreign | 78.32918 29.78726 2.63 0.010 18.93508 137.7233
|
_cons | 12595.97 1235.936 10.19 0.000 10131.58 15060.35
-------------------------------------------------------------------------------
I don't need any of the other stats at the moment; I'm strictly including that last piece of output to show what I mean with the value labels. Searching through the documentation for outreg2 tells me how to display variable labels, but not value labels.
Also posted on Statalist.
As #Dimitriy points out, you can use estout, from SSC. An example:
sysuse auto, clear
reg price mpg c.mpg#i.foreign
estimates store m1, title(Model 1)
estout m1, label
You can add other statistics, stars and more. After installation (ssc install estout), read patiently help estout.
If you decode your variables and use xi, it will do the trick. Of course this solution assumes that you recode your variables, but if you want to stick with outreg2 is an easy solution.
sysuse auto, clear
set seed 1234
gen maxspeed = round(uniform()*3)+1
label define speed 1 "Light" 2 "Ridiculous" 3 "Ludicrous" 4 "Plaid"
label values maxspeed speed
decode maxspeed, gen(maxspeed_str)
decode foreign, gen(foreign_str)
xi: reg price mpg weight i.foreign_str*i.maxspeed_str
outreg2 using test, see text label
I used the example you asked in Statalist as it was your latest question.

Stata output files in surveys (proportions)

I need to modify the code below which I'm using on some CPS data to capture insurance coverage. I need to output a file with the percent covered by Census region (there are four). It should look something like this:
region n percent
1 xxx xx
2 xxx xx
3 xxx xx
4 xxx xx
I could live with two rows defining the percentages covered and not covered in each region if necessary, but I really only need the percentage covered.
Here's the code I'm using:
svyset [iw=hinswt], sdrweight(repwt1-repwt160) vce(sdr)
tempname memhold
postfile `memhold' region_rec n prop using Insurance, replace
levelsof region_rec, local(lf)
foreach x of local lf{
svy, subpop(if region_rec==`x' & age>=3 & age<=17): proportion hcovany
scalar forx = `x'
scalar prop = _b[hcovany]
matrix b = e(_N_subp)
matrix c = e(_N)
scalar n = el(c,1,1)
post `memhold' (forx) (n) (prop)
}
postclose `memhold'
use Insurance, clear
list
This is what it produces:
Survey: Proportion estimation Number of obs = 210648
Population size = 291166198
Subpop. no. obs = 10829
Subpop. size = 10965424.5
Replications = 160
_prop_1: hcovany = Not covered
--------------------------------------------------------------
| SDR
| Proportion Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
hcovany |
_prop_1 | .0693129 .0046163 .0602651 .0783607
Covered | .9306871 .0046163 .9216393 .9397349
--------------------------------------------------------------
[hcovany] not found
r(111);
I can't figure out how to get around the error message at the bottom and get it to save the results. I think a SE and CV would be a desirable feature as well, but I'm not sure how to handle that within the matrix framework.
EDIT: Additional output
+----------------------------------+
| region~c n prop se |
|----------------------------------|
| 1 9640 .9360977 2 |
| 2 12515 .9352329 2 |
| 3 14445 .8769684 2 |
| 4 13241 .8846368 2 |
+----------------------------------+
Try changing _b[hcovany] for _b[some-value-label]. To be clear, the following non-sensical example is similar to your code, but instead of using _b[sex], where sex is a variable, it uses _b[Male], where Male is a value label for sex. Subpopulation sizes and standard errors
are also saved.
clear all
set more off
webuse nhanes2f
svyset [pweight=finalwgt]
tempname memhold
tempfile results
postfile `memhold' region nsubpop maleprop stderr using `results', replace
levelsof region, local(lf)
foreach x of local lf{
svy, subpop(if region == `x' & inrange(age, 20, 40)): proportion sex
post `memhold' (`x') (e(N_subpop)) (_b[Male]) (_se[Male])
}
postclose `memhold'
use `results', clear
list
If we were to use _b[sex] instead of _b[Male], we would get the same r(111) error as in your original post.
For this example, lets see what the matrix e(b), containing the estimated proportions, looks like:
. matrix list e(b)
e(b)[1,2]
sex: sex:
Male Female
y1 .48821487 .51178513
Therefore, if we wanted to extract the proportions for females instead
of males, we could use _b[Female].
Yet another option is to save the estimation result in a matrix and use numerical subscripts:
<snip>
matrix b = e(b)
post `memhold' (`x') (b[1,2])
<snip>
There are other slight changes like the use of inrange and direct use of returned estimation results with post.
Also, you may want to take a look at help _variables and its link:
[U] 13.5 Accessing coefficients and standard errors.