I am using Stata to conduct survey analysis and I am running the command:
quietly svy: mean consumption
I need to extract the results and put them in a matrix.
How do I extract the mean, sd and confidence intervals?
I believe for the mean we can run:
matrix[1,2] = e(b)
The following works for me:
webuse nhanes2f
svyset psuid [pweight=finalwgt], strata(stratid)
quietly svy: mean zinc
return list
scalars:
r(level) = 95
macros:
r(mcmethod) : "noadjust"
matrices:
r(table) : 9 x 1
matrix list r(table)
r(table)[9,1]
zinc
b 87.182067
se .49448269
t 176.30965
pvalue 4.244e-48
ll 86.173563
ul 88.190571
df 31
crit 2.0395134
eform 0
Related
stata experts,
I have been trying to find a way to store marginal estimations, including the p value and confidence interval.
Below is the code I have. All that I can get is the estimated marginal effect of variable I. Looks like I can't specify "ci" like what we can do for usual regression models. Is there a way to also store and present the other numbers from marginal estimations?
probit Y1 X
margin, dydx(X) post
est store m1
probit Y2 X
margins, dydx(X) post
est store m2
esttab m1 m2
esttab m1 m2, ci
Another related question is: how do I save marginal estimations for interaction terms? Example code below
probit Y2 year month year*month
margins year#month, asbalanced post
Thank you in advance!
Here's a way to grab p-values and confidence intervals after a margins command.
sysuse auto, clear
probit foreign price trunk
margin, dydx(price) post
eststo m1
The results from the margin command:
------------------------------------------------------------------------------
| Delta-method
| dy/dx std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
price | .0000268 .0000159 1.69 0.092 -4.36e-06 .000058
------------------------------------------------------------------------------
Then the p-value and confidence interval are recoverable from the stored matrices e(b) and e(V). To get the p-value, we need the z-score which is the point estimate over the standard error (e(b)[1,1]/sqrt(e(V)[1,1]). The rest is calculating the area in the two tails using normal.
The confidence interval is the point estimate e(b)[1,1] plus the standard error sqrt(e(V)[1,1]) times the critical value of z invnormal(0.975).
Shown with the output so that you can see the numbers line up:
. di "P-value: " normal(-abs(e(b)[1,1]/sqrt(e(V)[1,1])))*2
P-value: .09186065
. di "Upper bound: " e(b)[1,1] + sqrt(e(V)[1,1])*invnormal(0.975)
Upper bound: .00005796
. di "Lower bound: " e(b)[1,1] - sqrt(e(V)[1,1])*invnormal(0.975)
Lower bound: -4.361e-06
To put the p-value in a table, for example, you could use estadd:
estadd scalar pvalue = normal(-abs(e(b)[1,1]/sqrt(e(V)[1,1])))*2
And then esttab:
esttab m1, stats(pvalue, label("P-value"))
. esttab m1, stats(pvalue, label("P-value"))
----------------------------
(1)
----------------------------
price 0.0000268
(1.69)
----------------------------
P-value 0.0919
----------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
I have run two regressions on similar but different data sets and regressors the results of which I want to report next to each other for comparability but keep them in one table using estout/esttab. The finished product should look something like this.
Table
-----------------------------------------
Dep. Var.: a
-----------------------------------------
Regression 1 |Regression 2
-----------------------------------------
x_1 coeff.|x_2 coeff.
y_1 coeff.|y_2 coeff.
z_1 coeff.|z_2 coeff.
l_1 coeff.|
m_1 coeff.|
-----------------------------------------
Obs value|Obs value
-----------------------------------------
Hypothesis |
x_1=1,p-value |x_2=1,p-value
-----------------------------------------
I am able to create individual tables like this just fine but I have honestly no idea where to start here and documentation hasn't been very helpful either. I hope someone here can point me in the right direction.
I think the eststo/esttab syntax will help. Here's code which I think produces what you're after:
ssc install estout, replace
sysuse auto, clear
eststo m1: regress price mpg
test mpg=1
estadd scalar pvalue = r(p)
eststo m2: regress price mpg trunk [enter image description here][1]
test trunk=1
estadd scalar pvalue = r(p)
esttab m1 m2, b(3) se(3) stats(pvalue r2 N, fmt(3 3 0) label("P-value" "R-squared" "Observations")) mtitle("Regression 1" "Regression 2") nocons
It creates this table:
--------------------------------------------
(1) (2)
Regression 1 Regression 2
--------------------------------------------
mpg -238.894*** -220.165**
(53.077) (65.593)
trunk 43.559
(88.719)
--------------------------------------------
P-value 0.000 0.633
R-squared 0.220 0.222
Observations 74 74
--------------------------------------------
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001
The esttab guide is really great.
I often make graphs that plot a mean or coefficient with a 95% error bar using -twoway scatter- and -twoway rcap-. The code below produces a legend with two entries: one for the mean marker symbol and one for the error bar. But I want the legend to display a single entry, showing the marker symbol and the error bar combined. Below is an example of how I usually make a graph.
sysuse auto
gen b = .
gen se = .
mean mpg if foreign == 1
replace b = _b[mpg] in 1
replace se = _se[mpg] in 1
mean mpg if foreign == 0
replace b = _b[mpg] in 2
replace se = _se[mpg] in 2
gen lb = b - (1.96 * se)
gen ub = b + (1.96 * se)
gen index = _n in 1/2
twoway scatter b index || rcap lb ub index, legend(order(1 "Mean" 2 "95% Interval"))
Is there an option in -legend- to allow me to overlay two legend entries in the way I want?
I don't really know how to do exactly what you want. It seems hard.
I also hate to waste the legend real estate, so one alternative is to label the means instead of using a legend (and add "With 95%CIs" to the title):
sysuse auto
reg mpg i.foreign
margins foreign, post
estimates store means
marginsplot, recast(scatter) xscale(reverse)
coefplot means
Another is to just use ciplot without any regression/summarization:
ciplot mpg, by(foreign) xscale(reverse)
coefplot and ciplot are both user-written.
I am trying to calculate the 95% binomial Wilson confidence interval for the proportion of people completing treatment by year (dataset is line-listed for each person).
I want to store the results into a matrix so that I can use the putexcel command to export the results to an existing Excel spreadsheet without changing the formatting of the sheet. I have created a binary variable dscomplete_binary which is 0 for a person if treatment was not completed, and 1 if treatment was completed.
I have tried the following:
bysort year: ci dscomplete_binary, binomial wilson level(95)
This gives output of each year with the 95% confidence intervals. Previously I used statsby to collapse the dataset to store the results in variables but this clears the dataset from the memory and so I have to constantly re-open it.
Is there a way to run the command and store the results in a tabular format so that the data is stored in a similar way to this:
year mean LowerCI UpperCI
r1 2005 .7031588 .69229454 .71379805
r2 2006 .75532377 .74504232 .7653212
r3 2007 .78125924 .77125096 .79094833
r4 2008 .80014324 .79059798 .80935836
r5 2009 .81860977 .80955398 .82732689
r6 2010 .82641232 .81723672 .83522016
r7 2011 .81854123 .80955547 .82719356
r8 2012 .83497983 .82621944 .8433823
r9 2013 .85411799 .84527379 .86253893
r10 2014 .84461939 .83499599 .85377985
I have tried the following commands, which give different estimates to the binomial Wilson option:
svyset id2
bysort year: eststo: ci dscomplete_binary, binomial wilson level(95)
I think the postfile family of commands will help you here. This won't save your data into a matrix, but will save the results of the ci command into a new data set, which you name and whose structure you set. After the analysis is complete, you can load the data saved by postfile and export to Excel in the manner of your choosing.
For postfile, you analyze the data in a loop instead of using by or bysort.
Assuming the years in your data run 2005-2014, here is sample code:
/*make sure no postfile is open, in case a previous run did not close the file*/
cap postclose ci_results
/*create the postfile that will store results*/
postfile ci_results year mean lowerCI upperCI using ci_results.dta, replace
/*loop through years*/
forval y = 2004/2014 {
ci dscomplete_binary if year==`y', binomial wilson level(95)
/*store saved results from ci to postfile. Make sure the post statement contains results in the same order stated in postfile command.*/
post (`y') (r(mean)) (r(lb)) (r(ub))
}
/*close the postfile once you've looped through all the cases of interest*/
postclose ci_results
use ci_results.dta, clear
Once you load the ci_results.dta data into memory, you can apply any Excel exporting command you like.
This is a development of the suggestion already made to use statsby. The objections to it are quite puzzling, as it is easy to get back to the original dataset. There is some machine time in re-loading a dataset, but how much personal time has been spent in pursuit of an alternative?
Absent a dataset which we can use, I've provided a reproducible example.
If you wish to do this repeatedly, you'll write a more elaborate program to do it, which is what this forum is all about.
I leave how to export results to Excel as a matter for those so inclined: no details of what is wanted are provided in any case.
. sysuse auto, clear
(1978 Automobile Data)
. preserve
. statsby mean=r(mean) ub=r(ub) lb=r(lb), by(rep78) : ci foreign, binomial wilson level(95)
(running ci on estimation sample)
command: ci foreign, binomial wilson
mean: r(mean)
ub: r(ub)
lb: r(lb)
by: rep78
Statsby groups
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.....
. list
+----------------------------------------+
| rep78 mean ub lb |
|----------------------------------------|
1. | 1 0 .6576198 0 |
2. | 2 0 .3244076 0 |
3. | 3 .1 .2562108 .0345999 |
4. | 4 .5 .7096898 .2903102 |
5. | 5 .8181818 .9486323 .5230194 |
+----------------------------------------+
. restore
. describe
The describe results will show that we are back where we started.
I would like to put column percentages into an excel file. The first step therefore would be to capture the percentages (or counts if not possible for percentages) into a matrix and then post the values into excel using putexcel. I can not use the matcell and matrow option with svy: tab so I attempted to check the stored results using e(name). The issue I am facing is how to capture the values from the following tabulation into a matrix:
webuse nhanes2b, clear
svyset psuid [pweight=finalwgt], strata(stratid)
svy: tabulate sex race , format(%11.3g) percent
--------------------------------------
1=male, | 1=white, 2=black, 3=other
2=female | White Black Other Total
----------+---------------------------
Male | 42.3 4.35 1.33 47.9
Female | 45.7 5.2 1.2 52.1
|
Total | 87.9 9.55 2.53 100
--------------------------------------
Key: cell percentages
I would like to put the values above in a matrix. I tried the following which worked:
mat pct = e(b)' * 100
matrix list pct
pct[6,1]
y1
p11 42.254909
p12 4.3497373
p13 1.3303765
p21 45.660537
p22 5.2008547
p23 1.2035865
But what I am interested in is the column percentages given by the following tabulation:
svy: tabulate sex race , format(%11.3g) col percent
--------------------------------------
1=male, | 1=white, 2=black, 3=other
2=female | White Black Other Total
----------+---------------------------
Male | 48.1 45.5 52.5 47.9
Female | 51.9 54.5 47.5 52.1
|
Total | 100 100 100 100
--------------------------------------
Key: column percentages
I tried this which did not return the desired values in the table above:
mat pct = e(b)' * 100
matrix list pct
pct[6,1]
y1
p11 42.254909
p12 4.3497373
p13 1.3303765
p21 45.660537
p22 5.2008547
p23 1.2035865
After checking through various stored objects using ereturn list I did not seem to find anything corresponding to column percentages.
How can I get the column percentages into a matrix?
easy peasy
ssc install estout
webuse nhanes2b, clear
estpost svy: tabulate race diabetes, row percent
esttab . using "C:/table.csv", b(2) se(2) scalars(F_Pear) nostar unstack mtitle(`e(colvar)')
see also here http://repec.org/bocode/e/estout/hlp_estpost.html#svy_tabulate