Coefplot confidence intervals by colour - stata

I want to create a coefficient plot where each confidence intervals shows three significance levels: 99, 95, 90. Moreover, I would like the confidence interval line to vary by colour depending on the significance level it is showing. My code is as follows:
sysuse auto
eststo reg_1: reg price mpg
eststo reg_2: reg price mpg headroom
coefplot reg_1 reg_2, keep(mpg) levels(99 95 90)
I tried to do this by adding the following mcolor and ciopts:
coefplot (reg_1, mcolor(ebblue) ciopts(color(ebblue))) (reg_2, mcolor(ebblue) ciopts(color(ebblue))), keep(mpg) levels(99 95 90)
but it seems only the 99% confidence interval is affected. How can I tell Stata which confidence interval should be which colour?

Related

How do I create a difference in means table with a column for N observations using esttab?

I'm trying to create a table showing the difference in means of two groups but I'm struggling to get exactly the right columns. I'm specifically trying to recreate a table that includes a column indicating N. I'm really confused how I would go about doing this. Here's the code for the bulk of the kind of table I'm talking about:
sysuse auto, clear
eststo all: quietly estpost summarize ///
price mpg weight headroom trunk
eststo domestic: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 0
eststo foreign: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 1
eststo diff: quietly estpost ttest ///
price mpg weight headroom trunk, by(foreign) unequal
esttab all domestic foreign diff, ///
cells("mean(pattern(1 1 1 0) fmt(2)) b(star pattern(0 0 0 1) fmt(2))" "sd(pattern(1 1 1 0) par fmt(2)) t(pattern(0 0 0 1) par fmt(2))") ///
label
In this scenario, I want a column N that just lists 74 in each row. In this shortened example, it doesn't make much sense but I'm using multiple data sources with the same variables and I want to stack the difference in means tables together.
You can reference the number of observations in each subsample (in this case by(foreign)) using N_1 and N_2 as follows:
sysuse auto, clear
estpost ttest price mpg weight headroom trunk, by(foreign)
esttab . , ///
cell(( N_2(fmt(%12.0fc)) mu_2(fmt(%12.3fc)) ///
N_1(fmt(%12.0fc)) mu_1(fmt(%12.3fc)) ///
b(fmt(%12.3fc)) t(fmt(%12.2fc)) )) ///
noobs label collabels(Count Mean Count Mean Diff T-Stat )

Computing and plotting difference in group means

In what follows I plot the mean of an outcome of interest (price) by a grouping variable (foreign) for each possible value taken by the fake variable time:
sysuse auto, clear
gen time = rep78 - 3
bysort foreign time: egen avg_p = mean(price)
scatter avg_p time if (foreign==0 & time>=0) || ///
scatter avg_p time if (foreign==1 & time>=0), ///
legend(order(1 "Domestic" 2 "Foreign")) ///
ytitle("Average price") xlab(#3)
What I would like to do is to plot the difference in the two group means over time, not the two separate means.
I am surely missing something, but to me it looks complicated because the information about the averages is stored "vertically" (in avg_p).
The easiest way to do this is to arguably use linear regression to estimate the differences:
/* Regression Way */
drop if time < 0 | missing(time)
reg price i.foreign##i.time
margins, dydx(foreign) at(time =(0(1)2))
marginsplot, noci title("Foreign vs Domestic Difference in Price")
If regression is hard to wrap your mind around, the other is involves mangling the data with a reshape:
/* Transform the Data */
keep price time foreign
collapse (mean) price, by(time foreign)
reshape wide price, i(time) j(foreign)
gen diff = price1-price0
tw connected diff time
Here is another approach. graph dot will happily plot means.
sysuse auto, clear
set scheme s1color
collapse price if inrange(rep78, 3, 5), by(foreign rep78)
reshape wide price, i(rep78) j(foreign)
rename price0 Domestic
label var Domestic
rename price1 Foreign
label var Foreign
graph dot (asis) Domestic Foreign, over(rep78) vertical ///
marker(1, ms(Oh)) marker(2, ms(+))

Plotting log odds against mid-point of category

I have a binary outcome variable (disease) and a continuous independent variable (age). There's also a cluster variable clustvar. Logistic regression assumes that the log odds is linear with respect to the continuous variable. To visualize this, I can categorize age as (for example, 0 to <5, 5 to <15, 15 to <30, 30 to <50 and 50+) and then plot the log odds against the category number using:
logistic disease i.agecat, vce(cluster clustvar)
margins agecat, predict(xb)
marginsplot
However, since the categories are not equal width, it would be better to plot the log odds against the mid-point of the categories. Is there any way that I can manually define that the values plotted on the x-axis by marginsplot should be 2.5, 10, 22.5, 40 and (slightly arbitrarily) 60, and have the points spaced appropriately?
If anyone is interested, I achieved the required graph as follows:
Recategorised age variable slightly differently using (integer) labels that represent the mid-point of the category:
gen agecat = .
replace agecat = 3 if age<6
replace agecat = 11 if age>=6 & age<16
replace agecat = 23 if age>=16 & age<30
replace agecat = 40 if age>=30 & age<50
replace agecat = 60 if age>=50 & age<.
For labelling purposes, created a label:
label define agecat 3 "Less than 5y" 11 "10 to 15y" 23 "15 to <30y" 40 "30 to <50y" 60 "Over 50 years"
label values agecat
Ran logistic regression as above:
logistic disease i.agecat, vce(cluster clustvar)
Used margins and plot using marginsplot:
margins agecat, predict(xb)
marginsplot

Add column with number of observations to esttab summary statistics table

I would like to make a summary statistics table using esttab from the estout package on SSC. I can make the table just fine, but I would like to add a column that counts the number of non-missing observations for each variable. That is, some variables may not be complete and I would like this to be clear to the reader.
In the example below I removed the first five observations for price, so I would like a 69 in that row. But my code doesn't include row-specific observation counts, only the total number of observations in the footer.
sysuse auto, clear
estpost summarize, detail
replace price = . in 1/5
local screen ///
cells("N mean sd min p50 max") ///
nonumber label
esttab, `screen'
This yields an empty N column, which I would prefer to have at 69 , followed by all 74s.
Is this it:
clear all
set more off
*----- exmple data -----
sysuse auto, clear
keep price mpg rep78 headroom
replace price = . in 1/5
*----- what you want -----
estpost summarize, detail
local screen cells("count mean sd") nonumber label noobs
esttab, `screen'
?
It just uses count. esttab is a wrapper for estout, and the help for the latter documents that it will take "results from e(myel)", which you have from estpost summarize, detail.
An alternative is:
tabstat _all, statistics(count mean sd) columns(statistics)
Yet another one, only that it allows variable labels to be displayed:
fsum _all, stat(n mean sd) uselabel
fsum is from SSC.

Use esttab to generate summary statistics by group with columns for mean difference and significance

I would like to use esttab (ssc install estout) to generate summary statistics by group with columns for the mean difference and significance. It is easy enough to generate these as two separate tables with estpost, summarize, and ttest, and combine manually, but I would like to automate the whole process.
The following code generates the two components of the desired table.
sysuse auto, clear
* summary statistics by group
eststo clear
by foreign: eststo: quietly estpost summarize ///
price mpg weight headroom trunk
esttab, cells("mean sd") label nodepvar
* difference in means
eststo: estpost ttest price mpg weight headroom trunk, ///
by(foreign) unequal
esttab ., wide label
And I can print the two tables and cut-an-paste into one table.
* can generate similar tables and append horizontally
esttab, cells("mean sd") label
esttab, wide label
* manual, cut-and-paste solution
-------------------------------------------------------------------------------------------------------
(1) (2) (3)
mean sd mean sd
-------------------------------------------------------------------------------------------------------
Price 6072.423 3097.104 6384.682 2621.915 -312.3 (-0.44)
Mileage (mpg) 19.82692 4.743297 24.77273 6.611187 -4.946** (-3.18)
Weight (lbs.) 3317.115 695.3637 2315.909 433.0035 1001.2*** (7.50)
Headroom (in.) 3.153846 .9157578 2.613636 .4862837 0.540** (3.30)
Trunk space (.. ft.) 14.75 4.306288 11.40909 3.216906 3.341*** (3.67)
-------------------------------------------------------------------------------------------------------
Observations 52 22 74
-------------------------------------------------------------------------------------------------------
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
It seems that I should be able to get the desired table with one esttab call and without cutting-and-pasting, but I can't figure it out. Is there a way to generate the desired table without manually cutting-and-pasting?
I would prefer to output a LaTeX table, but anything that eliminates the cutting-and-pasting is a big step, even passing through a delimited text file.
If you still want to use esttab, you can play around using cells and pattern. The table in the original post can be replicated with the following code:
sysuse auto, clear
eststo domestic: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 0
eststo foreign: quietly estpost summarize ///
price mpg weight headroom trunk if foreign == 1
eststo diff: quietly estpost ttest ///
price mpg weight headroom trunk, by(foreign) unequal
esttab domestic foreign diff, ///
cells("mean(pattern(1 1 0) fmt(2)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(2)) t(pattern(0 0 1) par fmt(2))") ///
label
which yields
-----------------------------------------------------------------------------------------------------
(1) (2) (3)
mean sd mean sd b t
-----------------------------------------------------------------------------------------------------
Price 6072.42 3097.10 6384.68 2621.92 -312.26 (-0.44)
Mileage (mpg) 19.83 4.74 24.77 6.61 -4.95** (-3.18)
Weight (lbs.) 3317.12 695.36 2315.91 433.00 1001.21*** (7.50)
Headroom (in.) 3.15 0.92 2.61 0.49 0.54** (3.30)
Trunk space (.. ft.) 14.75 4.31 11.41 3.22 3.34*** (3.67)
-----------------------------------------------------------------------------------------------------
Observations 52 22 74
-----------------------------------------------------------------------------------------------------
I don't think there's a way to do this with esttab (estout package from ssc), but I have a solution with listtab (also ssc) and postfile. The table here is a little different than the one I propose above, but the approach is general enough that you can modify it to fit your needs.
This solution also use LaTeX's booktabs package.
/* data and variables */
sysuse auto, clear
local vars price mpg weight headroom trunk
/* means */
tempname postMeans
tempfile means
postfile `postMeans' ///
str100 varname domesticMeans foreignMeans pMeans using "`means'", replace
foreach v of local vars {
local name: variable label `v'
ttest `v', by(foreign)
post `postMeans' ("`name'") (r(mu_1)) (r(mu_2)) (r(p))
}
postclose `postMeans'
/* medians */
tempname postMedians
tempfile medians
postfile `postMedians' ///
domesticMedians foreignMedians pMedians using `medians', replace
foreach v of local vars {
summarize `v' if !foreign, detail
local med1 = r(p50)
summarize `v' if foreign, detail
local med2 = r(p50)
ranksum `v', by(foreign)
local pval = 2 * (1 - normal(abs(r(z))))
post `postMedians' (`med1') (`med2') (`pval')
}
postclose `postMedians'
/* combine */
use `means'
merge 1:1 _n using `medians', nogenerate
format *Means *Medians %9.3gc
list
/* make latex table */
/* requires LaTeX package `booktabs` */
listtab * using "Table.tex", ///
rstyle(tabular) replace ///
head("\begin{tabular}{lcccccc}" ///
"\toprule" ///
"& \multicolumn{3}{c}{Means} & \multicolumn{3}{c}{Medians} \\" ///
"\cmidrule(lr){2-4} \cmidrule(lr){5-7}" ///
"& Domestic & Foreign & \emph{p} & Domestic & Foreign & \emph{p}\\" ///
"\midrule") ///
foot("\bottomrule" "\end{tabular}")
This yields the following.
The answer chosen is nice but a bit redudant. You can achieve the same result with only estpost ttest.
sysuse auto, clear
estpost ttest price mpg weight headroom trunk, by(foreign)
esttab, cells("mu_1 mu_2 b(star)"
The output looks like this:
mu_1 mu_2 b
c_score 43.33858 42.034 1.30458***
nc_a4_17 4.007524 3.924623 .0829008*