coefplot: several models with several coefficients each in one graph - stata

I would like to display the coefficients (with their confidence intervals) of two regressions beneath one another.
Using Ben Jann's nice coefplot (ssc install coefplot), I can create a graph with one subgraph only where all coefficients from all models are included, but I do not succeed in ordering the coefficients by model rather than by coefficient.
Alternatively, I can create a graph with several subgraphs by coefficient, which is not I need: there should be one subgraph only, and a common scale for the coefficients.
Here comes a minimum example illustrating my needs and what I just described:
sysuse auto.dta, clear
reg price mpg rep78
eststo model1
reg price mpg rep78 weight
eststo model2
*what do I have: 2 models with 2 coefficients each (plus constant)
*what do I want: 1 graph with 2 models beneath one another,
*2 coefficients per model, 1 colour and legend entry per coefficient (not model!)
*common scale
*what is easy to get:
coefplot model1 model2, ///1 graph with all coefficients and models,
keep(mpg rep78) //but order is by coefficient, not by model
//how to add model names as ylabels?
*or 1 graph with 2 subgraphs by coefficient:
coefplot model1 || model2, ///
keep(mpg rep78) bycoefs
Can anyone help me in getting the graph I want optimally using coefplot?
As you can read from the notes in the example, the perfect solution would include one colour and legend entry per coefficient (not model) and the ylabels using the model names, but this is secondary.
I already tried a couple of the coefplot options, but it seems to me that most of them are for several equations from one model rather than for coefficients from different models.

I am not sure how to deal with the model names, but for the first part of your question, it seems to me that you could just do something like:
sysuse auto.dta, clear
reg price mpg rep78
eststo m1
reg price mpg rep78 weight
eststo m2
coefplot (m1) || (m2), ///
drop(_cons) byopts(row(2)) keep(mpg rep78)
Or do I misunderstand what you want?

Related

How do I plot coefficients for multiple models in one graph using coefplot?

I have models that have different dependent variables (DVs), but using the same instrumental variable (IV), with two different types of identification strategies. In short I have:
Group1
model 1
model 2
model 3
model 4
Group2
model 1-2
model 2-2
model 3-2
model 4-2
I want to plot the coefficients where I can compare coefficient estimate of the IV of model 1 and model 1-2; model 2 and model 2-2 ... at once. I would like the eight coefficients to be plotted in one graph, since the IV of interest is the same for all models.
Is there a stylized way to do this?
For now, I only know how to plot coefs of model 1 and model 1-2 in one graph, but not all other ones. I want the coefficients to be plotted horizontally, according to model number (1, 2, 3, 4) and compare coefficients of model 1 and model 1-2; model 2 and model 2-2, for each model number on `x-axis'.
The community-contributed command coefplot is not meant to be used like that.
Nevertheless, below is a toy example of how you could get what you want:
sysuse auto, clear
estimates clear
// code for identification strategy 1
regress price mpg trunk length turn if foreign == 0
estimates store A
regress price trunk turn if foreign == 0
estimates store B
regress price weight trunk turn if foreign == 0
estimates store C
regress price trunk if foreign == 0
estimates store D
// code for identification strategy 2
regress price mpg trunk length turn if foreign == 1
estimates store E
regress price trunk turn if foreign == 1
estimates store F
regress price weight trunk turn if foreign == 1
estimates store G
regress price trunk if foreign == 1
estimates store H
Then you draw the plot as follows:
local gap1 : display _dup(30) " "
local gap2 : display _dup(400) " "
local label1 |`gap1'|`gap1'|`gap1'|`gap2'
local gap3 : display _dup(25) " "
local gap4 : display _dup(400) " "
local label2 DV1`gap3'DV2`gap3'DV3`gap3'DV4`gap4'
coefplot (A, offset(-0.38)) (E, offset(-0.33)) ///
(B, offset(-0.15)) (F, offset(-0.1)) ///
(C, offset(0.09)) (G, offset(0.135)) ///
(D, offset(0.3)) (H, offset(0.35)), ///
vertical keep(trunk) coeflabels(trunk = `""`label1'""`label2'""') ///
xlabel(, notick labgap(0)) legend(off)
The code here uses Stata's toy auto dataset to run a number of simple regressions for each foreign category. The same dependent variable price is used for illustration but you can use different variables in its place. The variable trunk is assumed here to be the variable of interest.
What the above toy code snippet basically does is to save each set of regression estimates and then simulate pairs of models. The labels DV are drawn based on a hack that i demonstrated recently in the following two questions:
coefplot: Putting names of regressions with vertical option
coefplot: Putting names of regressions on y-axis
The result is a very good approximation but will require experimentation from your part:

Stata creating output for IV regression with bysort

So I am running an 2SLS model by interview year and I have many interview years and different models. I want to present the first-stage results first and then after reassuring the reader that they are solid move on to the interesting results.
Example of Table A (first stage):
Year DV Coef SE F N
1 A 0.5 0.1 100 1000
2 A 0.8 0.2 10 1500
3 B -0.6 0.4 800 800
Table B with the main results would look the same just without the F-Stat.
I searched on the web about how to create those tables automatically in Stata, but despite finding many questions I didn't find an answer that worked for me. From those different posts and help-files I build something that is nearly there.
It creates the table I want for the main results with the F-Stat together by some variable (Step A in the code). However, when I move on to do the same for the first stage it only saves the last wave as I restore the estimates. I understand why Stata does it like that, but I cannot think of a way of convincing it to do what I want.
clear all
*Install user-written commands
ssc install outreg2, replace
ssc install ivreg210, replace
*load data
sysuse auto, clear
*run example model (obviously the model itself is bogus)
********************************************************
*Step A: creates the IV results by foreign plus the F-Statistic
bys foreign: ///
outreg2 using output1-IV-F, label excel stats(coef se) dec(2) adds(F-Test, e(widstat)) nocons nor2 keep(mpg) replace: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
*Step B: creates the first stage results in a seperate table
bys foreign: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
est restore _ivreg210_mpg
outreg2 using output1_1st-stage, replace keep(rep78)
cap erase output1-IV-F
cap erase output1_1st-stage
So ideally I would only run the model once and have the F-Stat in the first-stage table, but I can fix that manually. The biggest issue I have is how to store the estimates when using bysort. If anyone has any suggestions about that, I would greatly appreciate it.
Thanks!
ssc install estout
then you can store whatever result you want for later use, even after a bysort.
eststo clear
sysuse auto, clear
bysort foreign: eststo: reg price weight mpg
esttab, label nodepvar nonumber
This is a round-about solution. It works, but really isn't the proper solution I was/am looking for. The "trick" is to run the 1st stage as a separate model.
clear all
*Install user-written commands
ssc install outreg2, replace
ssc install ivreg210, replace
*load data
sysuse auto, clear
*run example model (obviously the model itself is bogus)
********************************************************
*Step A: creates the IV results by foreign plus the F-Statistic
bys foreign: ///
outreg2 using output1-IV-F, label excel stats(coef se) dec(2) adds(F-Test, e(widstat)) nocons nor2 keep(mpg) replace: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
*Step B: creates the first stage results in a seperate table
bys foreign: ///
ivreg210 price headroom trunk (mpg=rep78 ), savefirst first
est restore _ivreg210_mpg
outreg2 using output1_1st-stage1, replace keep(rep78)
*************
/* NEW BIT */
*************
*Step C: creates the first stage results in a seperate table
bys foreign: ///
outreg2 using output1_1st_NEW, label excel stats(coef se) dec(2) nocons nor2 keep(rep78) replace: ///
reg mpg headroom trunk rep78
cap erase output1-IV-F
cap erase output1_1st-stage1
cap erase output1_1st_NEW

Computing and plotting difference in group means

In what follows I plot the mean of an outcome of interest (price) by a grouping variable (foreign) for each possible value taken by the fake variable time:
sysuse auto, clear
gen time = rep78 - 3
bysort foreign time: egen avg_p = mean(price)
scatter avg_p time if (foreign==0 & time>=0) || ///
scatter avg_p time if (foreign==1 & time>=0), ///
legend(order(1 "Domestic" 2 "Foreign")) ///
ytitle("Average price") xlab(#3)
What I would like to do is to plot the difference in the two group means over time, not the two separate means.
I am surely missing something, but to me it looks complicated because the information about the averages is stored "vertically" (in avg_p).
The easiest way to do this is to arguably use linear regression to estimate the differences:
/* Regression Way */
drop if time < 0 | missing(time)
reg price i.foreign##i.time
margins, dydx(foreign) at(time =(0(1)2))
marginsplot, noci title("Foreign vs Domestic Difference in Price")
If regression is hard to wrap your mind around, the other is involves mangling the data with a reshape:
/* Transform the Data */
keep price time foreign
collapse (mean) price, by(time foreign)
reshape wide price, i(time) j(foreign)
gen diff = price1-price0
tw connected diff time
Here is another approach. graph dot will happily plot means.
sysuse auto, clear
set scheme s1color
collapse price if inrange(rep78, 3, 5), by(foreign rep78)
reshape wide price, i(rep78) j(foreign)
rename price0 Domestic
label var Domestic
rename price1 Foreign
label var Foreign
graph dot (asis) Domestic Foreign, over(rep78) vertical ///
marker(1, ms(Oh)) marker(2, ms(+))

Remove extra column title in esttab (Stata)

I'm using estout to prepare regression tables for LaTeX. It seems when storing a nonlinear model, there is an extra row in the output that I can't seem to turn off or remove. As an MWE, consider:
sysuse auto, clear
eststo clear
eststo: poisson mpg rep78
esttab, tex nomti nodepvars
This will produce output with a row after the model numbers and before the coefficients with "mpg" (the dependent variable) in the left-most column and the other columns blank. This is not a model title and the nomti and nodepvars options will not remove it.
Oddly, this row does not show up when the regression is OLS. However, this row does appear in the following example with OLS and Poisson, now containing "main":
sysuse auto, clear
eststo clear
eststo: reg mpg rep78
eststo: poisson mpg rep78
esttab, tex nomti nodepvars
I've been through the options in esttab and estout and I can't find any that seem to turn off this row (I'm not even sure what purpose it might serve). Any ideas how to get rid of this row?
Add the option eqlabels(none) to your esttab.
#Dimitriy has posted the solution.
As a side note, consider why this is happening:
The matrix of coefficients for regress looks like:
. matrix list e(b)
e(b)[1,2]
rep78 _cons
y1 2.3842975 13.169421
while that for poisson:
. matrix list e(b)
e(b)[1,2]
mpg: mpg:
rep78 _cons
y1 .11242548 2.6692439
estout, esttab, etc. use these stored results, therefore the tables.

Stata: Combine table command with ttest and output latex

For regression output, I usually use a combination of eststo to store estimations, estadd to add the R2 and additional tests, then estab to output the lot.
I need to do the same with the table command. I need the mean, median and N for a variable across three by variables and would like to add stars for the result of a ttest==1 on the mean and signtest==1 on the median. I have three by variables, so I've been using table to collate the mean, median and N, which I'm calling like the following pseudo-code:
sysuse auto,clear
table foreign rep78 , ///
contents(mean price median price n price) format(%9.2f)
ttest price==1, by(foreign rep78)
signtest price=1, by(foreign rep78)
I've tried esttab and estpost to no avail. I've also looked at tabstat, tablemat and summarize as alternatives to table, but they don't allow three by variables.
How can I create this table, add the stars for the ttest and signtest p-values and output the full table?
The main point in your question seems to be producing a LaTeX table. However, you show "pseudo-code", that looks pretty much like Stata code, with the caveat that it is illegal.
In particular, for the ttest you can only have one variable in the by() option. But notice that ttest allows also the by: prefix (you can use both, in fact). Their reasons-to-be are different. On the other hand, signtest does not allow a by() option but it does allow the by: prefix. So you should probably clarify what you want to do before creating the table.
If you are trying to use the by: prefix in both cases and afterwards produce a table, you can create a grouping variable, and put the commands in a loop. In this way, you can try tabulating the saved results for each group using the ESTOUT module (by Ben Jann in SSC). Something like:
*clear all
set more off
sysuse auto
keep price foreign rep78
* create group variable
egen grou = group(foreign rep78)
* tests by group
forvalues i = 1/8 {
ttest price == 1 if grou == `i'
signtest price = 1 if grou == `i'
*<complete with estout syntax>
}
See help by, help egen (the group function), help estout and help saved results.