Stata coefplot: plot coefficients and corresponding confidence intervals on 2nd axis - stata

When trying to depict two coefficients from one regression on separate axes with Ben Jann's superb coefplot (ssc install coefplot) command, the coefficient to be shown on the 2nd axis is correctly displayed, but its confidence interval is depicted on the 1st scale.
Can anyone explain how I get the CI displayed on the same (2nd) axis as the coefficient it belongs to? I couldn't find any option to change this - and imagine it should be the default, if not the only, option to plot the CI around the point estimate it belongs to.
I use the latest coefplot version with Stata 16.
Here is a minimum example to illustrate the problem:
results plot
webuse union, clear
eststo results: reg idcode i.union grade
coefplot (results, keep(1.union)) (results, keep(grade) xaxis(2))

In the line
coefplot (results, keep(1.union)) (results, keep(grade) xaxis(2))
you specify the option xaxis(2), but this is not a documented option of coefplot, although it is a valid option of twoway rspike which is called by coefplot. Apparently, if you use xaxis(2) something goes wrong with the communication between coefplot and rspike.
This works for me:
coefplot (results, keep(1.union)) (results, keep(grade) axis(2))

I'm trying to create something similar. Since this option is not built-in we need to write a program to tweak how coefplot works. I'm sharing the code from the user manual here: http://repec.sowi.unibe.ch/stata/coefplot/markers.html
capt program drop coefplot_mlbl
*! version 1.0.0 10jun2021 Ben Jann
program coefplot_mlbl, sclass
_parse comma plots 0 : 0
syntax [, MLabel(passthru) * ]
if `"`mlabel'"'=="" local mlabel mlabel(string(#b, "%5.2f") + " (" + string(#ll, "%5.2f") + "; " + string(#ul, "%5.2f") + ")")
preserve
qui coefplot `plots', `options' `mlabel' generate replace nodraw
sreturn clear
tempvar touse
qui gen byte `touse' = __at<.
mata: st_global("s(mlbl)", ///
invtokens((strofreal(st_data(.,"__at","`touse'")) :+ " " :+ ///
"`" :+ `"""' :+ st_sdata(.,"__mlbl","`touse'") :+ `"""' :+ "'")'))
sreturn local plots `"`plots'"'
sreturn local options `"`options'"'
end
capt program drop coefplot_ymlbl
*! version 1.0.0 10jun2021 Ben Jann
program coefplot_ymlbl
_parse comma plots 0 : 0
syntax [, MLabel(str asis) * ]
_parse comma mlspec mlopts : mlabel
local mlopts = substr(`"`mlopts'"', 2, .) // remove leading comma
if `"`mlspec'"'!="" local mlabel mlabel(`mlspec')
else local mlabel
coefplot_mlbl `plots', `options' `mlabel'
coefplot `plots', ///
yaxis(1 2) yscale(alt) yscale(axis(2) alt noline) ///
ylabel(none, axis(2)) yti("", axis(2)) ///
ymlabel(`s(mlbl)', axis(2) notick angle(0) `mlopts') `options'
end
coefplot_ymlbl D F, drop(_cons) xline(0)
However, the above program does not allow for the option 'bylabel'. I get a stata error saying "bylabel not allowed". I wanted to ask if there is a way to edit this code and include the bylabel option which is used to label subplots?

Related

Scatter plot color by variable

I want to make an scatter plot in Stata with points colored according to a categorical variable.
The only way I've found to do this, is to code colors in layers of a twoway plot.
However, this seems a rather convoluted solution for such a simple operation:
twoway (scatter latitud longitud if nougrups4 ==1, mcolor(black)) ///
(scatter latitud longitud if nougrups4 ==2, mcolor(blue)) ///
(scatter latitud longitud if nougrups4 ==3, mcolor(red)) ///
(scatter latitud longitud if nougrups4 ==4, mcolor(green))
Is there a simpler and automatic way to do this?
In this case, the categorical variable nougrups4 came from a cluster analysis. A general solution would be fine, but also a specific solution to draw clusters.
This is how I would do this by hand:
sysuse auto, clear
separate price, by(rep78)
tw scatter price? mpg
drop price?
Or in one line using Nick Cox's sepscatter command from SSC:
sepscatter price mpg, separate(rep78)
The latter command can also output other type of plots with the recast() option.
There isn't a 'simpler' built-in solution for what you want to do.
However, here's a simple wrapper command, which you can extend to meet your needs:
capture program drop foo
program define foo
syntax varlist(min=1 max=3)
quietly {
tokenize `varlist'
levelsof `3', local(foolevels)
local i = 0
local foocolors red green blue
foreach x of local foolevels {
local ++i
local extra `extra' || scatter `1' `2' if `3' == `x', mcolor("`: word `i' of `foocolors''")
}
twoway `extra'
}
end
And a toy example:
clear
set obs 10
generate A = runiform()
generate B = runiform()
generate C = .
replace C = 1 in 1/3
replace C = 2 in 4/7
replace C = 3 in 8/10
foo A B C

Graph evolution of quantile non-linear coefficient: can it be done with grqreg? Other options?

I have the following model:
Y_{it} = alpha_i + B1*weight_{it} + B2*Dummy_Foreign_{i} + B3*(weight*Dummy_Foreign)_ {it} + e_{it}
and I am interested on the effect on Y of weight for foreign cars and to graph the evolution of the relevant coefficient across quantiles, with the respective standard errors. That is, I need to see the evolution of the coefficients (B1+ B3). I know this is a non-linear effect, and would require some sort of delta method to obtain the variance-covariance matrix to obtain the standard error of (B1+B3).
Before I delve into writing a program that attempts to do this, I thought I would try and ask if there is a way of doing it with grqreg. If this is not possible with grqreg, would someone please guide me into how they would start writing a code that computes the proper standard errors, and graphs the quantile coefficient.
For a cross section example of what I am trying to do, please see code below.
I use grqred to generate the evolution of the separate coefficients (but I need the joint one)-- One graph for the evolution of (B1+B3) with it's respective standard errors.
Thanks.
(I am using Stata 14.1 on Windows 10):
clear
sysuse auto
set scheme s1color
gen gptm = 1000/mpg
label var gptm "gallons / 1000 miles"
gen weight_foreign= weight*foreign
label var weight_foreign "Interaction weight and foreign car"
qreg gptm weight foreign weight_foreign , q(.5)
grqreg weight weight_foreign , ci ols olsci reps(40)
*** Question 1: How to constuct the plot of the coefficient of interest?
Your second question is off-topic here since it is statistical. Try the CV SE site or Statalist.
Here's how you might do (1) in a cross section, using margins and marginsplot:
clear
set more off
sysuse auto
set scheme s1color
gen gptm = 1000/mpg
label var gptm "gallons / 1000 miles"
sqreg gptm c.weight##i.foreign, q(10 25 50 75 95) reps(500) coefl
margins, dydx(weight) predict(outcome(q10)) predict(outcome(q25)) predict(outcome(q50)) predict(outcome(q75)) predict(outcome(q95)) at(foreign=(0 1))
marginsplot, xdimension(_predict) xtitle("Quantile") ///
legend(label(1 "Domestic") label(2 "Foreign")) ///
xlabel(none) xlabel(1 "Q10" 2 "Q25" 3 "Q50" 4 "Q75" 5 "Q95", add) ///
title("Marginal Effect of Weight By Origin") ///
ytitle("GPTM")
This produces a graph like this:
I didn't recast the CI here since it would look cluttered, but that would make it look more like your graph. Just add recastci(rarea) to the options.
Unfortunately, none of the panel quantile regression commands play nice with factor variables and margins. But we can hack something together. First, you can calculate the sums of coefficients with nlcom (instead of more natural lincom, which the lacks the post option), store them, and use Ben Jann's coefplot to graph them. Here's a toy example to give you the main idea where we will look at the effect of tenure for union members:
set more off
estimates clear
webuse nlswork, clear
gen tXu = tenure*union
local quantiles 1 5 10 25 50 75 90 95 99 // K quantiles that you care about
local models "" // names of K quantile models for coefplot to graph
local xlabel "" // for x-axis labels
local j=1 // counter for quantiles
foreach q of numlist `quantiles' {
qregpd ln_wage tenure union tXu, id(idcode) fix(year) quantile(`q')
nlcom (me_tu:_b[tenure]+_b[tXu]), post
estimates store me_tu`q'
local models `"`models' me_tu`q' || "'
local xlabel `"`xlabel' `j++' "Q{sub:`q'}""'
}
di "`models'
di `"`xlabel'"'
coefplot `models' ///
, vertical bycoefs rescale(100) ///
xlab(none) xlabel(`xlabel', add) ///
title("Marginal Effect of Tenure for Union Members On Each Conditional Quantile Q{sub:{&tau}}", size(medsmall)) ///
ytitle("Wage Change in Percent" "") yline(0) ciopts(recast(rcap))
This makes a dromedary curve, which suggests that the effect of tenure is larger in the middle of the wage distribution than at the tails:

How to make Stata margins work for user-written model

I wonder, what requirements must a user-written estimation and/or prediction program satisfy in order for standard Stata margins command to be able to work with it?
I have created a toy "estimation" program with a prediction module, but when I run margins, dydx(x) after myreg y x, Stata throws r(103) ("too many specified") and produces nothing. Can anyone modify my code so that margins could work with it?
Yes, I know that if e(predict) is not returned, margins assume linear prediction and work OK, but eventually I need to write a nonlinear model and estimate marginal effects for it.
program mypred
version 13
syntax name [if] [in]
marksample touse
local newVar = "`1'"
mat b = e(b)
local columnNames: colfullnames b
tokenize `columnNames'
gen `newVar' = b[1,1] + b[1,2] * `2'
end
program myreg, eclass
version 13
syntax varlist(min=2 max=2) [if] [in]
marksample touse
tempname b V
matrix input b = (1.1, 2.3)
matrix input V = (9, 1 \ 1, 4)
matrix colnames b = _cons `2'
matrix colnames V = _cons `2'
matrix rownames V = _cons `2'
ereturn post b V, esample(`touse')
ereturn local predict "mypred"
ereturn local cmd "myreg"
ereturn display
end
I don't have a complete answer. If there is such a one-stop location within the Stata documentation that answers your question, I'm not aware of it.
The recommendation is to read, at least, the whole entry: [R] margins. Here is a list of conditions that should be considered:
margins cannot be used after estimation commands that do not produce
full variance matrices, such as exlogistic and expoisson (see [R]
exlogistic and [R] expoisson).
margins is all about covariates and
cannot be used after estimation commands that do not post the
covariates, which eliminates gmm (see [R] gmm).
margins cannot be used
after estimation commands that have an odd data organization, and that
excludes asclogit, asmprobit, asroprobit, and nlogit (see [R]
asclogit, [R] asmprobit, [R] asroprobit, and [R] nlogit).
From another subsection:
... as of Stata 11, you are supposed to set in e(marginsok) the list
of options allowed with predict that are okay to use with margins.
Consider also inspecting (help viewsource) user-written commands from experienced user/programmers who allow for this in their commands. Maarten Buis is one of them. (You can run search maarten buis, all to search within Stata.)

generating bootstrap standard errors for large number of scalars

Suppose I have four scalars: call them dea_1 dea_2 dea_3 dea_4. They are output from a program samprogram (not shown here).
Now I use the bootstrap command in Stata with these scalars to get bootstrapped standard errors.
set seed 123
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4), reps(100): samprogram
This is fine but in my original program, I calculate 30 scalars, dea_1 dea_2 ... dea_30. Now I want to avoid writing each of these 30 scalars in the bootstrap command and for this purpose I wrote a loop as follows:
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
This works, but gives the output for each scalar one at a time. However, I am looking for code which avoids writing all scalars in the bootstrap command but still gives the output for all at the same time (i.e. like the output from the following command)
set seed 345
bootstrap dea_1=r(dea_1)dea_2=r(dea_2)dea_3=r(dea_3)dea_4=r(dea_4)[omitted]...dea_30=r(dea_30), reps(100): samprogram
Any help in this regard will be highly appreciated.
This yields to building-up the contents of a local macro step-by-step.
set seed 123
forval i = 1/30 {
local call `call' dea_`i'=r(dea_`i')
}
bootstrap `call', reps(100) : samprogram
If need be, blank out the macro beforehand by
local call
More discussion in http://www.stata-journal.com/sjpdf.html?articlenum=pr0005 [free .pdf]
(LATER) Note that contrary to your assertion the code
set seed 234
forvalues i in 1(1)30{
local k dea_`i'
bootstrap dea_`k'=r(dea_`k'), reps(100): samprogram
}
would not work as intended. First time round, for example, bootstrap would be looking for r(dea_dea_1) and would return missing for every sample. The code for calling bootstrap repeatedly could simply be
set seed 234
forvalues i = 1/30 {
bootstrap dea_`i'=r(dea_`i'), reps(100): samprogram
}
but that would be a bad idea when you can do what you want in one call.
An alternative solution would be to make your program eclass and return the results in the matrix e(b). This allows the shortcut bootstrap _b, reps(100): samprogram. Below is an example. The key points here are that the different scalars are stored in the row vector `b', which is returned by the program as the row vector e(b) whith the command:
ereturn post `b', esample(`touse')
A complete example is here:
clear all
program define sim, eclass
syntax varlist(numeric) [if] [in], by(varname numeric)
marksample touse
markout `touse' `by'
local k : word count `varlist'
tempname b m0
matrix `b' = J(1,`k',.)
local i = 1
foreach var of local varlist {
sum `var' if `touse' & `by', meanonly
scalar `m0' = r(mean)
sum `var' if `touse' & !`by', meanonly
matrix `b'[1,`i'] = `m0' - r(mean)
local i = `i' + 1
}
ereturn post `b', esample(`touse')
end
sysuse auto
bootstrap _b, reps(100) : sim price mpg length weight trunk, by(foreign)

Calculating the Gini Coefficient from LIS data (in Stata)

I need to calculate the Gini coefficient from disposable personal income data at LIS. According to a LIS training document, the Stata code to do this is:
di "** INCOME DISTRIBUTION II – Exercise 13 **"
program define bottop
qui sum ey [w=hweight*d4]
replace ey = .01*r(mean) if ey<.01*r(mean)
qui sum dpi [w=hweight*d4], de
replace ey = (10*r(p50)/(d4^.5)) if dpi>10*r(p50)
end
foreach file in $us00h $fi00h {
display "`file'"
use hweight d4 dpi if (!mi(dpi) & !(dpi==0)) using "`file'", clear
gen ey=dpi/(d4^0.5)
bottop
ineqdeco ey [w=hweight*d4]
}
I have simply copied and pasted this code from the training document. The snippets
qui sum ey [w=hweight*d4]
replace ey=0.01*r(mean) if ey<0.01*r(mean)
and
qui sum dpi [w=hweight*d4], de
replace ey=(10*r(p50)/(d4^0.5)) if dpi>10*r(p50)
are bottom and top coding, respectively.
When I tried to run this code, the variable hweight was not found. Does anyone know what the new name of hweight is at LIS? Or can anyone suggest how I might otherwise overcome this impasse?
I'm familiar with stata, but the sophistication of this code is beyond my ken.
Much appreciated.
Based on the varaiable definition list at the LIS Documentation page, it looks like the variable is now called HWGT
This is more of a second-best solution. However, the census of population provides income by brackets. If you are willing to do that, you can get the counts for every bracket. Have a top-coded bracket for the last one. Use the median income value within each bracket. Then you can directly apply the formula for the Gini coefficient. It is a second best because it is an approximation for the individaul-level data.
Why don't you try the fastgini command:
http://www.stata.com/statalist/archive/2007-02/msg00524.html
ssc install fastgini
fastgini income
return list
this should give you the gini for the variable income.
This package also allows for weights. Type
help fastgini
for more information