Spacing for y-axis labels in coefplot - stata

I am working with the community-contributed command coefplot in Stata.
I have a large number of estimated coefficients, which I would like to plot on the same graph.
As such, I would like to reduce the spacing between coefficients.
Consider the following toy example using Stata's auto toy dataset:
quietly sysuse auto, clear
quietly regress price mpg trunk length turn
coefplot, drop(_cons) xline(0)
How can the spacing between Mileage (mpg) and Trunk space (cu. ft.) be decreased?

Some white space around the graph is an unavoidable limitation because of how Stata's graphics system works. With that said, an alternative way around this (which does not tinker with the aspect ratio of the graph) is to increase the range of the y-axis.
For example:
forvalues i = 1 / 4 {
coefplot, drop(_cons) xline(0) yscale(range(-`i' `=6+`i''))
}
A different but related approach, is to turn off the y labels entirely and use marker labels instead:
forvalues i = 1 / 4 {
coefplot, drop(_cons) ///
xline(0) ///
yscale(range(-`i' `=6+`i'')) ///
yscale(off) ///
mlabels(mpg = 12 "Mileage" ///
trunk = 12 "Trunk space (cu. ft.)" ///
length = 12 "Length (in.)" ///
turn = 12 "Turn Circle (ft.)")
}
In both approaches, the starting and ending positions (i.e. the amount of space above and below the labels) can be set by tweaking the values specified within the range() suboption.
Note that the grid lines can be turned off by using the option grid(none).
In addition, by combining the at(matrix()) option and yscale(range()) one can allow for unequal reductions in the distance of the coefficients:
matrix A = (0.2,0.21,0.22,0.225,0.255)
coefplot, drop(_cons) ///
xline(0) ///
yscale(range(0.18 0.26)) ///
yscale(off) ///
mlabels(mpg = 12 "Mileage" ///
trunk = 12 "Trunk space (cu. ft.)" ///
length = 12 "Length (in.)" ///
turn = 12 "Turn Circle (ft.)") ///
at(matrix(A)) ///
horizontal

Related

Marginal effects from estat mfx after asclogit

I am trying to understand how Stata calculates both the probability that an alternative is selected and the marginal effect calculated that the mean when I estat mfx after estimating a McFadden / conditional logit model using asclogit.
For example:
asclogit H t, case(ID) alternatives(AQ) casevars(Medicaid) ///
basealternative(1) vce(cluster Medicaid)
estat mfx, varlist(Medicaid)
My goal is to re-create the results by estimating the same model using the clogit and manually calculating the equivalent marginal effects. I am able to reproduce the conditional logit estimates generated by asclogit using clogit but I get stuck reproducing the post estimation calculations.
I have not been able to re-produce computed probability of each alternative being selected, which from reading the documentation for estat mfx I learned is evaluated at the value that is labeled X in the table output.
Here are the estat probability figures:
In case the picture didn't come out:
. matrix baseline = r(pr_1)\r(pr_2)\r(pr_3)\r(pr_4)\r(pr_5)
. matrix list baseline
baseline[5,1]
c1
r1 .04077232
r2 .15206384
r3 .01232535
r4 .10465885
r5 .69017964
Keep in mind that the variables beginning with Valpha and VMedicaid are case specific variables I created for the clogit command. They are respectively the intercept and an indicator for Medicaid coverage.
Here is what I have:
clogit H t Valpha* i.VMedicaid_2 i.VMedicaid_3 i.VMedicaid_4 i.VMedicaid_5 , ///
group(ID) vce(cluster Medicaid)
* Reproducing probability an alternative selected calculated by estat mfx
// calculate covariate means to plug into probability calculations
local V t Valpha* VMedicaid_* Medicaid
foreach var of varlist `V' {
summarize `var'
scalar `var'_MN = r(mean)
}
// alternative specific ZB
scalar zb = _b[t]*t_MN
// numerators attempt 1
foreach j of numlist 2/5 {
scalar XB`j' = exp(zb + (_b[Valpha_`j']*Valpha_`j'_MN) + ///
(_b[1.VMedicaid_`j']*VMedicaid_`j'_MN))
di "this is `j': " XB`j'
}
// numerators attempt 2, documentation for estat mfx said that probably was
// evaluated with Medicaid= 0.68 which is the Medicaid coverage rate across cases
// rather than the mean of the various VMedicaid_ variables are used to estimate
// clogit. Replaced intercept mean with 1
foreach j of numlist 2/5 {
scalar XB`j' = exp(zb + (_b[Valpha_`j']) + (_b[1.VMedicaid_`j' *Medicaid_MN))
di "this is `j': " XB`j'
}
scalar XB1 =exp(zb)
// denominator
scalar DNM = XB1+ XB2+ XB3+ XB4 + XB5
// Baseline
foreach j of numlist 1/5 {
scalar PRB`j' = XB`j'/DNM
di "The probability of choosing hospital `j' is: " PRB`j'
}
The results I get are the following:
The probability of choosing hospital 1 is: .14799075
The probability of choosing hospital 2 is: .21019437
The probability of choosing hospital 3 is: .09046377
The probability of choosing hospital 4 is: .18383085
The probability of choosing hospital 5 is: .36752026

Coding in SAS using dynamically changing variable

I have a co-variance matrix of variables like this: The values are mirror image across diagonal line. Therefore below diagonal or upper diagonal can be made null for convenience. This is variance-co-variance matrix of the coefficients of a linear regression.
Obs Intercept length diameter height weight_w weight_s
1 0.15510 -0.29969 -0.05904 -0.20594 0.07497 -0.00168
2 -0.29969 3.46991 -3.50836 -0.01703 -0.04841 -0.14048
3 -0.05904 -3.50836 5.08407 -0.82108 -0.13027 0.10732
4 -0.20594 -0.01703 -0.82108 4.89589 -0.29959 0.30447
5 0.07497 -0.04841 -0.13027 -0.29959 0.13787 -0.18763
6 -0.00168 -0.14048 0.10732 0.30447 -0.18763 0.40414
Where Obs 1 – 6 represent Intercept, and five variables length, diameter, height, weight_w, weight_s. Diagonal values are variances. Rest are co-variances between variables and variables and intercept.
I want to create a formula, where number of variables can change and user can input those as parameters. Based on number of variables formula should dynamically expand or contract and calculate result. The formula for five variables will be like this: C1,C2, ...C5 are constant that comes with five variables from outside. These are beta co-efficient of a linear regression. These will vary based on number of variables.
0.15+(C1)^2 * 3.46 + (C2)^2 * 5.08 + (C3)^2 * 4.89 + (C4)^2*0.13 + (C5)^2*0.40 -- Covers all variances
+2*C1*-0.29 + 2*C2*-0.05 + 2*C3*-0.20 + 2*C4*-0.07 + 2*C5*-0.001 -- covers co-variance of all variables with intercept
+2*C1*C2*-3.50 + 2*C1*C3*-0.01 + 2*C1*C4*-0.04 + 2*C1*C5*-0.14 --covers co-variance of “length” with other leftover (minus intercept) variables
+2*C2*C3*-0.82+ 2*C2*C4*-0.13 + 2*C2*C5*0.10 -- covers co-variance of “diameter” with leftover variables
+2*C3*C4*-0.29+ 2*C3*C5*0.30 -- covers co-variance of “height” with leftovers
+2*C4*C5*-0.18 -- covers co-variance of weight_w & weight_s
Those five constants, matching to five variables, are inserted from outside. Rest are coming from co-variance matrix. In the co-variance matrix table the diagonal values are variances of those variables. Rest are co-variances. In the formula, you can see that where there are variances I have taken square of constants (Cs). Where there are co-variance, of two variables involved, the respective constants (Cs) multiple. Intercept is another term that comes with these variables. But intercept doesn't have any "C".
For two variables the co-variance matrix will be like this: Intercept will be there too.
Obs Intercept GRE GPA
1 1.15582 -.000281894 -0.28256
2 -0.00028 0.000001118 -0.00011
3 -0.28256 -.000114482 0.10213
Formula for calculation:
1.15582+(C1)^2 * 0.000001118 + (C2)^2 * 0.10213 -- Covers all variances on diagonal line
+2*C1*-.000281894 + 2*C2*-0.28256 -- covers co-variance of all variables with intercept
+2*C1*C2*-0.00011 -- covers co-variance between variable GRE & GPA

Polychoric correlation (Stata) using multiple imputations and a complex sample design

I have a data base (I use Stata 13) that has multiple imputations with a complex sample design (Strate and Pweight), so I generally use the following command before my analysis : mi estimate, esampvaryok:svy:
I just want to know is there any way to use the polychoric command in Stata in that context? Or, if it's not possible, do you know other software that would allow me to do so?
Apply polychoric to each imputation data set and then average the results. Although polychoric is not survey-aware, only the probability weights are needed to estimate the correlations. Here's code that computes two estimates of the correlations: 1) the average of the individual correlations from polychoric; 2) an estimate based on the average inverse-hyperbolic-tangent transform of those correlations. In the example they are similar, but I'd usually prefer the latter. See: Nick Cox, Speaking Stata: Correlation with confidence, or Fisher’s z revisited. The Stata Journal (2008) 8, Number 3, pp. 413–439. http://www.stata-journal.com/article.html?article=pr0041.
Update : Output the correlations to Stata matrices and added row & column names
/* Create MI data set */
set seed 43228226
sysuse auto, clear
recode rep78 1/2=3
replace foreign = . in 3/5
mi set flong
mi register impute rep78 foreign
mi impute chained ///
(ologit) rep78 ///
(logit) foreign = turn weight mpg ///
[pw = turn], add(5) double
/* Set up polychoric */
/* local macro with variables */
local pvars rep78 foreign mpg weight
local nvars : word count("`pvars'")
mata: nv = strtoreal(st_local("nvars"))
qui mi des
mata: nreps = st_numscalar("r(M)")
/* loop through MI data sets to get sums*/
mata: sum_r = J(nv,nv,0)
mata: sum_atr = J(nv,nv,0)
forvalues i = 1/`=r(M)'{
qui polychoric `pvars' [pw = turn] if _mi_m==`i'
mata: r = st_matrix("r(R)")
mata: sum_r = sum_r + r
mata: sum_atr = sum_atr +atanh(r)
}
/* Now average and get estimates */
mata:
st_matrix("rho1",sum_r/nreps)
/* For version based on atanh:
1) Get average of atanh transforms.
2) Diagonal elements are missing; substitute 0s.
3) Back transform to correlation scale.
4) Put 1s on diagonal.
5) Make the matrix symmetric. */
st_matrix("rho2", ///
makesymmetric(tanh(lowertriangle(sum_atr/nreps,0))) +I(nv))
end
/* rho1 : average correlations
rho2 : back transform of average atanh(r) */
forvalues i = 1/2 {
mat colnames rho`i' = `pvars'
mat rownames rho`i' = `pvars'
mat list rho`i'
}
Update 2: Mostly Mata
/* Set up variables & pweight */
local pcvars rep78 foreign mpg weight
local pwtvar turn
mata:
stata("qui mi query")
M= st_numscalar("r(M)")
vnames = tokens(st_local("pcvars"))
nv = cols(vnames)
/*Initialize sums for average numerators */
sum_r = J(nv,nv,0)
sum_atr = J(nv,nv,0)
/* Run -polychoric- on each imputed data set */
for (j = 1; j<=M; j++) {
st_numscalar("j",j)
stata("qui polychoric `pcvars' [pw = `pwtvar'] if _mi_m==j")
r = st_matrix("r(R)")
sum_r = sum_r + r
sum_atr = sum_atr + atanh(r)
}
/* Create Stata correlation matrices from average over imputations*/
st_matrix("rho1",sum_r/M)
st_matrix("rho2", ///
makesymmetric(tanh(lowertriangle(sum_atr/M,0))) +I(nv))
/* Label rows & columns */
c = (J(nv,1,""),vnames')
st_matrixrowstripe("rho1",c)
st_matrixcolstripe("rho1",c)
st_matrixrowstripe("rho2",c)
st_matrixcolstripe("rho2",c)
end
mat list rho1
mat list rho2

Adding variables in Stata and then dividing by a number is giving unexpected results

I am trying to calculate z-scores by creating a variable D from 3 other variables, namely A, B, and C. I am trying to generate D as : D= (A-B)/C but for some reason when I do it, it produces very large numbers. When I did just (A-B) it did not get what it should have when I calculated by hand, instead of -2, I for -105.66.
Variable A is 'long' and variable B is 'float', I am not sure if this is the reason? My stata syntax is:
gen zscore= (height-avheight)/meansd
did not work.
You are confusing scalars and variables. Here's a solution (chop off the first four lines and replace x by height to fit the calculation into your code):
// example data
clear
set obs 50
gen x = runiform()
// summarize
qui su x
// store scalars
sca de mu = r(mean)
sca de sd = r(sd)
// z-score
gen zx = (x - mu) / sd
su zx
x and its z-score zx are variables that take many values, whereas mu and sd are constants. You might code constants in Stata by using scalars or macros.
I am not sure what you are trying to get, but I will use the auto data from Stata to explain. This is basic stuff in Stata. Say I want to test that the price=3
sysuse auto
sum price
#return list which is optional command
scalar myz=(3-r(mean))/r(sd) #r(mean) and r(sd) gives the mean and sd of price, if that is given you can simply enter the value for that
dis myz
-2.0892576
So, z value is -2.09 here.

Compute percentages in a PROC REPORT

PRODUCT CODE Quantity
A 1 100
A 2 150
A 3 50
total product A 300
B 1 10
B 2 15
B 3 5
total product B 30
I made a proc report and the break after product gives me the total quantity for each product. How can I compute an extra column on the right to calculate the percent quantity of product based on the subtotal?
SAS has a good example of this in their documentation, here. I reproduce a portion of this with some additional comments below. See the link for the initial datasets and formats (or create basic ones yourself).
proc report data=test nowd split="~" style(header)=[vjust=b];
format question $myques. answer myyn.;
column question answer,(n pct cum) all;
/* Since n/pct/cum are nested under answer, they are columns 2,3,4 and 5,6,7 */
/* and must be referred to as _c2_ _c3_ etc. rather than by name */
/* in the OP example this may not be the case, if you have no across nesting */
define question / group "Question";
define answer / across "Answer";
define pct / computed "Column~Percent" f=percent8.2;
define cum / computed "Cumulative~Column~Percent" f=percent8.2;
define all / computed "Total number~of answers";
/* Sum total number of ANSWER=0 and ANSWER=1 */
/* Here, _c2_ refers to the 2nd column; den0 and den1 store the sums for those. */
/* compute before would be compute before <variable> if there is a variable to group by */
compute before;
den0 = _c2_;
den1 = _c5_;
endcomp;
/* Calculate percentage */
/* Here you divide the value by its denominator from before */
compute pct;
_c3_ = _c2_ / den0;
_c6_ = _c5_ / den1;
endcomp;
/* This produces a summary total */
compute all;
all = _c2_ + _c5_;
/* Calculate cumulative percent */
temp0 + _c3_;
_c4_ = temp0;
temp1 + _c6_;
_c7_ = temp1;
endcomp;
run;