Mediation analysis - KHB ologit - stata

I'm using KHB in stata based on this paper by Kohler, Karlson and Holm: https://journals.sagepub.com/doi/10.1177/1536867X1101100306
When trying out the code for ordinal logistic regression, I got the same error message as this I'm author:
https://www.statalist.org/forums/forum/general-stata-discussion/general/1496400-khb-method-for-mediation-analysis
I have tried the suggestion in the forum, but it doesn't work. Does anyone have experience with KHB and ologit in stata and can help me with this code?
forv i = 1/3 {
quietly estate: khi ologit diabetes education || healty_diet, outcome(`i´) ape summary
}
esttab, scalars("ratio_ education Conf.-Ratio" "pct_education Conf.-Perc.")
I get this error:
program error: code follows on the same line as open brace

The error seems to be an issue with the syntax of your code, and not the methods you are linking to.
When using the curly brackets { } to indicate the beginning and end of a loop then nothing is allowed to follow on the same line as the opening bracket {.
This is allowed:
forv i = 1/3 {
di "`i'"
}
but not this:
forv i = 1/3 { di "`i'"
}

Related

Stata coefplot: plot coefficients and corresponding confidence intervals on 2nd axis

When trying to depict two coefficients from one regression on separate axes with Ben Jann's superb coefplot (ssc install coefplot) command, the coefficient to be shown on the 2nd axis is correctly displayed, but its confidence interval is depicted on the 1st scale.
Can anyone explain how I get the CI displayed on the same (2nd) axis as the coefficient it belongs to? I couldn't find any option to change this - and imagine it should be the default, if not the only, option to plot the CI around the point estimate it belongs to.
I use the latest coefplot version with Stata 16.
Here is a minimum example to illustrate the problem:
results plot
webuse union, clear
eststo results: reg idcode i.union grade
coefplot (results, keep(1.union)) (results, keep(grade) xaxis(2))
In the line
coefplot (results, keep(1.union)) (results, keep(grade) xaxis(2))
you specify the option xaxis(2), but this is not a documented option of coefplot, although it is a valid option of twoway rspike which is called by coefplot. Apparently, if you use xaxis(2) something goes wrong with the communication between coefplot and rspike.
This works for me:
coefplot (results, keep(1.union)) (results, keep(grade) axis(2))
I'm trying to create something similar. Since this option is not built-in we need to write a program to tweak how coefplot works. I'm sharing the code from the user manual here: http://repec.sowi.unibe.ch/stata/coefplot/markers.html
capt program drop coefplot_mlbl
*! version 1.0.0 10jun2021 Ben Jann
program coefplot_mlbl, sclass
_parse comma plots 0 : 0
syntax [, MLabel(passthru) * ]
if `"`mlabel'"'=="" local mlabel mlabel(string(#b, "%5.2f") + " (" + string(#ll, "%5.2f") + "; " + string(#ul, "%5.2f") + ")")
preserve
qui coefplot `plots', `options' `mlabel' generate replace nodraw
sreturn clear
tempvar touse
qui gen byte `touse' = __at<.
mata: st_global("s(mlbl)", ///
invtokens((strofreal(st_data(.,"__at","`touse'")) :+ " " :+ ///
"`" :+ `"""' :+ st_sdata(.,"__mlbl","`touse'") :+ `"""' :+ "'")'))
sreturn local plots `"`plots'"'
sreturn local options `"`options'"'
end
capt program drop coefplot_ymlbl
*! version 1.0.0 10jun2021 Ben Jann
program coefplot_ymlbl
_parse comma plots 0 : 0
syntax [, MLabel(str asis) * ]
_parse comma mlspec mlopts : mlabel
local mlopts = substr(`"`mlopts'"', 2, .) // remove leading comma
if `"`mlspec'"'!="" local mlabel mlabel(`mlspec')
else local mlabel
coefplot_mlbl `plots', `options' `mlabel'
coefplot `plots', ///
yaxis(1 2) yscale(alt) yscale(axis(2) alt noline) ///
ylabel(none, axis(2)) yti("", axis(2)) ///
ymlabel(`s(mlbl)', axis(2) notick angle(0) `mlopts') `options'
end
coefplot_ymlbl D F, drop(_cons) xline(0)
However, the above program does not allow for the option 'bylabel'. I get a stata error saying "bylabel not allowed". I wanted to ask if there is a way to edit this code and include the bylabel option which is used to label subplots?

Precisions and counts

I am working with a educational dataset called IPEDS from the National Center for Educational Statistics. They track students in college based upon major, degree completion, etc. The problem in Stata is that I am trying to determine the total count for degrees obtained by a specific major.
They have a variable cipcode which contains values that serve as "majors". cipcode might be 14.2501 "petroleum engineering, 16.0102 "Linguistics" and so forth.
When I write a particular code like
tab cipcode if cipcode==14.2501
it reports no observations. What code will give me the totals?
/*Convert Float Variable to String Variable and use Force Replace*/
tostring cipcode, gen(cipcode_str) format(%6.4f) force
replace cipcode_str = reverse(substr(reverse(cipcode_str), indexnot(reverse(cipcode_str), "0"), .))
replace cipcode_str = reverse(substr(reverse(cipcode_str), indexnot(reverse(cipcode_str), "."), .))
/* Created a total variable called total_t1 for total count of all stem majors listed in table 1*/
gen total_t1 = cipcode_str== "14.2501" + "14.3901" + "15.0999" + "40.0601"
This minimal example confirms your problem. (See, by the way, https://stackoverflow.com/help/mcve for advice on good examples.)
* code
clear
input code
14.2501
14.2501
14.2501
end
tab code if code == 14.2501
tab code if code == float(14.2501)
* results
. tab code if code == 14.2501
no observations
. tab code if code == float(14.2501)
code | Freq. Percent Cum.
------------+-----------------------------------
14.2501 | 3 100.00 100.00
------------+-----------------------------------
Total | 3 100.00
The keyword is one you use, precision. In Stata, search precision for resources, starting with blog posts by William Gould. A decimal like 14.2501 is hard (impossible) to hold exactly in binary and the details of holding a variable as type float can bite.
It's hard to see what you're doing with your last block of code, which you don't explain. The last statement looks puzzling, as you're adding strings. Consider what happens with
. gen whatever = "14.2501" + "14.3901" + "15.0999" + "40.0601"
. di whatever[1]
14.250114.390115.099940.0601
The result is a long string that cannot be a valid cipcode. I suspect that you are reaching towards
... if inlist(cipcode_str, "14.2501", "14.3901", "15.0999", "40.0601")
which is quite different.
But using float() is the minimal trick for this problem.

American Community Survey, SAS EG code for margin of error

The Census Bureau gives the mathematical formula for calculating the margin of error for the American Community Survey, but doesn't include the SAS code for it. The formula is on page 24 of the documentation here: http://www2.census.gov/programs-surveys/acs/tech_docs/accuracy/ACS_Accuracy_of_Data_2014.pdf
Does anyone have the SAS code for the Margin of Error? It would have to incorporate all 80 pwgtp's.
Here is the relevant code. It uses a 90% confidence interval because that is what the Census Bureau uses for their published margins of error on American FactFinder. You can change the confidence interval at the beginning where '1.64537' is.
/* Margin of Error 90% confidence code*/
1.64537*(SQRT(.05*(SUM((SUM(t1.pwgtp1)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp2)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp3)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp4)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp5)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp6)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp7)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp8)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp9)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp10)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp11)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp12)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp13)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp14)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp15)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp16)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp17)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp18)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp19)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp20)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp21)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp22)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp23)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp24)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp25)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp26)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp27)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp28)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp29)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp30)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp31)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp32)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp33)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp34)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp35)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp36)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp37)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp38)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp39)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp40)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp41)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp42)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp43)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp44)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp45)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp46)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp47)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp48)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp49)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp50)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp51)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp52)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp53)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp54)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp55)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp56)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp57)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp58)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp59)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp60)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp61)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp62)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp63)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp64)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp65)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp66)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp67)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp68)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp69)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp70)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp71)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp72)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp73)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp74)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp75)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp76)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp77)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp78)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp79)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp80)-(SUM(t1.PWGTP)))**2))))
AS Margin_of_Error,
/* Plus_Minus */
(1.64537*(SQRT(.05*(SUM((SUM(t1.pwgtp1)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp2)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp3)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp4)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp5)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp6)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp7)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp8)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp9)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp10)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp11)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp12)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp13)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp14)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp15)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp16)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp17)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp18)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp19)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp20)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp21)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp22)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp23)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp24)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp25)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp26)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp27)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp28)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp29)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp30)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp31)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp32)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp33)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp34)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp35)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp36)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp37)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp38)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp39)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp40)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp41)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp42)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp43)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp44)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp45)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp46)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp47)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp48)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp49)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp50)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp51)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp52)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp53)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp54)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp55)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp56)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp57)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp58)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp59)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp60)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp61)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp62)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp63)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp64)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp65)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp66)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp67)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp68)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp69)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp70)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp71)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp72)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp73)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp74)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp75)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp76)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp77)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp78)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp79)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp80)-(SUM(t1.PWGTP)))**2)))))/ (SUM(t1.PWGTP))
FORMAT=PERCENT6.1 AS Plus_Minus_Percent
/* End of Margin of Error Code */
To see where this fits in to a query, here is the full code of a query with the margin of error code embedded.
/* Example program */
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_COMBINEDACS2013_SAS7BD(label="QUERY_FOR_combinedacs2013.sas7bdat") AS
SELECT /* SUM_of_PWGTP */
(SUM(t1.PWGTP)) FORMAT=Z5. AS SUM_of_PWGTP,
t1.SCHL,
1.64537*(SQRT(.05*(SUM((SUM(t1.pwgtp1)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp2)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp3)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp4)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp5)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp6)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp7)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp8)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp9)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp10)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp11)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp12)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp13)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp14)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp15)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp16)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp17)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp18)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp19)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp20)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp21)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp22)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp23)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp24)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp25)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp26)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp27)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp28)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp29)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp30)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp31)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp32)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp33)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp34)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp35)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp36)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp37)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp38)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp39)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp40)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp41)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp42)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp43)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp44)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp45)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp46)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp47)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp48)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp49)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp50)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp51)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp52)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp53)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp54)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp55)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp56)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp57)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp58)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp59)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp60)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp61)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp62)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp63)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp64)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp65)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp66)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp67)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp68)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp69)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp70)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp71)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp72)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp73)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp74)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp75)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp76)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp77)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp78)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp79)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp80)-(SUM(t1.PWGTP)))**2))))
AS Margin_of_Error,
/* Plus_Minus */
(1.64537*(SQRT(.05*(SUM((SUM(t1.pwgtp1)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp2)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp3)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp4)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp5)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp6)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp7)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp8)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp9)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp10)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp11)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp12)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp13)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp14)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp15)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp16)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp17)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp18)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp19)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp20)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp21)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp22)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp23)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp24)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp25)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp26)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp27)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp28)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp29)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp30)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp31)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp32)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp33)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp34)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp35)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp36)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp37)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp38)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp39)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp40)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp41)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp42)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp43)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp44)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp45)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp46)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp47)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp48)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp49)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp50)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp51)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp52)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp53)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp54)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp55)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp56)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp57)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp58)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp59)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp60)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp61)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp62)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp63)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp64)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp65)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp66)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp67)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp68)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp69)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp70)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp71)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp72)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp73)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp74)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp75)-(SUM(t1.PWGTP)))**2
,(SUM(t1.pwgtp76)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp77)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp78)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp79)-(SUM(t1.PWGTP)))**2,(SUM(t1.pwgtp80)-(SUM(t1.PWGTP)))**2)))))/ (SUM(t1.PWGTP))
FORMAT=PERCENT6.1 AS Plus_Minus_Percent
FROM EC100005.combinedacs2013 t1
GROUP BY t1.SCHL;
QUIT;
/* End example program */

Identify subsequent event windows (or occurrences) for each individual

This question is in the context of twoway line with the by() option, but I think the bigger problem is how to identify a second (and all subsequent) event windows without a priori knowing every event window.
Below I generate some data with five countries over the 1990s and 2000s. In all countries an event occurs in 1995 and in Canada only the event repeats in 2005. I would like to plot outcome over the five years centered on each event in each country. If I do this using twoway line and by(), then Canada plots twice in the same plot window.
clear
set obs 100
generate year = 1990 + mod(_n, 20)
generate country = "United Kingdom" in 1/20
replace country = "United States" in 21/40
replace country = "Canada" in 41/60
replace country = "Australia" in 61/80
replace country = "New Zealand" in 81/100
generate event = (year == 1995) ///
| ((year == 2005) & (country == "Canada"))
generate time_to_event = 0 if (event == 1)
generate outcome = runiform()
encode country, generate(countryn)
xtset countryn year
forvalue i = 1/2 {
replace time_to_event = `i' if (l`i'.event == 1)
replace time_to_event = -`i' if (f`i'.event == 1)
}
twoway line outcome time_to_event, ///
by(country) name(orig, replace)
A manual solution adds an occurrence variable that numbers each event occurrence by country, then adds occurrence to the by() option.
generate occurrence = 1 if !missing(time_to_event)
replace occurrence = 2 if ///
(inrange(year, 2005 - 2, 2005 + 2) & (country == "Canada"))
twoway line outcome time_to_event, ///
by(country occurrence) name(attempt, replace)
This works great in the play data, but in my real data there are many more countries and many more events. I can manually code this occurrence variable, but that is tedious (and now I'm really curious if there's a tool or logic that works :) ).
Is there a logic to automate identifying windows? Or one that at least works with twoway line? Thanks!
You have generated a variable time_to_event which is -2 .. 2 in a window and missing otherwise. You can use tsspell from SSC, installed by
ssc inst tsspell
to label such windows. Windows are defined by spells or runs of observations all non-missing on that time_to_event:
tsspell, cond(time_to_event < .)
tsspell requires a prior tsset and generates three variables explained in its help. You can then renumber windows by using one of those variables _seq (sequence number within spell, numbered 1 up)
gen _spell2 = (_seq > 0) * sum(_seq == 1)
and then label spells distinctly by using country and the spell identifier for each spell from _spell, another variable produced by tsspell:
egen gspell = group(country _spell) if _spell2, label
My code assumes that windows are disjoint and cannot overlap, but that seems to be one of your assumptions too. Some technique for handling spells is given by http://www.stata-journal.com/sjpdf.html?articlenum=dm0029 That article does not mention tsspell, which in essence is an implementation of its principles. I started explaining the principles, but the article got long enough before I could explain the program. As the help of tsspell is quite detailed, I doubt that a sequel paper is needed, or at least that it will be written.
(LATER) This code also assumes that windows don't touch. Solving that problem suggests a more direct approach not involving tsspell at all:
bysort country (year) : gen w_id = (time_to_event < .) * sum(time_to_event == -2)
egen w_label = group(country w_id) if w_id, label

Calculating the Gini Coefficient from LIS data (in Stata)

I need to calculate the Gini coefficient from disposable personal income data at LIS. According to a LIS training document, the Stata code to do this is:
di "** INCOME DISTRIBUTION II – Exercise 13 **"
program define bottop
qui sum ey [w=hweight*d4]
replace ey = .01*r(mean) if ey<.01*r(mean)
qui sum dpi [w=hweight*d4], de
replace ey = (10*r(p50)/(d4^.5)) if dpi>10*r(p50)
end
foreach file in $us00h $fi00h {
display "`file'"
use hweight d4 dpi if (!mi(dpi) & !(dpi==0)) using "`file'", clear
gen ey=dpi/(d4^0.5)
bottop
ineqdeco ey [w=hweight*d4]
}
I have simply copied and pasted this code from the training document. The snippets
qui sum ey [w=hweight*d4]
replace ey=0.01*r(mean) if ey<0.01*r(mean)
and
qui sum dpi [w=hweight*d4], de
replace ey=(10*r(p50)/(d4^0.5)) if dpi>10*r(p50)
are bottom and top coding, respectively.
When I tried to run this code, the variable hweight was not found. Does anyone know what the new name of hweight is at LIS? Or can anyone suggest how I might otherwise overcome this impasse?
I'm familiar with stata, but the sophistication of this code is beyond my ken.
Much appreciated.
Based on the varaiable definition list at the LIS Documentation page, it looks like the variable is now called HWGT
This is more of a second-best solution. However, the census of population provides income by brackets. If you are willing to do that, you can get the counts for every bracket. Have a top-coded bracket for the last one. Use the median income value within each bracket. Then you can directly apply the formula for the Gini coefficient. It is a second best because it is an approximation for the individaul-level data.
Why don't you try the fastgini command:
http://www.stata.com/statalist/archive/2007-02/msg00524.html
ssc install fastgini
fastgini income
return list
this should give you the gini for the variable income.
This package also allows for weights. Type
help fastgini
for more information