I am trying to analyze how extreme flatlining can be in a set of Likert-scale variables (V21-V34, V84-V92, and V114-V119). For example, if a respondent answers "strongly agree" for 14 out of 27 variables, "somewhat agree" for 9, "neither/nor" for 2, "somewhat disagree" for 1, and "strongly disagree" for 2, I would like this variable to be "14" for this respondent.
I don't have any code because I don't know where to begin here. I haven't had much experience with Stata.
EDIT: It seems like egen will work best for what I need. Code included below--
egen flatlinerstrongsupp = anycount(V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V84 V85 V86 V87 V88 V89 V90 V91 V92 V114 V115 V116 V117 V118 V119), values(1)
You appear to want the frequency of the mode for each person. Supposing that your possible values run 1 to 5, then this should help:
forval j = 1/5 {
egen count`j' = anycount(V21-V34 V84-V92 V114-V119), values(`j')
}
egen countmode = rowmax(count?)
Related
I am trying to create a school-level learning inequality index based on the difference in learning outcomes between the top 10% and the bottom 50% of students in each school. The data are student-level (below is my attempt) but evidently, I am not creating the deciles at the school level. I imagine I need to use a foreach loop, but that is not yet something I know am fluent at.
My attempt:
bysort IDSCHOOL: egen pv__std_deciles = std(avg_pv)
xtile pv_deciles=avg_pv, nq(10)
bysort IDSCHOOL: egen school_pv_5 = mean(avg_pv) if pv_deciles<6
bysort IDSCHOOL: egen school_pv_10 = mean(avg_pv) if pv_deciles==10
egen max_sp_10=max(school_pv_10), by(IDSCHOOL)
egen max_sp_5=max(school_pv_5), by(IDSCHOOL)
gen school_pv_diff= max_sp_10 - max_sp_5
sort IDSCHOOL
quietly by IDSCHOOL: gen dup = cond(_N == 1, 0, _n)
drop if dup > 1
isid IDSCHOOL
In Stata I am trying to repeat code inside an if qualifier using perhaps a forvalues loop. My code looks something like this:
gen y=0
replace y=1 if x_1==1 & x_2==1 & x_3==1 & x_4==1
Instead of writing the & x_i==1 statement every time for each variable, I want to do it using a loop, something like this:
gen y=0
replace y=1 if forvalues i=1/4{x_`i'==1 &}
LATER EDIT:
Would it be possible to create a local in the line of this with the elements added together:
forvalues i=1/4{
local text_`i' "x_`i'==1 &"
display "`text_`i''"
}
And then call it at the if qualifier ?
Although you use the term "if statement" all your code is phrased in terms of if qualifiers, which aren't commands or statements. (Your use of the term "statement" is looser than customary, but that doesn't affect an answer directly.)
You can't insert loops in if qualifiers.
See for the differences
help if
help ifcmd
The entire example
gen y = 0
replace y = 1 if x==1 | x==2 | x==3 | x==4
would be better as
gen y = inlist(x, 1, 2, 3, 4)
or (dependent possibly on whatever values are allowed)
gen y = inrange(x, 1, 4)
A loop solution could be
gen y = 0
quietly forval i = 1/4 {
replace y = 1 if x == `i'
}
We can't discuss whether inlist() or inrange() would or would not be a solution for your real problem if you don't show to us.
I usually don't like - in Nick's terms - to write code to write code. I see an immediate, though not elegant nor 'heterodox', solution to your issue. The whole thing amounts to generate an indicator function for all your indicators, and use it with your if qualifier.
Implicit assumptions, which make this a bad, non-generalizable solution, are: 1) all variables are dummies, and you need them to be == 1, and 2) variable names are conveniently ordered 1 to N (although, if that is not the case, you can easily change the forv into a 'foreach var of varlist etc.')
g touse = 1
forv i =1/30{
replace touse = touse * x_'i'
}
<your action> if touse == 1
I am using two-level loops to create a set of variables. But Stata reports a syntax error.
forvalues i = 1/5 {
local to `i'+1
dis `to'
forvalues j = `to'/6{
dis `j'
gen e_`i'_`j' = .
}
}
I could not figure out where I made the syntax error.
And a follow-up question. I would like to change how the number of loops are coded in the example above. Right now, it's hard-coded as 5 and 6. But I want to make it based on the data. For instance,I am coding as below:
sum x
scalar x_max_1 = `r(max)'-1
scalar x_max_2 = `r(max)'
forvalues i = 1/x_max_1 {
local to = `i'+1
dis `to'
forvalues j = `to'/x_max_2{
dis `j'
gen e_`i'_`j' = .
}
}
However, Stata reports a syntax error in this case. I am not sure why. The scalar is a numeric variable. Why would the code above not work?
Your code would be better as
forvalues i = 1/5 {
local to = `i' + 1
forvalues j = `to'/6 {
gen e_`i'_`j' = .
}
}
With your code you went
local to `i' + 1
so first time around the loop to becomes the string or text 1 + 1 which is then illegal as an argument to forvalues. That is, a local definition without an = sign will result in copying of text, not evaluation of the expression.
The way you used display could not show you this error because display used that way will evaluate expressions to the extent possible. If you had insisted that the macro was a string with
di "`to'"
then you would have seen its contents.
Another way to do it is
forvalues i = 1/5 {
forvalues j = `= `i' + 1'/6 {
gen e_`i'_`j' = .
}
}
EDIT
You asked further about
sum x
scalar x_max_1 = `r(max)'-1
scalar x_max_2 = `r(max)'
forvalues i = 1/x_max_1 {
and quite a lot can be said about that. Let's work backwards from one of various better solutions:
sum x, meanonly
forvalues i = 1/`= r(max) - 1' {
or another, perhaps a little more transparent:
sum x, meanonly
local max = r(max) - 1
forvalues i = 1/`max' {
What are the messages here:
If you only want the maximum, specify meanonly. Agreed: the option name alone does not imply this. See https://www.stata-journal.com/sjpdf.html?articlenum=st0135 for more.
What is the point of pushing the r-class result r(max) into a scalar? You already have what you need in r(max). Educate yourself out of this with the following analogy.
I have what I want. Now I put it into a box. Now I take it out of the box. Now I have what I want again. Come to think of it, the box business can be cut.
The box is the scalar, two scalars in this case.
forvalues won't evaluate scalars to give you the number you want. That will happen in many languages, but not here.
More subtly, forvalues doesn't even evaluate local references or similar constructs. What happens is that Stata's generic syntax parser does that for you before what you typed is passed to forvalues.
EDIT: Thank to Joe's advice, I will make my question more specific. Actually I need to code a function in Stata which takes variables A,B,C,D,... as inputs and a variable Y as output which can be evaluated with usual Stata functions/commands like "generate dummy=2*myfun(X) if ..."
The function itself contains numerical calculations. A pseudo Stata code will look like
myfun(X)
gen Y=0.5*X if X==1
replace Y=31-X if X==2
replace Y=X-2 if X==3
.... a long list
return(Y)
Notice that X can be a huge set of different Stata variables and the numerical calculations are rather long inside the function. That's why I would like to use a function. I guess that the native "program" command in Stata is not suitable for this type of problem because it cannot take variables as input/output.
(ANSWER TO ORIGINAL QUESTION)
I have never used SAS, but at a wild guess you want something like
foreach v in A B C D {
gen test`v' = 0.5 * (`v' == 1) + 0.6 * (`v' == 2) + 0.7 * (`v' == 3)
}
or
foreach v in A B C D {
gen test`v' = cond(`v' == 1, 0.5, cond(`v' == 2, 0.6, cond(`v' == 3, 0.7, .)))
}
But hang on; that middle line also looks like
gen test`v' = (4 + `v') / 10
(ANSWER TO COMPLETELY DIFFERENT REVISED QUESTION)
This can be done in various ways. As above you could have a loop
foreach v in A B C D {
gen test`v' = 0.5 * `v' if `v' == 1
replace test`v' = 31 - `v' if `v' == 2
replace test`v' = `v' - 2 if `v' == 3
}
The question says "I guess that the native "program" command in Stata is not suitable for this type of problem because it cannot take variables as input/output." That guess is completely incorrect. You could write a program to do this too. This example is schematic, not definitive. A real program would include more checks and error messages to match any incorrect input. For detailed advice, you really need to read the documentation. One answer on SO can't teach you all you need to know even to write simple Stata programs. In any case, the example is evidently frivolous and/or incomplete, so a complete working example would be pointless or impossible.
program myweirdexample
version 13
syntax varlist(numeric), Generate(namelist)
local nold : word count `varlist'
local nnew : word count `generate'
if `nold' != `nnew' {
di as err "`generate' does not match `varlist'"
exit 198
}
local i = 1
quietly foreach v of local varlist {
local new : word `i' of `generate'
gen `new' = 0.5 * `v' if `v' == 1
replace `new' = 31 - `v' if `v' == 2
replace `new' = `v' - 2 if `v' == 3
local ++i
}
end
Footnote on terminology: The question uses the term function more broadly than it is used in Stata. In Stata, commands and functions are distinct; "function" is not a synonym for command.
Second footnote: Check out recode. It may be what you need, but it is best for mapping integer codes to other integer codes.
Third footnote: An example of a needed check is that the argument of generate() should be variable names that are legal and new.
Is the modified version of kappa proposed by Conger (1980) available in Stata? Tried to google it to no avail.
This is an old question, but in case anyone is still looking--the SSC package kappaetc now calculates that, along with every other inter-rater statistic you could ever want.
Since no one has responded with a Stata solution, I developed some code to calculate Conger's kappa using the formulas provided in Gwet, K. L. (2012). Handbook of Inter-Rater Reliability (3rd ed.), Gaithersburg, MD: Advanced Analytics, LLC. See especially pp. 34-35.
My code is undoubtedly not as efficient as others could write, and I would welcome any improvements to the code or to the program format that others wish to make.
cap prog drop congerkappa
prog def congerkappa
* This program has only been tested with Stata 11.2, 12.1, and 13.0.
preserve
* Number of judges
scalar judgesnum = _N
* Subject IDs
quietly ds
local vlist `r(varlist)'
local removeit = word("`vlist'",1)
local targets: list vlist - removeit
* Sums of ratings by each judge
egen judgesum = rowtotal(`targets')
* Sum of each target's ratings
foreach i in `targets' {
quietly summarize `i', meanonly
scalar mean`i' = r(mean)
}
* % each target rating of all target ratings
foreach i in `targets' {
gen `i'2 = `i'/judgesum
}
* Variance of each target's % ratings
foreach i in `targets' {
quietly summarize `i'2
scalar s2`i'2 = r(Var)
}
* Mean variance of each target's % ratings
foreach i in `targets' {
quietly summarize `i'2, meanonly
scalar mean`i'2 = r(mean)
}
* Square of mean of each target's % ratings
foreach i in `targets' {
scalar mean`i'2sq = mean`i'2^2
}
* Sum of variances of each target's % ratings
scalar sumvar = 0
foreach i in `targets' {
scalar sumvar = sumvar + s2`i'2
}
* Sum of means of each target's % ratings
scalar summeans = 0
foreach i in `targets' {
scalar summeans = summeans + mean`i'2
}
* Sum of meansquares of each target's % ratings
scalar summeansqs = 0
foreach i in `targets' {
scalar summeansqs = summeansqs + mean`i'2sq
}
* Conger's kappa
scalar conkappa = summeansqs -(sumvar/judgesnum)
di _n "Conger's kappa = " conkappa
restore
end
The data structure required by the program is shown below. The variable names are not fixed, but the judge/rater variable must be in the first position in the data set. The data set should not include any variables other than the judge/rater and targets/ratings.
Judge S1 S2 S3 S4 S5 S6
Rater1 2 4 2 1 1 4
Rater2 2 3 2 2 2 3
Rater3 2 5 3 3 3 5
Rater4 3 3 2 3 2 3
If you would like to run this against a test data set, you can use the judges data set from StataCorp and reshape it as shown.
use http://www.stata-press.com/data/r12/judges.dta, clear
sort judge
list, sepby(judge)
reshape wide rating, i(judge) j(target)
rename rating* S*
list, noobs
* Run congerkappa program on demo data set in memory
congerkappa
I have run only a single validation test of this code against the data in Table 2.16 in Gwet (p. 35) and have replicated the Conger's kappa = .23343 as calculated by Gwet on p. 34. Please test this code on other data with known Conger's kappas before relying on it.
I don't know if Conger's kappa for multiple raters is available in Stata, but it is available in R via the irr package, using the kappam.fleiss function and specifying the exact option. For information on the irr package in R, see http://cran.r-project.org/web/packages/irr/irr.pdf#page.12 .
After installing and loading the irr package in R, you can view a demo data set and Conger's kappa calculation using the following code.
data(diagnoses)
print(diagnoses)
kappam.fleiss(diagnoses, exact=TRUE)
I hope someone else here can help with a Stata solution, as you requested, but this may at least provide a solution if you can't find it in Stata.
In response to Dimitriy's comment below, I believe Stata's native kappa command applies either to two unique raters or to more than two non-unique raters.
The original poster may also want to consider the icc command in Stata, which allows for multiple unique raters.