How to get the original value labels from a tempvar in a Stata program? - stata

I have a program which does some calculation and saves some result matrices. I would like to use a sub-program which gets passed on some arguments from the main programm to name the columns and rows of the result matrices. In the best case, the value labels of the original variable in my dataset should be used for the row names of the matrices. However, I cannot figure out how to get the value labels from the original variable when I am passing on the variable. In the main program, I use syntax varname, rowvar(varname). Here is an example code:
*** Sub-program name matrix rows and cols ***
program namemat
version 6.0
args rowvar
tempname rowlab tmp_min tmp_max tmp_rowlab
mat def exmat = J(13,3,0)
qui sum `rowvar'
local tmp_min = r(min)
local tmp_max = r(max)
foreach i of numlist `tmp_min' / `tmp_max' {
local tmp_rowlab: label (`rowvar') `i'
local rowlab = `"`rowlab'"' + `""`tmp_rowlab'" "'
}
matrix colnames exmat = "col 1" "col 2" "col 3"
matrix rownames exmat = `rowlab'
mat list exmat
end
*** Use subprogram ***
sysuse nlsw88, clear
namemat occupation
How can I get the original value labels from the 13 occupations as rownames? In the next coding step, I would save value labels which are too long in an additional scalar which I would then store along with the matrices as rclass results.

This works for me:
*** Sub-program name matrix rows and cols ***
program namemat
version 6.0
args rowvar
mat def exmat = J(13,3,0)
sum `rowvar', meanonly
forval i = `r(min)'/`r(max)' {
local rowlab `"`rowlab' "`: label (`rowvar') `i''" "'
}
matrix colnames exmat = "col 1" "col 2" "col 3"
matrix rownames exmat = `rowlab'
mat list exmat
end
*** Use subprogram ***
sysuse nlsw88, clear
namemat occupation
EDIT: Your problem is using tempname rowlab which means that the local macro rowlab starts out in life as a temporary name like __000000 Moral: don't use tempname when you're defining a local macro.

Related

Power analysis via simulations in Stata version 15.1

I have been trying to run this simulation code in Stata version 15.1, but am having issues running it as indicated below.
local num_clus 3 6 9 18 36
local clussize 5 10 15 20 25
*Model specifications
local intercept 17.87
local timecoeff1 -5.42
local timecoeff2 -5.72
local timecoeff3 -7.03
local timecoeff4 -6.13
local timecoeff5 -9.13
local intrvcoeff 5.00
local sigma_u3 25.77
local sigma_u2 120.62
local sigma_error 38.35
local nrep 1000
local alpha 0.05
*Generate multi-level data
capture program drop swcrt
program define swcrt, rclass
version 15.1
preserve
clear
args num_clus clussize intercept intrvcoeff timecoeff1 timecoeff2 timecoeff3 timecoeff4 timecoeff5 sigma_u3 sigma_error alpha
assert `num_clus' > 0 & `clussize' > 0 & `intercept' > 0 & `intrvcoeff' > 0 & `timecoeff1' < 0 & `timecoeff2' < 0 & `timecoeff3' < 0 & `timecoeff4' < 0 & `timecoeff5' < 0 & `sigma_u3' > 0 & `sigma_error' > 0 & `alpha' > 0
/*Generate simulated multi—level data*/
qui
clear
set obs `num_clus'
qui gen cluster = _n
qui gen group = 1+mod(_n-1,4)
/*Generate cluster-level errors*/
qui gen u_3 = rnormal(0,`sigma_u3')
expand `clussize'
bysort cluster: gen individual = _n
/*Set up time*/
expand 6
bysort cluster individual: gen time = _n-1
/*Set up intervention variable*/
gen intrv = (time>=group)
/*Generate residual errors*/
qui gen error = rnormal(0,`sigma_error')
/*Generate outcome y*/
qui gen y = `intercept' + `intrvcoeff'*intrv + `timecoeff1'*1.time + `timecoeff2'*2.time + `timecoeff3'*3.time + `timecoeff4'*4.time + `timecoeff5'*5.time + u_3 + error
/*Fit multi-level model to simulated dataset*/
mixed y intrv i.time ||cluster:, covariance(unstructured) reml dfmethod(kroger)
/*Return estimated effect size, bias, p-value, and significance dichotomy*/
tempname M
matrix `M' = r(table)
return scalar bias = _b[intrv] - `intrvcoeff'
return scalar p = `M'[1,4]
return scalar p_= (`M'[1,4] < `alpha')
exit
end swcrt
*Postfile to store results
tempname step
tempfile powerresults
capture postutil clear
postfile `step' num_clus [B]clussize[/B] intrvcoeff p p_ bias using `powerresults', replace
ERROR: (note: file /var/folders/v4/j5kzzhc52q9fvh6w9pcx9fgm0000gn/T//S_00310.00000c not found)
*Loop over number of clusters
foreach c of local num_clus{
display as text "Number of clusters" as result "`c'"
foreach s of local clussize{
display as text "Cluster size" as result "`s'"
forvalue i = 1/`nrep'{
display as text "Iterations" as result `nrep'
quietly swcrt `num_clus' `clussize' `intercept' `intrvcoeff' `timecoeff1' `timecoeff2' `timecoeff3' `timecoeff4' `timecoeff5' `sigma_u3' `sigma_error' `alpha'
post `step' (`c') (`s') (`intrvcoeff') (`r(p)') (`r(p_)') (`r(bias)')
}
}
}
postclose `step'
ERROR:
Number of clusters3
Cluster size5
Iterations1000
r(9);
*Open results, calculate power
use `powerresults', clear
levelsof num_clus, local(num_clus)
levelsof clussize, local(clussize)
matrix drop _all
*Loop over combinations of clusters
*Add power results to matrix
foreach c of local num_clus{
foreach s of local clussize{
quietly ci proportions p_ if num_clus == `c' & clussize = `s'
local power `r(proportion)'
local power_lb `r(lb)'
local power_ub `r(ub)'
quietly ci mean bias if num_clus == `c' & clussize = `s'
local bias `r(mean)'
matrix M = nullmat(M) \ (`c', `s', `intrvcoeff', `power', `power_lb', `power_ub', `bias')
}
}
*Display the matrix
matrix colnames M = c s intrvcoeff power power_lb power_ub bias
ERROR:
matrix M not found
r(111);
matrix list M, noheader format(%3.2f)
ERROR:
matrix M not found
r(111);
There are a few things that seem to be amiss above.
I get a message after the postfile command saying that the file is not found. Nowhere in my code do I actually use that name so it seems to be generated by Stata.
After the loop and the post command I get error r(9).
Error message r(111) - says that the matrix is not found.
I have checked the following parts of the code to try and resolve the issue:
Specified local macros outside of the program and passed into it via the args statement of the program
Match between the variables in the call of the swcrt with the args statement in the program
Match between arguments in assert statement of the program with args command and whether the alligator clips are specified appropriately
Match b/w the number of variables in the post and postfile commands
I am not quite sure why I get these errors considering that the code did work previously and the program iterated (even when I take away the changes there is still the error). Does anyone know why this happens? If I had to guess, the matrix can't be found because of the error with the file not being found when I use postfile.

Add mean and sd column in correlation matrix in Stata

I'm trying to create correlation matrix that also includes means and sd's of each variable.
** Set variables used in Summary and Correlation
local variables relationship commission anxiety enjoyment negotiation_efficacy similarity_values similarity_behaviors SPT_confidence own_SPT_effort
** Descriptive statistics
estpost summarize `variables'
matrix table = ( e(mean) \ e(sd) )
matrix rownames table = mean sd
matrix list table
** Correlation matrix
correlate `variables'
matrix C = r(C)
local k = colsof(C)
matrix C = C[1..`=`k'-1',.]
local corr : rownames C
matrix table = ( table \ C )
matrix list table
estadd matrix table = table
local cells table[count](fmt(0) label(Count)) table[mean](fmt(2) label(Mean)) table[sd](fmt(2) label(Standard Deviation))
local drop
foreach row of local corr {
local drop `drop' `row'
local cells `cells' table[`row'](fmt(4) drop(`drop'))
}
display "`cells'"
esttab using Report.rtf,
replace
noobs
nonumbers
compress
cells("`cells'")
If it helps, this is what the correlation code looks like:
asdoc corr relationship commission anxiety enjoyment negotiation_efficacy similarity_values similarity_behaviors SPT_confidence own_SPT_effort ranger_SPT_effort cooperative_motivation competitive_motivation, nonum
This correlation matrix looks exactly how it should, but I'm essentially hoping to add means and sd's to the beginning.
*This is cross-posted here: https://www.statalist.org/forums/forum/general-stata-discussion/general/1549809-add-mean-and-sd-column-in-correlation-matrix-in-stata
It's not clear for me whether you want the table to include significance stars or not. If not you can just use corr and a loop to obtain sd and mean, then use frmttable. Seems shorter than your current example. Here's an example
bcuse wage2
global variables "wage hours educ exper"
corr $variables
matrix corr_t = r(C)
local rows = rowsof(corr_t)
di "`rows'"
matrix add = J(`rows',2,.)
matrix list add
local n = 1
foreach x of global variables {
sum `x'
mat add[`n',1] = r(sd)
mat add[`n',2] = r(mean)
local n = `n' + 1
}
matrix final = corr_t,add
matrix list final
frmttable, statmat(final) sdec(2) ctitle("","wage", "hours", "educ", "exper","sd","mean") rtitle("wage"\ "hours"\ "educ" \ "exper")

How to display r-squared for multiple models using outreg2

I run two regressions for which I would like to show the r-squared:
logit y c.x1 c.x2
quietly est store e1
local r1 = e(r2_p)
logit y c.x1 c.x2 c.x3
quietly est store e2
local r2 = e(r2_p)
I tried to create a matrix to fill it but was not successful:
mat t1=J(1,2,0) //Defining empty matrix with 2 columns 1 row
local rsq `r*' //Trying to store r1 and r2 as numeric
local a=1
forval i=1/2{
mat t1[`i'+1,`a']= `r*' // filling each row, one at a time, this fails
loc ++a
}
mat li t1
Ultimately, I would like to export the results with the community-contributed Stata command outreg2:
outreg2 [e*] using "myfile", excel replace addstat(Adj. R^2:, `rsq')
The following works for me:
webuse lbw, clear
logit low age smoke
outreg2 using "myfile.txt", replace addstat(Adj. R^2, e(r2_p))
logit low age smoke ptl ht ui
outreg2 using "myfile.txt", addstat(Adj. R^2, e(r2_p)) append
type myfile.txt
(1) (2)
VARIABLES low low
age -0.0498 -0.0541
(0.0320) (0.0339)
smoke 0.692** 0.557
(0.322) (0.339)
ptl 0.679**
(0.344)
ht 1.408**
(0.624)
ui 0.817*
(0.451)
Constant 0.0609 -0.168
(0.757) (0.806)
Observations 189 189
Adj. R^2 0.0315 0.0882
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1

Create comparison-of-means table with multiple variables by multiple groups comparing to total mean

I'm looking for a way to create a comparison-of-means (t-test) table from the output of a tabstat command. Basically, I want to know if the mean of each group is statistically significantly different from the mean for the variable overall.
I have 75 variables across 15 groups for a total of 1125 t-tests, so doing them one at a time is out of the question.
I could always write a loop for the tests, but I was wondering if there was a command similar to tabstat that would make the table for me. Google has been unhelpful thus far, even though it seems like a fairly logical place to go from a tabstat output.
Thanks!
There might be packages that better serve you, but here's an example that I just put together. It's assuming you are using the one sample t-test because I can't see another way to do it with a t-test. This block of code returns a matrix with three things: the difference from the grand mean, the t value, and the p value.
Feel free to adapt the code as you see fit. Actually it'd just take a few more steps to make it into an ado file.
sysuse auto,clear
loca varlist mpg weight length price // put varlist here
loca grpvar foreign // put grouping variable here
loca n_var=wordcount("`varlist'")
qui tab `grpvar'
loca n_grp=`r(r)'
mat T=J(`n_var'*3,`n_grp',.) // (# of vars*3, # of groups,.)
**colnames
loca cnames=""
su `grpvar', meanonly
forval i=`r(min)'/`r(max)' { // assuming consecutive sequence
loca cnames="`cnames'"+" "+"`i'"
}
mat colnames T=`cnames' // values of grouping variable
**rownames
loca rnames=""
forval i=1/`n_var' {
loca var=word("`varlist'",`i')
loca rnames="`rnames'"+" "+"`var':diff `var':t `var':p"
}
mat rownames T=`rnames' // difference, t value, p value
loca i=1
foreach var in `varlist' {
loca j=1
su `grpvar', meanonly
forval f=`r(min)'/`r(max)' {
su `var', meanonly
loca ydbhat=`r(mean)' // y double hat
su `var' if `grpvar'==`f', meanonly
loca diff=`ydbhat'-`r(mean)' // difference
qui ttest `var'=`ydbhat' if `grpvar'==`f' // one-sample ttest
mat T[`i',`j']=`diff'
mat T[`i'+1,`j']=`r(t)'
mat T[`i'+2,`j']=`r(p)'
loca ++j
}
loca i=`i'+3
}
mat list T, f(%8.3f)
Now I am not sure if 15 columns would be too wide. If so, change the display format or even just use putexcel to export the matrix into a spreadsheet.
Edited: Fixed the forval i=0/1 in the loops to a more generally applicable form. Also other minor editing.
Edited the code a bit - can't post markdown in the comments, so I made it a new answer. This version does a two-sample t-test and also displays the cluster mean for each variable.
local varlist var1 var2 var3 // put varlist here
local grpvar _clus_1 // put grouping variable here
local n_var=wordcount("`varlist'")
qui summ `grpvar', meanonly
local n_grp=`r(max)'
mat T=J(`n_var'*4,`n_grp',.) // (# of vars*4,# of groups,.)
**colnames
local cnames=""
qui summ `grpvar', meanonly
forval i=`r(min)'/`r(max)' { // assuming consecutive sequence
local cnames="`cnames'"+" "+"`i'"
}
//di "`cnames'"
mat colnames T=`cnames' // values of grouping variable
**rownames
local rnames=""
forval i=1/`n_var' {
local var=word("`varlist'",`i')
local rnames="`rnames'"+" "+"`var':mean `var':diff `var':t-stat `var':p-value"
}
mat rownames T=`rnames' // mean, difference, t value, p value
local i=1
foreach var in `varlist' {
local j=1
qui summ `grpvar'
forval f=`r(min)'/`r(max)' {
qui summ `var'
local varmean=`r(mean)'
local varn = `r(N)'
local varsd = `r(sd)'
qui summ `var' if `grpvar'==`f'
local clusmean = `r(mean)'
local clusn = `r(N)'
local clussd = `r(sd)'
local diff=`clusmean'-`varmean' // difference
**two-sample t-test
qui ttesti `varn' `varmean' `varsd' `clusn' `clusmean' `clussd'
mat T[`i',`j']=`clusmean'
mat T[`i'+1,`j']=`diff'
mat T[`i'+2,`j']=`r(t)'
mat T[`i'+3,`j']=`r(p)'
local ++j
}
local i=`i'+4
}
mat list T, f(%8.3f)

Is Conger's kappa available in Stata?

Is the modified version of kappa proposed by Conger (1980) available in Stata? Tried to google it to no avail.
This is an old question, but in case anyone is still looking--the SSC package kappaetc now calculates that, along with every other inter-rater statistic you could ever want.
Since no one has responded with a Stata solution, I developed some code to calculate Conger's kappa using the formulas provided in Gwet, K. L. (2012). Handbook of Inter-Rater Reliability (3rd ed.), Gaithersburg, MD: Advanced Analytics, LLC. See especially pp. 34-35.
My code is undoubtedly not as efficient as others could write, and I would welcome any improvements to the code or to the program format that others wish to make.
cap prog drop congerkappa
prog def congerkappa
* This program has only been tested with Stata 11.2, 12.1, and 13.0.
preserve
* Number of judges
scalar judgesnum = _N
* Subject IDs
quietly ds
local vlist `r(varlist)'
local removeit = word("`vlist'",1)
local targets: list vlist - removeit
* Sums of ratings by each judge
egen judgesum = rowtotal(`targets')
* Sum of each target's ratings
foreach i in `targets' {
quietly summarize `i', meanonly
scalar mean`i' = r(mean)
}
* % each target rating of all target ratings
foreach i in `targets' {
gen `i'2 = `i'/judgesum
}
* Variance of each target's % ratings
foreach i in `targets' {
quietly summarize `i'2
scalar s2`i'2 = r(Var)
}
* Mean variance of each target's % ratings
foreach i in `targets' {
quietly summarize `i'2, meanonly
scalar mean`i'2 = r(mean)
}
* Square of mean of each target's % ratings
foreach i in `targets' {
scalar mean`i'2sq = mean`i'2^2
}
* Sum of variances of each target's % ratings
scalar sumvar = 0
foreach i in `targets' {
scalar sumvar = sumvar + s2`i'2
}
* Sum of means of each target's % ratings
scalar summeans = 0
foreach i in `targets' {
scalar summeans = summeans + mean`i'2
}
* Sum of meansquares of each target's % ratings
scalar summeansqs = 0
foreach i in `targets' {
scalar summeansqs = summeansqs + mean`i'2sq
}
* Conger's kappa
scalar conkappa = summeansqs -(sumvar/judgesnum)
di _n "Conger's kappa = " conkappa
restore
end
The data structure required by the program is shown below. The variable names are not fixed, but the judge/rater variable must be in the first position in the data set. The data set should not include any variables other than the judge/rater and targets/ratings.
Judge S1 S2 S3 S4 S5 S6
Rater1 2 4 2 1 1 4
Rater2 2 3 2 2 2 3
Rater3 2 5 3 3 3 5
Rater4 3 3 2 3 2 3
If you would like to run this against a test data set, you can use the judges data set from StataCorp and reshape it as shown.
use http://www.stata-press.com/data/r12/judges.dta, clear
sort judge
list, sepby(judge)
reshape wide rating, i(judge) j(target)
rename rating* S*
list, noobs
* Run congerkappa program on demo data set in memory
congerkappa
I have run only a single validation test of this code against the data in Table 2.16 in Gwet (p. 35) and have replicated the Conger's kappa = .23343 as calculated by Gwet on p. 34. Please test this code on other data with known Conger's kappas before relying on it.
I don't know if Conger's kappa for multiple raters is available in Stata, but it is available in R via the irr package, using the kappam.fleiss function and specifying the exact option. For information on the irr package in R, see http://cran.r-project.org/web/packages/irr/irr.pdf#page.12 .
After installing and loading the irr package in R, you can view a demo data set and Conger's kappa calculation using the following code.
data(diagnoses)
print(diagnoses)
kappam.fleiss(diagnoses, exact=TRUE)
I hope someone else here can help with a Stata solution, as you requested, but this may at least provide a solution if you can't find it in Stata.
In response to Dimitriy's comment below, I believe Stata's native kappa command applies either to two unique raters or to more than two non-unique raters.
The original poster may also want to consider the icc command in Stata, which allows for multiple unique raters.