Displaying results from ranksum - stata

I'm trying to modify the code posted by #Nick Cox in my previous question but I have some issues.
I have set my varlist and my group variable. I have also changed the col option to fit my varname. I would like to add the number of observations for each group r(N 1) / r(N 2) and put over the results list some "titles".
I am trying to study the display command but I cannot figure out a solution.
My code is the following:
foreach v of var BVCAlogMAR AvgSSI L1DensityWholeImage {
quietly ranksum `v', by(G6PDcarente) porder
scalar pval = 2*normprob(-abs(r(z)))
di "`v'{col 34}" %05.3f pval " " %6.4e pval " " %05.3f r(porder) ///
" " %05.3f r(N 1) " " %05.3f r(N 2)
}
I am not able to put in the results list the values of r(N 1) and r(N 2) . Moreover, I have no idea how to show a column header with the following title:
P-Value, PValue2, Porder- Observ. group 1 - Observ. group 2
Can you please help me?

You are incorrectly referring to r(N_1) and r(N_2) as r(N 1) and r(N 2) respectively.
In the toy example provided in your previous post, i have corrected this mistake and i inserted six additional lines in bold, which achieve what you want:
sysuse auto, clear
local i = 0
foreach v of var mpg price weight length displacement {
local ++i
quietly ranksum `v', by(foreign) porder
scalar pval = 2*normprob(-abs(r(z)))
if `i' == 1 {
display %20s "P-Value", %5s "PValue2", %5s "Porder1", %5s "group 1", %5s "group 2"
display ""
}
display "`v'{col 14}" %05.3f pval " " %6.4e pval " " %05.3f r(porder) ///
" " %05.3f r(N_1) " " %05.3f r(N_2)
}
The first display command acts as a header, but you will have to play with the values to get the desired spacing. The second merely adds a blank line.
The counter macro i serves to display the header and blank lines only in the first step of the for loop instead of them repeating for each variable.
The results are illustrated below:
P-Value PValue2 Porder1 group 1 group 2
mpg 0.002 1.9e-03 0.271 52.000 22.000
price 0.298 3.0e-01 0.423 52.000 22.000
weight 0.000 3.8e-07 0.875 52.000 22.000
length 0.000 9.4e-07 0.862 52.000 22.000
displacement 0.000 1.1e-08 0.921 52.000 22.000
For more information about output formatting using the display command, type help format in Stata's command prompt.

Related

Multiple local in foreach command macro

I have a dataset with multiple subgroups (variable economist) and dates (variable temps99).
I want to run a tabsplit command that does not accept bysort or by prefixes. So I created a macro to apply my tabsplit command to each of my subgroups within my data.
For example:
levelsof economist, local(liste)
foreach gars of local liste {
display "`gars'"
tabsplit SubjectCategory if economist=="`gars'", p(;) sort
return list
replace nbcateco = r(r) if economist == "`gars'"
}
For each subgroup, Stata runs the tabsplit command and I use the variable nbcateco to store count results.
I did the same for the date so I can have the evolution of r(r) over time:
levelsof temps99, local(liste23)
foreach time of local liste23 {
display "`time'"
tabsplit SubjectCategory if temps99 == "`time'", p(;) sort
return list
replace nbcattime = r(r) if temps99 == "`time'"
}
Now I want to do it on each subgroups economist by date temps99. I tried multiple combination but I am not very good with macros (yet?).
What I want is to be able to have my r(r) for each of my subgroups over time.
Here's a solution that shows how to calculate the number of distinct publication categories within each by-group. This uses runby (from SSC). runby loops over each by-group, each time replacing the data in memory with the data from the current by-group. For each by-group, the commands contained in the user's program are executed. Whatever is left in memory when the user's program terminates is considered results and accumulates. Once all the groups have been processed, these results replace the data in memory.
I used the verbose option because I wanted to present the results for each by-group using nice formatting. The derivation of the list of distinct categories is done by splitting each list, converting to a long layout, and reducing to one observation per distinct value. The distinct_categories program generates one variable that contains the final count of distinct categories for the by-group.
* create a demontration dataset
* ------------------------------------------------------------------------------
clear all
set seed 12345
* Example generated by -dataex-. To install: ssc install dataex
clear
input str19 economist
"Carmen M. Reinhart"
"Janet Currie"
"Asli Demirguc-Kunt"
"Esther Duflo"
"Marianne Bertrand"
"Claudia Goldin"
"Bronwyn Hughes Hall"
"Serena Ng"
"Anne Case"
"Valerie Ann Ramey"
end
expand 20
bysort economist: gen temps99 = 1998 + _n
gen pubs = runiformint(1,10)
expand pubs
sort economist temps99
gen pubid = _n
local nep NEP-AGR NEP-CBA NEP-COM NEP-DEV NEP-DGE NEP-ECM NEP-EEC NEP-ENE ///
NEP-ENV NEP-HIS NEP-INO NEP-INT NEP-LAB NEP-MAC NEP-MIC NEP-MON ///
NEP-PBE NEP-TRA NEP-URE
gen SubjectCategory = ""
forvalues i=1/19 {
replace SubjectCategory = SubjectCategory + " " + word("`nep'",`i') ///
if runiform() < .1
}
replace SubjectCategory = subinstr(trim(SubjectCategory)," ",";",.)
leftalign // from SSC
* ------------------------------------------------------------------------------
program distinct_categories
dis _n _n _dup(80) "-"
dis as txt "fille = " as res economist[1] as txt _col(68) " temps = " as res temps99[1]
// if there are no subjects for the group, exit now to avoid a no obs error
qui count if !mi(trim(SubjectCategory))
if r(N) == 0 exit
// split categories, reshape to a long layout, and reduce to unique values
preserve
keep pubid SubjectCategory
quietly {
split SubjectCategory, parse(;) gen(cat)
reshape long cat, i(pubid)
bysort cat: keep if _n == 1
drop if mi(cat)
}
// show results and generate the wanted variable
list cat
local distinct = _N
dis _n as txt "distinct = " as res `distinct'
restore
gen wanted = `distinct'
end
runby distinct_categories, by(economist temps99) verbose
This is an example of the XY problem, I think. See http://xyproblem.info/
tabsplit is a command in the package tab_chi from SSC. I have no negative feelings about it, as I wrote it, but it seems quite unnecessary here.
You want to count categories in a string variable: semi-colons are your separators. So count semi-colons and add 1.
local SC SubjectCategory
gen NCategory = 1 + length(`SC') - length(subinstr(`SC', ";", "", .))
Then (e.g.) table or tabstat will let you explore further by groups of interest.
To see the counting idea, consider 3 categories with 2 semi-colons.
. display length("frog;toad;newt")
14
. display length(subinstr("frog;toad;newt", ";", "", .))
12
If we replace each semi-colon with an empty string, the change in length is the number of semi-colons deleted. Note that we don't have to change the variable to do this. Then add 1. See also this paper.
That said, a way to extend your approach might be
egen class = group(economist temps99), label
su class, meanonly
local nclass = r(N)
gen result = .
forval i = 1/`nclass' {
di "`: label (class) `i''"
tabsplit SubjectCategory if class == `i', p(;) sort
return list
replace result = r(r) if class == `i'
}
Using statsby would be even better. See also this FAQ.

How to get the original value labels from a tempvar in a Stata program?

I have a program which does some calculation and saves some result matrices. I would like to use a sub-program which gets passed on some arguments from the main programm to name the columns and rows of the result matrices. In the best case, the value labels of the original variable in my dataset should be used for the row names of the matrices. However, I cannot figure out how to get the value labels from the original variable when I am passing on the variable. In the main program, I use syntax varname, rowvar(varname). Here is an example code:
*** Sub-program name matrix rows and cols ***
program namemat
version 6.0
args rowvar
tempname rowlab tmp_min tmp_max tmp_rowlab
mat def exmat = J(13,3,0)
qui sum `rowvar'
local tmp_min = r(min)
local tmp_max = r(max)
foreach i of numlist `tmp_min' / `tmp_max' {
local tmp_rowlab: label (`rowvar') `i'
local rowlab = `"`rowlab'"' + `""`tmp_rowlab'" "'
}
matrix colnames exmat = "col 1" "col 2" "col 3"
matrix rownames exmat = `rowlab'
mat list exmat
end
*** Use subprogram ***
sysuse nlsw88, clear
namemat occupation
How can I get the original value labels from the 13 occupations as rownames? In the next coding step, I would save value labels which are too long in an additional scalar which I would then store along with the matrices as rclass results.
This works for me:
*** Sub-program name matrix rows and cols ***
program namemat
version 6.0
args rowvar
mat def exmat = J(13,3,0)
sum `rowvar', meanonly
forval i = `r(min)'/`r(max)' {
local rowlab `"`rowlab' "`: label (`rowvar') `i''" "'
}
matrix colnames exmat = "col 1" "col 2" "col 3"
matrix rownames exmat = `rowlab'
mat list exmat
end
*** Use subprogram ***
sysuse nlsw88, clear
namemat occupation
EDIT: Your problem is using tempname rowlab which means that the local macro rowlab starts out in life as a temporary name like __000000 Moral: don't use tempname when you're defining a local macro.

Plot confidence interval efficiently

I want to plot confidence intervals for some estimates after running a regression model.
As I'm working with a very big dataset, I need an efficient solution: in particular, a solution that does not require me to sort or save the dataset. In the following example, I plot estimates for b1 to b6:
reg y b1 b2 b3 b4 b5 b6
foreach i of numlist 1/6 {
local mean `mean' `=_b[b`i']' `i'
local ci `ci' ///
(scatteri ///
`=_b[b`i'] +1.96*_se[b`i']' `i' ///
`=_b[`i'] -1.96 * _se[b`i']' `i' ///
,lpattern(shortdash) lcolor(navy))
}
twoway `ci' (scatteri `mean', mcolor(navy)), legend(off) yline(0)
While scatteri efficiently plots the estimates, I can't get boundaries for the confidence interval similar to rcap.
Is there a better way to do this?
Here's token code for what you seem to want. The example is ridiculous. It's my personal view that refining this would be pointless given the very accomplished previous work behind coefplot. The multiplier of 1.96 only applies in very large samples.
sysuse auto, clear
set scheme s1color
reg mpg weight length displ
gen coeff = .
gen upper = .
gen lower = .
gen which = .
local i = 0
quietly foreach v in weight length displ {
local ++i
replace coeff = _b[`v'] in `i'
replace upper = _b[`v'] + 1.96 * _se[`v'] in `i'
replace lower = _b[`v'] - 1.96 * _se[`v'] in `i'
replace which = `i' in `i'
label def which `i' "`v'", modify
}
label val which which
twoway scatter coeff which, mcolor(navy) xsc(r(0.5, `i'.5)) xla(1/`i', val) ///
|| rcap upper lower which, lcolor(navy) xtitle("") legend(off)

Trying to get descriptive/summary statistics by 2 levels/categories

I have a bit of a rudimentary question. I am trying to get descriptive statistics into word/excel/txt format for my data which is made up of 5 repeated cross section surveys with probability weights [pw=weight_hh]
I want to get the descriptives into this format (attached)
I have the population divided into 2 groups and the years represent the survey rounds. At the moment I am using the means command to either obtain the descriptives for each round separately or each group separately- but not both at one go. Moreover I used both tabout and outreg2 with means to export data and both commands save a file with a table but the tables are blank?
I am relatively new to programming on Stata so I apologize if the question isn't very clear. Any direction on this will be greatly appreciated.
This is what I'm trying at the moment.
global outcomes marital male age
foreach var in $outcomes {
levelsof year, local (year)
levelsof group, local (group)
foreach g of local hhgrp {
qui foreach r of loc year {
qui mean `var' if round==`r' & hhgrp==`g' & keep_main==1 [pw=hhwt]
matrix N`var'=e(N)
matrix m`var'=e(b)
matrix sd`var'=e(V)
matmap sd`var'sd`var', m(sqrt(#))
matrix `r'`var'`g'= N`var', m`var', sd`var'
matrix rownames `var'=`var'
}
}
}
But its only producing one matrix per variable that is 3 in total. I was hoping it would produce 10 per variable that is 30 in total
That is
Marital->10 matrices-> 5 survey years x 2 groups
Age ->10 matrices-> 5 survey years x 2 groups
Male ->10 matrices-> 5 survey years x 2 groups
Cross posted here: http://www.statalist.org/forums/forum/general-stata-discussion/general/990246-basic-question-about-descriptive-statiistics
It's not clear what you want. A table similar to the one shown in the image can be produced with tabstat (weights are allowed but not used in the example):
clear
set more off
*----- example data -----
sysuse auto
*----- what you want -----
label define lblrep78 1 ", rep 1" 2 ", rep 2" 3 ", rep 3" ///
4 ", rep 4" 5 ", rep 5"
label values rep78 lblrep78
egen groupv = group(foreign rep78), label
tabstat price weight, stats(mean semean) by(groupv) nototal
Your code however implies that you want matrices. Which one is it?
Thanks for the replies everyone. I tried tabstat, but it doesn't allow pWeights. I eventually went with this code and got close to what I wanted
outreg, clear
global outcomes age yrs_ed marital male
foreach var in $outcomes {
levelsof round, local (round)
levelsof hhgrp, local (hhgrp)
foreach g of local hhgrp {
qui foreach r of loc round {
qui mean `var' if round==`r' & hhgrp==`g' & keep_main==1 [pw=hhwt]
matrix N`var'=e(N)
matrix m`var'=e(b)
matrix sd`var'=e(V)
matrix `var'`g'`r'= m`var', sd`var', N`var'
}
}
}
outreg, clear
frmttable, clear
foreach var in $outcomes {
levelsof hhgrp, local (hhgrp)
foreach g of local hhgrp {
mat `var'`g'= `var'`g'90\ `var'`g'95\ `var'`g'00\ `var'`g'05\ `var'`g'10
}
}
foreach var in $outcomes {
levelsof hhgrp, local (hhgrp)
foreach g of local hhgrp {
qui frmttable using "$output\Output1.doc", sdec (2\2\0\2\2\0\2\2\0\2\2\0\2\2\0) substat(2) statmat(`var'`g')/*
*/ title("Individual Statistics") coljust(l;c) varlabels ctitles("", `var' \ "", `g') /*
*/ rtitles (Mean \ SD \ N \ Mean \ SD\ N\ Mean \ SD \ N \ Mean \ SD \ N \ Mean \ SD \ N ) merge
}
}

How to return a value label by indexing label position

Suppose I have a variable named MyVar with value labels defined like this:
0 Something
1 Something else
2 Yet another thing
How do I obtain the second value label (i.e. "Something else")? Edit: Assume that I do not know a priori what the factor values are (i.e. I do not know the minimum value label, and the factor values may increment by numbers other than 1, and may increment unevenly).
I know I can obtain the label corresponding to the value of 2:
. local LABEL: label (MyVar) 2, strict
. di "`LABEL'"
Yet another thing
But I want to obtain the label corresponding to the position of 2 in the value label list:
. <Some amazing Stata-fu using (labeled) variable MyVar and the position 2>
. di "`LABEL'"
Something else
You want to nest a couple of extended macro functions like matryoshkas:
clear
set obs 3
gen x=_n-1
label define xlab 0 "Something" 1 "Something else" 2 "Yet another thing"
lab val x xlab
levelsof x, local(xnumbers)
di "`:label xlab `:word 2 of `xnumbers'''"
Working from the end of the last line to the front. The local xnumbers produced by levelsof contains the distinct levels of x from smallest to largest: 0 1 2. Then you figure out what the second word of that is local is, which is 1. Finally, you get the label corresponding to that numeric value, which is "Something else".
You can get the labels into a vector in Mata.
. sysuse auto, clear
(1978 Automobile Data)
. mata
------------------------------------------------- mata (type end to exit) --
: st_vlload("origin", values = ., text = "")
: values
1
+-----+
1 | 0 |
2 | 1 |
+-----+
: text
1
+------------+
1 | Domestic |
2 | Foreign |
+------------+
: text[2,1]
Foreign
: end
That could be the hard core of a program to do something with them. Dependent on what you want to do, the answer could be expanded. It's also up for grabs whether you start with a variable name or a value label name.
EDIT: Here is a quick hack at a program to return the j th value label. You present a name which by default is taken to be a variable name; with the labelname option it is taken to be a value label name. Not much tested.
*! 1.0.0 NJC 7 Oct 2014
program jthvaluelabel, rclass
version 9
syntax name , j(numlist int >0 min=1 max=1) [labelname]
if "`labelname'" == "" {
confirm var `namelist'
local labelname : value label `namelist'
if "`labelname'" == "" {
di as err "no value label attached to `namelist'"
exit 111
}
}
else {
local labelname `namelist'
capture label list `labelname'
if _rc {
di as err "no such value label defined"
exit 111
}
}
mata: lookitup("`labelname'", `j')
di as text `"`valuelabel'"'
return local valuelabel `"`valuelabel'"'
end
mata:
void lookitup (string scalar lblname, real scalar j) {
real colvector values
string colvector text
real scalar nlbl
string scalar labels
st_vlload(lblname, values = ., text = "")
nlbl = length(text)
if (nlbl == 1) labels = "label"
else if (nlbl > 1) labels = "labels"
if (nlbl < j) {
errprintf("no such label; %1.0f %s, but #%1.0f requested\n",
nlbl, labels, j)
exit(498)
}
else {
st_local("valuelabel", text[j])
}
}
end
Some examples:
. sysuse auto, clear
(1978 Automobile Data)
. jthvaluelabel foreign, j(1)
Domestic
. jthvaluelabel foreign, j(2)
Foreign
. jthvaluelabel foreign, j(3)
no such label; 2 labels, but #3 requested
r(498);
. jthvaluelabel make, j(1)
no value label attached to make
r(111);
. jthvaluelabel origin, j(1) labelname
Domestic
Posting code here is occasionally a little difficult. The code delimiters aren't always respected. The real program on my machine is indented more systematically than is evident from the version above.
I cobbled together a nice solution from Nick's and Dimitriy's answers and comments (the application is for a function outputting a line of a table, in a section and the user has specified that they want labels for groupvar for the position index):
local labelname : value label `groupvar'
mata: st_vlload("`labelname'", values = ., text = "")
mata: st_local("vallab", text[`index'])
local vallab = substr("`vallab'",1,8)
Then the program carries on using the local vallab.