Capitalizing value labels in Stata - stata

Some datasets come with full-lowercase value labels, and I end up with graphs and tables showing results for "egypt", "jordan" and "saudi arabia" instead of the capitalized country names.
I guess that the proper() string function can do something for me, but I am not finding the right way to write the code for Stata 11 that will capitalize all value labels for a given variable.
I basically need to run the proper() function on all value labels on the variable, and then assign them to the variable. Is that possible using a foreach loop and macros in Stata?

Yes. First let's create some sample data with labels for testing:
clear
drawnorm x, n(10)
gen byte v = int(4+x)
drop x
label define types 0 "zero" 1 "one" 2 "two" 3 "three" 4 "four" 5 "five" 6 "six"
label list types
label values v types
Here's a macro to capitalize the values associated with the variable "v":
local varname v
local sLabelName: value label `varname'
di "`sLabelName'"
levelsof `varname', local(xValues)
foreach x of local xValues {
local sLabel: label (`varname') `x', strict
local sLabelNew =proper("`sLabel'")
noi di "`x': `sLabel' ==> `sLabelNew'"
label define `sLabelName' `x' "`sLabelNew'", modify
}
After running it, check the results:
label list types

Related

Loop through a set of variables based on condition in another variable

I have a list of variables a_23 a_24_1 a_24_2 a_24_3 a_24_4 a_24_5 a_24_6 a_24_7 a_24_8.
The values in variables a_24* are based on the response in a_23.
If a_23==1, then at least one variable in a_24* must be equal to 1.
I therefore want to check if any of the variables a_24* does not contain the value 1 if a_23==1
I tried the loop below,
foreach var of varlist a_24_1* {
br a_23 a_24* if a_23==1 & `var' != 1
}
but it returns all the variables that do not contain 1 in the set of variables. However, I only need cases where all variables do not contain the value 1 if the determining variable is equal to 1.
A data example as well as code would be a good idea, so that you then base your question on an MCVE: see https://stackoverflow.com/help/mcve for explanation.
As I understand it an intermediate variable would help here:
egen mina_24 = rowmin(a_24_*)
as the minimum will be 0 if and only if all values are 0.
Note that your loop
foreach var of varlist a_24_1* {
br a_23 a_24* if a_23 == 1 & `var' != 1
}
is a loop over the single variable a_24_1; presumably you mean a24_* in the foreach line.

How can I sort variables based on part of a string variable?

I have a dataset with string variables and I am trying to generate a new binary variable based on the first two characters. All strings are 5 characters long, but I'm only concerned with the first two in order to sort.
For example, I could have 22001 and 22005. Since both are of the form 22XXX, I want to assign value 1 for both in the variable type_A. And if I have 25001 and 25005, since both are not of the form 22XXX, I want to assign value 0 for both in the variable type_A.
This should do the job:
clear
set obs 4
generate str5 var1 = "22001" in 1
replace var1 = "22005" in 2
replace var1 = "25001" in 3
replace var1 = "25005" in 4
gen type_A = substr(var1, 1, 2) == "22"
Please note that as you explain your problem it looks like you you are storing 22005 as text - which may not necessarily be the best idea..

How to populate missing values for string variable in a column based on fixed criteria

To populate missing data with a fixed range of values
I would like to check how to populate column aktype with a range of values (the range of values for the same pidlink are always fixed at 11 types of values listed below) for those cells with missing values. I have about 17,000+ observations that are missing.
The range of values are as follows:
A
B
C
D
E
G
H
I
J
K
L
I have tried the following command but it does not work:-
foreach x of varlist aktype=1/11 {
replace aktype = "A" in 1 if aktype==""
replace aktype = "B" in 2 if aktype==""
replace aktype = "C" in 3 if aktype==""
replace aktype = "D" in 4 if aktype==""
replace aktype = "E" in 5 if aktype==""
replace aktype = "G" in 6 if aktype==""
replace aktype = "H" in 7 if aktype==""
replace aktype = "I" in 8 if aktype==""
replace aktype = "J" in 9 if aktype==""
replace aktype = "K" in 10 if aktype==""
replace aktype = "L" in 11 if aktype==""
}
Would appreciate it if you could advise on the right command to use. Many thanks!
I would generate a variable AK that has letters A-K in positions 1-11 (and 12-22, and 23-33, and so on). The replace missing values with the value of this variable AK.
* generate data
clear
set obs 20
generate aktype = ""
replace aktype = "foo" in 1/1
replace aktype = "bar" in 10/12
* generate variable with letters A-K
generate AK = char(65 + mod(_n - 1, 11))
* fill missing values
replace aktype = AK if missing(aktype)
list
This yields the following.
. list
+-------------+
| aktype AK |
|-------------|
1. | foo A |
2. | B B |
3. | C C |
4. | D D |
5. | E E |
|-------------|
This first addresses the comment "it does not work".
Generally, in this kind of forum you should always be specific and say exactly what happens, namely where the code breaks down and what the result is (e.g. what error message you get). If necessary, add why that is not what is wanted.
Specifically, in this case Stata would get no further than
foreach x of varlist aktype=1/11
which is illegal (as well as unclear to Stata programmers).
You can loop over a varlist. In this case looping over a single variable aktype is legal. (It is usually pointless, but that's style, not syntax.) So this is legal:
foreach x of varlist aktype
By the way, you define x as the loop argument, but never refer to it inside the loop. That isn't illegal, but it is unusual.
You can also loop over a numlist, e.g.
foreach x of numlist 1/11
although
forval x = 1/11
is a more direct way of doing that. All this follows from the syntax diagrams for the commands concerned, where whatever is not explicitly allowed is forbidden.
On occasions when you need to loop over a varlist and a numlist you will need to use different syntax, but what is best depends on the precise problem.
Now second to the question: I can't see any kind of rule in the question for which values get assigned A through L, so can't advise positively.

How to return a value label by indexing label position

Suppose I have a variable named MyVar with value labels defined like this:
0 Something
1 Something else
2 Yet another thing
How do I obtain the second value label (i.e. "Something else")? Edit: Assume that I do not know a priori what the factor values are (i.e. I do not know the minimum value label, and the factor values may increment by numbers other than 1, and may increment unevenly).
I know I can obtain the label corresponding to the value of 2:
. local LABEL: label (MyVar) 2, strict
. di "`LABEL'"
Yet another thing
But I want to obtain the label corresponding to the position of 2 in the value label list:
. <Some amazing Stata-fu using (labeled) variable MyVar and the position 2>
. di "`LABEL'"
Something else
You want to nest a couple of extended macro functions like matryoshkas:
clear
set obs 3
gen x=_n-1
label define xlab 0 "Something" 1 "Something else" 2 "Yet another thing"
lab val x xlab
levelsof x, local(xnumbers)
di "`:label xlab `:word 2 of `xnumbers'''"
Working from the end of the last line to the front. The local xnumbers produced by levelsof contains the distinct levels of x from smallest to largest: 0 1 2. Then you figure out what the second word of that is local is, which is 1. Finally, you get the label corresponding to that numeric value, which is "Something else".
You can get the labels into a vector in Mata.
. sysuse auto, clear
(1978 Automobile Data)
. mata
------------------------------------------------- mata (type end to exit) --
: st_vlload("origin", values = ., text = "")
: values
1
+-----+
1 | 0 |
2 | 1 |
+-----+
: text
1
+------------+
1 | Domestic |
2 | Foreign |
+------------+
: text[2,1]
Foreign
: end
That could be the hard core of a program to do something with them. Dependent on what you want to do, the answer could be expanded. It's also up for grabs whether you start with a variable name or a value label name.
EDIT: Here is a quick hack at a program to return the j th value label. You present a name which by default is taken to be a variable name; with the labelname option it is taken to be a value label name. Not much tested.
*! 1.0.0 NJC 7 Oct 2014
program jthvaluelabel, rclass
version 9
syntax name , j(numlist int >0 min=1 max=1) [labelname]
if "`labelname'" == "" {
confirm var `namelist'
local labelname : value label `namelist'
if "`labelname'" == "" {
di as err "no value label attached to `namelist'"
exit 111
}
}
else {
local labelname `namelist'
capture label list `labelname'
if _rc {
di as err "no such value label defined"
exit 111
}
}
mata: lookitup("`labelname'", `j')
di as text `"`valuelabel'"'
return local valuelabel `"`valuelabel'"'
end
mata:
void lookitup (string scalar lblname, real scalar j) {
real colvector values
string colvector text
real scalar nlbl
string scalar labels
st_vlload(lblname, values = ., text = "")
nlbl = length(text)
if (nlbl == 1) labels = "label"
else if (nlbl > 1) labels = "labels"
if (nlbl < j) {
errprintf("no such label; %1.0f %s, but #%1.0f requested\n",
nlbl, labels, j)
exit(498)
}
else {
st_local("valuelabel", text[j])
}
}
end
Some examples:
. sysuse auto, clear
(1978 Automobile Data)
. jthvaluelabel foreign, j(1)
Domestic
. jthvaluelabel foreign, j(2)
Foreign
. jthvaluelabel foreign, j(3)
no such label; 2 labels, but #3 requested
r(498);
. jthvaluelabel make, j(1)
no value label attached to make
r(111);
. jthvaluelabel origin, j(1) labelname
Domestic
Posting code here is occasionally a little difficult. The code delimiters aren't always respected. The real program on my machine is indented more systematically than is evident from the version above.
I cobbled together a nice solution from Nick's and Dimitriy's answers and comments (the application is for a function outputting a line of a table, in a section and the user has specified that they want labels for groupvar for the position index):
local labelname : value label `groupvar'
mata: st_vlload("`labelname'", values = ., text = "")
mata: st_local("vallab", text[`index'])
local vallab = substr("`vallab'",1,8)
Then the program carries on using the local vallab.

Stata factor value from label

I would like to look up a value/code associated with a label, and store that value in a scalar or local macro. While the information I want is stored in the definition of the label vector, apparently I need to go through some contortions to get it.
Extending Roberto Ferrer's answer to my last question, I came up with this approach:
// sample data
clear
input str5 mystr int mynum
a 5
b 5
b 6
c 4
end
encode mystr, gen(myfactor)
// get code for "b"
gen tmp = 0
replace tmp = myfactor if myfactor == "b":myfactor
sort tmp
scalar bcode = tmp[_N]
This seems woefully inefficient in terms of data manipulation and code maintenance, especially considering how the information I want is already saved (and viewable with label list).
This uses labellist, from SSC. Download using ssc install labellist.
clear
set more off
*----- example data -----
input str5 mystr
"good"
"bad"
"bad"
"regular"
end
encode mystr, gen(myfactor)
*----- what you want -----
labellist
local faclab = r(myfactor_labels)
local facval = r(myfactor_values)
// get # for "good"
local i : list posof "good" in faclab
local j : word `i' of `facval'
display "`j'"