Replace $ in Stata variable labels with \textdollar - stata

I have some variables with dollar signs (i.e., $) in the variable labels. This causes some problems downstream in my code (I later modify these labels and the dollar signs deregister as empty global macros). So I would like to replace these dollar signs with LaTeX's \textdollar using Stata's subinstr() function.
But I can't figure it out. Is this possible? Or should I resign to doing this more manually? Or by looking for other characters near or around the $ in the variable labels?
clear
set obs 10
generate x = runiform()
label variable x "Label with $mil"
generate y = runiform()
label variable y "Another label with $mil"
describe
foreach v of varlist * {
local name : variable label `v'
local name `=subinstr("`name'", "$mil", "\textdollar", .)'
label variable `v' "`name'"
}
describe
This removes the label altogether.

You are missing an argument in subinstr(), what appears in help as n:
clear
set obs 10
generate x = runiform()
label variable x "Label with $"
local name: variable label x
local name = subinstr("`name'", "$", "\textdollar", .)
label variable x "`name'"
describe

(The problem has completely changed, which is why I give a separate answer.)
Having $something in the variable label is somewhat problematic because Stata will treat it as a macro and will therefore dereference it. What is actually Stata doing in your toy example? Let's see:
This is expected behavior:
. local name = subinstr("some text", " ", "xyz", .)
. display "`name'"
somexyztext
The following, which I don't know if documented, is not necessarily expected but crucial in understanding:
. local name = subinstr("some text", "", "xyz", .)
. display "`name'"
. (blank)
I put in the last line to emphasize that the local name has nothing.
In your code Stata dereferences $mil to nothing (because it's not declared beforehand; it's not meant to, of course). In fact,
label variable x "Label with $mil"
does not hold what you intend. Rather you want to delay macro substitution with \:
label variable x "Label with \$mil"
For the other part, when you run this
local name `=subinstr("`name'", "$mil", "\textdollar", .)'
it evaluates to
local name `=subinstr("`name'", "", "\textdollar", .)'
and the local name now holds nothing. That ends the story of why your code does what it does.
A solution might be:
clear
set obs 10
generate x = runiform()
label variable x "Label with \$mil"
generate y = runiform()
label variable y "Another \$mil"
describe
*-----
foreach v of varlist _all {
local name : variable label `v'
label variable `v' "`=subinstr("`name'\$mil", "\$mil", "\textdollar", .)'"
}
describe
but this only works if $mil is at the end of the label text. If it is in the middle somewhere, another strategy must be used.
All this on Stata 12.1.

Related

Shorten all value labels of a variable

Let's say I'm using
sysuse auto
where the variable foreign has two value labels. I want to take a certain variable, foreign, and change all the value labels that are longer than 6 chars, to 3 chars + (...).
In this dataset, that would mean changing the labels to For(...) and Dom(...). In my actual dataset, I have dozens of different values. Therefore, I'm looking for a solution that loops through all value labels, and doesn't specifically change each of these two manually.
Since I'm doing this on an air gap server, I prefer approaches that work with default packages.
sysuse auto, clear
levelsof foreign, local(values)
foreach value of local values {
local labfull : label (foreign) `value'
if strlen("`labfull'") > 6 {
local labsub = substr("`labfull'", 1, 3)
local newlab `"`labsub'(...)"'
label define newlabel `value' "`newlab'", add
}
else label define newlabel `value' "`labfull'", add
}
label list
label values foreign newlabel

How to separate Stata macro `varlist' with commas for using in mi( ) and inlist( )?

I want to store a list of variables in a macro and then call that macro inside a mi() statement. The original application is for a programme that uses data I cannot bring online for secrecy reasons, and which will include the following statement:
generate u = cond(mi(`vars'),., runiform(0,1))
The issue being that mi() requires comma separated variable names but vars is delimited by spaces.
I use the auto dataset and mark to illustrate my problem:
sysuse auto
local myvars foreign price
mark missing if mi(`myvars')
In this example, mi() asks for arguments separated by commas, Stata stops and complains that it cannot find a foreignprice variable. Is there a utility function that will insert the commas between the macro elements?
A direct answer to the question as set is to use the macro extended function subinstr to change spaces to commas:
sysuse auto
local myvars foreign price
local myvars : subinstr local myvars " " ",", all
mark missing if mi(`myvars')
If the aim is to create a marker variable that marks observations with any values missing on specified variables, then there are other alternative ways, most of which don't need any fiddling with separators in a list. This doesn't purport to be a complete set.
A1.
regress foreign price
gen missing = !e(sample)
A2.
egen missing = rowmiss(foreign price)
replace missing = missing > 0
A3.
local myvars foreign price
local myvars : subinstr local myvars " " ",", all
gen missing = missing(`myvars')
A4.
gen missing = 0
quietly foreach v in foreign price {
replace missing = 1 if missing(`v')
}
A5.
mark missing
markout missing foreign price
replace missing = !missing
EDIT In the edited question there is reference to this within a program:
generate u = cond(mi(`vars'),., runiform(0,1))
I wouldn't do that, even with the macro edited to include commas too, although any issue is more one of personal taste.
marksample touse
markout `vars'
generate u = runiform(0,1) if `touse'
It's likely that the indicator variable so produced is needed, or at least useful, somewhere else in the same program.

Extract the mean from svy mean result in Stata

I am able to extract the mean into a matrix as follows:
svy: mean age, over(villageid)
matrix villagemean = e(b)'
clear
svmat village
However, I also want to merge this mean back to the villageid. My current thinking is to extract the rownames of the matrix villagemean like so:
local names : rownames villagemean
Then try to turn this macro names into variable
foreach v in names {
gen `v' = "``v''"
}
However, the variable names is empty. What did I do wrong? Since a lot of this is copied from Stata mailing list, I particularly don't understand the meaning of local names : rownames villagemean.
It's not completely clear to me what you want, but I think this might be it:
clear
set more off
*----- example data -----
webuse nhanes2f
svyset [pweight=finalwgt]
svy: mean zinc, over(sex)
matrix eb = e(b)
*----- what you want -----
levelsof sex, local(levsex)
local wc: word count `levsex'
gen avgsex = .
forvalues i = 1/`wc' {
replace avgsex = eb[1,`i'] if sex == `:word `i' of `levsex''
}
list sex zinc avgsex in 1/10
I make use of two extended macro functions:
local wc: word count `levsex'
and
`:word `i' of `levsex''
The first one returns the number of words in a string; the second returns the nth token of a string. The help entry for extended macro functions is help extended_fcn. Better yet, read the manuals, starting with: [U] 18.3 Macros. You will see there (18.3.8) that I use an abbreviated form.
Some notes on your original post
Your loop doesn't do what you intend (although again, not crystal clear to me) because you are supplying a list (with one element: the text name). You can see it running and comparing:
local names 1 2 3
foreach v in names {
display "`v'"
}
foreach v in `names' {
display "`v'"
}
foreach v of local names {
display "`v'"
}
You need to read the corresponding help files to set that right.
As for the question in your original post, : rownames is another extended macro function but for matrices. See help matrix, #11.
My impression is that for the kind of things you are trying to achieve, you need to dig deeper into the manuals. Furthermore, If you have not read the initial chapters of the Stata User's Guide, then you must do so.

How to access and manipulate strings in Stata variable labels?

I have a Stata dataset with variable labels describing the year of measurement. I need to access the year of measurement from the variables to later rename each variable using a suffix showing the year. For example V95 has a label GNP/CAPITA,75, and I want to rename it to gnp_capita_75. Some variables have labels like GNP/C:GROWTH RTS,60-75, and I want to add the midpoint of the interval after the comma to the variable name.
So far my code for accessing the variable labels looks like this:
local varlist V1 V2 V3 V28 V29 V30 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94
foreach variable in `varlist' {
//get the label of the variable
local label : var label `variable'
//find the position of the comma
local commapos : strpos(`label', ",")
//find the stub before the comma
local namestub : substr(`label', 1, `commapos' - 1)
//find year after the comma
local year : substr(`label', `commapos' + 1, `commapos' + 2)
//replace any illegal character (%,/,:," ") with underscores:
local namestub : subinstr(`namestub', "%", "_")
local namestub : subinstr(`namestub', "/", "_")
local namestub : subinstr(`namestub', ":", "_")
local namestub : subinstr(`namestub', " ", "_")
//rename variable with the new stub and the year
rename `variable' `namestub'`year'
}
I get an error saying that strpos() is not allowed. Is it because I'm trying to make that a local macro? The examples I saw use it to generate a variable in the dataset, but I'm dealing with the variable labels.
Am I in the right direction here? How do I fix this error?
The syntax you need is
local commapos = strpos("`label'", ",")`
and similarly for substr().
Note the extra bug that you need to enclose the label text in " ". That is needed elsewhere too.
The colon syntax is for extended macro functions, not functions in general.
(Learning how to automate this is a good ambition, but if the problem is this size, using varm interactively would have been quicker.)

Use value label in if command

I am working with a set of dta files representing surveys from different years.
Conveniently, each year uses different values for the country variable, so I am trying to set the country value labels for each year to match. I am having trouble comparing value labels though.
So far, I have come up with the following code:
replace country=1 if countryO=="Japan"
replace country=2 if countryO=="South Korea" | countryO=="Korea"
replace country=3 if countryO=="China"
replace country=4 if countryO=="Malaysia"
However, this doesn't work because "Japan" is the value label, not the actual value.
How do I tell Stata that I am comparing the value label?
Try
replace country=1 if countryO=="Japan":country0valuelabel
replace country=2 if inlist(countryO,"South Korea":country0valuelabel,"Korea":country0valuelabel)
You will have to replace country0valuelabel with the corresponding value label name in your data. You can find out its name by looking at the penultimate column in the output of describe country0.
To complement #Dimitriy's answer:
clear all
set more off
sysuse auto
keep foreign weight
describe foreign
label list origin
replace weight = . if foreign == 0
list in 1/15
list in 1/15, nolabel
describe displays the value label associated with a variable. label list can show the content of a particular value label.
I know I'm responding to this post years later, but I wanted to provide a solution that will work for multiple variables in case anybody comes across this.
My task was similar, except that I had to recode every variable that had a "Refused" response as a numerical value (8, 9, 99, etc) to the missing value type (., .r, .b, etc). All the variables had "Refused" coded a different value based on the value label, e.g. some variables had "Refused" coded as 9, while others had it as 99, or 8.
Version Information
Stata 15.1
Code
foreach v of varlist * {
if `"`: val label `v''"' == "yndkr" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "bw3" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "def_some" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "difficulty5" {
recode `v' (9 = .r)
}
}
You can keep adding as many else if commands as needed. I only showed a chunk of my entire loop, but I hope this demonstrates what needs to be done. If you need to find the name of your value labels, use the command labelbook and it will print them all for you.