How to access and manipulate strings in Stata variable labels? - stata

I have a Stata dataset with variable labels describing the year of measurement. I need to access the year of measurement from the variables to later rename each variable using a suffix showing the year. For example V95 has a label GNP/CAPITA,75, and I want to rename it to gnp_capita_75. Some variables have labels like GNP/C:GROWTH RTS,60-75, and I want to add the midpoint of the interval after the comma to the variable name.
So far my code for accessing the variable labels looks like this:
local varlist V1 V2 V3 V28 V29 V30 V85 V86 V87 V88 V89 V90 V91 V92 V93 V94
foreach variable in `varlist' {
//get the label of the variable
local label : var label `variable'
//find the position of the comma
local commapos : strpos(`label', ",")
//find the stub before the comma
local namestub : substr(`label', 1, `commapos' - 1)
//find year after the comma
local year : substr(`label', `commapos' + 1, `commapos' + 2)
//replace any illegal character (%,/,:," ") with underscores:
local namestub : subinstr(`namestub', "%", "_")
local namestub : subinstr(`namestub', "/", "_")
local namestub : subinstr(`namestub', ":", "_")
local namestub : subinstr(`namestub', " ", "_")
//rename variable with the new stub and the year
rename `variable' `namestub'`year'
}
I get an error saying that strpos() is not allowed. Is it because I'm trying to make that a local macro? The examples I saw use it to generate a variable in the dataset, but I'm dealing with the variable labels.
Am I in the right direction here? How do I fix this error?

The syntax you need is
local commapos = strpos("`label'", ",")`
and similarly for substr().
Note the extra bug that you need to enclose the label text in " ". That is needed elsewhere too.
The colon syntax is for extended macro functions, not functions in general.
(Learning how to automate this is a good ambition, but if the problem is this size, using varm interactively would have been quicker.)

Related

Why do I get an invalid syntax error with a foreach loop?

I want to rename variable names starting with intensity. I received an invalid syntax, r(198) error, with the following code.
#delimit;
foreach VAR of varlist intensity* {;
local NEW = subinstr("`VAR'", "intensity", "int");
rename `VAR' `NEW';
};
Your use of the delimiter ; here does not bite, so I will ignore it.
The error is in the use of subinstr(), which must have four arguments, the fourth being the number of substitutions to be made. See help subinstr().
This works (note please the use of a minimal complete verifiable example):
clear
set obs 1
generate intensity1 = 1
generate intensity2 = 2
foreach VAR of varlist intensity* {
local NEW = subinstr("`VAR'", "intensity", "int", 1)
rename `VAR' `NEW'
}
ds
But the loop is utterly unnecessary. First, let's flip the names back and then show how to change names directly:
rename int* intensity*
rename intensity* int*
See help rename groups for more.

How to separate Stata macro `varlist' with commas for using in mi( ) and inlist( )?

I want to store a list of variables in a macro and then call that macro inside a mi() statement. The original application is for a programme that uses data I cannot bring online for secrecy reasons, and which will include the following statement:
generate u = cond(mi(`vars'),., runiform(0,1))
The issue being that mi() requires comma separated variable names but vars is delimited by spaces.
I use the auto dataset and mark to illustrate my problem:
sysuse auto
local myvars foreign price
mark missing if mi(`myvars')
In this example, mi() asks for arguments separated by commas, Stata stops and complains that it cannot find a foreignprice variable. Is there a utility function that will insert the commas between the macro elements?
A direct answer to the question as set is to use the macro extended function subinstr to change spaces to commas:
sysuse auto
local myvars foreign price
local myvars : subinstr local myvars " " ",", all
mark missing if mi(`myvars')
If the aim is to create a marker variable that marks observations with any values missing on specified variables, then there are other alternative ways, most of which don't need any fiddling with separators in a list. This doesn't purport to be a complete set.
A1.
regress foreign price
gen missing = !e(sample)
A2.
egen missing = rowmiss(foreign price)
replace missing = missing > 0
A3.
local myvars foreign price
local myvars : subinstr local myvars " " ",", all
gen missing = missing(`myvars')
A4.
gen missing = 0
quietly foreach v in foreign price {
replace missing = 1 if missing(`v')
}
A5.
mark missing
markout missing foreign price
replace missing = !missing
EDIT In the edited question there is reference to this within a program:
generate u = cond(mi(`vars'),., runiform(0,1))
I wouldn't do that, even with the macro edited to include commas too, although any issue is more one of personal taste.
marksample touse
markout `vars'
generate u = runiform(0,1) if `touse'
It's likely that the indicator variable so produced is needed, or at least useful, somewhere else in the same program.

Generate variables with loop over pairs of variables

I have data on quantities and Values for a set of countries, and currently the variable names are Q_US V_US Q_UK V_UK Q_France V_France and in that order: Quantity_country Value_country, etc.
For each country (US, UK, France, etc.) I want to generate a new variable that gives me the unit value. Manually I would create them as
gen unit_US = V_US/Q_US
gen unit_UK = V_UK/Q_UK
gen unit_France = V_France/Q_France
But I have 100+ countries, and it would be great to do this in a loop if possible.
Is there an easy way to do this?
Let's get a list of all the countries as you have used them in variable names.
unab where : V_*
local where " `where'"
local where : subinstr local where " V_" " ", all
The additional space is designed to ensure that the text removed is just the prefix V_ at the start of variable names. For another example of using unab, see this FAQ.
Check it worked:
display "`where'"
Now loop:
foreach c of local where {
gen unit_`c' = V_`c'/Q_`c'
}
I'd also consider reshape long.

Replace $ in Stata variable labels with \textdollar

I have some variables with dollar signs (i.e., $) in the variable labels. This causes some problems downstream in my code (I later modify these labels and the dollar signs deregister as empty global macros). So I would like to replace these dollar signs with LaTeX's \textdollar using Stata's subinstr() function.
But I can't figure it out. Is this possible? Or should I resign to doing this more manually? Or by looking for other characters near or around the $ in the variable labels?
clear
set obs 10
generate x = runiform()
label variable x "Label with $mil"
generate y = runiform()
label variable y "Another label with $mil"
describe
foreach v of varlist * {
local name : variable label `v'
local name `=subinstr("`name'", "$mil", "\textdollar", .)'
label variable `v' "`name'"
}
describe
This removes the label altogether.
You are missing an argument in subinstr(), what appears in help as n:
clear
set obs 10
generate x = runiform()
label variable x "Label with $"
local name: variable label x
local name = subinstr("`name'", "$", "\textdollar", .)
label variable x "`name'"
describe
(The problem has completely changed, which is why I give a separate answer.)
Having $something in the variable label is somewhat problematic because Stata will treat it as a macro and will therefore dereference it. What is actually Stata doing in your toy example? Let's see:
This is expected behavior:
. local name = subinstr("some text", " ", "xyz", .)
. display "`name'"
somexyztext
The following, which I don't know if documented, is not necessarily expected but crucial in understanding:
. local name = subinstr("some text", "", "xyz", .)
. display "`name'"
. (blank)
I put in the last line to emphasize that the local name has nothing.
In your code Stata dereferences $mil to nothing (because it's not declared beforehand; it's not meant to, of course). In fact,
label variable x "Label with $mil"
does not hold what you intend. Rather you want to delay macro substitution with \:
label variable x "Label with \$mil"
For the other part, when you run this
local name `=subinstr("`name'", "$mil", "\textdollar", .)'
it evaluates to
local name `=subinstr("`name'", "", "\textdollar", .)'
and the local name now holds nothing. That ends the story of why your code does what it does.
A solution might be:
clear
set obs 10
generate x = runiform()
label variable x "Label with \$mil"
generate y = runiform()
label variable y "Another \$mil"
describe
*-----
foreach v of varlist _all {
local name : variable label `v'
label variable `v' "`=subinstr("`name'\$mil", "\$mil", "\textdollar", .)'"
}
describe
but this only works if $mil is at the end of the label text. If it is in the middle somewhere, another strategy must be used.
All this on Stata 12.1.

Use value label in if command

I am working with a set of dta files representing surveys from different years.
Conveniently, each year uses different values for the country variable, so I am trying to set the country value labels for each year to match. I am having trouble comparing value labels though.
So far, I have come up with the following code:
replace country=1 if countryO=="Japan"
replace country=2 if countryO=="South Korea" | countryO=="Korea"
replace country=3 if countryO=="China"
replace country=4 if countryO=="Malaysia"
However, this doesn't work because "Japan" is the value label, not the actual value.
How do I tell Stata that I am comparing the value label?
Try
replace country=1 if countryO=="Japan":country0valuelabel
replace country=2 if inlist(countryO,"South Korea":country0valuelabel,"Korea":country0valuelabel)
You will have to replace country0valuelabel with the corresponding value label name in your data. You can find out its name by looking at the penultimate column in the output of describe country0.
To complement #Dimitriy's answer:
clear all
set more off
sysuse auto
keep foreign weight
describe foreign
label list origin
replace weight = . if foreign == 0
list in 1/15
list in 1/15, nolabel
describe displays the value label associated with a variable. label list can show the content of a particular value label.
I know I'm responding to this post years later, but I wanted to provide a solution that will work for multiple variables in case anybody comes across this.
My task was similar, except that I had to recode every variable that had a "Refused" response as a numerical value (8, 9, 99, etc) to the missing value type (., .r, .b, etc). All the variables had "Refused" coded a different value based on the value label, e.g. some variables had "Refused" coded as 9, while others had it as 99, or 8.
Version Information
Stata 15.1
Code
foreach v of varlist * {
if `"`: val label `v''"' == "yndkr" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "bw3" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "def_some" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "difficulty5" {
recode `v' (9 = .r)
}
}
You can keep adding as many else if commands as needed. I only showed a chunk of my entire loop, but I hope this demonstrates what needs to be done. If you need to find the name of your value labels, use the command labelbook and it will print them all for you.