Currently recoding a household survey dataset with 300+ variables.
1. It would be too inefficient though if I name the variables one-by-one. Is there any way to relabel all in a few lines of code?**
This is the dictionary file of two of my variables lc03_rel (relationship to household head) and lc04_sex (sex of the household head:
[Item]
Label=C03-Relationship to Household Head
Name=LC03_REL
Start=27
Len=2
ZeroFill=Yes
[ValueSet]
Label=C03-Relationship to Household Head
Name=LC03_REL_VS1
Value=1;Head
Value=2;Wife/Spouse
Value=3;Son/daughter
Value=4;Brothers/sisters
Value=5;Son/daughter_law
Value=6;Grandchildren
Value=7;Father/Mother
Value=8;Other Relative
Value=9;Boarder
Value=10;Domestic Helper
Value=11;Non_Relative
[Item]
Label=C04-Sex
Name=LC04_SEX
Start=29
Len=1
ZeroFill=Yes
[ValueSet]
Label=C04-Sex
Name=LC04_SEX_VS1
Value=1;Male
Value=2;Female
2. Also, can one assign value labels to categorical variables in a few lines of code ?
- Most of my categorical variables are 0 = No / 1 = Yes.
- Note that the value set in the dictionary file are the value labels.
Question 1. I don't see that there is a way in any language to avoid defining value labels separately for two variables Relationship to Household Head and Sex. What would be "inefficient" about that?
Question 2. You can assign a label to several variables at once, as is documented in the help and the associated manual entry.
label def yesno 1 Yes 0 No
label val foo bar CO5-CO72 yesno
If all your variables that use (for example) the value labels male/female end in "_sex" you could use a loop to assign labels (edited to show examples without loops as suggested in comment).
label def sex_lab 2 female 1 male
foreach var of varlist *_sex {
label values `var' sex_lab
}
or without a loop:
label def sex_lab 2 female 1 male
label values *_sex sex_lab
Alternatively create a list of the variables you want to assign a particular label to:
local LIST_MY_VARS "lc04_sex foo bar"
forvalues i=1/2 {
local my_var: word `i' of `LIST_MY_VARS'
label values `my_var' sex_lab
}
or without a loop:
label values LIST_MY_VARS sex_lab
Related
I have a variable called "count," which contains the number of subjects who attend each of 1300 study visits. I would like to store these values in a local macro and display them one by one using a for loop.
E.g.,
local count_disp = count
forvalues i = 1/1300 {
disp `i' of `count_disp'
}
However, I'm unsure how to store the entire list of the count variable in a macro or how to call each "word" in the macro.
Is this possible in Stata?
In case you only want to display all values in order, then it is easier to skip the intermediate step of creating the macro. You can just display the values row by row like this:
* Example generated by -dataex-. For more info, type help dataex
clear
input byte count
11
22
33
end
* Loop over the number of observation and display
* the value of variable count for that row number
count
forvalues i = 1/`r(_N)' {
* Display value
display count[`i']
}
Let's say I'm using
sysuse auto
where the variable foreign has two value labels. I want to take a certain variable, foreign, and change all the value labels that are longer than 6 chars, to 3 chars + (...).
In this dataset, that would mean changing the labels to For(...) and Dom(...). In my actual dataset, I have dozens of different values. Therefore, I'm looking for a solution that loops through all value labels, and doesn't specifically change each of these two manually.
Since I'm doing this on an air gap server, I prefer approaches that work with default packages.
sysuse auto, clear
levelsof foreign, local(values)
foreach value of local values {
local labfull : label (foreign) `value'
if strlen("`labfull'") > 6 {
local labsub = substr("`labfull'", 1, 3)
local newlab `"`labsub'(...)"'
label define newlabel `value' "`newlab'", add
}
else label define newlabel `value' "`labfull'", add
}
label list
label values foreign newlabel
I have many datasets that have a panel identifier that is a labeled integer. The files were put together by several co-authors and RAs, so I am worried that value labels are not consistent and that a merge will be wrong. For example, if Firm ABC is 1 and 11. The data is xtset, so these cannot be stored as a string.
Is there a way to check that for a given a variable, the value labels are consistent across datasets? There's too many panels and datasets to check by hand.
Here's a toy example of what I am trying to avoid:
clear
tempfile f1
clear
input id
1
2
end
label define idlab 1 "One" 2 "Two"
lab val id idlab
save `f1'
clear
input id
1
2
end
label define idlab 1 "Three" 2 "Four"
lab val id idlab
merge 1:1 id using `f1', nogen
list, clean noobs
I think that you can use value label and label macro extended functions to validate if value labels are consistent accross datasets. My strategy would be:
to load a dataset with the label values that I consider the more adequate.
loop through all datasets
get all the values of variable id
get the name of the label value with the value label macro function
get the label of every value of id with the label macro function
compare the "adequate" value label with the value label of this dataset
If you detect that the label is different for some value, you could do something to change it.
This would be the strategy to verify with your toy code:
clear
tempfile f1
clear
input id
1
2
end
label define idlab 1 "One" 2 "Two"
lab val id idlab
levelsof id, local(values_id)
global labelvname1 : value label id
foreach x of local values_id{
global labelvname1_`x' : label ${labelvname1} `x'
}
save `f1'
clear
input id
1
2
end
label define idlab 1 "Three" 2 "Four"
lab val id idlab
global labelvname2 : value label id
foreach x of local values_id{
global labelvname2_`x' : label ${labelvname2} `x'
if "${labelvname1_`x'}"!="${labelvname2_`x'}"{
display "Label values are different for value `x'"
}
}
I hope this help.
I am able to extract the mean into a matrix as follows:
svy: mean age, over(villageid)
matrix villagemean = e(b)'
clear
svmat village
However, I also want to merge this mean back to the villageid. My current thinking is to extract the rownames of the matrix villagemean like so:
local names : rownames villagemean
Then try to turn this macro names into variable
foreach v in names {
gen `v' = "``v''"
}
However, the variable names is empty. What did I do wrong? Since a lot of this is copied from Stata mailing list, I particularly don't understand the meaning of local names : rownames villagemean.
It's not completely clear to me what you want, but I think this might be it:
clear
set more off
*----- example data -----
webuse nhanes2f
svyset [pweight=finalwgt]
svy: mean zinc, over(sex)
matrix eb = e(b)
*----- what you want -----
levelsof sex, local(levsex)
local wc: word count `levsex'
gen avgsex = .
forvalues i = 1/`wc' {
replace avgsex = eb[1,`i'] if sex == `:word `i' of `levsex''
}
list sex zinc avgsex in 1/10
I make use of two extended macro functions:
local wc: word count `levsex'
and
`:word `i' of `levsex''
The first one returns the number of words in a string; the second returns the nth token of a string. The help entry for extended macro functions is help extended_fcn. Better yet, read the manuals, starting with: [U] 18.3 Macros. You will see there (18.3.8) that I use an abbreviated form.
Some notes on your original post
Your loop doesn't do what you intend (although again, not crystal clear to me) because you are supplying a list (with one element: the text name). You can see it running and comparing:
local names 1 2 3
foreach v in names {
display "`v'"
}
foreach v in `names' {
display "`v'"
}
foreach v of local names {
display "`v'"
}
You need to read the corresponding help files to set that right.
As for the question in your original post, : rownames is another extended macro function but for matrices. See help matrix, #11.
My impression is that for the kind of things you are trying to achieve, you need to dig deeper into the manuals. Furthermore, If you have not read the initial chapters of the Stata User's Guide, then you must do so.
I am working with a set of dta files representing surveys from different years.
Conveniently, each year uses different values for the country variable, so I am trying to set the country value labels for each year to match. I am having trouble comparing value labels though.
So far, I have come up with the following code:
replace country=1 if countryO=="Japan"
replace country=2 if countryO=="South Korea" | countryO=="Korea"
replace country=3 if countryO=="China"
replace country=4 if countryO=="Malaysia"
However, this doesn't work because "Japan" is the value label, not the actual value.
How do I tell Stata that I am comparing the value label?
Try
replace country=1 if countryO=="Japan":country0valuelabel
replace country=2 if inlist(countryO,"South Korea":country0valuelabel,"Korea":country0valuelabel)
You will have to replace country0valuelabel with the corresponding value label name in your data. You can find out its name by looking at the penultimate column in the output of describe country0.
To complement #Dimitriy's answer:
clear all
set more off
sysuse auto
keep foreign weight
describe foreign
label list origin
replace weight = . if foreign == 0
list in 1/15
list in 1/15, nolabel
describe displays the value label associated with a variable. label list can show the content of a particular value label.
I know I'm responding to this post years later, but I wanted to provide a solution that will work for multiple variables in case anybody comes across this.
My task was similar, except that I had to recode every variable that had a "Refused" response as a numerical value (8, 9, 99, etc) to the missing value type (., .r, .b, etc). All the variables had "Refused" coded a different value based on the value label, e.g. some variables had "Refused" coded as 9, while others had it as 99, or 8.
Version Information
Stata 15.1
Code
foreach v of varlist * {
if `"`: val label `v''"' == "yndkr" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "bw3" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "def_some" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "difficulty5" {
recode `v' (9 = .r)
}
}
You can keep adding as many else if commands as needed. I only showed a chunk of my entire loop, but I hope this demonstrates what needs to be done. If you need to find the name of your value labels, use the command labelbook and it will print them all for you.