Use value label in if command - stata

I am working with a set of dta files representing surveys from different years.
Conveniently, each year uses different values for the country variable, so I am trying to set the country value labels for each year to match. I am having trouble comparing value labels though.
So far, I have come up with the following code:
replace country=1 if countryO=="Japan"
replace country=2 if countryO=="South Korea" | countryO=="Korea"
replace country=3 if countryO=="China"
replace country=4 if countryO=="Malaysia"
However, this doesn't work because "Japan" is the value label, not the actual value.
How do I tell Stata that I am comparing the value label?

Try
replace country=1 if countryO=="Japan":country0valuelabel
replace country=2 if inlist(countryO,"South Korea":country0valuelabel,"Korea":country0valuelabel)
You will have to replace country0valuelabel with the corresponding value label name in your data. You can find out its name by looking at the penultimate column in the output of describe country0.

To complement #Dimitriy's answer:
clear all
set more off
sysuse auto
keep foreign weight
describe foreign
label list origin
replace weight = . if foreign == 0
list in 1/15
list in 1/15, nolabel
describe displays the value label associated with a variable. label list can show the content of a particular value label.

I know I'm responding to this post years later, but I wanted to provide a solution that will work for multiple variables in case anybody comes across this.
My task was similar, except that I had to recode every variable that had a "Refused" response as a numerical value (8, 9, 99, etc) to the missing value type (., .r, .b, etc). All the variables had "Refused" coded a different value based on the value label, e.g. some variables had "Refused" coded as 9, while others had it as 99, or 8.
Version Information
Stata 15.1
Code
foreach v of varlist * {
if `"`: val label `v''"' == "yndkr" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "bw3" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "def_some" {
recode `v' (9 = .r)
}
else if `"`: val label `v''"' == "difficulty5" {
recode `v' (9 = .r)
}
}
You can keep adding as many else if commands as needed. I only showed a chunk of my entire loop, but I hope this demonstrates what needs to be done. If you need to find the name of your value labels, use the command labelbook and it will print them all for you.

Related

Syntax issues for making a new variable from integer data

I am currently trying to create my do-file for Australian data. The data input asked for a free-cell textbox participant postcode and I would like to create a new variable that assigns them to states. Stata has recognised the free-cell text as an "int" type, but when I try and make a new variable I get a syntax error. I have included the variations on the value range I have tried.
*Make postcode to states
generate famsur_state = "ACT/NSW" if famsur_postcode=="2000/2999"
replace famsur_state = "SA" if famsur_postcode==(5000/5999)
replace famsur_state = "QLD" if famsur_postcode==4000/4999
replace famsur_state = "NT" if famsur_postcode==">=0000 & <=0999"
replace famsur_state = "WA" if famsur_postcode==>=6000 & <=6999
replace famsur_state = "TAS" if famsur_postcode==>=7000 & <=7999
replace famsur_state = "VIC" if famsur_postcode==>=3000 & <=3999
label var famsur_state "Which state is the participant from?"
label define state 1 "ACT/NSW" ///
2 "SA" ///
3 "QLD" ///
4 "NT" ///
5 "WA" ///
6 "TAS" ///
7 "VIC"
label values famsur_state state
I'm not really sure I understand what you want to do here. If I understand you correctly, you're dealing with panel data of families within states, perhaps at one point in time, perhaps at many points in time. So we have a reference point, maybe tell me a bit more about the context the data are situated in.
Anyways, if this is true, and all you want to do is make a string variable based on the values of your numeric variables... Your issue here is that you've misspecified the variable type of the post code variable. More precisely, you've misspecified the way to delineate the range of codes you're interested in. At present, the way the code above is written, you want to generate the string ACT/NSW if the code is "2000/2999". Stata here is interpreting this as a string. This is to say that Stata would replace the variable with "ACT/NSW" if the contents of the cell were "2000/2999".
One way of tackling this problem is the inrange function. Obviously, it acts on variables only if the values of the variables fall within a given range. Here's my code so you can follow along. To see how it works, take away the comment of "set tr on" and you'll see it at work.
clear
cls
set obs 4
g famsur_postcode = .
replace famsur_postcode = 2000 in 1
replace famsur_postcode = 2999 in 2
replace famsur_postcode = 3000 in 3
replace famsur_postcode = 3999 in 4
loc states ACT/NSW VIC
loc ranges ""2000,2999" "3000,3999""
loc n: word count `states'
as `n' ==2
g famsur_state = ""
*set tr on
forv i = 1/`n' {
loc a: word `i' of `states' // Where 1 = "ACT/NSW" and 2 = "VIC"
loc b: word `i' of `ranges' // Where 1 = "2000,2999" and 2 = "3000,3999"
replace famsur_state = "`a'" if inrange(famsur_postcode,`b')
}
br
What I do above, is I make a macro for the state abbreviations and code ranges. I count how many words there are in each macro (in this case 2 but it can be any amount you like). I then loop over these as parallel lists (see the good documentation on this on the Stata website). This, as you can see, seems to handle the first part correctly.
You also seem to want to assign value labels to these abbreviations. What I would do, is use the sencode command to accomplish this (ssc inst sencode, replace), and then recode the remaining variables by hand.
sencode famsur_state, replace
recode famsur_state (2=7)
You have here three different guesses at the syntax, all wrong. P1 to P3 are identifiable problems with your syntax.
generate famsur_state = "ACT/NSW" if famsur_postcode=="2000/2999"
P1. This syntax is illegal and so wrong if the postcode variable is integer, as the double quotes imply that it is string,
replace famsur_state = "SA" if famsur_postcode==(5000/5999)
replace famsur_state = "QLD" if famsur_postcode==4000/4999
P2. This syntax is legal if the postcode is integer, but wrong because the slash indicates division, and the result of 5000/5999 and 4000/4999 is each case not an integer.
replace famsur_state = "NT" if famsur_postcode==">=0000 & <=0999"
P1. Again, the implication of string is illegal.
replace famsur_state = "WA" if famsur_postcode==>=6000 & <=6999
P3. You can say in Stata >= 6000 to mean "greater than equal to 6000", but & does not work like that. ==>= will not do what you want.
replace famsur_state = "TAS" if famsur_postcode==>=7000 & <=7999
P3. Similar comment.
label var famsur_state "Which state is the participant from?"
label define state 1 "ACT/NSW" ///
2 "SA" ///
3 "QLD" ///
4 "NT" ///
5 "WA" ///
6 "TAS" ///
7 "VIC"
This is legal and possibly even correct, meaning what you want:
generate famsur_state = "ACT/NSW" if inrange(famsur_postcode, 2000, 2999)
This would be legal too but Stata will ignore the leading zeros:
replace famsur_state = "NT" if inrange(famsur_postcode, 0000, 0999)
If what you are seeing is the result of value labels, this could be quite wrong. It is possible that you have integers presenting as e.g. 0000 if the display format is %04.0f, but there is not enough hard information in the question to rule out other possibilities.
The functions inrange() will help you but otherwise Stata's syntax here is what is documented at help operators.

Stata factor variables may not contain negative values in categorical variable

I want to count the proportion of a variable, but the warning "factor variables may not contain negative values" always come up. After I check the label list, it contains as below:
label list w38_E1a:
w38_E1a:
-99 Refused
-98 Don't know
1 Yes
2 No
How do I remove this -99 and -98 data?
Thank you.
Assuming that the data is coded as numeric type, then I would simply recode them to be positive because if they are categorical it shouldn't matter their sign,
recode w38_E1a (-99 = 99) (-98 = 98)
I think you should drop those outliers,you can use drop if w38_E1a<0
It seems that -99 and -98 are intended to code missing values, thus no outlier here. If this is the case, you should recode the values -99 and -98 of variables using the label w38_E1a to missing. To find the variables whose values are labeled with a specific value label you can use -findname- from SSC.
cap which findname
if _rc ssc install findname // install -findname if necessary
findname, vallabelname(w38_E1a)
foreach v of varlist `r(varlist)' {
recode `v' (-99 = .a ) (-98 = .b)
}
label def w38_E1a .a "Refused" .b "Don't know" -99 "" -98 "", modify
I could not find a way to respond to https://stackoverflow.com/users/15742435/jesse-kaczmarski or https://stackoverflow.com/users/15819003/bing and because I have not "earned" enough reputation I can't comment on their answers directly. However, one should note that their advice can work out in a wrong way:
puput0808 only showed us a the contents of a value label, however, you are trying to recode a variable with the same name or drop cases if a variable with the same name have the values -99 or -98. However, what if the variable name is not identical to the name of the value label? It could be (a) that there is no variable that is connected to this value label (in that case an error message would occur) or (b) that there are several variables connected to this value label and only one has also the name of the value label (in this case the problem would persist).
puput0808 showed us the labels of -99 and -98 indicating that the values are intended to be treated as missing. In that case recoding the values to positive numbers would certainly be a mistake.

Shorten all value labels of a variable

Let's say I'm using
sysuse auto
where the variable foreign has two value labels. I want to take a certain variable, foreign, and change all the value labels that are longer than 6 chars, to 3 chars + (...).
In this dataset, that would mean changing the labels to For(...) and Dom(...). In my actual dataset, I have dozens of different values. Therefore, I'm looking for a solution that loops through all value labels, and doesn't specifically change each of these two manually.
Since I'm doing this on an air gap server, I prefer approaches that work with default packages.
sysuse auto, clear
levelsof foreign, local(values)
foreach value of local values {
local labfull : label (foreign) `value'
if strlen("`labfull'") > 6 {
local labsub = substr("`labfull'", 1, 3)
local newlab `"`labsub'(...)"'
label define newlabel `value' "`newlab'", add
}
else label define newlabel `value' "`labfull'", add
}
label list
label values foreign newlabel

giving a string variable values conditional on another variable

I am using Stata 14. I have US states and corresponding regions as integer.
I want create a string variable that represents the region for each observation.
Currently my code is
gen div_name = "A"
replace div_name = "New England" if div_no == 1
replace div_name = "Middle Atlantic" if div_no == 2
.
.
replace div_name = "Pacific" if div_no == 9
..so it is a really long code.
I was wondering if there is a shorter way to do this where I can automate assigning values rather than manually hard coding them.
You can define value labels in one line with label define and then use decode to create the string variable. See the help for those commands.
If the correspondence was defined in a separate dataset you could use merge. See e.g. this FAQ
There can't be a short-cut here other than typing all the names at some point or exploiting the fact that someone else typed them earlier into a file.
With nine or so labels, typing them yourself is quickest.
Note that you type one statement more than you need, even doing it the long way, as you could start
gen div_name = "New England" if div_no == 1

Extract the mean from svy mean result in Stata

I am able to extract the mean into a matrix as follows:
svy: mean age, over(villageid)
matrix villagemean = e(b)'
clear
svmat village
However, I also want to merge this mean back to the villageid. My current thinking is to extract the rownames of the matrix villagemean like so:
local names : rownames villagemean
Then try to turn this macro names into variable
foreach v in names {
gen `v' = "``v''"
}
However, the variable names is empty. What did I do wrong? Since a lot of this is copied from Stata mailing list, I particularly don't understand the meaning of local names : rownames villagemean.
It's not completely clear to me what you want, but I think this might be it:
clear
set more off
*----- example data -----
webuse nhanes2f
svyset [pweight=finalwgt]
svy: mean zinc, over(sex)
matrix eb = e(b)
*----- what you want -----
levelsof sex, local(levsex)
local wc: word count `levsex'
gen avgsex = .
forvalues i = 1/`wc' {
replace avgsex = eb[1,`i'] if sex == `:word `i' of `levsex''
}
list sex zinc avgsex in 1/10
I make use of two extended macro functions:
local wc: word count `levsex'
and
`:word `i' of `levsex''
The first one returns the number of words in a string; the second returns the nth token of a string. The help entry for extended macro functions is help extended_fcn. Better yet, read the manuals, starting with: [U] 18.3 Macros. You will see there (18.3.8) that I use an abbreviated form.
Some notes on your original post
Your loop doesn't do what you intend (although again, not crystal clear to me) because you are supplying a list (with one element: the text name). You can see it running and comparing:
local names 1 2 3
foreach v in names {
display "`v'"
}
foreach v in `names' {
display "`v'"
}
foreach v of local names {
display "`v'"
}
You need to read the corresponding help files to set that right.
As for the question in your original post, : rownames is another extended macro function but for matrices. See help matrix, #11.
My impression is that for the kind of things you are trying to achieve, you need to dig deeper into the manuals. Furthermore, If you have not read the initial chapters of the Stata User's Guide, then you must do so.