How do you recode a variable whilst keeping the value labels in Stata? - stata

I have a variable 'gender' that is coded 1=Male 2=Female. I just want to recode it so Male=0 and Female=1.
I tried this:
recode gender 1=0 2=1
which gives me the right code but now Males just come up as '0s' and Females come up as 'Male'. I am assuming because Stata associates the value label 'Male' with the numeric code 1 and as it has no associated value label for 0, it just comes up as 0.
So I tried this:
label define gender 0 "Male" 1 "Female", modify
and
label define gender 0 "Male" 1 "Female", replace
Neither do anything.
I want to recode whilst keeping the value labels please or at least be able to rename them so it can go back to Male and Female. How do I do this?

Related

add a new column based on other columns in sas

I'm new to SAS and would like to get help with the question as follows:
1: Sample table shows as below
Time Color Food label
2020 red Apple A
2019 red Orange A,B
2018 blue Apple A,B
2017 blue Orange B
Logic to return label is:
when color = 'red' then 'A'
when color = 'blue' then 'B'
when food = 'orange' then 'B'
when food = 'apple' then 'A',
since for row 2, we have both red and orange then our label should contains both 'A,B', same as row 3.
The requirement is to print out the label for each combination. I know that we can use CASE WHEN statement to define how is our label should be based on color and food. Here we only have 2 kind of color and 2 different food, but what if we like 7 different color and 10 different food, then we would have 7*10 different combinations. I don't want to list all of those combinations by using case when statement.
Is there any convenient way to return the label? Thanks for any ideas!(prefer to achieve it in PROC SQL, but SAS is also welcome)
This looks like a simple application of formats. So define a format that converts COLOR to a code letter and a second one that converts FOOD to a code letter.
proc format ;
value color 'red'='A' 'blue'='B';
value food 'Apple'='A' 'Orange'='B' ;
run;
Then use those to convert the actual values of COLOR and FOOD variables into the labels. Either in a data step:
data want;
set have ;
length label $5 ;
label=catx(',',put(color,color.),put(food,food.));
run;
Or an SQL query:
proc sql ;
create table want as
select *
, catx(',',put(color,color.),put(food,food.)) as label length=5
from have
;
run;
You do not need to re-create the format if the data changes, only if the list of possible values changes.

Create unique personal id

I have a Stata data set like this:
HouseholdId PersonId OtherVariables
1 1
1 2
2 1
2 2
3 1
3 2
Here HouseholdId is a unique identifier for each household and PersonId is a unique identifier for each person in a household. If I want to create a unique personal id for each person within the sample, period. How would I do this?
I have tried egen per_id = group(PersonID HouseholdID)
but that doesn't seem to work.
I take it that you want a unique identifier for each person within the entire dataset. That could be just
sort HouseholdId PersonId
gen long obs Id = _n
as follows from an accessible discussion in this Stata FAQ. That would have been found by typing in Stata
search identifier
or even
search id
(Meta-answer: You can and should look within Stata for information on basic notions like this.)
I add a strong recommendation that the word unique still carries its original meaning of appearing once only. The word distinct is, I suggest, a much better word when that is what you mean. More on that on p.558 of this paper.

Generate a new variable from row conditions in Stata

I am using Stata13 on Windows 7. I have a dataset with repeated observations of age and educ in a row for each id. i.e. variables q9p1educ and q9p1age is the education and age for person1 respectively , q9p2educ and q9p2age is the education and age for person2 respectively etc. I want to extract the education level of the person with the highest age. I have managed to extract the maximum age maxage using egen maxage = rowmax(q9p1age - q9p9age) How can I get the education of the person with the maximum age?
The sample data is here
I would start by reshaping your data into long format
reshape long q9#educ q9#age, i(id maxage) j(pid) string
Then the answer depends on What do you want to do if the maxage is not unique. Perhaps you could do something like average them?
bysort id (age): gen temp=q9educ if age==maxage
bysort id: egen educmaxage=mean(temp)
drop temp
Then if you want it wide again, you could simply reshape wide.
reshape wide q9#educ q9#age, i(id maxage educmaxage) j(pid) string

sas covariates in a linear regressions

I am running a simple linear regression in SAS. The regression has three different groups of participants as the predictors (with group 1 as the reference), the outcome a continuous social support variable, and five covariates. Three of the covariates are dichotomized (age, sex, & education), one is a three-level nominal variable (marital status), and the last is continuous (it's a chronic disease index).
My question is: Do I need to specify the different types of covariates in the SAS coding somehow?
Would this coding example be correct?:
proc glm data=work.example;
class group age sex education marital education chronic_diseases;
model social_support = group age sex education marital education chronic_diseases;
estimate 'group 1 vs group 2' group -1 1 0;
estimate 'group 1 vs group 3' group -1 0 1;
run;
The class statement tells SAS that you want to consider a variable non-continuous: that is, categorical or binary. It doesn't differentiate between the two, as it will choose the reference based on the first value in ascending order by default unless you specify a reference group.
For example, if you're comparing Apples and Oranges, SAS will use Apples as the reference value. Hey, they're fruit - you can compare fruit to fruit! :)
All model covariates are considered numeric unless specified in a class statement. Since chronic_diseases is continuous, simply remove it from the class statement; otherwise, SAS will look at every single value of chronic_diseases and consider it a level, then compare them all to the lowest level.
proc glm data=work.example;
class group age sex education marital education;
model social_support = group age sex education marital education chronic_diseases;
estimate 'group 1 vs group 2' group -1 1 0;
estimate 'group 1 vs group 3' group -1 0 1;
run;

Recoding of Gender in Stata

In my dataset I currently have the labels Male and Female within my gender variable.
As I am going to be running a regression model I would like to change this so Male and Female are recoded to appear as 0 and 1. However, I am not sure how to do this!
Any help greatly appreciated
You need to so something like this:
recode gender (X = 0) (Y = 1), gen(gender_dummy)
where X and Y are the values you want to recode. You can issue a label list to find out what the coding is.
You stated that your gender variable is numeric, with labels. To determine the numeric values, tabulate without labels
tab gender, nolabel
Let's assume the output reveals that gender variable is coded as male==1 and female==2. To recode that as 0 and 1, I would create a new dichotomous variable called female where female==1 and male==0.
gen female=.
replace female=1 if gender==2
replace female=0 if gender==1
If you then want to add labels to the new female variable you can do that by defining a new label and assigning it to the variable:
label define FEMALE 1 "female" 0 "male"
label values female FEMALE
You can then test this by tabulating with and without labels:
tab female
tab female, nolabel
If you no longer want the original gender variable, you can drop it:
drop gender
You can then rename the new female variable to gender, if you'd like, but it's generally recommend that you name dichotomous variables after whatever value is coded as 1, so I'd leave it as female.
rename female gender