Recoding of Gender in Stata - stata

In my dataset I currently have the labels Male and Female within my gender variable.
As I am going to be running a regression model I would like to change this so Male and Female are recoded to appear as 0 and 1. However, I am not sure how to do this!
Any help greatly appreciated

You need to so something like this:
recode gender (X = 0) (Y = 1), gen(gender_dummy)
where X and Y are the values you want to recode. You can issue a label list to find out what the coding is.

You stated that your gender variable is numeric, with labels. To determine the numeric values, tabulate without labels
tab gender, nolabel
Let's assume the output reveals that gender variable is coded as male==1 and female==2. To recode that as 0 and 1, I would create a new dichotomous variable called female where female==1 and male==0.
gen female=.
replace female=1 if gender==2
replace female=0 if gender==1
If you then want to add labels to the new female variable you can do that by defining a new label and assigning it to the variable:
label define FEMALE 1 "female" 0 "male"
label values female FEMALE
You can then test this by tabulating with and without labels:
tab female
tab female, nolabel
If you no longer want the original gender variable, you can drop it:
drop gender
You can then rename the new female variable to gender, if you'd like, but it's generally recommend that you name dichotomous variables after whatever value is coded as 1, so I'd leave it as female.
rename female gender

Related

How do you recode a variable whilst keeping the value labels in Stata?

I have a variable 'gender' that is coded 1=Male 2=Female. I just want to recode it so Male=0 and Female=1.
I tried this:
recode gender 1=0 2=1
which gives me the right code but now Males just come up as '0s' and Females come up as 'Male'. I am assuming because Stata associates the value label 'Male' with the numeric code 1 and as it has no associated value label for 0, it just comes up as 0.
So I tried this:
label define gender 0 "Male" 1 "Female", modify
and
label define gender 0 "Male" 1 "Female", replace
Neither do anything.
I want to recode whilst keeping the value labels please or at least be able to rename them so it can go back to Male and Female. How do I do this?

Calculate Total male Total Female in powerbi report

I want to count the total number of males and females in a powerbi report
eg.
Name Gender
std1 Female
std2 Male
std3 Female
std4 Male
std5 Male
std6 Male
The result I want is:
Female 2
Male 4
To get the results you are looking for, follow these steps:
Make a second table using the New Table button with the following code:
GenderCounts = DISTINCT(TableName[Gender])
Make a relationship from the newly create table back to the original table
Add a new column to the GenderCounts table with the following code:
Count = COUNTROWS(RELATEDTABLE(TableName))
And there you have a second table containing the counts of each gender.
For more information and other possibilities, check out a related Power BI Community forum post here and Stackoverflow questions here and here.

Generate a new variable from row conditions in Stata

I am using Stata13 on Windows 7. I have a dataset with repeated observations of age and educ in a row for each id. i.e. variables q9p1educ and q9p1age is the education and age for person1 respectively , q9p2educ and q9p2age is the education and age for person2 respectively etc. I want to extract the education level of the person with the highest age. I have managed to extract the maximum age maxage using egen maxage = rowmax(q9p1age - q9p9age) How can I get the education of the person with the maximum age?
The sample data is here
I would start by reshaping your data into long format
reshape long q9#educ q9#age, i(id maxage) j(pid) string
Then the answer depends on What do you want to do if the maxage is not unique. Perhaps you could do something like average them?
bysort id (age): gen temp=q9educ if age==maxage
bysort id: egen educmaxage=mean(temp)
drop temp
Then if you want it wide again, you could simply reshape wide.
reshape wide q9#educ q9#age, i(id maxage educmaxage) j(pid) string

sas covariates in a linear regressions

I am running a simple linear regression in SAS. The regression has three different groups of participants as the predictors (with group 1 as the reference), the outcome a continuous social support variable, and five covariates. Three of the covariates are dichotomized (age, sex, & education), one is a three-level nominal variable (marital status), and the last is continuous (it's a chronic disease index).
My question is: Do I need to specify the different types of covariates in the SAS coding somehow?
Would this coding example be correct?:
proc glm data=work.example;
class group age sex education marital education chronic_diseases;
model social_support = group age sex education marital education chronic_diseases;
estimate 'group 1 vs group 2' group -1 1 0;
estimate 'group 1 vs group 3' group -1 0 1;
run;
The class statement tells SAS that you want to consider a variable non-continuous: that is, categorical or binary. It doesn't differentiate between the two, as it will choose the reference based on the first value in ascending order by default unless you specify a reference group.
For example, if you're comparing Apples and Oranges, SAS will use Apples as the reference value. Hey, they're fruit - you can compare fruit to fruit! :)
All model covariates are considered numeric unless specified in a class statement. Since chronic_diseases is continuous, simply remove it from the class statement; otherwise, SAS will look at every single value of chronic_diseases and consider it a level, then compare them all to the lowest level.
proc glm data=work.example;
class group age sex education marital education;
model social_support = group age sex education marital education chronic_diseases;
estimate 'group 1 vs group 2' group -1 1 0;
estimate 'group 1 vs group 3' group -1 0 1;
run;

Transpose keeping all combinations of multiple values for id variable in SAS

I want to transpose a table from long to wide but I have more than one value for each Key. I want the transposed table to have one line for each combination of Id and Key, so for this example item, I'd have 8 lines after transposition. The Id variable would be preserved, each distinct Key would be all combinations of different values for the same key. So 2 * 2 * 1 * 1 * 1 * 2 = 8 lines.
data grades;
input Id Key $ Value $;
cards;
219381 Category Something
219381 Category Another
219381 Color White
219381 Color Black
219381 Sport Football
219381 Gender Male
219381 Size Big
219381 Quality Good
219381 Quality Bad
;
run;
This is what I want to come out after this complex transposition:
Id Category Color Sport Gender Size Quality
219381 Something White Football Male Big Good
219381 Something White Football Male Big Bad
219381 Something Black Football Male Big Good
219381 Something Black Football Male Big Bad
219381 Another White Football Male Big Good
219381 Another White Football Male Big Bad
219381 Another Black Football Male Big Good
219381 Another Black Football Male Big Bad
Any ideas how I can achieve this?
I've tried many things without success.
To me this looks like you want a cartesian product for different keys that are stored in one table. While not very SAS-like, one way you can get the result you're looking for is by using PROC SQL with joins on the same table, simulating individual tables for these different key types.
PROC SQL;
CREATE TABLE grades_combos AS
SELECT DISTINCT
g.id, category.value as category, color.value as color, sport.value as sport,
gender.value as gender, size.value as size, quality.value as quality
FROM grades g
INNER JOIN grades category ON category.id = g.id AND category.key = 'Category'
INNER JOIN grades color ON color.id = g.id AND color.key = 'Color'
INNER JOIN grades sport ON sport.id = g.id AND sport.key = 'Sport'
INNER JOIN grades gender ON gender.id = g.id AND gender.key = 'Gender'
INNER JOIN grades size ON size.id = g.id AND size.key = 'Size'
INNER JOIN grades quality ON quality.id = g.id AND quality.key = 'Quality'
ORDER BY id, category, color, sport, gender, size, quality
;
QUIT;
You could probably make this more flexible and generic, perhaps wrapping this in a macro that generates the JOIN statements based on an arbitrary set of keys.
From the top of my head, I can think of the following two approaches:
Use a by-statement? (requires sorted data)
Create a third variable which is the concatenation of the other 2
variables and use that one.