Split IDs in categorical variables - stata

I have a variable with IDs:
clear
input ID
1
.
2
1
.
3
4
5
4
4
6
end
How can I create separate categorical variables with ID as a name and values of 1 and 2 (the latter if the generated variable matches the ID)?
For example, variable _ID_1 should look as follows:
2
.
1
2
.
1
1
1
1
1
1
Any ideas?

Another way to do it:
clear
input ID
1
.
2
1
.
3
4
5
4
4
6
end
forvalues j = 1/6 {
generate ID_`j' = 1 + (ID == `j') if ID != .
}
list

Related

page break by length and group sas proc report

I would like to create a page break value that can help me break the page when I use proc report.
Now my data looks like this:
Group Value
a 1
a 2
a 3
...
b 1
b 2
...
c 1
c 2
c 3
And suppose I only want two lines per page, and break if the group changed.
So I need a dataset like this:
Group Value Page
a 1 1
a 2 1
a 3 2
...
b 1 3
b 2 3
...
c 1 4
c 2 4
c 3 5
Can anyone help me with this? Thanks!
Retain holds values across rows. Create a counter value that you can use to track the number of records per group. This allows you to split it into pages of N amount.
Use BY and FIRST to reset counter at the start of each group
Check if the you need to increment page
data have;
input Group $ Value;
cards;
a 1
a 2
a 3
b 1
b 2
c 1
c 2
c 3
;;;;
data want;
set have;
by group;
retain counter page;
if first.group then counter=0;
counter+1;
if mod(counter, 2) =1 or first.group then page+1;
run;
proc print data=want;
run;
Results:
Obs Group Value counter page
1 a 1 1 1
2 a 2 2 1
3 a 3 3 2
4 b 1 1 3
5 b 2 2 3
6 c 1 1 4
7 c 2 2 4
8 c 3 3 5

Create duplicate rows in SAS and change values of variables

I have been so confused on how to implement this in SAS. I am trying to create duplicate rows if the value of "2" occurs more than once between the variables (member1 -member4). For example, if a row has the value 2 in member2, member3, and member4, then I will create 2 duplicate rows since the initial row will serve for the first variable and the duplicate rows will be for member 3 and 4. On the duplicate row for member3 for example, member 2 and 4 will be missing if their values is equal to 2. Basically the value "2" can only occur once per row. let's assume sa1 to sa4 corresponds to other variables of member1 to member4 respectively. When we create a duplicate row for each member, the other variables should be missing if they have a value of "1". For example, if the duplicate row is for member 3, then values that equal "1" for sa1, sa2 and sa4 should be set to missing. There are other variables in the dataset that will have same values for all duplicate rows as initial rows. Duplicate rows will also have a suffix for the ID to indicate the parent rows.
This is an example of the data I have
id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 0 2 2 0 0 1 1 0
2 2 2 0 5 . 1 0 0
3 2 2 3 2 1 1 0 1
Then this is the output I am trying to achieve
id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 0 2 . 0 0 1 . 0
1_1 0 . 2 0 0 . 1 0
2 2 . 0 5 . . 0 0
2_1 . 2 0 5 . 1 0 0
3 2 . 3 . 1 . 0 .
3_1 . 2 3 . . 1 0 .
3_2 . . 3 2 . . 0 1
Will appreciate any help. Thank you!
You need to count the number of '2's. You also need to remember where they used to be. "I had the spots removed for good luck, but I remember where the spots formerly were."
data have ;
input id :$10. member1 member2 member3 member4 sa1 sa2 sa3 sa4 ;
cards;
1 0 2 2 0 0 1 1 0
2 2 2 0 5 . 1 0 0
3 2 2 3 2 1 1 0 1
4 2 0 0 0 . . . .
5 0 0 0 0 . . . .
;
data want ;
set have ;
array m member1-member4 ;
array x [4] _temporary_;
do index=1 to dim(m);
x[index]=m[index]=2;
end;
n2 = sum(of x[*]);
if n2<2 then output;
else do counter=1 to n2;
id=scan(id,1,'_');
if counter > 1 then id=catx('_',id,counter-1);
counter2=0;
do index=1 to dim(m);
if x[index] then do;
counter2+1;
if counter = counter2 then m[index]=2;
else m[index]=.;
end;
end;
output;
end;
drop index n2 counter counter2;
run;
Results
Obs id member1 member2 member3 member4 sa1 sa2 sa3 sa4
1 1 0 2 . 0 0 1 1 0
2 1_1 0 . 2 0 0 1 1 0
3 2 2 . 0 5 . 1 0 0
4 2_1 . 2 0 5 . 1 0 0
5 3 2 . 3 . 1 1 0 1
6 3_1 . 2 3 . 1 1 0 1
7 3_2 . . 3 2 1 1 0 1
8 4 2 0 0 0 . . . .
9 5 0 0 0 0 . . . .
I think your expecting us to code the whole thing for you... I dont get your logic explanation of what you want - but to start off with:
create a new dataset
rename all the variables on the way in - prefix with O_ (Original)
code however you like to see how many values contain 2 (HOWMANYTWOS)
do ROW = 1 to HOWMANYTWOS
4.1 again go through the values on the O_ variables you have
4.2 if the ROW - corresponds to your increasing counter its the 2 you wish to keep and so you dont touch it - if the 2 does not correspond to your ROW - make it .
4.3 output the record with a new(if required) ID
a start for you:
data NEW;
set ORIG (rename=(MEMBER1-MEMBER4=O_MEMBER1-O_MEMBER4 ID=O_ID etc..)
HOWMANYTWOS = sum(O_MEMBER1=2,O_MEMBER2=2,O_MEMBER3=2,O_MEMBER4=2);
do ROW = 1 to HOWMANYTWOS; /* This is stepping through and creating the new rows - you need to step through the variables to see if you want to make them null before outputting... NOTE do not change O_ variables only create/update the variables going to the output dataset (The O_ version is for checking against only)
ID = ifc(ROW = 1, O_ID, catx("_", O_ID, ROW);
/* create a counter
output;
end;
run;
Sorry - Not got sas here and its been a little while

Function to sum group based on id independently based off id

i am currently trying to write some code that goes through my dataset and sums each group everytime it appears independently of the whole group. this is what it currently looks like vs what i want it to. I thought it would be simple but sas 9.3 does not support sum over statements/
week ID var2 ... MinUnits group
24jun2019 1 x 5 0
01jul2019 1 x 4 1
08jul2019 1 x 7 1
15jul2019 1 x 2 1
22jul2019 1 x 0 2
29jul2019 1 x 5 2
05aug2019 1 x 2 2
24jun2019 1 x 9 0
01jul2019 2 x 5 1
08jul2019 2 x 6 1
15jul2019 2 x 8 1
22jul2019 2 x 1 2
29jul2019 2 x 5 2
05aug2019 3 x 3 2
what i want it to show
week ID var2 ... MinUnits group SumMinUnits
24jun2019 1 x 5 0 5
01jul2019 1 x 4 1 13
08jul2019 1 x 7 1
15jul2019 1 x 2 1
22jul2019 1 x 0 2 7
29jul2019 1 x 5 2
05aug2019 1 x 2 2
24jun2019 1 x 9 0 9
01jul2019 2 x 5 1 19
08jul2019 2 x 6 1
15jul2019 2 x 8 1
22jul2019 2 x 1 2 9
29jul2019 2 x 5 2
05aug2019 2 x 3 2
as you can see simply summing by group would not work because the group number gets repeated for different ID's (and eventually same ID's but in those cases a location variable is different than the orignal time the ID showed up).
please note i am not asking for you to code it for me as that is too much work. i just want to know if there is a functin i could use to do this. I thought about using a loop and groupby but that would sum up the total groups
You can use the NOTSORTED keyword on the BY statement use the GROUP variable to make BY groups.
data want;
do until (last.group);
set have ;
by group notsorted;
SumMinUnits=sum(SumMinUnits,MinUnits);
end;
do until (last.group);
set have ;
by group notsorted;
output;
end;
run;
Note this will set SUMMINUNITS to the same value for all observations in the group. You could add extra code to set it to missing inside the second DO loop when it is not the first observation for the group.
Wouldn't something like this work? It adds the total to every record of the group but otherwise your data seems order by ID and GROUP.
proc sql;
create table want as
select *, sum(minUnits) as total_units
from have
group by ID, GROUP;
quit;

Putting same income for same groupID

In my data, income was asked only to one person of the group.
householdID memberID income
1 1 4
2 2 .
1 2 .
2 3 .
2 1 3
But obviously, I need to fill them up like
householdID memberID income
1 1 4
2 2 3
1 2 4
2 3 3
2 1 3
How can I do this in Stata?
This is an elementary application of by:
bysort householdID (income) : replace income = income[1] if missing(income)
See for related material this FAQ
A more circumspect approach would check that at most one non-missing value has been supplied for each household:
bysort householdID (income) : gen OK = missing(income) | (income == income[1])
list if !OK

Create consecutive ID based on non-consecutive ID in Stata

Given the following variables (id and partner) in Stata, I would like to create a new variable (pid) that is simply the consecutive partner counter within id (as you can see, partner is not consecutive). Here is a MWE:
clear
input id partner pid
1 1 1
1 1 1
1 3 2
1 3 2
2 2 1
2 3 2
2 3 2
2 3 2
2 5 3
2 5 3
end
// create example data
clear
input id partner
1 1
1 1
1 3
1 3
2 2
2 3
2 3
2 3
2 5
2 5
end
// create pid
bysort id partner : gen pid = _n == 1
by id : replace pid = sum(pid)
// admire the result
list, sepby(id)