Create consecutive ID based on non-consecutive ID in Stata - stata

Given the following variables (id and partner) in Stata, I would like to create a new variable (pid) that is simply the consecutive partner counter within id (as you can see, partner is not consecutive). Here is a MWE:
clear
input id partner pid
1 1 1
1 1 1
1 3 2
1 3 2
2 2 1
2 3 2
2 3 2
2 3 2
2 5 3
2 5 3
end

// create example data
clear
input id partner
1 1
1 1
1 3
1 3
2 2
2 3
2 3
2 3
2 5
2 5
end
// create pid
bysort id partner : gen pid = _n == 1
by id : replace pid = sum(pid)
// admire the result
list, sepby(id)

Related

page break by length and group sas proc report

I would like to create a page break value that can help me break the page when I use proc report.
Now my data looks like this:
Group Value
a 1
a 2
a 3
...
b 1
b 2
...
c 1
c 2
c 3
And suppose I only want two lines per page, and break if the group changed.
So I need a dataset like this:
Group Value Page
a 1 1
a 2 1
a 3 2
...
b 1 3
b 2 3
...
c 1 4
c 2 4
c 3 5
Can anyone help me with this? Thanks!
Retain holds values across rows. Create a counter value that you can use to track the number of records per group. This allows you to split it into pages of N amount.
Use BY and FIRST to reset counter at the start of each group
Check if the you need to increment page
data have;
input Group $ Value;
cards;
a 1
a 2
a 3
b 1
b 2
c 1
c 2
c 3
;;;;
data want;
set have;
by group;
retain counter page;
if first.group then counter=0;
counter+1;
if mod(counter, 2) =1 or first.group then page+1;
run;
proc print data=want;
run;
Results:
Obs Group Value counter page
1 a 1 1 1
2 a 2 2 1
3 a 3 3 2
4 b 1 1 3
5 b 2 2 3
6 c 1 1 4
7 c 2 2 4
8 c 3 3 5

Split IDs in categorical variables

I have a variable with IDs:
clear
input ID
1
.
2
1
.
3
4
5
4
4
6
end
How can I create separate categorical variables with ID as a name and values of 1 and 2 (the latter if the generated variable matches the ID)?
For example, variable _ID_1 should look as follows:
2
.
1
2
.
1
1
1
1
1
1
Any ideas?
Another way to do it:
clear
input ID
1
.
2
1
.
3
4
5
4
4
6
end
forvalues j = 1/6 {
generate ID_`j' = 1 + (ID == `j') if ID != .
}
list

Function to sum group based on id independently based off id

i am currently trying to write some code that goes through my dataset and sums each group everytime it appears independently of the whole group. this is what it currently looks like vs what i want it to. I thought it would be simple but sas 9.3 does not support sum over statements/
week ID var2 ... MinUnits group
24jun2019 1 x 5 0
01jul2019 1 x 4 1
08jul2019 1 x 7 1
15jul2019 1 x 2 1
22jul2019 1 x 0 2
29jul2019 1 x 5 2
05aug2019 1 x 2 2
24jun2019 1 x 9 0
01jul2019 2 x 5 1
08jul2019 2 x 6 1
15jul2019 2 x 8 1
22jul2019 2 x 1 2
29jul2019 2 x 5 2
05aug2019 3 x 3 2
what i want it to show
week ID var2 ... MinUnits group SumMinUnits
24jun2019 1 x 5 0 5
01jul2019 1 x 4 1 13
08jul2019 1 x 7 1
15jul2019 1 x 2 1
22jul2019 1 x 0 2 7
29jul2019 1 x 5 2
05aug2019 1 x 2 2
24jun2019 1 x 9 0 9
01jul2019 2 x 5 1 19
08jul2019 2 x 6 1
15jul2019 2 x 8 1
22jul2019 2 x 1 2 9
29jul2019 2 x 5 2
05aug2019 2 x 3 2
as you can see simply summing by group would not work because the group number gets repeated for different ID's (and eventually same ID's but in those cases a location variable is different than the orignal time the ID showed up).
please note i am not asking for you to code it for me as that is too much work. i just want to know if there is a functin i could use to do this. I thought about using a loop and groupby but that would sum up the total groups
You can use the NOTSORTED keyword on the BY statement use the GROUP variable to make BY groups.
data want;
do until (last.group);
set have ;
by group notsorted;
SumMinUnits=sum(SumMinUnits,MinUnits);
end;
do until (last.group);
set have ;
by group notsorted;
output;
end;
run;
Note this will set SUMMINUNITS to the same value for all observations in the group. You could add extra code to set it to missing inside the second DO loop when it is not the first observation for the group.
Wouldn't something like this work? It adds the total to every record of the group but otherwise your data seems order by ID and GROUP.
proc sql;
create table want as
select *, sum(minUnits) as total_units
from have
group by ID, GROUP;
quit;

find the count of num column changing by id

Please can anyone help me the follwing probelm.
I have following dummy data:
id num
1 1
1 2
1 1
1 2
1 1
1 2
2 1
2 15
2 1
2 1
2 1
2 15
2 1
2 15
How to count number of times num (column) is changing for each id?
Please find the results and new column.
I need results like this
id number no_of_times
1 1 1
1 2 1
1 1 1
1 2 2
1 1 1
1 2 3
2 1 1
2 15 1
2 1 1
2 1 1
2 1 1
2 15 2
2 1 1
2 15 3
Hope you can understand after seeing the results
The following hash approach works for the test data provided with the question:
data have;
input id number no_of_times_target;
cards;
1 1 1
1 2 1
1 1 1
1 2 2
1 1 1
1 2 3
2 1 1
2 15 1
2 1 1
2 1 1
2 1 1
2 15 2
2 1 1
2 15 3
;
run;
data want;
set have;
by id;
if _n_ = 1 then do;
length prev_number no_of_times 8;
declare hash h();
rc = h.definekey('number','prev_number');
rc = h.definedata('no_of_times');
rc = h.definedone();
end;
prev_number = lag(number);
if number > prev_number and not(first.id) then do;
rc = h.find();
no_of_times = sum(no_of_times,1);
rc = h.replace();
end;
else no_of_times = 1;
if last.id then rc = h.clear();
drop rc prev_number;
run;

Putting same income for same groupID

In my data, income was asked only to one person of the group.
householdID memberID income
1 1 4
2 2 .
1 2 .
2 3 .
2 1 3
But obviously, I need to fill them up like
householdID memberID income
1 1 4
2 2 3
1 2 4
2 3 3
2 1 3
How can I do this in Stata?
This is an elementary application of by:
bysort householdID (income) : replace income = income[1] if missing(income)
See for related material this FAQ
A more circumspect approach would check that at most one non-missing value has been supplied for each household:
bysort householdID (income) : gen OK = missing(income) | (income == income[1])
list if !OK