Create a running counter based on ID and date - sas

I have 3 variables and a counter has to be created based on them.
Input:
ID window start window end
1 29oct20 12mar21
1 31oct20 08Feb21
1 31oct21 08feb21
1 31oct21 08feb21
2 06Nov20 11Apr21
2 06Nov20 11Apr21
2 27Nov20 01Apr19
Expected output:
ID window start window end priority_count
1 29oct20 12mar21 1
1 31oct20 08Feb21 2
1 31oct21 08feb21 2
1 31oct21 08feb21 2
2 06Nov20 11Apr21 1
2 06Nov20 11Apr21 1
2 27Nov20 01Apr19 2
So for every ID a new count should start once a new date comes.
I have been using this code
data want;
set have;
by ID window_start window_end;
if first.ID and first.window_start and first.window_endthen priority_count=1;
else priority_count+1;
run;
But it gives:
priority_count
1
2
3
4
1
2
3

Not sure if those are typos but there are several observations for which window_start is after window_end.
Using the LAG function
data want;
set have;
by id;
_lag=lag(window_start);
if first.id then priority_count=1;
else do;
if window_start ne _lag then
priority_count + 1;
end;
drop _lag;
run;
ID window_start window_end priority_count
1 29OCT2020 12MAR2021 1
1 31OCT2020 08FEB2021 2
1 31OCT2020 08FEB2021 2
1 31OCT2020 08FEB2021 2
2 06NOV2020 11APR2021 1
2 06NOV2020 11APR2021 1
2 27NOV2020 01APR2019 2

I think you're on the right track but need a slight modifications on your IF statements to reflect the logic.
Set to 0 at first of each ID
Increment if the window_end changes (or window_start since they're consistent in your example). Setting it to 0 initially means you can increment without worrying if it's the first or not.
data want;
set have;
by ID window_start window_end;
if first.ID then priority_count=0;
if first.window_end then priority_count+1;
run;

Related

How do I find first row of last group in SAS, where ordering matters?

I'd like to ask help in this, as I am new to SAS, but a PROC SQL approach is usable as well.
My dataset has IDs, a time variable, and a flag. After I sort by id and time, I need to find the first flagged observation of the last flagged group/streak. As in:
ID TIME FLAG
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
Here I want my script to return the row where time is 8 for ID 1, as it is the first observation from the last "streak", or flagged group. For ID 2 it should be where time is 3.
Desired output:
ID TIME FLAG
1 8 1
2 3 1
I'm trying to wrap my head around using first. and last. here, but I suppose the problem here is that I view temporally displaced flagged groups/streaks as different groups, while SAS looks at them as they are only separated by flag, so a simple "take first. from last." is not sufficient.
I was also thinking of collapsing the flags to a string and using a regex lookahead, but I couldn't come up with either the method or the pattern.
I would just code a double DOW loop. The first will let you calculate the observation for this ID that you want to output and the second will read through the records again and output the selected observation.
You can use the NOTSORTED keyword on the BY statement to have SAS calculate the FIRST.FLAG variable.
data have;
input ID TIME FLAG;
cards;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;
data want;
do obs=1 by 1 until(last.id);
set have;
by id flag notsorted;
if first.flag then want=obs;
end;
do obs=1 to obs;
set have;
if obs=want then output;
end;
drop obs want;
run;
Loop through the dataset by id. Use the lag function to look at the current and previous value of flag. If the current value is 1 and the previous value is 0, or it's the first observation for that ID, write the value of time to a retained variable. Only output the last observation for each id. The retained variable should contain the time of the first flagged observation of the last flagged group:
data result;
set have;
by id;
retain firstflagged;
prevflag = lag(flag);
if first.id and flag = 1 then firstflagged = time;
else if first.id and flag = 0 then firstflagged = .;
else if flag = 1 and prevflag = 0 then firstflagged = time;
if last.id then output;
keep id firstflagged flag;
rename firstflagged = time;
run;

Needing to retain Lab category tests based on individual positive test result

Hello so this is a sample of my data (There is an additional column of LBCAT =URINALYSIS for those panel of tests)
I've been asked to only include the panel of tests where LBNRIND is populated for any of those tests and the rest to be removed. Some subjects have multiple test results at different visit timepoints and others only have 1.I can't utilise a simple where LBNRIND ne '' in the data step because I need the entire panel of Urinalysis tests and not just that particular test result. What would be the best approach here? I think transposing the data would be too messy but maybe putting the variables in an array/macro and utilising a do loop for those panel of tests?.
Update:I've tried this code but it doesn't keep the corresponding tests for where lb_nrind >0. If I apply the sum(lb_nrind > '' ) the same when applying lb_nrind > '' to the having clause
*proc sql;
*create table want as
select * from labUA
group by ptno and day and lb_cat
having sum(lb_nrind > '') > 0 ;
data want2;
do _n_ = 1 by 1 until (last.ptno);
set labUA;
by ptno period day hour ;
if not flag_group then flag_group = (lb_nrind > '');
end;
do _n_ = 1 to _n_;
set want;
if flag_group then output;
end;
drop flag_group; run;*
You can use a SQL HAVING clause to retain rows of a group meeting some aggregate condition. In your case that group might be a patientid, panelid and condition at least one LBNRIND not NULL
Example:
Consider this example where a group of rows is to be kept only if at least one of the rows in the group meets the criteria result7=77
Both code blocks use the SAS feature that a logical evaluation is 1 for true and 0 for false.
SQL
data have;
infile datalines missover;
input id test $ parm $ result1-result10;
datalines;
1 A P 1 2 . 9 8 7 . . . .
1 B Q 1 2 3
1 C R 4 5 6
1 D S 8 9 . . . 6 77
1 E T 1 1 1
1 F U 1 1 1
1 G V 2
2 A Z 3
2 B K 1 2 3 4 5 6 78
2 C L 4
2 D M 9
3 G N 8
4 B Q 7
4 D S 6
4 C 1 1 1 . . 5 0 77
;
proc sql;
create table want as
select * from have
group by id
having sum(result7=77) > 0
;
DOW Loop
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if not flag_group then flag_group = (result7=77);
end;
do _n_ = 1 to _n_;
set have;
if flag_group then output;
end;
drop flag_group;
run;

SAS and do loop

I'm writing a program in SAS.
Here's the dataset I have:
id huuse days
1 0 4
1 0 3
1 1 12
1 1 1
1 2 15
2 1 13
2 0 16
2 1 18
2 0 44
For each ID, I want to delete the record if variable huuse ne 1, until I get to the first huuse=1. Then I want to keep that record and all subsequent records for that id, no matter what value huuse is. So for id=1, I want to delete the first two records than keep all records for id=1 starting with the 3rd record. For id=2, the first record has huuse=1, so I want to keep all records for id=2.
The data set I want should look like this:
id huuse days
1 0 4
1 0 3
1 1 12
1 1 1
1 2 15
2 1 13
2 0 16
2 1 18
2 0 44
I tried this code, but it removes all records that have huuse ne 1.
data want;
set have;
by id;
do until (huuse=1);
if huuse = 1 then LEAVE;
if huuse ne 1 then DELETE;
END;
run;
I've tried several variations of do loops, but they all do the same thing.
The DATA step is a program with an implicit loop that reads every record of the data set specified in the SET statement. Any program data vector (pdv) variables not coming from the data set are, by default, reset to missing at the top of the implicit loop. You change that behavior using a RETAIN statement to name variables that should not get reset.
So, in your problem you have two situations when a tracking variable is needed. The variable will track the state of the condition Have I seen huuse=1 yet in this group ?. Call this variable one_flag
RETAIN one_flag; so you control when it's value changes
At the start of a BY group one_flag needs to be reset to false (0)
When huuse is first seen as 1 set the flag to true (1)
Example:
data want(drop=one_flag);
set have;
by id;
retain one_flag 0;
if first.id then one_flag = 0;
if not one_flag and huuse = 1 then one_flag = 1;
if one_flag then OUTPUT; * want all rows in group starting at first huuse=1;
run;
You can place the SET and BY statement inside an explicit DO and that changes the operating behavior of the program, especially if the explicit loop is terminated according to a LAST.<var> automatic variable. Such a loop is commonly called a DOW loop by SAS programmers. There is no phrase DOW loop in the SAS documentation.
Example:
data want;
do until (last.id);
set have;
by id;
if not one_flag and huuse=1 then one_flag = 1;
if one_flag then OUTPUT; * want all rows in group starting at first huuse=1;
end;
run;
Because the looping is explicit and never reaches the TOP of the program with in the loop, there is no need to RETAIN the flag variable, nor reset it. Program variables that are not retained are reset automatically at the top of the program, and the top of the program is only reached at the start of the BY group. Learn more about this programming construct in the SGF 2013 paper "The Magnificent DO", Paul M. Dorfman
Your source and result are same :-)
But if I understood your question correctly the solution is quite simple with a retain solution. I add 2 lines to the example to make it clear that I understood correctly.
The code with example table:
data test;
id=1;huuse=0;days=4;output;
id=1;huuse=0;days=3;output;
id=1;huuse=1;days=12;output;
id=1;huuse=1;days=1;output;
id=1;huuse=2;days=15;output;
id=2;huuse=1;days=13;output;
id=2;huuse=0;days=16;output;
id=2;huuse=1;days=18;output;
id=2;huuse=0;days=44;output;
id=3;huuse=0;days=1;output;
id=3;huuse=1;days=2;output;
run;
data test_output;
set test;
retain keep_id -1;
if (keep_id ne id and huuse ne 0) then keep_id=id;
if keep_id = id then output;
run;
/* the results:
id huuse days
1 1 12 1
1 1 1 1
1 2 15 1
2 1 13 2
2 0 16 2
2 1 18 2
2 0 44 2
3 1 2 3
*/

How to sum value from next row by group using SAS?

I want to create a column in my dataset that calculates the sum of the current row and next row for another field. There are several groups within the data, and I only want to take the sum of the next row if the next row is part of the current group. If a row is the last record for that group I want to fill with a null value.
I'm referencing reading next observation's value in current observation, but still can't figure out how to obtain the solution I need.
For example:
data have;
input Group ID Salary;
cards;
10 1 1
10 2 2
10 3 2
10 4 1
11 1 2
11 2 2
11 3 1
11 4 1
;
run;
The result I want to obtain here is this:
data want;
input Group ID Salary Sum;
cards;
10 1 1 3
10 2 2 4
10 3 2 3
10 4 1 .
11 1 2 4
11 2 2 3
11 3 1 2
11 4 1 .
;
run;
Similar to Tom's answer, but using a 'look-ahead' merge (without a by statement, and firstobs=2) :
data want ;
merge have
have (firstobs=2
keep=Group Salary
rename=(Group=NextGroup Salary=NextSalary)) ;
if Group = NextGroup then sum = sum(Salary,NextSalary) ;
drop Next: ;
run ;
Use BY group processing and a second SET statement that skips the first observation.
data want ;
set have end=eof;
by group ;
if not eof then set have (keep=Salary rename=(Salary=Sum) firstobs=2);
if last.group then Sum=.;
else sum=sum(sum,salary);
run;
I found a solution using proc expand that produced what I needed:
proc sort data = have;
by Group ID;
run;
proc expand data=have out=want method=none;
by Group;
convert Salary = Next_Sal / transformout=(lead 1);
run;
data want(keep=Group ID Salary Sum);
set want;
Sum = Salary + Next_Sal;
run;

Creating a dummy variable for ``switching''

I'm working on a project in SAS and I wanted to create a dummy variable that accounted for ``preferences in medicine''. I have a long data-set, by time period, of individuals taking either medicine type 1 or type 2. For my research, I want to create a variable to represent if individuals who take type 1 medicine, then switched to type 2, but went back to type 1. I am unconcerned with the time interval that the individual was on the medication for, just that they followed this pattern.
id month type
1 1 2
1 2 2
1 3 2
2 1 1
2 2 2
2 3 1
...
I have more months, but just wanted to provide something to elucidate what I'm trying to get. Basically, I want to tally those subjects who are like subject 2.
well, nothing fancy, but it works for me:
DATA LONG1;
input id month type;
cards;
1 1 2
1 2 2
1 3 2
1 4 2
1 5 2
1 6 2
1 7 2
1 8 2
1 9 2
1 10 2
2 1 1
2 2 1
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
2 8 1
2 9 1
2 10 1
3 1 1
3 2 1
3 3 1
3 4 2
3 5 1
3 6 1
3 7 1
3 8 1
3 9 1
3 10 1
;
Proc Print; run;
* 1) make a wide dataset by deconstructing the initial long data by month & rejoining by id
2) then use if/then statements to create your dummy variable,
3) then merge the dummy variable back into your long dataset using ID;
DATA month1; set long1; where month=1; rename month=month_1 type=type_1; Proc Sort; by ID; run;
DATA month2; set long1; where month=2; rename month=month_2 type=type_2; Proc Sort; by ID; run;
DATA month3; set long1; where month=3; rename month=month_3 type=type_3; Proc Sort; by ID; run;
DATA month4; set long1; where month=4; rename month=month_4 type=type_4; Proc Sort; by ID; run;
DATA month5; set long1; where month=5; rename month=month_5 type=type_5; Proc Sort; by ID; run;
DATA month6; set long1; where month=6; rename month=month_6 type=type_6; Proc Sort; by ID; run;
DATA month7; set long1; where month=7; rename month=month_7 type=type_7; Proc Sort; by ID; run;
DATA month8; set long1; where month=8; rename month=month_8 type=type_8; Proc Sort; by ID; run;
DATA month9; set long1; where month=9; rename month=month_9 type=type_9; Proc Sort; by ID; run;
DATA month10; set long1; where month=10; rename month=month_10 type=type_10; Proc Sort; by ID; run;
DATA WIDE;
merge month1 month2 month3 month4 month5 month6 month7 month8 month9 month10; by ID;
if (type_1=1 and type_2=1 and type_3=1 and type_4=1 and type_5=1
and type_6=1 and type_7=1 and type_8=1 and type_9=1 and type_10=1) or
(type_1=2 and type_2=2 and type_3=2 and type_4=2 and type_5=2
and type_6=2 and type_7=2 and type_8=2 and type_9=2 and type_10=2)
then switch='no '; else switch='yes '; keep ID switch; run;
DATA LONG2;
merge wide long1; by ID;
Proc Print; run;
btw: also go to the SAS listserv, they love stuff like this:
http://www.listserv.uga.edu/archives/sas-l.html
This worked on the limited data I used:
DATA Have;
input id month type;
datalines;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 1 1
2 2 2
2 3 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 2
3 4 2
3 5 1
4 1 2
4 2 2
4 3 2
4 4 2
4 5 2
;
Data Temp(keep=id dummy);
length dummy $15;
retain Start Type2 dummy;
set Have;
by id;
if first.id then Do;
Start=0;
Type2=0;
Dummy="";
end;
If Type=1 then do;
If Start=0 then Start=1;
else if Start=1 and Type2=1 then Dummy="Switch-er-Roo";
end;
else do;
if Start=1 then Type2=1;
end;
if last.id then output;
run;
Data Want;
merge temp(in=a) have(in=b);
by id;
run;
I prefer #CarolinaJay65 approach, it's a lot cleaner and just involves one pass of the data. If all you are interested in are the patients who start and finish on Type1, but use Type2 at some point, then the code can be simplified slightly. The following code (using #CarolinaJay65 source data) will only output the patient_id's matching this criteria.
data switch_id (keep=id);
set have;
by id month;
retain switch;
if first.id then do;
call missing(switch);
if type=1 then switch=0;
end;
else if not missing(switch) and type=2 then switch=1;
if last.id and type=1 and switch=1 then output;
run;
If you just wanted the number of patients who match the criteria then you could tweak this code further.
data switch (keep=count);
set have end=final;
by id month;
retain switch count 0;
if first.id then do;
call missing(switch);
if type=1 then switch=0;
end;
else if not missing(switch) and type=2 then switch=1;
if last.id and type=1 and switch=1 then count+1;
if final then output;
run;
I think the following should work:
DATA Have;
input id month type;
if _n_ ^= 1 and id ^= lag(id) then diftype = .;
else diftype = dif(type);
datalines;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 1 1
2 2 2
2 3 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 2
3 4 2
3 5 1
4 1 2
4 2 2
4 3 2
4 4 2
4 5 2
;
proc sql;
select case when max(diftype) = 1 and min(diftype) = -1 then 1 else 0 end as flag, * from have
group by id
;
quit;