Group By Interpolation Based on the Previous Row - sas

The goal is to add a new row whenever there is a gap between the date variable between two rows grouped by id.
If the gap occurs, then duplicate a row that is first. However only the date feature should not be as the first row rather it should be incremented by one day.
Also, everything needs to be grouped by id. I need to achieve it without expanding the function.
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data sample;
set sample;
format date yymmdd10.;
run;
The desired result:
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-03 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-02 4 V
2 2020-01-03 1 B
2 2020-01-04 1 B
2 2020-01-05 9 F
;
data sample;
set sample;
format date yymmdd10.;
run;

You can perform a 1:1 self merge with the second self starting at row 2 in order to provide a lead value. A 1:1 merge does not use a BY statement.
Example:
data have;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data want;
* 1:1 merge without by statement;
merge
have /* start at row 1 */
have ( firstobs=2 /* start at row 2 for lead values */
keep=id date /* more data set options that prepare the lead */
rename = ( id=nextid
date=nextdate
))
;
output;
flag = '*'; /* marker for filled in dates */
if id = nextid then
do date=date+1 to nextdate-1;
output;
end;
drop next:;
run;
Result flagging filled in dates

To "look ahead" you can re-read the same dataset starting from the second observation. SAS will stop when you read past the end of the input so add an extra empty observation.
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd.;
format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data want;
set sample;
by id;
set sample(firstobs=2 keep=date rename=(date=next_date)) sample(obs=1 drop=_all_);
output;
if not last.id then do date=date+1 to next_date-1; output; end;
run;
Results:
numeric_ character_
Obs id date feature feature next_date
1 1 2020-01-01 5 A 2020-01-02
2 1 2020-01-02 3 Z 2020-01-04
3 1 2020-01-03 3 Z 2020-01-04
4 1 2020-01-04 2 D 2020-01-05
5 1 2020-01-05 7 B 2020-01-01
6 2 2020-01-01 4 V 2020-01-03
7 2 2020-01-02 4 V 2020-01-03
8 2 2020-01-03 1 B 2020-01-05
9 2 2020-01-04 1 B 2020-01-05
10 2 2020-01-05 9 F .

Related

SAS: assign a value to a variable based on characteristics of other variables

I'd like to assign to an empty field a value based on many values of other entries. Here my dataset:
input ID date $10. type typea $10. ;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 . 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 . 1 cb
5 . 2 b
;
run;
My goal is the following: for all empty entries of the variable "date", assign to it the same date of the record which has the same ID, the same type, but a different typea. If there aren't other records with the criteria described, leave the date field empty. So the output should be:
data temp;
input ID date $10. type typea $10.;
datalines;
1 10/11/2006 1 a
2 10/12/2006 2 a
2 10/12/2006 2 b
3 20/01/2007 5 p
4 . 1 r
5 11/09/2008 1 ca
5 11/09/2008 1 cb
5 . 2 b
;
run;
I tried with something like that based on another answer on SO (SAS: get the first value where a condition is verified by group), but it doesn't work:
by ID type typea ;
run;
data temp;
set temp;
by ID type typea ;
if cat(first.ID, first.type, first.typea) then date_store=date;
if cat(ID eq ID and type ne type and typea eq typea) then do;
date_change_type1to2=date_store;
end;
run;
Do you have any hints? Thanks a lot!
You could use UPDATE statement to help you carry-forward the DATE values for a group.
data have;
input ID type typea :$10. date :yymmdd. ;
format date yymmdd10.;
datalines;
1 1 a 2006-11-10
2 2 a 2006-12-10
2 2 b .
3 5 p 2007-01-20
4 1 r .
5 1 ca 2008-09-11
5 1 cb .
5 2 b .
;
data want;
update have(obs=0) have;
by id type ;
output;
run;
If there are also missing values of TYPEA then those will also be carried forward. If you don't want that to happen you could re-read just those variables after the update.
data want;
update have(obs=0) have;
by id type ;
set have(keep=typea);
output;
run;

Needing to retain Lab category tests based on individual positive test result

Hello so this is a sample of my data (There is an additional column of LBCAT =URINALYSIS for those panel of tests)
I've been asked to only include the panel of tests where LBNRIND is populated for any of those tests and the rest to be removed. Some subjects have multiple test results at different visit timepoints and others only have 1.I can't utilise a simple where LBNRIND ne '' in the data step because I need the entire panel of Urinalysis tests and not just that particular test result. What would be the best approach here? I think transposing the data would be too messy but maybe putting the variables in an array/macro and utilising a do loop for those panel of tests?.
Update:I've tried this code but it doesn't keep the corresponding tests for where lb_nrind >0. If I apply the sum(lb_nrind > '' ) the same when applying lb_nrind > '' to the having clause
*proc sql;
*create table want as
select * from labUA
group by ptno and day and lb_cat
having sum(lb_nrind > '') > 0 ;
data want2;
do _n_ = 1 by 1 until (last.ptno);
set labUA;
by ptno period day hour ;
if not flag_group then flag_group = (lb_nrind > '');
end;
do _n_ = 1 to _n_;
set want;
if flag_group then output;
end;
drop flag_group; run;*
You can use a SQL HAVING clause to retain rows of a group meeting some aggregate condition. In your case that group might be a patientid, panelid and condition at least one LBNRIND not NULL
Example:
Consider this example where a group of rows is to be kept only if at least one of the rows in the group meets the criteria result7=77
Both code blocks use the SAS feature that a logical evaluation is 1 for true and 0 for false.
SQL
data have;
infile datalines missover;
input id test $ parm $ result1-result10;
datalines;
1 A P 1 2 . 9 8 7 . . . .
1 B Q 1 2 3
1 C R 4 5 6
1 D S 8 9 . . . 6 77
1 E T 1 1 1
1 F U 1 1 1
1 G V 2
2 A Z 3
2 B K 1 2 3 4 5 6 78
2 C L 4
2 D M 9
3 G N 8
4 B Q 7
4 D S 6
4 C 1 1 1 . . 5 0 77
;
proc sql;
create table want as
select * from have
group by id
having sum(result7=77) > 0
;
DOW Loop
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if not flag_group then flag_group = (result7=77);
end;
do _n_ = 1 to _n_;
set have;
if flag_group then output;
end;
drop flag_group;
run;

Generating Unique ID for same group

I have data set,
CustID Rating
1 A
1 A
1 B
2 A
2 B
2 C
2 D
3 X
3 X
3 Z
4 Y
4 Y
5 M
6 N
7 O
8 U
8 T
8 U
And expecting Output
CustID Rating ID
1 A 1
1 A 1
1 B 1
2 A 1
2 B 2
2 C 3
2 D 4
3 X 1
3 X 1
3 Z 2
4 Y 1
4 Y 1
5 M 1
6 N 1
7 O 1
8 U 1
8 T 2
8 U 1
In the solution below, I selected the distinct possible ratings into a macro variable to be used in an array statement. These distinct values are then searched in the ratings tolumn to return the number assigned at each successful find.
You can avoid the macro statement in this case by replacing the %sysfunc by 3 (the number of distinct ratings, if you know it before hand). But the %sysfunc statement helps resolve this in case you don't know.
data have;
input CustomerID Rating $;
cards;
1 A
1 A
1 B
2 A
2 A
3 A
3 A
3 B
3 C
;
run;
proc sql noprint;
select distinct quote(strip(rating)) into :list separated by ' '
from have
order by 1;
%put &list.;
quit;
If you know the number before hand:
data want;
set have;
array num(3) $ _temporary_ (&list.);
do i = 1 to dim(num);
if findw(rating,num(i),'tips')>0 then id = i;
end;
drop i;
run;
Otherwise:
%macro Y;
data want;
set have;
array num(%sysfunc(countw(&list., %str( )))) $ _temporary_ (&list.);
do i = 1 to dim(num);
if findw(rating,num(i),'tips')>0 then id = i;
end;
drop i;
run;
%mend;
%Y;
The output:
Obs CustomerID Rating id
1 1 A 1
2 1 A 1
3 1 B 2
4 2 A 1
5 2 A 1
6 3 A 1
7 3 A 1
8 3 B 2
9 3 C 3
Assuming data is sorted by customerid and rating (as in the original unedited question). Is the following what you want:
data want;
set have;
by customerid rating;
if first.customerid then
id = 0;
if first.rating then
id + 1;
run;

SAS - Split single column into two based on value of an ID column

I have data which is as follows.
data have;
input group replicate $ sex $ count;
datalines;
1 A F 3
1 A M 2
1 B F 4
1 B M 2
1 C F 4
1 C M 5
2 A F 5
2 A M 4
2 B F 6
2 B M 3
2 C F 2
2 C M 2
3 A F 5
3 A M 1
3 B F 3
3 B M 4
3 C F 3
3 C M 1
;
run;
I want to break the count column into two separate columns based on gender.
count_ count_
Obs group replicate female male
1 1 A 3 2
2 1 B 4 2
3 1 C 4 5
4 2 A 5 4
5 2 B 6 3
6 2 C 2 2
7 3 A 5 1
8 3 B 3 4
9 3 C 3 1
This can be done by first creating two separate data sets for each level of sex and then performing a merge.
data just_female;
set have;
where sex = 'F';
rename count = count_female;
run;
data just_male;
set have;
where sex = 'M';
rename count = count_male;
run;
data want;
merge
just_female
just_male
;
by
group
replicate
;
keep
group
replicate
count_female
count_male
;
run;
Is there a less verbose way to do this which doesn't require the need to sort or explicitly drop/keep variables?
You can do this using proc transpose but you will need to sort the data. I believe this is what you're looking for though.
proc sort data=have;
by group replicate;
run;
The data is sorted so now you have your by-group for transposing.
proc transpose data=have out=want(drop=_name_) prefix=count_;
by group replicate;
id sex;
var count;
run;
proc print data=want;
Then you get:
Obs group replicate count_F count_M
1 1 A 3 2
2 1 B 4 2
3 1 C 4 5
4 2 A 5 4
5 2 B 6 3
6 2 C 2 2
7 3 A 5 1
8 3 B 3 4
9 3 C 3 1

Using a sas lookup table when the column number changes

I have two sas datasets,
Table 1 Table 2
col1 col2 col3 col4 col5 a b
. 1 2 3 4 1 1
1 5 8 6 1 1 4
2 5 9 7 1 4 3
3 6 9 7 1 2 1
4 6 9 7 2 2 2
where table 1 is a lookup table for values a and b in table 2, such that I can make a column c. In table 1 a is equivalent to col1 and b to row1 (i.e. the new column c in table 2 should read 5,1,7,5,9. How can I achieve this in sas. I was thinking of reading table 1 into a 2d array then get column c = array(a,b), but can't get it to work
Here's an IML solution, first, as I think this is really the 'best' solution for you - you're using a matrix, so use the matrix language. I'm not sure if there's a non-loop method - there may well be; if you want to find out, I would add the sas-iml tag to the question and see if Rick Wicklin happens by the question.
data table1;
input col1 col2 col3 col4 col5 ;
datalines;
. 1 2 3 4
1 5 8 6 1
2 5 9 7 1
3 6 9 7 1
4 6 9 7 2
;;;;
run;
data table2;
input a b;
datalines;
1 1
1 4
4 3
2 1
2 2
;;;;
run;
proc iml;
use table1;
read all var _ALL_ into table1[colname=varnames1];
use table2;
read all var _ALL_ into table2[colname=varnames2];
print table1;
print table2;
table3 = j(nrow(table2),3);
table3[,1:2] = table2;
do _i = 1 to nrow(table3);
table3[_i,3] = table1[table3[_i,1]+1,table3[_i,2]+1];
end;
print table3;
quit;
Here is the temporary array solution. It's not all that pretty. If speed is an issue you don't have to loop over the array to insert it, you can use direct memory access, but I don't want to do that unless speed is a huge issue (and if it is, you should use a better data structure first).
data table3;
set table2;
array _table1[4,4] _temporary_;
if _n_ = 1 then do;
do _i = 1 by 1 until (eof);
set table1(firstobs=2) nobs=_nrows end=eof;
array _cols col2-col5;
do _j = 1 to dim(_cols);
_table1[_i,_j] = _cols[_j];
end;
end;
end;
c = _table1[a,b];
keep a b c;
run;
Just use the POINT= option on a SET statement to pick the row. You can then use an ARRAY to pick the column.
data table1 ;
input col1-col4 ;
cards;
5 8 6 1
5 9 7 1
6 9 7 1
6 9 7 2
;
data table2 ;
input a b ;
cards;
1 1
1 4
4 3
2 1
2 2
;
data want ;
set table2 ;
p=a ;
set table1 point=p ;
array col col1-col4 ;
c=col(b);
drop col1-col4;
run;