SAS - Working with consecutive months? - sas

From the sample data below, I'm trying to identify accounts (by ID and SEQ) where there is an occurence of STATUS_DATE for at least 3 consecutive months. I've been messing with this for a while and I'm not at all sure how to tackle it.
Sample Data:
ID SEQ STATUS_DATE
11111 1 01/01/2014
11111 1 02/10/2014
11111 1 03/15/2014
11111 1 05/01/2014
11111 2 01/30/2014
22222 1 06/20/2014
22222 1 07/15/2014
22222 1 07/16/2014
22222 1 08/01/2014
22222 2 02/01/2014
22222 2 09/10/2014
What I need to return:
ID SEQ STATUS_DATE
11111 1 01/01/2014
11111 1 02/10/2014
11111 1 03/15/2014
22222 1 06/20/2014
22222 1 07/15/2014
22222 1 07/16/2014
22222 1 08/01/2014
Any help would be appreciated.

Here is one method:
data have;
input ID SEQ STATUS_DATE $12.;
datalines;
11111 1 01/01/2014
11111 1 02/10/2014
11111 1 03/15/2014
11111 1 05/01/2014
11111 2 01/30/2014
22222 1 06/20/2014
22222 1 07/15/2014
22222 1 07/16/2014
22222 1 08/01/2014
22222 2 02/01/2014
22222 2 09/10/2014
;
run;
data grouped (keep = id seq status_date group) groups (keep = group2);
set have;
sasdate = input(status_date, mmddyy12.);
month = month(sasdate);
year = year(sasdate);
pdate = intnx('month', sasdate, -1);
if lag(year) = year(sasdate) and lag(month) = month(sasdate) then group+0;
else if lag(year) = year(pdate) and lag(month) = month(pdate) then count+1;
else do;
group+1;
count = 0;
end;
if count = 0 and lag(count) > 1 then do;
group2 = group-1;
output groups;
end;
output grouped;
run;
data want (keep = id seq status_date);
merge grouped groups (in=a rename=(group2=group));
by group;
if a;
run;
Basically I give observations the same group number if they are in consecutive months, then also create a data set with group numbers of groups with more than 2 observations. Then I merge those two data sets and only keep observations which are in the second data set, that is, those with more than 2 observations.

How about following. However you may want to sort on Month if thats what you want.
data want;
do _n_ = 1 by 1 until(last.id);
set survey;
by id;
if _n_ <=3 then output;
end;
run;

Related

Summation for more than once of the dataset using proc sql

My data
data mydata;
input
Category $
Item
type
amount;
datalines;
A 1 100 11111
A 2 900 11111
A 3 123 11111
B 1 113 11111
B 2 900 11111
C 1 111 11111
C 2 900 11111
;
My attempt
proc sql;
create table want as
select *, sum(amount and item <> 900) as without900, sum(amount) as total from mydata
group by category
;
quit;
Result
Category Item type amount without900 total
A 3 123 11111 3 33333
A 1 100 11111 3 33333
A 2 900 11111 3 33333
B 2 900 11111 2 22222
B 1 113 11111 2 11111
C 2 900 11111 2 11111
C 1 111 11111 2 11111
Expected result
Category Item type amount without900 total
A 3 123 11111 22222 33333
A 1 100 11111 22222 33333
A 2 900 11111 22222 33333
B 2 900 11111 11111 22222
B 1 113 11111 11111 11111
C 2 900 11111 11111 11111
C 1 111 11111 11111 11111
I know this can be easily achieved by creating another table and maybe hence using left join. I wonder how to achieve the expected using as least proc SQL step as possible. Thank you very much.
You are comparing item to 900, when you should be comparing type. The conditional sum can be accomplished using a case clause within.
Example
data mydata;
input Category $ Item type amount;
datalines;
A 1 100 11111
A 2 900 11111
A 3 123 11111
B 1 113 11111
B 2 900 11111
C 1 111 11111
C 2 900 11111
;
proc sql;
create table want as
select
*
, sum(case when type ne 900 then amount end) as without900
, sum(amount) as total
from
mydata
group by
category
;
quit;

Last Changed row +1 in a group

I've got group data and it has flags created anytime a name is changed within that group. I can pull the last two or first two observations within the group, but I am struggling figuring out how to pull the last observation with a name change AND the row right after.
The below code give me the first or last two observations per group, depending on how I sort the data.
DATA LastTwo;
SET WhatIveGot;
count + 1;
BY group_ID /*data pre sorted*/;
IF FIRST.group_ID THEN count=1;
IF count<=2 THEN OUTPUT;
RUN;
What I need is to be the LAST observation with a name change AND the following row.
group_ID NAME DATE NAME_CHange
1 TOM 1/1/19 0
1 Jill 1/30/19 1
1 Jill 1/20/19 0
1 Bob 2/10/19 1
1 Bob 2/30/19 0
2 TOM 2/1/19 0
2 Jill 2/30/19 1
2 Jill 2/20/19 0
2 Jim 3/10/19 1
2 Jim 3/30/19 0
2 Jim 4/15/19 0
3 Joe 2/20/19 0
3 Kim 3/10/19 1
3 Kim 3/30/19 0
3 Ken 4/15/19 1
4 Tim 3/10/19 0
4 Tim 3/30/19 0
The desired output:
group_ID NAME DATE NAME_CHange
1 Bob 2/10/19 1
1 Bob 2/30/19 0
2 Jim 3/10/19 1
2 Jim 3/30/19 0
3 Ken 4/15/19 1
The cases for Group_ID 2 and 3 are the roadblock. The data is already sorted by date.
Thank you for any help in advance
Use DOW processing to determine where the last name change was. Apply that information in a succeeding loop.
Example:
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id name notsorted;
if first.name then _index_of_last_name_change = _n_;
end;
do _n_ = 1 to _n_;
set have;
if _index_of_last_name_change <= _n_ <= _index_of_last_name_change+1 then OUTPUT;
end;
drop _:;
run;

SAS flag each row that contains the max value

I tried searching but couldn't exactly find what I was looking for. I have a dataset with multiple rows per ID. I'd like to add a variable called maxdec and show a 1 for each row that has the max dec for each ID.
Sample Dataset:
ID DEC
123 1
123 2
123 2
123 2
456 2
456 3
456 3
Desired Output:
ID DEC MAXDEC
123 1 .
123 2 1
123 2 1
123 2 1
456 2 .
456 2 .
456 3 1
It is easier to define it with 1 or 0 instead of 1 or missing.
proc sql;
create table want as
select id,dec, dec=max(dec) as maxdec
from have
group by id
;
quit;
proc sort data=have;
by id;
proc summary data=have;
class id;
var dec;
output out=max_info max=max_value;
run;
data want;
merge have
max_info (keep=id max_value)
;
by id;
if dec=max_value then maxdec=1;
run;
The proc summary calculates the maximum value of DEC for each ID, and outputs as variable MAX_VALUE in dataset MAX_INFO. The subsequent data step assigns MAXDEC=1 if the current value of DEC is equal to MAX_VALUE for that ID.
Here is a DoW loop approach
data have;
input ID DEC;
datalines;
123 1
123 2
123 2
123 2
456 2
456 3
456 3
;
data want(drop = m);
do _N_ = 1 by 1 until (last.id);
set have;
by id;
m = max(maxdex, dec);
end;
do _N_ = 1 to _N_;
set have;
maxdex = ifn(dec = m, 1, .);
output;
end;
run;

In SAS: How to flag unique combinations of a set of variable values

In SAS, how can I create an identifier for each unique combination of a set of variables?
I have, for example, a several thousand observations with a dichotomous value for six variables. There are 2^6 unique combinations for the values of these variables for each observation. I would like to create an identifier for each unique combination, and eventually group my observations according to this value.
Have:
SubjectID Var1 Var2 Var3 Var4 Var5 Var6
---------------------------------------------------------------
ID1 1 1 1 1 1 1
ID2 1 0 1 1 1 1
ID3 0 1 1 1 1 1
ID4 0 0 1 1 1 0
... ... ... ... ... ... ...
ID3000 1 1 0 1 0 0
Want:
SubjectID Var1 Var2 Var3 Var4 Var5 Var6 Identifier
------------------------------------------------------------------------------
ID1 1 1 1 1 1 1 A
ID2 1 1 1 1 1 1 A
ID3 0 1 1 1 1 1 B
ID4 0 0 1 1 1 0 C
... ... ... ... ... ... ...
ID3000 1 1 0 1 0 0 Z
A would represent 1, 1, 1, 1, 1, 1 as a unique combination and B would represent 0, 1, 1, 1, 1, 1 etc.
I have thought about creating a dummy variable based on 64 Var1-Var6 conditional statements. I've also thought about concatenating the values from Var1-Var6 into a new row to create a unique identifier.
Is there a more straightforward way of going about this?
I prefer an approach that assigns a specific identifier to a specific combination of the values, rather than one that just generates some arbitrary unique string whenever a new combination comes up.
Proc summary works well with the LEVELS option. This technique works for any values of the group variables numeric or character.
data have;
input (v1-v6)(1.);
cards;
111111
111110
111101
111011
110111
;;;;
proc print;
proc summary data=have nway;
class v1-v6;
output out=unique(drop=_type_) / levels;
run;
Why not just concatenate the values?
So your combinations are:
111111
111110
111101
111011
110111
....
You can use PROC FREQ to check the number of each type.
proc freq data=have;
table var1*var2*var3*var4*var5*var6 / out=want list;
run;
By using the unique values of the given variables' combinations and then creating an alphabetical List of Ids, you can create the result
data inp;
length combined $6.;
input subjectid $4. v1 1. v2 1. v3 1. v4 1. v5 1. v6 1.;
combined=compress(v1||v2||v3||v4||v5||v6);
datalines;
ID1 111111
ID2 011111
ID3 001111
ID4 111110
ID5 000111
ID6 111111
ID7 000111
;
run;
proc sql;
create table uniq
as
select distinct combined from inp order by combined desc;
quit;
data uniq1;
set uniq;
retain alphabet 65;
Id=byte(alphabet) ;
alphabet+1;
drop alphabet;
run;
proc sql;
create table final_ds
as
select subjectid, v1, v2, v3, v4, v5, v6, Id
from inp a
left join uniq1 b
on a.combined=b.combined;
quit;
Assuming the data is sorted by your grouping variables then just use BY group processing.
data want;
set have;
by var1-var6 ;
groupid + first.var6 ;
run;
Or you could just convert the 6 binary variables into a single unique value.
group2 = input(cats(of var1-var6),binary6.);
This has the added value of not requiring that you sort the data, but it does need for none of the grouping variables to be missing.
Result
SubjectID Var1 Var2 Var3 Var4 Var5 Var6 Identifier Want groupno group2
ID4 0 0 1 1 1 0 C 1 14
ID3 0 1 1 1 1 1 B 2 31
ID1 1 1 1 1 1 1 A 3 63
ID2 1 1 1 1 1 1 A 3 63

In a sas compare, output only differences and new records

In a compare with id, how can I output only the difference and the new records
but not the old records no more present?
Example, suppose I have two tables:
mybase:
key other
1 Ann
3 Ann
4 Charlie
5 Emily
and mycompare:
key other
2 Bill
3 Charlie
4 Charlie
running:
proc compare data=mybase
compare=mycompare
outnoequal
outdif
out=myoutput
listvar
outcomp
outbase
method = absolute
criterion = 0.0001
;
id key;
run;
I get a table "myoutput" like this:
type obs key other
base 1 1 Ann
compare 1 2 Bill
base 2 3 Ann
compare 2 3 Charlie
dif 2 3 XXXXXXX
base 4 5 Emily
I would like to have this:
type obs key other
compare 1 2 Bill
base 2 3 Ann
compare 2 3 Charlie
dif 2 3 XXXXXXX
This works for your example. I think you want to output records that are not matched in base and any records that match and have differences.
data mybase;
input key other $;
cards;
1 Ann
3 Ann
4 Charlie
5 Emily
;;;;
data mycompare;
input key other $;
cards;
2 Bill
3 Charlie
4 Charlie
;;;;
proc compare data=mybase
compare=mycompare
outnoequal
outdif
out=myoutput
listvar
outcomp
outbase
method = absolute
criterion = 0.0001
;
id key;
run;
proc print;
run;
data test;
set myoutput;
by key;
if (first.key and last.key) and _type_ eq 'BASE' then delete;
run;
proc print;
run;
Obs _TYPE_ _OBS_ key other
1 COMPARE 1 2 Bill
2 BASE 2 3 Ann
3 COMPARE 2 3 Charlie
4 DIF 1 3 XXXXXXX.