suppose to have the following data set:
ID Date_Start Date_End Flag1 Flag2
001 13JAN2015 01JUN2018 1 0
001 02JUN2018 02JUL2018 1 0
001 03JUL2018 31DEC2020 1 0
002 01JAN2015 31DEC2020 1 0
003 01JAN2017 31DEC2019 1 0
003 01JAN2020 31DEC2021 1 0
004 01JAN2011 31DEC2021 1 2
..... ......... ......... ..... ......
Desired output:
ID Date_Start Date_End Flag1 Flag2
001 13JAN2015 01JUN2018 1 0
001 02JUN2018 02JUL2018 1 0
001 03JUL2018 31DEC2020 1 10
002 01JAN2015 31DEC2020 1 10
003 01JAN2017 31DEC2019 1 0
003 01JAN2020 31DEC2021 1 10
004 01JAN2011 31DEC2021 1 2
..... ......... ......... ..... ......
In other words: if Flag2 == 0 and Flag1 == 1 replace the flag in Flag2 column with 10 for each ID as follows:
for replicated IDs take the last interval of time;
for unique IDs take the interval you have.
I'm a newbie in SAS programming. I know that what I have to do is:
data my data;
set input;
if Flag2 = 0 AND Flag1 = 1 then Flag2 = 10
run;
but I don't know how to manage periods and replicated IDs. Can anyone help me please?
I'm not entirely sure here, but I think this is what you want.
data have;
input ID $ (Date_Start Date_End)(:date9.) Flag1 Flag2;
format Date_Start Date_End date9.;
datalines;
001 13JAN2015 01JUN2018 1 0
001 02JUN2018 02JUL2018 1 0
001 03JUL2018 31DEC2020 1 0
002 01JAN2015 31DEC2020 1 0
003 01JAN2017 31DEC2019 1 0
003 01JAN2020 31DEC2021 1 0
004 01JAN2011 31DEC2021 1 2
;
data want;
set have;
by ID;
if last.ID and flag1 = 1 and flag2 = 0 then flag2 = 10;
run;
Result
ID Date_Start Date_End Flag1 Flag2
001 13JAN2015 01JUN2018 1 0
001 02JUN2018 02JUL2018 1 0
001 03JUL2018 31DEC2020 1 10
002 01JAN2015 31DEC2020 1 10
003 01JAN2017 31DEC2019 1 0
003 01JAN2020 31DEC2021 1 10
004 01JAN2011 31DEC2021 1 2
Related
suppose to have the following:
ID Start_date End_date Hospital Work
00001 01JAN2015 15JAN2015 006 w
00001 16JAN2015 16JAN2015 006 p
00001 17JAN2015 20JAN2015 006 w
00001 21JAN2015 29JAN2015 006 f
00001 30JAN2015 02FEB2015 004 w
00001 03FEB2015 03FEB2015 004 s
00001 04FEB2015 08FEB2015 004 w
00001 09FEB2015 13FEB2015 004 f
00001 14FEB2015 16FEB2015 006 f
00001 17FEB2015 28DEC2016 006 w
00001 29DEC2016 31DEC2016 006 w
.... ..... ...... ... ...
Desired output:
ID Start_date End_date Hospital Work Flag1 Flag2
00001 01JAN2015 15JAN2015 006 w 1 4
00001 16JAN2015 16JAN2015 006 p 4 9
00001 17JAN2015 20JAN2015 006 w 9 4
00001 21JAN2015 29JAN2015 006 f 4 9
00001 30JAN2015 02FEB2015 004 w 9 2
00001 03FEB2015 03FEB2015 004 s 2 9
00001 04FEB2015 08FEB2015 004 w 9 4
00001 09FEB2015 13FEB2015 004 f 4 9
00001 14FEB2015 16FEB2015 006 f 9 2
00001 17FEB2015 28DEC2016 006 w 2 4
00001 29DEC2016 31DEC2016 006 w 4 Stop
.... ..... ...... ... ...
in other words I need to add two columns: Flag1 and Flag2 containing indices with the following criteria:
if the the first Start_date for the ID then Flag1 must always be 1. Then flag2 will contain four indices as follows: 4 if "w" in Work column, 9 if not "w" in Work column (f, s or other), 2 if Hospital changes (here from 006 to 004 and then 006 again) and Stop for the end of the period, here 31DEC2016 but it could be 31DEC2019 or 31DEC2020 depending on the ID. Totally I have 350 IDs that are repeated because I have many periods per ID.
Column Flag1 will take the previous index of Flag2 column.
Can anyone help me please?Thank you in advance
data source_data;
input ID :$5. Start_date :date9. End_flag :date9. Hospital :$3. Work :$1.;
format Start_date End_flag date9.;
datalines;
00001 01JAN2015 15JAN2015 006 w
00001 16JAN2015 16JAN2015 006 p
00001 17JAN2015 20JAN2015 006 w
00001 21JAN2015 29JAN2015 006 f
00001 30JAN2015 02FEB2015 004 w
00001 03FEB2015 03FEB2015 004 s
00001 04FEB2015 08FEB2015 004 w
00001 09FEB2015 13FEB2015 004 f
00001 14FEB2015 16FEB2015 006 f
00001 17FEB2015 28DEC2016 006 w
00001 29DEC2016 31DEC2016 006 w
;
proc sort data=source_data;
by ID start_date hospital;
run;
data destination_data;
retain ID Start_date End_flag Hospital Work Flag1 Flag2;
attrib Flag1 length=$8 Flag2 length=$8;
set source_data;
by id start_date hospital;
retain Flag2R;
if work='w' then Flag2='4';
else Flag2='9';
if not first.ID and lag(hospital) NE hospital then Flag2='2';
if last.ID then Flag2='Stop';
Flag2R=lag(Flag2);
if first.ID then flag1='1';
else flag1=Flag2R;
drop Flag2R;
run;
proc print data=destination_data noobs;
run;
In SAS, I have a dataset(have) as below, I need to add a group variable based on test and visitnum. When visitnum is 101 and 108, they need to be in the same group. The desired as shown as data want.
data have:
test visitnum ord seq
aa 101 0 0
aa 101 0 1
aa 108 1 0
aa 108 1 1
aa 108 2 0
aa 108 2 1
aa 115 1 0
aa 115 1 1
aa 115 2 0
aa 115 2 1
bb 101 0 0
bb 101 0 1
bb 108 1 0
bb 108 1 1
bb 108 2 0
bb 108 2 1
bb 115 1 0
bb 115 1 1
bb 115 2 0
bb 115 2 1
data want:
test visitnum ord seq group
aa 101 0 0 1
aa 101 0 1 1
aa 108 1 0 1
aa 108 1 1 1
aa 108 2 0 1
aa 108 2 1 1
aa 115 1 0 2
aa 115 1 1 2
aa 115 2 0 2
aa 115 2 1 2
bb 101 0 0 3
bb 101 0 1 3
bb 108 1 0 3
bb 108 1 1 3
bb 108 2 0 3
bb 108 2 1 3
bb 115 1 0 4
bb 115 1 1 4
bb 115 2 0 4
bb 115 2 1 4
First sort your data by test and visitnum. There are two cases when we want to increment the group number:
When it's the start of a test group and the visitnum is 101 or 108
When it's the start of a visitnum group and it's not 101 or 108
Here's how this looks:
proc sort data=have;
by test visitnum;
run;
data want;
set have;
by test visitnum;
if( first.test AND visitnum IN(101, 108)
OR (first.visitnum AND visitnum NOT IN(101, 108) )
)
then group+1;
run;
Output:
test visitnum ord seq group
aa 101 0 0 1
aa 101 0 1 1
aa 108 1 0 1
aa 108 1 1 1
aa 108 2 0 1
aa 108 2 1 1
aa 115 1 0 2
aa 115 1 1 2
aa 115 2 0 2
.. ... .. .. ..
bb 115 2 0 4
bb 115 2 1 4
I need help with a SAS code which would keep data rows with same Unique ID after meeting a certain condition. For example, if I have a dataset called BASE and is as shown below;
Account_Number Default_Indicator
1010 0
1010 0
1010 1
1010 1
1010 1
1010 0
1010 0
1010 0
1010 1
1010 1
1020 0
1020 0
1020 0
1020 1
1020 1
1020 1
1020 0
1020 0
1020 1
1020 1
I would like the final dataset to keep rows after the Default_Indicator changes from 1 to 0 for the first time as shown below;
Account_Number Default_Indicator
1010 0
1010 0
1010 0
1010 1
1010 1
1020 0
1020 0
1020 1
1020 1
Help with this will be greatly appreciated.
You can use BY group processing, just add the NOTSORTED keyword to the BY statement. Use the LAG() function to access the value from the previous data step iteration. Retain the flag variable indicting that you have found a 1 -> 0 transition. Make sure to reset when starting a new account.
data have;
row+1;
input Account_Number Default_Indicator ## ;
cards;
1010 0 1010 0
1010 1 1010 1 1010 1
1010 0 1010 0 1010 0
1010 1 1010 1
1020 0 1020 0 1020 0
1020 1 1020 1 1020 1
1020 0 1020 0
1020 1 1020 1
;
data want ;
set have;
by account_number default_indicator notsorted;
lag_indicator=lag(default_indicator);
if first.account_number then call missing(found,lag_indicator);
if first.default_indicator and default_indicator=0 and lag_indicator=1 then found+1;
if found then output;
drop lag_indicator found;
run;
Results (without the DROP statement)
Account_ Default_ lag_
Obs row Number Indicator indicator found
1 6 1010 0 1 1
2 7 1010 0 0 1
3 8 1010 0 0 1
4 9 1010 1 0 1
5 10 1010 1 1 1
6 17 1020 0 1 1
7 18 1020 0 0 1
8 19 1020 1 0 1
9 20 1020 1 1 1
Here I have a list of weight of 2 subjects.
data weight_test;
format subject $3. weight 4.;
infile datalines dlm=" " dsd;
input subject weight ;
datalines;
001 27
001 27.5
001 28
001 30
001 29
001 29
002 29
002 30
002 31
002 29
;
run;
I want to mark the weight with 0 and 1:
If the weight < 30 then mark with 0;
Once the weight >= 30 occurs then mark the rest of weight within the same subject with 1.
As the following lists:
subject weight mark
001 27 0
001 27 0
001 28 0
001 30 1
001 29 1
001 29 1
002 29 0
002 30 1
002 31 1
002 29 1
I tried to use the following codes, but it doesn't work properly. Please help me. Thank you~
data weight;
set weight_test;
by subject;
i=0;
retain i;
if weight < 30 then mark=i;
else if weight >= 30 then do;
i = 1;
mark = i;
end;
run;
You have over complicated it. Just set MARK to zero when you start a new subject and set it to one when the target weight is seen.
data weight_test;
input subject $ weight ## ;
datalines;
001 27 001 27.5 001 28 001 30 001 29 001 29
002 29 002 30 002 31 002 29
;
data weight;
set weight_test;
by subject;
if first.subject then mark=0;
if weight >= 30 then mark=1;
retain mark;
run;
Results:
Obs subject weight mark
1 001 27.0 0
2 001 27.5 0
3 001 28.0 0
4 001 30.0 1
5 001 29.0 1
6 001 29.0 1
7 002 29.0 0
8 002 30.0 1
9 002 31.0 1
10 002 29.0 1
Make sure the variable MARK does not already exist in the input dataset.
Try this
data want;
mark=0; _iorc_=0;
do until (last.subject);
set weight_test;
by subject;
if weight >= 30 & _iorc_=0 then do;
_iorc_=1;
mark=1;
end;
output;
end;
run;
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I need create some new variables day1 day2 day3 etc. If readmit=1then do day[i] each day[i]=gap For example, the first two readmit should get day[1]=21 day[2]=9. then the next readmit=1, For the third readmit, the fourth readmit and the fifth readmit=1 should get the result day[1]=29 day[2]=12 day[3]=23 and so on. Hopefully, I expressed well enough. Thanks in advance.
STUDYID index readmit gap
10001 1 0
10001 1 0 79
10001 1 0 48
10001 1 0 39
10001 1 0 74
10001 1 0 41
10001 0 1 21
10001 0 1 9
10001 0 0 130
10001 0 0 52
10001 0 0 110
10001 1 0 80
10001 0 1 29
10001 0 1 12
10001 0 1 23
10001 1 0 57
10001 0 1 28
10001 0 1 14
10001 1 0 118
10001 0 1 5
10001 0 1 22
10001 1 0 40
10001 0 1 23
10001 0 1 24
10001 0 1 19
I think the code below answers your question. This requires 2 passes of the data, the first to calculate the maximum number of consecutive rows where READMIT=1, which is stored in a macro variable used to determine the array size in the second pass.
The key to solving this question is the order of the data and the use of the NOTSORTED option in the BY statement. This enables every change in the READMIT value to be treated as a new section.
Hope this helps, although it would be good if someone could find a method that just uses a single pass of the data.
data have;
input STUDYID index readmit gap;
cards;
10001 1 0 .
10001 1 0 79
10001 1 0 48
10001 1 0 39
10001 1 0 74
10001 1 0 41
10001 0 1 21
10001 0 1 9
10001 0 0 130
10001 0 0 52
10001 0 0 110
10001 1 0 80
10001 0 1 29
10001 0 1 12
10001 0 1 23
10001 1 0 57
10001 0 1 28
10001 0 1 14
10001 1 0 118
10001 0 1 5
10001 0 1 22
10001 1 0 40
10001 0 1 23
10001 0 1 24
10001 0 1 19
;
run;
data _null_;
set have (keep=readmit) end=last;
by readmit notsorted;
if first.readmit then days=0;
retain max_days;
if readmit=1 then days+1;
max_days=max(max_days,days);
if last then call symput('max_days',strip(max_days));
run;
%put maximum consecutive days = &max_days.;
data want;
set have;
by readmit notsorted;
array dayvar{*} day1-day&max_days.;
if first.readmit then do;
num_day=0;
call missing(of day:);
end;
retain day1-day&max_days.;
if readmit=1 then do;
num_day+1;
dayvar{num_day}=gap;
if last.readmit then output;
end;
keep studyid index day: ;
run;