I need to calculate max (Measure) in the last 3 months for each ID and month, without using PROC SQL.I was wondering I could do this using the RETAIN statement, however I have no idea how to implement the condition of comparing the value of Measure in the current row and the preceding two.
I will also need to prepare the above for more than 3 months so any solution that do not require a separate step for each additional month would be absolutely appreciated!
Here is the data I have:
data have;
input month ID $ measure;
cards;
201501 A 0
201502 A 30
201503 A 60
201504 A 90
201505 A 0
201506 A 0
201501 B 0
201502 B 30
201503 B 0
201504 B 30
201505 B 60
;
Here the one I need:
data want;
input month ID $ measure max_measure_3m;
cards;
201501 A 0 0
201502 A 30 30
201503 A 60 60
201504 A 90 90
201505 A 0 90
201506 A 0 90
201501 B 0 0
201502 B 30 30
201503 B 0 30
201504 B 30 30
201505 B 60 60
;
And here both tables: the one I have on the left and the one I need on the right
You can do this with an array that's size to your moving window. I'm not sure what type of dynamic code you need in terms of windows. If you need the max for a 4 or 5 month on top of 3 month then I would recommend using PROC EXPAND instead of these methods. The documentation for PROC EXPAND has a good example of how to do this.
data want;
set have;
by id;
array _prev(0:2) _temporary_;
if first.id then
do;
call missing (of _prev(*));
count=0;
end;
count+1;
_prev(mod(count, 3))=measure;
max=max(of _prev(*));
drop count;
run;
proc expand data=test out=out method=none;
by id;
id month;
convert x = x_movave3 / transformout=(movave 3);
convert x = x_movave4 / transformout=(movave 4);
run;
Try this:
data want(drop=l1 l2 cnt tmp);
set have;
by id;
retain cnt max_measure_3m l1 l2;
if first.id then do;
max_measure_3m = 0;
cnt = 0;
l1 = .;
l2 = .;
end;
cnt = cnt + 1;
tmp = lag(measure);
if cnt > 1 then
l1 = tmp;
tmp = lag2(measure);
if cnt > 2 then
l2 = tmp;
if measure > l1 and measure > l2 then
max_measure_3m = measure;
run;
Related
Every Subject has a baseline. Once the difference between the value and the baseline exceeds 5, that value becomes the baseline for all future comparisons until another value exceeds this new baseline by 5.
This is what I want the output data to look like:
This is what I'm getting
This is my current code, which gets me as close as anything I've tried. I've tried different combinations of retain, lag(), and ifn (suggested in this post)
Data Have;
Input Visit usubjid Baseline Value;
datalines;
1 1 112.2 112.2
2 1 112.2 113.7
3 1 112.2 112
3 1 112.2 108
4 1 112.2 109
5 1 112.2 107
7 1 112.2 106
8 1 112.2 107
;
run;
proc sort;by usubjid;run;
data want;
Length chg $71;
retain chg;
set Have;
length prevchg $71;
by usubjid;
prevchg=chg;
if first.usubjid then do; prevchg=''; end;
baseline=ifn(prevchg in ('Increase >= 5mm New', "Decrease >= 5mm"),lag(value),lag(baseline));
diff = value-baseline;
if visit > 1 then do;
if diff > 5 then do; chg='Increase >= 5mm New'; order = 3; end;
else if diff < -5 then do; chg = 'Decrease >= 5mm'; order = 6; end;
else if -5 <= diff <= 5 then do;
if prevchg in('Increase >= 5mm New', 'Increase > 5mm Persistent') then do; chg ='Increase > 5mm Persistent'; order = 4; end;
else do; chg = 'No Change (change >= -5 and <= 5mm)'; order = 5; end;
end;
end;
run;
Right now the code will correctly update the baseline to the previous value for the next visit, but then goes right back to the original baseline. I'm confident this has something to do with the way Lag() and Retain work with if/then, but I cannot figure out the solution. here is an example of the issue:
You should be able to do this easily. The BASELINE variable CANNOT be on the input if you want to RETAIN its value.
data want ;
set have ;
by usubjid;
retain baseline;
if first.usubjid then baseline=value;
difference = baseline - value;
output;
if difference > 5 then baseline=value;
run;
I have the following data and used one of the existing answered questions to solve my data problem but could not get what I want. Here is what I have in my data
Amt1 is populated when the Evt_type is Fee
Amt2 is populated when the Evt_type is REF1/REF2
I don't want to display any observations after the last Flag='Y'
If there is no Flag='Y' then I want all the observations for that id (e.g. id=102)
I want to display if the next row for that id is a Fee followed by REF1/REF2 after flag='Y' (e.g. id=101) However I don't want if there is no REF1/REF2 (e.g.id=103)
Have:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/3/2019 Fee 10
103 2/4/2019 REF1 Y 10
103 2/5/2019 Fee 10
Want:
id Date Evt_Type Flag Amt1 Amt2
101 2/2/2019 Fee 5
101 2/3/2019 REF1 Y 5
101 2/4/2019 Fee 10
101 2/6/2019 REF2 Y 10
101 2/7/2019 Fee 4
101 2/8/2019 REF1
102 2/2/2019 Fee 25
102 2/2/2019 REF1 N 25
103 2/4/2019 REF1 Y 10
103 2/5/2019 Fee 10
I tried the following
data want;
set have;
by id Date;
drop count;
if (first.id or first.date) and FLAG='Y' then
do;
retain count;
count=1;
output;
return;
end;
if count=1 and ((first.id or first.date) and Flag ne 'Y') then
do;
retain count;
delete;
return;
end;
output;
run;
Any help is appreciated.
Thanks
A technique known as DOW loop can perform a computation that measures a group in some way and then, in a second loop, apply that computation to members of the group.
The DOW relies on a SET statement inside the loop. In this case the computation is 'what row in the group is the last one having flag="Y".
data want;
* DOW loop, contains computation;
_max_n_with_Y = 1e12;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if flag='Y' then _max_n_with_Y = _n_;
end;
* Follow up loop, applies computation;
do _n_ = 1 to _n_;
set have;
if _n_ <= _max_n_with_Y then OUTPUT;
end;
drop _:;
run;
Here is one way
data have;
input id $ Date : mmddyy10. Evt_Type $ Flag $ Amt1 Amt2;
format Date mmddyy10.;
infile datalines dsd missover;
datalines;
101,2/2/2019,Fee,,5,
101,2/3/2019,REF1,Y,,5
101,2/4/2019,Fee,,10,
101,2/6/2019,REF2,Y,,10
101,2/7/2019,Fee,,4,
102,2/2/2019,Fee,,25,
102,2/2/2019,REF1,N,25,
;
data want;
do _N_ = 1 by 1 until (last.id);
set have;
by id;
if flag = "Y" then _iorc_ = _N_;
end;
do _N_ = 1 to _N_;
set have;
if _N_ le _iorc_ then output;
end;
_iorc_=1e7;
run;
I want to transform my SAS table from data Have to data want.
I feel I need to use Proc transpose but could not figure it out how to do it.
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
data Want;
input Variable $11.0 MAX MIN SUM;
datalines;
Variable_1 6 0 29
Variable_2 7 1 87
Variable_3 11 3 87
Variable_4 23 5 100
;
You are right, proc transpose is the solution
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
/*sort it by the stat var*/
proc sort data=Have; by Stat; run;
/*id statement will keep the column names*/
proc transpose data=have out=want name=Variable;
id stat;
run;
proc print data=want; run;
I have a dataset of money earned as a % every week in 2017 to 2018. Some don't have data at the start of 2017 as they didn't start earning until later on. The weeks are numbered as 201701, 201702 - 201752 and 201801 - 201852.
What I'd like to do is have 104 new variables called WEEK0 - WEEK103, where WEEK0 will have the first non empty column value of the money earned columns. Here is an example of the data:
MON_EARN_201701 MON_EARN_201702 MON_EARN_201703 MON_EARN_201704
30 21 50 65
. . 30 100
. 102 95 85
Then I want my data to have the following columns (example)
WEEK0 WEEK1 WEEK2 WEEK3
30 21 50 65
30 100 . .
102 95 85 .
These are just small examples of a very large dataset.
I was thinking I'd need to try and do some sort of do loops so what I've tried so far is:
DATA want;
SET have;
ARRAY mon_earn{104} mon_earn_201701 - mon_earn_201752 mon_earn_201801 -mon_earn_201852;
ARRAY WEEK {104} WEEK0 - WEEK103;
DO i = 1 to 104;
IF mon_earn{i} NE . THEN;
WEEK{i} = mon_earn{i};
END;
END;
RUN;
This doesn't work as it doesn't fill the WEEK0 when the first value is empty.
If anymore information is needed please comment and I will add it in.
Sounds like you just need to find the starting point for copying.
First look thru the list of earnings by calendar month until you find the first non missing value. Then copy the values starting from there into you new array of earnings by relative month.
data want;
set have;
array mon_earn mon_earn_201701 -- mon_earn_201852;
array week (104);
do i = 1 to dim(mon_earn) until(found);
if mon_earn{i} ne . then found=1;
end;
do j=1 to dim(week) while (i+j<dim(mon_earn));
week(j) = mon_earn(i+j-1);
end;
run;
NOTE: I simplified the ARRAY definitions. For the input array I assumed that the variables are defined in order so that you could use positional array list. For the WEEK array SAS and I both like to start counting from one, not zero.
You could do this if it was a long format. There's a chance you don't need it while in a long format.
proc sort data=have;
by ID week;
run;
data want;
set have;
by id; *for each group/id counter;
retain counter;
if first.id then counter=0;
if counter=0 and not missing(value) then do;
counter=1; new_week=0; end;
if counter = 1 then new_week+1;
run;
If you really need it wide:
Find first not missing value and store index in i
Loop from i to end of week dimension
Assign week to mon_earned from i to end of week.
data want;
set have;
array mon_earned(*) .... ;
array week(*) ... ;
found=0; i=0;
do while(found=0);
if not missing(mon_earned(i)) then found=1;
i+1;
end;
z=0;
do j=i to dim(week);
week(z) = mon_earned(j);
z+1;
end;
run;
You need a second index variable, call it j, to target the proper week assignment. j is only incremented when a months earning is not missing.
This example code will 'squeeze out` all missing earnings; even those missing earnings that occurring after some earning has occurred. For example
earning: . . . 10 . 120 . 25 … will squeeze to
week: 10 120 25 …
data have;
array earn earn_201701-earn_201752 earn_201801-earn_201852;
do _n_ = 1 to 1000;
call missing (of earn(*));
do _i_ = 1 + 25 * ranuni(123) to dim(earn);
if ranuni(123) < 0.95 then
earn(_i_) = round(10 + 125 * ranuni(123));
end;
output;
end;
run;
data want;
set have;
array earn earn_201701-earn_201752 earn_201801-earn_201852;
array week(0:103);
j = -1;
do i = 1 to dim(earn);
if not missing(earn(i)) then do;
j+1;
week(j) = earn(i);
end;
end;
drop i j;
run;
If you want to maintain interior missing earnings the logic would be
if not missing(earn(i)) or j >=0 then do;
j+1;
week(j) = earn(i);
end;
I am working on a dataset in SAS to get the next observation's score should be the current observation's value for the column Next_Row_score. If there is no next observation then the current observation's value for the column Next_Row_score should be 'null'per group(ID). For better illustration i have provided the sample below dataset :
ID Score
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
Resultant output should be like -
ID Salary Next_Row_Salary
10 1000 1500
10 1500 2000
10 2000 .
20 3000 4000
20 4000 .
30 2500 2500
Thank you in advance for your help.
data want(drop=_: flag);
merge have have(firstobs=2 rename=(ID=_ID Score=_Score));
if ID=_ID then do;
Next_Row_Salary=_Score;
flag+1;
end;
else if ID^=_ID and flag>=1 then do;
Next_Row_Salary=.;
flag=.;
end;
else Next_Row_Salary=score;
run;
Try this :
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
proc sql noprint;
select count(*) into :obsHave
from have;
quit;
data want2(rename=(id1=ID Score1=Salary) drop=ID id2 Score);
do i=1 to &obsHave;
set have point=i;
id1=ID;
Score1=Score;
j=i+1;
set have point=j;
id2=ID;
if id1=id2 then do;
Next_Row_Salary = Score;
end;
else Next_Row_Salary=".";
output;
end;
stop;
;
run;
There is a simpler (in my mind, at least) proc sql approach that doesn't involve loops:
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
/*count each observation's place in its ID group*/
data have2;
set have;
count + 1;
by id;
if first.id then count = 1;
run;
/*if there is only one ID in a group, keep original score, else lag by 1*/
proc sql;
create table want as select distinct
a.id, a.score,
case when max(a.count) = 1 then a.score else b.score end as score2
from have2 as a
left join have2 (where = (count > 1)) as b
on a.id = b.id and a.count = b.count - 1
group by a.id;
quit;