My problem is about calculating the cumulative sum for each id and for each date taking into account a sliding period of 15 previous days. If the cumulative sum exceeds 10k, the variable top is incremented.
The treatment is done for Juen only.
Here is an exemple of the desired result :
id app_date price cum top
1 29-juin-20 4000 4 000 .
1 13-juin-20 5000 45 000 1
1 13-juin-20 6000 40 000 2
1 11-juin-20 7000 34 000 3
1 10-juin-20 8000 27 000 4
1 01-juin-20 9000 19 000 5
1 30-mai-20 10000 10 000 .
proc sort data = tab out= tab1;
by id descending app_date;
run;
data tab2;
set tab1;
%let annee=2020;
%let month=06;
by id;
retain last_date date_last_d CUM;
if first.id then do;
last_date =app_date;
date_last_dem = app_date;
CUM=0;
end;
if month(date_last_d) =&month. then do ;
diff= date_last_d -app_date;
CUM= price+ CUM;
end;
if diff>15 then do;
diff = .;
CUM =.;
last_date =app_date;
date_last_d = app_date;
end;
if last.id and CUM>10000 then top= top+1 ;
output;
last_date=app_date;
format last_date DDMMYY10.;
format date_last_d DDMMYY10.;
format CUM 14.2;
run;
I can do it for the first iteration but I cannot do it for all the lines.
How about this?
data have;
input Cnt Price ID App_date :ddmmyy10.;
format App_date ddmmyy10.;
datalines;
1 2265 534 30/05/2020
2 2330 4594 27/06/2020
3 1360 723 14/05/2020
4 1393 723 14/05/2020
5 2400 101666 12/06/2020
6 2411 101666 12/06/2020
7 2400 101666 11/06/2020
8 2400 101666 11/06/2020
9 2527 101666 10/06/2020
10 2536 101666 10/06/2020
11 2458 101666 04/06/2020
12 2758 1088 30/05/2020
13 4412 1056 13/06/2020
14 1870 1255 30/06/2020
15 4198 1255 14/05/2020
;
data want(drop = c k p dt);
dcl hash h(ordered : "Y");
h.definekey("c");
h.definedata("c", "p", "dt");
h.definedone();
dcl hiter i("h");
do c = 1 by 1 until (last.ID);
set have(rename=(App_Date=dt Price=p));
by ID notsorted;
h.add();
end;
do k = 1 by 1 until (last.ID);
set have;
by ID notsorted;
cum = 0;
do while (i.next() = 0);
if App_Date - 15 <= dt <= App_Date & k <= c then cum + p;
end;
if cum > 10000 then top + 1;
else top = .;
output;
end;
h.clear();
run;
Related
data scores;
length variables $ 16;
input variables $ low high score;
datalines;
Debt -10000 1 55
Debt 1 10000 23
MAX_NA -1 1 500
MAX_NA 1 100 -240
;
data main_data;
input ID Debt MAX_NA;
datalines;
222554 7584 12
212552 20 0
883123 500 7
913464 -200 -78
;
data end_result;
input ID Debt MAX_NA score;
datalines;
222554 7584 12 -217
212552 20 0 523
883123 500 7 -185
913464 -200 -78 555
;
Above you'll find three data sets.
The scores data sets depict each variables' score, based on a range of values between low and high columns.
The second data set main_data shows the exact values of Debt and MAX_NA.
end_result table is what I would like to achieve.
What step and statements should I use to calculate the score and get the end_result table?
Another apprach is to use a double left join like so:
data scores;
length variables $ 16;
input variables $ low high score;
datalines;
Debt -10000 1 55
Debt 1 10000 23
MAX_NA -1 1 500
MAX_NA 1 100 -240
;
data main_data;
input ID Debt MAX_NA;
sortseq = _n_;
datalines;
222554 7584 12
212552 20 0
883123 500 7
913464 -200 -78
;
proc sql;
create table end_result as
select a.ID
,a.Debt
,a.MAX_NA
,coalesce(b.score,0) + coalesce(c.score,0) as score
from main_data as a
left join scores(where=(variables="Debt")) as b
on b.low < a.Debt <= b.high
left join scores(where=(variables="MAX_NA")) as c
on c.low < a.MAX_NA <= c.high
order by a.sortseq
;
quit;
Note that I have included a sortseq variable in main_data to keep the sorting order.
Like draycut I get the same score for id 222554 and 883123. For ID 913464 the MAX_NA value is out of range of the scores dataset, so I have counted it as zero by using the coalesce funtion. I therefore get the results:
ID Debt MAX_NA score
222554 7584 12 -217
212552 20 0 523
883123 500 7 -217
913464 -200 -78 55
Simpler:
data end_result(keep=ID Debt MAX_NA score);
set main_data;
score = 0;
do i = 1 to n;
set scores(rename=score=s) point=i nobs=n;
if variables = "Debt" and low <= Debt <= high then score + s;
else if variables = "MAX_NA" and low <= MAX_NA <= high then score + s;
end;
run;
I don't understand why id 222554 and 883123 do not get the same score?
Anyway, here is an approach you can use as a template.
data end_result;
if _N_ = 1 then do;
dcl hash h(dataset : "scores(rename=score=s)", multidata : "Y");
h.definekey("variables");
h.definedata(all : "Y");
h.definedone();
dcl hiter hi("h");
end;
set main_data;
if 0 then set scores(rename=score=s);
score = 0;
do while (hi.next() = 0);
if variables = "Debt" and low <= Debt <= high then score + s;
else if variables = "MAX_NA" and low <= MAX_NA <= high then score + s;
end;
keep id Debt max_na score;
run;
Result:
ID Debt MAX_NA score
222554 7584 12 -217
212552 20 0 523
883123 500 7 -217
913464 -200 -78 555
I want to transform my SAS table from data Have to data want.
I feel I need to use Proc transpose but could not figure it out how to do it.
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
data Want;
input Variable $11.0 MAX MIN SUM;
datalines;
Variable_1 6 0 29
Variable_2 7 1 87
Variable_3 11 3 87
Variable_4 23 5 100
;
You are right, proc transpose is the solution
data Have;
input Stat$ variable_1 variable_2 variable_3 variable_4;
datalines;
MAX 6 7 11 23
MIN 0 1 3 5
SUM 29 87 30 100
;
/*sort it by the stat var*/
proc sort data=Have; by Stat; run;
/*id statement will keep the column names*/
proc transpose data=have out=want name=Variable;
id stat;
run;
proc print data=want; run;
I am new to SAS and I want to transpose the following table in SAS
From
ID Var1 Var2 Jul-09 Aug-09 Sep-09
1 10 15 200 300
2 5 17 -150 200
to
ID Var1 Var2 Date Transpose
1 10 15 Jul-09 200
1 10 15 Aug-09 300
2 5 17 Aug-09 -150
2 5 17 Sep-09 200
Can anyone help please?
You can use proc transpose to tranform data.
options validvarname=any;
data a;
infile datalines missover;
input ID Var1 Var2 "Jul-09"n "Aug-09"n "Sep-09"n;
datalines;
1 10 15 200 300
2 5 17 -150 200
;
run;
proc transpose data=a out=b(rename=(_NAME_=Date COL1=Transpose));
var "Jul-09"n--"Sep-09"n;
by ID Var1-Var2;
run;
data a;
input ID Var1 Var2 Jul_09 Aug_09;
CARDS;
1 10 15 200 300
2 5 17 -150 200
;
DATA b(drop=i jul_09 aug_09);
array dates_{*} jul_09 aug_09;
set a;
do i=1 to dim(dates_);
this_value=dates_{i};
this_date=input(compress(vname(dates_{i}),'_'),MONYY5.);
output;
end;
format this_date monyy5.;
run;
I am working on a dataset in SAS to get the next observation's score should be the current observation's value for the column Next_Row_score. If there is no next observation then the current observation's value for the column Next_Row_score should be 'null'per group(ID). For better illustration i have provided the sample below dataset :
ID Score
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
Resultant output should be like -
ID Salary Next_Row_Salary
10 1000 1500
10 1500 2000
10 2000 .
20 3000 4000
20 4000 .
30 2500 2500
Thank you in advance for your help.
data want(drop=_: flag);
merge have have(firstobs=2 rename=(ID=_ID Score=_Score));
if ID=_ID then do;
Next_Row_Salary=_Score;
flag+1;
end;
else if ID^=_ID and flag>=1 then do;
Next_Row_Salary=.;
flag=.;
end;
else Next_Row_Salary=score;
run;
Try this :
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
proc sql noprint;
select count(*) into :obsHave
from have;
quit;
data want2(rename=(id1=ID Score1=Salary) drop=ID id2 Score);
do i=1 to &obsHave;
set have point=i;
id1=ID;
Score1=Score;
j=i+1;
set have point=j;
id2=ID;
if id1=id2 then do;
Next_Row_Salary = Score;
end;
else Next_Row_Salary=".";
output;
end;
stop;
;
run;
There is a simpler (in my mind, at least) proc sql approach that doesn't involve loops:
data have;
input ID Score;
datalines;
10 1000
10 1500
10 2000
20 3000
20 4000
30 2500
;
run;
/*count each observation's place in its ID group*/
data have2;
set have;
count + 1;
by id;
if first.id then count = 1;
run;
/*if there is only one ID in a group, keep original score, else lag by 1*/
proc sql;
create table want as select distinct
a.id, a.score,
case when max(a.count) = 1 then a.score else b.score end as score2
from have2 as a
left join have2 (where = (count > 1)) as b
on a.id = b.id and a.count = b.count - 1
group by a.id;
quit;
I need to calculate max (Measure) in the last 3 months for each ID and month, without using PROC SQL.I was wondering I could do this using the RETAIN statement, however I have no idea how to implement the condition of comparing the value of Measure in the current row and the preceding two.
I will also need to prepare the above for more than 3 months so any solution that do not require a separate step for each additional month would be absolutely appreciated!
Here is the data I have:
data have;
input month ID $ measure;
cards;
201501 A 0
201502 A 30
201503 A 60
201504 A 90
201505 A 0
201506 A 0
201501 B 0
201502 B 30
201503 B 0
201504 B 30
201505 B 60
;
Here the one I need:
data want;
input month ID $ measure max_measure_3m;
cards;
201501 A 0 0
201502 A 30 30
201503 A 60 60
201504 A 90 90
201505 A 0 90
201506 A 0 90
201501 B 0 0
201502 B 30 30
201503 B 0 30
201504 B 30 30
201505 B 60 60
;
And here both tables: the one I have on the left and the one I need on the right
You can do this with an array that's size to your moving window. I'm not sure what type of dynamic code you need in terms of windows. If you need the max for a 4 or 5 month on top of 3 month then I would recommend using PROC EXPAND instead of these methods. The documentation for PROC EXPAND has a good example of how to do this.
data want;
set have;
by id;
array _prev(0:2) _temporary_;
if first.id then
do;
call missing (of _prev(*));
count=0;
end;
count+1;
_prev(mod(count, 3))=measure;
max=max(of _prev(*));
drop count;
run;
proc expand data=test out=out method=none;
by id;
id month;
convert x = x_movave3 / transformout=(movave 3);
convert x = x_movave4 / transformout=(movave 4);
run;
Try this:
data want(drop=l1 l2 cnt tmp);
set have;
by id;
retain cnt max_measure_3m l1 l2;
if first.id then do;
max_measure_3m = 0;
cnt = 0;
l1 = .;
l2 = .;
end;
cnt = cnt + 1;
tmp = lag(measure);
if cnt > 1 then
l1 = tmp;
tmp = lag2(measure);
if cnt > 2 then
l2 = tmp;
if measure > l1 and measure > l2 then
max_measure_3m = measure;
run;