How to Capture previous row value and perform subtraction - sas

How to Capture previous row value and perform subtraction
Refer Table 1 as main data, Table 2 as desired output, Let me explain you in detail, Closing_Bal is derived from (Opening_bal - EMI) for eg if (20 - 2) = 18, as value 18 i want in 2nd row under opening_bal column then ( opening_bal - EMI) and so till new LAN , If New LAN available then start the loop again ,
i have created lag function butnot able to run loop

Try this
data A;
input Month $ LAN Opening_Bal EMI Closing_Bal;
infile datalines dlm = '|' dsd;
datalines;
1_Nov|1|20|2|18
2_Dec|1| |3|
3_Jan|1| |5|
4_Feb|1| |3|
1_Nov|2|30|4|26
2_Dec|2| |3|
3_Jan|2| |2|
4_Feb|2| |5|
5_Mar|2| |6|
;
data B(drop = c);
set A;
by LAN;
if first.LAN then c = Closing_Bal;
if Opening_Bal = . then do;
Opening_Bal = c;
Closing_Bal = Opening_Bal - EMI;
c = Closing_Bal;
end;
retain c;
run;
Result:
Month LAN Opening_Bal EMI Closing_Bal
1_Nov 1 20 2 18
2_Dec 1 18 3 15
3_Jan 1 15 5 10
4_Feb 1 10 3 7
1_Nov 2 30 4 26
2_Dec 2 26 3 23
3_Jan 2 23 2 21
4_Feb 2 21 5 16
5_Mar 2 16 6 10

The problem is that you already have CLOSING_BAL on the input dataset, so when the SET statement reads a new observation it will overwrite the value calculated on the previous observation. Either drop or rename the variable in the source dataset.
Example:
data have;
input Month $ LAN Opening_Bal EMI Closing_Bal;
datalines;
1_Nov 1 20 2 18
2_Dec 1 . 3 .
3_Jan 1 . 5 .
4_Feb 1 . 3 .
1_Nov 2 30 4 26
2_Dec 2 . 3 .
3_Jan 2 . 2 .
4_Feb 2 . 5 .
5_Mar 2 . 6 .
;
data want;
set have (drop=closing_bal);
retain Closing_Bal;
Opening_Bal=coalesce(Opening_Bal,Closing_Bal);
Closing_bal=Opening_bal - EMI ;
run;
Results:
Opening_ Closing_
Obs Month LAN Bal EMI Bal
1 1_Nov 1 20 2 18
2 2_Dec 1 18 3 15
3 3_Jan 1 15 5 10
4 4_Feb 1 10 3 7
5 1_Nov 2 30 4 26
6 2_Dec 2 26 3 23
7 3_Jan 2 23 2 21
8 4_Feb 2 21 5 16
9 5_Mar 2 16 6 10

I am not sure this works
data B;
set A;
by lan;
if not first.lan then do;
opening_bal = lag(closing_bal);
closing_bal = opening_bal - EMI;
end;
run;
because you don't execute lag for each observation.

Related

SaS: How to calculate moving average in sas using current observation?

I am trying to calculate moving average for test data set in SaS, where i want to consider the current calculated moving average for next moving average. I have added the below sample calculation.
I have data something like this
data have;
input category week value ;
datalines;
a 1 10
a 2 5
a 3
a 4 30
a 5 50
b 1 30
b 2 5
b 3
b 4 0
b 5 50
;
I want to calculate 4 weeks of moving average at category level
here is below expected output
data want;
input category week value moving_average;
datalines;
a 1 10 .
a 2 5 .
a 3 . .
a 4 30 .
a 5 50 .
a 6 . 28.33
a 7 . 36.11
a 8 . 34.86
b 1 30 .
b 2 5 .
b 3 . .
b 4 0 .
b 5 50 .
b 6 . 18.33
b 7 . 22.77
b 8 . 22.775
b 9 . 28.46
SO here is logic for b
`For Week 6: (50+0+5)/3 = 18.33
For Week 7: (18.33+50+0)/3 = 22.77
For Week 8: (22.77+18.33+50+0)/4 = 22.775
Similar calculation can be done for b
**One can consider till week 5 is training data after week its test data **
Hope this time i have made clear my problem statement.`
So you want to create new observations? You will need an explicit OUTPUT statement.
You can use a "circular array" to make it easier to calculate the average.
data have;
input category $ week value ;
datalines;
a 1 10
a 2 5
a 3 .
a 4 30
a 5 50
b 1 30
b 2 5
b 3 .
b 4 0
b 5 50
;
data want;
set have;
by category ;
array c_array [0:3] _temporary_ ;
if first.category then call missing(of c_array[*]);
if week <= 5 then c_array[mod(week,4)]=value;
output;
if week=5 then do week=6 to 9;
value=.;
average=mean(of c_array[*]);
output;
c_array[mod(week,4)]=average;
end;
run;
Results
Obs category week value average
1 a 1 10 .
2 a 2 5 .
3 a 3 . .
4 a 4 30 .
5 a 5 50 .
6 a 6 . 28.3333
7 a 7 . 36.1111
8 a 8 . 36.1111
9 a 9 . 37.6389
10 b 1 30 .
11 b 2 5 .
12 b 3 . .
13 b 4 0 .
14 b 5 50 .
15 b 6 . 18.3333
16 b 7 . 22.7778
17 b 8 . 22.7778
18 b 9 . 28.4722

How to Count Distinct for SAS PROC SQL with Rolling Date Window of 5 years?

I want to count the distinct values of a variable grouped by MEMBER_ID and a rolling date range of 5 years. I have seen a similar post.
How to Count Distinct for SAS PROC SQL with Rolling Date Window?
When I change h2.DATE BETWEEN h.DATE - 180 AND h.DATE to h2.year BETWEEN h.year-5 AND h.year, should it give me the correct distinct count within the last 5 years? Thank you in advance.
data have;
input permno year Cand_ID$;
datalines;
1 2000 1
1 2001 2
1 2002 3
1 2003 1
1 2004 3
1 2005 1
2 2000 1
2 2001 3
2 2002 1
2 2003 2
2 2004 2
2 2005 2
2 2006 1
2 2007 1
3 2001 3
3 2002 3
3 2003 3
3 2004 1
3 2005 1
;
run;
Here's how you can do it with a data step. This assumes you have values for all years. If you do not, fill it in with zeros.
Keep a rolling list of the last 5 years by using the lag function. If we keep a rolling sorted array list of the last 5 years using lag, we can count the distinct values for each row to get a rolling 5-year count.
In other words, we're going to create and count a list that looks like this:
permno year id1 id2 id3 id4 id5
1 2000 . . . . 1
1 2001 . . . 1 2
1 2002 . . 1 2 3
1 2003 . 1 1 2 3
Code:
data want;
set have;
by permno year;
array lagid[4] $;
array id[5] $;
id1 = cand_id;
lagid1 = lag1(cand_id);
lagid2 = lag2(cand_id);
lagid3 = lag3(cand_id);
lagid4 = lag4(cand_id);
/* Reset the counter for the first group */
if(first.permno) then n = 0;
/* Count the number of rows within a group */
n+1;
/* Save the last 5 years by using the lag function,
but do not get lags from previous groups
*/
do i = 1 to 4;
if(i < n) then id[i+1] = lagid[i];
end;
/* Sort the array of IDs into ascending order */
call sortc(of id:);
/* Count the number of distinct IDs in the array. Do not count
missing values.
*/
n_distinct = 1;
do i = 2 to dim(id);
if(id[i] > id[i-1] AND NOT missing(id[i-1]) ) then n_distinct+1;
end;
drop lag: n i;
run;
Output (without id: dropped):
permno year Cand_ID id1 id2 id3 id4 id5 n_distinct
1 2000 1 . . . . 1 1
1 2001 2 . . . 1 2 2
1 2002 3 . . 1 2 3 3
1 2003 1 . 1 1 2 3 3
1 2004 3 1 1 2 3 3 3
1 2005 1 1 1 2 3 3 3

Biderectional Vlookup - flag in the same table - Sas

I need to do this:
table 1:
ID Cod.
1 20
2 102
4 30
7 10
9 201
10 305
table 2:
ID Cod.
1 20
2 50
3 15
4 30
5 25
7 10
10 300
Now, I got a table like this with an outer join:
ID Cod. ID1 Cod1.
1 20 1 20
2 50 . .
. . 2 102
3 15 . .
4 30 4 30
5 25 . .
7 10 7 10
. . 9 201
10 300 . .
. . 10 305
Now I want to add a flag that tell me if the ID have common values, so:
ID Cod. ID1 Cod1. FLag_ID Flag_cod:
1 20 1 20 0 0
2 50 . . 0 1
. . 2 102 0 1
3 15 . . 1 1
4 30 4 30 0 0
5 25 . . 1 1
7 10 7 10 0 0
. . 9 201 1 1
10 300 . . 0 1
. . 10 305 0 1
I would like to know how can I get the flag_ID, specifically to cover the cases of ID = 2 or ID=10.
Thank you
You can group by a coalescence of id in order to count and compare details.
Example
data table1;
input id code ##; datalines;
1 20 2 102 4 30 7 10 9 201 10 305
;
data table2;
input id code ##; datalines;
1 20 2 50 3 15 4 30 5 25 7 10 10 300
;
proc sql;
create table got as
select
table2.id, table2.code
, table1.id as id1, table1.code as code1
, case
when count(table1.id) = 1 and count(table2.id) = 1 then 0 else 1
end as flag_id
, case
when table1.code - table2.code ne 0 then 1 else 0
end as flag_code
from
table1
full join
table2
on
table2.id=table1.id and table2.code=table1.code
group by
coalesce(table2.id,table1.id)
;
You might also want to look into
Proc COMPARE with BY

SAS, calculate row difference

data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
I have two columns of data ID and Month. Column 1 is the ID, the same ID may have multiple rows (1-5). The second column is the enrolled month. I want to create the third column. It calculates the different between the current month and the initial month for each ID.
you can do it like that.
data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
data calc;
set test;
by id;
retain current_month;
if first.id then do;
current_month=month;
calc_month=0;
end;
if ^first.id then do;
calc_month = month - current_month ;
end;
run;
Krs

Detect the difference b/w ages greater than some value using SAS

I am trying to detect groups which contain the difference between first age and second age are greater than 5. For example, if I have the following data, the difference between age in grp=1 is 39 so I want to output that group in a separate data set. Same goes for grp 4.
id grp age sex
1 1 60 M
2 1 21 M
3 2 30 M
4 2 25 F
5 3 45 F
6 3 30 F
7 3 18 M
8 4 32 M
9 4 18 M
10 4 16 M
My initial idea was to sort them by grp and then get the absolute value between ages using something like if first.grp then do;. But I don't know how to get the absolute value between first age and second age by group or actually I don't know how should I start this.
Thanks in advance.
Here's one way that I think works.
data have;
input id $ grp $ age sex $;
datalines;
1 1 60 M
2 1 21 M
3 2 30 M
4 2 25 F
5 3 45 F
6 3 30 F
7 3 18 M
8 4 32 M
9 4 18 M
10 4 16 M
;
proc sort data=have ;
by grp descending age;
run;
data temp(keep=grp);
retain old;
set have;
by grp descending age;
if first.grp then old=age;
if last.grp then do;
diff=old-age;
if diff>5 then output ;
end;
run;
Data want;
merge temp(in=a) have(in=b);
by grp ;
if a and b;
run;
I would use PROC TRANSPOSE so the values in each group can easily be compared. For example:
data groups1;
input id $ grp age sex $;
datalines;
1 1 60 M
2 1 21 M
3 2 30 M
4 2 25 F
5 3 45 F
6 3 30 F
7 3 18 M
8 4 32 M
9 4 18 M
10 4 16 M
;
run;
proc sort data=groups1;
by grp; /* This maintains age order */
run;
proc transpose data=groups1 out=groups2;
by grp;
var age;
run;
With the transposed data you can do whatever comparison you like (I can't tell from your question what exactly you want, so I just compare first two ages):
/* With all ages of a particular group in a single row, it is easy to compare */
data outgroups1(keep=grp);
set groups2;
if abs(col1-col2)>5 then output;
run;
In this instance this would be my preferred method for creating a separate data set for each group that satisfies whatever condition is applied (generate and include code dynamically):
/* A separate data set per GRP value in OUTGROUPS1 */
filename dynacode catalog "work.dynacode.mycode.source";
data _null_;
set outgroups1;
file dynacode;
put "data grp" grp ";";
put " set groups1(where=(grp=" grp "));";
put "run;" /;
run;
%inc dynacode;
If you are after the difference between just the 1st and 2nd ages, then the following code is a fairly straightforward way of extracting these. It reads though the dataset to identify the groups, then uses the direct access method, POINT=, to extract the relevant records. I put in an extra condition, grp=lag(grp) just in case you have any groups with only 1 record.
data want;
set have;
by grp;
if first.grp then do;
num_grp=0;
outflag=0;
end;
outflag+ifn(lag(first.grp)=1 and grp=lag(grp) and abs(dif(age))>5,1,0) /* set flag to determine if group meets criteria */;
if not first.grp then num_grp+1; /* count number of records in group */
if last.grp and outflag=1 then do i=_n_-num_grp to _n_;
set have point=i; /* extract required group records */
drop num_grp outflag;
output;
end;
run;
Here's an SQL approach (using CarolinaJay's code to create the dataset):
data groups1;
input id grp age sex $;
datalines;
1 1 60 M
2 1 21 M
3 2 30 M
4 2 25 F
5 3 45 F
6 3 30 F
7 3 18 M
8 4 32 M
9 4 18 M
10 4 16 M
;
run;
proc sql noprint;
create table xx as
select a.*
from groups1 a
where grp in (select b.grp
from groups1 b
join groups1 c on c.id = b.id+1
and c.grp = b.grp
and abs(c.age - b.age) > 5
left join groups1 d on d.id = b.id-1
and d.grp = b.grp
where d.id eq .
)
;
quit;
The join on C finds all occurrences where the subsequent record in the same group has an absolute value > 5. The join on D (and the where clause) makes sure we only consider the results from the C join if the record is the very first record in the group.
data have;
input id $ grp $ age sex $;
datalines;
1 1 60 M
2 1 21 M
3 2 30 M
4 2 25 F
5 3 45 F
6 3 30 F
7 3 18 M
8 4 32 M
9 4 18 M
10 4 16 M
;
data want;
do i = 1 by 1 until(last.grp);
set have;
by grp notsorted;
if first.grp then cnt = 0;
cnt + 1;
if cnt = 1 then age1 = age;
if cnt = 2 then age2 = age;
diff = sum( age1, -age2 );
end;
do until(last.grp);
set have;
by grp;
if diff > 5 then output;
end;
run;