SAS, calculate row difference - sas

data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
I have two columns of data ID and Month. Column 1 is the ID, the same ID may have multiple rows (1-5). The second column is the enrolled month. I want to create the third column. It calculates the different between the current month and the initial month for each ID.

you can do it like that.
data test;
input ID month d_month;
datalines;
1 59 0
1 70 11
1 80 21
2 10 0
2 11 1
2 13 3
3 5 0
3 9 4
4 8 0
;
run;
data calc;
set test;
by id;
retain current_month;
if first.id then do;
current_month=month;
calc_month=0;
end;
if ^first.id then do;
calc_month = month - current_month ;
end;
run;
Krs

Related

How to Capture previous row value and perform subtraction

How to Capture previous row value and perform subtraction
Refer Table 1 as main data, Table 2 as desired output, Let me explain you in detail, Closing_Bal is derived from (Opening_bal - EMI) for eg if (20 - 2) = 18, as value 18 i want in 2nd row under opening_bal column then ( opening_bal - EMI) and so till new LAN , If New LAN available then start the loop again ,
i have created lag function butnot able to run loop
Try this
data A;
input Month $ LAN Opening_Bal EMI Closing_Bal;
infile datalines dlm = '|' dsd;
datalines;
1_Nov|1|20|2|18
2_Dec|1| |3|
3_Jan|1| |5|
4_Feb|1| |3|
1_Nov|2|30|4|26
2_Dec|2| |3|
3_Jan|2| |2|
4_Feb|2| |5|
5_Mar|2| |6|
;
data B(drop = c);
set A;
by LAN;
if first.LAN then c = Closing_Bal;
if Opening_Bal = . then do;
Opening_Bal = c;
Closing_Bal = Opening_Bal - EMI;
c = Closing_Bal;
end;
retain c;
run;
Result:
Month LAN Opening_Bal EMI Closing_Bal
1_Nov 1 20 2 18
2_Dec 1 18 3 15
3_Jan 1 15 5 10
4_Feb 1 10 3 7
1_Nov 2 30 4 26
2_Dec 2 26 3 23
3_Jan 2 23 2 21
4_Feb 2 21 5 16
5_Mar 2 16 6 10
The problem is that you already have CLOSING_BAL on the input dataset, so when the SET statement reads a new observation it will overwrite the value calculated on the previous observation. Either drop or rename the variable in the source dataset.
Example:
data have;
input Month $ LAN Opening_Bal EMI Closing_Bal;
datalines;
1_Nov 1 20 2 18
2_Dec 1 . 3 .
3_Jan 1 . 5 .
4_Feb 1 . 3 .
1_Nov 2 30 4 26
2_Dec 2 . 3 .
3_Jan 2 . 2 .
4_Feb 2 . 5 .
5_Mar 2 . 6 .
;
data want;
set have (drop=closing_bal);
retain Closing_Bal;
Opening_Bal=coalesce(Opening_Bal,Closing_Bal);
Closing_bal=Opening_bal - EMI ;
run;
Results:
Opening_ Closing_
Obs Month LAN Bal EMI Bal
1 1_Nov 1 20 2 18
2 2_Dec 1 18 3 15
3 3_Jan 1 15 5 10
4 4_Feb 1 10 3 7
5 1_Nov 2 30 4 26
6 2_Dec 2 26 3 23
7 3_Jan 2 23 2 21
8 4_Feb 2 21 5 16
9 5_Mar 2 16 6 10
I am not sure this works
data B;
set A;
by lan;
if not first.lan then do;
opening_bal = lag(closing_bal);
closing_bal = opening_bal - EMI;
end;
run;
because you don't execute lag for each observation.

Create values for group - SAS

data test;
input Index Indicator value FinalValue;
datalines;
1 0 5 21
1 1 21 21
2 1 0 0
3 0 4 7
3 1 7 7
3 0 8 7
3 0 2 7
4 1 1 1
4 0 4 1
;
run;
I have a data set with the first 3 columns. How do I get the 4th columns based on the indicators? For example, for the index, when the indicator =1, the value is 21, so I put 21 is the final values in all lines for index 1.
Use the SAS Retain Keyword.
You can do this in a data step; by Retaining the Value where indicator = 1.
Steps:
Sort your data by Index and Indicator
Group by the Index & Retain the Value where Indicator=1
Code:
/*Sort Data by Index and Indicator & remove the hardcodeed finalvalue*/
proc sort data=test (keep= Index Indicator value);
by index descending indicator ;
run;
/*Retain the FinalValue*/
data want;
set test;
retain FinalValue;
keep Index Indicator value FinalValue;
if indicator =1 then do;FinalValue=value;end;
/*The If statement below will assign . to records that doesn't have an indicator value of 1*/
if indicator ne 1 and FIRST.Index=1 then FinalValue=.;
by index;
run;
Output:
Index=1 Indicator=1 value=21 FinalValue=21
Index=1 Indicator=0 value=5 FinalValue=21
Index=2 Indicator=1 value=0 FinalValue=0
Index=3 Indicator=1 value=7 FinalValue=7
Index=3 Indicator=0 value=4 FinalValue=7
Index=3 Indicator=0 value=8 FinalValue=7
Index=3 Indicator=0 value=2 FinalValue=7
Index=4 Indicator=1 value=1 FinalValue=1
Index=4 Indicator=0 value=4 FinalValue=1
Use proc sql by left join. Select value which indicator=1 and group by index, then left join with original dataset. It seemed that your first row of index=3 should be 7, not 0.
proc sql;
select a.*,b.finalvalue from test a
left join (select *,value as finalvalue from test group by index having indicator=1) b
on a.index=b.index;
quit;
This is rather old school but should be adequate. I reckon you call it a self merge or something.
data test;
input Index Indicator value;* FinalValue;
datalines;
1 0 5 21
1 1 21 21
2 1 0 0
3 0 4 7
3 1 7 7
3 0 8 7
3 0 2 7
4 1 1 1
4 0 4 1
;;;;
run;
data final;
if 0 then set test;
merge test(where=(indicator eq 1) rename=(value=FinalValue)) test;
by index;
run;
proc print;
run;
Final
Obs Index Indicator value Value
1 1 0 5 21
2 1 1 21 21
3 2 1 0 0
4 3 0 4 7
5 3 1 7 7
6 3 0 8 7
7 3 0 2 7
8 4 1 1 1
9 4 0 4 1

Filling in gaps between sequentially numbered records and updating a status indicator

In a summarized dataset, I have the status of an event at each hour after baseline in which it was recorded. I also have the last hour the event could have been recorded. I want to create a new dataset with one record for each hour from the first through the last hour, with the status for each record being the one from the last recorded status.
Here is an example dataset:
data new;
input hour status last_hour;
cards;
2 1 12
4 1 12
5 1 12
6 1 12
7 0 12
9 1 12
10 0 12
;
run;
In this case, the first recorded hour was the second, and the last recorded hour was the 10th. The last possible hour to record data was the 12th.
The final dataset should look like so:
0 . 12
1 . 12
2 1 12
3 1 12
4 1 12
5 1 12
6 1 12
7 0 12
8 0 12
9 1 12
10 0 12
11 0 12
12 0 12
I sort of have it working with this series of data steps, but I'm not sure if there's a cleaner way I'm not seeing.
data step1;
set new (keep=id hour);
by id;
do hour = 0 to last_hour;
output;
end;
run;
proc sort data=step1;
by id hour;
run;
proc sql;
create table step2 as
select distinct a.id, a.hour, b.status
from step1 as a
left join new as b
on a.id = b.id
and a.hour = b.hour
order by a.id, a.hour;
quit;
data step3;
set step2;
by id hour;
retain previous_status;
if first.id then do;
previous_status = .;
if status > . then previous_status = status;
end;
if not first.id then do;
if status = . and previous_status > . then status = previous_status;
if status > . then previous_status = status;
end;
run;
Seeing your code, it seems you left out of your question the fact that you also have id's. So this is a newer solution that deals with different id's. See further below for my first solution ignoring id's.
Since last_hour is always 12, I left it out of the have dataset. It will be added later on.
data have;
input id hour status;
cards;
1 2 1
1 4 1
1 5 1
1 6 1
1 7 0
1 9 1
1 10 0
2 2 1
2 4 1
2 5 1
2 6 1
2 7 0
2 9 1
2 10 0
;
Create a hours dataset, just containing numbers 0 thru 12;
data hours;
do i = 0 to 12;
hour = i;
output;
end;
drop i;
run;
Create a temporary dataset that will have the right number of rows (13 rows for every id, with valid hour values where they exist in the have table).
proc sql;
create table tmp as
select distinct t1.id, t2.hour, 12 as last_hour
from have as t1
cross join
(select hour from hours) as t2;
quit;
Then use merge and retain to fill in the missing hour column where appropriate.
data want;
merge have
tmp;
by id hour;
retain status_previous;
if not first.id then do;
if status ne . then status_previous = status;
else if status_previous ne . then status = status_previous;
end;
if last.id then status_previous = .;
drop status_previous;
run;
Previous solution (no id's)
If last_hour is always 12, then this should do it:
data have;
input hour status last_hour;
datalines;
2 1 12
4 1 12
5 1 12
6 1 12
7 0 12
9 1 12
10 0 12
;
data hours;
do i = 0 to 12;
hour = i;
last_hour = 12;
output;
end;
drop i;
run;
data want;
merge have
hours;
by hour;
retain status_previous;
if status ne . then status_previous = status;
else if status_previous ne . then status = status_previous;
drop status_previous;
run;

SAS Creating a count variable that resets

I'd like to create a count variable that resets every 6 months by the manner of crime. Here is my code :
proc sort data=incident_v1;
by incident year;
run;
Data incident_v2;
set incident_v1 end= eof;
by incident;
do i=1 until (eof);
do j = 1 to 6;
retain ID 0;
if first.incident then ID=ID +1;
end;
end;
run;
Year-mo incident # of incidents Count every six months
199901 car-jack 6 1
199902 car-jack 7 2
199903 car-jack 12 3
199904 car-jack 8 4
199905 car-jack 13 5
199906 car-jack 8 6
199907 car-jack 13 1
199908 car-jack 6 2
199909 car-jack 8 6
200001 robbery 5 1
200002 robbery 5 2
200003 robbery 8 3
200004 robbery 4 4
200005 robbery 6 5
200006 robbery 14 6
Try this:
proc sort data=incident_v1;
by incident year;
run;
Data incident_v2;
set incident_v1 end= eof;
by incident;
/*Reset Count if New Incident Type*/
if first.incident then Incident_count_6_mnths = 0;
/*Reset Count if month is 07 (which from your data below I believe to be actually year & month and of the form YYYYMM*/
if substr(year,5,2) = "07" then Incident_count_6_mnths = 0;
/*if year is a not a text field you will have to use: substr(compress(put(year,8.)),5,2) = "07" */
/*Add a counter - please note there is no need for a retain statement as the +1 will automatically retain.*/
Incident_count_6_mnths+1;
run;

subset of dataset using first and last in sas

Hi I am trying to subset a dataset which has following
ID sal count
1 10 1
1 10 2
1 10 3
1 10 4
2 20 1
2 20 2
2 20 3
3 30 1
3 30 2
3 30 3
3 30 4
I want to take out only those IDs who are recorded 4 times.
I wrote like
data AN; set BU
if last.count gt 4 and last.count lt 4 then delete;
run;
But there is something wrong.
EDIT - Thanks for clarifying. Based on your needs, PROC SQL will be more direct:
proc sql;
CREATE TABLE AN as
SELECT * FROM BU
GROUP BY ID
HAVING MAX(COUNT) = 4
;quit;
For posterity, here is how you could do it with only a data step:
In order to use first. and last., you need to use a by clause, which requires sorting:
proc sort data=BU;
by ID DESCENDING count;
run;
When using a SET statement BY ID, first.ID will be equal to 1 (TRUE) on the first instance of a given ID, 0 (FALSE) for all other records.
data AN;
set BU;
by ID;
retain keepMe;
If first.ID THEN DO;
IF count = 4 THEN keepMe=1;
ELSE keepMe=0;
END;
if keepMe=0 THEN DELETE;
run;
During the datastep BY ID, your data will look like:
ID sal count keepMe first.ID
1 10 4 1 1
1 10 3 1 0
1 10 2 1 0
1 10 1 1 0
2 20 3 0 1
2 20 2 0 0
2 20 1 0 0
3 30 4 1 1
3 30 3 1 0
3 30 2 1 0
3 30 1 1 0
If I understand correct, you are trying to extract all observations are are repeated 4 time or more. if so, your use of last.count and first.count is wrong. last.var is a boolean and it will indicate which observation is last in the group. Have a look at Tim's suggestion.
In order to extract all observations that are repeated four times or more, I would suggest to use the following PROC SQL:
PROC SQL;
CREATE TABLE WORK.WANT AS
SELECT /* COUNT_of_ID */
(COUNT(t1.ID)) AS COUNT_of_ID,
t1.ID,
t1.SAL,
t1.count
FROM WORK.HAVE t1
GROUP BY t1.ID
HAVING (CALCULATED COUNT_of_ID) ge 4
ORDER BY t1.ID,
t1.SAL,
t1.count;
QUIT;
Result:
1 10 1
1 10 2
1 10 3
1 10 4
3 30 1
3 30 2
3 30 3
3 30 4
Slight variation on Tims - assuming you don't necessarily have the count variable.
proc sql;
CREATE TABLE AN as
SELECT * FROM BU
GROUP BY ID
HAVING Count(ID) >= 4;
quit;