SAS data mining

SAS data mining - sas

I have a dataset with ID & a Good/Bad indicator variable:
ID Good_Bad
734374 0
4834110 1
I want to extrapolate the 1's 12 times as 0s so that for every 0 I have 12 1s like this:
ID Good_Bad
734374 0
4834110 1
4834110 1
4834110 1
4834110 1
4834110 1
4834110 1
4834110 1
4834110 1
4834110 1
4834110 1
4834110 1
4834110 1
& this has to be done using SAS. Can anyone help out?
Thanks in advance!!

data want;
set have;
if good_bad = 0 then output;
else if good_bad = 1 then do;
do i = 1 to 12;
output;
end;
end;
drop i;
run;

Related

How do I find first row of last group in SAS, where ordering matters?

I'd like to ask help in this, as I am new to SAS, but a PROC SQL approach is usable as well.
My dataset has IDs, a time variable, and a flag. After I sort by id and time, I need to find the first flagged observation of the last flagged group/streak. As in:
ID TIME FLAG
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
Here I want my script to return the row where time is 8 for ID 1, as it is the first observation from the last "streak", or flagged group. For ID 2 it should be where time is 3.
Desired output:
ID TIME FLAG
1 8 1
2 3 1
I'm trying to wrap my head around using first. and last. here, but I suppose the problem here is that I view temporally displaced flagged groups/streaks as different groups, while SAS looks at them as they are only separated by flag, so a simple "take first. from last." is not sufficient.
I was also thinking of collapsing the flags to a string and using a regex lookahead, but I couldn't come up with either the method or the pattern.

I would just code a double DOW loop. The first will let you calculate the observation for this ID that you want to output and the second will read through the records again and output the selected observation.
You can use the NOTSORTED keyword on the BY statement to have SAS calculate the FIRST.FLAG variable.
data have;
input ID TIME FLAG;
cards;
1 2 1
1 3 1
1 4 1
1 5 0
1 6 1
1 7 0
1 8 1
1 9 1
1 10 1
2 2 0
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
;
data want;
do obs=1 by 1 until(last.id);
set have;
by id flag notsorted;
if first.flag then want=obs;
end;
do obs=1 to obs;
set have;
if obs=want then output;
end;
drop obs want;
run;

Loop through the dataset by id. Use the lag function to look at the current and previous value of flag. If the current value is 1 and the previous value is 0, or it's the first observation for that ID, write the value of time to a retained variable. Only output the last observation for each id. The retained variable should contain the time of the first flagged observation of the last flagged group:
data result;
set have;
by id;
retain firstflagged;
prevflag = lag(flag);
if first.id and flag = 1 then firstflagged = time;
else if first.id and flag = 0 then firstflagged = .;
else if flag = 1 and prevflag = 0 then firstflagged = time;
if last.id then output;
keep id firstflagged flag;
rename firstflagged = time;
run;

SAS: How can I test the stability of a value among time

I have this database:
data temp;
input ID monitoring_date score ;
datalines;
1 10/11/2006 0
1 10/12/2006 0
1 15/01/2007 1
1 20/01/2007 1
1 20/04/2007 1
2 10/08/2008 0
2 11/09/2008 0
2 17/10/2008 1
2 12/11/2008 0
3 10/12/2008 0
3 10/08/2008 0
3 11/09/2008 0
3 17/10/2009 1
3 12/12/2009 1
3 05/01/2010 0
4 10/12/2006 0
4 10/08/2006 0
4 11/09/2006 0
4 17/10/2007 0
4 12/12/2007 0
4 09/04/2008 1
4 05/08/2008 1
5 10/12/2013 0
5 03/09/2013 0
5 11/09/2013 0
5 19/10/2014 0
5 10/12/2014 1
5 14/01/2015 1
6 10/12/2017 0
6 10/08/2018 0
6 11/09/2018 0
6 17/10/2018 1
6 12/12/2018 1
6 09/04/2019 1
6 25/07/2019 0
6 05/08/2019 1
6 15/09/2019 0
;
I would like to create a new database with a new column where I note, for each ID, the date of the first progression of the score from 0 to 1 and if this progression is stable at least 3 months until at the end of monitoring else date_progresion = . :
data want;
input ID date_progression;
datalines;
1 15/01/2007
2 .
3 .
4 09/04/2008
5 .
6 .
;
I really have no idea to code this and I would like to get the wanted data to generate a cox model where the progression (Yes/No) is my event.
I am really stuck !
Thank you in advance for your help!

A DOW loop can process the ID groups, tracking for a single active run of 1s. A run has a start date and duration.
Example:
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
select;
when (pscore = 0 and score = 1) do; state = 1; start = date; dur = 1; end;
when (pscore = 1 and score = 1) do; state = 2; dur + 1; end;
when (pscore = 1 and score = 0) do; state = 3; start = .; dur = .; end;
when (pscore = 0 and score = 0) do; state = 4; end;
otherwise;
end;
pscore = score;
end;
if state = 2 and dur >= 3 then progression_date = start;
keep ID progression_date;
format progression_date yymmdd10.;
run;

Identifying first occurrence after trigger event

I have a big panel dataset that looks somewhat like this:
data have;
input id t a b ;
datalines;
1 1 0 0
1 2 0 0
1 3 1 0
1 4 0 0
1 5 0 1
1 6 1 0
1 7 0 0
1 8 0 0
1 9 0 0
1 10 0 1
2 1 0 0
2 2 1 0
2 3 0 0
2 4 0 0
2 5 0 1
2 6 0 1
2 7 0 1
2 8 0 1
2 9 1 0
2 10 0 1
3 1 0 0
3 2 0 0
3 3 0 0
3 4 0 0
3 5 0 0
3 6 0 0
3 7 1 0
3 8 0 0
3 9 0 0
3 10 0 0
;
run;
For every ID I want to record all 'trigger' events, namely when a=1 and then I need to how long it takes to the next occurrence of b=1. The final output should then give me the following:
data want;
input id a_no a_t b_t diff ;
datalines;
1 1 3 5 2
1 2 6 10 4
2 1 2 5 3
2 2 9 10 1
3 1 7 . .
;
run;
It is of course no problem to get all a=1 and b=1 events, but as it is a very big dataset with a lot of both events for every ID I am searching for an elegant and straight-forward solution. Any ideas?

Here's a fairly simple SQL approach that gives more or less the desired output:
proc sql;
create table want
as select
t1.id,
t1.t as a_t,
t2.t as b_t,
t2.t - t1.t as diff
from
have(where = (a=1)) t1
left join
have(where = (b=1)) t2
on
t1.id = t2.id
and t2.t > t1.t
group by t1.id, t1.t
having diff = min(diff)
;
quit;
The only part missing is a_no. This sort of row-incrementing ID is quite a lot of work to generate consistently in SQL, but it's trivial with an extra data step:
data want;
set want;
by id;
if first.id then a_no = 0;
a_no + 1;
run;

An elegant DATA step way can use nested DOW loops. It's straight forward when you understand DOW loops.
data want(keep=id--diff);
length id a_no a_t b_t diff 8;
do until (last.id); * process each group;
do a_no = 1 by 1 until(last.id); * counter for each output;
do until ( output_condition or end); * process each triggering state change;
SET have end=end; * read data;
by id; * setup first. last. variables for group;
if a=1 then a_t = t; * detect and record start of trigger state;
output_condition = (b=1 and t > a_t > 0); * evaluate for proper end of trigger state;
end;
if output_condition then do;
b_t = t; * compute remaining info at output point;
diff = b_t - a_t;
OUTPUT;
a_t = .; * reset trigger state tracking variables;
b_t = .;
end;
else
OUTPUT; * end of data reached without triggered output;
end;
end;
run;
Note: A SQL way (not shown) can use self join within groups.

Create values for group - SAS

data test;
input Index Indicator value FinalValue;
datalines;
1 0 5 21
1 1 21 21
2 1 0 0
3 0 4 7
3 1 7 7
3 0 8 7
3 0 2 7
4 1 1 1
4 0 4 1
;
run;
I have a data set with the first 3 columns. How do I get the 4th columns based on the indicators? For example, for the index, when the indicator =1, the value is 21, so I put 21 is the final values in all lines for index 1.

Use the SAS Retain Keyword.
You can do this in a data step; by Retaining the Value where indicator = 1.
Steps:
Sort your data by Index and Indicator
Group by the Index & Retain the Value where Indicator=1
Code:
/*Sort Data by Index and Indicator & remove the hardcodeed finalvalue*/
proc sort data=test (keep= Index Indicator value);
by index descending indicator ;
run;
/*Retain the FinalValue*/
data want;
set test;
retain FinalValue;
keep Index Indicator value FinalValue;
if indicator =1 then do;FinalValue=value;end;
/*The If statement below will assign . to records that doesn't have an indicator value of 1*/
if indicator ne 1 and FIRST.Index=1 then FinalValue=.;
by index;
run;
Output:
Index=1 Indicator=1 value=21 FinalValue=21
Index=1 Indicator=0 value=5 FinalValue=21
Index=2 Indicator=1 value=0 FinalValue=0
Index=3 Indicator=1 value=7 FinalValue=7
Index=3 Indicator=0 value=4 FinalValue=7
Index=3 Indicator=0 value=8 FinalValue=7
Index=3 Indicator=0 value=2 FinalValue=7
Index=4 Indicator=1 value=1 FinalValue=1
Index=4 Indicator=0 value=4 FinalValue=1

Use proc sql by left join. Select value which indicator=1 and group by index, then left join with original dataset. It seemed that your first row of index=3 should be 7, not 0.
proc sql;
select a.*,b.finalvalue from test a
left join (select *,value as finalvalue from test group by index having indicator=1) b
on a.index=b.index;
quit;

This is rather old school but should be adequate. I reckon you call it a self merge or something.
data test;
input Index Indicator value;* FinalValue;
datalines;
1 0 5 21
1 1 21 21
2 1 0 0
3 0 4 7
3 1 7 7
3 0 8 7
3 0 2 7
4 1 1 1
4 0 4 1
;;;;
run;
data final;
if 0 then set test;
merge test(where=(indicator eq 1) rename=(value=FinalValue)) test;
by index;
run;
proc print;
run;
Final
Obs Index Indicator value Value
1 1 0 5 21
2 1 1 21 21
3 2 1 0 0
4 3 0 4 7
5 3 1 7 7
6 3 0 8 7
7 3 0 2 7
8 4 1 1 1
9 4 0 4 1

find the count of num column changing by id

Please can anyone help me the follwing probelm.
I have following dummy data:
id num
1 1
1 2
1 1
1 2
1 1
1 2
2 1
2 15
2 1
2 1
2 1
2 15
2 1
2 15
How to count number of times num (column) is changing for each id?
Please find the results and new column.
I need results like this
id number no_of_times
1 1 1
1 2 1
1 1 1
1 2 2
1 1 1
1 2 3
2 1 1
2 15 1
2 1 1
2 1 1
2 1 1
2 15 2
2 1 1
2 15 3
Hope you can understand after seeing the results

The following hash approach works for the test data provided with the question:
data have;
input id number no_of_times_target;
cards;
1 1 1
1 2 1
1 1 1
1 2 2
1 1 1
1 2 3
2 1 1
2 15 1
2 1 1
2 1 1
2 1 1
2 15 2
2 1 1
2 15 3
;
run;
data want;
set have;
by id;
if _n_ = 1 then do;
length prev_number no_of_times 8;
declare hash h();
rc = h.definekey('number','prev_number');
rc = h.definedata('no_of_times');
rc = h.definedone();
end;
prev_number = lag(number);
if number > prev_number and not(first.id) then do;
rc = h.find();
no_of_times = sum(no_of_times,1);
rc = h.replace();
end;
else no_of_times = 1;
if last.id then rc = h.clear();
drop rc prev_number;
run;

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

SAS data mining - sas

data want; set have; if good_bad = 0 then output; else if good_bad = 1 then do; do i = 1 to 12; output; end; end; drop i; run;

Related

How do I find first row of last group in SAS, where ordering matters?

SAS: How can I test the stability of a value among time

Identifying first occurrence after trigger event

Create values for group - SAS

find the count of num column changing by id

Categories

Resources