SAS count a sequence of equal numbers - sas

I wish to get last number in a sequence of equal numbers. For example, I have the following dataset:
X
1
1
0
0
0
1
1
0
Given that sequence of numbers I need extract the last number of a sequence of "ones" until appear a 0. That is what I want:
X Seq
1 1
1 2
1 3
0 1
0 2
0 3
1 1
1 2
0 1
1 1
1 2
1 3
0 1
I need create a new dataset with the numbers in bold, that is:
Seq1
3
2
3
Thanks for any advice.

One more option - use the NOTSORTED BY group option.
Data want;
Set have;
By x NOTSORTED;
Retain count;
If first.x then count=1;
Else count+1;
If last.x then output;
Keep count;
Run;

You can create group variables using a lag and then keep the last observation of each group you have created:
data temp;
input x $;
datalines;
1
1
1
0
0
0
1
1
0
1
1
1
0
;
run;
data temp2;
set temp;
retain flag;
if lag(x) > x then flag = _n_;
if x = 0 then delete;
run;
data temp3 (keep = seq1);
set temp2;
seq1 + 1;
by flag;
if first.flag then seq1 = 1;
if last.flag then save = 1;
if missing(save) then delete;
run;

Use a proc summary with notsorted option here:
data math;
input x;
datalines;
1
1
1
0
0
0
1
1
0
1
0
1
1
1
1
0
1
;
run;
proc summary data=math;
by x notsorted;
class x;
output Out=z;
run;
data z (drop=_type_ x);
set z (rename=(_FREQ_=COUNT));
where _type_=1 and x=1;/*if you are looking for other number then 1, replace it here*/
run;
proc print data=z noobs;
run;
result is:

Here's a solution using only one data step with a retain statement.
data have;
input x ##;
output;
datalines;
1 1 1 0 0 0 1 1 0 1 0 1 1 1 1 0 1
;
data want(keep = count);
set have end = last;
retain x_previous . count .;
if x = 0 then do;
if x_previous = 1 then do;
output;
count = 0;
end;
end;
else if x = 1 then count + 1;
if last = 1 and count > 0 then output;
x_previous = x;
run;
Results
count
-----
3
2
1
4
1

Related

SAS LOOP - create columns from the records which are having a value

Suppose i have random diagnostic codes, such as 001, v58, ..., 142,.. How can I construct columns from the codes which is 1 for the records?
Input:
id found code
1 1 001
2 0 v58
3 1 v58
4 1 003
5 0 v58
......
......
15000 0 v58
Output:
id code_001 code_v58 code_003 .......
1 1 0 0
2 0 0 0
3 0 1 0
4 1 0 0
5 0 0 0
.........
.........
You will want to TRANSPOSE the values and name the pivoted columns according to data (value of code) with an ID statement.
Example:
In real world data it is often the case that missing diagnoses will be flagged zero, and that has to be done in a subsequent step.
data have;
input id found code $;
datalines;
1 1 001
2 0 v58
2 1 003 /* second diagnosis result for patient 2 */
3 1 v58
4 1 003
5 0 v58
;
proc transpose data=have out=want(drop=_name_) prefix=code_;
by id;
id code; * column name becomes <prefix><code>;
var found;
run;
* missing occurs when an id was not diagnosed with a code;
* if that means the flag should be zero (for logistic modeling perhaps)
* the missings need to be changed to zeroes;
data want;
set want;
array codes code_:;
do _n_ = 1 to dim(codes); /* repurpose automatic variable _n_ for loop index */
if missing(codes(_n_)) then codes(_n_) = 0;
end;
run;

SAS: How can I test the stability of a value among time

I have this database:
data temp;
input ID monitoring_date score ;
datalines;
1 10/11/2006 0
1 10/12/2006 0
1 15/01/2007 1
1 20/01/2007 1
1 20/04/2007 1
2 10/08/2008 0
2 11/09/2008 0
2 17/10/2008 1
2 12/11/2008 0
3 10/12/2008 0
3 10/08/2008 0
3 11/09/2008 0
3 17/10/2009 1
3 12/12/2009 1
3 05/01/2010 0
4 10/12/2006 0
4 10/08/2006 0
4 11/09/2006 0
4 17/10/2007 0
4 12/12/2007 0
4 09/04/2008 1
4 05/08/2008 1
5 10/12/2013 0
5 03/09/2013 0
5 11/09/2013 0
5 19/10/2014 0
5 10/12/2014 1
5 14/01/2015 1
6 10/12/2017 0
6 10/08/2018 0
6 11/09/2018 0
6 17/10/2018 1
6 12/12/2018 1
6 09/04/2019 1
6 25/07/2019 0
6 05/08/2019 1
6 15/09/2019 0
;
I would like to create a new database with a new column where I note, for each ID, the date of the first progression of the score from 0 to 1 and if this progression is stable at least 3 months until at the end of monitoring else date_progresion = . :
data want;
input ID date_progression;
datalines;
1 15/01/2007
2 .
3 .
4 09/04/2008
5 .
6 .
;
I really have no idea to code this and I would like to get the wanted data to generate a cox model where the progression (Yes/No) is my event.
I am really stuck !
Thank you in advance for your help!
A DOW loop can process the ID groups, tracking for a single active run of 1s. A run has a start date and duration.
Example:
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
select;
when (pscore = 0 and score = 1) do; state = 1; start = date; dur = 1; end;
when (pscore = 1 and score = 1) do; state = 2; dur + 1; end;
when (pscore = 1 and score = 0) do; state = 3; start = .; dur = .; end;
when (pscore = 0 and score = 0) do; state = 4; end;
otherwise;
end;
pscore = score;
end;
if state = 2 and dur >= 3 then progression_date = start;
keep ID progression_date;
format progression_date yymmdd10.;
run;

Identifying first occurrence after trigger event

I have a big panel dataset that looks somewhat like this:
data have;
input id t a b ;
datalines;
1 1 0 0
1 2 0 0
1 3 1 0
1 4 0 0
1 5 0 1
1 6 1 0
1 7 0 0
1 8 0 0
1 9 0 0
1 10 0 1
2 1 0 0
2 2 1 0
2 3 0 0
2 4 0 0
2 5 0 1
2 6 0 1
2 7 0 1
2 8 0 1
2 9 1 0
2 10 0 1
3 1 0 0
3 2 0 0
3 3 0 0
3 4 0 0
3 5 0 0
3 6 0 0
3 7 1 0
3 8 0 0
3 9 0 0
3 10 0 0
;
run;
For every ID I want to record all 'trigger' events, namely when a=1 and then I need to how long it takes to the next occurrence of b=1. The final output should then give me the following:
data want;
input id a_no a_t b_t diff ;
datalines;
1 1 3 5 2
1 2 6 10 4
2 1 2 5 3
2 2 9 10 1
3 1 7 . .
;
run;
It is of course no problem to get all a=1 and b=1 events, but as it is a very big dataset with a lot of both events for every ID I am searching for an elegant and straight-forward solution. Any ideas?
Here's a fairly simple SQL approach that gives more or less the desired output:
proc sql;
create table want
as select
t1.id,
t1.t as a_t,
t2.t as b_t,
t2.t - t1.t as diff
from
have(where = (a=1)) t1
left join
have(where = (b=1)) t2
on
t1.id = t2.id
and t2.t > t1.t
group by t1.id, t1.t
having diff = min(diff)
;
quit;
The only part missing is a_no. This sort of row-incrementing ID is quite a lot of work to generate consistently in SQL, but it's trivial with an extra data step:
data want;
set want;
by id;
if first.id then a_no = 0;
a_no + 1;
run;
An elegant DATA step way can use nested DOW loops. It's straight forward when you understand DOW loops.
data want(keep=id--diff);
length id a_no a_t b_t diff 8;
do until (last.id); * process each group;
do a_no = 1 by 1 until(last.id); * counter for each output;
do until ( output_condition or end); * process each triggering state change;
SET have end=end; * read data;
by id; * setup first. last. variables for group;
if a=1 then a_t = t; * detect and record start of trigger state;
output_condition = (b=1 and t > a_t > 0); * evaluate for proper end of trigger state;
end;
if output_condition then do;
b_t = t; * compute remaining info at output point;
diff = b_t - a_t;
OUTPUT;
a_t = .; * reset trigger state tracking variables;
b_t = .;
end;
else
OUTPUT; * end of data reached without triggered output;
end;
end;
run;
Note: A SQL way (not shown) can use self join within groups.

Create different decimal format in SAS proc report and ODS

I have a sample SAS code below. And I want to know how to proc report the percentage using two decimal formats. For example, I want to put 100% as just zero decimal, and all the other percentages which is not 100% with 1 decimal such like 25.0%.
Here is my code.
data a;
infile datalines missover;
input subjid trt itt safety pp complete enroll disreason;
uncomplete=(complete^=1);
datalines;
1 1 1 1 1 1 1
2 2 0 1 1 1 1
3 1 1 1 1 0 1 4
4 2 1 1 1 1 1
5 1 1 1 0 0 1 5
6 2 1 1 1 1 1
7 2 1 1 1 0 1 1
8 1 0 1 0 1 1
9 1 1 1 1 0 1 5
10 2 0 1 0 0 1 5
11 2 1 1 1 0 1 1
12 2 1 1 0 0 1 2
13 1 1 1 0 0 1 3
14 2 1 1 0 0 1 4
;
run;
data b;
set a(in=a) a(in=b);
if b then trt=3;
run;
data c;
length cat $20;
set b;
array a(6) itt safety pp complete uncomplete enroll;
array b(5) r1 r2 r3 r4 r5;
do i=1 to 6;
cat=upcase(vname(a(i)));
value=a(i);
output;
end;
do j=1 to 5;
if disreason=j then value=1;
else value=0;
cat=upcase(vname(b(j)));
output;
end;
keep trt cat value;
run;
proc format ;
value $newcat(notsorted)
'ENROLL'='Enrolled Population'
'1'='1'
'PP'='Per-Protocol Population'
'2'='2'
'ITT'='ITT Population'
'3'='3'
'SAFETY'='Safety Population'
'4'='4'
'COMPLETE'='Patients Completed'
'UNCOMPLETE'='Patients Discontinued'
'5'='Primary Reason for Discontinuation of Study Dose'
'R1'='\li360 Lack of Effect'
'R2'='\li360 Protocol Violation'
'R3'='\li360 Lost to Follow-up'
'R4'='\li360 Adverse Event'
'R5'='\li360 Personal Reason';
value trt 1='Treatment 1'
2='Treatment 2'
3='Overall';
picture pct(round)
0<-100='0009)'(prefix='(' mult=100)
0=0;
run;
option missing='' nodate nonumber orientation=landscape;
ods rtf file='c:\dispoistion.rtf';
proc report data=c completerows nowd
style(report)={frame=hsides rules=groups}
style(header)={background=white};
column cat cat2 trt, value, (sum mean);
define cat/group format=$newcat. preloadfmt order=data noprint;
define cat2/computed ' ' style(column)={cellwidth=30% PROTECTSPECIALCHARS=OFF
};
define trt/across format=trt. '' order=internal;
define value/analysis '';
define sum/ 'N' style(column)={cellwidth=35pt} style(header)={just=right};
define mean/ '(%)' format=pct. style(column)={cellwidth=35pt just=left}
style(header)={just=left};
compute cat2/char length=50;
cat2=put(cat, $newcat.);
if cat2 in ('1', '2', '3','4') then cat2='';
endcomp;
run;
ods rtf close;
You need to use a picture format:
data test;
input x;
datalines;
0.5
0.751
0.999
1.00
;;;;
run;
proc format;
picture pct100f
-1 = [PERCENT7.0]
-1<-<1 = [PERCENT7.1]
1 = [PERCENT7.0]
other=[BEST12.];
quit;
proc print data=test;
format x pct100f.;
var x;
run;
Adjust that as needed. The -1 <-< 1 means anything that is between -1 and 1, exclusive. The [PERCENT7.0] is telling it to use that format for that section.
Try using this format:
define mean/ '(%)' format=percent7.1

Using Retain Statement for Mathematical Operations in SAS

I have a dataset with 4 observations (rows) per person.
I want to create three new variables that calculate the difference between the second and first, third and second, and fourth and third rows.
I think retain can do this, but I'm not sure how.
Or do I need an array?
Thanks!
data test;
input person var;
datalines;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
;
run;
data test;
set test;
by person notsorted;
retain pos;
array diffs{*} diff0-diff3;
retain diff0-diff3;
if first.person then do;
pos = 0;
end;
pos + 1;
diffs{pos} = dif(var);
if last.person then output;
drop var diff0 pos;
run;
Why not use The Lag function.
data test; input person var;
cards;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
run;
data test; set test;
by person;
LagVar=Lag(Var);
difference=var-Lagvar;
if first.person then difference=.;
run;
An alternative approach without arrays.
/*-- Data from simonn's answer --*/
data SO1019005;
input person var;
datalines;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
;
run;
/*-- Why not just do a transpose? --*/
proc transpose data=SO1019005 out=NewData;
by person;
run;
/*-- Now calculate your new vars --*/
data NewDataWithVars;
set NewData;
NewVar1 = Col2 - Col1;
NewVar2 = Col3 - Col2;
Newvar3 = Col4 - Col3;
run;
Why not use the dif() function instead?
/* test data */
data one;
do id = 1 to 2;
do v = 1 to 4 by 1;
output;
end;
end;
run;
/* check */
proc print data=one;
run;
/* on lst
Obs id v
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 2 3
8 2 4
*/
/* now create diff within id */
data two;
set one;
by id notsorted; /* assuming already in order */
dif = ifn(first.id, ., dif(v));
run;
proc print data=two;
run;
/* on lst
Obs id v dif
1 1 1 .
2 1 2 1
3 1 3 1
4 1 4 1
5 2 1 .
6 2 2 1
7 2 3 1
8 2 4 1
*/
data output_data;
retain count previous_value diff1 diff2 diff3;
set data input_data
by person;
if first.person then do;
count = 0;
end;
else do;
count = count + 1;
if count = 1 then diff1 = abs(value - previous_value);
if count = 2 then diff2 = abs(value - previous_value);
if count = 3 then do;
diff3 = abs(value - previous_value);
output output_data;
end;
end;
previous_value = value;
run;