How to tricky rank SAS? - sas

SO For example I have data like this,
KEY first count
11Y 1
11Y 2
11N 3
11N 4
11Y 5
11N 6
I want out put like this
KEY first count RANKS
11Y 1 1
11Y 2 1
11N 3 2
11N 4 2
11Y 5 3
11N 6 4
how can I do in SAS?
Thanks
I did this
proc sort data=step3;
by first_count key;
run;
data step4;
set step3;
by key;
if first.key THEN ranks=1;
else ranks+1;
run;
This causing error
ERROR: BY variables are not properly sorted on data set WORK.STEP3

You need NOTSORTED.
data key;
input KEY:$3. count;
cards;
11Y 1
11Y 2
11N 3
11N 4
11Y 5
11N 6
;
run;
data key2;
set key;
by key NOTSORTED;
if first.key then rank+1;
run;
proc print;
run;

Based on DN's anwser, You can do it without first.var statement.
data key;
input KEY:$3. count;
cards;
11Y 1
11Y 2
11N 3
11N 4
11Y 5
11N 6
;
run;
data key2;
set key;
rank + (key ^= lag(key));
run;
proc print;
run;

Related

SAS how to Dense_rank

I am new to sas, I used to do oracle SQL
I did similar question before
How to tricky rank SAS?
I thought this question could solve the problem.
but
I got stuck.
so my code is this
data stepstep;
input emplid KEY:$3. count;
cards;
11 11Y 1
11 11Y 2
11 11N 3
11 11N 4
11 11Y 5
11 11N 6
12 12Y 1
12 12Y 2
12 12N 3
;
run;
and then I tried
data stepstep2;
set stepstep;
by key emplid NOTSORTED;
if first.key AND first.emplidthen rank=1;
ELSE rank+1;
run;
Output is this
I want to show
emplid key count rank
11 11Y 1 1
11 11Y 2 1
11 11N 3 2
11 11N 4 2
11 11Y 5 3
11 11N 6 4
12 12Y 1 1
12 12Y 2 1
12 12N 3 2
so new emplid comes, I want "Rank" goes back to start count from 1.
so this example, when first emplid "12" comes, rank goes back to 1
How can I do that?
You need to leverage your BY groups properly and I think you have them in the wrong order for starters. Try this instead:
data stepstep2;
set stepstep;
by emplid KEY NOTSORTED;
if first.emplid then rank=1; *start of each emplid group;
ELSE if first.key rank+1; *start of each new key;
run;
You can also use a sum statement:
data stepstep2;
set stepstep;
by emplid key NOTSORTED;
if first.emplid then rank=0;
rank + first.key;
run;

Compare column values

I have 5 columns and want to check which columns have exact values
num1 num2 num3 num4 num5
1 2 2 3 1
2 3 3 2 2
2 2 2 2 2
4 5 6 7 4
Here column 1(num1) and last(num5) have exact same values everywhere. How can I find it?
You could transpose and then look for duplicate rows instead.
data have ;
input num1-num5 ;
cards;
1 2 2 3 1
2 3 3 2 2
2 2 2 2 2
4 5 6 7 4
;
data _null_;
call symputx('nobs',nobs);
stop;
set have nobs=nobs;
run;
proc transpose data=have out=tran; var num1-num5; run;
proc sort data=tran; by col1-col&nobs; run;
data want;
set tran ;
by col1-col&nobs;
if not (first.col&nobs and last.col&nobs) ;
run;
proc print data=want;
run;
Results
Obs _NAME_ COL1 COL2 COL3 COL4
1 num1 1 2 2 4
2 num5 1 2 2 4

How to join multiple columns into one in sas

I have a time series SAS dataset and I want to transfer it to vertical dataset.
My data looks like..
ID A2009 A2010 A2011 A2012
1 1 2 3 4
2 1 2 3 4
3 1 2 3 4
4 1 2 3 4
5 1 2 3 4
data multcol;
infile datalines;
input ID A2009 A2010 A2011 A2012 A2013;
return;
datalines;
1 1 2 3 4 5
2 1 2 3 4 5
3 1 2 3 4 5
4 1 2 3 4 5
5 1 2 3 4 5
;
run;
proc print data=multcol noobs;
run;
I search the web only find someone's solution as following.Not worked.
But my dataset is too large, this method shut down my computer.
data cmbcol(keep=a orig_varname orig_obsnum);
set multcol;
array myvars _numeric_;
do i = 2 to dim(myvars);
orig_varname = vname(myvars(i));
orig_obsnum = _n_;
A = myvars(i);
output;
end;
run;
proc print data=cmbcol ;
title 'cmbcol';
run;
proc sort data=cmbcol;
by orig_varname a;
run;
proc print data=cmbcol noobs;
title 'cmbcol';
run;
And I want them to become like this.
ID t t+1
1 1 2
2 1 2
3 1 2
4 1 2
5 1 2
1 2 3
2 2 3
3 2 3
4 2 3
5 2 3
1 3 4
2 3 4
3 3 4
4 3 4
5 3 4
How can we do that?
Thanks in advance.
That is an unusual data structure for sure, but you could achieve this using the following macro (adjust to your needs).
options validvarname = any;
%macro transp;
%let i = 2009;
%do %while (&i <= 2011);
%let j = %eval(&i + 1);
data part_&i(rename = (A&i = t A&j = 't+1'n));
set multcol(keep = ID A&i A&j);
run;
%let i = %eval(&i + 1);
%end;
data combined;
set part_:;
run;
proc datasets nolist nodetails;
delete part_:;
quit;
%mend transp;
%transp

Creating a dummy variable for ``switching''

I'm working on a project in SAS and I wanted to create a dummy variable that accounted for ``preferences in medicine''. I have a long data-set, by time period, of individuals taking either medicine type 1 or type 2. For my research, I want to create a variable to represent if individuals who take type 1 medicine, then switched to type 2, but went back to type 1. I am unconcerned with the time interval that the individual was on the medication for, just that they followed this pattern.
id month type
1 1 2
1 2 2
1 3 2
2 1 1
2 2 2
2 3 1
...
I have more months, but just wanted to provide something to elucidate what I'm trying to get. Basically, I want to tally those subjects who are like subject 2.
well, nothing fancy, but it works for me:
DATA LONG1;
input id month type;
cards;
1 1 2
1 2 2
1 3 2
1 4 2
1 5 2
1 6 2
1 7 2
1 8 2
1 9 2
1 10 2
2 1 1
2 2 1
2 3 1
2 4 1
2 5 1
2 6 1
2 7 1
2 8 1
2 9 1
2 10 1
3 1 1
3 2 1
3 3 1
3 4 2
3 5 1
3 6 1
3 7 1
3 8 1
3 9 1
3 10 1
;
Proc Print; run;
* 1) make a wide dataset by deconstructing the initial long data by month & rejoining by id
2) then use if/then statements to create your dummy variable,
3) then merge the dummy variable back into your long dataset using ID;
DATA month1; set long1; where month=1; rename month=month_1 type=type_1; Proc Sort; by ID; run;
DATA month2; set long1; where month=2; rename month=month_2 type=type_2; Proc Sort; by ID; run;
DATA month3; set long1; where month=3; rename month=month_3 type=type_3; Proc Sort; by ID; run;
DATA month4; set long1; where month=4; rename month=month_4 type=type_4; Proc Sort; by ID; run;
DATA month5; set long1; where month=5; rename month=month_5 type=type_5; Proc Sort; by ID; run;
DATA month6; set long1; where month=6; rename month=month_6 type=type_6; Proc Sort; by ID; run;
DATA month7; set long1; where month=7; rename month=month_7 type=type_7; Proc Sort; by ID; run;
DATA month8; set long1; where month=8; rename month=month_8 type=type_8; Proc Sort; by ID; run;
DATA month9; set long1; where month=9; rename month=month_9 type=type_9; Proc Sort; by ID; run;
DATA month10; set long1; where month=10; rename month=month_10 type=type_10; Proc Sort; by ID; run;
DATA WIDE;
merge month1 month2 month3 month4 month5 month6 month7 month8 month9 month10; by ID;
if (type_1=1 and type_2=1 and type_3=1 and type_4=1 and type_5=1
and type_6=1 and type_7=1 and type_8=1 and type_9=1 and type_10=1) or
(type_1=2 and type_2=2 and type_3=2 and type_4=2 and type_5=2
and type_6=2 and type_7=2 and type_8=2 and type_9=2 and type_10=2)
then switch='no '; else switch='yes '; keep ID switch; run;
DATA LONG2;
merge wide long1; by ID;
Proc Print; run;
btw: also go to the SAS listserv, they love stuff like this:
http://www.listserv.uga.edu/archives/sas-l.html
This worked on the limited data I used:
DATA Have;
input id month type;
datalines;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 1 1
2 2 2
2 3 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 2
3 4 2
3 5 1
4 1 2
4 2 2
4 3 2
4 4 2
4 5 2
;
Data Temp(keep=id dummy);
length dummy $15;
retain Start Type2 dummy;
set Have;
by id;
if first.id then Do;
Start=0;
Type2=0;
Dummy="";
end;
If Type=1 then do;
If Start=0 then Start=1;
else if Start=1 and Type2=1 then Dummy="Switch-er-Roo";
end;
else do;
if Start=1 then Type2=1;
end;
if last.id then output;
run;
Data Want;
merge temp(in=a) have(in=b);
by id;
run;
I prefer #CarolinaJay65 approach, it's a lot cleaner and just involves one pass of the data. If all you are interested in are the patients who start and finish on Type1, but use Type2 at some point, then the code can be simplified slightly. The following code (using #CarolinaJay65 source data) will only output the patient_id's matching this criteria.
data switch_id (keep=id);
set have;
by id month;
retain switch;
if first.id then do;
call missing(switch);
if type=1 then switch=0;
end;
else if not missing(switch) and type=2 then switch=1;
if last.id and type=1 and switch=1 then output;
run;
If you just wanted the number of patients who match the criteria then you could tweak this code further.
data switch (keep=count);
set have end=final;
by id month;
retain switch count 0;
if first.id then do;
call missing(switch);
if type=1 then switch=0;
end;
else if not missing(switch) and type=2 then switch=1;
if last.id and type=1 and switch=1 then count+1;
if final then output;
run;
I think the following should work:
DATA Have;
input id month type;
if _n_ ^= 1 and id ^= lag(id) then diftype = .;
else diftype = dif(type);
datalines;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 1 1
2 2 2
2 3 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 2
3 4 2
3 5 1
4 1 2
4 2 2
4 3 2
4 4 2
4 5 2
;
proc sql;
select case when max(diftype) = 1 and min(diftype) = -1 then 1 else 0 end as flag, * from have
group by id
;
quit;

Using Retain Statement for Mathematical Operations in SAS

I have a dataset with 4 observations (rows) per person.
I want to create three new variables that calculate the difference between the second and first, third and second, and fourth and third rows.
I think retain can do this, but I'm not sure how.
Or do I need an array?
Thanks!
data test;
input person var;
datalines;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
;
run;
data test;
set test;
by person notsorted;
retain pos;
array diffs{*} diff0-diff3;
retain diff0-diff3;
if first.person then do;
pos = 0;
end;
pos + 1;
diffs{pos} = dif(var);
if last.person then output;
drop var diff0 pos;
run;
Why not use The Lag function.
data test; input person var;
cards;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
run;
data test; set test;
by person;
LagVar=Lag(Var);
difference=var-Lagvar;
if first.person then difference=.;
run;
An alternative approach without arrays.
/*-- Data from simonn's answer --*/
data SO1019005;
input person var;
datalines;
1 5
1 10
1 12
1 20
2 1
2 3
2 5
2 90
;
run;
/*-- Why not just do a transpose? --*/
proc transpose data=SO1019005 out=NewData;
by person;
run;
/*-- Now calculate your new vars --*/
data NewDataWithVars;
set NewData;
NewVar1 = Col2 - Col1;
NewVar2 = Col3 - Col2;
Newvar3 = Col4 - Col3;
run;
Why not use the dif() function instead?
/* test data */
data one;
do id = 1 to 2;
do v = 1 to 4 by 1;
output;
end;
end;
run;
/* check */
proc print data=one;
run;
/* on lst
Obs id v
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 2 3
8 2 4
*/
/* now create diff within id */
data two;
set one;
by id notsorted; /* assuming already in order */
dif = ifn(first.id, ., dif(v));
run;
proc print data=two;
run;
/* on lst
Obs id v dif
1 1 1 .
2 1 2 1
3 1 3 1
4 1 4 1
5 2 1 .
6 2 2 1
7 2 3 1
8 2 4 1
*/
data output_data;
retain count previous_value diff1 diff2 diff3;
set data input_data
by person;
if first.person then do;
count = 0;
end;
else do;
count = count + 1;
if count = 1 then diff1 = abs(value - previous_value);
if count = 2 then diff2 = abs(value - previous_value);
if count = 3 then do;
diff3 = abs(value - previous_value);
output output_data;
end;
end;
previous_value = value;
run;