I have the following SAS dataset.
correlation
policynum
risknum
A
X
Y
A
X
Y
A
X
Y
B
X
Y
B
X
Y
B
X
Y
B
X
L
B
X
L
B
X
L
C
Z
M
C
Z
M
C
Z
M
D
Z
M
D
Z
M
D
Z
M
In SAS, I want to filter the above dataset so I get my final output as:
correlation
policynum
risknum
B
X
Y
B
X
Y
B
X
Y
B
X
L
B
X
L
B
X
L
D
Z
M
D
Z
M
D
Z
M
i.e. for each group of policynum and risknum, if multiple values exist for correlation, I want to keep the second value and get rid of the first value.
If only a single value of correlation exists for a group of policynum and risknum, I want to retain that group in my final output too.
What would be the best way to do this? It might be something simple as I am relatively new to SAS.
Thanks in advance!
If the order of the correlation values, in sort order, is the same ordering as they appear row-wise in the data set you can use SQL. Otherwise, SQL, being based on set theory, which does not have implicit row numbers, can not be used. A DATA step with DOW loop can be used.
Example:
FYI, one common situation in which SAS coders use the phrase 'DOW loop' is when SET & BY statements occur inside a DO loop.
data have;
input correlation $ policynum $ risknum $;
datalines;
A X Y
A X Y
A X Y
B X Y
B X Y
B X Y
B X L
B X L
B X L
C Z M
C Z M
C Z M
D Z M
D Z M
D Z M
;
/* keep last group of a nested group */
* SQL can be used only if correlation wanted is ALWAYS highest valued correlation;
proc sql;
create table want as
select * from have
group by policynum, risknum
having correlation = max(correlation)
;
* DATA Step DOW loops can be used when correlation wanted is last occurring correlation within by group;
data want;
do _n_ = 1 by 1 until (last.policynum);
set have;
by policynum risknum notsorted; /* presume at least contiguous */
end;
_want_correlation = correlation;
do _n_ = 1 to _n_;
set have;
if _want_correlation = correlation then OUTPUT;
end;
run;
I am trying to find the equivalent of the Stata code "egen group" in SAS.
The goal is:
I have three variables x, y, and z. I want to create a new variable which will assign a different ordinal number for each combination of values of x, y, and z. How can I do this in SAS?
If you order your data by x, y, and z, SAS knows exactly where the groups x, y, and z start/end. You can use this to create unique identifiers.
Let's make some sample data. This data purposefully has duplicate values to illustrate how first. works.
data have;
do x = 'a', 'b', 'c';
do y = 'd', 'e', 'f';
do z = 'g', 'h', 'i';
output;
output;
end;
end;
end;
run;
Single-Threaded Unique IDs
This is the most likely case for you. This applies if you're running code in Base SAS.
First, sort the data by x y z.
proc sort data=have;
by x y z;
run;
Next, create your identifiers. We'll tell SAS that the data is ordered by x y z. Since z is nested within y and x, if we reach the first value of z, we've reached a unique combination of x y z.
data want;
set have;
by x y z;
if(first.z) then id+1;
run;
Output:
x y z id
a d g 1
a d g 1
a d h 2
a d h 2
a d i 3
a d i 3
...
id+1 is a special SAS shortcut called a sum statement and is equivalent to the following code:
retain id 0;
if(first.z) then id = id+1;
Multi-threaded Unique IDs
This applies if you're running code in SAS Viya in CAS. You need to add _THREADID_ to the ID to make it unique. For example:
cas;
libname casuser cas caslib='casuser';
data casuser.have;
set have;
run;
data casuser.want;
set casuser.have;
by x y z;
if(first.z) then _id+1;
id = catx('_', _THREADID_, _id);
drop _id;
run;
Output:
x y z id
a d g 15_1
a d g 15_1
a d h 15_2
a d h 15_2
a d i 15_3
a d i 15_3
...
I'm trying to pull only last 4 working days data in SAS...I tried following code but I'm not getting what I'm intended to...
data input;
Input id $ id1 $ id2 $ num date date9.;
Format Date Date9.;
datalines;
x y z 3 19JUL2015
x y z 2 18JUL2015
x y z 3 17JUL2015
x y z 2 16JUL2015
x y z 3 15JUL2015
x y z 2 14JUL2015
x y z 3 13JUL2015
a b c 1 12JUL2015
a b c 1 11JUL2015
a b c 1 10JUL2015
a b c 1 09JUL2015
a b c 1 08JUL2015
a b c 2 07JUL2015
x y z 1 06JUL2015
;
Run;
Data test;
Set input;
Weekday=Weekday(Date);
intck=intck('weekday',Date,today());
*if intck('weekday',Date,today()) >4;
if 1<Weekday(Date)<7 and Date>=today()-4;
Run;
I think you need to reverse the > in your code, and add a qualification that you only want weekdays:
Data test;
Set input;
Weekday=Weekday(Date);
intck=intck('weekday',Date,today());
if intck('weekday',Date,'20JUL2015'd) le 4 and 1<weekday(Date)<7;
*if 1<Weekday(Date)<7 and Date>='20JUL2015'd-5;
Run;
I'm here asking for help for a problem with proc transpose.
I have a dataset made this way (I'm going to show only 3 variables but I have lots of them)
PR ID VAR1a VAR1b VAR1c VAR2a VAR2b VAR2c VAR3a VAR3b VAR3c
1 1 x x x x x x x x x
1 2 x x x x x x x x x
1 3 x x x x x x x x x
2 1 x x x x x x x x x
2 2 x x x x x x x x x
2 3 x x x x x x x x x
I need an output dataset like this:
PREID ID VAR(name) A B C
1 1 VAR1(name) x x x
1 1 VAR2(name) x x x
1 1 VAR3(name) x x x
1 2 VAR1(name) x x x
1 2 VAR2(name) x x x
1 2 VAR3(name) x x x
1 3 VAR1(name) x x x
1 3 VAR2(name) x x x
1 3 VAR3(name) x x x
etc with preid 2 id 1 2 3, preid 3 id 1 2 3.
So I need to transpose but using the name (discriminating from a b c), I really have no idea from where I could start.
Can you help me please?
If i'm able to understand the output correctly. I think to achieve the result, first each observation of your input data would be broken into several different observation. So single observation would be converted into 9(var1a to var3c) observations( You can achive that using proc transpose by pr & id variable and transpose var1a to var3c variables). After this using a datastep, you would need to break _NAME__ variable into var1/2/3 and the a/b/c. After getting this done, you should be able to transpose the data to achieve your result.
I tried to write down the code based on your input data. Let me know if it helps.
data input;
infile datalines dsd dlm=',' missover;
input PR :$8.
ID :$8.
VAR1a :$8.
VAR1b :$8.
VAR1c :$8.
VAR2a :$8.
VAR2b :$8.
VAR2c :$8.
VAR3a :$8.
VAR3b :$8.
VAR3c :$8.;
datalines4;
1,1,x,x,x,x,x,x,x,x,x
1,2,x,x,x,x,x,x,x,x,x
1,3,x,x,x,x,x,x,x,x,x
2,1,x,x,x,x,x,x,x,x,x
2,2,x,x,x,x,x,x,x,x,x
2,3,x,x,x,x,x,x,x,x,x
;;;;
run;
proc transpose data=input out=staging ;
by pr id ;
var VAR1a--VAR3c;
run;
data staging;
set staging;
var=substrn(strip(_name_),1,length(strip(_name_))-1);
dummy=substrn(strip(_name_),length(strip(_name_)),1);
drop _name_;
run;
proc transpose data=staging out=final(drop=_name_);
by pr id var;
id dummy;
var col1;
run;
proc print data=final;run;
Similar to #sushil solution above, but one less step. Since you have to go into a data step anyways, you may as well transpose the data in that step as well. So in this solution the Proc Transpose/Data step are combined. If you had few enough variables I'd remove the last transpose as well, but this is more flexible if you have quite a few variables.
data input;
infile datalines dsd dlm=',' missover;
input PR :$8.
ID :$8.
VAR1a :$8.
VAR1b :$8.
VAR1c :$8.
VAR2a :$8.
VAR2b :$8.
VAR2c :$8.
VAR3a :$8.
VAR3b :$8.
VAR3c :$8.;
datalines4;
1,1,x,x,x,x,x,x,x,x,x
1,2,x,x,x,x,x,x,x,x,x
1,3,x,x,x,x,x,x,x,x,x
2,1,x,x,x,x,x,x,x,x,x
2,2,x,x,x,x,x,x,x,x,x
2,3,x,x,x,x,x,x,x,x,x
;;;;
run;
data out1;
set input;
array vars(*) var1a--var3c;
do i=1 to dim(vars);
name=vname(vars(i));
varname=substr(name,1,length(name)-1);
group=substr(name,length(name));
value=vars(i);
output;
end;
drop var1a--var3c;
run;
proc transpose data=out1 out=out2;
by pr id varname;
id group;
var value;
run;