Do it in proc sql - sas

How to do below codes in proc sql.
Two proc statement and one merge given below.
proc sort data=new out=new1 nodupkey;
by id;
where roll=100;
run;
proc sort data new2 out =new4 nodupkey
by id;
where roll=100;
run;
data score;
merge new4 (in=a) new1;
by id;
if a;
run;

The merge you show is equivalent to SQL left-join. You want all the rows from "new2" and ignore all the rows from "new" that don't have a common id. The uniqueness of the id (per the pre-sorts) further supports a left-join equivalence.
Proc SQL;
select new.*, new2.*
from new2
left join new on new.id = new2.id
where roll=100
order by id;
quit;
For the scenario of atypical data where there is many:many ids in the merge, the left-join is not equivalent.
I did leave out the NODUPKEY equivalent. Presuming option EQUALS is in effect, the selection of a groups first row would be equivalent. The undocumented MONOTONIC() function can be used to apply a default row order to a sub-query, which can then be used in a by group having expression.
data LEFT;
input id x1 x2 x3;
datalines;
1 1 1 1
1 2 2 2
1 3 3 3
2 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
;
run;
data RIGHT;
input id y1 y2 y3 x1;
datalines;
1 1 1 1 11
2 1 1 1 22
3 1 2 3 4
3 2 3 4 5
3 3 4 5 6
4 1 1 1 44
6 6 6 6 6
;
run;
proc sql;
select
LEFT.id
, coalesce(RIGHT.x1,LEFT.x1) as x1
, LEFT.x2
, LEFT.x3
, RIGHT.y1
, RIGHT.y2
, RIGHT.y3
from
(
select * from (select monotonic() as _seq_, * from LEFT) group by id having _seq_ = min(_seq_)
)
as LEFT
left join
(
select * from (select monotonic() as _seq_, * from RIGHT) group by id having _seq_ = min(_seq_)
)
as RIGHT
on
LEFT.id = RIGHT.id
;
I feel the need to reiterate that SQL left join is not always the same a merge, and SQL does not have common variable 'overlaying' that is implicit in DATA Step. When LEFT and RIGHT collide on non-key variables, you need to select a coalescence of the common variables into a new like-named variable in the output.

Related

Needing to retain Lab category tests based on individual positive test result

Hello so this is a sample of my data (There is an additional column of LBCAT =URINALYSIS for those panel of tests)
I've been asked to only include the panel of tests where LBNRIND is populated for any of those tests and the rest to be removed. Some subjects have multiple test results at different visit timepoints and others only have 1.I can't utilise a simple where LBNRIND ne '' in the data step because I need the entire panel of Urinalysis tests and not just that particular test result. What would be the best approach here? I think transposing the data would be too messy but maybe putting the variables in an array/macro and utilising a do loop for those panel of tests?.
Update:I've tried this code but it doesn't keep the corresponding tests for where lb_nrind >0. If I apply the sum(lb_nrind > '' ) the same when applying lb_nrind > '' to the having clause
*proc sql;
*create table want as
select * from labUA
group by ptno and day and lb_cat
having sum(lb_nrind > '') > 0 ;
data want2;
do _n_ = 1 by 1 until (last.ptno);
set labUA;
by ptno period day hour ;
if not flag_group then flag_group = (lb_nrind > '');
end;
do _n_ = 1 to _n_;
set want;
if flag_group then output;
end;
drop flag_group; run;*
You can use a SQL HAVING clause to retain rows of a group meeting some aggregate condition. In your case that group might be a patientid, panelid and condition at least one LBNRIND not NULL
Example:
Consider this example where a group of rows is to be kept only if at least one of the rows in the group meets the criteria result7=77
Both code blocks use the SAS feature that a logical evaluation is 1 for true and 0 for false.
SQL
data have;
infile datalines missover;
input id test $ parm $ result1-result10;
datalines;
1 A P 1 2 . 9 8 7 . . . .
1 B Q 1 2 3
1 C R 4 5 6
1 D S 8 9 . . . 6 77
1 E T 1 1 1
1 F U 1 1 1
1 G V 2
2 A Z 3
2 B K 1 2 3 4 5 6 78
2 C L 4
2 D M 9
3 G N 8
4 B Q 7
4 D S 6
4 C 1 1 1 . . 5 0 77
;
proc sql;
create table want as
select * from have
group by id
having sum(result7=77) > 0
;
DOW Loop
data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
if not flag_group then flag_group = (result7=77);
end;
do _n_ = 1 to _n_;
set have;
if flag_group then output;
end;
drop flag_group;
run;

Aggregate multiple vars on different groupings in one Proc SQL query

I need to aggregate about ten different vars on different groupings using Proc SQL;
Is there a way to achieve SUM () OVER ( [ partition_by_clause ] order_by_clause) in one sql query with different partition by clauses.
I've made an example here
data have;
infile cards;
input a b c d e f;
cards;
1 2 3 4 5
2 2 4 5 6
1 4 3 4 7
3 4 4 5 8
;
run;
proc sql;
create table want as
select *,
sum a over partiton by (b,c) as a1,
sum b over partiton by (c,d) as b1
sum c over partiton by (d,e) as c1
sum d over partiton by (a,c) as d1
from have
;
quit;
I don't want to wirte multiple sql queries and grouping on different vars and calculating one var in each step.
Hope that makes sense.
Proc SQL does not implement windowing functions and thus partition syntax therein as found in other SQL implementations. You can only do partition by with passthrough SQL to a connection that allows such syntax.
You could perform such a computation in DATA step using hashes.
data have;
infile cards;
input a b c d e ;
cards;
1 2 3 4 5
2 2 4 5 6
1 4 3 4 7
3 4 4 5 8
;
run;
data want;
if 0 then set have;
length a1 b1 c1 d1 8;
declare hash a1s();
a1s.defineKey('b', 'c');
a1s.defineData('a1');
a1s.defineDone();
declare hash b1s();
b1s.defineKey('c', 'd');
b1s.defineData('b1');
b1s.defineDone();
declare hash c1s();
c1s.defineKey('d', 'e');
c1s.defineData('c1');
c1s.defineDone();
declare hash d1s();
d1s.defineKey('a', 'c');
d1s.defineData('d1');
d1s.defineDone();
do while (not end);
set have end=end;
if a1s.find() = 0 then a1+a; else a1=a; a1s.replace();
if b1s.find() = 0 then b1+b; else b1=b; b1s.replace();
if c1s.find() = 0 then c1+c; else c1=c; c1s.replace();
if d1s.find() = 0 then d1+d; else d1=d; d1s.replace();
end;
do while (not last);
set have end=last;
a1s.find();
b1s.find();
c1s.find();
d1s.find();
output;
end;
format _numeric_ 4.;
stop;
run;

Using a sas lookup table when the column number changes

I have two sas datasets,
Table 1 Table 2
col1 col2 col3 col4 col5 a b
. 1 2 3 4 1 1
1 5 8 6 1 1 4
2 5 9 7 1 4 3
3 6 9 7 1 2 1
4 6 9 7 2 2 2
where table 1 is a lookup table for values a and b in table 2, such that I can make a column c. In table 1 a is equivalent to col1 and b to row1 (i.e. the new column c in table 2 should read 5,1,7,5,9. How can I achieve this in sas. I was thinking of reading table 1 into a 2d array then get column c = array(a,b), but can't get it to work
Here's an IML solution, first, as I think this is really the 'best' solution for you - you're using a matrix, so use the matrix language. I'm not sure if there's a non-loop method - there may well be; if you want to find out, I would add the sas-iml tag to the question and see if Rick Wicklin happens by the question.
data table1;
input col1 col2 col3 col4 col5 ;
datalines;
. 1 2 3 4
1 5 8 6 1
2 5 9 7 1
3 6 9 7 1
4 6 9 7 2
;;;;
run;
data table2;
input a b;
datalines;
1 1
1 4
4 3
2 1
2 2
;;;;
run;
proc iml;
use table1;
read all var _ALL_ into table1[colname=varnames1];
use table2;
read all var _ALL_ into table2[colname=varnames2];
print table1;
print table2;
table3 = j(nrow(table2),3);
table3[,1:2] = table2;
do _i = 1 to nrow(table3);
table3[_i,3] = table1[table3[_i,1]+1,table3[_i,2]+1];
end;
print table3;
quit;
Here is the temporary array solution. It's not all that pretty. If speed is an issue you don't have to loop over the array to insert it, you can use direct memory access, but I don't want to do that unless speed is a huge issue (and if it is, you should use a better data structure first).
data table3;
set table2;
array _table1[4,4] _temporary_;
if _n_ = 1 then do;
do _i = 1 by 1 until (eof);
set table1(firstobs=2) nobs=_nrows end=eof;
array _cols col2-col5;
do _j = 1 to dim(_cols);
_table1[_i,_j] = _cols[_j];
end;
end;
end;
c = _table1[a,b];
keep a b c;
run;
Just use the POINT= option on a SET statement to pick the row. You can then use an ARRAY to pick the column.
data table1 ;
input col1-col4 ;
cards;
5 8 6 1
5 9 7 1
6 9 7 1
6 9 7 2
;
data table2 ;
input a b ;
cards;
1 1
1 4
4 3
2 1
2 2
;
data want ;
set table2 ;
p=a ;
set table1 point=p ;
array col col1-col4 ;
c=col(b);
drop col1-col4;
run;

How to sum value from next row by group using SAS?

I want to create a column in my dataset that calculates the sum of the current row and next row for another field. There are several groups within the data, and I only want to take the sum of the next row if the next row is part of the current group. If a row is the last record for that group I want to fill with a null value.
I'm referencing reading next observation's value in current observation, but still can't figure out how to obtain the solution I need.
For example:
data have;
input Group ID Salary;
cards;
10 1 1
10 2 2
10 3 2
10 4 1
11 1 2
11 2 2
11 3 1
11 4 1
;
run;
The result I want to obtain here is this:
data want;
input Group ID Salary Sum;
cards;
10 1 1 3
10 2 2 4
10 3 2 3
10 4 1 .
11 1 2 4
11 2 2 3
11 3 1 2
11 4 1 .
;
run;
Similar to Tom's answer, but using a 'look-ahead' merge (without a by statement, and firstobs=2) :
data want ;
merge have
have (firstobs=2
keep=Group Salary
rename=(Group=NextGroup Salary=NextSalary)) ;
if Group = NextGroup then sum = sum(Salary,NextSalary) ;
drop Next: ;
run ;
Use BY group processing and a second SET statement that skips the first observation.
data want ;
set have end=eof;
by group ;
if not eof then set have (keep=Salary rename=(Salary=Sum) firstobs=2);
if last.group then Sum=.;
else sum=sum(sum,salary);
run;
I found a solution using proc expand that produced what I needed:
proc sort data = have;
by Group ID;
run;
proc expand data=have out=want method=none;
by Group;
convert Salary = Next_Sal / transformout=(lead 1);
run;
data want(keep=Group ID Salary Sum);
set want;
Sum = Salary + Next_Sal;
run;

SAS sort by original order

Say you have three separate data sets consisting of the same number of observations. Each observation has an ID letter, A-Z, followed by some numerical observation. For example:
Data set 1:
B 3 8 1 9 4
C 4 1 9 3 1
A 4 4 5 4 9
Data set 2:
C 3 1 9 4 0
A 4 1 2 0 0
B 0 3 3 1 8
I want to merge the data sets BY that first variable. The problem is, the first variable is NOT already sorted in alphabetical form, and I do not want to sort it in alphabetical form. I want to merge the data but keep the original order. For example, I would get:
Merged data:
B 3 8 1 9 4
B 0 3 3 1 8
C 4 1 9 3 1
C 3 1 9 4 0
A 4 4 5 4 9
A 4 1 2 0 0
Is there any way to do this?
You can create a variable that holds the order and then apply that the new dataset after its "merged". I believe this is an append rather than merge though. I've used a format, though you could use a sql or data set merge as well.
data have1;
input id $ var1-var5;
cards;
B 3 8 1 9 4
C 4 1 9 3 1
A 4 4 5 4 9
;
run;
data have2;
input id $ var1-var5;
cards;
C 3 1 9 4 0
A 4 1 2 0 0
B 0 3 3 1 8
;
run;
data order;
set have1;
fmtname='sort_order';
type='J';
label=_n_;
start=id;
keep id fmtname type label start;
run;
proc format cntlin=order;
run;
data want;
set have1 have2;
order_var=input(id, $sort_order.);
run;
proc sort data=want;
by order_var;
run;
This is just one SQL version which follows along a similar path to Joe's answer. Row order is input via a sub-query rather than a format. However the initial order of the two input tables is lost in the join to the row order sub-query. The original order (have2 follows have1) is re-instated by using the table names as a secondary order variable.
proc sql;
create table want1 as
select want.id
,want.var1
,want.var2
,want.var3
,want.var4
,want.var5
from (
select *
, 'have1' as source
from have1
union all
select *
, 'have2' as source
from have2
) as want
left join
(
select id
, monotonic() as row_no
from have1
) as order
on want.id eq order.id
order by order.row_no
,want.source
;
quit;
proc compare
base=want1
compare=want
;
run;
And this is a data step version without a format. Here the have1 table with row order is re-merged with the concatenated data (have1 and have2) and then re-sorted by row order.
data want2;
set have1 have2;
run;
data have1;
set have1;
order_var = _n_;
run;
proc sort data=want2;
by id;
run;
proc sort data=have1;
by id;
run;
data want2;
merge want2 have1;
by id;
run;
proc sort data=want2;
by order_var;
run;
proc compare
base=want2
compare=want
;
run;