SAS: Select rows where the ID is in another table - sas

I have two tables, that both have an ID column. I'd like to select the rows in the one table, that have an ID that is in the second table.
I R I would do this be saying tbl1[tbl$ID %in% tbl2$ID,], but I haven't found a way to translate this into SAS.

Try this:
PROC SQL;
CREATE TABLE result AS
SELECT t2.*
FROM table1 AS t1, table2 AS t2
WHERE t1.id = t2.id
;
QUIT;

This is an expansion on Hong Ooi's method with the corrections suggested by Jon Clements. I have found using a data step to be quicker than using SQL. And it gives you more options for outputting data. For instance, this solution creates a table called "match_error" which holds all IDs in table1 that aren't in table2.
proc sort data=table1;
by id;
run;
proc sort data=table2;
by id;
run;
data result match_error;
merge table1 (in=in_T1) table2 (in=in_T2 keep=id);
by id;
if in_T1 and in_T2 then output result;
if in_T1 and not in_T2 then output match_error;
run;

data out;
merge table1 (in=t1) table2 (in=t2);
by id;
if t1 and not t2;
run;

Related

How do I merge the results from proc sql count into one table?

I used proc sql to count the number of observations in 4 different tables. Now how do I merge these 4 results so that I get one nice table? Thanks.
SQL DICTIONARY.TABLES might be what you want.
Example:
proc sql;
create table want as
select libname, memname, nobs
from dictionary.tables
where libname = 'SASHELP'
and upcase(memname) in ('CARS', 'CLASS', 'AIR', 'BASEBALL')
;
There are many ways to merge tables in SAS, and you can even count the observations all in a single step. Assuming you have four tables with one row and dataset each, you can do a simple merge step. For example:
data count1;
n_obs1 = 100;
run;
data count2;
n_obs2 = 200;
run;
data want;
merge count1 count2;
run;
You could also do everything in a single SQL step.
proc sql;
create table want as
select nobs_1, nobs_2
from (select count(*) as nobs_1 from sashelp.cars)
, (select count(*) as nobs_2 from sashelp.class)
;
quit;
Proc sql;
Create table counts as
Select count(*) as count, "t1" as source from t1
Union select count(*) as count, "t2" as source from t2
Union select count(*) as count, "t3" as source from t3
Union select count(*) as count, "t4" as source from t4;
Quit;`

Merging multiple datasets together

Hello I have a listing I'm struggling with as I don't think the code I am using is doing the job correctly. Here is the spec.
Dataset: Firstly, merge QS with SUPPQS by USUBJID, IDVARVAL= QSSEQ, keep only records where QSCAT=’SOFA’. Then merge with ADSOFA by USUBJID and QSSEQ. Only keep records where MITTFL=’Y’
and here is the code I'm using
proc sql;
create table qs (where=(qscat="SOFA" )) as
select a.*,b.qnam as SOFASCS,qval as avalc_qs from trans.qs as a
left join
trans.suppqs (where=(qnam='SOFASCS')) as b
on a.usubjid = b.usubjid and a.qsseq = input(b.idvarval,best.);
quit;
proc sort data=qs;
by usubjid qsseq;
run;
data adsofa;
set adb.adsofa;
run;
proc sort data=adsofa;
by usubjid qsseq;
run;
data qs01;
merge qs(in=a drop=studyid)
adsofa(in=b where=(mittfl = "Y"));
by usubjid qsseq;
if a or b;
I keep getting rows I don't want. Is there a cleaner way of doing this?.
I tried to convert your logic into a classical SQL.
proc sql;
create table qs as
select a.*
,b.qnam as SOFASCS
,qval as avalc_qs
from trans.qs as a
left join trans.suppqs as b
on a.usubjid = b.usubjid and a.qsseq = input(b.idvarval,best.) and qnam='SOFASCS'
where qscat="SOFA" ;
quit;
proc sql;
create table qs01 as
select qs.*, a.*
from qs
full /* left? */ join adb.adsofa as a
on a.usubjid = qs.usubjid and a.qsseq = qs.qsseq and mittfl = "Y"
;
quit;
I assume that you did not really want to have a full join but a simple left loin in the last one.

Put everything into sas sql

I have two codes one proc sql and another proc and datastep. Both are interlinked datasets.
Below is the proc sql lines.
create table new as select a.id,a.alid,b.pdate from tb a inner join
tb1 act on a.aid =act.aid left join tb2 as b on (r.alid=a.alid) where
a.did in (15,45); quit;
Below is the proc and datasteps created from above datatset new.
proc sort data = new uodupkey;
by alid;
data new1;
set new;
format ddate date9.
dat1=datepart(today);
datno=input(number,20.);
key=_n_;
rename alid blid;
run;
proc sort data=new1 nodupkey;
by datno dat1;
run;
I need to put everything into single proc sql step.
You mention two data steps but I only see one.
Anyway, your data step and proc sort can indeed be written in one sql query (which you can then insert in your proc sql):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from new1
group by datno, dat1
having key=min(key)
;
quit;
One remark though. Your data step expects variables called ddate,today and number in your input dataset new. If that dataset is supposed to be the result of your first sql query, then those variables don't exist and their values along with those of dat1 and datno in new1 will always be missing.
Also I assume you misspelled nodupkey on your proc sort.
EDIT: or, to have it all in the same query (if that's what you meant with "the same proc sql"):
proc sql;
create table new1 as
select id
,alid as blid
,pdate
,datepart(today) as dat1
,input(number,20.) as datno
,monotonic() as key
from (
select a.id,a.alid,b.pdate
from tb a
inner join tb1 act
on a.aid =act.aid
left join tb2 as b
on (r.alid=a.alid)
where a.did in (15,45)
)
group by datno, dat1
having key=min(key)
;
quit;

Joining two sets in SAS SQL with table/correlation error

I am trying to join two datasets. The first dataset1 has two columns item and price. The second dataset2 has three columns - item, customerid, and qty. I need to only include the unique rows from dataset1 that are not in dataset2. While trying to implement this code, I get the error:
Error: Unresolved reference to table/correlation name i.
I am unsure how to fix this error, thanks.
PROC SQL;
create table a as
select *
from dataset1 as i
except corr
select *
from dataset2 as p
where i.item = p.item;
describe table a;
QUIT;
EXCEPT is used to select records in the first set that do not exist in the second set. So if what you want is, to quote you, select records from dataset1 that do not appear in dataset2, you don't need the where clause:
PROC SQL;
create table a as
select *
from dataset1 as i
except corr
select *
from dataset2 as p
;
QUIT;
If however, like that where clause would suggest, you actually want to select records from dataset1 where the value of item is not found in dataset2, you could do this
proc sql;
select *
from dataset1 i
where not exists (select *
from dataset2 p
where i.item=p.item
)
;
quit;
EDIT: following your latest comment, and if you reaaaally need your query to feature an except, this should get you your result
proc sql;
create table a as
select t1.*
from dataset1 t1
inner join (select *
from dataset1 as i
except corr
select *
from dataset2 as p
) t2
on t1.item=t2.item
;
quit;
Even though this will do the same as the query above (with not exists) or, now that I think of it (stupid me), as this:
proc sql;
create table a as
select *
from dataset1
where item not in (select distinct item from dataset2)
;
quit;

SAS enterprise guide summing by personal ID

I have a dataset which has multiple obs per person. I want to have each single record showing the sum of a variable per person ID. However I do not want to group the data into single personal IDs. I hope the example below explains my question
I want to create the column in bold. How to do this? In SAS EG (or SAS if necessary)?
ID...Var1...SUM
X.....10.......30
X.....20.......30
Y.....20.......80
Y.....20.......80
Y.....40.......80
Z.....30.......30
You can do this using either proc sql or proc means
more info:proc means
proc sql
proc sql:
proc sql noprint;
create table new_table as
select distinct id, var1, sum(var_to_sum) as summed_var_name
from old_table
group by id
;
quit;
after rereading your question, using proc means you will need to merge var1 back in, better off using proc sql above.
proc means:
proc means data = old_table sum;
by id var1;
var var_to_sum;
output out = new_table sum;
run;