Concatenate two SAS data sets, but only if ID appear in both - sas

I want to concatenate two SAS data sets, one from 2003 and one from 2013. There is a uniq identifier in both, and I'll only allow allow records to be concatenated if they appears in both.
NB. there is multiple records with the same ID.

Here's some untested code:
proc sql;
create table want as
select * from(
select * from t1 where t1.id in (select t2.id in t2)
union
select * from t2 where t2.id in (select t1.id in t1)) as A;
quit;

Related

How do I merge the results from proc sql count into one table?

I used proc sql to count the number of observations in 4 different tables. Now how do I merge these 4 results so that I get one nice table? Thanks.
SQL DICTIONARY.TABLES might be what you want.
Example:
proc sql;
create table want as
select libname, memname, nobs
from dictionary.tables
where libname = 'SASHELP'
and upcase(memname) in ('CARS', 'CLASS', 'AIR', 'BASEBALL')
;
There are many ways to merge tables in SAS, and you can even count the observations all in a single step. Assuming you have four tables with one row and dataset each, you can do a simple merge step. For example:
data count1;
n_obs1 = 100;
run;
data count2;
n_obs2 = 200;
run;
data want;
merge count1 count2;
run;
You could also do everything in a single SQL step.
proc sql;
create table want as
select nobs_1, nobs_2
from (select count(*) as nobs_1 from sashelp.cars)
, (select count(*) as nobs_2 from sashelp.class)
;
quit;
Proc sql;
Create table counts as
Select count(*) as count, "t1" as source from t1
Union select count(*) as count, "t2" as source from t2
Union select count(*) as count, "t3" as source from t3
Union select count(*) as count, "t4" as source from t4;
Quit;`

How to prevent left join from returning multiple rows

While using left join in SAS, the right side table have duplicate IDs with different donations. Therefore, it returns several rows.
While i only want one row with the highest donated amount.
The code is as follows:
Create table x
As select T1.*,
T2. Donations
From xxx t1
Left join yy t2 on (t1.id = t2.id);
Quit;
Thanks for any help
IN SAS follow https://stackoverflow.com/a/61486331/8227346
and in mysql
you can use partioning with ROW_NUMBER
CREATE TABLE x As select T1.*, T2.Donations
From xxx t1
LEFT JOIN
(
SELECT * FROM
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY donated_amount DESC) rank
FROM
yy
)
WHERE
rank = 1
)
t2
ON (t1.id = t2.id);
More info can be found https://www.c-sharpcorner.com/blogs/rownumber-function-with-partition-by-clause-in-sql-server1
You can either work with a subselect which selects only the highest donation for a given ID or you could do some pre work with SAS (which i prefer):
*Order ascending by ID and DONATIONS;
proc sort data=work.t2;
by ID DONATIONS;
run;
*only retain the dataset with the highest DONATION per ID;
data work.HIGHEST_DONATIONS;
set work.t2;
by ID;
if last.ID then output;
run;
I don't have SAS available right now but it should work.
Don't hesitate asking further questions. :)

Joining two sets in SAS SQL with table/correlation error

I am trying to join two datasets. The first dataset1 has two columns item and price. The second dataset2 has three columns - item, customerid, and qty. I need to only include the unique rows from dataset1 that are not in dataset2. While trying to implement this code, I get the error:
Error: Unresolved reference to table/correlation name i.
I am unsure how to fix this error, thanks.
PROC SQL;
create table a as
select *
from dataset1 as i
except corr
select *
from dataset2 as p
where i.item = p.item;
describe table a;
QUIT;
EXCEPT is used to select records in the first set that do not exist in the second set. So if what you want is, to quote you, select records from dataset1 that do not appear in dataset2, you don't need the where clause:
PROC SQL;
create table a as
select *
from dataset1 as i
except corr
select *
from dataset2 as p
;
QUIT;
If however, like that where clause would suggest, you actually want to select records from dataset1 where the value of item is not found in dataset2, you could do this
proc sql;
select *
from dataset1 i
where not exists (select *
from dataset2 p
where i.item=p.item
)
;
quit;
EDIT: following your latest comment, and if you reaaaally need your query to feature an except, this should get you your result
proc sql;
create table a as
select t1.*
from dataset1 t1
inner join (select *
from dataset1 as i
except corr
select *
from dataset2 as p
) t2
on t1.item=t2.item
;
quit;
Even though this will do the same as the query above (with not exists) or, now that I think of it (stupid me), as this:
proc sql;
create table a as
select *
from dataset1
where item not in (select distinct item from dataset2)
;
quit;

Selecting Max Value from Left Join

I have two tables like below.
Profile : ID
Charac : ID, NAME, DATE
With the above tables, I am trying to get NAME from Charac where we have max date.
I am trying to do a join with proc sql by replicating the answer for mysql like below
proc sql;
create table ggg as
select profile.ID ,T2.NAME
from Profile
left join
( select ID,max(DATE) as max_DATE
from EDW.CHARAC
group by ID
) as T1
on fff.ID = EDW.ID
left join EDW.CHARAC as T2
on T2.ID = T1.max_DATE
order by profile.ID DESC;
quit;
Error
ERROR: Unresolved reference to table/correlation name EDW.
ERROR: Expression using equals (=) has components that are of different data types.
Could it be you intended
on T2.ID = T1.max_DATE
which is probably source of "components that are of different data types" error
to be:
on T2.ID = T1.ID and T2.DATE = T1.max_DATE
that, is - joining on IDs at maximum DATE?
You can't use EDW like that. You need to join
on fff.ID=T1.ID
As far as data types, that probably is because EDW.ID is undefined and thus numeric by default.

SAS: Select rows where the ID is in another table

I have two tables, that both have an ID column. I'd like to select the rows in the one table, that have an ID that is in the second table.
I R I would do this be saying tbl1[tbl$ID %in% tbl2$ID,], but I haven't found a way to translate this into SAS.
Try this:
PROC SQL;
CREATE TABLE result AS
SELECT t2.*
FROM table1 AS t1, table2 AS t2
WHERE t1.id = t2.id
;
QUIT;
This is an expansion on Hong Ooi's method with the corrections suggested by Jon Clements. I have found using a data step to be quicker than using SQL. And it gives you more options for outputting data. For instance, this solution creates a table called "match_error" which holds all IDs in table1 that aren't in table2.
proc sort data=table1;
by id;
run;
proc sort data=table2;
by id;
run;
data result match_error;
merge table1 (in=in_T1) table2 (in=in_T2 keep=id);
by id;
if in_T1 and in_T2 then output result;
if in_T1 and not in_T2 then output match_error;
run;
data out;
merge table1 (in=t1) table2 (in=t2);
by id;
if t1 and not t2;
run;