Joining Together Tables in PROC SQL

Joining Together Tables in PROC SQL - sas

I want to join two tables that I created into one table but I am getting a syntax error that says Column OverallStudentReport.ID was found in more than one table in the same scope. If anyone could help fix this syntax error that would be appreciated or if anyone has a better way to join these two tables together into one that would be helpful as well.
The code below created my first table
PROC SQL;
Create table SemesterReport1 as select coalesce(A.ID,B.ID,C.ID,D.ID,E.ID) as ID,
coalesce(A.Year,B.Year,C.Year,D.Year,E.Year) as Year, coalesce(A.Term,B.Term,C.Term,D.Term,E.Term) as Term,
SemesterGPA.SemGPA, AccumulativeGPA.GPAAccum,
CreditHoursEarnedSemester.CreditHoursEarnedSemester,
GradedCreditHoursEarnedSemester.GradedCreditHoursEarnedSemester,
ClassStanding.ClassStanding
from SemesterGPA as A
full join AccumulativeGPA as B on A.ID=B.ID and A.Year=B.Year and A.Term=B.Term
full join CreditHoursEarnedSemester as C on A.ID=C.ID and A.Year=C.Year and A.Term=C.Term
full join GradedCreditHoursEarnedSemester as D on A.ID=D.ID and A.Year=D.Year and A.Term=D.Term
full join ClassStanding as E on A.ID=E.ID and A.Year=E.Year and A.Term=E.Term
order by ID, Year, Term
;
quit;
The code below created my second table
PROC SQL;
Create table OverallStudentReport as select coalesce(A.ID,B.ID,C.ID,D.ID,E.ID) as ID,
OverallGPA.TotalGPA,
OverallCreditHoursEarned.OverallCreditHoursEarned,
OverallGradedCreditHoursEarned.OverallGradedCreditHoursEarned,
RepeatClasses.RepeatClasses,
GradeCounts.ACount,GradeCounts.BCount,GradeCounts.CCount,GradeCounts.DCount,
GradeCounts.ECount, GradeCounts.WCount
from OverallGPA as A
full join OverallCreditHoursEarned as B on A.ID=B.ID
full join OverallGradedCreditHoursEarned as C on A.ID=C.ID
full join RepeatClasses as D on A.ID=D.ID
full join GradeCounts as E on A.ID=E.ID
order by ID
;
quit;
and the code below is supposed to join the two tables created above but there is a syntax error.
PROC SQL;
Create table Report1 as select *
from SemesterReport1, OverallStudentReport
full join
OverallStudentReport
on SemesterReport1.ID=OverallStudentReport.ID
order by ID
;
quit;
Here is my log
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 PROC SQL;
74 Create table Report1 as select *
75 from SemesterReport1, OverallStudentReport
76 full join
77 OverallStudentReport
78 on SemesterReport1.ID=OverallStudentReport.ID
79 order by ID
80 ;
ERROR: Column OverallStudentReport.ID was found in more than one table in the same scope.
WARNING: Column named ID is duplicated in a select expression (or a view). Explicit references to it will be to the first one.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
81 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):

When you assign table aliases, you should use them consistently throughout the query, not just selectively in SELECT and JOIN. Also, fields in ORDER BY is ambiguous. Since you require the calculated columns in SELECT use calculated keyword.
By the way, see Bad Habits to Kick : Using table aliases like (a, b, c) or (t1, t2, t3). Instead, use more informative shorthand aliases that align to original table names. Consider following adjustments:
PROC SQL;
create table SemesterReport1 as
select coalesce(s.ID, a.ID, ch.ID, g.ID, cs.ID) as Final_ID
, coalesce(s.Year, a.Year, ch.Year, g.Year, cs.Year) as Final_Year
, coalesce(s.Term, a.Term, ch.Term, g.Term, cs.Term) as Final_Term
, s.SemGPA
, a.GPAAccum
, ch.CreditHoursEarnedSemester
, g.GradedCreditHoursEarnedSemester
, cs.ClassStanding
from SemesterGPA as s
full join AccumulativeGPA as a
on s.ID = a.ID
and s.Year = a.Year
and s.Term = a.Term
full join CreditHoursEarnedSemester as ch
on s.ID = ch.ID
and s.Year = ch.Year
and s.Term = ch.Term
full join GradedCreditHoursEarnedSemester as g
on s.ID = g.ID
and s.Year = g.Year
and s.Term = g.Term
full join ClassStanding as cs
on s.ID = cs.ID
and s.Year = cs.Year
and s.Term = cs.Term
order by calculated Final_ID
, calculated Final_Year
, calculated Final_Term;
quit;
PROC SQL;
create table OverallStudentReport as
select coalesce(og.ID, och.ID, ogch.ID, r.ID, gc.ID) as Final_ID
, og.TotalGPA
, och.OverallCreditHoursEarned
, ogch.OverallGradedCreditHoursEarned
, r.RepeatClasses
, gc.ACount
, gc.BCount
, gc.CCount
, gc.DCount
, gc.ECount
, gc.WCount
from OverallGPA as og
full join OverallCreditHoursEarned as och
on og.ID = och.ID
full join OverallGradedCreditHoursEarned as ogch
on og.ID = ogch.ID
full join RepeatClasses as r
on og.ID = r.ID
full join GradeCounts as gc
on og.ID = gc.ID
order by calculated Final_ID;
quit;
Then in final query, do not repeat table OverallStudentReport. And you should qualify the ID (here being Final_ID) in order by. And see another habit to kick: Why is SELECT * considered harmful?
PROC SQL;
create table Report1 as
select smr.Final_ID as ID
, smr.Final_Year as Year
, smr.Final_Term as Term
, smr.SemGPA
, smr.GPAAccum
, smr.CreditHoursEarnedSemester
, smr.GradedCreditHoursEarnedSemester
, smr.ClassStanding
, osr.Final_ID
, osr.TotalGPA
, osr.OverallCreditHoursEarned
, osr.OverallGradedCreditHoursEarned
, osr.RepeatClasses
, osr.ACount
, osr.BCount
, osr.CCount
, osr.DCount
, osr.ECount
, osr.WCount
from SemesterReport1 smr
full join OverallStudentReport osr
on smr.Final_ID = osr.Final_ID
order by smr.Final_ID ;
quit;

Related

Sql to proc sql conversion

I have 2 tables that contains var values like (ID, date, var1,var2,var3....)
I need to get the data from table2 and add it to table1 for which (ID or date) does not exist in table1.
I am using the below code in sql to get new ID's from tab2 to tab1:
INSERT INTO table1
SELECT * FROM table2 a
WHERE ID not in(select ID from table1 where ID=a.ID)
Here is the code to add new date for existing ID's in tab2 to tab1 :
INSERT INTO table1
SELECT * FROM table2 a
WHERE date not in(select date from table1 where ID=a.ID)
I don't know how to do this in proc sql.
Please share an effective method to do this task.
To insert the new ID I used:
proc sql;
create table lookup as
select a.ID
from table1 a inner join table2 b
on a.ID = b.ID
;
insert into table1
select * from table2 a
where a.ID not in (select ID from lookup)
;
quit;
This works well. But, it failed to insert a date for existing IDs.
Please suggest some ideas to complete this step.
Thanks in advance!

SAS SQL is Similar to the SQl you have written.
The SAME insert statements can be warped as proc sqls in SAS and they work like charm.
If your SQL did work then the following will work too.
PROC SQL;
INSERT INTO work.table1
SELECT * FROM work.table2 a
WHERE ID not in(select ID from work.table1 where ID=a.ID);
INSERT INTO work.table1
SELECT * FROM work.table2 a
WHERE date not in(select date from work.table1 where ID=a.ID)
QUIT;

SAS column name replacement scheme

I have a pass-through query using proc sql (excerpt below), in which some of the resulting names are modified since they aren't valid SAS names. 1A is replaced by _A, 2A is replaced by _A0, and some other changes are made. My questions are:
Is there a document which explains the rules for name replacement e.g. 2A becomes _A0?
Is it possible for me to change the way SAS corrects the names? For example, can I make 1A become _1A instead of _A?
.
proc sql;
connect to oracle as clc([omitted]);
CREATE table out.bk_ald as
SELECT *
FROM connection to bpm (
SELECT
, "1A"
, "1B"
, "1C"
, "1D"
, "1E"
, "2A"
, "2B"
, "2C"
...

You cannot change the algorithm and I am not sure if it is published. But you could either rename the column yourself on the Oracle side.
select * from connection to oracle (select "1A" as "_1A", ...);
Or rename on the SAS side. SAS will store the original name as the variable's LABEL. You could query the metadata and use that to rename the variables.
proc contents data=bk_ald noprint out=contents; run;
proc sql noprint ;
select catx(name,'=',cats('_',label)) into :rename separated by ' '
from contents
where upcase(name) ne upcase(label)
;
quit;
data want ;
set bk_ald;
rename &rename ;
run;

Joining two sets in SAS SQL with table/correlation error

I am trying to join two datasets. The first dataset1 has two columns item and price. The second dataset2 has three columns - item, customerid, and qty. I need to only include the unique rows from dataset1 that are not in dataset2. While trying to implement this code, I get the error:
Error: Unresolved reference to table/correlation name i.
I am unsure how to fix this error, thanks.
PROC SQL;
create table a as
select *
from dataset1 as i
except corr
select *
from dataset2 as p
where i.item = p.item;
describe table a;
QUIT;

EXCEPT is used to select records in the first set that do not exist in the second set. So if what you want is, to quote you, select records from dataset1 that do not appear in dataset2, you don't need the where clause:
PROC SQL;
create table a as
select *
from dataset1 as i
except corr
select *
from dataset2 as p
;
QUIT;
If however, like that where clause would suggest, you actually want to select records from dataset1 where the value of item is not found in dataset2, you could do this
proc sql;
select *
from dataset1 i
where not exists (select *
from dataset2 p
where i.item=p.item
)
;
quit;
EDIT: following your latest comment, and if you reaaaally need your query to feature an except, this should get you your result
proc sql;
create table a as
select t1.*
from dataset1 t1
inner join (select *
from dataset1 as i
except corr
select *
from dataset2 as p
) t2
on t1.item=t2.item
;
quit;
Even though this will do the same as the query above (with not exists) or, now that I think of it (stupid me), as this:
proc sql;
create table a as
select *
from dataset1
where item not in (select distinct item from dataset2)
;
quit;

Multiple left join with same columns

I have 3 Data sets A,B,C which contains the following variables
A:
period region city
B:
period city Sales
C:
period region Sales
My goal is to do a left join on A using B and C to get the Sales information based on the geographic location. I tried to in the sequence of steps:
/* Left joining B to A based on period and region */
proc sql;
Create table merge1 as
select l.* , r.* from
A as l left join B as r
on l.period = r.period and l.city=r.city;
quit;
/* Renaming Sales variable*/
data merged2;
set merge1;
rename Sales= s1;
run;
/*Doing another left join again, this time using C*/
proc sql;
create table merge3 as
select l.*,r.* from
A as l left join C as r
on l.period= r.period and l.region=r.region;
quit;
/*Replacing some of the values*/
data merge4;
set merge3;
Sales1= IFN(s1=., Sales, s1);
drop s1 Sales;
run;
My question would be if there are much better/ efficient ways to go about this? Especially on the multiple left joins since the process will get really tedious as the number of datasets and varaibles to be matched increases, thanks!

You could do it in a single SQL procedure. Since you have multiple tables, you will have to join them one by one.
proc sql;
Create table merge1 as select
A.* ,
B.sales as s1,
C.sales as s2,
coalesce(B.sales, C.sales) as Sales /*takes first non missing value*/
from A
left join B on (A.period = B.period and A.city = B.city)
left join C on (A.period = C.period and A.region = C.region);
quit;

Selecting Max Value from Left Join

I have two tables like below.
Profile : ID
Charac : ID, NAME, DATE
With the above tables, I am trying to get NAME from Charac where we have max date.
I am trying to do a join with proc sql by replicating the answer for mysql like below
proc sql;
create table ggg as
select profile.ID ,T2.NAME
from Profile
left join
( select ID,max(DATE) as max_DATE
from EDW.CHARAC
group by ID
) as T1
on fff.ID = EDW.ID
left join EDW.CHARAC as T2
on T2.ID = T1.max_DATE
order by profile.ID DESC;
quit;
Error
ERROR: Unresolved reference to table/correlation name EDW.
ERROR: Expression using equals (=) has components that are of different data types.

Could it be you intended
on T2.ID = T1.max_DATE
which is probably source of "components that are of different data types" error
to be:
on T2.ID = T1.ID and T2.DATE = T1.max_DATE
that, is - joining on IDs at maximum DATE?

You can't use EDW like that. You need to join
on fff.ID=T1.ID
As far as data types, that probably is because EDW.ID is undefined and thus numeric by default.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Joining Together Tables in PROC SQL - sas

Related

Sql to proc sql conversion

SAS column name replacement scheme

Joining two sets in SAS SQL with table/correlation error

Multiple left join with same columns

Selecting Max Value from Left Join

Categories

Resources