Why does erroneous inner query not make outer query erroneous - sas

Help understand why erroneous inner query does not make outer query erroneous
The following query returns 19
proc sql;
select count(distinct name)
from sashelp.class
where name in (select name from sashelp.iris
where species is not missing)
;quit; *returns 19;
However, I would expect it to return an error, because the inner query does indeed return an error (because the column 'name' is not found in sashelp.iris):
proc sql;
select name from sashelp.iris
where species is not missing
;quit; *returns an error (column not found);
Can some explain the logic why I am not getting an error message in the first instance?

You did not qualify the reference to name so it used the only variable it found called name. So you ran this query:
proc sql;
select count(distinct A.name)
from sashelp.class A
where A.name in
(select A.name
from sashelp.iris B
where B.species is not missing
)
;
quit;
If you actually refer to NAME from IRIS you will get the error message.
220 proc sql;
221 select count(distinct A.name)
222 from sashelp.class A
223 where A.name in
224 (select B.name
225 from sashelp.iris B
226 where B.species is not missing
227 )
228 ;
ERROR: Column name could not be found in the table/view identified with the correlation name B.
ERROR: Unresolved reference to table/correlation name B.
229 quit;

Related

Joining Together Tables in PROC SQL

I want to join two tables that I created into one table but I am getting a syntax error that says Column OverallStudentReport.ID was found in more than one table in the same scope. If anyone could help fix this syntax error that would be appreciated or if anyone has a better way to join these two tables together into one that would be helpful as well.
The code below created my first table
PROC SQL;
Create table SemesterReport1 as select coalesce(A.ID,B.ID,C.ID,D.ID,E.ID) as ID,
coalesce(A.Year,B.Year,C.Year,D.Year,E.Year) as Year, coalesce(A.Term,B.Term,C.Term,D.Term,E.Term) as Term,
SemesterGPA.SemGPA, AccumulativeGPA.GPAAccum,
CreditHoursEarnedSemester.CreditHoursEarnedSemester,
GradedCreditHoursEarnedSemester.GradedCreditHoursEarnedSemester,
ClassStanding.ClassStanding
from SemesterGPA as A
full join AccumulativeGPA as B on A.ID=B.ID and A.Year=B.Year and A.Term=B.Term
full join CreditHoursEarnedSemester as C on A.ID=C.ID and A.Year=C.Year and A.Term=C.Term
full join GradedCreditHoursEarnedSemester as D on A.ID=D.ID and A.Year=D.Year and A.Term=D.Term
full join ClassStanding as E on A.ID=E.ID and A.Year=E.Year and A.Term=E.Term
order by ID, Year, Term
;
quit;
The code below created my second table
PROC SQL;
Create table OverallStudentReport as select coalesce(A.ID,B.ID,C.ID,D.ID,E.ID) as ID,
OverallGPA.TotalGPA,
OverallCreditHoursEarned.OverallCreditHoursEarned,
OverallGradedCreditHoursEarned.OverallGradedCreditHoursEarned,
RepeatClasses.RepeatClasses,
GradeCounts.ACount,GradeCounts.BCount,GradeCounts.CCount,GradeCounts.DCount,
GradeCounts.ECount, GradeCounts.WCount
from OverallGPA as A
full join OverallCreditHoursEarned as B on A.ID=B.ID
full join OverallGradedCreditHoursEarned as C on A.ID=C.ID
full join RepeatClasses as D on A.ID=D.ID
full join GradeCounts as E on A.ID=E.ID
order by ID
;
quit;
and the code below is supposed to join the two tables created above but there is a syntax error.
PROC SQL;
Create table Report1 as select *
from SemesterReport1, OverallStudentReport
full join
OverallStudentReport
on SemesterReport1.ID=OverallStudentReport.ID
order by ID
;
quit;
Here is my log
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 PROC SQL;
74 Create table Report1 as select *
75 from SemesterReport1, OverallStudentReport
76 full join
77 OverallStudentReport
78 on SemesterReport1.ID=OverallStudentReport.ID
79 order by ID
80 ;
ERROR: Column OverallStudentReport.ID was found in more than one table in the same scope.
WARNING: Column named ID is duplicated in a select expression (or a view). Explicit references to it will be to the first one.
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
81 quit;
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SQL used (Total process time):
When you assign table aliases, you should use them consistently throughout the query, not just selectively in SELECT and JOIN. Also, fields in ORDER BY is ambiguous. Since you require the calculated columns in SELECT use calculated keyword.
By the way, see Bad Habits to Kick : Using table aliases like (a, b, c) or (t1, t2, t3). Instead, use more informative shorthand aliases that align to original table names. Consider following adjustments:
PROC SQL;
create table SemesterReport1 as
select coalesce(s.ID, a.ID, ch.ID, g.ID, cs.ID) as Final_ID
, coalesce(s.Year, a.Year, ch.Year, g.Year, cs.Year) as Final_Year
, coalesce(s.Term, a.Term, ch.Term, g.Term, cs.Term) as Final_Term
, s.SemGPA
, a.GPAAccum
, ch.CreditHoursEarnedSemester
, g.GradedCreditHoursEarnedSemester
, cs.ClassStanding
from SemesterGPA as s
full join AccumulativeGPA as a
on s.ID = a.ID
and s.Year = a.Year
and s.Term = a.Term
full join CreditHoursEarnedSemester as ch
on s.ID = ch.ID
and s.Year = ch.Year
and s.Term = ch.Term
full join GradedCreditHoursEarnedSemester as g
on s.ID = g.ID
and s.Year = g.Year
and s.Term = g.Term
full join ClassStanding as cs
on s.ID = cs.ID
and s.Year = cs.Year
and s.Term = cs.Term
order by calculated Final_ID
, calculated Final_Year
, calculated Final_Term;
quit;
PROC SQL;
create table OverallStudentReport as
select coalesce(og.ID, och.ID, ogch.ID, r.ID, gc.ID) as Final_ID
, og.TotalGPA
, och.OverallCreditHoursEarned
, ogch.OverallGradedCreditHoursEarned
, r.RepeatClasses
, gc.ACount
, gc.BCount
, gc.CCount
, gc.DCount
, gc.ECount
, gc.WCount
from OverallGPA as og
full join OverallCreditHoursEarned as och
on og.ID = och.ID
full join OverallGradedCreditHoursEarned as ogch
on og.ID = ogch.ID
full join RepeatClasses as r
on og.ID = r.ID
full join GradeCounts as gc
on og.ID = gc.ID
order by calculated Final_ID;
quit;
Then in final query, do not repeat table OverallStudentReport. And you should qualify the ID (here being Final_ID) in order by. And see another habit to kick: Why is SELECT * considered harmful?
PROC SQL;
create table Report1 as
select smr.Final_ID as ID
, smr.Final_Year as Year
, smr.Final_Term as Term
, smr.SemGPA
, smr.GPAAccum
, smr.CreditHoursEarnedSemester
, smr.GradedCreditHoursEarnedSemester
, smr.ClassStanding
, osr.Final_ID
, osr.TotalGPA
, osr.OverallCreditHoursEarned
, osr.OverallGradedCreditHoursEarned
, osr.RepeatClasses
, osr.ACount
, osr.BCount
, osr.CCount
, osr.DCount
, osr.ECount
, osr.WCount
from SemesterReport1 smr
full join OverallStudentReport osr
on smr.Final_ID = osr.Final_ID
order by smr.Final_ID ;
quit;

SAS Array creation

I am trying to create array that hold a value.
proc sql noprint;
select count(*) into :dscnt from study;
select libname into :libname1 - :libname&dscnt from study;
quit;
I think the syntax is correct but i keep getting this following error message in in SAS studio.
***NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: Line generated by the macro variable "DSCNT".
79 libname 4
_
22
200
ERROR 22-322: Syntax error, expecting one of the following: ',', FROM, NOTRIM.
ERROR 200-322: The symbol is not recognized and will be ignored.***
Can someone explain to me what i am doing wrong?
Thanks
You do not need to know the number of items ahead of time, if you leave it blank, SAS will automatically create the correct number of macro variables.
If you do want to use that number elsewhere you can create it using the TRIMMED option to remove any extra spaces. See the second example below.
proc sql noprint;
select name into :name1- from sashelp.class;
quit;
%put &name1;
%put &name19.;
proc sql noprint;
select count(distinct name) into :name_count TRIMMED from sashelp.class;
quit;
%put &name_count;
Results:
3068 proc sql noprint;
3069 select name into :name1- from sashelp.class;
3070 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
3071
3072 %put &name1;
Alfred
3073 %put &name19.;
William
3074
3075 proc sql noprint;
3076 select count(distinct name) into :name_count TRIMMED from
3076! sashelp.class;
3077 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
3078
3079 %put &name_count;
19
The into syntax in proc sql stores formatted values into macro variables. For example if you run this code:
proc sql noprint;
select count(*) into :dscnt from sashelp.class;
quit;
%put #&dscnt#;
You will see the output is:
# 19#
In otherwords the result is left padded with spaces. This means in your example, the code is resolving to something like:
select libname into :libname1 - :libname 19 from study;
^ Which is obviously invalid syntax. To fix this, you can simply add the TRIMMED keyword to your SQL statement:
select count(*) into :dscnt TRIMMED from study;
Thanks to Reeza for the TRIMMED keyword.
do something like below
proc sql noprint;
select count(*) into :dscnt from sashelp.class;
select name into :name1 - :name%left(&dscnt) from sashelp.class;
quit;

SAS: Create Variable from Proc SQL to use in Macro

I want to count the number of unique items in a variable (call it "categories") then use that count to set the number of iterations in a SAS macro (i.e., I'd rather not hard code the number of iterations).
I can get a count like this:
proc sql;
select count(*)
from (select DISTINCT categories from myData);
quit;
I can run a macro like this:
%macro superFreq;
%do i=1 %to &iterationVariable;
Proc freq data=myData;
table var&i / out=var&i||freq;
run;
%mend superFreq;
%superFreq
I want to know how to get the count into the iteration variable so that the macro iterates as many times as there are unique values in the variable "categories".
Sorry if this is confusing. Happy to clarify if need be. Thanks in advance.
You can achieve this by using the into clause in proc sql:
proc sql noprint;
select max(age),
max(height),
max(weight)
into :max_age,
:max_height,
:max_weight
from sashelp.class;
quit;
%put &=max_age &=max_height &=max_weight;
Result:
MAX_AGE= 16 MAX_HEIGHT= 72 MAX_WEIGHT= 150
You can also select a list of results into a macro variable by combining the into clause with the separated by clause:
proc sql noprint;
select name into :list_of_names separated by ' ' from sashelp.class;
quit;
%put &=list_of_names;
Result:
LIST_OF_NAMES=Alfred Alice Barbara Carol Henry James Jane Janet Jeffrey John Joyce Judy Louise Mary Philip Robert Ronald Thomas
William

Joining two sets in SAS SQL with table/correlation error

I am trying to join two datasets. The first dataset1 has two columns item and price. The second dataset2 has three columns - item, customerid, and qty. I need to only include the unique rows from dataset1 that are not in dataset2. While trying to implement this code, I get the error:
Error: Unresolved reference to table/correlation name i.
I am unsure how to fix this error, thanks.
PROC SQL;
create table a as
select *
from dataset1 as i
except corr
select *
from dataset2 as p
where i.item = p.item;
describe table a;
QUIT;
EXCEPT is used to select records in the first set that do not exist in the second set. So if what you want is, to quote you, select records from dataset1 that do not appear in dataset2, you don't need the where clause:
PROC SQL;
create table a as
select *
from dataset1 as i
except corr
select *
from dataset2 as p
;
QUIT;
If however, like that where clause would suggest, you actually want to select records from dataset1 where the value of item is not found in dataset2, you could do this
proc sql;
select *
from dataset1 i
where not exists (select *
from dataset2 p
where i.item=p.item
)
;
quit;
EDIT: following your latest comment, and if you reaaaally need your query to feature an except, this should get you your result
proc sql;
create table a as
select t1.*
from dataset1 t1
inner join (select *
from dataset1 as i
except corr
select *
from dataset2 as p
) t2
on t1.item=t2.item
;
quit;
Even though this will do the same as the query above (with not exists) or, now that I think of it (stupid me), as this:
proc sql;
create table a as
select *
from dataset1
where item not in (select distinct item from dataset2)
;
quit;

Column Sorting with PROC Transpose in SAS

I am using below mentioned code to get the columns sorted dynamically after proc transpose. I have gone a lot of solutions for this solution. But now I am getting an error if I run
data work.AB ;
input name $ class $ dt $ gpa $;
datalines;
JOHN 1 201607 C-
JOHN 1 201608 C+
JOHN 1 201702 B-
JOHN 2 201608 A
NICK 1 201608 A
NICK 1 201707 A
MIKE 2 201608 B
MIKE 2 201607 B
MIKE 2 201707 B+
MIKE 2 201702 B
BOB 3 201702 D
BOB 3 201607 C
BOB 3 201707 C
;
proc sort data=work.AB;
by NAME ClASS dt;
run;
PROC TRANSPOSE DATA = AB OUT = ABC(drop=_name_) ;
BY nAME cLASS;
VAR GPA;
ID dt;
RUN ;
proc sql ;
create table test as
select name into : list separated by ' '
from dictionary.columns
where libname='WORK' and memname='ABC'
order by input(substr(name,anydigit(name)),best32.)
;
quit;
%put &list;
data want;
retain &list;
set ABC;
run;
Error that I get is
22 GOPTIONS ACCESSIBLE;
WARNING: Apparent symbolic reference LIST not resolved.
23 %put &list;
&list
24 data want;
25 retain &list;
_
22
200
WARNING: Apparent symbolic reference LIST not resolved.
ERROR 22-322: Syntax error, expecting one of the following: a name, ;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.
ERROR 200-322: The symbol is not recognized and will be ignored.
26 set ABC;
27 run;
Kindly suggest.
You cannot put the values from the same SELECT statement into both a dataset and macro variables. Remove the create table test as from your SQL code.
You also might want to suppress some of the warnings by changing the query to:
proc sql noprint ;
select case when (not anydigit(name)) then -1
else input(substr(name,anydigit(name)),?32.)
end as order
, name
into :list
, :list separated by ' '
from dictionary.columns
where libname='WORK' and memname='ABC'
order by 1
;
quit;
%put &list;