issue with nested select statements in proc sql - sas

I was given this code to run but I keep getting errors even after I've made sure there are the right number of left/right brackets. This is the original code as my adding of brackets seemed to be in the wrong place.
proc sql;
create table all as
select distinct a.id, a.count, b.date
from (select distinct id, count (*) as count from (select distinct id, p_id, date2 from table1) group by id) a
(select distinct id, min(date2) as date format datetime. from table1) b
where a.id=b.id;
quit;
(select distinct id, min(date2) as date format datetime. from
-------- -
22 22
202 76
3520! table1) group by id) b
ERROR 22-322: Syntax error, expecting one of the following: ), ','.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
ERROR 76-322: Syntax error, statement will be ignored.
Edit: after adding a comma I then get this error:
256 , (select id, min(date2) as date format datetime. from
256! table1) group by id ) b
-
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, !, !!, &, (, *, **, +, ',', -,
'.', /, <, <=, <>, =, >, >=, ?, AND, BETWEEN, CONTAINS, EQ, EQT, EXCEPT, GE, GET,
GT, GTT, HAVING, IN, INTERSECT, IS, LE, LET, LIKE, LT, LTT, NE, NET, NOT, NOTIN,
OR, ORDER, OUTER, UNION, ^, ^=, |, ||, ~, ~=.
ERROR 76-322: Syntax error, statement will be ignored.
257 Where a.id=b.id;
258 quit;

The error is not of brackets, but of comma (,) . You've missed comma sign at the start of 5th line.
, (select distinct id, min(date2) as date format datetime. from table1) b
EDIT: Indented the original code with my comma fix. I don't know why you are getting this new error. I copied your original code w/ comma addition and tested you code with dummy data and it's working fine. I guess some hidden junk character is causing error.
data table1;
input id p_id date2 :yymmdd10.;
datalines;
1 1 2012-01-15
1 1 2012-01-15
2 1 2012-01-15
2 2 2012-01-15
4 1 2012-01-15
;;;;
run;
proc sql;
create table all as
select distinct a.id, a.count, b.date
from (select distinct id, count (*) as count
from (select distinct id, p_id, date2 from table1)
group by id
) a
, (select distinct id, min(date2) as date format datetime.
from table1
) b
where a.id=b.id;
quit;

Related

Partitions in Proc SQL

Can we use partitions inside Proc Sql, if not could you please help me the equivalent logic to use?
Proc sql;
select user_id,
content_title,
calendar_date,
sum(watch_second)/3600 as hours_watched,
sum(hours_watched) over (partition by user_id, content_title order by calendar_date) AS cumulative_hours,
available_hours,
cumulative_hours / available_hours as pct_completed
from watch_history as a
inner join dim_user s
ON s.USER_ID = a.USER_ID
left join dim_content_meta pb
ON pb.metrics_video_id = a.metrics_video_id
inner join sascidm.coop_top50 v
on pb.hummus_show_id = v.hummus_show_id
where pb.hummus_playback_type in ('VOD')
AND pb.hummus_show_type = 'series'
order by 1,2,3;
quit;
Error:
sum(watch_second)/3600 as hours_watched,
29 sum(hours_watched) over (partition by user_id, content_title order by calendar_date) AS cumulative_hours,
____
22
76
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, ',', -, /, <, <=, <>, =, >, >=, ?, AND, BETWEEN,
CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
ERROR 76-322: Syntax error, statement will be ignored.
No, proc sql supports only a relatively limited set of SQL that's close to ANSI SQL from a few decades ago, and does not support partition by as that's much newer.
For the most part, a SAS programmer would instead use SAS procs to compute things like this rather than SQL - PROC TABULATE could probably do this I suspect.
It is simple in a DATA step to generate a cummulative sum. Just use a SUM statement. To reset by groups use a FIRST. flags to detect when a new group is starting.
Proc sql;
create table raw as
select
user_id
,content_title
,calendar_date
,sum(watch_second)/3600 as hours_watched
/*
,sum(hours_watched) over (partition by user_id, content_title order by calendar_date) AS cumulative_hours
*/
,available_hours
/*
,cumulative_hours / available_hours as pct_completed
*/
from watch_history as a
inner join dim_user s
ON s.USER_ID = a.USER_ID
left join dim_content_meta pb
ON pb.metrics_video_id = a.metrics_video_id
inner join sascidm.coop_top50 v
on pb.hummus_show_id = v.hummus_show_id
where pb.hummus_playback_type in ('VOD')
AND pb.hummus_show_type = 'series'
order by 1,2,3
;
quit;
data want;
set raw ;
by user_id content_title calendar_date;
if first.content_title then cumulative_hours=0;
cumulative_hours + hours_watched;
pct_completed = cumulative_hours / available_hours;
run;

Using the result of a query as a parameter for another query

I am trying to use a query result as a parameter for another query.
As below:
PROC SQL;
SELECT mydate INTO : varmydate FROM work.MyTable WHERE codigo = 1234;
QUIT;
PROC SQL;
SELECT * FROM work.MyOtherTable
WHERE codata = &varmydate;
QUIT;
But, unfortunately, this didn't work.
In this example, the varmydate variable will receive a value of type data in the first query.
96 PROC SQL;
97 SELECT codata FORMAT date9., valor FROM work.MyOtherTable WHERE codata = &varmydate;
NOTE: PROC SQL set option NOEXEC and will continue to check the syntax of statements.
NOTE: Line generated by the macro variable "VARMYDATE".
97 01MAR2020
_______
22
76
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, -, /, <, <=, <>, =, >, >=, AND, EQ, EQT, GE, GET,
GROUP, GT, GTT, HAVING, LE, LET, LT, LTT, NE, NET, OR, ORDER, ^=, |, ||, ~=.
ERROR 76-322: Syntax error, statement will be ignored.
98 QUIT;
After executing the first query, the command below:
%PUT &varmydate;
will have the following result:
01MAR2020
But the second query returns an error.
If I use the syntax below:
PROC SQL;
SELECT * FROM work.MyOtherTable
WHERE codata = '01MAR2020'd;
QUIT;
then it will work.
You did not indicate what the LOG window messaged regarding didn't work
I presume column mydate in table mytable is type character storing the date9. represention of a date. If it had instead been numeric variable with a date format the macro variable myvardate would have a value such as 21975
Thus, you need to resolve the date9 representation in a SAS date literal context
WHERE codata = "&myvardate"d;
Double quotes are needed because macro resolution will not occur within single quoted literals.

Replace ? in a column to appropriate value in SAS sql query?

I have a data that has column A with following data
Column A
--------
1
2
?
2
I used the query:
proc sql;
select
if A= '?' then A=., count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
This is not working. Please suggest how can i replace the ? to any standard value?
In SQL it's a CASE statement, not IF/THEN.
proc sql;
select
case when a='?' then .
else a end as a, count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
Or you could use an IFC() function as well.
proc sql;
select
ifc(a='?', ., a) as a, count(*) as N_obs
from freq_sex_Partner
group by Number_of_sexual_partners;
quit;
Column A contains "?" so it is character valued. The #reeza code should be then "" or ifc(a='?',"", a). Also, if you do not also select the grouping variable the context of the N_obs is lost.
Suggest
data have;
input a $ nsp ;
datalines;
1 2
2 3
? 7
2 7
run;
proc sql;
select
nsp
, case when a='?' then '' else a end as a
, count(*) as nsp_count
from have
group by nsp
;
quit;
The query will also log the message NOTE: The query requires remerging summary statistics back with the original data. as Proc SQL is performing an automatic remerge of group aggregates with individual rows within the group.

Syntax error using CATX in SAS PROC SQL

I am generating a syntax error in SAS 9.4 when trying to use CATX("|", of a1-a5) in PROC SQL.
Why do the first two outputs work, but the third fails?
data test;
input a1 $ a2 $ a3 $ a4 $ a5 $;
cards;
a b c d e
f g h i j
k l m n o
p q r s t
u v w x y
;
run;
proc sql;
select CATX('|',a1,a2,a3,a4,a5) as catx from test;
quit;
data test2;
set test;
catx=CATX('|',OF a1-a5);
run;
proc print data=test2; run;
proc sql;
select CATX('|',OF a1-a5) as catx from test;
quit;
The first proc sql and the data step produce the expected "a|b|c|d|e", etc. But the third proc sql produces a syntax error pointed at the "a1":
32 proc sql;
33 select CATX('|',OF a1-a5) as catx from test;
--
22
ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, (, *, **, +, ',', -, '.', /, <, <=, <>, =, >, >=, ?, AND, BETWEEN,
CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=.
Thanks
You have hit one of those walls in Proc SQL where some base SAS functions are not fully supported. As someone mentioned earlier you're better off creating a macro variable containing the columns you need for your concatenation.
here is a quick example:
proc sql noprint;
select name into :cols separated by ','
from dictionary.columns
where libname = "WORK" and
memname = "TEST";
quit;
%put &cols;
proc sql;
select CATX('|',&cols) as catx from test;
quit;
Obviously your where clause will be more complex as your original data set may contain columns not required in the CATX expression.
This syntax "OF a1-a5" is specified for data-step code only.
"proc sql" sas procedure is a simpliest sql code and it cant't be mixed with data-step code.

Extend SAS MACRO to multiple fields

I have a macro inspired by "PROC SQL by Example" that finds duplicate rows based on a single column/field:
data have ;
input name $ term $;
cards;
Joe 2000
Joe 2000
Joe 2002
Joe 2008
Sally 2001
Sally 2003
; run;
%MACRO DUPS(LIB, TABLE, GROUPBY) ;
PROC SQL ;
CREATE TABLE DUPROWS AS
SELECT &GROUPBY, COUNT(*) AS Duplicate_Rows
FROM &LIB..&TABLE
GROUP BY &GROUPBY
HAVING COUNT(*) > 1
ORDER BY Duplicate_Rows;
QUIT;
%MEND DUPS ;
%DUPS(WORK,have,name) ;
proc print data=duprows ; run;
I would like to extend this to look for duplicates based on multiple columns (Rows 1 and 2 in my example), but still be flexible enough to deal with a single column.
In this case it would run the code:
proc sql ;
create table duprows as select name,term,count(*) as Duplicate_Rows
from work.have
group by name,term
HAVING COUNT(*) > 1
;quit;
To produce:
To include an arbitrary number of fields to group on, you can list them all in the groupby macro parameter, but the list must be comma-delimited and surrounded by %quote(). Otherwise SAS will see the commas and think you're providing more macro parameters.
So in your case, your macro call would be:
%dups(lib = work, table = have, groupby = %quote(name, term));
Since &groupby is included in the select and group by clauses, all fields listed will appear in the output and will be used for grouping. This is because when &groupby resolves, it becomes the text name, term.