Difference b/w Create table r&<VAR> and &<VAR> - sas

I was asked to look into a SAS code today and was wondering what each of these meant. This the first time i am looking into a SAS code, so it might be a basic question. I checked the docs (and google), but didn't find the right answer.
create table r&MAC as
create table &MAC as
The full code snippet
create table r&MAC as
select distinct
<COL LIST>
from all;
and
create table &MAC as
select *
from r&MAC;
Any idea what the r&MAC?

The & is a trigger to the SAS Macro language that you want to reference a macro variable. So &MAC will be replaced by the value of the macro variable named MAC. So if you set MAC to table1
%let mac=table1;
then run this statement:
create table &MAC as
select *
from r&MAC;
The SAS Macro language processor will convert that to this SAS code that will then run.
create table table1 as
select *
from rtable1;

Related

SAS - Choose last dataset in library that satisfies specific name convention

Say I have the a library named mylib.
Within the mylib library, the following datasets are held:
mylib.data_yearly_2015
mylib.data_yearly_2016
mylib.data_yearly_2017
mylib.data_yearly_2018
mylib.data_yearly_2015
mylib.data_mtly_01JUN2015
mylib.data_mtly_01DEC2015
mylib.data_mtly_01JUN2016
mylib.data_mtly_01DEC2016
mylib.data_mtly_01JUN2017
mylib.data_mtly_01DEC2017
Now I need to write a macro that will specifically choose the latest data_mtly_xxxxxx table from the mylib library.
For example, in the current stage, it should choose mylib.data_mtly_01DEC2017
If, however, a new dataset gets added, for example mylib.data_mtly_01JUN2018, it would have to choose that table.
How can I go about doing this in SAS?
Get a list of all data sets
Get the date portion using SCAN() and INPUT()
Get max date.
Proc sql noprint;
Select max(input(scan(name, -1, ‘_’), date9.) ) into :latest_date
From sashelp.vtable
Where upcase(libname) = ‘MYLIB’ and upcase(memname) like ‘DATA_MTLY_%’;
Quit;
Now you should have the latest date value in a macro variable and can use that in your code.
%put &latest_date.;
If it looks like a number and not a date, you’ll need a format applied but you should be able to convert it using PUT().
Note: code is untested.

Filter a SAS dataset to contain only identifiers given in a list

I am working in SAS Enterprise guide and have a one column SAS table that contains unique identifiers (id_list).
I want to filter another SAS table to contain only observations that can be found in id_list.
My code so far is:
proc sql noprint;
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN id_list
quit;
This code gives me the following errors:
Error 22-322: Syntax error, expecting on of the following: (, SELECT.
What am I doing wrong?
Thanks up front for the help.
You can't just give it the table name. You need to make a subquery that includes what variable you want it to read from ID_LIST.
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN (select id from id_list)
;
You could use a join in proc sql but probably simpler to use a merge in a data step with an in= statement.
data want;
merge oneColData(in = A) otherData(in = B);
by id_list;
if A;
run;
You merge the two datasets together, and then using if A you only take the ID's that appear in the single column dataset. For this to work you have to merge on id_list which must be in both datasets, and both datasets must be sorted by id_list.
The problem with using a Data Step instead of a PROC SQL is that for the Data step the Data-set must be sorted on the variable used for the merge. If this is not yet the case, the complete Data-set must be sorted first.
If I have a very large SAS Data-set, which is not sorted on the variable to be merged, I have to sort it first (which can take quite some time). If I use the subquery in PROC SQL, I can read the Data-set selectively, so no sort is needed.
My bet is that PROC SQL is much faster for large Data-sets from which you want only a small subset.

SAS Warning: CREATE TABLE statement recursively references the target table

SAS allows creation of proc sql create table statement where the table to be created recursively references itself in the select statement e.g.:
proc sql;
create table t1 as
select
t1.id
,t2.val1
from
t1
inner join t2 on t1.id=t2.id
;
quit;
When a statement like this is executed a warning message is written to the log.
WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.
This warning message could be suppressed by using undo_policy=none option. (see SAS Usage Note 12062)
Question:
Can creating a table in such a recursive manner potentially return some unexpected results? Is it possible that it would create different results that spiting the same operation into 2 steps:
proc sql;
create table _data_ as
select
t1.id
,t2.val1
from
t1
inner join t2 on t1.id=t2.id;
create table t1 as
select * from &syslast;
quit;
Is the two step approach better/safer to use?
This should work fine if the tables being queried are SAS datasets. It is no worse than this simple data step.
data t1;
merge t1 t2;
by id;
run;
When SAS runs that type of step it will first create a new physical file with the results and only after the step has finished it will delete the old t1.sas7bdat and rename the temporary file to t1.sas7bdat. If you do with a PROC SQL statement SAS will follow the same basic steps.
I believe that the warning is there because if the tables being referenced were from a external database system (such as Oracle) then SAS might push the query into the database and there it could cause trouble.
I have found that using the same table name as input and output for SAS proc sql can produce incorrect results. It works OK most of the time, but definitely not 100% of the time. Rather than suppress the warning, use a different output table name.
SAS has confessed to this: http://support.sas.com/kb/12/062.html

Writing a macro in SAS to create a table

Very new to SAS Programming. Want to start with something simple - writing a macro that run an append query. This is all I have managed to figure out. Where am I going wrong?
%MACRO APPENDTEST;
PROC SQL;
CREATE TABLE WORK.APPENDTEST AS
SELECT *
FROM WORK.MONTHLY_SALES_SUMMARY
QUIT;
%MEND APPENDTEST;
You've created a macro but have executed it. This functionality, similar to a function in other languages, allows a macro to compile and execute and different times.
Adding in the following line will call the macro.
%appendtest;

Limiting results in PROC SQL

I am trying to use PROC SQL to query a DB2 table with hundreds of millions of records. During the development stage, I want to run my query on an arbitrarily small subset of those records (say, 1000). I've tried using INOBS to limit the observations, but I believe that this parameter is simply limiting the number of records which SAS is processing. I want SAS to only fetch an arbitrary number of records from the database (and then process all of them).
If I were writing a SQL query myself, I would simply use SELECT * FROM x FETCH FIRST 1000 ROWS ONLY ... (the equivalent of SELECT TOP 1000 * FROM x in SQL Server). But PROC SQL doesn't seem to have any option like this. It's taking an extremely long time to fetch the records.
The question: How can I instruct SAS to arbitrarily limit the number of records to return from the database.
I've read that PROC SQL uses ANSI SQL, which doesn't have any specification for a row limiting keyword. Perhaps SAS didn't feel like making the effort to translate its SQL syntax to vendor-specific keywords? Is there no work around?
Have you tried using the outobs option in your proc sql?
For example,
proc sql outobs=10; create table test
as
select * from schema.HUGE_TABLE
order by n;
quit;
Alternatively, you can use SQL passthrough to write a query using DB2 syntax (FETCH FIRST 10 ROWS ONLY), although this requires you to store all your data in the database, at least temporarily.
Passthrough looks something like this:
proc sql;
connect to db2 (user=&userid. password=&userpw. database=MY_DB);
create table test as
select * from connection to db2 (
select * from schema.HUGE_TABLE
order by n
FETCH FIRST 10 ROWS ONLY
);
quit;
It requires more syntax and can't access your sas datasets, so if outobs works for you, I would recommend that.
When SAS is talking to a database via SAS syntax, part of the query can be translated to DBMS language equivalent - this is called implicit pass through. The rest of the query is "post-processed" by SAS to produce final result.
Depending on SAS version, DBMS vendor and DBMS version, and in some cases even some connection/libname options, different parts of SAS syntax are translatable/considered compatible between SAS and DBMS and thus sent to be performed by DBMS instead of SAS.
With SAS SQL options - INOBS and OUTOBS - I've worked a lot with MS SQL and Oracle via different versions of SAS, but I haven't seen those ever translated to TOP xxx type of queries, so this is probably not supported yet, although when query touches just DMBS data (no joins to SAS data etc), should be quite doable.
So I think you're left with the so called explicit pass-through - specific SAS SQL syntax to connect to database. This type of queries look like this:
proc sql;
connect to oracle as db1 (user=user1 pw=pasw1 path=DB1);
create table test_table as
select *
from connection to db1
( /* here we're in oracle */
select * from test.table1 where rownum <20
)
;
disconnect from db1;
quit;
In SAS 9.3 the syntax can be simplified - if there's already a LIBNAME connection, you can reuse it for explicit pass-through:
LIBNAME ORALIB ORACLE user=...;
PROC SQL;
connect to oracle using ORALIB;
create table work.test_table as
select *
from connection to ORALIB (
....
When connecting using libname be sure to use READBUFF (I usually set some 5000 or so) or INSERTBUFF options (1000 or more) when loading database.
To see if implicit pass-through takes place, set sastrace option:
option sastrace=',,,ds' sastraceloc=saslog nostsuffix;