SAS allows creation of proc sql create table statement where the table to be created recursively references itself in the select statement e.g.:
proc sql;
create table t1 as
select
t1.id
,t2.val1
from
t1
inner join t2 on t1.id=t2.id
;
quit;
When a statement like this is executed a warning message is written to the log.
WARNING: This CREATE TABLE statement recursively references the target table. A consequence of this is a possible data integrity problem.
This warning message could be suppressed by using undo_policy=none option. (see SAS Usage Note 12062)
Question:
Can creating a table in such a recursive manner potentially return some unexpected results? Is it possible that it would create different results that spiting the same operation into 2 steps:
proc sql;
create table _data_ as
select
t1.id
,t2.val1
from
t1
inner join t2 on t1.id=t2.id;
create table t1 as
select * from &syslast;
quit;
Is the two step approach better/safer to use?
This should work fine if the tables being queried are SAS datasets. It is no worse than this simple data step.
data t1;
merge t1 t2;
by id;
run;
When SAS runs that type of step it will first create a new physical file with the results and only after the step has finished it will delete the old t1.sas7bdat and rename the temporary file to t1.sas7bdat. If you do with a PROC SQL statement SAS will follow the same basic steps.
I believe that the warning is there because if the tables being referenced were from a external database system (such as Oracle) then SAS might push the query into the database and there it could cause trouble.
I have found that using the same table name as input and output for SAS proc sql can produce incorrect results. It works OK most of the time, but definitely not 100% of the time. Rather than suppress the warning, use a different output table name.
SAS has confessed to this: http://support.sas.com/kb/12/062.html
Related
As there are many datasets contained in one library. How can I use SAS code to find out which dataset has the largest number of cases?
Suppose the library name is "SASHELP".
Thank you!
The SQL dictionary.* family of tables gives access to all sorts of metadata. Be careful, some dictionary requests can cause a lot of activity as it collects the information requested.
From Docs:
How to View DICTIONARY Tables
…
DICTIONARY Tables and Performance
When you query a DICTIONARY table, SAS gathers information that is pertinent to that table. Depending on the DICTIONARY table that is being queried, this process can include searching libraries, opening tables, and executing SAS views. Unlike other SAS procedures and the DATA step, PROC SQL can improve this process by optimizing the query before the select process is launched. Therefore, although it is possible to access DICTIONARY table information with SAS procedures or the DATA step by using the SASHELP views, it is often more efficient to use PROC SQL instead.
…
Note: SAS does not maintain DICTIONARY table information between queries. Each query of a DICTIONARY table launches a new discovery process.
Example:
* Use dictionary.tables to get the names of the tables with the 10 most rowcount;
proc sql;
reset outobs=10;
create table top_10_datasets_by_rowcount as
select libname, memname, nobs
from dictionary.tables
where libname = 'SASHELP'
and memtype = 'DATA'
order by nobs descending
;
reset outobs=max;
Sometimes I like to make an instant copy of a data set outside of the work library, so if development gets messy I always know where my back up copy of a critical data set is located.
Ex - Lets say I have a permanent library already established, named source.
I can use a data step in order to create a set (set_1) in two different spots.
data set_1 source.set_1;
set sashelp.cars;
run;
I do understand the below sql (or even a copy procedure for that matter) would be equivalent results to the data step above:
proc sql;
create table set_1 as
select distinct *
from sashelp.cars
;
create table source.set_1 as
select *
from set_1
;
quit;
I sound lazy here, but I am interested to know if there is a method in proc sql in which I can just call two sets to be made off the same query, such as the data step example above.
You cannot. Stick to using data steps.
I am working in SAS Enterprise guide and have a one column SAS table that contains unique identifiers (id_list).
I want to filter another SAS table to contain only observations that can be found in id_list.
My code so far is:
proc sql noprint;
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN id_list
quit;
This code gives me the following errors:
Error 22-322: Syntax error, expecting on of the following: (, SELECT.
What am I doing wrong?
Thanks up front for the help.
You can't just give it the table name. You need to make a subquery that includes what variable you want it to read from ID_LIST.
CREATE TABLE test AS
SELECT *
FROM data_sample
WHERE id IN (select id from id_list)
;
You could use a join in proc sql but probably simpler to use a merge in a data step with an in= statement.
data want;
merge oneColData(in = A) otherData(in = B);
by id_list;
if A;
run;
You merge the two datasets together, and then using if A you only take the ID's that appear in the single column dataset. For this to work you have to merge on id_list which must be in both datasets, and both datasets must be sorted by id_list.
The problem with using a Data Step instead of a PROC SQL is that for the Data step the Data-set must be sorted on the variable used for the merge. If this is not yet the case, the complete Data-set must be sorted first.
If I have a very large SAS Data-set, which is not sorted on the variable to be merged, I have to sort it first (which can take quite some time). If I use the subquery in PROC SQL, I can read the Data-set selectively, so no sort is needed.
My bet is that PROC SQL is much faster for large Data-sets from which you want only a small subset.
I work with SAS on a relationnal database that I can access with a libname odbc statement as below :
libname myDBMS odbc datasrc="myDBMS";
Say the database contains a table named 'myTable' with a numeric variable 'var_ex' which values can be 0,1 or . (missing). Now say I want to exclude all rows for which var_ex=1.
If I use the following :
DATA test1;
SET myDBMS.myTable; /* I call directly the table from the DBMS */
where var_ex NE 1;
run;
I don't get rows for which 'var_ex' is missing. Here is a screenshot of the log, with my actual data :
Whereas if I do the exact same thing after importing the table in the Work :
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
DATA test2;
SET myTable; /* I call the table from the work */
where var_ex NE 1;
run;
I select rows for which 'var_ex' is 0 or missing, as intended. Here is a screenshot of the log, with my actual data :
The same happens if I use PROC SQL instead of a DATA step, or another NE-like.
I did some research and more or less understood here that unintended stuff like that can happen if you work directly on a DBMS table.
Does that mean is it simply not recommended to work with a DBMS table, and one has to import table locally as below before doing anything ?
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
Or is there a proper way to manipulate such tables ?
The best way to test how SAS is translating the data step code into database code is through the sastrace system option. Before running code, try this:
options sastrace=',,,db' sastraceloc=saslog;
Then run your code tests. When you check the log, you will see precisely how SAS is translating the code (if it can at all). If it can't, you'll see,
ACCESS ENGINE: SQL statement was not passed to the DBMS, SAS will do the processing.
followed by a select * from table.
In general, if SAS cannot translate data step code into dbms-specific code, it will pull everything to locally manipulate the data. By viewing this output, you can determine precisely how to get the data step to translate into what you need.
If all else fails, you can use explicit SQL pass-through. The code in parentheses operates the same way as if you're running SQL directly from some other client.
proc sql;
connect to odbc(datasrc='source' user='username' pass='password');
create table want as
select * from connection to odbc
(<code specific to your dbms language>);
disconnect from odbc;
quit;
MY SITUATION
I hope all is well. I am currently undertaking a research project with deals with a lot a datasets placed under different libraries. I have created multiple %macro definitions which in turn has generated many output tables and utilised many input tables. These tables are saved under different libraries.
MY ISSUE:
My computer slows down when these datafiles are created. Clearing unwanted tables in every macro program session speedens up computational response.
My QUERY:
Is there a way to generate a list of input and output tables created either using PROC SQL or DATA steps by each macro program? Each MACRO PROGRAM has multiple %macros, once again from code readability purposes. Prefixing/suffixing the datafiles with 'IN' or 'OUT' statements does not help.
This would help me in data management.
You probably have two options: parsing a log file or making a snapshot of dictionary.tables before and after every macro call and extracting the differences. I would prefer an ( easier :)) second option, e.g. like
PROC SQL;
CREATE TABLE BEFORE AS SELECT CATS(libname,".",memname) AS fname FROM DICTIONARY.tables WHERE libname in ('WORK');
QUIT;
PROC SQL;
CREATE TABLE AFTER AS SELECT CATS(libname,".",memname) AS fname FROM DICTIONARY.tables WHERE libname in ('WORK') AND calculated fname not in (select fname FROM BEFORE);
QUIT;