What exactly the use SQL in SAS? - sas

I have just started studying SAS and a little bit confused. This link here show a Query to the DATA SET. I thought that it would be like connecting to a external DATABASE and perform a request query to a DATABASE.
So does the DATA SET is the database and the SQL syntax is just another way of processing data in the DATA SET?
Also can you recommend a better tutorial. A free open source tutorial/book/sources will be much better.
Well I'm still learning and I will appreciate any opinion/answer/recommendation.
I use SAS University Edition in virtually in my computer.

So does the DATA SET is the database and the SQL syntax is just another way of processing data in the DATA SET?
DATA SET is the table (not the database), and yes SQL is another way.
You can think of the native SAS library engine, V9 as the data base. For example:
libname mydata 'c:\projectx\sasdata'; is the same as
libname mydata V9 'c:\projectx\sasdata';
libname mydata <engine> 'c:\projectx\sasdata';
libname mydata <engine> <options for connection parameters>;
V9 is the default engine used when the libname statement does specify one. There are different engines for connecting to almost any remote (non-SAS) data bases, data files or data providers that let a SAS coder code in SAS and not have to learn the language or dialect of the remote environment.
A rough mapping of SAS structure concepts to data base concepts:
V9 engine ~ "data base"
local folder ~ schema, instance, or catalog
data set ~ table
variable ~ column
observation ~ row
You can learn more about engines by searching the help system for "SAS Engines" and "How Engines Work with SAS Files"
Proc SQL lets you code using SQL. A coder can choose the best language for themselves and for the problem at hand; be it SQL, DATA steps and PROC steps.

Do not confuse SQL (query language) with mySQL, postgresql, sqlite or any other database technology.
proc sql is an alternative to the data step.
Mostly you can do the same with both, but one might be able to perform better in certain situation or allow for easier/shorter syntax than the other.
The dataset you use has nothing to do with the language you use to "query" it.
Look into LIBNAME statement to connect to external databases.

As someone said before, do not confuse between SQL (the query language) and DataSet (its the name of the tables in SAS).
Here is an example of the same result using DATA SET syntax and PROC SQL syntax:
With DATA SET:
DATA myNewTable;
SET myTable;
WHERE id = 123;
RUN;
With PROC SQL syntax:
PROC SQL;
CREATE TABLE myNewTable AS
SELECT * FROM myTable
WHERE id = 123;
QUIT;
Hope it makes sense.

Related

Get server info for all librefs

How can I get a table with variables libref and server_id (or any server info) for all libraries available to me in SAS?
My goal is to get a summary of where the data is physically located for all these libraries, in order to write efficient queries when fetching data from different servers.
Look at what information is available in the view SASHELP.VLIBNAM (or DICTIONARY.LIBNAMES when using PROC SQL).
Here is a utility macro that pulls the engine, host and schema from that view for a given libref. I have used it for TERADATA, ORACLE and ODBC engines. dblibchk.sas
From Tom's code and advices I built the table I needed with this code:
PROC SQL;
SELECT distinct libname, engine, path,
CASE WHEN engine in('BASE','V9') THEN 'SAS' ELSE catx('_',engine,path) END AS server
FROM DICTIONARY.LIBNAMES ;
QUIT;
there are a few tables that can help you in the Library SASHELP like Tom mentioned.
You can also use VTABLE will have all the tables from which library and VCOLUMN will have the detail from library to table to columns as well as the data type used and it's length.
They work a bit like in SQL data from the information_schema database.
Alternatively using proc content on a dataset will also return all of it's component and you can put that in a table or a macro variable.
Hope this helps!

The processing of large data sets in sas

I am looking for solutions or ideas how to speed the processing of large data sets in sas.
What would you recommend?
What is better data step or proc sql procedure?
Speeding up your data processing depends on where your data is saved.
Your data can be either in:
SAS Table,
Database Table (Miscrosfot SQL, Oracle, DB2, MYSQL, ..
etc.)
Use SAS Data Step when:
You are querying/processing SAS tables,
You want to do iterative
processing (ex. retaining values or using arrays).
Use Proc SQL when:
You are querying a large Database table,
You can do a SQL "Pass Through" where you send SQL code to be
executed on the DB server and only the output is sent to SAS (instead
of bringing the entire tables through the network to SAS and then filter it),
You want to query SAS Tables but prefer SQL joins to data step merges.
Another topic you should consider is efficiency programming; where you are optimising your query and look-ups.
I find Proc SQL to be better for my use cases. We may need some more specifics on the size and variety of data your trying to join/export etc.
Give us some info on that and we can try to help.
Tips:
Limit the fields your pulling over
Subset data
Anecdotally from my experience Proc SQL seems faster.
Here are two tips on speeding up queries with Proc SQL:
In general, you want to rule out as much data as possible when querying. If you are usingProc SQL, the order of the restrictions in the where clause matters. Put the most restrictive parts first.
For example, if I'm querying a database for teachers with the last name "JONES", that were hired after Jan 2005, I would structure my where clause like this: where last_name = 'JONES' and hire_date > 200501 I would do this because last name is likely to exclude more records than the hire date restriction.
When possible, don't use Select * instead, list out the specific columns that you need. Remember, even if you are doing a calculation with a column, you don't have to include that column in your select statement.
Here is a very useful resource for understanding how to use proc sql efficiently. I recommend reading it in it's entirety if you do a lot of work with large data sets in SAS.
http://www2.sas.com/proceedings/sugi29/127-29.pdf

SAS : Select rows from a relationnal database

I work with SAS on a relationnal database that I can access with a libname odbc statement as below :
libname myDBMS odbc datasrc="myDBMS";
Say the database contains a table named 'myTable' with a numeric variable 'var_ex' which values can be 0,1 or . (missing). Now say I want to exclude all rows for which var_ex=1.
If I use the following :
DATA test1;
SET myDBMS.myTable; /* I call directly the table from the DBMS */
where var_ex NE 1;
run;
I don't get rows for which 'var_ex' is missing. Here is a screenshot of the log, with my actual data :
Whereas if I do the exact same thing after importing the table in the Work :
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
DATA test2;
SET myTable; /* I call the table from the work */
where var_ex NE 1;
run;
I select rows for which 'var_ex' is 0 or missing, as intended. Here is a screenshot of the log, with my actual data :
The same happens if I use PROC SQL instead of a DATA step, or another NE-like.
I did some research and more or less understood here that unintended stuff like that can happen if you work directly on a DBMS table.
Does that mean is it simply not recommended to work with a DBMS table, and one has to import table locally as below before doing anything ?
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
Or is there a proper way to manipulate such tables ?
The best way to test how SAS is translating the data step code into database code is through the sastrace system option. Before running code, try this:
options sastrace=',,,db' sastraceloc=saslog;
Then run your code tests. When you check the log, you will see precisely how SAS is translating the code (if it can at all). If it can't, you'll see,
ACCESS ENGINE: SQL statement was not passed to the DBMS, SAS will do the processing.
followed by a select * from table.
In general, if SAS cannot translate data step code into dbms-specific code, it will pull everything to locally manipulate the data. By viewing this output, you can determine precisely how to get the data step to translate into what you need.
If all else fails, you can use explicit SQL pass-through. The code in parentheses operates the same way as if you're running SQL directly from some other client.
proc sql;
connect to odbc(datasrc='source' user='username' pass='password');
create table want as
select * from connection to odbc
(<code specific to your dbms language>);
disconnect from odbc;
quit;

Limiting results in PROC SQL

I am trying to use PROC SQL to query a DB2 table with hundreds of millions of records. During the development stage, I want to run my query on an arbitrarily small subset of those records (say, 1000). I've tried using INOBS to limit the observations, but I believe that this parameter is simply limiting the number of records which SAS is processing. I want SAS to only fetch an arbitrary number of records from the database (and then process all of them).
If I were writing a SQL query myself, I would simply use SELECT * FROM x FETCH FIRST 1000 ROWS ONLY ... (the equivalent of SELECT TOP 1000 * FROM x in SQL Server). But PROC SQL doesn't seem to have any option like this. It's taking an extremely long time to fetch the records.
The question: How can I instruct SAS to arbitrarily limit the number of records to return from the database.
I've read that PROC SQL uses ANSI SQL, which doesn't have any specification for a row limiting keyword. Perhaps SAS didn't feel like making the effort to translate its SQL syntax to vendor-specific keywords? Is there no work around?
Have you tried using the outobs option in your proc sql?
For example,
proc sql outobs=10; create table test
as
select * from schema.HUGE_TABLE
order by n;
quit;
Alternatively, you can use SQL passthrough to write a query using DB2 syntax (FETCH FIRST 10 ROWS ONLY), although this requires you to store all your data in the database, at least temporarily.
Passthrough looks something like this:
proc sql;
connect to db2 (user=&userid. password=&userpw. database=MY_DB);
create table test as
select * from connection to db2 (
select * from schema.HUGE_TABLE
order by n
FETCH FIRST 10 ROWS ONLY
);
quit;
It requires more syntax and can't access your sas datasets, so if outobs works for you, I would recommend that.
When SAS is talking to a database via SAS syntax, part of the query can be translated to DBMS language equivalent - this is called implicit pass through. The rest of the query is "post-processed" by SAS to produce final result.
Depending on SAS version, DBMS vendor and DBMS version, and in some cases even some connection/libname options, different parts of SAS syntax are translatable/considered compatible between SAS and DBMS and thus sent to be performed by DBMS instead of SAS.
With SAS SQL options - INOBS and OUTOBS - I've worked a lot with MS SQL and Oracle via different versions of SAS, but I haven't seen those ever translated to TOP xxx type of queries, so this is probably not supported yet, although when query touches just DMBS data (no joins to SAS data etc), should be quite doable.
So I think you're left with the so called explicit pass-through - specific SAS SQL syntax to connect to database. This type of queries look like this:
proc sql;
connect to oracle as db1 (user=user1 pw=pasw1 path=DB1);
create table test_table as
select *
from connection to db1
( /* here we're in oracle */
select * from test.table1 where rownum <20
)
;
disconnect from db1;
quit;
In SAS 9.3 the syntax can be simplified - if there's already a LIBNAME connection, you can reuse it for explicit pass-through:
LIBNAME ORALIB ORACLE user=...;
PROC SQL;
connect to oracle using ORALIB;
create table work.test_table as
select *
from connection to ORALIB (
....
When connecting using libname be sure to use READBUFF (I usually set some 5000 or so) or INSERTBUFF options (1000 or more) when loading database.
To see if implicit pass-through takes place, set sastrace option:
option sastrace=',,,ds' sastraceloc=saslog nostsuffix;

How can I read a SAS dataset?

I have a lot of files in SAS format, and I'd like to be able to read them in programs outside of SAS. I don't have anything except the base SAS system installed. I could manually convert each one, but I'd like a way to do it automatically.
You'll need to have a running SAS session to act as a data server. You can then access the SAS data using ODBC, see the SAS ODBC drivers guide.
To get the local SAS ODBC server running, you need to:
Define your SAS ODBC server setup at described in the SAS ODBC drivers guide. In the example that follows, I'll connect to a server that is set up with the name "loclodbc".
Add an entry in your services file, (C:\WINDOWS\system32\drivers\etc\services), like this:
loclodbc 9191/tcp
...set the port number (here: 9191) so that it fits into your local setup. The name of the service "loclodbc" must match the server name as defined in the ODBC setup. Note that the term "Server" has nothing to do with the physical host name of your PC.
Your SAS ODBC server is now ready to run, but is has no assigned data resources available. Normally you would set this in the "Libraries" tab in the SAS ODBC setup process, but since you want to point to data sources "on the fly", we omit this.
From your client application you can now connect to the SAS ODBC server, point to the data resources you want to access, and fetch the data.
The way SAS points to data resources is through the concept of the "LIBNAME". A libname is a logical pointer to a collection of data.
Thus
LIBNAME sasadhoc 'C:\sasdatafolder';
assigns the folder "C:\sasdatafolder" the logical handle "sasiodat".
If you from within SAS want access to the data residing in the SAS data table file "C:\sasdatafolder\test.sas7bdat", you would do something like this:
LIBNAME sasadhoc 'C:\sasdatafolder';
PROC SQL;
CREATE TABLE WORK.test as
SELECT *
FROM sasadhoc.test
;
QUIT;
So what we need to do is to tell our SAS ODBC server to assign a libname to C:\sasdatafolder, from our client application. We can do this by sending it this resource allocation request on start up, by using the DBCONINIT parameter.
I've made some sample code for doing this. My sample code is also written in the BASE SAS language. Since there are obviously more clever ways to access SAS data, than SAS connecting to SAS via ODBC, this code only serves as an example.
You should be able to take the useful bits and create your own solution in the programming environment you're using...
SAS ODBC connection sample code:
PROC SQL;
CONNECT TO ODBC(DSN=loclodbc DBCONINIT="libname sasadhoc 'c:\sasdatafolder'");
CREATE TABLE temp_sas AS
SELECT * FROM CONNECTION TO ODBC(SELECT * FROM sasadhoc.test);
QUIT;
The magic happens in the "CONNECT TO ODBC..." part of the code, assigning a libname to the folder where the needed data resides.
There's now a python package that will allow you to read .sas7bdat files, or convert them to csv if you prefer
https://pypi.python.org/pypi/sas7bdat
You could make a SAS-to-CSV conversion program.
Save the following in sas_to_csv.sas:
proc export data=&sysparm
outfile=stdout dbms=csv;
run;
Then, assuming you want to access libname.dataset, call this program as follows:
sas sas_to_csv -noterminal -sysparm "libname.dataset"
The SAS data is converted to CSV that can be piped into Python. In Python, it would be easy enough to generate the "libname.dataset" parameters programmatically.
I have never tried http://www.oview.co.uk/dsread/, but it might be what you're looking for: "a simple command-line utility for working with datasets in the SAS7BDAT file format." But do note "This software should be considered experimental and is not guaranteed to be accurate. You use it at your own risk. It will only work on uncompressed Windows-format SAS7BDAT files for now. "
I think you might be able to use ADO,
See the SAS support site for more details.
Disclaimer:
I haven't looked at this for a while
I'm not 100% sure that this doesn't require additional licensing
I'm not sure if you can do this using Python