How can I read a SAS dataset? - sas

I have a lot of files in SAS format, and I'd like to be able to read them in programs outside of SAS. I don't have anything except the base SAS system installed. I could manually convert each one, but I'd like a way to do it automatically.

You'll need to have a running SAS session to act as a data server. You can then access the SAS data using ODBC, see the SAS ODBC drivers guide.
To get the local SAS ODBC server running, you need to:
Define your SAS ODBC server setup at described in the SAS ODBC drivers guide. In the example that follows, I'll connect to a server that is set up with the name "loclodbc".
Add an entry in your services file, (C:\WINDOWS\system32\drivers\etc\services), like this:
loclodbc 9191/tcp
...set the port number (here: 9191) so that it fits into your local setup. The name of the service "loclodbc" must match the server name as defined in the ODBC setup. Note that the term "Server" has nothing to do with the physical host name of your PC.
Your SAS ODBC server is now ready to run, but is has no assigned data resources available. Normally you would set this in the "Libraries" tab in the SAS ODBC setup process, but since you want to point to data sources "on the fly", we omit this.
From your client application you can now connect to the SAS ODBC server, point to the data resources you want to access, and fetch the data.
The way SAS points to data resources is through the concept of the "LIBNAME". A libname is a logical pointer to a collection of data.
Thus
LIBNAME sasadhoc 'C:\sasdatafolder';
assigns the folder "C:\sasdatafolder" the logical handle "sasiodat".
If you from within SAS want access to the data residing in the SAS data table file "C:\sasdatafolder\test.sas7bdat", you would do something like this:
LIBNAME sasadhoc 'C:\sasdatafolder';
PROC SQL;
CREATE TABLE WORK.test as
SELECT *
FROM sasadhoc.test
;
QUIT;
So what we need to do is to tell our SAS ODBC server to assign a libname to C:\sasdatafolder, from our client application. We can do this by sending it this resource allocation request on start up, by using the DBCONINIT parameter.
I've made some sample code for doing this. My sample code is also written in the BASE SAS language. Since there are obviously more clever ways to access SAS data, than SAS connecting to SAS via ODBC, this code only serves as an example.
You should be able to take the useful bits and create your own solution in the programming environment you're using...
SAS ODBC connection sample code:
PROC SQL;
CONNECT TO ODBC(DSN=loclodbc DBCONINIT="libname sasadhoc 'c:\sasdatafolder'");
CREATE TABLE temp_sas AS
SELECT * FROM CONNECTION TO ODBC(SELECT * FROM sasadhoc.test);
QUIT;
The magic happens in the "CONNECT TO ODBC..." part of the code, assigning a libname to the folder where the needed data resides.

There's now a python package that will allow you to read .sas7bdat files, or convert them to csv if you prefer
https://pypi.python.org/pypi/sas7bdat

You could make a SAS-to-CSV conversion program.
Save the following in sas_to_csv.sas:
proc export data=&sysparm
outfile=stdout dbms=csv;
run;
Then, assuming you want to access libname.dataset, call this program as follows:
sas sas_to_csv -noterminal -sysparm "libname.dataset"
The SAS data is converted to CSV that can be piped into Python. In Python, it would be easy enough to generate the "libname.dataset" parameters programmatically.

I have never tried http://www.oview.co.uk/dsread/, but it might be what you're looking for: "a simple command-line utility for working with datasets in the SAS7BDAT file format." But do note "This software should be considered experimental and is not guaranteed to be accurate. You use it at your own risk. It will only work on uncompressed Windows-format SAS7BDAT files for now. "

I think you might be able to use ADO,
See the SAS support site for more details.
Disclaimer:
I haven't looked at this for a while
I'm not 100% sure that this doesn't require additional licensing
I'm not sure if you can do this using Python

Related

What exactly the use SQL in SAS?

I have just started studying SAS and a little bit confused. This link here show a Query to the DATA SET. I thought that it would be like connecting to a external DATABASE and perform a request query to a DATABASE.
So does the DATA SET is the database and the SQL syntax is just another way of processing data in the DATA SET?
Also can you recommend a better tutorial. A free open source tutorial/book/sources will be much better.
Well I'm still learning and I will appreciate any opinion/answer/recommendation.
I use SAS University Edition in virtually in my computer.
So does the DATA SET is the database and the SQL syntax is just another way of processing data in the DATA SET?
DATA SET is the table (not the database), and yes SQL is another way.
You can think of the native SAS library engine, V9 as the data base. For example:
libname mydata 'c:\projectx\sasdata'; is the same as
libname mydata V9 'c:\projectx\sasdata';
libname mydata <engine> 'c:\projectx\sasdata';
libname mydata <engine> <options for connection parameters>;
V9 is the default engine used when the libname statement does specify one. There are different engines for connecting to almost any remote (non-SAS) data bases, data files or data providers that let a SAS coder code in SAS and not have to learn the language or dialect of the remote environment.
A rough mapping of SAS structure concepts to data base concepts:
V9 engine ~ "data base"
local folder ~ schema, instance, or catalog
data set ~ table
variable ~ column
observation ~ row
You can learn more about engines by searching the help system for "SAS Engines" and "How Engines Work with SAS Files"
Proc SQL lets you code using SQL. A coder can choose the best language for themselves and for the problem at hand; be it SQL, DATA steps and PROC steps.
Do not confuse SQL (query language) with mySQL, postgresql, sqlite or any other database technology.
proc sql is an alternative to the data step.
Mostly you can do the same with both, but one might be able to perform better in certain situation or allow for easier/shorter syntax than the other.
The dataset you use has nothing to do with the language you use to "query" it.
Look into LIBNAME statement to connect to external databases.
As someone said before, do not confuse between SQL (the query language) and DataSet (its the name of the tables in SAS).
Here is an example of the same result using DATA SET syntax and PROC SQL syntax:
With DATA SET:
DATA myNewTable;
SET myTable;
WHERE id = 123;
RUN;
With PROC SQL syntax:
PROC SQL;
CREATE TABLE myNewTable AS
SELECT * FROM myTable
WHERE id = 123;
QUIT;
Hope it makes sense.

Read a sas7bdat file in SAS Studio

I've scoured the internet but cannot seem to figure this out. My question is, if I have a sas7bdat file, how can I read a sas7bdat file in SAS studio so that I can work with it.
I've tried:
libname test 'C:\Users\name\Downloads\test.sas7bdat';
which gives me the error that library test does not exist and if I try the following, I know that I need an INPUT which I don't know of unless I can see into the file.
DATA test;
INFILE 'C:\Users\lees162\Downloads\test.sas7bdat';
RUN;
Is there something I'm missing?
Libref's that you create via the LIBNAME statement point to directories, not individual files.
libname test 'C:\Users\name\Downloads\';
INFILE is for reading raw data files. To reference an existing SAS dataset you use a SET statement (or MERGE,MODIFY,UPDATE statement).
set test.test ;
Note that you can skip defining a libref and just use the quoted physical name in the SET statement.
DATA test;
set 'C:\Users\lees162\Downloads\test.sas7bdat';
RUN;
Of course to use C:\ in the paths this is assuming that you are using SAS/Studio to point to full SAS running on your PC. If you are using SAS University Edition then it is running in a virtual machine and you will need to put the SAS dataset into a folder that is mapped to the virtual machine and then reference it in the SAS code with the name that the virtual machine uses for the directory.
So something like:
DATA test;
set '/folders/myfolders/test.sas7bdat';
RUN;
Libname is just pointing the location and once you have done that you can use that libname followed period and dataset in your set statement
libname test "C:\Users\name\Downloads";
DATA test;
set test.asl;
RUN;
One possible reason could be that you are using the SAS University edition (It doesn't support variable library address).
From one of the SAS community Q/A:
"When you are using the SAS University Edition, any libraries that you create must be assigned to a shared folder. You access your shared folder with this pathname: /folders/myfolders/. Always use '/' in the directory path, even in Windows operating environments"
After setting the directory address, proceed as instructed by Tom above in one of the answers.
Suppose you have the sas dataset at location. C:\Users\name\Downloads\test.sas7bdat
libname download 'C:\Users\name\Downloads';
proc sql;
select * from downloads.test;
run;
you can read your dataset like a table using the proc sql, in case you want to query the dataset, but if you want to modify the existing dataset then you can use the data setp as mentioned by #krian.

Get server info for all librefs

How can I get a table with variables libref and server_id (or any server info) for all libraries available to me in SAS?
My goal is to get a summary of where the data is physically located for all these libraries, in order to write efficient queries when fetching data from different servers.
Look at what information is available in the view SASHELP.VLIBNAM (or DICTIONARY.LIBNAMES when using PROC SQL).
Here is a utility macro that pulls the engine, host and schema from that view for a given libref. I have used it for TERADATA, ORACLE and ODBC engines. dblibchk.sas
From Tom's code and advices I built the table I needed with this code:
PROC SQL;
SELECT distinct libname, engine, path,
CASE WHEN engine in('BASE','V9') THEN 'SAS' ELSE catx('_',engine,path) END AS server
FROM DICTIONARY.LIBNAMES ;
QUIT;
there are a few tables that can help you in the Library SASHELP like Tom mentioned.
You can also use VTABLE will have all the tables from which library and VCOLUMN will have the detail from library to table to columns as well as the data type used and it's length.
They work a bit like in SQL data from the information_schema database.
Alternatively using proc content on a dataset will also return all of it's component and you can put that in a table or a macro variable.
Hope this helps!

SAS File <lib>.<dataset>.DATA does not exist but proc datasets shows dataset

I'm trying to move a SAS dataset over to our Linux server from a client. They created it on SAS 9.4, 64-bit on Windows 7. I'm using SAS 9.4, 64-bit on Linux.
If I do
proc datasets library=din;
run;
I get the following in my log
Libref DIN
Engine V9
Physical Name /sasUsr/DM/DATA/SAS_DATA/201510_SSI
Filename /sasUsr/DM/DATA/SAS_DATA/201510_SSI
Inode Number 46358529
Access Permission rwxrwxr-x
Owner Name cvandenb
File Size (bytes) 4096
Member File
# Name Type Size Last Modified
1 SAMPLE_FROM_SSI DATA 131072 09/14/2015 17:07:01
2 TEST DATA 131072 09/15/2015 09:35:59
15 run;
but when I do
data test;
set din.sample_from_SSI;
run;
I get
18 data test;
19 set din.sample_from_SSI;
ERROR: File DIN.SAMPLE_FROM_SSI.DATA does not exist.
20 run;
I also created a dummy dataset din.test and was able to proc print it. This seems to either be a version compatibility issue or transmission issue. I thought this would be straightforward. Any suggestions? I'm moving the file from windows to Linux with WinSCP. I'd rather not have to request a .csv and create the input statement, but will if I have to. Your help is appreciated.
Thanks,
Cory
If you are talking about an actual SAS dataset then make sure that the name of the file is in all lowercase letters and has the extension of .sas7bdat. If the source file from Windows did not have an extension of .sas7bdat then perhaps you are not dealing with a SAS dataset, but some other type of file.
In SAS code it does not matter whether you reference a dataset using upper or lower case letters. So you can reference a datasets as sample_from_SSI or Sample_From_Ssi to refer to the same file. The same is true of general filenames on a Windows machine. But on Unix system file names with different use of upper and lower case letters are distinct files. SAS requires that the filename of a SAS dataset must be in all lowercase letters.
So if you write:
libname DIN '/sasUsr/DM/DATA/SAS_DATA/201510_SSI';
proc print data=DIN.SAMPLE_FROM_SSI;
run;
Then you are looking to make a listing of the data in a file named:
/sasUsr/DM/DATA/SAS_DATA/201510_SSI/sample_from_ssi.sas7bdat
I usually get a note about CEDA in this case not missing data.
Create either a CPORT or XPORT file using the associated proc, PROC CPORT or XPORT and then move that file.
Try referring to the data with all caps as well, which I don't think should be the issue, but is possible.
I would try using PROC COPY directly on the libname, as you can select memtype=data that way without explicitly specifying the file.
If SAS still can't do that, then you might have a permissions issue or something else that is outside of the SAS realm I suspect.
Try using PROC CPORT and PROC CIMPORT.
Use the CPORT Procedure to convert the file into a transport file.
Use the CIMPORT Procedure to convert the transport file to a SAS format.
There is an example that sounds similar to what you are doing here.
According to SAS, the general procedure is:
A transport file is created at the source computer using PROC CPORT.
The transport file is transferred from the source computer to the target computer via communications software or a magnetic medium
The transport file is read at the target computer using PROC CIMPORT.
Note: Transport files that are created using PROC CPORT are not
interchangeable with transport files that are created using the XPORT
engine.
If that doesn't work, or it is taking a very long time to figure out, it would be faster to ask them for a CSV and import it directly using PROC IMPORT. It should read in quite easily, especially if it comes from PROC EXPORT.

Proc SQL: How / When does SAS Move the Data

I am a DBA / R user. I just took a job in an office full of SAS users and I am trying to understand better how SAS' proc sql works. I understand that SAS includes a relational database and it includes the ability to run proc sql against external servers like Oracle. I am trying to better understand when / how it decides to use the database server rather than its internal database system.
I have seen some really S. L. O. W. SAS code where my coworkers running a series of proc sql commands. These programs typically include 3 - 5 proc sql steps. Each proc sql command creates a local SAS table. They are not using passthrough sql. The data sets are large (1 million rows +) and these proc sql steps run slowly. Most of the data lives on the server. There is usually a small table that defines the population that we want to look at and it is in a SAS data file, but everything else lives on the server.
I have demonstrated dramatic improvements in speed by simply running all of the queries directly on the server. (Oracle in this case, but I don't think that is important.) Usually, I have to first upload a table to my personal schema that defines the population of clients we want to examine. Everything else is on the server. Sometimes I collapse their queries together because they can be done in a single step, but I do not believe that is why my version of their program is so much faster.
I think proc sql uploads the initial data set and then runs the first query on the server. It then downloads the output to the local computer, creating the local SAS data set. For the second proc sql step, it uploads the table created in step one back to the server and then runs the query on the server. To make this all even worse, the "local" SAS data sets are actually stored on a remote server, not the actual local machine. This is invisible to SAS, but it does mean we are copying data across the network yet again. I believe SAS is running slowly because of a large amount of unnecessary network traffic.
Question #1 - Is my understanding of what proc sql is doing correct? Are we really wasting as much time as I think we are uploading and downloading large tables / data sets across our network?
Qeustion #2 - Is there some way to control when proc sql runs against a server versus when it runs against the local database? In some cases, if we could prevent the upload / download step, the query would run more efficiently.
Short answer
Your understanding is not exactly correct, but it's in the right ballpark. SQL is probably not sending the SAS dataset to the server, it is more likely downloading the server data to SAS - but it's probably downloading the entire table, not limited by the join criteria. Your solution is exactly what I would suggest doing - hopefully your colleagues will get on board.
Long answer
In terms of how the processing works, it depends on your code. PROC SQL will execute code locally (as in, on the SAS server/desktop), unless it decides to pass the query up to the server and hasn't been told it's not allowed to. That's called implicit passthrough. You can't really control it except to turn it entirely off (with noipassthru on the PROC SQL statement). You can look at it sometimes using options msglevel=i; (a system option), and _METHOD or _TREE to see what SQL decided to do (similar to explain plan).
I've had cases where it caused harm: SQL Server runs character comparisons case-insensitively while SAS does not, and I had a particular query that sometimes was sent up to the server and sometimes not depending on details of the data. I wasn't careful enough with checking case, and so it appeared to work when it really wasn't correct (comparing Propcase to UPCASE).
The general rule is that SAS will try to send the query to the server if:
The data in the query entirely already resides on the server
The query is sufficiently simple that SAS can easily figure out how to tell the server to do it, in its native language
If you're running a query with local SAS dataset (say, joining a server table to a SAS dataset locally), it won't (at least as far as I know) go to the server. It should always run it locally, which would mean downloading from the server all data in the contributing tables (possibly filtered if there is a logical filter in the query). IE (these examples aren't necessarily good SQL code, just examples of concept):
libname oralib oracle [connection info];
proc sql;
*Will pass through likely;
select tableA.*, tableB.cost
from oralib.tableA inner join oralib.tableB
on tableA.id=tableB.id;
*Will probably not pass through;
select tableA.*, tableB.cost
from oralib.tableA inner join work.tableB
on tableA.id=tableB.id;
*Might pass through, might not;
select tableA.*, tableB.cost, tableC.productID
from oralib.tableA inner join oralib.tableB
on tableA.id=tableB.id
left join oralib.tableC
on tableA.id=tableC.id;
*This downloads the data but probably applies the where statement server side;
select tableA.*, tableB.cost
from oralib.tableA inner join work.tableB
on tableA.id=tableB.id
where tableA.date < '01JAN2010'd;
quit;
In the case of the second query, it probably pulls all of tableA down. In the fourth query, it likely will pass the where clause to the server (assuming the date doesn't cause a problem, but it shouldn't, SAS knows how to convert dates to oracle type dates).
Note that SAS procs can also generate passthrough. PROC MEANS, etc., will send the instructions to Oracle to do the means/sums/etc. if it can easily do so.
Your best bet is to:
Try to do everything in pass through that you can (and that makes sense). Only way to be sure it goes to the server is to use passthrough.
If you have a large table on the server and a small table in SAS, upload the table in SAS to the server. A passthrough session and a libname session can't see each others session-specific temporary tables, so you'd have to use a GTT or similar (something all users can see). Similarly, if you have a large table in SAS and a small table (or small query result) in SQL, bring it down locally (through passthrough if necessary).
When you do have to bring things down, limit as much as possible. When I worked in that kind of environment, I made huge time savings simply by joining to tables on the server to limit my result set before bringing them down.
At the end of the day, you will be constrained by network traffic no matter what you do; just try to optimize it as best you can. It sounds like you understand how to do that already, so just do what you normally would do in non-SAS environments.