SAS : Select rows from a relationnal database - sas

I work with SAS on a relationnal database that I can access with a libname odbc statement as below :
libname myDBMS odbc datasrc="myDBMS";
Say the database contains a table named 'myTable' with a numeric variable 'var_ex' which values can be 0,1 or . (missing). Now say I want to exclude all rows for which var_ex=1.
If I use the following :
DATA test1;
SET myDBMS.myTable; /* I call directly the table from the DBMS */
where var_ex NE 1;
run;
I don't get rows for which 'var_ex' is missing. Here is a screenshot of the log, with my actual data :
Whereas if I do the exact same thing after importing the table in the Work :
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
DATA test2;
SET myTable; /* I call the table from the work */
where var_ex NE 1;
run;
I select rows for which 'var_ex' is 0 or missing, as intended. Here is a screenshot of the log, with my actual data :
The same happens if I use PROC SQL instead of a DATA step, or another NE-like.
I did some research and more or less understood here that unintended stuff like that can happen if you work directly on a DBMS table.
Does that mean is it simply not recommended to work with a DBMS table, and one has to import table locally as below before doing anything ?
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
Or is there a proper way to manipulate such tables ?

The best way to test how SAS is translating the data step code into database code is through the sastrace system option. Before running code, try this:
options sastrace=',,,db' sastraceloc=saslog;
Then run your code tests. When you check the log, you will see precisely how SAS is translating the code (if it can at all). If it can't, you'll see,
ACCESS ENGINE: SQL statement was not passed to the DBMS, SAS will do the processing.
followed by a select * from table.
In general, if SAS cannot translate data step code into dbms-specific code, it will pull everything to locally manipulate the data. By viewing this output, you can determine precisely how to get the data step to translate into what you need.
If all else fails, you can use explicit SQL pass-through. The code in parentheses operates the same way as if you're running SQL directly from some other client.
proc sql;
connect to odbc(datasrc='source' user='username' pass='password');
create table want as
select * from connection to odbc
(<code specific to your dbms language>);
disconnect from odbc;
quit;

Related

Is it possible to read RAW data type in SAS?

I am working with SAS and I am using data from an Oracle database via an ODBC connection. There are some fields I require from this database that have data_type = RAW in the Oracle SQL Developer environment.
SAS is reading these in incorrectly and is returning every field as 2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A with Type = Character and Format and Informat = $HEX40.
One thing I tried to do is read it in as a character variable instead, using character formats and informats using the following code, where mylib is the library connected to an Oracle database.
data want;
set mylib.have (obs= 10000);
format raw_data_var char40.;
informat raw_data_var char40.;
run;
This changed the formats to character but it then converted the cells to ********************
I also tried find some SAS documentation on reading binary data, https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p1vu9u7w1ieua7n17973igt2cq3c.htm&docsetVersion=9.4&locale=en
but unfortunately, I could not find something useful to help.
Can someone point me in the right direction to read in a raw data type using a data step or proc sql?
Thank you
You could use Proc SQL with a pass though query that utilizes the Oracle function RAWTOHEX
proc sql;
connect using mylib;
create table want as
select
a,b,c,input(rawhexed,$HEX32000.) as raw16kchars
from
connection to mylib
(
select a,b,c,rawtohex(myraw) as rawhexed
from have /* oracle side reference */
)
;
quit;

Sql Query into Access database(.accdb)

I'm using right now , and its working, a proc sql to connect to my DB(access .accdb). Until now I was only using it to do SELECT query. Here is a example that work and that I use to do so.
proc sql;
/* create an ODBC pass-through connection using the Microsoft Office Access 2007 .accdb driver */
connect to ODBC as savesdb
(required="driver=Microsoft Access Driver (*.mdb, *.accdb); dbq=&dir_BD.;");
create table MYTABLE as select * from connection to savesdb(select MYID from ACCESSTABLE);
/* close the pass-through connection */
disconnect from savesdb;
quit;
Now I want to execute a INSERT INTO query. I know that the next code is working
execute( INSERT INTO ACCESSTABLE ( MYID ) VALUES ( 1 )) by savesdb;
The thing is in the INSERT INTO I want to specify to insert the values that are in a dataset. In other word I have a dataset with 4 records so I want to call my insert into 4 times with the values in dataset.
Is there a way to do so?
All you need to do is loop through the DataSet in a data step and read the columns to a macro variable.. one var for each column.
then call the proc sql code with these macro variables as input to the macro definition which will do the insert into DB using full pass through.
however as pointed out libref is the easier method.
Thanks,
Manish

Load SAS dataset into Teradata table using Fast LOAD

I am trying to load SAS dataset into a teradata table using FASTLOAD utility. This works fine in some cases, but I want to separate the error tables and create them in my own/other database in teradata environment.
Could some one provide me the syntax (I do know it but it's not working) for how to make it possible?
Any method is fine either using proc sql command or proc append command. Thanks in advance.
You can use the LOGDB libname option to tell SAS into which database the log files should be created. By default they are created in the same database as the table being created (named as the target table named plus the three character suffixes you've discovered). Using the info provided in your comments, try this:
/* Delete any exisiting log files for table TPT_LD_TEST */
libname TPTLOAD TERADATA
SERVER=TDServ DATABASE=TPTLOAD
USER=tduser PASSWORD=tdpasswd1
;
proc delete data=TPTLOAD.TPT_LD_TEST_ET;
run;
proc delete data=TPTLOAD.TPT_LD_TEST_UV;
run;
proc delete data=TPTLOAD.TPT_LD_TEST_RS;
run;
libname TPTLOAD clear;
/* Define connection to target database */
LIBNAME TDSERV TERADATA
SERVER=TDServ
USER=tduser PASSWORD=tdpasswd1
LOGDB=TPTLOAD;
/* Truncate target table if necessary */
proc sql noprint;
delete from TDSERV.TPT_LD_TEST;
quit;
proc append base=TDSERV.TPT_LD_TEST(fastload=yes tpt=yes)
data=work.FastLoad;
run;
I added some code to delete any existing rows in the target table (a requirement for FASTLOAD).
If you have DROP TABLE and CREATE TABLE rights on your target database, it might be safer to drop and re-create the table so you can guarantee the structure and explicitly name the table index.
/* Delete target table if it exists */
proc delete data=TDSERV.TPT_LD_TEST;
run;
data TDSERV.TPT_LD_TEST
(fastload=yes tpt=yes
dbcreate_table_opts='primary index(index_column_name)'
)
set work.FastLoad;
run;
And in either case, be sure to remove any duplicate records from your source dataset; those will be written to your error files (as well as any records that fail other constraints).
PROC DELETE is a handy device because it will not create an error if the target table does not exist.

SAS Proc SQL to add a constant to a variable

I have a SAS dataset with numeric variables to, from, and weight. Some of the observations have value 0 for weight. I need all the weight values to be positive, so I wish to simply add 1 to all weight values.
How can I do that using Proc SQL?
I have tried the following, but it doesn't work:
proc sql;
update mylib.mydata
set weight=weight+1;
quit;
The error is:
ERROR: A CURRENT-OF-CURSOR operation cannot be initiated because
the column "weight" cannot be used to uniquely identify a row
because of its data type.
Also, mylib refers to a Greenplum appliance. This might be the problem...
If you have the database permissions to update that table, you might want to use the SAS/Access pass-through facility. You will need to know the correct syntax for this to work. Here is a non-working example:
proc sql;
connect to greenplm as dbcon
(server=greenplum04 db=sample port=5432 user=gpusr1 password=gppwd1);
execute (
/* Native code goes here */
update sample.mydata
set weight=weight+1
) by dbcon;
quit;
The connection string would be the same as used on the LIBNAME that defined your "mylib' libref.
However, if you are really trying to create a SAS dataset (not update the real table), you can do that with a simple data step:
data mydata;
set mylib.mydata
weight = weight + 1;
run;
That will create a copy of the table that can be used with other SAS procedures.
Check out this note at prosgress.com. You probably need to add UPDATE_MULT_ROWS=YES to your library definition.

Limiting results in PROC SQL

I am trying to use PROC SQL to query a DB2 table with hundreds of millions of records. During the development stage, I want to run my query on an arbitrarily small subset of those records (say, 1000). I've tried using INOBS to limit the observations, but I believe that this parameter is simply limiting the number of records which SAS is processing. I want SAS to only fetch an arbitrary number of records from the database (and then process all of them).
If I were writing a SQL query myself, I would simply use SELECT * FROM x FETCH FIRST 1000 ROWS ONLY ... (the equivalent of SELECT TOP 1000 * FROM x in SQL Server). But PROC SQL doesn't seem to have any option like this. It's taking an extremely long time to fetch the records.
The question: How can I instruct SAS to arbitrarily limit the number of records to return from the database.
I've read that PROC SQL uses ANSI SQL, which doesn't have any specification for a row limiting keyword. Perhaps SAS didn't feel like making the effort to translate its SQL syntax to vendor-specific keywords? Is there no work around?
Have you tried using the outobs option in your proc sql?
For example,
proc sql outobs=10; create table test
as
select * from schema.HUGE_TABLE
order by n;
quit;
Alternatively, you can use SQL passthrough to write a query using DB2 syntax (FETCH FIRST 10 ROWS ONLY), although this requires you to store all your data in the database, at least temporarily.
Passthrough looks something like this:
proc sql;
connect to db2 (user=&userid. password=&userpw. database=MY_DB);
create table test as
select * from connection to db2 (
select * from schema.HUGE_TABLE
order by n
FETCH FIRST 10 ROWS ONLY
);
quit;
It requires more syntax and can't access your sas datasets, so if outobs works for you, I would recommend that.
When SAS is talking to a database via SAS syntax, part of the query can be translated to DBMS language equivalent - this is called implicit pass through. The rest of the query is "post-processed" by SAS to produce final result.
Depending on SAS version, DBMS vendor and DBMS version, and in some cases even some connection/libname options, different parts of SAS syntax are translatable/considered compatible between SAS and DBMS and thus sent to be performed by DBMS instead of SAS.
With SAS SQL options - INOBS and OUTOBS - I've worked a lot with MS SQL and Oracle via different versions of SAS, but I haven't seen those ever translated to TOP xxx type of queries, so this is probably not supported yet, although when query touches just DMBS data (no joins to SAS data etc), should be quite doable.
So I think you're left with the so called explicit pass-through - specific SAS SQL syntax to connect to database. This type of queries look like this:
proc sql;
connect to oracle as db1 (user=user1 pw=pasw1 path=DB1);
create table test_table as
select *
from connection to db1
( /* here we're in oracle */
select * from test.table1 where rownum <20
)
;
disconnect from db1;
quit;
In SAS 9.3 the syntax can be simplified - if there's already a LIBNAME connection, you can reuse it for explicit pass-through:
LIBNAME ORALIB ORACLE user=...;
PROC SQL;
connect to oracle using ORALIB;
create table work.test_table as
select *
from connection to ORALIB (
....
When connecting using libname be sure to use READBUFF (I usually set some 5000 or so) or INSERTBUFF options (1000 or more) when loading database.
To see if implicit pass-through takes place, set sastrace option:
option sastrace=',,,ds' sastraceloc=saslog nostsuffix;