Performance issue on dataload SAS to Vertica - sas

Our organisation is in the process of migrating from PADB to Vertica.
We have some analyst who uses SAS.
I converted their tools and macros to be able to work from PADB to Vertica.
But when it comes to dataload from SAS to Vertica the perfomance is not the same as before.
I am seeing execution time from 1-2 minutes becoming 2-3 hours.
I am using the ODBC as I was not able to use other method like :
proc sql exec;
connect to odbc (datasrc=EDW authdomain=VERTICA);
execute(COPY CRM_COMMON.new_load_test FROM local
'/data/saswork/SAS_work765E0000405D_cammsaim238/SAS_workB45C0000405D_cammsaim238/test1.csv' PARSER fcsvparser() ) by odbc;
disconnect from odbc;
quit;
It's not working I'm getting a note :NOTE: No data found/modified.
I tried using proc append and proc dataset and proc copy everything is slow.
I tried using bulkload but it's not available.
Any idea on what I can do or try to speed up the data transfer?
Let me know!
-------UPDATE----
I tried various other ways and still getting error, when I check the log there seems to be an issue with null values from my data ... any idea ?
proc sql exec;
/* Loading converted csv file to ParAccel */ connect to odbc (datasrc=EDW authdomain=VERTICA); execute(COPY CRM_COMMON.new_load_test FROM local '/data/saswork/SAS_work765E0000405D_cammsaim238/SAS_workB45C0000405D_cammsaim238/test1.csv' DELIMITER ',' ) by odbc;
disconnect from odbc;
quit;

Manage to load data with this :
I used the option abort on error to get more details about the error and also the reject option to see record that failed
utlimately the issue was with null values precisely with the timestamp
I havent found a fit all solution to load any type of data regard less of nulls and format but I managed to load data faster than using the ODBC
I tried the option trailing nullcols but it's not helping with the null values
proc sql exec;
/* Loading converted csv file to ParAccel */
connect to odbc (datasrc=EDW authdomain=VERTICA);
execute(COPY CRM_COMMON.test_low FROM local '/data/saswork/SAS_work765E0000405D_cammsaim238/SAS_workB45C0000405D_cammsaim238/test_low.csv' DELIMITER ',' abort on error
rejected data '/data/saswork/SAS_work765E0000405D_cammsaim238/SAS_workB45C0000405D_cammsaim238/reject_low.csv' ) by odbc;
disconnect from odbc;
quit;

Related

Is it possible to read RAW data type in SAS?

I am working with SAS and I am using data from an Oracle database via an ODBC connection. There are some fields I require from this database that have data_type = RAW in the Oracle SQL Developer environment.
SAS is reading these in incorrectly and is returning every field as 2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A2A with Type = Character and Format and Informat = $HEX40.
One thing I tried to do is read it in as a character variable instead, using character formats and informats using the following code, where mylib is the library connected to an Oracle database.
data want;
set mylib.have (obs= 10000);
format raw_data_var char40.;
informat raw_data_var char40.;
run;
This changed the formats to character but it then converted the cells to ********************
I also tried find some SAS documentation on reading binary data, https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p1vu9u7w1ieua7n17973igt2cq3c.htm&docsetVersion=9.4&locale=en
but unfortunately, I could not find something useful to help.
Can someone point me in the right direction to read in a raw data type using a data step or proc sql?
Thank you
You could use Proc SQL with a pass though query that utilizes the Oracle function RAWTOHEX
proc sql;
connect using mylib;
create table want as
select
a,b,c,input(rawhexed,$HEX32000.) as raw16kchars
from
connection to mylib
(
select a,b,c,rawtohex(myraw) as rawhexed
from have /* oracle side reference */
)
;
quit;

Error while loading data in Teradata using proc append (SAS EG 14.1)

I am getting below error while appending data in a Teradata table from SAS
ERROR: Teradata connection: No more room in database TINYDB. Correct error and
restart as an APPEND process with option TPT_RESTART=YES. Since no checkpoints were
taken, if the previous run used FIRSTOBS=n, use the same value in the restart.
I don't know why i am getting this error for one table, because i am able to append other Teradata tables.
I am using simple proc append
proc append data=table1 base=table2 (MULTILOAD=YES TPT=YES) force;
run;
Please suggest why its giving above error just for one table, while appending in other Teradata tables is working fine.
Thanks
#
Adding
Just one explanation, if i remove (MULTILOAD=YES TPT=YES) from the Proc Append Statement, then it will work, but will take huge amount of time

SAS : Select rows from a relationnal database

I work with SAS on a relationnal database that I can access with a libname odbc statement as below :
libname myDBMS odbc datasrc="myDBMS";
Say the database contains a table named 'myTable' with a numeric variable 'var_ex' which values can be 0,1 or . (missing). Now say I want to exclude all rows for which var_ex=1.
If I use the following :
DATA test1;
SET myDBMS.myTable; /* I call directly the table from the DBMS */
where var_ex NE 1;
run;
I don't get rows for which 'var_ex' is missing. Here is a screenshot of the log, with my actual data :
Whereas if I do the exact same thing after importing the table in the Work :
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
DATA test2;
SET myTable; /* I call the table from the work */
where var_ex NE 1;
run;
I select rows for which 'var_ex' is 0 or missing, as intended. Here is a screenshot of the log, with my actual data :
The same happens if I use PROC SQL instead of a DATA step, or another NE-like.
I did some research and more or less understood here that unintended stuff like that can happen if you work directly on a DBMS table.
Does that mean is it simply not recommended to work with a DBMS table, and one has to import table locally as below before doing anything ?
DATA myTable; /* I put myTable in the Work library */
SET myDBMS.myTable;
run;
Or is there a proper way to manipulate such tables ?
The best way to test how SAS is translating the data step code into database code is through the sastrace system option. Before running code, try this:
options sastrace=',,,db' sastraceloc=saslog;
Then run your code tests. When you check the log, you will see precisely how SAS is translating the code (if it can at all). If it can't, you'll see,
ACCESS ENGINE: SQL statement was not passed to the DBMS, SAS will do the processing.
followed by a select * from table.
In general, if SAS cannot translate data step code into dbms-specific code, it will pull everything to locally manipulate the data. By viewing this output, you can determine precisely how to get the data step to translate into what you need.
If all else fails, you can use explicit SQL pass-through. The code in parentheses operates the same way as if you're running SQL directly from some other client.
proc sql;
connect to odbc(datasrc='source' user='username' pass='password');
create table want as
select * from connection to odbc
(<code specific to your dbms language>);
disconnect from odbc;
quit;

how to get the return codes from SAS pass-through SQL to Teradata ?

In SAS 9.2, how do I get the return codes / error messages from explicit pass-through sql to teradata? Printed in log or output or something.
I already got a small query to work fine, but having some trouble with a more complex one. Debugging would be much easier with the error messages.
Tried the sqlxmsg and sqlxrc that are used when querying db2, but of course those don't work... haven't found any documentation on this. (I'm quite new to Teradata)
Use the SASTRACE option to bring back debugging messages from Teradata.
http://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a000433982.htm
This document supports and shows an example of it in use on explicit pass-through SQL:
https://support.sas.com/resources/papers/TroubleshootingSASandTeradataQueryPerformanceProblems.pdf
N.B. If you are using this option on large sets of data, be careful to choose the options wisely or you will create huge logs
This was the code, and now after shutting and restarting SAS it works fine!
Seems I had some process hanging somewhere...
rsubmit sashost;
proc sql;
connect to teradata (user=&terauser password=&terapass server=&teraserv mode=teradata);
create table test as
select * from connection to teradata
(select x
from y.z
where c);
%put &sqlxmsg;
%put &sqlxrc;
disconnect from teradata;
quit;
proc download data=test out=locallib.test; run;
endrsubmit;

How can I read a SAS dataset?

I have a lot of files in SAS format, and I'd like to be able to read them in programs outside of SAS. I don't have anything except the base SAS system installed. I could manually convert each one, but I'd like a way to do it automatically.
You'll need to have a running SAS session to act as a data server. You can then access the SAS data using ODBC, see the SAS ODBC drivers guide.
To get the local SAS ODBC server running, you need to:
Define your SAS ODBC server setup at described in the SAS ODBC drivers guide. In the example that follows, I'll connect to a server that is set up with the name "loclodbc".
Add an entry in your services file, (C:\WINDOWS\system32\drivers\etc\services), like this:
loclodbc 9191/tcp
...set the port number (here: 9191) so that it fits into your local setup. The name of the service "loclodbc" must match the server name as defined in the ODBC setup. Note that the term "Server" has nothing to do with the physical host name of your PC.
Your SAS ODBC server is now ready to run, but is has no assigned data resources available. Normally you would set this in the "Libraries" tab in the SAS ODBC setup process, but since you want to point to data sources "on the fly", we omit this.
From your client application you can now connect to the SAS ODBC server, point to the data resources you want to access, and fetch the data.
The way SAS points to data resources is through the concept of the "LIBNAME". A libname is a logical pointer to a collection of data.
Thus
LIBNAME sasadhoc 'C:\sasdatafolder';
assigns the folder "C:\sasdatafolder" the logical handle "sasiodat".
If you from within SAS want access to the data residing in the SAS data table file "C:\sasdatafolder\test.sas7bdat", you would do something like this:
LIBNAME sasadhoc 'C:\sasdatafolder';
PROC SQL;
CREATE TABLE WORK.test as
SELECT *
FROM sasadhoc.test
;
QUIT;
So what we need to do is to tell our SAS ODBC server to assign a libname to C:\sasdatafolder, from our client application. We can do this by sending it this resource allocation request on start up, by using the DBCONINIT parameter.
I've made some sample code for doing this. My sample code is also written in the BASE SAS language. Since there are obviously more clever ways to access SAS data, than SAS connecting to SAS via ODBC, this code only serves as an example.
You should be able to take the useful bits and create your own solution in the programming environment you're using...
SAS ODBC connection sample code:
PROC SQL;
CONNECT TO ODBC(DSN=loclodbc DBCONINIT="libname sasadhoc 'c:\sasdatafolder'");
CREATE TABLE temp_sas AS
SELECT * FROM CONNECTION TO ODBC(SELECT * FROM sasadhoc.test);
QUIT;
The magic happens in the "CONNECT TO ODBC..." part of the code, assigning a libname to the folder where the needed data resides.
There's now a python package that will allow you to read .sas7bdat files, or convert them to csv if you prefer
https://pypi.python.org/pypi/sas7bdat
You could make a SAS-to-CSV conversion program.
Save the following in sas_to_csv.sas:
proc export data=&sysparm
outfile=stdout dbms=csv;
run;
Then, assuming you want to access libname.dataset, call this program as follows:
sas sas_to_csv -noterminal -sysparm "libname.dataset"
The SAS data is converted to CSV that can be piped into Python. In Python, it would be easy enough to generate the "libname.dataset" parameters programmatically.
I have never tried http://www.oview.co.uk/dsread/, but it might be what you're looking for: "a simple command-line utility for working with datasets in the SAS7BDAT file format." But do note "This software should be considered experimental and is not guaranteed to be accurate. You use it at your own risk. It will only work on uncompressed Windows-format SAS7BDAT files for now. "
I think you might be able to use ADO,
See the SAS support site for more details.
Disclaimer:
I haven't looked at this for a while
I'm not 100% sure that this doesn't require additional licensing
I'm not sure if you can do this using Python