SAS PROC PRINT is really slow for me, any ideas? - sas

Let me start by saying that I'm on a team that are all very new to SAS. We are using Enterprise Guide 5.1 in SAS 9.3, and have a set of schedule data arranged vertically (one or two rows per person per day). We have some PROC SQL statements, a PROC TRANSPOSE, and a couple other steps that together primarily make the data grouped by week and displayed horizontally. That set of code works fine. The first time the process flow runs, it takes a little extra time establishing the connection to the database, but once the connection is made, the rest of the process only takes a few seconds (about 6 seconds for a test run of 7 months of data: 58,000 rows and 26 columns of source data going to 6,000 rows, 53 columns of output).
Our problem is in the output. The end-users are looking for results in Excel, so we are using the SAS Excel add-in and opening a stored process. In order to get output, we need a PROC PRINT, or something similar. But using PROC PRINT on the results from above (6,000 rows x 53 columns) is taking 36 seconds just to generate. Then, it is taking another 10 seconds or so to render in EG, and even more time in Excel.
The code is very basic, just:
PROC PRINT DATA=WORK.Report_1
NOOBS
LABEL;
RUN;
We have also tried using a basic PROC REPORT, but we are only gaining 3 seconds: it is still taking 33 seconds to generate plus rendering time.
PROC REPORT DATA=WORK.Report_1
RUN;
QUIT;
Any ideas why it is taking so long? Are there other print options that might be faster?

Tested on my laptop. Took about 13 seconds to output a table with 6000 records and 53 variables (I used 8 character long strings) with PROC PRINT and ODS HTML.
data test;
format vars1-vars53 $8.;
array vars[53];
do i=1 to 6000;
do j=1 to 53;
vars[j] = "aasdfjkl;";
end;
output;
end;
drop i j;
run;
ods html body="c:\temp\test.html";
proc print data=test noobs;
run;
ods html close;
File size was a little less than 11M.
If you are only using this as a stored process, you can make it a streaming process and write to _WEBOUT HTML. This will work for viewing in Excel and greatly reduces the size of the HTML generated (no CSS included).
data _null_;
set test end=last;
file _webout;
array vars[53] $;
format outstr $32.;
if _n_ = 1 then do;
put '<html><body><table>';
put '<tr>';
do i=1 to 53;
outstr = vname(vars[i]);
put '<th>' outstr '</th>';
end;
put '</tr>';
end;
put '<tr>';
do i=1 to 53;
put '<td>' vars[i] '</td>';
end;
put '</tr>';
if last then do;
put '</table></body></html>';
end;
run;
This takes .2 seconds to run and generated 6M of output. Add any HTML decorators as needed.

Related

How to skip code if created dataset has zero rows

I have a job which at first imports some xlsx files, then connects to multiple DB tables. Based on conditions, the job selects rows to output, and creates an excel file to send on to the final end-user.
Sometimes, that job returns zero rows, which is acceptable; in that case, I would prefer to create an empty excel file with only the variables, but not run the other code (checking/cleaning code).
How can I conditionally execute code only when there are results?
Something like this:
I get 0 rows
If Result = 0 then Go to *"here"*
Else *"just run the code further"*
You have a few useful things that can help you here.
First off, PROC SQL sets a macro variable SQLOBS, which is particularly useful in identifying how many records were returned from the last SQL query it ran.
proc sql;
select * from sashelp.class;
quit;
%put I returned &SQLOBS rows;
You might use this to drive further processing, either with %IF blocks as Tom notes in comments or other methods I will cover below.
You can also check how many rows are in a dataset explicitly, if you prefer a slightly more robust option.
proc sql;
select count(*) into :class_count from sashelp.class;
quit;
%put I returned &class_count rows;
For very large datasets, there are faster options (using the dataset descriptors, dictionary tables, or a few other options), but for most tables this is fine.
Either way, what I would typically do with a program I intended to run in production would be then to drive the rest of the program from macros.
%macro whatIWantToDo(params);
...
do stuff
...
%mend whatIWantToDo;
proc sql;
mySqlStuff;
quit;
%if &sqlobs. gt 0 %then %do;
%whatIWantToDo(params);
%end;
%else %do;
%put Nothing to do;
%end;
Another option is to use call execute; this is appropriate if your data drives the macro parameters. The big advantage of call execute is that it only runs if you have data rows - if you have zero, it won't do anything!
Say you have some datasets to run code on. You could have up to twelve - one per month - but only have them for the current calendar year, so in Jan you have one, Feb you have two, etc. You could do this:
data mydata_jan mydata_feb mydata_mar;
set sashelp.class;
run;
%macro printit(data=);
title "Printing &data.";
proc print data=&data;
run;
title;
%mend printit;
data _null_;
set sashelp.vtable;
where upcase(memname) like 'MYDATA_%' and nobs gt 0;
callstr = cats('%printit(data=',memname,')');
call execute(Callstr);
run;
First I make the datasets, with a name I can programmatically identify. Then I make the macro that I want to run on each (this could be checking, cleaning, whatever). Then I use sashelp.vtable which shows which tables are created, and check the nobs variable (number of observations) is more than zero. Then I use call execute to run the macro on that dataset!

SAS: proc reg and macro

i have a data that contain 30 variable and 2000 Observations.
I want to calculate regression in a loop, whan in each step I delete the i row in the data.
so in the end I need thet my output will be 2001 regrsion, one for the regrsion on all the data end 2000 on each time thet I drop a row.
I am new to sas, and I tray to find how to do it withe macro, but I didn't understand.
Any comments and help will be appreciated!
This will create the data set I was talking about in my comment to Chris.
data del1V /view=del1v;
length group _obs_ 8;
set sashelp.class nobs=nobs;
_obs_ = _n_;
group=0;
output;
do group=1 to nobs;
if group eq _n_ then;
else output;
end;
run;
proc sort out=analysis;
by group;
run;
DATA NEW;
DATA OLD;
do i = 1 to 2001;
IF _N_ ^= i THEN group=i;
else group=.;
output;
end;
proc sort data=new;
by group;
proc reg syntax;
by group;
run;
This will create a data set that is much longer. You will only call proc reg once, but it will run 2001 models.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. SAS can create a data set with the GROUP column to differentiate the results.
I edited my original answer per #data null suggestion. I agree that the above is probably faster, though I'm not as confident that it would be 100x faster. I do not know enough about the costs of the overhead of proc reg versus the cost of the group by statement and a larger data set. Regardless the answer above is simpler programming. Here is my original answer/alternate approach.
You can do this within a macro program. It will have this general structure:
%macro regress;
%do i=1 %to 2001;
DATA NEW;
DATA OLD;
IF _N_=&I THEN DELETE;
RUN;
proc reg syntax;
run;
%end;
%mend;
%regress
Macros are an advanced programming function in SAS. The macro program is required in order to do a loop of proc reg. The %'s are indicative of macro functions. &i is a macro variable (& is the prefix of a macro variable that is being called). The macro is created in a block that starts and ends with %macro / %mend, and called by %regress.
Examining 2001 regression outputs will be difficult just written as output. You will likely need to go read the PROC REG support documentation and look into the output options for whatever type of output you're interested in. Use &i to create a different data set each time and then append together as part of the macro loop.

Why is syscc being set to 4 on exit of gplot when I have a title of 42 characters and output to activex?

My department uses syscc to trap errors and warnings. Most of the time this is really helpful, however, I’ve either run into a very strange bug, or I’m doing something wrong.
The code below creates some data then plots 2 charts. The latter “%put &syscc;” – where title1 is set to a string 42 characters long - returns a value of 4, but “%put &syswarningtext;”doesn’t return anything, and Enterprise guide doesn’t identify it as a warning.
ETA NB:The same behavious is seen in both Enterprise guide and the stored process web app.
ETA NB:Because of the better handling of busy axes, we need to output to ActiveX.
data test;
length x y 8;
infile datalines dsd;
input x y;
datalines;
1,1
2,2
;
run;
TITLE1 'This is forty one characters long. Honest';
PROC GPLOT DATA = WORK.test;
PLOT y * x;
run; quit;
%put &syscc;
%put &syswarningtext;
TITLE1 'This however, is forty two characters long';
PROC GPLOT DATA = WORK.test;
PLOT y * x;
run; quit;
%put &syscc;
%put &syswarningtext;
I’ve done some investigation and determined a few things -
The value is set on the exit of gplot (either on hitting a quit or a
subsequent data step).
The Linesize option is set to 132, so it’s
not that.
It isn’t affected by the width of the chart - I thought it
might be affected by the ability to fit the title over the chart.
Any thoughts? (NB Already submitted to SAS support. Race!)
*****************EDIT********************
Further investigation, prompted by Joe's answer below. It seems that one of the things that "goptions reset=all;" does is change the device from activeX to blank. While this fixes the title problem, it causes the real chart (but not the example) to throw other errors since it has a somewhat congested x-axis. It would be great to get both working. In the meantime, I think we will have weirdly short titles. *****************EDIT2*******************
SAS Support have opened a (very small, low priority) defect about this. My first SAS bug!
I'm not sure of the direct cause, but it seems like it can be solved by adding
goptions reset=all;
to the top of the program (or, before any PROC GPLOT statements) (and of course adding back your necessary elements). This is in EG 6.1/SAS 9.4 local.
It is specific to EG, as it doesn't occur when I run it in base SAS. It occurs with GPLOT and GCHART, but not the ODS Graphics procs (SGPLOT etc.) or regular PROC FREQ.
I also saw some weird inconsistencies that suggested it's not related to the 41-42 characters precisely; often it would be 0 for my first entire run but then 4 on the start of the second run. It's possible some of this was related to your data step, which I fixed belatedly (it works, but it's not exactly right, and it's possibly related- though certainly not the entire cause).
I also took out the spurious QUIT; after the gplots, which sometimes seemed to have an effect; another thing that seemed to have an effect was TITLE1; before re-setting the title. But again, the effects were inconsistent, and never persisted to a second running.
My best guess is that there's a goption somewhere that's not quite handled properly by EG, probably related to the creation of the image file.
data test;
infile datalines dlm=',';
input x y;
datalines;
1,1
2,2
;
run;
goptions reset=all;
TITLE1 'This is forty one characters long. Honest';
PROC GPLOT DATA = WORK.test;
PLOT y * x;
run;
%put &syscc;
%put &syswarningtext;
TITLE1 'This however, is forty two characters long';
PROC GPLOT DATA = WORK.test;
PLOT y * x;
run;
%put &syscc;
%put &syswarningtext;

How to run the same SAS program with data from different months in one program rather than doing it separately for each month?

I need to rerun the same program with data from different months and create a separate excel spreadsheet for each month. What is a shorter way to program this in SAS than to run each program separately? For example in the following I read data from October, and at the end of the same program I output the October results to excel. I need to do the same for each month. Can I do it in one SAS program (maybe using Macro)? Thanks.
data sourceh.trades2;
set sourceh.trades1_october08_wk1;
if time<34200000 or time>57602000 then delete;
run;
proc export data=sourceh.avesymbol
outfile='C:\Documents and Settings\zd\My Documents\h\hdata\trades\2008\October 08 1 min correlations.xls'
replace;
run;
I would use a macro for that. Here I have wrapped your code into a macro which you can execute with the RunProgram(); macro statement for each desired month and year.
%MACRO RunProgram(month, year);
data sourceh.trades2;
set sourceh.trades1_&month.&year._wk1;
if time<34200000 or time>57602000 then delete;
run;
proc export data=sourceh.avesymbol
outfile="C:\Documents and Settings\zd\My Documents\h\hdata\trades\2008\&month. &year. 1 min correlations.xls"
replace;
run;
%MEND RunProgram;
%RunProgram(October, 08);
%RunProgram(November, 08);

Print sas datatset having 100,000 columns in to excel file

I need to print a sas dataset having 100,000 rows * 100,000 columns into excel file.
Proc export or ODS html statements are breaking and hence, are unable to print the same.
Data in file statements are able to print the same. But, due to their logical record length limit, the printing is not proper and essentially my one row is being broken down into 3 rows.
Is there a way out or is this a limitation of SAS in terms of data handling?
Not so much a limitation of SAS, but a limitation of Excel, which can handle up to 16384 columns and up to ~1 million rows, depending on the version. Excel isn't meant to handle datasets of this magnitude; use a proper database.
You certainly cannot get this into excel in any system.
You should be able to get this into another format, like a text file. For example:
data mydata;
array vars[100000];
do _n_=1 to 10;
do _t = 1 to dim(vars);
vars[_t]=_t;
end;
output;
end;
drop _t;
run;
data _null_;
file "c:\temp\myfile.csv" dlm=',' lrecl=2000000;
set mydata;
put _all_;
run;
*put all doesn't really work properly for this, but as I don't know your variable names or setup I cannot really give you a better solution; more than likely you can use a shortcut to define the put statement.;
Maximum LRECL value depends on your operating system, but I'd think most of them could handle a million or two. Certainly Win7 can. You could also use PROC EXPORT to a csv, but you'd have to grab the (300k lines of) code from the log and modify the LRECL to be larger as it defaults to 32767, and I don't think you can modify it in the proc.
SAS/IML would also allow another option. I'm not sure you could really do 100k*100k on any reasonable system (if it's numeric 8 byte matrix elements, you're at 80 billion bytes required to store...)
proc iml;
x=j(1e5,1e5,12345);
filename out ’c:\temp\myfile.csv’;
file out lrecl=800000;
do i=1 to nrow(x);
do j=1 to ncol(x);
put (x[i,j]) 5.0 +5 ',' #;
end;
put;
end;
closefile out;
quit;
Edit: It seems that the lrecl statement in IML doesn't quite behave properly, or else I'm doing something wrong here - but that may be a fault of my system. I get buffer overflows even when the lrecl is clearly long enough.