I want to get the content of my SAS EG log into R.
My first idea was to use PROC PRINTTO to print to a text file that I would then import, but I can only use it to print the log to the server on which SAS is installed, which I am not able to access from R (I don't have admin rights).
I figured out a way to run egp projects from R and to read SAS tables from R however, so I will be able to fetch the log if I can redirect its content to a table, or to a macro variable that I will then store into a table.
How can I do this ?
You could register and run your code as a SAS Stored Process, and use R to call it over http. Appending &_debug=log will give you the log. Just one option.. And avoids proc print.
I figured out a way:
use PROC PRINTTO to redirect the log of my project to a file on the server where I can write to from SAS (and not R).
read this file as a delimited file into a table, using an exotic delimiter that I will just have to try not to use in my code (no delimiter seems not to be an option unfortunately)
import this table from R and trim the irrelevant first rows
My SAS code :
%let writeable_folder_on_server = /some_path/;
%let temp_log_for_R = &writeable_folder_on_server/temp_log_for_R.txt;
%let log_as_tbl = mylib.mytbl;
proc printto log="&temp_log_for_R" print="&temp_log_for_R" new;
run;
proc datasets library= mylib nolist;
delete mytbl;
run;
/* code producing log */
%put foo;
%put bar;
proc import datafile="&temp_log_for_R" out=&log_as_tbl dbms=dlm replace;
delimiter='§';
getnames=no;
GUESSINGROWS=MAX;
run;
The replace parameter of the IMPORT procedure "should" make the table deletion redundant but for some reason (maybe because I used an oracle library) it doesn't.
It produces the following output, stored into the table :
NOTE: PROCEDURE PRINTTO used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 904.75k
OS Memory 15140.00k
Timestamp 01/30/2019 01:29:21 PM
Page Faults 2
Page Reclaims 251
Page Swaps 0
Voluntary Context Switches 1
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
28
29 proc datasets library= mylib nolist;
30 delete mytbl;
31 run;
NOTE: Deleting mylib.mytbl (memtype=DATA)
32
33 /* code producing log */
34 %put foo;
foo
35 %put bar;
bar
36
NOTE: PROCEDURE DATASETS used (Total process time)
real time 0.17 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 2425.56k
OS Memory 17956.00k
Timestamp 01/30/2019 01:29:21 PM
2
Page Faults 5
Page Reclaims 858
Page Swaps 0
Voluntary Context Switches 57
Involuntary Context Switches 4
Block Input Operations 0
Block Output Operations 0
Related
Below is a simple representation of my problem. I do not control the data, nor the format applied (this is a backend service for a Stored Process Web App). My goal is to return the error message generated - which in this case is actually a NOTE.
data _null_;
input x 8.;
cards;
4 4
;
run;
The above generates:
NOTE: Invalid data for x in line 61 1-8. RULE:
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 61 4 4 x=. ERROR=1 N=1 NOTE: DATA statement used (Total
process time):
real time 0.00 seconds
cpu time 0.01 seconds
It's easy enough to capture the error status (if _error_ ne 0 then do) but what I'd like to do is return the value of the NOTE - which handily tells us which column was invalid, along with line and column numbers.
Is this possible without log scanning? I've tried sysmsg() and syswarningtext to no avail.
AFAIK, There is no feature for capturing the NOTES a data step causes while the data step is running.
Since you are in STP environment, you might either use either:
-altlog at session startup or
proc printto log=… wrap of the step
and do that scan.
I would like to know if it is possbile to have the european letters.,
Ä Å Ö as a part of variable name in SAS 9.3.
It is possible in SAS enterprise guide but I couldnt do that in SAS 9.3 .,
data dsn;
input År name$;
datalines;
1 fgh
2 hjy
;
run;
and the log details from 9.3 are
38 data dsn;
39 input År name$;
ERROR: The name År is not a valid SAS name.
40 datalines;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DSN may be incomplete. When this step was
stopped there were 0
observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
and from enterprise guide..it works.
NOTE: The data set WORK.DSN has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Try changing the settings of valid variable names - validvarname. It tends to default to ANY in EG and V7 in SAS BASE.
Options validvarname=ANY;
I got a short question - If we are creating a SAS dataset say - Sample.sas7bdat which already exists, will the code take more time to execute (because here the code has to overwrite the existing dataset) than the case when this dataset was not already there?
data sample;
.....
.....
run;
I did some reasearch on the internet but could not find a satisfactory answer. To me it seems like the code should take a little bit extra time, though not sure how much of impact it would make on a 10GB of dataset.
You could test this yourself fairly easily. A few caveats:
Make sure you have a large enough dataset such that you won't miss the differences in simple random cpu activity. 100+MB is usually a good target.
Make sure you perform the test multiple times - the more the better, with no time in between if possible. One test will always be insufficient and will always tend to show the first dataset as faster, because it benefits from write caching (basically the OS saying that it's done writing when it's not, but simply has the write queued up in memory).
Here's an example of my test. This is a 100 million row dataset with two 8 byte numerics, so 1.6 GB.
First, the results. I see a few second difference. Why? SAS takes a few operations when replacing a dataset:
Write dataset to temporary file
Delete the old dataset
Rename temporary dataset to new dataset
On some OSs this seems to be faster than others; I've found Windows desktop to be fairly slow about this, compared to unix or even Windows Server OS which is pretty quick. I'm guessing Windows is more careful about deleting than simply changing a file system pointer, but I don't really know. It's certainly not copying the whole file over from the utility directory (it's not nearly enough time for that). I also suspect write caching is still giving a bit of a boost to the new datasets, particularly as time for all datasets is growing as I write. The difference is probably only about a second or so - the difference between _REP iteration 2 and _NEW iteration 3 seems the most reasonable to me.
Iteration 1 _NEW=7.26999998099927 _REP=12.9079999922978
Iteration 2 _NEW=10.0119998454974 _REP=11.0789999961998
Iteration 3 _NEW=10.1360001564025 _REP=15.3819999695042
Iteration 4 _NEW=14.7720000743938 _REP=17.4649999142056
Iteration 5 _NEW=16.2560000418961 _REP=19.2009999752044
Notice the first iteration new is far faster than the others, and overall time increases as you go (as the write caching is less and less able to keep up). I suspect if you allow it to continue (or use a still larger file, which I don't have time for right now) you might see even more consistent times. I'm also not sure what happens with write caching when a file that is write cached is deleted; it's possible it has to wait for the write caching to write out to disk before doing the delete op or something similar. You could perform a test where you waited 30 seconds between _NEW and _REP to verify that.
The code:
%macro test_me(iter=1);
%do _i=1 %to &iter.;
%let start = %sysfunc(time());
data test&_i.;
do x = 1 to 1e8;
y=x**2;
output;
end;
run;
%let mid=%sysfunc(time());
data test&_i.;
do x = 1 to 1e8;
y=x**2;
output;
end;
run;
%let end=%sysfunc(time());
%let _new = %sysevalf(&mid.-&start.);
%let _rep = %sysevalf(&end.-&mid.);
%put Iteration &_i. &=_new. &=_rep.;
%end;
proc datasets nolist kill;
quit;
%mend test_me;
options nosource nonotes nomprint nosymbolgen;
%test_me(iter=5);
There are more file operations involved when you are overwriting. After creating the table, SAS will delete the old table and rename the new. In my tests this took 0.2 seconds extra time.
In a brief test, my 800Mb dataset took 4 seconds to create new and 10-15 seconds to overwrite. I'm assuming this is because SAS has to preserve the existing dataset until the datastep completes executing so as to preserve data-integrity. That's why you might get the following message in the log:
WARNING: Data set dset was not replaced because this step was stopped.
Overwrite test
NOTE: The data set WORK.SAMPLE has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 10.06 seconds
user cpu time 3.08 seconds
system cpu time 1.48 seconds
memory 1506.46k
OS Memory 26268.00k
Timestamp 08/12/2014 11:43:06 AM
Step Count 42 Switch Count 38
Page Faults 0
Page Reclaims 155
Page Swaps 0
Voluntary Context Switches 190
Involuntary Context Switches 288
Block Input Operations 0
Block Output Operations 1588496
New data test
NOTE: The data set WORK.SAMPLE1 has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 3.94 seconds
user cpu time 3.14 seconds
system cpu time 0.80 seconds
memory 1482.18k
OS Memory 26268.00k
Timestamp 08/12/2014 11:43:10 AM
Step Count 43 Switch Count 38
Page Faults 0
Page Reclaims 112
Page Swaps 0
Voluntary Context Switches 99
Involuntary Context Switches 294
Block Input Operations 0
Block Output Operations 1587464
The only difference between the log messages is the real time, which to me would indicate SAS is processing filesystem operations on the dataset files.
N.B. I have tested this on SAS (r) Proprietary Software Release 9.4 TS1M2, which I'm running through SAS Studio online. I think it's a Linux operating system, results could vary depending on your operating system.
Let me start by saying that I'm on a team that are all very new to SAS. We are using Enterprise Guide 5.1 in SAS 9.3, and have a set of schedule data arranged vertically (one or two rows per person per day). We have some PROC SQL statements, a PROC TRANSPOSE, and a couple other steps that together primarily make the data grouped by week and displayed horizontally. That set of code works fine. The first time the process flow runs, it takes a little extra time establishing the connection to the database, but once the connection is made, the rest of the process only takes a few seconds (about 6 seconds for a test run of 7 months of data: 58,000 rows and 26 columns of source data going to 6,000 rows, 53 columns of output).
Our problem is in the output. The end-users are looking for results in Excel, so we are using the SAS Excel add-in and opening a stored process. In order to get output, we need a PROC PRINT, or something similar. But using PROC PRINT on the results from above (6,000 rows x 53 columns) is taking 36 seconds just to generate. Then, it is taking another 10 seconds or so to render in EG, and even more time in Excel.
The code is very basic, just:
PROC PRINT DATA=WORK.Report_1
NOOBS
LABEL;
RUN;
We have also tried using a basic PROC REPORT, but we are only gaining 3 seconds: it is still taking 33 seconds to generate plus rendering time.
PROC REPORT DATA=WORK.Report_1
RUN;
QUIT;
Any ideas why it is taking so long? Are there other print options that might be faster?
Tested on my laptop. Took about 13 seconds to output a table with 6000 records and 53 variables (I used 8 character long strings) with PROC PRINT and ODS HTML.
data test;
format vars1-vars53 $8.;
array vars[53];
do i=1 to 6000;
do j=1 to 53;
vars[j] = "aasdfjkl;";
end;
output;
end;
drop i j;
run;
ods html body="c:\temp\test.html";
proc print data=test noobs;
run;
ods html close;
File size was a little less than 11M.
If you are only using this as a stored process, you can make it a streaming process and write to _WEBOUT HTML. This will work for viewing in Excel and greatly reduces the size of the HTML generated (no CSS included).
data _null_;
set test end=last;
file _webout;
array vars[53] $;
format outstr $32.;
if _n_ = 1 then do;
put '<html><body><table>';
put '<tr>';
do i=1 to 53;
outstr = vname(vars[i]);
put '<th>' outstr '</th>';
end;
put '</tr>';
end;
put '<tr>';
do i=1 to 53;
put '<td>' vars[i] '</td>';
end;
put '</tr>';
if last then do;
put '</table></body></html>';
end;
run;
This takes .2 seconds to run and generated 6M of output. Add any HTML decorators as needed.
I've been moving all of my datasets into SPDE libraries because I've experienced wonderful performance gains in everything. Everything until running proc transpose. This takes ~60x longer to execute on the SPDE dataset than the same dataset stored in normal v9 library. The data sets is sorted by item_id. It is being read/written to the same library.
Does anyone have an idea why this is the case? Am I missing something important about SPDE and Proc Transpose not playing well together?
SPDE Libary
MPRINT(XMLIMPORT_VANTAGE): proc transpose data = smplus.links_response_mechanism out = smplus.response_mechanism (drop = _NAME_)
prefix = rm_;
MPRINT(XMLIMPORT_VANTAGE): by item_id;
MPRINT(XMLIMPORT_VANTAGE): id lookup_code;
MPRINT(XMLIMPORT_VANTAGE): var x;
MPRINT(XMLIMPORT_VANTAGE): run;
NOTE: There were 5866747 observations read from the data set SMPLUS.LINKS_RESPONSE_MECHANISM.
NOTE: The data set SMPLUS.RESPONSE_MECHANISM has 3209353 observations and 14 variables.
NOTE: Compressing data set SMPLUS.RESPONSE_MECHANISM decreased size by 37.98 percent.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 28:27.63
cpu time 28:34.64
V9 Library
MPRINT(XMLIMPORT_VANTAGE): proc transpose data = mplus.links_response_mechanism out = mplus.response_mechanism (drop = _NAME_)
prefix = rm_;
MPRINT(XMLIMPORT_VANTAGE): by item_id;
68 The SAS System 02:00 Thursday, August 8, 2013
MPRINT(XMLIMPORT_VANTAGE): id lookup_code;
MPRINT(XMLIMPORT_VANTAGE): var x;
MPRINT(XMLIMPORT_VANTAGE): run;
NOTE: There were 5866747 observations read from the data set MPLUS.LINKS_RESPONSE_MECHANISM.
NOTE: The data set MPLUS.RESPONSE_MECHANISM has 3209353 observations and 14 variables.
NOTE: Compressing data set MPLUS.RESPONSE_MECHANISM decreased size by 27.60 percent.
Compressed is 32271 pages; un-compressed would require 44572 pages.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 28.76 seconds
cpu time 28.79 seconds
Looks to me that there is some issue with PROC TRANSPOSE and SPDE. Here's a simple SSCCE, which has significant differences; not as significant as yours, but to some extent that may be a factor of this being on a desktop with not particularly substantial performance tuning in the first place. Sounds like a call to SAS tech support is in order.
libname spdelib spde 'c:\temp\SPDE Main'
datapath=('c:\temp\SPDE Data' 'd:\temp\SPDE Data')
indexpath=('d:\temp\SPDE Index')
partsize=512;
libname mainlib 'c:\temp\';
data mainlib.bigdata;
do ID = 1 to 1500000;
do _varn=1 to 10;
varname=cats("Var_",_varn);
vardata=ranuni(7);
output;
end;
end;
run;
data spdelib.bigdata;
do ID = 1 to 1500000;
do _varn=1 to 10;
varname=cats("Var_",_varn);
vardata=ranuni(7);
output;
end;
end;
run;
*These data steps take roughly the same amount of time, around 30 seconds each;
proc transpose data=spdelib.bigdata out=spdelib.transdata;
by id;
id varname;
var vardata;
run;
*Run a few times, this takes around 3 to 4 minutes, with 1.5 minutes CPU time;
proc transpose data=mainlib.bigdata out=mainlib.transdata;
by id;
id varname;
var vardata;
run;
*Run a few times, this takes around 30 to 45 seconds, with 20 seconds CPU time;
There have been known issues with SPDE and proc compare in the past (not multi-threading), at least up to version 4.1. What version are you using? (can be seen in the “!install/logs” folder).
This is definitely something to raise with SAS support, to "speed" things along I would recommend submitting a log with the following options:
proc setinit noalias; run;
proc options; run;
%put _ALL_;
options fullstimer msglevel=i;
Also:
options spdedebug='DA_TRACEIO_OCR CJNL=Trace.txt';
(The CJNL option simply routes the trace message output to a text file)
In the meantime, you may be able to take advantage of some of the following SPD specific options:
http://support.sas.com/kb/11/349.html
This issue usually occurs when PROC TRANSPOSE is used with BY-processing on compressed datasets. SAS is forced to read the same block of rows repeatedly decompressing them every time until all the records are fully sorted.
Set Compress=No option and it will work. See the log below, one program has Compress=yes and the other Compress=no, the former was 56 minutes vs .5 seconds.
OPTIONS COMPRESS=YES;
50 **tranpose from spde to spde;
51 proc transpose data=spdelib.balancewalkoutput out=spdelib.spdelib_to_spdelib;
52 var metric ;
53 by balancewalk facility_id isretained isexisting isicaapnpl monthofmaturity vintage;
54 run;
NOTE: There were 10000000 observations read from the data set SPDELIB.BALANCEWALKOUTPUT.
NOTE: The data set SPDELIB.SPDELIB_TO_SPDELIB has 160981 observations and 74 variables.
NOTE: Compressing data set SPDELIB.SPDELIB_TO_SPDELIB decreased size by 69.96 percent.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 56:58.54
user cpu time 52:03.65
system cpu time 4:03.00
memory 19028.75k
OS Memory 34208.00k
Timestamp 09/16/2019 06:19:55 PM
Step Count 9 Switch Count 22476
Page Faults 0
Page Reclaims 4056
Page Swaps 0
Voluntary Context Switches 142316
Involuntary Context Switches 5726
Block Input Operations 88
Block Output Operations 569200
OPTIONS COMPRESS=NO;
50 **tranpose from spde to spde;
51 proc transpose data=spdelib.balancewalkoutput out=spdelib.spdelib_to_spdelib;
52 var metric ;
53 by balancewalk facility_id isretained isexisting isicaapnpl monthofmaturity vintage;
18 The SAS System 16:04 Monday, September 16, 2019
54 run;
NOTE: There were 10000000 observations read from the data set SPDELIB.BALANCEWALKOUTPUT.
NOTE: The data set SPDELIB.SPDELIB_TO_SPDELIB has 160981 observations and 74 variables.
NOTE: PROCEDURE TRANSPOSE used (Total process time):
real time 26.73 seconds
user cpu time 14.52 seconds
system cpu time 11.99 seconds
memory 13016.71k
OS Memory 27556.00k
Timestamp 09/16/2019 04:13:06 PM
Step Count 9 Switch Count 24827
Page Faults 0
Page Reclaims 2662
Page Swaps 0
Voluntary Context Switches 162653
Involuntary Context Switches 1678
Block Input Operations 96
Block Output Operations 1510040