Below is a simple representation of my problem. I do not control the data, nor the format applied (this is a backend service for a Stored Process Web App). My goal is to return the error message generated - which in this case is actually a NOTE.
data _null_;
input x 8.;
cards;
4 4
;
run;
The above generates:
NOTE: Invalid data for x in line 61 1-8. RULE:
----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0 61 4 4 x=. ERROR=1 N=1 NOTE: DATA statement used (Total
process time):
real time 0.00 seconds
cpu time 0.01 seconds
It's easy enough to capture the error status (if _error_ ne 0 then do) but what I'd like to do is return the value of the NOTE - which handily tells us which column was invalid, along with line and column numbers.
Is this possible without log scanning? I've tried sysmsg() and syswarningtext to no avail.
AFAIK, There is no feature for capturing the NOTES a data step causes while the data step is running.
Since you are in STP environment, you might either use either:
-altlog at session startup or
proc printto log=… wrap of the step
and do that scan.
Related
I am using the following code
proc surveyselect data = tmp method = urs sampsize = 500 seed = 100 out = out_tmp; run;
However when I look at the logs I am getting 491 records. My tmp dataset has 30,000 records. Need help to understand why the 9 records are getting dropped. I played around with changing the seed value and I am getting around 470 to 495 records per random seed but never get an absolute 500. Referred to the documentation and URS option means "unrestricted random sampling, which is selection with equal probability and with replacement". Probability being equal has no impact however, replacement terminology , I understand as, a record could be present more than once, which is what I am aiming for.
What I do not understand is why does the drawn sample stops are at number less than the 500 i have specified?
Thanks for the help.
The issue is you're failing to quite understand how URS works - I recommend a look through the documentation.
Take this (extreme) example:
proc surveyselect data=sashelp.cars method=urs out=sample_cars sampsize=10000 seed=100;
run;
NOTE: The sample size, 10000, is greater than the number of sampling units, 428.
NOTE: The data set WORK.SAMPLE_CARS has 428 observations and 16 variables.
NOTE: PROCEDURE SURVEYSELECT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
Here I ask for 10,000 (out of 428 total records!), and get... 428 records. The important detail to pay attention to is the NumberHits variable. That says how many times each record was sampled.
If you want one record output for each hit, meaning you want those duplicates, you can add outhits to your PROC SURVEYSELECT statement. From the documentation on URS:
For unrestricted random sampling, by default, the output data set contains a single copy of each unit selected, even when a unit is selected more than once, and the variable NumberHits records the number of hits (selections) for each unit. If you specify the OUTHITS option, the output data set contains m copies of a sampling unit for which NumberHits is m; for example, the output data set contains three copies of a sampling unit that is selected three times (NumberHits is three). For information about the contents of the output data set, see the section Sample Output Data Set.
Here is my example modified to do just that.
proc surveyselect data=sashelp.cars method=urs out=sample_cars sampsize=10000 seed=100 outhits;
run;
NOTE: The sample size, 10000, is greater than the number of sampling units, 428.
NOTE: The data set WORK.SAMPLE_CARS has 10000 observations and 16 variables.
NOTE: PROCEDURE SURVEYSELECT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
I want to get the content of my SAS EG log into R.
My first idea was to use PROC PRINTTO to print to a text file that I would then import, but I can only use it to print the log to the server on which SAS is installed, which I am not able to access from R (I don't have admin rights).
I figured out a way to run egp projects from R and to read SAS tables from R however, so I will be able to fetch the log if I can redirect its content to a table, or to a macro variable that I will then store into a table.
How can I do this ?
You could register and run your code as a SAS Stored Process, and use R to call it over http. Appending &_debug=log will give you the log. Just one option.. And avoids proc print.
I figured out a way:
use PROC PRINTTO to redirect the log of my project to a file on the server where I can write to from SAS (and not R).
read this file as a delimited file into a table, using an exotic delimiter that I will just have to try not to use in my code (no delimiter seems not to be an option unfortunately)
import this table from R and trim the irrelevant first rows
My SAS code :
%let writeable_folder_on_server = /some_path/;
%let temp_log_for_R = &writeable_folder_on_server/temp_log_for_R.txt;
%let log_as_tbl = mylib.mytbl;
proc printto log="&temp_log_for_R" print="&temp_log_for_R" new;
run;
proc datasets library= mylib nolist;
delete mytbl;
run;
/* code producing log */
%put foo;
%put bar;
proc import datafile="&temp_log_for_R" out=&log_as_tbl dbms=dlm replace;
delimiter='§';
getnames=no;
GUESSINGROWS=MAX;
run;
The replace parameter of the IMPORT procedure "should" make the table deletion redundant but for some reason (maybe because I used an oracle library) it doesn't.
It produces the following output, stored into the table :
NOTE: PROCEDURE PRINTTO used (Total process time):
real time 0.01 seconds
user cpu time 0.01 seconds
system cpu time 0.01 seconds
memory 904.75k
OS Memory 15140.00k
Timestamp 01/30/2019 01:29:21 PM
Page Faults 2
Page Reclaims 251
Page Swaps 0
Voluntary Context Switches 1
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 0
28
29 proc datasets library= mylib nolist;
30 delete mytbl;
31 run;
NOTE: Deleting mylib.mytbl (memtype=DATA)
32
33 /* code producing log */
34 %put foo;
foo
35 %put bar;
bar
36
NOTE: PROCEDURE DATASETS used (Total process time)
real time 0.17 seconds
user cpu time 0.02 seconds
system cpu time 0.00 seconds
memory 2425.56k
OS Memory 17956.00k
Timestamp 01/30/2019 01:29:21 PM
2
Page Faults 5
Page Reclaims 858
Page Swaps 0
Voluntary Context Switches 57
Involuntary Context Switches 4
Block Input Operations 0
Block Output Operations 0
I have a panel/longitudinal dataset in SAS.
One field indicates a class or type, another a point in time without breaks, another is the observed history and another is the log difference forecast for said history. I'd like to add a new field: the history field, advanced by the forecast field.
So if the time field is in the 'future', I want to recursively advance my goal variable with its own lag, multiplied by the exp of the log-difference forecast variable. A trivial operation it seems to me.
I've attempted to replicate the problem with a toy dataset below.
data in;
input class time hist forecast;
datalines;
1 1 100 .
1 2 . .1
1 3 . .15
1 4 . .17
2 1 100 .
2 2 . .18
2 3 . .12
2 4 . .05
run;
proc sort data=work.in;
by class time;
run;
data out;
set in;
by class time;
retain goal hist;
if time > 1 then goal= lag1(goal) * exp(forecast);
run;
JP:
You might want this:
data out;
set in;
by class time;
retain goal;
if first.class
then goal=hist;
else goal = goal * exp(forecast);
run;
Retaining a non data set variable can mostly be considered a lag1 type of stack. The initial goal needs to be reset at the start of each group.
Your first attempt is conditionally LAG1'ng a retained variable while BY group processing -- makes my head spin. LAG-n is tricky because the implicit LAG stack is updated only when processing flow goes through it. If a conditional bypasses the LAG function invocation there is no way the LAG stack can get updated. If you do see LAG in other SAS coding, it might appear in an unconditional place prior to any ifs.
NOTE: retaining data set variables (such as hist) is atypical because their values are overwritten when the SET statement is reached. The atypical case is when testing the retained data set variable prior to the SET statement has a functional purpose.
I would like to know if it is possbile to have the european letters.,
Ä Å Ö as a part of variable name in SAS 9.3.
It is possible in SAS enterprise guide but I couldnt do that in SAS 9.3 .,
data dsn;
input År name$;
datalines;
1 fgh
2 hjy
;
run;
and the log details from 9.3 are
38 data dsn;
39 input År name$;
ERROR: The name År is not a valid SAS name.
40 datalines;
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DSN may be incomplete. When this step was
stopped there were 0
observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
and from enterprise guide..it works.
NOTE: The data set WORK.DSN has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Try changing the settings of valid variable names - validvarname. It tends to default to ANY in EG and V7 in SAS BASE.
Options validvarname=ANY;
I got a short question - If we are creating a SAS dataset say - Sample.sas7bdat which already exists, will the code take more time to execute (because here the code has to overwrite the existing dataset) than the case when this dataset was not already there?
data sample;
.....
.....
run;
I did some reasearch on the internet but could not find a satisfactory answer. To me it seems like the code should take a little bit extra time, though not sure how much of impact it would make on a 10GB of dataset.
You could test this yourself fairly easily. A few caveats:
Make sure you have a large enough dataset such that you won't miss the differences in simple random cpu activity. 100+MB is usually a good target.
Make sure you perform the test multiple times - the more the better, with no time in between if possible. One test will always be insufficient and will always tend to show the first dataset as faster, because it benefits from write caching (basically the OS saying that it's done writing when it's not, but simply has the write queued up in memory).
Here's an example of my test. This is a 100 million row dataset with two 8 byte numerics, so 1.6 GB.
First, the results. I see a few second difference. Why? SAS takes a few operations when replacing a dataset:
Write dataset to temporary file
Delete the old dataset
Rename temporary dataset to new dataset
On some OSs this seems to be faster than others; I've found Windows desktop to be fairly slow about this, compared to unix or even Windows Server OS which is pretty quick. I'm guessing Windows is more careful about deleting than simply changing a file system pointer, but I don't really know. It's certainly not copying the whole file over from the utility directory (it's not nearly enough time for that). I also suspect write caching is still giving a bit of a boost to the new datasets, particularly as time for all datasets is growing as I write. The difference is probably only about a second or so - the difference between _REP iteration 2 and _NEW iteration 3 seems the most reasonable to me.
Iteration 1 _NEW=7.26999998099927 _REP=12.9079999922978
Iteration 2 _NEW=10.0119998454974 _REP=11.0789999961998
Iteration 3 _NEW=10.1360001564025 _REP=15.3819999695042
Iteration 4 _NEW=14.7720000743938 _REP=17.4649999142056
Iteration 5 _NEW=16.2560000418961 _REP=19.2009999752044
Notice the first iteration new is far faster than the others, and overall time increases as you go (as the write caching is less and less able to keep up). I suspect if you allow it to continue (or use a still larger file, which I don't have time for right now) you might see even more consistent times. I'm also not sure what happens with write caching when a file that is write cached is deleted; it's possible it has to wait for the write caching to write out to disk before doing the delete op or something similar. You could perform a test where you waited 30 seconds between _NEW and _REP to verify that.
The code:
%macro test_me(iter=1);
%do _i=1 %to &iter.;
%let start = %sysfunc(time());
data test&_i.;
do x = 1 to 1e8;
y=x**2;
output;
end;
run;
%let mid=%sysfunc(time());
data test&_i.;
do x = 1 to 1e8;
y=x**2;
output;
end;
run;
%let end=%sysfunc(time());
%let _new = %sysevalf(&mid.-&start.);
%let _rep = %sysevalf(&end.-&mid.);
%put Iteration &_i. &=_new. &=_rep.;
%end;
proc datasets nolist kill;
quit;
%mend test_me;
options nosource nonotes nomprint nosymbolgen;
%test_me(iter=5);
There are more file operations involved when you are overwriting. After creating the table, SAS will delete the old table and rename the new. In my tests this took 0.2 seconds extra time.
In a brief test, my 800Mb dataset took 4 seconds to create new and 10-15 seconds to overwrite. I'm assuming this is because SAS has to preserve the existing dataset until the datastep completes executing so as to preserve data-integrity. That's why you might get the following message in the log:
WARNING: Data set dset was not replaced because this step was stopped.
Overwrite test
NOTE: The data set WORK.SAMPLE has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 10.06 seconds
user cpu time 3.08 seconds
system cpu time 1.48 seconds
memory 1506.46k
OS Memory 26268.00k
Timestamp 08/12/2014 11:43:06 AM
Step Count 42 Switch Count 38
Page Faults 0
Page Reclaims 155
Page Swaps 0
Voluntary Context Switches 190
Involuntary Context Switches 288
Block Input Operations 0
Block Output Operations 1588496
New data test
NOTE: The data set WORK.SAMPLE1 has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 3.94 seconds
user cpu time 3.14 seconds
system cpu time 0.80 seconds
memory 1482.18k
OS Memory 26268.00k
Timestamp 08/12/2014 11:43:10 AM
Step Count 43 Switch Count 38
Page Faults 0
Page Reclaims 112
Page Swaps 0
Voluntary Context Switches 99
Involuntary Context Switches 294
Block Input Operations 0
Block Output Operations 1587464
The only difference between the log messages is the real time, which to me would indicate SAS is processing filesystem operations on the dataset files.
N.B. I have tested this on SAS (r) Proprietary Software Release 9.4 TS1M2, which I'm running through SAS Studio online. I think it's a Linux operating system, results could vary depending on your operating system.