I am relatively new to PROC IML procedure. I'd like to have my log to be completely clean, which includes log showing no notes and "!"(length in this case too?) if possible. How can I eliminate the note, keep my CPU and performance very efficient?
Thank you for your help!! I appreciate it!.- Michelle
71 proc iml;
NOTE: IML Ready
72
72 ! varNames={"NACCZMMS" "NACCZLMI" "NACCZLMD" "NACCZDFT" "NACCAGEB"};
73
73 ! use Class2.exercise2;
NOTE: Data file CLASS2.EXERCISE2.DATA is in a format that is native to
another host, or the file encoding does not
match the session encoding. Cross Environment Data Access will be used,
which might require additional CPU
resources and might reduce performance.
74
74 ! read all var varNames into CG;
75
75 ! print CG[c=varNames];
75 ! /*c for colname*/
76 quit;
You can convert the data set to a format that's optimal for your system.
data exercise2;
set class.exercise2;
run;
Then use the exercise2 data in your IML code. You only need to do this once. This has to do with the fact that the data set was created on a different operating system than yours and SAS is letting you know that. It will do the conversion automatically, but can slow things down.
Turn on the option NONOTES; which will suppress all NOTES to the LOG. But WARNINGS will be displayed. I don't recommend this as NOTES can be very useful to detect issues in your code.
Related
I have a data set with lots of flight information, in the below format.
carrier flight origin dest air_time
9E 4194 EWR ATL 105
9E 4362 EWR ATL .
9E 4362 EWR ATL 117
9E 3633 EWR ATL 113
The second record, does not have the air_time data available. The business requirement is that in such cases;
I should find the average air_time for the air craft carrier code,
use the same departure, and destination airports
Populate this average air_time as the air_time for row #2, which has the missing data.
I am unable to code this in SAS. The code should do this for every time a missing value is found in air_time. Request experts to help me.
Thanks in advance!
The below solution worked for me perfectly.
Step #1, sorted the variables using which I planned to find the average values.
proc sort data=cs1.flights_cln out=cs1.flights_srt;
by carrier origin dest;
run;
Step #2, used the standar procedure. After I ran this code, the missing values in the data set, got replaced with the average values.
proc standard data=cs1.flights_srt out=cs1.flights_stn replace;
by carrier origin dest;
run;
I'm creating a stored process in SAS EG for some business partners, but I can't seem to get my dataset to output.
A 'Results' viewer shows up but is blank, and my code works perfectly fine when not using a stored process, but the user has to manually change the macro variable for the account they are looking for. With a stored process I can mitigate users accidentally deleting some code, etc.
I can see in my SAS log that the output dataset is being created with variables and observations, but it doesn't automatically pop up like a typical SAS EG job would. I also have some documentation I received from a co-worker around stored processes, and it seems to me that after successful execution a SAS dataset should automatically output.
One thought: Will a stored process output a dataset if there are warnings in the log? I have warnings presented because I am appending datasets to a base file that isn't created, so the lengths of my numeric variables change.
Here's a snippet from the log..
NOTE: The address space has used a maximum of 5504K below the line and 222716K above the line.
104
105 data tran_last;
106 retain TRAN_DT MRCH_NAME MRCH_CITY AMT_TRAN DEB_CRD_IND;
107 set tran_sorted;
108 output;
109 run;
The SAS System
NOTE: There were 164 observations read from the data set WORK.TRAN_SORTED.
NOTE: The data set WORK.TRAN_LAST has 164 observations and 5 variables.
NOTE: The DATA statement used 0.00 CPU seconds and 51817K.
NOTE: The address space has used a maximum of 5504K below the line and 222716K above the line.
The data set WORK.TRAN_LAST is the dataset I wish to be output so that my user can directly copy/paste from there, maybe I'm missing something apparent, but I can't seem to figure this out.
Version 7.1
The answer was extremely simple. I had to use
PROC PRINT DATA = MYDATA ;
RUN;
at the end of my stored procedure.
However, I have books from the SAS Institute that say you can retrieve an "Output Data" file from a stored procedure instead of the "Results Viewer" using proc print. This functionality must have been taken out with newer versions, or maybe I was doing something wrong.
To fix this issue, I have my SAS connected to an excel file that the end-user will run the program(s) from so that they won't need to worry about the output being "Results Viewer".
I got a short question - If we are creating a SAS dataset say - Sample.sas7bdat which already exists, will the code take more time to execute (because here the code has to overwrite the existing dataset) than the case when this dataset was not already there?
data sample;
.....
.....
run;
I did some reasearch on the internet but could not find a satisfactory answer. To me it seems like the code should take a little bit extra time, though not sure how much of impact it would make on a 10GB of dataset.
You could test this yourself fairly easily. A few caveats:
Make sure you have a large enough dataset such that you won't miss the differences in simple random cpu activity. 100+MB is usually a good target.
Make sure you perform the test multiple times - the more the better, with no time in between if possible. One test will always be insufficient and will always tend to show the first dataset as faster, because it benefits from write caching (basically the OS saying that it's done writing when it's not, but simply has the write queued up in memory).
Here's an example of my test. This is a 100 million row dataset with two 8 byte numerics, so 1.6 GB.
First, the results. I see a few second difference. Why? SAS takes a few operations when replacing a dataset:
Write dataset to temporary file
Delete the old dataset
Rename temporary dataset to new dataset
On some OSs this seems to be faster than others; I've found Windows desktop to be fairly slow about this, compared to unix or even Windows Server OS which is pretty quick. I'm guessing Windows is more careful about deleting than simply changing a file system pointer, but I don't really know. It's certainly not copying the whole file over from the utility directory (it's not nearly enough time for that). I also suspect write caching is still giving a bit of a boost to the new datasets, particularly as time for all datasets is growing as I write. The difference is probably only about a second or so - the difference between _REP iteration 2 and _NEW iteration 3 seems the most reasonable to me.
Iteration 1 _NEW=7.26999998099927 _REP=12.9079999922978
Iteration 2 _NEW=10.0119998454974 _REP=11.0789999961998
Iteration 3 _NEW=10.1360001564025 _REP=15.3819999695042
Iteration 4 _NEW=14.7720000743938 _REP=17.4649999142056
Iteration 5 _NEW=16.2560000418961 _REP=19.2009999752044
Notice the first iteration new is far faster than the others, and overall time increases as you go (as the write caching is less and less able to keep up). I suspect if you allow it to continue (or use a still larger file, which I don't have time for right now) you might see even more consistent times. I'm also not sure what happens with write caching when a file that is write cached is deleted; it's possible it has to wait for the write caching to write out to disk before doing the delete op or something similar. You could perform a test where you waited 30 seconds between _NEW and _REP to verify that.
The code:
%macro test_me(iter=1);
%do _i=1 %to &iter.;
%let start = %sysfunc(time());
data test&_i.;
do x = 1 to 1e8;
y=x**2;
output;
end;
run;
%let mid=%sysfunc(time());
data test&_i.;
do x = 1 to 1e8;
y=x**2;
output;
end;
run;
%let end=%sysfunc(time());
%let _new = %sysevalf(&mid.-&start.);
%let _rep = %sysevalf(&end.-&mid.);
%put Iteration &_i. &=_new. &=_rep.;
%end;
proc datasets nolist kill;
quit;
%mend test_me;
options nosource nonotes nomprint nosymbolgen;
%test_me(iter=5);
There are more file operations involved when you are overwriting. After creating the table, SAS will delete the old table and rename the new. In my tests this took 0.2 seconds extra time.
In a brief test, my 800Mb dataset took 4 seconds to create new and 10-15 seconds to overwrite. I'm assuming this is because SAS has to preserve the existing dataset until the datastep completes executing so as to preserve data-integrity. That's why you might get the following message in the log:
WARNING: Data set dset was not replaced because this step was stopped.
Overwrite test
NOTE: The data set WORK.SAMPLE has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 10.06 seconds
user cpu time 3.08 seconds
system cpu time 1.48 seconds
memory 1506.46k
OS Memory 26268.00k
Timestamp 08/12/2014 11:43:06 AM
Step Count 42 Switch Count 38
Page Faults 0
Page Reclaims 155
Page Swaps 0
Voluntary Context Switches 190
Involuntary Context Switches 288
Block Input Operations 0
Block Output Operations 1588496
New data test
NOTE: The data set WORK.SAMPLE1 has 100000000 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 3.94 seconds
user cpu time 3.14 seconds
system cpu time 0.80 seconds
memory 1482.18k
OS Memory 26268.00k
Timestamp 08/12/2014 11:43:10 AM
Step Count 43 Switch Count 38
Page Faults 0
Page Reclaims 112
Page Swaps 0
Voluntary Context Switches 99
Involuntary Context Switches 294
Block Input Operations 0
Block Output Operations 1587464
The only difference between the log messages is the real time, which to me would indicate SAS is processing filesystem operations on the dataset files.
N.B. I have tested this on SAS (r) Proprietary Software Release 9.4 TS1M2, which I'm running through SAS Studio online. I think it's a Linux operating system, results could vary depending on your operating system.
I have a sas output, and for each person have some information. But each person is supposed to be on a separate page when printed out, in other words that PDF should be one page for each person. I didn't use macro in my code. Also I don't know how to make macro. So is there any way that I can separate pages without using macro?
Code:
data _null_;
set maingroup;
call execute('%bygroup(' || trim(maingroup) || ')');
run;
This code separate the people for each page. But I don't have macro, I changed the code little bit. Check the report as below.:
Ayda Ceyhan: 325
1258 458
Grade:3.0
Issues: Test
-------
Justin Costay: 526
1568 132
Grade:3.5
Issues: NA
This is the output, there are two people in here. I need them to separate for each page when print out.
This depends largely on your actual report; but in general, you should be able to use by groups rather than using macros.
A simple example:
ods pdf file="c:\temp\test.pdf" startpage=bygroup;
proc report data=sashelp.class nowd;
by name;
columns age sex height weight;
run;
ods pdf close;
The startpage=bygroup tells the PDF engine to print out a new page for each by group. You might need to use notsorted if your by variable cannot be sorted on. This may or may not exactly do what you want, depending on how you're producing the report.
If you're doing this with data step programming, you may have a harder time without having access to the macro that's doing it. I honestly wouldn't use data step programming; nowadays, proc report/tabulate/etc. are very good at producing reports in whatever format you want, and they're much more powerful than data step programming.
In your specific simple example, you may be able to issue ods pdf startpage=now; commands via call execute (and then use startpage=never on the original ods pdf statement).
We read SAS xpt files to load data in .net. Everything works fine but recently we have encountered a problem where the customer has stored date as a numeric value in a column and provided a Format in the file header. The SAS viewer can display that data correctly using the given format but we have to load that data in .net in our program and we don't require SAS.
I recently found out that you can use the SaS LocalProvider with OLEDB but it turns out that it does not support Numeric formatting. So we are ending up with the wrong data in columns where data is stored as a numeric value with a format provided for it.
Can anyone please help me understand and resolve the issue with probably some code sample. I have looked around for code samples in .Net but with no luck so far for this issue.
Thanks in advance.
Regards,
Nasir
SAS Date values are stored as the number of days since Jan 1, 1960.
122
123 data _null_;
124 x=today();
125 put x=;
126 run;
x=19410
Today (2/21/2013) for example is 19410 days since 1/1/1960. Assuming you know your own software's date format (probably some number of days since some other date), you can perform the transformation on your own.
If it's relevant, SAS datetime values are # of seconds since 1/1/1960 00:00:00 .
128 data _null_;
129 x=datetime();
130 put x=;
131 run;
x=1677052885.5
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds
Again, that's the time as of 08:00 2/21/2013.