Is there a way to force sas to continue processing, despite finding errors?
I'm appending a large quantity of datasets at the moment, however within the list I of dataset names I have, some don't exist. This is resulting in a bunch of errors and causing SAS to exit with the message "The SAS System stopped processing this step because of errors.".
You could evaluate the existence of a dataset using the EXIST() function and make the execution of the append conditional on the outcome.
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000210903.htm
Related
I work on a team that runs this big data QA project. In addition to the QA I'm often tasked with trying to improve the speed/efficiency of the SAS code (the QA evolves a lot). One coworker will write multiple DATA or PROC steps and then put a RUN; at the end of them all- I never learned to code that way. All I care about is speed and memory use- does this style of coding impact that?
example"
data a;
set b;
if yadda yadda;
proc transpose data= a;
out= c;
id ;
var;
data ;
set;
data;
set;
run;
Any execution speed impact (plus or minus) from leaving off the explicit step ending statements will be trivial. SAS will determine the step has ended when it sees that you have started another step (PROC or DATA).
Only speed impact would be if you were running interactively, say in Display Manager, and left the last DATA step unterminated. The SAS session would just wait for you to finish defining the data step before it would start to compile and run it.
But leaving them off might have a large impact on your ability to maintain the code or debug any issues by reading the SAS logs. You also might run the risk of coding mistake impacting more than one step. If the step has been ended with a RUN statement and does not contain any coding errors it will run. But if there is a coding error in the first line of the next step then that might impact both steps ability to be understood and executed by SAS.
I would consider this poor programming style, but it doesn't affect the functioning of the program.
SAS will consider a DATA or PROC step terminated when it either encounters a step boundary, such as:
RUN
QUIT
DATA
PROC
Any of those ends the current step.
From SAS's RUN documentation:
Although the RUN statement is not required between steps in a SAS program, using it creates a step boundary and can make the SAS log easier to read.
For that reason, I consider it mandatory in my environment. But it doesn't affect the actual running time. However, to me running time is less important than programmer time.
Code executes correctly - log doesn't show any errors, ect.
I've tried removing the observations with multiple methods. I can manually delete observations. I am able to add to the dataset, but can't remove with code.
Data genes3;
Set genes;
If A6_A8= 28.0507 THEN A6_A8=.;
IF ND3_A8= 0.11936 THEN ND3_A8=.;
IF ND5_A8=0.39961 THEN ND5_A8=.;
IF ND3_ND5= 20.0195 THEN ND3_ND5=.;
Run;
Results Showing no difference in the dataset before running the code and after
If a SAS DATA step references a non-existant variable in a DROP, KEEP, or RENAME statement, it returns an error saying such and stops the DATA step due to this error.
How do I get SAS to keep going with the step when it references a non-existent variable? I assume there's an OPTION for this (?) but I can't figure out what it's called if this is the case.
(I'm dealing with yearly datasets for which variables occasionally get added or deleted from year to year.)
Try using:
options dkricond=nowarn dkrocond=nowarn;
First one is for input datasets, second one is for output datasets.
You might want to set these back to warn or error after you are done with the specific data steps where you know this will be an issue.
SAS Manual page
I'm new to SAS EG, I usually use BASE SAS when I actually need the program, but my company is moving heavily toward EG. I'm helping some areas with some code to get data they need on an ad-hoc basis (the code won't change though).
However, during processing, we create many temporary files that are just iterations across months. I.E. if the user wants data from 2002 - 2016, we have to pull all those libraries and then concatenate them with our results. This is due to high transactional volume, the final dataset is limited to a small number of observations. Whenever I run this program though, SAS outputs all 183 of the datasteps created in the macro, making it very ugly, and sometimes the "Output Data" that appears isn't even output from the last datastep, but from an intermediary step, making it annoying to search through for the 'final output dataset'.
Is there a way to limit the datasets written to "Output Data" so that it only shows the final dataset - so that our end user doesn't need to worry about being confused?
Above is an example - There's a ton of output data sets that I don't care to see. I just want the final, which is located (somewhere) in that list...
Version is SAS E.G. 7.1
EG will always automatically show every dataset that was created after the program ends. If you don't want it to show any intermediate tables, delete them at the very last step in your process.
In your case, it looks as if your temporary tables all share the name TRN. You can clean it up as such:
/* Start of process flow */
<program statements>;
/* End of process flow*/
proc datasets lib=work nolist nowarn nodetails;
delete TRN:;
quit;
Be careful if you do this. Make sure that all of your temporary tables follow the same prefix naming scheme, otherwise you may accidentally delete tables that you need.
Another solution is to limit the number of datasets generated, and have a user-created link to the final dataset. There's an article about it here.
The alternate solution here is to add the output dataset explicitly as an entry on your process flow, and disregard the OUTPUT window unless you need to investigate something from the intermediary datasets.
This has the advantage that it lets you look at the intermediary datasets if something goes wrong, but also lets you not have to look through all of them to see the final dataset.
You should be able to add the final output dataset to the process flow once it's created once easily, and then after that one time it will be there for you to select to look at.
We are evaluating the time taken for two set of codes in SAS. Is there a way we can write/ tabulate option fullstimer results in a SAS dataset, without copying the entire log file into a notepad?
I would go about it like this.
Create separate SAS program files containing your code for each approach. Include options fullstimer at the top of both.
Batch submit your programs and write the logs to permanent files using the -log command line option.
Create a simple program that reads in both logs and compares the results.
The last step can be accomplished by using data steps with the INFILE statement and restricting the input records to those which are standard output from FULLSTIMER. Then you can compare the created datasets however you wish, e.g. via PROC COMPARE.
SAS has provided a log parsing macro that looks as though it should do the sort of thing that you want. It's available here:
http://support.sas.com/kb/34/301.html