How do I get SAS to ignore missing variable names? - sas

If a SAS DATA step references a non-existant variable in a DROP, KEEP, or RENAME statement, it returns an error saying such and stops the DATA step due to this error.
How do I get SAS to keep going with the step when it references a non-existent variable? I assume there's an OPTION for this (?) but I can't figure out what it's called if this is the case.
(I'm dealing with yearly datasets for which variables occasionally get added or deleted from year to year.)

Try using:
options dkricond=nowarn dkrocond=nowarn;
First one is for input datasets, second one is for output datasets.
You might want to set these back to warn or error after you are done with the specific data steps where you know this will be an issue.
SAS Manual page

Related

SAS sortedby option

I am reading some code that involves a sortedby option which confuses me a little bit.
So in the first part of the code, the following code was used to readin the data:
data _quotes / view=_quotes;
set taq.&yyyymmdd:;
by symbol date time NOTSORTED ex; length EXN 3.;
run;
Note that there is a NOTSORTED option here. when I remove it, SAS would return an error ERROR: BY variables are not properly sorted on data set.
Based on my understanding of how SAS NOTSORTED works, the taq dataset is not properly sorted, but in proper groups.
However, in the following code almost immediately after the previous code (no sort code is involved in between), there in no more NOTSORTED option, but there is no error:
data &outset (sortedby= SYMBOL DATE TIME index=(SYMBOL)
label="WRDS-TAQ NBBO Data");
set _quotes;
by symbol date time;
run;
Therefore, I was wondering if it is because of the sortedby statement that made the difference? I read the SAS documentation, it seems that sortedby will NOT sort the dataset but only specify how the data is currently sorted.
But why did the by statement without the NOTSORTED option work in the second code but not the first code?
Note that the variable ex is not on the BY statement on the second step. If the source data is sorted by symbol date time but is not sorted by ex, what you are observing makes sense.
It's unusual to put the notsorted option in the middle of a list of variables. No matter where it is placed, it applies to the entire list. In this case, it's possible the author was intending to suggest to the reader which variable was not sorted. I find this style confusing.
To check, add ex to the BY statement on the second step and see if it throws an error.

SAS NOTSORTED Equivalent

I was using the following code to analyze data:
set taq.cq_&yyyymmdd:;
by symbol date time NOTSORTED ex;
There are are thousands of datasets I am running the code on in the unit of days. When &yyyymmdd only specifies one dataset (for one day. for example, 20130102), it works. However, when I try to run it for multiple datasets (for example, 201301:), SAS returns the following errors:
BY NOTSORTED/NOBYSORTED cannot be used with SET statement when
more than one data set is specified.
If I cannot use NOTSORTED here, what is an equivalent statement that I could use?
My understanding of the keyword NOTSORTED is that you use it when the data is not sorted yet. Therefore, do I need to sort it first? How to do it?
I am also confused by the number of variables that NOTSORTED is referencing. Does it only have an effect on "time", or it has effect on "symbol, data, time"?
Many thanks!
UPDATE#2:
The rest of the process immediately following the set statement is: (pseudo code as i don't have the permission to post the original code)
Data _quotes;
SET STATEMENT HERE
Change the name of a variable in the dataset (Variable name is EXN).
last.EXN in a if statement. If the condition is satisfied, label EXN.
Drop some variables.
Run;
DATA NEWDATASET (sortedby= SYMBOL DATE TIME index=(SYMBOL)
label="WRDS-TAQ NBBO Data");
SET _quotes;
by symbol date time;
....
Run;
NOTSORTED means that SAS can assume the sort order in the data is correct, so it may not have explicitly gone through a PROC SORT but it is in logical order as listed in the BY statement.
All variables in the BY statement are included in the NOTSORTED option. Given that I suspect you fully don't understand BY group processing.
It's usually a bit dangerous to use, especially if you don't understand BY group processing. If your data is in the same group but not adjacent it won't work properly and will not produce an error. The correct workaround depends on your processes to be honest.
I would suggest reviewing the documentation regarding BY group processing. It's quite in depth and has lots of samples to illustrate the different type of calculations.
http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n138da4gme3zb7n1nifpfhqv7clq.htm
NOTSORTED is often used in example posts to either avoid a sort or when using a custom sort that's difficult to implement in other ways. Explicitly sorting will remove this issue but you may also be misunderstanding how SAS processes data when you have a SET statement with a BY statement. I believe this is called interleaving.
http://support.sas.com/documentation/cdl/en/lrcon/69852/HTML/default/viewer.htm#n1tgk0uanvisvon1r26lc036k0w7.htm
I suspect that the NOTSORTED keyword is being using to find groups for observations with the same value for the EX variable within the same symbol,date,time. If you only need to find the FIRST then you can use the LAG() function to calculate the FIRST.EX flag.
data want;
set taq.cq_&yyyymmdd:;
by symbol date time;
first_ex = first.time or ex ne lag(ex);
Otherwise then perhaps you want to convert the process to data step views and then set the views together.
data work.view_cq_20130102 / view=work.view_cq_20130102;
set taq.cq_20130102;
by symbol date time ex NOTSORTED;
...
run;
...
data want ;
set work.view_cq_201301: ;
by symbol date time;
...

Prevent SAS EG from outputting every dataset in datastep

I'm new to SAS EG, I usually use BASE SAS when I actually need the program, but my company is moving heavily toward EG. I'm helping some areas with some code to get data they need on an ad-hoc basis (the code won't change though).
However, during processing, we create many temporary files that are just iterations across months. I.E. if the user wants data from 2002 - 2016, we have to pull all those libraries and then concatenate them with our results. This is due to high transactional volume, the final dataset is limited to a small number of observations. Whenever I run this program though, SAS outputs all 183 of the datasteps created in the macro, making it very ugly, and sometimes the "Output Data" that appears isn't even output from the last datastep, but from an intermediary step, making it annoying to search through for the 'final output dataset'.
Is there a way to limit the datasets written to "Output Data" so that it only shows the final dataset - so that our end user doesn't need to worry about being confused?
Above is an example - There's a ton of output data sets that I don't care to see. I just want the final, which is located (somewhere) in that list...
Version is SAS E.G. 7.1
EG will always automatically show every dataset that was created after the program ends. If you don't want it to show any intermediate tables, delete them at the very last step in your process.
In your case, it looks as if your temporary tables all share the name TRN. You can clean it up as such:
/* Start of process flow */
<program statements>;
/* End of process flow*/
proc datasets lib=work nolist nowarn nodetails;
delete TRN:;
quit;
Be careful if you do this. Make sure that all of your temporary tables follow the same prefix naming scheme, otherwise you may accidentally delete tables that you need.
Another solution is to limit the number of datasets generated, and have a user-created link to the final dataset. There's an article about it here.
The alternate solution here is to add the output dataset explicitly as an entry on your process flow, and disregard the OUTPUT window unless you need to investigate something from the intermediary datasets.
This has the advantage that it lets you look at the intermediary datasets if something goes wrong, but also lets you not have to look through all of them to see the final dataset.
You should be able to add the final output dataset to the process flow once it's created once easily, and then after that one time it will be there for you to select to look at.

SAS Data Integration - Create a physical table from metadata structure

i need to use a append object after a series of join that have a conditional run... So the join step may be not execute if the condition is not verified and his work physical dataset will not be created.
The problem is that the append step take an error if one o more input physical dataset are not created.
Is there a smart way to create a physical empty table from a metadata structure of the works table of the joins or to use the append with some non-created datasets?
The create table with the list of all field is not a real solution because i've to replicate it per 8 different joins and then replicate the job 10 times...
Thanks to all
Roberto
Thank you for your comments.
What you should do:
Amend your conditional node so that it would on positive condition to create a global macro variable with value of MAX. On negative condition to create the same variable with value of 0.
Replace offending SQL step with "CREATE TABLE" node
In the options for "CREATE TABLE", specify macro variable for "MAXIMUM OUTPUT ROWS (OUTOBS)". See the picture below for example of those options.
So now when your condition is not met, you will always end up with an empty table. When condition is met, the step executes normally.
I must say my version of DI Studio is a bit old. In my version SQL node doens't allow passing macro variables to SQL options, only integers can be typed in. Check if your version allows it because if it does, then you can amend existing SQL step and avoid replacing it with another node.
One more thing, you will get a warning when OUTOBS options is less then the resulting would be dataset.
Let me know if you have any questions.
See the picture for create table options:
At the end i've created another step that extract 0 row from the source table by the condition 1=0 in the where tab. In this way i have a empty table that i can use with a data/set in the post sql of the conditional run if the work table of the join does not exist.
This is not a solution but a valid work around.

Keep SAS processing after finding errors

Is there a way to force sas to continue processing, despite finding errors?
I'm appending a large quantity of datasets at the moment, however within the list I of dataset names I have, some don't exist. This is resulting in a bunch of errors and causing SAS to exit with the message "The SAS System stopped processing this step because of errors.".
You could evaluate the existence of a dataset using the EXIST() function and make the execution of the append conditional on the outcome.
http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000210903.htm