Warn if column is missing in a data step [duplicate] - sas

This question already has an answer here:
Can I Promote Notes About Uninitialized Variables to Errors
(1 answer)
Closed 7 years ago.
I had a recent problem where I was using a data step to create an output file and one of the columns had been renamed. The data step executed normally filling in the now missing column with nulls without any errors or warnings. It did add a note in the log saying that a variable was undefined but otherwise there was no indication that anything was wrong.
Is there anyway to force the data step to error out or at least give a more noticeable warning in such a situation?

There is an undocumented system option which turns problematic notes into errors, including the uninitialized note. I find it very handy.
1 options dsoptions=note2err;
2 data a;
3 y=x;
4 run;
ERROR: Variable x is uninitialized.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.A may be incomplete. When this step was stopped there were 0 observations and 2 variables.

Put a list of the expected variables in the output file in a keep statement in your data step (or a keep= clause on the output dataset), and set option dkrocond = error; before running your data step. This is quite an old option (it goes back to at least SAS 9.1.3) so it should work in your scenario.
You can also trigger similar error messages if variables are missing from an input dataset by setting option dkricond = error;.
You can also set either of these to warn if you prefer.
Also, if you want a more general method of detecting whether a variable is present in a data set, you could try something like this:
data _null_;
dsid = open('sashelp.class');
vnum1 = varnum(dsid,'varname');
vnum2 = varnum(dsid,'sex');
rc = close(dsid);
put vnum1= vnum2=;
run;
The crucial behaviour here is that the varnum function returns 0 for variables that are not present in the opened dataset. All of the functions used above can be used with %sysfunc, so it's even possible to do this sort of check in pure macro code, i.e. without actually running a data step or proc.

Related

Put with macro variable as format

I have a dataset with a variable called pt with observations 8.1,8.2,8.3 etc and a variable called mean with values like 8.24 8.1 8.234 etc. Which are paired with each other.
I want to be able to set my put informat to the formats from the variable num.
I get the errors "Expecting an arithmetic expression"
"the symbol is not recognized and will be ignored" and "syntax error" from my code. (underlining the &fmt. part)
if pt=&type;
call symput("fmt",pt);
fmt_mean = putn(mean,&fmt.);
Thanks in advance for your help.
The macro processor's work is done before SAS compiles and runs the data step. So trying to place the value into a macro variable and then use it immediately to generate and execute SAS code will not work.
But since you are using the PUTN() function it can use the value of an actual variable, so there is no need to put the format into a macro variable.
fmt_mean = putn(mean,pt);
Please, post your data set and data step. Your description is hard to understand.
However the solution seems to be simple: do not use macro variables! You don't need them here. Unlike put() function which expect format know at compile time (that is when you can use macro variables) its analog putn() expects second argument to be variable. Of course, it works a little slower due to that permittance. So your code can look like that:
data ...;
set ...(keep=mean pt);
fmt_mean = putn(mean, pt);
run;
where pt variable maybe numeric, i.e. 8.2, or character, i.e. '8.2'.
If you want to understand how SAS macro works and what call symput does look here:
https://stackoverflow.com/a/69979074/7864377

Check whether a SAS date is valid before using MDY()?

Is there a way to check whether three variables (month, day, year) can actually build a valid SAS date format before handing those variables over to MDY() (maybe except checking all possible cases)?
Right now I am dealing with a couple of thousand input variables and let SAS put them together - there are a lot of date variables which cannot work like month=0, day=33, year=10 etc. and I'd like to catch them. Otherwise I will get way too many Notes like
NOTE: Invalid argument to function MDY(13,12,2014)
which then eventually culminate in Warnings like
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
I really would like too prevent getting those Warnings and I thought the best way would be to actually check the validity of the date - any recommendations?
Use an INFORMAT instead, then you can use the ?? modifier to suppress errors.
month=0;
day=33;
year=10;
date = input(cats(put(year,z4.),put(month,z2.),put(day,z2.)),??yymmdd8.);
SAS documentation: ? or ?? (Format Modifiers for Error Reporting)

Create Lag variable with three conditions

-I need a lead variable based on 3 conditions. IF variable RoaDLM has a number and IF the Co_ID is the same as the lag(co_id) and IF CEO = lag(ceo), I need a lead variable: Lead1
-i sort descending to create lag variable
-Every thing else should be '.'
-here is my code:
data RoaReg;
set RoaReg;
by CO_ID descending fyear;
if RoaDlm ne 0 and Co_ID = lag(CO_ID) and ceo=ceo then
Lead1 = lag(ROA);
else if RoaDlm= 0 then
Lead1='.';
run;
-Anyway, this does not work. Thanks!
Theres a couple of issues with your code.
Do not use the same data set name in the SET and DATA statements. This is a recipe for errors that are difficult to debug.
Lag() cannot be calculated conditionally, use it always and set to missing when necessary.
data RoaReg2;
set RoaReg;
by CO_ID descending fyear;
Lead1 = lag(ROA);
if RoaDlm= 0 then call missing (lead1);
run;
This is the correct version of your code, or my best guess. Providing sample data would help for sure.
Based on what I understood, you need a lead variable based on few conditions - two being lagged value of the variables.
You don't have a lead function in SAS, as per my knowledge. You can use proc expand for that purpose. And, you did not mention about the variable for which you want a lead - so, I am assuming it to be a variable named ROA.
So, here is my best guess/interpretation of what you want.
data RoaReg_lead;
merge RoaReg RoaReg(keep=ROA rename=(ROA=LeadROA) firstobs=2); /*merged the same table with only the ROA variable, and read the values from 2nd observation | can't use by variables in order to do so*/
Lag_co_id=lag(co_id); /*creating lagged values*/
Lag_ceo=lag(ceo);
/*conditions*/
if (RoaDLM ne . and RoaDLM>0) and co_id=Lag_co_id and ceo=Lag_ceo then
Lead1=LeadROA;
drop Lag_co_id Lag_ceo LeadROA; /*You can keep the vars to do a manual check*/
run;
Otherwise, providing a sample table of your data (have and want) would be very helpful.

How to choose indexed assignment variable dynamically in SAS?

I am trying to build a custom transformation in SAS DI. This transformation will "act" on columns in an input data set, producing the desired output. For simplicity let's assume the transformation will use input_col1 to compute output_col1, input_col2 to compute output_col2, and so on up to some specified number of columns to act on (let's say 2).
In the Code Options section of the custom transformation users are able to specify (via prompts) the names of the columns to be acted on; for example, a user could specify that input_col1 should refer to the column named "order_datetime" in the input dataset, and either make a similar specification for input_col2 or else leave that prompt blank.
Here is the code I am using to generate the output for the custom transformation:
data cust_trans;
set &_INPUT0;
i=1;
do while(i<3);
call symputx('index',i);
result = myfunc("&&input_col&index");
output_col&index = result; /*what is proper syntax here?*/
i = i+1;
end;
run;
Here myfunc refers to a custom function I made using proc fcmp which works fine.
The custom transformation works fine if I do not try to take into account the variable number of input columns to act on (i.e. if I use "&&input_col&i" instead of "&&input_col&index" and just use the column result on the output table).
However, I'm having two issues with trying to make the approach more dynamic:
I get the following warning on the line containing
result = myfunc("&&input_col&index"):
WARNING: Apparent symbolic reference INDEX not resolved.
I do not know how to have the assignment to the desired output column happen dynamically; i.e., depending on the iteration of the do loop I'd like to assign the output value to the corresponding output column.
I feel confident that the solution to this must be well known amongst experts, but I cannot find anything explaining how to do this.
Any help is greatly appreciated!
You can't use macro variables that depend on data variables, in this manner. Macro variables are resolved at compile time, not at run time.
So you either have to
%do i = 1 %to .. ;
which is fine if you're in a macro (it won't work outside of an actual macro), or you need to use an array.
data cust_trans;
set &_INPUT0;
array in[2] &input_col1 &input_col2; *or however you determine the input columns;
array output_col[2]; *automatically names the results;
do i = 1 to dim(in);
result = myfunc(in[i]); *You quote the input - I cannot see what your function is doing, but it is probably wrong to do so;
output_col[i] = result; /*what is proper syntax here?*/
end;
run;
That's the way you'd normally do that. I don't know what myfunc does, and I also don't know why you quote "&&input_col&index." when you pass it to it, but that would be a strange way to operate unless you want the name of the input column as text (and don't want to know what data is in that variable). If you do, then pass vname(in[i]) which passes the name of the variable as a character.

SAS code behaves differently in interactive and batch modes

I have the following code that is running inside a macro. When it is run in interactive mode, it runs absolutely fine, no errors or warning. That was the case for last two year.
The same code has now been deployed in batch mode and it generates a warning WARNING: Apparent symbolic reference FIRSTRECCOUNT not resolved. and no value assigned to macro variable.
My question is, does anyone have any ideas why batch mode and interactive mode would behave differently?
Here some more information:
The dataset is being created and it is in work library.
The dataset does get opened by data step.
`firstreccount' doesn't get initialiased anywhere else in the program
I have search sas community. There is a topic here, but I don't have the same errors in batch initilisation as described in the answer.
Detailed information on the warning but it doesn't explain by it would work in interactive mode, but not in batch mode.
.
1735 %LET FIRSTSET = work.dataset1;
1744 DATA _NULL_;
1745 IF 0 THEN
1746 SET &FIRSTSET NOBS=X;
1747 CALL SYMPUT('FIRSTRECCOUNT' ,X);
1748 STOP;
1749 RUN;
1755 DATA _NULL_;
1756 IF 0 THEN
1757 SET &SECONDSET NOBS=X;
1758 CALL SYMPUT('SECONDRECOUNT' ,X);
1759 STOP;
1760 RUN;
WARNING: Apparent symbolic reference FIRSTRECCOUNT not resolved.
Update:
So I have attempted to replicate the error by copying the code with warning into a separate scheduled flow, but it didn't cause any errors at all.
By the way, the original job was deployed from SAS DI studio. I have checked all lines in user written code nodes and made sure that the length was within 80 characters as recommended by #RawFocus, #RobertPentridge, but it didn't solve the issue.
As recomended by #data_null_ I have checked VALIDVARNAME and it was different between interactive (value of "any") and batch mode (value of "V7") but changing these hasn't made any difference.
I have rewritted the logic to get the number of observations by calling attr for an open dataset. This eliminated the warning, but program would still fail with warning popping out in different places. It made me think Robert Partridge is correct. At the same time, I got an error that a macro not being resolved. The macro was inserted by DI studio to collect performance MI even that the job wasn't meant to be collecting MI. This made me think that SAS DI studio is not generating code correctly when deploying it, so I manually edited the deployed code to remove offending macro call and I also spotted that there was one line of code with MD5 function that was too long on one line because of a number of parameters being passed to it, so I inserted some white space. And finally the problem was fixed!!
I still need to do something about the job because when it will get redeployed from SAS DI, it will generate the same errors again. I don't have time to look into this further at the moment.
Conclusion: what you write in SAS DI and what gets deployed could be slightly different which could cause syntax parse to throw errors in random places. So I will mark Robert's answer as correct because it got me closer to solving the problem then any other answer.
The problem could be happening above the code snippet you pasted. The parser got into a funk earlier, and ended up issuing warning about code that is perfectly fine.
Check to make sure that no code within a macro is longer that ~160 chars on a single line. I try to keep my code well below that but long lines of code can run fine interactively and fail in batch - particularly when inside of a macro.
I expect your program has some small error above that does not cause SAS to go into syntax check mode when run interactively but does cause SAS to set obs to 0 and enter syntax check mode when run in batch.
One possibility is the limit (in batch mode) of the length of a line in your submitted SAS program:
See: http://support.sas.com/kb/15/883.html
Which version of SAS are you running?