Error surrounding use of scan(&varlist) + Comparison of macro variables - sas

As a follow up to this question, for which my existing answer appears to be best:
Extracting sub-data from a SAS dataset & applying to a different dataset
Given a dataset dsn_in, I currently have a set of macro variables max_1 - max_N that contain numeric data. I also have a macro variable varlist containing a list of variables. The two sets of macros are related such that max_1 is associated with scan(&varlist, 1), etc. I am trying to do compare the data values within dsn_in for each variable in varlist to the associated comparison values max_1 - max_N. I would like to output the updated data to dsn_out. Here is what I have so far:
data dsn_out;
set dsn_in;
/* scan list of variables and compare to decision criteria.
if > decision criteria, cap variable */
do i = 1 by 1 while(scan(&varlist, i) ~= '');
if scan("&varlist.", i) > input(symget('max_' || left(put(i, 2.))), best12.) then
scan("&varlist.", i) = input(symget('max_' || left(put(i, 2.))), best12.);
end;
run;
However, I'm getting the following error, which I don't understand. options mprint; shown. SAS appears to be interpreting scan as both an array and a variable, when it's a SAS function.
ERROR: Undeclared array referenced: scan.
MPRINT(OUTLIERS_MAX): if scan("var1 var2 var3 ... varN", i) > input(symget('max_'
|| left(put(i, 2.))), best12.) then scan("var1 var2 var3 ... varN", i) =
input(symget('max_' || left(put(i, 2.))), best12.);
ERROR: Variable scan has not been declared as an array.
MPRINT(OUTLIERS_MAX): end;
MPRINT(OUTLIERS_MAX): run;
Any help you can provide would be greatly appreciated.

The specific issue you have here is that you place SCAN on the left side of an equal sign. That is not allowed; SUBSTR is allowed to be used in this fashion, but not SCAN.

Related

How to correct this sas function in order to have the jaccard distance?

I created a SAS function using fcmp to calculate the jaccard distance between two strings. I do not want to use macros, as I'm going to use it through a large dataset for multiples variables. the substrings I have are missing others.
proc fcmp outlib=work.functions.func;
function distance_jaccard(string1 $, string2 $);
n = length(string1);
m = length(string2);
ngrams1 = "";
do i = 1 to (n-1);
ngrams1 = cats(ngrams1, substr(string1, i, 2) || '*');
end;
/*ngrams1= ngrams1||'*';*/
put ngrams1=;
ngrams2 = "";
do j = 1 to (m-1);
ngrams2 = cats(ngrams2, substr(string2, j, 2) || '*');
end;
endsub;
options cmplib=(work.functions);
data test;
string1 = "joubrel";
string2 = "farjoubrel";
jaccard_distance = distance_jaccard(string1, string2);
run;
I expected ngrams1 and ngrams2 to contain all the substrings of length 2 instead I got this
ngrams1=jo*ou*ub
ngrams2=fa*ar*rj
If you want real help with your algorithm you need to explain in words what you want to do.
I suspect your problem is that you never defined how long you new character variables NGRAM1 and NGRAM2 should be. From the output you show it appears that FCMP defaulted them to length $8.
To define a variable you need use a LENGTH statement (or an ATTRIB statement with the LENGTH= option) before you start referencing the variable.

NOT+IN SAS operators combined, is this valid? Can't find documentation

I'm trying to understand how to code something along the lines of "NOT IN the LIST" type of logic in SAS.
I figured I could do "NOT" + "IN" as something like below.
Data work.OUT;
Set work.IN;
If VAR=1 then OUTPUT=1;
else if VAR=2 then OUTPUT=2;
else if VAR NOT in (1,2) then OUTPUT=3;
else OUTPUT=4;
run;
When I export the dataset all I see is OUTPUT=3 for all records. So something is happening in the derivation and it's transforming all VAR values into OUTPUT 3 values for some reason. Even though I know for a fact that other values exist in the VAR.
I don't understand what the problem is? Can we not combine NOT+IN operators? Alternatively, do you have any other ways of coding this type of logic in SAS? I rather not code each bit of code since I have more than 300 unique values for VAR
Welcome to Stack Overflow Alejandro. Your code assigns values 1 2 or 3 depending on what values are in the variable called var:
data in;
do var = 1 to 5;
output;
end;
run;
Data work.OUT;
set work.IN;
If VAR=1 then OUTPUT=1;
else if VAR=2 then OUTPUT=2;
else if VAR NOT in (1,2) then OUTPUT=3;
else OUTPUT=4;
run;
Your code says check for var = 1 then check for var = 2 and then check if it is not 1 or 2. The final else is never checked because a var will be 1 or 2 or not 1 or 2.
If you have a pile of if checks, you can use a select/when/otherwise/end block. It will check a series of rules (in the order you type them) and then will do something based on whichever rule is true first.
data out;
set in;
select;
when(var = 1) output = 1;
when(var = 2) output = 2;
when(var < 5) output = 3;
when(.) output = -9999999;
otherwise output = 42;
end;
run;
I hope that helps. If not please send up another flare.

Is there a SAS function to delete negative and missing values from a variable in a dataset?

Variable name is PRC. This is what I have so far. First block to delete negative values. Second block is to delete missing values.
data work.crspselected;
set work.crspraw;
where crspyear=2016;
if (PRC < 0)
then delete;
where ticker = 'SKYW';
run;
data work.crspselected;
set work.crspraw;
where ticker = 'SKYW';
where crspyear=2016;
where=(PRC ne .) ;
run;
Instead of using a function to remove negative and missing values, it can be done more simply when inputting or outputting the data. It can also be done with only one data step:
data work.crspselected;
set work.crspraw(where = (PRC >= 0 & PRC ^= .)); * delete values that are negative and missing;
where crspyear = 2016;
where ticker = 'SKYW';
run;
The section that does it is:
(where = (PRC >= 0 & PRC ^= .))
Which can be done for either the input dataset (work.crspraw) or the output dataset (work.crspselected).
If you must use a function, then the function missing() includes only missing values as per this answer. Hence ^missing() would do the opposite and include only non-missing values. There is not a function for non-negative values. But I think it's easier and quicker to do both together simultaneously without a function.
You don't need more than your first test to remove negative and missing values. SAS treats all 28 missing values (., ._, .A ... .Z) as less than any actual number.

Loop through a set of variables based on condition in another variable

I have a list of variables a_23 a_24_1 a_24_2 a_24_3 a_24_4 a_24_5 a_24_6 a_24_7 a_24_8.
The values in variables a_24* are based on the response in a_23.
If a_23==1, then at least one variable in a_24* must be equal to 1.
I therefore want to check if any of the variables a_24* does not contain the value 1 if a_23==1
I tried the loop below,
foreach var of varlist a_24_1* {
br a_23 a_24* if a_23==1 & `var' != 1
}
but it returns all the variables that do not contain 1 in the set of variables. However, I only need cases where all variables do not contain the value 1 if the determining variable is equal to 1.
A data example as well as code would be a good idea, so that you then base your question on an MCVE: see https://stackoverflow.com/help/mcve for explanation.
As I understand it an intermediate variable would help here:
egen mina_24 = rowmin(a_24_*)
as the minimum will be 0 if and only if all values are 0.
Note that your loop
foreach var of varlist a_24_1* {
br a_23 a_24* if a_23 == 1 & `var' != 1
}
is a loop over the single variable a_24_1; presumably you mean a24_* in the foreach line.

min/max function work-around in SAS

Essentially, what I would like to do is use the min/max function while altering a table. I am altering a table, adding a column, and then having that column set to a combination of a min/max function. In SAS, however, you can't use summary functions. Is there a way to go around this?
There are many more inputs but for the sake of clarity, a condensed version is below! Thanks!
%let variable = 42
alter table X add Z float;
update X
set C = min(max(0,500 - %sysevalf(variable)),0);
First, let's remove the %sysevalf(), they are not needed and format for readability
alter table claims.simulation add Paid_Claims_NoISL float;
update claims.simulation
set Paid_Claims_NoISL
= min(
max(0
, Allowed_Claims -&OOPM
, min(Allowed_Claims
,&Min_Paid+ max(Allowed_Claims - &Deductible * &COINS
,0
)
)
, &Ind_Cap_Claim
)
);
Notice that the first min() only has 1 argument. That is causing your ERROR. SAS thinks that because it only has 1 input, you want to summarize a column, which is not allowed in an update.
Just take that out and it should work:
alter table claims.simulation add Paid_Claims_NoISL float;
update claims.simulation
set Paid_Claims_NoISL
= max(0
, Allowed_Claims -&OOPM
, min(Allowed_Claims
,&Min_Paid+ max(Allowed_Claims - &Deductible * &COINS
,0
)
)
, &Ind_Cap_Claim
);
To reference the value of a macro variable you need to use &.
%let mvar = 42;
proc sql;
update X set C = min(max(0,500 - &mvar),0);
quit;
Note there is no need to use the macro function %SYSEVALF() since SAS can more easily handle the case when the value &mvar is an expression than the macro processor can.
%let mvar = 500 - 42;
proc sql;
update X set C = min(max(0,&mvar),0);
quit;