min/max function work-around in SAS - sas

Essentially, what I would like to do is use the min/max function while altering a table. I am altering a table, adding a column, and then having that column set to a combination of a min/max function. In SAS, however, you can't use summary functions. Is there a way to go around this?
There are many more inputs but for the sake of clarity, a condensed version is below! Thanks!
%let variable = 42
alter table X add Z float;
update X
set C = min(max(0,500 - %sysevalf(variable)),0);

First, let's remove the %sysevalf(), they are not needed and format for readability
alter table claims.simulation add Paid_Claims_NoISL float;
update claims.simulation
set Paid_Claims_NoISL
= min(
max(0
, Allowed_Claims -&OOPM
, min(Allowed_Claims
,&Min_Paid+ max(Allowed_Claims - &Deductible * &COINS
,0
)
)
, &Ind_Cap_Claim
)
);
Notice that the first min() only has 1 argument. That is causing your ERROR. SAS thinks that because it only has 1 input, you want to summarize a column, which is not allowed in an update.
Just take that out and it should work:
alter table claims.simulation add Paid_Claims_NoISL float;
update claims.simulation
set Paid_Claims_NoISL
= max(0
, Allowed_Claims -&OOPM
, min(Allowed_Claims
,&Min_Paid+ max(Allowed_Claims - &Deductible * &COINS
,0
)
)
, &Ind_Cap_Claim
);

To reference the value of a macro variable you need to use &.
%let mvar = 42;
proc sql;
update X set C = min(max(0,500 - &mvar),0);
quit;
Note there is no need to use the macro function %SYSEVALF() since SAS can more easily handle the case when the value &mvar is an expression than the macro processor can.
%let mvar = 500 - 42;
proc sql;
update X set C = min(max(0,&mvar),0);
quit;

Related

How to correct this sas function in order to have the jaccard distance?

I created a SAS function using fcmp to calculate the jaccard distance between two strings. I do not want to use macros, as I'm going to use it through a large dataset for multiples variables. the substrings I have are missing others.
proc fcmp outlib=work.functions.func;
function distance_jaccard(string1 $, string2 $);
n = length(string1);
m = length(string2);
ngrams1 = "";
do i = 1 to (n-1);
ngrams1 = cats(ngrams1, substr(string1, i, 2) || '*');
end;
/*ngrams1= ngrams1||'*';*/
put ngrams1=;
ngrams2 = "";
do j = 1 to (m-1);
ngrams2 = cats(ngrams2, substr(string2, j, 2) || '*');
end;
endsub;
options cmplib=(work.functions);
data test;
string1 = "joubrel";
string2 = "farjoubrel";
jaccard_distance = distance_jaccard(string1, string2);
run;
I expected ngrams1 and ngrams2 to contain all the substrings of length 2 instead I got this
ngrams1=jo*ou*ub
ngrams2=fa*ar*rj
If you want real help with your algorithm you need to explain in words what you want to do.
I suspect your problem is that you never defined how long you new character variables NGRAM1 and NGRAM2 should be. From the output you show it appears that FCMP defaulted them to length $8.
To define a variable you need use a LENGTH statement (or an ATTRIB statement with the LENGTH= option) before you start referencing the variable.

How to use string as column name in Bigquery

There is a scenario where I receive a string to the bigquery function and need to use it as a column name.
here is the function
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT column from WORK.temp WHERE rownumber = row_number)
);
When I call this function as select METADATA.GET_VALUE("TXCAMP10",149); I get the value as TXCAMP10 so we can say that it is processed as SELECT "TXCAMP10" from WORK.temp WHERE rownumber = 149 but I need it as SELECT TXCAMP10 from WORK.temp WHERE rownumber = 149 which will return some value from temp table lets suppose the value as A
so ultimately I need value A instead of column name i.e. TXCAMP10.
I tried using execute immediate like execute immediate("SELECT" || column || "from WORK.temp WHERE rownumber =" ||row_number) from this stack overflow post to resolve this issue but turns out I can't use it in a function.
How do I achieve required result?
I don't think you can achieve this result with the help of UDF in standard SQL in BigQuery.
But it is possible to do this with stored procedures in BigQuery and EXECUTE IMMEDIATE statement. Consider this code, which simulates the situation you have:
create or replace table d1.temp(
c1 int64,
c2 int64
);
insert into d1.temp values (1, 1), (2, 2);
create or replace procedure d1.GET_VALUE(column STRING, row_number int64, out result int64)
BEGIN
EXECUTE IMMEDIATE 'SELECT ' || column || ' from d1.temp where c2 = ?' into result using row_number;
END;
BEGIN
DECLARE result_c1 INT64;
call d1.GET_VALUE("c1", 1, result_c1);
select result_c1;
END;
After some research and trial-error methods, I used this workaround to solve this issue. It may not be the best solution when you have too many columns but it surely works.
CREATE OR REPLACE FUNCTION METADATA.GET_VALUE(column STRING, row_number int64) AS (
(SELECT case
when column_name = 'a' then a
when column_name = 'b' then b
when column_name = 'c' then c
when column_name = 'd' then d
when column_name = 'e' then e
end from WORK.temp WHERE rownumber = row_number)
);
And this gives the required results.
Point to note: the number of columns you use in the case statement should be of the same datatype else it won't work

Is there a SAS function to delete negative and missing values from a variable in a dataset?

Variable name is PRC. This is what I have so far. First block to delete negative values. Second block is to delete missing values.
data work.crspselected;
set work.crspraw;
where crspyear=2016;
if (PRC < 0)
then delete;
where ticker = 'SKYW';
run;
data work.crspselected;
set work.crspraw;
where ticker = 'SKYW';
where crspyear=2016;
where=(PRC ne .) ;
run;
Instead of using a function to remove negative and missing values, it can be done more simply when inputting or outputting the data. It can also be done with only one data step:
data work.crspselected;
set work.crspraw(where = (PRC >= 0 & PRC ^= .)); * delete values that are negative and missing;
where crspyear = 2016;
where ticker = 'SKYW';
run;
The section that does it is:
(where = (PRC >= 0 & PRC ^= .))
Which can be done for either the input dataset (work.crspraw) or the output dataset (work.crspselected).
If you must use a function, then the function missing() includes only missing values as per this answer. Hence ^missing() would do the opposite and include only non-missing values. There is not a function for non-negative values. But I think it's easier and quicker to do both together simultaneously without a function.
You don't need more than your first test to remove negative and missing values. SAS treats all 28 missing values (., ._, .A ... .Z) as less than any actual number.

SAS: When using user defined formats, if there's not a match, "default value" is the unformatted input variable?

In SAS EG, I have a user defined format
value $MDC
'001' = '77'
'002' = '77
...
'762' = '14'
etc.
My data set has DRG_code string variables with values like '001' and '140'.
I was trying to create a new variable, with the below code.
MDC = put(DRG_code, $MDC.)
Only there are more values for the variable DRG_code in my data set, then specified in the user defined format file, $MDC.
For example, when the data set DRG_Code equals '140' this value does not exist in the user defined format, and for some reason the put statement is returning MDC = '14' (which should only be its value with the DRUG code is '762').
Is there a way to make sure my put statement only returns a value from the user defined format when a corresponding value is present?
Grateful for feedback.
Lori
I've tried using formatting like "length" to have my put statement return 3, which I thought would result in "140" instead of "14" and that didn't work.
value $MDC
'001' = '77'
'002' = '77
...
'762' = '14'
MDC = put(DRG_code, $MDC.)
Formats have a DEFAULT width. If you do not specify a width when using the format then SAS will use the default width. When making a user defined format PROC FORMAT will set the default width to the maximum width of the formatted values. In your example the default width is being set to 2.
You can override that when you use the format.
MDC = put(DRG_code, $MDC3.)
Or you could define the default when you define the format.
value $MDC (default=3)
'001' = '77'
'002' = '77'
...
'762' = '14'
;
You can also set a default value for the unmatched values using the other keyword.
value $MDC (default=3)
'001' = '77'
'002' = '77'
...
'762' = '14'
other = 'UNK'
;
You can even nest a call to another format for the unmatched values (or any target format). In which case you do not need to specify the default width since the width on the nested format will be used when defining the default width.
value $MDC
'001' = '77'
'002' = '77'
...
'762' = '14'
other = [$3.]
;
I presume all the value mappings were $2 because that is what is used for an 'unfound' source value. In order to ensure the length of 'unfound' values, make sure one of the formatted values has trailing spaces filling out to length of longest unfound value.
value $MDC
'001' = '77 ' /* 7 characters, presuming no DRG_code exceeds 7 characters */
'002' = '77'
'762 = '14'
You can also fix this by specifying a length to use when applying the format, e.g.
proc format;
value $MDC
'001' = '77'
'762' = '14'
;
run;
data _null_;
do var = '001','140','762';
var_formatted = quote(put(var,$MDC3.));
put var= var_formatted=;
end;
run;
Output:
var=001 var_formatted="77 "
var=140 var_formatted="140"
var=762 var_formatted="14 "
N.B. both this solution and Richard's will result in trailing whitespace being added to formatted values, as you can see from the quotes.
Here I propose a slight modification to user667489's solution so that:
you don't need to specify the length of the format every time you use it (using the default option of the value statement when defining the format)
the resulting formatted value doesn't have trailing blanks (using the trim() function on the output resulting from applying the format)
i.e.
proc format;
value $MDC(default=3)
'001' = '77'
'002' = '77'
'762' = '14'
;
run;
data _null_;
do var = '001', '140', '762';
var_formatted = quote(trim(put(var, $MDC.)));
put var= var_formatted=;
end;
run;
which gives the following output:
var=001 var_formatted="77"
var=140 var_formatted="140"
var=762 var_formatted="14"

Error surrounding use of scan(&varlist) + Comparison of macro variables

As a follow up to this question, for which my existing answer appears to be best:
Extracting sub-data from a SAS dataset & applying to a different dataset
Given a dataset dsn_in, I currently have a set of macro variables max_1 - max_N that contain numeric data. I also have a macro variable varlist containing a list of variables. The two sets of macros are related such that max_1 is associated with scan(&varlist, 1), etc. I am trying to do compare the data values within dsn_in for each variable in varlist to the associated comparison values max_1 - max_N. I would like to output the updated data to dsn_out. Here is what I have so far:
data dsn_out;
set dsn_in;
/* scan list of variables and compare to decision criteria.
if > decision criteria, cap variable */
do i = 1 by 1 while(scan(&varlist, i) ~= '');
if scan("&varlist.", i) > input(symget('max_' || left(put(i, 2.))), best12.) then
scan("&varlist.", i) = input(symget('max_' || left(put(i, 2.))), best12.);
end;
run;
However, I'm getting the following error, which I don't understand. options mprint; shown. SAS appears to be interpreting scan as both an array and a variable, when it's a SAS function.
ERROR: Undeclared array referenced: scan.
MPRINT(OUTLIERS_MAX): if scan("var1 var2 var3 ... varN", i) > input(symget('max_'
|| left(put(i, 2.))), best12.) then scan("var1 var2 var3 ... varN", i) =
input(symget('max_' || left(put(i, 2.))), best12.);
ERROR: Variable scan has not been declared as an array.
MPRINT(OUTLIERS_MAX): end;
MPRINT(OUTLIERS_MAX): run;
Any help you can provide would be greatly appreciated.
The specific issue you have here is that you place SCAN on the left side of an equal sign. That is not allowed; SUBSTR is allowed to be used in this fashion, but not SCAN.