SAS ERROR IN A PROC HPBIN.... ERROR: class variable - sas

I RAN THIS QUERY AND GET THIS ERROR MESSAGE:
ERROR: num_bankrupt_iva is class variable, only interval variable is supported.
/*WoE and IV Analysis*/
proc hpbin data=train numbin=5;
input monthly_installment loan_balance bureau_score2 num_bankrupt_iva time_since_bankrupt
num_ccj2 time_since_ccj ccj_amount num_bankrupt num_iva min_months_since_bankrupt
ltv arrears_months origination_date maturity_date arrears_status arrears_segment
mob remaining_mat loan_term live_status repaid_status /*year quarter*/
month worst_arrears_status max_arrears_12m recent_arrears_date months_since_2mia
avg_mia_6m max_arrears_bal_6m max_mia_6m avg_bal_6m avg_bureau_score_6m
cc_util annual_income emp_length months_since_recent_cc_delinq;
ods output mapping=mapping;
run;
and the LOG shows these errors:
NOTE: Binning methods: BUCKET BINNING .
ERROR: num_bankrupt_iva is class variable, only interval variable is supported.
ERROR: time_since_bankrupt is class variable, only interval variable is supported.
ERROR: time_since_ccj is class variable, only interval variable is supported.
ERROR: ccj_amount is class variable, only interval variable is supported.
ERROR: num_bankrupt is class variable, only interval variable is supported.
ERROR: num_iva is class variable, only interval variable is supported.
ERROR: min_months_since_bankrupt is class variable, only interval variable is supported.
ERROR: recent_arrears_date is class variable, only interval variable is supported.
ERROR: months_since_2mia is class variable, only interval variable is supported.
ERROR: avg_bureau_score_6m is class variable, only interval variable is supported.
NOTE: The number of bins is: 5.
NOTE: The HPBIN procedure is executing in single-machine mode.

It seems like all of those variables in the log are specified as characters in your input dataset. You will need to convert them to numeric using the input() function. Or, you could multiply them by 1 and let SAS automatically do the conversion.
data want;
set have;
numvar = input(classvar, 8.);
numvar2 = 1*classvar;
run;

Related

Pass Character dynamically to a macro

I'm trying to pass file name to a macro. The macro runs once a month, therefore,I'm trying to store the output file with a month prefix. In the current code someone has to manually provide a file name every month (Sep17_Sales, Oct17_Sales etc.). I want to automate this so that SAS generates files with the name of the month prefixed to the data file.
Macro:
%macro sales (outdata = , dt =);
Current Code
%Sales(Outdata = Sep17_Sales, dt = '2017-09-01');
%Sales(Outdata =Oct17_Sales, dt ='2017-10-01');
My approach:
data _null_;
current_date = today();
current_month = intnx('month', current_date, 0, "Begginning");
Name = "_Sales";
Result = put(current_month, monyy7.) || name;
run;
%Sales(Outdata=Result, dt='2017-10-01');
When I try to pass the parameter, it throws error. I tried changing Result to %Let Result and pass a reference &Result to the macro but it also fails.
Any suggestion how to solve this? Thank you for all the help!!
What you are doing there is assigning a value to a data step variable called Result. The name Result doesn't mean anything outside the context of that datastep and therefore does not resolve to anything when you call your macro. What you are doing instead is telling your macro that your output file should be called "Result".
You could fix that by replacing your Result= line with call symput('Result',put(current_month, monyy7.) || name);, which effectively creates a macro variable called "Result", then call your sales macro like so: ``%Sales(Outdata=&Result, dt='2017-10-01');
OR, you could scratch all that and simply call your macro like this:
%sales(outdata=%sysfunc(today(),monyy7.)_Sales, dt='2017-10-01');
Going further, assuming the second argument (dt) is always meant to be the first day of the month formatted as yyyy-mm-dd and enclosed in single quotes (although if that is the case I see little use in specifying it as a parameter of the macro), you could make the call even more dynamic:
%sales(outdata=%sysfunc(today(),monyy7.)_Sales, dt=%str(%')%sysfunc(intnx(month,%sysfunc(today()),0,B),E8601DA.)%str(%'));
if that date can be enclosed in double quotes, this can be simplified a little as:
%sales(outdata=%sysfunc(today(),monyy7.)_Sales, dt="%sysfunc(intnx(month,%sysfunc(today()),0,B),E8601DA.)");

Is it possible to filter a data step on a newly computed variable?

In a basic data step I'm creating a new variable and I need to filter the dataset based on this new variable.
data want;
set have;
newVariable = 'aaa';
*lots of computations that change newVariable ;
*if xxx then newVariable = 'bbb';
*if yyy AND not zzz then newVariable = 'ccc';
*etc.;
where newVariable ne 'aaa';
run;
ERROR: Variable newVariable is not on file WORK.have.
I usually do this in 2 steps, but I'm wondering if there is a better way.
( Of course you could always write a complex where statement based on variables present in WORK.have. But in this case the computation of newVariable it's too complex and it is more efficient to do the filter in a 2nd data step )
I couldn't find any info on this, I apologize for the dumb question if the answer is in the documentation and I didn't find it. I'll remove the question if needed.
Thanks!
Use a subsetting if statement:
if newVariable ne 'aaa';
In general, if <condition>; is equivalent to if not(<condition>) then delete;. The delete statement tells SAS to abandon this iteration of the data step and go back to the start for the next iteration. Unless you have used an explicit output statement before your subsetting if statement, this will prevent a row from being output.

SAS: use oberservation

I have a question on how to use the value from a SAS database in another command. In my case, I have a database with two variables (cell and res). "Cell" contains a reference to a cell in an Excel sheet where the value of "res" should be copied.
So I would like to use the value stored in "cell" in my command linking to the Excel sheet. This code does not work (concatenating with || does not work.)
DATA _null_;
SET test;
FILENAME ExcelTmp DDE "EXCEL|[&myInputTemplate.]&mySheet.!" || cell;
FILE ExcelTmp NOTAB LRECL=7000;
PUT res;
RUN;
Error message:
ERROR 23-2: Invalid option name ||.
1491! DDE "EXCEL|[&myInputTemplate.]&mySheet.!" || cell;
ERROR: Error in the FILENAME statement.
ERROR 23-2: Invalid option name cell.
1492 FILE ExcelTmp NOTAB LRECL=7000;
ERROR 23-2: Invalid option name NOTAB.
If I write
FILENAME ExcelTmp DDE "EXCEL|[&myInputTemplate.]&mySheet.!R1C1:R1C1";
then the value is written to cell A1 in Excel.
Is there some similar approach that works without invoking a macro?
Thanks for your help!
Christoph
The usual way to use values from a dataset as a part of command/statement is CALL EXECUTE routine:
DATA _null_;
SET test;
call execute("DATA _NULL_;");
call execute(cats("FILENAME ExcelTmp DDE ""EXCEL|[&myInputTemplate.]&mySheet.!",cell,""";"));
call execute("FILE ExcelTmp NOTAB LRECL=7000;");
call execute("PUT '"||res||"';");
call execute("RUN;");
run;
This code generates DATA-steps that stacked up in a buffer and will be executed after the step above is executed. So basically you will generate as many DATA NULL steps as you have records in your test dataset.
Assuming you're trying to update multiple cells, and cell is in the form RnCn, something like this may work...
You also need to determine the cell range beforehand, e.g. R2C2:R100:C5.
%LET RANGE = R2C2:R100C5 ;
DATA _null_;
SET test;
FILENAME ExcelTmp DDE "EXCEL|[&myInputTemplate.]&mySheet.!&RANGE" ;
FILE ExcelTmp NOTAB LRECL=7000;
put "[select(""" cell """)]" ;
PUT res;
RUN;

SAS function that will only use non-missing values for a variable?

I am trying to create a new variable that is the sum of other variables. Should be simple enough, however if one of the variables that is being used in the calculation of the new variable has a missing value, then the new variable has a missing value as well, when I want it to just sum across the remaining non-missing variables. For example, the data may look like:
a b c d e
1 . 3 2 6
The new variable is calculated as
newvar=a+b+c+d+e
For the above row, SAS returns a missing value for newvar because b is missing, when I would like it to return
newvar=a+c+d+e
as the answer. Is there a simple way to get SAS to do this?
Sure thing: just use the SUM function:
data _null_;
a=1;
b=.;
c=3;
d=2;
e=6;
newvar = sum(a,b,c,d,e);
put newvar=;
run;

Stata ambiguous abbreviation r(111)

I am trying to draw marginplot using stata12. I am running the following code:
margins, at(FuncVariant =(0(0.2) 1)) over(Platform)
Following is the error:
FuncVariant ambiguous abbreviation r(111);
I have the following variables like
FuncVariant :
FuncVariant
FuncVariant_mean
FuncVariant_W
Is that creating a problem?
Post the exact result of the following command to get a diagnosis of the issue in your data:
d FuncVariant*
To get rid of the issue, turn the Stata variable abbreviation setting permanently off:
set varabbrev off, perm
tl;dr: you probably don't have a FuncVariant variable in your data.
d FuncVariant*
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------
FuncVariant byte %8.0g
FuncVariant_m~n float %9.0g
FuncVariant_W float %9.0g
I understood that FuncVariant is dummy variable, so instead I used FuncVariant_W, but it throws error
margins, at( FuncVariant_W =-1(0.2)1) over(Platform)
'FuncVariant_W' not found in list of covariates
For many other variables in the dataset it shows the same error, though the variables are present in the dataset.