Naive Bayes Classifier in SAS - sas

I am attempting to classify a dataset based on heart diease using the naive bayes classifier on sas.The dataset i am using can be found on kaggle using https://www.kaggle.com/johnsmith88/heart-disease-dataset. The code I am using is below.
data heart_train;
set heart_train;
if target="0" then class_diease="2";
if target="1" then class_diease="1";
run;
%nb( train=heart_train, score=heart_test,
nclass=2,
target=class_diease, inputs=age sex cp trestbps chol
fbs restecg thalach exang oldpeak slope ca thal)
%end;
when i run this code i get alot of errors such as
ERROR: Expression using equals (=) has components that are of different data types.
ERROR: Column sex could not be found in the table/view identified with the correlation name B.
ERROR: Column sex could not be found in the table/view identified with the correlation name B.
ERROR: WHERE clause operator requires compatible variables.
can anyone tell me why my code isnt working?

Related

SAS transformation and missing data

I'm using the boxcox transformation in SAS with the proc transreg procedure, and I was wondering how does SAS handle missing data.
I have a dataset that includes one row per month per participant, with a continuous variable every month. For some months, the variable is missing. The formula of the Box-Cox transformation doesn't use the distribution of the variable or whatever. How is SAS working, does it exclude the missing data?
Below is my code to apply the boxcox transformation to my variable:
PROC TRANSREG DATA=myfile DETAILS;
MODEL BOXCOX(myvariable/ parameter=0.1) = identity(month);
OUTPUT OUT= transformed_myfile;
RUN;
Thanks!
From the documentation:
PROC TRANSREG can estimate missing values, with or without category or monotonicity constraints, so that the regression model fit is optimized. Several approaches to missing data handling are provided. All observations with missing values in IDENTITY, CLASS, POINT, EPOINT, QPOINT, SMOOTH, PBSPLINE, PSPLINE, and BSPLINE variables are excluded from the analysis. When METHOD=UNIVARIATE (specified in the PROC TRANSREG or MODEL statement), observations with missing values in any of the independent variables are excluded from the analysis. When you specify the NOMISS a-option, observations with missing values in the other analysis variables are excluded. Otherwise, missing data are estimated, and the variable means are the initial estimates.
(Emphasis added). You can add various transformations as you prefer, or go with SAS's default estimates.

SAS propensity score matching: Observations considered for matching in PSMATCH is less than the total observations available in the data set

I am using SAS procedure PSMATCH to balance the cohorts. I am calculating the propensity score separately using logistic regression and then using the generated dataset in PSMATCH using PSDATA. I am doing multiple iterations of matching (to get the best results) by bringing variation in region, method (Optimal, Greedy and variable ratio), distance variable, caliper value and ratio. Please find the code below:
proc psmatch data=work.&data_set. region=&region_var.;
class &cat_var.;
psdata treatvar = case_cntrl_fl(Treated='1') PS=prop_score;
match method=&mtch_method.(&k_method.=&k_val.) exact= &.exact_mtch_var.
stat=&stat_var. caliper(mult=stddev)=&caliper_var.;
assess lps ps var=(prop_score &covar_asses.) / plots = (boxplot cloudplot);
output out(obs=match)=WORK.psm ps=ps lps=lps matchid=_MatchID matchwgt = _MATCHWGT_;
run;
My concern is regarding the number of observation considered for matching (i.e. All Observations). The total observation logistic regression data set are Treatment Arm 1: 531 and Treatment Arm 2: 3252 However, in PSMATCH report All observations reported as Treatment Arm 1: 446 and Treatment Arm 2: 2784 The result is consistent irrespective of the variations in PSMATCH methods
Can somebody help me understand the possible reason of drop in counts?
You likely have missing values in your data. If any variable in the proc is missing, that entire row is excluded from the analysis overall.

In SAS: How to produce odds ratios (OR) in the results of PROC GENMOD

Possibly related to this question: How can I print odds ratios as part of the results of a GENMOD procedure?
I am dealing with a wide dataset containing; a main exposure variable, a categorical variable Type (four levels), as several continuous and binary variables as confounding factors.
Additional info: The dataset contains multiple imputations.
I am using the following code:
Proc genmod;
Class ID Type (ref=first)
Model class1= Type;
estimate 'black' TYPE 0 1 1/exp;
estimate 'white' TYPE 1 0 1/exp;
estimate 'red' TYPE 0 1 0/exp;
Repeated ID;
By imputation;
Run;
I expected the results table to contain, among others, the beta for the exponential of every level of the categorical variable Type ( bar that variable's reference group). The actual results table lacks beta values, nor does the table have confidence intervals printed.
What syntax should I use to tell SAS to produce those numbers in the results? I have looked through SAS documentation, but I have yet found an answer.

ERROR: Expression using IN has components that are of different data types

I am using the below query in SAS Enterprise Guide to find the count for different offer_ids customers for different dates :
PROC SQL;
CREATE TABLE test1 as
select offer_id,
(Count(DISTINCT (case when date between '2016-11-13' and '2016-12-27' then customer_id else 0 end))) as CUSTID
from test
group by offer_id
;QUIT;
ERROR: Expression using IN has components that are of different data types
Note: Here, Offer_id is the character variable whereas Custome_id is an numeric variable.
Most likely the error is caused by comparing the numeric variable DATE to the character strings '2016-11-13'. If you want to specify a date literal in SAS you must specify the date in DATE9 format and append the letter D after the close quote.
date BETWEEN '13NOV2016'd AND '27DEC2016'd
Note that there is no reference to any external database in the posted code. But even if your source table was tdlib.tdtable instead of work.test you still need to use SAS syntax when writing SAS code. Let the Teradata engine figure out how to convert it for you.
You don't make it clear whether this is being run on SAS or Teradata (via pass through).
I'm guessing SAS, in which case you are missing d after your dates (e.g. '2016-11-13'd). Without this, the dates are being treated as text instead of formatted numbers.
The error statement is slightly misleading, as SAS is treating the between statement as an in statement.

Datatype mismatch converting SAS numeric to Teradata BIGINT

I have a SAS dataset with a numeric variable ACCT_ID (among other fields). Its attributes in a PROC CONTENTS are:
# Variable Type Len Format Informat Label
1 ACCT_ID Num 8 19. 19. ACCT_ID
I know that this field doesn't have any non-integer values in it, so I want to store it as a BIGINT in Teradata, and I've specified this with the dbtype data set option like this:
data td.output(dbtype=(ACCT_ID="BIGINT", <etc etc>));
However, this gives the following error:
ERROR: Datatype mismatch for column: ACCT_ID.
There are no missing or non-integer values in that field, and the error persists even if I round ACCT_ID using round(acct_id, 1) to explicitly remove any floating point values that could exist.
Strangely enough, no error is given if I assign this to be a DECIMAL(18,0) in Teradata rather than a BIGINT. I guess that could be one workaround, but I'd like to understand how I can create integer fields in Teradata from SAS numeric variables like this given that SAS doesn't distinguish types between integer and floating point.
SAS does not support the BIGINT datatype. See http://support.sas.com/kb/34/729.html.
Teradata's BIGINT data type is not supported in SAS/ACCESS Interface
to Teradata. You cannot read or update a table containing a column
with the BIGINT data type in SAS/ACCESS Interface to Teradata.
Attempting to do so generates the following error message:
ERROR: At least one of the columns in this DBMS table has a datatype that is
not supported by this engine.