I think I am getting closer in SAS - sas

I am trying to create a logistic regression model but I keep on getting error messages on when I try to create it.
Here is my sample code:
proc logistic data = cleaned_anes descending;
class default / param=glm;
model default = student balance income;
run;
It says my variables are not found. But they are in my excel spreadsheet.

Related

PROC GENMOD Error: Nesting of continuous variable not allowed

I am doing cross-sectional logistic regression modeling of the probability of an event in eyes. Each patient is assigned an PatientID and each eye is assigned an EyeID; there are 2 eyes per patient.
I have attached my code blow.
PROC GENMOD data=new descend;
class patientID Explan1(ref="0") Explan2(ref ="0") Gender(ref="M") / param=ref;
model Therapy = PVD_STATUS Explan1 Explan2 Explan3 Gender/ dist=bin;
repeated subject=patientID(EyeID) / corr=unstr corrw;
run;
I get this error code: ERROR: Nesting of continuous variable not allowed.
This could be an issue related to the
repeated subject=patientID(EyeID)
Has anyone encountered this before? Possible solutions?
Set EyeID as a class variable. SAS assumes that it is continuous unless otherwise defined.
PROC GENMOD data=new descend;
class EyeID patientID Explan1(ref="0") Explan2(ref ="0") Gender(ref="M") / param=ref;
model Therapy = PVD_STATUS Explan1 Explan2 Explan3 Gender/ dist=bin;
repeated subject=patientID(EyeID) / corr=unstr corrw;
run;

Can SAS Score a Data Set to an ARIMA Model?

Is it possible to score a data set with a model created by PROC ARIMA in SAS?
This is the code I have that is not working:
proc arima data=work.data;
identify var=x crosscorr=(y(7) y(30));
estimate outest=work.arima;
run;
proc score data=work.data score=work.arima type=parms predict out=pred;
var x;
run;
When I run this code I get an error from the PROC SCORE portion that says "ERROR: Variable x not found." The x column is in the data set work.data.
proc score does not support autocorrelated variables. The simplest way to get an out-of-sample score is to combine both proc arima and a data step. Here's an example using sashelp.air.
Step 1: Generate historical data
We leave out the year 1960 as our score dataset.
data have;
set sashelp.air;
where year(date) < 1960;
run;
Step 2: Generate a model and forecast
The nooutall option tells proc arima to only produce the 12 future forecasts.
proc arima data=have;
identify var=air(12);
estimate p=1 q=(2) method=ml;
forecast lead=12 id=date interval=month out=forecast nooutall;
run;
Step 3: Score
Merge together your forecast and full historical dataset to see how well the model did. I personally like the update statement because it will not replace anything with missing values.
data want;
update forecast(in=fcst)
sashelp.air(in=historical);
by Date;
/* Generate fit statistics */
Error = Forecast-Air;
PctError = Error/Air;
AbsPctError = abs(PctError);
/* Helpful for bookkeeping */
if(fcst) then Type = 'Score';
else if(historical) then Type = 'Est';
format PctError AbsPctError percent8.2;
run;
You can take this code and convert it into a generalized macro for yourself. That way in the future, if you wanted to score something, you could simply call a macro program to get what you need.

ods output in proc logistic

I am running a proc logistic with selection =score , to get the best model based on chi-square value. Here is the code
options symbolgen;
%let input_var=ABC_DEF_CkkkkkedHojjjjjerRen101 dept_gert home_value
child_household ;
ods output bestsubsets=score;
proc logistic data=trail;
model response(event='Y')=&input_var
/ selection=score best=1;
run;
The output dataset named score has been generated through ods output. Below is the image of the data set.
score data set image
In the score dataset, in the "variables included in model" column, you can only see a part of variable name "ABC_DEF_CkkkkkedHojjjjjerRen101" and not the entire name. May I know why is this happening and how do I get the entire variable name. Please let me know
Add NAMELEN=32 to your proc logistic statement.

How to calculate regression coefficient and put it into each row of a table

I have a SQL that would create for each customer a short excerpt of his history. Suppose the columns I am interested in are TIMESTAMP and PURCHASE VALUE. I'd like to calculate a linear regression for each customer and put this value into a table.
proc sql;
create table CUSTOMERHISTORY as
select
TIME_STAMP
,PURCHASE_VALUE
,CUSTOMER_ID
from <my data source>
;quit;
The table is quite large; it would be best, if the table wouldn't have to loaded into RAM prior to computation.
I tried
proc reg
data = CUSTOMERHISTORY;
model PURCHASE_VALUE=TIME_STAMP;
outest = OUTTABLE;
by CUSTOMER_ID;
but it never wrote anything to the OUTTABLE. (I found parameter outest in http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_reg_sect007.htm )
According to the documentation you link to, outtest is a parameter that you should give as a option to proc reg. So to get that specific output, your code should look as:
proc reg
data = CUSTOMERHISTORY
outest = OUTTABLE;
model PURCHASE_VALUE=TIME_STAMP;
by CUSTOMER_ID;
run;
Note that there is no semicolon between data = ... and outtest = ....

SAS selecting top logit models by AIC

I have a problem with SAS proc logistic.
I was using the following procedures when I had OLS regression and everything worked OK:
proc reg data = input_data outest = output_data;
model y = x1-x25 / selection = cp aic stop = 10;
run;
quit;
Here I wanted SAS to estimate all possible regressions using combinations of 25 regressors (x1-x25) including no more than 10 regressors in model.
Basically, I want to do the same thing (estimate all possible models having 25 regressors with no more than 10 included in a model and output top-models in a dataset with corresponding AIC) but with logistic regression.
I also know that I can use selection = score in Proc Logistic, but I'm not sure how to use outest= then and whether Score Chi-square is really a reliable alternative to cp and AIC in proc reg
So far, I know how to do stepwise/backward/forward logistic regressions, but these methods do not suit me well and btw they display in the output dataset only the top-1 model, while I want at least top-100.
Any help or advice will be highly appreciated!