I am trying to create a decision tree in SAS. Whenever I run my SAS code I get an error message.
Could I have someone assistance?
Sample code:
proc hpsplit data=credit;
class default student;
model default = student balance income
output out=hpsliout;
prune costcomplexity;
run;
Related
Wondering if someone could help me figure out where I'm going wrong with this HPSplit procedure to form a decision tree? Here are the instructions for classification using decision tree. We are using SAS Studio (University Edition):
"Using the variable default as the response variable, fit a decision tree model with the predictor variables student, balance, and income."
The biggest problem seems to be the default and student variables are yes/no values and the others are numbers. I can't seem to get the predictor variables student and income to be a part of the tree. Here's the code I have. I've tried all sorts of combinations.
ODS GRAPHICS ON;
PROC HPSPLIT DATA=MYFOLDER.DEFAULT;
Class default student;
model default=balance student income;
output out=hpsliout;
prune costcomplexity;
run;
Here's what the tree looks like:
subtree starting at node=0
Thanks for the help!
I am very new to SAS, which is why this question has probably a quite easy answer. I use the SAS university edition.
I have dataset containing socio-structural data in 31 variables and 1000000 observations. The data is stored in a Stata .dta file, which is why I used the following code to import in into SAS:
LIBNAME IN '/folders/myfolders/fake_data';
proc import out= fake_2017 datafile = "/folders/myfolders/fake_data/mz_2017.dta" replace;
run;
Now, I want to create new variables. At first, a year variable that takes the value 2017 for all observations. After, I have several other variables that I want to generate from the 31 existing variables. However, running my code I get the same error message for all my steps:
year = 2017;
run;
ERROR 180-322: Statement is not valid or it is used out of proper order.
I found many things online but nothing that'd help me. What am I doing wrong/forgetting? For me, the code looks like in all the SAS tutorial videos that I have already watched.
You cannot have an assignment statement outside of a data step. You used a PROC IMPORT step to create a dataset named fake_2017. So now you need to run a data step to make a new dataset where you can create your new variable. Let's call the new dataset fixed_2017.
data fixed_2017;
set fake_2017;
year=2017;
run;
Here is reading the beginning of the code,
data bmi;
infile "/home/my_courses/bmi.dat" firstobs=2;
input studyid age ChildhoodBMIz sex AdulthoodBMI Obesity;
run;
proc sort data=bmi;
by studyid;
run;
The following is the block of code that I am having trouble with,
* MRM for RC method;
PROC MIXED data=bmi METHOD=ML COVTEST;
CLASS studyid;
MODEL ChildhoodBMIz = age/SOLUTION;
RANDOM INTERCEPT age/SUB=studyid TYPE=un G S cl;
ODS LISTING EXCLUDE SOLUTIONR; ODS OUTPUT SOLUTIONR=randnew SOLUTIONF=fixednew;
RUN;
Only, I am receiving an error that I don't quite understand.
ERROR: The LISTING destination is not active; no select/exclude lists are available.
How do I activate the listing destination?
Remove the ODS LISTING EXCLUDE SOLUTIONR; portion of the code and it should successfully run.
The code is outputting data used by some of your graphics from your mixed model to a new dataset called randnew:
ODS OUTPUT SOLUTIONR=randnew
All modern SAS IDEs output to HTML by default. The code is excluding the old listing graphics destination, which looks to already be disabled in your session. If this was autogenerated code, it is probably legacy code that no longer applies to modern SAS IDEs and should be mentioned to Tech Support.
I am trying to use the proc tabulate procedure to arrive at the average price of some configurable items, across stores and across months. Below is the sample data set, which I need to process
Configuration|Store_Postcode|Retail Price|month
163|SE1 2BN|455|1
320|SW12 9HD|545|1
23|E2 0RY|515|1
The below code is displaying the month wise average price for each configuration.
proc tabulate data=cs2.pos_data_raw;
class configuration store_postcode month;
var retail_price;
table configuration,month*MEAN*retail_price;
run;
But can I get this grouped one more level - at the Store Post code level? I modified the code to read as shown below, but executing this is crashing the system!
proc tabulate data=cs2.pos_data_raw;
class configuration store_postcode month;
var retail_price;
table configuration,store_postcode*month*MEAN*retail_price;
run;
Please advice if my approach is incorrect, or what am I doing wrong in proc tabulate so much so that it crashes the system.
I am not sure if this exactly answers your question since I am new to SAS, but when I switched store_postcode*month*MEAN*retail_price to month*store_postcode*MEAN*retail_price , it worked without crashing. I am just guessing that the reason for this is because your data only contains 1 value for month and multiple for postal code, therefore month is the most general level of categorization then it becomes more specific.
On a side note, I tried to format the table in another way also to segment the data by postal code:
proc tabulate data=pos_data_raw;
class configuration store_postcode month;
var retail_price;
table store_postcode*configuration, month*MEAN*retail_price;
run;
The output looks like this:
where the table will have postal code and configuration id on the left and month and retail price on top.
all.
I have already got output from logistic regression, those coefficients. I would like to plug those coefficients into new data set and use the variables in the new data set but the old coefficients to predict new "y". What should I do?I have already tried proc score, but not sure if it is the proper way.
Use can use PROC Logistic Inmodel statement, See the example from SAS documentation here:-
proc logistic inmodel = your_coefficient_file_from_logistic_run;
score data= new_dataset_to_score out=new_scored_dataset;
run;
Let me know if you have any questions