I am really new to SAS and I am struggling. I am trying to create a logistic regression model. The three variables I need are student, balance, and income. but I keep on getting an error message. The error I am getting:
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
ERROR: File WORK.CREDIT.DATA does not exist.
NOTE: The SAS System stopped processing this step because of errors.
73 proc logistic data=credit;
74 class gender/ param=glm;
75 model default (event='1') = gender age;
76 run;
77
78 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
90
Code:
proc logistic data=credit;
class gender/ param=glm;
model default (event='1') = gender age;
run;
I changed my code to:
proc logistic WORK.IMPORT;
class gender/ param=glm; model default (event='1') = student balance income;
run;
and now I am getting this:
OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: The SAS System stopped processing this step because of errors.
73 proc logistic WORK.IMPORT;
___________
22
201
ERROR 22-322: Syntax error, expecting one of the following: ;, ALPHA, COVOUT, DATA, DESC, DESCENDING, EXACTONLY, EXACTOPTIONS, IN,
INEST, INMODEL, MAXRESPONSELEVELS, MISSING, MULTIPASS, NAMELEN, NOFIT, NOPRINT, ORDER, OUT, OUTDESIGN, OUTDESIGNONLY,
OUTEST, OUTMODEL, PLOT, PLOTS, REF, REFERENCE, ROCOPTIONS, RORDER, SIMPLE, TRUNCATE.
ERROR 201-322: The option is not recognized and will be ignored.
74 class gender/ param=glm; model default (event='1') = student balance income;
75 run;
76
77 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
89
I am getting closer:
proc logistic data= WORK.IMPORT;
class gender/ param=glm; model default (event='1') = student balance income;
run;
New error variable gender not found.
Let's start with the basics and use sashelp.heart to create a simple logistic regression model and break down each statement.
proc logistic data=sashelp.heart plots=all;
class sex(ref='Male') / param=glm;
model status(event='Dead') = Sex Weight Height AgeAtStart Smoking;
run;
proc logistic data=sashelp.heart plots=all;
Start the proc logistic procedure, which creates logistic regression models, and print all the plots it produces. Read our data from sashelp.heart.
class sex(ref='Male') / param=glm;
Tell proc logistic that the variable sex is a categorical variable. We want to make our reference value males and use GLM encoding. The / indicates additional options for the class statement. Many procedures use this type of options syntax. Check the procedure's documentation to learn about its options and features.
The class statement must precede the model statement. proc logistic needs to know which variables are categorical before it starts modeling.
model status(event='Dead') = Sex Weight Height AgeAtStart Smoking;
Estimate a logistic regression model using dead as our event of interest. In other words, we're creating parameter estimates of the log odds that someone will die.
Related
When I use the proc logistic in SAS, in the output, it return the confidence of interval and p-value of the odds ratio, how can I output the standard error of the odds ratio?
proc logistic data=edu;
model school = age sex income/ clodds=wald orpvalue;
oddsratio age;
run;
The output likes
Odds Ratio Estimates and Wald confidence interval
Point 95% Wald
Effect Estimate Confidence Limits p-value
age 1.21 0.74 2.001 < 0.01
Tip: The documentation page Proc Logistic Details -> ODS Table Names lists all the tables the procedure will produce for ODS. The ODDSRATIO ... /CL=WALD ...; statement creates an output table named OddsRatiosWald.
The ODS TRACE ON statement will also log the the table names that a Proc Step produces for ODS output.
Save the table as an output data set using the ODS OUTPUT statement.
Example:
Code from SAS samples tweaked to save ODS OUTPUT.
* Example 76.4 Nominal Response Data: Generalized Logits Model;
data school;
length Program $ 9;
input School Program $ Style $ Count ##;
datalines;
1 regular self 10 1 regular team 17 1 regular class 26
1 afternoon self 5 1 afternoon team 12 1 afternoon class 50
2 regular self 21 2 regular team 17 2 regular class 26
2 afternoon self 16 2 afternoon team 12 2 afternoon class 36
3 regular self 15 3 regular team 15 3 regular class 16
3 afternoon self 12 3 afternoon team 12 3 afternoon class 20
;
ods trace on;
ods graphics on;
ods html file='logistic.html';
proc logistic data=school;
freq Count;
class School Program(ref=first);
model Style(order=data)=School Program School*Program / link=glogit;
oddsratio program / cl=wald;
ods output OddsRatiosWald=or_program;
run;
proc print data=or_program;
title "Logistic Odds Ratios CL=Wald output data";
run;
ods html close;
ods trace off;
title;
Output data as examined by viewtable in Base SAS
I have a dataset that contains a list of employees and the number of hours they work.
Name Hours_CD_Max
Bob 455
Dan 675
Jane 543
Suzzy 575
Emily 234
I used a proc summary datastep to calculate the number of total hours worked by these employees.
Proc summary data = staff;
where position = 'PA_FT_UMC';
var Hours_CD_Max;
output out=PAFT_Only_Staff_Totals
sum = Hours_CD_Max_tot;
run;
I would like to use the 'Hours_CD_Max_tot' statistic (2482) from this proc summary datastep and apply to other occurrences within my code where I need to make calculations.
For example, I want to take each providers current hours and divide it by Hours_CD_Max_tot.
So for example, create a dataset that looks like this
data PA_FT_UMC_StaffingV2;
set PA_FT_UMC_StaffingV1 PAFT_Only_Staff_Totals;
if position = 'PA_FT_UMC';
PA_FT_Max_Percent= Hours_CD_Max/Hours_CD_Max_tot;
run;
Name Hours_CD_Max Hours_CD_Max_tot
Bob 455 2482
Dan 675 2482
Jane 543 2482
Suzzy 575 2482
Emily 234 2482
I realize I can calculate the % of hours worked using a proc freq but I really would like to have that (2482) number to be freely available so I can plug it into more complex equations.
If you want to use a variable from another dataset then pull it into your current code.
If there are BY variables you can merge it onto your other dataset. If this case where you just have one observation just run a SET command on the first iteration of the data step.
data PA_FT_UMC_StaffingV2;
set PA_FT_UMC_StaffingV1 PAFT_Only_Staff_Totals;
if _n_=1 then set PAFT_Only_Staff_Totals(keep=Hours_CD_Max_tot);
if position = 'PA_FT_UMC';
PA_FT_Max_Percent= Hours_CD_Max/Hours_CD_Max_tot;
run;
You need to create a macro variable from it and SQL is the fastest solution.
proc sql noprint;
select sum(hours_cd_max) into :hours_max
from have;
quit;
Then you can use it in your calculations later on, using &hours_max
Issue: I'm running in the same code two separate proc tabulate which generate two seaparate cross frequency tables. I would be able to generate two different result reports as output instead of the standard one aggregating the two outputs in the same result page, without the need to create two separate codes. There is any method to achive this?
Update1 : Below is the output of the two proc tabulate I want separate into two differente objects.
enter image description here
You can use the SAS ODS (Output Delivery System) and output your results to two different files. (the file can be a pdf, html, rtf).
Code below based on SAS Support Example will split the output into two files ttest1.htm & ttest2.htm
title 'Comparing Group Means';
data Scores;
input Gender $ Score ##;
datalines;
f 75 f 76 f 80 f 77 f 80 f 77 f 73
m 82 m 80 m 85 m 85 m 78 m 87 m 82
;
ods html body='ttest1.htm' style=HTMLBlue;
proc ttest;
class Gender;
var Score;
run;
ods html close;
ods html body='ttest2.htm' style=HTMLBlue;
proc ttest;
class Gender;
var Score;
run;
ods html close;
In SAS Enterprise Guide:
You can add the option to create your SAS report output in RTF and PDF formats. This will show the page break in one file/report.
Go to Tools/Options then check the output formats you want and re-run your project.
As the title suggests, I wonder about there's a way to print the Somers'D statistics and the p-value of the predictor x in a dataset.
You can get such statistics by simply running:
ODS TRACE ON;
PROC LOGISTIC DATA = BETTING.TRAINING_DUMMIES NOPRINT;
MODEL Z1 (EVENT = '1') = D_INT_LNGAP_1;
OPTIONS;
RUN;
ODS TRACE OFF;
ODS OUTPUT FITSTATISTICS=FITDS;
PROC LOGISTIC DATA = BETTING.TRAINING_DUMMIES NOPRINT;
MODEL Z1 (EVENT = '1') = D_INT_LNGAP_1;
OPTIONS;
RUN;
If I run a similar code to the one proposed here, I get only the AIC, the SIC and finally the LR stat and in the SAS log I find:
10 ODS TRACE ON;
11
12 PROC LOGISTIC DATA = BETTING.TRAINING_DUMMIES NOPRINT;
13 MODEL Z1 (EVENT = '1') = D_INT_LNGAP_1;
14 OPTIONS;
15 RUN;
NOTE: PROC LOGISTIC is modeling the probability that z1=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 3968 observations read from the data set BETTING.TRAINING_DUMMIES.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.07 seconds
cpu time 0.04 seconds
16
17 ODS TRACE OFF;
in the first piece of code, while in the second I find the following:
18 ODS OUTPUT FITSTATISTICS=FITDS;
NOTE: Writing HTML Body file: sashtml.htm
19 PROC LOGISTIC DATA = BETTING.TRAINING_DUMMIES NOPRINT;
20 MODEL Z1 (EVENT = '1') = D_INT_LNGAP_1;
21 OPTIONS;
22 RUN;
NOTE: PROC LOGISTIC is modeling the probability that z1=1.
NOTE: Convergence criterion (GCONV=1E-8) satisfied.
NOTE: There were 3968 observations read from the data set BETTING.TRAINING_DUMMIES.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 0.04 seconds
cpu time 0.04 seconds
WARNING: Output 'FITSTATISTICS' was not created. Make sure that the output object name, label,
or path is spelled correctly. Also, verify that the appropriate procedure options are
used to produce the requested output object. For example, verify that the NOPRINT
option is not used.
Some of you can suggest a way to to print such statistics in a new dataset?
Any help will be appreciated.
Thanks!
I don't know why you're not getting ODS TRACE output. I'd restart your SAS version or report it to SAS.
The tables you want are called Association and ParameterEstimates. Somer's D requires the Odds Ratio statement to be created.
ods trace on;
ods output association=somers parameterestimates=pe;
proc logistic data=sashelp.heart;
model status=ageatstart;
oddsratio ageatstart;
run;
ods trace off;
I have the following dataset:
Date Occupation Tota_Employed
1/1/2005 Teacher 45
1/1/2005 Economist 76
1/1/2005 Artist 14
2/1/2005 Doctor 26
2/1/2005 Economist 14
2/1/2005 Mathematician 10
and so on until November 2014
What I am trying to do is to calculate a column of percentage of employed by occupation such that my data will look like this:
Date Occupation Tota_Employed Percent_Emp_by_Occupation
1/1/2005 Teacher 45 33.33
1/1/2005 Economist 76 56.29
1/1/2005 Artist 14 10.37
2/1/2005 Doctor 26 52.00
2/1/2005 Economist 14 28.00
2/1/2005 Mathematician 10 20.00
where the percent_emp_by_occupation is calculated by dividing total_employed by each date (month&year) by total sum for each occupation to get the percentage:
Example for Teacher: (45/135)*100, where 135 is the sum of 45+76+14
I know I can get a table via proc tabulate, but was wondering if there is anyway of getting it through another procedure, specially since I wanted this as a separate dataset.
What is the best way to go about doing this? Thanks in advance.
Extract month and year from the date and create a key:
data ds;
set ds;
month=month(date);
year=year(date);
key=catx("_",month,year);
run;
Roll up the total at month level:
Proc sql;
create table month_total as
select key,sum(total_employed) as monthly_total
from ds
group by key;
quit;
Update the original data with the monthly total:
Proc sql;
create table ds as
select a.*,b.monthly_total
from ds as a left join month_total as b
on a.key=b.key;
quit;
This would lead to the following data set:
Date Occupation Tota_Employed monthly_total
1/1/2005 Teacher 45 135
1/1/2005 Economist 76 135
1/1/2005 Artist 14 135
Finally calculate the percentage as:
data ds;
set ds;
percentage=total_employed/monthly_total;
run;
Here you go:
proc sql;
create table occ2 as
select
occ.*,
total_employed/employed_by_date as percentage_employed_by_date format=percent7.1
from
occ a
join
(select
date,
sum(total_employed) as employed_by_date
from occ
group by date) b
on
a.date = b.date
;
quit;
Produces a table like so:
One last thought: you can create all of the totals you desire for this calculation in one pass of the data. I looked at a prior question you asked about this data and assumed that you used proc means to summarize your initial data by date and occupation. You can calculate the totals by date as well in the same procedure. I don't have your data, so I'll illustrate the concept with sashelp.class data set that comes with every SAS installation.
In this example, I want to get the total number of students by sex and age, but I also want to get the total students by sex because I will calculate the percentage of students by sex later. Here's how to summarize the data and get counts for 2 different levels of summary.
proc summary data=sashelp.class;
class sex age;
types sex sex*age;
var height;
output out=summary (drop=_freq_) n=count;
run;
The types statement identifies the levels of summary of my class variables. In this case, I want counts of just sex, as well as the counts of sex by age. Here's what the output looks like.
The _TYPE_ variable identifies the level of summary. The total count of sex is _TYPE_=2 while the count of sex by age is _TYPE_=3.
Then a simple SQL query to calculate the percentages within sex.
proc sql;
create table summary2 as
select
a.sex,
a.age,
a.count,
a.count/b.count as percent_of_sex format=percent7.1
from
summary (where=(_type_=3)) a /* sex * age */
join
summary (where=(_type_=2)) b /* sex */
on
a.sex = b.sex
;
quit;
The answer is to look back at the questions you have asked in the last few days about this same data and study those answers. Your answer is there.
While you are reviewing those answers, take time to thank them and give someone a check for helping you out.