proc transreg not outputting curve fit plot - sas

I am using proc transreg to test different transformations in the sashelp.baseball dataset. I request all plots and sometimes I can see a curve fit graph and sometimes I can't. Is there something I am missing if I want to output the regression fit with the code below?
DATA BASEBALL;
SET SASHELP.BASEBALL;
RUN;
ODS GRAPHICS ON;
ODS OUTPUT
NObs = num_obs
FitStatistics = fitstat
Coef = params
;
PROC TRANSREG
DATA=BASEBALL
PLOTS=ALL
SOLVE
SS2
PREDICTED;
;
MODEL_1:
MODEL POWER(logsalary/parameter=1) = log(nruns);
OUTPUT OUT = fitted_model;
RUN;
For clarity, the regression fit plot is a scatter plot with the estimated regression line fitted through

The fit plot is generated when the dependent variable does not have a transformation. You can create the transformation ahead of time to get this graph then.
From documentation:
ODS Graph Name: FitPlot
Plot Description: Simple Regression and Separate Group Regressions
Statement and Option: MODEL, a dependent variable that is not
transformed, one non-CLASS independent variable, and at most one CLASS
variable
This code works for me:
PROC TRANSREG
DATA=sashelp.BASEBALL
PLOTS=ALL
SOLVE
SS2
PREDICTED;
;
MODEL_1:
MODEL identity(logsalary) = log(nruns);
OUTPUT OUT = fitted_model;
RUN;
And generates the desired graph.

Related

Only output the ROC curve in SAS

I am looking to create a pdf with 4 nice graphs for different analysis. My question is, how do I output only the ROC curve for my logistic regression?
I use the following code
TITLE2 JUSTIFY=CENTER "Rank ordering characteristic curve (ROC)";
ODS GRAPHICS ON;
PROC LOGISTIC
DATA = input
plots(only)=(roc(id=obs))
;
MODEL y
(Event = '1')= x
/
SELECTION=NONE
LINK=LOGIT;
RUN;
QUIT;
ODS GRAPHICS OFF;
and a dummy dataset can be imagined using this
DATA HAVE;
DO I = 1 TO 100;
Y = RAND('integer',0,1);
x = ranuni(i);
output;
end;
run;
Thanks
EDIT: just to be explicit, I'm looking to output just a plot of the ROC curve and nothing else, i.e. the tables containing the somers' D etc.
ODS SELECT ROCCURVE;
ODS SELECT allows you to control the output and include only the tables/output you want.
You can wrap your code in ODS TRACE ON, ODS TRACE OFF to find out what the table name, or check the documentation.

how can create a prediction interval based on a linear model in SAS

I am trying to create a prediction interval in SAS. My SAS code is
Data M;
input y x;
datalines;
100 20
120 40
125 32
..
;
proc reg;
model y = x / clb clm alpha =0.05;
Output out=want p=Ypredicted;
run;
data want;
set want;
y1= Ypredicted;
proc reg data= want;
model y1 = x / clm cli;
run;
but when I run the code I could find the new Y1 how can I predict the new Y?
What you're trying to do is score your model, which takes the results from the regression and uses them to estimate new values.
The most common way to do this in SAS is simply to use PROC SCORE. This allows you to take the output of PROC REG and apply it to your data.
To use PROC SCORE, you need the OUTEST= option (think 'output estimates') on your PROC REG statement. The dataset that you assign there will be the input to PROC SCORE, along with the new data you want to score.
As Reeza notes in comments, this is covered, along with a bunch of other ways to do this that might work better for you, in Rick Wicklin's blog post, Scoring a regression model in SAS.

Can SAS Score a Data Set to an ARIMA Model?

Is it possible to score a data set with a model created by PROC ARIMA in SAS?
This is the code I have that is not working:
proc arima data=work.data;
identify var=x crosscorr=(y(7) y(30));
estimate outest=work.arima;
run;
proc score data=work.data score=work.arima type=parms predict out=pred;
var x;
run;
When I run this code I get an error from the PROC SCORE portion that says "ERROR: Variable x not found." The x column is in the data set work.data.
proc score does not support autocorrelated variables. The simplest way to get an out-of-sample score is to combine both proc arima and a data step. Here's an example using sashelp.air.
Step 1: Generate historical data
We leave out the year 1960 as our score dataset.
data have;
set sashelp.air;
where year(date) < 1960;
run;
Step 2: Generate a model and forecast
The nooutall option tells proc arima to only produce the 12 future forecasts.
proc arima data=have;
identify var=air(12);
estimate p=1 q=(2) method=ml;
forecast lead=12 id=date interval=month out=forecast nooutall;
run;
Step 3: Score
Merge together your forecast and full historical dataset to see how well the model did. I personally like the update statement because it will not replace anything with missing values.
data want;
update forecast(in=fcst)
sashelp.air(in=historical);
by Date;
/* Generate fit statistics */
Error = Forecast-Air;
PctError = Error/Air;
AbsPctError = abs(PctError);
/* Helpful for bookkeeping */
if(fcst) then Type = 'Score';
else if(historical) then Type = 'Est';
format PctError AbsPctError percent8.2;
run;
You can take this code and convert it into a generalized macro for yourself. That way in the future, if you wanted to score something, you could simply call a macro program to get what you need.

ods output in proc logistic

I am running a proc logistic with selection =score , to get the best model based on chi-square value. Here is the code
options symbolgen;
%let input_var=ABC_DEF_CkkkkkedHojjjjjerRen101 dept_gert home_value
child_household ;
ods output bestsubsets=score;
proc logistic data=trail;
model response(event='Y')=&input_var
/ selection=score best=1;
run;
The output dataset named score has been generated through ods output. Below is the image of the data set.
score data set image
In the score dataset, in the "variables included in model" column, you can only see a part of variable name "ABC_DEF_CkkkkkedHojjjjjerRen101" and not the entire name. May I know why is this happening and how do I get the entire variable name. Please let me know
Add NAMELEN=32 to your proc logistic statement.

drawing histogram and boxplot in SAS

I wrote the following code in sas, but I did not get result!
The result histogram in grey and the range of data is not as I specified! what is the problem?
I got the following warning too: WARNING: The MIDPOINTS= list was extended to accommodate the data
what about color?
axis1 order=(0 to 100000 by 50000);
axis2 order=(0 to 100 by 5);
run;
proc capability data=HW2 noprint;
histogram Mvisits/midpoints=0 to 98000 by 10000
haxis=axis1
cfill=blue;
run;
.......................................
I have the same problem with boxplot, for example I got the following plot and I want to change the distances, then I could see the plot better, but I could not.
The below is for proc univariate rather than proc capability, I do not have access to SAS/QC to test, but the user guide shows very similar syntax for the histogram statements. Hopefully, you'll be able to translate it back.
It looks like you are having problems with the colour due to your output system. Your graphs are probably delivered via ODS, in which case the cfill option does not apply (see here and not the Traditional Graphics tag).
To change the colour of the histogram bars in ODS output you can use proc template:
proc template;
define style styles.testStyle;
parent = styles.htmlblue;
style GraphDataDefault /
color = green;
end;
run;
ods listing style = styles.testStyle;
proc univariate data = sashelp.cars;
histogram mpg_city;
run;
An example explaining this can be found here.
Alternatively you can use proc sgplot to create a histogram with more control of the colour as follows:
proc sgplot data = sashelp.cars;
histogram mpg_city / fillattrs = (color = red);
run;
As to your question of truncating the histogram. It doesn't really make a great deal of sense to ignore the extreme values as it will give you an erroneous image of the distribution, which somewhat defeats the purpose of the histogram. That said, you can achieve what you are asking for with bit of a hack:
data tempData;
set sashelp.cars;
tempClass = 1;
run;
proc univariate data = tempData noprint;
class tempClass;
histogram mpg_city / maxnbin = 5 endpoints = 0 to 25 by 5;
run;
In the above a dummy class tempClass is created and then comparative histograms are requested using the class statement. maxnbins will limit the number of bins displayed only in a comparative histogram.
Your other option is to exclude (or cap) your extreme points before creating the histogram, but this will lead to slightly erroneous frequency counts/percentages/bar heights.
data tempData;
set sashelp.cars;
mpg_city = min(mpg_city, 20);
run;
proc univariate data = tempData noprint;
histogram mpg_city / endpoints = 0 to 25 by 5;
run;
This is a possible approach to original question (untested as no SAS/QC or data):
proc capability data = HW2 noprint;
histogram Mvisits /
midpoints = 0 to 300000 by 10000
noplot
outhistogram = histData;
run;
proc sgplot data = histData;
vbar _MIDPT_ /
response = _OBSPCT_
fillattrs = (color = blue);
where _MIDPT_ <= 100000;
run;