Calculation of values for confidence bands - sas

I need actual values for the confidence bands for regression lines generated by SAS during PROC REG. SAS does this automatically when plotting, but I need to know the actual values of the range (knowing this for just some sampled x's would be sufficient.) How can I get SAS to report these values?

Use the output out= option and specify the lcl= and ucl= options. This will output variables for lower and upper confidence limits, respectively. The code below outputs a dataset named predicted containing predicted values as pred, lower confidence limits as lower, and upper confidence limits as upper.
proc reg data=sashelp.cars;
model msrp=horsepower;
output out=predicted p=pred lcl=lower ucl=upper;
run;

Related

Feature Selection and PCA in Machine Learning

I have a dataset with around 15 numeric columns and two categorical columns which are a "State" column and an "Income" column with six buckets representing each different income range. Do I need to encode the "Income" column if it contains integers 1-6 representing each income range? In addition, what type of encoder should I use for the "state" column and does anyone have any good resources on this?
In addition, does one typically perform feature selection (wrapper and filter methods such as Pearson's and Recursive Feature Elimination) before PCA? What is the typical correlation threshold when using a method like Pearson's? And what is the ideal number of dimensions or explained variance ratio one should use when running PCA. I'm confused if you use one of them or both. Thank you.

How does CfsSubsetEva (Correlation-based Feature Selection) works in Weka

I have a dataset which is categorical dataset. I am using WEKA software for feature selection. I have used CfsSubsetEval as attribute evaluator with Greedystepwise method. I came to know this link that CFS uses Pearson correlation to find the strong correlation between the dataset. I also found out how to calculate Pearson correlation coefficient using this link. As per the link the data values need to be numerical for evaluation. Then how can WEKA did the evaluation on my categorical dataset?
The strange result is that Among 70 attributes CFS selects only 10 attributes. Is it because of the categorical dataset? Additionally my dataset is a highly imbalanced dataset where imbalanced ration 1:9(yes:no).
A Quick question
If you go through the link you can found the statement the correlation coefficient to measure the strength and direction of the linear relationship between two numerical variables X and Y. Now I can understand the strength of the correlation coefficient which is varied in between +1 to -1 but what about the direction? How can I get that? I mean the variable is not a vector so it should not have a direction.
The method correlate in the CfsSubsetEval class is used to compute the correlation between two attributes. It calls other methods, depending on the attribute types, which I've linked here:
two numeric attributes: num_num
numeric/nominal attributes: num_nom2
two nominal attributes: nom_nom

SAS Proc Logistic Selection=Score

I'm using PROC LOGISTIC procedure in SAS and option SELECTION=SCORE which gives me few logistic regression models and their Chi-Square values. My question would be which model is better - with smaller Chi-Square or bigger?
In general, the larger chi-squared statistic will correspond with a lower p-value (more significance). However, it is important to know the shape of the chi-squared distribution and also the number of degrees of freedom. As you can see in the graph, the relationship between p and chi-squared changes based on the degrees of freedom.
Score for Chi-Square is larger, the model is better.

estimate linear combination of regression coefficients in sas

I'm using a LMM in SAS and, I would like to get an estimation (and a p-value) of a linear combination of some of the regression coefficients.
Say that the model is:
b0+b1Time+b2X1+b3X2+b4(Time*X1)
and say that, I want to get an estimate and a p-value for the b1+b4.
What should I do?

SAS, ROC curve, PROC LOGISITC, point labels

I am trying to create a single ROC curve for three bio-markers on a common population.
I have already created an overlay curve from proc logistic statement. is there any way in SAS (among default options) to label the specific points on one of the bio-markers.
also, I would like to create a horizontal and vertical lines that depict the Sn and 1-Sp for those specific points.
is there an easier way to do this other than creating a annotation dataset and plotting a graph through proc gplot?
Thanks in advance!!
Among default options, the answer is no. SAS gives you options to control certain aspects of the ROC curves in the roc and rocoptions options in the proc logistic statement, but it doesn't support adding specific features to plots directly within the procedure.
To get the features you're looking for, as you said, you'll need to plot the raw ROC data using a graphics procedure. I like sgplot, the ODS graphics successor to gplot. Assuming you know exactly which points you want to label ahead of time, horizontal and vertical lines for the sensitivity and 1 - specificity can be generated using the refline statement in sgplot.
An annotation dataset may be the best way to go to label specific points. If you're using sgplot, you can generate an SG annotation dataset using the SG annotation macros. More information regarding SG annotation, including the use of the macros, can be found here. The macros are located in the default SAS autocall macro library so they should be able to be referenced without any special fussing. Once you have your dataset, you can feed it into sgplot using the sganno= option in the proc sgplot statement.