I'm using PROC LOGISTIC procedure in SAS and option SELECTION=SCORE which gives me few logistic regression models and their Chi-Square values. My question would be which model is better - with smaller Chi-Square or bigger?
In general, the larger chi-squared statistic will correspond with a lower p-value (more significance). However, it is important to know the shape of the chi-squared distribution and also the number of degrees of freedom. As you can see in the graph, the relationship between p and chi-squared changes based on the degrees of freedom.
Score for Chi-Square is larger, the model is better.
Related
I am new to generalised linear modelling. I ran the negative binomial model, and then try to estimate the residuals from the model.
Here is what I did:
Run a negative binomial regression model with nbreg command in stata 17.
Run the predict command to estimate the predicted values.
Then, generate the residual by subtracting predicted values from observed values.
Did I do it correctly?
I have a dataset which is categorical dataset. I am using WEKA software for feature selection. I have used CfsSubsetEval as attribute evaluator with Greedystepwise method. I came to know this link that CFS uses Pearson correlation to find the strong correlation between the dataset. I also found out how to calculate Pearson correlation coefficient using this link. As per the link the data values need to be numerical for evaluation. Then how can WEKA did the evaluation on my categorical dataset?
The strange result is that Among 70 attributes CFS selects only 10 attributes. Is it because of the categorical dataset? Additionally my dataset is a highly imbalanced dataset where imbalanced ration 1:9(yes:no).
A Quick question
If you go through the link you can found the statement the correlation coefficient to measure the strength and direction of the linear relationship between two numerical variables X and Y. Now I can understand the strength of the correlation coefficient which is varied in between +1 to -1 but what about the direction? How can I get that? I mean the variable is not a vector so it should not have a direction.
The method correlate in the CfsSubsetEval class is used to compute the correlation between two attributes. It calls other methods, depending on the attribute types, which I've linked here:
two numeric attributes: num_num
numeric/nominal attributes: num_nom2
two nominal attributes: nom_nom
I need actual values for the confidence bands for regression lines generated by SAS during PROC REG. SAS does this automatically when plotting, but I need to know the actual values of the range (knowing this for just some sampled x's would be sufficient.) How can I get SAS to report these values?
Use the output out= option and specify the lcl= and ucl= options. This will output variables for lower and upper confidence limits, respectively. The code below outputs a dataset named predicted containing predicted values as pred, lower confidence limits as lower, and upper confidence limits as upper.
proc reg data=sashelp.cars;
model msrp=horsepower;
output out=predicted p=pred lcl=lower ucl=upper;
run;
I am trying to fit a univariate Gaussian Mixture Model with the EM algorithm. But I only found a package in R (mclust). Does anybody know an equivalent proc step in SAS 9.3?
Check out PROC FMM (Finite Mixture Models). I believe that does what you are looking for.
http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_fmm_overview01.htm
I'm using a LMM in SAS and, I would like to get an estimation (and a p-value) of a linear combination of some of the regression coefficients.
Say that the model is:
b0+b1Time+b2X1+b3X2+b4(Time*X1)
and say that, I want to get an estimate and a p-value for the b1+b4.
What should I do?