Difference between SimpleLogistic and Logistic in Weka - weka

I am working on Weka, but i cannot understand the difference between SimpleLogistic and Logistic classifier. Does anybody know the difference?

According to the documentation (SimpleLogistic and Logistic), SimpleLogistic uses LogitBoost whereas Logistic uses a ridge estimator. The papers that describe the algorithms are
Niels Landwehr, Mark Hall, Eibe Frank (2005). Logistic Model Trees. Machine Learning 95(1-2):161-205 for SimpleLogistic
le Cessie, S., van Houwelingen, J.C. (1992). Ridge Estimators in Logistic Regression. Applied Statistics. 41(1):191-201 for Logistic.

Related

How I could fit a logistic regression of a SAS code in R?

I read the following article: MAYFIELD LOGISTIC REGRESSION: A PRACTICAL
APPROACH FOR ANALYSIS OF NEST SURVIVAL and it offers a SAS code (below) to fit a logistic model:
proc logistic data = {data set};
model FAIL/OBSDAYS = MIDHT SNAGBA
VERTDENS;
run;
In this model, the objective is to estimate the daily survival rate of nests. So, the response variable (FAIL) is the fate of nests (0=succeful, 1=fail), OBSDAYS is the time of exposure of nests, MIDHT, SNAGBA, and VERTDENS are covariates. I understand perfectly the second part of the model, but I have doubts about configuring the response variable in this model in R. Would it be appropriate to set it as follows in R?:
m1<-glm(fail~MIDHT+SNAGBA+VERTDENS, data=data set, family="binomial")

How to perform likelihood ratio test for linear regression in SAS

I am trying to do a likelihood ratio test to compare nested models in SAS. I am very new to SAS and am only familiar with PROC REG to conduct a regression analysis. Do you have any ideas on how I can find the likelihood ratio test or how I would start?
I know how to do a LR test with logistic regression but it seems to automatically come up with the PROC LOGISTIC function.
Any help would be appreciated!

Why do I get different regression outputs in SAS and in Stata when using Prais-Winsten estimation?

I have a time series dataset with serious serial correlation problem, so I adopted Prais-Winsten estimator with iterated estimates to fix that. I did the regressions in Stata with the following command:
prais depvar indepvar indepvar2, vce(robust) rhotype(regress)
My colleague wanted to reproduce my results in SAS, so she used the following:
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 iter itprint method=YW;
run;
For the different specifications we ran, some of them roughly match, while others do not. Also I noticed that for each regression specification, Stata has many more iterations than SAS. I wonder if there is something wrong with my (or my colleague's) code.
Update
Inspired by Joe's comment, I modified my SAS code.
/*Iterated Estimation*/
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 itprint method=ITYW;
run;
/*Twostep Estimation*/
proc autoreg data=DATA;
model depvar = indepvar indepvar2/nlag=1 itprint method=YW;
run;
I have a few suggestions. Note that I'm not a real statistician and am not familiar with the specific estimators here, so this is just a quick read of the docs.
First off, the most likely issue is that it looks like SAS uses the OLS variance estimation method. That is, in your Stata code, you have vce(robust), which is in contrast to what I read SAS as using, the equivalent of vce(ols). See this page in the docs which explains how SAS does the Y-W method of autoregression, compared to this doc page that explains how Stata does it.
Second, you probably should not specify method=YW. SAS distinguishes between the simple Y-W estimation ("two-step" method) and iterated Y-W estimation. method=ITYW is what you want. You specify iter, so it may well be that you're getting this anyway as SAS tends to be smart about those sorts of things, but it's good to verify.
I would suggest actually turning the iterations off to begin with - have both do the two-step method (Stata option twostep, SAS by removing the iter request and specifying method=YW or no method specification). See how well they match there. Once you can get those to match, then move on to iterated; it's possible SAS has a different cutoff than Stata and may well not iterate past that.
I'd also suggest trying this with only one independent and dependent variable pair first, as it's possible the two programs handle things differently when you add in a second independent variable. Always start simple and then add complexity.

slow logistic regression performance in accord.net

Logistic Regression using Accord.net (http://accord-framework.net/docs/html/T_Accord_Statistics_Analysis_LogisticRegressionAnalysis.htm) takes about 5 mins compute. SAS does it in a few seconds (using single CPU core as well).
Dataset is about 40000 rows and 30 inputs.
Why is there such a difference? Does SAS use algorithm with much better complexity? Logistic regression is quite simple algorithm as I know.
Is there any other library that will do better (preferably free)?
The solution is to comment out this line:
https://github.com/accord-net/framework/blob/development/Sources/Accord.Statistics/Analysis/LogisticRegressionAnalysis.cs#L504
It computes some very expensive statistics that I don't need.
There is the class that can be used with the standard Accord package: https://gist.github.com/eugenem/e1dd2ef2149e8c21c37d
I had the same expirence with multinomial logistic regression. I have made a comparison between Accord, R, SPSS and Python's Scikit. I have 30 inputs, 10 outputs and 1600+ training examples. Accord took 8 min, and the rest took 2-8 sek. Accord looks beautiful but for the multinomial logistic regression it's way to slow. My solution was that I made a small python webservice that calculates the regression and saves the result in the database.

Automated regression procedure in Stata

I'm going to study the relationship between the illiquidity and returns in stock markets, using the Amihud model proposed in the paper "Illiquidity and stock returns: cross-section and time-series effects" (2002). I would like to know if it is possible to automate the regression analysis. I've have more than 2000 stocks in the sample and I'd like to avoid to run each regression one-by-one, speeding the process up.
Do you know if it is possible automate this process in Stata? or if is it possible to do that using some other statistical software (R, SAS, Matlab, Gretl,...) ? If it is, how could I do that?
You should look at foreach and forval as ways of looping.
forval i = 1/3 {
regress Ystock`i' Xstock`i'
}
would be an example if and only if there are variables with names like those you indicated. If you have other names, or a different data structure, a loop would still be possible.