Logistic regression coefficients in weka LMT tree - weka

How can I obtain the coefficients of the regression function in the LMT leave nodes?
Thanks!

It should come up by default.
The screenshot below contains the coefficients that are generated using LMT:
This result was achieved without changes to LMT Default Parameters on a Randomly generated dataset and using Weka 3.7.11.
From Java, the LMT.ToString() method should give you the leaves of the tree.

Related

How can I test overdispersion in STATA when using xtpoisson and xtnbreg?

I have balanced panel data and my dependent variable is count one which distribution has lots of zero(0).
therefore I think it might be suitable for using negative binomial regression rather than poisson one. However, I cannot find how can I test whether xtnbreg or xtpoisson is suitable for my data.
If someone can help how can I test overdispersion to choose poisson model or nbmodel.
Thank you in advance!

Empty confusion matrix in Weka with test data

I am classifying iris data using DECISION TREE (C4.5), RANDOM FOREST and NAIVE BAYES. I am using the dataset downloaded from iris-train and iris-test. When I train the all networks everything is fine with proper results with 'classifier output', 'Detailed accuracy with class' and 'confusion matrix'. But, when I select the iris-test data in the Weka-explorer-classify-test options and select the iris-test file and in 'more options' select 'output prediction' as 'csv' and click start, I am getting the result as shown in the figure below. The 'classifier output' is showing the classified samples correctly, but, 'Detailed accuracy with class' and 'confusion matrix' is with all values zeros. Any suggestion where I am going wrong in selecting any option. Thank you.
The confusion matrix shows you how well your trained classifier performs by comparing the actual class of the instances in the test set with the class that was predicted by the classifier. But you are supplying a test set with no class information, so there's nothing to compare against. This is why you see
Total Number of Instances 0
Ignored Class Unknown Instances 120
in the output in your screenshot.
Typically you would first evaluate the performance of your classifier using cross-validation, or a test set that has class information. Then you can use the trained classifier to classify unknown data, for example using the Re-evaluate model on current test set right-click option as described in the help.

How does Weka calculate the output predictions in J48 and other classifier?

I have used the output predictions of J48 classifier in Weka and got the results with predictions (probability). As I need to use these predictions number in my research, I need to know how the weka calculates these numbers? What is the formula? Is it specified for each classifier?
In addition to Jan Eglinger answer.
The J48 classifier is Weka's implementation of the infamous C4.5 decision tree classifier, which is a classification algorithm based on ID3 that classifies using information entropy.
The training data is a set S = {s_1, s_2, ...} of already classified samples. Each sample s_i consists of a p-dimensional vector (x_{1,i}, x_{2,i}, ...,x_{p,i}) , where the x_j represent attribute values or features of the sample, as well as the class in which s_i falls.
At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other. The splitting criterion is the normalized information gain (difference in entropy). The attribute with the highest normalized information gain is chosen to make the decision. The C4.5 algorithm then recurs on the smaller sublists.
This algorithm has a few base cases.
All the samples in the list belong to the same class. When this
happens, it simply creates a leaf node for the decision tree saying
to choose that class.
None of the features provide any information gain. In this case,
C4.5 creates a decision node higher up the tree using the expected
value of the class.
Instance of previously-unseen class encountered. Again, C4.5 creates
a decision node higher up the tree using the expected value.
You can find the information Gain and entropy in the Weka Api package. For that you need to start dubbing the java weka api and go through each step.
In general, if you don't worry about how algorithm works internally using high level mathematics. Try to calculate InformationGain and entropy and explain them in your research apart from decision trees, you have methods for both of these to calculate their value.
What is the formula?
Weka's J48 classifier is an implementation of the C4.5 algorithm.
I need to know how the weka calculates these numbers?
You can find implementation details in J48.java and in the weka.classifiers.trees.j48 package.

methods does not give confusion matrix in weka

I want to do classification in weka. I am using some methods(Random Tree, Random Forest, Decision Table, RandomSubspace...) but they give results like below.
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.1678
Mean absolute error 0.4832
Root mean squared error 0.4931
Relative absolute error 96.6501 %
Root relative squared error 98.6323 %
Total Number of Instances 100000
However I want results as accurancy and confusion matrix. How can I get results like that?
Note: When I use small dataset, it gives results as confusion matrix. Can it be related with the size of dataset?
The output of the training/testing in Weka depends on the type of the attribute that you are trying to predict. If your attribute is nominal, you will get a confusion matrix and accuracy value. If your attribute is numeric, you will get a correlation coefficient.
In your small and large datasets that you mention, what is your type of the attribute that you are predicting?
I have run a 2-class problem using J48 and RandomForest with 100000 instances and the confusion matrix appeared correctly. I additionally increased the problem complexity to run 20 different classes and the confusion matrix appeared correctly as well.
If you look under more options, please ensure that the 'output confusion matrix' is checked and see if this resolves the issue.

Regression Tree Forest in Weka

I'm using Weka and would like to perform regression with random forests. Specifically, I have a dataset:
Feature1,Feature2,...,FeatureN,Class
1.0,X,...,1.4,Good
1.2,Y,...,1.5,Good
1.2,F,...,1.6,Bad
1.1,R,...,1.5,Great
0.9,J,...,1.1,Horrible
0.5,K,...,1.5,Terrific
.
.
.
Rather than learning to predict the most likely class, I want to learn the probability distribution over the classes for a given feature vector. My intuition is that using just the RandomForest model in Weka would not be appropriate, since it would be attempting to minimize its absolute error (maximum likelihood) rather than its squared error (conditional probability distribution). Is that intuition right? Is there a better model to be using if I want to perform regression rather than classification?
Edit: I'm actually thinking now that in fact it may not be a problem. Presumably, classifiers are learning the conditional probability P(Class | Feature1,...,FeatureN) and the resulting classification is just finding the c in Class that maximizes that probability distribution. Therefore, a RandomForest classifier should be able to give me the conditional probability distribution. I just had to think about it some more. If that's wrong, please correct me.
If you want to predict the probabilities for each class explicitly, you need different input data. That is, you would need to replace the value to predict. Instead of one data set with the class label, you would need n data sets (for n different labels) with aggregated data for each unique feature vector. Your data would look something like
Feature1,...,Good
1.0,...,0.5
0.3,...,1.0
and
Feature1,...,Bad
1.0,...,0.8
0.3,...,0.1
and so on. You would need to learn one model for each class and run them separately on any data to be classified. That is, for each label you learn a model to predict a number that is the probability of being in that class, given a feature vector.
If you don't need the probabilities to be predicted explicitly, have a look at the Bayesian classifiers in Weka, which make use of probabilities in the models that they learn.