why my weka doesn't show detailed accuracy by class - weka

why my weka doesn't show detailed accuracy by class section in result? how can i change the setting? Thanks

Your statistics output lists Correlation coefficient and not Correctly Classified Instances, which implies that your class attribute is numeric (RandomForest can function as a regressor and classifier).
Per-class statistics can only be generated for nominal class attributes.
If your class attribute is indeed a nominal one, then you have to turn it into a nominal one, e.g., by using the NumericToNominal filter.

Related

number expected.read Token[2015-02-02 14:19:00] weka project

i hope you all are doing well!
I have a project at data mining class.Τhe data consists of numerical data and many algorithms do not work.I have to do this:"you should compare the performance of the following categorization algorithms:
RandomForest, C4.5, JRip, Bayesian Network. Where necessary use them
Weka filters to replace or create values ​​for some properties
new properties. For comparison, adopt the Train / Test Percentage Split type with
percentage for training data equal to 80%.Describe your observations by giving tables with the results and
presenting the performance of the algorithms. Repeat the experiment by putting
percentage for training data equal to 70% and 50% presenting the results."
So my first try was to transform the data inside weka with preprocessing data numeric to nominal but a friend of mine suggest that is statistical wrong.So my second try was to use excel to transform all data even the date to numeric,remove the first row(id) and pass it to the weka(I leave double quotes only at date)
.But i have the error that i mention on the title.The dataset is:https://archive.ics.uci.edu/ml/datasets/Occupancy+Detection+
Thank you for the time.
If you define date-like data as a DATE attribute in the ARFF file (using the right format for parsing the strings), then WEKA will treat it as a numeric attribute internally (Java epoch, ie milli-seconds since 1970-01-01).
Instead of using NumericToNominal, use either the supervised or unsupervised Discretize filter if the algorithm cannot handle numeric attributes.
Converting nominal attributes to numeric ones is not a recommended approach. Instead, try the supervised or unsupervised NominalToBinary filter.

Interpreting results using J48 for a divided attribute of interest in x levels (WEKA)

I'm new to data mining and Weka. I built a classifier with J48 in Weka using the GUI, with J48 (training set) for an attribute of interest in five levels. I have to evaluate the precision of the model, but I don't know very well how to do it! Some information may be of interest:
== Detailed Accuracy By Class ===
Precision
0.80
?
0.67
0.56
?
?
First, I would like to know the meaning of the "?" in the precision column. When probing with an attribute of interest in two levels I got no "?". The tree is bigger now than when dividing into two levels. I am questioning if this means that taking an attribute of interest in five levels could generate a less efficient tree in terms of classification and computation time. This seems quite obvious as the number of Correctly Classified Instances when the attribute had 2 levels were up to 72%.
Thank you in advance, all interesting answers will be rewarded!
"I would like to know the meaning of the "?" in the precision column"
Note that for these same classes the TP and FP rates are 0. It appears that J48 has not assigned any of your observations to these classes.
Are these classes relatively small? If so, you might want to consider using the ClassBalancer filter. This will use weights to make all classes look the same size.
Of course, after you get the model you need to "convert back" to the real situation. This is similar for correcting for physically oversampling or undersampling. See my answer here: https://stats.stackexchange.com/questions/211174/how-to-exact-prediction-from-over-sampled-dataundoing-oversampling/257507#257507

Weka not display Correctly classified instances as output

I am new on weka. I have a dataset in csv with 5000 samples. here 20 samples of it; when I upload this dataset into weka, it looks ok, but when I run knn algorithm it gives a result that is not supposed to give. here is the sample data.
a,b,c,d
74,85,123,1
73,84,122,1
72,83,121,1
70,81,119,1
70,81,119,1
69,80,118,1
70,81,119,1
70,81,119,1
76,87,125,1
76,87,125,1
82,92,146,2
74,86,140,2
68,80,134,2
64,76,130,2
64,75,132,2
83,96,152,2
72,85,141,2
71,83,141,2
69,81,139,2
65,79,137,2
here is the result :
=== Cross-validation ===
=== Summary ===
Correlation coefficient 0.6148
Mean absolute error 0.2442
Root mean squared error 0.4004
Relative absolute error 50.2313 %
Root relative squared error 81.2078 %
Total Number of Instances 5000
it is supposed to give this kind of result like:
Correctly classified instances: 69 92%
Incorrectly classified instances: 6 8%
What should be the problem? What am I missing? I did this in all other algorithms but they all give the same output. I have used sample weka datasets, they all work as expected.
The IBk algorithm can be used for regression (predicting the value of a numeric response for each instance) as well as for classification (predicting which class each instance belongs to).
It looks like all the values of the class attribute in your dataset (column d in your CSV) are numbers. When you load this data into Weka, Weka therefore guesses that this attribute should be treated as a numeric one, not a nominal one. You can tell this has happened because the histogram in the Preprocess tab looks something like this:
instead of like this (coloured by class):
The result you're seeing when you run IBk is the result of a regression fit (predicting a numeric value of column d for each instance) instead of a classification (selecting the most likely nominal value of column d for each instance).
To get the result you want, you need to tell Weka to treat this attribute as nominal. When you load the csv file in the Preprocess tab, check Invoke options dialog in the file dialog window. Then when you click Open, you'll get this window:
The field nominalAttributes is where you can give Weka a list of which attributes are nominal ones even if they look numeric. Entering 4 here will specify that the fourth attribute (column) in the input is a nominal attribute. Now IBk should behave as you expect.
You could also do this by applying the NumericToNominal unsupervised attribute filter to the already loaded data, again specifying attribute 4 otherwise the filter will apply to all the attributes.
The ARFF format used for the Weka sample datasets includes a specification of which attributes are which type. After you've imported (or filtered) your dataset as above, you can save it as ARFF and you'll then be able to reload it without having to go through the same process.

Empty confusion matrix in Weka with test data

I am classifying iris data using DECISION TREE (C4.5), RANDOM FOREST and NAIVE BAYES. I am using the dataset downloaded from iris-train and iris-test. When I train the all networks everything is fine with proper results with 'classifier output', 'Detailed accuracy with class' and 'confusion matrix'. But, when I select the iris-test data in the Weka-explorer-classify-test options and select the iris-test file and in 'more options' select 'output prediction' as 'csv' and click start, I am getting the result as shown in the figure below. The 'classifier output' is showing the classified samples correctly, but, 'Detailed accuracy with class' and 'confusion matrix' is with all values zeros. Any suggestion where I am going wrong in selecting any option. Thank you.
The confusion matrix shows you how well your trained classifier performs by comparing the actual class of the instances in the test set with the class that was predicted by the classifier. But you are supplying a test set with no class information, so there's nothing to compare against. This is why you see
Total Number of Instances 0
Ignored Class Unknown Instances 120
in the output in your screenshot.
Typically you would first evaluate the performance of your classifier using cross-validation, or a test set that has class information. Then you can use the trained classifier to classify unknown data, for example using the Re-evaluate model on current test set right-click option as described in the help.

Regression Tree Forest in Weka

I'm using Weka and would like to perform regression with random forests. Specifically, I have a dataset:
Feature1,Feature2,...,FeatureN,Class
1.0,X,...,1.4,Good
1.2,Y,...,1.5,Good
1.2,F,...,1.6,Bad
1.1,R,...,1.5,Great
0.9,J,...,1.1,Horrible
0.5,K,...,1.5,Terrific
.
.
.
Rather than learning to predict the most likely class, I want to learn the probability distribution over the classes for a given feature vector. My intuition is that using just the RandomForest model in Weka would not be appropriate, since it would be attempting to minimize its absolute error (maximum likelihood) rather than its squared error (conditional probability distribution). Is that intuition right? Is there a better model to be using if I want to perform regression rather than classification?
Edit: I'm actually thinking now that in fact it may not be a problem. Presumably, classifiers are learning the conditional probability P(Class | Feature1,...,FeatureN) and the resulting classification is just finding the c in Class that maximizes that probability distribution. Therefore, a RandomForest classifier should be able to give me the conditional probability distribution. I just had to think about it some more. If that's wrong, please correct me.
If you want to predict the probabilities for each class explicitly, you need different input data. That is, you would need to replace the value to predict. Instead of one data set with the class label, you would need n data sets (for n different labels) with aggregated data for each unique feature vector. Your data would look something like
Feature1,...,Good
1.0,...,0.5
0.3,...,1.0
and
Feature1,...,Bad
1.0,...,0.8
0.3,...,0.1
and so on. You would need to learn one model for each class and run them separately on any data to be classified. That is, for each label you learn a model to predict a number that is the probability of being in that class, given a feature vector.
If you don't need the probabilities to be predicted explicitly, have a look at the Bayesian classifiers in Weka, which make use of probabilities in the models that they learn.