Confusion matrix in Weka - weka

I want to calculate confusion matrix, f1 score, roc etc. But the Weka output is showing this. How can I get the confusion matrix, f1 score, roc, etc?

First of all, your dataset seems to have a numeric class attribute. Correlation coefficient is a statistic generated for regression models. A confusion matrix (which you want) is only computed for classification models.
Secondly, you are using ZeroR as classifier, which is not a very useful classifier (only for determining a baseline). ZeroR either predicts the mean class value (numeric class attribute) or the majority class (nominal class attribute).
Solutions:
Ensure that you are using the right attribute for your class. Assuming that you are using the Weka Explorer, check the combobox on the Classify panel that it has the right attribute selected. On the command-line, use the -c flag to specify the index of the class attribute (1-based index, first and last can be used as well).
If you imported your data from a CSV file and the class attribute column contains only numeric values, then Weka will have left it as numeric (it doesn't know that this column represents a nominal attribute). In that case, make sure that you convert your class attribute to a nominal one, e.g., by using the NumericToNominal filter in the Preprocess panel.
Choose a different classifier, like RandomForest or J48, which tend to generate reasonable models with just the default parameters.

Related

How can I change the order of the attributes in Weka?

I was doing a machine learning task in Weka and the dataset has 486 attributes. So, I wanted to do attribute selection using chi-square and it provides me ranked attributes like below:
Now, I also have a testing dataset and I have to make it compatible. But how can I reorder the test attributes in the same manner that can be compatible with the train set?
Changing the order of attributes (e.g., when using the Ranker in conjunction with an attribute evaluator) will probably not have much influence on the performance of your classifier model (since all the attributes will stay in the dataset). Removing attributes, on the other hand, will more likely have an impact (for that, use subset evaluators).
If you want the ordering to get applied to the test set as well, then simply define your attribute selection search and evaluation schemes in the AttributeSelectedClassifier meta-classifier, instead of using the Attribute selection panel (that panel is more for exploration).

How do I apply my model to a new dataset in WEKA?

I have created a new prediction model based on a dataset that was given to me. It predicts a nominal (binary) class attribute (positive/negative) based on a number of numerical attributes.
Now I have been asked to use this prediction model to predict classes for a new dataset. This dataset has all the same attributes except for the class column, which does not exist yet. How do I apply my model to this new data? I have tried adding an empty class column to my new dataset and then doing the following:
Simply loading the new dataset in WEKA's explorer and loading the model. It tells me there is no training data.
Opening my training set in WEKA's explorer and then opening my training model, then choosing my new data as a 'supplied test set'. It runs but does not output any predictions.
I should note that the model works fine when testing on the training data for cross validation. It also works fine with a subset of the training data I separated ages ago for test/eval use. I think it may be a problem with how I am adding a new class column, maybe?
For making predictions, Weka requires the two datasets, training and the one for making predictions, to have the exact same structure, down to the order of labels. That also means, that you need to have a class attribute with the correct labels present. In terms of values for your class attribute, simply use the missing value (denoted by a question mark).
See the FAQ How do i make predictions with a trained model? on the Weka wiki for more information on how to make predictions.

In data mining what is a class label..? please give an example

i don't understand what it means.
in database a tuple means a field value and a attribute means a table field?
am i correct?
and what is a Class label in Data Mining?
Very short answer: class label is the discrete attribute whose value you want to predict based on the values of other attributes. (Do read the rest of the answer.)
The term class label is usually used in the contex of supervised machine learning, and in classification in particular, where one is given a set of examples of the form (attribute values, classLabel) and the goal is to learn a rule that computes the label from the attribute values. The class label always takes on a finite (as opposed to inifinite) number of different values.
For a concrete example, we might be given a set of adult people and we'd like to predict whether they're homeless or not. Suppose the attributes were highest educational level achieved and origin (examples are of the from (origin, educationalLevel; isHomeless):
(Manhattan, PhD; no)
(Brooklyn, Primary school; yes)
...
In this particular case, isHomeless is the class label. The goal is to learn a function that computes whether the person with a given attribute values is homeless or not. (More specifically, to learn a function that makes as little mistakes as possible under a certain quantification of the number of mistakes.)
The Wikipedia article Supervised learning gives a good description.
Regarding the other question: no, a tuple means the whole set of values of the attributes in a given row. For example, if you had a table Table person(id, name, surname) then a tuple representing the first row could be (0, 'Akhil', 'Mohan').
Basically a class label (in classification) can be compared to a response variable (in regression): a value we want to predict in terms of other (independent) variables.
Difference is that a class labels is usually a discrete/Categorcial variable (eg-Yes-No, 0-1, etc.), whereas a response variable is normally a continuous/real-number variable.
You can find more about Regression and Classification related to Response variables and Class lables at https://math.stackexchange.com/questions/141381/regression-vs-classification.
Take an example of email spam filter, it classifies that an email is a spam or not, for which we define 2 classes which are spam(class 1) and not spam(class 2). Both of these are class labels or you can say that, if an email have some certain attributes then it belongs to spam class or not spam class

Remove Missing Values in Weka

I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues) when using a classifier like NaiveBayes.
I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?
See this answer below for a better, modern approach.
My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.
If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.
Load your data in Weka Explorer
Select MultiFilter from the Filter area
Click on MultiFilter and Add RemoveWithValues
Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
Save the filter settings and click Apply in Explorer.
Use the removeIf() method on weka.core.Instances using the method reference from weka.core.Instance for the hasMissingValue method, which returns a boolean if a given Instance has any missing values.
Instances dataset = source.getDataSet(); // for some source
dataset.removeIf(Instance::hasMissingValue);

Convert String attributes to numeric values in WEKA

I am new to weka.. My data contains a column of student name. I want to convert these names to numeric values, over the whole column.
Eg: Suppose there are 10 names abcd ,cdef,xyz ,etc. I want to pre process the data so that corresponding to each name there is distinct numeric value, like abcd changes to 1 ,cdef changes to 2 ,etc.
Also two or more rows can have same name. So in this case, same name should have same value.
Please help me...
Weka supports 4 non-relational attribute types: nominal, numeric, string and date. You can find out more about them in Weka Manual (it can be found in the same folder were you downloaded Weka), chapter "The ARFF Header Section".
You should find out what is the type of the "student's name" attribute (probably string, but could be nominal), and decide what should be the type of the attribute with converted values (numeric, nominal, or string).
There can be 2 scenarios:
(1) If types of the existing and desired attributes are the same (string-string or nominal-nominal, i.e. you only want to change values, not attribute type), you could do so
(a) manually - open the data file in Weka Explorer, and click Edit... button, or
(b) write a small program using Weka's Attribute class functions value and setValue.
(2) Types are different - Weka attribute types cannot be converted, so you will have to create and insert a new attribute with the converted values, and delete the old attribute. An example of how to create a new attribute can be found at
http://weka.wikispaces.com/Programmatic+Use#Step.
As far as I understand, strictly converting names into a "numeric" type doesn't seem like the best approach, within the context of WEKA - WEKA will treat numeric attributes differently than it does "string" or "nominal" attributes (for example, for running certain "attribute selection" algorithms, you can not use "numeric" types - they need to be "discretized" or converted into nominal form).
So, for your case, I think you can convert your "string" names into just "nominal" type using the StringToNominal class (this class acts as a WEKA "filter" to help convert a given "string" attribute into an attribute of type "nominal"). This will also take care about the repeating names - the list of "nominal" values for the names (that will be generated after you apply this filter) will contain any given name (that appears any number of times) only one time.
"Nominal" attributes also have the advantage that implicitly, they do have a numeric representation (the index of the value within the set of values; similar to how the "enums" in Java have a numeric index). So, you can utilize that as the "numeric" information corresponding to the names (though as I said earlier, it's probably best to just use it as "nominal" attribute; really depends on your particular use case).
I had the same problem as the one mentioned in the question, and I could "address" it in the following way.
I first applied the StringToNominal filter as mentioned before (don't forget to change the attribute range (from "last" to "first-last")). Once done that, I saved the dataset in LibSVM format, which changes the nominal values to numeric ones.
Then, if you close Weka and open it again, you will have the same dataset with the same number of features but they will be numeric. Now some changes should be done, first of all, normalizing all the numeric values in the dataset, using the Normalize filter. After that, apply the NumericToNominal filter to the last attribute.
Then, you will have a similar dataset with numeric values.
Hope this helps.