How can I change the order of the attributes in Weka? - data-mining

I was doing a machine learning task in Weka and the dataset has 486 attributes. So, I wanted to do attribute selection using chi-square and it provides me ranked attributes like below:
Now, I also have a testing dataset and I have to make it compatible. But how can I reorder the test attributes in the same manner that can be compatible with the train set?

Changing the order of attributes (e.g., when using the Ranker in conjunction with an attribute evaluator) will probably not have much influence on the performance of your classifier model (since all the attributes will stay in the dataset). Removing attributes, on the other hand, will more likely have an impact (for that, use subset evaluators).
If you want the ordering to get applied to the test set as well, then simply define your attribute selection search and evaluation schemes in the AttributeSelectedClassifier meta-classifier, instead of using the Attribute selection panel (that panel is more for exploration).

Related

Confusion matrix in Weka

I want to calculate confusion matrix, f1 score, roc etc. But the Weka output is showing this. How can I get the confusion matrix, f1 score, roc, etc?
First of all, your dataset seems to have a numeric class attribute. Correlation coefficient is a statistic generated for regression models. A confusion matrix (which you want) is only computed for classification models.
Secondly, you are using ZeroR as classifier, which is not a very useful classifier (only for determining a baseline). ZeroR either predicts the mean class value (numeric class attribute) or the majority class (nominal class attribute).
Solutions:
Ensure that you are using the right attribute for your class. Assuming that you are using the Weka Explorer, check the combobox on the Classify panel that it has the right attribute selected. On the command-line, use the -c flag to specify the index of the class attribute (1-based index, first and last can be used as well).
If you imported your data from a CSV file and the class attribute column contains only numeric values, then Weka will have left it as numeric (it doesn't know that this column represents a nominal attribute). In that case, make sure that you convert your class attribute to a nominal one, e.g., by using the NumericToNominal filter in the Preprocess panel.
Choose a different classifier, like RandomForest or J48, which tend to generate reasonable models with just the default parameters.

Weka InfoGainAttributeEval not available

I need to do feature selection using information gain in learning to rank. I try to use Weka to implement this. But I found in "Select attributes"-"Attribute Evaluator", the "InfoGainAttributeEval" is not available to use. I do not know how to install it and make it available. Anybody knows how to fix this problem?
This usually means that something about your dataset is not compatible with the technique you are trying to use.
Although the InfoGainAttributeEval entry is greyed out in the list, you should still be able to select it (note the Start button is now greyed out), click on its name and then click Capabilities which should show you:
CAPABILITIES
Class -- Binary class, Missing class values, Nominal
class
Attributes -- Binary attributes, Date attributes, Empty nominal
attributes, Missing values, Nominal attributes, Numeric attributes,
Unary attributes
Does your data have attributes that don't match these requirements, or have you selected a class attribute that doesn't match the class requirements?

how to classify using j48 weka with information gain and random attribute selection?

I know that j48 decision tree uses gain ratio to select attribute for making tree.
But i want to use information gain and random selection instead of gain ratio. In select attribute tab in Weka Explorer, I choose InfoGainAttributeEval and put start button. After that I see the sorted list of attribute with information gain method. But I don't know how to use this list to run j48 in Weka. Moreover I don't know how to select attribute randomly in j48.
Please help me if you can.
If you want to perform feature selection on the data before running the algorithm you have two options:
In the Classify tab use AttributeSelectedClassifier (under the meta folder). There you can configure the feature selection algorithm you want. (The default is J48 with CfsSubsetEval).
In the Preprocess tab find and apply AttributeSelect filter (located at supervised\attribute folder). The default here is also the CfsSubsetEval algorithm.
Notice that the first method will apply the algorithm only on the train set when you'll evaluate the algorithm, while the second method will use the entire dataset and will remove features that were not selected (you can use undo to bring them back).
Notice that the way J48 selects features during the training process will remain the same. To change it you need to implement your own algorithm or change the current implementation.

Remove Missing Values in Weka

I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues) when using a classifier like NaiveBayes.
I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?
See this answer below for a better, modern approach.
My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.
If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.
Load your data in Weka Explorer
Select MultiFilter from the Filter area
Click on MultiFilter and Add RemoveWithValues
Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
Save the filter settings and click Apply in Explorer.
Use the removeIf() method on weka.core.Instances using the method reference from weka.core.Instance for the hasMissingValue method, which returns a boolean if a given Instance has any missing values.
Instances dataset = source.getDataSet(); // for some source
dataset.removeIf(Instance::hasMissingValue);

what does the attribute selection in preprocess tab do in weka?

I cant seem to find out what attribute selection filter does in pre process tab? someone could please tell me in simple language as im new to weka
when i apply it to my dataset it seems to remove a couple of attributes but im unsure why
A real data set may contain many attributes. Applying any data mining process on this data set (e.g. finding clusters, generating a classification model ...) may take very long time.
Instead of that, we can select some attributes(dimensions) which is called the most discriminative attributes. These attributes can almost describe the data set with lower number of attributes and this will speed up any process done on the data.
Attribute selection tab contains many different methods for selecting these attributes. One of them is CFS Feature Set Evaluation This filter gives you the attributes that have higher correlation with the class label which makes them discriminative attributes.