I was trying to preprocess Leukemia dataset which has two classes ALL and AML.I need to convert it into binary values. I used "nominal to binary" filter. But it does not convert it to binary values. My weka version is 3.6.11.
Well, on my 3.6 version of Weka, it is working.
1. Load the file on Explorer.
2. Go to Filter->Weka filters ->unsupervised->attribute->nominalToBinary.
3. In the attributeIndices, indicate the "nominal" attribute index that you are trying to change to "binary".
4. Leave all other options to default. Click OK.
5. Click apply.
To get the NominalToBinary filter to work on the class attribute,
make sure the attribute selected in the class dropdown is changed to another attribute, temporarily, then you can switch back after applying the filter.
Weka apparently does not let you apply the NominalToBinary filter on the selected class attribute.
Related
I need to do feature selection using information gain in learning to rank. I try to use Weka to implement this. But I found in "Select attributes"-"Attribute Evaluator", the "InfoGainAttributeEval" is not available to use. I do not know how to install it and make it available. Anybody knows how to fix this problem?
This usually means that something about your dataset is not compatible with the technique you are trying to use.
Although the InfoGainAttributeEval entry is greyed out in the list, you should still be able to select it (note the Start button is now greyed out), click on its name and then click Capabilities which should show you:
CAPABILITIES
Class -- Binary class, Missing class values, Nominal
class
Attributes -- Binary attributes, Date attributes, Empty nominal
attributes, Missing values, Nominal attributes, Numeric attributes,
Unary attributes
Does your data have attributes that don't match these requirements, or have you selected a class attribute that doesn't match the class requirements?
I have a couple of attributes with missing values.
This is a survey, so the fact that the person refused to answer is, by itself, useful information!
I would like to create a new attribute called is-missing-value = 1 if a given value in an attribute is a missing value and 0 otherwise.
Things I have tried:
I have tried using AddExpression, but this seems to only perform arithmetic operations such as 2*attribute.
I know that MathExpression allows using if-elses, such as ifelse(A < 3.0, 1, 0)... Do you guys know if/how I can test if a value is nan?
MakeIndicator (or NominalToBinary) should be able to do what I want, but I think I need (i) to convert my missing values to a nominal value, so that then (ii) I can convert this new nominal value to binary. The problem is that ReplaceMissingValue only works for mode or mean; I need to be able to define a new value. One solution could be to Edit the data directly, but I'd rather avoid this.
Please notice that I need to do this using the Weka GUI, not the Java interface.
I think I have a solution for you:
copy the attribute (if you want the original one to remain): apply the copy filter (this and the following filters are all under unsupervised/attribute folder) with the index of the attribute
Convert your attribute to nominal using the numericToNominal filter (set the attribute index)
Fill the missing values with a new value using ReplaceMissingWithUserConstant. Here you need to specify the nominalStringReplacementValue parameter (e.g. "missing") in addition to the index of your attribute.
Apply the NominalToBinary filter on your attribute. This will create several new attributes (as the number of unique values in the dataset + the missing value). You can remove the attributes you don't need and keep only the missing attribute.
Hope it helped.
I know that j48 decision tree uses gain ratio to select attribute for making tree.
But i want to use information gain and random selection instead of gain ratio. In select attribute tab in Weka Explorer, I choose InfoGainAttributeEval and put start button. After that I see the sorted list of attribute with information gain method. But I don't know how to use this list to run j48 in Weka. Moreover I don't know how to select attribute randomly in j48.
Please help me if you can.
If you want to perform feature selection on the data before running the algorithm you have two options:
In the Classify tab use AttributeSelectedClassifier (under the meta folder). There you can configure the feature selection algorithm you want. (The default is J48 with CfsSubsetEval).
In the Preprocess tab find and apply AttributeSelect filter (located at supervised\attribute folder). The default here is also the CfsSubsetEval algorithm.
Notice that the first method will apply the algorithm only on the train set when you'll evaluate the algorithm, while the second method will use the entire dataset and will remove features that were not selected (you can use undo to bring them back).
Notice that the way J48 selects features during the training process will remain the same. To change it you need to implement your own algorithm or change the current implementation.
I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues) when using a classifier like NaiveBayes.
I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?
See this answer below for a better, modern approach.
My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.
If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.
Load your data in Weka Explorer
Select MultiFilter from the Filter area
Click on MultiFilter and Add RemoveWithValues
Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
Save the filter settings and click Apply in Explorer.
Use the removeIf() method on weka.core.Instances using the method reference from weka.core.Instance for the hasMissingValue method, which returns a boolean if a given Instance has any missing values.
Instances dataset = source.getDataSet(); // for some source
dataset.removeIf(Instance::hasMissingValue);
I cant seem to find out what attribute selection filter does in pre process tab? someone could please tell me in simple language as im new to weka
when i apply it to my dataset it seems to remove a couple of attributes but im unsure why
A real data set may contain many attributes. Applying any data mining process on this data set (e.g. finding clusters, generating a classification model ...) may take very long time.
Instead of that, we can select some attributes(dimensions) which is called the most discriminative attributes. These attributes can almost describe the data set with lower number of attributes and this will speed up any process done on the data.
Attribute selection tab contains many different methods for selecting these attributes. One of them is CFS Feature Set Evaluation This filter gives you the attributes that have higher correlation with the class label which makes them discriminative attributes.