I need to do feature selection using information gain in learning to rank. I try to use Weka to implement this. But I found in "Select attributes"-"Attribute Evaluator", the "InfoGainAttributeEval" is not available to use. I do not know how to install it and make it available. Anybody knows how to fix this problem?
This usually means that something about your dataset is not compatible with the technique you are trying to use.
Although the InfoGainAttributeEval entry is greyed out in the list, you should still be able to select it (note the Start button is now greyed out), click on its name and then click Capabilities which should show you:
CAPABILITIES
Class -- Binary class, Missing class values, Nominal
class
Attributes -- Binary attributes, Date attributes, Empty nominal
attributes, Missing values, Nominal attributes, Numeric attributes,
Unary attributes
Does your data have attributes that don't match these requirements, or have you selected a class attribute that doesn't match the class requirements?
Related
I was doing a machine learning task in Weka and the dataset has 486 attributes. So, I wanted to do attribute selection using chi-square and it provides me ranked attributes like below:
Now, I also have a testing dataset and I have to make it compatible. But how can I reorder the test attributes in the same manner that can be compatible with the train set?
Changing the order of attributes (e.g., when using the Ranker in conjunction with an attribute evaluator) will probably not have much influence on the performance of your classifier model (since all the attributes will stay in the dataset). Removing attributes, on the other hand, will more likely have an impact (for that, use subset evaluators).
If you want the ordering to get applied to the test set as well, then simply define your attribute selection search and evaluation schemes in the AttributeSelectedClassifier meta-classifier, instead of using the Attribute selection panel (that panel is more for exploration).
I was trying to preprocess Leukemia dataset which has two classes ALL and AML.I need to convert it into binary values. I used "nominal to binary" filter. But it does not convert it to binary values. My weka version is 3.6.11.
Well, on my 3.6 version of Weka, it is working.
1. Load the file on Explorer.
2. Go to Filter->Weka filters ->unsupervised->attribute->nominalToBinary.
3. In the attributeIndices, indicate the "nominal" attribute index that you are trying to change to "binary".
4. Leave all other options to default. Click OK.
5. Click apply.
To get the NominalToBinary filter to work on the class attribute,
make sure the attribute selected in the class dropdown is changed to another attribute, temporarily, then you can switch back after applying the filter.
Weka apparently does not let you apply the NominalToBinary filter on the selected class attribute.
I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues) when using a classifier like NaiveBayes.
I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?
See this answer below for a better, modern approach.
My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.
If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.
Load your data in Weka Explorer
Select MultiFilter from the Filter area
Click on MultiFilter and Add RemoveWithValues
Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
Save the filter settings and click Apply in Explorer.
Use the removeIf() method on weka.core.Instances using the method reference from weka.core.Instance for the hasMissingValue method, which returns a boolean if a given Instance has any missing values.
Instances dataset = source.getDataSet(); // for some source
dataset.removeIf(Instance::hasMissingValue);
I cant seem to find out what attribute selection filter does in pre process tab? someone could please tell me in simple language as im new to weka
when i apply it to my dataset it seems to remove a couple of attributes but im unsure why
A real data set may contain many attributes. Applying any data mining process on this data set (e.g. finding clusters, generating a classification model ...) may take very long time.
Instead of that, we can select some attributes(dimensions) which is called the most discriminative attributes. These attributes can almost describe the data set with lower number of attributes and this will speed up any process done on the data.
Attribute selection tab contains many different methods for selecting these attributes. One of them is CFS Feature Set Evaluation This filter gives you the attributes that have higher correlation with the class label which makes them discriminative attributes.
In Sitecore's Advanced System Reporter (v1.3) shared source module, is there an out-of-the-box way of sorting the results before the results are displayed to email/screen or will I need to implement something myself?
In a standard ASR install, I can see the Media Viewer viewer configuration item has a sort parameter in the attributes field but it's using ASR.Reports.Items.ItemViewer class which, after checking in reflector, doesn't respect the sort parameter. I take this to mean that the class might have respected the sort parameter previously but doesn't now.
As a side thought, I would have thought that a Scanner class would be a much more logical place to put sorting logic than at the Viewer class level.
Ok, found the answer. The sort parameter I found is actually used when running the report by the ASR module.
The sort parameter is set up in the attributes and is in the following format:
sort=ColumnName,ASC|DESC,[DateTime]
where Column Name is the display name of the column, ASC or DESC is the sort direction and is required and DateTime is to be set if the column is a date time value.
Example:
Given the column formatting of
<Columns>
<Column name="item name">Item Name</Column>
<Column name="publish date">Publish Date</Column>
</Columns>
to sort by publish date descending, the appropriate sort parameter would be
sort=Publish Date,DESC,DateTime
and to sort by item name, the sort parameter would be
sort=Item Name,ASC
I'm not sure anyone can answer your question immediately, apart from probably the module author. But you have a huge advantage in this case - the module sources. Instead of browsing the assemblies with the Reflector, you can check out the latest sources and just debug it. One debug session can answer more questions than a bunch of SO posts. ;-)
Also, as a side note, you might have noticed special Sitecore logos on that page - this blog post will tell you what it means.