Set all negative attribute values to zero in weka - weka

I have an attribute that I added using AddExpression filter, and now I want to change its values so that all negative values are set to zero. I tried using MathExpression filter like this:
MathExpression -E "ifelse(A > 0, A, 0)" -V -R 17
17 is the attribute index seen in weka Preprocess/Attributes. But after applying it, I can still see that minimum value for my attribute is -5, not 0 as expected. What am I doing wrong?
If it changes anything, I removed some attributes before applying this filter, so the attribute index changed

The problem appears to be some kind of bug - it sometimes works but sometimes it doesn't. I don't know the exact way to reproduce it. However, I've found a workaround, that works if there aren't many negative values.
If you click on the Edit button, you can sort rows by the attribute you want to modify, and manually change negative values to zeros. If you want to preserve the original row order, before sorting add an ID attribute using AddID filter. After you're finished modyfing values, sort data by ID to restore the original order.

Related

Remove attributes not falling in particular range in Weka

I have data-set with 7070 attributes and 70 instances. I want to remove attributes whose values do not lie in the range[20, 2000] in weka. What filter should I use and how to pass parameters to that filter. Also what cutoff should be used for feature selection while using Pearson correlation?
Java code will do. Use "remove" filter and check numeric stats of each attribute if they fall in range add them to an array and then call remove filter.

Weka GUI: add attribute is-missing-value

I have a couple of attributes with missing values.
This is a survey, so the fact that the person refused to answer is, by itself, useful information!
I would like to create a new attribute called is-missing-value = 1 if a given value in an attribute is a missing value and 0 otherwise.
Things I have tried:
I have tried using AddExpression, but this seems to only perform arithmetic operations such as 2*attribute.
I know that MathExpression allows using if-elses, such as ifelse(A < 3.0, 1, 0)... Do you guys know if/how I can test if a value is nan?
MakeIndicator (or NominalToBinary) should be able to do what I want, but I think I need (i) to convert my missing values to a nominal value, so that then (ii) I can convert this new nominal value to binary. The problem is that ReplaceMissingValue only works for mode or mean; I need to be able to define a new value. One solution could be to Edit the data directly, but I'd rather avoid this.
Please notice that I need to do this using the Weka GUI, not the Java interface.
I think I have a solution for you:
copy the attribute (if you want the original one to remain): apply the copy filter (this and the following filters are all under unsupervised/attribute folder) with the index of the attribute
Convert your attribute to nominal using the numericToNominal filter (set the attribute index)
Fill the missing values with a new value using ReplaceMissingWithUserConstant. Here you need to specify the nominalStringReplacementValue parameter (e.g. "missing") in addition to the index of your attribute.
Apply the NominalToBinary filter on your attribute. This will create several new attributes (as the number of unique values in the dataset + the missing value). You can remove the attributes you don't need and keep only the missing attribute.
Hope it helped.

Remove instances where nominal attribute = value (Weka GUI)

I have a dataset with:
400 instances
I have one nominal attribute cluster with values:
cluster1
cluster2
...
cluster10
How do I remove instances where e.g. cluster=cluster5? (using the GUI)
I was told to use the filter weka.filters.unsupervised.instance.RemoveWithValues, but it seems to only be able to remove numerical values below a certain splitPoint. I could of course use the Edit window, but notice I have 400 instances!
weka.filters.unsupervised.instance.RemoveWithValues will remove nominal values. Note the field "nominalIndices" in the image below. After selecting the index of the desired attribute, enter the index of the nominal value you would like to have removed.

Infragistics UltraGrid - How to use displayed values in group by headers when using an IEditorDataFilter?

I have a situation where I'm using the IEditorDataFilter interface within a custom UltraGrid editor control to automatically map values from a bound data source when they're displayed in the grid cells. In this case it's converting guid-based key values into user-friendly values, and it works well by displaying what I need in the cell, but retaining the GUID values as the 'value' behind the scenes.
My issue is what happens when I enable the built-in group by functionality and the user groups by a column using my editor. In that case the group by headers default to using the cell's value, which is the guid in my case, so I end up with headers like this:
Column A: 7F720CE8-123A-4A5D-95A7-6DC6EFFE5009 (10 items)
What I really want is the cell's display value to be used instead so it's something like this:
Column A: Item 1 (10 items)
What I've tried so far
Infragistics provides a couple mechanisms for modifying what's shown in group by rows:
GroupByRowDescriptionMask property of the grid (http://bit.ly/1g72t1b)
Manually set the row description via the InitializeGroupByRow event (http://bit.ly/1ix1CbK)
Option 1 doesn't appear to give me what I need because the cell's display value is not exposed in the set of tokens they provide. Option 2 looks promising but it's not clear to me how to get at the cell's display value. The event argument only appears to contain the cell's backing value, which in my case is the GUID.
Is there a proper approach for using the group by functionality when you're also using an IEditorDataFilter implementation to convert values?
This may be frowned upon, but I asked my question on the Infragistic forums as well, and a complete answer is available there (along with an example solution demonstrating the problem):
http://www.infragistics.com/community/forums/p/88541/439210.aspx
In short, I was applying my custom editors at the cell level, which made them unavailable when the rows were grouped together. A better approach would be to apply the editor at the column level, which would make the editor available at the time of grouping, and would provide the expected behavior.

Remove Missing Values in Weka

I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues) when using a classifier like NaiveBayes.
I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?
See this answer below for a better, modern approach.
My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.
If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.
Load your data in Weka Explorer
Select MultiFilter from the Filter area
Click on MultiFilter and Add RemoveWithValues
Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
Save the filter settings and click Apply in Explorer.
Use the removeIf() method on weka.core.Instances using the method reference from weka.core.Instance for the hasMissingValue method, which returns a boolean if a given Instance has any missing values.
Instances dataset = source.getDataSet(); // for some source
dataset.removeIf(Instance::hasMissingValue);