Remove attributes not falling in particular range in Weka - weka

I have data-set with 7070 attributes and 70 instances. I want to remove attributes whose values do not lie in the range[20, 2000] in weka. What filter should I use and how to pass parameters to that filter. Also what cutoff should be used for feature selection while using Pearson correlation?

Java code will do. Use "remove" filter and check numeric stats of each attribute if they fall in range add them to an array and then call remove filter.

Related

How can I change the order of the attributes in Weka?

I was doing a machine learning task in Weka and the dataset has 486 attributes. So, I wanted to do attribute selection using chi-square and it provides me ranked attributes like below:
Now, I also have a testing dataset and I have to make it compatible. But how can I reorder the test attributes in the same manner that can be compatible with the train set?
Changing the order of attributes (e.g., when using the Ranker in conjunction with an attribute evaluator) will probably not have much influence on the performance of your classifier model (since all the attributes will stay in the dataset). Removing attributes, on the other hand, will more likely have an impact (for that, use subset evaluators).
If you want the ordering to get applied to the test set as well, then simply define your attribute selection search and evaluation schemes in the AttributeSelectedClassifier meta-classifier, instead of using the Attribute selection panel (that panel is more for exploration).

Confusion matrix in Weka

I want to calculate confusion matrix, f1 score, roc etc. But the Weka output is showing this. How can I get the confusion matrix, f1 score, roc, etc?
First of all, your dataset seems to have a numeric class attribute. Correlation coefficient is a statistic generated for regression models. A confusion matrix (which you want) is only computed for classification models.
Secondly, you are using ZeroR as classifier, which is not a very useful classifier (only for determining a baseline). ZeroR either predicts the mean class value (numeric class attribute) or the majority class (nominal class attribute).
Solutions:
Ensure that you are using the right attribute for your class. Assuming that you are using the Weka Explorer, check the combobox on the Classify panel that it has the right attribute selected. On the command-line, use the -c flag to specify the index of the class attribute (1-based index, first and last can be used as well).
If you imported your data from a CSV file and the class attribute column contains only numeric values, then Weka will have left it as numeric (it doesn't know that this column represents a nominal attribute). In that case, make sure that you convert your class attribute to a nominal one, e.g., by using the NumericToNominal filter in the Preprocess panel.
Choose a different classifier, like RandomForest or J48, which tend to generate reasonable models with just the default parameters.

Negative filtering by filter_box or some other mechanism

Let's say I have a column named Column1. There are more than 10k different values for this column, but my goal is to display on a dashboard all data except few of them. Is it possible to achieve it in Superset? As far as I understand the only one option to filter dashboard is a filter_box, and I have to choose values explicitly in filterbox, so no way to use a negative filter. Is it true, or there is some hidden mechanism?
You can use the limit selector values option to provide the filter out values you dont need by specifying the column name and the list of values you would like to ignore using the appropriate condition like *equals, not equals, etc

Remove Missing Values in Weka

I'm using a dataset in Weka for classfication that includes missing values. As far as I understood, Weka replaces them automatically with the Modes or Mean of the training data (using the filter unsupervised/attribute/ReplaceMissingValues) when using a classifier like NaiveBayes.
I would like to try removing them, to see how this effects the quality of the classifier. Is there a filter to do that?
See this answer below for a better, modern approach.
My approach is not the perfect one because IF you have more than 5 or 6 attributes then it becomes quite cumbersome to apply but I can suggest that MultiFilter should be used for this purpose if only a few attributes have missing values.
If you have missing values in 2 attributes then you'll use RemoveWithValues 2 times in a MultiFilter.
Load your data in Weka Explorer
Select MultiFilter from the Filter area
Click on MultiFilter and Add RemoveWithValues
Then configure each RemoveWithValues filter with the attribute index and select True in matchMissingValues
Save the filter settings and click Apply in Explorer.
Use the removeIf() method on weka.core.Instances using the method reference from weka.core.Instance for the hasMissingValue method, which returns a boolean if a given Instance has any missing values.
Instances dataset = source.getDataSet(); // for some source
dataset.removeIf(Instance::hasMissingValue);

what does the attribute selection in preprocess tab do in weka?

I cant seem to find out what attribute selection filter does in pre process tab? someone could please tell me in simple language as im new to weka
when i apply it to my dataset it seems to remove a couple of attributes but im unsure why
A real data set may contain many attributes. Applying any data mining process on this data set (e.g. finding clusters, generating a classification model ...) may take very long time.
Instead of that, we can select some attributes(dimensions) which is called the most discriminative attributes. These attributes can almost describe the data set with lower number of attributes and this will speed up any process done on the data.
Attribute selection tab contains many different methods for selecting these attributes. One of them is CFS Feature Set Evaluation This filter gives you the attributes that have higher correlation with the class label which makes them discriminative attributes.