improve weka classifier results

improve weka classifier results - weka

I have a database which consists of 27 attributes and 597 instances .
I want to classify it with as best results as possible using Weka.
Which classifier is not important .The class attribute is nominal and the rest are numeric .
The Best results until now was LWL (83.2215) and oneR(83.389). I used attribute selection filter but the results are not improved and no other classifier can give better results even NN or SMO or meta classes.
Any idea about how to improve this database knowing that there are no missing values and the database is about 597 patients gathered in three years.

Have you tried boosting or bagging? These generally can help improve results.
http://machinelearningmastery.com/improve-machine-learning-results-with-boosting-bagging-and-blending-ensemble-methods-in-weka/
Boosting
Boosting is an ensemble method that starts out with a base classifier
that is prepared on the training data. A second classifier is then
created behind it to focus on the instances in the training data that
the first classifier got wrong. The process continues to add
classifiers until a limit is reached in the number of models or
accuracy.
Boosting is provided in Weka in the AdaBoostM1 (adaptive boosting)
algorithm.
Click “Add new…” in the “Algorithms” section. Click the “Choose”
button. Click “AdaBoostM1” under the “meta” selection. Click the
“Choose” button for the “classifier” and select “J48” under the “tree”
section and click the “choose” button. Click the “OK” button on the
“AdaBoostM1” configuration.
Bagging
Bagging (Bootstrap Aggregating) is an ensemble method that creates
separate samples of the training dataset and creates a classifier for
each sample. The results of these multiple classifiers are then
combined (such as averaged or majority voting). The trick is that each
sample of the training dataset is different, giving each classifier
that is trained, a subtly different focus and perspective on the
problem.
Click “Add new…” in the “Algorithms” section. Click the “Choose”
button. Click “Bagging” under the “meta” selection. Click the “Choose”
button for the “classifier” and select “J48” under the “tree” section
and click the “choose” button. Click the “OK” button on the “Bagging”
configuration.

I tried Boosting and Bagging as #applecrusher has mentioned. It showed a little improvement in the accuracy; but for the same data with SKLearn, I was getting a lot better accuracy. When I compared the code and output at each step, I found that train-test split function in SKLearn was, by default, shuffling the data. When I shuffled the data for WEKA using Collections.shuffle(), I saw improved results. Give it a try.

Related

what is the correct way to apply a feature selection method to an imbalanced dataset?

I am new to data science & machine learning, so I'll write my question in detail.
I have an imbalanced dataset (binary classification dataset), and I want to apply these methods by using Weka paltform:
10-Fold cross validation.
Oversampling to balance the data.
A Wrapper feature selection method.
6 classifiers and compare between their performance.
I want to apply them under these conditions:
Balancing the data before applying a feature selection method (reference).
Balancing the data during cross validation (reference).
What is the correct procedure?
I've written a post below with a suggested procedure.

Is this procedure correct?
Firstly, using a feature selection method to reduce the number of features:
From Preprocess tab: Balancing the entire dataset.
From Select atributes tab: Applying a feature selection method to the balanced dataset.
From Preprocess tab: Removing the unselected attributes (resulting from step #2) from the original imbalanced dataset and saving the new copy of the dataset in order to use it in the following.
Then, applying coss validation and balancing methods to the new copy of the dataset:
From Classify tab: Chosing the 10-fold cross validation.
Chosing FilterClassifier and editting its properties:
classifier: selecting the classifier (one by one).
filter: Resampling.

Adding a custom reference point on scatter plot

I am working on a scatter chart to display Speed (X-axis) vs Consumption (Y-axis) of different vehicle designs. The goal of the report is to examine that for the same design, is the particular vehicle more or less efficient than that of others in the market.
I would like to know if it is possible for the user to input the specifications for the particular vehicle's X-axis and Y-axis within the report itself so that the user can compare it visually.
As seen in the image below, say the user has input the specifications for the specific vehicle when it is laden (in red) and when it is ballast (in green).

There are a few options:
When in Direct Query or Mixed Mode:
Embed PowerApps in your report to capture data and write it to your data store and then refresh the visual.
Build a companion App (in the tech of your choice) to update the DB with your parameters.
When in Import Mode: Use what if parameters to provide the input.

How to save the result of feature selection in Weka?

I’m trying to use InfoGainAttributeEval in Weka for feature selection, how to save the result? I try to save it but seems like my weka just save my input data, not the result of feature selection.

Welcome to SO. As far as I understand you want to get the ranked values of the attributes. To do this, right click on the "Ranker + InfoGainAttributeEval" statement in the "Result List" section. Select "Save result buffer". You can see the results in programs such as notepad. You can also import it into "Excel" and create it in the chart. I think you selected "Ranker" in the Search Method section. I think it is an image as seen in the figure below.
After selecting and running "InfoGainAttributeEval" and "Ranker" it will give you a "ranked" list (Use full training set). Right click and select "Save Reduced Data" then save. Open the file in notepad as well. Open in Weka too. Select the ones whose Rank value is 0 in Weka and delete them with "Remove". Let those with rank value be left. Now you can get the same result reduced with these features. Save in .arff format. Now you have acquired Reduced data.

If "Save Reduced Data" is not working for you, here is another approach.
Attribute selection can be accomplished in the Preprocess tab.
There is a bar near the top for Filtering the data. Click the
"Choose" button. Under Filters->Supervised->Attribute you will
find AttributeSelection. Select that.
Once it says "AttributeSelection" in the Filter bar, you can click
on the bar to pick a selection method and a search method as well as
set the parameters for those choices.
Once you have made your choices for the feature selection algorithm,
click Apply to the right of the filter bar so that the filter is
actually applied to the data. The data should now have the reduced
feature set. So all you need to do is save it by clicking on the
Save button at the top right.
This should save the reduced data set.

Prevent all pages -> page -> visualization filter to apply

I need to create a report with several pages filtered out to some entities (whatever) to display some consumption charts. I made a filter at "all pages" level to modify all pages at once. Working fine!
However I would like to have average for all entities displayed so member of a specific entity can compare to others without having their details.
To do this, I found it would be convenient to have a way to prevent :
visualization to apply "page" and "all pages" filters
"page" to apply"all pages" filters
In other words to prevent filters cascading.
Can this be made ?
Thx,

To stop filter interaction you have to click on your visual and then go to Format > Edit Interaction. Now you see on every other visual a diagram and a circle. If the diagram is grey the interaction is active, is the circle is grey the interaction is inactive.
For your second problem you can activate the sync slicers pane (View > Sync Slicers). However this just works for slivers and not for filters which are in the filter pane.
At the moment it is not possible to stop or edit interactions from filters within the filter pane. They are applied always like it is stated: For a visual, page or all pages.

how to classify using j48 weka with information gain and random attribute selection?

I know that j48 decision tree uses gain ratio to select attribute for making tree.
But i want to use information gain and random selection instead of gain ratio. In select attribute tab in Weka Explorer, I choose InfoGainAttributeEval and put start button. After that I see the sorted list of attribute with information gain method. But I don't know how to use this list to run j48 in Weka. Moreover I don't know how to select attribute randomly in j48.
Please help me if you can.

If you want to perform feature selection on the data before running the algorithm you have two options:
In the Classify tab use AttributeSelectedClassifier (under the meta folder). There you can configure the feature selection algorithm you want. (The default is J48 with CfsSubsetEval).
In the Preprocess tab find and apply AttributeSelect filter (located at supervised\attribute folder). The default here is also the CfsSubsetEval algorithm.
Notice that the first method will apply the algorithm only on the train set when you'll evaluate the algorithm, while the second method will use the entire dataset and will remove features that were not selected (you can use undo to bring them back).
Notice that the way J48 selects features during the training process will remain the same. To change it you need to implement your own algorithm or change the current implementation.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js