I want to perform the best features selection that should be helpful in best classification results in the form of precision and recall, through NSGA-II in weka.
How can I perform this? Can anyone give me blueprint for this task? Any help will be really appreciated.
Maybe this paper can help you to choose the complements to use in weka.
Related
I am currently working with WEKA and I would appreciate yor advice regarding preprocessing filters when it comes to unbalanced attribute data. I was previously recommended to use the SMOTE filter in order to deal with the problem. I was wondering if anyone could propose any alternative solution. The classifier I am mainly using is MultilayerPerceptron and the SMOTE filter seems to be working decently, but I would like to know if there is another possible method.
Cost-sensitive classification is another approach. See FAQ I have unbalanced data now what on the Weka wiki.
how can I tell if the classification model I build in Weka is correct or wrong? What helps me find my mistakes while building classification modules?
In machine learning there is no such thing as totally correct or totally wrong. Your classifier makes decisions: some of them are correct, some of them are wrong. Quality of classifier is a ratio of correct answers to wrong ones. So you take a test set with known answers, apply the classifier on it and check how much correct answers it gives. The test set must not be part of learning set because otherwise you will not detect overfitting.
I have a classification problem that is highly correlated to economics by city. I have unstructured data in free text such as population, median income, employment, etc. Is it possible to use text mining to understand the values in the text and make a classification. Most text mining articles if have read use keyword or phrase count to make classification. I would like to be able to make classifications by the meaning of the text versus the frequency of the text. Is this possible?
BTW, I currently use RapidMiner and R. Not sure if this would work with either of these?
Thanks in advance,
John
Yes, this probably is possible.
But no, I cannot give you a simple solution, you will have to collect a lot of experience and experiment yourself. There is no push-button magic solution that works for everybody.
As your question is overly broad, I don't think there will be a better answer than "Yes, this might be possible", sorry.
You could think of these as two separate problems.
Extract information from unstructured data.
Classification
There are several approaches to mine specific features from the text. On the other hand you could also use directly use bag of words approach for classification directly and see the results. Depending on your problem, a classifier could potentially learn from just the text features.
You could also use PCA or something similar to find all the important features and then run mining process to extract those features.
All of this depends on your problem which is too broad and vague.
I've been searching the web on how to generate J48 decision trees but so far after almost a couple days I haven't found any result about how to generate a J48 decision without Weka, I mean manually by hand. The reason why I wanna do this is because I need to evaluate my data in an assignment.
I would appreciate any information about the j48 algorithm.
The J48 classifier implements the C4.5 algorithm. You should be able to use either a description of that or, if you need to be exactly like what Weka does, you can step through the code itself.
you can use weka as well for developing a simple code, what you have to do, download the jar file of weka, and study the API of weka which is provided by weka as well. and develop your own program to use the algorithm and implement it on your data
I am just starting to play around with the Weka API and a couple of the example data sets, but just wanted to understand a couple bits and pieces. Does anyone know how to perform 0.632 bootstrapping in Weka?
Also how do would I go about detecting outliers (I understand there are many different methods of doing this...)?
Also how would I remove say 10% of outliers, once they have been identified?
Any help would be greatly appreciated!
Cheers,
Neil
You can perform supervised resampling, which is what bootstrap is, using the Resample filter.