Is there a better way to deal with unbalanced data than the preprocessing filter SMOTE? - weka

I am currently working with WEKA and I would appreciate yor advice regarding preprocessing filters when it comes to unbalanced attribute data. I was previously recommended to use the SMOTE filter in order to deal with the problem. I was wondering if anyone could propose any alternative solution. The classifier I am mainly using is MultilayerPerceptron and the SMOTE filter seems to be working decently, but I would like to know if there is another possible method.

Cost-sensitive classification is another approach. See FAQ I have unbalanced data now what on the Weka wiki.

Related

Does ML.Net transformations apply stopwords?

I'm new to ML.Net and playing around with some basic MultiClassClassification scenarios and wondering if it can already handle stopwords by default now or should I do that in my data prep?
Please check out this section of ML.NET cookbook.
If you use mlContext.Transforms.Text.FeaturizeText in your pipeline, it will by default remove English stopwords.
Of course, you are free to tweak your NLP preprocessing using other ML.NET provided components, but, from my little experience with text classification, the catch-all FeaturizeText is doing a reasonable job for most cases.

How to extract the best features through NSGA-II algo in weka

I want to perform the best features selection that should be helpful in best classification results in the form of precision and recall, through NSGA-II in weka.
How can I perform this? Can anyone give me blueprint for this task? Any help will be really appreciated.
Maybe this paper can help you to choose the complements to use in weka.

Classification using text mining - by values versus keywords

I have a classification problem that is highly correlated to economics by city. I have unstructured data in free text such as population, median income, employment, etc. Is it possible to use text mining to understand the values in the text and make a classification. Most text mining articles if have read use keyword or phrase count to make classification. I would like to be able to make classifications by the meaning of the text versus the frequency of the text. Is this possible?
BTW, I currently use RapidMiner and R. Not sure if this would work with either of these?
Thanks in advance,
John
Yes, this probably is possible.
But no, I cannot give you a simple solution, you will have to collect a lot of experience and experiment yourself. There is no push-button magic solution that works for everybody.
As your question is overly broad, I don't think there will be a better answer than "Yes, this might be possible", sorry.
You could think of these as two separate problems.
Extract information from unstructured data.
Classification
There are several approaches to mine specific features from the text. On the other hand you could also use directly use bag of words approach for classification directly and see the results. Depending on your problem, a classifier could potentially learn from just the text features.
You could also use PCA or something similar to find all the important features and then run mining process to extract those features.
All of this depends on your problem which is too broad and vague.

J48 decision tree

I've been searching the web on how to generate J48 decision trees but so far after almost a couple days I haven't found any result about how to generate a J48 decision without Weka, I mean manually by hand. The reason why I wanna do this is because I need to evaluate my data in an assignment.
I would appreciate any information about the j48 algorithm.
The J48 classifier implements the C4.5 algorithm. You should be able to use either a description of that or, if you need to be exactly like what Weka does, you can step through the code itself.
you can use weka as well for developing a simple code, what you have to do, download the jar file of weka, and study the API of weka which is provided by weka as well. and develop your own program to use the algorithm and implement it on your data

How do you structure a database for a wiki site?

What's does the table look like- is there only one? How do you revert to older versions? Similar to how Stack overflow works.
The best way to go about this is to look at other software such as MediaWiki and see how they structure their database. Then you can pick and choose what you want to use to start off on your own wiki design.
On the other hand, you could always start off with a pretty basic spread of tables that would keep track of Users, Articles, Revisions on an Article, etc. and start spiraling out from there.
Mediawiki details in their help pages how they layout their database.
I agree with CookieOfFortune's comment that you should take a look at an existing open source wiki to see how they do it, but I'll also offer this thought prefixed with the fact that I have no experience writing wiki software. Maybe some sort of partial star schema could be useful in maintaining the previous versions.