how to add a new weka classification algorithm to weka - weka

I want to use some classification algorithm by weka(like c4.5, ID3) but I dont know how to add them to weka! Are they available on weka? and if these algorithm are not available how can I add them? And I could not find weka.classifiers.trees
package on google

Weka's implementation of C4.5 (and its precursor ID3) is called J48. J is for Java (and 48 is for 1998, or it is just some increment, I don't know).
Choose Weka Explorer, "Classify", "Classifier", "Tree" , "J48"
Click "More" Button:
NAME weka.classifiers.trees.J48
SYNOPSIS Class for generating a pruned or unpruned C4.5 decision tree.
For more information, see
Ross Quinlan (1993). C4.5: Programs for Machine Learning. Morgan
Kaufmann Publishers, San Mateo, CA.

Yes, they are available in weka,
As knb mentioned, we talk about J48 as if it’s synonymous with C4.5 and you dont need to download it.
In order to use ID3 classification algorithm you should download a package named "simpleEducationalLearningSchemes".
for the full process to:
Weka 3.8 Package installation: What are the steps to add id3?
(There's an answer there written by G5W)

Related

Weka software decision tree

After installing Weka 3.8, load a .csv file in the Explorer, I wanted to build a decision tree with the parameter "use training set".
Everything was installed ok (a self-extracting executable for 64-bit Windows that includes Oracle's 64-bit Java VM 1.8)
The file loaded fine as it was saved previously, using excel, as coma delimited.
The problem lays in building the decision tree itself: I go to the tab Classify, select the test option "use training set" and start.
After starting, appears a particular result which, accordingly to some images I've seen before, should allow, from there, to right-click and select "Visualize Tree".
That doesn't happen, as you can see in the next image:
How do I fix this in order to build the decision tree?
You have run a ZeroR classifier, see http://chem-eng.utoronto.ca/~datamining/dmc/zeror.htm. The ZeroR classifier is not a decission tree classifier and can not be visualised as such. You need to train an actual decission tree classifier, J48 is one of them. See http://facweb.cs.depaul.edu/mobasher/classes/ect584/WEKA/classify.html for a guide on how to do so.

Weka ,Text Classification on an arff file

.This is a basic question .I am trying to classify text files into 20 different classes.
Therefore I have a project structure with a folder called train,test.
In the train folder I have 20 different folders ,each folder again has many files related to that particular class.ex:weather, atheism...etc
I have now created a train.arff file for the entire train folder.When the data is visualized through I can see only two attributes .
Have provided a link below:
Screen in weka
My Doubt is how can i view the various files under these folders and remove the stopwords,punctuation,stemmin.How do I go about preprocessing.If some links to good resources are available please suggest and provide the necessary links
I found the videos below quite helpful when I first got my hands on text classification using Weka. You might want to take a look.
Weka Tutorial 31: Document Classification 1 (Application)
Weka Tutorial 32: Document classification 2 (Application)
WEKA Text Classification for First Time & Beginner Users
You might want to use StringToWordVector filter to see the effect of each word as an attribute, which is indeed described in detail in the first and last video . Within the filter settings you can give a stopwords list and choose in each run to use it or not. Same with the stemming you can change it as well. This documentation and videos will get you to understand it easily.

weka SVM multi class classifier

I understand that weka use a 1 to 1 approach in terms of SVM. However, i would like to classify documents and i have 10 class labels.
Is it possible to change the parameters to change it to a 1 vs rest approach instead.
How should i actually go about doing it.
The official site http://weka.wikispaces.com/LibSVM does not help much
Other classification methods such as naive bayes have been tried but i would like to compare the results against SVM methods
LIBSVM also allows multi-label classification. You can find here examples of implementation.
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multilabel/
You can also search papers that using LibSVM with non-binary datasets.
i.e. http://link.springer.com/article/10.1007/s00521-011-0793-1
Anyway another variant to use LibSVM is SMO WEKA library.
In Weka Version 3.8, the multi-class meta classifier can be used. It also has options including 1-against-1 and 1-against-all multi-classification methods.

Creating custom voice commands (GNU/Linux)

I'm looking for advices, for a personal project.
I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed.
The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example).
I already searched in two ways :
- Speech Recognition (CMU-Sphinx, Julius, simon) : There is good open-source solutions, but they often need large database files, and speech recognition is not really what I'm attempting to do. Speech Recognition could consume too much power for a small feature.
- Audio Fingerprinting (Chromaprint -> http://acoustid.org/chromaprint) : It seems to be almost what I'm looking for. The principle is to create fingerprint from raw audio data, then compare fingerprints to determine if they can be identical. However, this kind of software/library seems to be designed for song identification (like famous softwares on smartphones) : I'm trying to configure a good "comparator", but I think I'm going in a bad way.
Do you know some dedicated software or parcel of code doing something similar ?
Any suggestion would be appreciated.
I had a more or less similar project in which I intended to send voice commands to a robot. A speech recognition software is too complicated for such a task. I used FFT implementation in C++ to extract Fourier components of the sampled voice, and then I created a histogram of major frequencies (frequencies at which the target voice command has the highest amplitudes). I tried two approaches:
Comparing the similarities between histogram of the given voice command with those saved in the memory to identify the most probable command.
Using Support Vector Machine (SVM) to train a classifier to distinguish voice commands. I used LibSVM and the results are considerably better than the first approach. However, one problem with SVM method is that you need a rather large data set for training. Another problem is that, when an unknown voice is given, the classifier will output a command anyway (which is obviously a wrong command detection). This can be avoided by the first approach where I had a threshold for similarity measure.
I hope this helps you to implement your own voice activated software.
Song fingerprint is not a good idea for that task because command timings can vary and fingerprint expects exact time match. However its very easy to implement matching with DTW algorithm for time series and features extracted with CMUSphinx library Sphinxbase. See Wikipedia entry about DTW for details.
http://en.wikipedia.org/wiki/Dynamic_time_warping
http://cmusphinx.sourceforge.net/wiki/download

KORMARC to MARC21 converter

Does anyone know if there is a free open-source solution to convert KORMARC (Korean MARC) into MARC21 (aka USMARC)?
While I'm not certain it has KORMARC support, you may want to try USEMARCON if you can find a mapping. From the USEMARCON page:
USEMARCON facilitates the conversion of catalogue records from one MARC format to another e.g. from UKMARC to UNIMARC. The software was designed as a toolbox-style application, allowing users with detailed knowledge of the source and target MARC formats to develop rules governing the behaviour of the conversion. Rules files may be supplemented by additional tables for more accurate conversion of MARC-specific character sets or coded information. The tables and rules files are simple ASCII text files and can be created using any standard text editor such as MS Windows Notepad.
Also, this thread from the Ask a Korean Studies Librarian Google Group might be useful, particularly the following message:
Library of Congress once tried to download records from the National
Library of Korea (NLK) to use as order records. LC wrote a
specification and developed a in-house program to convert KORMARC to
USMARC. Since NLK records only provide script, LC used a
transliterator to provide romanization for Voyager system developed by
non-LC programmer. The feedback of this method is not very positive
by LC staff. ... In stead of converting KORMARC to USMARC, a few research libraries
including LC is currently using MarcEdit with Excel spreadsheets which
are provided by Korean vendors based on contract. Vendors provide
both Korean script and romanization for several elements of MARC
fields (ISBN, title, author, publisher, place, series, etc.) in
different columns of spreadsheet for your order items. It sounds a
lot simpler to set up initially. And once MarcEdit is set up
properly, it creates MARC records.