Cross Validation JAVA WEKA - weka

I need to run classifiers on WEKA however using the command line and perform 10 folds cross-validation. What would be the command lines for performing cross-validation? The algorithms I need to run are MLP, SMO, and Random Forest. I got the commands for these algorithms, but I need to perform cross-validation.

To perform cross validation using the WEKA CLI, all you need is to omit the Test File command (i.e. the T parameter). This will result into performing CV on the training data.
Using the x parameter to set the number of folds.
Check out this website for full details about WEKA CLI commands : https://www.cs.waikato.ac.nz/~remco/weka_bn/node13.html

Related

Tool for pdf comparison with images to be integrated with test scripts

Analyzing tools for pdf comparison to compare text, images (all elements in the pdf) and that can be integrated with test scripts (for automated comparison)
I looked at Beyond Compare. They have command line utility that can help to trigger the comparison automatically with scripts.
However do not have a robust solution for images comparison within pdf.
Is there any feature that i am missing to enable image comparison
Beyond Compare doesn't support comparing images in PDF files, only plain text.

Wrapping code for PIG script,Hive Queries and Corresponding MapReduce code

I am working on 2 datasets.I have MapReduced those,then Operated on output by means of PIG & HIVE.I want to execute all these steps at once in sequence.How should I wrap these things into a single scritp i.e Map Reduce code,followed by PIG script and finally few Hive queries.
Thanks,
Ketan
You need to wrap those in Oozie workflow.
Oozie enable you to run collection of actions arrange in a DAG - check this link
They have good documentation so you can start with that.

Weka - Measuring testing time

I'm using Weka 3.6.8 to carry out some machine learning and I'm want to find the 'time taken to test model on training/testing data'. When I test a predictive model on evaluation data, this parameter seems to be missing. Has this feature been removed from Weka or is it just a setting I'm missing? All I seem to be able to find is the time taken to build the actual predictive model. (I've also checked the Weka Manual but can't find anything)
Thanks in advance
That feature was added to 3.7.7, you need to upgrade. You should be able to get this data by running the test on the command line with the -T parameter.

Creating custom voice commands (GNU/Linux)

I'm looking for advices, for a personal project.
I'm attempting to create a software for creating customized voice commands. The goal is to allow user/me to record some audio data (2/3 secs) for defining commands/macros. Then, when the user will speak (record the same audio data), the command/macro will be executed.
The software must be able to detect a command in less than 1 second of processing time in a low-cost computer (RaspberryPi, for example).
I already searched in two ways :
- Speech Recognition (CMU-Sphinx, Julius, simon) : There is good open-source solutions, but they often need large database files, and speech recognition is not really what I'm attempting to do. Speech Recognition could consume too much power for a small feature.
- Audio Fingerprinting (Chromaprint -> http://acoustid.org/chromaprint) : It seems to be almost what I'm looking for. The principle is to create fingerprint from raw audio data, then compare fingerprints to determine if they can be identical. However, this kind of software/library seems to be designed for song identification (like famous softwares on smartphones) : I'm trying to configure a good "comparator", but I think I'm going in a bad way.
Do you know some dedicated software or parcel of code doing something similar ?
Any suggestion would be appreciated.
I had a more or less similar project in which I intended to send voice commands to a robot. A speech recognition software is too complicated for such a task. I used FFT implementation in C++ to extract Fourier components of the sampled voice, and then I created a histogram of major frequencies (frequencies at which the target voice command has the highest amplitudes). I tried two approaches:
Comparing the similarities between histogram of the given voice command with those saved in the memory to identify the most probable command.
Using Support Vector Machine (SVM) to train a classifier to distinguish voice commands. I used LibSVM and the results are considerably better than the first approach. However, one problem with SVM method is that you need a rather large data set for training. Another problem is that, when an unknown voice is given, the classifier will output a command anyway (which is obviously a wrong command detection). This can be avoided by the first approach where I had a threshold for similarity measure.
I hope this helps you to implement your own voice activated software.
Song fingerprint is not a good idea for that task because command timings can vary and fingerprint expects exact time match. However its very easy to implement matching with DTW algorithm for time series and features extracted with CMUSphinx library Sphinxbase. See Wikipedia entry about DTW for details.
http://en.wikipedia.org/wiki/Dynamic_time_warping
http://cmusphinx.sourceforge.net/wiki/download

Cross Validation in libsvm

I'm using libsvm library in my project and have recently discovered that it provides out-of-the-box cross validation.
I'm checking the documentation and it says clearly that I have to call svm-train with -n switch to use CV feature
.
When I call it with -v switch I cannot get a model file which is needed by svm-predict.
Implementing Support Vector Machine from scratch is beyond the scope of my project, so I'd rather fix this one if it is broken or ask the community for support.
Can anybody help with that?
Here's the link to the library, implemented in C and C++, and here is the paper that describes how to use it.
Cause libsvm use cv only for parameter selection.
From libsvm FAQ:
Q: After doing cross validation, why there is no model file outputted ?
Cross validation is used for selecting good parameters. After finding them, you want to re-train the whole data without the -v option.
If you are going to use cv for estimating quality of classifier on your data you should implement external cross validation by splitting data, train on some part and test on other.
It's been a while since I used libsvm so I don't think I have the answer you're looking, but if you run the cross-validation and are satisfied with the results, running lib-svm with the same parameters without the -v will yield the same model.