WEKA: Classifying an ARFF data with a given SMO model - weka

I'm new with weka and this is my problem:
I've a unlabeled arff data and a given SMO model; I need classify this data with that model.
I searched examples, but all of them use a testing set to build classifier and I've not testing sets.
I need get classification with java or weka command line.
I tryed (under linux) command like:
java weka.classifiers.functions.SMO -l /path/of/mymodel/SMOModel.model -T /path/pf/myunlabeledarff/unlabeled.arff
but I get several errors :S
Can someone help me?
Thanks a lot

Documentation showing that the -l flag works is here: http://weka.wikispaces.com/Primer. That documentation also indicates that your syntax is correct, and that what you are trying to do is possible.
You say that the data is unlabeled: this can cause errors if the arff file you are using to predict does not match the format of the arff file which was used to create the model. Make sure that the arff header has the class attribute declared in it, and that every instance (row) in the file has a class value in it (even if the value is a ? to indicate unknown). Otherwise the formats won't match, and the classifier won't work.
Please post your error messages if this does not solve the problem.

Related

weka say that my data set train and test set are not compatible

why weka say my train and test file is not compatible
I'm trying to test my model with new dataset in weka. I have done the same preprocessing step as i have done for building my model. I have all the attributes(train vs test dataset) in same order, same attribute names and data types. But still i'm not able to resolve the issue. Both of the files train and test seems to be similar but the weka explorer is giving me error saying Train and test set are not compatible. How to resolve this error?

How to use .rec format for training in MXNet C++ implementation?

C++ examples of MXNet contain model training examples for MNISTIter, MNIST data set (.idx3-ubyte or .idx1-ubyte). However the same code actually recommends to use im2rec tool to produce the data, and it produces the different .rec format. Looks like the .rec format contains images and labels in the same file, because im2rec takes a prepared .lst file with both (number, label and image file name per each line).
I have produced the code like
auto val_iter = MXDataIter("ImageRecordIter");
setDataIter(&val_iter, "Train", vector < string >
{"output_train.rec", "output_validate.rec"}, batch_size));
with all files present but it fails because four files are still required in the vector (segmentation fault). But why, should not labels be inside the file now?
Digging more into the code, I found that setDataIter actually sets the parameters. Parameters for ImageRecordIter can be found here. I tried to set parameters like path_imgrec, path.imgrec, then call .CreateDataIter() but all this was not helpful - segmentation fault on the first attempt to use the iterator.
I was not able to find a single example in the whole Internet about how to train any MxNet neural network in C++ using .rec file format for training and validation sets. Is it possible? The only work around I found is to try original MNIST tools that produce files covered by MNIST output examples.
Eventually I have used Mnisten to produce the matching data set so that may input format is now the same as MxNet examples use. Mnisten is a good tool to work, just it is important not to forget that it normalizes grayscale pixels into 0..1 range (no more 0..255).
It is a command line tool but with all C++ code available (and there is not really a lot if it), the converter can also be integrated with existing code of the project to handle various specifics. I have never been affiliated with this project before.

Converting source code directory into ARFF (WEKA)

Currently, i am working on a project using WEKA. Being naive and newbie in it, there are many things which i am not familair with. In my last project I used text files as a classification using WEKA. I applied the TextDirectoryLoader convertor to convert a directory containing text files as mentioned on this URL Text categorization with WEKA. Now I want to use the same stretagy for converting a directory containing source code (instead of text). For example, I have a Jedit source file containing Java source code. I am trying to convert it to ARFF file so that i can apply classifiers or other functions present in WEKA on that ARFF file for data mining purposes. I have also tried a test file given on following URL ARFF files from Text Collections. I believe i can use the same file as an example to convert source code files. However, I do not know what attributes should I define in a FastVector? and What format should the data be in (String or numeric). And what other sections should an ARFF file may have?
As in the example the authors have defined following attributes
FastVector atts = new FastVector(2);
atts.addElement(new Attribute("filename", (FastVector) null));
atts.addElement(new Attribute("contents", (FastVector) null));
I have tried to find some examples on Google but no success.
Could anyone here suggests me any solution or alternate to solve the above said problem? (Example code will be highly appreciated).
Or atleast could give me a short example which convertes a source code directory into an ARFF file. (If it is possible).
If not possible what could be the possible reason
Any alternate solution (except WEKA) where I can use the same set of functions on a source code.
It is not clear, what is your goal? Do you want to classify the source code files, or find the files which are contains any bug, or what?
As I imagine, you want to extract features from each source file, and represent it with an instance. Then you can apply any machine learning based algorithm.
Here, you can find a java example, how can you construct an arff file from java:
https://weka.wikispaces.com/Creating+an+ARFF+file
But, you have to define your task specific features and extract it from each source code files.

Create arff training and test files for weka

Good day.
An apology for my English but it's not my native language so they know apologize for any errors.
I have a text file with the data processing which want to get a .arff file, which is the file type using weka.
I do not want to generate a single file. I get 2 files, one for training the model (training) and another to test the model (test).
This is done directly weka eh applying a filter stringtowordtokenizer but the problem is that when you use the second file to test a mistake because it is not fair test a model that has words that should not be.
If someone helps me I would appreciate.
Thank you and best regards.

Weka says 'training and test set are not compatible' when both are the same file

I'm getting a very odd error from the weka machine learning toolkit:
java weka.classifiers.meta.AdaBoostM1 -t train.arff -d tmp.model -c 22 //generates the model
java weka.classifiers.meta.AdaBoostM1 -l tmp.model -T train.arff -p 22 //have the model predict values in the set it was trained on.
This produces the message:
java.lang.Exception: training and test set are not compatible
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1035)
at weka.classifiers.Classifier.runClassifier(Classifier.java:312)
at weka.classifiers.meta.AdaBoostM1.main(AdaBoostM1.java:779)
But of course, the input files are the same... Any suggestions?
Sometimes Weka is complaining when the class variable does not consist of the same number of classes, e.g. when you training data consists of the classes {a,b,c} and the testing data (loaded later) only has {a,c}. In that case Weka just throws that nice exception :)
Maybe you find a solution in the Weka source code or by loading your data sets with the Weka Explorer. The latter one tells you how the data set is looking like when it is loaded...