weka say that my data set train and test set are not compatible - data-mining

why weka say my train and test file is not compatible
I'm trying to test my model with new dataset in weka. I have done the same preprocessing step as i have done for building my model. I have all the attributes(train vs test dataset) in same order, same attribute names and data types. But still i'm not able to resolve the issue. Both of the files train and test seems to be similar but the weka explorer is giving me error saying Train and test set are not compatible. How to resolve this error?

Related

Create arff training and test files for weka

Good day.
An apology for my English but it's not my native language so they know apologize for any errors.
I have a text file with the data processing which want to get a .arff file, which is the file type using weka.
I do not want to generate a single file. I get 2 files, one for training the model (training) and another to test the model (test).
This is done directly weka eh applying a filter stringtowordtokenizer but the problem is that when you use the second file to test a mistake because it is not fair test a model that has words that should not be.
If someone helps me I would appreciate.
Thank you and best regards.

Load existing Model in Weka Knowledge Flow

I am trying to plot multiple ROC curves in the same diagram in Weka. I have learnt that I can do this in Weka Knowledge Flow using "Model Performance Chart". However, I can't figure out how to do this for existing models.
I have tried using ArffLoader and TestSetMaker to generate the testing data, and connected this to a suitable Classifier icon (eg AdaBoostM1 when this is the kind of model I am trying to load). In the configurations of the Classifier icon I choose "load model" and in the Status bar it says "Loaded model.". However, when I run this it says "ERROR: no trained/loaded classifier to use for prediction".
Can anyone tell me what I am doing wrong here? Thanks in advance!
There is a post that was published here that indicates some ambiguity in the meaning of the error. It also continues to state that the order of attributes and the number and order of values is also rather important.
It also states that 'for performance results to be computed, your Knowledge Flow process will need a "ClassifierPerformanceEvaluator" component after the classifier and before a TextViewer component.'
If you are new with the KnowledgeFlow environment, there is a great tutorial here from Rushdi Shams that details the general process.
Below is a sample workflow that has generated desirable results using AdaBoost (preloaded model):
Hope this Helps!

Weka - Measuring testing time

I'm using Weka 3.6.8 to carry out some machine learning and I'm want to find the 'time taken to test model on training/testing data'. When I test a predictive model on evaluation data, this parameter seems to be missing. Has this feature been removed from Weka or is it just a setting I'm missing? All I seem to be able to find is the time taken to build the actual predictive model. (I've also checked the Weka Manual but can't find anything)
Thanks in advance
That feature was added to 3.7.7, you need to upgrade. You should be able to get this data by running the test on the command line with the -T parameter.

WEKA: Classifying an ARFF data with a given SMO model

I'm new with weka and this is my problem:
I've a unlabeled arff data and a given SMO model; I need classify this data with that model.
I searched examples, but all of them use a testing set to build classifier and I've not testing sets.
I need get classification with java or weka command line.
I tryed (under linux) command like:
java weka.classifiers.functions.SMO -l /path/of/mymodel/SMOModel.model -T /path/pf/myunlabeledarff/unlabeled.arff
but I get several errors :S
Can someone help me?
Thanks a lot
Documentation showing that the -l flag works is here: http://weka.wikispaces.com/Primer. That documentation also indicates that your syntax is correct, and that what you are trying to do is possible.
You say that the data is unlabeled: this can cause errors if the arff file you are using to predict does not match the format of the arff file which was used to create the model. Make sure that the arff header has the class attribute declared in it, and that every instance (row) in the file has a class value in it (even if the value is a ? to indicate unknown). Otherwise the formats won't match, and the classifier won't work.
Please post your error messages if this does not solve the problem.

Weka says 'training and test set are not compatible' when both are the same file

I'm getting a very odd error from the weka machine learning toolkit:
java weka.classifiers.meta.AdaBoostM1 -t train.arff -d tmp.model -c 22 //generates the model
java weka.classifiers.meta.AdaBoostM1 -l tmp.model -T train.arff -p 22 //have the model predict values in the set it was trained on.
This produces the message:
java.lang.Exception: training and test set are not compatible
at weka.classifiers.Evaluation.evaluateModel(Evaluation.java:1035)
at weka.classifiers.Classifier.runClassifier(Classifier.java:312)
at weka.classifiers.meta.AdaBoostM1.main(AdaBoostM1.java:779)
But of course, the input files are the same... Any suggestions?
Sometimes Weka is complaining when the class variable does not consist of the same number of classes, e.g. when you training data consists of the classes {a,b,c} and the testing data (loaded later) only has {a,c}. In that case Weka just throws that nice exception :)
Maybe you find a solution in the Weka source code or by loading your data sets with the Weka Explorer. The latter one tells you how the data set is looking like when it is loaded...