I m using weka and try to test my file but always got a popup window showing "Train and Test set are not compatible". I m using the csv file. All the attributes are same in both file.out of 30 attributes i divide them in two parts first 20 attributes as training set and rest 10 as test set. pls help me.
Your attributes and their order must be the same in the both files. See following Weka Wiki post and stack overflow question 1 and question 2. Even a small difference may cause this error.
According to you their order may be same but according to weka they are not same. Convert them to arff format and try again. You will see that their arff headers are not same. See below example.
CSV file1
Feature A
true
false
CSV file2
Feature A
false
true
Representation of these CSV files as arff header are not SAME.Since their first occurrence change in files, their order in arff header change too.
Related
I use WEKA for Text classification , I have trained data set , and I apply StringToWOrdVector and NumericToNominal filters , and have test data set and applied the same filters on it .
When I try to apply my model on test data ,it gave me the following error
Train and test set are not compatible
I searched for a solution , the error occurred because number of attributes different between two sets, and it always be different because texts in two sets are different
How I can solve this error please ?
The best thing you can do is combine your training and test set into one file and then apply the filter to it all in one go, then split them up again and copy the #attribute values from the combined file into both the training and test files. This way the attributes will be consistent across both files.
I have an .arff file with 5 features named:
JaccardCoefficient,adamicadar,commonneighbors,katz,rootedpagerank
I open the file in weka but it does not show katz values. It shows the max:0 min:0 mean:0 stddev:0
Note that the katz values are so small like 0.0000312. What should I do to see katz values?
I have had a look at your sample file in Weka and have found the zero values that were reported. The data appears to be visually represented correctly, but the precision of the attributes appear to be limited to three decimal places. For this reason, the values are too small to be represented in the attribute list.
One way that you could change this for use with Weka's prediction models is to pre-process the data to a more suitable range. This could be done using normalisation or other rescaling techniques as required for your purposes. In the image below, I have adjusted the data by simply multiplying the attribute by 100, which brought the attribute summaries into a visible range on the screen:
Hope this helps!
While I am running CRF++ on my training data (train.txt) I have got the follwoing error
C:\Users\2012\Desktop\CRF_Software_Package\CRF++-0.58>crf_learn template train.d
ata model
CRF++: Yet Another CRF Tool Kit
Copyright (C) 2005-2013 Taku Kudo, All rights reserved.
reading training data: tagger.cpp(393) [feature_index_->buildFeatures(this)]
0.00 s
My training data contains Unicode characters and the data is saved using Notepad (encoding= Unicode big indian)
I am not sure If the problem with the template or with the format of the training data. How can I check the format of the training data?
I think this is because of your template file.
Please check whether you have included the last column which is gold-standard as training features. The column index starts from 0.
E.g if you have 6 column in your BIO file.
The template should not have something like %x[0,5]
The Problem is with the Template file
check your features for incorrect "grammer"
i.e
U10:%x[-1,0]/% [0,0]
you realize that after the second % there is a missing 'x'
the corrected line should look like the one below
U10:%x[-1,0]/%x[0,0]
I had the same issue, files are in UTF-8, and template file and training file are definitely in the correct format. The reason was that CRFPP expects at most 1024 columns in the input files. Would be great if it would output an appropriate error message in such a case.
The problem is not with the Unicode encoding, but the template file.
Have a look at this similar Q: The failure in using CRF+0.58 train NE Model
I am generating an .arff file using a Java program. The file has about 600 attributes.
I am unable to open the file in Weka Explorer.
It says: "nominal value not declared in header, read Token[0], line 626."
Here is the first attribute line: #attribute vantuono numeric
Here are the first few chars of line 626: 0,0,0,0,1,0,0,0,0,1,0,1...
Why is WEKA unable to parse '0' as a numeric value?
Interestingly, this happens only in this file. I have other files with numeric attributes accepting '0' for a value.
Are you sure that your declaration is correct? The WEKA FAQ says:
nominal value not declared in header, read Token[X], line Y
If you get this error message than you seem to have declared a nominal attribute in the ARFF header section, but Weka came across a value ("X") in the data (in line Y) for this particular attribute that wasn't listed as possible value.
All nominal values that appear in the data must be declared in the header.
There is also a bug regarding sparse ARFF files
Increase the memory to accommodate all the rows using -B #noOfRecords option.
java weka.core.converters.CSVLoader filename.csv filename.arff -B 33000
If you get this error, it's more likely that in your dataset (after the line #data), you kept the HEADER (column names) that you have already declared. Please remove that header line, and you should be good to go.
I got the same error. Then I saw that my program puts an extra Apostrophe. When I remove the Apostrophe it works
I had such a problem and it costed me so you won't be costed Okay. Just put the class attribute last, and ensure the attributes are in order as in the text.
Each row of my training and test datasets has intensity values for pixels in an image with the last column having the label which tells what digit is represented in the image; the label can be any number from 0 to 9 in training set and is always ? on test set. I loaded the training dataset on Weka Explorer, passed the data through NumericalToNominal filter and used RemovePercentage filter to split the data in 70-30 ratio, the 30% file being used as cross validation set. I built a classifer and saved the model.
Then, I loaded the test data which has ? against label for each row and applied the NumericToNominal filter and saved it as arff file.Now, when i load the test data and try to user the model against it, I always get the error message saying "training and test set are not compatible". Both datasets have undergone the same processing. What possibly could have gone wrong?
As you can read from ARFF manual (http://www.cs.waikato.ac.nz/ml/weka/arff.html):
Nominal values are defined by providing an
listing the possible values: {, ,
, ...}
For example, the class value of the Iris dataset can be defined as
follows:
#ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
So when you apply NumericToNominal to your test file you can possibly have different number of possible values for one or more attributes within train and test arff - it really can happen, it bothered me many times - so one solution is to check your arff's manually (if it is not to big, or just copy and paste invocation of arff file with
e.g.
#attribute 'My first binary attribute' {0,1}
(...)
#attribute 'My last binary attribute' {0,1}
from train to test file - should work
you can use batch filtering, here you can read how to batch filtering in weka