Problem running the LinearRegression algorithm on the weka "Problem in classifier evaluation: Null - weka

I'm working on an article using weka and after trying to use the LinearRegression algorithm it always returns the "Problem evaluating classifier: null". The arff file is correct, I already ran some other algorithms and there was no problem.enter image description here
What appears on the console after the error is this, which I honestly could not decipher what the real problem is.enter image description here
The file has more than 600k lines, but here a small part (in Portuguese, I am Brazilian ¯_(ツ)_/¯)
enter image description here

Related

How to use .rec format for training in MXNet C++ implementation?

C++ examples of MXNet contain model training examples for MNISTIter, MNIST data set (.idx3-ubyte or .idx1-ubyte). However the same code actually recommends to use im2rec tool to produce the data, and it produces the different .rec format. Looks like the .rec format contains images and labels in the same file, because im2rec takes a prepared .lst file with both (number, label and image file name per each line).
I have produced the code like
auto val_iter = MXDataIter("ImageRecordIter");
setDataIter(&val_iter, "Train", vector < string >
{"output_train.rec", "output_validate.rec"}, batch_size));
with all files present but it fails because four files are still required in the vector (segmentation fault). But why, should not labels be inside the file now?
Digging more into the code, I found that setDataIter actually sets the parameters. Parameters for ImageRecordIter can be found here. I tried to set parameters like path_imgrec, path.imgrec, then call .CreateDataIter() but all this was not helpful - segmentation fault on the first attempt to use the iterator.
I was not able to find a single example in the whole Internet about how to train any MxNet neural network in C++ using .rec file format for training and validation sets. Is it possible? The only work around I found is to try original MNIST tools that produce files covered by MNIST output examples.
Eventually I have used Mnisten to produce the matching data set so that may input format is now the same as MxNet examples use. Mnisten is a good tool to work, just it is important not to forget that it normalizes grayscale pixels into 0..1 range (no more 0..255).
It is a command line tool but with all C++ code available (and there is not really a lot if it), the converter can also be integrated with existing code of the project to handle various specifics. I have never been affiliated with this project before.

Improve quality of tesseract ocr result

I'm developing an OCR app for Android using JNI and a code developed under C++ using OpenCV and Tesseract. It will be used to read a badge with an alphanumeric ID from a photo taken by the app.
I developed an code which handle with the preprocess of the image, in order to obtain a "readable image" as the one below:
I wrote the following function for "reading" the image using tesseract:
char* read_text(Mat input_image)
{
tesseract::TessBaseAPI text_recognizer;
text_recognizer.Init("/usr/share/tesseract-ocr/tessdata", "eng", tesseract::OEM_TESSERACT_ONLY);
text_recognizer.SetVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789");
text_recognizer.SetImage((uchar*)input_image.data, input_image.cols, input_image.rows, input_image.channels(), input_image.step1());
text_recognizer.Recognize(NULL);
return text_recognizer.GetUTF8Text();
}
The expected result is "KQ 978 A3705", but what I get is "KO 978 H375".
I did all the recommendations for improving the quality of the image from https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality. In addition, reading the docs from https://github.com/tesseract-ocr/docs, I tryed using an approximation of the images using polygons in order to get "better" features. The image I used is one like this:
With this image, I get "KO 978 A3705". The result is clearly better than the previous one, but is not fine.
I think that the processed image I pass to tesseract is fine enought to get a good result and I don't get it. I don't know what else to do, so I ask you for ideas in order to solve this problem. I need an exact result and I think I could get it with the processed image I get. Ideas please! =)
I noticed that with some dilations, the result ORC result is improved incredible well! For me, it was the solution

Tesseract OCR not able to train image correctly

I am facing following issue while training Tesseract OCR. I am using Tesseract 3.02 for windows.
I have a dataset of characters which is to be trained. I have written a C++ program to read each character from the data set, crop it & resize it to 40x40 image and merge/paste on a single image of size 650x450 (see attached image). This is repeated for all 100 images in dataset. The C++ program also generates the box file for every character added. I have verified the box file and image using Box editor tools mentioned on the Tesseract wiki. These files are correct. The extension of the merged image is .tif.
I am attaching the image for your reference. The issue is when I train the image in the Tesseract I get following output on console.
F:\test>tesseract eng.normal.exp0.tif eng.normal.exp0 box.train
Tesseract Open Source OCR Engine v3.02 with Leptonica
APPLY_BOXES:
Boxes read from boxfile: 100
Found 100 good blobs.
TRAINING ... Font name = normal
Generated training data for 9 words
Even though there are 36 distinct words or characters in the image, the Tesseract says it could generate training data for only 9 characters. It also says it found 100 good blobs. I do not know why this issue is occurring. The box file has labels for all 100 characters in the image.
Please help!
Thanks
The training data-set should be realistic according to the training guide. Note that as you mentioned it generated training data for 9 words not for 9 characters. Probably it may have identified all the characters. You can use this tool to inspect generated .traineddata file for analyze what are the characters that tesseract have been trained for.
Per Training Wiki, "DO NOT MIX FONTS IN AN IMAGE FILE (In a single .tr file to be precise.) This will cause features to be dropped at clustering, which leads to recognition errors."

how to use svmlight model file to detect object in opencv using c++

I am using OpenCV , C++ and trying to detect object in images. Till now what i have done.
1. from small image(108x64) patch i extracted desired 6200(from one patch) feature. Then i wrote these feature in train.txt and test.txt file in svmLight format.
.2 Then I gave train.txt to svmLight and got model-file. Using this model file i can test classification accuracy which is 90% approx. I have done till now in Ubuntu and OpenCV and C++. and it's command line training and testing both.
3. Now I want to detect object from original images(480x640) using model-file generated during training.
BUT the problem is I don't know how to use model-file to detect object from original image(640x480). I want very basic/fundamental thing that how to use this model-file for detection using simple sliding window(108x64) and svmLight or (LatentSVM or cvSVM). Plz don't tell me that I should resize my original image(image pyramid ) for good accuracy and i should use ADM(Active Deformable Model/Snake). Don't tell me about local maxima suppression to remove extra rectangle box. Just tell me how to detect(step by step complete implementation) and get rectangle box. Thanks in advance , waiting for experts reply.
Thanks GOD , I did it.
what I had in my object detection project that
first i implemented feature extraction part . So using a small image patch i extracted desired feature and write it in a train.txt file in SVMLight format. I used C++ it's very easy. I used 10 image patch so 10 line i wrote in .txt file with appropriate label(1 or -1) with dimension of each feature vector is 6200(so each line in train.txt file have 6200 index and corresponding value).
Second in the same way i generate a test.txt file also. Here label is not necessary but then you need to put 0 as label instead of 1 or -1
I got executable of SVMLight . So simple i use $./svm_learn train.txt model.txt command in Ubuntu. after this i got model file .
I did classification as $./svm_classify test.txt model.txt predict.txt command , and it showed accuracy and precision/recall rate . I got 95% accuracy . It will depend upon number of sample you use for training . later I used 800 positive and 800 negative then i got 97% accuracy . I was very happy .
Till here I did . I got result and was very happy. but After that i don't know what to do and how to do . Then i read lot's of document and article to use this model file for detection object in original image (size 512x512) . Main thing I had no idea how to use Modelfile . After reading lot's of artical in Internet and stackoverflow I really got confused . But somewhere in stackoverflow i read that take code from SVMLight and integrate it with your Application. Same I did.
So Now come to solution of above problem I asked in Question.
first i download the SVMLight Source Code . Tried to understand it. Then I came to conclusion that use only svm_classify module in my Application . Then I copied selected piece of code from svm_classify.c to my Application or you can say I integrated/merged svm_classify.c in my Application.
Don't worry about Modelfile svm_classify load it and will do all work for you.
Then you will get processed value(variable) for every small detection window or inputvector or for each line in test.txt . variable name is"dist" . if it is +ve then object is present in detection window or in test_input_vector else object not present.
NOTE: SVMLight is free to use only for non-commercial . I am using it as research , M-tech project(non-comercial) in government University. If I am avoiding any rule please let me know.
If anyone have doubt . most welcome.

Read svm data and retrain with more data?

I am implementing a facial expression recognition and am using SVM to classify given expression.
When I train, I use this command line
svm.train(myFeatureVector,myLabels,Mat(),Mat(), myParameters);
svm.save("myClassifier.yml");
which will later when I will predict using
response = svm.predict(incomingFeatureVector);
But then when I want to train more than once (exited the program and start again), it seems to have overwritten my previous svm file. Is there any way I could do read previous svm file and add more data into it (and then resave it ,etc) ? I looked up on this openCV documentation and found nothing. However, when I read on this page; there is a method called CvSVM::read. I don't know what that does/how to implement it.
Hope anyone can help me :(
What you are trying to do is incremental learning but unfortunately Support Vector Machines is a batch algorithm, hence if you want to add more data you have to retrain with the whole set again.
There are online learning alternatives, like Pegasos SVM but I am not aware of any that is implemented on OpenCV