3-fold cross-validation using Joaquim's SVM light - data-mining

I need to do a 3-fold cross validation using Joaquim's SVM light. Cross Validation and SVM are new things to me and I don't know if I'm doing it right. What have I done so far? I converted my data in 3 files that I called fold1.txt fold2.txt fold3.txt with my features in this following model:
1 numberofthefeature:1 numberofthefeature:1 ...
And I also did a file called words.txt with my tokens where the number of the lines are my numberofthefeature. Did I do everything right?
So, now I have to do the 3-fold cross-validation, but I don't know how to do it with Joaquim's SVM light. I don't know to make the svm light learn and classify using the three files and choose which ones I'm going to use as a test and a train. Do I have to do a script or a program to do it?
Thanks to everybody
Thiago

I am gonna assume that you are doing text-mining as you are referring to Thorsten Joachims. Anyways, here is a set of tutorial videos on text classification, with x-validation:
http://vancouverdata.blogspot.ca/2010/11/text-analytics-with-rapidminer-part-5.html

Related

Why Classification model in weka predicting all instances as one class?

I have built a classification model using weka.I have two classes namely {spam,non-spam} After applying stringtowordvector filter, I get 10000 attributes for 19000 records. Then I am using liblinear library to build model which gives me F-score as follows:
Spam-94%
non-spam-98%
When I use same model to predict new instances, it predict all of them as spam.
Also, when I try to use test set same as training set, It predict all of them as spam too. I am mentally exhausted to find the problem.Any help will be appreciated.
I get it also wrong every so often. Then I watch this video to remind myself how it's done: https://www.youtube.com/watch?v=Tggs3Bd3ojQ where Prof Witten, one of the Weka Developers/Architects shows how to use the FilteredClassifier (which in turn is configured to load the StringToWordVector Filter) on the training-dataset and the test-set correctly.
This is shown for weka 3.6, weka 3.7. might be slightly different.
What does ZeroR give you? If it's close to 100%, you know that any classification algorithm should be not too far off either.
Why do you optimize for F-Measure? Just asking. I have never used this and don't know much about it. (I would optimize for the "Precision" metric assuming you have much more Spam than Nonspam).

Read svm data and retrain with more data?

I am implementing a facial expression recognition and am using SVM to classify given expression.
When I train, I use this command line
svm.train(myFeatureVector,myLabels,Mat(),Mat(), myParameters);
svm.save("myClassifier.yml");
which will later when I will predict using
response = svm.predict(incomingFeatureVector);
But then when I want to train more than once (exited the program and start again), it seems to have overwritten my previous svm file. Is there any way I could do read previous svm file and add more data into it (and then resave it ,etc) ? I looked up on this openCV documentation and found nothing. However, when I read on this page; there is a method called CvSVM::read. I don't know what that does/how to implement it.
Hope anyone can help me :(
What you are trying to do is incremental learning but unfortunately Support Vector Machines is a batch algorithm, hence if you want to add more data you have to retrain with the whole set again.
There are online learning alternatives, like Pegasos SVM but I am not aware of any that is implemented on OpenCV

Where can I find completed img's pack for training opencv face recognizing system?

So...
Where can I find completed img's pack for training opencv face recognizing system?
Can anybody help?
have a look here
the att faces db was probably used a lot ( if you look at the docs. )
once you downloaded a set of images, you'll want to run the little python script to generate the needed csv file for training
if you opt for the yale db, you'll have to convert the images to png or pgm first ( opencv can't handle gif's)
but honestly, in the end you want to use a db, that consists entirely of faces you want to recognize [that is, your own db].
unlike most ml algo's it does not need explicit 'negative' images[people other than you want to recognize] here. thoose only add noise and degrade the actual recognition.
the only situation, where you'd want that is when there's only 1 person to recognize. you#d need some others there to increase 'contrast'

Automatic Numberplate Recognition

As the title suggest, i want to build an ANPR application in windows. I am using Brazilian number plates. And i am using OpenCV for this.
So far i manged to extract the letters form the numberplate. Following images show some of the numbers i have extracted.
The problem i am facing is that how to recognize those letter. I tried to use Google tesseract. But it fails to recognize them sometimes. Then i tried to train an OCR data base using OpenCV i used about 10 images for each character. but it also did not work properly.
So i am stuck here. i need this for final year project.So can anybody help me?? i would really appreciate it.
Following site does it very nicely
https://www.anpronline.net/demo.html
Thank you..
you could train an ann or multi-class svm on the letter images, like here
Check out OpenALPR (http://www.openalpr.com). It already has the problem solved.
If you need to do it yourself, you really do need to train Tesseract. It will give you the best results. 10 images per character is not enough, you need dozens or hundreds. If you can find a font that is similar to your plate characters, a good approach is to print out a sheet of paper with all of the characters used multiple times. Then take 5-10 pictures of the page with your camera. These can then be your input for training Tesseract.

How can i translate my feature matrix to weka language?

I need some help please.
Well, i have some feature vectores from 2 classes (2 differents movements of upper limb). Now i need to put my feature matrix (all feature vectors) in weka to classify my movements, specifically with SVM algorithm. But i never worked with weka before, or with java or with format arff. How can i translate my feature matrix to weka language?
Thank you very much. I will apreciate all help
Lilia
Realized it should probably be a full answer, but there are a number of great documents out there that detail the .arff file format. Since you already have feature vectors it's worth just using each entry in that feature vector as a different numerical output.
There's a good explanation of the Arff format here: http://www.cs.waikato.ac.nz/ml/weka/arff.html
There's a Java example showing how to convert a csv to an arff file programatically:
http://weka.wikispaces.com/Converting+CSV+to+ARFF
And there's even an online tool that will do most of it for you (I don't really recommend this as it makes sometimes critical mistakes):
http://slavnik.fe.uni-lj.si/markot/csv2arff/csv2arff.php
Though if all you want to do is run some regression, weka will let you do that without converting anything to arff.