How can i translate my feature matrix to weka language? - weka

I need some help please.
Well, i have some feature vectores from 2 classes (2 differents movements of upper limb). Now i need to put my feature matrix (all feature vectors) in weka to classify my movements, specifically with SVM algorithm. But i never worked with weka before, or with java or with format arff. How can i translate my feature matrix to weka language?
Thank you very much. I will apreciate all help
Lilia

Realized it should probably be a full answer, but there are a number of great documents out there that detail the .arff file format. Since you already have feature vectors it's worth just using each entry in that feature vector as a different numerical output.
There's a good explanation of the Arff format here: http://www.cs.waikato.ac.nz/ml/weka/arff.html
There's a Java example showing how to convert a csv to an arff file programatically:
http://weka.wikispaces.com/Converting+CSV+to+ARFF
And there's even an online tool that will do most of it for you (I don't really recommend this as it makes sometimes critical mistakes):
http://slavnik.fe.uni-lj.si/markot/csv2arff/csv2arff.php
Though if all you want to do is run some regression, weka will let you do that without converting anything to arff.

Related

Input for Hidden Markov Model-based speech recognition program

I am going to build a speech recognition program based on Hidden Markov Model. Unfortunately, I don't know how to get an input sound sequence, and, well, work with it. Can anyone tell me what is the general approach for reading values from a sound file format (i.e. .wav, .mp3, etc)and slicing a soundtrack into pieces in C++?
The general approach is to convert an input sound into the sequence of feature vectors (usually, MFCCs). This process is described in general in CMU Sphinx wiki, and described in details in HTK Book. You might also want to study the general-purpose openSMILE toolkit to see how it is done in C++.

Read svm data and retrain with more data?

I am implementing a facial expression recognition and am using SVM to classify given expression.
When I train, I use this command line
svm.train(myFeatureVector,myLabels,Mat(),Mat(), myParameters);
svm.save("myClassifier.yml");
which will later when I will predict using
response = svm.predict(incomingFeatureVector);
But then when I want to train more than once (exited the program and start again), it seems to have overwritten my previous svm file. Is there any way I could do read previous svm file and add more data into it (and then resave it ,etc) ? I looked up on this openCV documentation and found nothing. However, when I read on this page; there is a method called CvSVM::read. I don't know what that does/how to implement it.
Hope anyone can help me :(
What you are trying to do is incremental learning but unfortunately Support Vector Machines is a batch algorithm, hence if you want to add more data you have to retrain with the whole set again.
There are online learning alternatives, like Pegasos SVM but I am not aware of any that is implemented on OpenCV

weka SVM multi class classifier

I understand that weka use a 1 to 1 approach in terms of SVM. However, i would like to classify documents and i have 10 class labels.
Is it possible to change the parameters to change it to a 1 vs rest approach instead.
How should i actually go about doing it.
The official site http://weka.wikispaces.com/LibSVM does not help much
Other classification methods such as naive bayes have been tried but i would like to compare the results against SVM methods
LIBSVM also allows multi-label classification. You can find here examples of implementation.
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multilabel/
You can also search papers that using LibSVM with non-binary datasets.
i.e. http://link.springer.com/article/10.1007/s00521-011-0793-1
Anyway another variant to use LibSVM is SMO WEKA library.
In Weka Version 3.8, the multi-class meta classifier can be used. It also has options including 1-against-1 and 1-against-all multi-classification methods.

3-fold cross-validation using Joaquim's SVM light

I need to do a 3-fold cross validation using Joaquim's SVM light. Cross Validation and SVM are new things to me and I don't know if I'm doing it right. What have I done so far? I converted my data in 3 files that I called fold1.txt fold2.txt fold3.txt with my features in this following model:
1 numberofthefeature:1 numberofthefeature:1 ...
And I also did a file called words.txt with my tokens where the number of the lines are my numberofthefeature. Did I do everything right?
So, now I have to do the 3-fold cross-validation, but I don't know how to do it with Joaquim's SVM light. I don't know to make the svm light learn and classify using the three files and choose which ones I'm going to use as a test and a train. Do I have to do a script or a program to do it?
Thanks to everybody
Thiago
I am gonna assume that you are doing text-mining as you are referring to Thorsten Joachims. Anyways, here is a set of tutorial videos on text classification, with x-validation:
http://vancouverdata.blogspot.ca/2010/11/text-analytics-with-rapidminer-part-5.html

How do I write a Perl script to filter out digital pictures that have been doctored?

Last night before going to bed, I browsed through the Scalar Data section of Learning Perl again and came across the following sentence:
the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings.
An idea immediately hit me that I could actually let Perl scan the pictures that I have stored on my hard disk to check if they contain the string Adobe. It seems by doing so, I can tell which of them have been photoshopped. So I tried to implement the idea and came up with the following code:
#!perl
use autodie;
use strict;
use warnings;
{
local $/="\n\n";
my $dir = 'f:/TestPix/';
my #pix = glob "$dir/*";
foreach my $file (#pix) {
open my $pic,'<', "$file";
while(<$pic>) {
if (/Adobe/) {
print "$file\n";
}
}
}
}
Excitingly, the code seems to be really working and it does the job of filtering out the pictures that have been photoshopped. But problem is many pictures are edited by other utilities. I think I'm kind of stuck there. Do we have some simple but universal method to tell if a digital picture has been edited or not, something like
if (!= /the origianl format/) {...}
Or do we simply have to add more conditions? like
if (/Adobe/|/ACDSee/|/some other picture editors/)
Any ideas on this? Or am I oversimplifying due to my miserably limited programming knowledge?
Thanks, as always, for any guidance.
Your best bet in Perl is probably ExifTool. This gives you access to whatever non-image information is embedded into the image. However, as other people said, it's possible to strip this information out, of course.
I'm not going to say there is absolutely no way to detect alterations in an image, but the problem is extremely difficult.
The only person I know of who claims to have an answer is Dr. Neal Krawetz, who claims that digitally altered parts of an image will have different compression error rates from the original portions. He claims that re-saving a JPEG at different quality levels will highlight these differences.
I have not found this to be the case, in my investigations, but perhaps you might have better results.
No. There is no functional distinction between a perfectly edited image, and one which was the way it is from the start - it's all just a bag of pixels in the end, after all, and any other metadata you can remove or forge all you want.
The name of the graphics program used to edit the image is not part of the image data itself but of something called meta data - which may be stored in the image file but, as others have noted, is neither required (so some programs may not store it, some may allow you an option of not storing it) nor reliable - if you forged an image, you might have forged the meta data as well.
So the answer to your question is "no, there's no way to universally tell if the pic was edited or not, although some image editing software may write its signature into the image file and it'll be left there by carelessness of the editing person.
If you're inclined to learn more about image processing in Perl, you could take a look at some of the excellent modules CPAN has to offer:
Image::Magick - read, manipulate and write of a large number of image file formats
GD - create colour drawings using a large number of graphics primitives, and emit the drawings in various formats.
GD::Graph - create charts
GD::Graph3d - create 3D Graphs with GD and GD::Graph
However, there are other utilities available for identifying various image formats. It's more of a question for Super User, but for various unix distros you can use file to identify many different types of files, and for MacOSX, Graphic Converter has never let me down. (It was even able to open the bizarre multi-file X-ray of my cat's shattered pelvis that I got on a disc from the vet.)
How would you know what the original format was? I'm pretty sure there's no guaranteed way to tell if an image has been modified.
I can just open the file (with my favourite programming language and filesystem API) and just write whatever I want into that file willy-nilly. As long as I don't screw something up with the file format, you'd never know it happened.
Heck, I could print the image out and then scan it back in; how would you tell it from an original?
As other's have stated, there is no way to know if the image was doctored. I'm guessing what you basically want to know is the difference between a realistic photograph and one that has been enhanced or modified.
There's always the option of running some extremely complex image recognition algorithm that would analyze every pixel in your image and do some very complicated stuff to determine if the image was doctored or not. This solution would probably involve AI which would examine millions of photos that are both doctored and those that are not and learn from them. However, this is more of a theoretical solution and isn't very practical... you would probably only see it in movies. It would be extremely complex to develop and probably take years. And even if you did get something like this to work, it probably still wouldn't be 100% correct all the time. I'm guessing AI technology still isn't at that level and could take a while until it is.
A not-commonly-known feature of exiftool allows you to recognize the originating software through an analysis of the JPEG quantization tables (not relying on image metadata). It recognizes tables written by many applications. Note that some cameras may use the same quantization tables as some applications, so this isn't a 100% solution, but it is worth looking into. Here is an example of exiftool run on two images, the first was edited by photoshop.
> exiftool -jpegdigest a.jpg b.jpg
======== a.jpg
JPEG Digest : Adobe Photoshop, Quality 10
======== b.jpg
JPEG Digest : Canon EOS 30D/40D/50D/300D, Normal
2 image files read
This will work even if the metadata has been removed.
There is existing software out there which uses various techniques (compression artifacting, comparison to signature profiles in a database of cameras, etc.) to analyze the actual image data for evidence of alteration. If you have access to such software and the software available to you provides an API for external access to these analysis functions, then there's a decent chance that a Perl module exists which will interface with that API and, if no such module exists, it could probably be created rather quickly.
In theory, it would also be possible to implement the image analysis code directly in native Perl, but I'm not aware of anyone having done so and I expect that you'd be better off writing something that low-level and processor-intensive in a fully-compiled language (e.g., C/C++) rather than in Perl.
http://www.impulseadventure.com/photo/jpeg-snoop.html
is a tool that does the job almost good
If there has been any cloning , there is a variation in the pixel density..or concentration which sometimes shows up.. upon manual inspection
a Photoshop cloned area will have even pixel density(my meaning is variation of Pixels wrt a scanned image)