Importing weights into opencv MLP? - c++

Say I want to use some other product to create an MLP (R,Python, Matlab, whatever) but I want to run that network, i.e. just for prediction, under opencv. Assume that the parameters (e.g. activation function) are compatible between the training product and opencv.
How can I import my trained weights into the opencv MLP? Perhaps the training product uses an MxN matrix of weights for each layer where M is the input layer and M the output (and so W(i,j) would be the weight between input node i and output node j.) Perhaps the biases are stored in a separate N element vector. The specifics of the original format don't matter so much because as long as I know what the weights mean and how they are stored I can transform them however opencv needs them.
So, given that, how do I import these weights into a (run time prediction only) opencv MLP? What weight bias (etc?) format does opencv need and how do I set its weights+baises?

I've just run into the same problem. I haven't looked at Opencv's MLP class enough yet to know if there's an easier/simpler way to, but OpenCV let's you save and load MLP's from .xmls and .ymls so if you make an ANN in OpenCV, you can then save it to one of those formats, look at it to figure out the format OpenCV wants, and then save your network into that format from R/Python/MatLab or at least into some format and make a script to translate it from there to OpenCv's format. Once you have that done it should be as simple as instantiating opencv's mlp in the code you actually want to use it to predict on and calling the load("filename") function on it. (I realize this is a year after the fact, so hopefully you found an answer or a work around. If you found a better idea, tell me, I'd love to know).

You must parse your model like how the 'read' function of MLP in OpenCV parse the xml or yml. I think this will not too hard.

Related

Neural Net gradient calculation with Batch Normalization C++

I was trying to change my activation function of my neural net from sigmoid to RELU (or more specifically SELU). Since I got a lot of exploding gradients with that change, I tried to use the batch normalization. I calculated the gradients of my error function w.r.t the learning parameters \beta and \gamma, but it seems that they are a bit different from the ones I saw in several (sadly only Python) examples.
Here, for example, the code example on the bottom of the page says dbeta = np.sum(dout, axis=0) and I wonder what exactly this dout is.
My derivatives look like this:
Derivation of error function w.r.t \beta
What am I doing wrong in this derivation?
Thank you a lot for your help.
I try to add batchnorm2d layer in a small CNN testet on MNIST with Libtorch C++ code with or without GPU use
Here
https://github.com/ollewelin/libtorch-GPU-CNN-test-MNIST-with-Batchnorm
And the precision increase a little then.
Search for
”bn1”
Or
”bn2”
In this code you find.
Installation at Ubuntu with GPU and Libtorch + OpenCV for C++ here:
https://github.com/ollewelin/torchlib-opencv-gpu

classification with SVM using vocab build from Bag of Word

My intention is to build a classifier that correctly classify the image ROI with the template that I have manually extracted
Here is what I have done.
My first step is to understand what should be done to achieve the above
I have realized I would need to create the representation vectors(of the template) through research from the net. Hence I have used Bag of words to create the vocabulary
I have used and rewritten the Roy's project to opencv 3.1 and also used his food database. On seeing his database, I have realised that some of the image contain multiple class type. I try to clip the image so that each training image only contains one class of item but the image are now of different size
I have tried to run this code. The result is very disappointing. It always points to one class.
Question I have?
Is my step in processing the training image wrong? I read around and some posts suggest the image size must be constant or at least the aspect ratio. I am confused by this. Is there some tools available for resizing samples?
It does not matter what is the size of the sample images, since Roy's algorithm uses local descriptors extracte from nearby points of interest.
SVM is linear regression classifier and you need to train different SVM-s for each class. For each class it will say whether it's of that class or the rest. The so called one vs. rest.

Sparse coding and dictionary learning using opencv and c++

I am trying to perform text image restoration and I can find no proper documentation on how to perform OMP or K-SVD in C++ using opencv.
I have over 1000 training images of different sizes so do I divide images into equal sized patches or resize all images? How do I construct the signal matrix X?
What other pre-processing steps are required for sparse coding? How to actually perform K-SVD on color images?
What data type is available in OpenCV for an image dictionary and how do I initialize the Dictionary D?
I have these very basic questions and have tried to use various libraries but they don't make the working very clear.
I found this code useful. This is the only implementation in opencv I have come across so far. I guess it uses a single image for dictionary learning whereas I have to use at least 1000 images. But it certainly provides a good guideline.

Modifying an OpenCV RandomTree classifier

My problem : objective is to implement a computer vision paper which uses a random tree structure to regress pixels from a rgbd image to 3D world coordinates.
I used already OpenCv for AdaBoost and random forest but i never dived into the code.
So now as i would like to modify the error function of the split node, i don't know if it's possible. I didn't see clear declarations in the header file.
Just to add some informations about what I want to do in the error function.
The input is a pixel (i,j). Then in the error function depending on the parameter, a feature would be created from the rgbd image and a best split over the feature of each pixels of the subset would have to be found. The features clearly depend on the parameter and should be estimated during training.
My question :
Is it possible to create a class extending CvRTrees and modifying the error function for each split node ?
If yes, what member should be modified ? If no, do you know any librairy that could help me to achieve that.
As no one answered i will just post what i found out :
The CvRTrees use a fixed feature as input (e.g. a HOG descriptor).
If you want to use random features, you have to either put all these features as input (which may be totally suboptimal or impossible).
Or you can create your own implementation of the weak classifier where the type of feature used is a random vraiable as for example the threshold could be.

vehicle type identification with neural network

I was given a project on vehicle type identification with neural network and that is how I came to know the awesomeness of neural technology.
I am a beginner with this field, but I have sufficient materials to learn it. I just want to know some good places to start for this project specifically, as my biggest problem is that I don't have very much time. I would really appreciate any help. Most importantly, I want to learn how to match patterns with images (in my case, vehicles).
I'd also like to know if python is a good language to start this in, as I'm most comfortable with it.
I am having some images of cars as input and I need to classify those cars by there model number.
Eg: Audi A4,Audi A6,Audi A8,etc
You didn't say whether you can use an existing framework or need to implement the solution from scratch, but either way Python is excellent language for coding neural networks.
If you can use a framework, check out Theano, which is written in Python and is the most complete neural network framework available in any language:
http://www.deeplearning.net/software/theano/
If you need to write your implementation from scratch, look at the book 'Machine Learning, An Algorithmic Perspective' by Stephen Marsland. It contains example Python code for implementing a basic multilayered neural network.
As for how to proceed, you'll want to convert your images into 1-D input vectors. Don't worry about losing the 2-D information, the network will learn 'receptive fields' on its own that extract 2-D features. Normalize the pixel intensities to a -1 to 1 range (or better yet, 0 mean with a standard deviation of 1). If the images are already centered and normalized to roughly the same size than a simple feed-forward network should be sufficient. If the cars vary wildly in angle or distance from the camera, you may need to use a convolutional neural network, but that's much more complex to implement (there are examples in the Theano documentation). For a basic feed-forward network try using two hidden layers and anywhere from 0.5 to 1.5 x the number of pixels in each layer.
Break your dataset into separate training, validation, and testing sets (perhaps with a 0.6, 0.2, 0.2 ratio respectively) and make sure each image only appears in one set. Train ONLY on the training set, and don't use any regularization until you're getting close to 100% of the training instances correct. You can use the validation set to monitor progress on instances that you're not training on. Performance should be worse on the validation set than the training set. Stop training when the performance on the validation set stops improving. Once you've accomplished this you can try different regularization constants and choose the one that results in the best validation set performance. The test set will tell you how well your final result is performing (but don't change anything based on test set results, or you risk overfitting to that too!).
If your car images are very complex and varied and you cannot get a basic feed-forward net to perform well, you might consider using 'deep learning'. That is, add more layers and pre-train them using unsupervised training. There's a detailed tutorial on how to do this here (though all the code examples are in MatLab/Octave):
http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial
Again, that adds a lot of complexity. Try it with a basic feed-forward NN first.