Sparse coding and dictionary learning using opencv and c++ - c++

I am trying to perform text image restoration and I can find no proper documentation on how to perform OMP or K-SVD in C++ using opencv.
I have over 1000 training images of different sizes so do I divide images into equal sized patches or resize all images? How do I construct the signal matrix X?
What other pre-processing steps are required for sparse coding? How to actually perform K-SVD on color images?
What data type is available in OpenCV for an image dictionary and how do I initialize the Dictionary D?
I have these very basic questions and have tried to use various libraries but they don't make the working very clear.

I found this code useful. This is the only implementation in opencv I have come across so far. I guess it uses a single image for dictionary learning whereas I have to use at least 1000 images. But it certainly provides a good guideline.

Related

classification with SVM using vocab build from Bag of Word

My intention is to build a classifier that correctly classify the image ROI with the template that I have manually extracted
Here is what I have done.
My first step is to understand what should be done to achieve the above
I have realized I would need to create the representation vectors(of the template) through research from the net. Hence I have used Bag of words to create the vocabulary
I have used and rewritten the Roy's project to opencv 3.1 and also used his food database. On seeing his database, I have realised that some of the image contain multiple class type. I try to clip the image so that each training image only contains one class of item but the image are now of different size
I have tried to run this code. The result is very disappointing. It always points to one class.
Question I have?
Is my step in processing the training image wrong? I read around and some posts suggest the image size must be constant or at least the aspect ratio. I am confused by this. Is there some tools available for resizing samples?
It does not matter what is the size of the sample images, since Roy's algorithm uses local descriptors extracte from nearby points of interest.
SVM is linear regression classifier and you need to train different SVM-s for each class. For each class it will say whether it's of that class or the rest. The so called one vs. rest.

FFT based image registration (optionally using OpenCV) in cpp?

I'm trying to align two images taken from a handheld camera.
At first, I was trying to use the OpenCV warpPerspective method based on SIFT/SURF feature points. The problem is the feature-extract & matching process may be extremely slow when the image quality is high (3000x4000). I tried to scale-down the image before find feature-points, the result is not as good as before.(The Mat generated from findHomography shouldn't be affected by scaling down the image, right?) And sometimes, due to lack of good feature point matches, the result is quite strange.
After searching on this topic, it seems that solving the problem in Fourier domain will speed up the registration process. And I've found this question which leads me to the code here.
The only problem is the code is written in python with numpy (not even using OpenCV), which makes it quite hard to re-written to C++ code using OpenCV (In OpenCV, I can only find dft and there's no fftshift nor fft stuff, I'm not quite familiar with NumPy, and I'm not brave enough to simply ignore the missing methods). So I'm wondering why there is not such a Fourier-domain image registration implementation using C++?
Can you guys give me some suggestion on how to implement one, or give me a link to the already implemented C++ version? Or help me to turn the python code into C++ code?
Big thanks!
I'm fairly certain that the FFT method can only recover a similarity transform, that is, only a (2d) rotation, translation and scale. Your results might not be that great using a handheld camera.
This is not quite a direct answer to your question, but, as a suggestion for a speed improvement, have you tried using a faster feature detector and descriptor? In OpenCV SIFT/SURF are some of the slowest methods they have for feature extraction/matching. You could try testing some of their other methods first, they all work quite well and are faster than SIFT/SURF. Especially if you use their FLANN-based matcher.
I've had to do this in the past with similar sized imagery, and using the binary descriptors OpenCV has increases the speed significantly.
If you need only shift you can use OpenCV's phasecorrelate

Pattern Recognition in C++

I have a simple template grayscale image, with white background and black shape over it, and I have several similar test images, I want to compare these two images and see if template matches any of the test images. Can you please suggest a simple(easy to use) pattern recognition library for C++ which takes two images and compares them and shows the result?
Just do image1-image2 for all pixels. Then sum up all the differences. The lower the results, the closer the images.
If your pattern could be of several sizes, then you have to resize it and check it for each positions.
Implement a Neural Network on the image. Inputs should be the greyscales of your image. you should train your network to a train set, chose proper regularization parameters using a cross validation set, and finally test your network on a test set.
http://www.codeproject.com/Articles/13582/Back-propagation-Neural-Net
(I have done this myself to train a network to recognise hand written digits - it works very well.)
How simple the library you need is depends on the specific parameters of your problem. OpenCV is a great image processing library that should be able to do what you need it to. Here is a tutorial on template matching in OpenCV. It makes it very easy to switch between matching metrics and choose the best one for your problem.

Object recognition using LDA and ORB with different sized training image.

I'm trying to build a lightweight object recognition system using ORB for feature extraction and LDA for classification. But I'm running into an issue do to the varying size of extracted features.
These are my steps:
Extract keypoints using ORB.
Extract trainable features in the image by grouping the keypoints.
(example of whats being extracted: http://imgur.com/gaQWk)
Train the recognizer with the extracted features. (This is where problems arise)
Classify objects in an image from the wild.
If I attempt to create a generalized matrix using cv::gemm, I get an exception due to the varying sizes. My first thought was to just to normalize all the images by resizing them, but this causes a lot of accuracy issues when objects have similar small features.
Is there any solution to this? Is LDA an appropriate method for this? I know it's commonly used with facial recognition algorithms such as fisherfaces.
LDA requires fixed length features, as do most optimization and machine learning methods. You could resize the image patches to be a fixed size, but that is probably not going to be a good feature. Normally people use a scale invariant feature such as SIFT. You also might try a color histogram, or some variation of edge detection and spatial histogram binning such as a GIST vector.
It's hard to say if LDA is an appropriate method for this without knowing what you hope to accomplish. You might also look into using SVM, some form of boosting, or just plain nearest neighbor with a large training set.

pixel conversion to raw bits (bit-stream)

I want to read the contents of every pixel in an image i have and convert it to a bit-stream (raw bits) or contain it in a 2-D array . Which would be the best place to start looking for such a conversion?
Specifics of the image : Standard test image called lena.bmp
size : 256 x 256
Bit depth of pixel : 8
Also I would like to know the importance of the number of bits per pixel with regards to this question since packing and unpacking will also be incorporated .
CImg is a nice simple, lightweight C++ library which can load and save a number of image formats (including BMP).
It's a single header file, so there's no need to compile or link the library. Just include the header, and you're good to go.
You should investigate OpenCV: a cross-platform computer vision library. It provides a C++ API as well as a C API, and it supports many image formats including bmp.
In the C++ interface, cv::Mat is the type that represents a 2D image. A simple application that loads and displays an image can be found here.
To learn how to access the matrix elements (pixels) you can check these threads:
OpenCV get pixel information from Mat image
Pixel access in OpenCV 2.2
Common Matrix Operations in OpenCV
OpenCV’s C++ interface offers a short introduction to cv::Mat. There has been many threads on Stackoverflow regarding OpenCV, there's a lot of valuable content around and you can benefit a lot by using the search box.
This page has a collection of books/tutorials/install guides focused on OpenCV, but this the newest official tutorial.