I'm trying to use DLib's one_class classifier on contours.
The data is in the format of an integer vector of the difference between continuous points (either 1, 0, or -1).
dlib::svm_one_class_trainer<dlib::radial_basis_kernel<dlib::matrix<double, features * 2, 1>>> trainer;
trainer.set_kernel(dlib::radial_basis_kernel<dlib::matrix<double, features * 2, 1>>(0.00001));
std::vector<dlib::matrix<double, features * 2, 1>> data;
for(auto &contour : contours){
int loops = std::ceil(contour.size() / features);
//Padding
for (long unsigned int i = 0; i < features - (contour.size() % features); i++) {
contour.push_back(feature{0, 0});
}
for (int i = 0; i < loops; i++) {
dlib::matrix<double, features * 2, 1> datapoint;
for (int j = 0; j < features * 2; j += 2) {
datapoint(j) = contour[i * features + j].x;
datapoint(j + 1) = contour[i * features + j + 1].y;
}
data.push_back(datapoint);
}
}
//Train SVM
return(trainer.train(data));
However, the output is always 0. Not 0.0, or some almost-zero floating point approximation, which surprised me. Is there anything obviously wrong with my training code that would cause this?
Related
I'm trying to create a convolution function but I'm having trouble during the access to the kernel data (cv::Mat).
I create the 3x3 kernel:
cv::Mat krn(3, 3, CV_32FC1);
krn.setTo(1);
krn = krn/9;
And I try to loop over it. Next the image Mat will be the image to which I want to apply the convolution operator and output will be the result of convolution:
for (int r = 0; r < image.rows - krn.rows; ++r) {
for (int c = 0; c < image.cols - krn.cols; ++c) {
int sum = 0;
for (int rs = 0; rs < krn.rows; ++rs) {
for (int cs = 0; cs < krn.cols; ++cs) {
sum += krn.data[rs * krn.cols + cs] * image.data[(r + rs) * image.cols + c + cs];
}
}
output.data[(r+1)*src.cols + c + 1]=sum; // assuming 3x3 kernel
}
}
However the output is not as desired (only randomic black and white pixel).
However, if I change my code this way:
for (int r = 0; r < image.rows - krn.rows; ++r) {
for (int c = 0; c < image.cols - krn.cols; ++c) {
int sum = 0;
for (int rs = 0; rs < krn.rows; ++rs) {
for (int cs = 0; cs < krn.cols; ++cs) {
sum += 0.11 * image.data[(r + rs) * image.cols + c + cs]; // CHANGE HERE
}
}
output.data[(r+1)*src.cols + c + 1]=sum; // assuming 3x3 kernel
}
}
Using 0.11 instead of the kernel values seems to give the correct output.
For this reason I think I'm doing something wrong accessing the kernel's data.
P.S: I cannot use krn.at<float>(rs,cs).
Thanks!
Instead of needlessly using memcpy, you can just cast the pointer. I'll use a C-style cast because why not.
cv::Mat krn = 1 / (cv::Mat_<float>(3,3) <<
1, 2, 3,
4, 5, 6,
7, 8, 9);
for (int i = 0; i < krn.rows; i += 1)
{
for (int j = 0; j < krn.cols; j += 1)
{
// to see clearly what's happening
uint8_t *byteptr = krn.data + krn.step[0] * i + krn.step[1] * j;
float *floatptr = (float*) byteptr;
// or in one step:
float *floatptr = (float*) (krn.data + krn.step[0] * i + krn.step[1] * j);
cout << "krn.at<float>(" << i << "," << j << ") = " << (*floatptr) << endl;
endl;
}
}
krn.at<float>(0,0) = 1
krn.at<float>(0,1) = 0.5
krn.at<float>(0,2) = 0.333333
krn.at<float>(1,0) = 0.25
krn.at<float>(1,1) = 0.2
krn.at<float>(1,2) = 0.166667
krn.at<float>(2,0) = 0.142857
krn.at<float>(2,1) = 0.125
krn.at<float>(2,2) = 0.111111
Note that pointer arithmetic may not be obvious. if you have a uint8_t*, adding 1 moves it by one uint8_t, and if you have a float*, adding 1 moves it by one float which is four bytes. The step[] contains offsets expressed in bytes.
Consult the documentation for details, which include information on the step[] array that contains the strides/steps to calculate the offset given a tuple of indices into the matrix.
cv::Mat::data is pointer of type uchar.
By data[y * cols + x] you access some byte of stored float values in krn. To get full float values use at method template:
krn.at<float>(rs,cs)
Consider changing type of sum variable to be real. Without this, you may lose partial results when calculating convolution .
So, if you cannot use at, just read 4 bytes from data pointer:
float v = 0.0;
memcpy(&v, krn.data + (rs * krn.step + cs * sizeof(float)), 4);
step - means total bytes occupied by one line in mat.
I have implemented a polynomial curve fitting method in C++ OpenCV based on the fact that any function can be approximated with the power function. The equation is then written into matrix form and is being solved. Basically, the code is this:
PolynomialFit(std::vector<cv::Point>& points, int order) {
cv::Mat U(points.size(), (order + 1), CV_64F);
cv::Mat Y(points.size(), 1, CV_64F);
for (int i = 0; i < U.rows; i++) {
for (int j = 0; j < U.cols; j++) {
U.at<double>(i, j) = pow(points[i].x, j);
}
}
for (int i = 0; i < Y.rows; i++) {
Y.at<double>(i, 0) = points[i].y;
}
cv::Mat K((order + 1), 1, CV_64F);
if(U.data != NULL) {
K = (U.t() * U).inv() * U.t() * Y;
}
and in main this is how I call it:
int order = 2;
cv::Mat K = PolynomialFit(_points, order);
if(_points.size() > 0) {
for (int j = _points.at(0).x; j < _points.at(_points.size() - 1).x; j++) {
cv::Point2d point(j, 0);
for (int k = 0; k < order + 1; k++) {
point.y += K.at<double>(k, 0) * std::pow(j, k);
}
cv::circle(image, point, 1, cv::Scalar(0, 255, 0), CV_FILLED, CV_AA);
}
}
The problem is, it only works for a certain type of points. For example, in the image below, it only works for the points that are in the left curve. How could I change this behaviour? I already tried changing the order parameter, but the right curve won't fit as it should be.
For calculating fit curve, it has to transform axis.
As you see below, you can get 2 fit curves with horizontal x axis and vertical x axis, and then get the sum of erro power, select one curve which has minium sum.
For this, you can exchange x and y from your code of PolynomialFit function.
I want to multiply one image by its transpose. my image size is nxm.
i do as follows
for (int k = 0; k < total_images; k++)
{
Mat img_tp1 = cv::Mat(imgRows, imgCols, CV_32FC1);
Mat img_tp2 = cv::Mat(imgRows, imgRows, CV_32FC1);
subtract(img[k], MeanMat, img_tp1);
img_tp2 = img_tp1 * img_tp2.t();
std::ostringstream name;
name << "sub" << k << ".jpg";
cv::imwrite(name.str(), img_tp2);
}
and i face this error
Unhandled exception at 0x000007FEFDB79E5D in Tracking.exe: Microsoft C++ exception: cv::Exception at memory location 0x00000000001E5EE0.
how can i do this multiplication? in fact i want to compute the covariance matrix of the sequence of images so i need this multiplication.
Thanks.
Then i decide to implement the multiplying for my RGB image and i use this code:
for (int i = 0; i < imgRows; i++)
{
for (int j = 0; j < imgRows; j++)
{
uchar pix1[3];
uchar pix2[3];
uchar pix[3] = { 0, 0, 0 };
for (int k = 0; k < imgCols; k++)
{
img_tp1.at<Vec3b>(i, k) = { pix1[0], pix1[1], pix1[2] };
img_tp1.at<Vec3b>(j, k) = { pix2[0], pix2[1], pix2[2] };
CovMat0.at<Vec3b>(i, j) = { pix[0], pix[1], pix[2] };
pix[0] = (pix1[0] * pix2[0]) + pix[0];
pix[1] = (pix1[1] * pix2[1]) + pix[1];
pix[2] = (pix1[2] * pix2[2]) + pix[2];
CovMat0.at<Vec3b>(i, j) = { pix[0], pix[1], pix[2] };
}
}
}
but it takes lots of time to process it. Is there any better way for that?
(I want to multiply one image by its transpose)
everyone I am trying to implement patter matching with FFT but I am not sure what the result should be (I think I am missing something even though a read a lot of stuff about the problem and tried a lot of different implementations this one is the best so far). Here is my FFT correlation function.
void fft2d(fftw_complex**& a, int rows, int cols, bool forward = true)
{
fftw_plan p;
for (int i = 0; i < rows; ++i)
{
p = fftw_plan_dft_1d(cols, a[i], a[i], forward ? FFTW_FORWARD : FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(p);
}
fftw_complex* t = (fftw_complex*)fftw_malloc(rows * sizeof(fftw_complex));
for (int j = 0; j < cols; ++j)
{
for (int i = 0; i < rows; ++i)
{
t[i][0] = a[i][j][0];
t[i][1] = a[i][j][1];
}
p = fftw_plan_dft_1d(rows, t, t, forward ? FFTW_FORWARD : FFTW_BACKWARD, FFTW_ESTIMATE);
fftw_execute(p);
for (int i = 0; i < rows; ++i)
{
a[i][j][0] = t[i][0];
a[i][j][1] = t[i][1];
}
}
fftw_free(t);
}
int findCorrelation(int argc, char* argv[])
{
BMP bigImage;
BMP keyImage;
BMP result;
RGBApixel blackPixel = { 0, 0, 0, 1 };
const bool swapQuadrants = (argc == 4);
if (argc < 3 || argc > 4) {
cout << "correlation img1.bmp img2.bmp" << endl;
return 1;
}
if (!keyImage.ReadFromFile(argv[1])) {
return 1;
}
if (!bigImage.ReadFromFile(argv[2])) {
return 1;
}
//Preparations
const int maxWidth = std::max(bigImage.TellWidth(), keyImage.TellWidth());
const int maxHeight = std::max(bigImage.TellHeight(), keyImage.TellHeight());
const int rowsCount = maxHeight;
const int colsCount = maxWidth;
BMP bigTemp = bigImage;
BMP keyTemp = keyImage;
keyImage.SetSize(maxWidth, maxHeight);
bigImage.SetSize(maxWidth, maxHeight);
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
RGBApixel p1;
if (i < bigTemp.TellHeight() && j < bigTemp.TellWidth()) {
p1 = bigTemp.GetPixel(j, i);
} else {
p1 = blackPixel;
}
bigImage.SetPixel(j, i, p1);
RGBApixel p2;
if (i < keyTemp.TellHeight() && j < keyTemp.TellWidth()) {
p2 = keyTemp.GetPixel(j, i);
} else {
p2 = blackPixel;
}
keyImage.SetPixel(j, i, p2);
}
//Here is where the transforms begin
fftw_complex **a = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
fftw_complex **b = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
fftw_complex **c = (fftw_complex**)fftw_malloc(rowsCount * sizeof(fftw_complex*));
for (int i = 0; i < rowsCount; ++i) {
a[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
b[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
c[i] = (fftw_complex*)fftw_malloc(colsCount * sizeof(fftw_complex));
for (int j = 0; j < colsCount; ++j) {
RGBApixel p1;
p1 = bigImage.GetPixel(j, i);
a[i][j][0] = (0.299*p1.Red + 0.587*p1.Green + 0.114*p1.Blue);
a[i][j][1] = 0.0;
RGBApixel p2;
p2 = keyImage.GetPixel(j, i);
b[i][j][0] = (0.299*p2.Red + 0.587*p2.Green + 0.114*p2.Blue);
b[i][j][1] = 0.0;
}
}
fft2d(a, rowsCount, colsCount);
fft2d(b, rowsCount, colsCount);
result.SetSize(maxWidth, maxHeight);
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
fftw_complex& y = a[i][j];
fftw_complex& x = b[i][j];
double u = x[0], v = x[1];
double m = y[0], n = y[1];
c[i][j][0] = u*m + n*v;
c[i][j][1] = v*m - u*n;
int fx = j;
if (fx>(colsCount / 2)) fx -= colsCount;
int fy = i;
if (fy>(rowsCount / 2)) fy -= rowsCount;
float r2 = (fx*fx + fy*fy);
const double cuttoffCoef = (maxWidth * maxHeight) / 37992.;
if (r2<128 * 128 * cuttoffCoef)
c[i][j][0] = c[i][j][1] = 0;
}
fft2d(c, rowsCount, colsCount, false);
const int halfCols = colsCount / 2;
const int halfRows = rowsCount / 2;
if (swapQuadrants) {
for (int i = 0; i < halfRows; ++i)
for (int j = 0; j < halfCols; ++j) {
std::swap(c[i][j][0], c[i + halfRows][j + halfCols][0]);
std::swap(c[i][j][1], c[i + halfRows][j + halfCols][1]);
}
for (int i = halfRows; i < rowsCount; ++i)
for (int j = 0; j < halfCols; ++j) {
std::swap(c[i][j][0], c[i - halfRows][j + halfCols][0]);
std::swap(c[i][j][1], c[i - halfRows][j + halfCols][1]);
}
}
for (int i = 0; i < rowsCount; ++i)
for (int j = 0; j < colsCount; ++j) {
const double& g = c[i][j][0];
RGBApixel pixel;
pixel.Alpha = 0;
int gInt = 255 - static_cast<int>(std::floor(g + 0.5));
pixel.Red = gInt;
pixel.Green = gInt;
pixel.Blue = gInt;
result.SetPixel(j, i, pixel);
}
BMP res;
res.SetSize(maxWidth, maxHeight);
result.WriteToFile("result.bmp");
return 0;
}
Sample output
This question would probably be more appropriately posted on another site like cross validated (metaoptimize.com used to also be a good one, but it appears to be gone)
That said:
There's two similar operations you can perform with FFT: convolution and correlation. Convolution is used for determining how two signals interact with each-other, whereas correlation can be used to express how similar two signals are to each-other. Make sure you're doing the right operation as they're both commonly implemented throught a DFT.
For this type of application of DFTs you usually wouldn't extract any useful information in the fourier spectrum unless you were looking for frequencies common to both data sources or whatever (eg, if you were comparing two bridges to see if their supports are spaced similarly).
Your 3rd image looks a lot like the power domain; normally I see the correlation output entirely grey except where overlap occurred. Your code definitely appears to be computing the inverse DFT, so unless I'm missing something the only other explanation I've come up with for the fuzzy look could be some of the "fudge factor" code in there like:
if (r2<128 * 128 * cuttoffCoef)
c[i][j][0] = c[i][j][1] = 0;
As for what you should expect: wherever there are common elements between the two images you'll see a peak. The larger the peak, the more similar the two images are near that region.
Some comments and/or recommended changes:
1) Convolution & correlation are not scale invariant operations. In other words, the size of your pattern image can make a significant difference in your output.
2) Normalize your images before correlation.
When you get the image data ready for the forward DFT pass:
a[i][j][0] = (0.299*p1.Red + 0.587*p1.Green + 0.114*p1.Blue);
a[i][j][1] = 0.0;
/* ... */
How you grayscale the image is your business (though I would've picked something like sqrt( r*r + b*b + g*g )). However, I don't see you doing anything to normalize the image.
The word "normalize" can take on a few different meanings in this context. Two common types:
normalize the range of values between 0.0 and 1.0
normalize the "whiteness" of the images
3) Run your pattern image through an edge enhancement filter. I've personally made use of canny, sobel, and I think I messed with a few others. As I recall, canny was "quick'n dirty", sobel was more expensive, but I got comparable results when it came time to do correlation. See chapter 24 of the "dsp guide" book that's freely available online. The whole book is worth your time, but if you're low on time then at a minimum chapter 24 will help a lot.
4) Re-scale the output image between [0, 255]; if you want to implement thresholds, do it after this step because the thresholding step is lossy.
My memory on this one is hazy, but as I recall (edited for clarity):
You can scale the final image pixels (before rescaling) between [-1.0, 1.0] by dividing off the largest power spectrum value from the entire power spectrum
The largest power spectrum value is, conveniently enough, the center-most value in the power spectrum (corresponding to the lowest frequency)
If you divide it off the power spectrum, you'll end up doing twice the work; since FFTs are linear, you can delay the division until after the inverse DFT pass to when you're re-scaling the pixels between [0..255].
If after rescaling most of your values end up so black you can't see them, you can use a solution to the ODE y' = y(1 - y) (one example is the sigmoid f(x) = 1 / (1 + exp(-c*x) ), for some scaling factor c that gives better gradations). This has more to do with improving your ability to interpret the results visually than anything you might use to programmatically find peaks.
edit I said [0, 255] above. I suggest you rescale to [128, 255] or some other lower bound that is gray rather than black.
I am new to opencv c++ .I am getting error with code for convolution (got from internet)which is equivalent to conv2 in matlab. The problem is all the pixel values are becoming 255.The filter which i am using in the code has same size as image. Can anybody please help me in correcting the problem.My opencv c++ code is given below:
#include<opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include<stdio.h>
#include<iostream>
#include<math.h>
#include<cv.hpp>
using namespace cv;
using namespace std;
Mat gd,img,bimgFiltered,gimgFiltered,rimgFiltered,fin_img;
Mat b,g,r,cr,cb,cg,B,G,R;
Mat b_logplane, b_plane,b_logfiltered,b_log,g_logplane,g_plane,g_logfiltered;
Mat g_log,r_logplane,r_plane,r_logfiltered,r_log;
Mat kernel, dest;
int m,n,m1,m2,n1,n2;
int c = 120;
double mysum = 0.0, mysum1 = 0.0, k = 0;
int cent=0,radius=0;
enum ConvolutionType {
/* Return the full convolution, including border */
CONVOLUTION_FULL,
/* Return only the part that corresponds to the original image */
CONVOLUTION_SAME,
/* Return only the submatrix containing elements that were not influenced
by the border
*/
CONVOLUTION_VALID
};
void conv2(const Mat &img, const Mat& kernel, ConvolutionType type,Mat& dest)
{
Mat source = img;
if(CONVOLUTION_FULL == type)
{
source = Mat();
const int additionalRows = kernel.rows - 1, additionalCols = kernel.cols - 1;
copyMakeBorder(img, source, (additionalRows + 1) / 2, additionalRows / 2,
(additionalCols + 1) / 2, additionalCols / 2, BORDER_CONSTANT, Scalar(0));
}
flip(kernel, kernel, -1);
Point anchor(kernel.cols - kernel.cols / 2 - 1, kernel.rows - kernel.rows / 2 - 1);
int borderMode = BORDER_CONSTANT;
filter2D(source, dest, img.depth(), kernel, anchor, 0, borderMode);
if(CONVOLUTION_VALID == type)
{
dest = dest.colRange((kernel.cols - 1) / 2, dest.cols - kernel.cols /
2).rowRange((kernel.rows - 1) / 2, dest.rows - kernel.rows / 2);
}
}
int main()
{
img = imread("milla.bmp", CV_LOAD_IMAGE_COLOR);
b.create(img.size(),img.type());
g.create(img.size(),img.type());
r.create(img.size(),img.type());
cr.create(img.size(),img.type());
cg.create(img.size(),img.type());
cb.create(img.size(),img.type());
Mat planes[3];
split(img,planes);
bimgFiltered.create(img.size(),img.type());
gimgFiltered.create(img.size(),img.type());
rimgFiltered.create(img.size(),img.type());
dest.create(img.size(), img.type());
gd.create(img.size(), img.type());
for(int j = 0; j < img.rows; j++)
{
for(int i = 0; i < img.cols; i++)
{
radius = ((cent - i)^2 + (cent - j)^2);
gd.at<float>(j, i) = exp((-(radius) / c^2));
mysum = mysum + gd.at<float>(j, i);
}
mysum1 = mysum1 + mysum;
}
k=1/mysum1;
cout<<endl<<k<<"\n"<<endl;
for(int j = 0; j < img.rows; j++)
{
for(int i = 0; i < img.cols; i++)
{
gd.at<float>(j, i) = k * gd.at<float>(j, i);
}
}
planes[0].convertTo(planes[0],CV_32F,1.0/255.0);
planes[1].convertTo(planes[1],CV_32F,1.0/255.0);
planes[2].convertTo(planes[2],CV_32F,1.0/255.0);
conv2(planes[0],gd,CONVOLUTION_SAME,bimgFiltered);
conv2(planes[1],gd,CONVOLUTION_SAME,gimgFiltered);
conv2(planes[2],gd,CONVOLUTION_SAME,rimgFiltered);
imshow("img",gimgFiltered );
waitKey(0);
return 0;
}
There are a few problems with the code:
Issue 1:
In the following two lines:
radius = ((cent - i)^2 + (cent - j)^2);
gd.at<float>(j, i) = exp((-(radius) / c^2));
You are using ^ operator which is the bitwise XOR operator in C/C++. I think you are mistaking it for power operator. To take the power of a number you have to use the pow function as follows:
radius = powf((cent - i),2) + powf((cent - j),2);
gd.at<float>(j, i) = expf((-(radius) / (c*c)));
Issue 2:
The gd matrix is assumed to have floating point values as it is accessed like gd.at<float>(j, i), but it is declared with the same type as that of the image, i.e. CV_8UC3. So gd should be created as follows:
gd.create(img.size(), CV_32FC1);
Issue 3:
Another possible logical error may be present in the first nested loop. You may have to set mysum = 0; before starting the inner loop like this:
for(int j = 0; j < img.rows; j++)
{
mysum = 0;
for(int i = 0; i < img.cols; i++)
{
radius = powf((cent - i),2) + powf((cent - j),2);
gd.at<float>(j, i) = expf((-(radius) / (c*c)));
mysum = mysum + gd.at<float>(j, i);
}
mysum1 = mysum1 + mysum;
}
Issue 4:
Output filtered images should be created single channel instead of 3 channels:
bimgFiltered.create(img.size(),CV_8UC1);
gimgFiltered.create(img.size(),CV_8UC1);
rimgFiltered.create(img.size(),CV_8UC1);