I try to compare images using method similar to Features2D + Homography to find a known object but replace findHomography() by self-writed findAffine() function.
I use Ceres Solver to obtain optimal affine matrix considering outliers.
double translation[] = {0, 0};
double angle = 0;
double scaleFactor = 1;
ceres::Problem problem;
for (size_t i = 0; i < points1.size(); ++i) {
problem.AddResidualBlock(
new ceres::AutoDiffCostFunction<AffineResidual, 1, 2, 1, 1>(
new AffineResidual(Eigen::Vector2d(points1[i].x, points1[i].y),
Eigen::Vector2d(points2[i].x, points2[i].y))),
new ceres::HuberLoss(1.0),
translation,
&angle,
&scaleFactor);
}
ceres::Solver::Options options;
options.linear_solver_type = ceres::DENSE_QR;
options.minimizer_progress_to_stdout = true;
ceres::Solver::Summary summary;
Solve(options, &problem, &summary);
Ceres solver provide LossFunction:
Loss functions reduce the influence of residual blocks with high residuals, usually the ones corresponding to outliers.
Of course, I can transform keypoints coordinates from first image by obtained matrix, compare with second and get deviation. But ceres solver already done it inside during work.
How I can retrieve it? Did not find it in the documentation.
I had the similar problem. After looking into Ceres library sources (particularly into ResidualBlock::Evaluate() method) I had a conclusion that there is no explicit "outlier" status for residual block. It seems that the loss function just affects resulting cost value for a block (which is exactly described by the phrase from documentation you have quoted - "Loss functions reduce the influence of residual blocks with high residuals"). So the answer is that you cannot retrieve outliers from Ceres, there is no such feature.
Workaround might be calculating residuals for your data with the solved result, and apply loss function to them. The comment from LossFunction::Evaluate() might help:
// For a residual vector with squared 2-norm 'sq_norm', this method
// is required to fill in the value and derivatives of the loss
// function (rho in this example):
//
// out[0] = rho(sq_norm),
// out[1] = rho'(sq_norm),
// out[2] = rho''(sq_norm),
//
// Here the convention is that the contribution of a term to the
// cost function is given by 1/2 rho(s), where
//
// s = ||residuals||^2.
//
// Calling the method with a negative value of 's' is an error and
// the implementations are not required to handle that case.
//
// Most sane choices of rho() satisfy:
//
// rho(0) = 0,
// rho'(0) = 1,
// rho'(s) < 1 in outlier region,
// rho''(s) < 0 in outlier region,
//
// so that they mimic the least squares cost for small residuals.
virtual void Evaluate(double sq_norm, double out[3]) const = 0;
Related
I'm attempting to improve performance of the OpenCV lanczos interpolation algorithm for applying homography transformations to astronomical images, as it is prone to ringing artefacts around stars in some images.
My approach is to apply homography twice, once using lanczos and once using bilinear filtering which is not susceptible to ringing, but doesn't perform as well at preserving detail. I then use the bilinear-interpolated output as a guide image, and clamp the lanczos-interpolated output to the guide if it undershoots the guide by more than a given percentage.
I have working code (below) but have 2 questions:
It doesn't seem optimal to iterate across elements in the Mat. Is there a better way of doing the compare and replace loop using OpenCV Mat methods?
My overall approach is computationally expensive - I'm applying homography to the entire Mat twice. Is there an overall better approach to preventing deringing of lanczos interpolation? (Rewriting the entire algorithm plus all the various optimisations that OpenCV makes available is not an option for me.)
warpPerspective(in, out, H, Size(target_rx, target_ry), interpolation, BORDER_TRANSPARENT);
if (interpolation == OPENCV_LANCZOS4) {
int count = 0;
// factor sets how big an undershoot can be tolerated
double factor = 0.75;
// Create guide image
warpPerspective(in, guide, H, Size(target_rx, target_ry), OPENCV_LINEAR, BORDER_TRANSPARENT);
// Compare the two, replace out pixels with guide pixels if too far out
for (int i = 0 ; i < out.rows ; i++) {
const double* outi = out.ptr<double>(i);
const double* guidei = guide.ptr<double>(i);
for (int j = 0; j < out.cols ; j++) {
if (outi[j] < guidei[j] * factor) {
out.at<double>(i, j) = guidei[j];
count++;
}
}
}
}
With a steer from Christoph Rackwitz, the answer was surprisingly simple:
compare(out, (guide * factor), mask, CMP_LT);
guide.copyTo(out, mask);
Thanks :)
When I am using Intel IPP's ippsFFTFwd_RToCCS_64f and then ippsMagnitude_64fc I get a massive peak at zero index in magnitudes array.
My sine wave is long and main component I am interested is somewhere between 0.15 Hz and 0.25 Hz. I take the sample with 500Hz sampling frequency. If I reduce mean from the signal before FFT I get really small zero component not that peak anymore. Below is a pic of magnitudes array head:
Also the magnitude scaling seems to be 10 times the magnitude I see in the time series of the signal e.g. if amplitude is 29 in magnitudes it is 290.
I Am not sure why this is so and my question is 1. Do I really need to address the zero index peak with mean reduction and 2. Where does this scale of 10 come?
void CalculateForwardTransform(array<double> ^signal, array<double> ^transformedSignal, array<double> ^magnitudes)
{
// source signal
pin_ptr<double> pinnedSignal = &signal[0];
double *pSignal = pinnedSignal;
int order = (int)Math::Round(Math::Log(signal->Length, 2));
// get sizes
int sizeSpec = 0, sizeInit = 0, sizeBuf = 0;
int status = ippsFFTGetSize_R_64f(order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, &sizeSpec, &sizeInit, &sizeBuf);
// memory allocation
IppsFFTSpec_R_64f* pSpec;
Ipp8u *pSpecMem = (Ipp8u*)ippMalloc(sizeSpec);
Ipp8u *pMemInit = (Ipp8u*)ippMalloc(sizeInit);
// FFT specification structure initialized
status = ippsFFTInit_R_64f(&pSpec, order, IPP_FFT_DIV_INV_BY_N, ippAlgHintNone, pSpecMem, pMemInit);
// transform
pin_ptr<double> pinnedTransformedSignal = &transformedSignal[0];
double *pDst = pinnedTransformedSignal;
Ipp8u *pBuffer = (Ipp8u*)ippMalloc(sizeBuf);
status = ippsFFTFwd_RToCCS_64f(pSignal, pDst, pSpec, pBuffer);
// get magnitudes
pin_ptr<double> pinnedMagnitudes = &magnitudes[0];
double *pMagn = pinnedMagnitudes;
status = ippsMagnitude_64fc((Ipp64fc*)pDst, pMagn, magnitudes->Length); // magnitudes is half of signal len
// free memory
ippFree(pSpecMem);
ippFree(pMemInit);
ippFree(pBuffer);
}
Do I really need to address the zero index peak with mean reduction?
For low frequency signal analysis a small bias can really interfere (especially due to spectral leakage). For sake of illustration, consider the following low-frequency signal tone and another one with a constant bias tone_with_bias:
fs = 1;
f0 = 0.15;
for (int i = 0; i < N; i++)
{
tone[i] = 0.001*cos(2*M_PI*i*f0/fs);
tone_with_bias[i] = 1 + tone[i];
}
If we plot the frequency spectrum of a 100 sample window of these signals, you should notice that the spectrum of tone_with_bias completely drowns out the spectrum of tone:
So yes it's better if you can remove that bias. However, it should be emphasized that this is possible provided that you know the nature of the bias. If you know that the bias is indeed a constant, removing it will reveal the low-frequency component. Otherwise, removing the mean from the signal may not achieve the desired effect if the bias is just a very low-frequency component.
Where does this scale of 10 come?
Scaling of the magnitude by the FFT should be expected, as described in this answer of approximately 0.5*N (where N is the FFT size). If you were processing a small chunk of 20 samples, then you should get such a factor of 10 scaling. If you scale the output of the FFT by 2/N (or equivalently scale by 2 while also using the IPP_FFT_DIV_FWD_BY_N flag) you should get results that have similar magnitudes as the time-domain signal.
In my program, I am downscaling an image of 500px or larger to an extreme level of approx 16px-32px. The source image is user-specified so I do not have control over its size. As you can imagine, few pixel interpolations hold up and inevitably the result is heavily aliased.
I've tried bilinear, bicubic and square average sampling. The square average sampling actually provides the most decent results but the smaller it gets, the larger the sampling radius has to be. As a result, it gets quite slow - slower than the other interpolation methods.
I have also tried an adaptive square average sampling so that the smaller it gets the greater the sampling radius, while the closer it is to its original size, the smaller the sampling radius. However, it produces problems and I am not convinced this is the best approach.
So the question is: What is the recommended type of pixel interpolation that is fast and works well on such extreme levels of downscaling?
I do not wish to use a library so I will need something that I can code by hand and isn't too complex. I am working in C++ with VS 2012.
Here's some example code I've tried as requested (hopefully without errors from my pseudo-code cut and paste). This performs a 7x7 average downscale and although it's a better result than bilinear or bicubic interpolation, it also takes quite a hit:
// Sizing control
ctl(0): "Resize",Range=(0,800),Val=100
// Variables
float fracx,fracy;
int Xnew,Ynew,p,q,Calc;
int x,y,p1,q1,i,j;
//New image dimensions
Xnew=image->width*ctl(0)/100;
Ynew=image->height*ctl(0)/100;
for (y=0; y<image->height; y++){ // rows
for (x=0; x<image->width; x++){ // columns
p1=(int)x*image->width/Xnew;
q1=(int)y*image->height/Ynew;
for (z=0; z<3; z++){ // channels
for (i=-3;i<=3;i++) {
for (j=-3;j<=3;j++) {
Calc += (int)(src(p1-i,q1-j,z));
} //j
} //i
Calc /= 49;
pset(x, y, z, Calc);
} // channels
} // columns
} // rows
Thanks!
The first point is to use pointers to your data. Never use indexes at every pixel. When you write: src(p1-i,q1-j,z) or pset(x, y, z, Calc) how much computation is being made? Use pointers to data and manipulate those.
Second: your algorithm is wrong. You don't want an average filter, but you want to make a grid on your source image and for every grid cell compute the average and put it in the corresponding pixel of the output image.
The specific solution should be tailored to your data representation, but it could be something like this:
std::vector<uint32_t> accum(Xnew);
std::vector<uint32_t> count(Xnew);
uint32_t *paccum, *pcount;
uint8_t* pin = /*pointer to input data*/;
uint8_t* pout = /*pointer to output data*/;
for (int dr = 0, sr = 0, w = image->width, h = image->height; sr < h; ++dr) {
memset(paccum = accum.data(), 0, Xnew*4);
memset(pcount = count.data(), 0, Xnew*4);
while (sr * Ynew / h == dr) {
paccum = accum.data();
pcount = count.data();
for (int dc = 0, sc = 0; sc < w; ++sc) {
*paccum += *i;
*pcount += 1;
++pin;
if (sc * Xnew / w > dc) {
++dc;
++paccum;
++pcount;
}
}
sr++;
}
std::transform(begin(accum), end(accum), begin(count), pout, std::divides<uint32_t>());
pout += Xnew;
}
This was written using my own library (still in development) and it seems to work, but later I changed the variables names in order to make it simpler here, so I don't guarantee anything!
The idea is to have a local buffer of 32 bit ints which can hold the partial sum of all pixels in the rows which fall in a row of the output image. Then you divide by the cell count and save the output to the final image.
The first thing you should do is to set up a performance evaluation system to measure how much any change impacts on the performance.
As said precedently, you should not use indexes but pointers for (probably) a substantial
speed up & not simply average as a basic averaging of pixels is basically a blur filter.
I would highly advise you to rework your code to be using "kernels". This is the matrix representing the ratio of each pixel used. That way, you will be able to test different strategies and optimize quality.
Example of kernels:
https://en.wikipedia.org/wiki/Kernel_(image_processing)
Upsampling/downsampling kernel:
http://www.johncostella.com/magic/
Note, from the code it seems you apply a 3x3 kernel but initially done on a 7x7 kernel. The equivalent 3x3 kernel as posted would be:
[1 1 1]
[1 1 1] * 1/9
[1 1 1]
I get image from a camera (calibrated and without lens distortions) and I need to detect a rectangular object. Markers are a good example. For markers I check corner count, min size, board contrast and convexity. I had an idea on how to improve this in cases where there is large amount of false rectangles.
Here is an example image:
Normally all of these are valid, because without knowing anything about camera we cannot determine if perspective allows these kinds of shapes. I know the size (or at least the ratio) of the rectangle in real-life. So I had an idea that I should be able to disregard many of these shapes just by reprojecting them and checking for error.
Like if I use solvePnPRansac it would not be able to converge if the shape is not possible. If it doesn't converge I just disregard it. Sadly, none of the OpenCV solve functions allow checking me for error or convergence. I actually need some ratio or quality, because it is possible that some of the rectangles overlap. For example my object finder identifies these rectangles:
One of the three is actually correct, or at least "the best". But I need some way to know which one it is. I cannot use things like line lengths because of the camera perspective. So I just thought I could solve and see which has the smallest error.
There are no lens distortions in the image, but even if there were solvePnP usually allows passing D to it as well.
Is this even possible or am I missing something?
I guess I could try hacking around solvePnPRansac just to return convergence, but maybe there is a simpler way?
I figured I can do something like what is done during calibration with a grid. I can calculate the reprojection error. So first I solve to get the transformation matrix. Then I transform the points in 3D using the transformation matrix and afterwards use projectPoints to project them back in 2D. Then I check distance between original 2D points and the projected 2D points. This can then be used for quality. Objects that are not possible often have 100 pixels or more reprojection error in my images, but possible objects have less than 20px. So I just did a 25 pixel cutoff and it seems to work fine.
Note that more transformations are possible than I though. In my original image maybe two are not possible with my current camera, but it still did reject a lot of fakes.
If nobody else has some ideas I will accept this as answer.
Here is some code for the method I use:
//This is the object in 3D
double width = 50.0; //Object is 50mm wide
double height = 30.0; //Object is 30mm tall
cv::Mat object_points(4,3,CV_64FC1);
object_points.at<double>(0,0)=0;
object_points.at<double>(0,1)=0;
object_points.at<double>(0,2)=0;
object_points.at<double>(1,0)=width;
object_points.at<double>(1,1)=0;
object_points.at<double>(1,2)=0;
object_points.at<double>(2,0)=width;
object_points.at<double>(2,1)=height;
object_points.at<double>(2,2)=0;
object_points.at<double>(3,0)=0;
object_points.at<double>(3,1)=height;
object_points.at<double>(3,2)=0;
//Check all rectangles for error
cv::Mat image_points(4,2,CV_64FC1);
for (size_t i = 0; i < rectangles_to_test.size(); i++) {
// Get rectangle points
for (size_t c = 0; c < 4; ++c) {
image_points.at<double>(c,0) = (rectangles_to_test[i].points[c].x);
image_points.at<double>(c,1) = (rectangles_to_test[i].points[c].y);
}
// Calculate transformation matrix
cv::Mat rvec, tvec;
cv::solvePnP(object_points, image_points, M1, D1, rvec, tvec);
cv::Mat rotation;
Matrix4<double> transform;
transform.init_identity();
cv::Rodrigues(rvec, rotation);
for(size_t row = 0; row < 3; ++row) {
for(size_t col = 0; col < 3; ++col) {
transform.set(row, col, rotation.at<double>(row, col));
}
transform.set(row, 3, tvec.at<double>(row, 0));
}
// Calculate projection
std::vector<cv::Point3f> p3(4);
std::vector<cv::Point2f> p2;
Vector4<double> p = transform * Vector4<double>(0, 0, 0, 1);
p3[0] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(width, 0, 0, 1);
p3[1] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(width, height, 0, 1);
p3[2] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
p = transform * Vector4<double>(0, height, 0, 1);
p3[3] = cv::Point3f((float)p.x, (float)p.y, (float)p.z);
cv::projectPoints(p3, cv::Mat::zeros(1, 3, CV_64FC1), cv::Mat::zeros(1, 3, CV_64FC1), M1, D1, p2);
// Calculate reprojection error
rectangles_to_test[i].reprojection_error = 0.0;
for (size_t c = 0; c < 4; ++c) {
double dx = p2[c].x - rectangles_to_test[i].points[c].x;
double dy = p2[c].y - rectangles_to_test[i].points[c].y;
rectangles_to_test[i].reprojection_error += std::sqrt(dx*dx + dy*dy);
}
if (rectangles_to_test[i].reprojection_error > reprojection_error_threshold) {
//rectangle is no good
}
}
Basically, I want to detect a fault in an image using logistic regression. I'm hoping to get so feedback on my approach, which is as follows:
For training:
Take a small section of the image marked "bad" and "good"
Greyscale them, then break them up into a series of 5*5 pixel segments
Calculate the histogram of pixel intensities for each of these segments
Pass the histograms along with the labels to the Logistic Regression class for training
Break the whole image into 5*5 segments and predict "good"/"bad" for each segment.
Using the sigmod function the linear regression equation is:
1/ (1 - e^(xθ))
Where x is the input values and theta (θ) is the weights. I use gradient descent to train the network. My code for this is:
void LogisticRegression::Train(float **trainingSet,float *labels, int m)
{
float tempThetaValues[m_NumberOfWeights];
for (int iteration = 0; iteration < 10000; ++iteration)
{
// Reset the temp values for theta.
memset(tempThetaValues,0,m_NumberOfWeights*sizeof(float));
float error = 0.0f;
// For each training set in the example
for (int trainingExample = 0; trainingExample < m; ++trainingExample)
{
float * x = trainingSet[trainingExample];
float y = labels[trainingExample];
// Partial derivative of the cost function.
float h = Hypothesis(x) - y;
for (int i =0; i < m_NumberOfWeights; ++i)
{
tempThetaValues[i] += h*x[i];
}
float cost = h-y; //Actual J(theta), Cost(x,y), keeps giving NaN use MSE for now
error += cost*cost;
}
// Update the weights using batch gradient desent.
for (int theta = 0; theta < m_NumberOfWeights; ++theta)
{
m_pWeights[theta] = m_pWeights[theta] - 0.1f*tempThetaValues[theta];
}
printf("Cost on iteration[%d] = %f\n",iteration,error);
}
}
Where sigmoid and the hypothesis are calculated using:
float LogisticRegression::Sigmoid(float z) const
{
return 1.0f/(1.0f+exp(-z));
}
float LogisticRegression::Hypothesis(float *x) const
{
float z = 0.0f;
for (int index = 0; index < m_NumberOfWeights; ++index)
{
z += m_pWeights[index]*x[index];
}
return Sigmoid(z);
}
And the final prediction is given by:
int LogisticRegression::Predict(float *x)
{
return Hypothesis(x) > 0.5f;
}
As we are using a histogram of intensities the input and weight arrays are 255 elements. My hope is to use it on something like a picture of an apple with a bruise and use it to identify the brused parts. The (normalized) histograms for the whole brused and apple training sets look somthing like this:
For the "good" sections of the apple (y=0):
For the "bad" sections of the apple (y=1):
I'm not 100% convinced that using the intensites alone will produce the results I want but even so, using it on a clearly seperable data set isn't working either. To test it I passed it a, labeled, completely white and a completely black image. I then run it on the small image below:
Even on this image it fails to identify any segments as being black.
Using MSE I see that the cost is converging downwards to a point where it remains, for the black and white test it starts at about cost 250 and settles on 100. The apple chuncks start at about 4000 and settle on 1600.
What I can't tell is where the issues are.
Is, the approach sound but the implementation broken? Is logistic regression the wrong algorithm to use for this task? Is gradient decent not robust enough?
I forgot to answer this... Basically the problem was in my histograms which when generated weren't being memset to 0. As to the overall problem of whether or not logistic regression with greyscale images was a good solution, the answer is no. Greyscale just didn't provide enough information for good classification. Using all colour channels was a bit better but I think the complexity of the problem I was trying to solve (bruises in apples) was a bit much for simple logistic regression on its own. You can see the results on my blog here.