How can I find the rotation of a document? - computer-vision

Take the following scanned image as an example:
By looking at the borders, you can clearly see that it is rotated slightly to the left. How can I detect the amount of this rotation? (In order to fix it)
Edge detection would "highlight" the border, but except for trying many rotations and building an x- / y- histogram I don't know how to use this information. And the iterative approach seems to be more computationally intensive than appropriate for such a simple problem.
I am looking for Pseudo-code / an algorithmic idea. Hence I didn't tag this question with a programming language. However, if you like to give code, I prefer Python.

What about the following approach (assuming the amount of rotation is small and there are indeed enough horizontal / vertical line features in the document):
Extract all (parametric) lines in the document via a Hough transform,
Classify lines as either close to horizontal or vertical (and simply discard the ones that can't be classified up to some tolerance),
Robustly fit a rotation to minimize the deviation of the lines from their expected orientation (RANSAC variant with a rotation solver).
For 3., one could start without any RANSAC (just fit a best rotation to all the horizontal / vertical lines) and only add it if there are noticeable blunders that need to be taken care of.
Regarding fitting a rotation to lines, each line can be parameterized with a unit 2D vector n and a scalar d s.t. a 2D point M belongs to the line iff n . M + d = 0. The vector n gives the orientation of the line and based on the classification is expected to be close to a reference vector n_0 (e_x, -e_x, e_y or -e_y depending on the actual classification). So a possible objective function would be F(theta) = argmax_theta 1/2 sum_i | ( R(theta) n^i ) . n_0^i |^2 where theta is the angle of the 2D rotation to apply, R(theta) the corresponding 2 x 2 rotation matrix. This objective function finds the rotation that maximizes the alignment between the rotated line vectors and the expected line vectors. If the square is dropped from the objective function, then the objective function actually boils down to a simplified Kabsch algorithm / absolute orientation in 2D without translation / scale which can be solved with an SVD.

The deskew package works pretty well for normal text documents. It doesn't work well for images and also not for the given example.

Related

How a feature descriptior for traffic sign detection works in opencv

I am trying to understand how Centroid to Contour (CtC) detector works. Here is a sample code that I have found on Github, and I am trying to understand what is the idea behind this. In this sample, the author trying to detect speed sign using CtC. There are just two important functions:
pre_process()
CtC_features
I have understood a part of the code and how it works but I have some problems in understanding how CtC_features function works.
If you can help me I would like to understand the following parts (just 3 points):
Why if centroid.x > curr.x we need to add PI value to the angle result ( if (centroid.x > curr.x) ang += 3.14159; //PI )
When we start selecting the features on line 97 we set the start angle ( double ang = - 1.57079; ). Why is this half the pi value and negative? How was this value chosen?
And a more general question, how can you know that what feature you select are related to speed limit sign? You find the centroid of the image and adjust the angle in the first step, but in the second step how can you know if ( while (feature_v[i].first > ang) ) the current angle is bigger than your hardcode angle ( in first case ang = - 1.57079) then we add that distance as a feature.
I would like to understand the idea behind this code and if someone with more experience and with some knowledge about trigonometry would help me it will be just great.
Thank you.
The code you provided is not the best, but let's see what happens.
I took this starting image of a sign:
Then, pre_process is called, which basically runs a Canny edge detector, along with some tricks which should lead to a better edge detection. I won't look into them, but this is what it returns:
Not the greatest. Maybe some parameter tuning would help.
But now, CtC_features is called, which is the scope of the question. The role of CtC_features is to obtain some features for a machine learning algorithms. This amounts to finding a numerical description of the image which would help the ML algorithm detect the sign. Such a description can be anything. Think about how someone who never saw a STOP sign and does not know how to read would describe it. They would say something like "A red, flat plate, with 8 sides and some white stuff in the middle". Based on this description, someone might be able to tell it's a STOP sign. We want to do the same, but since computers are computers, we look for numerical features. And, with them, some algorithm could be trained to "learn" what features each sign has.
So, let's see what features does CtC_features obtains from the contours.
The first thing it does is to call findContours. This function takes a binary image and returns arrays of points representing the contours of the image. Basically, it takes the edges and puts them into arrays of points. With connectivity, so we basically know which points are connected. If we use the code from here for visualization, we can see what happens:
So, the array contours is a std::vector<std::vector<cv::Point>> and you have in each sub-array a continuous contour, here drawn with a different color.
Next, we compute the number of points (edge pixels), and do an average over their coordinates to find the centroid of the edge image. The centroid is the filled circle:
Then, we iterate over all points, and create a vector of std::pair<double, double>, recording for each point the distance from the centroid and the angle. The angle function is defined at the bottom of the file as
double angle(Point2f a, Point2f b) {
return atan((a.y - b.y) / (a.x - b.x));
}
It basically computes the angle of the line from a to b with respect to the x axis, using the arctangent function. I'll let you watch a video on arctangent, but tl;dr is that it gives you the angle from a ratio. In radians (a circle is 2 PI radians, half a circle is PI radians). The problem is that the function is periodic, with a period of PI. This means that there are 2 angles on the circle (the circle of all points at the same distance around the centroid) which give you the same value. So, we compute the ratio (the ratio is btw known as the tangent of the angle), apply the inverse function (arctangent) and we get an angle (corresponding to a point). But what if it's the other point? Well, we know that the other point is exactly with PI degrees offset (it is diametrically opposite), so we add PI if we detect that it's the other point.
The picture below also helps understand why there are 2 points:
The tangent of the angle is highlighted vertical distance,. But the angle on the other side of the diagonal line, which intersects the circle in the bottom left, also has the same tangent. The atan function gives the tangents only for angles on the left side of the center. Note that there are no 2 directions with the same tangent.
What the check does is to ask whether the point is on the right of the centroid. This is done in order to be able to add a half a circle (PI radians or 180 degrees) to correct for the result of atan.
Now, we know the distance (a simple formula) and we have found (and corrected) for the angle. We insert this pair into the vector feature_v, and we sort it. The sort function, called like that, sorts after the first element of the pair, so we sort after the angle, then after distance.
The interval variable:
int degree = 10;
double interval = double((double(degree) / double(360)) * 2 * 3.14159); //5 degrees interval
simply is value of degree, converted from degrees into radians. We need radians since the angles have been computed so far in radians, and degrees are more user friendly. Yep, the comment is wrong, the interval is 10 degrees, not 5.
The ang variable defined below it is -PI / 2 (a quarter of a circle):
double ang = - 1.57079;
Now, what it does is to divide the points around the centroid into bins, based on the angle. Each bin is 10 degrees wide. This is done by iterating over the points sorted after angle, all are accumulated until we get to the next bin. We are only interested in the largest distance of a point in each bin. The starting point should be small enough that all the direction (points) are captured.
In order to understand why it starts from -PI/2, we have to get back at the trigonometric function diagram above. What happens if the angle goes like this:
See how the highlighted vertical segment goes "downwards" on the y axis. This means that its length (and implicitly the tangent) is negative here. Also, the angle is considered to be negative (otherwise there would be 2 angles on the same side of the center with the same tangent). Now, we are interested in the range of angles we have. It's all the angles on the right side of the centroid, starting from the bottom at -PI/2 to the top at PI/2. A range of PI radians, or 180 degrees. This is also written in the documentation of atan:
If no errors occur, the arc tangent of arg (arctan(arg)) in the range [-PI/2, +PI/2] radians, is returned.
So, we simply split all the possible directions (360 degrees) into buckets of 10 degrees, and take the distance of the farthest point in each bin. Since the circle has 360 degrees, we'll get 360 / 10 = 36 bins. Then, these are normalized such that the greatest value is 1. This helps a bit with the machine learning algorithm.
How can we know if the point we selected belongs to the sign? We don't. Most computer vision make some assumptions regarding the image in order to simplify the problem. The idea of the algorithm is to determine the shape of the sign by recording the distance from the center to the edges. This makes the assumption that the centroid is roughly in the middle of the sign. Depending on the ML algorithm used, and on the training data, different levels of robustness can be obtained.
Also, it assumes that (some of) the edges can be reliably identified. See how in my image, the algorithm was not able to detect the upper left edge?
The good news is that this doesn't have to be perfect. ML algorithms know how to handle this variation (up to some extent) provided that they are appropriately trained. It doesn't have to be perfect, but it has to be good enough. In order to answer what good enough means, what are the actual limitations of the algorithm, some more testing needs to be done, as well as some understanding of the ML algorithm used. But this is also why ML is so popular in vision: it can handle a lot of variation quite well.
At the end, we basically get an array of 36 numbers, one for each of the 36 bins of 10 degrees, representing the maximum distance of a point in the bin. I assume this is because the developer of the algorithm wanted a way to capture the shape of the sign, by looking at distances from center in various directions. This assumes that no edges are detected in the background, and the sign looks something like:
The max distance is used to pick the border, and not the or other symbols on the sign.
It is not directly used here, but a possibly related reading is the Hough transform, which uses a similar particularization to detect straight lines in an image.

How can I get a point on a spiral given degrees of rotation?

The closest thing I've found to help explain what I need is here in this question: Draw equidistant points on a spiral
However, that's not exactly what I want.
The spiral to draw is an archimedean spiral and the points obtained must be equidistant from each other. (Quote: From the question linked above.)
This is precisely what I want given the Archimedean Spiral equation, .
There is a specific set of data a user can input, they are NOT based on spirals but circular figures in general. They are as follows: center point [X,Y,Z], radius, horizontal separation [can be called X separation, depends on figure], and vertical separation [can be called Y separation, depends on figure], and most importantly degrees of rotation. I'd like the horizontal separation to be the distance between consecutive points since they are the ones that need to be the same distance between each other. I'd also like vertical separation to be the distance between the 'parallel' curves.
So given that specific input selection (and yes, some can be ignored), how can I iterate through all of the consecutive, equidistant points it would take to reach the input degrees (which can be very large but is finite) and return the X and Y point of each point of those points?
Basically what I'm try to achieve is a loop from zero to the number of degrees in the input, given all of the rest of the input and my preferences noted above, and drawing a point for all of the equidistant, consecutive points (if you decide to represent using code, just represent the drawing using a 'print').
I'm having a hard time explaining, but I think I got it pretty much covered. The points on this graph are exactly what I need:
Assuming a 2D case and an archimedean spiral centered around zero (a=0), so with equation . Successive lines are then apart, so to obtain a 'vertical spacing' of , set .
The length of the arc from the centre to a point at given angle is given by Wolfram, but his solution is difficult to working with. Instead, we can approximate the length of the arc (using a very rough for-large-theta approximation) to . Rearranging, , allowing us to determine what angles correspond to the desired 'horizontal spacing'. If this approximation is not good enough, I would look at using something like Newton-Raphson. The question you link to uses also uses an approximation, although not the same one.
Finally, recognising that polar coordinates translate to cartesian as follows: ; .
I get the following:
This is generated by the following MATLAB code, but it should be straight-forward enough to translate to C++ if this is what you actually need.
% Entered by user
vertspacing = 1;
horzspacing = 1;
thetamax = 10*pi;
% Calculation of (x,y) - underlying archimedean spiral.
b = vertspacing/2/pi;
theta = 0:0.01:thetamax;
x = b*theta.*cos(theta);
y = b*theta.*sin(theta);
% Calculation of equidistant (xi,yi) points on spiral.
smax = 0.5*b*thetamax.*thetamax;
s = 0:horzspacing:smax;
thetai = sqrt(2*s/b);
xi = b*thetai.*cos(thetai);
yi = b*thetai.*sin(thetai);

Finding "how straight" is a shape. openCV

I'm working on an application were I have a set of Contours(each one representing a Potential Line) and I wanna check "How straight" is that contour/shape.
The article I am using as a refrence uses the following technique:
It Matches a "segmented" line crossing the shape like so-
Then grading how "straight" is the line.
Heres an example of the Contours I am working on:
How would you go about implementing this technique?
Is there any other way of checking "How Straight" is a contour\shape?
Regards!
My first guess would be to use a coefficient of determination. That would be, fit a linear line to all your point assuming some reasonable origin where you won't receive rounding errors and calculate R^2.
A more advanced approach, if all contours are disconnected components, would be to calculate the structure model index (the link is for bone morphometry, but they explain the concept and cite the original paper.) This gives you a number that tells you how much your segment is "like a rod". This is just an idea, though. Anything that forms curves or has branches will be less and less like a rod.
I would say that it also depends on what you are using the metric for and if your contours are always generally carrying left to right.
An additional method would be to create the covariance matrix of your points, calculate the eigenvalues from that matrix, and take their ratio (where the ratio is greater than or equal to 1; otherwise, invert the ratio.) This is the basic principle behind a PCA besides the final ratio. If you have a rather linear data set (the data set varies in only one direction) then you will have a very large ratio. As the data set becomes less and less linear (or more uncorrelated) you would see the ratio approach one. A perfectly linear data set would be infinity and a perfect circle one (I believe, but I would appreciate if someone could verify this for me.) Also, working in two dimensions would mean the calculation would be computationally cheap and straight forward.
This would handle outliers very well and would be invariant to the rotation and shape of your contour. You also have a number which is always positive. The only issue would be preventing overflow when dividing the two eigenvalues. Then again you could always divide the smaller eigenvalue by the larger and your metric would be bound between zero and one, one being a circle and zero being a straight line.
Either way, you would need to test if this parameter is sensitive enough for your application.
One example for a simple algorithm is using the dot product between two segments to determine the angle between them. The formula for dot product is:
A * B = ||A|| ||B|| cos(theta)
Solving the equation for cos(theta) yields
cos(theta) = (A * B / (||A|| ||B||))
Since cos(0) = 1, cos(pi) = -1.0 and you're checking for the "straightness" of the lines, a line whose normalization of cos(theta) angles is closest to -1.0 is the straightest.
straightness = SUM(cos(theta))/(number of line segments)
where a straight line is close to -1.0, and a non-straight line approaches 1.0. Keep in mind this is a cursory evaluation of this algorithm and it obviously has edge cases and caveats that would need to be addressed in an implementation.
The trick is to use image moments. In short, you calculate the minimum inertia around an axis, the inertia around an axis perpendicular to this, and the ratio between them (which is always between 0 and 1; since inertia is non-negative)
For a straight line, the inertia along the line is zero, so the ratio is also zero. For a circle, the inertia is the same along all axis so the ratio is one. Your segmented line will be 0.01 or so as it's a fairly good match.
A simpler method is to compare the circumference of the the convex polygon containing the shape with the circumference of the shape itself. For a line, they're trivially equal, and for a not too crooked shape it's still comparable.

displacement between two images using opencv surf

I am working on image processing with OPENCV.
I want to find the x,y and the rotational displacement between two images in OPENCV.
I have found the features of the images using SURF and the features have been matched.
Now i want to find the displacement between the images. How do I do that? Can RANSAC be useful here?
regards,
shiksha
Rotation and two translations are three unknowns so your min number of matches is two (since each match delivers two equations or constraints). Indeed imagine a line segment between two points in one image and the corresponding (matched) line segment in another image. The difference between segments' orientations gives you a rotation angle. After you rotated just use any of the matched points to find translation. Thus this is 3DOF problem that requires two points. It is called Euclidean transformation or rigid body transformation or orthogonal Procrustes.
Using Homography (that is 8DOF problem ) that has no close form solution and relies on non-linear optimization is a bad idea. It is slow (in RANSAC case) and inaccurate since it adds 5 extra DOF. RANSAC is only needed if you have outliers. In the case of pure noise and overdetrmined system (more than 2 points) your optimal solution that minimizes the sum of squares of geometric distance between matched points is given in a close form by:
Problem statement: min([R*P+t-Q]2), R-rotation, t-translation
Solution: R = VUT, t = R*Pmean-Qmean
where X=P-Pmean; Y=Q-Qmean and we take SVD to get X*YT=ULVT; all matrices have data points as columns. For a gentle intro into rigid transformations see this

Finding curvature from a noisy set of data points using 2d/3dsplines? (C++)

I am trying to extract the curvature of a pulse along its profile (see the picture below). The pulse is calculated on a grid of length and height: 150 x 100 cells by using Finite Differences, implemented in C++.
I extracted all the points with the same value (contour/ level set) and marked them as the red continuous line in the picture below. The other colors are negligible.
Then I tried to find the curvature from this already noisy (due to grid discretization) contour line by the following means:
(moving average already applied)
1) Curvature via Tangents
The curvature of the line at point P is defined by:
So the curvature is the limes of angle delta over the arclength between P and N. Since my points have a certain distance between them, I could not approximate the limes enough, so that the curvature was not calculated correctly. I tested it with a circle, which naturally has a constant curvature. But I could not reproduce this (only 1 significant digit was correct).
2) Second derivative of the line parametrized by arclength
I calculated the first derivative of the line with respect to arclength, smoothed with a moving average and then took the derivative again (2nd derivative). But here I also got only 1 significant digit correct.
Unfortunately taking a derivative multiplies the already inherent noise to larger levels.
3) Approximating the line locally with a circle
Since the reciprocal of the circle radius is the curvature I used the following approach:
This worked best so far (2 correct significant digits), but I need to refine even further. So my new idea is the following:
Instead of using the values at the discrete points to determine the curvature, I want to approximate the pulse profile with a 3 dimensional spline surface. Then I extract the level set of a certain value from it to gain a smooth line of points, which I can find a nice curvature from.
So far I could not find a C++ library which can generate such a Bezier spline surface. Could you maybe point me to any?
Also do you think this approach is worth giving a shot, or will I lose too much accuracy in my curvature?
Do you know of any other approach?
With very kind regards,
Jan
edit: It seems I can not post pictures as a new user, so I removed all of them from my question, even though I find them important to explain my issue. Is there any way I can still show them?
edit2: ok, done :)
There is ALGLIB that supports various flavours of interpolation:
Polynomial interpolation
Rational interpolation
Spline interpolation
Least squares fitting (linear/nonlinear)
Bilinear and bicubic spline interpolation
Fast RBF interpolation/fitting
I don't know whether it meets all of your requirements. I personally have not worked with this library yet, but I believe cubic spline interpolation could be what you are looking for (two times differentiable).
In order to prevent an overfitting to your noisy input points you should apply some sort of smoothing mechanism, e.g. you could try if things like Moving Window Average/Gaussian/FIR filters are applicable. Also have a look at (Cubic) Smoothing Splines.