Good Morning everybody,
Today I wanna concern about the topic "Image Manipulation in C++".
So far I am able to filter all the noisy stuff out of the picture and change the color to black and white.
But now I have two questions.
First Question:
Below you see a screenshot of the image. What is the best way to find out how to rotate the text. In the end it would be nice if the text is horizontal. Does anybody have a good link or an example.
Second Question:
How to go on? Do you think I should send the image to an "Optical Character Recognizer" (a) or should I filter out each letter (b)?
If the answer is (a) what is the smallest ocr lib? All libs I found so far seem to be overpowered and difficult to implement in an existing project. (like gocr or tesseract)
If the answer is (b) what is the best way to save each letter as an own image? Shoul i search for an white pixel an than go from pixel to pixel an save the coordinates in an 2D Array? What is with the letter "i" ;)
Thanks to everybody who will help me to find my way!Sorry for the strange english above. I'm still a language noob :-)
The usual name for the problem in your first question is "Skew Correction"
You may Google for it (lot of references). A nice paper here, showing for example how to get this:
An easy way to start (but not as good as the previously mentioned), is to perform a Principal Component Analysis:
For your first question:
First, Remove any "specs" of noisy white pixels that aren't part of the letter sequence. A gentle low-pass filter (pixel color = average of surrounding pixels) followed by a clamping of the pixel values to pure black or pure white. This should get rid of the little "dot" underneath the "a" character in your image and any other specs.
Now search for the following pixels:
xMin = white pixel with the lowest x value (white pixel closest to the left edge)
xMax = white pixel with the largest x value (white pixel closest to the right edge)
yMin = white pixel with the lowest y value (white pixel closest to the top edge)
yMax = white pixel with the largest y value (white pixel closest to the bottom edge)
with these four pixel values, form a bounding box: Rect(xMin, yMin, xMax, yMax);
compute the area of the bounding box and find the center.
using the center of the bounding box, rotate the box by N degrees. (You can pick N: 1 degree would be an ok value).
Repeat the process of finding xMin,xMax,yMin,yMax and recompute the area
Continue rotating by N degrees until you've rotated K degrees. Also rotate by -N degrees until you've rotated by -K degrees. (Where K is the max rotation... say 30 degrees). At each step recompute the area of the bounding box.
The rotation that produces the bounding box with the smallest area is likely the rotation that aligns the letters parallel to the bottom edge (horizontal alignment).
You could measure the height to each white pixel from the bottom and find how much the text is leaning. It's a very simple approach but it worked fine for me when I tried it.
Related
I am trying to use opencv's simple blob detector to determine the colors on the face of a Rubik's cube. I have been using this fantastic resource so far and it has proved to be very helpful and descriptive. After a fair bit of tweaking, I have been successful in making good filters for each color. I look at the x and y position of each color blob, and since the cubies have an even spacing, do a quick rounding division to determine which row and column they belong in, with groups of two being split and belonging in two different rows/columns respectively.
This is more of a curiosity question than anything else. To my eye, it looks like the centroids are being calculated incorrectly... shouldn't the drawn circle be more centered in each blob? Yet both seem to be sticking out arbitrarily to one side.
Below, I have the original image of the cube, and two of the color filters, the top for green, and the bottom for blue.
As you can see, the green and blue blobs are positioned correctly, and should be far enough away from each other to be classified into separate rows, but the centroids visually seem to be skewed from the centers of the blobs (green centroid should be more to the right, and blue more to the left). Is there something that I am not picking up on here? Is this just a quirk of the system?
For my undergraduate paper I am working on a iPhone Application using openCV to detect domino tiles. The detection works well in close areas, but when the camera is angled the tiles far away are difficult to detect.
My approach to solve this I would want to do some spacial calculations. For this I would need to convert a 2D Pixel value into world coordinates, calculate a new 3D position with a vector and convert these coordinates back to 2D and then check the colour/shape at that position.
Additionally I would need to know the 3D positions for Augmented Reality additions.
The Camera Matrix i got trough this link create opencv camera matrix for iPhone 5 solvepnp
The Rotationmatrix of the Camera I get from the Core Motion.
Using Aruco markers would be my last resort, as I woulnd't get the decided effect that I would need for the paper.
Now my question is, can i not make calculations when I know the locations and distances of the circles on a lets say Tile with a 5 on it?
I wouldn't need to have a measurement in mm/inches, I can live with vectors without measurements.
The camera needs to be able to be rotated freely.
I tried to invert the calculation sm'=A[R|t]M' to be able to calculate the 2D coordinates in 3D. But I am stuck with inverting the [R|t] even on paper, and I don't know either how I'd do that in swift or c++.
I have read so many different posts on forums, in books etc. and I am completely stuck and appreciate any help/input you can give me. Otherwise I'm screwed.
Thank you so much for your help.
Update:
By using the solvePnP that was suggested by Micka I was able to get the Rotation and Translation Vectors for the angle of the camera.
Meaning that if you are able to identify multiple 2D Points in your image and know their respective 3D World coordinates (in mm, cm, inch, ...), then you can get the mechanisms to project points from known 3D World coordinates onto the respective 2D coordinates in your image. (use the opencv projectPoints function).
What is up next for me to solve is the translation from 2D into 3D coordinates, where I need to follow ozlsn's approach with the inverse of the received matrices out of solvePnP.
Update 2:
With a top down view I am getting along quite well to being able to detect the tiles and their position in the 3D world:
tile from top Down
However if I am now angling the view, my calculations are not working anymore. For example I check the bottom Edge of a 9-dot group and the center of the black division bar for 90° angles. If Corner1 -> Middle Edge -> Bar Center and Corner2 -> Middle Edge -> Bar Center are both 90° angles, than the bar in the middle is found and the position of the tile can be found.
When the view is Angled, then these angles will be shifted due to the perspective to lets say 130° and 50°. (I'll provide an image later).
The Idea I had now is to make a solvePNP of 4 Points (Bottom Edge plus Middle), claculate solvePNP and then rotate the needed dots and the center bar from 2d position to 3d position (height should be irrelevant?). Then i could check with the translated points if the angles are 90° and do also other needed distance calculations.
Here is an image of what I am trying to accomplish:
Markings for Problem
I first find the 9 dots and arrange them. For each Edge I try to find the black bar. As said above, seen from Top, the angle blue corner, green middle edge to yellow bar center is 90°.
However, as the camera is angled, the angle is not 90° anymore. I also cannot check if both angles are 180° together, that would give me false positives.
So I wanted to do the following steps:
Detect Center
Detect Edges (3 dots)
SolvePnP with those 4 points
rotate the edge and the center points (coordinates) to 3D positions
Measure the angles (check if both 90°)
Now I wonder how I can transform the 2D Coordinates of those points to 3D. I don't care about the distance, as I am just calculating those with reference to others (like 1.4 times distance Middle-Edge) etc., if I could measure the distance in mm, that would even be better though. Would give me better results.
With solvePnP I get the rvec which I could change into the rotation Matrix (with Rodrigues() I believe). To measure the angles, my understanding is that I don't need to apply the translation (tvec) from solvePnP.
This leads to my last question, when using the iPhone, can't I use the angles from the motion detection to build the rotation matrix beforehand and only use this to rotate the tile to show it from the top? I feel that this would save me a lot of CPU Time, when I don't have to solvePnP for each tile (there can be up to about 100 tile).
Find Homography
vector<Point2f> tileDots;
tileDots.push_back(corner1);
tileDots.push_back(edgeMiddle);
tileDots.push_back(corner2);
tileDots.push_back(middle.Dot->ellipse.center);
vector<Point2f> realLivePos;
realLivePos.push_back(Point2f(5.5,19.44));
realLivePos.push_back(Point2f(12.53,19.44));
realLivePos.push_back(Point2f(19.56,19.44));
realLivePos.push_back(Point2f(12.53,12.19));
Mat M = findHomography(tileDots, realLivePos, CV_RANSAC);
cout << "M = "<< endl << " " << M << endl << endl;
vector<Point2f> barPerspective;
barPerspective.push_back(corner1);
barPerspective.push_back(edgeMiddle);
barPerspective.push_back(corner2);
barPerspective.push_back(middle.Dot->ellipse.center);
barPerspective.push_back(possibleBar.center);
vector<Point2f> barTransformed;
if (countNonZero(M) < 1)
{
cout << "No Homography found" << endl;
} else {
perspectiveTransform(barPerspective, barTransformed, M);
}
This however gives me wrong values, and I don't know anymore where to look (Sehe den Wald vor lauter Bäumen nicht mehr).
Image Coordinates https://i.stack.imgur.com/c67EH.png
World Coordinates https://i.stack.imgur.com/Im6M8.png
Points to Transform https://i.stack.imgur.com/hHjBM.png
Transformed Points https://i.stack.imgur.com/P6lLS.png
You see I am even too stupid to post 4 images here??!!?
The 4th index item should be at x 2007 y 717.
I don't know what I am doing wrongly here.
Update 3:
I found the following post Computing x,y coordinate (3D) from image point which is doing exactly what I need. I don't know maybe there is a faster way to do it, but I am not able to find it otherwise. At the moment I can do the checks, but still need to do tests if the algorithm is now robust enough.
Result with SolvePnP to find bar Center
The matrix [R|t] is not square, so by-definition, you cannot invert it. However, this matrix lives in the projective space, which is nothing but an extension of R^n (Euclidean space) with a '1' added as the (n+1)st element. For compatibility issues, the matrices that multiplies with vectors of the projective space are appended by a '1' at their lower-right corner. That is : R becomes
[R|0]
[0|1]
In your case [R|t] becomes
[R|t]
[0|1]
and you can take its inverse which reads as
[R'|-Rt]
[0 | 1 ]
where ' is a transpose. The portion that you need is the top row.
Since the phone translates in the 3D space, you need the distance of the pixel in consideration. This means that the answer to your question about whether you need distances in mm/inches is a yes. The answer changes only if you can assume that the ratio of camera translation to the depth is very small and this is called weak perspective camera. The question that you're trying to tackle is not an easy one. There is still people researching on this at PhD degree.
I am trying to develop box sorting application in qt and using opencv. I want to measure width and length of box.
As shown in image above i want to detect only outermost lines (ie. box edges), which will give me width and length of box, regardless of whatever printed inside the box.
What i tried:
First i tried using Findcontours() and selected contour with max area, but the contour of outer edge is not enclosed(broken somewhere in canny output) many times and hence not get detected as a contour.
Hough line transform gives me too many lines, i dont know how to get only four lines am interested in out of that.
I tried my algorithm as,
Convert image to gray scale.
Take one column of image, compare every pixel with next successive pixel of that column, if difference in there value is greater than some threshold(say 100) that pixel belongs to edge, so store it in array. Do this for all columns and it will give upper line of box parallel to x axis.
Follow the same procedure, but from last column and last row (ie. from bottom to top), it will give lower line parallel to x axis.
Likewise find lines parallel to y axis as well. Now i have four arrays of points, one for each side.
Now this gives me good results if box is placed in such a way that its sides are exactly parallel to X and Y axis. If box is placed even slightly oriented in some direction, it gives me diagonal lines which is obvious as shown in below image.
As shown in image below i removed first 10 and last 10 points from all four arrays of points (which are responsible for drawing diagonal lines) and drew the lines, which is not going to work when box is tilted more and also measurements will go wrong.
Now my question is,
Is there any simpler way in opencv to get only outer edges(rectangle) of box and get there dimensions, ignoring anything printed on the box and oriented in whatever direction?
I am not necessarily asking to correct/improve my algorithm, but any suggestions on that also welcome. Sorry for such a big post.
I would suggest the following steps:
1: Make a mask image by using cv::inRange() (documentation) to select the background color. Then use cv::not() to invert this mask. This will give you only the box.
2: If you're not concerned about shadow, depth effects making your measurment inaccurate you can proceed right away with trying to use cv::findContours() again. You select the biggest contour and store it's cv::rotatedRect.
3: This cv::rotatedRect will give you a rotatedRect.size that defines the width en the height of your box in pixels
Since the box is placed in a contrasting background, you should be able to use Otsu thresholding.
threshold the image (use Otsu method)
filter out any stray pixels that are outside the box region (let's hope you don't get many such pixels and can easily remove them with a median or a morphological filter)
find contours
combine all contour points and get their convex hull (idea here is to find the convex region that bounds all these contours in the box region regardless of their connectivity)
apply a polygon approximation (approxPolyDP) to this convex hull and check if you get a quadrangle
if there are no perspective distortions, you should get a rectangle, otherwise you will have to correct it
if you get a rectangle, you have its dimensions. You can also find the minimum area rectangle (minAreaRect) of the convexhull, which should directly give you a RotatedRect
I've created an openCV application for human detection on images.
I run my algorithm on the same image over different scales, and when detections are made, at the end I have information about the bounding box position and at which scale it was taken from. Then I want to transform that rectangle to the original scale, given that position and size will vary.
I've wrapped my head around this and I've gotten nowhere. This should be rather simple, but at the moment I am clueless.
Help anyone?
Ok, got the answer elsewhere
"What you should do is store the scale where you are at for each detection. Then transforming should be rather easy right. Imagine you have the following.
X and Y coordinates (center of bounding box) at scale 1/2 of the original. This means that you should multiply with the inverse of the scale to get the location in the original, which would be 2X, 2Y (again for the bounxing box center).
So first transform the center of the bounding box, than calculate the width and height of your bounding box in the original, again by multiplying with the inverse. Then from the center, your box will be +-width_double/2 and +-height_double/2."
Another tricky question. What you can see here is my physical pyramid built with 3 leds which form a triangle in 1 plane and another led in the mid center, about 18mm above the other 3. The 4th one makes the triangle to a pyramid. (You may can see it better if you look on the right triangle. This one is rotated about the horizontal achsis, and you can see a diode on a stick very well).
The second picture shows my running program. The left box shows the raw picture of the leds (photo with ir-filter). The picture in the center shows that my program found the points and is also able to tell which point is which, based on some conditions (like C is always where the both lines with maximal distance betweens diodes intersect; and the both longest lengths are always a and b). But dont care about this, i know the points are 100% correctly found.
Then on the right picture are some calculated values, like the height between C and c and so on. I would be able to calculate more, but i didnt bother to care for now, cause I am stuck.
I want to calculate the pyramids rotation and translation in the 3 dimensional space.
The yellow points are the leds after rotation arround an axis throught the center of the triangle in camera z- direction. So now i do not have to worry about this, when calculating the other 2. The Rotation arround the horizontal axis, and the rotation arround the vertical axis. I could easily calculate this with the lengths of the distance from the center of the triangle to the 4th diode (as you can see the 4th diode moves on the image plane with rotation), or the lengths of the both axes.
But my problem is the unknown depth.
It affects all lengths (a,b,c, and also the lengths from the center to the 4th diode if we call this d and e). I know the measurments of the real pyramid, with a tolerance of +-5% or so, but they get also affected by the zoom. So how do i deal with this?
I thought of an equation with a ratio between something with the lengths of the horizontal axis, the length of the vertical axis, the angles alpha, beta and gamma, and the lengths d and e.
Alpha, beta and gamma only get affected by rotation arround the axes (which i want to know. i want to know the rotation and the zoom), where a rotation arround one axis has the opposite effect than a rotation arround the other. So if you rotate arround both axes in the same angle, the ratio between the length of the axes is the same as before.
The zoom (real: how close it is to the camera; what i want to know in 1st place: multiplication factor 2x, 3x,0.5, 0,4322344,.....) does not affect the angles, but all the lengths: a,b,c,d,e,hc (vertical length of axis), hx (i have not calculated it yet, but it would be easy. the name hx can vary, i just thought of something random right now; it is the length of the horizontal axis) in the same way (i guess).
You see i have thought of many, but i think i am too dumb.
So, is there any math genius out there wo can give me the right equations, for either the rotation OR/AND the zoomfactor?
(i also thought about using Posit/Downhill- Simplex, and so on, but this would be the nicest, since i already know so much, like all Points, and so on and so on)
Please, please, i need your help really bad! I am writing this in C++ and with help of OpenCV if you need to know, but i think its more a mathematical problem.
Thanks in advance!
Ah, and Alpha seems to be always the same as Beta!
Edit: Had to delete the second picture
Have a look to Boost Geometry or here also
Have a look at SolvePnP() in OpenCV. Even if you don't use it directly, the documentation has citations for the methods used.