Char's bounding box order of vertices - google-cloud-platform

Google Vision API documentation states that vertices of detected characters will always be in the same order:
// The bounding box for the symbol.
// The vertices are in the order of top-left, top-right, bottom-right,
// bottom-left. When a rotation of the bounding box is detected the rotation
// is represented as around the top-left corner as defined when the text is
// read in the 'natural' orientation.
// For example:
// * when the text is horizontal it might look like:
// 0----1
// | |
// 3----2
// * when it's rotated 180 degrees around the top-left corner it becomes:
// 2----3
// | |
// 1----0
// and the vertice order will still be (0, 1, 2, 3).
However sometimes I can see a different order of vertices. Here is an example of two characters from the same image, which have the same orientation:
[x:778 y:316 x:793 y:316 x:793 y:323 x:778 y:323 ]
0----1
| |
3----2
and
[x:857 y:295 x:857 y:287 x:874 y:287 x:874 y:295 ]
1----2
| |
0----3
Why order of vertices is not the same? and not as in documentation?

It seems like a bug in Vision API.
The solution is to detect image orientation and then reorder vertices in the correct order.
Unfortunately Vision API doesn't provide image orientation in it's output, so I had to write code to detect it.
Horizontal/vertical orientation can be detected by comparing character height and width. Height is usually larger than width.
Next step is to detect direction of text. For example in case of vertical image orientation, text may go from up to down or from down to up.
Most of characters in output seem to appear in the natural way. So by looking at stats we can detect text direction. So for example:
line 1 has Y coord 1000
line 2 has Y coord 900
line 3 has Y coord 950
line 4 has Y coord 800
We can see that image is rotated upside down.

You must to reorder vertices of four poins(clockwise inverted from A to D):
A-B-C-D that:
A: min X, min Y
B: max X, min Y
C: max X, max Y
D: min X, max Y
And save to your rectangle object.
Update: You can order vertices by distance from O(0,0) for A-B-C-D order above.

Related

What value should Z actually be for perspective divide?

So I'm trying to understand the fundamentals of perspective projection for 3D graphics and I'm getting stuck. I'm trying to avoid matrices at the moment to try and make things easier for understanding. This is what I've come up with so far:
First I imagine I have a point coming in with screen (pixel) coordinates of x: 200, y: 600, z: 400. The z amount in this context represents the distance, in pixels, from the projection plane or monitor (this is just how I'm thinking of it). I also have a camera that I'm saying is 800 pixels from the projection plane/monitor (on the back side of the projection plane/monitor), so that acts as the focal length of the camera.
From my understanding, first I find the total z distance of the point 200, 600 by adding its z to the camera's focal length (400 + 800), which gives me a total z distance of 1200. Then, if I wanted to find the projected point of these coordinates I just need to multiply each coordinate (x & y) by (focal_length/z_distance) or 800/1200 which gives me the projected coordinates x: 133, y: 400.
Now, from what I understand, openGL expects me to send my point down in clips space (-1 to 1) so I shouldn't send my pixel values down as 200, 600. I would have to normalize my x and y coordinates to this -1 to 1 space first. So I normalize my x & y values like so:
xNorm = (x / (width/2)) - 1;
yNorm = (y / (height/2)) - 1;
This gives me normalized values of x: -.6875, y: -.0625. What I'm unsure of is what my Z would need to be if openGL is going to eventually divide these normalized values by it. I know aspect ratio probably needs to be entered into the equation but not sure how.

Find a point inside a rotated rectangle

Ok so, this should be super simple, but I'm not a smart man. Technically I want to know whether a point resides inside a rectangle, however the rectangle can be in different states. In my current context when I want to draw a rectangle rotated by, lets say, 45° clockwise, what I do is rotate the entire x,y axis centered at the top-left corner of the rectangle and then I just draw the rectangle as if nothing has happened. Same goes if I want to draw the rectangle at a random coordinate. Given that is the coordinate system who gets tossed and rotated, the rectangle always thinks it's being drawn at (0,0) with 0°, therefore, the best way to find if a given point is inside the rectangle would be to find the projection for the point based on the translation + rotation of the rectangle. But I have no idea how to do that.
This is what I currently do in order to find out if a point is inside a rectangle (not taking into consideration rotation):
bool Image::isPointInsideRectangle(int x, int y, const ofRectangle & rectangle){
return x - xOffset >= rectangle.getX() && x - xOffset <= rectangle.getX() + rectangle.getWidth() &&
y - yOffset >= rectangle.getY() && y - yOffset <= rectangle.getY() + rectangle.getHeight();
}
I already have angleInDegrees stored, as long as I could use it to project the (x,y) point I receive I should be able find out if the point is inside the rectangle.
Cheers!
Axel
The easiest way is to un-rotate x,y in the reverse direction relative to the origin and rotation of the rectangle.
For example, if angleInDegrees is 45 degrees, you would rotate the point to test -45 degrees (or 315 degrees if your rotation routine only allows positive rotations). This will plot the x,y on the same coordinate system as the unrotated rectangle.
Then, you can use the function you already provided to test whether the point is within the rectangle.
Note that prior to rotating x,y, you will probably need to adjust the x,y relative to the point of rotation - the upper-left corner of the rectangle. Since the rotation is relative to that point rather than the overall coordinate origin 0,0. You can compute the difference between x,y and the upper-left corner of your rectangle (which won't change during rotation), then simply rotate the adjusted point by -angleToRotate, then add the origin point difference back into the unrotated point to get absolute coordinates on your coordinate system.
Editted:
#include <cmath>
bool Image::isPointInsideRectangle(int x, int y, const ofRectangle & rectangle){
return x*cosd(deg) - y*sin(deg) + xOffset >= rectangle.getX()
&& x*cosd(deg) - y*sin(deg) + xOffset <= rectangle.getX() + rectangle.getWidth()
&& x*sind(deg) + y*cosd(deg) + yOffset >= rectangle.getY()
&& x*sind(deg) + y*cosd(deg) + yOffset <= rectangle.getY() + rectangle.getHeight();
Like you have already told, you could translate the coordinates of your point into the space of the rectangle. This is a common task in many software products which work with geometry. Each object have it own coordinate space and works as it would be at position (0, 0) without rotation. If your rectangle is at position v and rotated about b degree/radian, than you can translate your point P into the space of the rectangle with the following formula:
| cos(-b) -sin(-b) | | P_x - v_x |
| | ⋅ | |
| sin(-b) cos(-b) | | P_y - v_y |
Many of the most important transformations can be represented as matrices. At least if you are using homogeneous coordinates. It is also very common to do that. Depending of the complexity and the goals of your program you could consider to use some math library like glm and use the transformations of your objects in form of matrices. Then you could write something like inverse(rectangle.transformation()) * point to get point translated into the space of rectangle.

Translating x,y while scaling

I'm trying to graph x,y coordinates where the window size is 600px by 600px. (0,0) would be in the top left.
(300,300) middle of the window.
(600,600) would be in the bottom right.
I am trying to translate latitude/longitude in radians to pixels and then plotting them.
I'm calculating 1px = ? lat by
`fabs(lborder+rborder)/600`
I calculated lon by taking the top and bottom borders.
Then when I want to find a specific position for a specific lat or lon:
lat/(previous # calculated above)
Problem is my window goes from 0,0 to 600,600 as explained above and I can get negative points and I'm not sure how to move them and I don't know how to center them around 300,300 when the bounds change.
At the moment, as long as I make the center (0,0) in terms of (x,y), not pixels, points get plotted where they're supposed to go.
For example, if x is -1 to 1 and y is -3 to 3, (300px,300px) would be (0,0).
If I change the bounds to say x -.5 to 1 and y -3 to .5, (300px,300px) would be (.25, 1.25). However, the calculations above with these numbers.
1.5/600 = .0025 ----> 1px = .0025lat.
3.5/600 = .0058 ----> 1px = .0058lon.
Then taking the midpoint (.25,1.25):
.25/.0025 = 100px
1.25/.0058 = 215px
which is clearly not 300px,300px despite being the center of the graph.
Any ideas would be extremely helpful.
If you would like to adjust the coordinates to the center, then maybe somthing like this?
Coord coordsFromCenter(Coord old_coord, float height, float width)
{
Coord new_center;
float new_x_center = width/2.0;
float new_y_center = height/2.0;
new_center.set_x(old_coord.get_x() - new_x_center);
new_center.set_y(old_coord.get_y() - new_y_center);
return new_center;
}

How to detect image gradient or normal using OpenCV

I wanted to detect ellipse in an image. Since I was learning Mathematica at that time, I asked a question here and got a satisfactory result from the answer below, which used the RANSAC algorithm to detect ellipse.
However, recently I need to port it to OpenCV, but there are some functions that only exist in Mathematica. One of the key function is the "GradientOrientationFilter" function.
Since there are five parameters for a general ellipse, I need to sample five points to determine one. Howevere, the more sampling points indicates the lower chance to have a good guess, which leads to the lower success rate in ellipse detection. Therefore, the answer from Mathematica add another condition, that is the gradient of the image must be parallel to the gradient of the ellipse equation. Anyway, we'll only need three points to determine one ellipse using least square from the Mathematica approach. The result is quite good.
However, when I try to find the image gradient using Sobel or Scharr operator in OpenCV, it is not good enough, which always leads to the bad result.
How to calculate the gradient or the tangent of an image accurately? Thanks!
Result with gradient, three points
Result without gradient, five points
----------updated----------
I did some edge detect and median blur beforehand and draw the result on the edge image. My original test image is like this:
In general, my final goal is to detect the ellipse in a scene or on an object. Something like this:
That's why I choose to use RANSAC to fit the ellipse from edge points.
As for your final goal, you may try
findContours and [fitEllipse] in OpenCV
The pseudo code will be
1) some image process
2) find all contours
3) fit each contours by fitEllipse
here is part of code I use before
[... image process ....you get a bwimage ]
vector<vector<Point> > contours;
findContours(bwimage, contours, CV_RETR_LIST, CV_CHAIN_APPROX_NONE);
for(size_t i = 0; i < contours.size(); i++)
{
size_t count = contours[i].size();
Mat pointsf;
Mat(contours[i]).convertTo(pointsf, CV_32F);
RotatedRect box = fitEllipse(pointsf);
/* You can put some limitation about size and aspect ratio here */
if( box.size.width > 20 &&
box.size.height > 20 &&
box.size.width < 80 &&
box.size.height < 80 )
{
if( MAX(box.size.width, box.size.height) > MIN(box.size.width, box.size.height)*30 )
continue;
//drawContours(SrcImage, contours, (int)i, Scalar::all(255), 1, 8);
ellipse(SrcImage, box, Scalar(0,0,255), 1, CV_AA);
ellipse(SrcImage, box.center, box.size*0.5f, box.angle, 0, 360, Scalar(200,255,255), 1, CV_AA);
}
}
imshow("result", SrcImage);
If you focus on ellipse(no other shape), you can treat the value of the pixels of the ellipse as mass of the points.
Then you can calculate the moment of inertial Ixx, Iyy, Ixy to find out the angle, theta, which can rotate a general ellipse back to a canonical form (X-Xc)^2/a + (Y-Yc)^2/b = 1.
Then you can find out Xc and Yc by the center of mass.
Then you can find out a and b by min X and min Y.
--------------- update -----------
This method can apply to filled ellipse too.
More than one ellipse on a single image will fail unless you segment them first.
Let me explain more,
I will use C to represent cos(theta) and S to represent sin(theta)
After rotation to canonical form, the new X is [eq0] X=xC-yS and Y is Y=xS+yC where x and y are original positions.
The rotation will give you min IYY.
[eq1]
IYY= Sum(m*Y*Y) = Sum{m*(xS+yC)(xS+yC)} = Sum{ m(xxSS+yyCC+xySC) = Ixx*S^2 + Iyy*C^2 + Ixy*S*C
For min IYY, d(IYY)/d(theta) = 0 that is
2IxxSC - 2IyySC + Ixy(CC-SS) = 0
2(Ixx-Iyy)/Ixy = (SS-CC)/SC = S/C+C/S = Z+1/Z
While programming, the LHS is just a number, let's said N
Z^2 - NZ +1 =0
So there are two roots of Z hence theta, let's said Z1 and Z2, one will min the IYY and the other will max the IYY.
----------- pseudo code --------
Compute Ixx, Iyy, Ixy for a hollow or filled ellipse.
Compute theta1=atan(Z1) and theta2=atan(Z2)
Put These two theta into eq1 find which is smaller. Then you get theta.
Go back to those non-zero pixels, transfer them to new X and Y by the theta you found.
Find center of mass Xc Yc and min X and min Y by sort().
-------------- by hand -----------
If you need the original equation of the ellipse
Just put [eq0] into the canonical form
You're using terms in an unusual way.
Normally for images, the term "gradient" is interpreted as if the image is a mathematical function f(x,y). This gives us a (df/dx, df/dy) vector in each point.
Yet you're looking at the image as if it's a function y = f(x) and the gradient would be f(x)/dx.
Now, if you look at your image, you'll see that the two interpretations are definitely related. Your ellipse is drawn as a set of contrasting pixels, and as a result there are two sharp gradients in the image - the inner and outer. These of course correspond to the two normal vectors, and therefore are in opposite directions.
Also note that your image has pixels. The gradient is also pixelated. The way your ellipse is drawn, with a single pixel width means that your local gradient takes on only values that are a multiple of 45 degrees:
▄▄ ▄▀ ▌ ▀▄

OpenGL - GlVertex relative/absolute position

Imagen I have a list of 2D points (x,y) that describe a 2D terrain in my simple game.
I have then glVertex() to draw all those points in GL_POINTS mode.
Then I have a Ball that also has it's (x,y) coordinates.
I want the ball to have a definite size in relation to everything else (such as the terrain).
How should I set the values of the (x,y) coordinates to draw everything the size I want it?
Having a 600x400 screen.
I am troubled also because glVertex2f(1,1) will draw a primitive point on the upper right corner. So 1 represents to go 100% to the right or top. But the screen is 600x400 so I can't have dimensions of equal length on x and y axis.
Since 0 is 0% (absolute left/bottom) and 1 is 100% (absolute right/top), you just have to find a point in between that will line up with the pixels.
For example. Say your ball is 20x20 pixels. This means that it is 5% of the screen tall and 3.33% of the screen wide. Therefore, the square surrounding your ball would have the following vertices:
void drawBall()
{
glVertex2f(ball.x - (20/600)/2, ball.y - (20/400)/2);
glVertex2f(ball.x - (20/600)/2, ball.y + (20/400)/2);
glVertex2f(ball.x + (20/600)/2, ball.y + (20/400)/2);
glVertex2f(ball.x + (20/600)/2, ball.y - (20/400)/2);
}
See how I'm dividing the width of the ball by the width of the screen to get a floating point value that works with glVertex2f? Also, ball.x and ball.y should be a floating point value between 0 and 1.
I divide these numbers by 2 because I'm assuming that (ball.x, ball.y) is the coordinate of the center of the ball, so half of the addition goes on either side of the center.
You can write your own function that draws the vertices and that takes pixels in arguments:
#define WINDOW_WIDTH 600
#define WINDOW_HEIGHT 400
void glVertex_pixels(const double x,const double y){
glVertex2d(x * 2.0 / (double)WINDOW_WIDTH - 1.0, 1.0 - y * 2.0 / (double)WINDOW_HEIGHT);
}
You can also use a macro:
#define WINDOW_WIDTH 600
#define WINDOW_HEIGHT 400
#define glVertex_pixels(x,y) glVertex2d((double)(x) * 2.0 / (double)WINDOW_WIDTH - 1.0, 1.0 - (double)(y) * 2.0 / (double)WINDOW_HEIGHT);
No matter which of the above codes you use, the use of this function is simple. For example, the following code draws a vertex 10 pixels from the left side and 20 pixels from the top side:
glVertex_pixels(10,20);