Finding distance of rectangle with known aspect ratio in OpenCV - c++

I'm working on an OpenCV program to find the distance from the camera to a rectangle with a known aspect ratio. Finding the distance to a rectangle as seen from a forward-facing view works just fine:
The actual distance is very close to the distance calculated by this:
wtarget · pimage
d = c ——————————————————————————
2 · ptarget · tan(θfov / 2)
Where wtarget is the actual width (in inches) of the target, pimage is the pixel width of the overall image, ptarget is the length of the largest width (in pixels) of the detected quadrilateral, and θfov is the FOV of our webcam. This is then multiplied by some constant c.
The issue occurs when the target rectangle is viewed from a perspective that isn't forward-facing:
The difference in actual distance between these two orientations is small, but the detected distance differs by almost 2 feet.
What I'd like to know is how to calculate the distance consistently, accounting for different perspective angles. I've experimented with getPerspectiveTransform, but that requires me to know the resulting scale of the target - I only know the aspect ratio.

Here's what you know:
The distance between the top left and top right corners in inches (w_target)
The distance between those corners in pixels on a 2D plane (p_target)
So the trouble is that you're not accounting for the shrinking distance of p_target when the rectangle is at an angle. For example, when the rectangle is turned 45 degrees, you'll lose about half of your pixels in p_target, but your formula assumes w_target is constant, so you overestimate distance.
To account for this, you should estimate the angle the box is turned. I'm not sure of an easy way to extract that information out of getPerspectiveTransform, but it may be possible. You could set up a constrained optimization where the decision variables are distance and skew angle and enforce a metric distance between the left and right points on the box.
Finally, no matter what you are doing, you should make sure your camera is calibrated. Depending on your application, you might be able to use AprilTags to just solve your problem.


AprilTag Localization Expected Acuracy

I am using the University of Michigan AprilTag library for localizing objects and am seeking advice for meeting my localization accuracy goals. I am using a 0.4 MegaPixel camera, on tags that are roughly 7.5 cm wide from distances of 0.1-1.5 meters away. I have used MatLab to calibrate my camera intrinsics and distortion coefficients.
Desired Outcome
I would like to be able to localize tags to within 5 mm accuracy.
Observed Outcome
As I move the camera relative to the tag, the localization results vary. For every 100 cm I move away from the tag, I find drift in the projected location of the tag in the world of about 10cm.
What is a reasonable expectation for the accuracy of the my localization? What actions can I take to reduce the drift I am observing?
If the drift mainly appears in the Z component of the TVEC and the error increases more or less linearly it is a sure sign that the focal length (fx & fy in the camera matrix) of your calibration is off.
Try the following:
check your calibration board: Is the size of the grid correct? Make sure that your printer does not scale the original file
make sure that the calibration board is fixed on a sturdy, flat surface
calibrate again and check if the values of fx and fy have changed (entries (0,0) and (1,1) in the camera matrix).
use at least 50 pictures, vary the board's angle and remove all pictures showing motion blur before calibrating
also check your detection parameters: You can try to activate para.cornerRefinementMethod = cv2.aruco.CORNER_REFINE_APRILTAG to improve corner accuracy (if you are using c++, adjust the command accordingly).
(too long for a comment, so I have to post it as another answer:) This will depend on the pixel size of your sensor and the focal length of your lens (which will "scale" your actual pixel size to a "projected" pixel size). As the effective resolution changes with the distance, a safe estimate would be to use the 1.5 m effective pixel value. In terms of pixels I would not trust marker corner accuracies below 0.3 px as there seems to be an issue with subpixeling accuracy, when rotating the marker (see my open question: Understanding openCV aruco marker detection/pose estimation in detail: subpixel accuracy). Tilting the marker will also degrade the accuracy as the precision of the determined rotation (rvec of the pose) is usually only within a few degrees. If small angles (say e. g. tilted only by 2°) occur, the pose might not reflect that and thus the marker will appear smaller and the distance will thus be over-estimated. In a flat setup (provided you are not using a wide angle lens) you might be able to get the 5 mm accuracy with a sensor > 5 MPx. But taking into account tilt & rotation of the marker, I am not sure if it will suffice...

How a feature descriptior for traffic sign detection works in opencv

I am trying to understand how Centroid to Contour (CtC) detector works. Here is a sample code that I have found on Github, and I am trying to understand what is the idea behind this. In this sample, the author trying to detect speed sign using CtC. There are just two important functions:
I have understood a part of the code and how it works but I have some problems in understanding how CtC_features function works.
If you can help me I would like to understand the following parts (just 3 points):
Why if centroid.x > curr.x we need to add PI value to the angle result ( if (centroid.x > curr.x) ang += 3.14159; //PI )
When we start selecting the features on line 97 we set the start angle ( double ang = - 1.57079; ). Why is this half the pi value and negative? How was this value chosen?
And a more general question, how can you know that what feature you select are related to speed limit sign? You find the centroid of the image and adjust the angle in the first step, but in the second step how can you know if ( while (feature_v[i].first > ang) ) the current angle is bigger than your hardcode angle ( in first case ang = - 1.57079) then we add that distance as a feature.
I would like to understand the idea behind this code and if someone with more experience and with some knowledge about trigonometry would help me it will be just great.
Thank you.
The code you provided is not the best, but let's see what happens.
I took this starting image of a sign:
Then, pre_process is called, which basically runs a Canny edge detector, along with some tricks which should lead to a better edge detection. I won't look into them, but this is what it returns:
Not the greatest. Maybe some parameter tuning would help.
But now, CtC_features is called, which is the scope of the question. The role of CtC_features is to obtain some features for a machine learning algorithms. This amounts to finding a numerical description of the image which would help the ML algorithm detect the sign. Such a description can be anything. Think about how someone who never saw a STOP sign and does not know how to read would describe it. They would say something like "A red, flat plate, with 8 sides and some white stuff in the middle". Based on this description, someone might be able to tell it's a STOP sign. We want to do the same, but since computers are computers, we look for numerical features. And, with them, some algorithm could be trained to "learn" what features each sign has.
So, let's see what features does CtC_features obtains from the contours.
The first thing it does is to call findContours. This function takes a binary image and returns arrays of points representing the contours of the image. Basically, it takes the edges and puts them into arrays of points. With connectivity, so we basically know which points are connected. If we use the code from here for visualization, we can see what happens:
So, the array contours is a std::vector<std::vector<cv::Point>> and you have in each sub-array a continuous contour, here drawn with a different color.
Next, we compute the number of points (edge pixels), and do an average over their coordinates to find the centroid of the edge image. The centroid is the filled circle:
Then, we iterate over all points, and create a vector of std::pair<double, double>, recording for each point the distance from the centroid and the angle. The angle function is defined at the bottom of the file as
double angle(Point2f a, Point2f b) {
return atan((a.y - b.y) / (a.x - b.x));
It basically computes the angle of the line from a to b with respect to the x axis, using the arctangent function. I'll let you watch a video on arctangent, but tl;dr is that it gives you the angle from a ratio. In radians (a circle is 2 PI radians, half a circle is PI radians). The problem is that the function is periodic, with a period of PI. This means that there are 2 angles on the circle (the circle of all points at the same distance around the centroid) which give you the same value. So, we compute the ratio (the ratio is btw known as the tangent of the angle), apply the inverse function (arctangent) and we get an angle (corresponding to a point). But what if it's the other point? Well, we know that the other point is exactly with PI degrees offset (it is diametrically opposite), so we add PI if we detect that it's the other point.
The picture below also helps understand why there are 2 points:
The tangent of the angle is highlighted vertical distance,. But the angle on the other side of the diagonal line, which intersects the circle in the bottom left, also has the same tangent. The atan function gives the tangents only for angles on the left side of the center. Note that there are no 2 directions with the same tangent.
What the check does is to ask whether the point is on the right of the centroid. This is done in order to be able to add a half a circle (PI radians or 180 degrees) to correct for the result of atan.
Now, we know the distance (a simple formula) and we have found (and corrected) for the angle. We insert this pair into the vector feature_v, and we sort it. The sort function, called like that, sorts after the first element of the pair, so we sort after the angle, then after distance.
The interval variable:
int degree = 10;
double interval = double((double(degree) / double(360)) * 2 * 3.14159); //5 degrees interval
simply is value of degree, converted from degrees into radians. We need radians since the angles have been computed so far in radians, and degrees are more user friendly. Yep, the comment is wrong, the interval is 10 degrees, not 5.
The ang variable defined below it is -PI / 2 (a quarter of a circle):
double ang = - 1.57079;
Now, what it does is to divide the points around the centroid into bins, based on the angle. Each bin is 10 degrees wide. This is done by iterating over the points sorted after angle, all are accumulated until we get to the next bin. We are only interested in the largest distance of a point in each bin. The starting point should be small enough that all the direction (points) are captured.
In order to understand why it starts from -PI/2, we have to get back at the trigonometric function diagram above. What happens if the angle goes like this:
See how the highlighted vertical segment goes "downwards" on the y axis. This means that its length (and implicitly the tangent) is negative here. Also, the angle is considered to be negative (otherwise there would be 2 angles on the same side of the center with the same tangent). Now, we are interested in the range of angles we have. It's all the angles on the right side of the centroid, starting from the bottom at -PI/2 to the top at PI/2. A range of PI radians, or 180 degrees. This is also written in the documentation of atan:
If no errors occur, the arc tangent of arg (arctan(arg)) in the range [-PI/2, +PI/2] radians, is returned.
So, we simply split all the possible directions (360 degrees) into buckets of 10 degrees, and take the distance of the farthest point in each bin. Since the circle has 360 degrees, we'll get 360 / 10 = 36 bins. Then, these are normalized such that the greatest value is 1. This helps a bit with the machine learning algorithm.
How can we know if the point we selected belongs to the sign? We don't. Most computer vision make some assumptions regarding the image in order to simplify the problem. The idea of the algorithm is to determine the shape of the sign by recording the distance from the center to the edges. This makes the assumption that the centroid is roughly in the middle of the sign. Depending on the ML algorithm used, and on the training data, different levels of robustness can be obtained.
Also, it assumes that (some of) the edges can be reliably identified. See how in my image, the algorithm was not able to detect the upper left edge?
The good news is that this doesn't have to be perfect. ML algorithms know how to handle this variation (up to some extent) provided that they are appropriately trained. It doesn't have to be perfect, but it has to be good enough. In order to answer what good enough means, what are the actual limitations of the algorithm, some more testing needs to be done, as well as some understanding of the ML algorithm used. But this is also why ML is so popular in vision: it can handle a lot of variation quite well.
At the end, we basically get an array of 36 numbers, one for each of the 36 bins of 10 degrees, representing the maximum distance of a point in the bin. I assume this is because the developer of the algorithm wanted a way to capture the shape of the sign, by looking at distances from center in various directions. This assumes that no edges are detected in the background, and the sign looks something like:
The max distance is used to pick the border, and not the or other symbols on the sign.
It is not directly used here, but a possibly related reading is the Hough transform, which uses a similar particularization to detect straight lines in an image.

OpenCV Camera to Object Horizontal Angle Calculation

So, I'm a high school student and the lead programmer on my local robotics team, and this year I decided to try out OpenCV and do some vision processing on our robot.
From my vision code, I need to know a few things about some objects on our competition field. These things are: distance (ft), horizontal angle from camera, and horizontal distance from camera (ft). Essentially, one large right triangle.
I already have the camera successfully detecting these objects and putting a boundingRect around them. With a gyroscope on our robot, we should be able to get our robot to ~90 degree angle to the object once it is detected (as it's a set angle on the field). Thus, I can calculate distance just based on an empirically made function of the area of the boundingRect of the object.
The horizontal angle of the object from the camera, however, I'm not exactly sure how to approach. Once I have that, though, I can do some simple trig and get the horizontal distance.
So here's what we have/know: Distance to object in ft, object is at ~90 degrees to camera, camera has horizontal fov of 67 degrees w/ resolution of 800x600, the real world dimensions of the object, and a boundingRect around the object.
How would I, using all of this information, calculate the horizontal angle from the camera to the object?

Grouping different scale bounding boxes

I've created an openCV application for human detection on images.
I run my algorithm on the same image over different scales, and when detections are made, at the end I have information about the bounding box position and at which scale it was taken from. Then I want to transform that rectangle to the original scale, given that position and size will vary.
I've wrapped my head around this and I've gotten nowhere. This should be rather simple, but at the moment I am clueless.
Help anyone?
Ok, got the answer elsewhere
"What you should do is store the scale where you are at for each detection. Then transforming should be rather easy right. Imagine you have the following.
X and Y coordinates (center of bounding box) at scale 1/2 of the original. This means that you should multiply with the inverse of the scale to get the location in the original, which would be 2X, 2Y (again for the bounxing box center).
So first transform the center of the bounding box, than calculate the width and height of your bounding box in the original, again by multiplying with the inverse. Then from the center, your box will be +-width_double/2 and +-height_double/2."

how to detect center of a blurry circle with opencv

I have got the following image:
There are curves on the picture.
i would like to find center of the circles containing curves.
i tried opencv and hough circle transform but had no results.
The natural candidate would be cvHoughCircles. Each part of each curve adds a "vote" for an X/Y/R triplet which identifies the centrepoint. Now, you only have part of the circles, so the number of votes is limited and the accuracy reduced, but you probably suspected as much.
Here's what I would try first:
Observe that if you draw rays from the true center of the circles, the local maxima of the image intensity along them occur at intervals that are independent of the ray orientation. These intervals are the differences between the lengths of the radii of consecutive circles.
So fix a number or ray directions, say 16 equally spaced in [0, pi], and define a cost function parametrized on the (xc, yc) coordinates of the center, and the ri radii of the circles, with cost equal to, for example, the variance of the maxima locations along the radii
among different radii.
Threshold the image
erode it until there is little or no noise (small blobs)
dilate it back
find the big blob. If there are still some small blobs, select the max area.
use cv::moments to find its centroid