Computing real world distance using pixel distance - computer-vision

I have two images as here a vehicle entered the region and other image as here vehicle exited the region
I captured this images using a single CCTV camera mounted on a road.
Now I want to compute the real world distance travelled by it to find the speed of the vehicle. I use object detection to get the bounding boxes for car number plate in both the images there by I can compute pixel distance. I can map pixel distance to real world only when image plane and road plane are parallel to each other(I am using this technique but it isn't giving accurate results). Since my camera is inclined at an angle to the road, I couldn't use that technique.
I have tried few research papers but couldn't find any useful information relevant to my problem. Someone please share insights on doing it, it will be helpful.

In this problem we have single camera view so there is not way you can find the real world distance between the objects using camera view geometry. Though we can convert the image pixels to real world unit by considering certain reference objects with known length values in real world units.
In the sample image captured you can identify the road lane markers as shown below in the image and knowing their lengths in real world units you can find the pixels to real world distance.
Below is a quick and basic implementation of road-lane marker detection approach. This will also give you contours in the objects like car, bike in the image but you can remove such contours by applying mask over those objects once you know their object bounding boxes.
img = cv2.imread("road_lane.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.blur(gray, (3, 3))
# Find Canny edges
edged = cv2.Canny(blur, 30, 200)
# Finding Contours
contours, hierarchy = cv2.findContours(edged, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
boundRect = []
for i, c in enumerate(contours):
#ignore large and small contours
if len(c) < 300 and len(c) > 100 :
box = cv2.boundingRect(c)
#check for vertical rectangles
if box[2] < box[3]:
boundRect.append(box)
for i in range(len(boundRect)):
cv2.rectangle(img, (int(boundRect[i][0]), int(boundRect[i][1])), (int(boundRect[i][0] + boundRect[i][2]), int(boundRect[i][1] + boundRect[i][3])), (255, 0, 0), 5)

Related

Track Eye Pupil Position with Webcam, OpenCV, and Python

I am trying to build a robot that I can control with basic eye movements. I am pointing a webcam at my face, and depending on the position of my pupil, the robot would move a certain way. If the pupil is in the top, bottom, left corner, right corner of the eye the robot would move forwards, backwards, left, right respectively.
My original plan was to use an eye haar cascade to find my left eye. I would then use houghcircle on the eye region to find the center of the pupil. I would determine where the pupil was in the eye by finding the distance from the center of the houghcircle to the borders of the general eye region.
So for the first part of my code, I'm hoping to be able to track the center of the eye pupil, as seen in this video. https://youtu.be/aGmGyFLQAFM?t=38
But when I run my code, it cannot consistently find the center of the pupil. The houghcircle is often drawn in the wrong area. How can I make my program consistently find the center of the pupil, even when the eye moves?
Is it possible/better/easier for me to tell my program where the pupil is at the beginning?
I've looked at some other eye tracking methods, but I cannot form a general algorithm. If anyone could help form one, that would be much appreciated!
https://arxiv.org/ftp/arxiv/papers/1202/1202.6517.pdf
import numpy as np
import cv2
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_righteye_2splits.xml')
#number signifies camera
cap = cv2.VideoCapture(0)
while 1:
ret, img = cap.read()
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#faces = face_cascade.detectMultiScale(gray, 1.3, 5)
eyes = eye_cascade.detectMultiScale(gray)
for (ex,ey,ew,eh) in eyes:
cv2.rectangle(img,(ex,ey),(ex+ew,ey+eh),(0,255,0),2)
roi_gray2 = gray[ey:ey+eh, ex:ex+ew]
roi_color2 = img[ey:ey+eh, ex:ex+ew]
circles = cv2.HoughCircles(roi_gray2,cv2.HOUGH_GRADIENT,1,20,param1=50,param2=30,minRadius=0,maxRadius=0)
try:
for i in circles[0,:]:
# draw the outer circle
cv2.circle(roi_color2,(i[0],i[1]),i[2],(255,255,255),2)
print("drawing circle")
# draw the center of the circle
cv2.circle(roi_color2,(i[0],i[1]),2,(255,255,255),3)
except Exception as e:
print e
cv2.imshow('img',img)
k = cv2.waitKey(30) & 0xff
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
I can see two alternatives, from some work that I did before:
Train a Haar detector to detect the eyeball, using training images with the center of the pupil at the center and the width of the eyeball as width. I found this better than using Hough circles or just the original eye detector of OpenCV (the one used in your code).
Use Dlib's face landmark points to estimate the eye region. Then use the contrast caused by the white and dark regions of the eyeball, together with contours, to estimate the center of the pupil. This produced much better results.
Just replace line where you created HoughCircles by this:
circles = cv2.HoughCircles(roi_gray2,cv2.HOUGH_GRADIENT,1,200,param1=200,param2=1,minRadius=0,maxRadius=0)
I just changed a couple of parameters and it gives me more accuracy.
Detailed information about parameters here.

Rectangle detection / tracking using OpenCV

What I need
I'm currently working on an augmented reality kinda game. The controller that the game uses (I'm talking about the physical input device here) is a mono colored, rectangluar pice of paper. I have to detect the position, rotation and size of that rectangle in the capture stream of the camera. The detection should be invariant on scale and invariant on rotation along the X and Y axes.
The scale invariance is needed in case that the user moves the paper away or towards the camera. I don't need to know the distance of the rectangle so scale invariance translates to size invariance.
The rotation invariance is needed in case the user tilts the rectangle along its local X and / or Y axis. Such a rotation changes the shape of the paper from rectangle to trapezoid. In this case, the object oriented bounding box can be used to measure the size of the paper.
What I've done
At the beginning there is a calibration step. A window shows the camera feed and the user has to click on the rectangle. On click, the color of the pixel the mouse is pointing at is taken as reference color. The frames are converted into HSV color space to improve color distinguishing. I have 6 sliders that adjust the upper and lower thresholds for each channel. These thresholds are used to binarize the image (using opencv's inRange function).
After that I'm eroding and dilating the binary image to remove noise and unite nerby chunks (using opencv's erode and dilate functions).
The next step is finding contours (using opencv's findContours function) in the binary image. These contours are used to detect the smallest oriented rectangles (using opencv's minAreaRect function). As final result I'm using the rectangle with the largest area.
A short conclusion of the procedure:
Grab a frame
Convert that frame to HSV
Binarize it (using the color that the user selected and the thresholds from the sliders)
Apply morph ops (erode and dilate)
Find contours
Get the smallest oriented bouding box of each contour
Take the largest of those bounding boxes as result
As you may noticed, I don't make an advantage of the knowledge about the actual shape of the paper, simply because I don't know how to use this information properly.
I've also thought about using the tracking algorithms of opencv. But there were three reasons that prevented me from using them:
Scale invariance: as far as I read about some of the algorithms, some don't support different scales of the object.
Movement prediction: some algorithms use movement prediction for better performance, but the object I'm tracking moves completely random and therefore unpredictable.
Simplicity: I'm just looking for a mono colored rectangle in an image, nothing fancy like car or person tracking.
Here is a - relatively - good catch (binary image after erode and dilate)
and here is a bad one
The Question
How can I improve the detection in general and especially to be more resistant against lighting changes?
Update
Here are some raw images for testing.
Can't you just use thicker material?
Yes I can and I already do (unfortunately I can't access these pieces at the moment). However, the problem still remains. Even if I use material like cartboard. It isn't bent as easy as paper, but one can still bend it.
How do you get the size, rotation and position of the rectangle?
The minAreaRect function of opencv returns a RotatedRect object. This object contains all the data I need.
Note
Because the rectangle is mono colored, there is no possibility to distinguish between top and bottom or left and right. This means that the rotation is always in range [0, 180] which is perfectly fine for my purposes. The ratio of the two sides of the rect is always w:h > 2:1. If the rectangle would be a square, the range of roation would change to [0, 90], but this can be considered irrelevant here.
As suggested in the comments I will try histogram equalization to reduce brightness issues and take a look at ORB, SURF and SIFT.
I will update on progress.
The H channel in the HSV space is the Hue, and it is not sensitive to the light changing. Red range in about [150,180].
Based on the mentioned information, I do the following works.
Change into the HSV space, split the H channel, threshold and normalize it.
Apply morph ops (open)
Find contours, filter by some properties( width, height, area, ratio and so on).
PS. I cannot fetch the image you upload on the dropbox because of the NETWORK. So, I just use crop the right side of your second image as the input.
imgname = "src.png"
img = cv2.imread(imgname)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
## Split the H channel in HSV, and get the red range
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
h,s,v = cv2.split(hsv)
h[h<150]=0
h[h>180]=0
## normalize, do the open-morp-op
normed = cv2.normalize(h, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8UC1)
kernel = cv2.getStructuringElement(shape=cv2.MORPH_ELLIPSE, ksize=(3,3))
opened = cv2.morphologyEx(normed, cv2.MORPH_OPEN, kernel)
res = np.hstack((h, normed, opened))
cv2.imwrite("tmp1.png", res)
Now, we get the result as this (h, normed, opened):
Then find contours and filter them.
contours = cv2.findContours(opened, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
print(len(contours))[-2]
bboxes = []
rboxes = []
cnts = []
dst = img.copy()
for cnt in contours:
## Get the stright bounding rect
bbox = cv2.boundingRect(cnt)
x,y,w,h = bbox
if w<30 or h < 30 or w*h < 2000 or w > 500:
continue
## Draw rect
cv2.rectangle(dst, (x,y), (x+w,y+h), (255,0,0), 1, 16)
## Get the rotated rect
rbox = cv2.minAreaRect(cnt)
(cx,cy), (w,h), rot_angle = rbox
print("rot_angle:", rot_angle)
## backup
bboxes.append(bbox)
rboxes.append(rbox)
cnts.append(cnt)
The result is like this:
rot_angle: -2.4540319442749023
rot_angle: -1.8476102352142334
Because the blue rectangle tag in the source image, the card is splited into two sides. But a clean image will have no problem.
I know it's been a while since I asked the question. I recently continued on the topic and solved my problem (although not through rectangle detection).
Changes
Using wood to strengthen my controllers (the "rectangles") like below.
Placed 2 ArUco markers on each controller.
How it works
Convert the frame to grayscale,
downsample it (to increase performance during detection),
equalize the histogram using cv::equalizeHist,
find markers using cv::aruco::detectMarkers,
correlate markers (if multiple controllers),
analyze markers (position and rotation),
compute result and apply some error correction.
It turned out that the marker detection is very robust to lighting changes and different viewing angles which allows me to skip any calibration steps.
I placed 2 markers on each controller to increase the detection robustness even more. Both markers has to be detected only one time (to measure how they correlate). After that, it's sufficient to find only one marker per controller as the other can be extrapolated from the previously computed correlation.
Here is a detection result in a bright environment:
in a darker environment:
and when hiding one of the markers (the blue point indicates the extrapolated marker postition):
Failures
The initial shape detection that I implemented didn't perform well. It was very fragile to lighting changes. Furthermore, it required an initial calibration step.
After the shape detection approach I tried SIFT and ORB in combination with brute force and knn matcher to extract and locate features in the frames. It turned out that mono colored objects don't provide much keypoints (what a surprise). The performance of SIFT was terrible anyway (ca. 10 fps # 540p).
I drew some lines and other shapes on the controller which resulted in more keypoints beeing available. However, this didn't yield in huge improvements.

What accuracy should I expect from basic opencv ortho-rectification algorithms?

So, I'm taking over the work on an ortho-rectification algorithm that is intended to produce "accurate" results. I'm running into trouble trying to increase the accuracy and could use a little help.
Here is the basic approach.
Extract a calibration pattern from an image that was taken from a mobile phone.
Rectify the image based on a calibration pattern in the image
Scale the image to get the real world size of the scene around the pattern.
The calibration pattern is held against a flat surface, like a wall, counter, table, floor and the user takes a picture. With that picture, we want to measure artifacts on the same surface as the calibration pattern. We have tried this with calibration patterns ranging from the size of a credit card to a sheet of paper (8.5" x 11")
Here is an example input picture
With this resulting output image
Right now our measurements are usually within 1-2% of what we expect. This is sufficient for small areas (less than 25cm away from the calibration pattern. However, we'd like the algorithm to scale so that we can accurately measure a 2x2 meter area. However, at that size, the current error is too much (2-4 cm).
Here is the algorithm we are following.
// convert original image to grayscale and perform morphological dilation to reduce false matches when finding circle grid
Mat imgGray;
cvtColor(imgOriginal, imgGray, CV_BGR2GRAY);
// find calibration pattern in original image
Size patternSize(4, 11);
vector <Point2f> circleCenters_OriginalImage;
if (!findCirclesGrid(imgGray, patternSize, circleCenters_OriginalImage, CALIB_CB_ASYMMETRIC_GRID))
{
return false;
}
Point2f inputQuad[4];
inputQuad[0] = Point2f(circleCenters_OriginalImage[0].x, circleCenters_OriginalImage[0].y);
inputQuad[1] = Point2f(circleCenters_OriginalImage[3].x, circleCenters_OriginalImage[3].y);
inputQuad[2] = Point2f(circleCenters_OriginalImage[43].x, circleCenters_OriginalImage[43].y);
inputQuad[3] = Point2f(circleCenters_OriginalImage[40].x, circleCenters_OriginalImage[40].y);
// create model points for calibration pattern
vector <Point2f> circleCenters_ObjectSpace = GeneratePatternPointsInObjectSpace(circleCenters_OriginalImage[0], Distance(circleCenters_OriginalImage[0], circleCenters_OriginalImage[1]) / 2.0f, ioData.marker_up);
Point2f outputQuad[4];
outputQuad[0] = Point2f(circleCenters_ObjectSpace[0].x, circleCenters_ObjectSpace[0].y);
outputQuad[1] = Point2f(circleCenters_ObjectSpace[3].x, circleCenters_ObjectSpace[3].y);
outputQuad[2] = Point2f(circleCenters_ObjectSpace[43].x, circleCenters_ObjectSpace[43].y);
outputQuad[3] = Point2f(circleCenters_ObjectSpace[40].x, circleCenters_ObjectSpace[40].y);
Mat lambda(2,4,CV_32FC1);
lambda = Mat::zeros(imgOriginal.rows, imgOriginal.cols, imgOriginal.type());
lambda = getPerspectiveTransform(inputQuad, outputQuad);
warpPerspective(imgOriginal, imgOrthorectified, lambda, imgOrthorectified.size());
...
My Questions:
Is it reasonable to shoot for error < 0.25%? Is there a different algorithm that would yield more accurate results? What are the most valuable sources of error to identify and resolve?
As I've worked on this, I've also looked at removing pincushion / barrel distortions, and trying homographies to find the perspective transform. The best approaches I have found so far remain in the 1-2% error.
Any suggestions of where to go next would be really helpful

Can I create a transformation matrix from rotation/translation vectors?

I'm trying to deskew an image that has an element of known size. Given this image:
I can use aruco:: estimatePoseBoard which returns rotation and translation vectors. Is there a way to use that information to deskew everything that's in the same plane as the marker board? (Unfortunately my linear algebra is rudimentary at best.)
Clarification
I know how to deskew the marker board. What I want to be able to do is deskew the other things (in this case, the cloud-shaped object) in the same plane as the marker board. I'm trying to determine whether or not that's possible and, if so, how to do it. I can already put four markers around the object I want to deskew and use the detected corners as input to getPerspectiveTransform along with the known distance between them. But for our real-world application it may be difficult for the user to place markers exactly. It would be much easier if they could place a single marker board in the frame and have the software deskew the other objects.
Since you tagged OpenCV:
From the image I can see that you have detected the corners of all the black box. So just get the most border for points in a way or another:
Then it is like this:
std::vector<cv::Point2f> src_points={/*Fill your 4 corners here*/};
std::vector<cv::Point2f> dst_points={cv:Point2f(0,0), cv::Point2f(width,0), cv::Point2f(width,height),cv::Point2f(0,height)};
auto H=v::getPerspectiveTransform(src_points,dst_points);
cv::Mat copped_image;
cv::warpPerspective(full_image,copped_image,H,cv::Size(width,height));
I was stuck on the assumption that the destination points in the call to getPerspectiveTransform had to be the corners of the output image (as they are in Humam's suggestion). Once it dawned on me that the destination points could be somewhere within the output image I had my answer.
float boardX = 1240;
float boardY = 1570;
float boardWidth = 1730;
float boardHeight = 1400;
vector<Point2f> destinationCorners;
destinationCorners(Point2f(boardX+boardWidth, boardY));
destinationCorners(Point2f(boardX+boardWidth, boardY+boardHeight));
destinationCorners(Point2f(boardX, boardY+boardHeight));
destinationCorners(Point2f(boardX, boardY));
Mat h = getPerspectiveTransform(detectedCorners, destinationCorners);
Mat bigImage(image.size() * 3, image.type(), Scalar(0, 50, 50));
warpPerspective(image, bigImage, h, bigImage.size());
This fixed the perspective of the board and everything in its plane. (The waviness of the board is due to the fact that the paper wasn't lying flat in the original photo.)

filtering lines and curves in background subtraction in opencv

I am working on object tracking using background subtraction in opencv. I have taken a sample soccer video and my goal is to track the players and filter out the bigger field markings. Due to non-static camera, the big lines are also detected as moving as in this image:
I made use of the Hough Transform to detect lines and after setting appropriate thresholds, was able to filter the half-way line and the image appeared as this:
Now I am concerned about filtering these 2 arcs.
Question 1. What are the ways I can possibly do this? How can I make use of the difference in "properties" the arc(long and thin) and a player(a compact blob) have?
Moreover, the Hough transform function sometimes reports many false positives (Detecting a tall thin player as a straight line or even connecting 2 players to show a longer line).
Question 2. In what way to specify the maximum thickness of the "to be detected" line and to maintain strict standards to detect lines "only"?
Thanks.
I had an old script lying around for a similar function. Unfortunately, it's Python and doesn't use the Hough transform function. Still, you may find it useful.
get_blobs is the important function while __main__ is example usage.
import cv2
def get_blobs(thresh, maxblobs, maxmu03, iterations=1):
"""
Return a 2-tuple list of the locations of large white blobs.
`thresh` is a black and white threshold image.
No more than `maxblobs` will be returned.
Moments with a mu03 larger than `maxmu03` are ignored.
Before sampling for blobs, the image will be eroded `iterations` times.
"""
# Kernel specifies an erosion on direct pixel neighbours.
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, (3, 3))
# Remove noise and thin lines by eroding/dilating blobs.
thresh = cv2.erode(thresh, kernel, iterations=iterations)
thresh = cv2.dilate(thresh, kernel, iterations=iterations-1)
# Calculate the centers of the contours.
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0]
moments = map(cv2.moments, contours)
# Filter out the moments that are too tall.
moments = filter(lambda k: abs(k['mu03']) <= maxmu03, moments)
# Select the largest moments.
moments = sorted(moments, key=lambda k: k['m00'], reverse=True)[:maxblobs]
# Return the centers of the moments.
return [(m['m10'] / m['m00'], m['m01'] / m['m00']) for m in moments if m['m00'] != 0]
if __name__ == '__main__':
# Load an image and mark the 14 largest blobs.
image = cv2.imread('input.png')
bwImage = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
trackers = get_blobs(bwImage, 14, 50000, 3)
for tracker in trackers:
cv2.circle(image, tuple(int(x) for x in tracker), 3, (0, 0, 255), -1)
cv2.imwrite('output.png', image)
Starting from your first image:
The algorithm uses erosion to separate the blobs from the lines.
Moments are then used to filter out the tall and small blobs. Moments are also used to locate the center of each blob.
get_blobs returns a 2-tuple list of the locations of the players. You can see them painted on the last image.
As it stands, the script is really messy. Feel free to use it directly, but I posted it mainly to give you some ideas.