Intersection-over-union between two detections - computer-vision

I was reading through the paper :
Ferrari et al. in the "Affinity Measures" section. I understood that Ferrari et al. tries to obtain affinity by :
Location affinity - using area of intersection-over-union between two detections
Appearance affinity - using Euclidean distances between Histograms
KLT point affinity measure
However, I have 2 main problems:
I cannot understand what is actually meant by intersection-over-union between 2 detections and how to calculate it
I tried a slightly difference appearance affinity measure. I transformed the RGB detection into HSV..concatenating the Hue and Saturation into 1 vector, and used it to compare with other detections. However, using this technique failed as a detection of a bag had a better similarity score than a detection of the same person's head (with a different orientation).
Any suggestions or solutions to my problems described above? Thank you and your help is very much appreciated.

Try intersection over Union
Intersection over Union is an evaluation metric used to measure the accuracy of an object detector on a particular dataset.
More formally, in order to apply Intersection over Union to evaluate an (arbitrary) object detector we need:
The ground-truth bounding boxes (i.e., the hand labeled bounding boxes from the testing set that specify where in the image our object is).
The predicted bounding boxes from our model.
Below I have included a visual example of a ground-truth bounding box versus a predicted bounding box:
The predicted bounding box is drawn in red while the ground-truth (i.e., hand labeled) bounding box is drawn in green.
In the figure above we can see that our object detector has detected the presence of a stop sign in an image.
Computing Intersection over Union can therefore be determined via:
As long as we have these two sets of bounding boxes we can apply Intersection over Union.
Here is the Python code
# import the necessary packages
from collections import namedtuple
import numpy as np
import cv2
# define the `Detection` object
Detection = namedtuple("Detection", ["image_path", "gt", "pred"])
def bb_intersection_over_union(boxA, boxB):
# determine the (x, y)-coordinates of the intersection rectangle
xA = max(boxA[0], boxB[0])
yA = max(boxA[1], boxB[1])
xB = min(boxA[2], boxB[2])
yB = min(boxA[3], boxB[3])
# compute the area of intersection rectangle
interArea = (xB - xA) * (yB - yA)
# compute the area of both the prediction and ground-truth
# rectangles
boxAArea = (boxA[2] - boxA[0]) * (boxA[3] - boxA[1])
boxBArea = (boxB[2] - boxB[0]) * (boxB[3] - boxB[1])
# compute the intersection over union by taking the intersection
# area and dividing it by the sum of prediction + ground-truth
# areas - the interesection area
iou = interArea / float(boxAArea + boxBArea - interArea)
# return the intersection over union value
return iou
The gt and pred are
gt : The ground-truth bounding box.
pred : The predicted bounding box from our model.
For more information, you can click this post

1) You have two overlapping bounding boxes. You compute the intersection of the boxes, which is the area of the overlap. You compute the union of the overlapping boxes, which is the sum of the areas of the entire boxes minus the area of the overlap. Then you divide the intersection by the union. There is a function for that in the Computer Vision System Toolbox called bboxOverlapRatio.
2) Generally, you don't want to concatenate the color channels. What you want instead, is a 3D histogram, where the dimensions are H, S, and V.

The current answer already explained the question clearly. So here I provide a bit better version of IoU with Python that doesn't break when two bounding boxes don't intersect.
import numpy as np
def IoU(box1: np.ndarray, box2: np.ndarray):
"""
calculate intersection over union cover percent
:param box1: box1 with shape (N,4) or (N,2,2) or (2,2) or (4,). first shape is preferred
:param box2: box2 with shape (N,4) or (N,2,2) or (2,2) or (4,). first shape is preferred
:return: IoU ratio if intersect, else 0
"""
# first unify all boxes to shape (N,4)
if box1.shape[-1] == 2 or len(box1.shape) == 1:
box1 = box1.reshape(1, 4) if len(box1.shape) <= 2 else box1.reshape(box1.shape[0], 4)
if box2.shape[-1] == 2 or len(box2.shape) == 1:
box2 = box2.reshape(1, 4) if len(box2.shape) <= 2 else box2.reshape(box2.shape[0], 4)
point_num = max(box1.shape[0], box2.shape[0])
b1p1, b1p2, b2p1, b2p2 = box1[:, :2], box1[:, 2:], box2[:, :2], box2[:, 2:]
# mask that eliminates non-intersecting matrices
base_mat = np.ones(shape=(point_num,))
base_mat *= np.all(np.greater(b1p2 - b2p1, 0), axis=1)
base_mat *= np.all(np.greater(b2p2 - b1p1, 0), axis=1)
# I area
intersect_area = np.prod(np.minimum(b2p2, b1p2) - np.maximum(b1p1, b2p1), axis=1)
# U area
union_area = np.prod(b1p2 - b1p1, axis=1) + np.prod(b2p2 - b2p1, axis=1) - intersect_area
# IoU
intersect_ratio = intersect_area / union_area
return base_mat * intersect_ratio

Here's yet another solution I implemented that works for me.
Borrowed heavily from PyImageSearch
import numpy as np
def bbox_intersects(bbox_a, bbox_b):
if bbox_b['x0'] >= bbox_a['x0'] and bbox_b['x0'] <= bbox_a['x1'] and \
bbox_b['y0'] >= bbox_a['y0'] and bbox_b['y0'] <= bbox_a['y1']:
# top-left of b within a
return True
elif bbox_b['x1'] >= bbox_a['x0'] and bbox_b['x1'] <= bbox_a['x1'] and \
bbox_b['y1'] >= bbox_a['y0'] and bbox_b['y1'] <= bbox_a['y1']:
# bottom-right of b within a
return True
elif bbox_a['x0'] >= bbox_b['x0'] and bbox_a['x0'] <= bbox_b['x1'] and \
bbox_a['y0'] >= bbox_b['y0'] and bbox_a['y0'] <= bbox_b['y1']:
# top-left of a within b
return True
elif bbox_a['x1'] >= bbox_b['x0'] and bbox_a['x1'] <= bbox_b['x1'] and \
bbox_a['y1'] >= bbox_b['y0'] and bbox_a['y1'] <= bbox_b['y1']:
# bottom-right of a within b
return True
return False
def bbox_area(x0, y0, x1, y1):
return (x1-x0) * (y1-y0)
def get_bbox_iou(bbox_a, bbox_b):
if bbox_intersects(bbox_a, bbox_b):
x_left = max(bbox_a['x0'], bbox_b['x0'])
x_right = min(bbox_a['x1'], bbox_b['x1'])
y_top = max(bbox_a['y0'], bbox_b['y0'])
y_bottom = min(bbox_a['y1'], bbox_b['y1'])
inter_area = bbox_area(x0 = x_left, x1 = x_right, y0 = y_top , y1 = y_bottom)
bbox_a_area = bbox_area(**bbox_a)
bbox_b_area = bbox_area(**bbox_b)
return inter_area / float(bbox_a_area + bbox_b_area - inter_area)
else:
return 0

Related

How to convert 2d(x,y) cooridinates into 3d(x,y,z) coordinates using python and point cloud?

I have been using this github repo: https://github.com/aim-uofa/AdelaiDepth/blob/main/LeReS/Minist_Test/tools/test_shape.py
To figure out how this piece of code can be used to get x,y,z coordinates:
def reconstruct_3D(depth, f):
"""
Reconstruct depth to 3D pointcloud with the provided focal length.
Return:
pcd: N X 3 array, point cloud
"""
cu = depth.shape[1] / 2
cv = depth.shape[0] / 2
width = depth.shape[1]
height = depth.shape[0]
row = np.arange(0, width, 1)
u = np.array([row for i in np.arange(height)])
col = np.arange(0, height, 1)
v = np.array([col for i in np.arange(width)])
v = v.transpose(1, 0)
I want to use these coordinates to find distance between 2 people in 3D for an object detection model. Does anyone have any advice?
I know how to use 2d images with yolo to figure out distance between 2 people. Based on this link: Compute the centroid of a rectangle in python
My thinking is i can use the bounding boxes to get corners and then find the centroid and do that for 2 bounding boxes of people and use triangulation to find the hypotenuse between the 2 points (which is their distance).
However, i am having a tricky time on how to use a set of 3d coordinates to find distance between 2 people. I can get the relative distance from my 2d model.
By having a 2D depth image and camera's intrinsic matrix, you can convert each pixel to 3D point cloud as:
z = d
x = (u - cx) * z / f
y = (v - cy) * z / f
// where (cx, cy) is the principle point and f is the focal length.
In the meantime, you can use third party library like open3d for doing the same:
xyz = open3d.geometry.create_point_cloud_from_depth_image(depth, intrinsic)

Determine if points are within a rotated rectangle (standard Python 2.7 library only) [duplicate]

This question already has answers here:
Finding whether a point lies inside a rectangle or not
(10 answers)
Closed 2 years ago.
I have a rotated rectangle with these coordinates as vertices:
1 670273 4879507
2 677241 4859302
3 670388 4856938
4 663420 4877144
And I have points with these coordinates:
670831 4867989
675097 4869543
Using only the Python 2.7 standard library, I want to determine if the points fall within the rotated rectangle.
I am not able to add additional Python libraries to my Jython implementation
What would it take to do this?
A line equation of the form ax+by+c==0 can be constructed from 2 points. For a given point to be inside a convex shape, we need testing whether it lies on the same side of every line defined by the shape's edges.
In pure Python code, taking care of writing the equations avoiding divisions, this could be as follows:
def is_on_right_side(x, y, xy0, xy1):
x0, y0 = xy0
x1, y1 = xy1
a = float(y1 - y0)
b = float(x0 - x1)
c = - a*x0 - b*y0
return a*x + b*y + c >= 0
def test_point(x, y, vertices):
num_vert = len(vertices)
is_right = [is_on_right_side(x, y, vertices[i], vertices[(i + 1) % num_vert]) for i in range(num_vert)]
all_left = not any(is_right)
all_right = all(is_right)
return all_left or all_right
vertices = [(670273, 4879507), (677241, 4859302), (670388, 4856938), (663420, 4877144)]
The following plot tests the code visually for several shapes. Note that for shapes with horizontal and vertical lines usual line equations could provoke division by zero.
import matplotlib.pyplot as plt
import numpy as np
vertices1 = [(670273, 4879507), (677241, 4859302), (670388, 4856938), (663420, 4877144)]
vertices2 = [(680000, 4872000), (680000, 4879000), (690000, 4879000), (690000, 4872000)]
vertices3 = [(655000, 4857000), (655000, 4875000), (665000, 4857000)]
k = np.arange(6)
r = 8000
vertices4 = np.vstack([690000 + r * np.cos(k * 2 * np.pi / 6), 4863000 + r * np.sin(k * 2 * np.pi / 6)]).T
all_shapes = [vertices1, vertices2, vertices3, vertices4]
for vertices in all_shapes:
plt.plot([x for x, y in vertices] + [vertices[0][0]], [y for x, y in vertices] + [vertices[0][1]], 'g-', lw=3)
for x, y in zip(np.random.randint(650000, 700000, 1000), np.random.randint(4855000, 4880000, 1000)):
color = 'turquoise'
for vertices in all_shapes:
if test_point(x, y, vertices):
color = 'tomato'
plt.plot(x, y, '.', color=color)
plt.gca().set_aspect('equal')
plt.show()
PS: In case you are running a 32-bit version of numpy, with this size of integers it might be necessary to convert the values to float to avoid overflow.
If this calculation needs to happen very often, the a,b,c values can be precalculated and stored. If the direction of the edges is known, only one of all_left or all_right is needed.
When the shape is fixed, a text version of the function can be generated:
def generate_test_function(vertices, is_clockwise=True, function_name='test_function'):
ext_vert = list(vertices) + [vertices[0]]
unequality_sign = '>=' if is_clockwise else '<='
print(f'def {function_name}(x, y):')
parts = []
for (x0, y0), (x1, y1) in zip(ext_vert[:-1], ext_vert[1:]):
a = float(y1 - y0)
b = float(x0 - x1)
c = a * x0 + b * y0
parts.append(f'({a}*x + {b}*y {unequality_sign} {c})')
print(' return', ' and '.join(parts))
vertices = [(670273, 4879507), (677241, 4859302), (670388, 4856938), (663420, 4877144)]
generate_test_function(vertices)
This would generate a function as:
def test_function(x, y):
return (-20205.0*x + -6968.0*y >= -47543270741.0) and (-2364.0*x + 6853.0*y >= 31699798882.0) and (20206.0*x + 6968.0*y >= 47389003912.0) and (2363.0*x + -6853.0*y >= -31855406372.0)
This function then can be copy-pasted and optimized by the Jython compiler. Note that the shape doesn't need to be rectangular. Any convex shape will do, allowing to use a tighter box.
Take three consequent vertices A, B, C (your 1,2,3)
Find lengths of sides AB and BC
lAB = sqrt((B.x - A.x)^2+(B.y - A.y)^2)
Get unit (normalized) direction vectors
uAB = ((B.x - A.x) / lAB, (B.y - A.y) / lAB)
For tested point P get vector BP
BP = ((P.x - B.x), (P.y - B.y))
And calculate signed distances from sides to point using cross product
SignedDistABP = Cross(BP, uAB) = BP.x * uAB.y - BP.y * uAB.x
SignedDistBCP = - Cross(BP, uBC) = - BP.x * uBC.y + BP.y * uBC.x
For points inside rectangle both distances should have the same sign - either negative or positive depending on vertices order (CW or CCW), and their absolute values should not be larger than lBC and lAB correspondingly
Abs(SignedDistABP) <= lBC
Abs(SignedDistBCP) <= lAB
As the shape is an exact rectangle, the easiest is to rotate all points by the angle
-arctan((4859302-4856938)/(677241-670388))
Doing so, the rectangle becomes axis-aligned and you just have to perform four coordinate comparisons. Rotations are easy to compute with complex numbers.
In fact you can simply represent all points as complex numbers, compute the vector defined by some side, and multiply everything by the conjugate.
A slightly different approach is to consider the change of coordinate frame that brings some corner to the origin and two incident sides to (1,0) and (0,1). This is an affine transformation. Then your test boils down to checking insideness to the unit square.

finding shortest path given distance transform image

I am given a distance transform (below) and I need to write a program that finds the shortest path going from point A(140,200) to point B(725,1095) while making sure I am at least ten pixels away from any obstacle
distance_transform_given
(the above image is the distance transform of map)
This is what I have done so far:
I started off at the initial point and evaluated the grayscale intensity of every point around it. ( 8 neighboring points that is)
Then I moved to the point with the highest grayscale intensity of the 8 neighboring points.
Then I repeated this process but I get random turns and not the shortest path.
please do help me out
code of what I have done so far :
def find_max_neigh_location(np,img):
maxi = 0
x0=0
y0=0
for i in range(len(np)):
if img[np[i][0]][np[i][1]][0] >maxi:
maxi = img[np[i][0]][np[i][1]][0]
x0 = np[i][0]
y0 = np[i][1]
return x0,y0
-----------------------------------------------------------------
def check_if_extremes(x,y):
if(x==1099 and y==1174):return 1
elif(y==1174 and x!=1099):return 2
elif(x==1099 and y!=1174):return 3
else:return 0
--------------------------------------------------------
def find_highest_neighbour(img,x,y,visted_points):
val = check_if_extremes(x,y)
if val==1:
neigh_points = [(x-1,y),(x-1,y-1),(x,y-1)]
np = list(set(neigh_points)-set(visited_points))
x0,y0 = find_max_neigh_location(np,img)
elif val==2:
neigh_points = [(x-1,y),(x-1,y-1),(x,y-1),(x+1,y-1),(x+1,y)]
np = list(set(neigh_points)-set(visited_points))
x0,y0 = find_max_neigh_location(np,img)
elif val==3:
neigh_points = [(x-1,y),(x-1,y-1),(x,y-1),(x,y+1),(x-1,y+1)]
np = list(set(neigh_points)-set(visited_points))
x0,y0 = find_max_neigh_location(np,img)
elif val==0:
neigh_points = [(x-1,y),(x-1,y-1),(x,y-1),(x,y+1),(x+1,y),(x+1,y+1),(x,y+1),(x-1,y+1)]
np = list(set(neigh_points)-set(visited_points))
x0,y0 = find_max_neigh_location(np,img)
for pt in neigh_points:
visited_points.append(pt)
return x0,y0,visited_points
---------------------------------------------------------
def check_if_neighbour_is_final_pt(img,x,y):
l = [(x-1,y), (x+1,y),(x,y-1),(x,y+1),(x-1,y-1),(x+1,y+1),(x-1,y+1),(x+1,y-1)]
if (725,1095) in l:
return True
else:
return False
--------------------------------------------------------------
x=140
y=200
pos=[]
count = 0
visited_points = [(x,y)]
keyword = True
while keyword:
val = check_if_neighbour_is_final_pt(img,x,y)
if val == True:
keyword = False
if val == False:
count=count+1
x,y,visited_points = find_highest_neighbour(img,x,y,visited_points)
img[x][y] = [255,0,0]
cv2.imwrite("img\distance_transform_result__"+str(count)+".png",img)
As you did not comment your code at all I won't read through your code.
I'll stick to what you described as your approach.
The fact that you start at point A and move to the brightest point A's neigbourhood shows that you don't know what distance transform does or what it is you see in your distance map... Never start coding if you don't know what you're dealing with.
Distance transform transforms a binary image into an image where each pixel's value is the minimum distance of the input image's foreground pixel to the background.
Dark pixels mean close to background (obstacles in your problem) and bright pixels are further away.
So moving to the brightest pixel nearby will only lead you away from the obstacles but never to your target point.
First restriction:
Never get closer to an obstacle than 10 pixels!
This means, every pixel that is closer to the obstacle (darker than 10) cannot be part of your path. So apply a global threshold of 10 to your distance map.
Now every white pixel can be used for your path to B.
The rest ist an optimization problem. There is plenty of literature on shortest path algorithms online. I'll leave that up to you...

Counting the point which intercept in a line with opencv python

I am working in vehicle counting with opencv and python programming, I already complete step:
1. Detect moving vehicle with BackgroundSubtractorMOG2
2. Draw rectangle on it, then poin a centroid of that
3. Draw a line (to indicate of the counting)
if that centroid accros/intercept with the line I want count that 1. but in my code sometime it add sometime no. Here the line code:
cv2.line(frame,(0,170),(300,170),(200,200,0),2)
and there the centroid:
if w > 20 and h > 25:
cv2.rectangle(frame, (x,y), (x+w,y+h), (180, 1, 0), 1)
x1=w/2
y1=h/2
cx=x+x1
cy=y+y1
centroid=(cx,cy)
cv2.circle(frame,(int(cx),int(cy)),4,(0,255,0),-1)
my counting code:
if cy==170:
counter=counter+1
Can anyone help me. please. for your advice thankyou!
Here is my approach that would work independently of the video frame rate. Assuming that you are able to track a car's centroid at each frame, I would save the last two centroids' position (last_centroid and centroid in my code) and process as follows:
compute the intercepting line equation's parameters ( (a,b,c) from aX + bY + c = 0)
compute the equation's parameters of the segment line between last_centroid and centroid
find if the two lines are intersecting
if so, increment your counter
Here is how I implemented it in OpenCV (Python):
import cv2
import numpy as np
import collections
Params = collections.namedtuple('Params', ['a','b','c']) #to store equation of a line
def calcParams(point1, point2): #line's equation Params computation
if point2[1] - point1[1] == 0:
a = 0
b = -1.0
elif point2[0] - point1[0] == 0:
a = -1.0
b = 0
else:
a = (point2[1] - point1[1]) / (point2[0] - point1[0])
b = -1.0
c = (-a * point1[0]) - b * point1[1]
return Params(a,b,c)
def areLinesIntersecting(params1, params2, point1, point2):
det = params1.a * params2.b - params2.a * params1.b
if det == 0:
return False #lines are parallel
else:
x = (params2.b * -params1.c - params1.b * -params2.c)/det
y = (params1.a * -params2.c - params2.a * -params1.c)/det
if x <= max(point1[0],point2[0]) and x >= min(point1[0],point2[0]) and y <= max(point1[1],point2[1]) and y >= min(point1[1],point2[1]):
print("intersecting in:", x,y)
cv2.circle(frame,(int(x),int(y)),4,(0,0,255), -1) #intersecting point
return True #lines are intersecting inside the line segment
else:
return False #lines are intersecting but outside of the line segment
cv2.namedWindow('frame')
frame = np.zeros((240,320,3), np.uint8)
last_centroid = (200,200) #centroid of a car at t-1
centroid = (210,180) #centroid of a car at t
line_params = calcParams(last_centroid, centroid)
intercept_line_params = calcParams((0,170), (300,170))
print("Params:", line_params.a,line_params.b,line_params.c)
while(1):
cv2.circle(frame,last_centroid,4,(0,255,0), -1) #last_centroid
cv2.circle(frame,centroid,4,(0,255,0), -1) #current centroid
cv2.line(frame,last_centroid,centroid,(0,0,255),1) #segment line between car centroid at t-1 and t
cv2.line(frame,(0,170),(300,170),(200,200,0),2) #intercepting line
print("AreLinesIntersecting: ",areLinesIntersecting(intercept_line_params,line_params,last_centroid,centroid))
cv2.imshow('frame',frame)
if cv2.waitKey(15) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
And here are some results:
Fig1. Segment is intersecting the line (intercepting line in blue - segment line between last_centroid and centroid in red)
Fig2. Segment is NOT intersecting the line
N.B. I found the formulas to calculate the intersection point here.
I hope my approach will help to address your problem.
To assume that the centroid will assume a position 170 (in x or y) is wrong, because videos generally works at 30 fps, that mean you will get 30 centroid locations per second which means even if there the object crosses the line, it may never be 170!
To counter this, one method that can be used is defining a line margin. This means now you have a line margin x before actual line (y = 170) and x after the line margin.
So if your object falls anywhere in the margin, you can increment the counter. Now the next big part would be to make a tracking mechanism wherein you collect the list of point for each object and check if it fell in the margin.

Complex cross spectral density

mlab.csd from matplotlib: http://matplotlib.org/api/mlab_api.html#matplotlib.mlab.csd can be used to get real valued cross spectral density. If I want to get the phase information from the spectral density, I need a csd calculation which returns complex values. Is there one ?
This is discussed e.g. in this answer: https://stackoverflow.com/a/29306730/3920342
If you use csd of the mlab library you will get complex values so you can calculate phase angles (and the real valued coherence). In the following code s1 and and s2 contain the two signals (in time domain) to be correlated.
from matplotlib import mlab
# First create power sectral densities for normalization
(ps1, f) = mlab.psd(s1, Fs=1./dt, scale_by_freq=False)
(ps2, f) = mlab.psd(s2, Fs=1./dt, scale_by_freq=False)
plt.plot(f, ps1)
plt.plot(f, ps2)
# Then calculate cross spectral density
(csd, f) = mlab.csd(s1, s2, NFFT=256, Fs=1./dt,sides='default', scale_by_freq=False)
fig = plt.figure()
ax1 = fig.add_subplot(1, 2, 1)
# Normalize cross spectral absolute values by auto power spectral density
ax1.plot(f, np.absolute(csd)**2 / (ps1 * ps2))
ax2 = fig.add_subplot(1, 2, 2)
angle = np.angle(csd, deg=True)
angle[angle<-90] += 360
ax2.plot(f, angle)
# zoom in on frequency with maximum coherence
ax1.set_xlim(9, 11)
ax1.set_ylim(0, 1e-0)
ax1.set_title("Cross spectral density: Coherence")
ax2.set_xlim(9, 11)
ax2.set_ylim(0, 90)
ax2.set_title("Cross spectral density: Phase angle")
Here the real and imaginary(!) part of the cross spectral density:
This code is taken from the question How to use the cross-spectral density to calculate the phase shift of two related signals to create two signals s1 and s2:
"""
Compute the coherence of two signals
"""
import numpy as np
import matplotlib.pyplot as plt
# make a little extra space between the subplots
plt.subplots_adjust(wspace=0.5)
nfft = 256
dt = 0.01
t = np.arange(0, 30, dt)
nse1 = np.random.randn(len(t)) # white noise 1
nse2 = np.random.randn(len(t)) # white noise 2
r = np.exp(-t/0.05)
cnse1 = np.convolve(nse1, r, mode='same')*dt # colored noise 1
cnse2 = np.convolve(nse2, r, mode='same')*dt # colored noise 2
# two signals with a coherent part and a random part
s1 = 0.01*np.sin(2*np.pi*10*t) + cnse1
s2 = 0.01*np.sin(2*np.pi*10*t) + cnse2