Why add a small number on the bounding box? - computer-vision

I found that on the implementation of Fast(er) RCNN, there is always a small value added to the width and height of the bounding box. Why add a small number to the width and height?
For example, in Fast RCNN, cfg.EPS (default is 1e-14) is added:
ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + cfg.EPS
ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + cfg.EPS
ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + cfg.EPS
gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + cfg.EPS
gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
In Faster-RCNN, 1.0 is added to widths and heights.
ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights

I don’t know what is happening in the first case, but in the second case looks like the left and right locations are both inside the bounding box. The number of pixels spanned must therefore include both left and right locations. This is why 1 is added.

I have a strong feeling the code will continue with the areas of the boxes and use those for calculating the IoU. If so, you need to be sure the bounding box actually has some non-zero area.

My opinion.Actually,The pixels on the "bounding box border" are need to be calculated.SO you need to exclude them(by add 1).

Related

SymPy unable to simplify solution

I am trying to solve the following system of equation using sympy.
from sympy import *
n = 4
K = 2
a = symbols(f"a_:{int(n)}", real=True)
b = symbols(f"b_:{int(n)}", real=True)
X = symbols(f"X_:{int(K)}", real=True)
Y = symbols(f"Y_:{int(K)}", real=True)
lambda_ = symbols("lambda",real=True)
mu = symbols(f"mu_:{int(K)}", real=True)
list_eq = [
# (1)
Eq(a[0] + a[1] + a[2] + a[3], 0),
Eq(a[0] + a[1], X[0]),
Eq(a[2] + a[3], X[1]),
# (2)
Eq(b[0] + b[1] + b[2] + b[3], 0),
Eq(b[0] + b[1], Y[0]),
Eq(b[2] + b[3], Y[1]),
# (3)
Eq(b[0], a[0] - lambda_ - mu[0]),
Eq(b[1], a[1] - lambda_ - mu[0]),
Eq(b[2], a[2] - lambda_ - mu[1]),
Eq(b[3], a[3] - lambda_ - mu[1]),
]
solve(list_eq, dict=True)
[{X_0: -b_2 - b_3 + mu_0 - mu_1,
X_1: b_2 + b_3 - mu_0 + mu_1,
Y_0: -b_2 - b_3,
Y_1: b_2 + b_3,
a_0: -b_1 - b_2 - b_3 + mu_0/2 - mu_1/2,
a_1: b_1 + mu_0/2 - mu_1/2,
a_2: b_2 - mu_0/2 + mu_1/2,
a_3: b_3 - mu_0/2 + mu_1/2,
b_0: -b_1 - b_2 - b_3,
lambda: -mu_0/2 - mu_1/2}]
The analytical solution for b is
b_0 = a_0 + (1/2)*(Y_0 - X_0)
b_1 = a_1 + (1/2)*(Y_0 - X_0)
b_2 = a_2 + (1/2)*(Y_1 - X_1)
b_3 = a_3 + (1/2)*(Y_1 - X_1)
However sympy does not manage to simplify the results and is still using mu_0 and mu_1 in the solution.
Is it possible to simplify those variables in the solution ?
For more details, the system i'm trying to solve is an optimization problem under constraints:
min_b || a - b ||^2 such that b_0 + b_1 + b_2 + b_3 = 0 and b_0 + b_1 = Y_0 and b_2 + b_3 = Y_1.
We assume that a_0 + a_1 + a_2 + a_3 = 0 and a_0 + a_1 = X_0 and a_2 + a_3 = X_1.
Therefore, the equations (1) are the assumptions on a and the equations (2) and (3) are the KKT equations.
You can eliminate variables from a system of linear or polynomial equations using a Groebner basis:
In [61]: G = groebner(list_eq, [*mu, lambda_, *b, *a, *X, *Y])
In [62]: for eq in G: pprint(eq)
X₁ - Y₁ + 2⋅λ + 2⋅μ₀
-X₁ + Y₁ + 2⋅λ + 2⋅μ₁
X₁ + Y₁ + 2⋅a₁ + 2⋅b₀
-X₁ + Y₁ - 2⋅a₁ + 2⋅b₁
-X₁ - Y₁ + 2⋅a₃ + 2⋅b₂
X₁ - Y₁ - 2⋅a₃ + 2⋅b₃
X₁ + a₀ + a₁
-X₁ + a₂ + a₃
X₀ + X₁
Y₀ + Y₁
Here the first two equations have mu and lambda but the others have these symbols eliminated. You can use G[2:] to get the equations that do not involve mu and lambda. The order of the symbols in a lex Groebner basis determines which symbols are eliminated first from the equations. You can solve specifically for b in terms of a, X and Y by picking out the equations involving b:
In [63]: solve(G[2:6], b)
Out[63]:
⎧ X₁ Y₁ X₁ Y₁ X₁ Y₁ X₁ Y₁ ⎫
⎨b₀: - ── - ── - a₁, b₁: ── - ── + a₁, b₂: ── + ── - a₃, b₃: - ── + ── + a₃⎬
⎩ 2 2 2 2 2 2 2 2 ⎭
This is not exactly the form you suggested but the form of solution for the problem is not unique because of the constraints among the variables it is expressed in. There are many equivalent ways to express b in terms of a, X and Y even after eliminating mu and lambda because a, X and Y are not independent (they are 8 symbols connected by 4 constraints).
Sometimes adding auxiliary equations with the pattern you desire and indicating what you don't want as a solution variable can help you get closer to what you desired:
[38] eqs = list_eq + [Y[0]-X[0]-var('z0'), Y[1]-X[1]-var('z1')]
[39] sol = Dict(solve(eqs, exclude=a, dict=True)[0]); sol

glCullFace is not correctly working, do you have answers for that?

I am rendering a terrain and because the underside will never be seen I though face culling would bring a nice performance boost. I looked up a tutorial and came to these settings:
glCullFace(GL_FRONT);
glFrontFace(GL_CCW);
The problem is, that this is only working partially. The terrain is made out of quadrilaterals which each consist out of two right triangles and I only see one of the two triangles, if I use clockwise winding, I see the other triangle. I am using indices to safe memory and I have drawn out on paper how the windings of the indices should be but somehow it isn't working.
I generate my indices like this:
mapIndices[arrayIdx + 0] = i;
mapIndices[arrayIdx + 1] = i + 1;
mapIndices[arrayIdx + 2] = i + CHUNK_SIDE_LENGHT + 1;
mapIndices[arrayIdx + 5] = mapIndices[arrayIdx + 1];
mapIndices[arrayIdx + 4] = mapIndices[arrayIdx + 2];
mapIndices[arrayIdx + 3] = mapIndices[arrayIdx + 2] + 1;
Don't get confused over the CHUNK_SIDE_LENGHT + 1, I want CHUNK_SIDE_LENGHT triangles, so I need CHUNK_SIDE_LENGHT + 1 vertices for one row.
As I said, I have drawn this order on paper and the winding is correct but OpenGL doesn't like it. Could this have something to do with the index rendering?
I cannot see any issue in your code.
The winding order of the 1st triangle is the same
0 : i
1 : i + 1
2 : i + CHUNK_SIDE_LENGHT + 1
2
+
| \
| \
| \
+-------+
0 1
as the winding order of the 2nd triangle:
3 : i + CHUNK_SIDE_LENGHT + 2
4 : i + CHUNK_SIDE_LENGHT + 1
5 : i + 1
4 3
+-------+
\ |
\ |
\ |
+
5

Normalizing data and applying colormap results in rotated image using matplotlib?

So I wanted to see if I could make fractal flames using matplotlib and figured a good test would be the sierpinski triangle. I modified a working version I had that simply performed the chaos game by normalizing the x range from -2, 2 to 0, 400 and the y range from 0, 2 to 0, 200. I also truncated the x and y coordinates to 2 decimal places and multiplied by 100 so that the coordinates could be put in to a matrix that I could apply a color map to. Here's the code I'm working on right now (please forgive the messiness):
import numpy as np
import matplotlib.pyplot as plt
import math
import random
def f(x, y, n):
N = np.array([[x, y]])
M = np.array([[1/2.0, 0], [0, 1/2.0]])
b = np.array([[.5], [0]])
b2 = np.array([[0], [.5]])
if n == 0:
return np.dot(M, N.T)
elif n == 1:
return np.dot(M, N.T) + 2*b
elif n == 2:
return np.dot(M, N.T) + 2*b2
elif n == 3:
return np.dot(M, N.T) - 2*b
def norm_x(n, minX_1, maxX_1, minX_2, maxX_2):
rng = maxX_1 - minX_1
n = (n - minX_1) / rng
rng_2 = maxX_2 - minX_2
n = (n * rng_2) + minX_2
return n
def norm_y(n, minY_1, maxY_1, minY_2, maxY_2):
rng = maxY_1 - minY_1
n = (n - minY_1) / rng
rng_2 = maxY_2 - minY_2
n = (n * rng_2) + minY_2
return n
# Plot ranges
x_min, x_max = -2.0, 2.0
y_min, y_max = 0, 2.0
# Even intervals for points to compute orbits of
x_range = np.arange(x_min, x_max, (x_max - x_min) / 400.0)
y_range = np.arange(y_min, y_max, (y_max - y_min) / 200.0)
mat = np.zeros((len(x_range) + 1, len(y_range) + 1))
random.seed()
x = 1
y = 1
for i in range(0, 100000):
n = random.randint(0, 3)
V = f(x, y, n)
x = V.item(0)
y = V.item(1)
mat[norm_x(x, -2, 2, 0, 400), norm_y(y, 0, 2, 0, 200)] += 50
plt.xlabel('x0')
plt.ylabel('y')
fig = plt.figure(figsize=(10,10))
plt.imshow(mat, cmap="spectral", extent=[-2,2, 0, 2])
plt.show()
The mathematics seem solid here so I suspect something weird is going on with how I'm handling where things should go into the 'mat' matrix and how the values in there correspond to the colormap.
If I understood your problem correctly, you need to transpose your matrix using the method .T. So just replace
fig = plt.figure(figsize=(10,10))
plt.imshow(mat, cmap="spectral", extent=[-2,2, 0, 2])
plt.show()
by
fig = plt.figure(figsize=(10,10))
ax = gca()
ax.imshow(mat.T, cmap="spectral", extent=[-2,2, 0, 2], origin="bottom")
plt.show()
The argument origin=bottom tells to imshow to have the origin of your matrix at the bottom of the figure.
Hope it helps.

Color picking in the openGL

i've been trying to implement color picking and it just aint working right. the problem is that if initially paint my model in the different colors that are used for the picking (i mean, i give each triangle different color, which is his id color), it works fine (without texture or anything .. ), but if i put texture of the model, and that when the mouse is clicked i paint the model by giving each triangle a different color, it doesnt work..
here is the code:
public int selection(int x, int y) {
GL11.glDisable(GL11.GL_LIGHTING);
GL11.glDisable(GL11.GL_TEXTURE_2D);
IntBuffer viewport = BufferUtils.createIntBuffer(16);
ByteBuffer pixelbuff = BufferUtils.createByteBuffer(16);
GL11.glGetInteger(GL11.GL_VIEWPORT, viewport);
this.render(this.mesh);
GL11.glReadPixels(x, y, 1, 1, GL11.GL_RGB, GL11.GL_UNSIGNED_BYTE, pixelbuff);
for (int m = 0; m < 3; m++)
System.out.println(pixelbuff.get(m));
GL11.glEnable(GL11.GL_TEXTURE_2D);
GL11.glEnable(GL11.GL_LIGHTING);
return 0;
}
public void render(GL_Mesh m, boolean inPickingMode)
{
GLMaterial[] materials = m.materials; // loaded from the .mtl file
GLMaterial mtl;
GL_Triangle t;
int currMtl = -1;
int i = 0;
// draw all triangles in object
for (i=0; i < m.triangles.length; ) {
t = m.triangles[i];
// activate new material and texture
currMtl = t.materialID;
mtl = (materials != null && materials.length>0 && currMtl >= 0)? materials[currMtl] : defaultMtl;
mtl.apply();
GL11.glBindTexture(GL11.GL_TEXTURE_2D, mtl.textureHandle);
// draw triangles until material changes
for ( ; i < m.triangles.length && (t=m.triangles[i])!=null && currMtl == t.materialID; i++) {
drawTriangle(t, i, inPickingMode);
}
}
}
private void drawTriangle(GL_Triangle t, int i, boolean inPickingMode) {
if (inPickingMode) {
byte[] triColor = this.triangleToColor(i);
GL11.glColor3ub((byte)triColor[2], (byte)triColor[1], (byte)triColor[0]);
}
GL11.glBegin(GL11.GL_TRIANGLES);
GL11.glTexCoord2f(t.uvw1.x, t.uvw1.y);
GL11.glNormal3f(t.norm1.x, t.norm1.y, t.norm1.z);
GL11.glVertex3f( (float)t.p1.pos.x, (float)t.p1.pos.y, (float)t.p1.pos.z);
GL11.glTexCoord2f(t.uvw2.x, t.uvw2.y);
GL11.glNormal3f(t.norm2.x, t.norm2.y, t.norm2.z);
GL11.glVertex3f( (float)t.p2.pos.x, (float)t.p2.pos.y, (float)t.p2.pos.z);
GL11.glTexCoord2f(t.uvw3.x, t.uvw3.y);
GL11.glNormal3f(t.norm3.x, t.norm3.y, t.norm3.z);
GL11.glVertex3f( (float)t.p3.pos.x, (float)t.p3.pos.y, (float)t.p3.pos.z);
GL11.glEnd();
}
as you can see, i have a selection function that's called everytime the mouse is clicked, i then disable the lightining and the texture, and then i render the scene again in the unique colors, and then read the pixles buffer, and the call of:
GL11.glReadPixels(x, y, 1, 1, GL11.GL_RGB, GL11.GL_UNSIGNED_BYTE, pixelbuff);
gives me wrong values .. and its driving me nutz !
btw, the main render function is render(mesh m, boolean inPickingMode) as u can see, you can also see that there is texture on the model before the mouse clicking ..
there are several problems with the example.
First, you're not clearing the color and depth-buffer when clicking the mouse (that causes the scene with color polygons to be mixed into the scene with textured polygons - and then it doesn't work). you need to call:
GL11.glClear(GL11.GL_COLOR_BUFFER_BIT | GL11.GL_DEPTH_BUFFER_BIT);
Second, it is probably a bad idea to use materials when color-picking. I'm not familiar with the GLMaterial class, but it might enable GL_COLOR_MATERIAL or some other stuff, which modifies the final color, even if lighting is disabled. Try this:
if(!inPickingMode) { // === add this line ===
// activate new material and texture
currMtl = t.materialID;
mtl = (materials != null && materials.length>0 && currMtl >= 0)? materials[currMtl] : defaultMtl;
mtl.apply();
GL11.glBindTexture(GL11.GL_TEXTURE_2D, mtl.textureHandle);
} // === and this line ===
Next, and that is not related to color picking, you call glBegin() too often for no good reason. You can call it in render(), before the triangle drawing loop (but that shouldn't change how the result looks like):
GL11.glBegin(GL11.GL_TRIANGLES);
// draw triangles until material changes
for ( ; i < m.triangles.length && (t=m.triangles[i])!=null && currMtl == t.materialID; i++) {
drawTriangle(t, i, inPickingMode);
}
GL11.glEnd();
--- now i am answering a little beyond the original question ---
The thing about color picking is, that the renderer has only limited number of bits to represent the colors (like as little as 5 bits per channel), so you need to use colors that do not have these bits set. It might be a bad idea to do this on a mobile device.
If your objects are simple enough (can be represented by, say a sphere, for picking), it might be a good idea to use raytracing for picking objects. It is pretty simple, the idea is that you take inverse of modelview-projection matrix, and transform points (mouse_x, mouse_y, -1) and (mouse_x, mouse_y, +1) by it, which will give you position of mouse at the near and at the far view plane, in object space. All you need to do is to subtract them to get direction of ray (origin is at the near plane), and you can pick your objects using this ray (google ray - sphere intersection).
float[] mvp = new float[16]; // this is your modelview-projection
float mouse_x, mouse_y; // those are mouse coordinates (in -1 to +1 range)
// inputs
float[] mvp_inverse = new float[16];
Matrix.invertM(mvp_inverse, 0, mvp, 0);
// inverse the matrix
float nearX = mvp_inverse[0 * 4 + 0] * mouse_x +
mvp_inverse[1 * 4 + 0] * mouse_y +
mvp_inverse[2 * 4 + 0] * -1 +
mvp_inverse[3 * 4 + 0];
float nearY = mvp_inverse[0 * 4 + 1] * mouse_x +
mvp_inverse[1 * 4 + 1] * mouse_y +
mvp_inverse[2 * 4 + 1] * -1 +
mvp_inverse[3 * 4 + 1];
float nearZ = mvp_inverse[0 * 4 + 2] * mouse_x +
mvp_inverse[1 * 4 + 2] * mouse_y +
mvp_inverse[2 * 4 + 2] * -1 +
mvp_inverse[3 * 4 + 2];
float nearW = mvp_inverse[0 * 4 + 3] * mouse_x +
mvp_inverse[1 * 4 + 3] * mouse_y +
mvp_inverse[2 * 4 + 3] * -1 +
mvp_inverse[3 * 4 + 3];
// transform the near point
nearX /= nearW;
nearY /= nearW;
nearZ /= nearW;
// dehomogenize the coordinate
float farX = mvp_inverse[0 * 4 + 0] * mouse_x +
mvp_inverse[1 * 4 + 0] * mouse_y +
mvp_inverse[2 * 4 + 0] * +1 +
mvp_inverse[3 * 4 + 0];
float farY = mvp_inverse[0 * 4 + 1] * mouse_x +
mvp_inverse[1 * 4 + 1] * mouse_y +
mvp_inverse[2 * 4 + 1] * +1 +
mvp_inverse[3 * 4 + 1];
float farZ = mvp_inverse[0 * 4 + 2] * mouse_x +
mvp_inverse[1 * 4 + 2] * mouse_y +
mvp_inverse[2 * 4 + 2] * +1 +
mvp_inverse[3 * 4 + 2];
float farW = mvp_inverse[0 * 4 + 3] * mouse_x +
mvp_inverse[1 * 4 + 3] * mouse_y +
mvp_inverse[2 * 4 + 3] * +1 +
mvp_inverse[3 * 4 + 3];
// transform the far point
farX /= farW;
farY /= farW;
farZ /= farW;
// dehomogenize the coordinate
float rayX = farX - nearX, rayY = farY - nearY, rayZ = farZ - nearZ;
// ray direction
float orgX = nearX, orgY = nearY, orgZ = nearZ;
// ray origin
And finally - a debugging suggestion: try to render with inPickingMode set to true so you can see what is it that you are actually drawing, on screen. If you see texture or lighting, then something went wrong.

How is make_heap in C++ implemented to have complexity of 3N?

I wonder what's the algorithm of make_heap in in C++ such that the complexity is 3*N? Only way I can think of to make a heap by inserting elements have complexity of O(N Log N). Thanks a lot!
You represent the heap as an array. The two elements below the i'th element are at positions 2i+1 and 2i+2. If the array has n elements then, starting from the end, take each element, and let it "fall" to the right place in the heap. This is O(n) to run.
Why? Well for n/2 of the elements there are no children. For n/4 there is a subtree of height 1. For n/8 there is a subtree of height 2. For n/16 a subtree of height 3. And so on. So we get the series n/22 + 2n/23 + 3n/24 + ... = (n/2)(1 * (1/2 + 1/4 + 1/8 + . ...) + (1/2) * (1/2 + 1/4 + 1/8 + . ...) + (1/4) * (1/2 + 1/4 + 1/8 + . ...) + ...) = (n/2) * (1 * 1 + (1/2) * 1 + (1/4) * 1 + ...) = (n/2) * 2 = n. Or, formatted maybe more readably to see the geometric series that are being summed:
n/2^2 + 2n/2^3 + 3n/2^4 + ...
= (n/2^2 + n/2^3 + n/2^4 + ...)
+ (n/2^3 + n/2^4 + ...)
+ (n/2^4 + ...)
+ ...
= n/2^2 (1 + 1/2 + 1/2^4 + ...)
+ n/2^3 (1 + 1/2 + 1/2^3 + ...)
+ n/2^4 (1 + 1/2 + 1/2^3 + ...)
+ ...
= n/2^2 * 2
+ n/2^3 * 2
+ n/2^4 * 2
+ ...
= n/2 + n/2^2 + n/2^3 + ...
= n(1/2 + 1/4 + 1/8 + ...)
= n
And the trick we used repeatedly is that we can sum the geometric series with
1 + 1/2 + 1/4 + 1/8 + ...
= (1 + 1/2 + 1/4 + 1/8 + ...) (1 - 1/2)/(1 - 1/2)
= (1 * (1 - 1/2)
+ 1/2 * (1 - 1/2)
+ 1/4 * (1 - 1/2)
+ 1/8 * (1 - 1/2)
+ ...) / (1 - 1/2)
= (1 - 1/2
+ 1/2 - 1/4
+ 1/4 - 1/8
+ 1/8 - 1/16
+ ...) / (1 - 1/2)
= 1 / (1 - 1/2)
= 1 / (1/2)
= 2
So the total number of "see if I need to fall one more, and if so which way do I fall? comparisons comes to n. But you get round-off from discretization, so you always come out to less than n sets of swaps to figure out. Each of which requires at most 3 comparisons. (Compare root to each child to see if it needs to fall, then the children to each other if the root was larger than both children.)