Intuition on YOLO bounding box width & height normalization formula

Intuition on YOLO bounding box width & height normalization formula - computer-vision

I found an article on how to compute yolo bounding box coordinates: x, y, width, height
http://christopher5106.github.io/object/detectors/2017/08/10/bounding-box-object-detectors-understanding-yolo.html
I dont understand what is the idea and intuition behind this computation? How does this lead to width and height be normalized to (0,1) relative to the original image?

Related

Measure actual size of object in image given pixel dimension and distance from camera

I'm working with computer vision application, and I need to know the actual size (height) of object in image given the pixel height and the distance from camera.
I have this setup:
the black circle in left is the object with known height of 2cm, having 18cm distance from the camera (color red). Using the Y-component of bounding rectangle I got the pixel height of object which is 50px.
On the second setup I placed the object closer to camera:
and I got 50px in height. I know that the relationship between the distance and height is inversley proportional. But given this information, i cannot connect these numbers to acquire the actual height of unknown object to be placed like this:
The distance from the camera is not fixed, I need to know the actual size given the distance and pixel height.
any idea ??? please help me .. Thank you
BTW
can i use this???
say:
y = actual height, x = pixel height, z = distance
the relationship would be : y = xz/k
is it right ?

Using OpenCV how could I crop an image based on the x and y coordinates, and allow that x and y coordinate to be the center of the crop?

I have attempted to crop images using OpenCV. I already have the coordinates for the specific parts of the images that I want cropped. The problem that I am having is that since when you use a rectangle to crop in opencv the coordinates you give it are the top left of the rectangle, so in my case its cutting off half of a face since I have the coordinates centered in the middle of the face. Is there anyway to make it so that the x and y cords given start in the center of the rectangle so that you can create a portion to crop, from the inside out, not from the outside in? I am also fine with listening to other suggestions for how I could achieve this task.

I don't know that openCV has a way to force the crop to be based off of the center of the image, but the solution is pretty simple anyways.
Right now you have something like cv::Rect imageToCrop(X, Y, Width, Height);
Change it to: cv::Rect imageToCrop(X - (Width/2), Y - (Height/2), Width, Height);
And that will center it around your X and Y

What's a Texel-Space Height Derivative and how do I calculate it?

The specific context is here:
http://www.rorydriscoll.com/2012/01/11/derivative-maps/
It's an article about bump mapping. Specifically, the author mentions "texel-space height derivatives". I'm not sure what this means.
Texel-Space I believe refers to the width and height of the texture/image. The texture/image itself isn't colour values, it's height values. So, I imagine the texel-space height derivative would be the (x,y) slope on each pixel in relation to the neighbouring pixels in the x and y directions?
How would I even calculate this? I don't understand what the size of the texture has to do with the local slope, especially considering if the pixel is at the lowest of it's neighbours in the x and y directions, what would it's slope be?

Problem with Multigradient brush implementation from scatch in C++ and GDI

I am trying to implement a gradient brush from scratch in C++ with GDI. I don't want to use GDI+ or any other graphics framework. I want the gradient to be of any direction (arbitrary angle).
My algorithm in pseudocode:
For each pixel in x dirrection
For each pixel in the y direction
current position = current pixel - centre //translate origin
rotate this pixel according to the given angle
scalingFactor =( rotated pixel + centre ) / extentDistance //translate origin back
rgbColor = startColor + scalingFactor(endColor - startColor)
extentDistance is the length of the line passing from the centre of the rectangle and has gradient equal to the angle of the gradient
Ok so far so good. I can draw this and it looks nice. BUT unfortunately because of the rotation bit the rectangle corners have the wrong color. The result is perfect only for angle which are multiples of 90 degrees. The problem appears to be that the scaling factor doesn't scale over the entire size of the rectangle.
I am not sure if you got my point cz it's really hard to explain my problem without a visualisation of it.
If anyone can help or redirect me to some helpful material I'd be grateful.

Ok guys fixed it. Apparently the problem was that when I was rotating the gradient fill (not the rectangle) I wasn't calculating the scaling factor correctly. The distance over which the gradient is scaled changes according to the gradient direction. What must be done is to find where the edge points of the rect end up after the rotation and based on that you can find the distance over which the gradient should be scaled. So basically what needs to be corrected in my algorithm is the extentDistance.
How to do it:
•Transform the coordinates of all four corners
•Find the smallest of all four x's as minX
•Find the largest of all four x's and call it maxX
•Do the same for y's.
•The distance between these two point (max and min) is the extentDistance

Opengl: fit a quad to screen, given the value of Z

Short Version of the question:
I will put a quad. I know the width and height of the screen in window coordinates, i know the Z-coordinates of the quad in 3D. I know the FOVY, I know the aspect. The quad will be put along Z-axis, My camera doesn't move (placed at 0, 0, 0). I want to find out the width and height of the quad IN 3D COORDINATES that will fit exactly onto my screen.
Long Version of the question:
I would like to put a quad along the Z-axis at the specified offset Z, I would like to find out the width and height of the quad that will exactly fill the entire screen.
I used to have a post on gamedev.net that uses a formula similar to the following:
*dist = Z * tan ( FOV / 2 )*
Now I can never find the post! Though it's similar, it is still different, because I remembered in that working formula, they do make use of screenWidth and screenHeight, which are the width and height of the screen in window coordinates.
I am not really familiar with concepts like frustum, fov and aspect so that's why I can't work out the formula on my own. Besides, I am sure I don't need gluUnproject (I tried it, but the results are way off). It's not some gl calls, it's just a math formula that can find out the width and height in 3D space that can fill the entire screen, IF Z offset, width in window coordinates, and height in window coordinates, are known.

Assuming the FOV is measured in Y-Z plane, then:
Height = Z * tan(fov/2)
width = height * aspect_ratio

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Intuition on YOLO bounding box width & height normalization formula - computer-vision

Related

Measure actual size of object in image given pixel dimension and distance from camera

Using OpenCV how could I crop an image based on the x and y coordinates, and allow that x and y coordinate to be the center of the crop?

What's a Texel-Space Height Derivative and how do I calculate it?

Problem with Multigradient brush implementation from scatch in C++ and GDI

Opengl: fit a quad to screen, given the value of Z

Categories

Resources