Let's say I annotated all images in my dataset to have 20 Bounding Boxes.
I basically want my predicted bounding boxes to also be only 20. After training however, I get differing amounts of bounding boxes, that aren't 20.
I'm trying to detect the same 20 objects in an image. All the objects are the same so I only 1 class for all 20 bounding boxes.
I'm currently using YOLOv5 but is there a better model for a use-case like this?
I suggest selecting the 20 detected objects with higher confidence, you can do that easily by appending the all detected objects boxes to a list as well as the confidence and labels and then iterated through the list with range limitation, and then you can draw the bounding box of the filtered objects (20 objects).
Related
Currently training a YOLO object detection model. I have 2 versions of the same dataset:
Contains full images, bounding boxes and labels
Contains segmented instances and labels
Which version is better to use? I'm inclined to go with the 2nd, but I'm worried that the pixels around the object, but still within the bounding box, can be important.
As I understand in YOLO algorithm we divide inuput image into a grid, for example 19x19 and we have to have output vector (pc, bx, by, bh bw, c) for each cell. Then we can train our network. And my question is: why we give to network XML file with only one bounding box, labels etc. (if only one object is on image) instead of give 19*19=361 ones? Does implementation of network divide image and create vector for each cell automatically? (How it do that?)
The same question is for sliding window algorithm. Why we give to network only one vector with label and bounding box instead of giving vector for each sliding window.
Let' say that the output of YOLO is composed of 19 by 19 grid cells, and each grid cell has some depth. Each grid cell can detect some bounding boxes, whose maximum number depends on the configuration of the model. For example, if one grid cell can detect up to 5 bounding boxes, the model can detect 19x19x5 = 1805 bounding boxes in total.
Since this number is too large, we train the model such that only the grid cell that contains the center of the bounding box within it predicts a bounding box with high confidence. When we train the model, we first figure out where the center of the true bounding box falls, and train the model such that the grid cell containing the center will predict a bounding box similar to the truth one with high probability, and such that other grid cells will predict bounding boxes with as lower probability as possible (when the probability is lower than a threshold, this prediction is discarded).
The image below shows a grid cell containing the box center when the output has 13 by 13 grid cells.
This is the same when there are more than one object in the training images. If there are two object in a training image, we update the two grid cells that contain the centers of the true two boxes such that they produce bounding boxes with high probability.
What is the fastest bounding box prediction algorithm without the classification?
For example, I have a flat surface with objects on top of it. I don't to need to know the type of the objects, I need only their bounding boxes. Something like pixel wise segmentation for two types of objects: ground and item.
I think that what you are looking for are models for "salient object detection" ("dealing with locating and segmenting the most salient object or region in a scene").
The output of such a model is a map of the same size as the input image, where each pixel's value is the probability that it is part of a salient object. Using some threshold value you can locate the boundaries of the object.
You can probably find information regarding the processing requirements of different models in an article named "Salient Object Detection: A Survey" (It was just recently updated).
I'm processing images like this
Images are very large (8192*8192). I need to do two things with these images.
First: Get all the unique colors into an array.
Second: Get the bounding box dimensions of each unique color.
I've no idea how to do both of these things with gpu. For example, I'm using unordered_set to record unique pixel values in my cpu version. GPU doesn't have something like that I assume, so how do I do that? Any suggestions?
I have a list of bounding boxes, I was wondering how I could calculate which ones were redundant / duplicates.
The reason being is I have 2 million of these I send to a API and I want to know which are overlapping others so I can reduce them down so each box only covers a unique area of land, so no two bounding boxes cover the same piece of geo space.
How would I calculate it so that these bounding boxes were each covering their own unique space of geo land ?
I am writing this program in C++ btw.
I think that this task is more complex then you think.
You would have to split existing boxes, untill no overlapping exists, and then remove the boxes totally contained in another.
Instead giving you a solution to that, I recomend to check if you can live with:
1) remove the boxes that are totally contained in another box.
2) leave (partly-)overlapping boxes as they are.
For 2 millions you need a spatial index (QuadTree), to get a list of all boxes nearby one box.
If you have to avoid any overlappings, then you must continue to think what should be the result?
A) A union of overlapping rectangles that therfore is not an rectangle anymore, but a polygon.
or B) The result should be rectangles.
You could check if X% of a box's vertices are inside another box to find if it's overlapped but I suppose this isn't the optimal solution.