about bounded boxes of objects - computer-vision

i'm trying to compose a dataset for the detection of soccer players, ball etc.. in a soccer game, i'm using alexeyAB Darknet framework,
So in the labeling phase in each image there are at least 8 players a ball and other stuff, at some point it is logical to think about the fact that i will have enough instances per player, but not enough for the ball and the goalkeeper for example,
so can i only marking bounded boxes the ball and other things avoiding to do it with the players to not waste time?

If you are training the model on your own dataset, I would recommend limiting the number of labels/classes in your data to what you seek. For example if you only want your model to see balls, goal-posts and Not players, simply keep the classes as balls and goal-posts. (This reminds me of a classification problem where 0 stands for balls and 1 stands for goal-post). P.S you mentioned object detection and Not Localization, which is the purpose of the YOLO models.

Related

OpenGL C++ Circle vs Circle Collision

I am making a 2D game and I have created two circles, one named player and one named enemy. The circles are drawn by calling a drawPlayerCircle and drawEnemyCircle function, both which are called in my display() function. At the moment my project isn't Object Orientated which you can already probably tell. My collision works for both circles as I have calculated penetration vectors and normalised vectors etc... However my question is, expanding this game further by having 10-20 enemies in the game, how can I work out the collision to every single enemy quickly rather than calculating manually the penetration and normalised vectors between my player and every enemy in the game which is obviously very inefficient. I'm guessing splitting my player and enemy entities into their own class but I'm still struggling to visualise how the collision will work.
Any help will be greatly appreciated! :)
Collision between two circles is very quick to test. Just check if the distance between the circle centre points is less than the sum of the two radii.
I would just start by comparing the player against every other enemy. Doing 20 comparisons is not going to tax any computer from the last 10 years. I'm a big believer in just writing the code that you need right now and not complicating things, especially when you are just starting out.
However if you end up having millions of circles and you've profiled your code and found that testing all circles is slow, there are quite a few options. The simplest might be to have a grid over the play area.
Each grid cell covers a fixed amount of space over the play area and stores a list of circles. Whenever a circle moves calculate which grid cell it should be in. Remove it from the grid cell that it is currently in, and add it to the grid cell that it should be in. Then you can compare only the circles that overlap the player's grid cell.
Space partitioning. Specifically, if you're in 2D, you can use a quadtree to speed up collision detection.

OpenCV - Detection of moving object C++

I am working on Traffic Surveillance System an OpenCv project, I need to detect moving cars and people. I am using background subtraction method to detect moving objects and thus drawing counters.
I have a problem :
When two car are moving on road closely them my system detects it as one car, I have used all efforts like canny-edge detection, transformation etc. Can anyone tell me any particular methodology to solve this type of problems.
Plenty of solutions are possible.
A geometric approach would detect that the one moving blob is too big to be a single passenger car. Still, this may indicate a car with a caravan. That leads us to another question: if you have two blobs moving close together, how do you know it's two cars and not one car towing a caravan? You may need to add some elementary shape detection.
Another trivial approach is to observe that cars do not suddenly multiply. If you have 5 video frames, and in 4 of them you spot two cars, then it's very very likely that the 5th frame also has two cars.
CV system tracks object as moving blobs (“clouds” of moving pixels) identifies them and distinct one from another in case of occlusions. When two (or more) blobs are intersected, system merges them in one combined object and marks it by IDs of all those source-objects that currently included in the combination. When one of objects separates from the combination CV system recognize which one is out and re-arrange ID appropriately.

Object Detection: Training Requried or No Training Required?

This question is related to Object detection, and basically, detecting any "known" object. For an example, imagine I have the below objects.
Table
Bottle.
Camera
Car
I will take 4 photos from all of these individual object. One from left, another from right, and other 2 from up and down. I originally thought it is possible to recognize these objects with these 4 photos per each, because you have the photos in all 4 angles, no matter how you see the object you can detect it.
But I got confused with someones idea about training the engine with thousands of positive and negative images from each object. I really don't think this is required.
So simply speaking, my question is, in order to identify an object, do I need these thousands of positive and negative objects? Or else simply 4 photos from 4 angles is enough?
I am expecting to use OpenCV for this.
Update
Actually the main thing is something like this.. Imagine that I have 2 laptops. One is Dell and the other one is HP. Both are laptops but you know, they have clearly visible differences including the Logo. Can we do this using Feature Description? If not, how "hard" the "training" process? How many pics needed?
Update 2
I need to detect "specific" objects. Not all the cars, all the bottles etc. For an example, the "Maruti Car Model 123" and "Ferrari Car Model 234" are both cars but different. Imagine I have the pictures of Maruti and Ferrari of above mentioned models, then I need to detect them. I don't have to worry about other cars or vehicles, or even other models of Maruti and Ferrari. But the above mentioned "Maruti Car Model 123" should be identified as "Maruti Car Model 123" and above mentioned "Ferrari Car Model 234"should be identified as "Ferrari Car Model 234". How many pictures do I need for this?
Answers:
If you want to detect a specific object and you don't need to account for view point changes, you can use 2D features:
http://docs.opencv.org/doc/tutorials/features2d/feature_homography/feature_homography.html
To distinguish between 2 logos, you'll probably need to build a detector for each logo which will be trained on a set of images. For example, you can train a Haar cascade classifier.
To distinguish between different models of cars, you'll probably need to train a classifier using training images of each car. However, I encountered an application which does that using a nearest neighbor approach - it just extracts features from the given test image and compares it to a know set of images of difference car models.
Also, I can recommend some approaches and packages if you'll explain more on the application.
To answer the question you asked in the title, if you want to be able to determine what the object in the picture is you need a supervised algorithm (a.k.a. trained). Otherwise you would be able to determine, in some cases, the edges or the presence of an object, but not what kind of an object it is. In order to tell what the object is you need a labelled training set.
Regarding the contents of the question, the number of possible angles in a picture of an object is infinite. If you just have four pictures in your training set, the test example could be taken in an angle that falls halfway between training example A and training example B, making it hard to recognize for your algorithm. The larger the training set the higher the probability of recognizing the object. Be careful: you never reach the absolute certainty that your algorithm will recognize the object. It just becomes more likely.

Detect person in bed

Suppose I want to find out if there is a person in a bed or not using cameras and computer vision algorithms. One can assume that the camera provides RGB, infrared and depth data.
I don't really have a good idea how to solve this. So far I came up with this:
Estimate a plane using RANSAC of the bed object. This plane should be further away from the ground plane, if there is a person in the bed. This seems very unstable though, assumes that the normal height of a bed is known and can easily be broken if the bed has an adjustable head part (e.g. in a hospital)
Face detection. Try to detect a face in the bed. Probably also isn't very reliable since the face can be sideways to the camera and partly covered.
Use the infrared-image. I am not sure how much you would see through the blanket and what would happen if the person just left the bed and the bed is still warm?
Is there a good way to do this? Or, to be reliable, you would have to use pressure sensors in the bed?
Thanks!
I dont know about infrared images but for camera based video processing this kind of problem is widely studied.
If your problem is to detect a person in a bed which is "Normally empty" then I think the simplest algorithm would be to capture successive frames and calculate their difference.
The existence of human in the frame would make it different from a frame capturing only empty bed. Depending on various algorithms like this you would get different reliability.
Otherwise you can go directly for human detection in video frames. One possible algorithm is described here.
Edit:
Your problem is harder than i thought. The following approach might solve the cases.
The main idea is to use bunch of features at once to get higher accuracy and remove false positives.
Use HOG person detector at top level to detect a person's entry in the scene. If the position of the possible entry doors are known or detectable using edge lines in the scene use it to increase accuracy. (At the point of entry the diference in successive frames will be located near the doors)
Use Edge lines to track the human. And use the bed edges to track the position of the human. The edges of human should be bounded by the edges of the bed.
If the difference is located within the bed implies human is in the bed but moving.
If needed as a preprocessing step include analysis of texture, connected component to remove possible moving objects in the room for higher accuracy (for example:- movement of clothes because of air).
Also use face detectors to increase accuracy.
Infrared that camera uses has a different frequency than infrared signal from a warm object. Unless you are using military grade IR scanners you can forget about connection IR-warmth. But IR is still useful if there is limited light or you use it for depth maps.
Go with depth (Kinect style) and estimate bed as a segment at your image. It should have some features in depth (certain dimension, flatness, etc). The bed usually surrounded by walls or floor that are easy to segment out. You algorithm can also be tuned to the distance to the bed and cut it out based just on depth range.
As other people said, it will be useful to learn more about your particular goal or application. What is background or environment around the bed? how does it looks when there is no person in it? Can a person simulate his/her presence(as in prison escape scenario), etc. etc.

Lane Detection in an artificial Environment

I'm writing an app that can detect lanes in a driving simulator. The environment is relatively simple, its mostly straight multi-lane roads and almost no curvature at all. At the moment, I can successfully detect lines using the (classical) Hough Transform but the issue is that the HT naturally also detects lines that are not lanes.
How can I be more selective? I do not draw horizontal lines already, but still some lines creep in. Ideally, I would like to detect the lane boundaries that the vehicle is traveling in. The following is a typical image of the environment
Here is what I'm doing so far:
1. Because the environment is more or less the same wherever I drive, I set the region of interest (RoI) to exclude the horizon and anything above it.
2. Threshold the image (I'll explain my reason for threshold in a bit)
3. Canny Edge Detection
4. Apply a Hough Transform
5. Draw the detected lines excluding those which have a gradient of 0.0 or nearly 0.0
The reason for threshold the imaging is as follows. If you take a look at the environment photograph linked above, you'll see a grayish line running parallel to the road. Because its a continuous line - unlike the lane markers - the HT ends up detecting it. I cannot exclude it based on gradient as it has the same gradient as the lane markers. With thresholding, i can remove that and therefore only detect lines that are the actual lane markers.
Here is the result of the above operations
I understand that there are many solutions to this problem and I have read countless papers on this but they all seem to be handling environments vastly more complicated than this and/or are simply way over my head. For what its worth, just a little more than a month ago, I had no background in ComputerVision and so all of this is very very new to me.
UPDATE 1:
I guess to put this in better terms, I'm looking for a way to model the lanes so that lines that do not fit the model are not included. Unfortunately, I do not have a clue about where to begin with models. Any suggestions?
For what its worth, I have managed to identify the lanes that the vehicle is traveling within and can exclude the extra lines that are not part of the "active" lane, so to speak. Hopefully this photo will help
Its not perfect, but its something I guess. My ultimate goal, after modeling, is to generate a heading/position of the vehicle. But I just want to get, relatively, robust lane detection at first. I'm hoping there is a relatively simple technique that can help achieve this (something that does not depend on the system's parameters such as focal length of field of view).
One way to go would be to use prior knowledge of the scene you are looking at. You could have a model with a hidden state, comprising more or less static parameters such as camera height, camera tilt or lane width, and dynamic parameters such as camera yaw, lateral displacement of the camera within the lane, road curvature, etc. You could handle such model in the frame of a Kalman filter. An advantage of such a model would be an ability to tolerate other road surface markings such as direction arrows, zebras and such. Good luck!
Perhaps you could try to find only lines on edges found at grey-white transitions rather than on all edges in the entire image?