I was trying to run train_object_detector.cpp in dlib library to train it for pedestrian detection. I'm using INRIA dataset and when i tried to use it, there was an exception:
exception thrown!
Error! An impossible set of object boxes was given for training. All
the boxes
need to have a similar aspect ratio and also not be smaller than about
1600
pixels in area. The following images contain invalid boxes:
crop001002.png
crop001027.png
crop001038.png
crop001160.png
crop001612.png
crop001709.png
Try the -h option for more information.
when i removed these photos, it did run and loaded all photos but then another exception was thrown
exception thrown!
An impossible set of object labels was detected. This is happening
because none
of the object locations checked by the supplied image scanner is a
close enough
match to one of the truth boxes. To resolve this you need to either
lower the
match_eps or adjust the settings of the image scanner so that it hits
this truth box. Or you could adjust the offending truth rectangle so
it can be matched by the current image scanner. Also, if you are using
the scan_image_pyramid object then you could try using a finer image
pyramid or adding more detection templates. E.g. if one of your
existing detection templates has a matching width/height ratio and
smaller area than the offending rectangle then a finer image pyramid
would probably help.
please help me to deal with that.
Did you label your images using ImgLab?
When you label your images with this tool, keep in mind that your bounding boxes must have a similar aspect ration and that these bounding boxes must be smaller than the sliding window.
Usually, the example that you are running should dynamically calculate the size of the sliding window according to the provided boxes.
I'd suggest that you modify the source code a bit to do further tracking for the error source, if non of these helps.
Related
I am pretty new to CV, so forgive my stupid questions...
What I want to do:
I want to recognize a RC plane in live video (for now its only a recorded video).
What I have done so far:
Differences between frames
Convert it to grey scale
GaussianBlur
Threshold
findContours
Here are some example frames:
But there are also frames with noise, so there are more objects in the frame.
I thought I could do something like this:
Use some object recognition algorithm for every contour that has been found. And compute only the feature vector for each of these bounding rectangles.
Is it possible to compute SURF/SIFT/... only for a specific patch (smaller part) of the image?
Since it will be important that the algorithm is capable of processing real time video I think it will only be possible if I don't look at the whole image all the time?! Or maybe decide for example if there are more than 10 bounding rectangles I check the whole image instead of every rectangle.
Then I will look at the next frame and try to match my feature vector with the previous one. That way I will be able to trace my objects. Once these objects cross the red line in the middle of the picture it will trigger another event. But that's not important here.
I need to make sure that not every object which is crossing or behind that red line is triggering that event. So there need to be at least 2 or 3 consecutive frames which contain that object and if it crosses then and only then the event should be triggered.
There are so many variations of object recognition algorithms, I am bit overwhelmed.
Sift/Surf/Orb/... you get what I am saying.
Can anyone give me a hint which one I should chose or if what I am doing is even making sense?
Assuming the plane location doesn't change a lot from one frame to the next, I think you should look at object tracking instead of trying to estimate the location independently in each frame.
http://docs.opencv.org/modules/video/doc/motion_analysis_and_object_tracking.html
I have a question about preparing the dataset of positive samples for a cascaded classifier that will be used for object detection.
As positive samples, I have been given 3 sets of images:
a set of colored images in full size (about 1200x600) with a white background and with the object displayed at a different angles in each image
another set with the same images in grayscale and with a white background, scaled down to the detection window size (60x60)
another set with the same images in grayscale and with a black background, scaled down to the detection window size (60x60)
My question is that in set 1, should the background really be white? Should it not instead be an environment that the object is likely to be found in in the testing dataset? Or should I have a fourth set where the images are in their natural environments? How does environment figure into the training samples?
The background should be a typical environment of the object, because when you actually try to detect the objects, the search window will always include some of the background. The best thing is to crop the objects from natural images.
If you use the trainCascadeObjectDetector function in MATLAB, you do not even have to crop the samples. It lets you specify multiple bounding boxes per image. You also do not have to worry about the size of the samples, because trainCascadeObjectDetector will resize them for you.
There is a very handy GUI app on MATLAB file exchange for labeling objects of interest in images designed for use with trainCascadeObjectDetector.
Edit: couple of other points. Your negative images should also contain backgrounds typically associated with your objects of interest. Here is a tutorial that explains how to prepare training data and how to set some of the parameters.
I'm hoping you can help and I'm hoping this isn't a duplicate (I've already been searching for the past 2 hours trying to find a solution).
What I'm currently doing:
I am using the Pikachoose Photo Gallery - http://www.pikachoose.com but I am open to others. I just need a very basic photo gallery that I can populate from a database. I have gotten the photo gallery looking 95% of how I need it to look but I'm running into a big problem. Although the width is set, the height is not as this would cause skewing of the image. I would assume most of the images will be landscape. So what happens while the slider rotates through the images, the main larger viewable area changes height based on the image that is loaded. It would be fine if all images were the same height/width, but that's not a possibility as I will not have control over the images loaded.
They are loaded into an unordered list and when clicked, will open a page that has the full image.
What I need it to do:
I'm familiar with how sprites work, but I wanted to know if there was a way to incorporate that type of functionality into the slider (i.e. Flickr does for thumbnails). I want to have a fixed height/width viewable area, such as 220px x 150px, and have the image load into that making the longest side 220 + 40 pixels (in case it's a narrow image) and center it, but you still only see 220px x 150px. Same with the thumbnails, I want them to be set width/height 50x50 pixels, which they are right now, but the images are squished. If I can solve either the main image or the thumbnail, I think I can solve it for the other. But I don't even know where to start.
See below for a visual example. Note: I meant for the thumbnails to read "set shortest side to 50px" not longest...
I'm currently working a computer vision application with OpenCV. The application involves target identification and characteristic determination. Generally, I'm going to have a target cross into the visible region and slowly move through it in a couple of seconds. This should give me upwards of 50-60 frames from the camera in which I'll be able to find the target.
We have successfully implemented the detection algorithms using SWT and OCR (the targets all have alphanumeric identifiers, which makes them relatively easy to pick out). What I want to do is use as much of the data as possible from all 50-60 shots of each target. To do this, I need some way to identify that a particular ROI of image 2 contains the same target as another ROI from image 1.
What I'm asking for a little advice from anyone who may have come across this before. How can I easily/quickly identify, within a reasonable error margin, that ROI #2 has the same target as ROI#1? My first instinct is something like this:
Detect targets in frame 1.
Calculate certain unique features of each of the targets in frame 1. Save.
Get frame 2.
Immediately look for ROIs which have the same features as those calc'd in step 2. Grab these and send them down the line for further processing, skipping step 5.
Detect new targets in frame 2.
Pass targets to a thread to calculate shape, color, GPS coordinates, etc.
Lather, rinse, repeat.
I'm thinking that SURF or SIFT features might be a way to accomplish this, but I'm concerned that they might have trouble identifying targets as the same from frame to frame due to distortion or color fade. I don't know how to set a threshold on SIFT/SURF features.
Thank you in advance for any light you can shed on this matter.
One thing you can do is locally equalize brightness and possibly saturation levels. If you aren't using an advanced space such as YCrCb or HSV, I suggest you try them.
Can you assume that the object is not moving too fast? If you feed the previous position in the detection routine, you can decrease the size of the window you are looking at. Same thing goes with the speed, and direction of movement.
I've successfully used histogram composition and shape descriptors of a region in order to reliably detect it, you can use that or add it to a SURF/SIFT classifier.
I am familiar with openCV, a powerful open source library and using that I am dealing with farm industry project where a mouse will be injected with drug , and its been kept on so called a stage which is surrounded by cylinder with painted strips of successive white and black. So i need to find out how many times the mouse will rotate its head to words the rotation of the cylinder . (its because it has got hang of drug) . How can i achieve this any opencv experts can help me out there.
I have added an image below
Seems an interesting one, these are my preliminary suggestions...
Depends on the resolution of the camera and how far your object (mouse) is from the camera...coz mouse is a small object so the image of the mouse need to cover good number of pixels in the image to differentiate head movement...
I don't think the mouse will stick to one position..it will keep moving in the cage...so you need to track the mouse...
At every position of the mouse you need to find the position of the head with respect to the body....that you can do using template matching (create templates of the head of the mouse)
Hence more info and some sample pictures are necessary to get the clear idea of the scene
EDIT AFTER IMAGE UPLOADED
since the camera is fixed hence create a circular region of interest...so that only movement inside this circle concerns you and not the moving cylinder outside the circle
subtract the present frame from the previous frame (frame differentiation) and store the absolute of the difference in an image.
absdiff(frameNow,framePrevs,diffofFrames);
threshold the diffofFrames as required to get the current position of the rat...
Now the task is easier if the image clearly shows its nose...since the nose has a pointed shape it can be detected by some template matching....however from the image you have given its difficult to make out the nose against a black background...However I can only suggest you the following process... green circles denote the tip of the nose...all I am trying to do is to get orientation of the head w.r.t. the body....for good results you need to have good images...