Is there any method to create a polygon(not a rectangle) around an object in an image for object recognition.
Please refer the following images:
the result I am looking for
and
the original image
.
I am not looking for bounding rectangles like this.I know the concepts of transfer learning, using pre-trained models for object recognition and other object detection concepts.
The main aim is the object detection but not giving results using bounding box but a fitter polygon instead.Link to some resources or papers will be helpful.
Here is a very simple (and a bit hacky) idea, but it might help: take a per-pixel scene labeling algorithm, e.g. SegNet, and then turn the resulting segmented image into a binary image, where the white pixels are the class of interest (in your example, white for cars and black for the rest). Now compute edges. You can add those edges to the original image to obtain a result similar to what you want.
What you want is called image segmentation, which is different to object detection. The best performing methods for common object classes (e.g. cars, bikes, people, dogs,...) do this using trained CNNs, and are usually called semantic segmentation networks awesome links. This will, in theory, give you regions in your image corresponding to the object you want. After that you can fit an enclosing polygon using what is called the convex hull.
Related
What is the fastest bounding box prediction algorithm without the classification?
For example, I have a flat surface with objects on top of it. I don't to need to know the type of the objects, I need only their bounding boxes. Something like pixel wise segmentation for two types of objects: ground and item.
I think that what you are looking for are models for "salient object detection" ("dealing with locating and segmenting the most salient object or region in a scene").
The output of such a model is a map of the same size as the input image, where each pixel's value is the probability that it is part of a salient object. Using some threshold value you can locate the boundaries of the object.
You can probably find information regarding the processing requirements of different models in an article named "Salient Object Detection: A Survey" (It was just recently updated).
Thresholded Image
BGR Image
Fitted Thresholded Image
Hi all. I'm working on a project about computer vision using OpenCV for C++ interface. My purpose is to track a moving deformable object that is marked with a colored tape. By processing each frame of the video I'm able to effectively isolate the color (as you can see in the thresholded image) and track its trajectory, movement and shape into the BGR image.
My problem is that I need to extrapolate an equation or polynomial that can describe the current shape assumed by my tracked object.
Is there an effective way to do this? I've no idea on how to address the problem.
Thanks in advance,
Cheers!
If your final goal is to detect your shape in various forms i think you want to read about Active shape model: https://en.wikipedia.org/wiki/Active_shape_model
If you just want to get a polynomial fit of the shape in each instance of time i would use the suggestion of Cherkesgiller Tural and read about 2D curve fitting.
If I understood correctly:
I would start to fit a polygon on your shape. A common method for that is alpha-shapes.
You can also try an optimization approach which is enormously powerful because you can basically design your cost-function and constrains however you want. But it is computationally very costly (depending on the algorithm).
Have a look at this thread: It might help you.
I have a set of simple rigid 3D objects that I wish to detect and recognize from an image (let's say 5 to 10 classes). The objects are simple in sense that they are cylinders in one color or rectangles with simple patterns (stripes for example) or some similarly simple shape. The objects are significantly different from one another (there aren't for example two classes where one is a large cylinder and another one is the same but smaller cylinder).
Because the textures are pretty simple (solids and/or simple patterns), bag-of-words approach fails (they do not contain significant number of unique edges).
While one possible approach is coding manually each classifier (manual feature extraction etc), is there a simple data driven approach (Haar/LBP classifier for example) that would work? If Haar or LBP are good for solving this problem, how would one solve the problem of unknown relative viewpoint (and by such perspective distortion, rotation, etc)? Would just providing positive images from all possible viewpoints for an object converge or is there something else that's usually done? The detection and recognition should run in real-time.
Based on your description of your problem, I see several drawbacks of a Haar or LBP-based detector. First, these features do not use color, which seems to be important here. Second, a classifier using Haar or LBP features is sensitive to in-plane and out-of-plane rotation. If your objects can be in any 3D orientation, you would need to discretize the range of 3D rotations and train a separate detector for each one. For example, for face detection you typically use two detectors: one for frontal faces, and one for profile faces. Finally, if there is not enough texture for bag-of-words, there also may not be enough texture for Haar or LBP.
Since your objects are simple 3D shapes, I would start by trying to detect straight lines and circles using the Hough transform, and trying to group them to form the object's outlines.
A little introduction on what I'm doing ...
For academic purposes I am creating an application in c++ using opencv for the detection of static objects in a scene.
The application is based on a combined approach of background subtraction and tracking, and the detection of events related to the abandonment of the objects works fine.
But at the moment I have a problem that I can't solve; I have to implement a finite state machine for detect the event of object removal, both before and after the entry of the object in the background.
To do this I was ordered by my superiors to use the edges of objects.
And now the problem.
After detecting a vehicle illegally parked along a road, I need to compare the edges of various images (the background captured at the time of the alarm, the current background, the current frame) to understand what the vehicle do (picks up the movement, remains parked or picks up the movement after being in the background).
I run these comparisons on the region of the scene in which there is the vehicle (vehicles typically have different size), I pull the edges using canny algorithm by obtaining a binarized CV_8UC1 cv::Mat.
At this point I have to compare them.
I tried to detect the contours with findContours and compare them with matchShapes, but it does not seem the right way, I'd compare each contour of the first image with every contour of the second, in addition typically the two images to campare have different number of contour (for example original background and current background, because the edges of the current background increased with the entry of the vehicle in the background).
I also tried to create a new image in which each pixel corresponds to the absolute difference of the other two, then I counted the white pixels of the difference image (wPx), and I used this number for comparison in this way: I set two thresholds (thr1 and thr2), and counted the pixels of the bounding rect of the vehicle (perim), if wPxthr2*perim images are different.
(I set percentages thresholds and I moltipy them with the perimeter of the bounding box to adapt the thresholds to the vehicle dimensions.)
This solution, however, seems to be very little robust.
Do you have something simple to suggest me?
Thank you very much in any case, more than once you StackOverflow users have helped me!
PS: THIS is an example of the images that I have to compare
The first is the background without the vehicle stationary, contains the edges of the street;
the second is the original background, the one captured when the stationary vehicle is detected;
the third is the current background (which in this case is equal to the original being the same frame, but then change);
the fourth is the current frame of the video;
You may want to take a look at this paper: A Novel SIFT-Like-Based Approach
for FIR-VS Images Registration. Aguilera et al. propose an Edge Oriented Histogram descriptor (EOH-SIFT).
This paper intends to register multispectral images, visible and infrared image, to each other. Because of the different characteristics of the images, the authors first extract edges/contours in both images, which results in images similiar to yours.
So, you can describe your image patches using this descriptor, illustrated in the following figure (taken from the above paper):
Subdivide your image patch into 4x4 zones
For each of the 16 subregions compose a histogram of contour's orientation (5 bins)
Put the histograms together into one descriptor vector of size 16x5=80 bins
Normalize the feature vector
So, every image you want to compare (in your case 4) is described by its 80-dimensional feature vector. You can compare them to each other by calculating and evaluating the Euclidean distance between them.
Note: Here a patch of size 80x80 or 100x100 (NxN) pixels is suggested. You may have to adjust the sizes to your image sizes.
I came across an application called PhysicsEditor and it traces images and results in vertices that make up the shape. I'm interested in implementing something that would do this but I'm not sure what type of algorithm can do this.
You get all of the points that make up the image (you might need to do this with edge detection or some kind of PCA if you're dealing with bitmaps)
Then you compute a convex hull : http://en.wikipedia.org/wiki/Convex_hull