Scratch Detection with Limited Samples - computer-vision

Problem:
I'm trying to build a image processing program which detects scratches on a module.
As seen in the image below, a few scratches can be found.
One of the problems is that I have only two samples with scratches.
Question:
What would be the best way to find the scratches under the limited number of sample condition?
(if detection is too hard, accepted / not accepted classification is also fine)
What I tried:
I tried to detect the scratches by using GMM(gaussian mixture model)
-> It didn't work because of too many features. GMM is only effective on object such as textures.
I will try to implement Deep Learning, but I'm not sure if it will work or not.
Image Sample

Related

How can I use dlib for a neural network regression?

It seems that dlib needs a loss layer that dictates how the layers most distant to our input layer are treated. I cannot find any documentation towards the loss layers but it seems that there is no way to have just some summation layer.
Summing up all the values of the last layer would be exactly what I need for the regression, though (see also: https://deeplearning4j.org/linear-regression)
I was thinking along the lines of writing a custom loss layer but could not find information about this, either.
So, have I overseen some corresponding layer here or is there a possibility to have what I need?
The loss layers in dlib are listed in the menu on dlib's machine learning page. Look for the words "loss layers". There is lots of documentation.
The current released version of dlib doesn't include a regression loss. However, if you get the current code from github you can use the new loss_mean_squared layer to do regression. See: https://github.com/davisking/dlib/blob/master/dlib/dnn/loss_abstract.h

FFT based image registration (optionally using OpenCV) in cpp?

I'm trying to align two images taken from a handheld camera.
At first, I was trying to use the OpenCV warpPerspective method based on SIFT/SURF feature points. The problem is the feature-extract & matching process may be extremely slow when the image quality is high (3000x4000). I tried to scale-down the image before find feature-points, the result is not as good as before.(The Mat generated from findHomography shouldn't be affected by scaling down the image, right?) And sometimes, due to lack of good feature point matches, the result is quite strange.
After searching on this topic, it seems that solving the problem in Fourier domain will speed up the registration process. And I've found this question which leads me to the code here.
The only problem is the code is written in python with numpy (not even using OpenCV), which makes it quite hard to re-written to C++ code using OpenCV (In OpenCV, I can only find dft and there's no fftshift nor fft stuff, I'm not quite familiar with NumPy, and I'm not brave enough to simply ignore the missing methods). So I'm wondering why there is not such a Fourier-domain image registration implementation using C++?
Can you guys give me some suggestion on how to implement one, or give me a link to the already implemented C++ version? Or help me to turn the python code into C++ code?
Big thanks!
I'm fairly certain that the FFT method can only recover a similarity transform, that is, only a (2d) rotation, translation and scale. Your results might not be that great using a handheld camera.
This is not quite a direct answer to your question, but, as a suggestion for a speed improvement, have you tried using a faster feature detector and descriptor? In OpenCV SIFT/SURF are some of the slowest methods they have for feature extraction/matching. You could try testing some of their other methods first, they all work quite well and are faster than SIFT/SURF. Especially if you use their FLANN-based matcher.
I've had to do this in the past with similar sized imagery, and using the binary descriptors OpenCV has increases the speed significantly.
If you need only shift you can use OpenCV's phasecorrelate

Object Annotation in images with OpenCV

I am trying to develop an automatic(or semi-automatic) image annotator for my final year project with OpenCV. I have been studying many OpenCV resources and have come across cascade classification for training and detection purposes. I understood that part, and also tried the Face Detection tutorial provided with OpenCV. So, now I know how to train and detect objects.
However, I still cannot understand how can I annotate objects present in the image?
For example, the system will show that this is an object, but I want the system to show that it is a ball. How can i accomplish that?
Thanks in advance.
One binary classificator (detector) can separate objects by two classes:
positive - the object type classifier was trained for,
and negative - all others.
If you need detect several distinguished classes you should use one detector for each class, or you can train multiclass classifier ("one vs all" type of classifiers for example), but it usually works slower and with less accuracy (because detector better search for similar objects). You can also take a look at convolutional networks (by Yann LeCun).
This is a very hard task. I suggest simplifying it by using latent SVM detector and limiting yourself to the models it supplies:
http://docs.opencv.org/modules/objdetect/doc/latent_svm.html

Best algorithm for feature detection in urban environment - OpenCV

I'm using OpenCV library (C++) to extract detectors from 2 images coming from a video stream taker from an aerial camera in order to, afterwards, find the matching points in successive images. i'm wondering which is the best algorithm to find robust detectors of a urban environment??
Ps. Actually I'm using SURF but when the images changes a little (because the camera is translating very slowly) the matchings between these descriptors become very few!
If you want to try different aproaches give a try to RoboRealm , they have a trial version, you just put the algoritms and seems the results, for testing purposes even if you will use OpenCV its ok.

Augmented Reality-PC

I recently saw the virtual mirror concept on you tube, I tried it out and researched about it. It seems that the creators have used augmented reality so that people can see the output on their screens. On researching I found out that we identify a pattern on which a 3D image is superimposed.
Question 1:How are they able to superimpose the jewellery and track the face of the person without identifying any pattern?
I also tried to check various libraries that I can use to make a program similar to the one they show. Seems to me that a lot of people are using Android phones and iPhones and making apps that use augmented reality.
Question 2:Is there any way that I can use c++ and try to make a program that uses augmented reality?
Oh, and the most important thing, the link to the application is provided below:
http://www.boutiqueaccessories.com.au/virtual-mirror/w1/i1001664/
Do try it out. Its a good experience. :D
I'm not able to actually try the live demo, but the linked video suggests that they either use some simplified pattern recognition (get the person's outline), or they simply track you based on the initial image (with your position/texture being determined by the outline being shown.
Following the video, it's easy to see that there's no real/advanced AR behind this. The images are simply overlayed or hidden (e.g. in case it's missing track of one ear due to you looking to the side) and they're not transformed (no perspective or resizing happening). They definitely seem to track the head (or features like ears, neck, etc.). depending on your background and surroundings that's actually a rather trivial task.
Question 2: Sure! There are lots of premade toolsets out there, but you could as well use some general image processing library such as OpenCV to do the math. Augmented reality usually uses some kind of pattern (e.g. a card or page with a known pattern) to determine the correct position and transformation for the contents to be added to the image. There are also approaches using the device's orientation and perspective changes in camera images to determine depth/position (I really like this demo).