Any Fast object detection method for Binary image? - computer-vision

My current task is detecting abnormal spot from the binary image.
I trained simple Convolutional NN model for classifying abnormality for 1 spot.
I need to detect abnormal point from large image.
It requires thousands of classifying to inspect whole pixel. (I'm using this method, But It requires tremendous time.)
The example of large image is below.
*need to inspect white/Black patterns. (It seems hard to be segmented)
*yellow box is size of 1 patch of CNN
*the center of yellow box is the location of abnormal spot.
Example of Large Image

Related

How does deep CNN recognize objects at a location not seen in training images?

In the training set, the target object always appears in the bottom left part of the image, even after augmentation.
Will the deep CNN (e.g., DenseNet) recognise same object at the top right part of the image? If so, how?
I know that CNN can achieve a certain degree of translational invariance due to the stride and max pooling. But I don't understand the mechanism for big translations.

Better model for classifying image quality (seperate sharp & well lit images from blurry/out of focus/grainy images)

I have a dataset of around 20K images that are human labelled. Labels are as follows:
Label = 1 if the image is sharp and well lit, and
Label = 0 for those blurry/out of focus/grainy images.
The images are of documents such as Identity cards.
I want to build a Computer Vision model that can do the classification task.
I tried using VGG-16 for transfer learning for this task but it did not give good results (precision .65 and recall = .73). My sense is that VGG-16 is not suitable for this task. It is trained on ImageNet and has very different low level features. Interestingly the model is under-fitting.
We also tried EfficientNet 7. Though the model was able to decently perform on training and validation, test performance remains bad.
Can someone suggest more suitable model to try for this task?
I think your problem with VGG and other NN is the resizing of images:
VGG expects as input 224x224 size image. I assume your dataset has much larger resolution, and thus you significantly downscale the input images before feeding them to your network.
What happens to blur/noise when you downscale an image?
Blurry and noisy images become sharper and cleaner as you decrease the resolution. Therefore, in many of your training examples, the net sees a perfectly good image while you label them as "corrupt". This is not good for training.
An interesting experiment would be to see what types of degradations your net can classify correctly and what types it fails: You report 65% precision # 73% recall. Can you look at the classified images at that point and group them by degradation type?
That is, what is precision/recall for only blurry images? what is it for noisy images? What about grainy images?
What can you do?
Do not resize images at all! if the network needs fixed size input - then crop rather than resize.
Taking advantage of the "resizing" effect, you can approach the problem using a "discriminator". Train a network that "discriminate" between an image and its downscaled version. If the image is sharp and clean - this discriminator will find it difficult to succeed. However, for blurred/noisy images the task should be rather easy.
For this task, I think using opencv is sufficient to solve the issue. In fact comparing the variance of Lablacien of the image with a threshold (cv2.Laplacian(image, cv2.CV_64F).var()) will generate a decision if an image is bluered or not.
You ca find an explanation of the method and the code in the following tutorial : detection with opencv
I think that training a classifier that takes the output of one of one of your neural network models and the variance of Laplacien as features will improve the classification results.
I also recommend experementing with ResNet and DenseNet.
I would look at the change in color between pixels, then rank the photos on the median delta between pixels... a sharp change from RGB (0,0,0) to (255,255,255) on each of the adjoining pixels would be the max possible score, the more blur you have the lower the score.
I have done this in the past trying to estimate areas of fields with success.

What is the best method to train the faces with FaceRecognizer OpenCV to have the best result?

Here I say that I have tried many tutorials to implement face recognition in OpenCV 3.2 by using the FaceRecognizer class in face module. But I did not get the accepted result as I wish.
Here I want to ask and I want to know, that what is the best way or what are the conditions to be care off during training and recognizing?
What I have done to improve the accuracy:
Create (at least) 10 faces for training each person in the best quality, size, and angle.
Try to fit the face in the image.
Equalize the HIST of the images
And then I have tried all the three face recognizer (EigenFaceRecognizer, FisherFaceRecognizer, LBPHFaceRecognizer), the result all was the same, but the recognition rate was really very low, I have trained only for three persons, but also cannot recognize very well (the fist person was recognized as the second and so on problems).
Questions:
Do the training and the recognition images must be from the same
camera?
Do the training images cropped manually (photoshop -> read images then train) or this task
must be done programmatically (detect-> crop-> resize then train)?
And what are the best parameters for the each face recognizer (int num_components, double threshold)
And how to set training Algorithm to return -1 when it is an unknown
person.
Expanding on my comment, Chapter 8 in Mastering OpenCV provides really helpful tips for pre-processing faces to make aid the recognition process, such as:
taking a sample only when both eyes are detected (via haar cascade)
Geometrical transformation and cropping: This process would include scaling, rotating, and translating the images so that the eyes are aligned, followed by the removal of the forehead, chin, ears, and background from the face image.
Separate histogram equalization for left and right sides: This process standardizes the brightness and contrast on both the left- and right-hand sides of the face independently.
Smoothing: This process reduces the image noise using a bilateral filter.
Elliptical mask: The elliptical mask removes some remaining hair and background from the face image.
I've added a hacky load/save to my fork of the example code, feel free to try it/tweak it as you need it. Currently it's very limited, but it's a start.
Additionally, you should also check OpenFace and it's DNN face recognizer.
I haven't played with that yet so can't provide details, but it looks really cool.

Improving accuracy OpenCV HOG people detector

I'm working in a project. A part of project consist to integrate the HOG people detector of OpenCV with a camera streaming .
Currently It's working the camera and the basic HOG detector (CPP detectMultiScale -> http://docs.opencv.org/modules/gpu/doc/object_detection.html). But don't work very well... The detections are very noising and the algorithm isn't very accuracy...
Why?
My camera image is 640 x 480 pixels.
The snippet code I'm using is:
std::vector<cv::Rect> found, found_filtered;
cv::HOGDescriptor hog;
hog.setSVMDetector(cv::HOGDescriptor::getDefaultPeopleDetector());
hog.detectMultiScale(image, found, 0, cv::Size(8,8), cv::Size(32,32), 1.05, 2);
Why don't work properly? What need for improve the accuracy? Is necessary some image size particular?
PS: Do you know some precise people detection algorithm, faster and developed in cpp ??
The size of the default people detector is 64x128, that mean that the people you would want to detect have to be atleast 64x128. For your camera resolution that would mean that a person would have to take up quite some space before getting properly detected.
Depending on your specific situation, you could try your hand at training your own HOG Descriptor, with a smaller size. You could take a look at this answer and the referenced library if you want to train your own HOG Descriptor.
For the Parameters:
win_stride:
Given your input image has a size of 640 x 480, and the defaultpeopleDetector has a window size of 64x128, you can fit the HOG Detection window ( the 64x128 window) multiple times in the input image.
The winstride tells HOG to move the detection window a certain amount each time.
How does this work:
Hog places the detection window on the top left of your input image.
and moves the detection window each time by the win_stride.
Like this (small win_stride):
or like this (large win_stride)
A smaller winstride should improve accuracy, but decreases preformance, and the other way around
padding
Padding adds a certain amount of extra pixels on each side of the input image. That way the detection window is placed a bit outside the input image. It's because of that padding that HOG can detect people who are very close to the edge of the input image.
group_threshold
The group_treshold determines a value by when detected parts should be placed in a group.
Low value provides no result grouping, a higher value provides result grouping if the amount of treshold has been found inside the detection windows. (in my own experience, I have never needed to change the default value)
I hope this makes a bit of sense for you.
I've been working with HOG for the past few weeks, and read alot of papers, but I lost some of the references, so I can't link you the pages where this info comes from, I'm sorry.

prior based image segmentation

I have a problem at hand, in which my image is composed of strange objects which do not necessarily have closed contours. (more like rivers and channels on a plain back ground).
I am also provided with a set of prior images of the same size from different rivers that their general orientation and structure matches my river under study while their position in the image might deviate.
I am looking for an image segmentation method, (theory or practice, i am really looking for clues to start with) which can actually use my set of prior examples in segmenting my river. in my case there could be multiple rivers of the same general orientation present in the image.
I am also very interested in ways of statistically representing these complex structures. for example, if it was not a river image (binary image), and i knew it had a Gaussian structure, then I could use information in the covariance estimated by the examples. but in binary or trinary images, I can not.
Here is an outline for image segmentation
Sample a small region (possible a rectangle) inside the river, the assumption is that they will belong to the foreground and provide a good estimate about its color distribution. You should have an algorithm which can find a small region inside the river with high confidence, probably this algorithm can be trained on the data you have.
Since you know little about the background, it would to be ideal to chose pixels lying on the image frame as background pixels.
The idea is to use these pre-selected foreground and background pixels as seeds in a graph cut algorithm for segmentation. Selecting seeds is the most important part of a graph cut algorithm for segmentation, once you have good seeds, the segmentation would be more or less correct. There is plenty of literature/code available online on how to do segmentation using graph cuts.