Robust algorithm for detecting vehicles before stop line - computer-vision

I need to write a program that use a camera to detect presence of a vehicle inside a determined region on the road before stop line on the intersection (just like an inductive loop). Outputs will true or false based on the visibility of a vehicle on that region. Camera can be installed perpendicular to the road or above the road. Currently I need an algorithm.
The following image is a sample implementation in order to detect vehicles in the intersection:
After some study in this field I realized this technique is background subtraction, the program model background and when a vehicle got inside the area, it will be detected. But the definition says it detect moving vehicles so what if cars stop on the sensor 50%-60% of times(when signal lights becomes red)? Will they become background? Are they detected all the times?
I've seen some algorithm in the BS field, like Mixture of Gaussian, but doubt they work in the real situation because of the above problem.
Currently I program some method like averaging using OpenCV under Linux. Program calculate pixels average inside that rectangle and save this value inside a buffer, calculate mode and compare with current frame. But there are problems like vehicles lights at night, vehicles shadow in day, and stopping cars on my sensor because of red signal.

I would like to recommend better detection of vehicles than separate the foreground from the background. There is much lights conditions problems and it is old fashion.
In opencv you can use for example haar cascade or LBP for the fast and simple detection of vehicles. In opencv 3.1 there are 2 utilily for learning the detector.
To use detector is simple.
Same as
In this tutorial
There is also some sources on web where you can download already pretrained cascade for car detection.
Code in detection Opencv is simple on and easy to understand
You can find the examples on my blog. Also I have one car dataset containing 2000 car positive samples. This samples just list in bash into the list of positiva samples and use utility to create sample and traincascade. LBP cascades are little bit faster with comparable performance..
I learned cascade on windows also under Linux.. The diference is about the run the program. Also the training (vec.vec bg.dat data have to be prepared in create samples utility.. If you have dataset the prepare the training takes 20 minutes. Problem is where to find data. I got dataset on my blog. Also try to understand the script. My -w 32 -h 64 parameters are for people detection. Dor Car is better something like -w 32 -h 32.
./opencv_traincascade and parameters
opencv_traincascade.exe -data v5 -vec vec.vec -bg bg.dat -numPos 540 -numNeg 700 -numStages 11 numThreads 4 -stageType BOOST -featureType LBP -w 32 -h 64 -minHitRate 0.999995 -maxFalseAlarmRate 0.2 -maxDepth 10 -maxWeakCount 120 -mode ALL
I also collect some dataset to train the detector..
You can download the dataset also from Dataset

Related

Better model for classifying image quality (seperate sharp & well lit images from blurry/out of focus/grainy images)

I have a dataset of around 20K images that are human labelled. Labels are as follows:
Label = 1 if the image is sharp and well lit, and
Label = 0 for those blurry/out of focus/grainy images.
The images are of documents such as Identity cards.
I want to build a Computer Vision model that can do the classification task.
I tried using VGG-16 for transfer learning for this task but it did not give good results (precision .65 and recall = .73). My sense is that VGG-16 is not suitable for this task. It is trained on ImageNet and has very different low level features. Interestingly the model is under-fitting.
We also tried EfficientNet 7. Though the model was able to decently perform on training and validation, test performance remains bad.
Can someone suggest more suitable model to try for this task?
I think your problem with VGG and other NN is the resizing of images:
VGG expects as input 224x224 size image. I assume your dataset has much larger resolution, and thus you significantly downscale the input images before feeding them to your network.
What happens to blur/noise when you downscale an image?
Blurry and noisy images become sharper and cleaner as you decrease the resolution. Therefore, in many of your training examples, the net sees a perfectly good image while you label them as "corrupt". This is not good for training.
An interesting experiment would be to see what types of degradations your net can classify correctly and what types it fails: You report 65% precision # 73% recall. Can you look at the classified images at that point and group them by degradation type?
That is, what is precision/recall for only blurry images? what is it for noisy images? What about grainy images?
What can you do?
Do not resize images at all! if the network needs fixed size input - then crop rather than resize.
Taking advantage of the "resizing" effect, you can approach the problem using a "discriminator". Train a network that "discriminate" between an image and its downscaled version. If the image is sharp and clean - this discriminator will find it difficult to succeed. However, for blurred/noisy images the task should be rather easy.
For this task, I think using opencv is sufficient to solve the issue. In fact comparing the variance of Lablacien of the image with a threshold (cv2.Laplacian(image, cv2.CV_64F).var()) will generate a decision if an image is bluered or not.
You ca find an explanation of the method and the code in the following tutorial : detection with opencv
I think that training a classifier that takes the output of one of one of your neural network models and the variance of Laplacien as features will improve the classification results.
I also recommend experementing with ResNet and DenseNet.
I would look at the change in color between pixels, then rank the photos on the median delta between pixels... a sharp change from RGB (0,0,0) to (255,255,255) on each of the adjoining pixels would be the max possible score, the more blur you have the lower the score.
I have done this in the past trying to estimate areas of fields with success.

Image classification in video stream with contours with Opencv

Please I need your help with this problem, I want to create a program to differentiate between the two forms(2 images), with a camera in real time, here are the methods. I found but I’m not sure they’re going to work because I want the detection to be feasible if the object is inclined by 90 degrees or 180 degrees by example, I have to use machine learning in this problem but I am open to any proposition, also I do not have many images in the database.
Here are the methods I found but I'm not sure they will work;
1 - Apply Canny filter to extract contours.
2 - Use a features extractors such SIFT, Fourier Descriptors, Haralick's Features, Hough Transform to extract more details which could be summarised in a short vector.
3-Then train SVM or ANN with this vector.
The goal is to detect two cases : Open or Close
Also i dont know that contours are the best way to solve this problem because the background changes a lot
The original images are valves with different shape, here is an example :

Creating training set from multiple images in Haar Cascade

I am currently working on detecting multiple fruits in a given image. For example, the given image can have fruits like bananas (like yellow, red and green), mangoes, oranges,etc. I was able to create training set with only one image at a time using opencv_createsamples.
Sample Code:
C:\opencv\build\x64\vc14\bin\opencv_createsamples.exe -img redbanana.jpg -bg bg.txt -info info/info.lst -pngoutput info -maxxangle 0.5 -maxyangle 0.5 -maxzangle 0.5 -num 100
Similarly I have done for around 5 fruits, which creates separate vec file for each fruit. Its hard to create for each fruit. Is there any possibility for creating training set from multiple images with a single vec file as an output?
Is there are any other methodology to detect multiple fruits in a given image?
A haar-classifier is ideally suited to detecting one class of similar looking objects quickly as outlined in the opencv documentation http:// docs.opencv.org/2.4/modules/objdetect/doc/cascade_classification.html. For example, the opencv repository (https:// github.com/opencv/opencv) has a list of classifiers (https:// github.com/opencv/opencv/tree/master/data/haarcascades) trained for specific classes of objects.
Unless the objects to be detected are similar (like faces with different features or cars of different makes and models) training would be more effective with a classifier per fruit - e.g., bananas, oranges, mangoes etc.,.
To create a training vector based on multiple positive sample images (and for any other aspect of haar-classifier training I'd recommend the steps here - steps 5 and 6 - and the details covered at http://coding-robin.de/2013/07/22/train-your-own-opencv-haar-classifier.html. In your case the positive images should include all types of bananas, oranges, mangoes etc including variation in color etc.,.
If you want to train the classifier with different variations of the same fruit, you can generate training samples from multiple images as described here.
However, do note that Haar classifiers work in greyscale and it is difficult to guarantee differentiation between objects like red and yellow bananas.
If you want multiple classes in one classifier, I recommend YOLO (You Only Look Once) or SSD (Single Shot multibox Detector).

Setting up a CNN network in keras?

I am currently trying to implement a cnn network, which can map an input to an output.
The input consist of stft of audio files, and the output is a feature vector.
Due to the different length of audio files, will the number of total samples always be different, but each sample has a frame length of 25 ms and 10 ms overlap. shape(x,2050)
The output is a feature vector shape is (x,13).
I thought the use of cnn seemed appropriate here as the stft as the each input contains some information of the previous sample due to the overlap.
Is it possible in keras to design a model, which make use of this, so the there will be calculated a convolutional sum for each row of the matrix, and somehow make it aware of the 25 frame length and the 10 overlap.
Yes it is, see line 220 of this file [1]. This is an implementation of Wavenet in Keras using convolutions. Even though they've created wrapper layers, this should give you the intuition on how to model audio samples.
[1] https://github.com/basveeling/wavenet/blob/master/wavenet.py#L220

Best way to train a pedestrian detector using dlib

I am trying to train a pedestrian detector using dlib and the the INRIA Person Dataset.
So far I used 27 images, the training is fast but the results are unsatisfying (on other images pedestrians are rarely recognized). Here is the result of my training using the train_object_detector program that comes with dlib (in /exmaples directory) :
Saving trained detector to object_detector.svm
Testing detector on training data...
Test detector (precision,recall,AP): 1 0.653061 0.653061
Parameters used:
threads: 4
C: 1
eps: 0.01
target-size: 6400
detection window width: 47
detection window height: 137
upsample this many times : 0
I am aware that other images need to be added to the training in order to have better results but before doing that I want to be sure of the meaning of every parameter printed in the result (precision, recall, AP, c, eps, ...) I am also wondering if you have any recommandations regarding the training : what images to choose ? how many images are needed ? Do I need to annotate every object in the image ? Do I need to ignore some regions in the image ? ...
One last question, is there any trained detector (svm file) that I can use to compare my results ?
Thank you for your answers
I am not familiar with dlib in particular, but let me tell you that you will not get good results with 27 images. In order to generalize well, your classifier needs to see many images with a variety of data. It won't do you any good to supply it with 10,000 images of the same person, wearing the same outfit. You want different people, clothing, settings, angles, and lighting. The INRIA dataset should cover most of those.
Your detection window dimensions and upsample settings will determine how large people must look in the image in order for your trained classifier to detect them reliably. Your settings will detect only people at 1 scale where they are around 137/47 pixels tall/wide. If you upsample even once, you'll be able to detect people at a smaller scale (upsampling makes the person look bigger than they are). I suggest you use a larger dataset and increase the upsampling number (by how much you upsample is another discussion - that appears to be built into the library). Things will take longer, but that is the nature of training classifiers - tweak parameters, retrain, compare the results.
For precision/recall I'll refer you to this wikipedia article. These are not parameters, but results of your classifier. You want both to be as close to 1 as possible.