OpenCV and C++ - Shape and road signs detection - c++
I have to write a program that detect 3 types of road signs (speed limit, no parking and warnings). I know how to detect a circle using HoughCircles but I have several images and the parameters for HoughCircles are different for each image. There's a general way to detect circles without changing parameters for each image?
Moreover I need to detect triangle (warning signs) so I'm searching for a general shape detector. Have you any suggestions/code that can help me in this task?
Finally for detect the number on speed limit signs I thought to use SIFT and compare the image with some templates in order to identify the number on the sign. Could it be a good approach?
Thank you for the answer!
I know this is a pretty old question but I had been through the same problem and now I show you how I solved it.
The following images show some of the most accurate results that are displayed by the opencv program.
In the following images the street signs detected are circled with three different colors that distinguish the three kinds of street signs (warning, no parking, speed limit).
Red for warning signs
Blue for no parking signs
Fuchsia for speed limit signs
The speed limit value is written in green above the speed limit signs
[![example][1]][1]
[![example][2]][2]
[![example][3]][3]
[![example][4]][4]
As you can see the program performs quite well, it is able to detect and distinguish the three kinds of sign and to recognize the speed limit value in case of speed limit signs. Everything is done without computing too many false positives when, for instance, in the image there are some signs that do not belong to one of the three categories.
In order to achieve this result the software computes the detection in three main steps.
The first step involves a color based approach where the red objects in the image are detected and their region are extract to be analyzed. This step is particularly useful in order to prevent the detection of false positives, because only a small part of the image is processed.
The second step works with a machine learning algorithm: in particular we use a Cascade Classifier to compute the detection. This operation firstly requires to train the classifiers and on a later stage to use them to detect the signs.
In the last step the speed limit values inside the speed limit signs are read, also in this case through a machine learning algorithm but using the k-nearest neighbor algorithm.
Now we are going to see in detail each step.
COLOR BASED STEP
Since the street signs are always circled by a red frame, we can afford to take out and analyze only the regions where the red objects are detected.
In order to select the red objects, we consider all the ranges of the red color: even if this may produce some false positives, they will be easily discarded in the next steps.
inRange(image, Scalar(0, 70, 50), Scalar(10, 255, 255), mask1);
inRange(image, Scalar(170, 70, 50), Scalar(180, 255, 255), mask2);
In the image below we can see an example of the red objects detected with this method.
After having found the red pixels we can gather them to find the regions using a clustering algorithm, I use the method
partition(<#_ForwardIterator __first#>, _ForwardIterator __last, <#_Predicate __pred#>)
After the execution of this method we can save all the points in the same cluster in a vector (one for each cluster) and extract the bounding boxes which represent the
regions to be analyzed in the next step.
HAAR CASCADE CLASSIFIERS FOR SIGNS DETECTION
This is the real detection step where the street signs are detected. In order to perform a cascade classifier the first step consist in building a dataset of positives and negatives images. Now I explain how I have built my own datasets of images.
The first thing to note is that we need to train three different Haar cascades in order to distinguish between the three kind of signs that we have to detect, hence we must repeat the following steps for each of the three kinds of sign.
We need two datasets: one for the positive samples (which must be a set of images that contains the road signs that we are going to detect) and another one for the negative samples which can be any kind of image without street signs.
After collecting a set of 100 images for the positive samples and a set of 200 images for the negatives in two different folders, we need to write two text files:
Signs.info which contains a list of file names like the one below,
one for each positive sample in the positive folder.
pos/image_name.png 1 0 0 50 45
Here, the numbers after the name represent respectively the number
of street signs in the image, the coordinate of the upper left
corner of the street sign, his height and his width.
Bg.txt which contains a list of file names like the one below, one
for each sign in the negative folder.
neg/street15.png
With the command line below we generate the .vect file which contains all the information that the software retrieves from the positive samples.
opencv_createsamples -info sign.info -num 100 -w 50 -h 50 -vec signs.vec
Afterwards we train the cascade classifier with the following command:
opencv_traincascade -data data -vec signs.vec -bg bg.txt -numPos 60 -numNeg 200 -numStages 15 -w 50 -h 50 -featureType LBP
where the number of stages indicates the number of classifiers that will be generated in order to build the cascade.
At the end of this process we gain a file cascade.xml which will be used from the CascadeClassifier program in order to detect the objects in the image.
Now we have trained our algorithm and we can declare a CascadeClassifier for each kind of street sign, than we detect the signs in the image through
detectMultiScale(<#InputArray image#>, <#std::vector<Rect> &objects#>)
this method creates a Rect around each object that has been detected.
It is important to note that exactly as every machine learning algorithm, in order to perform well, we need a large number of samples in the dataset. The dataset that I have built, is not extremely large, thus in some situations it is not able to detect all the signs. This mostly happens when a small part of the street sign is not visible in the image like in the warning sign below:
I have expanded my dataset up to the point where I have obtained a fairly accurate result without
too many errors.
SPEED LIMIT VALUE DETECTION
Like for the street signs detection also here I used a machine learning algorithm but with a different approach. After some work, I realized that an OCR (tesseract) solution does not perform well, so I decided to build my own ocr software.
For the machine learning algorithm I took the image below as training data which contains some speed limit values:
The amount of training data is small. But, since in speed limit signs all letters have the same font, it is not a huge problem.
To prepare the data for training, I made a small code in OpenCV. It does the following things:
It loads the image on the left;
It selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
It draws the bounding rectangle around one letter and it waits for the key to be manually pressed. This time the user presses the digit key corresponding to the letter in box by himself.
Once the corresponding digit key is pressed, it saves 100 pixel values in an array and the correspondent manually entered digit in another array.
Eventually it saves both the arrays in separate txt files.
Following the manual digit classification all the digits in the train data( train.png) are manually labeled, and the image will look like the one below.
Now we enter into training and testing part.
For training we do as follows:
Load the txt files we already saved earlier
Create an instance of classifier that we are going to use ( KNearest)
Then we use KNearest.train function to train the data
Now the detection:
We load the image with the speed limit sign detected
Process the image as before and extract each digit using contour methods
Draw bounding box for it, then resize to 10x10, and store its pixel values in an array as done earlier.
Then we use KNearest.find_nearest() function to find the nearest item to the one we gave.
And it recognizes the correct digit.
I tested this little OCR on many images, and just with this small dataset I have obtained an accuracy of about 90%.
CODE
Below I post all my openCv c++ code in a single class, following my instruction you should be able to achive my result.
#include "opencv2/objdetect/objdetect.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
#include <cmath>
#include <stdlib.h>
#include "opencv2/core/core.hpp"
#include "opencv2/highgui.hpp"
#include <string.h>
#include <opencv2/ml/ml.hpp>
using namespace std;
using namespace cv;
std::vector<cv::Rect> getRedObjects(cv::Mat image);
vector<Mat> detectAndDisplaySpeedLimit( Mat frame );
vector<Mat> detectAndDisplayNoParking( Mat frame );
vector<Mat> detectAndDisplayWarning( Mat frame );
void trainDigitClassifier();
string getDigits(Mat image);
vector<Mat> loadAllImage();
int getSpeedLimit(string speed);
//path of the haar cascade files
String no_parking_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/no_parking_cascade.xml";
String speed_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/speed_limit_cascade.xml";
String warning_signs_cascade = "/Users/giuliopettenuzzo/Desktop/cascade_classifiers/warning_cascade.xml";
CascadeClassifier speed_limit_cascade;
CascadeClassifier no_parking_cascade;
CascadeClassifier warning_cascade;
int main(int argc, char** argv)
{
//train the classifier for digit recognition, this require a manually train, read the report for more details
trainDigitClassifier();
cv::Mat sceneImage;
vector<Mat> allImages = loadAllImage();
for(int i = 0;i<=allImages.size();i++){
sceneImage = allImages[i];
//load the haar cascade files
if( !speed_limit_cascade.load( speed_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
if( !no_parking_cascade.load( no_parking_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
if( !warning_cascade.load( warning_signs_cascade ) ){ printf("--(!)Error loading\n"); return -1; };
Mat scene = sceneImage.clone();
//detect the red objects
std::vector<cv::Rect> allObj = getRedObjects(scene);
//use the three cascade classifier for each object detected by the getRedObjects() method
for(int j = 0;j<allObj.size();j++){
Mat img = sceneImage(Rect(allObj[j]));
vector<Mat> warningVec = detectAndDisplayWarning(img);
if(warningVec.size()>0){
Rect box = allObj[j];
}
vector<Mat> noParkVec = detectAndDisplayNoParking(img);
if(noParkVec.size()>0){
Rect box = allObj[j];
}
vector<Mat> speedLitmitVec = detectAndDisplaySpeedLimit(img);
if(speedLitmitVec.size()>0){
Rect box = allObj[j];
for(int i = 0; i<speedLitmitVec.size();i++){
//get speed limit and skatch it in the image
int digit = getSpeedLimit(getDigits(speedLitmitVec[i]));
if(digit > 0){
Point point = box.tl();
point.y = point.y + 30;
cv::putText(sceneImage,
"SPEED LIMIT " + to_string(digit),
point,
cv::FONT_HERSHEY_COMPLEX_SMALL,
0.7,
cv::Scalar(0,255,0),
1,
cv::CV__CAP_PROP_LATEST);
}
}
}
}
imshow("currentobj",sceneImage);
waitKey(0);
}
}
/*
* detect the red object in the image given in the param,
* return a vector containing all the Rect of the red objects
*/
std::vector<cv::Rect> getRedObjects(cv::Mat image)
{
Mat3b res = image.clone();
std::vector<cv::Rect> result;
cvtColor(image, image, COLOR_BGR2HSV);
Mat1b mask1, mask2;
//ranges of red color
inRange(image, Scalar(0, 70, 50), Scalar(10, 255, 255), mask1);
inRange(image, Scalar(170, 70, 50), Scalar(180, 255, 255), mask2);
Mat1b mask = mask1 | mask2;
Mat nonZeroCoordinates;
vector<Point> pts;
findNonZero(mask, pts);
for (int i = 0; i < nonZeroCoordinates.total(); i++ ) {
cout << "Zero#" << i << ": " << nonZeroCoordinates.at<Point>(i).x << ", " << nonZeroCoordinates.at<Point>(i).y << endl;
}
int th_distance = 2; // radius tolerance
// Apply partition
// All pixels within the radius tolerance distance will belong to the same class (same label)
vector<int> labels;
// With lambda function (require C++11)
int th2 = th_distance * th_distance;
int n_labels = partition(pts, labels, [th2](const Point& lhs, const Point& rhs) {
return ((lhs.x - rhs.x)*(lhs.x - rhs.x) + (lhs.y - rhs.y)*(lhs.y - rhs.y)) < th2;
});
// You can save all points in the same class in a vector (one for each class), just like findContours
vector<vector<Point>> contours(n_labels);
for (int i = 0; i < pts.size(); ++i){
contours[labels[i]].push_back(pts[i]);
}
// Get bounding boxes
vector<Rect> boxes;
for (int i = 0; i < contours.size(); ++i)
{
Rect box = boundingRect(contours[i]);
if(contours[i].size()>500){//prima era 1000
boxes.push_back(box);
Rect enlarged_box = box + Size(100,100);
enlarged_box -= Point(30,30);
if(enlarged_box.x<0){
enlarged_box.x = 0;
}
if(enlarged_box.y<0){
enlarged_box.y = 0;
}
if(enlarged_box.height + enlarged_box.y > res.rows){
enlarged_box.height = res.rows - enlarged_box.y;
}
if(enlarged_box.width + enlarged_box.x > res.cols){
enlarged_box.width = res.cols - enlarged_box.x;
}
Mat img = res(Rect(enlarged_box));
result.push_back(enlarged_box);
}
}
Rect largest_box = *max_element(boxes.begin(), boxes.end(), [](const Rect& lhs, const Rect& rhs) {
return lhs.area() < rhs.area();
});
//draw the rects in case you want to see them
for(int j=0;j<=boxes.size();j++){
if(boxes[j].area() > largest_box.area()/3){
rectangle(res, boxes[j], Scalar(0, 0, 255));
Rect enlarged_box = boxes[j] + Size(20,20);
enlarged_box -= Point(10,10);
rectangle(res, enlarged_box, Scalar(0, 255, 0));
}
}
rectangle(res, largest_box, Scalar(0, 0, 255));
Rect enlarged_box = largest_box + Size(20,20);
enlarged_box -= Point(10,10);
rectangle(res, enlarged_box, Scalar(0, 255, 0));
return result;
}
/*
* code for detect the speed limit sign , it draws a circle around the speed limit signs
*/
vector<Mat> detectAndDisplaySpeedLimit( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
//normalizes the brightness and increases the contrast of the image
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
speed_limit_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << speed_limit_cascade.getFeatureType();
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 255 ), 4, 8, 0 );
Mat resultImage = frame(Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height));
result.push_back(resultImage);
}
return result;
}
/*
* code for detect the warning sign , it draws a circle around the warning signs
*/
vector<Mat> detectAndDisplayWarning( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
warning_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << warning_cascade.getFeatureType();
Rect previus;
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
Rect newRect = Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height);
if((previus & newRect).area()>0){
previus = newRect;
}else{
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 0, 0, 255 ), 4, 8, 0 );
Mat resultImage = frame(newRect);
result.push_back(resultImage);
previus = newRect;
}
}
return result;
}
/*
* code for detect the no parking sign , it draws a circle around the no parking signs
*/
vector<Mat> detectAndDisplayNoParking( Mat frame )
{
std::vector<Rect> signs;
vector<Mat> result;
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
equalizeHist( frame_gray, frame_gray );
//-- Detect signs
no_parking_cascade.detectMultiScale( frame_gray, signs, 1.1, 3, 0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
cout << no_parking_cascade.getFeatureType();
Rect previus;
for( size_t i = 0; i < signs.size(); i++ )
{
Point center( signs[i].x + signs[i].width*0.5, signs[i].y + signs[i].height*0.5 );
Rect newRect = Rect(center.x - signs[i].width*0.5,center.y - signs[i].height*0.5,signs[i].width,signs[i].height);
if((previus & newRect).area()>0){
previus = newRect;
}else{
ellipse( frame, center, Size( signs[i].width*0.5, signs[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 0 ), 4, 8, 0 );
Mat resultImage = frame(newRect);
result.push_back(resultImage);
previus = newRect;
}
}
return result;
}
/*
* train the classifier for digit recognition, this could be done only one time, this method save the result in a file and
* it can be used in the next executions
* in order to train user must enter manually the corrisponding digit that the program shows, press space if the red box is just a point (false positive)
*/
void trainDigitClassifier(){
Mat thr,gray,con;
Mat src=imread("/Users/giuliopettenuzzo/Desktop/all_numbers.png",1);
cvtColor(src,gray,CV_BGR2GRAY);
threshold(gray,thr,125,255,THRESH_BINARY_INV); //Threshold to find contour
imshow("ci",thr);
waitKey(0);
thr.copyTo(con);
// Create sample and label data
vector< vector <Point> > contours; // Vector for storing contour
vector< Vec4i > hierarchy;
Mat sample;
Mat response_array;
findContours( con, contours, hierarchy,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE ); //Find contour
for( int i = 0; i< contours.size(); i=hierarchy[i][0] ) // iterate through first hierarchy level contours
{
Rect r= boundingRect(contours[i]); //Find bounding rect for each contour
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,0,255),2,8,0);
Mat ROI = thr(r); //Crop the image
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR ); //resize to 10X10
tmp1.convertTo(tmp2,CV_32FC1); //convert to float
imshow("src",src);
int c=waitKey(0); // Read corresponding label for contour from keyoard
c-=0x30; // Convert ascii to intiger value
response_array.push_back(c); // Store label to a mat
rectangle(src,Point(r.x,r.y), Point(r.x+r.width,r.y+r.height), Scalar(0,255,0),2,8,0);
sample.push_back(tmp2.reshape(1,1)); // Store sample data
}
// Store the data to file
Mat response,tmp;
tmp=response_array.reshape(1,1); //make continuous
tmp.convertTo(response,CV_32FC1); // Convert to float
FileStorage Data("TrainingData.yml",FileStorage::WRITE); // Store the sample data in a file
Data << "data" << sample;
Data.release();
FileStorage Label("LabelData.yml",FileStorage::WRITE); // Store the label data in a file
Label << "label" << response;
Label.release();
cout<<"Training and Label data created successfully....!! "<<endl;
imshow("src",src);
waitKey(0);
}
/*
* get digit from the image given in param, using the classifier trained before
*/
string getDigits(Mat image)
{
Mat thr1,gray1,con1;
Mat src1 = image.clone();
cvtColor(src1,gray1,CV_BGR2GRAY);
threshold(gray1,thr1,125,255,THRESH_BINARY_INV); // Threshold to create input
thr1.copyTo(con1);
// Read stored sample and label for training
Mat sample1;
Mat response1,tmp1;
FileStorage Data1("TrainingData.yml",FileStorage::READ); // Read traing data to a Mat
Data1["data"] >> sample1;
Data1.release();
FileStorage Label1("LabelData.yml",FileStorage::READ); // Read label data to a Mat
Label1["label"] >> response1;
Label1.release();
Ptr<ml::KNearest> knn(ml::KNearest::create());
knn->train(sample1, ml::ROW_SAMPLE,response1); // Train with sample and responses
cout<<"Training compleated.....!!"<<endl;
vector< vector <Point> > contours1; // Vector for storing contour
vector< Vec4i > hierarchy1;
//Create input sample by contour finding and cropping
findContours( con1, contours1, hierarchy1,CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE );
Mat dst1(src1.rows,src1.cols,CV_8UC3,Scalar::all(0));
string result;
for( int i = 0; i< contours1.size(); i=hierarchy1[i][0] ) // iterate through each contour for first hierarchy level .
{
Rect r= boundingRect(contours1[i]);
Mat ROI = thr1(r);
Mat tmp1, tmp2;
resize(ROI,tmp1, Size(10,10), 0,0,INTER_LINEAR );
tmp1.convertTo(tmp2,CV_32FC1);
Mat bestLabels;
float p=knn -> findNearest(tmp2.reshape(1,1),4, bestLabels);
char name[4];
sprintf(name,"%d",(int)p);
cout << "num = " << (int)p;
result = result + to_string((int)p);
putText( dst1,name,Point(r.x,r.y+r.height) ,0,1, Scalar(0, 255, 0), 2, 8 );
}
imwrite("dest.jpg",dst1);
return result ;
}
/*
* from the digits detected, it returns a speed limit if it is detected correctly, -1 otherwise
*/
int getSpeedLimit(string numbers){
if ((numbers.find("30") != std::string::npos) || (numbers.find("03") != std::string::npos)) {
return 30;
}
if ((numbers.find("50") != std::string::npos) || (numbers.find("05") != std::string::npos)) {
return 50;
}
if ((numbers.find("80") != std::string::npos) || (numbers.find("08") != std::string::npos)) {
return 80;
}
if ((numbers.find("70") != std::string::npos) || (numbers.find("07") != std::string::npos)) {
return 70;
}
if ((numbers.find("90") != std::string::npos) || (numbers.find("09") != std::string::npos)) {
return 90;
}
if ((numbers.find("100") != std::string::npos) || (numbers.find("001") != std::string::npos)) {
return 100;
}
if ((numbers.find("130") != std::string::npos) || (numbers.find("031") != std::string::npos)) {
return 130;
}
return -1;
}
/*
* load all the image in the file with the path hard coded below
*/
vector<Mat> loadAllImage(){
vector<cv::String> fn;
glob("/Users/giuliopettenuzzo/Desktop/T1/dataset/*.jpg", fn, false);
vector<Mat> images;
size_t count = fn.size(); //number of png files in images folder
for (size_t i=0; i<count; i++)
images.push_back(imread(fn[i]));
return images;
}
maybe you should try implementing the ransac algorithm, if you are using color images, migt be a good idea (if you are in europe) to get the red channel only since the speed limits are surrounded by a red cricle (or a thin white i think also).
For that you need to filter the image to get the edges, (canny filter).
Here are some useful links:
OpenCV detect partial circle with noise
https://hal.archives-ouvertes.fr/hal-00982526/document
Finally for the numbers detection i think its ok. Other approach is to use something like Viola-Jones algorithm to detect the signals, with pretrained existing models... It's up to you!
Related
Opencv - How to get number of vertical lines present in image (count of lines)
Firstly I integrate OpenCV framework to XCode and All the OpenCV code is on ObjectiveC and I am using in Swift Using bridging header. I am new to OpenCV Framework and trying to achieve count of vertical lines from the image. Here is my code: First I am converting the image to GrayScale + (UIImage *)convertToGrayscale:(UIImage *)image { cv::Mat mat; UIImageToMat(image, mat); cv::Mat gray; cv::cvtColor(mat, gray, CV_RGB2GRAY); UIImage *grayscale = MatToUIImage(gray); return grayscale; } Then, I am detecting edges so I can find the line of gray color + (UIImage *)detectEdgesInRGBImage:(UIImage *)image { cv::Mat mat; UIImageToMat(image, mat); //Prepare the image for findContours cv::threshold(mat, mat, 128, 255, CV_THRESH_BINARY); //Find the contours. Use the contourOutput Mat so the original image doesn't get overwritten std::vector<std::vector<cv::Point> > contours; cv::Mat contourOutput = mat.clone(); cv::findContours( contourOutput, contours, CV_RETR_EXTERNAL, CV_CHAIN_APPROX_SIMPLE ); NSLog(#"Count =>%lu", contours.size()); //For Blue /*cv::GaussianBlur(mat, gray, cv::Size(11, 11), 0); */ UIImage *grayscale = MatToUIImage(mat); return grayscale; } This both Function is written on Objective C Here, I am calling both function Swift override func viewDidLoad() { super.viewDidLoad() let img = UIImage(named: "imagenamed") let img1 = Wrapper.convert(toGrayscale: img) self.capturedImageView.image = Wrapper.detectEdges(inRGBImage: img1) } I was doing this for some days and finding some useful documents(Reference Link) OpenCV - how to count objects in photo? How to count number of lines (Hough Trasnform) in OpenCV OPENCV Documents https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html?#findcontours Basically, I understand the first we need to convert this image to black and white, and then using cvtColor, threshold and findContours we can find the colors or lines. I am attaching the image that vertical Lines I want to get. Original Image Output Image that I am getting I got number of lines count =>10 I am not able to get accurate count here. Please guide me on this. Thank You!
Since you want to detect the number of the vertical lines, there is a very simple approach I can suggest for you. You already got a clear output and I used this output in my code. Here are the steps before the code: Preprocess the input image to get the lines clearly Check each row and check until get a pixel whose value is higher than 100(threshold value I chose) Then increase the line counter for that row Continue on that line until get a pixel whose value is lower than 100 Restart from step 3 and finish the image for each row At the end, check the most repeated element in the array which you assigned line numbers for each row. This number will be the number of vertical lines. Note: If the steps are difficult to understand, think like this way: " I am checking the first row, I found a pixel which is higher than 100, now this is a line edge starting, increase the counter for this row. Search on this row until get a pixel smaller than 100, and then research a pixel bigger than 100. when row is finished, assign the line number for this row to a big array. Do this for all image. At the end, since some lines looks like two lines at the top and also some noises can occur, you should take the most repeated element in the big array as the number of lines." Here is the code part in C++: #include <vector> #include <iostream> #include <opencv2/opencv.hpp> #include <opencv2/highgui/highgui.hpp> int main() { cv::Mat img = cv::imread("/ur/img/dir/img.jpg",cv::IMREAD_GRAYSCALE); std::vector<int> numberOfVerticalLinesForEachRow; cv::Rect r(0,0,img.cols-10,200); img = img(r); bool blackCheck = 1; for(int i=0; i<img.rows; i++) { int numberOfLines = 0; for(int j=0; j<img.cols; j++) { if((int)img.at<uchar>(cv::Point(j,i))>100 && blackCheck) { numberOfLines++; blackCheck = 0; } if((int)img.at<uchar>(cv::Point(j,i))<100) blackCheck = 1; } numberOfVerticalLinesForEachRow.push_back(numberOfLines); } // In this part you need a simple algorithm to check the most repeated element for(int k:numberOfVerticalLinesForEachRow) std::cout<<k<<std::endl; cv::namedWindow("WinWin",0); cv::imshow("WinWin",img); cv::waitKey(0); }
Here's another possible approach. It relies mainly on the cv::thinning function from the extended image processing module to reduce the lines at a width of 1 pixel. We can crop a ROI from this image and count the number of transitions from 255 (white) to 0 (black). These are the steps: Threshold the image using Otsu's method Apply some morphology to clean up the binary image Get the skeleton of the image Crop a ROI from the center of the image Count the number of jumps from 255 to 0 This is the code, be sure to include the extended image processing module (ximgproc) and also link it before compiling it: #include <iostream> #include <opencv2/opencv.hpp> #include <opencv2/ximgproc.hpp> // The extended image processing module // Read Image: std::string imagePath = "D://opencvImages//"; cv::Mat inputImage = cv::imread( imagePath+"IN2Xh.png" ); // Convert BGR to Grayscale: cv::cvtColor( inputImage, inputImage, cv::COLOR_BGR2GRAY ); // Get binary image via Otsu: cv::threshold( inputImage, inputImage, 0, 255, cv::THRESH_OTSU ); The above snippet produces the following image: Note that there's a little bit of noise due to the thresholding, let's try to remove those isolated blobs of white pixels by applying some morphology. Maybe an opening, which is an erosion followed by dilation. The structuring elements and iterations, though, are not the same, and these where found by experimentation. I wanted to remove the majority of the isolated blobs without modifying too much the original image: // Apply Morphology. Erosion + Dilation: // Set rectangular structuring element of size 3 x 3: cv::Mat SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(3, 3) ); // Set the iterations: int morphoIterations = 1; cv::morphologyEx( inputImage, inputImage, cv::MORPH_ERODE, SE, cv::Point(-1,-1), morphoIterations); // Set rectangular structuring element of size 5 x 5: SE = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(5, 5) ); // Set the iterations: morphoIterations = 2; cv::morphologyEx( inputImage, inputImage, cv::MORPH_DILATE, SE, cv::Point(-1,-1), morphoIterations); This combination of structuring elements and iterations yield the following filtered image: Its looking alright. Now comes the main idea of the algorithm. If we compute the skeleton of this image, we would "normalize" all the lines to a width of 1 pixel, which is very handy, because we could reduce the image to a 1 x 1 (row) matrix and count the number of jumps. Since the lines are "normalized" we could get rid of possible overlaps between lines. Now, skeletonized images sometimes produce artifacts near the borders of the image. These artifacts resemble thickened anchors at the first and last row of the image. To prevent these artifacts we can extend borders prior to computing the skeleton: // Extend borders to avoid skeleton artifacts, extend 5 pixels in all directions: cv::copyMakeBorder( inputImage, inputImage, 5, 5, 5, 5, cv::BORDER_CONSTANT, 0 ); // Get the skeleton: cv::Mat imageSkelton; cv::ximgproc::thinning( inputImage, imageSkelton ); This is the skeleton obtained: Nice. Before we count jumps, though, we must observe that the lines are skewed. If we reduce this image directly to a one row, some overlapping could indeed happen between to lines that are too skewed. To prevent this, I crop a middle section of the skeleton image and count transitions there. Let's crop the image: // Crop middle ROI: cv::Rect linesRoi; linesRoi.x = 0; linesRoi.y = 0.5 * imageSkelton.rows; linesRoi.width = imageSkelton.cols; linesRoi.height = 1; cv::Mat imageROI = imageSkelton( linesRoi ); This would be the new ROI, which is just the middle row of the skeleton image: Let me prepare a BGR copy of this just to draw some results: // BGR version of the Grayscale ROI: cv::Mat colorROI; cv::cvtColor( imageROI, colorROI, cv::COLOR_GRAY2BGR ); Ok, let's loop through the image and count the transitions between 255 and 0. That happens when we look at the value of the current pixel and compare it with the value obtained an iteration earlier. The current pixel must be 0 and the past pixel 255. There's more than a way to loop through a cv::Mat in C++. I prefer to use cv::MatIterator_s and pointer arithmetic: // Set the loop variables: cv::MatIterator_<cv::Vec3b> it, end; uchar pastPixel = 0; int jumpsCounter = 0; int i = 0; // Loop thru image ROI and count 255-0 jumps: for (it = imageROI.begin<cv::Vec3b>(), end = imageROI.end<cv::Vec3b>(); it != end; ++it) { // Get current pixel uchar ¤tPixel = (*it)[0]; // Compare it with past pixel: if ( (currentPixel == 0) && (pastPixel == 255) ){ // We have a jump: jumpsCounter++; // Draw the point on the BGR version of the image: cv::line( colorROI, cv::Point(i, 0), cv::Point(i, 0), cv::Scalar(0, 0, 255), 1 ); } // current pixel is now past pixel: pastPixel = currentPixel; i++; } // Show image and print number of jumps found: cv::namedWindow( "Jumps Found", CV_WINDOW_NORMAL ); cv::imshow( "Jumps Found", colorROI ); cv::waitKey( 0 ); std::cout<<"Jumps Found: "<<jumpsCounter<<std::endl; The points where the jumps were found are drawn in red, and the number of total jumps printed is: Jumps Found: 9
Lines and edges detector, opencv
I'm trying to process the following images from a maze. My Question is about how to process the edges. I'm using OpenCV 2.4 with c++. I'd like to know if there is any way to discriminate the edges between the floor and the wall from the lines painted in the floor? The floor is black, the walls are white and the lines painted in the floor are white too. What I am trying to do is distinguish between wall and marks in floor. The lines on the floor will give me a distance reference and if I can turn in the maze. While the walls just tell the limit of the halls of the maze. here you'll find the process images I've done. I'm using Canny and HoughLinesP functions to detect and save the lines. But as you can see in the images the program doesn't separate the lines from the edges. The code: vector<Vec4i> get_lines(Mat dst, Mat cdst) { vector<Vec4i> lines; HoughLinesP(dst, lines, 1, CV_PI/180, 100, 50, 10 ); for( size_t i = 0; i < lines.size(); i++ ) { Vec4i l = lines[i]; double size = norm(Mat(Point(l[0], l[1])), Mat(Point(l[2], l[3])) ); if(size > 100) line( cdst, Point(l[0], l[1]), Point(l[2], l[3]), Scalar(0,0,255), 3, CV_AA); } return lines; } And main function is: int main(int argc, char** argv) { const char* filename = argc >= 2 ? argv[1] : "pic1.jpg"; Mat src = imread(filename, 0); if(src.empty()) { help(); cout << "can not open " << filename << endl; return -1; } Mat dst, cdst; Canny(src, dst, 50, 200, 3); cvtColor(dst, cdst, CV_GRAY2BGR); vector<Vec4i> lines = get_lines(dst, cdst); imshow("source W&B", src); imshow("edges", dst); imshow("detected lines", cdst); imwrite("lines.jpg",cdst); imwrite("src.jpg",src); imwrite("canny.jpg",dst); waitKey(); return 0; }
The obvious thing to try would be to compare the brightness of pixels on either side of the line. make three regions: pixels a little distance to one side of the line, pixels a little distance to the other side of the line, and pixels close to the line. Calculate the average brightness in either region. The walls are light grey, the floor is black and the lines are white, so if one side is significantly brighter than the other side, it is probably an edge (and you can even tell which side is the floor), if both sides are significantly darker than the middle it is probably a marking on the floor. (and if the line is vertical, it is a wall-wall edge)
Lane Detector divider lines c ++ with OpenCV
Now I have been working on the analysis of images with OpenCV, what I'm trying to do is recognize the lane dividing lines, what I do is the following: 1.I receive a image, 2. Then transform it to grayscale 3.I apply the GaussianBlur 4.After I place me in the ROI 5.I apply the canny 6.then I look for lines with hough transform Lines 7.Draw the lines obtained from hough But I've run into a problem which is: that recognizes no dividing lines both rail and neither recognizes the yellow lines. I hope to help me solve this problem, you will thank a lot. Then I put the code #include "opencv2/highgui/highgui.hpp" #include <opencv2/objdetect/objdetect.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <iostream> #include <vector> #include <stdio.h> #include "linefinder.h" using namespace cv; int main(int argc, char* argv[]) { int houghVote = 200; string arg = argv[1]; Mat image; image = imread(argv[1]); Mat gray; cvtColor(image,gray,CV_RGB2GRAY); GaussianBlur( gray, gray, Size( 5, 5 ), 0, 0 ); vector<string> codes; Mat corners; findDataMatrix(gray, codes, corners); drawDataMatrixCodes(image, codes, corners); //Mat image = imread(""); //Rect region_of_interest = Rect(x, y, w, h); //Mat image_roi = image(region_of_interest); std::cout << image.cols << "\n"; std::cout << image.rows << "\n"; Rect roi(0,290,640,190);// set the ROI for the image Mat imgROI = image(roi); // Display the image imwrite("original.bmp", imgROI); // Canny algorithm Mat contours; Canny(imgROI, contours, 120, 300, 3); imwrite("canny.bmp", contours); Mat contoursInv; threshold(contours,contoursInv,128,255,THRESH_BINARY_INV); // Display Canny image imwrite("contours.bmp", contoursInv); /* Hough tranform for line detection with feedback Increase by 25 for the next frame if we found some lines. This is so we don't miss other lines that may crop up in the next frame but at the same time we don't want to start the feed back loop from scratch. */ std::vector<Vec2f> lines; if (houghVote < 1 or lines.size() > 2){ // we lost all lines. reset houghVote = 200; }else{ houghVote += 25; } while(lines.size() < 5 && houghVote > 0){ HoughLines(contours,lines,1,PI/180, houghVote); houghVote -= 5; } std::cout << houghVote << "\n"; Mat result(imgROI.size(),CV_8U,Scalar(255)); imgROI.copyTo(result); // Draw the limes std::vector<Vec2f>::const_iterator it= lines.begin(); Mat hough(imgROI.size(),CV_8U,Scalar(0)); while (it!=lines.end()) { float rho= (*it)[0]; // first element is distance rho float theta= (*it)[1]; // second element is angle theta if ( theta > 0.09 && theta < 1.48 || theta < 3.14 && theta > 1.66 ) { // filter to remove vertical and horizontal lines // point of intersection of the line with first row Point pt1(rho/cos(theta),0); // point of intersection of the line with last row Point pt2((rho-result.rows*sin(theta))/cos(theta),result.rows); // draw a white line line( result, pt1, pt2, Scalar(255), 8); line( hough, pt1, pt2, Scalar(255), 8); } ++it; } // Display the detected line image std::cout << "line image:"<< "\n"; namedWindow("Detected Lines with Hough"); imwrite("hough.bmp", result); // Create LineFinder instance LineFinder ld; // Set probabilistic Hough parameters ld.setLineLengthAndGap(60,10); ld.setMinVote(4); // Detect lines std::vector<Vec4i> li= ld.findLines(contours); Mat houghP(imgROI.size(),CV_8U,Scalar(0)); ld.setShift(0); ld.drawDetectedLines(houghP); std::cout << "First Hough" << "\n"; imwrite("houghP.bmp", houghP); // bitwise AND of the two hough images bitwise_and(houghP,hough,houghP); Mat houghPinv(imgROI.size(),CV_8U,Scalar(0)); Mat dst(imgROI.size(),CV_8U,Scalar(0)); threshold(houghP,houghPinv,150,255,THRESH_BINARY_INV); // threshold and invert to black lines namedWindow("Detected Lines with Bitwise"); imshow("Detected Lines with Bitwise", houghPinv); Canny(houghPinv,contours,100,350); li= ld.findLines(contours); // Display Canny image imwrite("contours.bmp", contoursInv); // Set probabilistic Hough parameters ld.setLineLengthAndGap(5,2); ld.setMinVote(1); ld.setShift(image.cols/3); ld.drawDetectedLines(image); std::stringstream stream; stream << "Lines Segments: " << lines.size(); putText(image, stream.str(), Point(10,image.rows-10), 2, 0.8, Scalar(0,0,255),0); imwrite("processed.bmp", image); char key = (char) waitKey(10); lines.clear(); } The following are the input images respectively: Here I show two photos one that recognizes the white line and another that does not recognize the yellow line, what I require is to recognize the dividing lines because I monitor the lane, but is complicated to me and it does not recognize the presence of all dividing lines, I hope help me because I have honestly tried everything but I have not had good results.
I think it's because you are doing a bitwise addition of both probabilistic hough and regular hough transforms. This means that the outputted image will only contain lines that appear in both of these transforms. I'm pretty sure in the regular transform the line is not detected but in the probabilistic hough output the line is detected. You're best bet is to output both transforms separately and debug. I'm doing a similar project, I imagine you could include a separate ROI to exclude from the bitwise addition and that area would be along the centrum of the lane markings.
Extracting text OpenCV
I am trying to find the bounding boxes of text in an image and am currently using this approach: // calculate the local variances of the grayscale image Mat t_mean, t_mean_2; Mat grayF; outImg_gray.convertTo(grayF, CV_32F); int winSize = 35; blur(grayF, t_mean, cv::Size(winSize,winSize)); blur(grayF.mul(grayF), t_mean_2, cv::Size(winSize,winSize)); Mat varMat = t_mean_2 - t_mean.mul(t_mean); varMat.convertTo(varMat, CV_8U); // threshold the high variance regions Mat varMatRegions = varMat > 100; When given an image like this: Then when I show varMatRegions I get this image: As you can see it somewhat combines the left block of text with the header of the card, for most cards this method works great but on busier cards it can cause problems. The reason it is bad for those contours to connect is that it makes the bounding box of the contour nearly take up the entire card. Can anyone suggest a different way I can find the text to ensure proper detection of text? 200 points to whoever can find the text in the card above the these two.
I used a gradient based method in the program below. Added the resulting images. Please note that I'm using a scaled down version of the image for processing. c++ version The MIT License (MIT) Copyright (c) 2014 Dhanushka Dangampola Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. #include "stdafx.h" #include <opencv2/core/core.hpp> #include <opencv2/highgui/highgui.hpp> #include <opencv2/imgproc/imgproc.hpp> #include <iostream> using namespace cv; using namespace std; #define INPUT_FILE "1.jpg" #define OUTPUT_FOLDER_PATH string("") int _tmain(int argc, _TCHAR* argv[]) { Mat large = imread(INPUT_FILE); Mat rgb; // downsample and use it for processing pyrDown(large, rgb); Mat small; cvtColor(rgb, small, CV_BGR2GRAY); // morphological gradient Mat grad; Mat morphKernel = getStructuringElement(MORPH_ELLIPSE, Size(3, 3)); morphologyEx(small, grad, MORPH_GRADIENT, morphKernel); // binarize Mat bw; threshold(grad, bw, 0.0, 255.0, THRESH_BINARY | THRESH_OTSU); // connect horizontally oriented regions Mat connected; morphKernel = getStructuringElement(MORPH_RECT, Size(9, 1)); morphologyEx(bw, connected, MORPH_CLOSE, morphKernel); // find contours Mat mask = Mat::zeros(bw.size(), CV_8UC1); vector<vector<Point>> contours; vector<Vec4i> hierarchy; findContours(connected, contours, hierarchy, CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE, Point(0, 0)); // filter contours for(int idx = 0; idx >= 0; idx = hierarchy[idx][0]) { Rect rect = boundingRect(contours[idx]); Mat maskROI(mask, rect); maskROI = Scalar(0, 0, 0); // fill the contour drawContours(mask, contours, idx, Scalar(255, 255, 255), CV_FILLED); // ratio of non-zero pixels in the filled region double r = (double)countNonZero(maskROI)/(rect.width*rect.height); if (r > .45 /* assume at least 45% of the area is filled if it contains text */ && (rect.height > 8 && rect.width > 8) /* constraints on region size */ /* these two conditions alone are not very robust. better to use something like the number of significant peaks in a horizontal projection as a third condition */ ) { rectangle(rgb, rect, Scalar(0, 255, 0), 2); } } imwrite(OUTPUT_FOLDER_PATH + string("rgb.jpg"), rgb); return 0; } python version The MIT License (MIT) Copyright (c) 2017 Dhanushka Dangampola Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. import cv2 import numpy as np large = cv2.imread('1.jpg') rgb = cv2.pyrDown(large) small = cv2.cvtColor(rgb, cv2.COLOR_BGR2GRAY) kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3)) grad = cv2.morphologyEx(small, cv2.MORPH_GRADIENT, kernel) _, bw = cv2.threshold(grad, 0.0, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU) kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 1)) connected = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, kernel) # using RETR_EXTERNAL instead of RETR_CCOMP contours, hierarchy = cv2.findContours(connected.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) #For opencv 3+ comment the previous line and uncomment the following line #_, contours, hierarchy = cv2.findContours(connected.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE) mask = np.zeros(bw.shape, dtype=np.uint8) for idx in range(len(contours)): x, y, w, h = cv2.boundingRect(contours[idx]) mask[y:y+h, x:x+w] = 0 cv2.drawContours(mask, contours, idx, (255, 255, 255), -1) r = float(cv2.countNonZero(mask[y:y+h, x:x+w])) / (w * h) if r > 0.45 and w > 8 and h > 8: cv2.rectangle(rgb, (x, y), (x+w-1, y+h-1), (0, 255, 0), 2) cv2.imshow('rects', rgb)
You can detect text by finding close edge elements (inspired from a LPD): #include "opencv2/opencv.hpp" std::vector<cv::Rect> detectLetters(cv::Mat img) { std::vector<cv::Rect> boundRect; cv::Mat img_gray, img_sobel, img_threshold, element; cvtColor(img, img_gray, CV_BGR2GRAY); cv::Sobel(img_gray, img_sobel, CV_8U, 1, 0, 3, 1, 0, cv::BORDER_DEFAULT); cv::threshold(img_sobel, img_threshold, 0, 255, CV_THRESH_OTSU+CV_THRESH_BINARY); element = getStructuringElement(cv::MORPH_RECT, cv::Size(17, 3) ); cv::morphologyEx(img_threshold, img_threshold, CV_MOP_CLOSE, element); //Does the trick std::vector< std::vector< cv::Point> > contours; cv::findContours(img_threshold, contours, 0, 1); std::vector<std::vector<cv::Point> > contours_poly( contours.size() ); for( int i = 0; i < contours.size(); i++ ) if (contours[i].size()>100) { cv::approxPolyDP( cv::Mat(contours[i]), contours_poly[i], 3, true ); cv::Rect appRect( boundingRect( cv::Mat(contours_poly[i]) )); if (appRect.width>appRect.height) boundRect.push_back(appRect); } return boundRect; } Usage: int main(int argc,char** argv) { //Read cv::Mat img1=cv::imread("side_1.jpg"); cv::Mat img2=cv::imread("side_2.jpg"); //Detect std::vector<cv::Rect> letterBBoxes1=detectLetters(img1); std::vector<cv::Rect> letterBBoxes2=detectLetters(img2); //Display for(int i=0; i< letterBBoxes1.size(); i++) cv::rectangle(img1,letterBBoxes1[i],cv::Scalar(0,255,0),3,8,0); cv::imwrite( "imgOut1.jpg", img1); for(int i=0; i< letterBBoxes2.size(); i++) cv::rectangle(img2,letterBBoxes2[i],cv::Scalar(0,255,0),3,8,0); cv::imwrite( "imgOut2.jpg", img2); return 0; } Results: a. element = getStructuringElement(cv::MORPH_RECT, cv::Size(17, 3) ); b. element = getStructuringElement(cv::MORPH_RECT, cv::Size(30, 30) ); Results are similar for the other image mentioned.
Here is an alternative approach that I used to detect the text blocks: Converted the image to grayscale Applied threshold (simple binary threshold, with a handpicked value of 150 as the threshold value) Applied dilation to thicken lines in image, leading to more compact objects and less white space fragments. Used a high value for number of iterations, so dilation is very heavy (13 iterations, also handpicked for optimal results). Identified contours of objects in resulted image using opencv findContours function. Drew a bounding box (rectangle) circumscribing each contoured object - each of them frames a block of text. Optionally discarded areas that are unlikely to be the object you are searching for (e.g. text blocks) given their size, as the algorithm above can also find intersecting or nested objects (like the entire top area for the first card) some of which could be uninteresting for your purposes. Below is the code written in python with pyopencv, it should easy to port to C++. import cv2 image = cv2.imread("card.png") gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) # grayscale _,thresh = cv2.threshold(gray,150,255,cv2.THRESH_BINARY_INV) # threshold kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3)) dilated = cv2.dilate(thresh,kernel,iterations = 13) # dilate _, contours, hierarchy = cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE) # get contours # for each contour found, draw a rectangle around it on original image for contour in contours: # get rectangle bounding contour [x,y,w,h] = cv2.boundingRect(contour) # discard areas that are too large if h>300 and w>300: continue # discard areas that are too small if h<40 or w<40: continue # draw rectangle around contour on original image cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,255),2) # write original image with added contours to disk cv2.imwrite("contoured.jpg", image) The original image is the first image in your post. After preprocessing (grayscale, threshold and dilate - so after step 3) the image looked like this: Below is the resulted image ("contoured.jpg" in the last line); the final bounding boxes for the objects in the image look like this: You can see the text block on the left is detected as a separate block, delimited from its surroundings. Using the same script with the same parameters (except for thresholding type that was changed for the second image like described below), here are the results for the other 2 cards: Tuning the parameters The parameters (threshold value, dilation parameters) were optimized for this image and this task (finding text blocks) and can be adjusted, if needed, for other cards images or other types of objects to be found. For thresholding (step 2), I used a black threshold. For images where text is lighter than the background, such as the second image in your post, a white threshold should be used, so replace thesholding type with cv2.THRESH_BINARY). For the second image I also used a slightly higher value for the threshold (180). Varying the parameters for the threshold value and the number of iterations for dilation will result in different degrees of sensitivity in delimiting objects in the image. Finding other object types: For example, decreasing the dilation to 5 iterations in the first image gives us a more fine delimitation of objects in the image, roughly finding all words in the image (rather than text blocks): Knowing the rough size of a word, here I discarded areas that were too small (below 20 pixels width or height) or too large (above 100 pixels width or height) to ignore objects that are unlikely to be words, to get the results in the above image.
#dhanushka's approach showed the most promise but I wanted to play around in Python so went ahead and translated it for fun: import cv2 import numpy as np from cv2 import boundingRect, countNonZero, cvtColor, drawContours, findContours, getStructuringElement, imread, morphologyEx, pyrDown, rectangle, threshold large = imread(image_path) # downsample and use it for processing rgb = pyrDown(large) # apply grayscale small = cvtColor(rgb, cv2.COLOR_BGR2GRAY) # morphological gradient morph_kernel = getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3)) grad = morphologyEx(small, cv2.MORPH_GRADIENT, morph_kernel) # binarize _, bw = threshold(src=grad, thresh=0, maxval=255, type=cv2.THRESH_BINARY+cv2.THRESH_OTSU) morph_kernel = getStructuringElement(cv2.MORPH_RECT, (9, 1)) # connect horizontally oriented regions connected = morphologyEx(bw, cv2.MORPH_CLOSE, morph_kernel) mask = np.zeros(bw.shape, np.uint8) # find contours im2, contours, hierarchy = findContours(connected, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) # filter contours for idx in range(0, len(hierarchy[0])): rect = x, y, rect_width, rect_height = boundingRect(contours[idx]) # fill the contour mask = drawContours(mask, contours, idx, (255, 255, 2555), cv2.FILLED) # ratio of non-zero pixels in the filled region r = float(countNonZero(mask)) / (rect_width * rect_height) if r > 0.45 and rect_height > 8 and rect_width > 8: rgb = rectangle(rgb, (x, y+rect_height), (x+rect_width, y), (0,255,0),3) Now to display the image: from PIL import Image Image.fromarray(rgb).show() Not the most Pythonic of scripts but I tried to resemble the original C++ code as closely as possible for readers to follow. It works almost as well as the original. I'll be happy to read suggestions how it could be improved/fixed to resemble the original results fully.
You can try this method that is developed by Chucai Yi and Yingli Tian. They also share a software (which is based on Opencv-1.0 and it should run under Windows platform.) that you can use (though no source code available). It will generate all the text bounding boxes (shown in color shadows) in the image. By applying to your sample images, you will get the following results: Note: to make the result more robust, you can further merge adjacent boxes together. Update: If your ultimate goal is to recognize the texts in the image, you can further check out gttext, which is an OCR free software and Ground Truthing tool for Color Images with Text. Source code is also available. With this, you can get recognized texts like:
Above Code JAVA version: Thanks #William public static List<Rect> detectLetters(Mat img){ List<Rect> boundRect=new ArrayList<>(); Mat img_gray =new Mat(), img_sobel=new Mat(), img_threshold=new Mat(), element=new Mat(); Imgproc.cvtColor(img, img_gray, Imgproc.COLOR_RGB2GRAY); Imgproc.Sobel(img_gray, img_sobel, CvType.CV_8U, 1, 0, 3, 1, 0, Core.BORDER_DEFAULT); //at src, Mat dst, double thresh, double maxval, int type Imgproc.threshold(img_sobel, img_threshold, 0, 255, 8); element=Imgproc.getStructuringElement(Imgproc.MORPH_RECT, new Size(15,5)); Imgproc.morphologyEx(img_threshold, img_threshold, Imgproc.MORPH_CLOSE, element); List<MatOfPoint> contours = new ArrayList<MatOfPoint>(); Mat hierarchy = new Mat(); Imgproc.findContours(img_threshold, contours,hierarchy, 0, 1); List<MatOfPoint> contours_poly = new ArrayList<MatOfPoint>(contours.size()); for( int i = 0; i < contours.size(); i++ ){ MatOfPoint2f mMOP2f1=new MatOfPoint2f(); MatOfPoint2f mMOP2f2=new MatOfPoint2f(); contours.get(i).convertTo(mMOP2f1, CvType.CV_32FC2); Imgproc.approxPolyDP(mMOP2f1, mMOP2f2, 2, true); mMOP2f2.convertTo(contours.get(i), CvType.CV_32S); Rect appRect = Imgproc.boundingRect(contours.get(i)); if (appRect.width>appRect.height) { boundRect.add(appRect); } } return boundRect; } And use this code in practice : System.loadLibrary(Core.NATIVE_LIBRARY_NAME); Mat img1=Imgcodecs.imread("abc.png"); List<Rect> letterBBoxes1=Utils.detectLetters(img1); for(int i=0; i< letterBBoxes1.size(); i++) Imgproc.rectangle(img1,letterBBoxes1.get(i).br(), letterBBoxes1.get(i).tl(),new Scalar(0,255,0),3,8,0); Imgcodecs.imwrite("abc1.png", img1);
This is a C# version of the answer from dhanushka using OpenCVSharp Mat large = new Mat(INPUT_FILE); Mat rgb = new Mat(), small = new Mat(), grad = new Mat(), bw = new Mat(), connected = new Mat(); // downsample and use it for processing Cv2.PyrDown(large, rgb); Cv2.CvtColor(rgb, small, ColorConversionCodes.BGR2GRAY); // morphological gradient var morphKernel = Cv2.GetStructuringElement(MorphShapes.Ellipse, new OpenCvSharp.Size(3, 3)); Cv2.MorphologyEx(small, grad, MorphTypes.Gradient, morphKernel); // binarize Cv2.Threshold(grad, bw, 0, 255, ThresholdTypes.Binary | ThresholdTypes.Otsu); // connect horizontally oriented regions morphKernel = Cv2.GetStructuringElement(MorphShapes.Rect, new OpenCvSharp.Size(9, 1)); Cv2.MorphologyEx(bw, connected, MorphTypes.Close, morphKernel); // find contours var mask = new Mat(Mat.Zeros(bw.Size(), MatType.CV_8UC1), Range.All); Cv2.FindContours(connected, out OpenCvSharp.Point[][] contours, out HierarchyIndex[] hierarchy, RetrievalModes.CComp, ContourApproximationModes.ApproxSimple, new OpenCvSharp.Point(0, 0)); // filter contours var idx = 0; foreach (var hierarchyItem in hierarchy) { idx = hierarchyItem.Next; if (idx < 0) break; OpenCvSharp.Rect rect = Cv2.BoundingRect(contours[idx]); var maskROI = new Mat(mask, rect); maskROI.SetTo(new Scalar(0, 0, 0)); // fill the contour Cv2.DrawContours(mask, contours, idx, Scalar.White, -1); // ratio of non-zero pixels in the filled region double r = (double)Cv2.CountNonZero(maskROI) / (rect.Width * rect.Height); if (r > .45 /* assume at least 45% of the area is filled if it contains text */ && (rect.Height > 8 && rect.Width > 8) /* constraints on region size */ /* these two conditions alone are not very robust. better to use something like the number of significant peaks in a horizontal projection as a third condition */ ) { Cv2.Rectangle(rgb, rect, new Scalar(0, 255, 0), 2); } } rgb.SaveImage(Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "rgb.jpg"));
Python Implementation for #dhanushka's solution: def process_rgb(rgb): hasText = False gray = cv2.cvtColor(rgb, cv2.COLOR_BGR2GRAY) morphKernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3)) grad = cv2.morphologyEx(gray, cv2.MORPH_GRADIENT, morphKernel) # binarize _, bw = cv2.threshold(grad, 0.0, 255.0, cv2.THRESH_BINARY | cv2.THRESH_OTSU) # connect horizontally oriented regions morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, (9, 1)) connected = cv2.morphologyEx(bw, cv2.MORPH_CLOSE, morphKernel) # find contours mask = np.zeros(bw.shape[:2], dtype="uint8") _,contours, hierarchy = cv2.findContours(connected, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE) # filter contours idx = 0 while idx >= 0: x,y,w,h = cv2.boundingRect(contours[idx]) # fill the contour cv2.drawContours(mask, contours, idx, (255, 255, 255), cv2.FILLED) # ratio of non-zero pixels in the filled region r = cv2.contourArea(contours[idx])/(w*h) if(r > 0.45 and h > 5 and w > 5 and w > h): cv2.rectangle(rgb, (x,y), (x+w,y+h), (0, 255, 0), 2) hasText = True idx = hierarchy[0][idx][0] return hasText, rgb
You can utilize a python implementation SWTloc. Full Disclosure : I am the author of this library To do that :- First and Second Image Notice that the text_mode here is 'lb_df', which stands for Light Background Dark Foreground i.e the text in this image is going to be in darker color than the background from swtloc import SWTLocalizer from swtloc.utils import imgshowN, imgshow swtl = SWTLocalizer() # Stroke Width Transform swtl.swttransform(imgpaths='img1.jpg', text_mode = 'lb_df', save_results=True, save_rootpath = 'swtres/', minrsw = 3, maxrsw = 20, max_angledev = np.pi/3) imgshow(swtl.swtlabelled_pruned13C) # Grouping respacket=swtl.get_grouped(lookup_radii_multiplier=0.9, ht_ratio=3.0) grouped_annot_bubble = respacket[2] maskviz = respacket[4] maskcomb = respacket[5] # Saving the results _=cv2.imwrite('img1_processed.jpg', swtl.swtlabelled_pruned13C) imgshowN([maskcomb, grouped_annot_bubble], savepath='grouped_img1.jpg') Third Image Notice that the text_mode here is 'db_lf', which stands for Dark Background Light Foreground i.e the text in this image is going to be in lighter color than the background from swtloc import SWTLocalizer from swtloc.utils import imgshowN, imgshow swtl = SWTLocalizer() # Stroke Width Transform swtl.swttransform(imgpaths=imgpaths[1], text_mode = 'db_lf', save_results=True, save_rootpath = 'swtres/', minrsw = 3, maxrsw = 20, max_angledev = np.pi/3) imgshow(swtl.swtlabelled_pruned13C) # Grouping respacket=swtl.get_grouped(lookup_radii_multiplier=0.9, ht_ratio=3.0) grouped_annot_bubble = respacket[2] maskviz = respacket[4] maskcomb = respacket[5] # Saving the results _=cv2.imwrite('img1_processed.jpg', swtl.swtlabelled_pruned13C) imgshowN([maskcomb, grouped_annot_bubble], savepath='grouped_img1.jpg') You will also notice that the grouping done is not so accurate, to get the desired results as the images might vary, try to tune the grouping parameters in swtl.get_grouped() function.
this is a VB.NET version of the answer from dhanushka using EmguCV. A few functions and structures in EmguCV need different consideration than the C# version with OpenCVSharp Imports Emgu.CV Imports Emgu.CV.Structure Imports Emgu.CV.CvEnum Imports Emgu.CV.Util Dim input_file As String = "C:\your_input_image.png" Dim large As Mat = New Mat(input_file) Dim rgb As New Mat Dim small As New Mat Dim grad As New Mat Dim bw As New Mat Dim connected As New Mat Dim morphanchor As New Point(0, 0) '//downsample and use it for processing CvInvoke.PyrDown(large, rgb) CvInvoke.CvtColor(rgb, small, ColorConversion.Bgr2Gray) '//morphological gradient Dim morphKernel As Mat = CvInvoke.GetStructuringElement(ElementShape.Ellipse, New Size(3, 3), morphanchor) CvInvoke.MorphologyEx(small, grad, MorphOp.Gradient, morphKernel, New Point(0, 0), 1, BorderType.Isolated, New MCvScalar(0)) '// binarize CvInvoke.Threshold(grad, bw, 0, 255, ThresholdType.Binary Or ThresholdType.Otsu) '// connect horizontally oriented regions morphKernel = CvInvoke.GetStructuringElement(ElementShape.Rectangle, New Size(9, 1), morphanchor) CvInvoke.MorphologyEx(bw, connected, MorphOp.Close, morphKernel, morphanchor, 1, BorderType.Isolated, New MCvScalar(0)) '// find contours Dim mask As Mat = Mat.Zeros(bw.Size.Height, bw.Size.Width, DepthType.Cv8U, 1) '' MatType.CV_8UC1 Dim contours As New VectorOfVectorOfPoint Dim hierarchy As New Mat CvInvoke.FindContours(connected, contours, hierarchy, RetrType.Ccomp, ChainApproxMethod.ChainApproxSimple, Nothing) '// filter contours Dim idx As Integer Dim rect As Rectangle Dim maskROI As Mat Dim r As Double For Each hierarchyItem In hierarchy.GetData rect = CvInvoke.BoundingRectangle(contours(idx)) maskROI = New Mat(mask, rect) maskROI.SetTo(New MCvScalar(0, 0, 0)) '// fill the contour CvInvoke.DrawContours(mask, contours, idx, New MCvScalar(255), -1) '// ratio of non-zero pixels in the filled region r = CvInvoke.CountNonZero(maskROI) / (rect.Width * rect.Height) '/* assume at least 45% of the area Is filled if it contains text */ '/* constraints on region size */ '/* these two conditions alone are Not very robust. better to use something 'Like the number of significant peaks in a horizontal projection as a third condition */ If r > 0.45 AndAlso rect.Height > 8 AndAlso rect.Width > 8 Then 'draw green rectangle CvInvoke.Rectangle(rgb, rect, New MCvScalar(0, 255, 0), 2) End If idx += 1 Next rgb.Save(IO.Path.Combine(Application.StartupPath, "rgb.jpg"))
Using OpenCV to detect parking spots
I am trying to use opencv to automatically find and locate all parking spots in an empty parking lot. Currently, I have a code that thresholds the image, applies canny edge detection, and then uses probabilistic hough lines to find the lines that mark each parking spot. The program then draws the lines and the points that make up the lines Here is the code: #include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include <iostream> using namespace cv; using namespace std; int threshold_value = 150; int threshold_type = 0;; int const max_value = 255; int const max_type = 4; int const max_BINARY_value = 255; int houghthresh = 50; char* trackbar_value = "Value"; char* window_name = "Find Lines"; int main(int argc, char** argv) { const char* filename = argc >= 2 ? argv[1] : "pic1.jpg"; VideoCapture cap(0); Mat src, dst, cdst, tdst, bgrdst; namedWindow( window_name, CV_WINDOW_AUTOSIZE ); createTrackbar( trackbar_value, window_name, &threshold_value, max_value); while(1) { cap >> src; cvtColor(src, dst, CV_RGB2GRAY); threshold( dst, tdst, threshold_value, max_BINARY_value,threshold_type ); Canny(tdst, cdst, 50, 200, 3); cvtColor(tdst, bgrdst, CV_GRAY2BGR); vector<Vec4i> lines; HoughLinesP(cdst, lines, 1, CV_PI/180, houghthresh, 50, 10 ); for( size_t i = 0; i < lines.size(); i++ ) { Vec4i l = lines[i]; line( bgrdst, Point(l[0], l[1]), Point(l[2], l[3]), Scalar(0,255,0), 2, CV_AA); circle( bgrdst, Point(l[0], l[1]), 5, Scalar( 0, 0, 255 ), -1, 8 ); circle( bgrdst, Point(l[2], l[3]), 5, Scalar( 0, 0, 255 ), -1, 8 ); } imshow("source", src); imshow(window_name, bgrdst); waitKey(1); } return 0; } Currently, my main problem is figuring out how to extrapolate the line data to find the locations of each parking space. My goal is to have opencv find the parking spaces and draw out rectangles on each parking space with a number labeling the spots. I think there are some major problems with the method I am currently using, because as shown in the output images, opencv is detecting multiple points on the line other than the 2 endpoints. That might make it very hard to use opencv to connect 2 adjacent endpoints. I read something about using convex hull, but I am not exactly sure what it does and how it works. Any help will be appreciated. Here are the output images from my program: http://imageshack.us/photo/my-images/22/test1hl.png/ http://imageshack.us/photo/my-images/822/test2lw.png/
Consider thinning your binary image, and then detect the end points and the branch points. Here is one such result based on the images provided; end points are in red and branch points are in blue. Now you can find the locations of the parking spaces. A pair of blue dots is always connected by a single edge. Each blue dot is connected to either two or three red points. Then there are several ways to find the parking space formed by two blue dots and two red dots, the simplest is along the lines: find the closest pair of red dots where one dot is connected to a certain blue dot, and the other red point is connected to the other blue dot. This step can also be complemented by checking how close to parallel lines are the edges considered.