OpenCV grooving detection - c++

I have pictures of a surface with many grooves. In most cases the edges of the grooving form parallel lines so Canny and Hough transformation work very good to detect the lines and to do some characterization. However, at several places the grooving is demaged and the edges aren't parallel anymore.
I am looking for an easy way to check if a certain edge is a straight line or if there are any gaps or deviations from a straight line. I am thinking of something like the R square parameter in linear interpolation, but here I need a parameter which is more location-dependent. Do you have any other thougts how to characterize the edges?
I attached a picture of the grooving after canny edge detection. Here, the edges are straight lines and the grooving is fine. Unfortunately I don't have access to pictures with damaged grooving at the moment. However, in pictures with damaged grooving, the lines would have major gaps (at least 10% of the picture's size) or wouldn't be parallel.

The core of the technique I'm sharing below uses cv::HoughLinesP() to find line segments in a grayscale image.
The application starts by loading the input image as grayscale. Then it performs a basic pre-processing operation to enhance certain characteristics of the image, aiming to improve the detection performed by cv::HoughLinesP():
#include <cv.h>
#include <highgui.h>
#include <algorithm>
// Custom sort method adapted from: http://stackoverflow.com/a/328959/176769
// This is used later by std::sort()
struct sort_by_y_coord
{
bool operator ()(cv::Vec4i const& a, cv::Vec4i const& b) const
{
if (a[1] < b[1]) return true;
if (a[1] > b[1]) return false;
return false;
}
};
int main()
{
/* Load input image as grayscale */
cv::Mat src = cv::imread("13531682.jpg", 0);
/* Pre-process the image to enhance the characteristics we are interested at */
medianBlur(src, src, 5);
int erosion_size = 2;
cv::Mat element = cv::getStructuringElement(cv::MORPH_CROSS,
cv::Size(2 * erosion_size + 1, 2 * erosion_size + 1),
cv::Point(erosion_size, erosion_size) );
cv::erode(src, src, element);
cv::dilate(src, src, element);
/* Identify all the lines in the image */
cv::Size size = src.size();
std::vector<cv::Vec4i> total_lines;
cv::HoughLinesP(src, total_lines, 1, CV_PI/180, 100, size.width / 2.f, 20);
int n_lines = total_lines.size();
std::cout << "* Total lines: "<< n_lines << std::endl;
cv::Mat disp_lines(size, CV_8UC1, cv::Scalar(0, 0, 0));
// For debugging purposes, the block below writes all the lines into disp_lines
// for (unsigned i = 0; i < n_lines; ++i)
// {
// cv::line(disp_lines,
// cv::Point(total_lines[i][0], total_lines[i][2]),
// cv::Point(total_lines[i][3], total_lines[i][4]),
// cv::Scalar(255, 0 ,0));
// }
// cv::imwrite("total_lines.png", disp_lines);
At this point, all the line segments detected can be written to a file for visualization purposes:
At this point we need to sort our vector of lines because cv::HoughLinesP() doesn't do that, and we need the vector sorted to be able to identify groups of lines, by measuring and comparing the distance between the lines:
/* Sort lines according to their Y coordinate.
The line closest to Y == 0 is at the first position of the vector.
*/
sort(total_lines.begin(), total_lines.end(), sort_by_y_coord());
/* Separate them according to their (visible) groups */
// Figure out the number of groups by distance between lines
std::vector<int> idx_of_groups; // stores the index position where a new group starts
idx_of_groups.push_back(0); // the first line indicates the start of the first group
// The loop jumps over the first line, since it was already added as a group
int y_dist = 35; // the next groups are identified by a minimum of 35 pixels of distance
for (unsigned i = 1; i < n_lines; i++)
{
if ((total_lines[i][5] - total_lines[i-1][6]) >= y_dist)
{
// current index marks the position of a new group
idx_of_groups.push_back(i);
std::cout << "* New group located at line #"<< i << std::endl;
}
}
int n_groups = idx_of_groups.size();
std::cout << "* Total groups identified: "<< n_groups << std::endl;
The last part of the code above simply stores the index positions of the vector of lines in a new vector<int> so we know which lines starts a new group.
For instance, assume that the indexes stored in the new vector are: 0 4 8 12. Remember: they define the start of each group. That means that the ending lines of the groups are: 0, 4-1, 4, 8-1, 8, 12-1, 12.
Knowing that, we write the following code:
/* Mark the beginning and end of each group */
for (unsigned i = 0; i < n_groups; i++)
{
// To do this, we discard the X coordinates of the 2 points from the line,
// so we can draw a line from X=0 to X=size.width
// beginning
cv::line(disp_lines,
cv::Point(0, total_lines[ idx_of_groups[i] ][7]),
cv::Point(size.width, total_lines[ idx_of_groups[i] ][8]),
cv::Scalar(255, 0 ,0));
// end
if (i != n_groups-1)
{
cv::line(disp_lines,
cv::Point(0, total_lines[ idx_of_groups[i+1]-1 ][9]),
cv::Point(size.width, total_lines[ idx_of_groups[i+1]-1 ][10]),
cv::Scalar(255, 0 ,0));
}
}
// mark the end position of the last group (not done by the loop above)
cv::line(disp_lines,
cv::Point(0, total_lines[n_lines-1][11]),
cv::Point(size.width, total_lines[n_lines-1][12]),
cv::Scalar(255, 0 ,0));
/* Save the output image and display it on the screen */
cv::imwrite("groups.png", disp_lines);
cv::imshow("groove", disp_lines);
cv::waitKey(0);
cv::destroyWindow("groove");
return 0;
}
And the resulting image is:
It's not a perfect match, but it's close. With a little bit of tweaks here and there this approach can get much better. I would start by writing a smarter logic for sort_by_y_coord, which should discard lines that have small distances between the X coordinates (i.e. small line segments), and also lines that are not perfectly aligned on the X axis (like the one from the second group in the output image). This suggestion makes much more sense after you take the time to evaluate the first image generated by the application.
Good luck.

What immediately comes to mind would be a Hough Transform. This is a voting scheme in line space, which takes each possible line and gives you a score for it. In the code I linked to above, you could simply set a threshold that approximates ~10% of screwed up grooves/lines.

Related

Fastest way to calculate 'percentage matched' of two trapezoids

I asked the following question here and got a good solution, but have found it to be too slow as far as performance (Takes 2-300 ms with 640x480 image). Now I would like to consider how it can be optimized.
Problem:
Given two polygons (ALWAYS trapezoids that are parallel to the X axis), I would like to calculate by some means how much they match up. By this, I mean overlapping area is not sufficient, because if one polygon has excess area, somehow that needs to count against it. Optimally, I would like to know what percent of the area created by both polygons is common. See image for example as what is desired.
A working (but slow) solution:
- Draw polygon one on an empty image (cv::fillConvexPoly)
- Draw polygon two on an empty image (cv::fillConvexPoly)
- Perform a bitwise and to create an image of all the pixels that overlapped
- Count all nonzero pixels --> overlapped Pixels
- Invert the first image and repeat with non-inverted second --> excessive pixels
- Invert the second image and repeat with non-inverted first --> more excessive pixels
- Take the 'overlapped pixels' over the sum of the 'excessive pixels'
As you can see the current solution is computationally intensive because it is evaluating/ operating on every single pixel of an image ~12 times or so. I would rather a solution that calculates this area that goes through the tedious construction and evaluation of several images.
Existing code:
#define POLYGONSCALING 0.05
typedef std::array<cv::Point, 4> Polygon;
float PercentMatch( const Polygon& polygon,
const cv::Mat optimalmat )
{
//Create blank mats
cv::Mat polygonmat{ cv::Mat(optimalmat.rows, optimalmat.cols, CV_8UC1, cv::Scalar(0)) };
cv::Mat resultmat{ cv::Mat(optimalmat.rows, optimalmat.cols, CV_8UC1, cv::Scalar(0)) };
//Draw polygon
cv::Point cvpointarray[4];
for (int i =0; i < 4; i++ ) {
cvpointarray[i] = cv::Point(POLYGONSCALING * polygon[i].x, POLYGONSCALING *
polygon[i].y);
}
cv::fillConvexPoly( polygonmat, cvpointarray, 4, cv::Scalar(255) );
//Find overlapped pixels
cv::bitwise_and(polygonmat, optimalmat, resultmat);
int overlappedpixels { countNonZero(resultmat) };
//Find excessive pixels
cv::bitwise_not(optimalmat, resultmat);
cv::bitwise_and(polygonmat, resultmat, resultmat);
int excessivepixels { countNonZero(resultmat) };
cv::bitwise_not(polygonmat, resultmat);
cv::bitwise_and(optimalmat, resultmat, resultmat);
excessivepixels += countNonZero(resultmat);
return (100.0 * overlappedpixels) / (overlappedpixels + excessivepixels);
}
Currently the only performance improvements i've devised is drawing the 'optimalmat' outside of the function so it isn't redrawn (it gets compared to many other polygons), and also I added a POLYGONSCALING to size down the polygons and lose some resolution but gain some performance. Still too slow.
I may have misunderstood what you want but I think you should be able to do it faster like this...
Fill your first trapezoid with 1's on a background of zeroes.
Fill your second trapezoid with 2's on a background of zeroes.
Add the two Mats together.
Now each pixel must be either 0, 1, 2 or 3. Create an array with 4 elements and in a single pass over all elements, with no if statements, simply increment the corresponding array element according to the value of each pixel.
Then total of pixels in the first index of the array are where neither trapezoid is present, the elements with indices 1 and 2 are where trapezoid 1 or 2 was present, and the elements with index 3 are overlap.
Also, try benchmarking the flood-fills of the two trapezoids, if that is a significant proportion if your time, maybe have a second thread filling the second trapezoid.
Benchmark
I wrote some code to try out the above theory, and with a 640x480 image it takes:
181 microseconds to draw the first polygon
84 microseconds to draw the second polygon
481 microseconds to calculate the overlap
So the total time is 740 microseconds on my iMac.
You could draw the second polygon in parallel with the first, but the thread creation and joining time is around 20 microseconds, so you would only save 60 microseconds there which is 8% or so - probably not worth it.
Most of the code is timing and debug:
#include "opencv2/highgui.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
#include <chrono>
using namespace cv;
using namespace std;
const int COLS=640;
const int ROWS=480;
typedef std::chrono::high_resolution_clock hrclock;
hrclock::time_point t1,t2;
std::chrono::nanoseconds elapsed;
int e;
int
main(int argc,char*argv[]){
Mat canvas1(ROWS,COLS,CV_8UC1,Scalar(0));
Mat canvas2(ROWS,COLS,CV_8UC1,Scalar(0));
Mat sum(ROWS,COLS,CV_8UC1,Scalar(0));
//Draw polygons on canvases
Point vertices1[4]={Point(10,10),Point(400,10),Point(400,460),Point(10,460)};
Point vertices2[4]={Point(300,50),Point(600,50),Point(600,400),Point(300,400)};
t1 = hrclock::now();
// FilConvexPoly takes around 150 microseconds here
fillConvexPoly(canvas1,vertices1,4,cv::Scalar(1));
t2 = hrclock::now();
elapsed = t2-t1;
e=elapsed.count();
cout << "fillConvexPoly: " << e << "ns" << std::endl;
imwrite("canvas1.png",canvas1);
t1 = hrclock::now();
// FilConvexPoly takes around 80 microseconds here
fillConvexPoly(canvas2,vertices2,4,cv::Scalar(2));
t2 = hrclock::now();
elapsed = t2-t1;
e=elapsed.count();
cout << "fillConvexPoly: " << e << "ns" << std::endl;
imwrite("canvas2.png",canvas2);
sum=canvas1+canvas2;
imwrite("sum.png",sum);
long totals[4]={0,0,0,0};
uchar* p1=sum.data;
t1 = hrclock::now();
for(int j=0;j<ROWS;j++){
uchar* data= sum.ptr<uchar>(j);
for(int i=0;i<COLS;i++) {
totals[data[i]]++;
}
}
t2 = hrclock::now();
elapsed = t2-t1;
e=elapsed.count();
cout << "Count overlap: " << e << "ns" << std::endl;
for(int i=0;i<4;i++){
cout << "totals[" << i << "]=" << totals[i] << std::endl;
}
}
Sample run
fillConvexPoly: 181338ns
fillConvexPoly: 84759ns
Count overlap: 481830ns
totals[0]=60659
totals[1]=140890
totals[2]=70200
totals[3]=35451
Verified using ImageMagick as follows:
identify -verbose sum.png | grep -A4 Histogram:
Histogram:
60659: ( 0, 0, 0) #000000 gray(0)
140890: ( 1, 1, 1) #010101 gray(1)
70200: ( 2, 2, 2) #020202 gray(2)
35451: ( 3, 3, 3) #030303 gray(3)

Image to ASCII art conversion

Prologue
This subject pops up here on Stack Overflow from time to time, but it is removed usually because of being a poorly written question. I saw many such questions and then silence from the OP (usual low rep) when additional information is requested. From time to time if the input is good enough for me I decide to respond with an answer and it usually gets a few up-votes per day while active, but then after a few weeks the question gets removed/deleted and all starts from the beginning. So I decided to write this Q&A so I can reference such questions directly without rewriting the answer over and over again …
Another reason is also this meta thread targeted at me so if you got additional input, feel free to comment.
Question
How can I convert a bitmap image to ASCII art using C++?
Some constraints:
gray scale images
using mono-spaced fonts
keeping it simple (not using too advanced stuff for beginner level programmers)
Here is a related Wikipedia page ASCII art (thanks to #RogerRowland).
Here similar maze to ASCII Art conversion Q&A.
There are more approaches for image to ASCII art conversion which are mostly based on using mono-spaced fonts. For simplicity, I stick only to basics:
Pixel/area intensity based (shading)
This approach handles each pixel of an area of pixels as a single dot. The idea is to compute the average gray scale intensity of this dot and then replace it with character with close enough intensity to the computed one. For that we need some list of usable characters, each with a precomputed intensity. Let's call it a character map. To choose more quickly which character is the best for which intensity, there are two ways:
Linearly distributed intensity character map
So we use only characters which have an intensity difference with the same step. In other words, when sorted ascending then:
intensity_of(map[i])=intensity_of(map[i-1])+constant;
Also when our character map is sorted then we can compute the character directly from intensity (no search needed)
character = map[intensity_of(dot)/constant];
Arbitrary distributed intensity character map
So we have array of usable characters and their intensities. We need to find intensity closest to the intensity_of(dot) So again if we sorted the map[], we can use binary search, otherwise we need an O(n) search minimum distance loop or O(1) dictionary. Sometimes for simplicity, the character map[] can be handled as linearly distributed, causing a slight gamma distortion, usually unseen in the result unless you know what to look for.
Intensity-based conversion is great also for gray-scale images (not just black and white). If you select the dot as a single pixel, the result gets large (one pixel -> single character), so for larger images an area (multiply of font size) is selected instead to preserve the aspect ratio and do not enlarge too much.
How to do it:
Evenly divide the image into (gray-scale) pixels or (rectangular) areas dots
Compute the intensity of each pixel/area
Replace it by character from character map with the closest intensity
As the character map you can use any characters, but the result gets better if the character has pixels dispersed evenly along the character area. For starters you can use:
char map[10]=" .,:;ox%##";
sorted descending and pretend to be linearly distributed.
So if intensity of pixel/area is i = <0-255> then the replacement character will be
map[(255-i)*10/256];
If i==0 then the pixel/area is black, if i==127 then the pixel/area is gray, and if i==255 then the pixel/area is white. You can experiment with different characters inside map[] ...
Here is an ancient example of mine in C++ and VCL:
AnsiString m = " .,:;ox%##";
Graphics::TBitmap *bmp = new Graphics::TBitmap;
bmp->LoadFromFile("pic.bmp");
bmp->HandleType = bmDIB;
bmp->PixelFormat = pf24bit;
int x, y, i, c, l;
BYTE *p;
AnsiString s, endl;
endl = char(13); endl += char(10);
l = m.Length();
s ="";
for (y=0; y<bmp->Height; y++)
{
p = (BYTE*)bmp->ScanLine[y];
for (x=0; x<bmp->Width; x++)
{
i = p[x+x+x+0];
i += p[x+x+x+1];
i += p[x+x+x+2];
i = (i*l)/768;
s += m[l-i];
}
s += endl;
}
mm_log->Lines->Text = s;
mm_log->Lines->SaveToFile("pic.txt");
delete bmp;
You need to replace/ignore VCL stuff unless you use the Borland/Embarcadero environment.
mm_log is the memo where the text is outputted
bmp is the input bitmap
AnsiString is a VCL type string indexed from 1, not from 0 as char*!!!
This is the result: Slightly NSFW intensity example image
On the left is ASCII art output (font size 5 pixels), and on the right input image zoomed a few times. As you can see, the output is larger pixel -> character. If you use larger areas instead of pixels then the zoom is smaller, but of course the output is less visually pleasing. This approach is very easy and fast to code/process.
When you add more advanced things like:
automated map computations
automatic pixel/area size selection
aspect ratio corrections
Then you can process more complex images with better results:
Here is the result in a 1:1 ratio (zoom to see the characters):
Of course, for area sampling you lose the small details. This is an image of the same size as the first example sampled with areas:
Slightly NSFW intensity advanced example image
As you can see, this is more suited for bigger images.
Character fitting (hybrid between shading and solid ASCII art)
This approach tries to replace area (no more single pixel dots) with character with similar intensity and shape. This leads to better results, even with bigger fonts used in comparison with the previous approach. On the other hand, this approach is a bit slower of course. There are more ways to do this, but the main idea is to compute the difference (distance) between image area (dot) and rendered character. You can start with naive sum of the absolute difference between pixels, but that will lead to not very good results because even a one-pixel shift will make the distance big. Instead you can use correlation or different metrics. The overall algorithm is the almost the same as the previous approach:
So evenly divide the image to (gray-scale) rectangular areas dot's
ideally with the same aspect ratio as rendered font characters (it will preserve the aspect ratio. Do not forget that characters usually overlap a bit on the x-axis)
Compute the intensity of each area (dot)
Replace it by a character from the character map with the closest intensity/shape
How can we compute the distance between a character and a dot? That is the hardest part of this approach. While experimenting, I develop this compromise between speed, quality, and simpleness:
Divide character area to zones
Compute a separate intensity for left, right, up, down, and center zone of each character from your conversion alphabet (map).
Normalize all intensities, so they are independent on area size, i=(i*256)/(xs*ys).
Process the source image in rectangle areas
(with the same aspect ratio as the target font)
For each area, compute the intensity in the same manner as in bullet #1
Find the closest match from intensities in the conversion alphabet
Output the fitted character
This is the result for font size = 7 pixels
As you can see, the output is visually pleasing, even with a bigger font size used (the previous approach example was with a 5 pixel font size). The output is roughly the same size as the input image (no zoom). The better results are achieved because the characters are closer to the original image, not only by intensity, but also by overall shape, and therefore you can use larger fonts and still preserve details (up to a point of course).
Here is the complete code for the VCL-based conversion application:
//---------------------------------------------------------------------------
#include <vcl.h>
#pragma hdrstop
#include "win_main.h"
//---------------------------------------------------------------------------
#pragma package(smart_init)
#pragma resource "*.dfm"
TForm1 *Form1;
Graphics::TBitmap *bmp=new Graphics::TBitmap;
//---------------------------------------------------------------------------
class intensity
{
public:
char c; // Character
int il, ir, iu ,id, ic; // Intensity of part: left,right,up,down,center
intensity() { c=0; reset(); }
void reset() { il=0; ir=0; iu=0; id=0; ic=0; }
void compute(DWORD **p,int xs,int ys,int xx,int yy) // p source image, (xs,ys) area size, (xx,yy) area position
{
int x0 = xs>>2, y0 = ys>>2;
int x1 = xs-x0, y1 = ys-y0;
int x, y, i;
reset();
for (y=0; y<ys; y++)
for (x=0; x<xs; x++)
{
i = (p[yy+y][xx+x] & 255);
if (x<=x0) il+=i;
if (x>=x1) ir+=i;
if (y<=x0) iu+=i;
if (y>=x1) id+=i;
if ((x>=x0) && (x<=x1) &&
(y>=y0) && (y<=y1))
ic+=i;
}
// Normalize
i = xs*ys;
il = (il << 8)/i;
ir = (ir << 8)/i;
iu = (iu << 8)/i;
id = (id << 8)/i;
ic = (ic << 8)/i;
}
};
//---------------------------------------------------------------------------
AnsiString bmp2txt_big(Graphics::TBitmap *bmp,TFont *font) // Character sized areas
{
int i, i0, d, d0;
int xs, ys, xf, yf, x, xx, y, yy;
DWORD **p = NULL,**q = NULL; // Bitmap direct pixel access
Graphics::TBitmap *tmp; // Temporary bitmap for single character
AnsiString txt = ""; // Output ASCII art text
AnsiString eol = "\r\n"; // End of line sequence
intensity map[97]; // Character map
intensity gfx;
// Input image size
xs = bmp->Width;
ys = bmp->Height;
// Output font size
xf = font->Size; if (xf<0) xf =- xf;
yf = font->Height; if (yf<0) yf =- yf;
for (;;) // Loop to simplify the dynamic allocation error handling
{
// Allocate and initialise buffers
tmp = new Graphics::TBitmap;
if (tmp==NULL)
break;
// Allow 32 bit pixel access as DWORD/int pointer
tmp->HandleType = bmDIB; bmp->HandleType = bmDIB;
tmp->PixelFormat = pf32bit; bmp->PixelFormat = pf32bit;
// Copy target font properties to tmp
tmp->Canvas->Font->Assign(font);
tmp->SetSize(xf, yf);
tmp->Canvas->Font ->Color = clBlack;
tmp->Canvas->Pen ->Color = clWhite;
tmp->Canvas->Brush->Color = clWhite;
xf = tmp->Width;
yf = tmp->Height;
// Direct pixel access to bitmaps
p = new DWORD*[ys];
if (p == NULL) break;
for (y=0; y<ys; y++)
p[y] = (DWORD*)bmp->ScanLine[y];
q = new DWORD*[yf];
if (q == NULL) break;
for (y=0; y<yf; y++)
q[y] = (DWORD*)tmp->ScanLine[y];
// Create character map
for (x=0, d=32; d<128; d++, x++)
{
map[x].c = char(DWORD(d));
// Clear tmp
tmp->Canvas->FillRect(TRect(0, 0, xf, yf));
// Render tested character to tmp
tmp->Canvas->TextOutA(0, 0, map[x].c);
// Compute intensity
map[x].compute(q, xf, yf, 0, 0);
}
map[x].c = 0;
// Loop through the image by zoomed character size step
xf -= xf/3; // Characters are usually overlapping by 1/3
xs -= xs % xf;
ys -= ys % yf;
for (y=0; y<ys; y+=yf, txt += eol)
for (x=0; x<xs; x+=xf)
{
// Compute intensity
gfx.compute(p, xf, yf, x, y);
// Find the closest match in map[]
i0 = 0; d0 = -1;
for (i=0; map[i].c; i++)
{
d = abs(map[i].il-gfx.il) +
abs(map[i].ir-gfx.ir) +
abs(map[i].iu-gfx.iu) +
abs(map[i].id-gfx.id) +
abs(map[i].ic-gfx.ic);
if ((d0<0)||(d0>d)) {
d0=d; i0=i;
}
}
// Add fitted character to output
txt += map[i0].c;
}
break;
}
// Free buffers
if (tmp) delete tmp;
if (p ) delete[] p;
return txt;
}
//---------------------------------------------------------------------------
AnsiString bmp2txt_small(Graphics::TBitmap *bmp) // pixel sized areas
{
AnsiString m = " `'.,:;i+o*%&$##"; // Constant character map
int x, y, i, c, l;
BYTE *p;
AnsiString txt = "", eol = "\r\n";
l = m.Length();
bmp->HandleType = bmDIB;
bmp->PixelFormat = pf32bit;
for (y=0; y<bmp->Height; y++)
{
p = (BYTE*)bmp->ScanLine[y];
for (x=0; x<bmp->Width; x++)
{
i = p[(x<<2)+0];
i += p[(x<<2)+1];
i += p[(x<<2)+2];
i = (i*l)/768;
txt += m[l-i];
}
txt += eol;
}
return txt;
}
//---------------------------------------------------------------------------
void update()
{
int x0, x1, y0, y1, i, l;
x0 = bmp->Width;
y0 = bmp->Height;
if ((x0<64)||(y0<64)) Form1->mm_txt->Text = bmp2txt_small(bmp);
else Form1->mm_txt->Text = bmp2txt_big (bmp, Form1->mm_txt->Font);
Form1->mm_txt->Lines->SaveToFile("pic.txt");
for (x1 = 0, i = 1, l = Form1->mm_txt->Text.Length();i<=l;i++) if (Form1->mm_txt->Text[i] == 13) { x1 = i-1; break; }
for (y1=0, i=1, l=Form1->mm_txt->Text.Length();i <= l; i++) if (Form1->mm_txt->Text[i] == 13) y1++;
x1 *= abs(Form1->mm_txt->Font->Size);
y1 *= abs(Form1->mm_txt->Font->Height);
if (y0<y1) y0 = y1; x0 += x1 + 48;
Form1->ClientWidth = x0;
Form1->ClientHeight = y0;
Form1->Caption = AnsiString().sprintf("Picture -> Text (Font %ix%i)", abs(Form1->mm_txt->Font->Size), abs(Form1->mm_txt->Font->Height));
}
//---------------------------------------------------------------------------
void draw()
{
Form1->ptb_gfx->Canvas->Draw(0, 0, bmp);
}
//---------------------------------------------------------------------------
void load(AnsiString name)
{
bmp->LoadFromFile(name);
bmp->HandleType = bmDIB;
bmp->PixelFormat = pf32bit;
Form1->ptb_gfx->Width = bmp->Width;
Form1->ClientHeight = bmp->Height;
Form1->ClientWidth = (bmp->Width << 1) + 32;
}
//---------------------------------------------------------------------------
__fastcall TForm1::TForm1(TComponent* Owner):TForm(Owner)
{
load("pic.bmp");
update();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormDestroy(TObject *Sender)
{
delete bmp;
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormPaint(TObject *Sender)
{
draw();
}
//---------------------------------------------------------------------------
void __fastcall TForm1::FormMouseWheel(TObject *Sender, TShiftState Shift, int WheelDelta, TPoint &MousePos, bool &Handled)
{
int s = abs(mm_txt->Font->Size);
if (WheelDelta<0) s--;
if (WheelDelta>0) s++;
mm_txt->Font->Size = s;
update();
}
//---------------------------------------------------------------------------
It is simple a form application (Form1) with a single TMemo mm_txt in it. It loads an image, "pic.bmp", and then according to the resolution, choose which approach to use to convert to text which is saved to "pic.txt" and sent to memo to visualize.
For those without VCL, ignore the VCL stuff and replace AnsiString with any string type you have, and also the Graphics::TBitmap with any bitmap or image class you have at disposal with pixel access capability.
A very important note is that this uses the settings of mm_txt->Font, so make sure you set:
Font->Pitch = fpFixed
Font->Charset = OEM_CHARSET
Font->Name = "System"
to make this work properly, otherwise the font will not be handled as mono-spaced. The mouse wheel just changes the font size up/down to see results on different font sizes.
[Notes]
See Word Portraits visualization
Use a language with bitmap/file access and text output capabilities
I strongly recommend to start with the first approach as it is very easy straightforward and simple, and only then move to the second (which can be done as modification of the first, so most of the code stays as is anyway)
It is a good idea to compute with inverted intensity (black pixels is the maximum value) because the standard text preview is on a white background, hence leading to much better results.
you can experiment with size, count, and layout of the subdivision zones or use some grid like 3x3 instead.
Comparison
Finally here is a comparison between the two approaches on the same input:
The green dot marked images are done with approach #2 and the red ones with #1, all on a six-pixel font size. As you can see on the light bulb image, the shape-sensitive approach is much better (even if the #1 is done on a 2x zoomed source image).
Cool application
While reading today's new questions, I got an idea of a cool application that grabs a selected region of the desktop and continuously feed it to the ASCIIart convertor and view the result. After an hour of coding, it's done and I am so satisfied with the result that I simply must have to add it here.
OK the application consists of just two windows. The first master window is basically my old convertor window without the image selection and preview (all the stuff above is in it). It has just the ASCII preview and conversion settings. The second window is an empty form with transparent inside for the grabbing area selection (no functionality whatsoever).
Now on a timer, I just grab the selected area by the selection form, pass it to conversion, and preview the ASCIIart.
So you enclose an area you want to convert by the selection window and view the result in master window. It can be a game, viewer, etc. It looks like this:
So now I can watch even videos in ASCIIart for fun. Some are really nice :).
If you want to try to implement this in GLSL, take a look at this:
Convert floating-point numbers to decimal digits in GLSL?

Template Matching with Mask

I want to perform Template matching with mask. In general Template matching can be made faster by converting the image from Spacial domain into Frequency domain. But is there any any method i can apply if i want to perform the same with mask? I'm using opencv c++. Is there any matching function already there in opencv for this task?
My current Approach:
Bitwise Xor Image A & Image B with Mask.
Count the Non-Zero Pixels.
Fill the Resultant matrix with this count.
Search for maxi-ma.
Few parameters I'm guessing now are:
Skip the Tile position if the matches are less than 25%.
Skip the tile position if the matches are less than 25%.
Skip the Tile position if the previous Tile has matches are less than 50%.
My question: is there any algorithm to do this matching already? Is there any mathematical operation which can speed up this process?
With binary images, you can use directly HU-Moments and Mahalanobis distance to find if image A is similar to image B. If the distance tends to 0, then the images are the same.
Of course you can use also Features detectors so see what matches, but for pictures like these, HU Moments or Features detectors will give approximately same results, but HU Moments are more efficient.
Using findContours, you can extract the black regions inside the white star and fill them, in order to have image A = image B.
Other approach: using findContours on your mask and apply the result to Image A (extracting the Region of Interest), you can extract what's inside the star and count how many black pixels you have (the mismatching ones).
I have same requirement and I have tried the almost same way. As in the image, I want to match the castle. The castle has a different shield image and variable length clan name and also grass background(This image comes from game Clash of Clans). The normal opencv matchTemplate does not work. So I write my own.
I follow the ways of matchTemplate to create a result image, but with different algorithm.
The core idea is to count the matched pixel under the mask. The code is following, it is simple.
This works fine, but the time cost is high. As you can see, it costs 457ms.
Now I am working on the optimization.
The source and template images are both CV_8U3C, mask image is CV_8U. Match one channel is OK. It is more faster, but it still costs high.
Mat tmp(matTempl.cols, matTempl.rows, matTempl.type());
int matchCount = 0;
float maxVal = 0;
double areaInvert = 1.0 / countNonZero(matMask);
for (int j = 0; j < resultRows; j++)
{
float* data = imgResult.ptr<float>(j);
for (int i = 0; i < resultCols; i++)
{
Mat matROI(matSource, Rect(i, j, matTempl.cols, matTempl.rows));
tmp.setTo(Scalar(0));
bitwise_xor(matROI, matTempl, tmp);
bitwise_and(tmp, matMask, tmp);
data[i] = 1.0f - float(countNonZero(tmp) * areaInvert);
if (data[i] > matchingDegree)
{
SRect rc;
rc.left = i;
rc.top = j;
rc.right = i + imgTemplate.cols;
rc.bottom = j + imgTemplate.rows;
rcOuts.push_back(rc);
if ( data[i] > maxVal)
{
maxVal = data[i];
maxIndex = rcOuts.size() - 1;
}
if (++matchCount == maxMatchs)
{
Log_Warn("Too many matches, stopped at: " << matchCount);
return true;
}
}
}
}
It says I have not enough reputations to post image....
http://i.stack.imgur.com/mJrqU.png
New added:
I success optimize the algorithm by using key points. Calculate all the points is cost, but it is faster to calculate only server key points. See the picture, the costs decrease greatly, now it is about 7ms.
I still can not post image, please visit: http://i.stack.imgur.com/ePcD9.png
Please give me reputations, so I can post images. :)
There is a technical formulation for template matching with mask in OpenCV Documentation, which works well. It can be used by calling cv::matchTemplate and its source code is also available under the Intel License.

Optimizing the Dijkstra's algorithm

I need a graph-search algorithm that is enough in our application of robot navigation and I chose Dijkstra's algorithm.
We are given the gridmap which contains free, occupied and unknown cells where the robot is only permitted to pass through the free cells. The user will input the starting position and the goal position. In return, I will retrieve the sequence of free cells leading the robot from starting position to the goal position which corresponds to the path.
Since executing the dijkstra's algorithm from start to goal would give us a reverse path coming from goal to start, I decided to execute the dijkstra's algorithm backwards such that I would retrieve the path from start to goal.
Starting from the goal cell, I would have 8 neighbors whose cost horizontally and vertically is 1 while diagonally would be sqrt(2) only if the cells are reachable (i.e. not out-of-bounds and free cell).
Here are the rules that should be observe in updating the neighboring cells, the current cell can only assume 8 neighboring cells to be reachable (e.g. distance of 1 or sqrt(2)) with the following conditions:
The neighboring cell is not out of bounds
The neighboring cell is unvisited.
The neighboring cell is a free cell which can be checked via the 2-D grid map.
Here is my implementation:
#include <opencv2/opencv.hpp>
#include <algorithm>
#include "Timer.h"
/// CONSTANTS
static const int UNKNOWN_CELL = 197;
static const int FREE_CELL = 255;
static const int OCCUPIED_CELL = 0;
/// STRUCTURES for easier management.
struct vertex {
cv::Point2i id_;
cv::Point2i from_;
vertex(cv::Point2i id, cv::Point2i from)
{
id_ = id;
from_ = from;
}
};
/// To be used for finding an element in std::multimap STL.
struct CompareID
{
CompareID(cv::Point2i val) : val_(val) {}
bool operator()(const std::pair<double, vertex> & elem) const {
return val_ == elem.second.id_;
}
private:
cv::Point2i val_;
};
/// Some helper functions for dijkstra's algorithm.
uint8_t get_cell_at(const cv::Mat & image, int x, int y)
{
assert(x < image.rows);
assert(y < image.cols);
return image.data[x * image.cols + y];
}
/// Some helper functions for dijkstra's algorithm.
bool checkIfNotOutOfBounds(cv::Point2i current, int rows, int cols)
{
return (current.x >= 0 && current.y >= 0 &&
current.x < cols && current.y < rows);
}
/// Brief: Finds the shortest possible path from starting position to the goal position
/// Param gridMap: The stage where the tracing of the shortest possible path will be performed.
/// Param start: The starting position in the gridMap. It is assumed that start cell is a free cell.
/// Param goal: The goal position in the gridMap. It is assumed that the goal cell is a free cell.
/// Param path: Returns the sequence of free cells leading to the goal starting from the starting cell.
bool findPathViaDijkstra(const cv::Mat& gridMap, cv::Point2i start, cv::Point2i goal, std::vector<cv::Point2i>& path)
{
// Clear the path just in case
path.clear();
// Create working and visited set.
std::multimap<double,vertex> working, visited;
// Initialize working set. We are going to perform the djikstra's
// backwards in order to get the actual path without reversing the path.
working.insert(std::make_pair(0, vertex(goal, goal)));
// Conditions in continuing
// 1.) Working is empty implies all nodes are visited.
// 2.) If the start is still not found in the working visited set.
// The Dijkstra's algorithm
while(!working.empty() && std::find_if(visited.begin(), visited.end(), CompareID(start)) == visited.end())
{
// Get the top of the STL.
// It is already given that the top of the multimap has the lowest cost.
std::pair<double, vertex> currentPair = *working.begin();
cv::Point2i current = currentPair.second.id_;
visited.insert(currentPair);
working.erase(working.begin());
// Check all arcs
// Only insert the cells into working under these 3 conditions:
// 1. The cell is not in visited cell
// 2. The cell is not out of bounds
// 3. The cell is free
for (int x = current.x-1; x <= current.x+1; x++)
for (int y = current.y-1; y <= current.y+1; y++)
{
if (checkIfNotOutOfBounds(cv::Point2i(x, y), gridMap.rows, gridMap.cols) &&
get_cell_at(gridMap, x, y) == FREE_CELL &&
std::find_if(visited.begin(), visited.end(), CompareID(cv::Point2i(x, y))) == visited.end())
{
vertex newVertex = vertex(cv::Point2i(x,y), current);
double cost = currentPair.first + sqrt(2);
// Cost is 1
if (x == current.x || y == current.y)
cost = currentPair.first + 1;
std::multimap<double, vertex>::iterator it =
std::find_if(working.begin(), working.end(), CompareID(cv::Point2i(x, y)));
if (it == working.end())
working.insert(std::make_pair(cost, newVertex));
else if(cost < (*it).first)
{
working.erase(it);
working.insert(std::make_pair(cost, newVertex));
}
}
}
}
// Now, recover the path.
// Path is valid!
if (std::find_if(visited.begin(), visited.end(), CompareID(start)) != visited.end())
{
std::pair <double, vertex> currentPair = *std::find_if(visited.begin(), visited.end(), CompareID(start));
path.push_back(currentPair.second.id_);
do
{
currentPair = *std::find_if(visited.begin(), visited.end(), CompareID(currentPair.second.from_));
path.push_back(currentPair.second.id_);
} while(currentPair.second.id_.x != goal.x || currentPair.second.id_.y != goal.y);
return true;
}
// Path is invalid!
else
return false;
}
int main()
{
// cv::Mat image = cv::imread("filteredmap1.jpg", CV_LOAD_IMAGE_GRAYSCALE);
cv::Mat image = cv::Mat(100,100,CV_8UC1);
std::vector<cv::Point2i> path;
for (int i = 0; i < image.rows; i++)
for(int j = 0; j < image.cols; j++)
{
image.data[i*image.cols+j] = FREE_CELL;
if (j == image.cols/2 && (i > 3 && i < image.rows - 3))
image.data[i*image.cols+j] = OCCUPIED_CELL;
// if (image.data[i*image.cols+j] > 215)
// image.data[i*image.cols+j] = FREE_CELL;
// else if(image.data[i*image.cols+j] < 100)
// image.data[i*image.cols+j] = OCCUPIED_CELL;
// else
// image.data[i*image.cols+j] = UNKNOWN_CELL;
}
// Start top right
cv::Point2i goal(image.cols-1, 0);
// Goal bottom left
cv::Point2i start(0, image.rows-1);
// Time the algorithm.
Timer timer;
timer.start();
findPathViaDijkstra(image, start, goal, path);
std::cerr << "Time elapsed: " << timer.getElapsedTimeInMilliSec() << " ms";
// Add the path in the image for visualization purpose.
cv::cvtColor(image, image, CV_GRAY2BGRA);
int cn = image.channels();
for (int i = 0; i < path.size(); i++)
{
image.data[path[i].x*cn*image.cols+path[i].y*cn+0] = 0;
image.data[path[i].x*cn*image.cols+path[i].y*cn+1] = 255;
image.data[path[i].x*cn*image.cols+path[i].y*cn+2] = 0;
}
cv::imshow("Map with path", image);
cv::waitKey();
return 0;
}
For the algorithm implementation, I decided to have two sets namely the visited and working set whose each elements contain:
The location of itself in the 2D grid map.
The accumulated cost
Through what cell did it get its accumulated cost (for path recovery)
And here is the result:
The black pixels represent obstacles, the white pixels represent free space and the green line represents the path computed.
On this implementation, I would only search within the current working set for the minimum value and DO NOT need to scan throughout the cost matrix (where initially, the initially cost of all cells are set to infinity and the starting point 0). Maintaining a separate vector of the working set I think promises a better code performance because all the cells that have cost of infinity is surely to be not included in the working set but only those cells that have been touched.
I also took advantage of the STL which C++ provides. I decided to use the std::multimap since it can store duplicating keys (which is the cost) and it sorts the lists automatically. However, I was forced to use std::find_if() to find the id (which is the row,col of the current cell in the set) in the visited set to check if the current cell is on it which promises linear complexity. I really think this is the bottleneck of the Dijkstra's algorithm.
I am well aware that A* algorithm is much faster than Dijkstra's algorithm but what I wanted to ask is my implementation of Dijkstra's algorithm optimal? Even if I implemented A* algorithm using my current implementation in Dijkstra's which is I believe suboptimal, then consequently A* algorithm will also be suboptimal.
What improvement can I perform? What STL is the most appropriate for this algorithm? Particularly, how do I improve the bottleneck?
You're using a std::multimap for 'working' and 'visited'. That's not great.
The first thing you should do is change visited into a per-vertex flag so you can do your find_if in constant time instead of linear times and also so that operations on the list of visited vertices take constant instead of logarithmic time. You know what all the vertices are and you can map them to small integers trivially, so you can use either a std::vector or a std::bitset.
The second thing you should do is turn working into a priority queue, rather than a balanced binary tree structure, so that operations are a (largish) constant factor faster. std::priority_queue is a barebones binary heap. A higher-radix heap---say quaternary for concreteness---will probably be faster on modern computers due to its reduced depth. Andrew Goldberg suggests some bucket-based data structures; I can dig up references for you if you get to that stage. (They're not too complicated.)
Once you've taken care of these two things, you might look at A* or meet-in-the-middle tricks to speed things up even more.
Your performance is several orders of magnitude worse than it could be because you're using graph search algorithms for what looks like geometry. This geometry is much simpler and less general than the problems that graph search algorithms can solve. Also, with a vertex for every pixel your graph is huge even though it contains basically no information.
I heard you asking "how can I make this better without changing what I'm thinking" but nevertheless I'll tell you a completely different and better approach.
It looks like your robot can only go horizontally, vertically or diagonally. Is that for real or just a side effect of you choosing graph search algorithms? I'll assume the latter and let it go in any direction.
The algorithm goes like this:
(0) Represent your obstacles as polygons by listing the corners. Work in real numbers so you can make them as thin as you like.
(1) Try for a straight line between the end points.
(2) Check if that line goes through an obstacle or not. To do that for any line, show that all corners of any particular obstacle lie on the same side of the line. To do that, translate all points by (-X,-Y) of one end of the line so that that point is at the origin, then rotate until the other point is on the X axis. Now all corners should have the same sign of Y if there's no obstruction. There might be a quicker way just using gradients.
(3) If there's an obstruction, propose N two-segment paths going via the N corners of the obstacle.
(4) Recurse for all segments, culling any paths with segments that go out of bounds. That won't be a problem unless you have obstacles that go out of bounds.
(5) When it stops recursing, you should have a list of locally optimised paths from which you can choose the shortest.
(6) If you really want to restrict bearings to multiples of 45 degrees, then you can do this algorithm first and then replace each segment by any 45-only wiggly version that avoids obstacles. We know that such a version exists because you can stay extremely close to the original line by wiggling very often. We also know that all such wiggly paths have the same length.

Using time in OpenCV for frame processes and other tasks

I want to count the vehicles from a video. After frame differencing I got a gray scale image or kind of binary image. I have defined a Region of Interest to work on a specific area of the frames, the values of the pixels of the vehicles passing through the Region of Interest are higher than 0 or even higher than 40 or 50 because they are white.
My idea is that when a certain number of pixels in a specific interval of time (say 1-2 seconds) are white then there must be a vehicle passing so I will increment the counter.
What I want is, to check whether there are still white pixels coming or not after a 1-2 seconds. If there are no white pixels coming it means that the vehicle has passed and the next vehicle is gonna come, in this way the counter must be incremented.
One method that came to my mind is to count the frames of the video and store it in a variable called No_of_frames. Then using that variable I think I can estimate the time passed. If the value of the variable No_of_frames is greater then lets say 20, it means that nearly 1 second has passed, if my videos frame rate is 25-30 fps.
I am using Qt Creator with windows 7 and OpenCV 2.3.1
My code is something like:
for(int i=0; i<matFrame.rows; i++)
{
for(int j=0;j<matFrame.cols;j++)
if (matFrame.at<uchar>(i,j)>100)//values of pixels greater than 100
//will be considered as white.
{
whitePixels++;
}
if ()// here I want to use time. The 'if' statement must be like:
//if (total_no._of_whitepixels>100 && no_white_pixel_came_after 2secs)
//which means that a vehicle has just passed so increment the counter.
{
counter++;
}
}
Any other idea for counting the vehicles, better than mine, will be most welcomed. Thanks in advance.
For background segmentation I am using the following algorithm but it is very slow, I don't know why. The whole code is as follows:
// opencv2/video/background_segm.hpp OPENCV header file must be included.
IplImage* tmp_frame = NULL;
CvCapture* cap = NULL;
bool update_bg_model = true;
Mat element = getStructuringElement( 0, Size( 2,2 ),Point() );
Mat eroded_frame;
Mat before_erode;
if( argc > 2 )
cap = cvCaptureFromCAM(0);
else
// cap = cvCreateFileCapture( "C:\\4.avi" );
cap = cvCreateFileCapture( "C:\\traffic2.mp4" );
if( !cap )
{
printf("can not open camera or video file\n");
return -1;
}
tmp_frame = cvQueryFrame(cap);
if(!tmp_frame)
{
printf("can not read data from the video source\n");
return -1;
}
cvNamedWindow("BackGround", 1);
cvNamedWindow("ForeGround", 1);
CvBGStatModel* bg_model = 0;
for( int fr = 1;tmp_frame; tmp_frame = cvQueryFrame(cap), fr++ )
{
if(!bg_model)
{
//create BG model
bg_model = cvCreateGaussianBGModel( tmp_frame );
// bg_model = cvCreateFGDStatModel( temp );
continue;
}
double t = (double)cvGetTickCount();
cvUpdateBGStatModel( tmp_frame, bg_model, update_bg_model ? -1 : 0 );
t = (double)cvGetTickCount() - t;
printf( "%d. %.1f\n", fr, t/(cvGetTickFrequency()*1000.) );
before_erode= bg_model->foreground;
cv::erode((Mat)bg_model->background, (Mat)bg_model->foreground, element );
//eroded_frame=bg_model->foreground;
//frame=(IplImage *)erode_frame.data;
cvShowImage("BackGround", bg_model->background);
cvShowImage("ForeGround", bg_model->foreground);
// cvShowImage("ForeGround", bg_model->foreground);
char k = cvWaitKey(5);
if( k == 27 ) break;
if( k == ' ' )
{
update_bg_model = !update_bg_model;
if(update_bg_model)
printf("Background update is on\n");
else
printf("Background update is off\n");
}
}
cvReleaseBGStatModel( &bg_model );
cvReleaseCapture(&cap);
return 0;
A great deal of research has been done on vehicle tracking and counting. The approach you describe appears to be quite fragile, and is unlikely to be robust or accurate. The main issue is using a count of pixels above a certain threshold, without regard for their spatial connectivity or temporal relation.
Frame differencing can be useful for separating a moving object from its background, provided the object of interest is the only (or largest) moving object.
What you really need is to first identify the object of interest, segment it from the background, and track it over time using an adaptive filter (such as a Kalman filter). Have a look at the OpenCV video reference. OpenCV provides background subtraction and object segmentation to do all the required steps.
I suggest you read up on OpenCV - Learning OpenCV is a great read. And also on more general computer vision algorithms and theory - http://homepages.inf.ed.ac.uk/rbf/CVonline/books.htm has a good list.
Normally they just put a small pneumatic pipe across the road (soft pipe semi filled with air). It is attached to a simple counter. Each vehicle passing over the pipe generates two pulses (first front, then rear wheels). The counter records the number of pulses in specified time intervals and divides by 2 to get the approx vehicle count.