C++: Segmentation fault on pthread_create - c++

I'm relatively new to C in general and I'm trying to make a small image filter while using pthreads. After a few hours of playing around with pointers and references, it goes through the compiler but then I get a segmentation fault, the code is the following:
#include <iostream>
#include <opencv2/core.hpp>
#include <opencv2/imgcodecs.hpp>
using namespace std;
using namespace cv;
#define WIDTH 3
#define HEIGHT 4
#define NUM_THREADS 4
struct readThreadParams{
Mat img;
Mat out;
int yStart;
int xEnd;
int yEnd;
int xRad;
int yRad;
};
//Find average of all pixels in WXH area
uchar getAverage(Mat &img, Mat &out, const float x1, const float y1, const int xRad, const int yRad){
//x1, y1: Pixel position being checked. xRad, yRad: how many pixels are being checked in x and y, relative to starting point.
uchar blue;
uchar green;
uchar red;
Vec3b outColor;
for (int c = 0; c < xRad; c++){
for (int r = 0; r < yRad; r++){
Vec3b intensity = img.at<Vec3b>(r, c);
blue =+ intensity.val[0];
green =+ intensity.val[1];
red =+ intensity.val[2];
}
}
outColor[0] = (blue/(xRad*yRad*4));
outColor[1] = (green/(xRad*yRad*4));
outColor[2] = (red/(xRad*yRad*4));
for (int c = 0; c< xRad; c++){
for (int r = 0; r< yRad; r++)
out.at<Vec3b>(Point(c, r)) = outColor;
}
}
void* parallel_processing_task(void * param){
//This is what each thread should do:
struct readThreadParams *input = (struct readThreadParams*)param;
Mat img = input->img;
Mat out = input->out;
const float yStart = input->yStart;
const float xEnd = input->xEnd;
const float yEnd = input->yEnd;
const float xRad = input->xRad;
const float yRad = input->yRad;
for (int c = 0; c < xEnd; c + xRad){
for (int r=yStart; r < yEnd; r + yRad){
getAverage(img, out, c, r, xRad, yRad);
}
}
}
int main(int argc, char *argv[]){
//prepare variables
pthread_t threads[NUM_THREADS];
void* return_status;
struct readThreadParams input;
int t;
Mat img = imread("image.jpg", IMREAD_COLOR);
int ROWS = img.rows;
int COLS = img.cols;
Mat out(ROWS, COLS, CV_8UC3);
input.img = img;
input.out = out;
input.xEnd = COLS;
input.xRad = WIDTH;
input.yRad = HEIGHT;
double t2 = (double) getTickCount();
for (int r = 0; r<ROWS ; ceil(ROWS/NUM_THREADS)){
input.yStart = r;
input.yEnd = r + ceil(ROWS/NUM_THREADS);
pthread_create(&threads[t], NULL, parallel_processing_task, (void *)&input);
}
for(t=0; t<NUM_THREADS; t++){
pthread_join(threads[t], &return_status);
}
t2 = ((double) getTickCount() - t2) / getTickFrequency();
//print execution time
cout << "Execution time: " << t2 << " s" << endl;
//result image
imwrite("output.png", out);
return(0);
}
I used GDB to find the culprit and managed to get as far as finding out it's on line 107:
pthread_create(&threads[t], NULL, parallel_processing_task, (void *)&input);
At this point, I tried going all over the place to find solutions, I tried the following:
Changing the way I defined the struct, making it receive pointers, which I later found out didn't work.
Changing the way I pass arguments (such as adding or removing
(*void) where it seemed proper), which ended up in a bigger mess of
errors or simply the same error at the end.
Furthermore, being new to this language doesn't really help me out when trying to read the gdb bt output:
#0__pthread_create_2_1(newthread=optimized out>, attr=<optimized out>, start_routine=<optimized out>, arg=<optimized out>) at pthread_create.c:601
#1 0x00011a00 in main(argc=1, argv=0x7efff394) at file.cpp:107
A part of me wants to think the problem is related to the optimized out parts, but looking it up yields no results, or at least, I may not be looking properly.
Any thoughts as to what I may be doing wrong here? I would very much appreciate the help!

You have not initialised t prior to using it in
pthread_create(&threads[t], NULL, parallel_processing_task, (void *)&input);
So this is likely to lead to undefined behaviour as t may be having any value that could make &threads[t] access invalid memory

Related

problem with sending a float number in a stream in vivado_hls

I am trying to do a simple image processing filter where the pixel values will be divided by half to reduce the intensity and I am trying to develop the hardware for the same. hence I am using vivado hls to generate the IP. As explained here https://forums.xilinx.com/t5/High-Level-Synthesis-HLS/Float-numbers-with-hls-stream/m-p/942747 to send floating numbers in a hls stream , an union needs to be used and I did the same. However, the results don't seem to be matching for the red and green components of the image whereas it is matching for the blue component of the image. It is a very simple algorithm where a pixel value will be divided by half.
I have been trying to resolve it but I am not able to see where the problem is. I have attached all the files below, can someone can help me resolve it??
////header file
#include "ap_fixed.h"
#include "hls_stream.h"
typedef union {
unsigned int i;
float r;
float g;
float b;
} conv;
typedef hls::stream <unsigned int> Stream_t;
void ftest(Stream_t& Sin,Stream_t& Sout);
////testbench
#include "stream_check_h.hpp"
int main()
{
Mat img_rev = imread("C:/Users/20181217/Desktop/images/imgs/output_fwd_v3.png");//(256x512)
Mat final_img(img_rev.rows,img_rev.cols,CV_8UC3);
Mat ref_img(img_rev.rows,img_rev.cols,CV_8UC3);
Stream_t S1,S2;
int err_r = 0;
int err_g = 0;
int err_b = 0;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
conv c;
c.r = (float)img_rev.at<Vec3b>(i,j)[0];
c.g = (float)img_rev.at<Vec3b>(i,j)[1];
c.b = (float)img_rev.at<Vec3b>(i,j)[2];
S1 << c.i;
}
}
ftest(S1,S2);
conv c;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
S2 >> c.i;
final_img.at<Vec3b>(i,j)[0]=(unsigned char)c.r;
final_img.at<Vec3b>(i,j)[1]=(unsigned char)c.g;
final_img.at<Vec3b>(i,j)[2]=(unsigned char)c.b;
ref_img.at<Vec3b>(i,j)[0] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[0])/2.0);
ref_img.at<Vec3b>(i,j)[1] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[1])/2.0);
ref_img.at<Vec3b>(i,j)[2] = (unsigned char)(((float)img_rev.at<Vec3b>(i,j)[2])/2.0);
}
}
Mat diff;
cout<<diff;
diff= abs(final_img-ref_img);
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
if((int)diff.at<Vec3b>(i,j)[0] > 0)
{
err_r++;
cout<<"expected value: "<<(int)ref_img.at<Vec3b>(i,j)[0]<<", final_value: "<<(int)final_img.at<Vec3b>(i,j)[0]<<", actual value:"<<(int)img_rev.at<Vec3b>(i,j)[0]<<endl;
}
if((int)diff.at<Vec3b>(i,j)[1] > 0)
err_g++;
if((int)diff.at<Vec3b>(i,j)[2] > 0)
err_b++;
}
}
cout<<"number of errors: "<<err_r<<", "<<err_g<<", "<<err_b;
return 0;
}
////core
#include "stream_check_h.hpp"
void ftest(Stream_t& Sin,Stream_t& Sout)
{
conv cin,cout;
for(int i=0;i<256;i++)
{
for(int j=0;j<512;j++)
{
Sin >> cin.i;
cout.r = cin.r/2.0 ;
cout.g = cin.g/2.0 ;
cout.b = cin.b/2.0 ;
Sout << cout.i;
}
}
}
when I debugged, it showed that the blue components of the pixels are matching. for one red pixel it showed me the following:
expected value: 22, final_value: 14, actual value:45
and the total errors for red, green, and blue are:
number of errors: 126773, 131072, 0
I am not able to see why it is going wrong for red and green. I posted here hoping a fresh set of eyes would help my problem.
Thanks in advance
I'm assuming you're using a 32bit-wide stream with 3 RGB pixels 8bit unsigned (CV_8U3). I believe the problem with the union type in your case is the overlapping of its three members (not just like the one float value in the example you cite). This means that by doing the division, you're actually doing it over the whole 32bit data you're receiving.
I possible workaround I quickly cam up with would be to cast the unsigned int you're getting from the stream into an ap_uint<32> type, then chop it in the R, G, B chunks (with the range() method) and divide. Finally, assemble back the result and stream it back.
unsigned int packet;
Sin >> packet;
ap_uint<32> packet_uint32 = *((ap_uint<32>*)&packet); // casting (not elegant, but works)
ap_int<8> b = packet_uint32.range(7, 0);
ap_int<8> g = packet_uint32.range(15, 8);
ap_int<8> r = packet_uint32.range(23, 16); // In case they are in the wrong bit range/order, just flip the r, g, b assignements
b /= 2;
g /= 2;
r /= 2;
packet_uint32.range(7, 0) = b;
packet_uint32.range(15, 8) = g;
packet_uint32.range(23, 16) = r;
packet = packet_uint32.to_int();
Sout << packet;
NOTE: I've reused the same variables in the code above: HLS shouldn't complain about it and come out with a good RTL anyway. In case it shouldn't, just create new ones.

CUDA Speed Slower than expected - Image Processing

I am new to CUDA development and wanted to write a simple benchmark to test some image processing feasibility. I have 32 images that are each 720x540, one byte per pixel greyscale.
I am running benchmarks for 10 seconds, and counting how many times they are able to process. There are three benchmarks I am running:
The first is just transferring the images into the GPU global memory, via cudaMemcpy
The second is transferring and processing the images.
The third is running the equivalent test on a CPU.
For a starting, simple test, the image processing is just counting the number of pixels above a certain greyscale value. I'm finding that accessing global memory on the GPU is very slow. I have my benchmark structured such that it creates one block per image, and one thread per row in each image. Each thread counts its pixels into a shared memory array, after which the first thread sums them up (See below).
The issue I am having is that this all runs very slowly - about 50fps. Much slower than a CPU version - about 230fps. If I comment out the pixel value comparison, resulting in just a count of all pixels, I get 6x the performance. I tried using texture memory but didn't see a performance gain. I am running a Quadro K2000. Also: the image copy only benchmark is able to copy at around 330fps, so that doesn't appear to be the issue.
Any help / pointers would be appreciated. Thank you.
__global__ void ThreadPerRowCounter(int Threshold, int W, int H, U8 **AllPixels, int *AllReturns)
{
extern __shared__ int row_counts[];//this parameter to kernel call "<<<, ,>>>" sets the size
//see here for indexing https://blog.usejournal.com/cuda-thread-indexing-fb9910cba084
int myImage = blockIdx.y * gridDim.x + blockIdx.x;
int myStartRow = (threadIdx.y * blockDim.x + threadIdx.x);
unsigned char *imageStart = AllPixels[myImage];
unsigned char *pixelStart = imageStart + myStartRow * W;
unsigned char *pixelEnd = pixelStart + W;
unsigned char *pixelItr = pixelStart;
int row_count = 0;
while(pixelItr < pixelEnd)
{
if (*pixelItr > Threshold) //REMOVING THIS LINE GIVES 6x PERFORMANCE
{
row_count++;
}
pixelItr++;
}
row_counts[myStartRow] = row_count;
__syncthreads();
if (myStartRow == 0)
{//first thread sums up for the while image
int image_count = 0;
for (int i = 0; i < H; i++)
{
image_count += row_counts[i];
}
AllReturns[myImage] = image_count;
}
}
extern "C" void cuda_Benchmark(int nImages, int W, int H, U8** AllPixels, int *AllReturns, int Threshold)
{
ThreadPerRowCounter<<<nImages, H, sizeof(int)*H>>> (
Threshold,
W, H,
AllPixels,
AllReturns);
//wait for all blocks to finish
checkCudaErrors(cudaDeviceSynchronize());
}
Two changes to your kernel design can result in a significant speedup:
Perform the operations column-wise instead of row-wise. The general background for why this matters/helps is described here.
Replace your final operation with a canonical parallel reduction.
According to my testing, those 2 changes result in ~22x speedup in kernel performance:
$ cat t49.cu
#include <iostream>
#include <helper_cuda.h>
typedef unsigned char U8;
__global__ void ThreadPerRowCounter(int Threshold, int W, int H, U8 **AllPixels, int *AllReturns)
{
extern __shared__ int row_counts[];//this parameter to kernel call "<<<, ,>>>" sets the size
//see here for indexing https://blog.usejournal.com/cuda-thread-indexing-fb9910cba084
int myImage = blockIdx.y * gridDim.x + blockIdx.x;
int myStartRow = (threadIdx.y * blockDim.x + threadIdx.x);
unsigned char *imageStart = AllPixels[myImage];
unsigned char *pixelStart = imageStart + myStartRow * W;
unsigned char *pixelEnd = pixelStart + W;
unsigned char *pixelItr = pixelStart;
int row_count = 0;
while(pixelItr < pixelEnd)
{
if (*pixelItr > Threshold) //REMOVING THIS LINE GIVES 6x PERFORMANCE
{
row_count++;
}
pixelItr++;
}
row_counts[myStartRow] = row_count;
__syncthreads();
if (myStartRow == 0)
{//first thread sums up for the while image
int image_count = 0;
for (int i = 0; i < H; i++)
{
image_count += row_counts[i];
}
AllReturns[myImage] = image_count;
}
}
__global__ void ThreadPerColCounter(int Threshold, int W, int H, U8 **AllPixels, int *AllReturns, int rsize)
{
extern __shared__ int col_counts[];//this parameter to kernel call "<<<, ,>>>" sets the size
int myImage = blockIdx.y * gridDim.x + blockIdx.x;
unsigned char *imageStart = AllPixels[myImage];
int myStartCol = (threadIdx.y * blockDim.x + threadIdx.x);
int col_count = 0;
for (int i = 0; i < H; i++) if (imageStart[myStartCol+i*W]> Threshold) col_count++;
col_counts[threadIdx.x] = col_count;
__syncthreads();
for (int i = rsize; i > 0; i>>=1){
if ((threadIdx.x+i < W) && (threadIdx.x < i)) col_counts[threadIdx.x] += col_counts[threadIdx.x+i];
__syncthreads();}
if (!threadIdx.x) AllReturns[myImage] = col_counts[0];
}
void cuda_Benchmark(int nImages, int W, int H, U8** AllPixels, int *AllReturns, int Threshold)
{
ThreadPerRowCounter<<<nImages, H, sizeof(int)*H>>> (
Threshold,
W, H,
AllPixels,
AllReturns);
//wait for all blocks to finish
checkCudaErrors(cudaDeviceSynchronize());
}
unsigned next_power_of_2(unsigned v){
v--;
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
v++;
return v;}
void cuda_Benchmark1(int nImages, int W, int H, U8** AllPixels, int *AllReturns, int Threshold)
{
int rsize = next_power_of_2(W/2);
ThreadPerColCounter<<<nImages, W, sizeof(int)*W>>> (
Threshold,
W, H,
AllPixels,
AllReturns, rsize);
//wait for all blocks to finish
checkCudaErrors(cudaDeviceSynchronize());
}
int main(){
const int my_W = 720;
const int my_H = 540;
const int n_img = 128;
const int my_thresh = 10;
U8 **img_p, **img_ph;
U8 *img, *img_h;
int *res, *res_h, *res_h1;
img_ph = (U8 **)malloc(n_img*sizeof(U8*));
cudaMalloc(&img_p, n_img*sizeof(U8*));
cudaMalloc(&img, n_img*my_W*my_H*sizeof(U8));
img_h = new U8[n_img*my_W*my_H];
for (int i = 0; i < n_img*my_W*my_H; i++) img_h[i] = rand()%20;
cudaMemcpy(img, img_h, n_img*my_W*my_H*sizeof(U8), cudaMemcpyHostToDevice);
for (int i = 0; i < n_img; i++) img_ph[i] = img+my_W*my_H*i;
cudaMemcpy(img_p, img_ph, n_img*sizeof(U8*), cudaMemcpyHostToDevice);
cudaMalloc(&res, n_img*sizeof(int));
cuda_Benchmark(n_img, my_W, my_H, img_p, res, my_thresh);
res_h = new int[n_img];
cudaMemcpy(res_h, res, n_img*sizeof(int), cudaMemcpyDeviceToHost);
cuda_Benchmark1(n_img, my_W, my_H, img_p, res, my_thresh);
res_h1 = new int[n_img];
cudaMemcpy(res_h1, res, n_img*sizeof(int), cudaMemcpyDeviceToHost);
for (int i = 0; i < n_img; i++) if (res_h[i] != res_h1[i]) {std::cout << "mismatch at: " << i << " was: " << res_h1[i] << " should be: " << res_h[i] << std::endl; return 0;}
}
$ nvcc -o t49 t49.cu -I/usr/local/cuda/samples/common/inc
$ cuda-memcheck ./t49
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors
$ nvprof ./t49
==1756== NVPROF is profiling process 1756, command: ./t49
==1756== Profiling application: ./t49
==1756== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 72.02% 54.325ms 1 54.325ms 54.325ms 54.325ms ThreadPerRowCounter(int, int, int, unsigned char**, int*)
24.71% 18.639ms 2 9.3195ms 1.2800us 18.638ms [CUDA memcpy HtoD]
3.26% 2.4586ms 1 2.4586ms 2.4586ms 2.4586ms ThreadPerColCounter(int, int, int, unsigned char**, int*, int)
0.00% 3.1040us 2 1.5520us 1.5360us 1.5680us [CUDA memcpy DtoH]
API calls: 43.63% 59.427ms 3 19.809ms 18.514us 59.159ms cudaMalloc
41.70% 56.789ms 2 28.394ms 2.4619ms 54.327ms cudaDeviceSynchronize
14.02% 19.100ms 4 4.7749ms 17.749us 18.985ms cudaMemcpy
0.52% 705.26us 96 7.3460us 203ns 327.21us cuDeviceGetAttribute
0.05% 69.268us 1 69.268us 69.268us 69.268us cuDeviceTotalMem
0.04% 50.688us 1 50.688us 50.688us 50.688us cuDeviceGetName
0.04% 47.683us 2 23.841us 14.352us 33.331us cudaLaunchKernel
0.00% 3.1770us 1 3.1770us 3.1770us 3.1770us cuDeviceGetPCIBusId
0.00% 1.5610us 3 520ns 249ns 824ns cuDeviceGetCount
0.00% 1.0550us 2 527ns 266ns 789ns cuDeviceGet
$
(Quadro K2000, CUDA 9.2.148, Fedora Core 27)
(The next_power_of_2 code is lifted from this answer)
I don't claim correctness for this code or any other code that I post. Anyone using any code I post does so at their own risk. I merely claim that I have attempted to address the questions in the original posting, and provide some explanation thereof. I am not claiming my code is defect-free, or that it is suitable for any particular purpose. Use it (or not) at your own risk.

Average Filter with OpenCV

I'm trying to implement an Averaging filter with a 5x5 kernel, although there is a function within OpenCV for this, I need to do it without it.
There is something wrong and I think that are the variables uchar, but I tried int, float and double and the image resulting it's not correct. I use an image with a padding of 7.
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/opencv.hpp>
#include "filter.h"
#include <iostream>
#include <fstream>
using namespace std;
using namespace cv;
cv::Mat filter::mean_filter(cv::Mat& image_in){
int centro = 7;
float total = 0.0;
double window[25];
double mean= 0.0;
int final=0;
int nlines, ncols;
cv::Mat kernel = cv::Mat::ones(5, 5, CV_32S);
nlines=image_in.size().height;
ncols=image_in.size().width;
cv::Mat image_out = cv::Mat::zeros(nlines,ncols,CV_32S);
for (unsigned int j=centro; j<nlines - centro; j++){
for (unsigned int z=centro; z<ncols - centro; z++){
window[0]=image_in.at<uchar>(j-2,z-2);
window[1]=image_in.at<uchar>(j-1,z-2);
window[2]=image_in.at<uchar>(j ,z-2);
window[3]=image_in.at<uchar>(j+1,z-2);
window[4]=image_in.at<uchar>(j+2,z-2);
window[5]=image_in.at<uchar>(j-2,z-1);
window[6]=image_in.at<uchar>(j-1,z-1);
window[7]=image_in.at<uchar>(j ,z-1);
window[8]=image_in.at<uchar>(j+1,z-1);
window[9]=image_in.at<uchar>(j+2,z-1);
window[10]=image_in.at<uchar>(j-2,z);
window[11]=image_in.at<uchar>(j-1,z);
window[12]=image_in.at<uchar>(j ,z);
window[13]=image_in.at<uchar>(j+1,z);
window[14]=image_in.at<uchar>(j+2,z);
window[15]=image_in.at<uchar>(j-2,z+2);
window[16]=image_in.at<uchar>(j-1,z+2);
window[17]=image_in.at<uchar>(j ,z+2);
window[18]=image_in.at<uchar>(j+1,z+2);
window[19]=image_in.at<uchar>(j+2,z+2);
window[20]=image_in.at<uchar>(j-2,z+1);
window[21]=image_in.at<uchar>(j-1,z+1);
window[22]=image_in.at<uchar>(j ,z+1);
window[23]=image_in.at<uchar>(j+1,z+1);
window[24]=image_in.at<uchar>(j+2,z+1);
mean=0.0;
final=0;
for (unsigned int k=0; k<25; k++){
mean+=window[k];
}
mean=mean/25;
final=round(mean);
image_out.at<int>(j,z)=final;
}
}
return image_out;
}
I changed your code a bit and have a working solution. It is a quite primitiv approach but it works.
Possible improvements could be to reuse some of the already accumulated pixel-values by tracking which pixels leave the kernel area and which pixels enter it.
Another possibility for improvement is to parallelise the loop over the image.
cv::Mat mean_filter(cv::Mat& image_in, int kernel)
{
// Make sure you get a grayscale image.
assert(image_in.type() == CV_8UC1);
// Make sure your kernel is an uneven number
assert(kernel % 2 == 1);
// Make sure your kernel is bigger than 1
assert(kernel >= 1);
// for padding calculate the border needed
int padding = (kernel - 1) / 2;
int mean = 0.0;
int final = 0;
int nlines, ncols;
cv::Mat img_temp;
nlines = image_in.size().height;
ncols = image_in.size().width;
// Make propper padding. Here it is done with 0. Padding describes the adding of a border to the image in order to avoid a cropping by applying a filter-mask.
copyMakeBorder(image_in, img_temp, padding, padding, padding, padding, BORDER_CONSTANT, 0);
// allocate the output image as grayscale as the input is grayscale as well
cv::Mat image_out = cv::Mat::zeros(nlines, ncols, CV_8UC1);
// loop over whole image
for (unsigned int j = padding; j<nlines + padding; j++){
for (unsigned int z = padding; z<ncols + padding; z++){
mean = 0.0;
// loop over kernel area
for (int x = -padding; x <= padding; x++){
for (int y = -padding; y <= padding; y++){
// accumulate all pixel-values
mean += img_temp.at<uchar>(j + x, z + y);
}
}
mean = mean / (kernel * kernel);
final = round(mean);
// cast result to uchar and set pixel in output image
image_out.at<uchar>(j - padding, z - padding) = (uchar)final;
}
}
return image_out;
}

calling c function from MATLAB?

I want to call a c function from matlab, for that I tried writing a wrapper function using MEX. While compiling I am getting
error C2109: subscript requires array or pointer type
and error C2440: 'function' : cannot convert from 'double *' to 'double'
Can anyone help me where i did the mistake??
#include "mex.h"
#include "matrix.h"
#include "CVIPtoolkit.h"
#include "CVIPtools.h"
#include "CVIPmatrix.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
void midd(double outdata, int type, int height, int width){
Image *outputImage;
byte **output;
int r,c;
mexPrintf("type %d\n", type);
mexPrintf("height %d\n", height);
mexPrintf("width %d\n", width);
outputImage=new_Image (PGM, GRAY_SCALE, 0, height, width, CVIP_BYTE, REAL );
outputImage = h_image(type, height,width);
output = getData_Image(outputImage, 0);
for(r=0; r < height; r++) {
for(c=0; c < width; c++)
{
mexPrintf("type %d\n", type);
mexPrintf("height %d\n", height);
mexPrintf("width %d\n", width);
outdata[r+height*c+height*width] =output[r][c]; /* passing data back to MATLAB variable from CVIPtools variable */
}
}
}
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
double *outdata;
int type, height, width;
// double *indata = (double *)mxGetData(prhs[0]);
type = mxGetScalar(prhs[0]);
height = mxGetScalar(prhs[1]);
width = mxGetScalar(prhs[2]);
mexPrintf("type %d\n", type);
mexPrintf("height %d\n", height);
mexPrintf("width %d\n", width);
plhs[0] = mxCreateDoubleMatrix(height,width,mxREAL);
outdata = mxGetData(plhs[0]);
midd(outdata, type, height, width);
}
The c function i am trying to call is as follows:
#include "CVIPtoolkit.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
Image *
h_image(int type, unsigned int height, unsigned int width){
/* type = 1, Constant
* type = 2, Fixed mask
* type = 3, Gaussian
*/
unsigned int r, c, hf_w = width/2, hf_h = height/2;
Image *outimage;
float **outdata, sum = 0.0, sigma, tmp1, tmp2, tmp;
if (height < 3 || width < 3) {
fprintf(stderr, "Masksize too small, at least 3x3\n");
return (Image *)NULL;
}
outimage = new_Image(PGM, GRAY_SCALE, 1, height, width, CVIP_FLOAT, REAL);
outdata = (float **)getData_Image(outimage, 0);
switch (type) {
case 1:
for (r = 0; r < height; r++)
for (c = 0; c < width; c++) {
outdata[r][c] = 1.0;
sum += outdata[r][c];
}
break;
case 2:
for (r = 0; r < height; r++)
for (c = 0; c < width; c++) {
outdata[r][c] = 1.0;
sum += outdata[r][c];
}
outdata[height/2][width/2] = height * width;
sum = sum - 1.0 + outdata[height/2][width/2];
break;
case 3:
c = (width + height) /4;
r = (width + height) /2;
sigma = sqrt(c*c / (2 * log(2) + (r - 3) * log(3)));
sigma = 1.0 / 2.0 /sigma/sigma;
tmp = width * height;
for (r = 0; r < height; r++)
for (c = 0; c < width; c++) {
tmp1 = (r-hf_h)*(r-hf_h); tmp2 = (c-hf_w)*(c-hf_w);
outdata[r][c] = tmp*exp(- (tmp1 + tmp2) * sigma);
sum += outdata[r][c];
}
break;
default:
fprintf(stderr, "Incorrect mask type number: %d\n", type);
return (Image *)NULL;
}
return outimage;
}
In your main function, outdata is a pointer to a double, yet your function midd takes in an actual double itself. That's why you're getting that error in type.
Simply change your function declaration so that the first input accepts a pointer to a double:
void midd(double *outdata, int type, int height, int width)
// ^^^^^^^^
Minor Note
I question your copying of your image data back to a MEX array here:
outdata[r+height*c+height*width] =output[r][c];
You don't need height*width as the offset. r + height*c is enough to access a single channel 2D matrix in column-major order. You only need to offset by height*width if you have a multi-channel image. That offset allows you to access image data in other channels... and since you only have single channel data (it looks like so...), this offset isn't required.
Therefore, you simply need to do:
outdata[r + height*c] = output[r][c];
If you don't do this, I suspect you will eventually get segmentation faults because you'll eventually access parts of memory you aren't allowed to access.
Also, once you fully test your code, get rid of the mexPrintf statements. It's going to unnecessarily flood your Command Prompt with print messages since you have it inside a nested for loop. I suspect you did this for debugging, and that's perfectly fine, but I would suggest you attach the MEX function to an actual debugger and debug your code properly instead of the print statements.
See my post on how to get that set up here: Preventing a MEX file from crashing in MATLAB

Preparing for OCR OpenCV

I am making an application that uses OCR and I am using OpenCV to threshold the image to improve the OCR results, I have gotten pretty good results but I want to know if anyone has any suggestions for improvement.
Here is what I've done so far:
// Convert to grayscale.
cv::cvtColor(cvMat, cvMat, CV_RGB2GRAY);
// Apply adaptive threshold.
cv::adaptiveThreshold(cvMat, cvMat, 255, CV_ADAPTIVE_THRESH_GAUSSIAN_C, CV_THRESH_BINARY, 3, 5);
// Attempt to sharpen the image.
cv::GaussianBlur(cvMat, cvMat, cv::Size(0, 0), 3);
cv::addWeighted(cvMat, 1.5, cvMat, -0.5, 0, cvMat);
Let me know if you have any suggestions to improve results, thanks.
Sample Images:
After:
One of the best algorithms for thresholding problem in the OCR field is sauvola method.You can use the below code.
#ifndef _THRESHOLDER
#define _THRESHOLDER
#include <cv.h>
#include "type.h"
using namespace cv;
enum class BhThresholdMethod{OTSU,NIBLACK,SAUVOLA,WOLFJOLION};
class BhThresholder
{
public :
void doThreshold(InputArray src ,OutputArray dst,const BhThresholdMethod &method);
private:
};
#endif //_THRESHOLDER
thresholder.cpp
#include "stdafx.h"
#define uget(x,y) at<unsigned char>(y,x)
#define uset(x,y,v) at<unsigned char>(y,x)=v;
#define fget(x,y) at<float>(y,x)
#define fset(x,y,v) at<float>(y,x)=v;
// *************************************************************
// glide a window across the image and
// create two maps: mean and standard deviation.
// *************************************************************
//#define BINARIZEWOLF_VERSION "2.3 (February 26th, 2013)"
double calcLocalStats (Mat &im, Mat &map_m, Mat &map_s, int win_x, int win_y) {
double m,s,max_s, sum, sum_sq, foo;
int wxh = win_x / 2;
int wyh = win_y / 2;
int x_firstth = wxh;
int y_lastth = im.rows-wyh-1;
int y_firstth= wyh;
double winarea = win_x*win_y;
max_s = 0;
for (int j = y_firstth ; j<=y_lastth; j++)
{
// Calculate the initial window at the beginning of the line
sum = sum_sq = 0;
for (int wy=0 ; wy<win_y; wy++)
for (int wx=0 ; wx<win_x; wx++) {
foo = im.uget(wx,j-wyh+wy);
sum += foo;
sum_sq += foo*foo;
}
m = sum / winarea;
s = sqrt ((sum_sq - (sum*sum)/winarea)/winarea);
if (s > max_s)
max_s = s;
map_m.fset(x_firstth, j, m);
map_s.fset(x_firstth, j, s);
// Shift the window, add and remove new/old values to the histogram
for (int i=1 ; i <= im.cols -win_x; i++) {
// Remove the left old column and add the right new column
for (int wy=0; wy<win_y; ++wy) {
foo = im.uget(i-1,j-wyh+wy);
sum -= foo;
sum_sq -= foo*foo;
foo = im.uget(i+win_x-1,j-wyh+wy);
sum += foo;
sum_sq += foo*foo;
}
m = sum / winarea;
s = sqrt ((sum_sq - (sum*sum)/winarea)/winarea);
if (s > max_s)
max_s = s;
map_m.fset(i+wxh, j, m);
map_s.fset(i+wxh, j, s);
}
}
return max_s;
}
void NiblackSauvolaWolfJolion (InputArray _src, OutputArray _dst,const BhThresholdMethod &version,int winx, int winy, double k, double dR) {
Mat src = _src.getMat();
Mat dst = _dst.getMat();
double m, s, max_s;
double th=0;
double min_I, max_I;
int wxh = winx/2;
int wyh = winy/2;
int x_firstth= wxh;
int x_lastth = src.cols-wxh-1;
int y_lastth = src.rows-wyh-1;
int y_firstth= wyh;
int mx, my;
// Create local statistics and store them in a double matrices
Mat map_m = Mat::zeros (src.size(), CV_32FC1);
Mat map_s = Mat::zeros (src.size(), CV_32FC1);
max_s = calcLocalStats (src, map_m, map_s, winx, winy);
minMaxLoc(src, &min_I, &max_I);
Mat thsurf (src.size(), CV_32FC1);
// Create the threshold surface, including border processing
// ----------------------------------------------------
for (int j = y_firstth ; j<=y_lastth; j++) {
// NORMAL, NON-BORDER AREA IN THE MIDDLE OF THE WINDOW:
for (int i=0 ; i <= src.cols-winx; i++) {
m = map_m.fget(i+wxh, j);
s = map_s.fget(i+wxh, j);
// Calculate the threshold
switch (version) {
case BhThresholdMethod::NIBLACK:
th = m + k*s;
break;
case BhThresholdMethod::SAUVOLA:
th = m * (1 + k*(s/dR-1));
break;
case BhThresholdMethod::WOLFJOLION:
th = m + k * (s/max_s-1) * (m-min_I);
break;
default:
cerr << "Unknown threshold type in ImageThresholder::surfaceNiblackImproved()\n";
exit (1);
}
thsurf.fset(i+wxh,j,th);
if (i==0) {
// LEFT BORDER
for (int i=0; i<=x_firstth; ++i)
thsurf.fset(i,j,th);
// LEFT-UPPER CORNER
if (j==y_firstth)
for (int u=0; u<y_firstth; ++u)
for (int i=0; i<=x_firstth; ++i)
thsurf.fset(i,u,th);
// LEFT-LOWER CORNER
if (j==y_lastth)
for (int u=y_lastth+1; u<src.rows; ++u)
for (int i=0; i<=x_firstth; ++i)
thsurf.fset(i,u,th);
}
// UPPER BORDER
if (j==y_firstth)
for (int u=0; u<y_firstth; ++u)
thsurf.fset(i+wxh,u,th);
// LOWER BORDER
if (j==y_lastth)
for (int u=y_lastth+1; u<src.rows; ++u)
thsurf.fset(i+wxh,u,th);
}
// RIGHT BORDER
for (int i=x_lastth; i<src.cols; ++i)
thsurf.fset(i,j,th);
// RIGHT-UPPER CORNER
if (j==y_firstth)
for (int u=0; u<y_firstth; ++u)
for (int i=x_lastth; i<src.cols; ++i)
thsurf.fset(i,u,th);
// RIGHT-LOWER CORNER
if (j==y_lastth)
for (int u=y_lastth+1; u<src.rows; ++u)
for (int i=x_lastth; i<src.cols; ++i)
thsurf.fset(i,u,th);
}
cerr << "surface created" << endl;
for (int y=0; y<src.rows; ++y)
for (int x=0; x<src.cols; ++x)
{
if (src.uget(x,y) >= thsurf.fget(x,y))
{
dst.uset(x,y,255);
}
else
{
dst.uset(x,y,0);
}
}
}
void BhThresholder::doThreshold(InputArray _src ,OutputArray _dst,const BhThresholdMethod &method)
{
Mat src = _src.getMat();
int winx = 0;
int winy = 0;
float optK=0.5;
if (winx==0 || winy==0) {
winy = (int) (2.0 * src.rows - 1)/3;
winx = (int) src.cols-1 < winy ? src.cols-1 : winy;
// if the window is too big, than we asume that the image
// is not a single text box, but a document page: set
// the window size to a fixed constant.
if (winx > 100)
winx = winy = 40;
}
// Threshold
_dst.create(src.size(), CV_8UC1);
Mat dst = _dst.getMat();
//medianBlur(src,dst,5);
GaussianBlur(src,dst,Size(5,5),0);
//#define _BH_SHOW_IMAGE
#ifdef _BH_DEBUG
#define _BH_SHOW_IMAGE
#endif
//medianBlur(src,dst,7);
switch (method)
{
case BhThresholdMethod::OTSU :
threshold(dst,dst,128,255,CV_THRESH_OTSU);
break;
case BhThresholdMethod::SAUVOLA :
case BhThresholdMethod::WOLFJOLION :
NiblackSauvolaWolfJolion (src, dst, method, winx, winy, optK, 128);
}
bitwise_not(dst,dst);
#ifdef _BH_SHOW_IMAGE
#undef _BH_SHOW_IMAGE
#endif
}
Here is comparsion table for thresholding methods: http://clweb.csa.iisc.ernet.in/rahulsharma/binarize/set1.php?id=set1%2Fimage00b
A few thoughts:
Since you're starting with a rectangular object that may be viewed at a non-normal angle, use an affine transform to warp the image so that it appears rectangular with right angle corners.
Before the affine transform, you should probably remove barrel distortion (the curviness of the card edges).
Consider using an adaptive threshold rather than a simple global binarization threshold.
If you can find a proper OCR algorithm that doesn't require binary images, use that. Although binarization will work well for black text on a white background, in general binarization presents a lot of problems if you want to achieve high accuracy (i.e., character recognition approaching 98%+ for arbitrary strings of characters)
Try to sample with better resolution.