I am using the OpenTLD C++ implementation as a library - only including the libopentld folder. I've successfully compiled the main executable many times and it runs without a hitch. But using the library seems to have a weirdly specific bug.
I'm using opencv 3.0 for the default opentld and my own project.
Running with -g -O0 and through gdb gives the following output:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 calcVariance (off=0x7f3e060f45b0, this=0x15568a0) at libs/opentld/src/libopentld/tld/VarianceFilter.cpp:67
67 float mX = (ii1[off[3]] - ii1[off[2]] - ii1[off[1]] + ii1[off[0]]) / (float) off[5]; //Sum of Area divided by area
(gdb) bt
#0 calcVariance (off=0x7f3e060f45b0, this=0x15568a0) at libs/opentld/src/libopentld/tld/VarianceFilter.cpp:67
#1 tld::VarianceFilter::filter (this=0x15568a0, i=23100) at libs/opentld/src/libopentld/tld/VarianceFilter.cpp:89
#2 0x00000000004141cd in tld::DetectorCascade::detect (this=0x1556780, img=...) at libs/opentld/src/libopentld/tld/DetectorCascade.cpp:317
#3 0x00000000004115bc in tld::TLD::initialLearning (this=0x15437c0) at libs/opentld/src/libopentld/tld/TLD.cpp:248
#4 0x0000000000411e0c in tld::TLD::selectObject (this=<optimized out>, img=..., bb=bb#entry=0x7ffcbe8caa70)
This occurs in the stack when I call TLD::selectObject(img, roi).
I've isolated the array accesses, and it looks like off[5] is the culprit, but I'm not certain. It seems that they all access memory that isn't defined for them. In IntegralImage the width and height are never defined, but the data array is the size of width*height by convention. (and the array accesses that I'm logging seem to be outside of that range)
I don't know why this works for the normal executable but not calling from my own program. I've looked many times, stripped the normal one to just a few calls and it still works. Is it possible that it has something to do with using only Mat objects instead of IplImage?
Here's my code that calls opentld:
using namespace cv;
Target OpenTLD::findTarget(cv::Mat HSV, bool restart) {
Target t;
cvtColor(HSV, t.image, COLOR_HSV2RGB);
Mat BGR;
cvtColor(t.image, BGR, COLOR_RGB2BGR);
Mat grey(HSV.size(), CV_8UC1);
int ch[] = {2, 0};
mixChannels(&HSV, 1, &grey, 1, ch, 1);
if (restart) {
started = true;
Rect roi = selectedROI();
tld->detectorCascade->imgWidth = HSV.cols;
tld->detectorCascade->imgHeight = HSV.rows;
tld->detectorCascade->imgWidthStep = HSV.step;
tld->processImage(BGR);
tld->selectObject(grey, &roi);
} else if (started) {
t.roi = ROI(*tld->currBB);
tld->processImage(BGR);
}
return t;
}
I've verified that the images and ROIs are valid values.
This was due to HSV.step giving wrong values. I used the width value, and it works perfectly fine.
Related
I'm using:
GeForce GTX 1080 TI which has a compute capability 6.1.
OpenCV 3.2 version (was built for VS2013, x64 Release and Debug configurations separately).
CUDA 8.0 version.
Visual studio 2013, Relase and Debug configurations of x64 platform.
My purpose is to process part of the entire input image.
The image part declared by upper left coordinate and width and height.
Problem description:
An invalid configuration argument CUDA error is rasied only when I'm running the Release output in stand alone mode (without debugging) via visual studio DEBUG menu (Ctrl + F5).
If I'm running the same Release executable via VS Debug menu (F5) the error isn't raised.
Also, when I'm running the output of Debug configuration that was generated by the same application code, both options F5 and Ctrl+F5 are work properly and the error isn't raised.
Here is my code:
struct sRect
{
unsigned int m_StartRow;
unsigned int m_StartCol;
unsigned int m_SizeRows;
unsigned int m_SizeCols;
};
__global__ void CleanNoisePreparation(unsigned char * SrcImage, size_t iStep, const sRect ImageSlice)
{
int iXPos = threadIdx.x + blockIdx.x*blockDim.x;
int iYPos = threadIdx.y + blockIdx.y*blockDim.y;
if (!(iXPos < ImageSlice.m_SizeCols && iYPos < ImageSlice.m_SizeRows))
return;
/*In case pixel value is less or equal to 127 set it to black color (0) otherwisw set it to white color (255)*/
SrcImage[iYPos * iStep + iXPos] = (SrcImage[iYPos * iStep + iXPos] <= (unsigned char)127) ? ((unsigned char)0) : ((unsigned char)255);
}
void PerformCleanNoisePreparationOnGPU(cv::cuda::GpuMat& Image,
const sRect &ImageSlice,
const dim3 &dimGrid,
const dim3 &dimBlock,
const cudaStream_t &Stream)
{
/*Calculate the rquired start address based on the required image slice characteristics*/
unsigned char * pImageData = (unsigned char*)(Image.data ImageSlice.m_StartRow * Image.step + ImageSlice.m_StartCol);
CleanNoisePreparation << <dimGrid, dimBlock, 0, Stream >> >(pImageData, Image.step, ImageSlice);
CUDA(cudaGetLastError());
}
void main
{
sRect ResSliceParams;
ResSliceParams.m_StartRow = 0;
ResSliceParams.m_StartCol = 4854;
ResSliceParams.m_SizeRows = 7096;
ResSliceParams.m_SizeCols = 5146;
cv::cuda::GpuMat MyFrame = cv::cuda::GpuMat::GpuMat(cv::Size(10000, 7096), CV_8U);
//Image step size is 10240
dim3 dimBlock (32, 32, 1)
dim3 dimGrid (161, 222, 1)
cudaStream_t cudaStream;
cudaStreamCreateWithFlags(&cudaStream, cudaStreamNonBlocking);
PerformCleanNoisePreparationOnGPU(MyFrame,
ResSliceParams,
dimGrid,
dimBlock,
cudaStream);
}
The error is raised also when:
The kernel is totally empty (All lines were commented)
The kernel inputs list is empty.
Default stream is used instead of specific stream
The problem source was found.
Due to the fact that the problem was raised only when I was executed my application under Release without debugging mode I could only used a prints commands in order to learn what are the variables values and what is the real flow of the code.
So, I was figured that dimGrid.y was set to a negative value by mistake only in this execution mode and under all other executions modes it was a positive value as I was expected.
Due to this negative value the CUDA was raised the rror of "Invalid configuration argument".
More detailes:
I have a code which calculate the required dimGrid values based on the input image resolution and if it is a portrait or a landscape.
I'm using a class member from type bool to hold this indication and send its initialization value to other sub classes as part of the member initialization list calls of the main class which include all of them as a members.
It was figured out that only in Release without debugging execution mode the bool value was false instead of true (which represents landscape mode) in the scope of the sub classes in opposite to its value in the scope of the main class.
I was verified that I it was initialized (as part of the member initialization list) to true before I sent it to all other sub classes constructors but due to the fact that class members initialization order isn't determined according to the members initialization list order but according to thier declarations order in the class it was sent to them an uninitiated.
In my system, only in Release without debugging execution mode, an uninitiated bool type get 0 value but in all other exections mode it get a positive value.
While "if" condition is performed on an uninitiated bool type, 0 translated to false but any positive value is translated to true.
This was caused to a wrong calculation of the dimGrid values.
The basic problem was as follows:
When I run the below Kernel with N threads and don't include the 4
lines to instantiate and populate the ScaledLLA variable every thing
works fine.
When I run the below Kernel with N threads and do include the 4
lines to instantiate and populate the ScaledLLA variable the GPU locks
up, and Windows throws a "display driver not responding" error.
If I reduce the number of threads running by reducing the grid size
everything worked fine.
I'm new to CUDA and have been incrementally building out some GIS functionality.
my host code looks like this at the kernel call.
MapperKernel << <g_CUDAControl->aGetGridSize(), g_CUDAControl->aGetBlockSize() >> >(g_Deltas.lat, g_Deltas.lon, 32.2,
g_DataReader->aGetMapper().aGetRPCBoundingBox()[0], g_DataReader->aGetMapper().aGetRPCBoundingBox()[1],
g_CUDAControl->aGetBlockSize().x,
g_CUDAControl->aGetThreadPitch(),
LLA_Offset,
LLA_ScaleFactor,
RPC_XN,RPC_XD,RPC_YN,RPC_YD,
Pixel_Offset, Pixel_ScaleFactor,
device_array);
cudaDeviceSynchronize(); //code crashes here
host_array = (point3D*)malloc(num_bytes);
cudaMemcpy(host_array, device_array, num_bytes, cudaMemcpyDeviceToHost);
the Kernel that is being called looks like this:
__global__ void MapperKernel(double deltaLat, double deltaLon, double passedAlt,
double minLat, double minLon,
int threadsperblock,
int threadPitch,
point3D LLA_Offset,
point3D LLA_ScaleFactor,
double * RPC_XN, double * RPC_XD, double * RPC_YN, double * RPC_YD,
point2D pixelOffset, point2D pixelScaleFactor,
point3D * rValue)
{
//calculate thread's LLA
int latindex = threadIdx.x + blockIdx.x*threadsperblock;
int lonindex = threadIdx.y + blockIdx.y*threadsperblock;
point3D LLA;
LLA.lat = ((double)(latindex))*deltaLat + minLat;
LLA.lon = ((double)(lonindex))*deltaLon + minLon;
LLA.alt = passedAlt;
//scale threads LLA - adding these four lines is what causes the problem
point3D ScaledLLA;
ScaledLLA.lat = (LLA.lat - LLA_Offset.lat) * LLA_ScaleFactor.lat;
ScaledLLA.lon = (LLA.lon - LLA_Offset.lon) * LLA_ScaleFactor.lon;
ScaledLLA.alt = (LLA.alt - LLA_Offset.alt) * LLA_ScaleFactor.alt;
rValue[lonindex*threadPitch + latindex] = ScaledLLA; //if I assign LLA without calculating ScaledLLA everything works fine
}
if I assign LLA to rValue then everything executes quickly and I get the expected behavior; however, when I add those fourlines for ScaledLLA and try to assign it to rValue, CUDA takes too long for windows's liking at the cudaDeviceSynchronize() call and I get a
"display driver not responding" error that then proceeds to reset the GPU. From looking around the error appears to be a windows thing that occurs when Windows believes that the GPU isn't being responsive. I am certain that the kernel is running and performing the right calculations, because I have stepped through it with the NSIGHT debugger.
Does anybody have a good explanation for why adding those three lines to the kernel would cause the execution time to spike?
I'm running Win7 VS 2013 and have nsight 4.5 installed.
For those who get here later via a search engine. It turns out the problem was with the card running out of memory.
That should probably have been one of the top couple of things to think of since the problem occurred only after the instantiation was added.
The card only had so much memory (~2GB) and my rvalue buffer was taking up most (~1.5GB) of it. With every thread trying to instantiate its own point3D variable the card simply ran out of memory.
For those interested NSight's profiler said that it was a cudaUknownError.
The fix was to lower the number of threads running the kernel
Hi I've written this simple program
Main.cpp
std::vector<cv::Mat> PD_Classifier_VEC;
#define Folder_Address ""
int Main()
{
int overall_counter=0;
for(int j = 0 ; j < 600 ; j++)
{
QString address = Folder_Address + QString::number(overall_counter++) +".jpg";
cv::Mat image = cv::imread(address.toUtf8().constData(),0);
PD_Classifier_VEC.push_back(image);
PD();
}
}
PD Function
void PD()
{
static int Total_Frame_Number=0;
Total_Frame_Number++;
cv::Mat Point_MAT = cv::Mat(PD_Classifier_VEC[0].size(),CV_8UC1,cv::Scalar::all(0));
....//Some Calculation //
PD_Classifier_VEC[0].release();
PD_Classifier_VEC.erase(PD_Classifier_VEC.begin());
}
this code works fine till j=56, after that Qt display this error and quits !!!
*** Error in `/home/parsa/QtProjects/QtVLPR/QtVLPR': corrupted double-linked list: 0x0000000000dcf880 ***
I ran the code in debugger mode and added this if statement code to PD() Function:
void PD()
{
static int Total_Frame_Number=0;
Total_Frame_Number++;
cv::Mat Point_MAT = cv::Mat(PD_Classifier_VEC[0].size(),CV_8UC1,cv::Scalar::all(0));
....//Some Calculation //
if(Total_Frame_Number==56)
{
std::cout<<Point_MAT<<"\n"; //it displays the elements perfectly
int Nonz = cv::countNonZero(Point_MAT); //it runs too
cv::imshow("Point_MAT",Point_MAT); //here the error appears !!!
cv::waitKey();
}
PD_Classifier_VEC[0].release();
PD_Classifier_VEC.erase(PD_Classifier_VEC.begin());
}
As you can see, the comments provided above the first two lines works fine but when I try to show the image using imshow the program crashes and it displays the corrupted double linked list error !!! what's wrong here?
Why I can't display this image and if the POINT_MAT image is corrupted how the first two lines works fine ?
P.S
If I start the program from j=57, it works fine till it finishes and no error appears so the
//some calculation
code works fine and I'm sure about it.
I've test many other functions such as threshold, subtract and ... which works on data part of the image and they work fine but when I add a function which work on metadata + data parts the corrupted double linked list appears again !!!
cv::subtract(Point_MAT,Point_MAT,temp); //works fine because it only works on data part
Point_MAT.copyTo(Temp_MAT); //gives error cause it works on header part too ...
The error comes from the standard C library, and indicates that you've corrupted the heap.
Specifically, you should not call release() on a Cv::mat unless you have a matching addref() in your own code. Cv::mat behaves like an implicitly-shared value common in Qt (like say QImage), you shouldn't worry about managing its reference counts manually.
The obvious suggestion is to remove the "some calculation" part. The code you show as is should work (without the release()) without the calculation, for many more images.
Convert the code that you show to a separate, single file project, and make sure it runs - because it should. Then the error is limited to your calculations - the fact that it only manifests itself when you do the calculations is a symptom of the problem with calculations.
Perhaps the calculations allocate memory?
The below is a working, self-contained example that demonstrates that the shown code is not only OK, but everything works even if you store a hundred images in memory at once. The images are 2MB in size each.
#include <QImage>
#include <QTemporaryFile>
#include <QDebug>
#include <vector>
#include <opencv2/opencv.hpp>
std::vector<cv::Mat> PD_Classifier_VEC;
void PD()
{
cv::Mat Point_MAT = cv::Mat(PD_Classifier_VEC[0].size(),CV_8UC1,cv::Scalar::all(0));
//Some Calculation //
std::stringstream stream;
stream<<Point_MAT<<"\n";
int Nonz = cv::countNonZero(Point_MAT);
cv::imshow("Point_MAT",Point_MAT);
PD_Classifier_VEC.erase(PD_Classifier_VEC.begin());
}
int main()
{
QTemporaryFile file;
file.setFileTemplate(file.fileTemplate() + ".jpg");
const int N = 100;
for(int j = 0 ; j < N ; j++)
{
file.open();
QImage img(800, 600, QImage::Format_RGB32);
img.save(&file);
file.close();
QString address = file.fileName();
cv::Mat image = cv::imread(address.toStdString(),0);
PD_Classifier_VEC.push_back(image);
}
while (!PD_Classifier_VEC.empty()) PD();
cv::waitKey();
return 0;
}
The underlying problem is well described in Kuba's answer.
In my case I was passing a cv::Mat as parameter to a function to the image. But by the point where the main thread would access that mat, the worker thread which took the image had already deleted it.
The solution was to pass a clone of the mat.
...
cv::Mat const img = recordImage();
myWindow->setImage(img);
} // end of function, img will get deleted
...
cv::Mat const img = recordImage();
myWindow->setImage(img.clone());
} // img will get deleted, but the clone will persist
So I have the following lines in my code:
MatrixXd qdash = zeroCentredMeasurementPointCloud_.topLeftCorner(3, zeroCentredMeasurementPointCloud_.cols());
Matrix3d H = q * qdash.transpose();
Eigen::JacobiSVD<MatrixXd> svd(H, Eigen::ComputeThinU | Eigen::ComputeThinV);
Now I am sure that qdash and H are being initialised correctly (q is also, just elsewhere). The last line, involving Eigen::JacobiSVD causes the program to throw this error when it is left in:
Program received signal SIGSEGV, Segmentation fault.
0xb0328af8 in _list_release () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
0 0xb0328af8 in _list_release () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
1 0xb032a464 in __free () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
2 0xb0329f7d in free () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
I.E. it is seg-faulting when trying to free it i guess. Now according to the tutorial here, all I should have to do to use this functionality is this:
MatrixXf m = MatrixXf::Random(3,2);
JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
Can anyone see why it is failing in my case?
Ok so this is super crazy. Turns out I was using Eigen Alignment which doesnt really work on my operating system. This caused an error which would change location just based on the size of the executable that was produced.
The moral of the story is be careful with your includes.
I'm developing an image processing application in C++. I've seen a lot of compiler errors and backtraces, but this one is new to me.
#0 0xb80c5430 in __kernel_vsyscall ()
#1 0xb7d1b6d0 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d1d098 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7d5924d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7d62276 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7d639c5 in malloc () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7f42f47 in operator new () from /usr/lib/libstdc++.so.6
#7 0x0805bd20 in Image<Color>::fft (this=0xb467640) at ../image_processing/image.cpp:545
What's happening here? The operator new is crashing, ok. But why? That's not an out of memory (it tries to allocate about 128Kb, a 128x64 pixel with two floats each). Also, it doesn't seam as it's an error in my own code (the constructor doesn't get touched!).
The code in the mentioned line (#7) is:
Image<Complex> *result = new Image<Complex>(this->resX, resY);
// this->resX = 128, resY = 64 (both int), Complex is a typedef for std::complex<float>
Almost the same instantiation works on other places in my code. If I comment out this part of the code, it will crash a bit later on a similar part. I don't understand it, I also don't have any ideas, how to debug it. Any help?
Compiler is gcc 4.3.3, libc is 2.9 (both from Ubuntu Jaunty)
Update:
I've included the following lines just above the faulty line in the same method and in main()
Image<Complex> *test = new Image<Complex>(128, 64);
delete test;
The strange thing: in the same method it will crash, in main() it won't. As I mentioned, Complex is a typedef of std::complex<float>. The constructor doesn't get called, I've inserted a cout just before this line and in the constructor itself.
Update 2:
Thanks to KPexEA for this tip! I tried this:
Image<Complex> *test = new Image<Complex>(128, 64);
delete test;
kiss_fft_cpx *output = (kiss_fft_cpx*) malloc( this->resX * this->resY/2 * sizeof(kiss_fft_cpx) );
kiss_fftndr( cfg, input, output );
Image<Complex> *test2 = new Image<Complex>(128, 64);
delete test2;
It crashes at - you guess? - test2! So the malloc for my kissfft seams to be the faulty one. I'll take a look at it.
Final update:
Ok, it's done! Thanks to all of you!
Actually, I should have noticed it before. Last week, I noticed, that kissfft (a fast fourier transform library) made a 130x64 pixel fft image from a 128x128 pixel source image. Yes, 130 pixel broad, not 128. Don't ask me why, I don't know! So, 130x64x2xsizeof(float) bytes had to be allocated, not 128x64x... as I thought before. Strange, that it didn't crash just after I fixed that bug, but some days later.
For the record, my final code is:
int resY = (int) ceil(this->resY/2);
kiss_fft_cpx *output = (kiss_fft_cpx*) malloc( (this->resX+2) * resY * sizeof(kiss_fft_cpx) );
kiss_fftndr( cfg, input, output );
Image<Complex> *result = new Image<Complex>(this->resX, resY);
Thanks!
craesh
Perhaps a previously allocated chunk of memory has a buffer overflow that is corrupting the heap?
You are not allocating enough memory. The half-spectrum format of kissfft (and FFTW and IMKL for that matter) contains X*(Y/2+1) complex elements.
See the kiss_fftndr.h header file:
/*
input timedata has dims[0] X dims[1] X ... X dims[ndims-1] scalar points
output freqdata has dims[0] X dims[1] X ... X dims[ndims-1]/2+1 complex points
*