Strange backtrace - where is the error? - c++

I'm developing an image processing application in C++. I've seen a lot of compiler errors and backtraces, but this one is new to me.
#0 0xb80c5430 in __kernel_vsyscall ()
#1 0xb7d1b6d0 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d1d098 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7d5924d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7d62276 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7d639c5 in malloc () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7f42f47 in operator new () from /usr/lib/libstdc++.so.6
#7 0x0805bd20 in Image<Color>::fft (this=0xb467640) at ../image_processing/image.cpp:545
What's happening here? The operator new is crashing, ok. But why? That's not an out of memory (it tries to allocate about 128Kb, a 128x64 pixel with two floats each). Also, it doesn't seam as it's an error in my own code (the constructor doesn't get touched!).
The code in the mentioned line (#7) is:
Image<Complex> *result = new Image<Complex>(this->resX, resY);
// this->resX = 128, resY = 64 (both int), Complex is a typedef for std::complex<float>
Almost the same instantiation works on other places in my code. If I comment out this part of the code, it will crash a bit later on a similar part. I don't understand it, I also don't have any ideas, how to debug it. Any help?
Compiler is gcc 4.3.3, libc is 2.9 (both from Ubuntu Jaunty)
Update:
I've included the following lines just above the faulty line in the same method and in main()
Image<Complex> *test = new Image<Complex>(128, 64);
delete test;
The strange thing: in the same method it will crash, in main() it won't. As I mentioned, Complex is a typedef of std::complex<float>. The constructor doesn't get called, I've inserted a cout just before this line and in the constructor itself.
Update 2:
Thanks to KPexEA for this tip! I tried this:
Image<Complex> *test = new Image<Complex>(128, 64);
delete test;
kiss_fft_cpx *output = (kiss_fft_cpx*) malloc( this->resX * this->resY/2 * sizeof(kiss_fft_cpx) );
kiss_fftndr( cfg, input, output );
Image<Complex> *test2 = new Image<Complex>(128, 64);
delete test2;
It crashes at - you guess? - test2! So the malloc for my kissfft seams to be the faulty one. I'll take a look at it.
Final update:
Ok, it's done! Thanks to all of you!
Actually, I should have noticed it before. Last week, I noticed, that kissfft (a fast fourier transform library) made a 130x64 pixel fft image from a 128x128 pixel source image. Yes, 130 pixel broad, not 128. Don't ask me why, I don't know! So, 130x64x2xsizeof(float) bytes had to be allocated, not 128x64x... as I thought before. Strange, that it didn't crash just after I fixed that bug, but some days later.
For the record, my final code is:
int resY = (int) ceil(this->resY/2);
kiss_fft_cpx *output = (kiss_fft_cpx*) malloc( (this->resX+2) * resY * sizeof(kiss_fft_cpx) );
kiss_fftndr( cfg, input, output );
Image<Complex> *result = new Image<Complex>(this->resX, resY);
Thanks!
craesh

Perhaps a previously allocated chunk of memory has a buffer overflow that is corrupting the heap?

You are not allocating enough memory. The half-spectrum format of kissfft (and FFTW and IMKL for that matter) contains X*(Y/2+1) complex elements.
See the kiss_fftndr.h header file:
/*
input timedata has dims[0] X dims[1] X ... X dims[ndims-1] scalar points
output freqdata has dims[0] X dims[1] X ... X dims[ndims-1]/2+1 complex points
*

Related

Segfault caused in OpenTLD VarianceFilter

I am using the OpenTLD C++ implementation as a library - only including the libopentld folder. I've successfully compiled the main executable many times and it runs without a hitch. But using the library seems to have a weirdly specific bug.
I'm using opencv 3.0 for the default opentld and my own project.
Running with -g -O0 and through gdb gives the following output:
Program terminated with signal SIGSEGV, Segmentation fault.
#0 calcVariance (off=0x7f3e060f45b0, this=0x15568a0) at libs/opentld/src/libopentld/tld/VarianceFilter.cpp:67
67 float mX = (ii1[off[3]] - ii1[off[2]] - ii1[off[1]] + ii1[off[0]]) / (float) off[5]; //Sum of Area divided by area
(gdb) bt
#0 calcVariance (off=0x7f3e060f45b0, this=0x15568a0) at libs/opentld/src/libopentld/tld/VarianceFilter.cpp:67
#1 tld::VarianceFilter::filter (this=0x15568a0, i=23100) at libs/opentld/src/libopentld/tld/VarianceFilter.cpp:89
#2 0x00000000004141cd in tld::DetectorCascade::detect (this=0x1556780, img=...) at libs/opentld/src/libopentld/tld/DetectorCascade.cpp:317
#3 0x00000000004115bc in tld::TLD::initialLearning (this=0x15437c0) at libs/opentld/src/libopentld/tld/TLD.cpp:248
#4 0x0000000000411e0c in tld::TLD::selectObject (this=<optimized out>, img=..., bb=bb#entry=0x7ffcbe8caa70)
This occurs in the stack when I call TLD::selectObject(img, roi).
I've isolated the array accesses, and it looks like off[5] is the culprit, but I'm not certain. It seems that they all access memory that isn't defined for them. In IntegralImage the width and height are never defined, but the data array is the size of width*height by convention. (and the array accesses that I'm logging seem to be outside of that range)
I don't know why this works for the normal executable but not calling from my own program. I've looked many times, stripped the normal one to just a few calls and it still works. Is it possible that it has something to do with using only Mat objects instead of IplImage?
Here's my code that calls opentld:
using namespace cv;
Target OpenTLD::findTarget(cv::Mat HSV, bool restart) {
Target t;
cvtColor(HSV, t.image, COLOR_HSV2RGB);
Mat BGR;
cvtColor(t.image, BGR, COLOR_RGB2BGR);
Mat grey(HSV.size(), CV_8UC1);
int ch[] = {2, 0};
mixChannels(&HSV, 1, &grey, 1, ch, 1);
if (restart) {
started = true;
Rect roi = selectedROI();
tld->detectorCascade->imgWidth = HSV.cols;
tld->detectorCascade->imgHeight = HSV.rows;
tld->detectorCascade->imgWidthStep = HSV.step;
tld->processImage(BGR);
tld->selectObject(grey, &roi);
} else if (started) {
t.roi = ROI(*tld->currBB);
tld->processImage(BGR);
}
return t;
}
I've verified that the images and ROIs are valid values.
This was due to HSV.step giving wrong values. I used the width value, and it works perfectly fine.

Eigen Jacobi causing odd segfault in c++

So I have the following lines in my code:
MatrixXd qdash = zeroCentredMeasurementPointCloud_.topLeftCorner(3, zeroCentredMeasurementPointCloud_.cols());
Matrix3d H = q * qdash.transpose();
Eigen::JacobiSVD<MatrixXd> svd(H, Eigen::ComputeThinU | Eigen::ComputeThinV);
Now I am sure that qdash and H are being initialised correctly (q is also, just elsewhere). The last line, involving Eigen::JacobiSVD causes the program to throw this error when it is left in:
Program received signal SIGSEGV, Segmentation fault.
0xb0328af8 in _list_release () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
0 0xb0328af8 in _list_release () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
1 0xb032a464 in __free () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
2 0xb0329f7d in free () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
I.E. it is seg-faulting when trying to free it i guess. Now according to the tutorial here, all I should have to do to use this functionality is this:
MatrixXf m = MatrixXf::Random(3,2);
JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
Can anyone see why it is failing in my case?
Ok so this is super crazy. Turns out I was using Eigen Alignment which doesnt really work on my operating system. This caused an error which would change location just based on the size of the executable that was produced.
The moral of the story is be careful with your includes.

this pointer is lost when calling a member method

I have encountered a strange problem when compiling my program using 64-bit g++ 4.7.0 on a Fedora 17 x86_64 machine (the same program works well on a 32-bit Fedora).
The program is too complicated and I cannot figure out an easy way to produce a small code sample. But from the following gdb record, you can see the problem.
Program received signal SIGSEGV, Segmentation fault.
0x000000000042a4b0 in boost::shared_ptr<cppPNML::details::ddObj>::operator!(this=0x100000007)
at /usr/include/boost/smart_ptr/detail/operator_bool.hpp:55
55 return px == 0;
Missing separate debuginfos, use: debuginfo-install gnome-keyring-3.4.1-3.fc17.x86_64
(gdb) bt
#0 0x000000000042a4b0 in boost::shared_ptr<cppPNML::details::ddObj>::operator! (this=0x100000007)
at /usr/include/boost/smart_ptr/detail/operator_bool.hpp:55
#1 0x00000000004202a5 in cppPNML::pnNode::getBBox (this=0xffffffff) at cpp_pnml.cpp:131
#2 0x000000000040eca4 in draw_page (g=..., painter=...) at pnml2pdf.cpp:178
#3 0x000000000040e3b9 in main (argc=2, argv=0x7fffffffe188) at pnml2pdf.cpp:106
(gdb) up
#1 0x00000000004202a5 in cppPNML::pnNode::getBBox (this=0xffffffff) at cpp_pnml.cpp:131
131 if(!p_) return pair<double, double>(0,0);
(gdb) up
#2 0x000000000040eca4 in draw_page (g=..., painter=...) at pnml2pdf.cpp:178
178 boost::tie(w, h) = node.getBBox();
(gdb) p node
$1 = {<cppPNML::pnObj> = {_vptr.pnObj = 0x79a490, p_ = {px = 0x7c40a0, pn = {pi_ = 0x7c4170}}}, <No data fields>}
(gdb) l
173 QRectF bound(0,0,0,0);
174
175 // nodes
176 for(pnNode node = g.front<pnNode>(); node.valid(); node = node.next()) {
177 double h, w, x, y, wa, ha, xa, ya, angle;
178 boost::tie(w, h) = node.getBBox();
179 angle = atan2(h, w);
180 boost::tie(x, y) = node.getPosition();
181 wa = 0; ha = 0; xa = 0; ya = 0;
182
(gdb)
The program under debugging is a graphic printing program (pnml2pdf) that draw a graph to pdf using QT4.
The object node belongs to class pnNode, which is defined by my own graphic data struct library (quite complex, https://github.com/wsong83/cppPNML).
It is shown a SEG error where the smart pointer is uninitialized.
Through the back trace you can see that the this pointer of node.getBBox() is invalid.
However, printing the node from one level upper show the node is actually OK.
I am totally confused here.
Anyone has any clue or need any more code segment? Thanks in advance!
Update:
Thanks to the advice from #atzz, I am now certain the calculation of this pointer in member method getBBox() produced a wrong address. The problem is not caused by any source code error (directly linking object files will eliminate the segment fault), but caused by the 64-bit static library generation command "ar" (as the definition of pnNode is defined in a static lib rather than object file). It is seems now the static library is wrong and causes the wrong this calculation.
Still digging... Will update the result if anyone is still interested to know.
Is this an optimised build or a debug build? Looks to me like it should be failing on line 176 not line 178.
Are you sure the loop is right? Looks like you are going over the end. I suspect your implementation of node.valid() either doesn't do the right thing, or is the wrong thing for the loop test.
The value 0xffffffff looks like a std::iterator end() value so I think you either need to test your loop against that, or make sure the pnObj::valid() const { return p_ != NULL && p_ != 0xffffffff; }
Also the way you are implementing next() just looks wrong. Creating an iterator, searching for the string ID and then calling next() on the iterator?

Segmentation fault in _dl_runtime_resolve()

I am doing simple string operations in the code where i am getting a segmention fault. I could not get what the exact problem is.
Please take a look if someone can help.
The backtrace of the core is
(gdb) bt
#0 0x00007f595dee41da in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#1 0x00007f595deea105 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2
#2 0x0000000000401d04 in getNodeInfo (node=0x7fffbfb4ba83 "TCU-0")
at hwdetails.cpp:294
#3 0x0000000000402178 in main (argc=3, argv=0x7fffbfb4aef8)
at hwdetails.cpp:369
At line 294 the crash is coming where the cout statement is there.
LdapDN is char * and is not NULL.
if ( Epath && (Epath->Entry[0].EntityType == SAHPI_ENT_UNSPECIFIED ||
Epath->Entry[0].EntityType == SAHPI_ENT_ROOT )) {
// nothing is mapped. Degrade the ldap dn path to slot.
if(LdapDN){
std::cout << "LdapDN " << LdapDN << std::endl;
}
std::string ldapDN;
ldapDN = LdapDN;
std::string slot = LDAP_PIU_ID;
if ( ldapDN.compare(0, slot.length(), slot) != 0 ) {
size_t pos = ldapDN.find(slot);
if ( pos != std::string::npos ) {
ldapDN = ldapDN.substr(pos);
LdapDN = (char *)ldapDN.c_str();
//getEntityPathFromLdapDn(ldapDN.c_str(), epath, domid);
}
}
}
A crash in _dl_fixup generally means that you have corrupted the state of runtime loader.
The two most common causes are:
Heap corruption (overflow) or
Mismatched parts of glibc itself.
If you are not setting e.g. LD_LIBRARY_PATH to point to a non-standard glibc, then we can forget about reason #2.
For #1, run your program under Valgrind, and make sure it detects no errors.
If in fact it doesn't, use disas and info registers GDB commands, update your question with their output, and you may receive additional help.
This is problem with GOT table. _dl_runtime_resolve - procedure which changes GOT (global offset table), when some function from dynamic library call's first time. In the next time using changed GOT entry.
When a function (for example printf() from libc.so) from dynamic library call's in your code first time:
goto PLT(program lookup table). The PLT is a trampoline which gets the correct address of the function being called from GOT.
from PLT goto GOT
return to PLT
call _dl_runtime_resolve
store actual function jump address to GOT
call function from dynamic library
The second time function call is:
goto PLT
goto GOT
GOT have direct jump to function address from dynamic library.
GOT is a reference to a function called once again without going through the _dl_runtime_resolve fast.
i see a memory leak here :
You essentially are losing your previous string LdapDN when you do
if ( pos != std::string::npos ) {
ldapDN = ldapDN.substr(pos);
LdapDN = (char *)ldapDN.c_str();
//getEntityPathFromLdapDn(ldapDN.c_str(), epath, domid);
}

GDB execution error : argc=<value temporarily unavailable, due to optimizations>, argv=0x7fff5fbff8f8

I've got a problem of execution with a C++ program. First of all, I'm working on a MacBook Pro, using native g++ to compile.
My program builds an array of Record*. Each record has a multidimensional key. Then it iterates over each record to find its unidimensional float key.
In the end, given an interval of two multidimensional keys, it determines if a given float corresponds to a multidimensional key in this interval. The algorithm is taken from a research paper, and it is quite simple in implementation.
Until 100,000 values computed, no problem, the program does its job. But when I goes to 1,000,000 values, execution crashes.Here is the error given by g++ :
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5f08dcd0
0x00000001000021ab in TestPyramid () at include/indextree_test.cc:444
Here is the full backtrace given by gdb :
(gdb) backtrace full
#0 0x00000001000021ab in TestPyramid () at include/indextree_test.cc:444
test_records = #1 0x00000001000027be in main (argc=<value temporarily unavailable, due to optimizations>, argv=0x7fff5fbff8f8) at include/indextree_test.cc:83
rc = <value temporarily unavailable, due to optimizations>
progName = 0x7fff5fbff9f8 "/Users/Max/Documents/indextree_test"
testNum = 4
Given lines are calls to the function.
Here is a sample of code :
Record* test_records[1000000];
float values[1000000];
int base = 0;
for (int i(0); i < 1000000; i++)
{
test_records[i] = CreateRecordBasic(i%30+10,i+i%100,"ab","Generic Payload");
if (i%30+10 > base)
base = i%30+10;
if (i+10*i > base)
base = i+10*i;
if (i > base)
base = i;
}
for (int i(0); i < 1000000; i++)
values[i] = floatValueFromKey(test_records[i]->key, base,num_char);
And in the end, I put the relevant float keys in a list.
Is the problem a limitation of my computer ? Did I allocate the memory in a bad manner ?
Thanks for your help,
Max.
Edit :
Here is the code of CreateRecordBasic :
Record *CreateRecordBasic(int32_t attribute_1, int64_t attribute_2, const char* attribute_3, const char* payload){
Attribute** a = new Attribute*[3];
a[0] = ShortAttribute(attribute_1);
a[1] = IntAttribute(attribute_2);
a[2] = VarcharAttribute(attribute_3);
Record *record = new Record;
record->key.value = a;
record->key.attribute_count = 3;
SetValue(record->payload,payload);
return record;
}
Record* test_records[1000000];
float values[1000000];
IMHO, these variables are too big to be stored in the stack whose size is defined by your environment. values takes up 4 megabytes and test_records may take 4-8 megabytes, this is pretty big amount of stack-space. Compiler does not exactly know the size of the system-allocated stack (this may change from system to system) , so you get the error at run-time. Try to allocate them on the heap...