this pointer is lost when calling a member method - c++

I have encountered a strange problem when compiling my program using 64-bit g++ 4.7.0 on a Fedora 17 x86_64 machine (the same program works well on a 32-bit Fedora).
The program is too complicated and I cannot figure out an easy way to produce a small code sample. But from the following gdb record, you can see the problem.
Program received signal SIGSEGV, Segmentation fault.
0x000000000042a4b0 in boost::shared_ptr<cppPNML::details::ddObj>::operator!(this=0x100000007)
at /usr/include/boost/smart_ptr/detail/operator_bool.hpp:55
55 return px == 0;
Missing separate debuginfos, use: debuginfo-install gnome-keyring-3.4.1-3.fc17.x86_64
(gdb) bt
#0 0x000000000042a4b0 in boost::shared_ptr<cppPNML::details::ddObj>::operator! (this=0x100000007)
at /usr/include/boost/smart_ptr/detail/operator_bool.hpp:55
#1 0x00000000004202a5 in cppPNML::pnNode::getBBox (this=0xffffffff) at cpp_pnml.cpp:131
#2 0x000000000040eca4 in draw_page (g=..., painter=...) at pnml2pdf.cpp:178
#3 0x000000000040e3b9 in main (argc=2, argv=0x7fffffffe188) at pnml2pdf.cpp:106
(gdb) up
#1 0x00000000004202a5 in cppPNML::pnNode::getBBox (this=0xffffffff) at cpp_pnml.cpp:131
131 if(!p_) return pair<double, double>(0,0);
(gdb) up
#2 0x000000000040eca4 in draw_page (g=..., painter=...) at pnml2pdf.cpp:178
178 boost::tie(w, h) = node.getBBox();
(gdb) p node
$1 = {<cppPNML::pnObj> = {_vptr.pnObj = 0x79a490, p_ = {px = 0x7c40a0, pn = {pi_ = 0x7c4170}}}, <No data fields>}
(gdb) l
173 QRectF bound(0,0,0,0);
174
175 // nodes
176 for(pnNode node = g.front<pnNode>(); node.valid(); node = node.next()) {
177 double h, w, x, y, wa, ha, xa, ya, angle;
178 boost::tie(w, h) = node.getBBox();
179 angle = atan2(h, w);
180 boost::tie(x, y) = node.getPosition();
181 wa = 0; ha = 0; xa = 0; ya = 0;
182
(gdb)
The program under debugging is a graphic printing program (pnml2pdf) that draw a graph to pdf using QT4.
The object node belongs to class pnNode, which is defined by my own graphic data struct library (quite complex, https://github.com/wsong83/cppPNML).
It is shown a SEG error where the smart pointer is uninitialized.
Through the back trace you can see that the this pointer of node.getBBox() is invalid.
However, printing the node from one level upper show the node is actually OK.
I am totally confused here.
Anyone has any clue or need any more code segment? Thanks in advance!
Update:
Thanks to the advice from #atzz, I am now certain the calculation of this pointer in member method getBBox() produced a wrong address. The problem is not caused by any source code error (directly linking object files will eliminate the segment fault), but caused by the 64-bit static library generation command "ar" (as the definition of pnNode is defined in a static lib rather than object file). It is seems now the static library is wrong and causes the wrong this calculation.
Still digging... Will update the result if anyone is still interested to know.

Is this an optimised build or a debug build? Looks to me like it should be failing on line 176 not line 178.
Are you sure the loop is right? Looks like you are going over the end. I suspect your implementation of node.valid() either doesn't do the right thing, or is the wrong thing for the loop test.
The value 0xffffffff looks like a std::iterator end() value so I think you either need to test your loop against that, or make sure the pnObj::valid() const { return p_ != NULL && p_ != 0xffffffff; }
Also the way you are implementing next() just looks wrong. Creating an iterator, searching for the string ID and then calling next() on the iterator?

Related

Mysterious segmentation fault in C++?

I have been searching all over and cannot find anything like this. Now, I won't bore you with my whole program. It's incredibly long. But, here's your basic overview:
int main()
{
int i=0;
int h=5;
cout << "h(IS) = " << h << endl;
cout << "testing comment.";
while(i < 10)
{
cout << "I'm in the loop!";
i++;
}
return 0;
}
Looks great, right? Okay, so here's the problem. I run it, and I get a segmentation fault. The weirdest part is where I'm getting it. That testing comment doesn't even print. Oh, and if I comment out all the lines before the loop, I still get the fault.
So, here's my output, so you understand:
h(IS) = 5
Segmentation fault
I am completely, and utterly, perplexed. In my program, h calls a function - but commenting out both the line that prints h and the function call have no effect, in fact, all it does is give the segmentation fault where the line ABOVE the printing h line used to be.
What is causing this fault? Anything I can do to test where it's coming from?
Keep your answers simple please, I'm only a beginner compared to most people here :)
Note: I can provide my full code upon request, but it's 600 lines long.
EDIT: I have pasted the real code here: http://pastebin.com/FGNbQ2Ka
Forgive the weird comments all over the place - and the arrays. It's a school assignment and we have to use them, not pointers. The goal is to print out solutions to the 15-Puzzle. And it's 1 AM, so I'm not going to fix my annoyed comments throughout the thing.
I most recently got irritated and commented out the whole first printing just because I thought it was something in there...but no...it's not. I still get the fault. Just with nothing printed.
For those interested, my input information is 0 6 2 4 1 10 3 7 5 9 14 8 13 15 11 12
THANK YOU SO MUCH, EVERYONE WHO'S HELPING! :)
You slip over array boundaries, causing the corruption:
for (i=0; i<=4; i++)
{
for (j=0; j<=4; j++)
{
if (cur[i][j] == 0)
{
row = i;
col = j;
}
}
}
Your i and j indices must not reach 4.
valgrind is a great tool for debugging memory access problems. It's very easy to use on Linux. Just install G++ and valgrind, and then run (without the $ signs):
$ g++ -g -o prog prog.cpp
$ valgrind ./prog
It will print very detailed error messages about memory access problems, with source code line numbers. If those still don't make sense to you, please post the full source code (prog.cpp) and the full output of valgrind.
I've run valgrind for you, its output is here: http://pastebin.com/J13dSCjw
It seems that you use some values which you don't initialize:
==21408== Conditional jump or move depends on uninitialised value(s)
==21408== at 0x8048E9E: main (prog.cpp:61)
...
==21408== Conditional jump or move depends on uninitialised value(s)
==21408== at 0x804A809: zero(int (*) [4], int (*) [4], int*, int, int, int, int, int, int) (prog.cpp:410)
==21408== by 0x804A609: lowest(int (*) [4], int (*) [4], int, int, int, int, int, int) (prog.cpp:354)
==21408== by 0x804932C: main (prog.cpp:125)
...
To fix these problems, add code which initializes the variables depicted in the error lines above (e.g. line 61, 410), then recompile, and rerun with valgrind again, until all errors disappear.
If your program behaves weirdly even after fixing all problems reported by valgrind, please let us know.
Lines 57 - 67:
for (i=0; i<=4; i++)
{
for (j=0; j<=4; j++)
{
if (cur[i][j] == 0)
{
row = i;
col = j;
}
}
}
at least one of your errors is in this code, cur is declared int cur[4][4]; this means then when j==4 (and when i==4) you are not within the bounds of your array (well you are within the memory for some of them, but not all) valid values will be 0 - 3.

Eigen Jacobi causing odd segfault in c++

So I have the following lines in my code:
MatrixXd qdash = zeroCentredMeasurementPointCloud_.topLeftCorner(3, zeroCentredMeasurementPointCloud_.cols());
Matrix3d H = q * qdash.transpose();
Eigen::JacobiSVD<MatrixXd> svd(H, Eigen::ComputeThinU | Eigen::ComputeThinV);
Now I am sure that qdash and H are being initialised correctly (q is also, just elsewhere). The last line, involving Eigen::JacobiSVD causes the program to throw this error when it is left in:
Program received signal SIGSEGV, Segmentation fault.
0xb0328af8 in _list_release () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
0 0xb0328af8 in _list_release () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
1 0xb032a464 in __free () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
2 0xb0329f7d in free () from /usr/qnx650/target/qnx6/x86/lib/libc.so.3
I.E. it is seg-faulting when trying to free it i guess. Now according to the tutorial here, all I should have to do to use this functionality is this:
MatrixXf m = MatrixXf::Random(3,2);
JacobiSVD<MatrixXf> svd(m, ComputeThinU | ComputeThinV);
Can anyone see why it is failing in my case?
Ok so this is super crazy. Turns out I was using Eigen Alignment which doesnt really work on my operating system. This caused an error which would change location just based on the size of the executable that was produced.
The moral of the story is be careful with your includes.

gdb: break when a particular object is altered

I have an object defined in c++ with a pointer to it used in various functions and files throughout the project. I am having an issue with the data being updated, so I want to debug it to see what is happening. Ideally, I want to break every time the object is accessed. however, watch requires a specific memory address. So, for example, if I have:
class data{
public:
int a;
int b;
};
then gdb will only break when a is altered, since the pointer to data is pointed at a, but not when b is altered.
Is there a way to break whenever the entire range of memory covered by the data class is altered?
Is there a way to break whenever the entire range of memory covered by the data class is altered?
Perhaps.
GDB hardware watchpoints use special debug registers in hardware, and there is usually a limit on how such registers work. On x86, you can set up to 4 word-sized hardware watch points, so for example you gave you can set watchpoints on &data->a and &data->b, and that will "cover" entire memory of the data.
I am guessing that your actual data has many more members though, and so 4 word-sized watch points will not suffice.
If you are on platform which has Valgrind support, and if your program can execute under Valgrind, then you can use Valgrind's built-in gdbserver to set watchpoints on arbitrary regions of memory.
Update:
I looked through the page you linked to and couldn't find what I was looking for
I am not sure what you were looking for. Here is a sample session showing how it works:
#include <stdlib.h>
void foo(char *p)
{
*p = 'a';
}
typedef struct {
char buf[1024];
} data;
int main()
{
data *d = calloc(1, sizeof(data));
foo(d->buf + 999);
}
gcc -g main.c
valgrind --vgdb-error=0 ./a.out
...
==10345== TO DEBUG THIS PROCESS USING GDB: start GDB like this
==10345== /path/to/gdb ./a.out
==10345== and then give GDB the following command
==10345== target remote | vgdb --pid=10345
... Valgrind now waits for debugger to attach.
In another window:
gdb ./a.out
GNU gdb (GDB) 7.4
...
(gdb) target remote | vgdb --pid=10345
relaying data between gdb and process 10345
[Switching to Thread 10345]
0x0000000004000af0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) b main
Breakpoint 1 at 0x40053d: file main.c, line 14.
(gdb) c
Breakpoint 1, main () at main.c:14
14 data *d = calloc(1, sizeof(data));
(gdb) n
15 foo(d->buf + 999);
(gdb) watch *d
Hardware watchpoint 2: *d
Note that a "hardware" watchpoint has been set on entire *d.
It's a hardware watchpoint only in the sense that Valgrind is the hardware.
(gdb) p d.buf[999]
$1 = 0 '\000'
(gdb) c
Hardware watchpoint 2: *d
Old value = {buf = '\000' <repeats 1023 times>}
New value = {buf = '\000' <repeats 999 times>, "a", '\000' <repeats 23 times>}
foo (p=0x51b6457 "a") at main.c:6
6 }
(gdb) q
Voila: the debugger stopped when 999th element was modified, proving that the watchpoint "covered" the entire structure.

Segmentation fault in _dl_runtime_resolve()

I am doing simple string operations in the code where i am getting a segmention fault. I could not get what the exact problem is.
Please take a look if someone can help.
The backtrace of the core is
(gdb) bt
#0 0x00007f595dee41da in _dl_fixup () from /lib64/ld-linux-x86-64.so.2
#1 0x00007f595deea105 in _dl_runtime_resolve () from /lib64/ld-linux-x86-64.so.2
#2 0x0000000000401d04 in getNodeInfo (node=0x7fffbfb4ba83 "TCU-0")
at hwdetails.cpp:294
#3 0x0000000000402178 in main (argc=3, argv=0x7fffbfb4aef8)
at hwdetails.cpp:369
At line 294 the crash is coming where the cout statement is there.
LdapDN is char * and is not NULL.
if ( Epath && (Epath->Entry[0].EntityType == SAHPI_ENT_UNSPECIFIED ||
Epath->Entry[0].EntityType == SAHPI_ENT_ROOT )) {
// nothing is mapped. Degrade the ldap dn path to slot.
if(LdapDN){
std::cout << "LdapDN " << LdapDN << std::endl;
}
std::string ldapDN;
ldapDN = LdapDN;
std::string slot = LDAP_PIU_ID;
if ( ldapDN.compare(0, slot.length(), slot) != 0 ) {
size_t pos = ldapDN.find(slot);
if ( pos != std::string::npos ) {
ldapDN = ldapDN.substr(pos);
LdapDN = (char *)ldapDN.c_str();
//getEntityPathFromLdapDn(ldapDN.c_str(), epath, domid);
}
}
}
A crash in _dl_fixup generally means that you have corrupted the state of runtime loader.
The two most common causes are:
Heap corruption (overflow) or
Mismatched parts of glibc itself.
If you are not setting e.g. LD_LIBRARY_PATH to point to a non-standard glibc, then we can forget about reason #2.
For #1, run your program under Valgrind, and make sure it detects no errors.
If in fact it doesn't, use disas and info registers GDB commands, update your question with their output, and you may receive additional help.
This is problem with GOT table. _dl_runtime_resolve - procedure which changes GOT (global offset table), when some function from dynamic library call's first time. In the next time using changed GOT entry.
When a function (for example printf() from libc.so) from dynamic library call's in your code first time:
goto PLT(program lookup table). The PLT is a trampoline which gets the correct address of the function being called from GOT.
from PLT goto GOT
return to PLT
call _dl_runtime_resolve
store actual function jump address to GOT
call function from dynamic library
The second time function call is:
goto PLT
goto GOT
GOT have direct jump to function address from dynamic library.
GOT is a reference to a function called once again without going through the _dl_runtime_resolve fast.
i see a memory leak here :
You essentially are losing your previous string LdapDN when you do
if ( pos != std::string::npos ) {
ldapDN = ldapDN.substr(pos);
LdapDN = (char *)ldapDN.c_str();
//getEntityPathFromLdapDn(ldapDN.c_str(), epath, domid);
}

Strange backtrace - where is the error?

I'm developing an image processing application in C++. I've seen a lot of compiler errors and backtraces, but this one is new to me.
#0 0xb80c5430 in __kernel_vsyscall ()
#1 0xb7d1b6d0 in raise () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d1d098 in abort () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7d5924d in ?? () from /lib/tls/i686/cmov/libc.so.6
#4 0xb7d62276 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb7d639c5 in malloc () from /lib/tls/i686/cmov/libc.so.6
#6 0xb7f42f47 in operator new () from /usr/lib/libstdc++.so.6
#7 0x0805bd20 in Image<Color>::fft (this=0xb467640) at ../image_processing/image.cpp:545
What's happening here? The operator new is crashing, ok. But why? That's not an out of memory (it tries to allocate about 128Kb, a 128x64 pixel with two floats each). Also, it doesn't seam as it's an error in my own code (the constructor doesn't get touched!).
The code in the mentioned line (#7) is:
Image<Complex> *result = new Image<Complex>(this->resX, resY);
// this->resX = 128, resY = 64 (both int), Complex is a typedef for std::complex<float>
Almost the same instantiation works on other places in my code. If I comment out this part of the code, it will crash a bit later on a similar part. I don't understand it, I also don't have any ideas, how to debug it. Any help?
Compiler is gcc 4.3.3, libc is 2.9 (both from Ubuntu Jaunty)
Update:
I've included the following lines just above the faulty line in the same method and in main()
Image<Complex> *test = new Image<Complex>(128, 64);
delete test;
The strange thing: in the same method it will crash, in main() it won't. As I mentioned, Complex is a typedef of std::complex<float>. The constructor doesn't get called, I've inserted a cout just before this line and in the constructor itself.
Update 2:
Thanks to KPexEA for this tip! I tried this:
Image<Complex> *test = new Image<Complex>(128, 64);
delete test;
kiss_fft_cpx *output = (kiss_fft_cpx*) malloc( this->resX * this->resY/2 * sizeof(kiss_fft_cpx) );
kiss_fftndr( cfg, input, output );
Image<Complex> *test2 = new Image<Complex>(128, 64);
delete test2;
It crashes at - you guess? - test2! So the malloc for my kissfft seams to be the faulty one. I'll take a look at it.
Final update:
Ok, it's done! Thanks to all of you!
Actually, I should have noticed it before. Last week, I noticed, that kissfft (a fast fourier transform library) made a 130x64 pixel fft image from a 128x128 pixel source image. Yes, 130 pixel broad, not 128. Don't ask me why, I don't know! So, 130x64x2xsizeof(float) bytes had to be allocated, not 128x64x... as I thought before. Strange, that it didn't crash just after I fixed that bug, but some days later.
For the record, my final code is:
int resY = (int) ceil(this->resY/2);
kiss_fft_cpx *output = (kiss_fft_cpx*) malloc( (this->resX+2) * resY * sizeof(kiss_fft_cpx) );
kiss_fftndr( cfg, input, output );
Image<Complex> *result = new Image<Complex>(this->resX, resY);
Thanks!
craesh
Perhaps a previously allocated chunk of memory has a buffer overflow that is corrupting the heap?
You are not allocating enough memory. The half-spectrum format of kissfft (and FFTW and IMKL for that matter) contains X*(Y/2+1) complex elements.
See the kiss_fftndr.h header file:
/*
input timedata has dims[0] X dims[1] X ... X dims[ndims-1] scalar points
output freqdata has dims[0] X dims[1] X ... X dims[ndims-1]/2+1 complex points
*