printing variables in GDB seems to change code behavior? - c++

I'm debugging a large, ancient, piece of code that we just upgraded the OS/driver for, the entire thing is running 32 bit. The original developers of the code are long gone and much of it is still a black box to me.
I'm running it on the debugger. I narrowed down on a particular if statement within a larger loop, I need the 'else' part of the for loop to run to update some variables, but it was never running; implying that the variable that is being checked in the 'if' statement is always true.
Eventually I stepped into the method call (a simple getter on a private boolean) and printed the content of the variable. When I print the variable it is false, and the 'else' method will be entered when I return.
To experiment I've tried allowing the loop to run for 10 minutes, the 'else' method is never entered (as indicated by a breakpoint not being hit). Then when I print the variable being checked it's false and the variable is entered. It doesn't matter how long I let it run, or how many times I break and continue before printing the variable, the same pattern holds, I enter the 'else' method IFF I print the content of the variable that is being checked first.
To rule out some sort of datarace I've tried sitting at the breakpoint in question for the length of time it takes to do a print statement, a delay without a print doesn't result in entering the 'else' method.
What could cause such an odd behavior? Since we had issues with different architectures, running a 32 bit program on a 64 bit OS and, more importantly, the driver that it uses was not tested for 32 bit for years until they recompiled the driver for me under a 32 bit architecture, which would make me suspect the driver except that particular line of code that is misbehaving isn't touching the driver in any way. Still I suspect some sort of overflow or underflow may be happening due to a confusion caused by trying to force an old 32 bit program to run.
However, even assuming this could cause such an odd behavior, I don't know how to confirm if that is happening or otherwise debug a program where the act of looking at it changes it's behavior. I'd love any tip on what could cause such a problem or how I could move forward with debugging it.
Dammit Jim I'm a programmer, not a Quantum Mechanic!

Related

Writing out arrays fixes a problem for a reason unknown to me

When I run certain programs using FORTRAN 77 with the GNU gfortran compiler I have come across the same problem several times and I'm hoping someone has insight. The value I want should be ~ 1, but it writes out at the end of a program as something well over 10^100. This is generally a problem restricted to arrays for me. The improper value often goes away when I write out the value of this at some stage in the program before (which inevitably happens in trying to troubleshoot the issue).
I have tried initializing arrays explicitly and have tried some array bound checking as well as some internal logical checks in my program. The issue, to me and my research supervisor, seems to be pathological.
WRITE(*,*)"OK9999", INV,INVV
vs
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
The former gives incorrect results for what I would expect for the variables INV and INVV, the latter correct. This is the 'newest' example of this problem, which has on and off affected me for about a year.
The greater context of these lines is:
WRITE(*,*)"AFTER ENERGY",I1STOR(1),I2STOR(1)
DO 167 NP=1!,NV
IF(I1STOR(NP).NE.0) THEN
INV = I1STOR(NP)
INVV = I2STOR(NP)
WRITE(*,*)"OK9999", INV,INVV,NP,I1STOR(NP),I2STOR(NP)
PAUSE
ENDIF
I1STOR(1) and I2STOR(1) are written correctly in the first case "AFTER ENERGY" above. If I write out the value of NP after the DO 167 line, this will also remedy the situation.
My expectation would be that writing out a variable wouldn't affect its value. Often times I am doing large, time-intensive calculations where the ultimate value is way off and in many cases it has traced back to the situation where writing the value out (to screen or file) magically alleviates the problem. Any help would be sincerely appreciated.

GDB step until function returns error?

I have a function that calls a big chunk of code that isn't mine. I'm trying to debug it, but just stepping until it returns an error has been slow and painful.
Is there a way to break when the function is "about to" return a negative number? I think I can set a breakpoint on the return register, but I only want it for the current thread.. and ideally, it would be nice to break at "return -1;" so the stack and function variables are still readable.
(I tried using the reverse debugging, which would be perfect, but gdb 8.1 crashes when I enable reverse debugging on my ubuntu 16 system.)

can the return value from finish in gdb be different from the actual one in execution

I am a gdb novice, and I was trying to debug some GSSAPI code, and was using fin to see the return value from the frame. As seen in the snip pasted below, the call from gssint_mechglue_initialize_library() seems to be 0 but the actual check seems to fail. Can someone please point out if I am missing something obvious here?
Thanks in advance!
One possible explanation for the observed behavior is that you are debugging optimized code, and that line 1001 isn't really executed.
You can confirm this with a few nexts, or by executing fin again and observing whether GSS_S_COMPLETE or something else is returned from gssint_select_mech_type.
When optimization is on, code motion performed by the optimizer often prevents correct assignment of actual code sequences to line numbers (as instructions "belonging" to different lines are mixed and re-ordered). This often causes the code to "jump around" when e.g. doing nexti command.
For ease of debugging, recompile with -O0, or make sure to remove any -O2 and the like from your compile lines.

std::cout printing forbidden system info to the console

I have a c++ program which runs continuously and in order to debug it I print some values to the console using cout. This has been working fine for the past week.
Now however, after about 30 seconds it will print information that it should not have access to, and stops printing what I tell it to completely.
Here is an example:
Object 104 dist: 0.111448
Object 104 dist: 0.113334
Object 104 dist: 0.0950752
JRE_HOME=/jreLS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;3^C
In the above example, I tell the program to print the "Object dist" on every iteration, then suddenly it spits out this JRE_HOME info that it should not have access to, then stops printing all together. Apart from the printing stopping, the program continues to run fine.
I am worried that I might have some kind of memory leak or buffer overflow, but I do not know how to diagnose this as I am fairly new to C++ and have no idea what might be causing this (I have looked through my recent code changes and nothing stands out as an obvious cause).
I have restarted my computer but each time it just prints out more random system info which it should not have access to which again leads me to believe it is reading from some part of system memory.
What could be causing this?
that it should not have access to
You are mistaken, each process has a copy of the environment block in its memory space.
You are seeing snippets of environment variables because you are using an invalid pointer, that happens to land within the environment block instead of the intended data.

c++ crashing on incrementing an unsigned long int

This one is WTF city.
The below program is crashing after a few thousand loops.
unsigned long int nTurn = 1;
bool quit = false;
int main(){
while(!quit){
doTurn();
++nTurn;
}
}
That's, of course, simplified from my game, but nTurn is at the moment used nowhere but the incrementing of it, and when I comment out the ++nTurn line, the program will reliably loop forever. Shouldn't it run into the millions?
WTF, stackoverflow?
Your problem is elsewhere.
Some other part of the program is reading from a wild pointer that ends up pointing to nTurn, and when this loop changes the value the other code acts different. Or there's a race condition, and the increment makes this loop take just a tiny bit longer so the race-y thing doesn't cause trouble. There are an infinite number of things you could have wrong elsewhere.
Can you run your program under valgrind? Some errors it won't find, but a lot it will.
may sound silly, but, if I were looking at this, I'd perhaps output the nTurn var and see if it were always crashing out on the value. then, perhaps initialize nTurn to that & see if that also causes it. you could always put this into debug and see what is happening with various registers and so forth. did you try different compilers ?
I'd use a debugger to catch the fault and see the value of nTurn. Or if you have core dump from the crash load it into the debugger to see the var values at the crash time.
One more point, could the problem be when nTurn wraps round and goes to zero?
++nTurn can't be the source of the crash directly. You may have some sort of buffer overflow causing the memory for the nTurn variable to be accessed by pointer arithmetic when it shouldn't be. That would cause weird behavior when combined with the incrementing.