I am developing a computational geometry application in c++. This runs in parallel using threads and openmp. So, I get some geometrical values (such as nodes, edges, etc) and produce an output. This is working almost always perfect. However, there are cases like 1% that I get this messed up result. The application doesn't crash but I get really bad results, such as my output has random memory values. But even if I run on the same data twice, the second time it's gonna run fine. I used valgrind and helgrind but they didn't detect any related error. So, I am starting to run out of ideas how to trace it. Is there any other tool to try that detects possible thread errors better than helgrind? Or is there any idea on how to replicate such a problem and how to record the exact state that led to that bug?
Disclaimer: I have not used the approach below using OpenMP but based on what I just looked up it seems to be possible.
I have had a similar bug I needed to reproduce in GDB. This post helped me to run the application indefinitely until a segmentation fault occured.
We could adapt this answer to answer your question by adding a conditional break point that hits when the output value is not as expected.
set pagination off
break exit
commands
run
end
break file.cpp:123 if some_condition_holds
Now, if you would run the above with GDB it would run indefinitely until the bad result occurs (some_condition_holds is true). Then we can switch to the correct thread by using the inferior commands:
info inferiors
inferior inferior_num
Related
My project is quite large and multithreaded. There should be a bug which crashes the whole program.
For release version, it stuck sometimes, but does not appear very often.
For debug code, it is more likely to appear. And the stack trace of gdb is the following.
0 clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:81
1 0x00007dff8270c700 in ?? ()
2 0x00007ffff6dde38d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
This information is not enough for me to locate the buggy code.
So my question is: how to get more information from the crash? any advanced use of gdb or other advanced tools?
============= Update ==============
One more information to add, after printing out all the thread ids, I figure out the thread that crashed. The only difference of the thread is that it is detached from the std thread object. If anyone has any experience with this, please tell me.
============= Update2 ================
This problem is not solved yet, and turn out to be a sever one.
If I run in the terminal, it'll crash the whole terminal and all other programs currently running under my username.
The system is then down and not accessible by ssh for a while. There are some other users getting broken pipe and it seems my program has made sshd not responsive.
After a while I'm able to login again, and find that the binary file of the program is broken (truncated) and need to recompile.
For me it looks like a memory or stack overwrite or access of dead pointers or objects.
To catch these kind of errors I like to use tools like efence or valgrind. With actual compilers you also can use thread sanitizer or the memory sanitizer. Both works with clang and g++.
If you can not catch the problem with that, you also should install the debug library version of the standard libs. Sometimes a wrong value crashes inside the g++lib or some other libs which results in hard to debug situations. With the debug infos installed you can catch this much easier.
I have a really large code and when I try to run it in codelite, the codelite interface becomes non-responsive and I have to kill it. This usually happens in case of infinite loops.
I tried to put breakpoints in multiple places of the code to find the problem, but no luck so far. The execution halts after a while from the time that I start running the program. What is the best way of detecting such infinite loops? Codelite doesn't have a "stop" button AFAIK.
EDIT:
I ended up adding a lot of cout statements and ran the executable in a terminal rather than gdb. This helped finding what the program is doing after a really long time.
The simplest approach is to run the code for a while and then use the debugger to suspend execution without using breakpoints. If you are lucky, the call stack should indicate the bit of code that you are getting stuck in.
Failing that you will need to pepper your code with logging statements.
I'm writing a software renderer in g++ under mingw32 in Windows 7, using NetBeans 7 as my IDE.
I've been needing to profile it of late, and this need has reached critical mass now that I'm past laying down the structure. I looked around, and to me this answer shows the most promise in being simultaneously cross-platform and keeping things simple.
The gist of that approach is that possibly the most basic (and in many ways, the most accurate) way to profile/optimise is to simply sample the stack directly every now and then by halting execution... Unfortunately, NetBeans won't pause. So I'm trying to find out how to do this sampling with gdb directly.
I don't know a great deal about gdb. What I can tell from the man pages though, is that you set breakpoints before running your executable. That doesn't help me.
Does anyone know of a simple approach to getting gdb (or other gnu tools) to either:
Sample the stack when I say so (preferable)
Take a whole bunch of samples at random intervals over a given period
...give my stated configuration?
Have you tried simply running your executable in gdb, and then just hitting ^C (Ctrl+C) when you want to interrupt it? That should drop you to gdb's prompt, where you can simply run the where command to see where you are, and then carry on execution with continue.
If you find yourself in a irrelevant thread (e.g. a looping UI thread), use thread, info threads and thread n to go to the correct one, then execute where.
I am trying to compile a rather big application on Solaris. Compiling it on AIX caused a problem that the command line buffer was too small (ARG_MAX).
On Solaris it compiles most of application successfullym but then it just hangs and without any error hangs an do nothing for at least an hour.
I am running it on SunOS 5.10 Sparc 32 bit.
Any ideas on how to find out what's going on or what might be causing such behavior?
I can't tell if the compilation is hanging, or your app itself.
If the app is hanging just follow the usual debugging steps: Either run it in your debugger and watch when it dies, or add print statements.
If the compiler dies, does it always die on the same file? If you compile that file by itself does it still hang? If so, try trussing the compiler when you try to build the file that hangs. You may find that it's blocking on I/O waiting for some nonexistant file or something similar.
What you may have to do is:
Comment out or delete 99% of the code and compile that
Add around 5% of the code back in and compile that
if the last thing you added caused the hour hang then split it up
Back to step 2
Just for those who encounter this in future.
The problem was optimization flag causes it to take a REALLY long time to compile. I am talking 1+ hour for one cpp file.
This is big project.
In addition there was an issue with Sys Admin on SUN box not giving me enough CPU share.
Increasing that solved this problem, well made it quicker and within reasonable time bounds.
I hope this helps
I am debugging an Iphone program with the simulator in xCode and I have one last issue to resolve but I need help resolving it for the following reason: when it happens the program goes into debugging mode but no errors appear (no BAD ACCESS appears) and it does not show where the code fails. Putting some variables as global helps me to see their values to start pin pointing where the bug is but before I go into this fully I would like to know what techniques/tools you guys use to debug these situations.
If it helps Im debugging the following: I merged some code into the SpeakHere demo. The code was added in the C++ modules of the program (AQRecorder.h and .mm). I seem to have pinpointed the problem code in a function I wrote.
My favourite is always to add debugging code and log it to a file. This allows me so report any and all information I need to resolve the issue if the debugger is not working properly.
I normally control the debugging code by use of a flag which I can manipulate at run time or by the command line.
If the error is (and it probably is) a memory management issue, printing log entries is really not going to help.
I would reccomend learning how to use Instruments, and use its tools to track down the memory leak when it occurs rather than waiting until the application crashes later on.