There is a weird issue with some code I'm working with:
As far as I know and as far as I checked there are no components in the code, that rely on anything random, like system time, etc.
Yet it is enough to just add a breakpoint in the code and the behavior of the program changes. It doesn't do anything wrong, it just behaves differently (there are several objects to be selected in the code by some value, which is 0 for all of them - it simply picks a different object where the value is 0 when adding breakpoints).
Unfortunately it's not possible to post any code, because there is way too much of it.
What could cause this kind of behavior?
Edit:
I spent some more time on the problem: It doesn't seem to be related to breakpoints themselves - the "easiest" way to produce different results is the following: When starting the debug process from a different .cpp file of the project the output varies already. When disabling "Build automatically" this doesn't happen anymore. I conclude from that, that even if there are no changes in the code, something is rebuilt.
I also narrowed it down to the class that actually causes the different behavior: It's an implementation of a fibonacci heap.
If you are not running on Windows then consider using Valgrind - these are the supported platforms. The Memcheck tool will run your program (slower) looking at every memory reference to track down use of uninitialized variables.
Memcheck can detect if your program:
Accesses memory it shouldn't (areas not yet allocated, areas that have been freed, areas past the end of heap blocks, inaccessible areas of the stack).
Uses uninitialised values in dangerous ways.
Leaks memory.
Does bad frees of heap blocks (double frees, mismatched frees).
Passes overlapping source and destination memory blocks to memcpy() and related functions.
Related
I have a very un-scientific observation about memory overwrites and was curious if anyone else has noticed something similar, knows why, and/or can tell me why I wasn't really seeing what I thought I was seeing.
What I noticed was that for some C++ programs, when I have a memory overwrite bug in that program, it would usually (if not always) show up in a specific section of code which was often unrelated to the section of code with the bug. This is not a blanket observation. Not all C++ programs behave this way. But when I have one, it is pretty consistent. (No comment on why my code has enough memory overwrites that I have the opportunity to notice consistent-anything :) )
I'm not asking why a memory overwrite in function1 can show up in function2; that is understood. My observation is that over the life of a given program, we have discovered memory overwrites in function1, function2, function3, function4, and function5. But in each case, we discovered the problem because the code would crash in function6. Always in function6 and only in function6. None of those functions are related and do not touch anything that function6 uses.
Over my lifetime, I've encountered two C programs and one C++ program that behaved this way. These were years apart in unrelated systems and hardware. I just found it weird and wondered if anyone else has seen this. Plus, I suspect that I may be seeing the same pattern in a C++/JNI/Java program that I'm working on now, but it is young enough that I've not had enough hits to be sure of a pattern.
Perhaps the real question here is about what escalates "silent" memory corruption into an actual/formal crash (as opposed to "just" more subtle problems such as unexpected data values, that the user might or might not notice or recognize). I don't think that question can be answered generally, as it depends a lot on the specifics of the compiler, the code, the memory layout of the in-memory data structures, etc.
It can be said that in most modern (non-embedded) systems there is an MMU that handles translating virtual (i.e. per-process) memory addresses into physical memory addresses, and that many user-space crashes are the result of the MMU generating an unrecoverable page fault when the user program tries to dereference a virtual address that has no defined physical equivalent. So perhaps in this case function6() was trying to dereference a pointer whose value had been corrupted in such a way that the MMU couldn't translate it to a physical address. Note that the compiler often places its own pointers on the stack (to remember where the program's control flow should return to when a function returns), so a bad pointer dereference can happen even in code that doesn't explicitly dereference any pointers.
Another common cause for a crash would be a deliberately induced crash invoked by code that notices that the data it is working with is "in a state that should never happen" and calls abort() or similar. This can happen in user code that has assert() calls in it, or in system-provided code such as the code that manages the process's heap. So it could be that function6() tried to allocate or free heap memory, and in so doing gave the heap manager the chance to detect an "impossible state" in one of its data structures and crash out. Keep in mind that the process's heap is really just one big data structure that is shared by all parts of the program that use the heap, so it's not terribly surprising that heap corruption caused by one part of the program might result in a crash later on by another (mostly unrelated) part of the program that also uses the heap.
I am debugging a code and there are 2 issues.
the debugger showed me the inner fields of each pointer, but suddenly it just wont, I dont know what changed or what did i click, but when i try to acsses the inner fields (like writing something into the pointed variable) it indeed shows me the correct variable, so it is saved there.
As you can see last clearly points to something, but it doesnt show the inner variable that the pointer is pointing to.
10 minutes ago it showed them though.
for some reason my program runs on debugging mode but encounter some sort of an unkown infinite loop when i run it regularly. Howcome?
im using the mingw debugger (i think its called GDB) on the IDE CLion.
I have no idea about the first part of the question, but for the second part:
for some reason my program runs on debugging mode but encounter some sort of an unkown infinite loop when i run it regularly. Howcome?
This is very common.
In 99.999% of instances this happens because your program exercises undefined behavior of some sort, such as using unitialized data, accessing array out of bounds, accessing memory after it has been deallocated, etc. etc.
In the remaining 0.001% of the cases it's due to a compiler bug.
On non-Widows OSes there are tools which help find such problems quickly, such as Address and Memory Sanitizers. Looks like Address Sanitizer is also available on Windows, but only under MSVC.
Update:
what can i usually do in order to find those memory bugs that the debugger wont pickup on?
The usual techniques are:
Leave no variable uninitialized.
Add assert()ions to verify that indices are in bounds, etc.
Have a very clear model of what dynamically allocated memory is owned by which object, so it's clear that no memory is accessed after it has been deleted, etc.
if my code is lets say 1500 lines long,
That is a very small program. Learning how to debug such programs will serve you well.
I haven't been able to create a Qt GUI app that didn't have over 1K 'definitely lost' bytes in valgrind. I have experimented with this, making minimal apps that just show one QWidget, that extend QMainWindow; that just create a QApplication object without showing it or without executing it or both, but they always leak.
Trying to figure this out I have read that it's because X11 or glibc has bugs, or because valgrind gives false positives. And in one forum thread it seemed to be implied that creating a QApplication-object in the main function and returning the object's exec()-function, as is done in tutorials, is a "simplified" way to make GUIs (and not necessarily good, perhaps?).
The valgrind output does indeed mention libX11 and libglibc, and also libfontconfig. The rest of the memory losses, 5 loss records, occurs at ??? in libQtCore.so during QLibrary::setFileNameAndVersion.
If there is a more appropriate way to create GUI apps that prevents even just some of this from happening, what is it?
And if any of the valgrind output is just noise, how do I create a suppression file that suppresses the right things?
EDIT: Thank you for comments and answers!
I'm not worrying about the few lost kB themselves, but it'll be easier to find my own memory leaks if I don't have to filter several screens of errors but can normally get an "OK" from valgrind. And if I'm going to suppress warnings, I'd better know what they are, right?
Interesting to see how accepted leaks can be!
It is not uncommon for large-scale multi-thread-capable libraries such as QT, wxWidgets, X11, etc. to set up singleton-type objects that initialize once when a process is started and then make no attempt to effort to clean up the allocation when the process shuts down.
I can assure you that anything "leaked" from a function such as QLibrary::setFileNameAndVersion() has been left so intentionally. The bits of memory left behind by X11/glibc/fontConfig are probably not bugs either.
It could be seen as bad coding practice or etiquette, but it can also greatly simplify certain types of tasks. Operating systems these days offer a very strong guarantee for cleaning up any memory or resources left open by a process when its killed (either gracefully or by force), and if the allocation in question is very likely to be needed for the duration of the application, including shutdown procedures -- and various core components of QT would qualify -- then it can be a boon to performance to have the library set up some memory allocations as soon as it is loaded/initialized, and allow those to persist indefinitely. Among other things, this allows the memory to be present for use by any other C++ destructors that might reference that memory.
Since those allocations are only set up once, and from one point in the code, there is no risk of a meaningful memory leak. Its just memory that belongs to the process and is thus cleaned up when the process is closed by the operating system.
Conclusion: if the memory leak isn't in your code, and it doesn't appear to get significantly larger over time (and by significant these days, think megabytes), and/or is clearly orginating from first-time initialization setup code that is only ever invoked once within your app, then don't worry about it. It is probably intentional.
One way to test this can be to run your code inside a loop, and vary the number of iterations. If the difference between allocs and frees is independent on the number of iterations, you are likely to be safe.
I am having a lot of trouble debugging a segmentation fault in a C++ project in XCode 4.
I only get a segfault when I built with the "LLVM 2.0" compiler option and use -O3 optimization. From what I understand, there are limited debugging options when one is using optimization, but here is the debug output I get after I run in Xcode with gdb turned on:
warning: Got an error handling event: "Dwarf Error: Cannot find DIE at 0x3be2 referenced from DIE at 0x11d [in module /Users/imran/Library/Developer/Xcode/DerivedData/cgo-hczcifktgscxjigfphieegbpxxsq/Build/Products/Debug/cgo]".
No memory available to program now: unsafe to call malloc
I can't get gdb to give me any useful info after that (like a trace), but I'm not sure I really know how to use it properly. When I try to use the "LLDB" debugger Xcode just crashes (which has been a common theme since I started using it).
My program is deterministic, but when I try to isolate the problem with print statements the behavior will change. For example if I add cout << "hello"; at one point the segfault goes away. Other print statements cause my program to segfault in a different iteration of its main loop. And naturally when I put in enough print statements to supposedly pinpoint the offending code, the segfault seems to occur after one line but before the next (i.e. nowhere).
I am using pointers and dynamic memory allocation, which is likely the cause of the problem, but since I can't narrow down the block of code causing the error I don't know what code to show here.
I tried profiling with the "Leaks" tool in Instruments, but it didn't find any leaks.
Any advice? I am very inexperienced with debugging so anything would help, really.
EDIT: Solved. Given certain inputs, my program would try to read past the end of an array.
I don't think there's enough information that I can help you with the DWARF issue. I am not familiar enough with that toolchain to know how robust it is.
Your crashing symptoms however smell a lot like heap corruption. I don't know what allocator OSX uses by default, but common optimizations store metadata inline with objects and/or thread the freelist through empty objects, which makes them very sensitive to buffer overflows on the heap. Freeing an object twice or using a dangling pointer (a pointer that has been freed but whose space may now be in use by another allocation) can also cause seemingly nondeterministic and hard to track errors, since the layout of the heap is likely to change between runs. Print statements also use the allocator, which means changing the print statements can change when and where the problem will appear.
A tool that you may find helpful in determining if this is a heap problem or something unrelated is a heap replacement called DieHard by my advisor (http://prisms.cs.umass.edu/emery/index.php?page=download-diehard). I believe it will build on OSX, and you can link it into your program using LD_PRELOAD=/path/to/libdiehard.so to replace the default allocator at runtime. Its sole purpose is to resist memory errors and heap corruption, so if your application actually runs with it, that's probably where you need to look.
My application uses GLUTesselator to tesselate complex concave polygons. It randomly crashes when I run the plain release exe, but it never crashes if I do start debugging in VS. I found this right here which is basically my problem:
The multi-thread debug CRT (/MTd) masks the problem, because, like
Windows does with processes spawned by
a debugger, it provides to your
program a debug heap, that is
initialized to the 0xCD pattern.
Probably somewhere you use some
uninitialized area of memory from the
heap as a pointer and you dereference
it; with the two debug heaps you get
away with it for some reason (maybe
because at address 0xbaadf00d and
0xcdcdcdcd there's valid allocated
memory), but with the "normal" heap
(which is often initialized to 0) you
get an access violation, because you
dereference a NULL pointer.
The problem is the crash occurs in GLU32.dll and I have no way to find out why its trying to dereference a null pointer sometimes. it seems to do this when my polygons get fairly large and have lots of points. What can I do?
Thanks
It's a fact of life that sometimes programs behave differently in the debugger. In your case, some memory is initialized differently, and it's probably laid out differently as well. Another common case in concurrent programs is that the timing is different, and race conditions often happen less often in a debugger.
You could try to manually initialize the heap to a different value (or see if there is an option for this in Visual Studio). Usually initializing to nonzero catches more bugs, but that may not be the case in your situation. You could also try to play with your program's memory mapping to arrange that the page 0xcdcdc000 is unmapped.
Visual Studio can set a breakpoint on accesses to a particular memory address, you could try this (it may slow your program significantly more than a variable breakpoint).
but it never crashes if I do start debugging in VS.
Well, I'm not sure exactly why but while debugging in visual studio program sometimes can get away with accessing some memory regions that would crash it without debugger. I do not know exact reasons, though, but sometimes 0xcdcdcdcd and 0xbaadfood doesn't have anything to do with that. It is just accessing certain addresses doesn't cause problems. When this happens, you'll need to find alternative methods of guessing the problem.
What can I do?
Possible solutions:
Install exception handler in your program (_set_se_translator, if I remember correctly). On access violation try MinidumpWriteDump. Debug it later using Visual Studio (afaik, crash dump debugging is n/a in express edition), or using windbg.
Use just-in-time debuggers. Non-express edition of visual studio have this feature. There are probably alternatives.
Write custom memory manager (that'll override new/delete and will provide malloc/free alternatives (if you use them)) that will grab large chunk of memory, lock all unused memory with VirtualProtect. In this case all invalid access will cause crashes even in debug mode. You'll need a lot of memory for such memory manager, because to be locked, each block should be aligned to pages.
Add excessive logging to all suspicious function calls. Dump a lot of text/debug information into file (or stderr) - parameter values, arrays, everything you suspect could be related to crash, flush after every write to file, otherwise some info will be lost during the crash. This way you'll be able to guess what happened before program crashed.
Try debugging release build. You should be able to do it to some extent if you enable "debug information" for release build in project settings.
Try switching on/off "basic runtime checks" and "buffer security check" in project properties (configuration properties->c/c++->code genration).
Try to find some kind of external tool - something like valgrind or bounds checker. Although, to my expereinece, #3 is more reliable than that approach. Although that really depends on the problem.
A link to an earlier question and two thoughts.
First off you may want to look at a previous question about valgrind substitutes for windows. Lots of good hints on programs that will help you.
Now the thoughts:
1) The debugger may stop your program from crashing in the code you're testing, but it's not fixing the problem. At worst you're just kicking the can down the street, there's still corruption but it's not evident from the way you're running. When you ship you can be assured someone will run into the problem again.
2) What often happens in cases like this is that the error isn't near where the problem occurs. While you may be noticing the problem in GLU32.dll, there was probably corruption earlier, maybe even in a different thread or function, which didn't cause a problem and at some later point the program came back to the corrupted region and failed.