STLport crash (race condition, Darwin only?) - c++

When I run STLport on Darwin I get a strange crash. (Haven't seen it anywhere else than on Mac, but exactly same thing crash on both i686 and PowerPC.) This is what it looks like in gdb:
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: 13 at address: 0x0000000000000000
[Switching to process 21097]
0x000000010120f47c in stlp_std::__node_alloc_impl::_M_allocate ()
It may be some setting in STLport, I noticed that Mac.h and MacOSX.h seemed far behind on features. I also know that it it must be some type of race condition, since it doesn't occur just by calling this method (implicity called). The crash happens mainly when I push the system, running 10 simultaneous threads that do a lot of string handling.
Other theories I come up with have to do with compiler flags (configure script) and g++ 4.2 bugs (seems like 4.4.3 isn't on Mac yet with Objective-C support, which I need to link with).
HELP!!! :)
Edit: I run unit tests, which do all sorts of things. This problem arise when I start 10 threads that push the system; and it always comes down to std::string::append which eventually boils down to _M_allocate. Since I can't even get a descent dump of the code that's causing the problem, I figure I'm doing something bad. Could it be so since it's trying to execute at instruction pointer 0x000...000? Are dynlibs built as DLLs in Windows with a jump table? Could it perhaps be that such a jump table has been overwritten for some reason? That would probably explain this behavior. (The code is huge, if I run out of other ideas, I'll post a minimum crashing sample here.)

hmm.. STLPort uses allocaters which takes platform memory and pool it internally to the data structures when needed.
Just check when the crash happens, the heap given to the executing thread is enough for a alloc to happen. Even if an alloc fails, this crash can happen.
I am not sure about the granularity of the STL allocator in the current configuration you are working on. Check stl_config.h for that.

This problem was caused by an unrelated crash bug, causing memory overwrites leading to an STLport crash in my case.

Related

C++ Program freezes esoterically

I wrote a C++ CLI program with MS VC++ 2010 and GCC 4.2.1 (for Mac OS X 10.6 64 bit, in Eclipse).
The program works well under GCC+OS X and most times under Windows. But sometimes it silently freezes. The command line cursor keeps blinking, but the program refuses to continue working.
The following configurations work well:
GCC with 'Release' and 'Debug' configuration.
VC++ with 'Debug' configuration
The error only occurs in the configuration 'VC++ with 'Release' configuration' under Win 7 32 bit and 64 bit. Unfortunately this is the configuration my customer wants to work with ;-(
I already checked my program high and low and fixed all memory leaks. But this error still occurs. Do you have any ideas how I can find the error?
Use logging to narrow down which part of code the program is executing when it crashes. Keep adding log until you narrow it down enough to see the issue.
Enable debug information in the release build (both compiler and linker); many variables won't show up correctly, but it should at least give you sensible backtrace (unless the freeze is due to stack smashing or stack overflow), which is usually enough if you keep functions short and doing just one thing.
Memory leaks can't cause freezes. Other forms of memory misuse are however quite likely to. In my experience overrunning a buffer often cause freezes when that buffer is freed as the free function follows the corrupted block chains. Also watch for any other kind of Undefined Behaviour. There is a lot of it in C/C++ and it usually behaves as you expect in debug and completely randomly when optimized.
Try building and running the program under DUMA library to check for buffer overruns. Be warned though that:
It requires a lot of memory. I mean easily like thousand times more. So you can only test on simple cases.
Microsoft headers tend to abuse their internal allocation functions and mismatch e.g. regular malloc and internal __debug_free (or the other way 'round). So might get a few cases that you'll have to carefully workaround by including those system headers into the duma one before it redefines the functions.
Try building the program for Linux and run it under Valgrind. That will check more problems in addition to buffer overruns and won't use that much memory (only twice as normal, but it is slower, approximately 20 times).
Debug versions usually initialize all allocated memory (MSVC fills them with 0xCD with the debug configuration). Maybe you have some uninitialized values in your classes, with the GCC configurations and MSVC Debug configuration it gets a "lucky" value, but in MSVC Release it doesn't.
Here are the rest of the magic numbers used by MSVC.
So look for uninitialized variables, attributes and allocated memory blocks.
Thank you all, especially Cody Gray and MikMik, I found it!
As some of you recommended I told VS to generate debug information and disabled optimizations in the release configuration. Then I started the program and paused it. Alternatively I remotely attached to the running process. This helped me finding the region where the error was.
The reasons were infinite loops, caused by reads behind the boundaries of an array and a missing exclusion of an invalid case. Both led to unreachable stopping conditions at runtime. The esoteric part came from the fact, that my program uses some randomized values.
That's life...

What is the difference between valgrind and regular c++ run

I'm trying to identify a bug in I have in my code where I get seg. fault while trying to assign value to a pointer from a vector (it is describe better in the link). When I run the code using valgrind I don't get the seg.fault.
What does valgrind do differently. I think that I need to consider the memory management differences between valgrind session and regular c++ session but I don't really know
From Valgrind FAQ:
4.4. My program crashes normally, but doesn't under Valgrind, or vice
versa. What's happening?
When a program runs under Valgrind, its environment is slightly
different to when it runs natively. For example, the memory layout is
different, and the way that threads are scheduled is different.
Most of the time this doesn't make any difference, but it can,
particularly if your program is buggy. For example, if your program
crashes because it erroneously accesses memory that is unaddressable,
it's possible that this memory will not be unaddressable when run
under Valgrind. Alternatively, if your program has data races, these
may not manifest under Valgrind.
There isn't anything you can do to change this, it's just the nature
of the way Valgrind works that it cannot exactly replicate a native
execution environment. In the case where your program crashes due to a
memory error when run natively but not when run under Valgrind, in
most cases Memcheck should identify the bad memory operation.
So you can nothing to do with it. Actually you need not worry that you program not crashes under Valgrind. You should read error messages from it and fix them. Start with Invalid read/Invalid write errors. They are almost always indicate the bug in code. In this particular case you can also run your code in infinite loop from simple bash script utill it produces error message. Most likely you are working with invalid iterators and it is Undefined Behaviour in C++.
Maybe the issue might be timing dependent, When you run your code with valgrind it runs a little slower because valgrind collects and diagnoses your code at run time.
Valgrind keeps track of your programs memory usage. This is how it tells you about leaks. What this means is that it hijacks the malloc and such and uses its own so that it can achieve this. This means that probably when you run your code normally you read/write to some data you have freed accidentally causing segfault whereas it could be that valgrind is keeping this memory around to see if it is truly lost etc thereby meaning by (un)luck the memory is still valid. Just a guess.
Valgrind runs you program on a virtual CPU, that is, it executes every assembly instruction in software (apart from kernel calls). Multi-threaded programs get serialized, i.e. only one thread of execution is making progress at one time.
If your application is multi-threaded, when it is executed under valgrind race condition and the lack of synchronization may be masked by the thread serialization, so that the effects of such bugs are not observed.

C++: Where to start when my application crashes at random places?

I'm developing a game and when I do a specific action in the game, it crashes.
So I went debugging and I saw my application crashed at simple C++ statements like if, return, ... Each time when I re-run, it crashes randomly at one of 3 lines and it never succeeds.
line 1:
if (dynamic) { ... } // dynamic is a bool member of my class
line 2:
return m_Fixture; // a line of the Box2D physical engine. m_Fixture is a pointer.
line 3:
return m_Density; // The body of a simple getter for an integer.
I get no errors from the app nor the OS...
Are there hints, tips or tricks to debug more efficient and get known what is going on?
That's why I love Java...
Thanks
Random crashes like this are usually caused by stack corruption, since these are branching instructions and thus are sensitive to the condition of the stack. These are somewhat hard to track down, but you should run valgrind and examine the call stack on each crash to try and identify common functions that might be the root cause of the error.
Are there hints, tips or tricks to debug more efficient and get known what is going on?
Run game in debugger, on the point of crash, check values of all arguments. Either using visual studio watch window or using gdb. Using "call stack" check parent routines, try to think what could go wrong.
In suspicious(potentially related to crash) routines, consider dumping all arguments to stderr (if you're using libsdl or on *nixlike systems), or write a logfile, or send dupilcates of all error messages using (on Windows) OutputDebugString. This will make them visible in "output" window in visual studio or debugger. You can also write "traces" (log("function %s was called", __FUNCTION__))
If you can't debug immediately, produce core dumps on crash. On windows it can be done using MiniDumpWriteDump, on linux it is set somewhere in configuration variables. core dumps can be handled by debugger. I'm not sure if VS express can deal with them on Windows, but you still can debug them using WinDBG.
if crash happens within class, check *this argument. It could be invalid or zero.
If the bug is truly evil (elusive stack corruption in multithreaded app that leads to delayed crash), write custom memory manager, that will override new/delete, provide alternative to malloc(if your app for some reason uses it, which may be possible), AND that locks all unused memory memory using VirtualProtect (windows) or OS-specific alternative. In this case all potentially dangerous operation will crash app instantly, which will allow you to debug the problem (if you have Just-In-Time debugger) and instantly find dangerous routine. I prefer such "custom memory manager" to boundschecker and such - since in my experience it was more useful. As an alternative you could try to use valgrind, which is available on linux only. Note, that if your app very frequently allocates memory, you'll need a large amount of RAM in order to be able to lock every unused memory block (because in order to be locked, block should be PAGE_SIZE bytes big).
In areas where you need sanity check either use ASSERT, or (IMO better solution) write a routine that will crash the application (by throwing an std::exception with a meaningful message) if some condition isn't met.
If you've identified a problematic routine, walk through it using debugger's step into/step over. Watch the arguments.
If you've identified a problematic routine, but can't directly debug it for whatever reason, after every statement within that routine, dump all variables into stderr or logfile (fprintf or iostreams - your choice). Then analyze outputs and think how it could have happened. Make sure to flush logfile after every write, or you might miss the data right before the crash.
In general you should be happy that app crashes somewhere. Crash means a bug you can quickly find using debugger and exterminate. Bugs that don't crash the program are much more difficult (example of truly complex bug: given 100000 values of input, after few hundreds of manipulations with values, among thousands of outputs, app produces 1 absolutely incorrect result, which shouldn't have happened at all)
That's why I love Java...
Excuse me, if you can't deal with language, it is entirely your fault. If you can't handle the tool, either pick another one or improve your skill. It is possible to make game in java, by the way.
These are mostly due to stack corruption, but heap corruption can also affect programs in this way.
stack corruption occurs most of the time because of "off by one errors".
heap corruption occurs because of new/delete not being handled carefully, like double delete.
Basically what happens is that the overflow/corruption overwrites an important instruction, then much much later on, when you try to execute the instruction, it will crash.
I generally like to take a second to step back and think through the code, trying to catch any logic errors.
You might try commenting out different parts of the code and seeing if it affects how the program is compiled.
Besides those two things you could try using a debugger like Visual Studio or Eclipse etc...
Lastly you could try to post your code and the error you are getting on a website with a community that knows programming and could help you work through the error (read: stackoverflow)
Crashes / Seg faults usually happen when you access a memory location that it is not allowed to access, or you attempt to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location).
There are many memory analyzer tools, for example I use Valgrind which is really great in telling what the issue is (not only the line number, but also what's causing the crash).
There are no simple C++ statements. An if is only as simple as the condition you evaluate. A return is only as simple as the expression you return.
You should use a debugger and/or post some of the crashing code. Can't be of much use with "my app crashed" as information.
I had problems like this before. I was trying to refresh the GUI from different threads.
If the if statements involve dereferencing pointers, you're almost certainly corrupting the stack (this explains why an innocent return 0 would crash...)
This can happen, for instance, by going out of bounds in an array (you should be using std::vector!), trying to strcpy a char[]-based string missing the ending '\0' (you should be using std::string!), passing a bad size to memcpy (you should be using copy-constructors!), etc.
Try to figure out a way to reproduce it reliably, then place a watch on the corrupted pointer. Run through the code line-by-line until you find the very line that corrupts the pointer.
Look at the disassembly. Almost any C/C++ debugger will be happy to show you the machine code and the registers where the program crashed. The registers include the Instruction Pointer (EIP or RIP on x86/x64) which is where the program was when it stopped. The other registers usually have memory addresses or data. If the memory address is 0 or a bad pointer, there is your problem.
Then you just have to work backward to find out how it got that way. Hardware breakpoints on memory changes are very helpful here.
On a Linux/BSD/Mac, using GDB's scripting features can help a lot here. You can script things so that after the breakpoint is hit 20 times it enables a hardware watch on the address of array element 17. Etc.
You can also write debugging into your program. Use the assert() function. Everywhere!
Use assert to check the arguments to every function. Use assert to check the state of every object before you exit the function. In a game, assert that the player is on the map, that the player has health between 0 and 100, assert everything that you can think of. For complicated objects write verify() or validate() functions into the object itself that checks everything about it and then call those from an assert().
Another way to write in debugging is to have the program use signal() in Linux or asm int 3 in Windows to break into the debugger from the program. Then you can write temporary code into the program to check if it is on iteration 1117321 of the main loop. That can be useful if the bug always happens at 1117322. The program will execute much faster this way than to use a debugger breakpoint.
some tips :
- run your application under a debugger, with the symbol files (PDB) together.
- How to set Visual Studio as the default post-mortem debugger?
- set default debugger for WinDbg Just-in-time Debugging
- check memory allocations Overriding new and delete, and Overriding malloc and free
One other trick: turn off code optimization and see if the crash points make more sense. Optimization is allowed to float little bits of your code to surprising places; mapping that back to source code lines can be less than perfect.
Check pointers. At a guess, you're dereferencing a null pointer.
I've found 'random' crashes when there are some reference to a deleted object. As the memory is not necessarily overwritten, in many cases you don't notice it and the program works correctly, and than crashes after the memory was updated and is not valid anymore.
JUST FOR DEBUGGING PURPOSES, try commenting out some suspicious 'deletes'. Then, if it doesn't crash anymore, there you are.
use the GNU Debugger
Refactoring.
Scan all the code, make it clearer if not clear at first read, try to understand what you wrote and immediately fix what seems incorrect.
You'll certainly discover the problem(s) this way and fix a lot of other problems too.

What causes a Sigtrap in a Debug Session

In my c++ program I'm using a library which will "send?" a Sigtrap on a certain operations when
I'm debugging it (using gdb as a debugger). I can then choose whether I wish to Continue or Stop the program. If I choose to continue the program works as expected, but setting custom breakpoints after a Sigtrap has been caught causes the debugger/program to crash.
So here are my questions:
What causes such a Sigtrap? Is it a leftover line of code that can be removed, or is it caused by the debugger when he "finds something he doesn't like" ?
Is a sigtrap, generally speaking, a bad thing, and if so, why does the program run flawlessly when I compile a Release and not a Debug Version?
What does a Sigtrap indicate?
This is a more general approach to a question I posted yesterday Boost Filesystem: recursive_directory_iterator constructor causes SIGTRAPS and debug problems.
I think my question was far to specific, and I don't want you to solve my problem but help me (and hopefully others) to understand the background.
Thanks a lot.
With processors that support instruction breakpoints or data watchpoints, the debugger will ask the CPU to watch for instruction accesses to a specific address, or data reads/writes to a specific address, and then run full-speed.
When the processor detects the event, it will trap into the kernel, and the kernel will send SIGTRAP to the process being debugged. Normally, SIGTRAP would kill the process, but because it is being debugged, the debugger will be notified of the signal and handle it, mostly by letting you inspect the state of the process before continuing execution.
With processors that don't support breakpoints or watchpoints, the entire debugging environment is probably done through code interpretation and memory emulation, which is immensely slower. (I imagine clever tricks could be done by setting pagetable flags to forbid reading or writing, whichever needs to be trapped, and letting the kernel fix up the pagetables, signaling the debugger, and then restricting the page flags again. This could probably support near-arbitrary number of watchpoints and breakpoints, and run only marginally slower for cases when the watchpoint or breakpoint aren't frequently accessed.)
The question I placed into the comment field looks apropos here, only because Windows isn't actually sending a SIGTRAP, but rather signaling a breakpoint in its own native way. I assume when you're debugging programs, that debug versions of system libraries are used, and ensure that memory accesses appear to make sense. You might have a bug in your program that is papered-over at runtime, but may in fact be causing further problems elsewhere.
I haven't done development on Windows, but perhaps you could get further details by looking through your Windows Event Log?
While working in Eclipse with minGW/gcc compiler, I realized it's reacting very bad with vectors in my code, resulting to an unclear SIGTRAP signal and sometimes even showing abnormal debugger behavior (i.e. jumping somewhere up in the code and continuing execution of the code in reverse order!).
I have copied the files from my project into the VisualStudio and resolved the issues, then copied the changes back to eclipse and voila, worked like a charm. The reasons were like vector initialization differences with reserve() and resize() functions, or trying to access elements out of the bounds of the vector array.
Hope this will help someone else.
I received a SIGTRAP from my debugger and found out that the cause was due to a missing return value.
string getName() { printf("Name!");};

What can cause an abnormal program termination?

MFC application (uses SQLite3.dll for DB access, along with other DLLs for accessing hardware) terminates abnormally. There is no particular sequence of termination :(
My application is a
Single threaded Application
Uses exception handling
Uses more than 6 DLLs to access different hardwares
Runs on WinXP SP2
Initially i thought it might be because of Stack Overflow, later i discovered its not. Can someone tell me what are all the general causes for an abnormal program termination? If someone has come across similar problems or has any hints or clues, please pass them on.
Thanks in Advance
Generally speaking, the general causes of crashes are when you:
read memory that isn't yours
write memory that isn't yours
divide by zero
do something inside an interrupt that you shouldn't
free() a pointer more than once
Possibly also:
have an unhanded exception
found a bug in your MFC
one of your >6 hardware-access DLLs is doing any of the above
You are encountering some kind of hardware fault
Maybe you're passing a bad buffer to one of your hardware DLLs, or are forgetting to lock some memory, or you could even have a version mismatch between the DLLs and their headers.
There are so many choices :P
Since this is a run-time issue, I suggest you send debug statements to a log file. Include the function name and perhaps a timestamp. Always flush the output buffer after writing to the file, as this provides better probability that the last line was written to the file before the exception occurs.