I created a thread using C++11 thread class and I want the thread to sleep in a loop.
When the this_thread::sleep_for() function is called, I get exception saying:
Run-Time Check Failure #2 - Stack around the variable '_Now' was
corrupted.
My code is below:
std::chrono::milliseconds duration( 5000 );
while (m_connected)
{
this->CheckConnection();
std::this_thread::sleep_for(duration);
}
I presume _Now is a local variable somewhere deep in implementation of sleep_for. If it gets corrupt, either there is bug in that function (unlikely) or some other part of your application is writing to dangling pointers (much more likely).
The most likely cause is that you, some time before calling the sleep_for, give out pointer to local variable that stays around and is written to by other thread while this thread sleeps.
If you were on Linux, I'd recommend you to try valgrind (though I am not certain it can catch invalid access to stack), but on Windows I don't know about any tool for debugging this kind of problems. You can do careful review and you can try disabling various parts of functionality to see when the problem goes away to narrow down where it might be.
I also used to use duma library with some success, but it can only catch invalid access to heap, not stack.
Note: Both clang and gcc are further in implementing C++11 than MSVC++, so if you don't use much Windows-specific stuff, it might be easy to port and try valgrind on it. Gcc and especially clang are also known for giving much better static diagnostics than MSVC++, so if you compile it with gcc or clagn, you may get some warning that will point you to the problem.
Related
I have an std::string which appears to be getting corrupted somehow. Sometimes the string destructor will trigger an access violation, and sometimes printing it via std::cout will produce a crash.
If I pad the string in a struct as follows, the back_padding becomes slightly corrupted at a relatively consistant point in my code:
struct Test {
int front_padding[128] = {0};
std::string my_string;
int back_padding[128] = {0};
};
Is there a way to protect the front and back padding arrays so that writing to them will cause a exception or something? Or perhaps some tool which can be used to catch the culprit writing to this memory?
Platform: Windows x64 built with MSVC.
In general you have to solve problem of code sanitation, which is quite a broad topic. It sounds like you may have either out-of-bound write, or use of a dangling pointer or even a race condition in using a pointer, but in latter case bug's visibility is affected by obsevation, like the proverbial cat in quantum superposition state.
A dirty way to debug source of such rogue write is to create a data breakpont. It is especially effective if bug appears to be deterministic and isn't a "heisenbug". It is possible in MSVS during debug session. In gdb it is possible by using watch breakpoints.
You can point at the std::string storage or, in your experimental case, at the front padding array to in attempt to trigger breakpoint where a write operation occurs.
How can you catch memory corruption in C++?
The best way with a modern compiler is to compile with an address sanitizer. This inserts exactly the sort of guard areas you describe around automatic (stack) and dynamic (heap) allocations, and detects when they're trampled. It's built into Clang, GCC and MSVC.
If you don't have compiler support, or need to diagnose the problem in an existing binary without recompiling, you can use Valgrind.
The sanitized executable runs at full speed, although it's doing more work and deliberately has a less cache-friendly memory layout; expect it to be about 2x slower than an equivalent un-instrumented build.
Running under valgrind is much slower (expect 10x-30x for memcheck), but will catch more types of error, and is your only option if you can't recompile.
In the following code, *(long*)0=0; is used along with the if clause, but what is its purpose?
if(r.wid*r.ht < tot)
*(long*)0=0;
It writes 0 to 0 interpreted as the address of a long, i.e. the NULL pointer. It's not a valid thing to be doing, since NULL is never an address at which you can validly have data that your program can access. This code triggers undefined behavior; you cannot rely on it to have any particular effect, in general.
However, often code like this is used to force a segmentation fault-type crash, which is sometimes handy to drop into a debugger.
Again, this is undefined behavior; there is no guarantee that it will cause such a fault, but on systems that have segmentation faults, the above code is pretty likely to generate one. On other systems it might do something completely different.
If you get a segfault, it's sometimes more convenient to trigger one this way than by manually setting a breakpoint in the debugger. For instance if you're not using an IDE, it's often easier to type those few tokens into the code in the desired place, than it is to give the (textual) command to the debugger, specifying the exact source code file and line number manually can be a bit annoying.
In textbook C, abort is the way to deliberately crash the program. However, when you're programming close to the metal, you might have to worry about the possibility of abort not working as intended! The standard POSIXy implementation of abort calls getpid and kill (via raise) to deliver SIGABRT to the process, which in turn may cause execution of a signal handler, which can do as it likes. There are situations, e.g. deep in the guts of malloc, in response to catastrophic, possibly-adversarial memory corruption, where you need to force a crash without touching the stack at all (specifically, without executing a return instruction, which might jump to malicious code). *(long *)0 = 0 is not the craziest thing to try in those circumstances. It does still risk executing a signal handler, but that's unavoidable; there is no way to trigger SIGKILL without making a function call. More seriously (IMHO) modern compilers are a little too likely to see that, observe that it has undefined behavior, delete it, and delete the test as well, because the test can't possibly ever be true, because no one would deliberately invoke undefined behavior, would they? If this kind of logic seems perverse, please read the LLVM group's discourse on undefined behavior and optimization (part 2, part 3).
There are better ways to achieve this goal. Many compilers nowadays have an intrinsic (e.g. gcc, clang: __builtin_trap()) that generates a machine instruction that is guaranteed to cause a hardware fault and delivery of SIGILL; unlike undefined tricks with pointers, the compiler won't optimize that out. If your compiler doesn't have that, but does have assembly inserts, you can manually insert such an instruction—this is probably low-level enough code that the additional bit of machine dependence isn't a big deal. Or, you could just call _exit. This is arguably the safest way to play it, because it doesn't risk running signal handlers, and it involves no function returns even internally. But it does mean you don't get a core dump.
To cause a program to 'exit abnormally', use the abort() function (http://pubs.opengroup.org/onlinepubs/9699919799/functions/abort.html).
The standard C/C++ idiom for "if condition X is not true, make the program exit abnormally" is the assert() macro. The code above would be better written:
assert( !(r.wid*r.ht < tot) );
or (if you're happy to ignore edge cases), it reads more cleanly as:
assert( r.wid*r.ht >= tot );
If width times height of r is less than total, crash the program.
I am working with a program where my code calls a third party library which uses boost and shared_pointers to create a large and complex structure. This structure is created in a method that I call and at the end of the method I know that the program is finished.
For a large sample that I am handling the code to handle the processing takes 30 minutes and the boost code called automatically at exit takes many hours. Exiting the program without releasing the memory and spending all that time would be a perfectly acceptable outcome.
I tried
vector *iddListV = new vector(); // this WILL leak memory
with all the relevant structures added to the vector but this does not help.
I also tried calling exit(0); before reaching the end of the subroutine. This also causes the boost code to spend many hours trying to release pointers.
How to I get a C++ program (Microsoft C++ on Windows if that matters) to abruptly exit without calling the boost destructors.
My constraints are I can call any function before the boost structure are allocated but cannot modify the code once it starts running.
_Exit quits without calling any destructors.
If you're unconcerned about portability, you can call TerminateProcess(). But remember to take care that you are absolutely sure that your program is in a state which is ready to terminate. For example, if you terminate before I/O has had a chance to flush, then your file data and network streams may become invalid.
It is possible, in a portable manner, to do:
#include <exception>
...
std::terminate();
However, there's a big gotcha, in that, at least on linux, this may cause a core dump. (I'm really not sure what the behavior is on Windows).
It should be noted, that the behavior is implementation defined as far as whether or not destructors are called. Siting §15.5.1 P2:
In the situation where the search for a handler (15.3) encounters the
outermost block of a function with a noexcept-specification that does
not allow the exception (15.4), it is implementation-defined whether
the stack is unwound, unwound partially, or not unwound at all before
std::terminate() is called.
Additionally in §18.8.3.4 P1:
Remarks: Called by the implementation when exception handling must be
abandoned for any of several reasons (15.5.1), in effect immediately
after evaluating the throw-expression (18.8.3.1). May also be called
directly by the program.
C++11 also defines the function std::quick_exit(int status) that can be used in a similar manner (presumably without a coredump). This function is available from <cstdlib>.
I have a C++ application cross-compiled for Linux running on an ARM CortexA9 processor which is crashing with a SIGFPE/Arithmetic exception. Initially I thought that it's because of some optimizations introduced by the -O3 flag of gcc but then I built it in debug mode and it still crashes.
I debugged the application with gdb which catches the exception but unfortunately the operation triggering exception seems to also trash the stack so I cannot get any detailed information about the place in my code which causes that to happen. The only detail I could finally get was the operation triggering the exception(from the following piece of stack trace):
3 raise() 0x402720ac
2 __aeabi_uldivmod() 0x400bb0b8
1 __divsi3() 0x400b9880
The __aeabi_uldivmod() is performing an unsigned long long division and reminder so I tried the brute force approach and searched my code for places that might use that operation but without much success as it proved to be a daunting task. Also I tried to check for potential divisions by zero but again the code base it's pretty large and checking every division operation it's a cumbersome and somewhat dumb approach. So there must be a smarter way to figure out what's happening.
Are there any techniques to track down the causes of such exceptions when the debugger cannot do much to help?
UPDATE: After crunching on hex numbers, dumping memory and doing stack forensics(thanks Crashworks) I came across this gem in the ARM Compiler documentation(even though I'm not using the ARM Ltd. compiler):
Integer division-by-zero errors can be trapped and identified by
re-implementing the appropriate C library helper functions. The
default behavior when division by zero occurs is that when the signal
function is used, or
__rt_raise() or __aeabi_idiv0() are re-implemented, __aeabi_idiv0() is
called. Otherwise, the division function returns zero.
__aeabi_idiv0() raises SIGFPE with an additional argument, DIVBYZERO.
So I put a breakpoint at __aeabi_idiv0(_aeabi_ldiv0) et Voila!, I had my complete stack trace before being completely trashed. Thanks everybody for their very informative answers!
Disclaimer: the "winning" answer was chosen solely and subjectively taking into account the weight of its suggestions into my debugging efforts, because more than one was informative and really helpful.
My first suggestion would be to open a memory window looking at the region around your stack pointer, and go digging through it to see if you can find uncorrupted stack frames nearby that might give you a clue as to where the crash was. Usually stack-trashes only burn a couple of the stack frames, so if you look upwards a few hundred bytes, you can get past the damaged area and get a general sense of where the code was. You can even look down the stack, on the assumption that the dead function might have called some other function before it died, and thus there might be an old frame still in memory pointing back at the current IP.
In the comments, I linked some presentation slides that illustrate the technique on a PowerPC — look at around #73-86 for a case study in a similar botched-stack crash. Obviously your ARM's stack frames will be laid out differently, but the general principle holds.
(Using the basic idea from Fedor Skrynnikov, but with compiler help instead)
Compile your code with -pg. This will insert calls to mcount and mcountleave() in every function. Do not link against the GCC profiling lib, but provide your own. The only thing you want to do in your mcount and mcountleave() is to keep a copy of the current stack, so just copy the top 128 bytes or so of the stack to a fixed buffer. Both the stack and the buffer will be in cache all the time so it's fairly cheap.
You can implement special guards in functions that can cause the exception. Guard is a simple class, in constractor of this class you put the name of the file and line (_FILE_, _LINE_) into file/array/whatever. The main condition is that this storage should be the same for all instances of this class(kind of stack). In the destructor you remove this line. To make it works you need to put the creation of this guard on the first line of each function and to create it only on stack. When you will be out of current block deconstructor will be called. So in the moment of your exception you will know from this improvised callstack which function is causing a problem.
Ofcaurse you may put creation of this class under debug condition
Enable generation of core files, and open the core file with the debuger
Since it uses raise() to raise the exception, I would expect that signal() should be able to catch it. Is this not the case?
Alternatively, you can set a conditional breakpoint at __aeabi_uldivmod to break when divisor (r1) is 0.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Common reasons for bugs in release version not present in debug mode
Sometimes I encouter such strange situations that the program run incorrectly while running normally and it will pop-up the termination dialog,but correctly while debugging.This do make me frustrated when I want to use debugger to find the bug inside my code.
Have you ever met this kind of situation and why?
Update:
To prove there are logic reasons that will led such a frustrating situation:
I think one big possibility is heap access volidation. I once wrote a function that allocate a small buffer, but later I step out the boudary. It will run correctly within gdb, cdb, etc (I do not know why, but it do run correctly); but terminate abnormally while running normally.
I am using C++.
I do not think my problem duplicate the above one.
That one is comparision between release mode and debug mode,but mine is between debugging and not debugging,which have a word heisenbug, as many other noted.
thanks.
You have a heisenbug.
Debugger might be initializing values
Some environments initialize variables and/or memory to known values like zero in debug builds but not release builds.
Release might be built with optimizations
Modern compilers are good, but it could hypothetically happen that optimized code functions differently than non-optimized code. Edit: These days, compiler bugs are rare. If you find yourself thinking you have one, exhaust all other ideas first.
There can be other reasons for heisenbugs.
Here's a common gotcha that can lead to a Heisenbug (love that name!):
// Sanity check - this should never fail
ASSERT( ReleaseResources() == SUCCESS);
In a debug build, this will work as expected, but the ASSERT macro's argument is ignored in a release build. By ignored, I mean that not only won't the result be reported, but the expression won't be evaluated at all (i.e. ReleaseResources() won't be called).
This is a common mistake, and it's why the Windows SDK defines a VERIFY() macro in addition to the ASSERT() macro. They both generate an assertion dialog at runtime in a debug build if the argument evaluates to false. Their behavior is different for a release build, however. Here's the difference:
ASSERT( foo() == true ); // Confirm that call to foo() was successful
VERIFY( bar() == true ); // Confirm that call to bar() was successful
In a debug build, the above two macros behave identically. In a release build, however, they are essentially equivalent to:
; // Confirm that call to foo() was successful
bar(); // Confirm that call to bar() was successful
By the way, if your environment defines an ASSERT() macro, but not a VERIFY() macro, you can easily define your own:
#ifdef _DEBUG
// DEBUG build: Define VERIFY simply as ASSERT
# define VERIFY(expr) ASSERT(expr)
#else
// RELEASE build: Define VERIFY as the expression, without any checking
# define VERIFY(expr) ((void)(expr))
#endif
Hope that helps.
Apparently stackoverflow won't let me post a response which contains only a single word :)
VALGRIND
When using a debugger, sometimes memory gets initialized (e.g. zero'ed) whereas without a debugging session, memory can be random. This could explain the behavior you are seeing.
You have dialogs, so there may be threads in your application. If there is threads, there is a possibility of race conditions.
Let say your main thread initialize a structure that another thread uses. When you run your program inside the debugger the initializing thread may be scheduled before the other thread while in your real-life situation the thread that use the structure is scheduled before the other thread actually initialize it.
In addition to what JeffH said, you have to consider if the deploying computer (or server) has the same environment/libraries/whatever_related_to_the_program.
Sometimes it's very difficult to debug correctly if you debug with other conditions.
Giovanni
Also, debuggers might add some padding around allocated memory changing the behaviour. This has caught me out a number of times, so you need to be aware of it. Getting the same memory behaviour in debug is important.
For MSVC, this can be disabled with the env-var _NO_DEBUG_HEAP=1. (The debug heap is slow, so this helps if your debug runs are hideously slow too..).
Another method to get the same is to start the process outside the debugger, so you get a normal startup, then wait on first line in main and attach the debugger to process. That should work for "any" system. provided that you don't crash before main. (You could wait on a ctor on a statically pre-mani constructed object then...)
But I've no experience with gcc/gdb in this matter, but things might be similar there... (Comments welcome.)
One real-world example of heisenbug from Raymand Zhang.
/*--------------------------------------------------------------
GdPage.cpp : a real example to illustrate Heisenberg Effect
related with guard page by Raymond Zhang, Oct. 2008
--------------------------------------------------------------*/
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{
LPVOID lpvAddr; // address of the test memory
lpvAddr = VirtualAlloc(NULL, 0x4096,
MEM_RESERVE | MEM_COMMIT,
PAGE_READONLY | PAGE_GUARD);
if(lpvAddr == NULL)
{
printf("VirtualAlloc failed with %ld\n", GetLastError());
return -1;
}
return *(long *)lpvAddr;
}
The program would terminate abnormally whether compile with Debug or Release,because
by specifying the PAGE_GUARD flag would cause the:
Pages in the region become guard
pages. Any attempt to read from or
write to a guard page causes the
system to raise a STATUS_GUARD_PAGE
exception and turn off the guard page
status. Guard pages thus act as a
one-shot access alarm.
So you'd get STATUS_GUARD_PAGE while trying to access *lpvAddr.But if you use debugger load the program and watch *lpvAddv or step the last statement return *(long *)lpvAddr assembly by assembly,the debugger would forsee the guard page to determine the value of *lpvAddr.So the debugger would have cleared the guard alarm for us before we access *lpvAddr.
Which programming language are you using. Certain languages, such as C++, behave slightly differently between release and debug builds. In the case of C++, this means that when you declare a var, such as int i;, in debug builds it will be initialised to 0, while in release builds it may take any value (whatever was stored in its memory location before).
One big reason is that debug code may define the _DEBUG macro that one may use in the code to add extra stuff in debug builds.
For multithreaded code, optimization may affect ordering which may influence race conditions.
I do not know if debug code adds code on the stack to mark stack frames. Any extra stuff on the stack may hide the effects of buffer overruns.
Try using the same command options as your release build and just add the -g (or equivalent debug flag). gcc allows the debug option together with the optimization options.
If your logic depends on data from the system clock, you could see serious probe effects. If you break into the debugger, you will obviously effect the values returned from clock functions such as timeGetTime(). The same is true if your program takes longer to execute. As other people have said, debug builds insert NOOPs. Also, simply running under the debugger (without hitting breakpoints) might slow things down.
An example of where this might happen is a real-time physics simulation with a variable time step, based off elapsed system time. This is why there are articles like this:
http://gafferongames.com/game-physics/fix-your-timestep/