In the following code, *(long*)0=0; is used along with the if clause, but what is its purpose?
if(r.wid*r.ht < tot)
*(long*)0=0;
It writes 0 to 0 interpreted as the address of a long, i.e. the NULL pointer. It's not a valid thing to be doing, since NULL is never an address at which you can validly have data that your program can access. This code triggers undefined behavior; you cannot rely on it to have any particular effect, in general.
However, often code like this is used to force a segmentation fault-type crash, which is sometimes handy to drop into a debugger.
Again, this is undefined behavior; there is no guarantee that it will cause such a fault, but on systems that have segmentation faults, the above code is pretty likely to generate one. On other systems it might do something completely different.
If you get a segfault, it's sometimes more convenient to trigger one this way than by manually setting a breakpoint in the debugger. For instance if you're not using an IDE, it's often easier to type those few tokens into the code in the desired place, than it is to give the (textual) command to the debugger, specifying the exact source code file and line number manually can be a bit annoying.
In textbook C, abort is the way to deliberately crash the program. However, when you're programming close to the metal, you might have to worry about the possibility of abort not working as intended! The standard POSIXy implementation of abort calls getpid and kill (via raise) to deliver SIGABRT to the process, which in turn may cause execution of a signal handler, which can do as it likes. There are situations, e.g. deep in the guts of malloc, in response to catastrophic, possibly-adversarial memory corruption, where you need to force a crash without touching the stack at all (specifically, without executing a return instruction, which might jump to malicious code). *(long *)0 = 0 is not the craziest thing to try in those circumstances. It does still risk executing a signal handler, but that's unavoidable; there is no way to trigger SIGKILL without making a function call. More seriously (IMHO) modern compilers are a little too likely to see that, observe that it has undefined behavior, delete it, and delete the test as well, because the test can't possibly ever be true, because no one would deliberately invoke undefined behavior, would they? If this kind of logic seems perverse, please read the LLVM group's discourse on undefined behavior and optimization (part 2, part 3).
There are better ways to achieve this goal. Many compilers nowadays have an intrinsic (e.g. gcc, clang: __builtin_trap()) that generates a machine instruction that is guaranteed to cause a hardware fault and delivery of SIGILL; unlike undefined tricks with pointers, the compiler won't optimize that out. If your compiler doesn't have that, but does have assembly inserts, you can manually insert such an instruction—this is probably low-level enough code that the additional bit of machine dependence isn't a big deal. Or, you could just call _exit. This is arguably the safest way to play it, because it doesn't risk running signal handlers, and it involves no function returns even internally. But it does mean you don't get a core dump.
To cause a program to 'exit abnormally', use the abort() function (http://pubs.opengroup.org/onlinepubs/9699919799/functions/abort.html).
The standard C/C++ idiom for "if condition X is not true, make the program exit abnormally" is the assert() macro. The code above would be better written:
assert( !(r.wid*r.ht < tot) );
or (if you're happy to ignore edge cases), it reads more cleanly as:
assert( r.wid*r.ht >= tot );
If width times height of r is less than total, crash the program.
Related
I'm currently dealing with one of the strangest bugs I have ever seen. I have this "else if" statement and inside the else-if I have a line of code that is causing a bug to happen elsewhere (my program is kind of complicated so I don't think it would help to post a short snippet of code here because it would be too difficult to explain -- so I apologize in advance if this post seems rather vague).
The issue is that the line of code that is causing the bug is not being called at all. I put a break point at the line and also put a print statement before it but the program never enters that particular "if-else" statement. The bug goes away when I comment out the line and shows up again when I uncomment it. This leads me to believe that the line must be getting called somehow but my break point and prints suggest otherwise.
Has anyone ever heard of something like this happening? Why would a line of code that is not even being called affect the rest of my program? Are there other ways to detect if the line is being called somehow besides using breakpoints and print statements?
I'm using XCode as my IDE and this is a single threaded program (so it's not some weird asynchronous bug)
PROBLEM HAS BEEN SOLVED. SEE TOP ANSWER
It actually may happen in some cases indeed and I already saw it before. If the bug is some slight buffer overflow then the presence/abasence of that line may make the compiler differently optimize the memory layout (ie. not allocate some variables or arrange them in a different way or place segments differently for example) that will by chance not trigger the problem anymore.
The same applies if the bug is a strange race condition: the lack of that line may change slightly the timings (due to differently optimized code) and make the bug come out.
Very long shot: that code may even somehow trigger a compiler bug. But this may be less the case, but it may.
So: yes it's definitely possible and I already saw it. And if something like this is happening to you and you're 100% sure your code is correct then be very careful since something quite nasty may be hiding in the code.
Not that there is enough information here to really answer this question, but perhaps I can try to provide some pointers.
Optimization. Try turning it off if it is turned on at all. When the compiler is optimizing your code, it may make different decisions when a statement is present vs. when it isn't. Whenever I am dealing with bugs that go away when a a seemingly unrelated code construct is changed, it is usually that the optimizer is doing something differently. I suppose a very simple example would be if the statement accesses something that the compiler would otherwise think is never accessed and can be folded away. The "unrelated" line may take an address of something which affects aliasing information, etc. All this isn't to say that the optimizer is wrong, your code likely still does have a bug, but it just explains the weird behaviour.
Debugging. It is very unlikely that the bug is in this line that is not reached. What I would focus on is setting watch points for the variables that are getting incorrect values (assuming you can narrow it down to what is receiving the wrong value).
These very weird issues often indicate an uninitialized variable or something along those lines. If the underlying compiler supports it, you can try providing an option that will cause the program to trap when an uninitialized memory location is being accessed.
Finally, this could be an indication that something is overwriting areas of the stack (and perhaps less likely other memory areas - heap, data). If you have anything that is writing to arrays allocated on the stack, check if you're walking past the end of the array.
Make sure you're using { and } around the bodies in your if() and else clauses. Without them it's easy to wind up with something like this:
if (a)
do_a_work();
if (b)
do_a_and_b_work();
else
do_not_a_work();
Which actually equates to this (and is not what's implied by the indenting):
if (a) {
do_a_work();
}
if (b) {
do_a_and_b_work();
} else {
do_not_a_work();
}
because the brace-less if (a) only takes the first statement below it for its body.
Commenting out your mystery line may be changing which code belongs to which if-else.
I'm on amd64 architecture. The following code gets rejected by the g++ compiler at the "if" statement:
void * newmem=malloc(n);
if(newmem==0xefbeaddeefbeadde){
with the error message:
error: ISO C++ forbids comparison between pointer and integer [-fpermissive]
I can't seem to find the magic incantation needed to get it going (and I don't want to use -fpermissive). Any help appreciated.
Background:
I'm hunting an ugly bug which crashes my program while requesting memory in some STL new operation (at least gdb told me that). Thinking it could be some memory overrun of one of the allocated memory chunks being, by bad luck, adjacent to memory used by the OS to manage memory lists of my program, I quickly overrode new(), new plus their delete counterparts with own routines that added memory fencing; and while the application still crashes (all fences intact (sigh)), gdb now told me this:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000604f5e in construct (__val=..., __p=0xefbeaddeefbeadde, this=<optimized out>)
at /usr/include/c++/4.6/ext/new_allocator.h:108
108 { ::new((void *)__p) _Tp(__val); }
I noted that the pointer __p has as value the pointer representation of the value I used for one of my memory fences (0xdeadbeef), hence my wish to catch this earlier in my new() to try and dump out some more complex values in my program.
Additional note: the function where it crashes runs flawlessly a couple of million times before it crashes (intermixed with dozens of other routines which all also run a couple of thousand to million times), using valgrind does not seem like an option atm because it takes 6hrs and 11 Gb before my program crashes.
One of them is an integer, and the other a pointer. Perhaps an ugly cast would do
if(newmem==(void*)0xefbeaddeefbeadde){
For your question: you have 2 options - cast the pointer to 64bit number or cast the number to void*
For the background: as this takes too long, it could be a memory leak. So, try valgrind - no need to wait for crash, just run your program through valgrind and use the appropriate options. valgrind could help you to catch undefined behavior, too.
Another thing - compile your program with -O0 optimization level and with max level of debug symbols - -ggdb3. Also, if your executable does not generate a core dump, when this error occurs, make it generate. Then leave it working, while you're trying different approaches to catch the error.
In case you don't succeed, you'll at least have a (hopefully) nice core dump, that you can examine with gdb your_exe the_core_dump. Then, watch the core frame by frame, watch the variables, etc.
If it's not a memory leak, it could be a memory corruption somewhere, undefined behavior or something like this. If a good core dump is generated, you'll probably catch the error. Or, at least, it could give you useful information about the case.
Another thing, that could be useful, if you generate a core dump (no matter if it's a good or a bad one) - the size of the dump. If it's too large (the definition of "large" this depends on your program), then you most probably have a memory leak (it could be some kind of cross-reference issue, if you use smart pointers and if so - valgrind will not catch it)
That probably won't work that way. Next time when your program runs, malloc may never return that value. There is something else you must do, rather than just checking on the return value of new!
I have a C++ application cross-compiled for Linux running on an ARM CortexA9 processor which is crashing with a SIGFPE/Arithmetic exception. Initially I thought that it's because of some optimizations introduced by the -O3 flag of gcc but then I built it in debug mode and it still crashes.
I debugged the application with gdb which catches the exception but unfortunately the operation triggering exception seems to also trash the stack so I cannot get any detailed information about the place in my code which causes that to happen. The only detail I could finally get was the operation triggering the exception(from the following piece of stack trace):
3 raise() 0x402720ac
2 __aeabi_uldivmod() 0x400bb0b8
1 __divsi3() 0x400b9880
The __aeabi_uldivmod() is performing an unsigned long long division and reminder so I tried the brute force approach and searched my code for places that might use that operation but without much success as it proved to be a daunting task. Also I tried to check for potential divisions by zero but again the code base it's pretty large and checking every division operation it's a cumbersome and somewhat dumb approach. So there must be a smarter way to figure out what's happening.
Are there any techniques to track down the causes of such exceptions when the debugger cannot do much to help?
UPDATE: After crunching on hex numbers, dumping memory and doing stack forensics(thanks Crashworks) I came across this gem in the ARM Compiler documentation(even though I'm not using the ARM Ltd. compiler):
Integer division-by-zero errors can be trapped and identified by
re-implementing the appropriate C library helper functions. The
default behavior when division by zero occurs is that when the signal
function is used, or
__rt_raise() or __aeabi_idiv0() are re-implemented, __aeabi_idiv0() is
called. Otherwise, the division function returns zero.
__aeabi_idiv0() raises SIGFPE with an additional argument, DIVBYZERO.
So I put a breakpoint at __aeabi_idiv0(_aeabi_ldiv0) et Voila!, I had my complete stack trace before being completely trashed. Thanks everybody for their very informative answers!
Disclaimer: the "winning" answer was chosen solely and subjectively taking into account the weight of its suggestions into my debugging efforts, because more than one was informative and really helpful.
My first suggestion would be to open a memory window looking at the region around your stack pointer, and go digging through it to see if you can find uncorrupted stack frames nearby that might give you a clue as to where the crash was. Usually stack-trashes only burn a couple of the stack frames, so if you look upwards a few hundred bytes, you can get past the damaged area and get a general sense of where the code was. You can even look down the stack, on the assumption that the dead function might have called some other function before it died, and thus there might be an old frame still in memory pointing back at the current IP.
In the comments, I linked some presentation slides that illustrate the technique on a PowerPC — look at around #73-86 for a case study in a similar botched-stack crash. Obviously your ARM's stack frames will be laid out differently, but the general principle holds.
(Using the basic idea from Fedor Skrynnikov, but with compiler help instead)
Compile your code with -pg. This will insert calls to mcount and mcountleave() in every function. Do not link against the GCC profiling lib, but provide your own. The only thing you want to do in your mcount and mcountleave() is to keep a copy of the current stack, so just copy the top 128 bytes or so of the stack to a fixed buffer. Both the stack and the buffer will be in cache all the time so it's fairly cheap.
You can implement special guards in functions that can cause the exception. Guard is a simple class, in constractor of this class you put the name of the file and line (_FILE_, _LINE_) into file/array/whatever. The main condition is that this storage should be the same for all instances of this class(kind of stack). In the destructor you remove this line. To make it works you need to put the creation of this guard on the first line of each function and to create it only on stack. When you will be out of current block deconstructor will be called. So in the moment of your exception you will know from this improvised callstack which function is causing a problem.
Ofcaurse you may put creation of this class under debug condition
Enable generation of core files, and open the core file with the debuger
Since it uses raise() to raise the exception, I would expect that signal() should be able to catch it. Is this not the case?
Alternatively, you can set a conditional breakpoint at __aeabi_uldivmod to break when divisor (r1) is 0.
I'm working on a library where I'm farming various tasks out to some third-party libraries that do some relatively sketchy or dangerous platform-specific work. (In specific, I'm writing a mathematical function parser that calls JIT-compilers, like LLVM or libjit, to build machine code.) In practice, these third-party libraries have a tendency to be crashy (part of this is my fault, of course, but I still want some insurance).
I'd like, then, to be able to very gracefully deal with a job dying horribly -- SIGSEGV, SIGILL, etc. -- without bringing down the rest of my code (or the code of the users calling my library functions). To be clear, I don't care if that particular job can continue (I'm not going to try to repair a crash condition), nor do I really care about the state of the objects after such a crash (I'll discard them immediately if there's a crash). I just want to be able to detect that a crash has occurred, stop the crash from taking out the entire process, stop calling whatever's crashing, and resume execution.
(For a little more context, the code at the moment is a for loop, testing each of the available JIT-compilers. Some of these compilers might crash. If they do, I just want to execute continue; and get on with testing another compiler.)
Currently, I've got a signal()-based implementation that fails pretty horribly; of course, it's undefined behavior to longjmp() out of a signal handler, and signal handlers are pretty much expected to end with exit() or terminate(). Just throwing the code in another thread doesn't help by itself, at least the way I've tested it so far. I also can't hack out a way to make this work using C++ exceptions.
So, what's the best way to insulate a particular set of instructions / thread / job from crashes?
Spawn a new process.
What output do you collect when a job succeeds?
I ask because if the output is low bandwidth I would be tempted to run each job in its own process.
Each of these crashy jobs you fire up has a high chance of corrupting memory used elsewhere in your process.
Processes offer the best protection.
Processes offer the best protection, but it's possible you can't do that.
If your threads' entry points are functions you wrote, (for example, ThreadProc in the Windows world), then you can wrap them in try{...}catch(...) blocks. If you want to communicate that an exception has occurred, then you can communicate specific error codes back to the main thread or use some other mechanism. If you want to log not only that an exception has occured but what that exception was, then you'll need to catch specific exception types and extract diagnostic information from them to communicate back to the main thread. A'la:
int my_tempermental_thread()
{
try
{
// ... magic happens ...
return 0;
}
catch( const std::exception& ex )
{
// ... or maybe it doesn't ...
string reason = ex.what();
tell_main_thread_what_went_wong(reason);
return 1;
}
catch( ... )
{
// ... definitely not magical happenings here ...
tell_main_thread_what_went_wrong("uh, something bad and undefined");
return 2;
}
}
Be aware that if you go this way you run the risk of hosing the host process when the exceptions do occur. You say you're not trying to correct the problem, but how do you know the malignant thread didn't eat your stack for example? Catch-and-ignore is a great way to create horribly confounding bugs.
On Windows, you might be able to use VirtualProtect(YourMemory, PAGE_READONLY) when calling the untrusted code. Any attempt to modify this memory would cause a Structured Exception. You can safely catch this and continue execution. However, memory allocated by that library will of course leak, as will other resources. The Linux equivalent is mprotect(YorMemory, PROT_READ), which causes a SEGV.
So, I have a big piece of legacy software, coded in C. It's for an embedded system, so if something goes wrong, like divide by zero, null pointer dereference, etc, there's not much to do except reboot.
I was wondering if I could implement just main() as c++ and wrap it's contents in try/catch. That way, depending on the type of exception thrown, I could log some debug info just before reboot.
Hmm, since there are multiple processes I might have to wrap each one, not just main(), but I hope that you see what I mean...
Is it worthwhile to leave the existing C code (several 100 Klocs) untouched, except for wrapping it with try/catch?
Division by zero or null pointer dereferencing don't produce exceptions (using the C++ terminology). C doesn't even have a concept of exceptions. If you are on an UNIX-like system, you might want to install signal handlers (SIGFPE, SIGSEGV, etc.).
Since it's an embedded application it probably runs only on one platform.
If so its probably much simpler and less intrusive to install a proper interrupt handler / kernel trap handler for the division by zero and other hard exceptions.
In most cases this can be done with just a few lines of code. Check if the OS supports such installable exception handlers. If you're working on an OS-less system you can directly hook the CPU interrupts that are called during exceptions.
First of all, divide by zero or null pointer dereference won't throw exceptions. Fpe exceptions are implemented in the GNU C library with signals (SIGFPE). And are part of the C99 standard, not the C++ standard.
A few hints based in my experience in embedded C and C++ development. Depending on your embedded OS:
Unless you are absolutely sure of what you are doing, don't try/catch between threads . Only inside the same thread. Each thread usually has its own stack.
Throwing in the middle of legacy C code and catching, is going to lead to problems, since for example when you have jumped to the catch block out of the normal program flow, allocated memory may not be reclaimed, thus you have leaks which is far more problematic than divide by zero or null pointer dereference in an embedded system.
If you are going to use exceptions, throw them inside your C++ code with allocation of other "C" resources in constructors and deallocation in destructors.
Catch them just above the C++ layer.
I don't think machine code compiled from C does the same thing with regard to exceptions, as machine code compiled for C++ which "knows" about exceptions. That is, I don't think it's correct to expect to be able to catch "exceptions" from C code using C++ constructs. C doesn't throw exceptions, and there's nothing that guarantees that errors that happen when C code does something bad are caught by C++'s exception mechanism.
Divide by zero, null pointer dereference are not C or C++ exceptions they are hardware exceptions.
On Windows you can get hardware exceptions like these wrapped in a normal C++ exception if _set_se_translator() is used.
__try/__catch may also help. Look up __try/__catch, GetExceptionCode(), _set_se_translator() on MSDN, you may find what you need.
I think I remember researching on how to handle SIGSEGV ( usually generated when derefing invalid pointer, or dividing by 0 ) and the general advise I found on the net was: do nothing, let the program die. The explanation, I think a correct one, is that by the time you get to a SIGSEGV trap, the stack of the offending thread is trashed beyond repair.
Can someone comment if this is true?
Only work on msvc
__try
{
int p = 0;
p=p / 0;
}
__except (1)
{
printf("Division by 0 \n");
}