In Visual Studio, through the dialog at Debug > Exceptions..., you can set specific C++ exceptions types to break on or skip past. In Windbg, turning on breaking for C++ exceptions with sxe eh is all or nothing.
Is there any way to skip breaking on specific C++ exception types? Conversely, is there a way to break on only specific types?
Note: This answer is 32-bit specific, as I haven't yet done much 64-bit debugging. I don't know how much applies to 64-bit.
Assume the following code:
class foo_exception : public std::exception {};
void throw_foo()
{
throw foo_exception();
}
And let's assume you've turned on breaking on first chance exceptions for C++ exceptions: sxe eh
Now when the debugger breaks, your exception record will be on the top of the stack. So if you just want to see what the type is, you can display the exception record info:
0:000> .exr #esp
ExceptionAddress: 751dc42d (KERNELBASE!RaiseException+0x00000058)
ExceptionCode: e06d7363 (C++ EH exception)
ExceptionFlags: 00000001
NumberParameters: 3
Parameter[0]: 19930520
Parameter[1]: 0027f770
Parameter[2]: 0122ada0
pExceptionObject: 0027f770
_s_ThrowInfo : 0122ada0
Type : class foo_exception
Type : class std::exception
Take a look at the current stack, and you can see where this stuff is sitting:
0027f6c4 e06d7363
0027f6c8 00000001
0027f6cc 00000000
0027f6d0 751dc42d KERNELBASE!RaiseException+0x58
0027f6d4 00000003
0027f6d8 19930520
0027f6dc 0027f770
0027f6e0 0122ada0 langD!_TI2?AVfoo_exception
...
So the exception itself is sitting at 0027f770 in this example, as you can see from the .exr output next to pExceptionObject. And you can see that value on the stack at 0027f6dc, or offset from the top of the stack by 0x18, so #esp+18. Let's see what the debugger tells us about that location.
0:000> dpp #esp+18 L1
0027f6dc 0027f770 01225ffc langD!foo_exception::`vftable'
This command says: starting at #esp+18, dump one pointer-sized value, then deref the value found there as a pointer, too, and write the name of any symbol matching that second address. And in this case it found the vtable for the foo_exception class. That tells us that the object at address 0027f770 is a foo_exception. And we can use that information to create an expression for a conditional breakpoint.
We need a way to get the address of the vtable directly, and that looks like this:
#!"langD!foo_exception::`vftable'"
We have to quote it because of the back tick and apostraphe. We also need to pull the desired stack value:
poi(poi(#esp+18))
The poi operator takes an address and returns a pointer-sized value stored there. The first evaluation turns the stack address into the object address, and the second evaluation turns the object address into the vtable address, which we need for comparison. The whole condition looks like this:
#!"langD!foo_exception::`vftable'" == poi(poi(#esp+18))
Now that we can tell if it's a foo_exception, we can skip breaking on them by setting a command to run automatically when the debugger breaks on C++ exceptions:
sxe -c".if ( #!\"langD!foo_exception::`vftable'\" == poi(poi(#esp+18)) ) {gc}" eh
Translation:
break on first chance for C++ exceptions and run a command that:
compares the foo_exception vtable address to the vtable address of the object at #esp+18
if they are the same, issue the gc command, which continues running if the debugger was running when this command was reached
(don't forget to escape the inner quotes)
And if you want to break only for a foo_exception, change the condition from == to !=.
Something to keep in mind is that sometimes exceptions are thrown as a pointer instead of by value, which means you'll need one more poi() around the #esp part of the expression. You'll be able to tell because when you dump the exception record with .exr, the Type will be class foo_expression *. This is completely dependent on the code that throws the exception and not the exception type itself, so you may need to tailor your .if-condition for the situtation.
Lastly, if you want to break on or skip several exception types, it is doable. I would suggest writing a script with the chained .if, .elsif commands and setting thesxe automatic command to $$><path\to\script. Doing a ton of if-condition chaining on one line can be very difficult to read and get right, especially with the extra escaping. A script won't need the extra escaping. Here's a small example:
.if ( #!"langD!foo_exception::`vftable'" == poi(poi(#esp+0x18)) )
{
$$ skip foo_exceptions
gc
}
.elsif ( #!"langD!bar_exception::`vftable'" == poi(poi(#esp+0x18)) )
{
$$ dump the exception to see the error message, then continue running
dt poi(#esp+18) langD!bar_exception
gc
}
.elsif ( #!"langD!baz_exception::`vftable'" == poi(poi(#esp+0x18)) )
{
$$ show the top 10 frames of the stack and then break (because we don't `gc`)
kc 10
}
(Note: Windbg will complain about a script error whenever this runs because it doesn't like a gc command followed by anything else. But it still runs fine)
Related
I've been working with C and C++ for a fairly long time. I have a computer science minor. I'm familiar with the pitfalls intrinsic to the low level access to process memory these languages provide. I've spent days and weeks in them.
Learning to use valgrind about a decade ago was a lifesaver in terms of catching minor access errors and such. Currently, I also use ASAN with clion, and mistakes of this sort are usually caught and dealt with quickly.
I presume there's no bulletproof, however, and a recent problem has me completely stumped.1
I have an object that includes a non-public sockaddr_storage field named from. This can be accessed via:
const sockaddr_storage* getSockAddr () {
return &from;
}
But the address returned is wrong. Starting from a breakpoint on the return line in gdb:
Breakpoint 3, socketeering::Socket::getSockAddr (this=0x617000000400) at Socket.hpp:81
81 return &from;
(gdb) p this
$1 = (socketeering::UDPsocket * const) 0x617000000400
(gdb) p &from
$2 = (sockaddr_storage *) 0x617000000600
(gdb) p (const sockaddr_storage*)&from
$3 = (const sockaddr_storage *) 0x617000000600
Seems pretty clear the value returned has to be 0x617000000600. But no:
(gdb) fin
Run till exit from #0 socketeering::Socket::getSockAddr (this=0x617000000400) at Socket.hpp:81
0x00000000004290ab in udpHandler::dataReady (this=0x631000014810, iod=0x617000000400, con=0x60e0000249b0) at /opt/cogware/C++/Socketeering2/demo/echo_server.cpp:66
66 auto sa = sock->getSockAddr();
Value returned is $4 = (const sockaddr_storage *) 0x617000000618
^^
(gdb) p sock
$5 = (socketeering::UDPsocket *) 0x617000000400
That's no good -- it is 18 bytes inside the structure. Even worse, I CANNOT reproduce it with a simple SSCCE:
class foo {
sockaddr_storage ss;
public:
foo () { cout << &ss << "\n"; }
const sockaddr_storage* getSockAddr () { return &ss; }
};
Meaning it's not some misunderstanding of the rules, etc. It's obviously not a logic error either.
It has to be corruption, right?
This is a single threaded process, and if instead of fin I just keep stepping to see what's happening, there is literally nothing to see. One step to the function close, and the next one is at the assignment with the wrong value. Neither valgrind nor ASAN indicate any hijinx.
What can I look at to find out what is happening? Obviously something is going wrong here in between:
return &from;
And the actual return of a value. Is looking at assembly dumps for clues the only route left to me (presuming that would help at all, I'm no ASM guy)?
The answer I dread is there's nothing beyond scouring the code for mistakes that valgrind and ASAN didn't catch. Finding out under what circumstances they would not catch corruption is a starting place for that.
Which I did raise earlier in a now deleted question. All any one could say was exactly what I would say if I read a question like that: We need an SSCCE, and the corruption could be in other parts of the code. Point being, there's nothing in the information I have to show which explains the problem, but, sans inviting everyone onto a 10-20K LOC project, that's all I can do. So what I am asking now is not what's wrong, but "How can I determine what's wrong?"
Is looking at assembly dumps for clues the only route left to me
Yes, using a disas command is the appropriate approach here.
(presuming that would help at all, I'm no ASM guy)?
Even if you can't write assembly, it's often pretty easy to read assembly. Especially if it's something like x86_64 and doesn't involve complicated bit twiddling or floating point. And it's a skill that will serve you well.
Usually the problem of this sort is a result of an ODR violation: somewhere in your program you have a different definition of socketeering::Socket, one in which the offset between this and from is 24 (it's not 18 bytes, it's 0x18 bytes!) instead of 0.
Often such ODR violation comes from using different #defines in different parts of the code, e.g.
class Socket {
#if defined(TRACING_ON)
char trace_buf[24];
#endif
sockaddr_storage from;
};
Compile above struct in one .cc file with -DTRACING_ON, compile another .cc without it, link them together into a single binary and BOOM: you may see exactly the bug you've described.
Sometimes, the problem comes from not recompiling all code (e.g. you may have an old object or a shared library laying around).
It could also come from linking together code built by different compilers, though this is rare (usually if the compilers are not ABI-compatible, they use different name mangling to preclude the program from linking).
Note: if Socket inherits from some other class, the difference may be coming from the superclass and not the Socket itself.
0x004069f1 in Space::setPosition (this=0x77733cee, x=-65, y=-49) at space.h:44
0x00402679 in Checkers::make_move (this=0x28cbb8, move=...) at checkers.cc:351
0x00403fd2 in main_savitch_14::game::make_computer_move (this=0x28cbb8) at game.cc:153
0x00403b70 in main_savitch_14::game::play (this=0x28cbb8) at game.cc:33
0x004015fb in _fu0___ZSt4cout () at checkers.cc:96
0x004042a7 in main () at main.cc:34
Hello, I am coding a game for a class and I am running into a segfault. The checker pieces are held in a two dimensional array, so the offending bit appears to be invalid x/y for the array. The moves are passed as strings, which are converted to integers, thus for the x and y were somehow ASCII NULL. I noticed that in the function call make_move it says move=...
Why does it say move=...? Also, any other quick tips of solving a segfault? I am kind of new to GDB.
Basically, the backtrace is trace of the calls that lead to the crash. In this case:
game::play called game::make_computer_move which called Checkers::make_move which called Space::setPosition which crashed in line 44 in file space.h.
Taking a look at this backtrace, it looks like you passed -65 and -49 to Space::setPosition, which if they happen to be invalid coordinates (sure look suspicious to me being negative and all). Then you should look in the calling functions to see why they have the values that they do and correct them.
I would suggest using assert liberally in the code to enforce contracts, pretty much any time you can say "this parameter or variable should only have values which meet certain criteria", then you should assert that it is the case.
A common example is if I have a function which takes a pointer (or more likely smart pointer) which is not allowed to be NULL. I'll have the first line of the function assert(p);. If a NULL pointer is ever passed, I know right away and can investigate.
Finally, run the application in gdb, when it crashes. Type up to inspect the calling stack frame and see what the variables looked like: (you can usually write things like print x in the console). likewise, down will move down the call stack if you need to as well.
As for SEGFAULT, I would recommend runnning the application in valgrind. If you compile with debugging information -g, then it often can tell you the line of code that is causing the error (and can even catch errors that for unfortunate reasons don't crash right away).
I am not allowed to comment, but just wanted to reply for anyone looking more recently on the issue trying to find where the variables become (-65, -49). If you are getting a segfault you can get a core dump. Here is a pretty good source for making sure you can set up gdb to get a core dump. Then you can open your core file with gdb:
gdb -c myCoreFile
Then set a breakpoint on your function call you'd like to step into:
b MyClass::myFunctionCall
Then step through with next or step to maneuver through lines of code:
step
or
next
When you are at a place in your code that you'd like to evaluate a variable you can print it:
p myVariable
or you can print all arguments:
info args
I hope this helps someone else looking to debug!
I want to write a Dtrace so that i can analyse if overflow_error is happening in a process
i am executing . I just know that this is an error thrown as std::overflow_error. I don't have much idea about how to write a D-Trace . I need some beginner guide and if someone can let me know how to write it . The process name i am running is say superbug_returns . How can i write a D-Trace for it analyzing if above scenario is happening or not? I am working on solaris
It would be probably much easier to run the program in the debugger (dbx), and have it stop on thrown exceptions.
I second the suggestion to try the debugger with this - there's usually a command to break-on-C++-exception. It's simpler to go that way.
If you insist on DTrace:
A few years ago, Sun published a whitepaper how to use DTrace with C++ - read that.
It's not trivial to apply the techniques described there to the "trace exceptions" usecase, unfortunately, because exception throwing/handling is in the C++ runtime and done through internal (nonexposed) function calls. In gcc-compiled code, throw ... becomes __cxa_throw(...) whereas in SunStudio-compiled code (which uses a different name mangling scheme) a function (unmangled / mangled):
void __Crun::ex_throw(void*,const __Crun::static_type_info*,void(*)(void*))
__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_
is called. Note that this depends on your compiler version; SunStudio changed their mangling scheme / C++ runtime at some point in the past. In both cases though, std::... would be passed as argument, so it you'd want to DTrace for a specific exception class only you'd need secondary filtering (a D probe predicate that tests whether the exception thrown is really the one you're interested in). You'd need to find out what args to the above function[s] correspond to std::overflow being thrown and filter for those.
Without your actual object file, I can't give more advice than that. For a start, try:
gcc:
dtrace -n '
pid<your-process-pid>::__cxa_throw:entry
{
#exloc[ustack()] = count();
}'
SunStudio:
dtrace -n '
pid<your-process-pid>::__1cG__CrunIex_throw6Fpvpkn0AQstatic_type_info_pF1_v_v_:entry
{
#exloc[ustack()] = count();
}'
to find places in your code where exceptions are being thrown (Ctrl+C to terminate the DTrace gives you the statistics). Then iterate from there (try to dump the args, see if you can identify std::overflow, filter for that by adding a /arg0 == .../ or similar to the probe).
I have this program written in C++ Builder 6. I didn't write all the code, just some of it. The language, however, is not C++ (as far as I'm aware) - it looks more like Delphi or Pascal. So that's why I included them all in the tags.
I have an int called Oversteering.
try
{
Oversteering=HoursCounter.ToInt();
}
catch(EConvertError &convertError)
{
Oversteering=0;
}
HoursCounter is an AnsiString, and it is in the form of an int.
Since this is the only try/catch statement in the whole code (that's not too good, I know), and I couldn't find any good example of such in Delphi/Pascal/???, I don't know if it's correctly written.
Well, I try to convert the string to an int. Sometimes I get this error:
That is, an exception called EConvertError has occurred.
So my question is: why is this exception NOT caught by the catch statement?
This error is shown by the debugger when running through the code,
if you run the exe and have the same situation the error message will not be shown to you
The exception is caught but the debugger is notifiying you regarding the error in the code
that is here
try
{
Oversteering=HoursCounter.ToInt();
}
since running in the debugger the ,your trying to convert (blankspace) '' to integer, the debugger will show the exception...but when running the exe, the debugger will set
Oversteering=0
check this from about.com
Break On Exceptions
When building a program with exception handling, you may not want Delphi to break on Exceptions. This is a great feature if you want Delphi to show where an exception has occurred; however, it can be annoying when you test your own exception handling.
As #PresleyDias explained, it is the debugger that is displaying the exception, not your app. The exception is being caught (you should be catching it by a const reference, though), but the debugger sees it before your app does, that's all. You can configure the debugger to ignore EConvertError, if you like.
A better solution is to avoid the exception in the first place. If you use AnsiString::ToIntDef() instead, you can remove the try/catch block completely:
Oversteering = HoursCounter.ToIntDef(0);
Alternatively, you can use TryStrToInt() instead:
if (!TryStrToInt(HoursCounter, Oversteering))
{
...;
}
If 0 is a valid value for your counter, use TryStrToInt():
if (TryStrToInt(HoursCounter, Oversteering))
{
// use Oversteering as needed, even zeros...
}
else
ShowMessage("Cannot convert HoursCounter to a valid integer!");
If 0 always represents an error, then use ToIntDef():
Oversteering = HoursCounter.ToIntDef(0);
if (Oversteering != 0)
{
// use Oversteering as needed, except zeros...
}
else
ShowMessage("Cannot convert HoursCounter to an acceptable integer!");
How do I get a C++ application including a loaded ada shared library to generate a core dump when crashing?
I have a C++ application which loads a ada shared library, inside the ada code I get a stack overflow error which causes program termination along with the console output:
raised STORAGE ERROR
No core dump file is generated even thou I have issued a "ulimit -c unlimited" before starting the application.
Same thing happens if I send a kill SIGSEGV to the application.
Sending kill SIGSEGV to another application that does not use the ada dll generates a core dump file just the way I want it to.
Found some information here: http://objectmix.com/ada/301203-gnat-fstack-check-does-work.html
UPDATED!
As mentioned by Adrien, there is no contradiction, -s sets the stack limit while -c sets the core file limit.
Still the problem remains. I checked the flags when building the ada library and the fstack-check flag was not set, so it should generate a core dump.
Althou I haven't tried it yet, it seems somewhat strange.
It mentions the -fstack-check compiler option + setting the GNAT_STACK_LIMIT variable but at the same time refers to the ulimit command which seems like a contradiction, setting "ulimit -c " is the only way I know of getting a core dump to be generated at the time of crash, if this infers with the fstack-check option then we have a catch 22.
Now, almost 2 years later (still working at the same company as Kristofer did when he asked the question), was the question raised again - and finally I think that I understands why no core-dump is generated!!
The problem is caused by the Ada run-time, which by default implements a signal handler for some POSIX-signals (for Linux: SIGABRT, SIGFPE, SIGILL, SIGSEGV and SIGBUS). For GNAT/linux the signal handler is called __gnat_error_handler in a-init.c, which looks something like this:
static void
__gnat_error_handler (int sig)
{
struct Exception_Data *exception;
char *msg;
static int recurse = 0;
...
switch (sig)
{
case SIGSEGV:
if (recurse)
{
exception = &constraint_error;
msg = "SIGSEGV";
}
else
{
...
msg = "stack overflow (or erroneous memory access)";
exception = &storage_error;
}
break;
}
recurse = 0;
Raise_From_Signal_Handler (exception, msg);
}
This handler is "process wide", and will be called by any trigged signal, no matter from which part of the process it originates from (no matter if coded in Ada/C/C++...).
When called, the handler rises an Ada-exception and leaves it to the Ada runtime to find an appropriate exception handler - if no such handler is found (eg. when an SIGSEGV is generated by any part of the C++-code), the Ada-runtime falls back to just terminate the process and just leave a simple printout from __gnat_error_handler (eg. "stack overflow (or erroneous memory access)").
http://www2.adacore.com/gap-static/GNAT_Book/html/node25.htm
To prevent Ada-runtime from handling a POSIX-signal, and convert it to an Ada-exception, it is possible to disable the default beahviour by using
pragma Interrupt_State (Name => value, State => SYSTEM | RUNTIME | USER);,
eg. to disable handling of SIGSEGV, define
Pragma Interrupt_State(SIGSEGV, SYSTEM);
in your Ada-code - now the system's default behaviour will be trigged when a SIGSEGV is raised, and a core-dump will be generated that allows you to trace down the origin of the problem!
I think this is a quite important issue to be aware of when mixing Ada and C/C++ on *NIX-platforms, since it may mislead you to think that problems origins from the Ada-code(since the printout indicates an exception generated from Ada) when the real source of the problem lays in the C/C++-code...
Although it is probably safe to disable the Ada-runtime default handling of SIGSEGV (I guess no sane programmer using this in any "expected" error handling... Well, maybe used in aviation software or similar, when some sort of "last resort" functionallity needs to be maintained to avoid something really bad from happening..) i guess a bit caution should be taken then "overriding" the Ada-runtime handling for signals.
One issue may be the SIGFPE-signal, which also raises an Ada Constraint_Error-exception by default. This type of exception may be used by the Ada-code as an "excpected behaviour".
Disabling SIGFPE by Pragma Interrupt_State may seriously affect the execution of the Ada-code, and crash your application during "normal circumstances" - on the other hand will any division by zero in the C/C++-code trig the Ada-exception handling mechanism, and leave you without any real trace of the origin of the problem...
This looks to me like a really good use for your AdaCore support. You aren't liable to find a whole lot of folk outside that company who are that familiar with the implications of the interactions between Gnu Ada's runtime and C++'s.
I would suggest for debugging the Ada code that you try putting in a last-ditch exception handler around everything, which in turn dumps the exception stack. Most vendors have some way of doing that, usually based off of Ada.Exceptions.Exception_Information and Ada.Exceptions.Exception_Message.
I found a discussion from a security perspective (finding malware). Basically there are 10 signals that you can try, SIGSEGV is only one of them.
It seems you can simply call sigaction(SIGSEGV, 0, SIG_DFL); to restore the default signal behavior.