arm64 - how to debug EXC_BAD_ACCESS with LLDB - c++

Hello I use MacBook M1 (OSX) and I have a dangling pointer which it seems I can't catch.
I am using Clion and LLDB as a debuger.
When I run my code I get:
Exception: EXC_BAD_ACCESS (code=1, address=0x18)
However this does not really shows me or I can't understand where exactly is the bad pointer.
I am attaching also screenshot of my editor and the debugger window:
I have read something about zombie objects which when enabled allows you to catch dangling pointers. How can I do that?

EXC_BAD_ACCESS (defined in /usr/include/mach/exception_types.h) has a code (which is a kern_return_t) and a subcode. kern_return_t is defined in /usr/include/mach/kern_return.h and 1 means KERN_INVALID_ADDRESS, so this was not a protection problem but an actual invalid address access. The subcode (0x18) is the address accessed.
A small number like 0x18 usually means that your code accessed a field that was 0x18 bytes into an object, but the object pointer was null. So the first thing to do is look at all the accesses in that line of code (or around it if you are debugging optimized code) and make sure none of them are null. This might also be a null vtable, and the 0x18 the vtable offset, i.e. one of the methods of the object, so look for calls as well. However, I didn't see any suspicious looking pointer values in your locals, so maybe it's some subobject?
If it isn't obvious from there which pointer is bad, you could run your code under ASAN (address sanitizer) - if the bad pointer access is because of a use after free ASAN will often find those quickly. Note, Zombie objects is an ObjC only thing, that doesn't look relevant in your code.
If that doesn't get it, the most straightforward way to diagnose this sort of error is to look at the disassembly, for instance just run:
(lldb) disassemble
The current PC will be marked in the output. That instruction will be some kind of memory access, often dereferencing a register with an offset or something like that. For instance:
ldr w9, [x9, #0x18]
is loading memory 0x18 bytes off from the value in register x9. If this were the instruction, the next question is what program entity is currently occupying x9? lldb might know, you can ask it by doing:
(lldb) image lookup -va $pc
That will tell you everything lldb knows about that pc, among other things the last set of entries will be where all the known variables are currently located. Look for one that is in x9. If there isn't one listed in x9, then maybe one of the currently visible variables was temporarily copied into x9, in which case you have to look up in the instruction stream to see what was the last value that got copied into x9.

Related

Debugging a Corrupted Object on the Heap

I'm debugging a non-trivial software project where I have a bunch of objects located on the heap. At some point in time (at least) one of these objects gets corrupted.
I added a const member to my class to serve as a canary and indeed, it gets corrupted during executing. Typically I'd add a watchpoint to this variable to figure out when the memory is written to. However, I don't know which instance gets overwritten, as any information stored in the class gets corrupted as well.
I have too many objects to set a watchpoint on each of them and I haven't been able to reproduce with a smaller input set. Running valgrind I see "Invalid read of size 4", which is my canary int of 4 bytes being read but at this point it's already too late.
Any suggestions on how to proceed from here?
Probably this won't be specific enough, but when I had a similar problem, here is what I ended up doing. I'm assuming you can reproduce your problem in a deterministic fashion.
My strategy was to find which instance caused the problem first. This I did with a counter on a specific line that exposes the symptom. For example, on Visual Studio, I would setup a breakpoint that triggers on the 100000th hit, so that it never does; but Visual Studio still tells you how many times the breakpoint is encountered during execution. By trial and error, I would find that the problem occurs on the say, 20th time the breakpoint is encountered, and so I would set the breakpoint to trigger on the 19th iteration, to be able to discriminate the appropriate instance before corruption occurred.
Starting from there, I could have the address of the variable that was corrupted before it was, and play with the debugger to find out what is going on: gather enough information about the faulty instance.
Then, I did setup breakpoints at strategic places, which were triggered by conditions : eg. trigger only for an instance with the appropriate address, or with specific values in members.
You'll probably get to when the symptom occurs precisely, but not to the problem, but that's still something.
Hope this helps!
Running valgrind I see "Invalid read of size 4", which is my canary int of 4 bytes being read but at this point it's already too late.
You are confused: if valgrind told you that you are doing invalid read (presumably because the object has been freed), then you are reading danging (already freed) object, and that is exactly your problem.
You shouldn't try to access such objects, and the fact that your canary has been changed / corrupted after you freed / deleted the object is irrelevant.
I managed to find out what was causing my issue. Turns out the object I was looking at never existed in the first place. Like #employed-russian, I wondered whether my object might have been deleted somewhere I wasn't aware of. Putting a breakpoint on the destructor yielded nothing so the only reasonable explanation is the pointer itself being invalid, pointing to memory that wasn't a valid instance of my class.
Lo and behold; the pointer I was dereferencing was left uninitialized by some constructor of another class. I figured it out when I added an explicit check for null and Valgrind's error became Conditional jump or move depends on uninitialised value(s). By using --track-origins=yes, I quickly figured out the source of the uninitialized data, i.e. the pointer missing from the initialization list.
(I know uninitialized values can be detected by the compiler with -Wuninitialized but apparently my version of clang (apple) didn't feel like mentioning it with -Wall enabled.)

Pop{pc} in assembly

This may be a stupid question, but in my assembly code, during debugging, I have
pop{r2-r6,pc}
and I think it is giving me an hard fault exception. I understand what pop does, but I am unsure what the pc part means. I cannot find it explained anywhere on the internet and it is not a variable in my code anywhere.
I am using keil on an stm32 in c++
pc or r15 is the program counter, the register which gives the address that the processor fetches instructions from. Changing it to another address makes the program execution jump to that address.
In this case, the address is read off the stack to return from a function call; the return address would have been pushed onto the stack (from the link register lr or r14) at the start of the function.
If that's causing a crash, then it's probably because the address on the stack has been corrupted. Perhaps you're writing outside the bounds of a local array, or overflowing the stack with too deep a function call level.
The PC register is the program counter, it holds the address of the next instruction to be executed on an ARM architecture (STM32 uses the ARM architecture).
The default in ARM assembly it to simply overwrite the PC register when a function is to return. What you are seeing with the pop statement is just a direct way to return, see here.
The rest of your question is neatly explained in Mike's post.

"this" pointer getting corrupted in stack trace

I have seen this thread. My case is slightly different and I'm struggling to figure out how "this" pointer is getting corrupted.
I'm using the Qt 4.6.2 framework, using their QTreeView with my own model. The backtrace I get (86 frames long, with a lot of recursion, that's why I haven't pasted the whole thing in, it's in this pastebin only involves their code.
It finally segfaults on some assembler in QBasicAtomicInt::deref, but it's obvious that it has died further down, evidenced by these three frames:
#15 0x01420fd3 in QFrame::event (this=0x942bba0, e=0xbf8eb624) at widgets/qframe.cpp:557
#16 0x014bb382 in QAbstractScrollArea::viewportEvent (this=0x4, e=0x93f9240) at widgets/qabstractscrollarea.cpp:1036
#17 0x0156fbd7 in QAbstractItemView::viewportEvent (this=0x942bba0, event=0xbf8eb624) at itemviews/qabstractitemview.cpp:1610
In frame 17, this is 0x942bb0. In frame 16, this should be the same, as in frame 17 it's calling its ancestor's implementation of the same method. However this becomes 0x4.
Interestingly enough in frame 15 (again, frame 16 has called its ancestor's implementation of the same function), the 'this' pointer is restored to 0x942bba0.
If you looked at the pastebin of the full backtrace, you might see some 'value optimized out'. I had the application compiled with optimization on; I now have gcc set to -g3 -O0 so when it happens next time I might have something more. But of course now I can't make it crash -- it is a fairly difficult bug to make happen (but very important to fix nonetheless) so I don't think that's too suspicious.
Given the optimizations, is that this pointer=0x4 unusual or definitely wrong? What is odd is that there's no real code in any of these viewportEvent frames -- they simply do a switch on the event's type, it falls through the switch statement, and it returns its ancestor's implementation.
Valgrind doesn't seem to be throwing up any issues, although I haven't made it crash in Valgrind yet.
Has anybody seen this behaviour before? What could be causing it?
I have seen this sort of thing before when debugging optimized builds and it has never been an indication of what the real bug is for me.
It is easier to first think about a local variable. In a non-optimized build, everything has its designated place in memory and must be stored after every line of code. This is so the debugger can find it. In an optimized build, values can live in registers without being written to memory. This is a major part of the improved performance of an optimized build. The debugger doesn't understand this and will always look at memory, so you will often see the wrong value.
The same can happen with parameters. If the optimizer decides to pass a parameter in a register, the debugger is still going to look at the stackframe. More specifically, at the location where the parameter would be according to the rules of the calling convention.
The fact that the next frame of the stack has the value properly restored indicates that the generated instructions are dealing with the this parameter correctly, but the debugger just doesn't know where to look for it.

gdb interpret memory address as an object

I am investigating a crash, based on the available core dump. The application crashing is a C++ program, built with gcc and running on RH5. The backtrace seems valid till the #1 frame. There trying to print an object I get
<invalid address>, <error reading variable>
Since I have the address of the object from the #2 frame is it a valid presumption that I can somehow 'dump' the memory in which the object is allocated and still collect some info. Furthermore, instead of trying to guess how the object is aligned, can I force gdb to print the address as if it is an object, even though it detects some error. My idea is that maybe the object has already been deleted, but just maybe the memory is still there and I can print some member variable.
Please comment on is that possible, and if so, how it should be done in gdb terms. 10x.
Well, if you have an address you can always do:
print *(class MyClass*)pointer_var

Variable Name from Memory address in Visual Studio 2008

My c++ application developed in Visual Studio 2008 crashes at some memory location. It is showing memory access violation erroe like,
Unhandled exception at 0x650514aa in XXX.exe: 0xC0000005: Access violation reading location 0x00000004.
how I can get the variable name assigned to 0x650514aa this memory location. Or how to debug this issue.
Thanks,
Nilesh
0x650514aa is the address of code (instruction pointer), not of a variable. If you're lucky, it's your code. Then a map file would help. If you're unlucky, it's some third-party code (blowing because you called it passing in nonsense). But it ain't pretty to dig through map files and it won't tell you your variables' values anyway.
However, if you run it from the debugger, the debugger should intercept and let you examine the stack. And even if you run it without debugger, just-in-time debugging should pop up a dialog asking whether you want to attach a debugger.
The other answers here have useful information. But if for some reason you cannot get the debugger to assist you, here's some more detail you can get out of that error message that may help you locate the problem.
Access violation reading location 0x00000004.
That memory location is what the program is incorrectly trying to read. Typically, reading or writing to a very low memory location like that is caused by code trying to use a NULL pointer as if it pointed to a valid object.
If you happen to know roughly what part of your program is executing when this error occurs, then examine it for any possibilities for NULL pointers to slip through unexpectedly.
Furthermore, 0x00000004 would be the location of a member variable 4 bytes from the start of the object. If the object has virtual functions, then it would probably be the first member variable in the object (because those first 4 bytes are the hidden pointer to the virtual function table). Otherwise, without virtual functions involved, there must be 4 bytes worth of other member variables and/or padding bytes before it. So if you can't immediately tell which pointer is going NULL and causing the problem, then consider which pointers are being used to read such a member variable.
(Note: Technically, the exact memory layout of non-POD objects, particularly when virtual functions are involved, is not guaranteed by any standard. Byte alignment settings in your project can also affect memory layouts. However, in this case it's fairly safe to assume that what I've described is what your compiler is actually doing.)
Usually, if you debug your application inside Visual Studio 2008, at the time of the crash it will stop right at the offending line. Be sure to compile in the Debug configuration, then click Debug | Start.
For further checking, you can go to Debug | Exceptions and check the boxes "Break when an exception is thrown".
If you're running in debug, you should be able to have the system break at that point and be able to see the source code.
If you're running in release mode however, you may need to use the .map file that can be generated. (Link switch /MAP, and you'll need to specify the export files too)
There's a description of how to for v6 here: http://www.codeproject.com/KB/debug/mapfile.aspx
2008 is pretty similar, I believe, although I tend to prefer to run in debug mode if possible.
The map file will allow you to translate your crash address into an exact location in the source code (line number), which may well be helpful. However it will only tell you where the error manifested - not what actually caused it (e.g. a stack corruption wouldn't tell you when you corrupted the stack, only when the corrupted stack was discovered.)
Still, it should help point you in the right direction.