I am getting unhandled exception at some functionality due to enabling one of control and i am unable to find the exact reason.It gives me error at assembly instruction
00451901 add dword ptr [eax],eax but i can't figure out the basic reason of unhandled exception.Please suggest some software or any other thing to know the impact of enabling and disabling the control.
You get exception, because most certainly, eax contains value that is not an address to writable memory area.
So, the question is why this instruction was executed. Here's the hint:
Machine code for instructions add dword ptr[eax], eax is 01 00.
That is, unexpected executing of this instructions usually means that you happen to execute some data (e.g. 32-bit constant '1').
This usually happens because of buffer or stack overflow in your code or calling function by pointer that wasn't properly assigned.
Check your array access and function pointers calls.
Related
I have a C++ program that reports a crash. Analyzing it further, I am able to figure out the exact place where the data on stack gets corrupted. But, I am not able to understand why that happens or what would cause such corruption. At a point in time
value at [rsp+40h] is 00000249 15ed4de0
Then the following assembly instruction gets executed
mov rsi,qword ptr [rsp+40h]
Now, the value at
[rsp+40h] is 00000000 15ed4de0
The higher order DWORD has changed from 00000249 to 00000000. There are no changes in the addresses nearby. I do not understand why a read from [rsp+40h] would end up corrupting the value present at that address.
Any pointers are welcome. Thanks in advance.
How to handle this exception?
__asm
{
mov esp, 0
mov eax, 0
div eax
}
This is not handled with try/except or SetUnhandledExceptionFilter().
Assuming this is running in an operating system, the operating system will catch the divide by zero, and then ATTEMPT to form an exception/signal stackframe for the application code. However, since the user-mode stack is "bad", it can't.
There is really no way for the operating system to deal with this, other than kill the application. [Theoretically, the could make up a new stack from some dynamically allocated memory, but it's pretty pointless, as there is no (always working) way for the application itself to recover to a sane state].
Don't set the stack pointer to something that isn't the stack - or if you do store "random" data in the stack pointer register, do not have exceptions. It's the same as "don't aim a gun at your foot and pull the trigger, unless you want to be without your foot".
Edit:
If the code is running in "kernel mode" rather than "usermode", it's even more "game over", since it will "double-fault" - the processor hits a divide by zero exception handler, which tries to write to the stack, and when it does so, it faults. This is now a "fault within a fault handler", aka a "double-fault". The typical setup of the double-fault handler is to have a separate stack, which then recovers the fault handler. But it's still game over - we don't know how to return to the original fault handler [or how to find out what the original fault handler was].
If there is no "new stack" with the double fault handler, it will triple fault a x86 processor - typically, a triple fault will make the processor restart [technically, it halts the processor with a special combination of bits signalled on the address bus to indicate that it's a "triple fault". The typical PC northbridge then resets the processor in recognition that the triple fault is an unrecoverable situation - this is why sometimes your PC simply reboots when you have poor quality drivers].
It's not a good idea to try to interact with a higher-level language's exception mechanism from embedded assembly. The compiler can do "magic" that you cannot match, and there's no (portable) way to tell the compiler that "this assembly code might throw an exception".
The last few days have been spent debugging a very strange problem. An application built for i386 running on Windows crashed, with the top of the callstack completely corrupted and the instruction pointer in a nonsense location.
After some effort, I rebuilt the callstack and was able to determine how the IP ended up in the nonsense location. An instruction in boost shared pointer code attempts to call a function defined in my DLLs import address table using an incorrect offset. The instruction looks like:
call dword ptr [nonsense offset into import address table]
As a result, execution ended up in a bad location that was, unfortunately, executable. Execution then proceeded, gobbling up the top of the stack until eventually crashing.
By launching the identical application on my PC, and stepping into the problematic code, I can find the same call instruction and see it's supposed to by calling msvc100's 'new' operator.
Further comparing the minidump from the client's PC to my PC, I found that my PC was calls a function with an offset of 0x0254 into the address table. On the clients PC, the code is trying to invoke a function with an offset of 0x8254.
What's even more confusing is that this offset is not coming from a register or another memory location. The offset is a constant in the disassembly. So, the disassembly looks like:
call dword ptr [ 0x50018254 ]
not like:
call dword ptr [ edx ]
Does anyone know how this might happen?
That's a single bit flip:
0x0254 = 0b0000001001010100
0x8254 = 0b1000001001010100
Perhaps corrupt memory, corrupt disk, gamma ray from the sun...?
If this specific case is reproducible and their on-disk binary matches yours, I'd investigate further. If it's not specifically reproducible, I'd encourage the client to run some machine diagnostics.
Thats seems to me a hardware error for sure, mainly memory error. As #Hostile_Fork pointed out, is just a bit flip.
Does your memory have ECC feature? it it does it, make sure is enabled. I would pass a burn-in memory test with memtest86 to see what happens, I bet you have a faulty memory chip, doesn't look like a bug.
We see a couple of below mentioned messages in /var/log/messages for one of our application:
Sep 18 03:24:23 <machine_name> kernel: application_name[14682] trap invalid opcode rip:f6c6e3ce rsp:ffc366bc error:0
...
Sep 18 03:19:35 <machine_name> kernel: application_name[4434] general protection rip:f6cd43a2 rsp:ffdfab0c error:7b2
I am not able to make what’s these output means and how we can track the function / code that is causing the issue. Further what is 'trap invalid opcode' and 'general protection' means?
Usually that means that your program's instruction pointer points to data or garbage. That's commonly caused by writing to stray pointers and such.
One scenario would be that your code writes (through a stray pointer) over some class' virtual table, replacing the member function addresses with nonsense. The next time you call one of the class' virtual functions, your program will interpret the garbage as an address and jump to that address. If whatever data lies at this address happens to not to be a valid machine code instruction for your processor, you would see this error.
There is another possibility that can cause 'invalid' op codes, that would be hardware not supporting newer opcode/instruction sets(SSE 4/5) or it not being from the right manufacturer(both AMD and Intel have some specific opcodes that work only on their processors) or just not having permission to exectute certain ops(though this would probably show up as something else).
From the above I would take RIP to be 'register(?) instruction pointer' and RSP to be 'register stack pointer', in which case you could use a debugger and set an execution hardware breakpoint on the specified address(RIP) and trace back what is calling it.(it seems your using linux or unix, so this is quite vague). if you are on windows, try using a custom exception filter to capture the EXCEPTION_ILLEGAL_INSTRUCTION event to get a little more information
I have a C++ tool that walks the call stack at one point. In the code, it first gets a copy of the live CPU registers (via RtlCaptureContext()), then uses a few "#ifdef ..." blocks to save the CPU-specific register names into stackframe.AddrPC.Offset, ...AddrStack..., and ...AddrFrame...; also, for each of the 3 Addr... members above, it sets stackframe.Addr....Mode = AddrModeFlat. (This was borrowed from some example code I came across a while back.)
With an x86 binary, this works great. With an x64 binary, though, StackWalk64() passes back bogus addresses. (The first time the API is called, the only blatantly bogus address value appears in AddrReturn ( == 0xFFFFFFFF'FFFFFFFE -- aka StackWalk64()'s 3rd arg, the pseudo-handle returned by GetCurrentThread()). If the API is called a second time, however, all Addr... variables receive bogus addresses.) This happens regardless of how AddrFrame is set:
using either of the recommended x64 "base/frame pointer" CPU registers: rbp (= 0xf), or rdi (= 0x0)
using rsp (didn't expect it to work, but tried it anyway)
setting AddrPC and AddrStack normally, but leaving AddrFrame zeroed out (seen in other example code)
zeroing out all Addr... values, to let StackWalk64() fill them in from the passed-in CPU-register context (seen in other example code)
FWIW, the physical stack buffer's contents are also different on x64 vs. x86 (after accounting for different pointer widths & stack buffer locations, of course). Regardless of the reason, StackWalk64() should still be able to walk the call stack correctly -- heck, the debugger is still able to walk the call stack, and it appears to use StackWalk64() itself behind the scenes. The oddity there is that the (correct) call stack reported by the debugger contains base-address & return-address pointer values whose constituent bytes don't actually exist in the stack buffer (below or above the current stack pointer).
(FWIW #2: Given the stack-buffer strangeness above, I did try disabling ASLR (/dynamicbase:no) to see if it made a difference, but the binary still exhibited the same behavior.)
So. Any ideas why this would work fine on x86, but have problems on x64? Any suggestions on how to fix it?
Given that fs.sf is a STACKFRAME64 structure, you need to initialize it like this before passing it to StackWalk64: (c is a CONTEXT structure)
DWORD machine = IMAGE_FILE_MACHINE_AMD64;
RtlCaptureContext (&c);
fs.sf.AddrPC.Offset = c.Rip;
fs.sf.AddrFrame.Offset = c.Rsp;
fs.sf.AddrStack.Offset = c.Rsp;
fs.sf.AddrPC.Mode = AddrModeFlat;
fs.sf.AddrFrame.Mode = AddrModeFlat;
fs.sf.AddrStack.Mode = AddrModeFlat;
This code is taken from ACE (Adaptive Communications Environment), adapted from the StackWalker project on CodeProject.
SymInitialize(process, nullptr, TRUE) must be called (once) before StackWalk64().
FWIW, I've switched to using CaptureStackBackTrace(), and now it works just fine.