Valgrind stack misses a function completely - c++

i have two c files:
a.c
void main(){
...
getvtable()->function();
}
the vtable is pointing to a function that is located in b.c:
void function(){
malloc(42);
}
now if i trace the program in valgrind I get the following:
==29994== 4,155 bytes in 831 blocks are definitely lost in loss record 26 of 28
==29994== at 0x402CB7A: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==29994== by 0x40A24D2: (below main) (libc-start.c:226)
so the call to function is completely ommited on the stack! How is it possible? In case I use GDB, a correct stack including "function" is shown.
Debug symbols are included, Linux, 32-bit.
Upd:
Answering the first question, I get the following output when debugging valgrind's GDB server. The breakpoint is not coming, while it comes when i debug directly with GDB.
stasik#gemini:~$ gdb -q
(gdb) set confirm off
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 11665
[Switching to Thread 11665]
0x040011d0 in ?? ()
(gdb) file /home/stasik/leak.so
Reading symbols from /home/stasik/leak.so...done.
(gdb) break function
Breakpoint 1 at 0x110c: file ../../source/leakclass.c, line 32.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>end
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) source thread-frames.py
Stack level 0, frame at 0x42348a0:
eip = 0x404efcb; saved eip 0x4f2f544c
called by frame at 0x42348a4
Arglist at 0x4234898, args:
Locals at 0x4234898, Previous frame's sp is 0x42348a0
Saved registers:
ebp at 0x4234898, eip at 0x423489c
Stack level 1, frame at 0x42348a4:
eip = 0x4f2f544c; saved eip 0x6e492056
called by frame at 0x42348a8, caller of frame at 0x42348a0
Arglist at 0x423489c, args:
Locals at 0x423489c, Previous frame's sp is 0x42348a4
Saved registers:
eip at 0x42348a0
Stack level 2, frame at 0x42348a8:
eip = 0x6e492056; saved eip 0x205d6f66
called by frame at 0x42348ac, caller of frame at 0x42348a4
Arglist at 0x42348a0, args:
Locals at 0x42348a0, Previous frame's sp is 0x42348a8
Saved registers:
eip at 0x42348a4
Stack level 3, frame at 0x42348ac:
eip = 0x205d6f66; saved eip 0x61746144
---Type <return> to continue, or q <return> to quit---
called by frame at 0x42348b0, caller of frame at 0x42348a8
Arglist at 0x42348a4, args:
Locals at 0x42348a4, Previous frame's sp is 0x42348ac
Saved registers:
eip at 0x42348a8
Stack level 4, frame at 0x42348b0:
eip = 0x61746144; saved eip 0x65736162
called by frame at 0x42348b4, caller of frame at 0x42348ac
Arglist at 0x42348a8, args:
Locals at 0x42348a8, Previous frame's sp is 0x42348b0
Saved registers:
eip at 0x42348ac
Stack level 5, frame at 0x42348b4:
eip = 0x65736162; saved eip 0x70616d20
called by frame at 0x42348b8, caller of frame at 0x42348b0
Arglist at 0x42348ac, args:
Locals at 0x42348ac, Previous frame's sp is 0x42348b4
Saved registers:
eip at 0x42348b0
Stack level 6, frame at 0x42348b8:
eip = 0x70616d20; saved eip 0x2e646570
called by frame at 0x42348bc, caller of frame at 0x42348b4
Arglist at 0x42348b0, args:
---Type <return> to continue, or q <return> to quit---
Locals at 0x42348b0, Previous frame's sp is 0x42348b8
Saved registers:
eip at 0x42348b4
Stack level 7, frame at 0x42348bc:
eip = 0x2e646570; saved eip 0x0
called by frame at 0x42348c0, caller of frame at 0x42348b8
Arglist at 0x42348b4, args:
Locals at 0x42348b4, Previous frame's sp is 0x42348bc
Saved registers:
eip at 0x42348b8
Stack level 8, frame at 0x42348c0:
eip = 0x0; saved eip 0x0
caller of frame at 0x42348bc
Arglist at 0x42348b8, args:
Locals at 0x42348b8, Previous frame's sp is 0x42348c0
Saved registers:
eip at 0x42348bc
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) continue
Continuing.

I see two possible reasons:
Valgrind is using a different stack unwind method than GDB
The address space layout is different while running your program under the two environments and you're only hitting stack corruption under Valgrind.
We can gain more insight by using Valgrind's builtin gdbserver.
Save this Python snippet to thread-frames.py
import gdb
f = gdb.newest_frame()
while f is not None:
f.select()
gdb.execute('info frame')
f = f.older()
t.gdb
set confirm off
file MY-PROGRAM
break function
commands
silent
end
run
source thread-frames.py
quit
v.gdb
set confirm off
target remote | vgdb
file MY-PROGRAM
break function
commands
silent
end
continue
source thread-frames.py
quit
(Change MY-PROGRAM, function in the scripts above and the commands below as required)
Get details about the stack frames under GDB:
$ gdb -q -x t.gdb
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbffff2f0:
eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
called by frame at 0xbffff310
source language c.
Arglist at 0xbffff2e8, args:
Locals at 0xbffff2e8, Previous frame's sp is 0xbffff2f0
Saved registers:
ebp at 0xbffff2e8, eip at 0xbffff2ec
Stack level 1, frame at 0xbffff310:
eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0xb7e33963
caller of frame at 0xbffff2f0
source language c.
Arglist at 0xbffff2f8, args:
Locals at 0xbffff2f8, Previous frame's sp is 0xbffff310
Saved registers:
ebp at 0xbffff2f8, eip at 0xbffff30c
Get the same data under Valgrind:
$ valgrind --vgdb=full --vgdb-error=0 ./MY-PROGRAM
In another shell:
$ gdb -q -x v.gdb
relaying data between gdb and process 574
0x04001020 in ?? ()
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbe88e2c0:
eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
called by frame at 0xbe88e2e0
source language c.
Arglist at 0xbe88e2b8, args:
Locals at 0xbe88e2b8, Previous frame's sp is 0xbe88e2c0
Saved registers:
ebp at 0xbe88e2b8, eip at 0xbe88e2bc
Stack level 1, frame at 0xbe88e2e0:
eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0x4051963
caller of frame at 0xbe88e2c0
source language c.
Arglist at 0xbe88e2c8, args:
Locals at 0xbe88e2c8, Previous frame's sp is 0xbe88e2e0
Saved registers:
ebp at 0xbe88e2c8, eip at 0xbe88e2dc
If GDB can successfully unwind the stack while connecting to "valgrind --gdb" then it's a problem with Valgrind's stack unwind algorithm. You can inspect the "info frame" output carefully for inline and tail call frames or some other reason that could throw Valgrind off. Otherwise it's probably stack corruption.

Ok, compiling all .so parts and the main program with an explicit -O0 seems to solve the problem. It seems that some of the optimizations of the 'core' program that was loading the .so (so was always compiled unoptimized) was breaking the stack.

This is Tail-call optimization in action.
The function function calls malloc as the last thing it does. The compiler sees this and kills the stack frame for function before it calls malloc. The advantage is that when malloc returns it returns directly to whichever function called function. I.e. it avoids malloc returning to function only to hit yet another return instruction.
In this case the optimization has prevented an unnecessary jump and made stack usage slightly more efficient, which is nice, but in the case of a recursive tail call then this optimization is a huge win as it turns a recursion into something more like iteration.
As you've discovered already, disabling optimization makes debugging much easier. If you want to debug optimized code (for performance testing, perhaps), then, as #Zang MingJie already said, you can disable this one optimization with -fno-optimize-sibling-calls.

Related

Stack allocation fail when close to heap

Issue Description
C++ server got several crashes at the same stack frame as below
enter image description here
Investigation
Memory was not used up, 400M-800M
Stack space used is not large, about 270K-350K
Haven't found any other code issue caused crash
The distance between stack and heap is about 1M.
Binary is built with PIE
OS is 2.6.32-696.16.1.el6.i686 #1 SMP Wed Nov 15 16:16:47 UTC 2017 i686 i686 i386 GNU/Linux
Stack space used in one crash
(gdb) f 0
#0 0xb717f616 in WDMS_Byte_Stream::write (this=Cannot access memory at
address 0xbfa16b1c) at wdms_bs.cpp:63 in wdms_bs.cpp
(gdb) set $top=$esp
(gdb) f 38
#38 0xb6a1bcbf in main (argc=1, argv=0xbfa5a8b4) at main.cpp:66
main.cpp: No such file or directory.
in main.cpp
(gdb) p $esp-$top
$1 = 277728
(gdb)
Distance between stack and heap
Memory allocated
enter image description here
One heap segment close to stack space, 0xb8a8f000->0xbf917000
enter image description here
Stack space 0xbfa18000->0xbfa5c000
enter image description here
the distance between them
(gdb) p 0xbfa18000- 0xbf917000
$3 = 1052672
Why crash
Program need to allocate 8268 bytes (0x204c) stack size at last frame
OS only expand stack size to 0xbfa18000, actual size is
(gdb) p $esp+0x204c-0xbfa18000
$19 = (void *) 0xb4c(2892 bytes)
So, when execute next instruction
call 0xb6a19ec9 <__i686.get_pc_thunk.bx>, where need access $esp, which is out of accessible memory space.
Why crash at the same stack frame? It could be because this call sequence has most call frames and to use most stack space.
enter image description here
Question-1
Is this related CVE-2017-1000364 Fix ?
enter image description here
Question-2
Why does OS allocate heap segment so close to stack? how does it avoid stack and heap collision and not allocation failure?

Invalid cast in gdb (armv7)

I have a similar issue like in this question GDB corrupted stack frame - How to debug?, like this:
(gdb) bt
#0 0x76bd6978 in fputs () from /lib/libc.so.6
#1 0x0000b080 in getfunction1 ()
#2 0x0000b080 in getfunction1 ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Chris Dodd wrote an answer to point the top of the stack to the program counter (PC). In a 32bit machine it shall be
(gdb) set $pc = *(void **)$esp
(gdb) set $esp = $esp + 4
However, after run the first line I got invalid cast:
(gdb) set $pc = *(void **)$esp
Invalid cast.
(gdb) set $esp = $esp + 4
Argument to arithmetic operation not a number or boolean.
Why do I get this messages? and how can I make a workaround to figure out where the crash occurs? I work on a armv7 machine with Linux.
ESP does not exist in ARM. It's MSP (Main Stack Pointer) or PSP (Stack pointer).
ARM Registers
As ESP does not exists, that's why you get invalid cast. If you do the same command with another valid ARM register there is no error

arm_data abort failure in case of running my program for the second time and thereafter

I add my program (load a file and do some computation) into the app of TizenRT on ARTIK053. The program can run successfully in the first time, but the data abort failure will be met when running it second time. The specific error info is as follows:
arm_dataabort:
Data abort. PC: 040d25a0 DFAR: 00000011 DFSR: 0000080d
up_assert: Assertion failed at file:armv7-r/arm_dataabort.c line: 111 task: ghsom_test
up_dumpstate: Current sp: 020c3eb0
up_dumpstate: User stack:
up_dumpstate: base: 020c3fd0
up_dumpstate: size: 00000fd4
up_dumpstate: used: 00000220
up_dumpstate: User Stack
up_stackdump: 020c3ea0: 00000003 020c3eb0 040c9638 041d38b8 00000000 040c9644 00000011 0000080
.....
.....
up_taskdump: Idle Task: PID=0 Stack Used=1024 of 1024
up_taskdump: hpwork: PID=1 Stack Used=164 of 2028
up_taskdump: lpwork: PID=2 Stack Used=164 of 2028
up_taskdump: logm: PID=3 Stack Used=300 of 2028
up_taskdump: LWIP_TCP/IP: PID=4 Stack Used=228 of 4068
up_taskdump: tash: PID=6 Stack Used=948 of 4076
up_taskdump: ghsom_test: PID=10 Stack Used=616 of 4052
I checked the remaining free RAM space, it is enough for my program. And I added some printing info into my main function to check on which line the error come out. I found that if I commented some lines before the line that the error come out, in the next time I running the program, the error line will move downward some lines. It seems like I released some stack space. So I guess it might be an issue related with the stack size that I can assign to a single proc. Anyone knows the reason, and how to solve the issue? To be mentioned, it only happens for the second time and thereafter I running the program.
With the stackdump you can almost always figure out where the error originated from.
Since you have the image file for your build you can do
arm-none-eabi-addr2line -f -p -i -b build/out/bin/tinyara 0xADDR
where ADDR would be the addr is one of the relevant addresses in the stack dump.
You can usually check the "current sp" (stack pointer) but often it points to the arm_dataabort shown in the failure above.
Then you can check the PC address and also look for addresses in the stack dump (starting from the back of it) that looks like the PC in value.
In your case it could be addresses like (in that order): 040c9644, 041d38b8, 040c9638
So basically:
arm-none-eabi-addr2line -f -p -i -b build/out/bin/tinyara 0x040c9644
notice the 0x in front of the address.
The command will give you a good indication for where this address is coming from in your binary like:
up_idlepm_static at /home/user/tizenrt/os/arch/arm/src/chip/s5j_idle.c:111
(inlined by) up_idle at /home/user/tizenrt/os/arch/arm/src/chip/s5j_idle.c:254
if the address is not pointing to code lines then it will look like:
?? ??:0
hope that helps

How to retain a stacktrace when Cortex-M3 gone in hardfault?

Using the following setup:
Cortex-M3 based µC
gcc-arm cross toolchain
using C and C++
FreeRtos 7.5.3
Eclipse Luna
Segger Jlink with JLinkGDBServer
Code Confidence FreeRtos debug plugin
Using JLinkGDBServer and eclipse as debug frontend, I always have a nice stacktrace when stepping through my code. When using the Code Confidence freertos tools (eclipse plugin), I also see the stacktraces of all threads which are currently not running (without that plugin, I see just the stacktrace of the active thread). So far so good.
But now, when my application fall into a hardfault, the stacktrace is lost.
Well, I know the technique on how to find out the code address which causes the hardfault (as seen here).
But this is very poor information compared to full stacktrace.
Ok, some times when falling into hardfault there is no way to retain a stacktrace, e.g. when the stack is corrupted by the faulty code. But if the stack is healty, I think that getting a stacktrace might be possible (isn't it?).
I think the reason for loosing the stacktrace when in hardfault is, that the stackpointer would be swiched from PSP to MSP automatically by the Cortex-M3 architecture. One idea is now, to (maybe) set the MSP to the previous PSP value (and maybe have to do some additional stack preperation?).
Any suggestions on how to do that or other approaches to retain a stacktrace when in hardfault?
Edit 2015-07-07, added more details.
I uses this code to provocate a hardfault:
__attribute__((optimize("O0"))) static void checkHardfault() {
volatile uint32_t* varAtOddAddress = (uint32_t*)-1;
(*varAtOddAddress)++;
}
When stepping into checkHardfault(), my stacktrace looks good like this:
gdb-> backtrace
#0 checkHardfault () at Main.cxx:179
#1 0x100360f6 in GetOneEvent () at Main.cxx:185
#2 0x1003604e in executeMainLoop () at Main.cxx:121
#3 0x1001783a in vMainTask (pvParameters=0x0) at Main.cxx:408
#4 0x00000000 in ?? ()
When run into the hardfault (at (*varAtOddAddress)++;) and find myself inside of the HardFault_Handler(), the stacktrace is:
gdb-> backtrace
#0 HardFault_Handler () at Hardfault.c:312
#1 <signal handler called>
#2 0x10015f36 in prvPortStartFirstTask () at freertos/portable/GCC/ARM_CM3/port.c:224
#3 0x10015fd6 in xPortStartScheduler () at freertos/portable/GCC/ARM_CM3/port.c:301
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
The quickest way to get the debugger to give you the details of the state prior to the hard fault is to return the processor to the state prior to the hard fault.
In the debugger, write a script that takes the information from the various hardware registers and restore PC, LR, R0-R14 to the state just prior to causing the hard fault, then do your stack dump.
Of course, this isn't always helpful when you end up at the hard fault because of popping stuff off of a blown stack or stomping on stuff in memory. You generally tend to corrupt a bunch of the important registers, return back to some crazy spot in memory, and then execute whatever's there. You can end up hard faulting many thousands (millions?) of cycles after your real problem happens.
Consider using the following gdb macro to restore the register contents:
define hfstack
set $frame_ptr = (unsigned *)$sp
if $lr & 0x10
set $sp = $frame_ptr + (8 * 4)
else
set $sp = $frame_ptr + (26 * 4)
end
set $lr = $frame_ptr[5]
set $pc = $frame_ptr[6]
bt
end
document hfstack
set the correct stack context after a hard fault on Cortex M
end

GDB + Core file dump

Can some please help me to understand this:-
Below is an extract from gdb. After my program crashed, I opened the binary and core file in gdb and issued the command info frame:
(gdb) info frame
Stack level 0, frame at 0xb75f7390:
eip = 0x804877f in base::func() (testing.cpp:16); saved eip 0x804869a
called by frame at 0xb75f73b0
source language c++.
Arglist at 0xb75f7388, args: this=0x0
Locals at 0xb75f7388, Previous frame's sp is 0xb75f7390
Saved registers:
ebp at 0xb75f7388, eip at 0xb75f738c
What do the lines "ebp", "eip", "Locals at" and "Previous Frame's sp " mean? Please explain
This diagram from the Wikipedia article Call stack may help:
GDB's info frame corresponds to functions being called in your program at run time. From the output, we can infer this about the stack frame layout:
0xb75f7388: The 4 bytes starting here stores the old EBP value, 0xb75f73a8. The first value pushed by the function prologue of base::func()
0xb75f738c: The 4 bytes starting here stores the return address, 0x804869a. Pushed by the call instruction in the previous frame
0xb75f7390: The 4 bytes starting here stores the implicit this argument to base::func(), 0x00000000.
I'll explain the info frame output line by line:
Stack level 0, frame at 0xb75f7390:
Stack level 0 means this is the newest frame. The address after frame at is called the Canonical Frame Address (CFA). On x86, this is defined to be the value of the stack pointer (ESP) at the previous frame, before the call instruction is executed.
eip = 0x804877f in base::func() (testing.cpp:16); saved eip 0x804869a
EIP is the x86 instruction pointer. saved eip is the return address.
If you try to look up the function that contains 0x804869a with info symbol 0x804869a, it should point inside the function calling base::func().
called by frame at 0xb75f73b0
called by shows the canonical frame address of the previous frame. We can see that the stack pointer advanced 32 bytes (0xb75f73b0 - 0xb75f7390 = 32) between the two frames.
source language c++.
Arglist at 0xb75f7388, args: this=0x0
Locals at 0xb75f7388, Previous frame's sp is 0xb75f7390
The x86 ABI passes arguments on the stack. base::func() only has the single implicit this argument. The fact that it's 0x0 i.e. NULL bodes ill. On a side note, Arglist and Locals seem to always have the same value in info frame on x86 and x86-64.
Saved registers:
ebp at 0xb75f7388, eip at 0xb75f738c
Saved registers reflects the registers that were saved at function entry. It lists where the old register values are saved on the stack. Saved EIP is the return address so if you examine the address stored at 0xb75f738c with x/a 0xb75f738c it should give 0x804869a. The fact that EBP is listed here implies that your code probably was not compiled with -fomit-frame-pointer and has a standard function prologue:
push %ebp
movl %esp, %ebp
at the very begging of base::func() which sets up EBP to act as the frame pointer.
To analyze core file, execute:
$gdb executable core
gdb$ bt -- backtrace
or gdb$ fr 0 -- the uppermost frame in the run-time stack
gdb$ fr 1 & so on will give you the order of function calls which led to Seg Fault.
Here, gdb$info frame is providing you info about frame 0.