GDB + Core file dump - gdb

Can some please help me to understand this:-
Below is an extract from gdb. After my program crashed, I opened the binary and core file in gdb and issued the command info frame:
(gdb) info frame
Stack level 0, frame at 0xb75f7390:
eip = 0x804877f in base::func() (testing.cpp:16); saved eip 0x804869a
called by frame at 0xb75f73b0
source language c++.
Arglist at 0xb75f7388, args: this=0x0
Locals at 0xb75f7388, Previous frame's sp is 0xb75f7390
Saved registers:
ebp at 0xb75f7388, eip at 0xb75f738c
What do the lines "ebp", "eip", "Locals at" and "Previous Frame's sp " mean? Please explain

This diagram from the Wikipedia article Call stack may help:
GDB's info frame corresponds to functions being called in your program at run time. From the output, we can infer this about the stack frame layout:
0xb75f7388: The 4 bytes starting here stores the old EBP value, 0xb75f73a8. The first value pushed by the function prologue of base::func()
0xb75f738c: The 4 bytes starting here stores the return address, 0x804869a. Pushed by the call instruction in the previous frame
0xb75f7390: The 4 bytes starting here stores the implicit this argument to base::func(), 0x00000000.
I'll explain the info frame output line by line:
Stack level 0, frame at 0xb75f7390:
Stack level 0 means this is the newest frame. The address after frame at is called the Canonical Frame Address (CFA). On x86, this is defined to be the value of the stack pointer (ESP) at the previous frame, before the call instruction is executed.
eip = 0x804877f in base::func() (testing.cpp:16); saved eip 0x804869a
EIP is the x86 instruction pointer. saved eip is the return address.
If you try to look up the function that contains 0x804869a with info symbol 0x804869a, it should point inside the function calling base::func().
called by frame at 0xb75f73b0
called by shows the canonical frame address of the previous frame. We can see that the stack pointer advanced 32 bytes (0xb75f73b0 - 0xb75f7390 = 32) between the two frames.
source language c++.
Arglist at 0xb75f7388, args: this=0x0
Locals at 0xb75f7388, Previous frame's sp is 0xb75f7390
The x86 ABI passes arguments on the stack. base::func() only has the single implicit this argument. The fact that it's 0x0 i.e. NULL bodes ill. On a side note, Arglist and Locals seem to always have the same value in info frame on x86 and x86-64.
Saved registers:
ebp at 0xb75f7388, eip at 0xb75f738c
Saved registers reflects the registers that were saved at function entry. It lists where the old register values are saved on the stack. Saved EIP is the return address so if you examine the address stored at 0xb75f738c with x/a 0xb75f738c it should give 0x804869a. The fact that EBP is listed here implies that your code probably was not compiled with -fomit-frame-pointer and has a standard function prologue:
push %ebp
movl %esp, %ebp
at the very begging of base::func() which sets up EBP to act as the frame pointer.

To analyze core file, execute:
$gdb executable core
gdb$ bt -- backtrace
or gdb$ fr 0 -- the uppermost frame in the run-time stack
gdb$ fr 1 & so on will give you the order of function calls which led to Seg Fault.
Here, gdb$info frame is providing you info about frame 0.

Related

Stack allocation fail when close to heap

Issue Description
C++ server got several crashes at the same stack frame as below
enter image description here
Investigation
Memory was not used up, 400M-800M
Stack space used is not large, about 270K-350K
Haven't found any other code issue caused crash
The distance between stack and heap is about 1M.
Binary is built with PIE
OS is 2.6.32-696.16.1.el6.i686 #1 SMP Wed Nov 15 16:16:47 UTC 2017 i686 i686 i386 GNU/Linux
Stack space used in one crash
(gdb) f 0
#0 0xb717f616 in WDMS_Byte_Stream::write (this=Cannot access memory at
address 0xbfa16b1c) at wdms_bs.cpp:63 in wdms_bs.cpp
(gdb) set $top=$esp
(gdb) f 38
#38 0xb6a1bcbf in main (argc=1, argv=0xbfa5a8b4) at main.cpp:66
main.cpp: No such file or directory.
in main.cpp
(gdb) p $esp-$top
$1 = 277728
(gdb)
Distance between stack and heap
Memory allocated
enter image description here
One heap segment close to stack space, 0xb8a8f000->0xbf917000
enter image description here
Stack space 0xbfa18000->0xbfa5c000
enter image description here
the distance between them
(gdb) p 0xbfa18000- 0xbf917000
$3 = 1052672
Why crash
Program need to allocate 8268 bytes (0x204c) stack size at last frame
OS only expand stack size to 0xbfa18000, actual size is
(gdb) p $esp+0x204c-0xbfa18000
$19 = (void *) 0xb4c(2892 bytes)
So, when execute next instruction
call 0xb6a19ec9 <__i686.get_pc_thunk.bx>, where need access $esp, which is out of accessible memory space.
Why crash at the same stack frame? It could be because this call sequence has most call frames and to use most stack space.
enter image description here
Question-1
Is this related CVE-2017-1000364 Fix ?
enter image description here
Question-2
Why does OS allocate heap segment so close to stack? how does it avoid stack and heap collision and not allocation failure?

Invalid cast in gdb (armv7)

I have a similar issue like in this question GDB corrupted stack frame - How to debug?, like this:
(gdb) bt
#0 0x76bd6978 in fputs () from /lib/libc.so.6
#1 0x0000b080 in getfunction1 ()
#2 0x0000b080 in getfunction1 ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Chris Dodd wrote an answer to point the top of the stack to the program counter (PC). In a 32bit machine it shall be
(gdb) set $pc = *(void **)$esp
(gdb) set $esp = $esp + 4
However, after run the first line I got invalid cast:
(gdb) set $pc = *(void **)$esp
Invalid cast.
(gdb) set $esp = $esp + 4
Argument to arithmetic operation not a number or boolean.
Why do I get this messages? and how can I make a workaround to figure out where the crash occurs? I work on a armv7 machine with Linux.
ESP does not exist in ARM. It's MSP (Main Stack Pointer) or PSP (Stack pointer).
ARM Registers
As ESP does not exists, that's why you get invalid cast. If you do the same command with another valid ARM register there is no error

arm_data abort failure in case of running my program for the second time and thereafter

I add my program (load a file and do some computation) into the app of TizenRT on ARTIK053. The program can run successfully in the first time, but the data abort failure will be met when running it second time. The specific error info is as follows:
arm_dataabort:
Data abort. PC: 040d25a0 DFAR: 00000011 DFSR: 0000080d
up_assert: Assertion failed at file:armv7-r/arm_dataabort.c line: 111 task: ghsom_test
up_dumpstate: Current sp: 020c3eb0
up_dumpstate: User stack:
up_dumpstate: base: 020c3fd0
up_dumpstate: size: 00000fd4
up_dumpstate: used: 00000220
up_dumpstate: User Stack
up_stackdump: 020c3ea0: 00000003 020c3eb0 040c9638 041d38b8 00000000 040c9644 00000011 0000080
.....
.....
up_taskdump: Idle Task: PID=0 Stack Used=1024 of 1024
up_taskdump: hpwork: PID=1 Stack Used=164 of 2028
up_taskdump: lpwork: PID=2 Stack Used=164 of 2028
up_taskdump: logm: PID=3 Stack Used=300 of 2028
up_taskdump: LWIP_TCP/IP: PID=4 Stack Used=228 of 4068
up_taskdump: tash: PID=6 Stack Used=948 of 4076
up_taskdump: ghsom_test: PID=10 Stack Used=616 of 4052
I checked the remaining free RAM space, it is enough for my program. And I added some printing info into my main function to check on which line the error come out. I found that if I commented some lines before the line that the error come out, in the next time I running the program, the error line will move downward some lines. It seems like I released some stack space. So I guess it might be an issue related with the stack size that I can assign to a single proc. Anyone knows the reason, and how to solve the issue? To be mentioned, it only happens for the second time and thereafter I running the program.
With the stackdump you can almost always figure out where the error originated from.
Since you have the image file for your build you can do
arm-none-eabi-addr2line -f -p -i -b build/out/bin/tinyara 0xADDR
where ADDR would be the addr is one of the relevant addresses in the stack dump.
You can usually check the "current sp" (stack pointer) but often it points to the arm_dataabort shown in the failure above.
Then you can check the PC address and also look for addresses in the stack dump (starting from the back of it) that looks like the PC in value.
In your case it could be addresses like (in that order): 040c9644, 041d38b8, 040c9638
So basically:
arm-none-eabi-addr2line -f -p -i -b build/out/bin/tinyara 0x040c9644
notice the 0x in front of the address.
The command will give you a good indication for where this address is coming from in your binary like:
up_idlepm_static at /home/user/tizenrt/os/arch/arm/src/chip/s5j_idle.c:111
(inlined by) up_idle at /home/user/tizenrt/os/arch/arm/src/chip/s5j_idle.c:254
if the address is not pointing to code lines then it will look like:
?? ??:0
hope that helps

Valgrind stack misses a function completely

i have two c files:
a.c
void main(){
...
getvtable()->function();
}
the vtable is pointing to a function that is located in b.c:
void function(){
malloc(42);
}
now if i trace the program in valgrind I get the following:
==29994== 4,155 bytes in 831 blocks are definitely lost in loss record 26 of 28
==29994== at 0x402CB7A: malloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==29994== by 0x40A24D2: (below main) (libc-start.c:226)
so the call to function is completely ommited on the stack! How is it possible? In case I use GDB, a correct stack including "function" is shown.
Debug symbols are included, Linux, 32-bit.
Upd:
Answering the first question, I get the following output when debugging valgrind's GDB server. The breakpoint is not coming, while it comes when i debug directly with GDB.
stasik#gemini:~$ gdb -q
(gdb) set confirm off
(gdb) target remote | vgdb
Remote debugging using | vgdb
relaying data between gdb and process 11665
[Switching to Thread 11665]
0x040011d0 in ?? ()
(gdb) file /home/stasik/leak.so
Reading symbols from /home/stasik/leak.so...done.
(gdb) break function
Breakpoint 1 at 0x110c: file ../../source/leakclass.c, line 32.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>end
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) source thread-frames.py
Stack level 0, frame at 0x42348a0:
eip = 0x404efcb; saved eip 0x4f2f544c
called by frame at 0x42348a4
Arglist at 0x4234898, args:
Locals at 0x4234898, Previous frame's sp is 0x42348a0
Saved registers:
ebp at 0x4234898, eip at 0x423489c
Stack level 1, frame at 0x42348a4:
eip = 0x4f2f544c; saved eip 0x6e492056
called by frame at 0x42348a8, caller of frame at 0x42348a0
Arglist at 0x423489c, args:
Locals at 0x423489c, Previous frame's sp is 0x42348a4
Saved registers:
eip at 0x42348a0
Stack level 2, frame at 0x42348a8:
eip = 0x6e492056; saved eip 0x205d6f66
called by frame at 0x42348ac, caller of frame at 0x42348a4
Arglist at 0x42348a0, args:
Locals at 0x42348a0, Previous frame's sp is 0x42348a8
Saved registers:
eip at 0x42348a4
Stack level 3, frame at 0x42348ac:
eip = 0x205d6f66; saved eip 0x61746144
---Type <return> to continue, or q <return> to quit---
called by frame at 0x42348b0, caller of frame at 0x42348a8
Arglist at 0x42348a4, args:
Locals at 0x42348a4, Previous frame's sp is 0x42348ac
Saved registers:
eip at 0x42348a8
Stack level 4, frame at 0x42348b0:
eip = 0x61746144; saved eip 0x65736162
called by frame at 0x42348b4, caller of frame at 0x42348ac
Arglist at 0x42348a8, args:
Locals at 0x42348a8, Previous frame's sp is 0x42348b0
Saved registers:
eip at 0x42348ac
Stack level 5, frame at 0x42348b4:
eip = 0x65736162; saved eip 0x70616d20
called by frame at 0x42348b8, caller of frame at 0x42348b0
Arglist at 0x42348ac, args:
Locals at 0x42348ac, Previous frame's sp is 0x42348b4
Saved registers:
eip at 0x42348b0
Stack level 6, frame at 0x42348b8:
eip = 0x70616d20; saved eip 0x2e646570
called by frame at 0x42348bc, caller of frame at 0x42348b4
Arglist at 0x42348b0, args:
---Type <return> to continue, or q <return> to quit---
Locals at 0x42348b0, Previous frame's sp is 0x42348b8
Saved registers:
eip at 0x42348b4
Stack level 7, frame at 0x42348bc:
eip = 0x2e646570; saved eip 0x0
called by frame at 0x42348c0, caller of frame at 0x42348b8
Arglist at 0x42348b4, args:
Locals at 0x42348b4, Previous frame's sp is 0x42348bc
Saved registers:
eip at 0x42348b8
Stack level 8, frame at 0x42348c0:
eip = 0x0; saved eip 0x0
caller of frame at 0x42348bc
Arglist at 0x42348b8, args:
Locals at 0x42348b8, Previous frame's sp is 0x42348c0
Saved registers:
eip at 0x42348bc
(gdb) continue
Continuing.
Program received signal SIGTRAP, Trace/breakpoint trap.
0x0404efcb in ?? ()
(gdb) continue
Continuing.
I see two possible reasons:
Valgrind is using a different stack unwind method than GDB
The address space layout is different while running your program under the two environments and you're only hitting stack corruption under Valgrind.
We can gain more insight by using Valgrind's builtin gdbserver.
Save this Python snippet to thread-frames.py
import gdb
f = gdb.newest_frame()
while f is not None:
f.select()
gdb.execute('info frame')
f = f.older()
t.gdb
set confirm off
file MY-PROGRAM
break function
commands
silent
end
run
source thread-frames.py
quit
v.gdb
set confirm off
target remote | vgdb
file MY-PROGRAM
break function
commands
silent
end
continue
source thread-frames.py
quit
(Change MY-PROGRAM, function in the scripts above and the commands below as required)
Get details about the stack frames under GDB:
$ gdb -q -x t.gdb
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbffff2f0:
eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
called by frame at 0xbffff310
source language c.
Arglist at 0xbffff2e8, args:
Locals at 0xbffff2e8, Previous frame's sp is 0xbffff2f0
Saved registers:
ebp at 0xbffff2e8, eip at 0xbffff2ec
Stack level 1, frame at 0xbffff310:
eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0xb7e33963
caller of frame at 0xbffff2f0
source language c.
Arglist at 0xbffff2f8, args:
Locals at 0xbffff2f8, Previous frame's sp is 0xbffff310
Saved registers:
ebp at 0xbffff2f8, eip at 0xbffff30c
Get the same data under Valgrind:
$ valgrind --vgdb=full --vgdb-error=0 ./MY-PROGRAM
In another shell:
$ gdb -q -x v.gdb
relaying data between gdb and process 574
0x04001020 in ?? ()
Breakpoint 1 at 0x80484a2: file valgrind-unwind.c, line 6.
Stack level 0, frame at 0xbe88e2c0:
eip = 0x80484a2 in function (valgrind-unwind.c:6); saved eip 0x8048384
called by frame at 0xbe88e2e0
source language c.
Arglist at 0xbe88e2b8, args:
Locals at 0xbe88e2b8, Previous frame's sp is 0xbe88e2c0
Saved registers:
ebp at 0xbe88e2b8, eip at 0xbe88e2bc
Stack level 1, frame at 0xbe88e2e0:
eip = 0x8048384 in main (valgrind-unwind.c:17); saved eip 0x4051963
caller of frame at 0xbe88e2c0
source language c.
Arglist at 0xbe88e2c8, args:
Locals at 0xbe88e2c8, Previous frame's sp is 0xbe88e2e0
Saved registers:
ebp at 0xbe88e2c8, eip at 0xbe88e2dc
If GDB can successfully unwind the stack while connecting to "valgrind --gdb" then it's a problem with Valgrind's stack unwind algorithm. You can inspect the "info frame" output carefully for inline and tail call frames or some other reason that could throw Valgrind off. Otherwise it's probably stack corruption.
Ok, compiling all .so parts and the main program with an explicit -O0 seems to solve the problem. It seems that some of the optimizations of the 'core' program that was loading the .so (so was always compiled unoptimized) was breaking the stack.
This is Tail-call optimization in action.
The function function calls malloc as the last thing it does. The compiler sees this and kills the stack frame for function before it calls malloc. The advantage is that when malloc returns it returns directly to whichever function called function. I.e. it avoids malloc returning to function only to hit yet another return instruction.
In this case the optimization has prevented an unnecessary jump and made stack usage slightly more efficient, which is nice, but in the case of a recursive tail call then this optimization is a huge win as it turns a recursion into something more like iteration.
As you've discovered already, disabling optimization makes debugging much easier. If you want to debug optimized code (for performance testing, perhaps), then, as #Zang MingJie already said, you can disable this one optimization with -fno-optimize-sibling-calls.

How to print register values in GDB?

How do I print the value of %eax and %ebp?
(gdb) p $eax
$1 = void
info registers shows all the registers; info registers eax shows just the register eax. The command can be abbreviated as i r
If you're trying to print a specific register in GDB, you have to omit the % sign. For example,
info registers eip
If your executable is 64 bit, the registers start with r. Starting them with e is not valid.
info registers rip
Those can be abbreviated to:
i r rip
There is also:
info all-registers
Then you can get the register name you are interested in -- very useful for finding platform-specific registers (like NEON Q... on ARM).
If only want check it once, info registers show registers.
If only want watch one register, for example, display $esp continue display esp registers in gdb command line.
If want watch all registers, layout regs continue show registers, with TUI mode.
Gdb commands:
i r <register_name>: print a single register, e.g i r rax, i r eax
i r <register_name_1> <register_name_2> ...: print multiple registers, e.g i r rdi rsi,
i r: print all register except floating point & vector register (xmm, ymm, zmm).
i r a: print all register, include floating point & vector register (xmm, ymm, zmm).
i r f: print all FPU floating registers (st0-7 and a few other f*)
Other register groups besides a (all) and f (float) can be found with:
maint print reggroups
as documented at: https://sourceware.org/gdb/current/onlinedocs/gdb/Registers.html#Registers
Tips:
xmm0 ~ xmm15, are 128 bits, almost every modern machine has it, they are released in 1999.
ymm0 ~ ymm15, are 256 bits, new machine usually have it, they are released in 2011.
zmm0 ~ zmm31, are 512 bits, normal pc probably don't have it (as the year 2016), they are released in 2013, and mainly used in servers so far.
Only one serial of xmm / ymm / zmm will be shown, because they are the same registers in different mode. On my machine ymm is shown.
p $eax works as of GDB 7.7.1
Tested as of GDB 7.7.1, the command you've tried works:
set $eax = 0
p $eax
# $1 = 0
set $eax = 1
p $eax
# $2 = 1
This syntax can also be used to select between different union members e.g. for ARM floating point registers that can be either floating point or integers:
p $s0.f
p $s0.u
From the docs:
Any name preceded by ‘$’ can be used for a convenience variable, unless it is one of the predefined machine-specific register names.
and:
You can refer to machine register contents, in expressions, as variables with names starting with ‘$’. The names of registers are different for each machine; use info registers to see the names used on your machine.
But I haven't had much luck with control registers so far: OSDev 2012 http://f.osdev.org/viewtopic.php?f=1&t=25968 || 2005 feature request https://www.sourceware.org/ml/gdb/2005-03/msg00158.html || alt.lang.asm 2013 https://groups.google.com/forum/#!topic/alt.lang.asm/JC7YS3Wu31I
ARM floating point registers
See: https://reverseengineering.stackexchange.com/questions/8992/floating-point-registers-on-arm/20623#20623
Easiest for me is:
(gdb) x/x $eax
First x stands for examine and second x is hex. You can see other formats using:
(gdb) help x
You can easily print strings with x/s $eax or return addresses with x/a $ebp+4.