Understanding output of gdb `where` command - gdb

I am debugging the following program (which is correct).
[OP#localhost 04]$ cat factorial.s
.section .text
.globl _start
.globl _factorial
_start:
push $4
call factorial
add $8, %rsp
mov %rax, %rdi
mov $60, %rax
syscall
.type factorial, #function
factorial:
# Parameters
# n : int
# Number to take factorial of
push %rbp
mov %rsp, %rbp
mov 0x10(%rbp), %rax
if:
cmp $1, %rax
jne else
jmp end_if
else:
dec %rax
push %rax
call factorial
add $8, %rsp
imul 0x10(%rbp), %rax
end_if:
pop %rbp
ret
I've set a breakpoint at the factorial function, and continued twice. Examining the value of %rsp, I find that it is
(gdb) print/$rsp
$1 = 0x7fffffffd698
Examining the region surrounding that, I find
(gdb) x /10xg 0x7fffffffd690
0x7fffffffd690: 0x0000000000000000 0x00007fffffffd6b0
0x7fffffffd6a0: 0x0000000000401030 0x0000000000000002
0x7fffffffd6b0: 0x00007fffffffd6c8 0x0000000000401030
0x7fffffffd6c0: 0x0000000000000003 0x0000000000000000
0x7fffffffd6d0: 0x0000000000401007 0x0000000000000004
Roughly, this is as one would expect. However, the output of where is the following:
(gdb) where
#0 0x000000000040101b in factorial ()
#1 0x0000000000401030 in else ()
#2 0x0000000000000002 in ?? ()
#3 0x00007fffffffd6c8 in ?? ()
#4 0x0000000000401030 in else ()
#5 0x0000000000000003 in ?? ()
#6 0x0000000000000000 in ?? ()
Which I cannot seem to understand. It seems to be reading the stack in order, but I don't know where the number 0x40101b came from (that number is nowhere on the stack), and I'm not sure why it stopped there, as it does not print out the stack frame for the initial function call to factorial.

However, the output of where is the following:
On x86_64, GDB expects a program to have proper DWARF unwind info (which your program completely lacks). (Documentation on how to insert such info using .cfi directives.)
Without DWARF info, GDB makes guesses using certain heuristics. Here GDB treats else as if it were a function, and tries to find its caller, with disastrous results. Effectively, where will not work well for a program without DWARF unwind info, unless that program only uses C function labels and uses frame pointers.

Related

Debugging issue: why JE-command is "stuck"?

I have multithreaded-app but need to mention in the very beginning that target CPU has 8 virtual cores and totally my worker pool consists of 7 threads where one thread is "stuck" in the runtime on a simple if-condition like:
if (enumerator::e1 == data_member || enumerator::e2 == data_member) {
return function_member();
}
// ...
enum class enumerator : char {
e1: 0,
e2: 'W' /// < 57
}
What I see if dig deeper:
0x883c90 <+0>: pushq %r15
0x883c92 <+2>: pushq %r14
0x883c94 <+4>: pushq %rbx
0x883c95 <+5>: subq $0x20, %rsp
0x883c99 <+9>: movq %rsi, %rbx
0x883c9c <+12>: movq %rdi, %r15
0x883c9f <+15>: movb 0x8(%r15), %al
0x883ca3 <+19>: cmpb $0x57, %al
-> 0x883ca5 <+21>: je 0x883cab
0x883ca7 <+23>: testb %al, %al
0x883ca9 <+25>: jne 0x883d08
0x883cab <+27>: movq (%rbx), %rax
This makes me confused: as far as I understand je is just a jump to 0x883cab which is never going to be happened because thread step-next|in|over do not lead to anything and even later lldb(after manual break by process interrupt) is saying that execution still at the same point.
I have also noticed that stop reason is "next-branch-location":
(lldb) thread select 3
* thread #3, name = 'myapp', stop reason = next-branch-location
...but not really sure what does this actually mean because was able to google just lldb repo where this reason is mentioned just once at /Target/ThreadPlanStepRange.cpp
Just in case:
(lldb) register read
General Purpose Registers:
rax = 0x0000000002c8c000
rbx = 0x00007fef312568b0
rcx = 0x0000000000000000
rdx = 0x00007fef31256988
rdi = 0x00000000025662a0
rsi = 0x00007fef312568b0
rbp = 0x0000000002579b48
rsp = 0x00007fef31256850
r8 = 0x0000000000000000
r9 = 0x00000000ffffffff
r10 = 0x0000000000000000
r11 = 0x0000000000000000
r12 = 0x00007fef31256920
r13 = 0x0000000002c5a030
r14 = 0x00007fef31256920
r15 = 0x00000000025662a0
rip = 0x0000000000883ca5
rflags = 0x0000000000000297
cs = 0x0000000000000033
fs = 0x0000000000000000
gs = 0x0000000000000000
ss = 0x000000000000002b
ds = 0x0000000000000000
es = 0x0000000000000000
I thought about thread starvation but totally my app use 8 threads and virtual machine on the Intel Ice Lake platform in the cloud configured exactly with 8 cores.
Happy to learn something new, thank you in advance.

Segfault sharing array between assembly and C++

I am writing a program that has a shared state between assembly and C++. I declared a global array in the assembly file and accessed that array in a function within C++. When I call that function from within C++, there are no issues, but then I call that same function from within assembly and I get a segmentation fault. I believe I preserved the right registers across function calls.
Strangely, when I change the type of the pointer within C++ to a uint64_t pointer, it correctly outputs the values but then segmentation faults again after casting it to a uint64_t.
In the following code, the array which keeps giving me errors is currentCPUState.
//CPU.cpp
extern uint64_t currentCPUState[6];
extern "C" {
void initInternalState(void* instructions, int indexSize);
void printCPUState();
}
void printCPUState() {
uint64_t b = currentCPUState[0];
printf("%d\n", b); //this line DOESNT crash ???
std::cout << b << "\n"; //this line crashes
//omitted some code for the sake of brevity
std::cout << "\n";
}
CPU::CPU() {
//set initial cpu state
currentCPUState[AF] = 0;
currentCPUState[BC] = 0;
currentCPUState[DE] = 0;
currentCPUState[HL] = 0;
currentCPUState[SP] = 0;
currentCPUState[PC] = 0;
printCPUState(); //this has no issues
initInternalState(instructions, sizeof(void*));
}
//cpu.s
.section .data
.balign 8
instructionArr:
.space 8 * 1024, 0
//stores values of registers
//used for transitioning between C and ASM
//uint64_t currentCPUState[6]
.global currentCPUState
currentCPUState:
.quad 0, 0, 0, 0, 0, 0
.section .text
.global initInternalState
initInternalState:
push %rdi
push %rsi
mov %rcx, %rdi
mov %rdx, %rsi
push %R12
push %R13
push %R14
push %R15
call initGBCpu
pop %R15
pop %R14
pop %R13
pop %R12
pop %rsi
pop %rdi
ret
//omitted unimportant code
//initGBCpu(rdi: void* instructions, rsi:int size)
//function initializes the array of opcodes
initGBCpu:
pushq %rdx
//move each instruction into the array in proper order
//also fill the instructionArr
leaq instructionArr(%rip), %rdx
addop inst0x00
addop inst0x01
addop inst0x02
addop inst0x03
addop inst0x04
call loadCPUState
call inst0x04 //inc BC
call saveCPUState
call printCPUState //CRASHES HERE
popq %rdx
ret
Additional details:
OS: Windows 64 bit
Compiler (MinGW64-w)
Architecture: x64
Any insight would be much appreciated
Edit:
addop is a macro:
//adds an opcode to the array of functions
.macro addop lbl
leaq \lbl (%rip), %rcx
mov %rcx, 0(%rdi)
mov %rcx, 0(%rdx)
add %rsi, %rdi
add %rsi, %rdx
.endm
Some of x86-64 calling conventions require that the stack have to be alligned to 16-byte boundary before calling functions.
After functions are called, a 8-byte return address is pushed on the stack, so another 8-byte data have to be added to the stack to satisfy this allignment requirement. Otherwise, some instruction with allignment requirement (like some of the SSE instructions) may crash.
Assumign that such calling conventions are applied, the initGBCpu function looks OK, but the initInternalState function have to add one more 8-byte thing to the stack before calling the initInternalState function.
For example:
initInternalState:
push %rdi
push %rsi
mov %rcx, %rdi
mov %rdx, %rsi
push %R12
push %R13
push %R14
push %R15
sub $8, %rsp // adjust stack allignment
call initGBCpu
add $8, %rsp // undo the stack pointer movement
pop %R15
pop %R14
pop %R13
pop %R12
pop %rsi
pop %rdi
ret

i=j=0; or i=0; j=0; which is more efficient?

I'm new to programming (just after my first year of using C++ with addition of some other languages. I've come across a small dillema. Which of these codes is better? Even if only just a little.
i = j = 0;
And second solution
i = 0;
j = 0;
I'm using it in a 'for' loop. That's why this is important for me to know.
These pieces of code literally describe the same program. There is no difference between them beyond syntax.
Thus, there will be no performance difference at runtime.
Remember, your C++ code is a description of a program, not a sequence of instructions for a computer to perform. It's your compiler's job to create one of those, after reading and understanding your source code.
With modern compilers, these will not make a difference.
See here for a read-up on common compiler optimizations: https://queue.acm.org/detail.cfm?id=3372264
However, as #asteroids-with-wings rightly points out, these don't even come into play here.
What actually happens is very likely to be compiler-specific, but you can check what they create by looking at the assembly code.
Example code:
test.cpp:
int main(int argc, char **argv) {
int i, j;
i = j = 0;
}
test2.cpp:
int main(int argc, char **argv) {
int i, j;
i = 0;
j = 0;
}
I compiled them with the following options:
clang test.cpp -O0 -save-temps=obj -o test_exec
clang test2.cpp -O0 -save-temps=obj -o test_exec2
-O0 is to disable optimizations, -save-temps=obj will keep the generated assembly around for inspection.
This provides the following two assembly files:
test.s:
.text
.file "test.cpp"
.globl main # -- Begin function main
.p2align 4, 0x90
.type main,#function
main: # #main
.cfi_startproc
# %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
xorl %eax, %eax
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, -24(%rbp)
movl $0, -20(%rbp)
popq %rbp
.cfi_def_cfa %rsp, 8
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
.cfi_endproc
# -- End function
.ident "clang version 11.0.0"
.section ".note.GNU-stack","",#progbits
.addrsig
test2.s:
.text
.file "test2.cpp"
.globl main # -- Begin function main
.p2align 4, 0x90
.type main,#function
main: # #main
.cfi_startproc
# %bb.0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
xorl %eax, %eax
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, -20(%rbp)
movl $0, -24(%rbp)
popq %rbp
.cfi_def_cfa %rsp, 8
retq
.Lfunc_end0:
.size main, .Lfunc_end0-main
.cfi_endproc
# -- End function
.ident "clang version 11.0.0"
.section ".note.GNU-stack","",#progbits
.addrsig
As you can see in the diff:
2c2
< .file "test.cpp"
---
> .file "test2.cpp"
17d16
< movl $0, -24(%rbp)
18a18
> movl $0, -24(%rbp)
there is very little difference between the two codes.
The only real difference is in lines 17 + 18, where these two lines are swapped:
movl $0, -20(%rbp)
movl $0, -24(%rbp)
Even without optimization, the only difference here is the order in which the variables are initialized, otherwise the same thing happens.
Note: this holds true for your specific case of assigning a compile-time constant (0). Results may differ for using run-time values from other variables.
As always in performance questions: Investigate what your compiler does, and profile the result - there may not be a single true answer.
It depends!
Your compiler should be smart enough to handle that and in the end is able to produce the same assembler instructions from that!
To see what's going on you can use the assembler output of your compiler. for GCC that means using the '-S' option.
g++ -S assignment_efficiency.cpp
if you run that command with two different versions, you'll eventually find out that the 'separate variant' will result in one asm command less than the 'combined variant', thus you could name it 'more efficient' to some extend.
BUT
if you tell your compiler to optimize, using the -O option, you'll get the exact same asm instructions.
g++ -S -O3 assignment_efficiency.cpp
you can verify that by saving both variants to a separate file and run a diff on them. e.g.
diff single.s sep.s
diff single.o3.s sep.o3.s
Compilers try to optimize producted code.
So It probably depends on compiler.
There are ways to know like generating asm and read the output code.
for getting asm :
How do you get assembler output from C/C++ source in gcc?
and other way is to see the duration of a big loop with compiler.

Where is the string stored with functions that return a hard-coded string literal?

I've seen this in C/C++ code:
char * GetName()
{
return "Aurian";
}
What is exactly going on here under the hood? Where in memory is "Aurian" stored such that it survives when I leave the GetName() scope, AND I get a char * to it? I'm guessing it doesn't follow the same rules as say, returning an int. And how does this relate to
char * name = "Aurian";
Is this implementation dependant? Also, would GetName() just be compiled away to just "Aurian"?
This thread seems to suggest that some sort of jump table might be used for all string literals, for GCC anyway.
It looks like the string constants are stored in the read only part of data segment (along with other non-zero initialized static variables). Check the assembly!
I compile this
#include<stdio.h>
char * GetName()
{
return "Aurian";
}
int main()
{
printf("%s", GetName());
return 0;
}
and the assembly looks like
.section .rodata
.LC0:
.string "Aurian"
.text
.globl GetName
.type GetName, #function
GetName:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size GetName, .-GetName
.section .rodata
.LC1:
.string "%s"
.text
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $0, %eax
call GetName
movq %rax, %rsi
movl $.LC1, %edi
movl $0, %eax
call printf
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
It is used as the Exit code for the function, so it's storage location is operating system dependent. The C language don't specify these low level details. They vary from platform to platform.
In UNIX flavored systems, this value will be the return value of the wait() or waitpid() system call.
Until these functions are called, the return code is stored in the PID entry in the linux kernel.
As we know
global variables ,static variables ,Dynamic variables have Heap
storage
pointers and parameters of the function have Stack storage
constant variables are stored in the code itself(data segments)
so based on these types these variables are stored in stack until they are returned from the stack frame
In your case the function finds space on stack after stack calls stack frame to return implicitly
when they are returned, they are stored in the CPU registers ..it is possible that we may consume two or more CPU registers which is OS dependent

I have a core dump of an executable that was NOT built with debug symbols. Can I recover argv contents?

I have a core dump of an executable that was NOT built with debug symbols.
Can I recover argv contents to see what the command line was?
If I run gdb, I can see a backtrace, and I can navigate to the main() frame. Once there, is there a way to recover argv, without knowing its exact address?
I am on x86_x64 (Intel Xeon CPU) running a CEntOS Linux distro/kernel,
One reason I am hopeful is that the core dump seems to show a partial argv.
(The program is postgres, and when I load the core file, gdb prints a message that includes the postgres db-user name, client OP address, and first 10 characters of the query))
On x86_64 the arguments are passed in %rdi, %rsi, etc. registers (calling convention).
Therefore, when you step into the main frame, you should be able to:
(gdb) p $rdi # == argc
(gdb) p (char**) $rsi # == argv
(gdb) set $argv = (char**)$rsi
(gdb) set $i = 0
(gdb) while $argv[$i]
> print $argv[$i++]
> end
Unfortunately, GDB will not normally restore $rdi and $rsi when you switch frames. So this example doesn't work:
cat t.c
#include <stdlib.h>
int bar() { abort(); }
int foo() { return bar(); }
int main()
{
foo();
return 0;
}
gcc t.c && ./a.out
Aborted (core dumped)
gdb -q ./a.out core
Core was generated by `./a.out'.
Program terminated with signal 6, Aborted.
#0 0x00007fdc8284aa75 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
in ../nptl/sysdeps/unix/sysv/linux/raise.c
(gdb) bt
#0 0x00007fdc8284aa75 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007fdc8284e5c0 in *__GI_abort () at abort.c:92
#2 0x000000000040052d in bar ()
#3 0x000000000040053b in foo ()
#4 0x000000000040054b in main ()
(gdb) fr 4
#4 0x000000000040054b in main ()
(gdb) p $rdi
$1 = 5524 ### clearly not the right value
So you'll have to work some more ...
What you can do is use the knowledge of how Linux stack is set up at process startup, combined with the fact that GDB will restore stack pointer:
(gdb) set backtrace past-main
(gdb) bt
#0 0x00007ffff7a8da75 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff7a915c0 in *__GI_abort () at abort.c:92
#2 0x000000000040052d in bar ()
#3 0x000000000040053b in foo ()
#4 0x0000000000400556 in main ()
#5 0x00007ffff7a78c4d in __libc_start_main (main=<optimized out>, argc=<optimized out>, ubp_av=<optimized out>, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdad8) at libc-start.c:226
#6 0x0000000000400469 in _start ()
(gdb) frame 6
(gdb) disas
Dump of assembler code for function _start:
0x0000000000400440 <+0>: xor %ebp,%ebp
0x0000000000400442 <+2>: mov %rdx,%r9
0x0000000000400445 <+5>: pop %rsi
0x0000000000400446 <+6>: mov %rsp,%rdx
0x0000000000400449 <+9>: and $0xfffffffffffffff0,%rsp
0x000000000040044d <+13>: push %rax
0x000000000040044e <+14>: push %rsp
0x000000000040044f <+15>: mov $0x400560,%r8
0x0000000000400456 <+22>: mov $0x400570,%rcx
0x000000000040045d <+29>: mov $0x40053d,%rdi
0x0000000000400464 <+36>: callq 0x400428 <__libc_start_main#plt>
=> 0x0000000000400469 <+41>: hlt
0x000000000040046a <+42>: nop
0x000000000040046b <+43>: nop
End of assembler dump.
So now we expect the original %rsp to be $rsp+8 (one POP, two PUSHes), but it could be at $rsp+16 due to alignment that was done at instruction 0x0000000000400449
Let's see what's there ...
(gdb) x/8gx $rsp+8
0x7fffbe5d5e98: 0x000000000000001c 0x0000000000000004
0x7fffbe5d5ea8: 0x00007fffbe5d6eb8 0x00007fffbe5d6ec0
0x7fffbe5d5eb8: 0x00007fffbe5d6ec4 0x00007fffbe5d6ec8
0x7fffbe5d5ec8: 0x0000000000000000 0x00007fffbe5d6ecf
That looks promising: 4 (suspected argc), followed by 4 non-NULL pointers, followed by NULL.
Let's see if that pans out:
(gdb) x/s 0x00007fffbe5d6eb8
0x7fffbe5d6eb8: "./a.out"
(gdb) x/s 0x00007fffbe5d6ec0
0x7fffbe5d6ec0: "foo"
(gdb) x/s 0x00007fffbe5d6ec4
0x7fffbe5d6ec4: "bar"
(gdb) x/s 0x00007fffbe5d6ec8
0x7fffbe5d6ec8: "bazzzz"
Indeed, that's how I invoked the binary. As a final sanity check, does 0x00007fffbe5d6ecf look like part of the enovironment?
(gdb) x/s 0x00007fffbe5d6f3f
0x7fffbe5d6f3f: "SSH_AGENT_PID=2874"
Yep, that's the beginning (or the end) of the environment.
So there you have it.
Final notes: if GDB didn't print <optimized out> so much, we could have recovered argc and argv from frame #5. There is work on both GDB and GCC sides to make GDB print much less of "optimized out" ...
Also, when loading the core, my GDB prints:
Core was generated by `./a.out foo bar bazzzz'.
negating the need for this whole exercise. However, that only works for short command lines, while the solution above will work for any command line.