Debugging issue: why JE-command is "stuck"? - c++

I have multithreaded-app but need to mention in the very beginning that target CPU has 8 virtual cores and totally my worker pool consists of 7 threads where one thread is "stuck" in the runtime on a simple if-condition like:
if (enumerator::e1 == data_member || enumerator::e2 == data_member) {
return function_member();
}
// ...
enum class enumerator : char {
e1: 0,
e2: 'W' /// < 57
}
What I see if dig deeper:
0x883c90 <+0>: pushq %r15
0x883c92 <+2>: pushq %r14
0x883c94 <+4>: pushq %rbx
0x883c95 <+5>: subq $0x20, %rsp
0x883c99 <+9>: movq %rsi, %rbx
0x883c9c <+12>: movq %rdi, %r15
0x883c9f <+15>: movb 0x8(%r15), %al
0x883ca3 <+19>: cmpb $0x57, %al
-> 0x883ca5 <+21>: je 0x883cab
0x883ca7 <+23>: testb %al, %al
0x883ca9 <+25>: jne 0x883d08
0x883cab <+27>: movq (%rbx), %rax
This makes me confused: as far as I understand je is just a jump to 0x883cab which is never going to be happened because thread step-next|in|over do not lead to anything and even later lldb(after manual break by process interrupt) is saying that execution still at the same point.
I have also noticed that stop reason is "next-branch-location":
(lldb) thread select 3
* thread #3, name = 'myapp', stop reason = next-branch-location
...but not really sure what does this actually mean because was able to google just lldb repo where this reason is mentioned just once at /Target/ThreadPlanStepRange.cpp
Just in case:
(lldb) register read
General Purpose Registers:
rax = 0x0000000002c8c000
rbx = 0x00007fef312568b0
rcx = 0x0000000000000000
rdx = 0x00007fef31256988
rdi = 0x00000000025662a0
rsi = 0x00007fef312568b0
rbp = 0x0000000002579b48
rsp = 0x00007fef31256850
r8 = 0x0000000000000000
r9 = 0x00000000ffffffff
r10 = 0x0000000000000000
r11 = 0x0000000000000000
r12 = 0x00007fef31256920
r13 = 0x0000000002c5a030
r14 = 0x00007fef31256920
r15 = 0x00000000025662a0
rip = 0x0000000000883ca5
rflags = 0x0000000000000297
cs = 0x0000000000000033
fs = 0x0000000000000000
gs = 0x0000000000000000
ss = 0x000000000000002b
ds = 0x0000000000000000
es = 0x0000000000000000
I thought about thread starvation but totally my app use 8 threads and virtual machine on the Intel Ice Lake platform in the cloud configured exactly with 8 cores.
Happy to learn something new, thank you in advance.

Related

Segfault sharing array between assembly and C++

I am writing a program that has a shared state between assembly and C++. I declared a global array in the assembly file and accessed that array in a function within C++. When I call that function from within C++, there are no issues, but then I call that same function from within assembly and I get a segmentation fault. I believe I preserved the right registers across function calls.
Strangely, when I change the type of the pointer within C++ to a uint64_t pointer, it correctly outputs the values but then segmentation faults again after casting it to a uint64_t.
In the following code, the array which keeps giving me errors is currentCPUState.
//CPU.cpp
extern uint64_t currentCPUState[6];
extern "C" {
void initInternalState(void* instructions, int indexSize);
void printCPUState();
}
void printCPUState() {
uint64_t b = currentCPUState[0];
printf("%d\n", b); //this line DOESNT crash ???
std::cout << b << "\n"; //this line crashes
//omitted some code for the sake of brevity
std::cout << "\n";
}
CPU::CPU() {
//set initial cpu state
currentCPUState[AF] = 0;
currentCPUState[BC] = 0;
currentCPUState[DE] = 0;
currentCPUState[HL] = 0;
currentCPUState[SP] = 0;
currentCPUState[PC] = 0;
printCPUState(); //this has no issues
initInternalState(instructions, sizeof(void*));
}
//cpu.s
.section .data
.balign 8
instructionArr:
.space 8 * 1024, 0
//stores values of registers
//used for transitioning between C and ASM
//uint64_t currentCPUState[6]
.global currentCPUState
currentCPUState:
.quad 0, 0, 0, 0, 0, 0
.section .text
.global initInternalState
initInternalState:
push %rdi
push %rsi
mov %rcx, %rdi
mov %rdx, %rsi
push %R12
push %R13
push %R14
push %R15
call initGBCpu
pop %R15
pop %R14
pop %R13
pop %R12
pop %rsi
pop %rdi
ret
//omitted unimportant code
//initGBCpu(rdi: void* instructions, rsi:int size)
//function initializes the array of opcodes
initGBCpu:
pushq %rdx
//move each instruction into the array in proper order
//also fill the instructionArr
leaq instructionArr(%rip), %rdx
addop inst0x00
addop inst0x01
addop inst0x02
addop inst0x03
addop inst0x04
call loadCPUState
call inst0x04 //inc BC
call saveCPUState
call printCPUState //CRASHES HERE
popq %rdx
ret
Additional details:
OS: Windows 64 bit
Compiler (MinGW64-w)
Architecture: x64
Any insight would be much appreciated
Edit:
addop is a macro:
//adds an opcode to the array of functions
.macro addop lbl
leaq \lbl (%rip), %rcx
mov %rcx, 0(%rdi)
mov %rcx, 0(%rdx)
add %rsi, %rdi
add %rsi, %rdx
.endm
Some of x86-64 calling conventions require that the stack have to be alligned to 16-byte boundary before calling functions.
After functions are called, a 8-byte return address is pushed on the stack, so another 8-byte data have to be added to the stack to satisfy this allignment requirement. Otherwise, some instruction with allignment requirement (like some of the SSE instructions) may crash.
Assumign that such calling conventions are applied, the initGBCpu function looks OK, but the initInternalState function have to add one more 8-byte thing to the stack before calling the initInternalState function.
For example:
initInternalState:
push %rdi
push %rsi
mov %rcx, %rdi
mov %rdx, %rsi
push %R12
push %R13
push %R14
push %R15
sub $8, %rsp // adjust stack allignment
call initGBCpu
add $8, %rsp // undo the stack pointer movement
pop %R15
pop %R14
pop %R13
pop %R12
pop %rsi
pop %rdi
ret

GDB can't create a breakpoint [duplicate]

This question already has an answer here:
Cannot insert breakpoints. Addresses with low values
(1 answer)
Closed 4 years ago.
I am working on implementing a simple stack overflow, which I am examining with gdb. A problem I keep coming up with is gdb not accepting my breakpoints. My c code is quite simple:
void function(int a, int b, int c) {
...//stuff
}
void main() {
int x;
x = 0;
function(1,2,3);
x = 1;
printf("%d\n",x);
}
And i'm using gcc -m32 -fno-stack-protector -o example3test example3test.c to complie it.
I have tried just setting a simple breakpoint on the line <+42> just to test if it works.
(gdb) disass main
Dump of assembler code for function main:
0x000005d1 <+0>: lea 0x4(%esp),%ecx
0x000005d5 <+4>: and $0xfffffff0,%esp
0x000005d8 <+7>: pushl -0x4(%ecx)
0x000005db <+10>: push %ebp
0x000005dc <+11>: mov %esp,%ebp
0x000005de <+13>: push %ebx
0x000005df <+14>: push %ecx
0x000005e0 <+15>: sub $0x10,%esp
0x000005e3 <+18>: call 0x470 <__x86.get_pc_thunk.bx>
0x000005e8 <+23>: add $0x1a18,%ebx
0x000005ee <+29>: movl $0x0,-0xc(%ebp)
0x000005f5 <+36>: push $0x3
0x000005f7 <+38>: push $0x2
0x000005f9 <+40>: push $0x1
0x000005fb <+42>: call 0x5a0 <function>
0x00000600 <+47>: add $0xc,%esp
0x00000603 <+50>: movl $0x1,-0xc(%ebp)
0x0000060a <+57>: sub $0x8,%esp
0x0000060d <+60>: pushl -0xc(%ebp)
0x00000610 <+63>: lea -0x1950(%ebx),%eax
0x00000616 <+69>: push %eax
0x00000617 <+70>: call 0x400 <printf#plt>
0x0000061c <+75>: add $0x10,%esp
0x0000061f <+78>: nop
0x00000620 <+79>: lea -0x8(%ebp),%esp
0x00000623 <+82>: pop %ecx
0x00000624 <+83>: pop %ebx
0x00000625 <+84>: pop %ebp
0x00000626 <+85>: lea -0x4(%ecx),%esp
0x00000629 <+88>: ret
End of assembler dump.
(gdb) break *0x000005fb
Breakpoint 1 at 0x5fb
(gdb) run
Starting program: /home/jasmine/tutorials/smashingTheStackForFun/example3test
Warning:
Cannot insert breakpoint 1.
Cannot access memory at address 0x5fb
I'm lost as to why it won't accept this breakpoint. Most of the answers already on here involve not using the * or using wrong notation, from what I can see mine looks right, but I could be wrong.
I'm lost as to why it won't accept this breakpoint.
You have a position independent executable, which is relocated to a different address at runtime.
This will work:
(gdb) start
# GDB stops at main
(gdb) break *&main+42
(gdb) continue
See also this answer.

Cause of EXC_BREAKPOINT crash

I've got a crash occurring on some users computers in a C++ audiounit component running inside Logic X. I can't repeat it locally unfortunately and in the process of trying to work out how it might occur I've got some questions.
Here's the relevant info from the crash dump:
Exception Type: EXC_BREAKPOINT (SIGTRAP)
Exception Codes: 0x0000000000000001, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Signal: Trace/BPT trap: 5
Termination Reason: Namespace SIGNAL, Code 0x5
Terminating Process: exc handler [0]
The questions are:
What might cause a EXC_BREAKPOINT in the situation I'm looking at. Is this information from Apple complete and accurate: "Similar to an Abnormal Exit, this exception is intended to give an attached debugger the chance to interrupt the process at a specific point in its execution. You can trigger this exception from your own code using the __builtin_trap() function. If no debugger is attached, the process is terminated and a crash report is generated."
Why would it occur on SharedObject + 200 (see disassembly)
Is RBX the 'this' pointer at the moment the crash occurs.
The crash occurs here:
juce::ValueTree::SharedObject::SharedObject(juce::ValueTree::SharedObject const&) + 200
The C++ is as follows:
SharedObject (const SharedObject& other)
: ReferenceCountedObject(),
type (other.type), properties (other.properties), parent (nullptr)
{
for (int i = 0; i < other.children.size(); ++i)
{
SharedObject* const child = new SharedObject (*other.children.getObjectPointerUnchecked(i));
child->parent = this;
children.add (child);
}
}
The disassembly:
-> 0x127167950 <+0>: pushq %rbp
0x127167951 <+1>: movq %rsp, %rbp
0x127167954 <+4>: pushq %r15
0x127167956 <+6>: pushq %r14
0x127167958 <+8>: pushq %r13
0x12716795a <+10>: pushq %r12
0x12716795c <+12>: pushq %rbx
0x12716795d <+13>: subq $0x18, %rsp
0x127167961 <+17>: movq %rsi, %r12
0x127167964 <+20>: movq %rdi, %rbx
0x127167967 <+23>: leaq 0x589692(%rip), %rax ; vtable for juce::ReferenceCountedObject + 16
0x12716796e <+30>: movq %rax, (%rbx)
0x127167971 <+33>: movl $0x0, 0x8(%rbx)
0x127167978 <+40>: leaq 0x599fe9(%rip), %rax ; vtable for juce::ValueTree::SharedObject + 16
0x12716797f <+47>: movq %rax, (%rbx)
0x127167982 <+50>: leaq 0x10(%rbx), %rdi
0x127167986 <+54>: movq %rdi, -0x30(%rbp)
0x12716798a <+58>: leaq 0x10(%r12), %rsi
0x12716798f <+63>: callq 0x12711cf70 ; juce::Identifier::Identifier(juce::Identifier const&)
0x127167994 <+68>: leaq 0x18(%rbx), %rdi
0x127167998 <+72>: movq %rdi, -0x38(%rbp)
0x12716799c <+76>: leaq 0x18(%r12), %rsi
0x1271679a1 <+81>: callq 0x12711c7b0 ; juce::NamedValueSet::NamedValueSet(juce::NamedValueSet const&)
0x1271679a6 <+86>: movq $0x0, 0x30(%rbx)
0x1271679ae <+94>: movl $0x0, 0x38(%rbx)
0x1271679b5 <+101>: movl $0x0, 0x40(%rbx)
0x1271679bc <+108>: movq $0x0, 0x48(%rbx)
0x1271679c4 <+116>: movl $0x0, 0x50(%rbx)
0x1271679cb <+123>: movl $0x0, 0x58(%rbx)
0x1271679d2 <+130>: movq $0x0, 0x60(%rbx)
0x1271679da <+138>: cmpl $0x0, 0x40(%r12)
0x1271679e0 <+144>: jle 0x127167aa2 ; <+338>
0x1271679e6 <+150>: xorl %r14d, %r14d
0x1271679e9 <+153>: nopl (%rax)
0x1271679f0 <+160>: movl $0x68, %edi
0x1271679f5 <+165>: callq 0x12728c232 ; symbol stub for: operator new(unsigned long)
0x1271679fa <+170>: movq %rax, %r13
0x1271679fd <+173>: movq 0x30(%r12), %rax
0x127167a02 <+178>: movq (%rax,%r14,8), %rsi
0x127167a06 <+182>: movq %r13, %rdi
0x127167a09 <+185>: callq 0x127167950 ; <+0>
0x127167a0e <+190>: movq %rbx, 0x60(%r13) // MY NOTES: child->parent = this
0x127167a12 <+194>: movl 0x38(%rbx), %ecx
0x127167a15 <+197>: movl 0x40(%rbx), %eax
0x127167a18 <+200>: cmpl %eax, %ecx
Update 1:
It looks like RIP is suggesting we are in the middle of the 'add' call which is this function, inlined:
/** Appends a new object to the end of the array.
This will increase the new object's reference count.
#param newObject the new object to add to the array
#see set, insert, addIfNotAlreadyThere, addSorted, addArray
*/
ObjectClass* add (ObjectClass* const newObject) noexcept
{
data.ensureAllocatedSize (numUsed + 1);
jassert (data.elements != nullptr);
data.elements [numUsed++] = newObject;
if (newObject != nullptr)
newObject->incReferenceCount();
return newObject;
}
Update 2:
At the point of crash register values of relevant registers:
this == rbx: 0x00007fe5bc37c950
&other == r12: 0x00007fe5bc348cc0
rax = 0
rcx = 0
There may be a few problems in this code:
like SM mentioned, other.children.getObjectPointerUnchecked(i) could return nullptr
in ObjectClass* add (ObjectClass* const newObject) noexcept, you check if newObject isn't null before calling incReferenceCount (which means a null may occur in this method call), but you don't null-check before adding this object during data.elements [numUsed++] = newObject;, so you may have a nullptr here, and if you call this array somewhere else without checking, you may have a crash.
we don't really know the type of data.elements (I assume an array), but there could be a copy operation occuring because of the pointer assignment (if the pointer is converted to object, if data.elements is not an array of pointer or there is some operator overload), there could be a crash here (but it's unlikely)
there is a circular ref between children and parent, so there may be a problem during object destruction
I suspect that the problem is that a shared, ref-counted object that should have been placed on the heap was inadvertently allocated on the stack. If the stack unwinds and is overwritten afterwards, and a reference to the ref-counted object still exists, then random things will happen at unpredictable times when that reference is accessed. (The proximity in address-space of this and &other also makes me suspect that both are on the stack.)

why there is difference in address of a function while using gdb break and gdb print?

When i execute the following commands i get different address of function()
(gdb) break function()
Breakpoint 1 at function() 0x804834a.
(gdb) print function()
Breakpoint 1 at function() 0x8048344.
Why there is difference in both address?
This output can't be correct, it would be if you did something as:
int func(void) {
int a = 10;
printf("%d\n", a);
return 1;
}
after loading it into the gdb:
(gdb) p func
$1 = {int (void)} 0x4016b0 <func>
(gdb) b func
Breakpoint 1 at 0x4016b6: file file.c, line 4.
(gdb) disassemble func
Dump of assembler code for function func:
0x004016b0 <+0>: push %ebp
0x004016b1 <+1>: mov %esp,%ebp
0x004016b3 <+3>: sub $0x28,%esp
0x004016b6 <+6>: movl $0xa,-0xc(%ebp)
0x004016bd <+13>: mov -0xc(%ebp),%eax
0x004016c0 <+16>: mov %eax,0x4(%esp)
0x004016c4 <+20>: movl $0x405064,(%esp)
0x004016cb <+27>: call 0x403678 <printf>
0x004016d0 <+32>: mov $0x1,%eax
0x004016d5 <+37>: leave
0x004016d6 <+38>: ret
End of assembler dump.
(gdb)
Here func points to the exact first instruction in the function, push %ebp, but when you setup a break point, gdb sets it after stack frame initialization instructions:
0x004016b0 <+0>: push %ebp
0x004016b1 <+1>: mov %esp,%ebp
0x004016b3 <+3>: sub $0x28,%esp
at where the instructions of the function actually begins:
=> 0x004016b6 <+6>: movl $0xa,-0xc(%ebp)
0x004016bd <+13>: mov -0xc(%ebp),%eax
0x004016c0 <+16>: mov %eax,0x4(%esp)
0x004016c4 <+20>: movl $0x405064,(%esp)
0x004016cb <+27>: call 0x403678 <printf>
0x004016d0 <+32>: mov $0x1,%eax
0x004016d5 <+37>: leave
0x004016d6 <+38>: ret
here this instruction:
movl $0xa,-0xc(%ebp) ; 0xa = 10
is this part:
int a = 10;
Gdb sets a breakpoint after function prologue, as before the things are properly set up it could not show the expected state like local variables, etc.
Break therefor sets breakpoint and prints address of first instruction after prologue, whereas print prints the address of actual first instruction in function.
You can set a breakpoint to actual first instruction by doing break *0x8048344, then observe the value of local variables there and after prologue.

OSX 64 bit C++ DIsassembly line by line

I have been reading through the following series of articles: http://www.altdevblogaday.com/2011/11/09/a-low-level-curriculum-for-c-and-c
The disassembled code shown and the disassembled code I am managing to produce whilst running the same code vary quite significantly and I lack the understanding to explain the differences.
Is there anyone that can step through it line by line and perhaps explain what it's doing at each step ? I get the feeling from the searching around I have done that the first few lines have something to do with frame pointers, there also seems to be a few extra lines in my disassembled code that ensures registers are empty before placing new values into them (absent from the code in the article)
I am running this on OSX (original author is using Windows) using the g++ compiler from within XCode 4. I am really clueless as to weather or not these variances are due to the OS, the architecture (32 bit vs 64 bit maybe?) or the compiler itself. It could even be the code I guess - mine is wrapped inside the main function declaration whereas the original code makes no mention of this.
My code:
int main(int argc, const char * argv[])
{
int x = 1;
int y = 2;
int z = 0;
z = x + y;
}
My disassembled code:
0x100000f40: pushq %rbp
0x100000f41: movq %rsp, %rbp
0x100000f44: movl $0, %eax
0x100000f49: movl %edi, -4(%rbp)
0x100000f4c: movq %rsi, -16(%rbp)
0x100000f50: movl $1, -20(%rbp)
0x100000f57: movl $2, -24(%rbp)
0x100000f5e: movl $0, -28(%rbp)
0x100000f65: movl -20(%rbp), %edi
0x100000f68: addl -24(%rbp), %edi
0x100000f6b: movl %edi, -28(%rbp)
0x100000f6e: popq %rbp
0x100000f6f: ret
The disassembled code from the original article:
mov dword ptr [ebp-8],1
mov dword ptr [ebp-14h],2
mov dword ptr [ebp-20h],0
mov eax, dword ptr [ebp-8]
add eax, dword ptr [ebp-14h]
mov dword ptr [ebp-20h],eax
A full line by line breakdown would be extremely enlightening but any help in understanding this would be appreciated.
All of the code from the original article is in your code, there's just some extra stuff around it. This:
0x100000f50: movl $1, -20(%rbp)
0x100000f57: movl $2, -24(%rbp)
0x100000f5e: movl $0, -28(%rbp)
0x100000f65: movl -20(%rbp), %edi
0x100000f68: addl -24(%rbp), %edi
0x100000f6b: movl %edi, -28(%rbp)
Corresponds directly to the 6 instructions talked about in the article.
There are two major differences between your disassembled code and the article's code.
One is that the article is using the Intel assembler syntax, while your disassembled code is using the traditional Unix/AT&T assembler syntax. Some differences between the two are documented on Wikipedia.
The other difference is that the article omits the function prologue, which sets up the stack frame, and the function epilogue, which destroys the stack frame and returns to the caller. The program he's disassembling has to contain instructions to do those things, but his disassembler isn't showing them. (Actually the stack frame could and probably would be omitted if the optimizer were enabled, but it's clearly not enabled.)
There are also some minor differences: your code is using a slightly different layout for local variables, and your code is computing the sum in a different register.
On the Mac, g++ doesn't support emitting Intel mnemonics, but clang does:
:; clang -S -mllvm --x86-asm-syntax=intel t.c
:; cat t.s
.section __TEXT,__text,regular,pure_instructions
.globl _main
.align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
push RBP
Ltmp2:
.cfi_def_cfa_offset 16
Ltmp3:
.cfi_offset rbp, -16
mov RBP, RSP
Ltmp4:
.cfi_def_cfa_register rbp
mov EAX, 0
mov DWORD PTR [RBP - 4], EDI
mov QWORD PTR [RBP - 16], RSI
mov DWORD PTR [RBP - 20], 1
mov DWORD PTR [RBP - 24], 2
mov DWORD PTR [RBP - 28], 0
mov EDI, DWORD PTR [RBP - 20]
add EDI, DWORD PTR [RBP - 24]
mov DWORD PTR [RBP - 28], EDI
pop RBP
ret
.cfi_endproc
.subsections_via_symbols
If you add the -g flag, the compiler will add debug information including source filenames and line numbers. It's too big to put here in its entirety, but this is the relevant part:
.loc 1 4 14 prologue_end ## t.c:4:14
Ltmp5:
mov DWORD PTR [RBP - 20], 1
.loc 1 5 14 ## t.c:5:14
mov DWORD PTR [RBP - 24], 2
.loc 1 6 14 ## t.c:6:14
mov DWORD PTR [RBP - 28], 0
.loc 1 8 5 ## t.c:8:5
mov EDI, DWORD PTR [RBP - 20]
add EDI, DWORD PTR [RBP - 24]
mov DWORD PTR [RBP - 28], EDI
First of all, the assembler listed as "from original article" is using "Intel" syntax, where the "disassembled output" in your post is "AT&T syntax". This explains the order of arguments to instructions being "back to front" [let's not argue about which is right or wrong, ok?], and register names are prefixed by a %, constants prefixed by $. There is also a difference in how memory locations/offsets to registers are referenced - dword ptr [reg+offs] in Intel assembler translates to l as a suffix on the instruction, and offs(%reg).
The 32-bit vs. 64-bit renames some of the registers - %rbp is the same as ebp in the article code.
The actual offsets (e.g -20) are different partly because the registers are bigger in 64-bit, but also because you have argc and argv as part of your function arguments, which is stored as part of the start of the function - I have a feeling the original article is actually disassembling a different function than main.