Related
I have a C++ function which has many return statements at various places. How to set a breakpoint at the return statement where the function actually returns ?
And what does "break" command without argument means?
Contrary to answers so far, most compilers will create a single return assembly instruction, regardless of how many return statements are in the function (it is convenient for the compiler to do that, so there is only a single place to perform all the stack frame cleanup).
If you wanted to stop on that instruction, all you have to do is disas and look for retq (or whatever the return instruction for your processor is), and set a breakpoint on it. For example:
int foo(int x)
{
switch(x) {
case 1: return 2;
case 2: return 3;
default: return 42;
}
}
int main()
{
return foo(0);
}
(gdb) disas foo
Dump of assembler code for function foo:
0x0000000000400448 <+0>: push %rbp
0x0000000000400449 <+1>: mov %rsp,%rbp
0x000000000040044c <+4>: mov %edi,-0x4(%rbp)
0x000000000040044f <+7>: mov -0x4(%rbp),%eax
0x0000000000400452 <+10>: mov %eax,-0xc(%rbp)
0x0000000000400455 <+13>: cmpl $0x1,-0xc(%rbp)
0x0000000000400459 <+17>: je 0x400463 <foo+27>
0x000000000040045b <+19>: cmpl $0x2,-0xc(%rbp)
0x000000000040045f <+23>: je 0x40046c <foo+36>
0x0000000000400461 <+25>: jmp 0x400475 <foo+45>
0x0000000000400463 <+27>: movl $0x2,-0x8(%rbp)
0x000000000040046a <+34>: jmp 0x40047c <foo+52>
0x000000000040046c <+36>: movl $0x3,-0x8(%rbp)
0x0000000000400473 <+43>: jmp 0x40047c <foo+52>
0x0000000000400475 <+45>: movl $0x2a,-0x8(%rbp)
0x000000000040047c <+52>: mov -0x8(%rbp),%eax
0x000000000040047f <+55>: leaveq
0x0000000000400480 <+56>: retq
End of assembler dump.
(gdb) b *0x0000000000400480
Breakpoint 1 at 0x400480
(gdb) r
Breakpoint 1, 0x0000000000400480 in foo ()
(gdb) p $rax
$1 = 42
You can use reverse debugging to find out where function actually returns. Finish executing current frame, do reverse-step and then you should stop at just returned statement.
(gdb) record
(gdb) fin
(gdb) reverse-step
Break on all retq of current function
This Python command puts a breakpoint on every retq instruction of the current function:
class BreakReturn(gdb.Command):
def __init__(self):
super().__init__(
'break-return',
gdb.COMMAND_RUNNING,
gdb.COMPLETE_NONE,
False
)
def invoke(self, arg, from_tty):
frame = gdb.selected_frame()
# TODO make this work if there is no debugging information, where .block() fails.
block = frame.block()
# Find the function block in case we are in an inner block.
while block:
if block.function:
break
block = block.superblock
start = block.start
end = block.end
arch = frame.architecture()
pc = gdb.selected_frame().pc()
instructions = arch.disassemble(start, end - 1)
for instruction in instructions:
if instruction['asm'].startswith('retq '):
gdb.Breakpoint('*{}'.format(instruction['addr']))
BreakReturn()
Source it with:
source gdb.py
and use the command as:
break-return
continue
You should now be at retq.
Step until retq
Just for fun, another implementation that stops when a retq is found (less efficient of because no hardware support):
class ContinueReturn(gdb.Command):
def __init__(self):
super().__init__(
'continue-return',
gdb.COMMAND_RUNNING,
gdb.COMPLETE_NONE,
False
)
def invoke(self, arg, from_tty):
thread = gdb.inferiors()[0].threads()[0]
while thread.is_valid():
gdb.execute('ni', to_string=True)
frame = gdb.selected_frame()
arch = frame.architecture()
pc = gdb.selected_frame().pc()
instruction = arch.disassemble(pc)[0]['asm']
if instruction.startswith('retq '):
break
ContinueReturn()
This will ignore your other breakpoints. TODO: can be avoided?
Not sure if it is faster or slower than reverse-step.
A version that stops at a given opcode can be found at: https://stackoverflow.com/a/31249378/895245
break without arguments stops execution at the next instruction in the currently selected stack frame. You select strack frames via the frame or up and down commands. If you want to debug the point where you are actually leaving the current function, select the next outer frame and break there.
rr reverse debugging
Similar to GDB record mentioned at https://stackoverflow.com/a/3649698/895245 , but much more functional as of GDB 7.11 vs rr 4.1.0 in Ubuntu 16.04.
Notably, it deals with AVX correctly:
gdb reverse debugging fails with "Process record does not support instruction 0xf0d at address"
"target record-full" in gdb makes "n" command fail on printf with "Process record does not support instruction 0xc5 at address 0x7ffff7dee6e7"?
which prevents it from working with the default standard library calls.
Install Ubuntu 16.04:
sudo apt-get install rr linux-tools-common linux-tools-generic linux-cloud-tools-generic
sudo cpupower frequency-set -g performance
But also consider compiling from source to get the latest updates, it was not hard.
Test program:
int where_return(int i) {
if (i)
return 1;
else
return 0;
}
int main(void) {
where_return(0);
where_return(1);
}
compile and run:
gcc -O0 -ggdb3 -o reverse.out -std=c89 -Wextra reverse.c
rr record ./reverse.out
rr replay
Now you are left inside a GDB session, and you can properly reverse debug:
(rr) break main
Breakpoint 1 at 0x56057c458619: file a.c, line 9.
(rr) continue
Continuing.
Breakpoint 1, main () at a.c:9
9 where_return(0);
(rr) step
where_return (i=0) at a.c:2
2 if (i)
(rr) finish
Run till exit from #0 where_return (i=0) at a.c:2
main () at a.c:10
10 where_return(1);
Value returned is $1 = 0
(rr) reverse-step
where_return (i=0) at a.c:6
6 }
(rr) reverse-step
5 return 0;
We are now on the correct return line.
If you can change the source code, you might use some dirty trick with the preprocessor:
void on_return() {
}
#define return return on_return(), /* If the function has a return value != void */
#define return return on_return() /* If the function has a return value == void */
/* <<<-- Insert your function here -->>> */
#undef return
Then set a breakpoint to on_return and go one frame up.
Attention: This will not work, if a function does not return via a return statement. So ensure, that it's last line is a return.
Example (shamelessly copied from C code, but will work also in C++):
#include <stdio.h>
/* Dummy function to place the breakpoint */
void on_return(void) {
}
#define return return on_return()
void myfun1(int a) {
if (a > 10) return;
printf("<10\n");
return;
}
#undef return
#define return return on_return(),
int myfun2(int a) {
if (a < 0) return -1;
if (a > 0) return 1;
return 0;
}
#undef return
int main(void)
{
myfun1(1);
myfun2(2);
}
The first macro will change
return;
to
return on_return();
Which is valid, since on_return also returns void.
The second macro will change
return -1;
to
return on_return(), -1;
Which will call on_return() and then return -1 (thanks to the ,-operator).
This is a very dirty trick, but despite using backwards-stepping, it will work in multi-threaded environments and inlined functions, too.
Break without argument sets a breakpoint at the current line.
There is no way for a single breakpoint to catch all return paths. Either set a breakpoint at the caller immediately after it returns, or break at all return statements.
Since this is C++, I suppose you could create a local sentry object, and break on its destructor, though.
I have the following code:
#include <windows.h>
#include <minidumpapiset.h>
#include <strsafe.h>
#include <fileapi.h>
#include <iostream>
#include <signal.h>
#include <minwinbase.h>
#include <new.h>
#include "StackWalker.h"
int minidumpId = 0;
#ifndef _AddressOfReturnAddress
// Taken from: http://msdn.microsoft.com/en-us/library/s975zw7k(VS.71).aspx
#ifdef __cplusplus
#define EXTERNC extern "C"
#else
#define EXTERNC
#endif
// _ReturnAddress and _AddressOfReturnAddress should be prototyped before use
EXTERNC void* _AddressOfReturnAddress(void);
EXTERNC void* _ReturnAddress(void);
EXTERNC int __cdecl _purecall();
#endif
EXCEPTION_POINTERS ExceptionPointers;
EXCEPTION_RECORD ExceptionRecord;
CONTEXT ContextRecord;
void GetExceptionPointers(DWORD exceptionCode, EXCEPTION_POINTERS** exceptionPointers)
{
// The following code was taken from VC++ 8.0 CRT (invarg.c: line 104)
ZeroMemory(&ExceptionPointers, sizeof(EXCEPTION_POINTERS));
ZeroMemory(&ExceptionRecord, sizeof(EXCEPTION_RECORD));
ZeroMemory(&ContextRecord, sizeof(CONTEXT));
// Looks like a workaround for some bug in RtlCaptureContext. But no description.
#ifdef _X86_
__asm {
mov dword ptr[ContextRecord.Eax], eax
mov dword ptr[ContextRecord.Ecx], ecx
mov dword ptr[ContextRecord.Edx], edx
mov dword ptr[ContextRecord.Ebx], ebx
mov dword ptr[ContextRecord.Esi], esi
mov dword ptr[ContextRecord.Edi], edi
mov word ptr[ContextRecord.SegSs], ss
mov word ptr[ContextRecord.SegCs], cs
mov word ptr[ContextRecord.SegDs], ds
mov word ptr[ContextRecord.SegEs], es
mov word ptr[ContextRecord.SegFs], fs
mov word ptr[ContextRecord.SegGs], gs
pushfd
pop[ContextRecord.EFlags]
}
ContextRecord.ContextFlags = CONTEXT_CONTROL;
#pragma warning(push)
#pragma warning(disable : 4311)
ContextRecord.Eip = (ULONG)_ReturnAddress();
ContextRecord.Esp = (ULONG)_AddressOfReturnAddress();
#pragma warning(pop)
ContextRecord.Ebp = *(static_cast<ULONG*>(_AddressOfReturnAddress()) - 1);
#elif defined(_IA64_) || defined(_AMD64_) || defined(_ARM_) || defined(_ARM64_)
CaptureContext(&ContextRecord);
#else /* defined (_IA64_) || defined (_AMD64_) || defined(_ARM_) || defined(_ARM64_) */
ZeroMemory(&ContextRecord, sizeof(ContextRecord));
#endif /* defined (_IA64_) || defined (_AMD64_) || defined(_ARM_) || defined(_ARM64_) */
ExceptionRecord.ExceptionCode = exceptionCode;
ExceptionRecord.ExceptionAddress = _ReturnAddress();
ExceptionRecord.ExceptionFlags = EXCEPTION_NONCONTINUABLE;
*exceptionPointers = &ExceptionPointers;
(*exceptionPointers)->ExceptionRecord = &ExceptionRecord;
(*exceptionPointers)->ContextRecord = &ContextRecord;
}
class DbgLibrary final
{
public:
DbgLibrary()
{
dbgLibrary = LoadLibraryW(L"dbghelp.dll");
}
~DbgLibrary()
{
FreeLibrary(dbgLibrary);
}
explicit operator bool() const
{
return dbgLibrary != NULL;
}
bool WriteMinidump(HANDLE file, EXCEPTION_POINTERS* exceptionPointers) const
{
MINIDUMP_EXCEPTION_INFORMATION exceptionInformation;
exceptionInformation.ThreadId = GetCurrentThreadId();
exceptionInformation.ExceptionPointers = exceptionPointers;
exceptionInformation.ClientPointers = FALSE;
MINIDUMP_CALLBACK_INFORMATION callbackInformation;
callbackInformation.CallbackRoutine = NULL;
callbackInformation.CallbackParam = NULL;
typedef BOOL(WINAPI* LPMINIDUMPWRITEDUMP)(HANDLE processHandle, DWORD ProcessId, HANDLE fileHandle,
MINIDUMP_TYPE DumpType, CONST PMINIDUMP_EXCEPTION_INFORMATION ExceptionParam,
CONST PMINIDUMP_USER_STREAM_INFORMATION UserEncoderParam,
CONST PMINIDUMP_CALLBACK_INFORMATION CallbackParam);
LPMINIDUMPWRITEDUMP pfnMiniDumpWriteDump =
(LPMINIDUMPWRITEDUMP)GetProcAddress(dbgLibrary, "MiniDumpWriteDump");
if (NULL == pfnMiniDumpWriteDump)
{
return false;
}
BOOL isWriteSucceed = pfnMiniDumpWriteDump(GetCurrentProcess(), GetCurrentProcessId(), file, MiniDumpNormal,
&exceptionInformation, NULL, &callbackInformation);
return isWriteSucceed;
}
private:
HMODULE dbgLibrary;
};
inline HANDLE CreateNativeFile(const wchar_t* filePath)
{
HANDLE file = NULL;
file = CreateFileW(filePath, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
return file;
}
void CreateMiniDump(PEXCEPTION_POINTERS exceptionPointers)
{
const DbgLibrary dbgLibrary;
if (dbgLibrary)
{
wchar_t FILE_PATH[4096];
// Write `exceptionPointers` to the minidump file
StringCbPrintfW(FILE_PATH, sizeof(FILE_PATH), L"%ls\\%ls_%ld.dmp", ".",
L"minidump", minidumpId++);
HANDLE hMinidump = CreateNativeFile(FILE_PATH);
if (hMinidump != INVALID_HANDLE_VALUE)
{
dbgLibrary.WriteMinidump(hMinidump, exceptionPointers);
CloseHandle(hMinidump);
}
}
}
LONG WINAPI SehHandler(PEXCEPTION_POINTERS exceptionPointers)
{
std::cerr << "SehHandler\n";
CreateMiniDump(exceptionPointers);
return EXCEPTION_EXECUTE_HANDLER;
}
void SigsegvHandler(int)
{
std::cerr << "SigsegvHandler\n";
PEXCEPTION_POINTERS exceptionPointers = static_cast<PEXCEPTION_POINTERS>(_pxcptinfoptrs);
// Write minidump file
CreateMiniDump(exceptionPointers);
}
int __cdecl NewHandler(size_t size)
{
std::cerr << "NewHandler\n";
// 'new' operator memory allocation exception
PEXCEPTION_POINTERS exceptionPointers;
GetExceptionPointers(STATUS_NO_MEMORY, &exceptionPointers);
CreateMiniDump(exceptionPointers);
return 0;
}
struct A5 {
void F()
{
while (true)
{
int* a = new int[50000000];
}
}
};
struct A4 {
A5 a;
void F()
{
a.F();
}
};
struct A3 {
A4 a;
void F()
{
a.F();
}
};
struct A2 {
A3 a;
void F()
{
a.F();
}
};
struct A1 {
A2 a;
void F()
{
a.F();
}
};
int main()
{
SetUnhandledExceptionFilter(SehHandler);
signal(SIGSEGV, SigsegvHandler);
_set_new_handler(NewHandler);
A1().F();
return 0;
}
Here two handlers would be invoked: NewHandler and SehHandler. The first one because of bad_alloc in operator new[], the second one because of unhandled exception. In both handlers I create minidump with information about crash.
NewHandler:
Thread 0 (crashed)
0 StackWalker_VC2017.exe!_callnewh [new_handler.cpp : 79 + 0x2]
eip = 0x0040a636 esp = 0x0019fefc ebp = 0x0019ff08 ebx = 0x00311000
esi = 0x00401d10 edi = 0x00655368 eax = 0x0042eed0 ecx = 0x00000000
edx = 0x00655368 efl = 0x00000202
Found by: given as instruction pointer in context
1 StackWalker_VC2017.exe!operator new(unsigned int) [new_scalar.cpp : 40 + 0x8]
eip = 0x00404a05 esp = 0x0019ff10 ebp = 0x0019ff14
Found by: call frame info
2 StackWalker_VC2017.exe!A5::F() [main.cpp : 197 + 0xa]
eip = 0x00401d0a esp = 0x0019ff1c ebp = 0x0019ff28
Found by: call frame info
3 StackWalker_VC2017.exe!main [main.cpp : 239 + 0x8]
eip = 0x00402500 esp = 0x0019ff24 ebp = 0x0019ff28
Found by: call frame info
4 StackWalker_VC2017.exe!static int __scrt_common_main_seh() [exe_common.inl : 288 + 0x1c]
eip = 0x00404c5d esp = 0x0019ff30 ebp = 0x0019ff70
Found by: call frame info
5 kernel32.dll + 0x1fa29
eip = 0x7712fa29 esp = 0x0019ff78 ebp = 0x0019ff80
Found by: call frame info
6 ntdll.dll + 0x67a9e
eip = 0x77c97a9e esp = 0x0019ff88 ebp = 0x0019ffdc
Found by: previous frame's frame pointer
7 ntdll.dll + 0x67a6e
eip = 0x77c97a6e esp = 0x0019ffe4 ebp = 0x0019ffec
Found by: previous frame's frame pointer
SehHandler:
Thread 0 (crashed)
0 KERNELBASE.dll + 0x12b812
eip = 0x76ddb812 esp = 0x0019fe68 ebp = 0x0019fec4 ebx = 0x19930520
esi = 0x00645a90 edi = 0x0042c754 eax = 0x0019fe68 ecx = 0x00000003
edx = 0x00000000 efl = 0x00000212
Found by: given as instruction pointer in context
1 StackWalker_VC2017.exe!_CxxThrowException [throw.cpp : 74 + 0x19]
eip = 0x00405a98 esp = 0x0019fecc ebp = 0x0019fef4
Found by: previous frame's frame pointer
2 StackWalker_VC2017.exe!__scrt_throw_std_bad_alloc() [throw_bad_alloc.cpp : 35 + 0x16]
eip = 0x0040509c esp = 0x0019fefc ebp = 0x0019ff10
Found by: call frame info
3 StackWalker_VC2017.exe!main [main.cpp : 239 + 0x8]
eip = 0x00402500 esp = 0x0019ff24 ebp = 0x0019ff14
Found by: call frame info with scanning
Extracted stacks using breakpad minidump_stackwalk:
The question is why SehHandler stacktrace does not have all function calls?
The main problem is that in project I use crash handlers for logging information in dumps. But creating minidump on each NewHandler call is not inappropriate solution, because sometimes bad_alloc could be fixed and exception thrown in try/catch block, that means that it is expected behaviour. So I want to handle bad_alloc in unhandled exception handler, so that it would definitely be crash. Also problem occurs only in release builds.
As mentioned in https://developercommunity.visualstudio.com/t/stdbad-alloc-failures-are-undebuggable/542559?viewtype=solutions it is bug in msvc. Unfortunately there is no good solution for release builds.
I've made this sample code:
#include <vector>
struct POD {
int a;
int b;
int c;
inline static POD make_pod_with_default()
{
POD p{ 41, 51, 61 };
return p;
}
inline void change_pod_a(POD &p, int a) {
p.a = a;
}
inline void change_pod_b(POD &p, int b) {
p.b = b;
}
static POD make_pod_with_a(int a) {
POD p = make_pod_with_default();
p.change_pod_a(p, a);
return p;
}
static POD make_pod_with_b(int a) {
POD p = make_pod_with_default();
p.change_pod_b(p, a);
return p;
}
};
int main()
{
std::vector<POD> vec{};
vec.reserve(2);
vec.push_back(POD::make_pod_with_a(71));
vec.push_back(POD::make_pod_with_b(81));
return vec[0].a + vec[0].b + vec[0].c + vec[1].a + vec[1].b + vec[1].c;
}
In the compiled assembly code we can see the following instructions are being generated for the first vec.push_back(...) call:
...
mov DWORD PTR $T2[esp+32], 41 ; 00000029H
...
mov DWORD PTR $T2[esp+36], 51 ; 00000033H
...
mov DWORD PTR $T5[esp+32], 71 ; 00000047H
...
mov DWORD PTR $T6[esp+44], 61 ; 0000003dH
...
There's a mov to [esp+32] for the 71, but the mov to [esp+32] for the 41 is still there, being useless! How can I write code for MSVC that will enable this kind of optimization, is MSVC even capable of it?
Both GCC and CLANG give more optimized versions, but CLANG defeats by a large margin with literally no overhead, in a very clean and logical fashion:
CLANG generated code:
main: # #main
push rax
mov edi, 24
call operator new(unsigned long)
mov rdi, rax
call operator delete(void*)
mov eax, 366
pop rcx
ret
Everything is done at compile time as 71 + 51 + 61 + 41 + 81 + 61 = 366!
I must admit its painful to see my program being computed at compile time and still throw in that call to vec.reserve() in the assembly... but CLANG still takes the cake, by far! Come on MSVC, this is not a vector of volatile.
If you turn your methods constexpr, you might do:
constexpr POD step_one()
{
POD p{2, 5, 11};
p.b = 3;
return p;
}
constexpr void step_two(POD &p)
{
p.c = 5;
}
constexpr POD make_pod(){
POD p = step_one();
step_two(p);
return p;
}
POD make_pod_final()
{
constexpr POD res = make_pod();
return res;
}
resulting to:
make_pod_final PROC
mov eax, DWORD PTR $T1[esp-4]
mov DWORD PTR [eax], 2
mov DWORD PTR [eax+4], 3
mov DWORD PTR [eax+8], 5
ret 0
Demo
When I compile following code with gcc 6 -O3 -std=c++14, I get nice and empty main:
Dump of assembler code for function main():
0x00000000004003e0 <+0>: xor %eax,%eax
0x00000000004003e2 <+2>: retq
But uncommenting last line in main "breaks" optimization:
Dump of assembler code for function main():
0x00000000004005f0 <+0>: sub $0x78,%rsp
0x00000000004005f4 <+4>: lea 0x40(%rsp),%rdi
0x00000000004005f9 <+9>: movq $0x400838,0x10(%rsp)
0x0000000000400602 <+18>: movb $0x0,0x18(%rsp)
0x0000000000400607 <+23>: mov %fs:0x28,%rax
0x0000000000400610 <+32>: mov %rax,0x68(%rsp)
0x0000000000400615 <+37>: xor %eax,%eax
0x0000000000400617 <+39>: movl $0x0,(%rsp)
0x000000000040061e <+46>: movq $0x400838,0x30(%rsp)
0x0000000000400627 <+55>: movb $0x0,0x38(%rsp)
0x000000000040062c <+60>: movl $0x0,0x20(%rsp)
0x0000000000400634 <+68>: movq $0x400838,0x50(%rsp)
0x000000000040063d <+77>: movb $0x0,0x58(%rsp)
0x0000000000400642 <+82>: movl $0x0,0x40(%rsp)
0x000000000040064a <+90>: callq 0x400790 <ErasedObject::~ErasedObject()>
0x000000000040064f <+95>: lea 0x20(%rsp),%rdi
0x0000000000400654 <+100>: callq 0x400790 <ErasedObject::~ErasedObject()>
0x0000000000400659 <+105>: mov %rsp,%rdi
0x000000000040065c <+108>: callq 0x400790 <ErasedObject::~ErasedObject()>
0x0000000000400661 <+113>: mov 0x68(%rsp),%rdx
0x0000000000400666 <+118>: xor %fs:0x28,%rdx
0x000000000040066f <+127>: jne 0x400678 <main()+136>
0x0000000000400671 <+129>: xor %eax,%eax
0x0000000000400673 <+131>: add $0x78,%rsp
0x0000000000400677 <+135>: retq
0x0000000000400678 <+136>: callq 0x4005c0 <__stack_chk_fail#plt>
Code
#include <type_traits>
#include <new>
namespace
{
struct ErasedTypeVTable
{
using destructor_t = void (*)(void *obj);
destructor_t dtor;
};
template <typename T>
void dtor(void *obj)
{
return static_cast<T *>(obj)->~T();
}
template <typename T>
static const ErasedTypeVTable erasedTypeVTable = {
&dtor<T>
};
}
struct ErasedObject
{
std::aligned_storage<sizeof(void *)>::type storage;
const ErasedTypeVTable& vtbl;
bool flag = false;
template <typename T, typename S = typename std::decay<T>::type>
ErasedObject(T&& obj)
: vtbl(erasedTypeVTable<S>)
{
static_assert(sizeof(T) <= sizeof(storage) && alignof(T) <= alignof(decltype(storage)), "");
new (object()) S(std::forward<T>(obj));
}
ErasedObject(ErasedObject&& other) = default;
~ErasedObject()
{
if (flag)
{
::operator delete(object());
}
else
{
vtbl.dtor(object());
}
}
void *object()
{
return reinterpret_cast<char *>(&storage);
}
};
struct myType
{
int a;
};
int main()
{
ErasedObject c1(myType{});
ErasedObject c2(myType{});
//ErasedObject c3(myType{});
}
clang can optimize-out both versions.
Any ideas what's going on? Am I hitting some optimization limit? If so, is it configurable?
I ran g++ with -fdump-ipa-inline to get more information about why functions are or are not inlined.
For the testcase with main() function and three objects created I got:
(...)
150 Deciding on inlining of small functions. Starting with size 35.
151 Enqueueing calls in void {anonymous}::dtor(void*) [with T = myType]/40.
152 Enqueueing calls in int main()/35.
153 not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
154 not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
155 not inlinable: int main()/35 -> ErasedObject::~ErasedObject()/33, call is unlikely and code size would grow
(...)
This error code is set in gcc/gcc/ipa-inline.c:
else if (!e->maybe_hot_p ()
&& (growth >= MAX_INLINE_INSNS_SINGLE
|| growth_likely_positive (callee, growth)))
{
e->inline_failed = CIF_UNLIKELY_CALL;
want_inline = false;
}
Then I discovered, that the smallest change to make g++ inline these functions is to add a declaration:
int main() __attribute__((hot));
I wasn't able to find in code why int main() isn't considered hot, but probably this should be left for another question.
More interesting is the the second part of the conditional I pasted above. The intent was to not inline when the code will grow and you produced an example when the code shrinks after complete inlining.
I think this deserves to be reported on GCC's bugzilla, but I'm not sure if you can call it a bug - estimation of inline impact is a heuristic and as such it is expected to work correctly in most cases, not all of them.
I have a C++ function which has many return statements at various places. How to set a breakpoint at the return statement where the function actually returns ?
And what does "break" command without argument means?
Contrary to answers so far, most compilers will create a single return assembly instruction, regardless of how many return statements are in the function (it is convenient for the compiler to do that, so there is only a single place to perform all the stack frame cleanup).
If you wanted to stop on that instruction, all you have to do is disas and look for retq (or whatever the return instruction for your processor is), and set a breakpoint on it. For example:
int foo(int x)
{
switch(x) {
case 1: return 2;
case 2: return 3;
default: return 42;
}
}
int main()
{
return foo(0);
}
(gdb) disas foo
Dump of assembler code for function foo:
0x0000000000400448 <+0>: push %rbp
0x0000000000400449 <+1>: mov %rsp,%rbp
0x000000000040044c <+4>: mov %edi,-0x4(%rbp)
0x000000000040044f <+7>: mov -0x4(%rbp),%eax
0x0000000000400452 <+10>: mov %eax,-0xc(%rbp)
0x0000000000400455 <+13>: cmpl $0x1,-0xc(%rbp)
0x0000000000400459 <+17>: je 0x400463 <foo+27>
0x000000000040045b <+19>: cmpl $0x2,-0xc(%rbp)
0x000000000040045f <+23>: je 0x40046c <foo+36>
0x0000000000400461 <+25>: jmp 0x400475 <foo+45>
0x0000000000400463 <+27>: movl $0x2,-0x8(%rbp)
0x000000000040046a <+34>: jmp 0x40047c <foo+52>
0x000000000040046c <+36>: movl $0x3,-0x8(%rbp)
0x0000000000400473 <+43>: jmp 0x40047c <foo+52>
0x0000000000400475 <+45>: movl $0x2a,-0x8(%rbp)
0x000000000040047c <+52>: mov -0x8(%rbp),%eax
0x000000000040047f <+55>: leaveq
0x0000000000400480 <+56>: retq
End of assembler dump.
(gdb) b *0x0000000000400480
Breakpoint 1 at 0x400480
(gdb) r
Breakpoint 1, 0x0000000000400480 in foo ()
(gdb) p $rax
$1 = 42
You can use reverse debugging to find out where function actually returns. Finish executing current frame, do reverse-step and then you should stop at just returned statement.
(gdb) record
(gdb) fin
(gdb) reverse-step
Break on all retq of current function
This Python command puts a breakpoint on every retq instruction of the current function:
class BreakReturn(gdb.Command):
def __init__(self):
super().__init__(
'break-return',
gdb.COMMAND_RUNNING,
gdb.COMPLETE_NONE,
False
)
def invoke(self, arg, from_tty):
frame = gdb.selected_frame()
# TODO make this work if there is no debugging information, where .block() fails.
block = frame.block()
# Find the function block in case we are in an inner block.
while block:
if block.function:
break
block = block.superblock
start = block.start
end = block.end
arch = frame.architecture()
pc = gdb.selected_frame().pc()
instructions = arch.disassemble(start, end - 1)
for instruction in instructions:
if instruction['asm'].startswith('retq '):
gdb.Breakpoint('*{}'.format(instruction['addr']))
BreakReturn()
Source it with:
source gdb.py
and use the command as:
break-return
continue
You should now be at retq.
Step until retq
Just for fun, another implementation that stops when a retq is found (less efficient of because no hardware support):
class ContinueReturn(gdb.Command):
def __init__(self):
super().__init__(
'continue-return',
gdb.COMMAND_RUNNING,
gdb.COMPLETE_NONE,
False
)
def invoke(self, arg, from_tty):
thread = gdb.inferiors()[0].threads()[0]
while thread.is_valid():
gdb.execute('ni', to_string=True)
frame = gdb.selected_frame()
arch = frame.architecture()
pc = gdb.selected_frame().pc()
instruction = arch.disassemble(pc)[0]['asm']
if instruction.startswith('retq '):
break
ContinueReturn()
This will ignore your other breakpoints. TODO: can be avoided?
Not sure if it is faster or slower than reverse-step.
A version that stops at a given opcode can be found at: https://stackoverflow.com/a/31249378/895245
break without arguments stops execution at the next instruction in the currently selected stack frame. You select strack frames via the frame or up and down commands. If you want to debug the point where you are actually leaving the current function, select the next outer frame and break there.
rr reverse debugging
Similar to GDB record mentioned at https://stackoverflow.com/a/3649698/895245 , but much more functional as of GDB 7.11 vs rr 4.1.0 in Ubuntu 16.04.
Notably, it deals with AVX correctly:
gdb reverse debugging fails with "Process record does not support instruction 0xf0d at address"
"target record-full" in gdb makes "n" command fail on printf with "Process record does not support instruction 0xc5 at address 0x7ffff7dee6e7"?
which prevents it from working with the default standard library calls.
Install Ubuntu 16.04:
sudo apt-get install rr linux-tools-common linux-tools-generic linux-cloud-tools-generic
sudo cpupower frequency-set -g performance
But also consider compiling from source to get the latest updates, it was not hard.
Test program:
int where_return(int i) {
if (i)
return 1;
else
return 0;
}
int main(void) {
where_return(0);
where_return(1);
}
compile and run:
gcc -O0 -ggdb3 -o reverse.out -std=c89 -Wextra reverse.c
rr record ./reverse.out
rr replay
Now you are left inside a GDB session, and you can properly reverse debug:
(rr) break main
Breakpoint 1 at 0x56057c458619: file a.c, line 9.
(rr) continue
Continuing.
Breakpoint 1, main () at a.c:9
9 where_return(0);
(rr) step
where_return (i=0) at a.c:2
2 if (i)
(rr) finish
Run till exit from #0 where_return (i=0) at a.c:2
main () at a.c:10
10 where_return(1);
Value returned is $1 = 0
(rr) reverse-step
where_return (i=0) at a.c:6
6 }
(rr) reverse-step
5 return 0;
We are now on the correct return line.
If you can change the source code, you might use some dirty trick with the preprocessor:
void on_return() {
}
#define return return on_return(), /* If the function has a return value != void */
#define return return on_return() /* If the function has a return value == void */
/* <<<-- Insert your function here -->>> */
#undef return
Then set a breakpoint to on_return and go one frame up.
Attention: This will not work, if a function does not return via a return statement. So ensure, that it's last line is a return.
Example (shamelessly copied from C code, but will work also in C++):
#include <stdio.h>
/* Dummy function to place the breakpoint */
void on_return(void) {
}
#define return return on_return()
void myfun1(int a) {
if (a > 10) return;
printf("<10\n");
return;
}
#undef return
#define return return on_return(),
int myfun2(int a) {
if (a < 0) return -1;
if (a > 0) return 1;
return 0;
}
#undef return
int main(void)
{
myfun1(1);
myfun2(2);
}
The first macro will change
return;
to
return on_return();
Which is valid, since on_return also returns void.
The second macro will change
return -1;
to
return on_return(), -1;
Which will call on_return() and then return -1 (thanks to the ,-operator).
This is a very dirty trick, but despite using backwards-stepping, it will work in multi-threaded environments and inlined functions, too.
Break without argument sets a breakpoint at the current line.
There is no way for a single breakpoint to catch all return paths. Either set a breakpoint at the caller immediately after it returns, or break at all return statements.
Since this is C++, I suppose you could create a local sentry object, and break on its destructor, though.