c++ inline function definition position in relation to call - c++

In c++, considering a single compilation unit, does the definition of a function has to come above the call(s) to it in order to be inlined, or is it enought that the definition is somewhere in the compilation unit?
In other words, is there any difference between :
class A {
public:
void f();
};
class B {
A a;
public:
void g();
};
inline void A::f() {
printf("Func'ing A!\n");
}
void B::g() {
//...
a.f();
}
and
class A {
public:
void f();
};
class B {
A a;
public:
void g();
};
void B::g() {
//...
a.f();
}
inline void A::f() {
printf("Func'ing A!\n");
}
regarding A::f() being inlined inside B::g() ?
Thanks

I think this is a reasonable question. There are several cases in C++ where order of text in the file does matter. This fortunately is not one of them - your two code samples are equivalent. As covered by Claudio, writing 'inline' in the source makes no difference either way.
Questions of the variety of "does this optimisation happen" are usually compiler dependent, so can best be answered by asking the compiler, for example:
# clang++ -c first.cpp -O3 -S -emit-llvm -o first.ll
; ModuleID = 'first.cpp'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"
%class.B = type { %class.A }
%class.A = type { i8 }
#str = private unnamed_addr constant [12 x i8] c"Func'ing A!\00"
; Function Attrs: nounwind uwtable
define void #_ZN1B1gEv(%class.B* nocapture readnone %this) #0 align 2 {
%puts.i = tail call i32 #puts(i8* getelementptr inbounds ([12 x i8]* #str, i64 0, i64 0)) #1
ret void
}
; Function Attrs: nounwind
declare i32 #puts(i8* nocapture readonly) #1
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)"}
Or, if you prefer x86-64,
# clang++ -c first.cpp -O3 -S -o first.s
.text
.file "first.cpp"
.globl _ZN1B1gEv
.align 16, 0x90
.type _ZN1B1gEv,#function
_ZN1B1gEv: # #_ZN1B1gEv
.cfi_startproc
# BB#0:
movl $.Lstr, %edi
jmp puts # TAILCALL
.Ltmp0:
.size _ZN1B1gEv, .Ltmp0-_ZN1B1gEv
.cfi_endproc
.type .Lstr,#object # #str
.section .rodata.str1.1,"aMS",#progbits,1
.Lstr:
.asciz "Func'ing A!"
.size .Lstr, 12
.ident "Debian clang version 3.5.0-10 (tags/RELEASE_350/final) (based on LLVM 3.5.0)"
.section ".note.GNU-stack","",#progbits
Both snippets compile to exactly the same intermediate representation with clang 3.5 - most easily verified using a diff tool - so we can be confident that relative position in the source made no difference.
This is actually the case without optimisation as well (using -O0), at least for the compiler I'm using.

inline is a hint for the compiler, which can choose if actually inlining or not.
You may want to have a look at this answer.

Related

weak symbols and custom sections in inline assembly

I'm stuck with a problem which is illustrated by the following g++ code:
frob.hpp:
template<typename T> T frob(T x);
template<> inline int frob<int>(int x) {
asm("1: nop\n"
".pushsection \"extra\",\"a\"\n"
".quad 1b\n"
".popsection\n");
return x+1;
}
foo.cpp:
#include "frob.hpp"
extern int bar();
int foo() { return frob(17); }
int main() { return foo() + bar(); }
bar.cpp:
#include "frob.hpp"
int bar() { return frob(42); }
I'm doing these quirky custom section things as a way to mimick the mechanism here in the linux kernel (but in a userland and C++ way).
My problem is that the instantiation of frob<int> is recognized as a weak symbol, which is fine, and one of the two is eventually elided by the linker, which is fine too. Except that the linker is not disturbed by the fact that the extra section has references to that symbol (via .quad 1b), and the linker want to resolve them locally. I get:
localhost /tmp $ g++ -O3 foo.cpp bar.cpp
localhost /tmp $ g++ -O0 foo.cpp bar.cpp
`.text._Z4frobIiET_S0_' referenced in section `extra' of /tmp/ccr5s7Zg.o: defined in discarded section `.text._Z4frobIiET_S0_[_Z4frobIiET_S0_]' of /tmp/ccr5s7Zg.o
collect2: error: ld returned 1 exit status
(-O3 is fine because no symbol is emitted altogether).
I don't know how to work around this.
would there be a way to tell the linker to also pay attention to symbol resolution in the extra section too ?
perhaps one could trade the local labels for .weak global labels ? E.g. like in:
asm(".weak exception_handler_%=\n"
"exception_handler_%=: nop\n"
".pushsection \"extra\",\"a\"\n"
".quad exception_handler_%=\n"
".popsection\n"::);
However I fear that if I go this way, distinct asm statements in distinct compilation units may get the same symbol via this mechanism (may they ?).
Is there a way around that I've overlooked ?
g++ (5,6, at least) compiles an inline function with external linkage - such as
template<> inline int frob<int>(int x) - at a weak global
symbol in a [COMDAT] [function-section] in
its own section-group. See:-
g++ -S -O0 bar.cpp
bar.s
.file "bar.cpp"
.section .text._Z4frobIiET_S0_,"axG",#progbits,_Z4frobIiET_S0_,comdat
.weak _Z4frobIiET_S0_
.type _Z4frobIiET_S0_, #function
_Z4frobIiET_S0_:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
#APP
# 8 "frob.hpp" 1
1: nop
.pushsection "extra","a"
.quad 1b
.popsection
# 0 "" 2
#NO_APP
movl -4(%rbp), %eax
addl $1, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
...
...
The relevant directives are:
.section .text._Z4frobIiET_S0_,"axG",#progbits,_Z4frobIiET_S0_,comdat
.weak _Z4frobIiET_S0_
(The compiler-generated #APP and #NO_APP delimit your inline assembly).
Do as the compiler does by making extra likewise a COMDAT section in
a section-group:
frob.hpp (fixed)
template<typename T> T frob(T x);
template<> inline int frob<int>(int x) {
asm("1: nop\n"
".pushsection \"extra\", \"axG\", #progbits,extra,comdat" "\n"
".quad 1b\n"
".popsection\n");
return x+1;
}
and the linkage error will be cured:
$ g++ -O0 foo.cpp bar.cpp
$ ./a.out; echo $?
61

LLVM IR main function returning void

I'm testing a main function that simply returns void and am get core dump errors (signal 65 or 73) when running the bitcode with lli:
define void #main() {
entry:
ret void
}
Is it a limitation of lli or just plain illegal in LLVM?
I'm well aware that in C++ the declaration of a main function with a return type of void is incorrect. In fact I've tried this with Clang (it's just a warning to do so) and get almost the same code (not exactly the same because of the #0 attributes, but close enough that I believe the differences are not causing this problem):
; Function Attrs: nounwind
define void #main() #0 {
entry:
ret void
}
It doesn't crash for me, so the culprit must be something else:
$ echo "define void #main() {entry: ret void}" | lli -
$
In any case, lli supports void main methods, as you can see in ExecutionEngine::runFunctionAsMain().

Adding intrinsics using an LLVM pass

I've added an intrinsic to an input code using an LLVM pass. I'm able to see the intrinsic call, yet I can't figure out how to compile the code to my target architecture (x86_64). I'm running the following command:
clang++ $(llvm-config --ldflags --libs all) ff.s -o foo
But the linker complains about undefined references:
/tmp/ff-2ada42.o: In function `fact(unsigned int)':
/home/rubens/Desktop/ff.cpp:9: undefined reference to `llvm.x86.sse3.mwait.i32.i32'
/tmp/ff-2ada42.o: In function `fib(unsigned int)':
/home/rubens/Desktop/ff.cpp:16: undefined reference to `llvm.x86.sse3.mwait.i32.i32'
/home/rubens/Desktop/ff.cpp:16: undefined reference to `llvm.x86.sse3.mwait.i32.i32'
/home/rubens/Desktop/ff.cpp:16: undefined reference to `llvm.x86.sse3.mwait.i32.i32'
Despite using ldflags from llvm-config, the compilation does not proceed. Any ideas on what should be done for the code to compile properly?
To generate the assembly code, I've done the following:
# Generating optimized code
clang++ $(llvm-config --cxxflags) -emit-llvm -c ff.cpp -o ff.bc
opt ff.bc -load path/to/mypass.so -mypass > opt_ff.bc
# Generating assembly
llc opt_ff.bc -o ff.s
I'm currently using llvm version 3.4.2; clang version 3.4.2 (tags/RELEASE_34/dot2-final); gcc version 4.9.2 (GCC); and Linux 3.17.2-1-ARCH x86_64.
Edit: adding the IR with the intrinsic:
File ~/llvm/include/llvm/IR/IntrinsicsX86.td:
...
589 // Thread synchronization ops.
590 let TargetPrefix = "x86" in { // All intrinsics start with "llvm.x86.".
591 def int_x86_sse3_monitor : GCCBuiltin<"__builtin_ia32_monitor">,
592 Intrinsic<[], [llvm_ptr_ty,
593 llvm_i32_ty, llvm_i32_ty], []>;
594 def int_x86_sse3_mwait : GCCBuiltin<"__builtin_ia32_mwait">,
595 Intrinsic<[], [llvm_i32_ty,
596 llvm_i32_ty], []>;
597 }
...
And calls (from file ff.s):
...
.Ltmp2:
callq llvm.x86.sse3.mwait.i32.i32
movl $_ZStL8__ioinit, %edi
callq _ZNSt8ios_base4InitC1Ev
movl $_ZNSt8ios_base4InitD1Ev, %edi
movl $_ZStL8__ioinit, %esi
movl $__dso_handle, %edx
callq __cxa_atexit
popq %rax
ret
...
Edit 2: Here's how I'm adding the intrinsic during the opt pass:
Function *f(bb->getParent());
Module *m(f->getParent());
std::vector<Type *> types(2, Type::getInt32Ty(getGlobalContext()));
Function *mwait = Intrinsic::getDeclaration(m, Intrinsic::x86_sse3_mwait, types);
std::vector<Value *> args;
IRBuilder<> builder(&bb->front());
for (uint32_t i : {1, 2}) args.push_back(builder.getInt32(i));
ArrayRef<Value *> args_ref(args);
builder.CreateCall(mwait, args_ref);
EDIT:
I am currently writing an LLVM pass that is basicaly doing what you tried to do in this question. The problem with your code is the following:
std::vector<Type *> types(2, Type::getInt32Ty(getGlobalContext()));
Function *mwait = Intrinsic::getDeclaration(m, Intrinsic::x86_sse3_mwait, types);
You are trying to get the deceleration for an Intrinsic function with the name llvm.x86.sse3.mwait.i32.i32 and this Intrinsic does not exist. However, llvm.x86.sse3.mwait exists and therefor you have to write this:
Function *mwait = Intrinsic::getDeclaration(m, Intrinsic::x86_sse3_mwait);
notice the missing type argument to the call. This is because llvm.x86.sse3.mwait has no overloadings.
I hope you figured it out in the meantime.
Ok since I want be able to answer you for a while here is a wild guess answer.
The problem is the way you add the intrinsic through your optimizer pass. It looks like you are just creating a function with the same name as the intrinsic not the intrinsic itself.
Here is a little C++ code that just uses the clang built-in to get the intrinsic inside the IR (I use clang 3.5 but this should not have any impact).
int main ()
{
__builtin_ia32_mwait(4,2);
}
Compiling it with clang -emit-llvm -S I get:
; ModuleID = 'intrin.cpp'
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"
; Function Attrs: nounwind uwtable
define i32 #main() #0 {
call void #llvm.x86.sse3.mwait(i32 4, i32 2)
ret i32 0
}
; Function Attrs: nounwind
declare void #llvm.x86.sse3.mwait(i32, i32) #1
attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5.0 "}
Please not that the SSE3 intrinsic has no type overloads like in your version.
Using llc on the generated file provides me:
.Ltmp2:
.cfi_def_cfa_register %rbp
movl $4, %ecx
movl $2, %eax
mwait
xorl %eax, %eax
popq %rbp
retq
Proper assembly was created.
So I assume the way you are introducing the intrinsic into the function is wrong in your opt pass.
Get the intrinsic function and call it:
vector<Type*> types;
types.push_back(IntegerType::get(/*LLVM context*/, 32));
types.push_back(IntegerType::get(/*LLVM context*/, 32));
Function* func = Intrinsic::getDeclaration(/* module */, Intrinsic::x86_sse3_mwait, types);
CallInst* call = CallInst::Create(func, /* arguments */);

Clang - Compiling a C header to LLVM IR/bitcode

Say I have the following trivial C header file:
// foo1.h
typedef int foo;
typedef struct {
foo a;
char const* b;
} bar;
bar baz(foo*, bar*, ...);
My goal is to take this file, and produce an LLVM module that looks something like this:
%struct.bar = type { i32, i8* }
declare { i32, i8* } #baz(i32*, %struct.bar*, ...)
In other words, convert a C .h file with declarations into the equivalent LLVM IR, including type resolution, macro expansion, and so on.
Passing this through Clang to generate LLVM IR produces an empty module (as none of the definitions are actually used):
$ clang -cc1 -S -emit-llvm foo1.h -o -
; ModuleID = 'foo1.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
My first instinct was to turn to Google, and I came across two related questions: one from a mailing list, and one from StackOverflow. Both suggested using the -femit-all-decls flag, so I tried that:
$ clang -cc1 -femit-all-decls -S -emit-llvm foo1.h -o -
; ModuleID = 'foo1.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
Same result.
I've also tried disabling optimizations (both with -O0 and -disable-llvm-optzns), but that made no difference for the output. Using the following variation did produce the desired IR:
// foo2.h
typedef int foo;
typedef struct {
foo a;
char const* b;
} bar;
bar baz(foo*, bar*, ...);
void doThings() {
foo a = 0;
bar myBar;
baz(&a, &myBar);
}
Then running:
$ clang -cc1 -S -emit-llvm foo2.h -o -
; ModuleID = 'foo2.h'
target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-darwin13.3.0"
%struct.bar = type { i32, i8* }
; Function Attrs: nounwind
define void #doThings() #0 {
entry:
%a = alloca i32, align 4
%myBar = alloca %struct.bar, align 8
%coerce = alloca %struct.bar, align 8
store i32 0, i32* %a, align 4
%call = call { i32, i8* } (i32*, %struct.bar*, ...)* #baz(i32* %a, %struct.bar* %myBar)
%0 = bitcast %struct.bar* %coerce to { i32, i8* }*
%1 = getelementptr { i32, i8* }* %0, i32 0, i32 0
%2 = extractvalue { i32, i8* } %call, 0
store i32 %2, i32* %1, align 1
%3 = getelementptr { i32, i8* }* %0, i32 0, i32 1
%4 = extractvalue { i32, i8* } %call, 1
store i8* %4, i8** %3, align 1
ret void
}
declare { i32, i8* } #baz(i32*, %struct.bar*, ...) #1
attributes #0 = { nounwind "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-realign-stack" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
!llvm.ident = !{!0}
!0 = metadata !{metadata !"clang version 3.5 (trunk 200156) (llvm/trunk 200155)"}
Besides the placeholder doThings, this is exactly what I want the output to look like! The problem is that this requires 1.) using a modified version of the header, and 2.) knowing the types of things in advance. Which leads me to...
Why?
Basically, I'm building an implementation for a language using LLVM to generate code. The implementation should support C interop by specifying C header files and associated libs only (no manual declarations), which will then be used by the compiler before link-time to ensure that function invocations match their signatures. Hence, I've narrowed the problem down to 2 possible solutions:
Turn the header files into LLVM IR/bitcode, which can then get the type signature of each function
Use libclang to parse the headers, then query the types from the resulting AST (my 'last resort' in case there is no sufficient answer for this question)
TL;DR
I need to take a C header file (such as the above foo1.h) and, without changing it, generate the aforementioned expected LLVM IR using Clang, OR, find another way to get function signatures from C header files (preferrably using libclang or building a C parser)
Perhaps the less elegant solution, but staying with the idea of a doThings function that forces the compiler to emit IR because the definitions are used:
The two problems you identify with this approach are that it requires modifying the header, and that it requires a deeper understanding of the types involved in order to generate "uses" to put in the function. Both of these can be overcome relatively simply:
Instead of compiling the header directly, #include it (or more likely, a preprocessed version of it, or multiple headers) from a .c file that contains all the "uses" code. Straightforward enough:
// foo.c
#include "foo.h"
void doThings(void) {
...
}
You don't need detailed type information to generate specific usages of the names, matching up struct instantiations to parameters and all that complexity as you have in the "uses" code above. You don't actually need to gather the function signatures yourself.
All you need is the list of the names themselves and to keep track of whether they're for a function or for an object type. You can then redefine your "uses" function to look like this:
void * doThings(void) {
typedef void * (*vfun)(void);
typedef union v { void * o; vfun f; } v;
return (v[]) {
(v){ .o = &(bar){0} },
(v){ .f = (vfun)baz },
};
}
This greatly simplifies the necessary "uses" of a name to either casting it to a uniform function type (and taking its pointer rather than calling it), or wrapping it in &( and ){0} (instantiating it regardless of what it is). This means you don't need to store actual type information at all, only the kind of context from which you extracted the name in the header.
(obviously give the dummy function and the placeholder types extended unique names so they don't clash with the code you actually want to keep)
This simplifies the parsing step tremendously since you only have to recognise the context of a struct/union or function declaration, without actually needing to do very much with the surrounding information.
A simple but hackish starting point (which I would probably use because I have low standards :D ) might be:
grep through the headers for #include directives that take an angle-bracketed argument (i.e. an installed header you don't want to also generate declarations for).
use this list to create a dummy include folder with all of the necessary include files present but empty
preprocess it in the hope that'll simplify the syntax (clang -E -I local-dummy-includes/ -D"__attribute__(...)=" foo.h > temp/foo_pp.h or something similar)
grep through for struct or union followed by a name, } followed by a name, or name (, and use this ridiculously simplified non-parse to build the list of uses in the dummy function, and emit the code for the .c file.
It won't catch every possibility; but with a bit of tweaking and extension, it probably will actually deal with a large subset of realistic header code. You could replace this with a dedicated simplified parser (one built to only look at the patterns of the contexts you need) at a later stage.

Preprocessor based exclusion of namespace qualified function calls

I’m currently working on a reporting library as part of a large project. It contains a collection of logging and system message functions. I’m trying to utilize preprocessor macros to strip out a subset of the functions calls that are intended strictly for debugging, and the function definitions and implementations themselves, using conditional compilation and function like macros defined to nothing (similar to the way that assert() calls are removed if DEBUG is defined). I’m running into a problem. I prefer to fully qualify namespaces, I find it improves readability; and I have my reporting functions wrapped in a namespace. Because the colon character can’t be part of a macro token I am unable to include the namespace in the stripping of the function calls. If I defined the functions alone to nothing I end up with Namespace::. I've considered just using conditional compilation to block the function code for those functions, but I am worried that the compiler might not competently optimize out the empty functions.
namespace Reporting
{
const extern std::string logFileName;
void Report(std::string msg);
void Report(std::string msg, std::string msgLogAdd);
void Log(std::string msg);
void Message(std::string msg);
#ifdef DEBUG
void Debug_Log(std::string message);
void Debug_Message(std::string message);
void Debug_Report(std::string message);
void Debug_Assert(bool test, std::string message);
#else
#define Debug_Log(x);
#define Debug_Message(x);
#define Debug_Report(x);
#define Debug_Assert(x);
#endif
};
Any idea on how to deal with the namespace qualifiers with the preprocessor?
Thoughts on, problems with, just removing the function code?
Any other ways to accomplish my goal?
This is how I did it when I wrote a similar library several months back. And yes, your optimizer will remove empty, inline function calls. If you declare them out-of-line (not in the header file), your compiler will NOT inline them unless you use LTO.
namespace Reporting
{
const extern std::string logFileName;
void Report(std::string msg);
void Report(std::string msg, std::string msgLogAdd);
void Log(std::string msg);
void Message(std::string msg);
#ifdef DEBUG
inline void Debug_Log(std::string message) { return Log(message); }
inline void Debug_Message(std::string message) { return Message(message); }
inline void Debug_Report(std::string message) { return Report(message); }
inline void Debug_Assert(bool test, std::string message) { /* Not sure what to do here */ }
#else
inline void Debug_Log(std::string) {}
inline void Debug_Message(std::string) {}
inline void Debug_Report(std::string) {}
inline void Debug_Assert(std::string) {}
#endif
};
Just as a side note, don't pass strings by value unless you need to make a copy anyways. Use a const reference instead. It prevents an expensive allocation + strcpy on the string for EVERY function call.
EDIT: Actually, now that I think about it, just use a const char*. Looking at the assembly, it's a LOT faster, especially for empty function bodies.
GCC optimizes this out at -O1, I don't think there's much of an issue with this:
clark#clark-laptop /tmp $ cat t.cpp
#include <cstdio>
inline void do_nothing()
{
}
int main()
{
do_nothing();
return 0;
}
clark#clark-laptop /tmp $ g++ -O1 -S t.cpp
clark#clark-laptop /tmp $ cat t.s
.file "t.cpp"
.text
.globl main
.type main, #function
main:
.LFB32:
.cfi_startproc
movl $0, %eax
ret
.cfi_endproc
.LFE32:
.size main, .-main
.ident "GCC: (Gentoo 4.5.0 p1.2, pie-0.4.5) 4.5.0"
.section .note.GNU-stack,"",#progbits
After a bit of tweaking, it seems that this will only be a FULL removal if you use const char*, NOT std::string or const std::string&. Here's the assembly for the const char*:
clark#clark-laptop /tmp $ cat t.cpp
inline void do_nothing(const char*)
{
}
int main()
{
do_nothing("test");
return 0;
}
clark#clark-laptop /tmp $ g++ -O1 -S t.cpp
clark#clark-laptop /tmp $ cat t.s
.file "t.cpp"
.text
.globl main
.type main, #function
main:
.LFB1:
.cfi_startproc
movl $0, %eax
ret
.cfi_endproc
.LFE1:
.size main, .-main
.ident "GCC: (Gentoo 4.5.0 p1.2, pie-0.4.5) 4.5.0"
.section .note.GNU-stack,"",#progbits
And here's with const std::string&...
.file "t.cpp"
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "test"
.text
.globl main
.type main, #function
main:
.LFB591:
.cfi_startproc
subq $24, %rsp
.cfi_def_cfa_offset 32
leaq 14(%rsp), %rdx
movq %rsp, %rdi
movl $.LC0, %esi
call _ZNSsC1EPKcRKSaIcE
movq (%rsp), %rdi
subq $24, %rdi
cmpq $_ZNSs4_Rep20_S_empty_rep_storageE, %rdi
je .L11
movl $_ZL22__gthrw_pthread_cancelm, %eax
testq %rax, %rax
je .L3
movl $-1, %eax
lock xaddl %eax, 16(%rdi)
jmp .L4
.L3:
movl 16(%rdi), %eax
leal -1(%rax), %edx
movl %edx, 16(%rdi)
.L4:
testl %eax, %eax
jg .L11
leaq 15(%rsp), %rsi
call _ZNSs4_Rep10_M_destroyERKSaIcE
.L11:
movl $0, %eax
addq $24, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE591:
.size main, .-main
[Useless stuff removed...]
.ident "GCC: (Gentoo 4.5.0 p1.2, pie-0.4.5) 4.5.0"
.section .note.GNU-stack,"",#progbits
Huge difference, eh?
I am not sure if I fully understand your problem. Would the following help?
namespace X
{
namespace{int dummy;}
void debug_check(int);
}
#ifdef DEBUG
#define DEBUG_CHECK(ARG) debug_check(ARG)
#else
#define DEBUG_CHECK(ARG) dummy // just ignore args
#endif
int main()
{
X::DEBUG_CHECK(1);
}
This solution might not work, because it can generate a "statement without effect" warning. A potentially better solution would be to gobble the namespace prefix up in a function declaration:
// debug_check and "#ifdef DEBUG" part omitted
namespace X
{
typedef void dummy_type;
}
namespace Y
{
typedef void dummy_type;
}
typedef void dummy_type;
#define DEBUG(X) dummy_type dummy_fn();
int main()
{
X::DEBUG(1);
Y::DEBUG(2);
X::DEBUG(3);
Y::DEBUG(4);
DEBUG(5);
DEBUG(6);
};
As long as any definition of dummy_type yields the same type, this should legal, because typedefs are not distinct types.
you could just have your logging function replaced by a function that does nothing, no?
I know that this questions has been answered since ages, but I came across this problem when I put a log macro into a namespace. You were suggesting empty functions and optimization levels. Clark Gaebles made me think, because of the different results using const char*or const std::string&. The following code gives me no reasonable changes in assembly with no optimization levels enabled:
#include <iostream>
#undef _DEBUG // undefine to use __NOJOB
namespace Debug
{
typedef void __NOJOB;
class Logger
{
public:
static void Log( const char* msg, const char* file, int line )
{
std::cout << "Log: " << msg << " in " <<
file << ":" << line << std::endl;
}
};
}
#ifdef _DEBUG
#define Log( msg ) Logger::Log( msg, __FILE__, __LINE__ );
#else
#define Log( msg )__NOJOB(0);
#endif
int main()
{
Debug::Log( "please skip me" );
return 0;
}
created assembly by http://assembly.ynh.io/:
main:
.LFB972:
.cfi_startproc
0000 55 pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
0001 4889E5 movq %rsp, %rbp
.cfi_def_cfa_register 6 // <- stack main
// no code for void( 0 ) here
0004 B8000000 movl $0, %eax // return
00
0009 5D popq %rbp // -> end stack main
.cfi_def_cfa 7, 8
000a C3 ret
Maybe I made an mistake or understood something wrong? Would be nice to hearing from you.