So that:
template <bool Mode>
void doIt()
{
//many lines
template_if(Mode)
{
doSomething(); // and not waste resources on if
}
//many other lines
}
I know there is enable_if command that can be used for enabling the function conditionally, but I do not think I can use it such option here.
Essentially what I need is template construct that acts as #ifdef macro.
Before trying something complex it's often worth checking if the simple solution already achieves what you want.
The simplest thing I can think of is to just use an if:
#include <iostream>
void doSomething()
{
std::cout << "doing it!" << std::endl;
}
template <bool Mode>
void doIt()
{
//many lines
if(Mode)
{
doSomething(); // and not waste resources on if
}
//many other lines
}
void dont()
{
doIt<false>();
}
void actuallyDoIt()
{
doIt<true>();
}
So what does that give:
gcc 5.3 with no optimizations enabled gives:
void doIt<false>():
pushq %rbp
movq %rsp, %rbp
nop
popq %rbp
ret
void doIt<true>():
pushq %rbp
movq %rsp, %rbp
call doSomething()
nop
popq %rbp
ret
Note no doSomething() call in the false case just the bare work of the doIt function call. Turning optimizations on would eliminate even that.
So we already get what we want and are not wasting anything in the if. It's probably good to leave it at that rather than adding any unneeded complexity.
It can sort of be done.
If the code inside your "if" is syntactically and semantically valid for the full set of template arguments that you intend to provide, then you can basically just write an if statement. Thanks to basic optimisations, if (someConstant) { .. } is not going to survive compilation when someConstant is false. And that's that.
However, if the conditional code is actually not valid when the condition isn't met, then you can't do this. That's because class templates and function templates are instantiated ... in full. Your entire function body is instantiated so it all has to be valid. There's no such thing as instantiating an arbitrary block of code.†
So, in that case, you'd have to go back to messy old function specialisation with enable_if or whatever.
† C++17 is likely to have if constexpr which essentially gives you exactly this. But that's future talk.
You could specialize your template so that your code is only used when the template parameter is true:
template < typename _Cond > struct condition {};
template <> struct condition<false> {
static /* constexpr */ void do_something() {};
}
template <> struct condition<true> {
static void do_something() {
// Actual code
}
}
// Usage:
condition<true>::do_something();
condition<compiletime_constant>::do_something();
Related
I have the following function
template <bool c>
void func()
{
...
if (c) {
// do something
} else {
// do something else
}
}
This function is to be used several time inside a loop:
for (...) {
func<true>();
}
I would like to know it the if inside func is done compile-time, run-time, or if it is left to the compiler.
The compiler may or may not optimize your code since it knows what c is at compile time. The only way to actually know would be to look at the generated assembly code to see if the branch was removed or not. That said, C++17 introduced constexpr if which is guaranteed to evaluate the condition at compile time and discard the branch that is not taken. That would make your function look like
template <bool c>
void func()
{
// ...
if constexpr(c) {
// do something if c is true, discarded otherwise
} else {
// do something if c is false, discarded otherwise
}
}
I've come across a few scenarios where I want to say a function's return value is likely inside the body of a function, not the if statement that will call it.
For example, say I want to port code from using a LIKELY macro to using the new [[likely]] annotation. But these go in syntactically different places:
#define LIKELY(...) __builtin_expect(!!(__VA_ARGS__),0)
if(LIKELY(x)) { ... }
vs
if(x) [[likely]] { ... }
There's no easy way to redefine the LIKELY macro to use the annotation. Would defining a function like
inline bool likely(bool x) {
if(x) [[likely]] return true;
else return false;
}
propagate the hint out to an if? Like in
if(likely(x)) { ... }
Similarly, in generic code, it can be difficult to directly express algorithmic likelihood information in the actual if statement, even if this information is known elsewhere. For example, a copy_if where the predicate is almost always false. As far as I know, there is no way to express that using attributes, but if branch weight info can propagate through functions, this is a solved problem.
So far I haven't been able to find documentation about this and I don't know a good setup to test this by looking at the outputted assembly.
The story appears to be mixed for different compilers.
On GCC, I think your inline likely function works, or at least has some effect. Using Compiler Explorer to test differences on this code:
inline bool likely(bool x) {
if(x) [[likely]] return true;
else return false;
}
//#define LIKELY(x) likely(x)
#define LIKELY(x) x
int f(int x) {
if (LIKELY(!x)) {
return -3548;
}
else {
return x + 1;
}
}
This function f adds 1 to x and returns it, unless x is 0, in which case it returns -3548. The LIKELY macro, when it's active, indicates to the compiler that the case where x is zero is more common.
This version, with no change, produces this assembly under GCC 10 -O1:
f(int):
test edi, edi
je .L3
lea eax, [rdi+1]
ret
.L3:
mov eax, -3548
ret
With the #define changed to the inline function with the [[likely]], we get:
f(int):
lea eax, [rdi+1]
test edi, edi
mov edx, -3548
cmove eax, edx
ret
That's a conditional move instead of a conditional jump. A win, I guess, albeit for a simple example.
This indicates that branch weights propagate through inline functions, which makes sense.
On clang, however, there is limited support for the likely and unlikely attributes, and where there is it does not seem to propagate through inline function calls, according to #Peter Cordes 's report.
There is, however, a hacky macro solution that I think also works:
#define EMPTY()
#define LIKELY(x) x) [[likely]] EMPTY(
Then anything like
if ( LIKELY(x) ) {
becomes like
if ( x) [[likely]] EMPTY( ) {
which then becomes
if ( x) [[likely]] {
.
Example: https://godbolt.org/z/nhfehn
Note however that this probably only works in if-statements, or in other cases that the LIKELY is enclosed in parentheses.
gcc 10.2 at least is able to make this deduction (with -O2).
If we consider the following simple program:
void foo();
void bar();
void baz(int x) {
if (x == 0)
foo();
else
bar();
}
then it compiles to:
baz(int):
test edi, edi
jne .L2
jmp foo()
.L2:
jmp bar()
However if we add [[likely]] on the else clause, the generated code changes to
baz(int):
test edi, edi
je .L4
jmp bar()
.L4:
jmp foo()
so that the not-taken case of the conditional branch corresponds to the "likely" case.
Now if we pull the comparison out into an inline function:
void foo();
void bar();
inline bool is_zero(int x) {
if (x == 0)
return true;
else
return false;
}
void baz(int x) {
if (is_zero(x))
foo();
else
bar();
}
we are again back to the original generated code, taking the branch in the bar() case. But if we add [[likely]] on the else clause in is_zero, we see the branch reversed again.
clang 10.0.1 however does not demonstrate this behavior and seems to ignore [[likely]] altogether in all versions of this example.
Yes, it will probably inline, but this is quite pointless.
The __builtin_expect will continue to work even after you upgrade to a compiler that supports those C++ 20 attributes. You can refactor them later, but it will be for purely aesthetic reasons.
Also, your implementation of the LIKELY macro is erroneous (it is actually UNLIKELY), the correct implementations are nelow.
#define LIKELY( x ) __builtin_expect( !! ( x ), 1 )
#define UNLIKELY( x ) __builtin_expect( !! ( x ), 0 )
Naturally, C++ compilers can inline function calls made from within a function template, when the inner function call is directly known in that scope (ref).
#include <iostream>
void holyheck()
{
std::cout << "!\n";
}
template <typename F>
void bar(F foo)
{
foo();
}
int main()
{
bar(holyheck);
}
Now what if I'm passing holyheck into a class, which stores the function pointer (or equivalent) and later invokes it? Do I have any hope of getting this inlined? How?
template <typename F>
struct Foo
{
Foo(F f) : f(f) {};
void calledLater() { f(); }
private:
F f;
};
void sendMonkeys();
void sendTissues();
int main()
{
Foo<void(*)()> f(sendMonkeys);
Foo<void(*)()> g(sendTissues);
// lots of interaction with f and g, not shown here
f.calledLater();
g.calledLater();
}
My type Foo is intended to isolate a ton of logic; it will be instantiated a few times. The specific function invoked from calledLater is the only thing that differs between instantiations (though it never changes during the lifetime of a Foo), so half of the purpose of Foo is to abide by DRY. (The rest of its purpose is to keep this mechanism isolated from other code.)
But I don't want to introduce the overhead of an actual additional function call in doing so, because this is all taking place in a program bottleneck.
I don't speak ASM so analysing the compiled code isn't much use to me.
My instinct is that I have no chance of inlining here.
If you don't really need to use a function pointer, then a functor should make the optimisation trivial:
struct CallSendMonkeys {
void operator()() {
sendMonkeys();
}
};
struct CallSendTissues {
void operator()() {
sendTissues();
}
};
(Of course, C++11 has lambdas, but you tagged your question C++03.)
By having different instantiations of Foo with these classes, and having no internal state in these classes, f() does not depend on how f was constructed, so it's not a problem if a compiler can't tell that it remains unmodified.
With your example, that after fiddling to make it compile looks like this:
template <typename F>
struct Foo
{
Foo(F f) : f(f) {};
void calledLater() { f(); }
private:
F f;
};
void sendMonkeys();
void sendTissues();
int main()
{
Foo<__typeof__(&sendMonkeys)> f(sendMonkeys);
Foo<__typeof__(&sendTissues)> g(sendTissues);
// lots of interaction with f and g, not shown here
f.calledLater();
g.calledLater();
}
clang++ (3.7 as of a few weeks back which means I'd expect clang++3.6 to do this, as it's only a few weeks older in source-base) generates this code:
.text
.file "calls.cpp"
.globl main
.align 16, 0x90
.type main,#function
main: # #main
.cfi_startproc
# BB#0: # %entry
pushq %rax
.Ltmp0:
.cfi_def_cfa_offset 16
callq _Z11sendMonkeysv
callq _Z11sendTissuesv
xorl %eax, %eax
popq %rdx
retq
.Ltmp1:
.size main, .Ltmp1-main
.cfi_endproc
Of course, without a definition of sendMonkeys and sendTissues, we can't really inline any further.
If we implement them like this:
void request(const char *);
void sendMonkeys() { request("monkeys"); }
void sendTissues() { request("tissues"); }
the assembler code becomes:
main: # #main
.cfi_startproc
# BB#0: # %entry
pushq %rax
.Ltmp2:
.cfi_def_cfa_offset 16
movl $.L.str, %edi
callq _Z7requestPKc
movl $.L.str1, %edi
callq _Z7requestPKc
xorl %eax, %eax
popq %rdx
retq
.L.str:
.asciz "monkeys"
.size .L.str, 8
.type .L.str1,#object # #.str1
.L.str1:
.asciz "tissues"
.size .L.str1, 8
Which, if you can't read assembler code is request("tissues") and request("monkeys") inlined as per expected.
I'm simply amazed that g++ 4.9.2. doesn't do the same thing (I got this far and expected to continue with "and g++ does the same, I'm not going to post the code for it"). [It does inline sendTissues and sendMonkeys, but doesn't go the next step to inline request as well]
Of course, it's entirely possible to make tiny changes to this and NOT get the code inlined - such as adding some conditions that depend on variables that the compiler can't determine at compile-time.
Edit:
I did add a string and an integer to Foo and updated these with an external function, at which point the inlining went away for both clang and gcc. Using JUST an integer and calling an external function, it does inline the code.
In other words, it really depends on what the code is in the section
// lots of interaction with f and g, not shown here. And I think you (Lightness) have been around here long enough to know that for 80%+ of the questions, it's the code that isn't posted in the question that is the most important part for the actual answer ;)
To make your original approach work, use
template< void(&Func)() >
struct Foo
{
void calledLater() { Func(); }
};
In general, I've had better luck getting gcc to inline things by using function references rather than function pointers.
Using policy based design, an EncapsulatedAlgorithm:
template< typename Policy>
class EncapsulatedAlgorithm : public Policy
{
double x = 0;
public:
using Policy::subCalculate;
void calculate()
{
Policy::subCalculate(x);
}
protected:
~EncapsulatedAlgorithm() = default;
};
may have a policy Policy that performs a sub-calculation. The sub-calculation is not necessary for the algorithm: it can be used in some cases to speed up algorithm convergence. So, to model that, let's say there are three policies.
One that just "logs" something:
struct log
{
static void subCalculate(double& x)
{
std::cout << "Doing the calculation" << endl;
}
};
one that calculates:
struct calculate
{
static void subCalculate(double& x)
{
x = x * x;
}
};
and one to bring them all and in the darkness bind them :D - that does absolutely nothing:
struct doNothing
{
static void subCalculate(double& x)
{
// Do nothing.
}
};
Here is the example program:
typedef EncapsulatedAlgorithm<doNothing> nothingDone;
typedef EncapsulatedAlgorithm<calculate> calculationDone;
typedef EncapsulatedAlgorithm<loggedCalculation> calculationLogged;
int main(int argc, const char *argv[])
{
nothingDone n;
n.calculate();
calculationDone c;
c.calculate();
calculationLogged l;
l.calculate();
return 0;
}
And here is the live example. I tried examining the assembly code produced by gcc with the optimization turned on:
g++ -S -O3 -std=c++11 main.cpp
but I do not know enough about Assembly to interpret the result with certainty - the resulting file was tiny and I was unable to recognize the function calls, because the code of the static functions of all policies was inlined.
What I could see is that when no optimization is set for the, within the main function, there is a call and a subsequent leave related to the 'doNothing::subCalculate'
call _ZN9doNothing12subCalculateERd
leave
Here are my questions:
Where do I start to learn in order to be able to read what g++ -S spews out?
Is the empty function optimized away or not and where in main.s are those lines?
Is this design O.K.? Usually, implementing a function that does nothing is a bad thing, as the interface is saying something completely different (subCalculate instead of doNothing), but in the case of policies, the policy name clearly states that the function will not do anything. Otherwise I need to do type traits stuff like enable_if, etc, just to exclude a single function call.
I went to http://assembly.ynh.io/, which shows assembly output. I
template< typename Policy>
struct EncapsulatedAlgorithm : public Policy
{
void calculate(double& x)
{
Policy::subCalculate(x);
}
};
struct doNothing
{
static void subCalculate(double& x)
{
}
};
void func(double& x) {
EncapsulatedAlgorithm<doNothing> a;
a.calculate(x);
}
and got these results:
.Ltext0:
.globl _Z4funcRd
_Z4funcRd:
.LFB2:
.cfi_startproc #void func(double& x) {
.LVL0:
0000 F3 rep #not sure what this is
0001 C3 ret #}
.cfi_endproc
.LFE2:
.Letext0:
Well, I only see two opcodes in the assembly there. rep (no idea what that is) and end function. It appears that the G++ compiler can easily optimize out the function bodies.
Where do I start to learn in order to be able to read what g++ -S spews out?
This site's not for recommending reading material. Google "x86 assembly language".
Is the empty function optimized away or not and where in main.s are those lines?
It will have been when the optimiser was enabled, so there won't be any lines in the generated .S. You've already found the call in the unoptimised output....
In fact, even the policy that's meant to do a multiplication may be removed as the compiler should be able to work out you're not using the resultant value. Add code to print the value of x, and seed x from some value that can't be known at compile time (it's often convenient to use argc in a little experimental program like this, then you'll be forcing the compiler to at least leave in the functionally significant code.
Is this design O.K.?
That depends on a lot of things (like whether you want to use templates given the implementation needs to be exposed in the header file, whether you want to deal with having distinct types for every instantiation...), but you're implementing the design correctly.
Usually, implementing a function that does nothing is a bad thing, as the interface is saying something completely different (subCalculate instead of doNothing), but in the case of policies, the policy name clearly states that the function will not do anything. Otherwise I need to do type traits stuff like enable_if, etc, just to exclude a single function call.
You may want to carefully consider your function names... do_any_necessary_calculations(), ensure_exclusivity() instead of lock_mutex(), after_each_value() instead of print_breaks etc..
I have an array of call backs like this void (*callbacks[n])(void* sender) and I'm wondering which one of these codes will preform faster :
//Method A
void nullcallback(void* sender){};
void callbacka(void* sender)
{
printf("Hello ");
}
void callbackb(void* sender)
{
printf("world\n");
}
int main()
{
void (*callbacks[5])(void* sender);
unsigned i;
for (i=0;i<5;++i)
callbacks[i] = nullcallback;
callbacks[2] = callbacka;
callbacks[4] = callbackb;
for (i=0;i<5;++i)
callbacks[i](NULL);
};
or
//Method B
void callbacka(void* sender)
{
printf("Hello ");
}
void callbackb(void* sender)
{
printf("world\n");
}
int main()
{
void (*callbacks[5])(void* sender);
unsigned i;
for (i=0;i<5;++i)
callbacks[i] = NULL;
callbacks[2] = callbacka;
callbacks[4] = callbackb;
for (i=0;i<5;++i)
if (callbacks[i])
callbacks[i](NULL);
};
some conditions:
Does it matter if I know most of my callbacks are valid or not?
Does it make a difference if I'm compiling my code using C or C++ compiler?
Does the target platform (windows, linux, mac, iOS, android) change any thing in the results? (the whole reason for this callback array is to manage callbacks in a game)
You'd have to look into the assembler code for that. On my platform (gcc, 32bit) I found that the compiler is not able to optimize the call to nullcallback out. But if I improve your method A to the following
int main(void) {
static void (*const callbacks[5])(void* sender) = {
[0] = nullcallback,
[1] = nullcallback,
[2] = callbacka,
[3] = nullcallback,
[4] = callbackb,
};
for (unsigned i=0;i<5;++i)
callbacks[i](0);
};
the compiler is able to unroll the loop and optimize the calls the result is just
.type main, #function
main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
subl $16, %esp
movl $0, (%esp)
call callbacka
movl $0, (%esp)
call callbackb
xorl %eax, %eax
leave
ret
.size main, .-main
This totally depends on your actual situation. If possible I would prefer methode A, because it is simply easier to read and produce cleaner code, in particular if your function has a return value:
ret = callbacks[UPDATE_EVENT](sender);
// is nicer then
if (callbacks[UPDATE_EVENT])
ret = callbacks[UPDATE_EVENT](sender);
else
ret = 0;
Of course methode A becomes tedouis when you have not only one function signature but let's say 100 different signature. And for each you have to write a null function.
For the performance consideration it depends if the nullcallback() is a rare case or not. If it is rare, methode A is obviously faster. If not methode B could be slightly faster, but that depends on many factors: which platform you use, how many arguments your functions have, etc. But in any case if your callbacks are doing "real work", ie. not only some simple calculations, it shouldn't matter at all.
Where your methode B could really be faster is when you not only call the callback for one sender but for very many:
extern void *senders[SENDERS_COUNT]; // SENDERS_COUNT is a large number
if (callbacks[UPDATE_EVENT])
{
for (int i = 0; i < SENDERS_COUNT; i++)
callbacks[UPDATE_EVENT](senders[i]);
}
Here the entire loop is skipped when there is no valid callback. This tweak can also be done with methode A if the nullcallback() address is known, ie. not defined in some module only.
You could optimize your code further by simply zero-initializing the array to start with like:
void (*callbacks[5])(void* sender) = { 0 };
Then you've completely eliminated the need for your for-loop to set each pointer to NULL. You now just have to make assignments for callbacka and callbackb.
For the general case method B is preferred, but for function pointer LUTs when NULL is the exception than method A is microscopically faster.
The primary example is Linux system call table, NULL calls should only occur in rare circumstances when running binaries built on newer systems, or programmer error. Systems calls occur often enough that nanosecond or even picosecond improvements can help.
Other instances it may prove worthy is for opcode LUTs inside emulators such as MAME.