Do branch likelihood hints carry through function calls? - c++

I've come across a few scenarios where I want to say a function's return value is likely inside the body of a function, not the if statement that will call it.
For example, say I want to port code from using a LIKELY macro to using the new [[likely]] annotation. But these go in syntactically different places:
#define LIKELY(...) __builtin_expect(!!(__VA_ARGS__),0)
if(LIKELY(x)) { ... }
vs
if(x) [[likely]] { ... }
There's no easy way to redefine the LIKELY macro to use the annotation. Would defining a function like
inline bool likely(bool x) {
if(x) [[likely]] return true;
else return false;
}
propagate the hint out to an if? Like in
if(likely(x)) { ... }
Similarly, in generic code, it can be difficult to directly express algorithmic likelihood information in the actual if statement, even if this information is known elsewhere. For example, a copy_if where the predicate is almost always false. As far as I know, there is no way to express that using attributes, but if branch weight info can propagate through functions, this is a solved problem.
So far I haven't been able to find documentation about this and I don't know a good setup to test this by looking at the outputted assembly.

The story appears to be mixed for different compilers.
On GCC, I think your inline likely function works, or at least has some effect. Using Compiler Explorer to test differences on this code:
inline bool likely(bool x) {
if(x) [[likely]] return true;
else return false;
}
//#define LIKELY(x) likely(x)
#define LIKELY(x) x
int f(int x) {
if (LIKELY(!x)) {
return -3548;
}
else {
return x + 1;
}
}
This function f adds 1 to x and returns it, unless x is 0, in which case it returns -3548. The LIKELY macro, when it's active, indicates to the compiler that the case where x is zero is more common.
This version, with no change, produces this assembly under GCC 10 -O1:
f(int):
test edi, edi
je .L3
lea eax, [rdi+1]
ret
.L3:
mov eax, -3548
ret
With the #define changed to the inline function with the [[likely]], we get:
f(int):
lea eax, [rdi+1]
test edi, edi
mov edx, -3548
cmove eax, edx
ret
That's a conditional move instead of a conditional jump. A win, I guess, albeit for a simple example.
This indicates that branch weights propagate through inline functions, which makes sense.
On clang, however, there is limited support for the likely and unlikely attributes, and where there is it does not seem to propagate through inline function calls, according to #Peter Cordes 's report.
There is, however, a hacky macro solution that I think also works:
#define EMPTY()
#define LIKELY(x) x) [[likely]] EMPTY(
Then anything like
if ( LIKELY(x) ) {
becomes like
if ( x) [[likely]] EMPTY( ) {
which then becomes
if ( x) [[likely]] {
.
Example: https://godbolt.org/z/nhfehn
Note however that this probably only works in if-statements, or in other cases that the LIKELY is enclosed in parentheses.

gcc 10.2 at least is able to make this deduction (with -O2).
If we consider the following simple program:
void foo();
void bar();
void baz(int x) {
if (x == 0)
foo();
else
bar();
}
then it compiles to:
baz(int):
test edi, edi
jne .L2
jmp foo()
.L2:
jmp bar()
However if we add [[likely]] on the else clause, the generated code changes to
baz(int):
test edi, edi
je .L4
jmp bar()
.L4:
jmp foo()
so that the not-taken case of the conditional branch corresponds to the "likely" case.
Now if we pull the comparison out into an inline function:
void foo();
void bar();
inline bool is_zero(int x) {
if (x == 0)
return true;
else
return false;
}
void baz(int x) {
if (is_zero(x))
foo();
else
bar();
}
we are again back to the original generated code, taking the branch in the bar() case. But if we add [[likely]] on the else clause in is_zero, we see the branch reversed again.
clang 10.0.1 however does not demonstrate this behavior and seems to ignore [[likely]] altogether in all versions of this example.

Yes, it will probably inline, but this is quite pointless.
The __builtin_expect will continue to work even after you upgrade to a compiler that supports those C++ 20 attributes. You can refactor them later, but it will be for purely aesthetic reasons.
Also, your implementation of the LIKELY macro is erroneous (it is actually UNLIKELY), the correct implementations are nelow.
#define LIKELY( x ) __builtin_expect( !! ( x ), 1 )
#define UNLIKELY( x ) __builtin_expect( !! ( x ), 0 )

Related

C++ Nullptr vs Null-Object for potential noop function arguments?

TL;DR : Should we use fn(Interface* pMaybeNull) or fn(Interface& maybeNullObject) -- specifically in the case of "optional" function arguments of a virtual/abstract base class?
Our code base contains various forms of the following pattern:
struct CallbackBase {
virtual ~CallbackBase() = default;
virtual void Hello(/*omitted ...*/) = 0;
};
...
void DoTheThing(..., CallbackBase* pOpt) {
...
if (pOpt) { pOpt->Hello(...); }
}
where the usage site would look like:
... {
auto greet = ...;
...
DoTheThing(..., &greet);
// or if no callback is required from call site:
DoTheThing(..., nullptr);
}
It has been proposed that, going forward, we should use a form of the Null-Object-Pattern. like so:
struct NoopCall : public CallbackBase {
virtual void Hello(/*omitted ...*/) { /*noop*/ }
};
void DoTheThing2(..., CallbackBase& opt) {
...
opt.Hello(...);
}
... {
NoopCall noop;
// if no callback is required from call site:
DoTheThing2(..., noop);
}
Note: Search variations yield lots of results regarding Null-Object (many not in the C++ space), a lot of very basic treatment of pointer vs. references and if you include the word "optional", as-in the parameter is optional, you obviously get a lot of hits regarding std::optional which, afaik, is unsuitable for this virtual interface use case.
I couldn't find a decent comparison of the two variants present here, so here goes:
Given C++17/C++20 and a halfway modern compiler, is there any expected difference in the runtime characteristics of the two approaches? (this factor being just a corollary to the overall design choice.)
The "Null Object" approach certainly "seems" more modern and safer to me -- is there anything in favor of the pointer approach?
Note:
I think it is orthogonal to the question posed, whether it stands as posted, or uses a variant of overloading or default arguments.
That is, the question should be valid, regardless of:
//a
void DoTheThing(arg);
// vs b
void DoTheThing(arg=nullthing);
// vs c
void DoTheThing(arg); // overload1
void DoTheThing(); // overload0 (calling 1 internally)
Performance:
I inspected the code on godbolt and while MSVC shows "the obvious", the gcc output is interesting (see below).
// Gist for a MCVE.
"The obvious" is that the version with the Noop object contains an unconditional virtual call to Hello and the pointer version has an additional pointer test, eliding the call if the pointer is null.
So, if the function is "always" called with a valid callback, the pointer version is a pessimization, paying an additional null check.
If the function is "never" called with a valid callback, the NullObject version is a (worse) pessimization, paying a virtual call that does nothing.
However, the object version in the gcc code contains this:
WithObject(int, CallbackBase&):
...
mov rax, QWORD PTR [rsi]
...
mov rax, QWORD PTR [rax+16]
(!) cmp rax, OFFSET FLAT:NoopCaller::Hello(HelloData const&)
jne .L31
.L25:
...
.L31:
mov rdi, rsi
mov rsi, rsp
call rax
jmp .L25
And while my understanding of assembly is certainly near non existent, this looks like gcc is comparing the call pointer to the NoopCaller::Hello function, and eliding the call in this case!
Conclusion
In general, the pointer version should produce more optimal code on the micro-level. However, compiler optimizations might make any difference near non-observable.
Think about using the pointer version if you have a very hot path where the callback is null.
Use the null object version otherwise, as it is arguably safer and more maintainable.

Passing integer to x86 ASM in C++

I am trying to do some script hooking in C++, and have setup a simple test function for this case.
void __declspec(naked) testFunct()
{
int myInt;
myInt = 2000;
__asm{
mov eax, myInt
jmp [jmp_back_address]
}
}
when using this to pass in the integer, the function fails when it is called and the project crashes. However, when using this instead, without an integer value, it successfully passes through.
void __declspec(naked) testFunct()
{
__asm{
mov eax, 2000
jmp [jmp_back_address]
}
}
How can I successfully pass the integer?
The correct solution for my situation was to simply do everything within the ourFunct() through ASM instead, as mixing both C++ and ASM for passing variables was creating buggy assembly code. Example with a function call that works:
int CalculateTotalScore()
{
return (int)*Game::current_speed_score;
}
DWORD jmpBackAddress;
void __declspec(naked) ourFunct()
{
__asm{
call CalculateTotalScore
jmp [jmpBackAddress]
}
}
The assembler doesn't know what "myInt" means. Most compilers support inline assembly with the possibility to pass values. For instance, with GCC, you may try to define a macro like
#define MY_ASM_MACRO(myInt) ({ asm volatile("mov eax,%0\n\t \
jmp [jmp_back_address]" : : "r"(myInt) : ); })
And use it like
void __declspec(naked) testFunct()
{
int myInt;
myInt = 2000;
MY_ASM_MACRO(myInt)
}

Can C++ templates be used for conditional code inclusion?

So that:
template <bool Mode>
void doIt()
{
//many lines
template_if(Mode)
{
doSomething(); // and not waste resources on if
}
//many other lines
}
I know there is enable_if command that can be used for enabling the function conditionally, but I do not think I can use it such option here.
Essentially what I need is template construct that acts as #ifdef macro.
Before trying something complex it's often worth checking if the simple solution already achieves what you want.
The simplest thing I can think of is to just use an if:
#include <iostream>
void doSomething()
{
std::cout << "doing it!" << std::endl;
}
template <bool Mode>
void doIt()
{
//many lines
if(Mode)
{
doSomething(); // and not waste resources on if
}
//many other lines
}
void dont()
{
doIt<false>();
}
void actuallyDoIt()
{
doIt<true>();
}
So what does that give:
gcc 5.3 with no optimizations enabled gives:
void doIt<false>():
pushq %rbp
movq %rsp, %rbp
nop
popq %rbp
ret
void doIt<true>():
pushq %rbp
movq %rsp, %rbp
call doSomething()
nop
popq %rbp
ret
Note no doSomething() call in the false case just the bare work of the doIt function call. Turning optimizations on would eliminate even that.
So we already get what we want and are not wasting anything in the if. It's probably good to leave it at that rather than adding any unneeded complexity.
It can sort of be done.
If the code inside your "if" is syntactically and semantically valid for the full set of template arguments that you intend to provide, then you can basically just write an if statement. Thanks to basic optimisations, if (someConstant) { .. } is not going to survive compilation when someConstant is false. And that's that.
However, if the conditional code is actually not valid when the condition isn't met, then you can't do this. That's because class templates and function templates are instantiated ... in full. Your entire function body is instantiated so it all has to be valid. There's no such thing as instantiating an arbitrary block of code.†
So, in that case, you'd have to go back to messy old function specialisation with enable_if or whatever.
† C++17 is likely to have if constexpr which essentially gives you exactly this. But that's future talk.
You could specialize your template so that your code is only used when the template parameter is true:
template < typename _Cond > struct condition {};
template <> struct condition<false> {
static /* constexpr */ void do_something() {};
}
template <> struct condition<true> {
static void do_something() {
// Actual code
}
}
// Usage:
condition<true>::do_something();
condition<compiletime_constant>::do_something();

Local Variables vs. Class Variables Compiler Optimization; Works vs. Doesn't Work

I have an example of code where a straightforward optimization is not working when structured as class variables, yet works as local variables; I want to know: why is the optimization not happening on the class variables formulation?
The intent of my example code is to have a class that is either enabled or disabled at construction and possibly changed during it's lifetime. I expect that, when the object is disabled for it's whole lifetime, the compiler would optimize away all code that conditionally executes when the object is enabled.
Specifically, I have a std::ofstream that I only want to write to when "enabled". When disabled, I want all formatted-output to be skipped. ( My real class does it's own, non-trivial message-formatting. )
I discovered that when I formulate this as a class, I don't get the optimizations I expect. However, if I replicate the code all as local variables, I do see the expected behavior.
Additionally, I discovered that if I don't make std::ofstream calls like 'open', 'exceptions', or 'clear' anywhere in the body of the example class's methods, I also get the expected optimizations. ( However, my design requires making such calls on std::ofstream, so for me it's a moot point. ) The below code uses the MACRO DISABLE_OPEN_OFSTREAM_AFTER_CONSTRUCTOR to allow one to try this case.
My example code uses 'asm' expressions to insert comments into the generated assembly-code. If one inspects the output of the compiler in assembly, I expect there to be no assembly between the 'disabled-test' comments. I'm observing assembly between the 'class disabled-test' comments, yet no assembly between the 'locals disabled-test' comments.
The input C++ code:
#include <fstream> // ofstream
#define DISABLE_OPEN_OFSTREAM_AFTER_CONSTRUCTOR 0
class Test_Ofstream
{
public:
Test_Ofstream( const char a_filename[],
bool a_b_enabled )
#if DISABLE_OPEN_OFSTREAM_AFTER_CONSTRUCTOR
: m_ofstream( a_filename ),
m_b_enabled( a_b_enabled )
{
}
#else
: m_ofstream(),
m_b_enabled( a_b_enabled )
{
m_ofstream.open( a_filename );
}
#endif
void write_test()
{
if( m_b_enabled )
{
m_ofstream << "Some text.\n";
}
}
private:
std::ofstream m_ofstream;
bool m_b_enabled;
};
int main( int argc, char* argv[] )
{
{
Test_Ofstream test_ofstream( "test.txt", true );
asm( "# BEGIN class enabled-test" );
test_ofstream.write_test();
asm( "# END class enabled-test" );
}
{
Test_Ofstream test_ofstream( "test.txt", false );
asm( "# BEGIN class disabled-test" );
test_ofstream.write_test();
asm( "# END class disabled-test" );
}
{
bool b_enabled = true;
#if DISABLE_OPEN_OFSTREAM_AFTER_CONSTRUCTOR
std::ofstream test_ofstream( "test.txt" );
#else
std::ofstream test_ofstream;
test_ofstream.open( "test.txt" );
#endif
asm( "# BEGIN locals enabled-test" );
if( b_enabled )
{
test_ofstream << "Some text.\n";
}
asm( "# END locals enabled-test" );
}
{
bool b_enabled = false;
#if DISABLE_OPEN_OFSTREAM_AFTER_CONSTRUCTOR
std::ofstream test_ofstream( "test.txt" );
#else
std::ofstream test_ofstream;
test_ofstream.open( "test.txt" );
#endif
asm( "# BEGIN locals disabled-test" );
if( b_enabled )
{
test_ofstream << "Some text.\n";
}
asm( "# END locals disabled-test" );
}
return 0;
}
The output assembly code:
##### Cut here. #####
#APP
# 53 "test_ofstream_optimization.cpp" 1
# BEGIN class disabled-test
# 0 "" 2
#NO_APP
cmpb $0, 596(%esp)
je .L22
movl $.LC1, 4(%esp)
movl %ebx, (%esp)
.LEHB9:
call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc
.LEHE9:
.L22:
#APP
# 55 "test_ofstream_optimization.cpp" 1
# END class disabled-test
# 0 "" 2
#NO_APP
##### Cut here. #####
#APP
# 116 "test_ofstream_optimization.cpp" 1
# BEGIN locals disabled-test
# 0 "" 2
# 121 "test_ofstream_optimization.cpp" 1
# END locals disabled-test
# 0 "" 2
#NO_APP
##### Cut here. #####
I realize that this is possibly tied to the compiler I'm using, which is: g++-4.6 (Debian 4.6.1-4) 4.6.1; compiler flags: -Wall -S -O2. However, this seems like such a simple optimization I find it hard to believe it could be the compiler messing up.
Any help, insight or guidance is greatly appreciated.
Pretty simple. When you write the code directly as a local variable, then the code is inlined and the compiler performs the constant folding. When you're in the class scope, then the code is not inlined and the value of m_b_enabled is unknown, so the compiler has to perform the call. To prove that the code was semantically equal and perform this optimization, not just that call would have to be inlined, but every access to the class. The compiler may well decide that inlining the class would not yield sufficient benefit. Compilers can also choose not to inline code because they don't know how, and inline asm expressions is exactly the kind of thing that could cause them to do it, as the compiler does not know how to handle your assembly code.
Usually, you would place a breakpoint and inspect the disassembly. That's what I'd do in Visual Studio, anyway. Inline assembler of any kind can be so damaging to the optimizer.
When I removed the assembler expressions, then Visual Studio inlined the code- and promptly didn't perform the optimization anyway. The problem with stacking optimization passes is that you can never get the right order to find all potential optimizations.
As you say, this will depend on compiler. But my guess:
The optimizer can prove that no other code can ever modify object bool b_enabled, since it's local and you never take its address or bind a reference to it. The local version is easily optimized.
When DISABLE_OPEN_OFSTREAM_AFTER_CONSTRUCTOR is true, the Test_Ofstream constructor:
Calls the constructor ofstream(const char*)
Initializes member m_b_enabled
Since there are no operations between initializing test_ofstream.m_b_enabled and testing it, this optimization is only a bit trickier, but it sounds like g++ still manages it.
When DISABLE_OPEN_OFSTREAM_AFTER_CONSTRUCTOR is false, the Test_Ofstream constructor:
Calls the ofstream default constructor
Initializes member m_b_enabled
Calls m_ofstream.open(const char*)
The optimizer is not allowed to assume that ofstream::open will not change test_ofstream.m_b_enabled. We know it shouldn't, but in theory that non-inline library function could figure out the complete object test_ofstream which contains its 'this' argument, and modify it that way.

Arbitrary pointer to unknown class function - invalid type conversion

I have a hack program; it injects some functions into a target process to control it. The program is written in C++ with inline assembly.
class GameProcMain {
// this just a class
};
GameProcMain* mainproc; // there is no problem I can do =(GameProcMain*)0xC1EA90
Now I want to define a class function (which set ecx to class pointer) instead of writing assembly.
PPLYDATA GetNearblyMob(__Vector3* cordinate) {
__asm {
mov ecx, 0xC1EA90
enter code here
push cordinate
mov edi, 0x4A8010
call edi
}
}
I want to define it and call it like.
PPLYDATA (DLPL::*GetNearblyMob)(__Vector3* cordinate);
mainproc->GetNearblyMob(ADDR_CHRB->kordinat)
When I try GetNearblyMob=(PPLYDATA (DLPL::*)(__Vector3*)) 0x4A8010;
It says something like error: invalid type conversion: "int" to "PPLYDATA (DLPL::*)(int, int)"
but I can do this to set the pointer:
void initializeHack() {
__asm {
LEA edi, GetNearblyMob
MOV eax, 0x4A8010
MOV [edi], eax
}
}
Now I want to learn "how I can set GetNearblyMob without using assembly and legitimately in C++".
The problem is that member functions automatically get an extra parameter for the this pointer. Sometimes you can cast between member and non-member functions, but I don't see the need to cast anything.
Typically it's easier to reverse-engineer into C functions than into C++. C typically has a more straightforward ABI, so you can keep the data structures straight as you work them out.
So, I would recommend
PPLYDATA (*GetNearblyMob)(DLPL *main_obj, __Vector3* cordinate) = 0x12345UL;
and then define your own function
class DLPL {
GetNearblyMob( __Vector3* cordinate ) {
return ::GetNearblyMob( this, cordinate );
}
// ... other program functions
};
I am a bit surprised that it won't you cast like that.
You can try to do something like
GetNearblyMob=reinterpret_cast<PPLYDATA (DLPL::*)(__Vector3*)> (0x4A8010);
If that still does not work, try
*(int*)(&GetNearblyMob) = 0x4A8010;