Do Compilers Un-Inline?

Do Compilers Un-Inline? - c++

It is fairly common knowledge that the most powerful tool in a compilers tool-belt is the inlining of functions into their call sites. But what about doing the reverse? If so, is it done? And when? For example given:
void foo(int x)
{
auto y = bar(x);
baz(y);
}
void bop()
{
int x;
auto y = bar(x);
baz(y);
}
Does it ever make sense for the compiler to abstract this out to
void qux(int x)
{
auto y = bar(x);
baz(y);
}
void foo(int x)
{
qux(x);
}
void bop()
{
int x;
qux(x);
}

Yes, for example LLVM has a MachineOutliner optimization pass.

Outlining makes sense even without repeated code, when the outlined section is [[unlikely]]. The function call is a loss, but unlikely, while on the other hand more likely code can fit in cache.
Compilers might also assume that an exception is unlikely, and outline the catch.

Related

Inline code in C++: options and their advantages/disadvantages

What are the methods to achieve inline code in C++? I can only think about macros and inline functions. Are there more alternatives in C++11/17/20 (e.g. lambda) ? Advantages and disadvantages?
// do macros still make sense in modern C++ standards?
#define square(x) ((x)*(x))
// is this a good alternative to macros?
template <class T> inline T square(T x) { return x * x; }
EDIT: changed comment from "are macros still encouraged...?" to "do macros still make sense...?"

// is this a good alternative to macros?
template <class T> inline T square(T x) { return x * x; }
Yes, this is the preferred way. (Although templates in general don't require inline, whereas explicit template specializations and instantiations do, it is ok to be consistent and write what one means).
Also note that constexpr functions and constructors are implicitly inline.
Also note that using final (where appropriate) in the context of virtual overloading can help inline even some virtual methods (check this post for some examples and explanation).

Macros never were encouraged. Consider that the macro is not doing the same as the function eg here:
int foo() {
static int x = 0;
++x;
return x;
}
std::cout << square(foo());
And that is just one downside of the macro. If you want a function that takes a parameter and returns a value then that is not a macro.

Macros have a big dis-advantage: They are namespace agnostic.
Imagine what happens if I extend your sample:
// are macros still encouraged in modern C++ standards?
#define square(x) ((x)*(x))
namespace My {
int square(int x) { return x * x; }
} // namespace My
So, IMHO, the answer is NO.
Remember that in C (where the preprocessor was introduced), there were no namespaces, and still have not been added until now.
Demo:
#define square(x) ((x)*(x))
namespace My {
int square(int x) { return x * x; }
} // namespace My
Preprocessed:
namespace My {
int ((int x)*(int x)) { return x * x; }
}
int main()
{
std::cout << My::((10)*(10));
}
Demo on coliru
in opposition to:
#include <iostream>
template <typename T>
T square(T x) { return x * x; }
namespace My {
int square(int x) { return x * x; }
} // namespace My
int main()
{
std::cout << My::square(10);
}
Output:
100
Demo on coliru

It smells like "opinion-based", some people tend to dislike macros more than others.
Usual disadvantages of macros:
they evaluate argument as many times as it is encountered (twice in your case)
more error prone (don't forget parentheses)
ignore program structure (don't belong to a namespace, and now you cannot have anything else named square, even as a class method
To mitigate the later one, name macros with ALL_CAPS, and don't name anything else like this.
The advantages of macros are:
C compatibility
Ability to have arbitrary pieces of code in macros, not just (inline) functions

Macros should be the last resort in C++.
They're not inherently bad, but they're not checked by the compiler.
Macro expansion is a little more than text replacement done before compiling, therefore you're writing code that will be checked only after the expansion.
That means that if you make a mistake find the error could be harder, also debugging will be harder, because you won't be able to jump into the macro code.
Also, you have to be careful with parentheses and with multiple evaluation.
Last but not least, all the other things pointed out by the other answers apply too.

When to use __declspec(noalias)?

As I understood if (https://learn.microsoft.com/en-us/cpp/cpp/noalias?view=vs-2019) __declspec(noalias) means that the function only modifies memory inside her body or through the parameters, so its not modifying static variables, or memory throught double pointers, is that correct?
static int g = 3;
class Test
{
int x;
Test& __declspec(noalias) operator +(const int b) //is noalias correct?
{
x += b;
return *this;
}
void __declspec(noalias) test2(int& x) { //correct here?
x = 3;
}
void __declspec(noalias) test3(int** x) { //not correct here!?
*x = 5;
}
}

Given something like:
extern int x;
extern int bar(void);
int foo(void)
{
if (x)
bar();
return x;
}
a compiler that knows nothing about bar() would need to generate code that allows for the possibility that it might change the value of x, and would thus have to load the value of x both before and after the function call. While some systems use so-called "link time optimization" to defer code generation of a function until after any function it calls have been analyzed to see what external objects, if any, they might access, MS uses a simpler approach of simply allowing function prototypes to say that they don't access any outside objects which the calling code might want to cache. This is a crude approach, but allows compilers to reap low hanging fruit cheaply and easily.

Design implications of returning a function with void

Consider the following bar functions
#include <iostream>
void foo(){
std::cout << "Hello" << std::endl;
}
void bar1(){
return foo();
}
void bar2(){
foo();
}
void bar3(){
foo();
return;
}
int main()
{
bar1();
bar2();
bar3();
return 1;
}
These functions do exactly the same thing, and actually godbolt produces the same code for all three (as one would hope). The question I have is simply whether there are any software engineering paradigms/guidelines that advocate one form over the other, and if there are any reasons why you would prefer one over the other. They seem to produce the same machine code, but I am imaging that one might be viewed as "easier to maintain", or something like that.

This is quite opinion-based. Though I'd say the general consensus is to write it like bar2(). Don't return explicitly unless you have to return early and don't do return func() if func() returns a void, that just confuses readers because you're not actually returning a value.

I totally agree with Sombrero Chicken's answer. But I'll also add that the construct like
void bar1(){
return foo();
}
doesn't make much sense for ordinary functions that return void, but may be useful for template code when you don't know the actual return type, e.g:
template <typename T>
auto SomeTemplateFunction(...)
{
// do some works
...
return SomeOtherTemplateFunction<T>(...);
}
This will work regardless SomeOtherTemplateFunction<T>'s return type is void or not.

It's quite opinion based, what I can say is that (3) is tagged by clang-tidy rules as part of the readibility-redundant-control-flow.
The idea is that the control flow here is already defined, the return is superfluous and should then be removed.

Can I rely on the compiler finding and optimizing simple boolean loop invariants?

I have a loop like the one below which has an invariant, here the never changing value of scaleEveryValueByTwo. Can I rely on the compiler finding this invariant and not checking the condition in every iteration (essentially compiling to something anologous to the code at the bottom)?
void loadValuesFromDisk(const bool scaleEveryValueByTwo)
{
std::vector<MyValueType> xs;
while(fileHasNewValues())
{
auto x = loadNextValue();
if (scaleEveryValueByTwo)
{
x *= 2;
}
xs.push_back(x);
}
}
I can of course split this into two loops manually (see below) or put the scaling part in a separate function, but in many cases this makes the code much longer and in my opinion harder to read (for example if I have nested loops for all dimensions of 3D data I would duplicate all three lines of loop headers and up to six lines of curly braces).
void loadValuesFromDisk(const bool scaleEveryValueByTwo)
{
std::vector<MyValueType> xs;
while(fileHasNewValues())
{
auto x = loadNextValue();
xs.push_back(x);
}
if (scaleEveryValueByTwo)
{
for(auto &x : xs)
{
x *= 2;
}
}
}
I'm primarily interested if I can rely on this (or even better, enforce) this optimization for commonly used compilers like gcc or MSVC, not some exotic ones that might be missing optimization that are de facto standard in most compilers.

Earlier there used to be /Og (global optimization) in MSVC compiler, which are now enabled by default.
My guess is other compilers also do that.
To know how the loop optimization is done, look into below link and search for "Loop optimization"
https://learn.microsoft.com/en-us/cpp/build/reference/og-global-optimizations?view=vs-2019
As this comes by default now, you can rely on compiler.

You can make scaleEveryValueByTwo a template parameter to be sure that the condition is evaluated only once.
In C++17 you can use if constexpr as follows
template <bool scaleEveryValueByTwo>
void loadValuesFromDisk()
{
std::vector<MyValueType> xs;
while(fileHasNewValues())
{
auto x = loadNextValue();
if constexpr (scaleEveryValueByTwo)
{
x *= 2;
}
xs.push_back(x);
}
}
If you do not have yet C++17 the code above can be obtained, for example, by involving an auxiliary template function multiply as follows
template <bool activate>
void multiply(decltype(loadNextValue())& x);
template <>
void multiply<true>(decltype(loadNextValue())& x) { x *= 2; }
template <>
void multiply<false>(decltype(loadNextValue())& x) { }
template <bool scaleEveryValueByTwo>
void loadValuesFromDisk()
{
std::vector<MyValueType> xs;
while(fileHasNewValues())
{
auto x = loadNextValue();
multiply<scaleEveryValueByTwo>(x);
xs.push_back(x);
}
}
(Note: I am using decltype because I do not know what your routine loadNextValue() returns.)
Then you call either loadValuesFromDisk<true>() or loadValuesFromDisk<false>(). If scaleEveryValueByTwo is only known at runtime, you can branch to the appropriate function:
void loadValuesFromDisk(bool const scaleEveryValueByTwo)
{
if (scaleEveryValueByTwo)
loadValuesFromDisk<true>();
else
loadValuesFromDisk<false>();
}

C++ virtual function call versus boost::function call speedwise

I wanted to know how fast is a single-inheritance virtual function call when compared to one same boost::function call. Are they almost the same in performance or is boost::function slower?
I'm aware that performance may vary from case to case, but, as a general rule, which is faster, and to a how large degree is that so?
Thanks,
Guilherme
-- edit
KennyTM's test was sufficiently convincing for me. boost::function doesn't seem to be that much slower than a vcall for my own purposes. Thanks.

As a very special case, consider calling an empty function 109 times.
Code A:
struct X {
virtual ~X() {}
virtual void do_x() {};
};
struct Y : public X {}; // for the paranoid.
int main () {
Y* x = new Y;
for (int i = 100000000; i >= 0; -- i)
x->do_x();
delete x;
return 0;
}
Code B: (with boost 1.41):
#include <boost/function.hpp>
struct X {
void do_x() {};
};
int main () {
X* x = new X;
boost::function<void (X*)> f;
f = &X::do_x;
for (int i = 100000000; i >= 0; -- i)
f(x);
delete x;
return 0;
}
Compile with g++ -O3, then time with time,
Code A takes 0.30 seconds.
Code B takes 0.54 seconds.
Inspecting the assembly code, it seems that the slowness may be due to exceptions and handling the possibility and that f can be NULL. But given the price of one boost::function call is only 2.4 nanoseconds (on my 2 GHz machine), the actual code in your do_x() could shadow this pretty much. I would say, it's not a reason to avoid boost::function.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Do Compilers Un-Inline? - c++

Yes, for example LLVM has a MachineOutliner optimization pass.

Outlining makes sense even without repeated code, when the outlined section is [[unlikely]]. The function call is a loss, but unlikely, while on the other hand more likely code can fit in cache. Compilers might also assume that an exception is unlikely, and outline the catch.

Related

Inline code in C++: options and their advantages/disadvantages

When to use __declspec(noalias)?

Design implications of returning a function with void

Can I rely on the compiler finding and optimizing simple boolean loop invariants?

C++ virtual function call versus boost::function call speedwise

Categories

Resources