I've been refactoring a codebase for an embedded chip that uses a lot of convenient overloaded functions with default parameters, like this:
int state, prevState;
float time;
void setState(int newState, float newtime)
{
if (newState != state)
{
prevState = state;
time = newTime;
}
state = newState;
}
inline void setState(int newState)
{
setState(newState, time);
}
In its current implementation, the second function is manually optimized:
void setState(int newState)
{
if (newState != state)
prevState = state;
state = newState;
}
If I use the new implementation (the one with inline) is there a way for the compiler to recognize and remove the code involving time, or is the old manual way the best practice?
I've used Godbolt's compiler on GCC yet can't find an appropriate setting for code or a compile flag that doesn't obfuscate everything, or have the calls remain.
GCC generates exactly the same code for both variants with -O3: godbolt.
Note the the inline keyword is at best only considered as a suggestion in the compiler's decision regarding inlining. It's main effect is something completely different.
Related
I have a piece of code with this structure:
__forceinline void problemFunction(bool condition, int & value)
{
if (condition)
{
value = 0;
return;
}
// ...
// a lot of calculations
// ...
}
void caller()
{
bool condition;
int value;
// ...
problemFunction(condition, value);
someOtherStuff();
}
But after building Release configuration with optimization turned on in Disassembly I get something like this:
void caller()
{
bool condition;
int value;
// ...
if (condition)
value = 0;
else
goto CalculateLabel;
ReturnProblemFunctionLabel:
someOtherStuff();
goto ReturnLabel;
CalculateLabel:
// ...
// a lot of calculations
// ...
goto ReturnProblemFunctionLabel;
ReturnLabel:
}
ProblemFunction was splitted into two parts. And the proble is that the second part is located after the someOtherStuff function call.
How can I locally suppress this kind of optimization?
I am using Visual Studio 2019 Version 16.4.2 on Windows 10.
In C++20 you can mark branches with the likely/unlikely attribute (see this question and cppreference), which can give the compiler a hint on how to better optimize the code. In your original post I'm assuming that the condition passed to problemFunction is usually false, which would mean an unnecessary jump in most cases.
As you can see on godbolt, if you mark your if statement with [[unlikely]] g++ will output your desired result, but msvc will not change the generated code. Note that this example is just a basic demo. Compiling your actual program may give different results.
Also note that jumps do not necessarily mean worse performance, because of branch prediction. You have to measure your execution time to make any meaningful conclusions.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Question: is it OK to rely on compiler optimizations while coding?
Let's say I need to calculate calculateF and calcuateG which both depend on another value returned by getValue. Sometimes I need both values, some other times I only need one of those values.
// some function
double getValue(double value)
{
double val(0.0);
// do some math with value
return val;
}
// calculateF depends on getValue
double calculateF(double value)
{
double f(0.0);
auto val = getValue(value);
// calculate f which depends on val (and value)
return f;
}
// calculateG depends on getValue
double calculateG(double value)
{
double g(0.0);
auto val = getValue(value);
// calculate g which depends on val (and value)
return g;
}
Now, I could write this more elegantly:
std::pair<double,double> calculateFG(double value)
{
auto val = getValue(value);
double f(0.0), g(0.0);
// calculate f and g which depend on val (and value)
return {f,g};
}
If I want both values:
double value(5.3);
auto [f,g] = calculateFG(value); // since C++17
// do things with f and g
If I want only 1 value, say f, I just don't use g and it will be optimized out. So, the performance of calculateFG is exactly the same as calculateF if I don't use g. Furthermore, if I need both f and g, I only need to call getValue once instead of twice.
The code is cleaner (only 1 function calculateFG instead of calculateF and calculateG), and faster if both f and g are required. But is relying on the compiler optimization a wise choice?
It is hard to say if it is wise or not. It depends on compiler optimization - function inlining.
If calculateFG is inlined, the complier can optimize out the unused one. Once inlined, g is unused so all the code for generating g is dead code[1]. (It may not be able, for example, if the calculation code has some side effects)
If not, I don't think the optimization can be applied(Always calc f and g).
Now you may wonder if it is possible to always inline specific functions.
Please note that giving inline keyword does not force the compiler to inline that function. It is just a hint. With or without the keyword, it is the compiler's call. It seems like there is non-standard way though - How do I force gcc to inline a function?
[1]Relavent compiler options : -fdce -fdse -ftree-dce -ftree-dse
Modern C++ compilers are pretty good at optimization choices, given the chance.
That is to say, if you declare a function inline, that does not mean the optimizer will actually ilnine it 100% of the time. The effect is more subtle: inline means you avoid the One Definition Rule, so the function definition can go into header files. That makes it a lot easier for the optimizer.
Now with your examples of double [f,g], optimizers are very good at tracking the use of simple scalar values, and will be able to eliminate write-only operations. Inlining allows the optimizer to eliminate unnecessary writes in called functions too. For you, that means the optimizer can eliminate writes to f in calculateFG when the calling code does not use f later on.
Perhaps it is best to turn the logic inside-out. Instead of computing a value (getValue()), passing it to both calculateF() and calculateG(), and passing the results to another place, you can change the code to pass the functions instead of computed values.
This way, if the client code does not need calculateF's value, it won't call it. The same with calculateG. If getValue is also expensive, you can call it once and bind or capture the value.
These are concepts used extensively in functional programming paradigm.
You could rewrite your calculateFG() function more or less like this:
auto getFG(double value)
{
auto val = getValue(value);
return {
[val]{ return calculateF(val); },
[val]{ return calculateG(val); }};
}
It sounds like your goal is to only perform the (potentially expensive) calculations of getValue(), f, and g as few times as possible given the caller's needs -- i.e. you don't want to perform any computations that the caller isn't going to use the results of.
In that case, it might be simplest to just implement a little class that does the necessary on-demand computations and caching, something like this:
#include <stdio.h>
#include <math.h>
class MyCalc
{
public:
MyCalc(double inputValue)
: _inputValue(inputValue), _vCalculated(false), _fCalculated(false), _gCalculated(false)
{
/* empty */
}
double getF() const
{
if (_fCalculated == false)
{
_f = calculateF();
_fCalculated = true;
}
return _f;
}
double getG() const
{
if (_gCalculated == false)
{
_g = calculateG();
_gCalculated = true;
}
return _g;
}
private:
const double _inputValue;
double getV() const
{
if (_vCalculated == false)
{
_v = calculateV();
_vCalculated = true;
}
return _v;
}
mutable bool _vCalculated;
mutable double _v;
mutable bool _fCalculated;
mutable double _f;
mutable bool _gCalculated;
mutable double _g;
// Expensive math routines below; we only want to call these (at most) one time
double calculateV() const {printf("calculateV called!\n"); return _inputValue*sin(2.14159);}
double calculateF() const {printf("calculateF called!\n"); return getV()*cos(2.14159);}
double calculateG() const {printf("calculateG called!\n"); return getV()*tan(2.14159);}
};
// unit test/demo
int main()
{
{
printf("\nTest 1: Calling only getF()\n");
MyCalc c(1.5555);
printf("f=%f\n", c.getF());
}
{
printf("\nTest 2: Calling only getG()\n");
MyCalc c(1.5555);
printf("g=%f\n", c.getG());
}
{
printf("\nTest 3: Calling both getF and getG()\n");
MyCalc c(1.5555);
printf("f=%f g=%f\n", c.getF(), c.getG());
}
return 0;
}
I think that it's best to write your code in a way that expresses what you are trying to accomplish.
If your goal is to make sure that certain calculations are only done once, use something like Jeremy's answer.
A good function should do only one thing. I would design like below.
class Calc {
public:
Calc(double value) : value{value}, val{getValue(value)} {
}
double calculateF() const;
double calculateG() const;
//If it is really a common usecase to call both together
std::pair<double, double> calculateFG() const {
return {calculateF(), calculateG()};
}
static double getValue(double value);
private:
double value;
double val;
};
To know whether compiler will optimize will depend on the rest of the code. For example, if there was a debug message like log_debug(...), that could affect dead code removal. Compiler can only get rid of the dead code if it can prove that the code has no side effects in compile time (Even if you force inline).
Other option is, you can mark the getValue function with special compiler specific attributes like pure or const. This can force the compiler to optimize the second call of getValue. https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-g_t_0040code_007bpure_007d-function-attribute-3348
I want to confirm if a certain kind optimization that came to my mind is possible.
// test.h
class Test
{
public:
static void Main();
private:
__forceinline static bool func1()
{
return ((externalCond1 && externalCond2) ? true : false);
}
}
// test.cpp
#include "test.h"
void Test::Main()
{
if(func1() == true)
{
//Do something
}
}
Would the condition at Main be optimized away thanks to the inlined func1, and prevent it from actually being tested? (resulting in only testing the conditions within func1).
This is only an example code. But, since my real inlined function is about that short anyways, I would simply copy the conditions within func1 to all places that wanted to call this function, if it turns out that this optimization is impossible.
Finally, I would like to know (only if the optimization is possible) whether it would simply be a case of the "Return Value Optimization" paradigm.
If the __forceinline attribute is honored, your code is equivalent to
void Test::Main()
{
if (((externalCond1 && externalCond2) ? true : false) == true)
{
//Do something
}
}
If your conditions are external in the sense of being variables external to this compilation unit, then they cannot be optimized away as their values are unknown at compile-time.
Let's say you have a function in C/C++, that behaves a certain way the first time it runs. And then, all other times it behaves another way (see below for example). After it runs the first time, the if statement becomes redundant and could be optimized away if speed is important. Is there any way to make this optimization?
bool val = true;
void function1() {
if (val == true) {
// do something
val = false;
}
else {
// do other stuff, val is never set to true again
}
}
gcc has a builtin function that let you inform the implementation about branch prediction:
__builtin_expect
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
For example in your case:
bool val = true;
void function1()
{
if (__builtin_expect(val, 0)) {
// do something
val = false;
}
else {
// do other stuff, val is never set to true again
}
}
You should only make the change if you're certain that it truly is a bottleneck. With branch-prediction, the if statement is probably instant, since it's a very predictable pattern.
That said, you can use callbacks:
#include <iostream>
using namespace std;
typedef void (*FunPtr) (void);
FunPtr method;
void subsequentRun()
{
std::cout << "subsequent call" << std::endl;
}
void firstRun()
{
std::cout << "first run" << std::endl;
method = subsequentRun;
}
int main()
{
method = firstRun;
method();
method();
method();
}
produces the output:
first run subsequent call subsequent call
You could use a function pointer but then it will require an indirect call in any case:
void (*yourFunction)(void) = &firstCall;
void firstCall() {
..
yourFunction = &otherCalls;
}
void otherCalls() {
..
}
void main()
{
yourFunction();
}
One possible method is to compile two different versions of the function (this can be done from a single function in the source with templates), and use a function pointer or object to decide at runtime. However, the pointer overhead will likely outweigh any potential gains unless your function is really expensive.
You could use a static member variable instead of a global variable..
Or, if the code you're running the first time changes something for all future uses (eg, opening a file?), you could use that change as a check to determine whether or not to run the code (ie, check if the file is open). This would save you the extra variable. Also, it might help with error checking - if for some reason the initial change is be unchanged by another operation (eg, the file is on removable media that is removed improperly), your check could try to re-do the change.
A compiler can only optimize what is known at compile time.
In your case, the value of val is only known at runtime, so it can't be optimized.
The if test is very quick, you shouldn't worry about optimizing it.
If you'd like to make the code a little bit cleaner you could make the variable local to the function using static:
void function() {
static bool firstRun = true;
if (firstRun) {
firstRun = false;
...
}
else {
...
}
}
On entering the function for the first time, firstRun would be true, and it would persist so each time the function is called, the firstRun variable will be the same instance as the ones before it (and will be false each subsequent time).
This could be used well with #ouah's solution.
Compilers like g++ (and I'm sure msvc) support generating profile data upon a first run, then using that data to better guess what branches are most likely to be followed, and optimizing accordingly. If you're using gcc, look at the -fprofile-generate option.
The expected behavior is that the compiler will optimize that if statement such that the else will be ordered first, thus avoiding the jmp operation on all your subsequent calls, making it pretty much as fast as if it wern't there, especially if you return somewhere in that else (thus avoiding having to jump past the 'if' statements)
One way to make this optimization is to split the function in two. Instead of:
void function1()
{
if (val == true) {
// do something
val = false;
} else {
// do other stuff
}
}
Do this:
void function1()
{
// do something
}
void function2()
{
// do other stuff
}
One thing you can do is put the logic into the constructor of an object, which is then defined static. If such a static object occurs in a block scope, the constructor is run the fist time that an execution of that scope takes place. The once-only check is emitted by the compiler.
You can also put static objects at file scope, and then they are initialized before main is called.
I'm giving this answer because perhaps you're not making effective use of C++ classes.
(Regarding C/C++, there is no such language. There is C and there is C++. Are you working in C that has to also compile as C++ (sometimes called, unofficially, "Clean C"), or are you really working in C++?)
What is "Clean C" and how does it differ from standard C?
To remain compiler INDEPENDENT you can code the parts of if() in one function and else{} in another. almost all compilers optimize the if() else{} - so, once the most LIKELY being the else{} - hence code the occasional executable code in if() and the rest in a separate function that's called in else
Suppose I have a function named caller, which will call a function named callee:
void caller()
{
callee();
}
Now caller might be called many times in the application, and you want to make sure callee is only called once. (kind of lazy initialization), you could implement it use a flag:
void caller()
{
static bool bFirst = true;
if(bFirst)
{
callee();
bFirst = false;
}
}
My opinion for this is it needs more code, and it needs one more check in every call of function caller.
A better solution to me is as follow: (suppose callee returns int)
void caller()
{
static int ret = callee();
}
But this can't handle the case if callee returns void, my solution is using the comma expression:
void caller()
{
static int ret = (callee(), 1);
}
But the problem with this is that comma expression is not popular used and people may get confused when see this line of code, thus cause problems for maintainance.
Do you have any good idea to make sure a function is only called once?
You could use this:
void caller()
{
static class Once { public: Once(){callee();}} Once_;
}
Thread-safe:
static boost::once_flag flag = BOOST_ONCE_INIT;
boost::call_once([]{callee();}, flag);
You could hide the function through a function pointer.
static void real_function()
{
//do stuff
function = noop_function;
}
static void noop_function()
{
}
int (*function)(void) = real_function;
Callers just call the function which will do the work the first time, and do nothing on any subsequent calls.
Your first variant with a boolean flag bFirst is nothing else that an explict manual implementatuion of what the compiler will do for you implictly in your other variants.
In other words, in a typical implementation in all of the variants you pesented so far there will be an additional check for a boolean flag in the generated machine code. The perfromance of all these variants will be the same (if that's your concern). The extra code in the first variant might look less elegant, but that doesn't seem to be a big deal to me. (Wrap it.)
Anyway, what you have as your first variant is basically how it is normally done (until you start dealing with such issues as multithreading etc.)
Inspired by some people, I think just use a macro to wrap comma expression would also make the intention clear:
#define CALL_ONCE(func) do {static bool dummy = (func, true);} while(0)