can compiler reorganize instructions over sleep call? - c++

Is there a difference if it is the first use of the variable or not. For example are a and b treated differently?
void f(bool&a, bool& b)
{
...
a=false;
boost::this_thread::sleep...//1 sec sleep
a=true;
b=true;
...
}
EDIT: people asked why I want to know this.
1. I would like to have some way to tell the compiler not to optimize(swap the order of the execution of the instructions) in some function, and using atomic and or mutexes is much more complicated than using sleep(and in my case sleeping is not a performance problem).
2. Like I said this is generally important to know.

We can't really tell. On scenario could be that the compiler has full introspection to your function at the calling site (and possibly does inline it), in which case it can jumble your function with the caller, and then do optimizations appropriately.
It could then e.g. completely optimize away a and b because there is no code that depends on a and b. Or it might see that you violate aliasing rules so that a and b refer to the same entity, and then merge them according to your program flow.
But it could also be that you tell the compiler to not optimize at all, e.g. with g++'s -O0 flag, in which case not much will happen.
The only proof for your particular platform *, can be made by looking at the generated assembly, or by telling the compiler to please output some log about what it optimizes (g++ has many flags for that).
* compiler+flags used to compile compiler+version+add-ons, hardware, operating system; even the weather might be relevant if your compiler omits some optimizations if it takes to long [which would actually be cool feature for debug builds, imho]

They are not local (because they are references), so it can't, because it can't tell whether the called function sees them or not and has to assume that it does. If they were local variables, it could, because local variables are not visible to the called function unless pointer or reference to them was created.

Related

Can the extra conditional check at the call site be optimised out by the compiler easily?

In the case where you need to check the return value at the call site, is it easy for the compiler to optimise it out if the value is checked in the function itself? Does it make a difference whether the function is inline? I tried looking at the assembly code to check for jumps but I'm afraid I don't understand it at all. I'm talking about a situation like this?
int* try_get()
{
static int anint;
anint = rand() % 2;
if (anint) return &anint;
else return nullptr;
}
int main()
{
int* p = try_get();
if (p) // The value was already tested in the function.
// Is optimisation of this easy? Does it depend on whether the function is inline?
{
std::cout << "Hello";
}
}
A C++ compiler is allowed to perform any optimization that has no observable effects, however the C++ standard does not require any C++ compiler to perform any such optimization (except those that are required by the C++ specification itself, such as mandatory copy elision). Except for the required optimizations, everything else is entirely at your C++ compiler's discretion.
If the compiler has access both to the function definition and its call site, and the compiler can work out that this particular optimization has no observable effects, then the compiler can certainly optimize it out. Whether your compiler will do that can only be answered by looking at your compiler's compiled code. And even after determining what your compiler actually does will not, of course, bear any relevance to what any other compiler would do.
Whether or not the function in question is inline, or not, may or may not be a factor that your compiler considers when deciding whether to perform this optimization.
And, finally, even looking at what your compiler produced, for a particular translation unit, may not even paint the entire picture as well. Many current C++ compilers feature link-time optimizations, where the combined mighty forces of the compiler and the linker produce additional optimizations and code transformations in the final, linked executable.
So the only definitive answer here is to go actually look at the actual linked code in your final executable, in order to figure out whether any particular optimization took place, and, of course, that is a highly technical matter.

If a function is only called from one place, is it always better to inline it? [duplicate]

This question already has answers here:
When to use the inline function and when not to use it?
(14 answers)
Closed 7 years ago.
If a function is only used in one place and some profiling shows that it's not being inlined, will there always be a performance advantage in forcing the compiler to inline it?
Obviously "profile and see" (and in the case of the function in question, it did prove to be a small perf boost). I'm mostly asking out of curiosity -- are there any performance disadvantages to this with a reasonably smart compiler?
No, there are notable exceptions. Take this code for example:
void do_something_often(void) {
x++;
if (x == 100000000) {
do_a_lot_of_work();
}
}
Let's say do_something_often() is called very often and from many places. do_a_lot_of_work() is called very rarely (one out of every one hundred million calls). Inlining do_a_lot_of_work() into do_something_often() doesn't gain you anything. Since do_something_often() does almost nothing, it would be much better if it got inlined into the functions that call it, and in the rare case that they need to call do_a_lot_of_work(), they call it out of line. In that way, they are saving a function call almost every time, and saving code bloat at every call site.
One legitimate case where it makes sense not to inline a function, even if it's only called from a single location, is if the call to the function is rare and almost always skipped. Keeping the instructions before the function call and the instructions after the function call closely together in memory may allow those instructions to be kept in the processor cache, when that would be impossible if those blocks of instructions were separated in memory.
It would still be possible for the compiler to compile the function call as if using goto, avoiding having to keep track of a return address, but if the compiler has already determined that the function call is rare, then it makes sense to not pay as much time optimising that call.
You can't "force" the compiler to inline it, unless you are considering some implementation-specific tools that you have not mentioned, so the question is entirely moot.
If your compiler is already not doing so then it has a reason.
If the function is called only once, there should be no performance disadvantages in inlining it. However, that does not mean you should blindly inline all functions. For example, if the code in question is Linux kernel code and you're using the BUG_ON or WARN_ON statement to print a stack trace, you don't get the full stack trace which includes the inline function. Instead, the stack trace contains only the name of the calling function.
And, as the other answer explained, the "inline" doesn't actually force the compiler to inline the function, it just is a hint to the compiler. However, there is actually an attribute __attribute__((always_inline)) in GCC which should force the compiler to inline the function.
Make sure that the function definition is not exported. If it is, it obviously needs to be compiled, and that means that if your function is big probably the call will not be inlined. (Remember, it's the call that gets inlined, not the function. A function might get inlined in one place and called in another, etc.)
So even if you know that the function is called only from one place, the compiler might not. Make sure to hide the definition of your function to the other object files, for example by defining it in the anonymous namespace.
That being said, even if it is called from only one place, it does not mean that it is always a good idea to inline it. If your function is called rarely, it might waste a lot of memory in the CPU cache.
Depending on how you wrote your function.
In some cases, yes!
void doSomething(int *src, int *dst,
const int loopCountInner, const int loopCountOuter)
{
int i, j;
for(i=0; i<loopCounterOuter; i++){
for(j=0; j<loopCounterInner; j++){
*dst = someCalculations(*src);
src++;
dst++
}
}
}
In this example, if this function is compiled as non-inlined, then compiler basically has no knowledge about the trip count of the two loops. This is a big deal for implementations that rely strongly on compile-time optimizations.
I came across a even worse case: compiler assumes loopCounterInner to be a large value and optimized for that case, but loopCounterInner is actually 3 or 5 so the best choice is to fully unroll the inner loop!
For C++ probably the best way to do it is to make them template variables, but for C, the only way to generate differently optimized code for different use cases is to inline the function.
No, if the code is a rarely used function then keeping it off the 'hot path' will be beneficial. An inline function will use up cache space [instruction cache] whether or not the code is actually used. Tools like LTCG combined with Profile Guided optimisation (in the MSFT world, not sure about Linux) go to great pains to keep rarely used code off the hot path and this can make a significant difference

Is there a reason why not to use link-time optimization (LTO)?

GCC, MSVC, LLVM, and probably other toolchains have support for link-time (whole program) optimization to allow optimization of calls among compilation units.
Is there a reason not to enable this option when compiling production software?
I assume that by "production software" you mean software that you ship to the customers / goes into production. The answers at Why not always use compiler optimization? (kindly pointed out by Mankarse) mostly apply to situations in which you want to debug your code (so the software is still in the development phase -- not in production).
6 years have passed since I wrote this answer, and an update is necessary. Back in 2014, the issues were:
Link time optimization occasionally introduced subtle bugs, see for example Link-time optimization for the kernel. I assume this is less of an issue as of 2020. Safeguard against these kinds of compiler and linker bugs: Have appropriate tests to check the correctness of your software that you are about to ship.
Increased compile time. There are claims that the situation has significantly improved since 2014, for example thanks to slim objects.
Large memory usage. This post claims that the situation has drastically improved in recent years, thanks to partitioning.
As of 2020, I would try to use LTO by default on any of my projects.
This recent question raises another possible (but rather specific) case in which LTO may have undesirable effects: if the code in question is instrumented for timing, and separate compilation units have been used to try to preserve the relative ordering of the instrumented and instrumenting statements, then LTO has a good chance of destroying the necessary ordering.
I did say it was specific.
If you have well written code, it should only be advantageous. You may hit a compiler/linker bug, but this goes for all types of optimisation, this is rare.
Biggest downside is it drastically increases link time.
Apart from to this,
Consider a typical example from embedded system,
void function1(void) { /*Do something*/} //located at address 0x1000
void function2(void) { /*Do something*/} //located at address 0x1100
void function3(void) { /*Do something*/} //located at address 0x1200
With predefined addressed functions can be called through relative addresses like below,
(*0x1000)(); //expected to call function2
(*0x1100)(); //expected to call function2
(*0x1200)(); //expected to call function3
LTO can lead to unexpected behavior.
updated:
In automotive embedded SW development,Multiple parts of SW are compiled and flashed on to a separate sections.
Boot-loader, Application/s, Application-Configurations are independently flash-able units. Boot-loader has special capabilities to update Application and Application-configuration. At every power-on cycle boot-loader ensures the SW application and application-configuration's compatibility and consistence via Hard-coded location for SW-Versions and CRC and many more parameters. Linker-definition files are used to hard-code the variable location and some function location.
Given that the code is implemented correctly, then link time optimization should not have any impact on the functionality. However, there are scenarios where not 100% correct code will typically just work without link time optimization, but with link time optimization the incorrect code will stop working. There are similar situations when switching to higher optimization levels, like, from -O2 to -O3 with gcc.
That is, depending on your specific context (like, age of the code base, size of the code base, depth of tests, are you starting your project or are you close to final release, ...) you would have to judge the risk of such a change.
One scenario where link-time-optimization can lead to unexpected behavior for wrong code is the following:
Imagine you have two source files read.c and client.c which you compile into separate object files. In the file read.c there is a function read that does nothing else than reading from a specific memory address. The content at this address, however, should be marked as volatile, but unfortunately that was forgotten. From client.c the function read is called several times from the same function. Since read only performs one single read from the address and there is no optimization beyond the boundaries of the read function, read will always when called access the respective memory location. Consequently, every time when read is called from client.c, the code in client.c gets a freshly read value from the address, just as if volatile had been used.
Now, with link-time-optimization, the tiny function read from read.c is likely to be inlined whereever it is called from client.c. Due to the missing volatile, the compiler will now realize that the code reads several times from the same address, and may therefore optimize away the memory accesses. Consequently, the code starts to behave differently.
Rather than mandating that all implementations support the semantics necessary to accomplish all tasks, the Standard allows implementations intended to be suitable for various tasks to extend the language by defining semantics in corner cases beyond those mandated by the C Standard, in ways that would be useful for those tasks.
An extremely popular extension of this form is to specify that cross-module function calls will be processed in a fashion consistent with the platform's Application Binary Interface without regard for whether the C Standard would require such treatment.
Thus, if one makes a cross-module call to a function like:
uint32_t read_uint32_bits(void *p)
{
return *(uint32_t*)p;
}
the generated code would read the bit pattern in a 32-bit chunk of storage at address p, and interpret it as a uint32_t value using the platform's native 32-bit integer format, without regard for how that chunk of storage came to hold that bit pattern. Likewise, if a compiler were given something like:
uint32_t read_uint32_bits(void *p);
uint32_t f1bits, f2bits;
void test(void)
{
float f;
f = 1.0f;
f1bits = read_uint32_bits(&f);
f = 2.0f;
f2bits = read_uint32_bits(&f);
}
the compiler would reserve storage for f on the stack, store the bit pattern for 1.0f to that storage, call read_uint32_bits and store the returned value, store the bit pattern for 2.0f to that storage, call read_uint32_bits and store that returned value.
The Standard provides no syntax to indicate that the called function might read the storage whose address it receives using type uint32_t, nor to indicate that the pointer the function was given might have been written using type float, because implementations intended for low-level programming already extended the language to supported such semantics without using special syntax.
Unfortunately, adding in Link Time Optimization will break any code that relies upon that popular extension. Some people may view such code as broken, but if one recognizes the Spirit of C principle "Don't prevent programmers from doing what needs to be done", the Standard's failure to mandate support for a popular extension cannot be viewed as intending to deprecate its usage if the Standard fails to provide any reasonable alternative.
LTO could also reveal edge-case bugs in code-signing algorithms. Consider a code-signing algorithm based on certain expectations about the TEXT portion of some object or module. Now LTO optimizes the TEXT portion away, or inlines stuff into it in a way the code-signing algorithm was not designed to handle. Worst case scenario, it only affects one particular distribution pipeline but not another, due to a subtle difference in which encryption algorithm was used on each pipeline. Good luck figuring out why the app won't launch when distributed from pipeline A but not B.
LTO support is buggy and LTO related issues has lowest priority for compiler developers. For example: mingw-w64-x86_64-gcc-10.2.0-5 works fine with lto, mingw-w64-x86_64-gcc-10.2.0-6 segfauls with bogus address. We have just noticed that windows CI stopped working.
Please refer the following issue as an example.

Why do we use volatile keyword? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Why does volatile exist?
I have never used it but I wonder why people use it? What does it exactly do? I searched the forum, I found it only C# or Java topics.
Consider this code,
int some_int = 100;
while(some_int == 100)
{
//your code
}
When this program gets compiled, the compiler may optimize this code, if it finds that the program never ever makes any attempt to change the value of some_int, so it may be tempted to optimize the while loop by changing it from while(some_int == 100) to something which is equivalent to while(true) so that the execution could be fast (since the condition in while loop appears to be true always). (if the compiler doesn't optimize it, then it has to fetch the value of some_int and compare it with 100, in each iteration which obviously is a little bit slow.)
However, sometimes, optimization (of some parts of your program) may be undesirable, because it may be that someone else is changing the value of some_int from outside the program which compiler is not aware of, since it can't see it; but it's how you've designed it. In that case, compiler's optimization would not produce the desired result!
So, to ensure the desired result, you need to somehow stop the compiler from optimizing the while loop. That is where the volatile keyword plays its role. All you need to do is this,
volatile int some_int = 100; //note the 'volatile' qualifier now!
In other words, I would explain this as follows:
volatile tells the compiler that,
"Hey compiler, I'm volatile and, you
know, I can be changed by some XYZ
that you're not even aware of. That
XYZ could be anything. Maybe some
alien outside this planet called
program. Maybe some lightning, some
form of interrupt, volcanoes, etc can
mutate me. Maybe. You never know who
is going to change me! So O you
ignorant, stop playing an all-knowing
god, and don't dare touch the code
where I'm present. Okay?"
Well, that is how volatile prevents the compiler from optimizing code. Now search the web to see some sample examples.
Quoting from the C++ Standard ($7.1.5.1/8)
[..] volatile is a hint to the
implementation to avoid aggressive
optimization involving the object
because the value of the object might
be changed by means undetectable by an
implementation.[...]
Related topic:
Does making a struct volatile make all its members volatile?
In computer programming, particularly in the C, C++, and C# programming languages, a variable or object declared with the volatile keyword usually has special properties related to optimization and/or threading. Generally speaking, the volatile keyword is intended to prevent the (pseudo)compiler from applying any optimizations on the code that assume values of variables cannot change "on their own." (c) Wikipedia
http://en.wikipedia.org/wiki/Volatile_variable

c++ optimization

I'm working on some existing c++ code that appears to be written poorly, and is very frequently called. I'm wondering if I should spend time changing it, or if the compiler is already optimizing the problem away.
I'm using Visual Studio 2008.
Here is an example:
void someDrawingFunction(....)
{
GetContext().DrawSomething(...);
GetContext().DrawSomething(...);
GetContext().DrawSomething(...);
.
.
.
}
Here is how I would do it:
void someDrawingFunction(....)
{
MyContext &c = GetContext();
c.DrawSomething(...);
c.DrawSomething(...);
c.DrawSomething(...);
.
.
.
}
Don't guess at where your program is spending time. Profile first to find your bottlenecks, then optimize those.
As for GetContext(), that depends on how complex it is. If it's just returning a class member variable, then chances are that the compiler will inline it. If GetContext() has to perform a more complicated operation (such as looking up the context in a table), the compiler probably isn't inlining it, and you may wish to only call it once, as in your second snippet.
If you're using GCC, you can also tag the GetContext() function with the pure attribute. This will allow it to perform more optimizations, such as common subexpression elimination.
If you're sure it's a performance problem, change it. If GetContext is a function call (as opposed to a macro or an inline function), then the compiler is going to HAVE to call it every time, because the compiler can't necessarily see what it's doing, and thus, the compiler probably won't know that it can eliminate the call.
Of course, you'll need to make sure that GetContext ALWAYS returns the same thing, and that this 'optimization' is safe.
If it is logically correct to do it the second way, i.e. calling GetContext() once on multiple times does not affect your program logic, i'd do it the second way even if you profile it and prove that there are no performance difference either way, so the next developer looking at this code will not ask the same question again.
Obviously, if GetContext() has side effects (I/O, updating globals, etc.) than the suggested optimization will produce different results.
So unless the compiler can somehow detect that GetContext() is pure, you should optimize it yourself.
If you're wondering what the compiler does, look at the assembly code.
That is such a simple change, I would do it.
It is quicker to fix it than to debate it.
But do you actually have a problem?
Just because it's called often doesn't mean it's called TOO often.
If it seems qualitatively piggy, sample it to see what it's spending time at.
Chances are excellent that it is not what you would have guessed.