Related
Chandler Carruth introduced two functions in his CppCon2015 talk that can be used to do some fine-grained inhibition of the optimizer. They are useful to write micro-benchmarks that the optimizer won't simply nuke into meaninglessness.
void clobber() {
asm volatile("" : : : "memory");
}
void escape(void* p) {
asm volatile("" : : "g"(p) : "memory");
}
These use inline assembly statements to change the assumptions of the optimizer.
The assembly statement in clobber states that the assembly code in it can read and write anywhere in memory. The actual assembly code is empty, but the optimizer won't look into it because it's asm volatile. It believes it when we tell it the code might read and write everywhere in memory. This effectively prevents the optimizer from reordering or discarding memory writes prior to the call to clobber, and forces memory reads after the call to clobber†.
The one in escape, additionally makes the pointer p visible to the assembly block. Again, because the optimizer won't look into the actual inline assembly code that code can be empty, and the optimizer will still assume that the block uses the address pointed by the pointer p. This effectively forces whatever p points to be in memory and not not in a register, because the assembly block might perform a read from that address.
(This is important because the clobber function won't force reads nor writes for anything that the compilers decides to put in a register, since the assembly statement in clobber doesn't state that anything in particular must be visible to the assembly.)
All of this happens without any additional code being generated directly by these "barriers". They are purely compile-time artifacts.
These use language extensions supported in GCC and in Clang, though. Is there a way to have similar behaviour when using MSVC?
†To understand why the optimizer has to think this way, imagine if the assembly block were a loop adding 1 to every byte in memory.
Given your approximation of escape(), you should also be fine with the following approximation of clobber() (note that this is a draft idea, deferring some of the solution to the implementation of the function nextLocationToClobber()):
// always returns false, but in an undeducible way
bool isClobberingEnabled();
// The challenge is to implement this function in a way,
// that will make even the smartest optimizer believe that
// it can deliver a valid pointer pointing anywhere in the heap,
// stack or the static memory.
volatile char* nextLocationToClobber();
const bool clobberingIsEnabled = isClobberingEnabled();
volatile char* clobberingPtr;
inline void clobber() {
if ( clobberingIsEnabled ) {
// This will never be executed, but the compiler
// cannot know about it.
clobberingPtr = nextLocationToClobber();
*clobberingPtr = *clobberingPtr;
}
}
UPDATE
Question: How would you ensure that isClobberingEnabled returns false "in an undeducible way"? Certainly it would be trivial to place the definition in another translation unit, but the minute you enable LTCG, that strategy is defeated. What did you have in mind?
Answer: We can take advantage of a hard-to-prove property from the number theory, for example, Fermat's Last Theorem:
bool undeducible_false() {
// It took mathematicians more than 3 centuries to prove Fermat's
// last theorem in its most general form. Hardly that knowledge
// has been put into compilers (or the compiler will try hard
// enough to check all one million possible combinations below).
// Caveat: avoid integer overflow (Fermat's theorem
// doesn't hold for modulo arithmetic)
std::uint32_t a = std::clock() % 100 + 1;
std::uint32_t b = std::rand() % 100 + 1;
std::uint32_t c = reinterpret_cast<std::uintptr_t>(&a) % 100 + 1;
return a*a*a + b*b*b == c*c*c;
}
I have used the following in place of escape.
#ifdef _MSC_VER
#pragma optimize("", off)
template <typename T>
inline void escape(T* p) {
*reinterpret_cast<char volatile*>(p) =
*reinterpret_cast<char const volatile*>(p); // thanks, #milleniumbug
}
#pragma optimize("", on)
#endif
It's not perfect but it's close enough, I think.
Sadly, I don't have a way to emulate clobber.
Sometimes a local variable is used for the sole purpose of checking it in an assert(), like so -
int Result = Func();
assert( Result == 1 );
When compiling code in a Release build, assert()s are usually disabled, so this code may produce a warning about Result being set but never read.
A possible workaround is -
int Result = Func();
if ( Result == 1 )
{
assert( 0 );
}
But it requires too much typing, isn't easy on the eyes and causes the condition to be always checked (yes, the compiler may optimize the check away, but still).
I'm looking for an alternative way to express this assert() in a way that wouldn't cause the warning, but still be simple to use and avoid changing the semantics of assert().
(disabling the warning using a #pragma in this region of code isn't an option, and lowering warning levels to make it go away isn't an option either...).
We use a macro to specifically indicate when something is unused:
#define _unused(x) ((void)(x))
Then in your example, you'd have:
int Result = Func();
assert( Result == 1 );
_unused( Result ); // make production build happy
That way (a) the production build succeeds, and (b) it is obvious in the code that the variable is unused by design, not that it's just been forgotten about. This is especially helpful when parameters to a function are not used.
I wouldn't be able to give a better answer than this, that addresses that problem, and many more:
Stupid C++ Tricks: Adventures in assert
#ifdef NDEBUG
#define ASSERT(x) do { (void)sizeof(x);} while (0)
#else
#include <assert.h>
#define ASSERT(x) assert(x)
#endif
As of C++17, the variable can be decorated with an attribute.
[[maybe_unused]] int Result = Func();
assert( Result == 1 );
See https://en.cppreference.com/w/cpp/language/attributes/maybe_unused for details.
This is better than the (void)Result trick because you directly decorate the variable declaration, rather than add something as an afterthought.
You could create another macro that allows you to avoid using a temporary variable:
#ifndef NDEBUG
#define Verify(x) assert(x)
#else
#define Verify(x) ((void)(x))
#endif
// asserts that Func()==1 in debug mode, or calls Func() and ignores return
// value in release mode (any braindead compiler can optimize away the comparison
// whose result isn't used, and the cast to void suppresses the warning)
Verify(Func() == 1);
int Result = Func();
assert( Result == 1 );
This situation means that in release mode, you really want:
Func();
But Func is non-void, i.e. it returns a result, i.e. it is a query.
Presumably, besides returning a result, Func modifies something (otherwise, why bother calling it and not using its result?), i.e. it is a command.
By the command-query separation principle (1), Func shouldn't be a command and a query at the same time. In other words, queries shouldn't have side effects, and the "result" of commands should be represented by the available queries on the object's state.
Cloth c;
c.Wash(); // Wash is void
assert(c.IsClean());
Is better than
Cloth c;
bool is_clean = c.Wash(); // Wash returns a bool
assert(is_clean);
The former doesn't give you any warning of your kind, the latter does.
So, in short, my answer is: don't write code like this :)
Update (1): You asked for references about the Command-Query Separation Principle. Wikipedia is rather informative. I read about this design technique in Object Oriented Software Construction, 2nd Editon by Bertrand Meyer.
Update (2): j_random_hacker comments "OTOH, every "command" function f() that previously returned a value must now set some variable last_call_to_f_succeeded or similar". This is only true for functions that don't promise anything in their contract, i.e. functions that might "succeed" or not, or a similar concept. With Design by Contract, a relevant number of functions will have postconditions, so after "Empty()" the object will be "IsEmpty()", and after "Encode()" the message string will be "IsEncoded()", with no need to check. In the same way, and somewhat symetrically, you don't call a special function "IsXFeasible()" before each and every call to a procedure "X()"; because you usually know by design that you're fulfilling X's preconditions at the point of your call.
You could use:
Check( Func() == 1 );
And implement your Check( bool ) function as you want. It may either use assert, or throw a particular exception, write in a log file or to the console, have different implementations in debug and release, or a combination of all.
With C++17 we can do:
[[maybe_unused]] int Result = Func();
though it involves a bit of extra typing compared to a assert substitution. See this answer.
Note: Added this because is the first google hit for "c++ assert unused variable".
You should move the assert inside the function before the return value(s). You know that the return value is not an unreferenced local variable.
Plus it makes more sense to be inside the function anyway, because it creates a self contained unit that has its OWN pre- and post-conditions.
Chances are that if the function is returning a value, you should be doing some kind of error checking in release mode on this return value anyway. So it shouldn't be an unreferenced variable to begin with.
Edit, But in this case the post condition should be X (see comments):
I strongly disagree with this point, one should be able to determine the post condition from the input parameters and if it's a member function, any object state. If a global variable modifies the output of the function, then the function should be restructured.
Most answers suggest using static_cast<void>(expression) trick in Release builds to suppress the warning, but this is actually suboptimal if your intention is to make checks truly Debug-only. The goals of an assertion macro in question are:
Perform checks in Debug mode
Do nothing in Release mode
Emit no warnings in all cases
The problem is that void-cast approach fails to reach the second goal. While there is no warning, the expression that you've passed to your assertion macro will still be evaluated. If you, for example, just do a variable check, that is probably not a big deal. But what if you call some function in your assertion check like ASSERT(fetchSomeData() == data); (which is very common in my experience)? The fetchSomeData() function will still be called. It may be fast and simple or it may be not.
What you really need is not only warning suppression but perhaps more importantly - non-evaluation of the debug-only check expression. This can be achieved with a simple trick that I took from a specialized Assert library:
void myAssertion(bool checkSuccessful)
{
if (!checkSuccessful)
{
// debug break, log or what not
}
}
#define DONT_EVALUATE(expression) \
{ \
true ? static_cast<void>(0) : static_cast<void>((expression)); \
}
#ifdef DEBUG
# define ASSERT(expression) myAssertion((expression))
#else
# define ASSERT(expression) DONT_EVALUATE((expression))
#endif // DEBUG
int main()
{
int a = 0;
ASSERT(a == 1);
ASSERT(performAHeavyVerification());
return 0;
}
All the magic is in the DONT_EVALUATE macro. It is obvious that at least logically the evaluation of your expression is never needed inside of it. To strengthen that, the C++ standard guarantees that only one of the branches of conditional operator will be evaluated. Here is the quote:
5.16 Conditional operator [expr.cond]
logical-or-expression ? expression : assignment-expression
Conditional expressions group right-to-left. The first expression is
contextually converted to bool. It is evaluated and if it is true, the
result of the conditional expression is the value of the second
expression, otherwise that of the third expression. Only one of these
expressions is evaluated.
I have tested this approach in GCC 4.9.0, clang 3.8.0, VS2013 Update 4, VS2015 Update 4 with the most harsh warning levels. In all cases there are no warnings and the checking expression is never evaluated in Release build (in fact the whole thing is completely optimized away). Bare in mind though that with this approach you will get in trouble really fast if you put expressions that have side effects inside the assertion macro, though this is a very bad practice in the first place.
Also, I would expect that static analyzers may warn about "result of an expression is always constant" (or something like that) with this approach. I've tested for this with clang, VS2013, VS2015 static analysis tools and got no warnings of that kind.
The simplest thing is to only declare/assign those variables if the asserts will exist. The NDEBUG macro is specifically defined if asserts won't be effected (done that way round just because -DNDEBUG is a convenient way to disable debugging, I think), so this tweaked copy of #Jardel's answer should work (cf. comment by #AdamPeterson on that answer):
#ifndef NDEBUG
int Result =
#endif
Func();
assert(Result == 1);
or, if that doesn't suit your tastes, all sorts of variants are possible, e.g. this:
#ifndef NDEBUG
int Result = Func();
assert(Result == 1);
#else
Func();
#endif
In general with this stuff, be careful that there's never a possibility for different translation units to be build with different NDEBUG macro states -- especially re. asserts or other conditional content in public header files. The danger is that you, or users of your library might accidentally instantiate a different definition of an inline function from the one used inside the compiled part of the library, quietly violating the one definition rule and making the runtime behaviour undefined.
This is a bad use of assert, IMHO. Assert is not meant as an error reporting tool, it's meant to assert preconditions. If Result is not used elsewhere, it's not a precondition.
Certainly you use a macro to control your assert definition, such as "_ASSERT". So, you can do this:
#ifdef _ASSERT
int Result =
#endif /*_ASSERT */
Func();
assert(Result == 1);
int Result = Func();
assert( Result == 1 );
Result;
This will make the compiler stop complaining about Result not being used.
But you should think about using a version of assert that does something useful at run-time, like log descriptive errors to a file that can be retrieved from the production environment.
I'd use the following:
#ifdef _DEBUG
#define ASSERT(FUNC, CHECK) assert(FUNC == CHECK)
#else
#define ASSERT(FUNC, CHECK)
#endif
...
ASSERT(Func(), 1);
This way, for release build, the compiler don't even need to produce any code for assert.
If this code is inside a function, then act on and return the result:
bool bigPicture() {
//Check the results
bool success = 1 != Func();
assert(success == NO, "Bad times");
//Success is given, so...
actOnIt();
//and
return success;
}
// Value is always computed. We also call assert(value) if assertions are
// enabled. Value is discarded either way. You do not get a warning either
// way. This is useful when (a) a function has a side effect (b) the function
// returns true on success, and (c) failure seems unlikely, but we still want
// to check sometimes.
template < class T >
void assertTrue(T const &value)
{
assert(value);
}
template < class T >
void assertFalse(T const &value)
{
assert(!value);
}
I haven't succeeded in using [[maybe_unused]] but
you can use the unused attribute
int Result __attribute__((__unused__)) = Func();
gcc Variable-Attributes
This question already has answers here:
Is Loop Hoisting still a valid manual optimization for C code?
(8 answers)
Closed 7 years ago.
My question is specifically about the gcc compiler.
Often, inside a loop, I have to use a value returned by a function that is constant during the whole loop.
I would like to know if it's better to prior store this constant return value inside a variable (let's imagine a long loop), or if a compiler like gcc is able to perform some optimizations to cache the constant value, because it would recognize it as constant threw the loop.
For example when I loop over chars in a string, I often write something like that:
bool find_something(string s, char something)
{
size_t sz = s.size();
for (size_t i = 0; i != sz; i++)
if (s[i] == something) return true;
return false;
}
but with a clever compiler, I could use the following (which is shorter and more clear):
bool find_something(string s, char something)
{
for (size_t i = 0; i != s.size(); i++)
if (s[i] == something) return true;
return false;
}
then the compiler could detect that the code inside the loop doesn't perform any change on the string object, and then would build a code that would cache the value returned by s.size(), instead of making a (slower) function call for each iteration.
Is there such optimization with gcc?
Generally there's nothing in your example that makes it impossible for the compiler to move the .size() computation before the loop. And in fact GCC 5.2.0 will produce exactly the same code for both implementations that you showed.
I would however strongly suggest against relying on optimizations like that (in really performance critical code) because a small change somewhere (GCCs optimizer, the implementation details of std::string, ...) could break GCCs ability to do this optimization.
However I don't see a point in writing the more verbose version in the usual 90% of code that's not really performance-critical.
Given a current C++ compiler though I would go for the even more concise:
bool find_something(std::string s, char something)
{
for (ch : s)
if (ch == something) return true;
return false;
}
Which BTW also yields very similar machine code with GCC 5.2.0.
The compiler has to know the object is not modified in a different thread. It can tell that the function won't change if the object doesn't change, but it can't tell that the object won't change from some other stimulus.
The compiler will unwind the call to the member with size if you include some form of whole-program optimization
It depends on whether or not the compiler can identify the function call as constant. Consider the following function that may reside in an external library which cannot be analyzed by the compiler.
int odd_size(string s) {
static int a = 0;
return a++;
}
This function will return distinct values regardless of the input parameter. The compiler can therfore not assume a constant return value even if the passed string object remains constant. No optimization will be applied.
On the other hand, if the compiler detects a constant function call, which may be the case in your example, it probably moves the constant expression out of the loop.
Older versions of gcc had an explicit option -floop-optimize which was responsible for that task. From the gcc-3.4.5 documentation:
-floop-optimize
Perform loop optimizations: move constant expressions out of loops, simplify exit test conditions and optionally do strength-reduction and loop unrolling as well.
Enabled at levels -O, -O2, -O3, -Os.
I cannot find this option in current versions of gcc, but I'm quite sure that they include this type of optimization as well.
I have code in my C++ application that generally does this:
bool myFlag = false;
while (/*some finite condition unrelated to myFlag*/) {
if (...) {
// statements, unrelated to myFlag
} else {
// set myFlag to true, perhaps only if it was false before?
}
}
if (myFlag) {
// Do something...
}
The question I have pertains to the else statement of my code. Basically, my loop may set the value of myFlag from false to true, based on a certain condition not being met. Never will the flag be unset from true to false. I would like to know what statement makes more sense performance-wise and perhaps if this issue is actually a non-issue due to compiler optimization.
myFlag = true;
OR
if (!myFlag) myFlag = true;
I would normally choose the former because it requires writing less code. However, I began to wonder that maybe it involved needless writing to memory and therefore the latter would prevent needless writing if myFlag was already true. But, would using the latter take more time because there is a conditional statement and therefore compile code using more instructions?
Or maybe I am over-thinking this too much...
UPDATE 1
Just to clarify a bit...the purpose of my latter case is to not write to memory if the variable was already true. Thus, only write to memory if the variable is false.
You're almost certainly better off just using myFlag = true;.
About the best you can hope for from the if (!myFlag) myFlag = true; is that the compiler will notice that the if is irrelevant and optimize it away. In particular, the if statement needs to read the current value of myFlag. If the value isn't already in the cache, that means the instruction will stall while waiting for the data to be read from memory.
By contrast, if you just write (without testing first) the value can be written to a write queue, and then more instructions can execute immediately. You won't get a stall until/unless you read the value of myFlag (and assuming it's read reasonably soon after writing, it'll probably still be in the cache, so stalling will be minimal).
CPU-cycle wise, prefer myFlag = true; Think about it: even if the compiler makes no optimization (not really likely), just setting it takes one asm statement, and going through the if takes at least 1 asm statement.
So just go with the assignment.
And more importantly, don't try to make hypotheses on such low-level details, specific compiler optimizations can totally go against intuition.
You do realize that the check is moot right? If you blindly set it to true and it was not set, you are setting it. If it was already true, then there is no change and you are not setting it, so you effectively can implement it as:
myFlag = true;
Regarding the potential optimizations, to be able to test, the value must be in the cache, so most of the cost is already paid. On the other hand, the branch (if the compiler does not optimize the if away, which most will) can have a greater impact in performance.
You are most likely over-thinking the problem as others already mentioned, so let me do the same. The following might be faster if you can afford to double the statements unrelated to myFlag. In fact, you can get rid of myFlag. OK, here we go:
while (/*some finite condition*/) {
if (...) {
// statements
} else {
while (/*some finite condition*/) {
if (...) {
// statements, repeated
}
}
// Do something (as if myFlag was true in the OPs example)
break;
}
}
As with all performance optimization: Measure, measure, measure!
It is architecture specific whether if (!myFlag) myFlag = true; will take more time to execute than the simple myFlag = true; even without any optimization. There are architectures (e.g., https://developer.qualcomm.com/hexagon-processor) where both statements will take only one cycle each to execute.
The only way to figure out on your machine would be by measurement.
In any case myFlag = true will always be faster or have same execution time as if (!myFlag) myFlag = true;
This question gave me headache too so i simply tested it myself with the following code (C#):
System.Diagnostics.Stopwatch time = new System.Diagnostics.Stopwatch();
int i = 0;
int j = 1;
time.Start();
if (i != 0)
i = 0;
time.Stop();
Console.WriteLine("compare + set - {0} ticks", time.ElapsedTicks);
time.Reset();
time.Start();
if (j != 0)
j = 0;
time.Stop();
Console.WriteLine("compare - {0} ticks", time.ElapsedTicks);
time.Reset();
time.Start();
i = 0;
time.Stop();
Console.WriteLine("set - {0} ticks", time.ElapsedTicks);
Console.ReadLine();
result:
compare + set - 1 ticks
compare - 1 ticks
set - 0 ticks
while the time, used to set the value surely isn't zero, it shows that even a single query needed more time than just setting the variable.
Consider an example like this:
if (flag)
for (condition)
do_something();
else
for (condition)
do_something_else();
If flag doesn't change inside the for loops, this should be semantically equivalent to:
for (condition)
if (flag)
do_something();
else
do_something_else();
Only in the first case, the code might be much longer (e.g. if several for loops are used or if do_something() is a code block that is mostly identical to do_something_else()), while in the second case, the flag gets checked many times.
I'm curious whether current C++ compilers (most importantly, g++) would be able to optimize the second example to get rid of the repeated tests inside the for loop. If so, under what conditions is this possible?
Yes, if it is determined that flag doesn't change and can't be changed by do_something or do_something_else, it can be pulled outside the loop. I've heard of this called loop hoisting, but Wikipedia has an entry called "loop invariant code motion".
If flags is a local variable, the compiler should be able to do this optimization since it's guaranteed to have no effect on the behavior of the generated code.
If flags is a global variable, and you call functions inside your loop it might not perform the optimization - it may not be able to determine if those functions modify the global.
This can also be affected by the sort of optimization you do - optimizing for size would favor the non-hoisted version while optimizing for speed would probably favor the hoisted version.
In general, this isn't the sort of thing that you should worry about, unless profiling tells you that the function is a hotspot and you see that less than efficient code is actually being generated by going over the assembly the compiler outputs. Micro-optimizations like this you should always just leave to the compiler unless you absolutely have to.
Tried with GCC and -O3:
void foo();
void bar();
int main()
{
bool doesnt_change = true;
for (int i = 0; i != 3; ++i) {
if (doesnt_change) {
foo();
}
else {
bar();
}
}
}
Result for main:
_main:
pushl %ebp
movl %esp, %ebp
andl $-16, %esp
call ___main
call __Z3foov
call __Z3foov
call __Z3foov
xorl %eax, %eax
leave
ret
So it does optimize away the choice (and unrolls smaller loops).
This optimization is not done if doesnt_change is global.
I'm sure if the compiler can determine that the flag will remain constant, it can do some shufflling:
const bool flag = /* ... */;
for (..;..;..;)
{
if (flag)
{
// ...
}
else
{
// ...
}
}
If the flag is not const, the compiler cannot necessarily optimize the loop, because it can't be sure flag won't change. It can if it does static analysis, but not all compilers do, I think. const is the sure-fire way of telling the compiler the flag won't change, after that it's up to the compiler.
As usual, profile and find out if it's really a problem.
I would be wary to say that it will. Can it guarantee that the value won't be modified by this, or another thread?
That said, the second version of the code is generally more readable and it would probably be the last thing to optimize in a block of code.
As many have said: it depends.
If you want to be sure, you should try to force a compile-time decision. Templates often come in handy for this:
for (condition)
do_it<flag>();
Generally, yes. But there is no guarantee, and the places where the compiler will do it are probably rare.
What most compilers do without a problem is hoisting immutable evaluations out of the loop, e.g. if your condition is
if (a<b) ....
when a and b are not affected by the loop, the comparison will be made once before the loop.
This means if the compiler can determine the condition does not change, the test is cheap and the jump wenll predicted. This in turn means the test itself costs one cycle or no cycle at all (really).
In which cases splitting the loop would be beneficial?
a) a very tight loop where the 1 cycle is a significant cost
b) the entire loop with both parts does not fit the code cache
Now, the compiler can only make assumptions about the code cache, and usually can order the code in a way that one branch will fit the cache.
Without any testing, I'dexpect a) the only case where such an optimization would be applied, becasue it's nto always the better choice:
In which cases splitting the loop would be bad?
When splitting the loop increases code size beyond the code cache, you will take a significant hit. Now, that only affects you if the loop itself is called within another loop, but that's something the compiler usually can't determine.
[edit]
I couldn't get VC9 to split the following loop (one of the few cases where it might actually be beneficial)
extern volatile int vflag = 0;
int foo(int count)
{
int sum = 0;
int flag = vflag;
for(int i=0; i<count; ++i)
{
if (flag)
sum += i;
else
sum -= i;
}
return sum;
}
[edit 2]
note that with int flag = true; the second branch does get optimized away. (and no, const doesn't make a difference here ;))
What does that mean? Either it doesn't support that, it doesn't matter, ro my analysis is wrong ;-)
Generally, I'd asume this is an optimization that is valuable only in a very few cases, and can be done by hand easily in most scenarios.
It's called a loop invariant and the optimization is called loop invariant code motion and also code hoisting. The fact that it's in a conditional will definitely make the code analysis more complex and the compiler may or may not invert the loop and the conditional depending on how clever the optimizer is.
There is a general answer for any specific case of this kind of question, and that's to compile your program and look at the generated code.