Consider the following code snippet:
#include <limits>
#include <stdexcept>
void g(unsigned) {
// ...
}
template<typename UIntT>
void f(UIntT n)
{
if constexpr (std::numeric_limits<UIntT>::max() > std::numeric_limits<unsigned>::max())
{
if (n > std::numeric_limits<unsigned>::max())
throw std::length_error("Too long.");
}
g(n);
}
I wonder whether the 'if constexpr' clause is really useful here. Aren't compilers smart enough to find out whether the 'if' clause can ever be true for a given UIntT? If so, is this mandated by the standard?
Aren't compilers smart enough to find out whether the if clause can ever be true for a given UIntT?
Most are.
If so, is this mandated by the standard?
No, some optimizations have been given a name (RVO:s etc) and have later been incorporated into the language standard, but DEADC0DE optimizations aren't standardized (to my knowledge).
... but constexpr is
There's no way a conforming compiler would keep that block (if the condition is false) in your resulting binary - however you decide to optimize your code.
This use of if constexpr has no observable difference from an if according to the C++ standard.
However, slightly different variants of it could result in an observable difference in what symbols a compilation unit uses. It seems plausible to me that would cause observable differences.
Most modern compilers can and will reduce that to if (false) during optimization even if not constexpr, and dead-branch elimination is a pretty simple optimization. In a debug build they might leave the dead code alone, while they might eliminate it with constexpr.
Compiler explorer is great to answer specific cases of this kind of question, as it makes it pretty easy to see the generated assembly of every major compiler. So if you want to know if there is a difference in a default MSVC 2015 debug or release setup, you can see it there.
Related
In the case where you need to check the return value at the call site, is it easy for the compiler to optimise it out if the value is checked in the function itself? Does it make a difference whether the function is inline? I tried looking at the assembly code to check for jumps but I'm afraid I don't understand it at all. I'm talking about a situation like this?
int* try_get()
{
static int anint;
anint = rand() % 2;
if (anint) return &anint;
else return nullptr;
}
int main()
{
int* p = try_get();
if (p) // The value was already tested in the function.
// Is optimisation of this easy? Does it depend on whether the function is inline?
{
std::cout << "Hello";
}
}
A C++ compiler is allowed to perform any optimization that has no observable effects, however the C++ standard does not require any C++ compiler to perform any such optimization (except those that are required by the C++ specification itself, such as mandatory copy elision). Except for the required optimizations, everything else is entirely at your C++ compiler's discretion.
If the compiler has access both to the function definition and its call site, and the compiler can work out that this particular optimization has no observable effects, then the compiler can certainly optimize it out. Whether your compiler will do that can only be answered by looking at your compiler's compiled code. And even after determining what your compiler actually does will not, of course, bear any relevance to what any other compiler would do.
Whether or not the function in question is inline, or not, may or may not be a factor that your compiler considers when deciding whether to perform this optimization.
And, finally, even looking at what your compiler produced, for a particular translation unit, may not even paint the entire picture as well. Many current C++ compilers feature link-time optimizations, where the combined mighty forces of the compiler and the linker produce additional optimizations and code transformations in the final, linked executable.
So the only definitive answer here is to go actually look at the actual linked code in your final executable, in order to figure out whether any particular optimization took place, and, of course, that is a highly technical matter.
We discovered recently that compilers (for us it is GCC) can optimize some code in conjunction with asserts set by developers.
This code for example:
#include <cassert>
int getBatteryLevel(){
return 0;
}
int process(int level);
int main() {
[[maybe_unused]] const auto level = getBatteryLevel();
assert(level > 0);
process(level);
}
Will link with -O2 even if process has no implementation. It does not link without optimizations.
Is this documented anywhere?
Is this documented anywhere?
The documentation of optimisations isn't very thorough. The optimisations that are probably used here are "inline expansion", "constant folding" and "dead code elmination".
Where can I learn about compilers
Books are a good place to start, unless you intend to invent computing from ground up. Academic papers can also contain good information, but filtering out irrelevant stuff, and figuring out required preliminary knowledge can be a lot of work. Plus, they tend to be quite expensive per unit of information.
P.S. If you don't define the function that is odr-used, then the program is ill-formed. Language implementations are allowed to accept ill-formed programs, but are not required to. Usually, they are required to diagnose such bugs, but this particular case is where implementations are not required to diagnose.
tl;dr: Can it be ensured somehow (e.g. by writing a unit test) that some things are optimized away, e.g. whole loops?
The usual approach to be sure that something is not included in the production build is wrapping it with #if...#endif. But I prefer to stay with C++ mechanics instead. Even there, instead of complicated template specializations I like to keep implementations simple and argue "hey, the compiler will optimize this out anyway".
Context is embedded SW in automotive (binary size matters) with often poor compilers. They are certified in the sense of safety, but usually not good in optimizations.
Example 1: In a container the destruction of elements is typically a loop:
for(size_t i = 0; i<elements; i++)
buffer[i].~T();
This works also for build-in types such as int, as the standard allows the explicit call of the destructor also for any scalar types (C++11 12.4-15). In such case the loop does nothing and is optimized out. In GCC it is, but in another (Aurix) not, I saw a literally empty loop in the disassembly! So that needed a template specialization to fix it.
Example 2: Code, which is intended for debugging, profiling or fault-injection etc. only:
constexpr bool isDebugging = false; // somehow a global flag
void foo(int arg) {
if( isDebugging ) {
// Albeit 'dead' section, it may not appear in production binary!
// (size, security, safety...)
// 'if constexpr..' not an option (C++11)
std::cout << "Arg was " << arg << std::endl;
}
// normal code here...
}
I can look at the disassembly, sure. But being an upstream platform software it's hard to control all targets, compilers and their options one might use. The fear is big that due to any reason a downstream project has a code bloat or performance issue.
Bottom line: Is it possible to write the software in a way, that certain code is known to be optimized away in a safe manner as a #if would do? Or a unit tests, which give a fail if optimization is not as expected?
[Timing tests come to my mind for the first problem, but being bare-metal I don't have convenient tools yet.]
There may be a more elegant way, and it's not a unit test, but if you're just looking for that particular string, and you can make it unique,
strings $COMPILED_BINARY | grep "Arg was"
should show you if the string is being included
if constexpr is the canonical C++ expression (since C++17) for this kind of test.
constexpr bool DEBUG = /*...*/;
int main() {
if constexpr(DEBUG) {
std::cerr << "We are in debugging mode!" << std::endl;
}
}
If DEBUG is false, then the code to print to the console won't generate at all. So if you have things like log statements that you need for checking the behavior of your code, but which you don't want to interact with in production code, you can hide them inside if constexpr expressions to eliminate the code entirely once the code is moved to production.
Looking at your question, I see several (sub-)questions in it that require an answer. Not all answers might be possible with your bare-metal compilers as hardware vendors don't care that much about C++.
The first question is: How do I write code in a way that I'm sure it gets optimized. The obvious answer here is to put everything in a single compilation unit so the caller can see the implementation.
The second question is: How can I force a compiler to optimize. Here constexpr is a bless. Depending on whether you have support for C++11, C++14, C++17 or even the upcoming C++20, you'll get different feature sets of what you can do in a constexpr function. For the usage:
constexpr char c = std::string_view{"my_very_long_string"}[7];
With the code above, c is defined as a constexpr variable. Because you apply it to the variable, you require some things:
Your compiler should optimize the code so the value of c is known at compile time. This even holds true for -O0 builds!
All functions used for calculate c are constexpr and available. (and by result, enforce the behaviour of the first question)
No undefined behaviour is allowed to be triggered in the calculation of c. (For the given value)
The negative about this is: Your input needs to be known at compile time.
C++17 also provides if constexpr which has similar requirements: condition needs to be calculated at compile time. The result is that 1 branch of the code ain't allowed to be compiled (as it even can contain elements that don't work on the type you are using).
Which than brings us to the question: How do I ensure sufficient optimizations for my program to run fast enough, even if my compiler ain't well behaving. Here the only relevant answer is: create benchmarks and compare the results. Take the effort to setup a CI job that automates this for you. (And yes, you can even use external hardware although not being that easy) In the end, you have some requirements: handling A should take less than X seconds. Do A several times and time it. Even if they don't handle everything, as long as it's within the requirements, its fine.
Note: As this is about debug, you most likely can track the size of an executable as well. As soon as you start using streams, a lot of conversions to string ... your exe size will grow. (And you'll find it a bless as you will immediately find commits which add 10% to the image size)
And than the final question: You have a buggy compiler, it doesn't meet my requirements. Here the only answer is: Replace it. In the end, you can use any compiler to compiler your code to bare metal, as long as the linker scripts work. If you need a start, C++Now 2018: Michael Caisse “Modern C++ in Embedded Systems” gives you a very good idea of what you need to use a different compiler. (Like a recent Clang or GCC, on which you even can log bugs if the optimization ain't good enough)
Insert a reference to external data or function into the block that should be verified to be optimised away. Like this:
extern void nop();
constexpr bool isDebugging = false; // somehow a global flag
void foo(int arg) {
if( isDebugging ) {
nop();
std::cout << "Arg was " << arg << std::endl; // may not appear in production binary!
}
// normal code here...
}
In Debug-Builds, link with an implementation of nop() in a extra compilation unit nop.cpp:
void nop() {}
In Release-Builds, don't provide an implementation.
Release builds will only link if the optimisable code is eliminated.
`- kisch
Here's another nice solution using inline assembly.
This uses assembler directives only, so it might even be kind of portable (checked with clang).
constexpr bool isDebugging = false; // somehow a global flag
void foo(int arg) {
if( isDebugging ) {
asm(".globl _marker\n_marker:\n");
std::cout << "Arg was " << arg << std::endl; // may not appear in production binary!
}
// normal code here...
}
This would leave an exported linker symbol in the compiled executable, if the code isn't optimised away. You can check for this symbol using nm(1).
clang can even stop the compilation right away:
constexpr bool isDebugging = false; // somehow a global flag
void foo(int arg) {
if( isDebugging ) {
asm("_marker=1\n");
std::cout << "Arg was " << arg << std::endl; // may not appear in production binary!
}
asm volatile (
".ifdef _marker\n"
".err \"code not optimised away\"\n"
".endif\n"
);
// normal code here...
}
This is not an answer to "How to ensure some code is optimized away?" but to your summary line "Can a unit test be written that e.g. whole loops are optimized away?".
First, the answer depends on how far you see the scope of unit-testing - so if you put in performance tests, you might have a chance.
If in contrast you understand unit-testing as a way to test the functional behaviour of the code, you don't. For one thing, optimizations (if the compiler works correctly) shall not change the behaviour of standard-conforming code.
With incorrect code (code that has undefined behaviour) optimizers can do what they want. (Well, for code with undefined behaviour the compiler can do it also in the non-optimizing case, but sometimes only the deeper analyses peformed during optimization make it possible for the compiler to detect that some code has undefined behaviour.) Thus, if you write unit-tests for some piece of code with undefined behaviour, the test results may differ when you run the tests with and without optimization. But, strictly speaking, this only tells you that the compiler translated the code both times in a different way - it does not guarantee you that the code is optimized in the way you want it to be.
Here's another different way that also covers the first example.
You can verify (at runtime) that the code has been eliminated, by comparing two labels placed around it.
This relies on the GCC extension "Labels as Values" https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
before:
for(size_t i = 0; i<elements; i++)
buffer[i].~T();
behind:
if (intptr_t(&&behind) != intptr_t(&&before)) abort();
It would be nice if you could check this in a static_assert(), but sadly the difference of &&label expressions is not accepted as compile-time constant.
GCC insists on inserting a runtime comparison, even though both labels are in fact at the same address.
Interestingly, if you compare the addresses (type void*) directly, without casting them to intptr_t, GCC falsely optimises away the if() as "always true", whereas clang correctly optimises away the complete if() as "always false", even at -O1.
constexpr permits expressions which can be evaluated at compile time to be ... evaluated at compile time.
Why is this keyword even necessary? Why not permit or require that compilers evaluate all expressions at compile time if possible?
The standard library has an uneven application of constexpr which causes a lot of inconvenience. Making constexpr the "default" would address that and likely improve a huge amount of existing code.
It already is permitted to evaluate side-effect-free computations at compile time, under the as-if rule.
What constexpr does is provide guarantees on what data-flow analysis a compliant compiler is required to do to detect1 compile-time-computable expressions, and also allow the programmer to express that intent so that they get a diagnostic if they accidentally do something that cannot be precomputed.
Making constexpr the default would eliminate that very useful diagnostic ability.
1 In general, requiring "evaluate all expressions at compile time if possible" is a non-starter, because detecting the "if possible" requires solving the Halting Problem, and computer scientists know that this is not possible in the general case. So instead a relaxation is used where the outputs are { "Computable at compile-time", "Not computable at compile-time or couldn't decide" }. And the ability of different compilers to decide would depend on how smart their test was, which would make this feature non-portable. constexpr defines the exact test to use. A smarter compiler can still pre-compute even more expressions than the Standard test dictates, but if they fail the test, they can't be marked constexpr.
Note: despite the below, I admit to liking the idea of making constexpr the default. But you asked why it wasn't already done, so to answer that I will simply elaborate on mattnewport's last comment:
Consider the situation today. You're trying to use some function from the standard library in a context that requires a constant expression. It's not marked as constexpr, so you get a compiler error. This seems dumb, since "clearly" the ONLY thing that needs to change for this to work is to add the word constexpr to the definition.
Now consider life in the alternate universe where we adopt your proposal. Your code now compiles, yay! Next year you decide you to add Windows support to whatever project you're working on. How hard can it be? You'll compile using Visual Studio for your Windows users and keep using gcc for everyone else, right?
But the first time you try to compile on Windows, you get a bunch of compiler errors: this function can't be used in a constant expression context. You look at the code of the function in question, and compare it to the version that ships with gcc. It turns out that they are slightly different, and that the version that ships with gcc meets the technical requirements for constexpr by sheer accident, and likewise the one that ships with Visual Studio does not meet those requirements, again by sheer accident. Now what?
No problem you say, I'll submit a bug report to Microsoft: this function should be fixed. They close your bug report: the standard never says this function must be usable in a constant expression, so we can implement however we want. So you submit a bug report to the gcc maintainers: why didn't you warn me I was using non-portable code? And they close it too: how were we supposed to know it's not portable? We can't keep track of how everyone else implements the standard library.
Now what? No one did anything really wrong. Not you, not the gcc folks, nor the Visual Studio folks. Yet you still end up with un-portable code and are not a happy camper at this point. All else being equal, a good language standard will try to make this situation as unlikely as possible.
And even though I used an example of different compilers, it could just as well happen when you try to upgrade to a newer version of the same compiler, or even try to compile with different settings. For example: the function contains an assert statement to ensure it's being called with valid arguments. If you compile with assertions disabled, the assertion "disappears" and the function meets the rules for constexpr; if you enable assertions, then it doesn't meet them. (This is less likely these days now that the rules for constexpr are very generous, but was a bigger issue under the C++11 rules. But in principle the point remains even today.)
Lastly we get to the admittedly minor issue of error messages. In today's world, if I try to do something like stick in a cout statement in constexpr function, I get a nice simple error right away. In your world, we would have the same situation that we have with templates, deep stack-traces all the way to the very bottom of the implementation of output streams. Not fatal, but surely annoying.
This is a year and a half late, but I still hope it helps.
As Ben Voigt points out, compilers are already allowed to evaluate anything at compile time under the as-if rule.
What constexpr also does is lay out clear rules for expressions that can be used in places where a compile time constant is required. That means I can write code like this and know it will be portable:
constexpr int square(int x) { return x * x; }
...
int a[square(4)] = {};
...
Without the keyword and clear rules in the standard I'm not sure how you could specify this portably and provide useful diagnostics on things the programmer intended to be constexpr but don't meet the requirements.
Let's say I have a function:
template <bool stuff>
inline void doSomething() {
if(stuff) {
cout << "Hello" << endl;
}
else {
cout << "Goodbye" << endl;
}
}
And I call it like this:
doSomething<true>();
doSomething<false>();
It would pring out:
Hello
Goodbye
What I'm really wondering is does the compiler fully optimize this?
When I call the templated function with true, will it create a function that just outputs "Hello" and avoids the if statement and the code for "Goodbye"?
This would be really useful for this one giant function I just wrote that's supposed to be very optimized and avoid as many unnecessary if statement checks as possible. I have a very good feeling it would, at least in a release build with optimizations if not in a debug build with no optimizations.
Disclaimer: Noone can guarantee anything.
That said, this an obvious and easy optimization for any compiler. It's quite safe to say that it will be optimized away, unless the optimizer is, well, practically useless.
Since your "true" and "false" are constants, you are unambiguously creating an obvious dead branch in each class, and the compiler should optimize it away. Should is taken literally here - I would consider it a major, major problem if an "optimising" compiler did not do dead branch removal.
In other words, if your compiler cannot optimize this, it is the use of that compiler that should be evaluated, not the code.
So, I would say your gut feeling is correct: while yes, no "guarantees" as such can be made on each and every compiler, I would not use a compiler incapable of performing simplistic optimizations in any production environment, and of course not in any performance critical one. (In release builds of course).
So, use it. Any modern optimizing compiler will optimize it away because it is a trivial optimization. If in doubt, check disassembly, and if it is not optimized, change the compiler to something more modern.
In general, if you are writing any kind of performance-critical code, you must rely, at least to some extent, on compiler optimizations.
This is inherently up to the compiler, so you'd have to check the compiler's documentation or the generated code. But in simple cases like this, you can easily implement the optimization yourself:
template <bool stuff>
inline void doSomething();
template<>
inline void doSomething<true>() {
cout << "Hello" << endl;
}
template<>
inline void doSomething<false>() {
cout << "Goodbye" << endl;
}
But "optimization" isn't really the right word to use since this might actually degrade performance. It's only an optimization if it actually benefits your code performance.
Indeed, it really createa two functions, but
premature optimization is the root of all evil
especially if your changing your code structure because of a simple if statement. I doubt that this will affect performance. Also the boolean must be static, that means you cant take a runtime evaluated var and pass it to the function. How should the linker know which function to call? In this case youll have to manually evaluate it and call the appropiate function on your own.
Compilers are really good at constant folding. That is, in this case it would surprise me if the check would stay until after optimization. A non-optimized build might still have the check. The easiest way to verify is to create assembler output and check.
That said, it is worth noting that the compiler has to check both branches for correctness, even if it only ever uses one branch. This frequently shows up, e.g., when using slightly different algorithms for Random Access Iterators and other iterators. The condition would depend on a type-trait and one of the branches may fail to compile depending on operations tested for by the traits. The committee has discussed turning off this checking under the term static if although there is no consensus, yet, on how the features would look exactly (if it gets added).
If I understand you correctly you want (in essence) end up with 'two' functions that are optimised for either a true or a false input so that they don't need check that flag?
Aside from any trivial optimisation that may yield (I'm against premature otimisation - I believe in maintainability before measurement before optimisation), I would say why not refactor your function to actually be two functions? If they have common code then then that code could be refactored out too. However if the requirement is such that the refactoring is non optimal then I'd replace that with a #define refactoring.