Are const constant operations evaluated at run time? - c++

I'm doing some chess programming in C++, as a result there are a lot of bitwise operations that I have to do with some large numbers. I was wondering, for perfomance sake if constant operations are done at runtime? Or if they're evaluated during compilation. e.g. Suppose I have to AND the following 2 constants:
const unsigned long long FILE_A = ~0x8080808080808080;
const unsigned long long FILE_B = ~0x4040404040404040;
In a function like this
unsigned long long join(){
return (FILE_A & FILE_B);
}
Is the AND operation on FILE_A and FILE_B done at runtime? Or does the compiler do it?

In general: a C++ compiler is allowed to do any optimization as long as the result of the optimization is "as if" the code was executed literally.
In the example you gave, doing the given calculation at compile-time is indistinguishable to doing it at run time; so modern C++ compilers will do exactly that. In fact, modern C++ compilers, if join() is defined in a header file (with an inline attribute) -- and if a moderate optimization level is selected -- will not only make the calculation at compile time, but completely optimize join() away, and inject the computed constant directly wherever join() gets used, making possible additional compile-time optimizations. That's because the result would be indistinguishable from the result if nothing was optimized away.

From the look of things it does. I put my code, the one above in this converter https://assembly.ynh.io/ and for the line return (FILE_A & FILE_B); it outputs the following assembly
movabsq $4557430888798830399, %rax
And yes, 4557430888798830399 is the bitwise and of (~0x8080808080808080) and (~0x4040404040404040)

Related

Are arithmetic operations on literals in C++ evaluated at compile time?

Herein, similar questions were asked for C#:
Are arithmetic operations on literals in C# evaluated at compile time?,
and java:
Are arithmetic operations on literals calculated at compile time or run time?.
Considering C++, will the following calculations be evaluated during run- or compile-time? The first is to define a built-in type, the second is to be a function argument.
Yet please consider them for all 4 basic arithmetic operations as well as with other built-in types, e.g. an int instead of the double below.
double testDouble = 2.0 + 2.0;
aUserDefinedType testUserDefinedTypeObject
(
aMemberVariable*std::pow(someOtherVariable, 1.0/8.0)
);
It depends on your compiler and its optimization level when building the code.
There is no intrinsic guarantee of compile time evaluation, but most compilers will evaluate constant expressions at compile time when optimizations are turned on.
There is also constexpr which can also help the compiler know what can be evaluated at compile time.

Is it good practice to construct long circuit statements?

Question Context: [C++] I want to know what is theoretically the fastest, and what the compiler will do. I don't want to hear about premature optimization is the root of all evil, etc.
I was writing some code like this:
bool b0 = ...;
bool b1 = ...;
if (b0 && b1)
{
...
}
But then I was thinking: the code, as-is, will compile into two TEST instructions, if compiled without optimizations. This means two branches. So I was thinking that it might be better to write:
if (b0 & b1)
Which will produce only one TEST instruction, if no optimization is done by the compiler. But then I feel that this is against my code-style. I usually write && and ||.
Q: What will the compiler do if I turn on optimization flags (-O1, -O2, -O3, -Os and -Ofast). Will the compiler automatically compile it like &, even if I have used a && in the code? And what is theoretically faster? Does the behavior change if I do this:
if (b0 && b1)
{ ... }
else if (b0)
{ ... }
else if (b1)
{ ... }
else
{ ... }
Q: As I could have guessed, this is very depended on the situation, but is it a common trick for a compiler to replace a && with a &?
Q: What will the compiler do if I turn on optimization flags (-O1, -O2, -O3, -Os and -Ofast).
Most likely nothing more to increase the optimization.
As stated in my comments, you really can't optimize the evaluation any further than:
AND B0 WITH B1 (sets condition flags)
JUMP ZERO TO ...
Although, if you have a lot of simple boolean logic or data operations, some processors may conditionally execute them.
Will the compiler automatically compile it like &, even if I have used a && in the code?
And what is theoretically faster?
In most platforms, there is no difference in evaluation of A & B versus A && B.
In the final evaluation, either a compare or an AND instruction is executed, then a jump based on the status. Two instructions.
Most processors don't have Boolean registers. It's all numbers and bits.
Optimize By Boolean Logic
Your best option is to review the design and set up your algorithms to use Boolean algebra. You can than simplify the Boolean expressions.
Another option is to implement the code so that the compiler can generate conditional assembly instructions, if the platform supports them.
Optimize: Reduce jumps
Processors favor arithmetic and data transfers over jumps.
Many processors are always feeding an instruction pipeline. When it comes to a conditional branch instruction, the processor has to wait (suspend the instruction prefetching) until the condition status is determined. Then it can determine where the next instruction will be fetched.
If you can't remove the jumps, such as in a loop, make the ratio of data processing to jumping bigger in the data side. Search for "Loop Unrolling". Many compilers will perform this when optimization levels are increased.
Optimize: Data Cache
You may notice increased performance by organizing your data for best data cache usage.
For example, instead of 3 large arrays, use one array of a structure containing 3 elements. This allows the elements in use to be close to each other (and reduce the likelihood of accessing data outside of the cache).
Summary
The difference in evaluation of A && B versus A & B as conditional expressions is known as a micro-optimization. You will achieve improved performance by using Boolean algebra to reduce the quantity of conditional expressions. Jumps, or changes in execution path, slow down instruction execution. Fetching data outside of the data cache also slows down execution. You will most likely get better performance by redesigning your code and helping the compiler to reduce the branches and more effective use of the data cache.
If you care about what's fastest, why do you care what the compiler will do without optimisation?
Q: As I could have guessed, this is very depended on the situation, but is it a common trick for a compiler to replace a && with a &?
This question seems to assume that the compiler transforms C++ code into more C++ code. It doesn't. It transforms your code into machine instructions (including the assembler as part of the compiler for argument's sake). You should not assume there is a one-to-one mapping from a C++ operator like && or & to a particular instruction.
With optimisation the compiler will do whatever it thinks will be faster. If a single instruction would be faster the compiler will generate a single instruction for if (b0 && b1), you don't need to bugger up your code with micro-optimisations to help it make such a simple transformation.
The compiler knows the instruction set it's using, it knows the context the condition is in and whether it can be removed entirely as dead code, or moved elsewhere to help the pipeline, or simplified by constant propagation, etc. etc.
And if you really care about what's fastest, why would you compute b1 until you know it's actually needed? If obtaining the value of b1 has no side effects the compiler could even transform your code to:
bool b0 = ...;
if (b0)
{
bool b1 = ...;
if (b1)
{
Does that mean two if conditions are faster than a &?! Of course not.
In other words, the whole premise of the question is flawed. Do not compromise the readability and simplicity of your code in the misguided pursuit of the "theoretically fastest" micro-optimisation. Spend your time improving the algorithms and data structures used not trying to second guess which instructions the compiler will generate.

Do these macros evaluate to the same code using gcc at compile-time?

Of course this is going to be a function of the compiler you are using, but I figured this would be a simple question to answer.
#define UBRRVAL(baud) (F_CPU/(16*baud)-1)
As compared with
#define UBRRVAL(baud) (F_CPU/16/baud-1)
I know that the latter is going to evaluate to (assuming F_CPU = 20000000):
#define UBRRVAL(baud) (12500000/baud-1)
Considering the forced precidence by the parenthesis I was curious to know if most compilers (gcc in particular) would evaluate the former expression equivalently to the latter at compile-time.
This is code that is going into an embeddded system, so if these expressions are not evaluated at compile-time equivalently, then the latter is more efficient; a single division at run-time is more efficient than a division and a mulitplication of course.
Simple answer, no.
Because neither macro is fully parenthesized, there are cases where the two are very different.
Consider UBRRVAL(2+1). The first would expand to (F_CPU/(16*2+1)-1), which is equivalent to F_CPU/33 - 1. The second would expand to (F_CPU/16/2+1-1), which is equivalent to F_CPU/32. Not the same at all.
Of course, it probably isn't meant to be called with an expression, just with a single constant value, but there's nothing to prevent it, and as such, someone will do it sometime in the future. One of the many evils of macros. I would recommend using a short (static) inline function (or constexpr as suggested in comments, if this is using a recent enough C++ compiler) instead...
Simple answer, yes. Within the specific constraints given both will be fully evaluated at compile time.
Parentheses force precedence but they do not force order of evaluation, except to the extent defined by the "as if" rule. You cannot be sure what code will be emitted if the expression is slightly more complicated so it is not evaluated at compile time. This may well depend on the specific processor.
As a side point, on most processors a 4 bit shift left or shift right are the same cost, and if the baud rate is a power of two the compiler is likely to generate shift operations.
[And be careful about parenthesising macro arguments. You got away with it this time, but only just.]

C++ conversion from int to bool

I want to know if the compiled code of a bool-to-int conversion contains a branch (jump) operation.
For example, given void func(bool b) and int i:
Is the compiled code of calling func(i) equivalent to the compiled code of func(i? 1:0)?
Or is there a more elaborate way for the compiler to perform this without the branch operation?
Update:
In other words, what code does the compiler generate in order to push 1 or 0 into the stack before jumping to the address of the function?
I assume that it really comes down to the architecture of the CPU at hand, and that some specific processors (certain DSPs, for example) may support this. So my question refers to "conventional" general-purpose CPUs (assuming that this definition is acceptable).
In terms of pure software, the question can also be phrased as: is there an efficient way for converting an integer value to 1 when it's not 0, and to 0 otherwise, without using a conditional statement?
Thanks
It's not your (compiler user) job too make built-in type conversion efficient. If the compiler is not dumb, it will make that sort of things as close as the CPU representation are.
For the most of the commercial CPU, bool and int are the exact same thing, and if(x) { ... }
translate in bit-anding (or bit-oring, whichever is faster: they are normally immediate instructions) x with itself and make a conditional jump after the } if the zero flag is set. (not that this is just a trick to force the zero-flag computation, that is an immediate consequence of the arithmetic unit electronics)
variants are much more a matter of CPU electronics, than code. So don'care about it. ifs are not triggered by a bool, but by the last arithmetic operation result.
Whatever arithmetic operation held by a CPU produces a result ans set some flags that represent certain result attributes: if it is zero, if it produced a carry or borrow, if it has an odd or even number of bit set to 1 etc. Resut and Flags are two registers, and can be loaded and stored from/to memory.

Efficency of repeated arithmetic between two macros

In an ANSI C project I am working on, I have two macros defined: PERIOD_IN_MS and CYCLES_PER_MS. In the actual period handling logic, I do many comparisons between a counter that is incremented every ''cycle'' and PERIOD_IN_MS * CYCLES_PER_MS. I'm concerned that this arithmetic operation is repeatedly evaluated during each comparison.
Does anyone know if this is true, or if the compiler will evaluate the product of the two integer literals at compile time and use that instead?
I realize that this particular example would probably only remove one instruction out of the generated assembly code, but now I'm curious about this.
The standard doesn't impose any requirement to do this, but any sensible compiler will fold these constants down into one at compile-time. See e.g. http://en.wikipedia.org/wiki/Constant_propagation.
If you're curious to know whether this has actually happened, you can always take a look at the assembler generated by the compiler.
The compiler should (but I believe in C is not required to) evaluate the constant expression at compile-time. A good compiler will almost certainly do it, though, when optimization is turned on.
If you want to avoid multiple evaluation, maybe just to speed up compilation and your constants fit into int, you could enforce single evaluation by using an enumeration constant, instead.
enum { cycles_per_period = PERIOD_IN_MS * CYCLES_PER_MS};