In C++, if value of a variable never gets changed once assigned in whole program VS If making that variable as const , In which case executable code is faster?
How compiler optimize executable code in case 1?
A clever compiler can understand that the value of a variable is never changed, thus optimizing the related code, even without the explicit const keyword by the programmer.
As for your second, question, when you mark a variable as const, then the follow might happen: the "compiler can optimize away this const by not providing storage to this variable rather add it in symbol table. So, subsequent read just need indirection into the symbol table rather than instructions to fetch value from memory". Read more in What kind of optimization does const offer in C/C++? (if any).
I said might, because const does not mean that this is a constant expression for sure, which can be done by using constexpr instead, as I explain bellow.
In general, you should think about safer code, rather than faster code when it comes to using the const keyword. So unless, you do it for safer and more readable code, then you are likely a victim of premature optimization.
Bonus:
C++ offers the constexpr keyword, which allows the programmer to mark a variable as what the Standard calls constant expressions. A constant expression is more than merely constant.
Read more in Difference between `constexpr` and `const` and When should you use constexpr capability in C++11?
PS: Constness prevents moving, so using const too liberally may turn your code to execute slower.
In which case executable code is faster?
The code is faster in case on using const, because compiler has more room for optimization. Consider this snippet:
int c = 5;
[...]
int x = c + 5;
If c is constant, it will simply assign 10 to x. If c is not a constant, it depend on compiler if it will be able to deduct from the code that c is de-facto constant.
How compiler optimize executable code in case 1?
Compiler has harder time to optimize the code in case the variable is not constant. The broader the scope of the variable, the harder for the compiler to make sure the variable is not changing.
For simple cases, like a local variables, the compiler with basic optimizations will be able to deduct that the variable is a constant. So it will treat it like a constant.
if (...) {
int c = 5;
[...]
int x = c + 5;
}
For broader scopes, like global variables, external variables etc., if the compiler is not able to analyze the whole scope, it will treat it like a normal variable, i.e. allocate some space, generate load and store operations etc.
file1.c
int c = 5;
file2.c
extern int c;
[...]
int x = c + 5;
There are more aggressive optimization options, like link time optimizations, which might help in such cases. But still, performance-wise, the const keyword helps, especially for variables with wide scopes.
EDIT:
Simple example
File const.C:
const int c = 5;
volatile int x;
int main(int argc, char **argv)
{
x = c + 5;
}
Compilation:
$ g++ const.C -O3 -g
Disassembly:
5 {
6 x = c + 5;
0x00000000004003e0 <+0>: movl $0xa,0x200c4a(%rip) # 0x601034 <x>
7 }
So we just move 10 (0xa) to x.
File nonconst.C:
int c = 5;
volatile int x;
int main(int argc, char **argv)
{
x = c + 5;
}
Compilation:
$ g++ nonconst.C -O3 -g
Disassembly:
5 {
6 x = c + 5;
0x00000000004003e0 <+0>: mov 0x200c4a(%rip),%eax # 0x601030 <c>
0x00000000004003e6 <+6>: add $0x5,%eax
0x00000000004003e9 <+9>: mov %eax,0x200c49(%rip) # 0x601038 <x>
7 }
We load c, add 5 and store to x.
So as you can see even with quite aggressive optimization (-O3) and the shortest program you can write, the effect of const is quite obvious.
g++ version 5.4.1
Related
Consider some dead-simple code (or a more complicated one, see below1) that uses an uninitialized stack variable, e.g.:
int main() { int x; return 17 / x; }
Here's what GCC emits (-O3):
mov eax, 17
xor ecx, ecx
cdq
idiv ecx
ret
Here's what MSVC emits (-O2):
mov eax, 17
cdq
idiv DWORD PTR [rsp]
ret 0
For reference, here's what Clang emits (-O3):
ret
The thing is, all three compilers detect that this is an uninitialized variable just fine (-Wall), but only one of them actually performs any optimizations based on it.
This is kind of stumping me... I thought all the decades of fighting over undefined behavior was to allow compiler optimizations, and yet I'm seeing only one compiler cares to optimize even the most basic cases of UB.
Why is this? What do I do if I want compilers other than Clang to optimize such cases of UB? Is there any way for me to actually get the benefits of UB instead of just the downsides with either compiler?
Footnotes
1 Apparently this was too much of an SSCCE for some folks to appreciate the actual issue. If you want a more complicated example of this problem that isn't undefined on every execution of the program, just massage it a bit. e.g.:
int main(int argc, char *[])
{
int x;
if (argc) { x = 100 + (argc * argc + x); }
return x;
}
On GCC you get:
main:
xor eax, eax
test edi, edi
je .L1
imul edi, edi
lea eax, [rdi+100]
.L1:
ret
On Clang you get:
main:
ret
Same issue, just more complicated.
Optimizing for actually reading unintiailized data is not the point.
Optimizing for assuming the data you read must have been initialized is.
So if you have some variable that can only be written to as 3 or 1, the compiler can assume it is odd.
Or, if you add positive signed constant to a signed value, we can assume the result is larger than the original signed value (this makes some loops faster).
What the optimizer does when it proves an uninitialized value is read isn't important; making UB or indeterminate value calculation faster is not the point. Well behaved programs don't do that on purpose, spending effort making it faster (or slower, or caring) is a waste of compiler writers time.
It may fall out of other efforts. Or it may not.
Consider this example:
int foo(bool x) {
int y;
if (x) y = 3;
return y;
}
Gcc realizes that the only way the function can return something well defined is when x is true. Hence, when optimizations are turned on there is no brach:
foo(bool):
mov eax, 3
ret
Calling foo(true) is not undefined behavior. Calling foo(false) is undefined behavior. There is nothing in the standard that specifies why foo(false) returns 3. There is also nothing in the standard that mandates that foo(false) does not return 3. Compilers do not optimize code that has undefined behavior, but compilers can optimize code without UB (eg remove the branch in foo) because it is not specified what happens when there is UB.
What do I do if I want compilers other than Clang to optimize such cases of UB?
Compilers do that by default. Gcc is not different than Clang with respect to that.
In your example of
int main() { int x; return 17 / x; }
there is no missed optimization, because it is not defined what the code will do in the first place.
Your second example can be considered as a missed opportunity for optimization. Though, again: UB grants opportunities to optimize code that does not have UB. The idea is not that you introduce UB in your code to gain optimizations. As your second example can (and should be) rewritten as
int main(int argc, char *[])
{
int x = 100 + (argc * argc + x);
return x;
}
It isnt a big issue in practice that gcc doesn't bother to remove the branch in your version. If you don't need the branch you don't have to write it just to expect the compiler to remove it.
The Standard uses the term "Undefined Behavior" to refer to actions which in some contexts might be non-portable but correct, but in other contexts would be erroneous, without making any effort to distinguish into when a particular action should be viewed one way or the other.
In C89 and C99, if it would be possible for a type's underlying storage to hold an invalid bit pattern, attempting to use an uninitialized automatic-duration or malloc-allocated object of that type would invoke Undefined Behavior, but if all possible bit patterns would be valid, accessing such an object would simply yield an Unspecified Value of that type. This meant, for example, that a program could do something like:
struct ushorts256 { uint16_t dat[256]; } x,y;
void test(void)
{
struct ushorts256 temp;
int i;
for (i=0; i<86; i++)
temp.dat[i*3]=i;
x=temp;
y=temp;
}
and if the callers only cared about what was in multiple-of-3 elements of the structures, there would be no need to have the code worry about the other 171 values of temp.
C11 changed the rules so that compiler writers wouldn't have to follow the C89 and C99 behavior if they felt there was something more useful they could do. For example, depending upon what calling code would do with the arrays, it might be more efficient to simply have the code write every third item of x and every third item of y, leaving the remaining items alone. A consequence of this would be that non-multiple-of-3 items of x might not match the corresponding items of y, but people seeking to sell compilers were expected to be able to judge their particular customers' needs better than the Committee ever could.
Some compilers treat uninitialized objects in a manner consistent with C89 and C99. Some may exploit the freedom to have the values behave non-deterministically (as in the example above) but without not disrupting program behavior. Some may opt to treat any programs that access uninitialized variables in gratuitously meaningless fashion. Portable programs may not rely upon any particular treatment, but the authors of the Standard have expressly stated they did not wish to "demean" useful programs that happened to be non-portable (see http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf page 13).
This question already has answers here:
Two different values at the same memory address
(7 answers)
Closed 5 years ago.
#include <iostream>
using namespace std;
int main() {
const int a = 10;
auto *b = const_cast<int *>(&a);
*b = 20;
cout << a << " " << *b;
cout << endl << &a << " " << b;
}
The output looks like:
10 20
0x7ffeeb1d396c 0x7ffeeb1d396c
The a and *b are at the same address, why do they have different value?
Most probably this is caused by optimization.
As molbdnilo already said in his comment: "Compilers trust you blindly. If you lie to them, strange things can happen."
So when optimization is enabled, the compiler finds the declaration
const int a = 10;
and thinks "ah, this will never change, so we don't need a "real" variable and can replace all appearances of a in the code simply with 10". This behavior is called constant folding.
Now in the next line you "cheat" the compiler:
auto *b = const_cast<int *>(&a);
*b = 20;
and change a, although you have promised not to do so.
The result is confusion.
Like a lot of people have already mentioned and Christian Hackl has thoroughly analyzed in his excellent in-depth answer, it's undefined behavior. The compiler is generally allowed to apply constant folding on constants, which are explicitly declared const.
Your question is a very good example (I don't know how anyone can vote it down!) why const_cast is very dangerous (and even more when combined with raw pointers), should be only used if absolutely necessary, and should be at least thoroughly commented why it was used if it was unavoidable. Comments are important in that case because it's not only the compiler you are "cheating":
Also your co-workers will rely on the information const and rely on the information that it won't change. If it DOES change though, and you didn't inform them, and did not comment, and did not exactly know what you were doing, you'll have a bad day at work :)
Try it out: Perhaps your program will even behave different in debug build (not optimized) and release build (optimized), and those bugs are usually pretty annoying to fix.
I think it helps to view const like this:
If you declare something as const it is actually not the compiler that ensures that you dont change it, but rather the opposite: you make a promise to the compiler that you wont change it.
The language helps you a bit to keep your promise, eg you cannot do:
const int a = 5;
a = 6; // error
but as you discovered, you indeed can attempt to modify something that you declared const. However, then you broke your promise and as always in c++, once you break the rules you are on your own (aka undefined behaviour).
The a and *b are at the same address, why do they have different
value?
They don't, not according to C++. Neither do they, according to C++.
How can that be? The answer is that the C++ language does not define the behaviour of your program because you modify a const object. That's not allowed, and there are no rules for what will happen in such a case.
You are only allowed to use const_cast in order to modify something which wasn't const in the first place:
int a = 123;
int const* ptr1 = &a;
// *ptr1 = 456; // compilation error
int* ptr2 = const_cast<int*>(ptr1);
*ptr2 = 456; // OK
So you've ended up with a program whose behaviour is not defined by C++. The behaviour is instead defined by compiler writers, compiler settings and pure chance.
That's also what's wrong with your question. It's impossible to give you a correct answer without knowing exactly how you compiled this and on which system it runs, and perhaps even then there are random factors involved.
You can inspect your binary to find out what the compiler thought it was doing; there are even online tools like https://gcc.godbolt.org/ for that. But try to use printf instead of std::cout for such an analysis; it will produce easier-to-read assembly code. You may also use easier-to-spot numbers, like 123 and 456.
For example, assuming that you even use GCC, compare these two programs; the only difference is the const modifier for a:
#include <stdio.h>
int main() {
int const a = 123;
auto *b = const_cast<int *>(&a);
*b = 456;
printf("%d", a);
printf("%d", *b);
}
#include <stdio.h>
int main() {
int a = 123;
auto *b = const_cast<int *>(&a);
*b = 456;
printf("%d", a);
printf("%d", *b);
}
Look at what the printf("%d", a); call becomes. In the const version, it becomes:
mov esi, 123
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
In the non-const version:
mov eax, DWORD PTR [rbp-12]
mov esi, eax
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
Without going too much into the details of assembly code, one can see that the compiler optimised the const variable such that the value 123 is pushed directly to printf. It doesn't really treat a as a variable for the printf call, but it does do so for the pointer initialisation.
Such chaos is the nature of undefined behaviour. The moral of the story is that you should always avoid undefined behaviour, and thinking twice before casting is an important part of allowing the compiler to help you in that endeavor.
Compiling your code once with const and once without const using gcc, shows that there either no space is allocated for a, or it's content is being ignored in const version. (even with 0 optimization) The only difference in their assembly result is the line which you print the a. Compiler simply prints 10 instead of refering to memory and fetching contents of a: (left side is const int a and right side is int a version)
movl $10, %esi | movl -36(%rbp), %eax
> movl %eax, %esi
As you can see there is an extra memory access for non-const version, which we know is expensive in terms of time. So the compiler decided to trust your code, and concluded that a const is never to be changed, and replaced its value wherever the variable is referenced.
Compiler options: --std=c++11 -O0 -S
Example:
#define Var1 35
static const int Var1( 35);
So while #define replaces everywhere that I've used Var1 with 35 at compile time (which I presume makes the compile time slightly longer, if you have a lot of them, as it parses the code), using a static const int makes the compiler consider it a variable.
Does this mean that when using static const int it'll increase the memory imprint of my program because it has to use memory for all those constants, or is this overhead pretty much optimised out by the compiler anyway?
The reason I ask is because I'm wondering if it'd be better, for situations like this, to have them as static const ints in debug mode (so you can easily see the values while debugging) but make them #defines in release mode so it would make the program smaller.
Using macros to “make the program smaller” is ungood for several reasons:
Use of macros may instead make the program larger, or have no effect.
Macros don't follow the scoping rules in C++. You risk inadvertent text replacement.
Depending on the quality of the tools you may lose debug information.
The advantageous effect, if it occurs, is marginal.
The common convention for avoiding macro name clashes, namely ALL UPPERCASE, is an eyesore.
In short this is an example of a premature optimization.
And as Donald Knuth observed, premature optimizations are Evil™.
In passing, note that the static in
static const int Var1( 35);
… is redundant if this at namespace scope. A namespace scope constant has internal linkage by default. Just write
const int Var1 = 35;
… for the same effect, but IMHO more clear.
If it is static then the compiler can see that it's only used inside of that translation unit and not have to wonder how it's used externally, which is an advantage. If you don't do anything making it have to be an actual variable (such as creating a pointer to it) then the compiler will often optimize it out.
A friendlier approach could be using enums
enum { Var1 = 35 };
or in C++11, constexpr
constexpr int Var1 = 35;
These also have the advantage of not messing with a variable of the same name in another scope, if you later had
void f() {
int Var1;
}
The #define would turn it into int 35;
But the difference in memory used will be very small, likely so insignificant it will never have any measurable impact on performance unless you're in an extremely limited environment.
"Does this mean that when using static const int it'll increase the memory imprint of my program because it has to use memory for all those constants, or is this overhead pretty much optimised out by the compiler anyway?"
That's totally dependent on your actual compiler's implementation, and how well optimization features are done with it.
For the case of simple numerical constants, that come within a logical context I'd pefer using enum declarations anyway.
And most of the time I find using a static const int Var1( 35); is the better choice vs a #define'd value, because I have full control over the scope where it should be seen.
Every decent compiler makes constant propagation to see which expression will remain constant. const helps the compiler in this job.
The next thing most compiler do very well is to remove unused parts of the code. This is why const variables which are not visible to the outside neither directly (local variables, static variables) nor indirectly (i.e. the address of the variable was not used as reference of for assigning a value to a pointer) are removed by the optimizer.
Example:
static const int e = 29;
int main()
{
int x = e;
return x + 1;
}
Will be compiled by MSVC 2013 in release mode to :
PUBLIC _main
_TEXT SEGMENT
_main PROC
mov eax, 30 ; optimized the code to return 30
ret 0
_main ENDP
_TEXT ENDS
END ; no place is reserved nowhere for the static.
According to the GCC manual, the -fipa-pta optimization does:
-fipa-pta: Perform interprocedural pointer analysis and interprocedural modification and reference analysis. This option can cause excessive
memory and compile-time usage on large compilation units. It is not
enabled by default at any optimization level.
What I assume is that GCC tries to differentiate mutable and immutable data based on pointers and references used in a procedure. Can someone with more in-depth GCC knowledge explain what -fipa-pta does?
I think the word "interprocedural" is the key here.
I'm not intimately familiar with gcc's optimizer, but I've worked on optimizing compilers before. The following is somewhat speculative; take it with a small grain of salt, or confirm it with someone who knows gcc's internals.
An optimizing compiler typically performs analysis and optimization only within each individual function (or subroutine, or procedure, depending on the language). For example, given code like this contrived example:
double *ptr = ...;
void foo(void) {
...
*ptr = 123.456;
some_other_function();
printf("*ptr = %f\n", *ptr);
}
the optimizer will not be able to determine whether the value of *ptr has been changed by the call to some_other_function().
If interprocedural analysis is enabled, then the optimizer can analyze the behavior of some_other_function(), and it may be able to prove that it can't modify *ptr. Given such analysis, it can determine that the expression *ptr must still evaluate to 123.456, and in principle it could even replace the printf call with puts("ptr = 123.456");.
(In fact, with a small program similar to the above code snippet I got the same generated code with -O3 and -O3 -fipa-pta, so I'm probably missing something.)
Since a typical program contains a large number of functions, with a huge number of possible call sequences, this kind of analysis can be very expensive.
As quoted from this article:
The "-fipa-pta" optimization takes the bodies of the called functions into account when doing the analysis, so compiling
void __attribute__((noinline))
bar(int *x, int *y)
{
*x = *y;
}
int foo(void)
{
int a, b = 5;
bar(&a, &b);
return b + 10;
}
with -fipa-pta makes the compiler see that bar does not modify b, and the compiler optimizes foo by changing b+10 to 15
int foo(void)
{
int a, b = 5;
bar(&a, &b);
return 15;
}
A more relevant example is the “slow” code from the “Integer division is slow” blog post
std::random_device entropySource;
std::mt19937 randGenerator(entropySource());
std::uniform_int_distribution<int> theIntDist(0, 99);
for (int i = 0; i < 1000000000; i++) {
volatile auto r = theIntDist(randGenerator);
}
Compiling this with -fipa-pta makes the compiler see that theIntDist is not modified within the loop, and the inlined code can thus be constant-folded in the same way as the “fast” version – with the result that it runs four times faster.
I've read that the C++ standard allows optimization to a point where it can actually hinder with expected functionality. When I say this, I'm talking about return value optimization, where you might actually have some logic in the copy constructor, yet the compiler optimizes the call out.
I find this to be somewhat bad, as in someone who doesn't know this might spend quite some time fixing a bug resulting from this.
What I want to know is whether there are any other situations where over-optimization from the compiler can change functionality.
For example, something like:
int x = 1;
x = 1;
x = 1;
x = 1;
might be optimized to a single x=1;
Suppose I have:
class A;
A a = b;
a = b;
a = b;
Could this possibly also be optimized? Probably not the best example, but I hope you know what I mean...
Eliding copy operations is the only case where a compiler is allowed to optimize to the point where side effects visibly change. Do not rely on copy constructors being called, the compiler might optimize away those calls.
For everything else, the "as-if" rule applies: The compiler might optimize as it pleases, as long as the visible side effects are the same as if the compiler had not optimized at all.
("Visible side effects" include, for example, stuff written to the console or the file system, but not runtime and CPU fan speed.)
It might be optimized, yes. But you still have some control over the process, for example, suppose code:
int x = 1;
x = 1;
x = 1;
x = 1;
volatile int y = 1;
y = 1;
y = 1;
y = 1;
Provided that neither x, nor y are used below this fragment, VS 2010 generates code:
int x = 1;
x = 1;
x = 1;
x = 1;
volatile int y = 1;
010B1004 xor eax,eax
010B1006 inc eax
010B1007 mov dword ptr [y],eax
y = 1;
010B100A mov dword ptr [y],eax
y = 1;
010B100D mov dword ptr [y],eax
y = 1;
010B1010 mov dword ptr [y],eax
That is, optimization strips all lines with "x", and leaves all four lines with "y". This is how volatile works, but the point is that you still have control over what compiler does for you.
Whether it is a class, or primitive type - all depends on compiler, how sophisticated it's optimization caps are.
Another code fragment for study:
class A
{
private:
int c;
public:
A(int b)
{
*this = b;
}
A& operator = (int b)
{
c = b;
return *this;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
int b = 0;
A a = b;
a = b;
a = b;
return 0;
}
Visual Studio 2010 optimization strips all the code to nothing, in release build with "full optimization" _tmain does just nothing and immediately returns zero.
This will depend on how class A is implemented, whether the compiler can see the implementation and whether it is smart enough. For example, if operator=() in class A has some side effects such optimizing out would change the program behavior and is not possible.
Optimization does not (in proper term) "remove calls to copy or assignments".
It convert a finite state machine in another finite state, machine with a same external behaviour.
Now, if you repeadly call
a=b; a=b; a=b;
what the compiler do depends on what operator= actually is.
If the compiler founds that a call have no chances to alter the state of the program (and the "state of the program" is "everything lives longer than a scope that a scope can access") it will strip it off.
If this cannot be "demonstrated" the call will stay in place.
Whatever the compiler will do, don't worry too much about: the compiler cannot (by contract) change the external logic of a program or of part of it.
i dont know c++ that much but am currently reading Compilers-Principles, techniques and tools
here is a snippet from their section on code optimization:
the machine-independent code-optimization phase attempts to improve
intermediate code so that better target code will result. Usually
better means faster, but other objectives may be desired, such as
shorter code, or target code that consumes less power. for example a
straightforward algorithm generates the intermediate code (1.3) using
an instruction for each operator in the tree representation that comes
from semantic analyzer. a simple intermediate code generation
algorithm followed by code optimization is a reasonable way to
generate good target code. the optimizar can duduce that the
conversion of 60 from integer to floating point can be done once and
for all at compile time, so the inttofloat operation can be eliminated
by replacing the integer 6- by the floating point number 60.0.
moreover t3 is used only once to trasmit its value to id1 so the
optimizer can transform 1.3 into the shorter sequence (1.4)
1.3
t1 - intoffloat(60
t2 -- id3 * id1
ts -- id2 + t2
id1 t3
1.4
t1=id3 * 60.0
id1 = id2 + t1
all and all i mean to say that code optimization should come at a much deeper level and because the code is at such a simple state is doesnt effect what your code does
I had some trouble with const variables and const_cast. The compiler produced incorrect results when it was used to calculate something else. The const variable was optimized away, its old value was made into a compile-time constant. Truly "unexpected behavior". Okay, perhaps not ;)
Example:
const int x = 2;
const_cast<int&>(x) = 3;
int y = x * 2;
cout << y << endl;