How to instruct VC++ compiler to not inline a constant? - c++

I have the following global constant in my C++ program:
const int K = 123456 ;
When I compile the program, the resulting executable contains the literal value 123456 in all the places where the value is used (dozens of times).
But, if I remove the const qualifier, the value 123456 appears only once in the entire executable (in the .data section).
This is the result I'm looking for. I want the value 123456 to appear only once so that it can be changed simply by editing the .exe file with a HEX editor.
However, I don't want to remove the const qualifier because I want the compiler to prevent me from accidentally modifying the constant in the source code.
Is it possible to instruct the compiler somehow to not inline the value of said constant?
The reason I need to do this is so that the executable is easily modifiable by students who will be tasked with "cracking" an example program to alter its behavior. The exercise must be simple enough for inexperienced people.

If you don't want K to be inlined then put this in a header file:
extern const int K;
This means "K is defined somewhere else". Then put this in a cpp file:
const int K = 123456;
In all the places where K is used, the compiler only knows that K is a const int declared externally. The compiler doesn't know the value of K so it cannot be inlined. The linker will find the definition of K in the cpp file put it in the .data section of the executable.
Alternatively, you could define K like this:
const volatile int K = 123456;
This means "K might magically change so you better not assume its value". It has a similar effect to the previous approach as the compiler won't inline K because it can't assume that K will always be 123456. The previous approach would fail if LTO was enabled but using volatile should work in that case.
I must say, this is a really weird thing to do. If you want to make your program configurable, you should put the value of K into a text file and then read the file at startup.

The simplest option is probably to declare it as global without const, so the compiler can't assume that it still has the value of the static initializer.
int K = 123456;
Even link-time optimization can't know that a library function doesn't access this global, assuming you call any in your program.
If your used static int K = 123456;, the compiler could notice that no functions in the compilation unit write the value, and none of them pass or return its address, so escape analysis for the whole compilation unit could discover that it was effectively a constant and could be optimized away.
(If you really wanted it to be static int K;, include a global function like void setK(int x){K=x;} that you never actually call. Without Link-Time Optimization, the compiler will have to assume that something outside this compilation unit could have called this function and changed K, and that any call to a function whose definition isn't visible might result in such a call.)
Beware that volatile const int K = 123456; can hurt optimization significantly more than making it non-const, especially if you have expressions that use K multiple times.
(But either of these can hurt a lot, depending on what optimizations were possible. Constant-propagation can be a huge win.)
The compiler is required to emit asm that loads exactly K once for each time the C abstract machine reads it. (e.g. reading K is considered a visible side-effect, like a read from an MMIO port or a location you have a hardware watchpoint on.)
If you want to let a compiler load it once per loop, and assume K is a loop invariant, then code that uses it should do int local_k = K;. It's up to you how often you want to re-read K, i.e. what scope you do / redo local_k = K at.
On x86, using a memory source operand that stays hot in L1d cache is probably not much of a performance problem, but it will prevent auto-vectorization.
The reason I need to do this is so that the executable is easily modifiable by students who will be tasked with "cracking" an example program to alter its behavior. The exercise must be simple enough for inexperienced people.
For this use-case, yes volatile is exactly what you want. Having all uses re-read from memory on the spot makes it slightly simpler than following the value cached in a register.
And performance is essentially irrelevant, and you won't want auto-vectorization. Probably just light optimization so the students don't have to wade through store/reload of everything after every C++ statement. Like gcc's -Og would be ideal.
With MSVC, maybe try -O1 or -O2 and see if it does anything confusing. I don't think it has options for some but not too aggressive optimization, it might be either debug build (nice for single-stepping the C++ source, bad for reading asm), or fully optimized for size or speed.

Try declaring the constant as volatile. That should result in a single and changeable value that won't be inlined.

Related

Is it possible to make variable truly read-only in C++?

By using the const qualifier a variable is supposed to be made read-only. As an example, an int marked const cannot be assigned to:
const int number = 5; //fine to initialize
number = 3; //error, number is const
At first glance it looks like this makes it impossible to modify the contents of number. Unfortunately, actually it can be done. As an example const_cast could be used (*const_cast<int*>(&number) = 3). This is undefined behavior, but this doesn't guarantee that number does not actually get modified. It could cause the program to crash, but it could also simply modify the value and continue.
Is it possible to make it actualy impossible to modify a variable?
A possible need for this might be security concerns. It might need to be of highest importance that some very valuable data must not be changed or that a piece of data being sent must not be modified.
No, this is not the concern of a programming language. Any "access" protection is only superficial and only exists at compile-time.
Memory of a computer can always be modified at runtime if you have the corresponding rights. Your OS might provide you with facilities to secure pages of memory though, e.g VirtualProtect() under Windows.
(Notice that an "attacker" could use the same facilities to restore the access if he has the privilege to do so)
Also I assume that there might be hardware solutions for this.
There is also the option of encrypting the data in question. Yet it appears to be a chicken-and-egg situation as the private key for the encryption and decryption has to be stored somewhere in memory as well (with a software-only solution).
While most of the answers in this thread are correct, but they are related to const, while the OP is asking for a way to have a constant value defined and used in the source code. My crystal ball says that OP is looking for symbolic constants (preprocessor #define statements).
#define NUMBER 3
//... some other code
std::cout<<NUMBER;
This way, the developer is able to parametrize values and maintain them easily, while there's virtually no (easy) way to alter it once the program is compiled and launched.
Just keep in mind that const variables are visible to debuggers, while symbolic constants are not, but they require no additional memory. Another criteria is the type checking, which is absent in case of symbolic constants, as well as for macros.
const is not intended to make a variable read-only.
The meaning of const x is basically:
Hey compiler, please prevent me from casually writing code in this scope which changes x.
That's very different from:
Hey compiler, please prevent any changes to x in this scope.
Even if you don't write any const_cast's yourself - the compiler will still not assume that const'ed entities won't change. Specifically, if you use the function
int foo(const int* x);
the compiler cannot assume that foo() doesn't change the memory pointed to by x.
You could use your value without a variable
Variables vary... so, naturally, a way to prevent that is using values which aren't stored in variables. You can achieve that by using...
an enumeration with a single value: enum : int { number = 1 }.
the preprocessor: #define NUMBER 1 <- Not recommended
a function: inline int get_number() { return 1; }
You could use implementation/platform-specific features
As #SebastianHoffman suggests, typical platforms allow marking some of a process' virtual memory space as read-only, so that attempts to change it result in an access violation signal to the process and the suspension of its execution. This is not a solution within the language itself, but it is often useful. Example: When you use string literals, e.g.:
const char* my_str = "Hello world";
const_cast<char*>(my_str)[0] = 'Y';
Your process will likely fail, with a message such as:
Segmentation fault (core dumped)
If you know the program at compile-time, you can place the data in read-only memory. Sure, someone could get around this, but security is about layers rather than absolutes. This makes it harder. C++ has no concept of this, so you'll have to inspect the resulting binary to see if it's happened (this could be scripted as a post-build check).
If you don't have the value at compile-time, your program depends on being able to change / set it at runtime, so you fundamentally cannot stop that from happening.
Of course, you can make it harder though things like const so the code is compiled assuming it won't change / programmers have a harder time accidentally changing it.
You may also find constexpr an interesting tool to explore here.
There is no way to specify what code does that does not adhere to the specification.
In your example number is truly constant. You correctly note that modifiying it after a const_cast would be undefined beahvior. And indeed it is impossible to modify it in a correct program.

Does the inline asm compiler barrier (memory clobber) count as an external function, or as static function call?

Introduction/confirmation of basic facts
It is well known that with GCC style C and C++ compilers, you can use inline assembly with a "memory" clobber:
asm("":::"memory");
to prevent reordering of (most) code past it, acting as a (thread local) "memory barrier" (for example for the purpose of interacting with async signals).
Note: these "compiler barriers" do NOT accomplish inter-threads synchronization.
It does the equivalent of a call to a non inline function, potentially reading all objects that can be read outside of the current scope and altering all those that can be altered (non const objects):
int i;
void f() {
int j = i;
asm("":::"memory"); // can change i
j += i; // not j *= 2
// ... (assume j isn't unused)
}
Essentially it's the same as calling a NOP function that's separately compiled, except that the non inline NOP function call is later (1) inlined so nothing survives from it.
(1) say, after compiler middle pass, after analysis
So here j cannot be changed as it's local, and is still the copy of the old i value, but i might have changed, so the compilation is pretty much the same as:
volatile int vi;
int f2() {
int j = vi;
; // can "change" vi
j += vi; // not j *= 2
return j;
}
Both reads of vi are needed (for a different reason) so the compiler doesn't change that into 2*vi.
Is my understanding correct up to that point? (I presume it is. Otherwise the question doesn't make sense.)
The real issue: extern or static
The above was just the preamble. The issue I have is with static variables, possible calls to static functions (or the C++ equivalent, anonymous namespaces):
Can a memory clobber access static data that isn't otherwise accessible via non static functions, and call static functions that aren't otherwise callable, as none of these are visible at link stage, from other modules, if they aren't named explicitly in the input arguments of the asm directive?
static int si;
int f3() {
int j = si;
asm("":::"memory"); // can access si?
j += si; // optimized to j = si*2; ?
return j;
}
[Note: the use of static is a little ambiguous. The suggestion is that the boundary of the TU is important, and that the static variable is TU-private, but I have not described how it was manipulated. Let's assume it is really manipulated that in that TU, or the compiler might assume it's effectively a constant.]
In other words, is that "clobber" the equivalent of a call to:
an external NOP function, which wouldn't be able to name si directly, nor to access it in any indirect way, as no function in the TU either communicates the address of si, or makes si indirectly modifiable
a locally defined NOP function that can access si
?
Bonus question: global optimization
If the answer is that static variables aren't treated like extern variables in that case, what is the impact when compiling the program at once? More specifically:
During global compilation of the whole program, with global analysis and inference over variables values, is the knowledge of the fact that for example a global variable is never modified (or never assigned a negative value...), except possibly in an asm "clobber", an input of the optimizer?
In other words, if non static i is only named in one TU, can it be optimized as if it was a static int even if there are asm statements? Should global variables be explicitly listed as clobbers in that case?
It does the equivalent of a call to a non inline function, potentially reading all objects that can be read outside of the current scope and altering all those that can be altered (non const objects):
No.
The compiler can decide to inline any function in the same compilation unit (and then, if the function wasn't static, also provide a separate "not inlined" copy for callers in other compilation units so that the linker can find one); and with link-time code optimization/link-time code generation the linker can decide to inline any functions in different compilation units. The only case where it's currently impossible for any function to be inlined is when it is in a shared library; but this limitation currently exists because operating systems currently aren't capable of "load-time optimization".
In other words; any appearance of any kind of barrier for any function is an unintended side-effect of optimizer weaknesses and not guaranteed; and therefore can not/should not be relied on.
The real issue: inline assembly
There are 5 possibilities:
a) The compiler understands all assembly, and is able to examine the inline assembly and determine what is/isn't clobbered; there is no clobber list (and no need for one). In this case (depending on how advanced the compiler/optimiser is) the compiler may be able to determine things like "this area of memory may be clobbered but that area of memory won't be clobbered" and avoid the cost of reloading data from the area of memory that wasn't clobbered.
b) The compiler doesn't understand any assembly and there is no clobber list, so the compiler has to assume everything will be clobbered; which means that the compiler has to generate code that saves the everything (e.g. currently in use values in registers, etc) to memory before the inline assembly is executed and reload everything afterwards, which will give extremely bad performance.
c) The compiler doesn't understand any assembly, and expects the programmer to provide a clobber list to avoid (some of) the performance disaster of having to assume everything will be clobbered.
d) The compiler understands some assembly but not all assembly, and doesn't have a clobber list. If it doesn't understand the assembly it assumes everything may have been clobbered.
e) The compiler understands some assembly but not all assembly, and does have an (optional?) clobber list. If it doesn't understand the assembly it relies on the clobber list (and/or falls back to "assume everything is clobbered" if there is no clobber list), and if it does understand the assembly it ignores the clobber list.
Of course a compiler that uses "option c)" can be improved to use "option e)"; and a compiler that uses "option e)" can be improved to use "option a)".
In other words; any appearance of any kind of barrier for something like "asm("":::"memory");" is an unintended side-effect of the compiler being "improvable"; and therefore can not/should not be relied on.
Summary
None of the things you've mentioned are actually a barrier of any kind. It's all just "unintended and undesired failure to optimize".
If you do need a barrier, then use an actual barrier (e.g. "asm("mfence":::"memory");". However (unless you need inter-threads synchronization and aren't using atomics) its extremely likely that you do not need a barrier in the first place.

Does c optimize the check portion of for loops?

With the following code how many times would the min function actually be called
for (int i = 0; i < min(size, max_size); i++) {
//Do something cool that does not involve changing the value of size or max size
}
Would the compiler notice that they could just calculate the minimum and register it or should I explicitly create a variable to hold the value before entering the loop? What kinds of languages would be able to optimize this?
As an extension if I were in an object oriented language with a similar loop except it looked more like this
for (int i = 0; i < object.coolFunc(); i++) {
//Code that may change parameters and state of object but does not change the return value of coolFunc()
}
What would be optimized?
Any good compiler will optimize the controlling expression of a for loop by evaluating visibly invariant subexpressions in it just once, provided optimization is enabled. Here, “invariant” means the value of the subexpression does not change while the loop is executing. “Visibly” means the compiler can see that the expression is invariant. There are things that can interfere with this:
Suppose, inside the loop, some function is called and the address of size is passed as an argument. Since the function has the address of size, it could change the contents of size. Maybe the function does not do this, but the compiler might not be able to see the contents of the function. Its source code could be in another file. Or the function could be so complicated the compiler cannot analyze it. Then the compiler cannot see that size does not change.
min is not a standard C function, so your program must define it somewhere. As above, if the compiler does not know what min does or if it is too complicated for the compiler to analyze (not likely in this particular case, but in general), the compiler might not be able to see that it is a pure function.
Of course, the C standard does not guarantee this optimization. However, as you become experienced in programming, your knowledge of compilers and other tools should grow, and you will become familiar with what is expected of good tools, and you will also learn to beware of issues such as those above. For simple expressions, you can expect the compiler to optimize. But you need to remain alert to things that can interfere with optimization.

Why is it impossible to build a compiler that can determine if a C++ function will change the value of a particular variable?

I read this line in a book:
It is provably impossible to build a compiler that can actually
determine whether or not a C++ function will change the value of a
particular variable.
The paragraph was talking about why the compiler is conservative when checking for const-ness.
Why is it impossible to build such a compiler?
The compiler can always check if a variable is reassigned, a non-const function is being invoked on it, or if it is being passed in as a non-const parameter...
Why is it impossible to build such a compiler?
For the same reason that you can't write a program that will determine whether any given program will terminate. This is known as the halting problem, and it's one of those things that's not computable.
To be clear, you can write a compiler that can determine that a function does change the variable in some cases, but you can't write one that reliably tells you that the function will or won't change the variable (or halt) for every possible function.
Here's an easy example:
void foo() {
if (bar() == 0) this->a = 1;
}
How can a compiler determine, just from looking at that code, whether foo will ever change a? Whether it does or doesn't depends on conditions external to the function, namely the implementation of bar. There's more than that to the proof that the halting problem isn't computable, but it's already nicely explained at the linked Wikipedia article (and in every computation theory textbook), so I'll not attempt to explain it correctly here.
Imagine such compiler exists. Let's also assume that for convenience it provides a library function that returns 1 if the passed function modifies a given variable and 0 when the function doesn't. Then what should this program print?
int variable = 0;
void f() {
if (modifies_variable(f, variable)) {
/* do nothing */
} else {
/* modify variable */
variable = 1;
}
}
int main(int argc, char **argv) {
if (modifies_variable(f, variable)) {
printf("Modifies variable\n");
} else {
printf("Does not modify variable\n");
}
return 0;
}
Don't confuse "will or will not modify a variable given these inputs" for "has an execution path which modifies a variable."
The former is called opaque predicate determination, and is trivially impossible to decide - aside from reduction from the halting problem, you could just point out the inputs might come from an unknown source (eg. the user). This is true of all languages, not just C++.
The latter statement, however, can be determined by looking at the parse tree, which is something that all optimizing compilers do. The reason they do is that pure functions (and referentially transparent functions, for some definition of referentially transparent) have all sorts of nice optimizations that can be applied, like being easily inlinable or having their values determined at compile-time; but to know if a function is pure, we need to know if it can ever modify a variable.
So, what appears to be a surprising statement about C++ is actually a trivial statement about all languages.
I think the key word in "whether or not a C++ function will change the value of a particular variable" is "will". It is certainly possible to build a compiler that checks whether or not a C++ function is allowed to change the value of a particular variable, you cannot say with certainty that the change is going to happen:
void maybe(int& val) {
cout << "Should I change value? [Y/N] >";
string reply;
cin >> reply;
if (reply == "Y") {
val = 42;
}
}
I don't think it's necessary to invoke the halting problem to explain that you can't algorithmically know at compile time whether a given function will modify a certain variable or not.
Instead, it's sufficient to point out that a function's behavior often depends on run-time conditions, which the compiler can't know about in advance. E.g.
int y;
int main(int argc, char *argv[]) {
if (argc > 2) y++;
}
How could the compiler predict with certainty whether y will be modified?
It can be done and compilers are doing it all the time for some functions, this is for instance a trivial optimisation for simple inline accessors or many pure functions.
What is impossible is to know it in the general case.
Whenever there is a system call or a function call coming from another module, or a call to a potentially overriden method, anything could happen, included hostile takeover from some hacker's use of a stack overflow to change an unrelated variable.
However you should use const, avoid globals, prefer references to pointers, avoid reusing variables for unrelated tasks, etc. that will makes the compiler's life easier when performing aggressive optimisations.
There are multiple avenues to explaining this, one of which is the Halting Problem:
In computability theory, the halting problem can be stated as follows: "Given a description of an arbitrary computer program, decide whether the program finishes running or continues to run forever". This is equivalent to the problem of deciding, given a program and an input, whether the program will eventually halt when run with that input, or will run forever.
Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist.
If I write a program that looks like this:
do tons of complex stuff
if (condition on result of complex stuff)
{
change value of x
}
else
{
do not change value of x
}
Does the value of x change? To determine this, you would first have to determine whether the do tons of complex stuff part causes the condition to fire - or even more basic, whether it halts. That's something the compiler can't do.
Really surprised that there isn't an answer that using the halting problem directly! There's a very straightforward reduction from this problem to the halting problem.
Imagine that the compiler could tell whether or not a function changed the value of a variable. Then it would certainly be able to tell whether the following function changes the value of y or not, assuming that the value of x can be tracked in all the calls throughout the rest of the program:
foo(int x){
if(x)
y=1;
}
Now, for any program we like, let's rewrite it as:
int y;
main(){
int x;
...
run the program normally
...
foo(x);
}
Notice that, if, and only if, our program changes the value of y, does it then terminate - foo() is the last thing it does before exiting. This means we've solved the halting problem!
What the above reduction shows us is that the problem of determining whether a variable's value changes is at least as hard as the halting problem. The halting problem is known to be incomputable, so this one must be also.
As soon as a function calls another function that the compiler doesn't "see" the source of, it either has to assume that the variable is changed, or things may well go wrong further below. For example, say we have this in "foo.cpp":
void foo(int& x)
{
ifstream f("f.dat", ifstream::binary);
f.read((char *)&x, sizeof(x));
}
and we have this in "bar.cpp":
void bar(int& x)
{
foo(x);
}
How can the compiler "know" that x is not changing (or IS changing, more appropriately) in bar?
I'm sure we can come up with something more complex, if this isn't complex enough.
It is impossible in general to for the compiler to determine if the variable will be changed, as have been pointed out.
When checking const-ness, the question of interest seems to be if the variable can be changed by a function. Even this is hard in languages that support pointers. You can't control what other code does with a pointer, it could even be read from an external source (though unlikely). In languages that restrict access to memory, these types of guarantees can be possible and allows for more aggressive optimization than C++ does.
To make the question more specific I suggest the following set of constraints may have been what the author of the book may have had in mind:
Assume the compiler is examining the behavior of a specific function with respect to const-ness of a variable. For correctness a compiler would have to assume (because of aliasing as explained below) if the function called another function the variable is changed, so assumption #1 only applies to code fragments that don't make function calls.
Assume the variable isn't modified by an asynchronous or concurrent activity.
Assume the compiler is only determining if the variable can be modified, not whether it will be modified. In other words the compiler is only performing static analysis.
Assume the compiler is only considering correctly functioning code (not considering array overruns/underruns, bad pointers, etc.)
In the context of compiler design, I think assumptions 1,3,4 make perfect sense in the view of a compiler writer in the context of code gen correctness and/or code optimization. Assumption 2 makes sense in the absence of the volatile keyword. And these assumptions also focus the question enough to make judging a proposed answer much more definitive :-)
Given those assumptions, a key reason why const-ness can't be assumed is due to variable aliasing. The compiler can't know whether another variable points to the const variable. Aliasing could be due to another function in the same compilation unit, in which case the compiler could look across functions and use a call tree to statically determine that aliasing could occur. But if the aliasing is due to a library or other foreign code, then the compiler has no way to know upon function entry whether variables are aliased.
You could argue that if a variable/argument is marked const then it shouldn't be subject to change via aliasing, but for a compiler writer that's pretty risky. It can even be risky for a human programmer to declare a variable const as part of, say a large project where he doesn't know the behavior of the whole system, or the OS, or a library, to really know a variable won't change.
Even if a variable is declared const, doesn't mean some badly written code can overwrite it.
// g++ -o foo foo.cc
#include <iostream>
void const_func(const int&a, int* b)
{
b[0] = 2;
b[1] = 2;
}
int main() {
int a = 1;
int b = 3;
std::cout << a << std::endl;
const_func(a,&b);
std::cout << a << std::endl;
}
output:
1
2
To expand on my comments, that book's text is unclear which obfuscates the issue.
As I commented, that book is trying to say, "let's get an infinite number of monkeys to write every conceivable C++ function which could ever be written. There will be cases where if we pick a variable that (some particular function the monkeys wrote) uses, we can't work out whether the function will change that variable."
Of course for some (even many) functions in any given application, this can be determined by the compiler, and very easily. But not for all (or necessarily most).
This function can be easily so analysed:
static int global;
void foo()
{
}
"foo" clearly does not modify "global". It doesn't modify anything at all, and a compiler can work this out very easily.
This function cannot be so analysed:
static int global;
int foo()
{
if ((rand() % 100) > 50)
{
global = 1;
}
return 1;
Since "foo"'s actions depends on a value which can change at runtime, it patently cannot be determined at compile time whether it will modify "global".
This whole concept is far simpler to understand than computer scientists make it out to be. If the function can do something different based on things can change at runtime, then you can't work out what it'll do until it runs, and each time it runs it may do something different. Whether it's provably impossible or not, it's obviously impossible.

Optimizing g++ when indexing an array with an injective function

I have a for loop where each step i, it processes an array element p[f(i)], where f(i) is an injective (one-to-one) map from 1...n to 1...m (m > n). So there is no data coupling in the loop and all compiler optimization techniques such as pipelining can be used. But how can I inform g++ of the injectivity of f(i)? Or do I even need to (can g++ figure that out)?
Assuming that f doesn't rely on any global state and produces no side effects, you can tag it with the const attribute:
int f(int i) __attribute__((const));
If f does rely on global state but still has the property that it's a pure function of its inputs and global state (and produces no side effects), you can use the the slightly weaker pure attribute.
These attributes let gcc make more optimizations than it otherwise could, although I don't know if these will be helpful in your case. Take a look at the generated assembly code and see if they help.
You could also try processing the loop with a temporary storage array, i.e.:
temp[i]= process(p[f(i)]);
then copy the results back:
p[f(i)]= temp[i];
Assuming you declared p and temp to be restricted pointers, the compiler has enough information to optimize a little more aggressively.
If the definition of f() is in scope and is inlineable most any good compiler should first inline it into the function, then the next optimization passes should be able to rewrite the code as if the function call wasn't there.