Marking a function as having no side-effects with Visual C++ - c++

Consider the following (a bit conceived) example:
// a.cpp
int mystrlen(const char* a) {
int l = 0;
while (a[l]) ++l;
return l;
}
// b.cpp
extern int mystrlen(const char*);
int foo(const char* text) {
return mystrlen(text) + mystrlen(text);
}
It would be very nice to be able to tell the compiler that mystrlen() doesn't have side-effects and thus it can re-use the old result from mystrlen(text) instead of calling it twice.
I don't find anything in the docs about it and restrict or one of its variances doesn't seem to do the job, either. A look at the output code with all optimizations on (switch /Ox) shows that the compiler really generates two calls. It even does so if I put both functions in one module.
Any solution to this or can anyone confirm that there is no solution in VC++?

MSVC has no support for pure/const attributes, and also has no intention to support them. See https://connect.microsoft.com/VisualStudio/feedback/details/804288/msvc-add-const-and-pure-like-function-attributes. Other compilers, such as GCC and Clang do support such attributes. Also see pure/const function attributes in different compilers.

What you are looking for won't help you.
In general, the compiler can't elide the calls, even if if believes the function is without side effects. Where did you get that pointer? Does a signal handler have access to that same pointer? Maybe another thread? How does the compiler know the memory pointed to by the pointer isn't going to change out from underneath it?
Compiler's do frequently eliminate redundant fetches within a function body, even for things fetched through pointers. But there is a limit to how much of this they can or even should do.
Here you're asking the compiler to believe that a bare pointer you have to who knows where will maintain consistent contents between two function calls. That's a lot of assuming. Sure you haven't declared your pointer volatile, but still, it's a lot of assuming.
Now, if you had a buffer on the stack and were passing it to that function twice in a row, the compiler can pretty safely assume that if you haven't passed the pointer out somewhere else that the contents of that buffer aren't going to be changed by some random other thing in the program at some unexpected time.
For example:
// b.cpp
extern int mystrlen(const char*);
int foo(int bar) {
char number[20];
snsprintf(number, 20, "%d", bar);
return mystrlen(number) + mystrlen(number);
}
Given assumptions the compiler could make about what snprintf did with number given that it's a library function, it could then elide the second call to mystrlen if there was a way to declare the mystrlen had no side effects.

Because C++ is an imperative language rather than a functional one, what you're trying to achieve is not possible.
It looks like the behaviour that you're expecting here is is that of referential transparency, which there isn't a way to tell the compiler about in C++ (but in a purely functional programming language like Haskell would be implicit).
Hopefully a future standard of C++ will introduce a keyword that will allow us to mark functions as 'pure' or 'without side effect'.

Related

Passing structs by-value while conforming to the C calling convention in LLVM IR

I would like to pass structs by-value between C++ and a JIT'd LLVM program. I've seen a lot of discussion about this and even a few questions on SO. I've read that I need to do something called "argument coercion" if I want my program to really pass-by-value. Using byval and sret looks like the easy cross-platform solution. It's still a bit of a pain and the C++ code has to remember to pass pointers instead of values (although, the calling code is C++ so I could do some templating magic).
The more I read about this problem, the less I seem to understand it. Calling convention is a platform-specific issue that should be dealt with by the code generator, right? I don't understand why the platform-specific code generator doesn't just deal with the platform-specific way of handling structs (while conforming to the platform's C ABI). The front-end should be platform-agnostic!
Is there a pass that does argument coercion for me? A pass that visits every function declaration and every function call and transforms all of the structs so that they are compatible with the platform's C ABI? I feel like that's something that all frontends would be using if it existed and Clang doesn't use it so maybe it's not possible. Why isn't this a viable solution? If a pass can just deal with this then I would expect it to be part of LLVM.
I don't understand why every frontend has to do argument coercion. I don't even understand how to do argument coercion. I've seen a few instances of people taking the Clang code generation code and factoring out the part that does argument coercion. Unfortunately, this seems like the best solution if I want real C ABI compatibility. The fact that it's even possible to reuse part of another frontend for a completely different language makes me continue to wonder why this has to be done in the frontend?
Something has to be done about this! We can't just keep writing the same C ABI compatibility code in every frontend. It's ridiculous! Maybe I simply don't understand.
Could someone clear this up for me? I'm thinking about using byval and sret simply because it's easier than modifying the clang code generator. Is there an easier way?
When passing around structs by value in LLVM IR, you have to make up your own rules. I chose the simplest set of rules I could.
Let's say I have a program like this:
struct MyStruct {
int a;
char b, c, d, e;
};
MyStruct identityImpl(MyStruct s) {
return s;
}
MyStruct identity(MyStruct s) {
return identityImpl(s);
}
The LLVM IR for this program is equivalent to this:
void identityImpl(MyStruct *ret, const MyStruct *s) {
MyStruct localS = *s;
*ret = localS;
}
void identity(MyStruct *ret, const MyStruct *s) {
MyStruct localS = *s;
MyStruct localRet;
identityImpl(&localRet, &localS);
*ret = localRet;
}
It's not the most efficient way of passing the struct because MyStruct can fit in a 64-bit register. However, the optimizer can remove localS and use s directly if it can prove that localS is never written to. Both of those functions optimize down to a single call to memcpy.
This only took half a day. Going the Clang route probably would have taken at least a week. I still think it's rather unfortunate that I had to do this but I understand the problem now. The passing of structs is not specified by the platform's C ABI.

Why is it impossible to build a compiler that can determine if a C++ function will change the value of a particular variable?

I read this line in a book:
It is provably impossible to build a compiler that can actually
determine whether or not a C++ function will change the value of a
particular variable.
The paragraph was talking about why the compiler is conservative when checking for const-ness.
Why is it impossible to build such a compiler?
The compiler can always check if a variable is reassigned, a non-const function is being invoked on it, or if it is being passed in as a non-const parameter...
Why is it impossible to build such a compiler?
For the same reason that you can't write a program that will determine whether any given program will terminate. This is known as the halting problem, and it's one of those things that's not computable.
To be clear, you can write a compiler that can determine that a function does change the variable in some cases, but you can't write one that reliably tells you that the function will or won't change the variable (or halt) for every possible function.
Here's an easy example:
void foo() {
if (bar() == 0) this->a = 1;
}
How can a compiler determine, just from looking at that code, whether foo will ever change a? Whether it does or doesn't depends on conditions external to the function, namely the implementation of bar. There's more than that to the proof that the halting problem isn't computable, but it's already nicely explained at the linked Wikipedia article (and in every computation theory textbook), so I'll not attempt to explain it correctly here.
Imagine such compiler exists. Let's also assume that for convenience it provides a library function that returns 1 if the passed function modifies a given variable and 0 when the function doesn't. Then what should this program print?
int variable = 0;
void f() {
if (modifies_variable(f, variable)) {
/* do nothing */
} else {
/* modify variable */
variable = 1;
}
}
int main(int argc, char **argv) {
if (modifies_variable(f, variable)) {
printf("Modifies variable\n");
} else {
printf("Does not modify variable\n");
}
return 0;
}
Don't confuse "will or will not modify a variable given these inputs" for "has an execution path which modifies a variable."
The former is called opaque predicate determination, and is trivially impossible to decide - aside from reduction from the halting problem, you could just point out the inputs might come from an unknown source (eg. the user). This is true of all languages, not just C++.
The latter statement, however, can be determined by looking at the parse tree, which is something that all optimizing compilers do. The reason they do is that pure functions (and referentially transparent functions, for some definition of referentially transparent) have all sorts of nice optimizations that can be applied, like being easily inlinable or having their values determined at compile-time; but to know if a function is pure, we need to know if it can ever modify a variable.
So, what appears to be a surprising statement about C++ is actually a trivial statement about all languages.
I think the key word in "whether or not a C++ function will change the value of a particular variable" is "will". It is certainly possible to build a compiler that checks whether or not a C++ function is allowed to change the value of a particular variable, you cannot say with certainty that the change is going to happen:
void maybe(int& val) {
cout << "Should I change value? [Y/N] >";
string reply;
cin >> reply;
if (reply == "Y") {
val = 42;
}
}
I don't think it's necessary to invoke the halting problem to explain that you can't algorithmically know at compile time whether a given function will modify a certain variable or not.
Instead, it's sufficient to point out that a function's behavior often depends on run-time conditions, which the compiler can't know about in advance. E.g.
int y;
int main(int argc, char *argv[]) {
if (argc > 2) y++;
}
How could the compiler predict with certainty whether y will be modified?
It can be done and compilers are doing it all the time for some functions, this is for instance a trivial optimisation for simple inline accessors or many pure functions.
What is impossible is to know it in the general case.
Whenever there is a system call or a function call coming from another module, or a call to a potentially overriden method, anything could happen, included hostile takeover from some hacker's use of a stack overflow to change an unrelated variable.
However you should use const, avoid globals, prefer references to pointers, avoid reusing variables for unrelated tasks, etc. that will makes the compiler's life easier when performing aggressive optimisations.
There are multiple avenues to explaining this, one of which is the Halting Problem:
In computability theory, the halting problem can be stated as follows: "Given a description of an arbitrary computer program, decide whether the program finishes running or continues to run forever". This is equivalent to the problem of deciding, given a program and an input, whether the program will eventually halt when run with that input, or will run forever.
Alan Turing proved in 1936 that a general algorithm to solve the halting problem for all possible program-input pairs cannot exist.
If I write a program that looks like this:
do tons of complex stuff
if (condition on result of complex stuff)
{
change value of x
}
else
{
do not change value of x
}
Does the value of x change? To determine this, you would first have to determine whether the do tons of complex stuff part causes the condition to fire - or even more basic, whether it halts. That's something the compiler can't do.
Really surprised that there isn't an answer that using the halting problem directly! There's a very straightforward reduction from this problem to the halting problem.
Imagine that the compiler could tell whether or not a function changed the value of a variable. Then it would certainly be able to tell whether the following function changes the value of y or not, assuming that the value of x can be tracked in all the calls throughout the rest of the program:
foo(int x){
if(x)
y=1;
}
Now, for any program we like, let's rewrite it as:
int y;
main(){
int x;
...
run the program normally
...
foo(x);
}
Notice that, if, and only if, our program changes the value of y, does it then terminate - foo() is the last thing it does before exiting. This means we've solved the halting problem!
What the above reduction shows us is that the problem of determining whether a variable's value changes is at least as hard as the halting problem. The halting problem is known to be incomputable, so this one must be also.
As soon as a function calls another function that the compiler doesn't "see" the source of, it either has to assume that the variable is changed, or things may well go wrong further below. For example, say we have this in "foo.cpp":
void foo(int& x)
{
ifstream f("f.dat", ifstream::binary);
f.read((char *)&x, sizeof(x));
}
and we have this in "bar.cpp":
void bar(int& x)
{
foo(x);
}
How can the compiler "know" that x is not changing (or IS changing, more appropriately) in bar?
I'm sure we can come up with something more complex, if this isn't complex enough.
It is impossible in general to for the compiler to determine if the variable will be changed, as have been pointed out.
When checking const-ness, the question of interest seems to be if the variable can be changed by a function. Even this is hard in languages that support pointers. You can't control what other code does with a pointer, it could even be read from an external source (though unlikely). In languages that restrict access to memory, these types of guarantees can be possible and allows for more aggressive optimization than C++ does.
To make the question more specific I suggest the following set of constraints may have been what the author of the book may have had in mind:
Assume the compiler is examining the behavior of a specific function with respect to const-ness of a variable. For correctness a compiler would have to assume (because of aliasing as explained below) if the function called another function the variable is changed, so assumption #1 only applies to code fragments that don't make function calls.
Assume the variable isn't modified by an asynchronous or concurrent activity.
Assume the compiler is only determining if the variable can be modified, not whether it will be modified. In other words the compiler is only performing static analysis.
Assume the compiler is only considering correctly functioning code (not considering array overruns/underruns, bad pointers, etc.)
In the context of compiler design, I think assumptions 1,3,4 make perfect sense in the view of a compiler writer in the context of code gen correctness and/or code optimization. Assumption 2 makes sense in the absence of the volatile keyword. And these assumptions also focus the question enough to make judging a proposed answer much more definitive :-)
Given those assumptions, a key reason why const-ness can't be assumed is due to variable aliasing. The compiler can't know whether another variable points to the const variable. Aliasing could be due to another function in the same compilation unit, in which case the compiler could look across functions and use a call tree to statically determine that aliasing could occur. But if the aliasing is due to a library or other foreign code, then the compiler has no way to know upon function entry whether variables are aliased.
You could argue that if a variable/argument is marked const then it shouldn't be subject to change via aliasing, but for a compiler writer that's pretty risky. It can even be risky for a human programmer to declare a variable const as part of, say a large project where he doesn't know the behavior of the whole system, or the OS, or a library, to really know a variable won't change.
Even if a variable is declared const, doesn't mean some badly written code can overwrite it.
// g++ -o foo foo.cc
#include <iostream>
void const_func(const int&a, int* b)
{
b[0] = 2;
b[1] = 2;
}
int main() {
int a = 1;
int b = 3;
std::cout << a << std::endl;
const_func(a,&b);
std::cout << a << std::endl;
}
output:
1
2
To expand on my comments, that book's text is unclear which obfuscates the issue.
As I commented, that book is trying to say, "let's get an infinite number of monkeys to write every conceivable C++ function which could ever be written. There will be cases where if we pick a variable that (some particular function the monkeys wrote) uses, we can't work out whether the function will change that variable."
Of course for some (even many) functions in any given application, this can be determined by the compiler, and very easily. But not for all (or necessarily most).
This function can be easily so analysed:
static int global;
void foo()
{
}
"foo" clearly does not modify "global". It doesn't modify anything at all, and a compiler can work this out very easily.
This function cannot be so analysed:
static int global;
int foo()
{
if ((rand() % 100) > 50)
{
global = 1;
}
return 1;
Since "foo"'s actions depends on a value which can change at runtime, it patently cannot be determined at compile time whether it will modify "global".
This whole concept is far simpler to understand than computer scientists make it out to be. If the function can do something different based on things can change at runtime, then you can't work out what it'll do until it runs, and each time it runs it may do something different. Whether it's provably impossible or not, it's obviously impossible.

calling qsort with a pointer to a c++ function

I have found a book that states that if you want to use a function from the C standard library which takes a function pointer as an argument (for example qsort), the function of which you want to pass the function pointer needs to be a C function and therefore declared as extern "C".
e.g.
extern "C" {
int foo(void const* a, void const* b) {...}
}
...
qsort(some_array, some_num, some_size, &foo);
I would not be surprised if this is just wrong information, however - I'm not sure, so: is this correct?
A lot depends on whether you're interested in a practical answer for the compiler you're using right now, or whether you care about a theoretical answer that covers all possible conforming implementations of C++. In theory it's necessary. In reality, you can usually get by without it.
The real question is whether your compiler uses a different calling convention for calling a global C++ function than when calling a C function. Most compilers use the same calling convention either way, so the call will work without the extern "C" declaration.
The standard doesn't guarantee that though, so in theory there could be a compiler that used different calling conventions for the two. At least offhand, I don't know of a compiler like that, but given the number of compilers around, it wouldn't surprise me terribly if there was one that I don't know about.
OTOH, it does raise another question: if you're using C++, why are you using qsort at all? In C++, std::sort is almost always preferable -- easier to use and usually faster as well.
This is incorrect information.
extern C is needed when you need to link a C++ library into a C binary; it allows the C linker to find the function names. This is not an issue with function pointers (as the function is not referenced by name in the C code).

Compiler Optimization with Parameters

Lets say you have some functions in some classes are called together like this
myclass::render(int offset_x, int offset_y)
{
otherClass.render(offset_x, offset_y)
}
This pattern will repeat for a while possibly through 10+ classes, so my question is:
Are modern C++ compilers smart enough to recognise that wherever the program stores function parameters - From what wikipedia tells me it seems to vary depending on parameter size, but that for a 2 parameter function the processor register seems likely - doesn't need to be overridden with new values?
If not I might need to look at implementing my own methods
I think it's more likely that the compiler will make a larger-scale optimization. You'd have to examine the actual machine code produced, but for example the following trivial attempt:
#include <iostream>
class B {
public:
void F( int x, int y ) {
std::cout << x << ", " << y << std::endl;
}
};
class A {
B b;
public:
void F( int x, int y ) {
b.F( x, y );
}
};
int main() {
A a;
a.F( 32, 64 );
}
causes the compiler (cl.exe from VS 2010, empty project, vanilla 'Release' configuration) to produce assembly that completely inlines the call tree; you basically get "push 40h, push 20h, call std::operator<<."
Abusing __declspec(noinline) causes cl.exe to realize that A::F just forwards to B::F and the definition of A::F is nothing but "call A::F" without stack or register manipulation at all (so in that case, it has performed the optimization you're asking about). But do note that my example is extremely contrived and so says nothing about the compiler's ability to do this well in general, only that it can be done.
In your real-world scenario, you'll have to examine the disassembly yourself. In particular, the 'this' parameter needs to be accounted for (cl.exe usually passes it via the ECX register) -- if you do any manipulation of the class member variables that may impact the results.
Yes, it is. The compiler performs dataflow analysis before register allocation, keeping track of which data is where at which time. And it will see that the arg0 location contains the value that needs to be in the arg0 location in order to call the next function, and so it doesn't need to move the data around.
I'm not a specialist, but it looks a lot like the perfect forwarding problem that will be solved in the next standard (C++0x) by using rvalue-references.
Currently I'd say it depend on the compiler, but I guess if the function and the parametters are simple enough then yes the function will serve as a shortcut.
If this function is imlpemented directly in the class definition (and then becoming implicitely candidate for inlining) it might be inligned, making the call directly call the wanted function instead of having two runtime calls.
In spite of your comment, I think that inlining is germane to this discussion. I don't believe that C++ compilers will do what you're asking (reuse parameters on the stack) UNLESS it also inlines the method completely.
The reason is that if it's making a real function call it still has to put the return address onto the stack, thus making the previous call's parameters no longer at the expected place on the stack. Thus in turn is has to put the parameters back on the stack a second time.
However I really wouldn't worry about that. Unless you're making a ridiculous number of function calls like this AND profiling shows that it's spending a large proportion of its time on these calls they're probably extremely minimal overhead and you shouldn't worry about it. For a function that small however, mark it inline and let the compiler decide if it can inline it away completely.
If I understand the question correctly, you are asking "Are most compilers smart enough to inline a simple function like this", and the answer to that question is yes. Note however the implicit this paremeter which is part of your function (because your function is part of a class), so it might not be completely inlineable if the call level is deep enough.
The problem with inlining is that the compiler will probably only be able to do this for a given compilation unit. The linker is probably less likely to be clever enough to inline from one compilation unit to another.
But given the total trivial nature of the function and that both functions have exactly the same arguments in the same order, the cost of the function call will probably be only one machine instruction viz. an additional branch (or jump) to the true implementation. There is no need to even push the return address onto the stack.

How can I trust the behavior of C++ functions that declare const?

This is a C++ disaster, check out this code sample:
#include <iostream>
void func(const int* shouldnotChange)
{
int* canChange = (int*) shouldnotChange;
*canChange += 2;
return;
}
int main() {
int i = 5;
func(&i);
std::cout << i;
return 0;
}
The output was 7!
So, how can we make sure of the behavior of C++ functions, if it was able to change a supposed-to-be-constant parameter!?
EDIT: I am not asking how can I make sure that my code is working as expected, rather I am wondering how to believe that someone else's function (for instance some function in some dll library) isn't going to change a parameter or posses some behavior...
Based on your edit, your question is "how can I trust 3rd party code not to be stupid?"
The short answer is "you can't." If you don't have access to the source, or don't have time to inspect it, you can only trust the author to have written sane code. In your example, the author of the function declaration specifically claims that the code will not change the contents of the pointer by using the const keyword. You can either trust that claim, or not. There are ways of testing this, as suggested by others, but if you need to test large amounts of code, it will be very labour intensive. Perhaps moreso than reading the code.
If you are working on a team and you have a team member writing stuff like this, then you can talk to them about it and explain why it is bad.
By writing sane code.
If you write code you can't trust, then obviously your code won't be trustworthy.
Similar stupid tricks are possible in pretty much any language. In C#, you can modify the code at runtime through reflection. You can inspect and change private class members. How do you protect against that? You don't, you just have to write code that behaves as you expect.
Apart from that, write a unittest testing that the function does not change its parameter.
The general rule in C++ is that the language is designed to protect you from Murphy, not Machiavelli. In other words, its meant to keep a maintainance programmer from accidentally changing a variable marked as const, not to keep someone from deliberatly changing it, which can be done in many ways.
A C-style cast means all bets are off. It's sort of like telling the compiler "Trust me, I know this looks bad, but I need to do this, so don't tell me I'm wrong." Also, what you've done is actually undefined. Casting off const-ness and then modifying the value means the compiler/runtime can do anything, including e.g. crash your program.
The only thing I can suggest is to allocate the variable shouldNotChange from a memory page that is marked as read-only. This will force the OS/CPU to raise an error if the application attempts to write to that memory. I don't really recommend this as a general method of validating functions just as an idea you may find useful.
The simplest way to enforce this would be to just not pass a pointer:
void func(int shouldnotChange);
Now a copy will be made of the argument. The function can change the value all it likes, but the original value will not be modified.
If you can't change the function's interface then you could make a copy of the value before calling the function:
int i = 5;
int copy = i
func(&copy);
Don't use C style casts in C++.
We have 4 cast operators in C++ (listed here in order of danger)
static_cast<> Safe (When used to 'convert numeric data types').
dynamic_cast<> Safe (but throws exceptions/returns NULL)
const_cast<> Dangerous (when removing const).
static_cast<> Very Dangerous (When used to cast pointer types. Not a very good idea!!!!!)
reinterpret_cast<> Very Dangerous. Use this only if you understand the consequences.
You can always tell the compiler that you know better than it does and the compiler will accept you at face value (the reason being that you don't want the compiler getting in the way when you actually do know better).
Power over the compiler is a two edged sword. If you know what you are doing it is a powerful tool the will help, but if you get things wrong it will blow up in your face.
Unfortunately, the compiler has reasons for most things so if you over-ride its default behavior then you better know what you are doing. Cast is one the things. A lot of the time it is fine. But if you start casting away const(ness) then you better know what you are doing.
(int*) is the casting syntax from C. C++ supports it fully, but it is not recommended.
In C++ the equivalent cast should've been written like this:
int* canChange = static_cast<int*>(shouldnotChange);
And indeed, if you wrote that, the compiler would NOT have allowed such a cast.
What you're doing is writing C code and expecting the C++ compiler to catch your mistake, which is sort of unfair if you think about it.