I have the following piece of code :
int f(int &x, int c){
c = c - 1;
if (c == 0) return 1;
x = x + 1;
return f(x, c)*x;
}
Now, suppose I call the above function like this :
int p = 5;
std::cout << f(p, p) << std::endl;
The output is 9^4, since x is passed by reference, hence the final value of x should be 9, but when the return statement of the above function is changed to :
return x*f(x, c);
the output is 3024 (6*7*8*9). Why is there a difference in output ? Has it anything to do with the order of evaluation of Operator* ? If we are asked to predict the output of the above piece of code, is it fixed, compiler-dependent or unspecified ?
When you write:
f(x,c)*x
the compiler may choose to retrieve the stored value in x (for the second operand) either before or after calling f. So there are many possible ways that execution could proceed. The compiler does not have to use any consistency in this choice.
To avoid the problem you could write:
auto x_temp = x;
return f(x, c) * x_temp;
Note: It is unspecified behaviour; not undefined behaviour because there is a sequence point before and after any function call (or in C++11 terminology, statements within a function are indeterminately-sequenced with respect to the calling code, not unsequenced).
The cause is that f() function has side effect on its x parameter. The variable passed to this parameter is incremented by the value of the second parameter c when the function returns.
Therefore when you swap the order of the operand, you get different results as x contains different values before and after the function is called.
However, note that behaviour of the code written in such way is undefined as compiler is free to swap evaluation of operand in any order. So it can behave differently on different platforms, compilers or even with different optimization settings. Because of that it's generally necessary to avoid such side effects. For details see http://en.cppreference.com/w/c/language/eval_order
Related
GCC can suggest functions for attribute pure and attribute const with the flags -Wsuggest-attribute=pure and -Wsuggest-attribute=const.
The GCC documentation says:
Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.
But what can happen if you attach __attribute__((__pure__)) to a function that doesn't match the above description, and does have side effects? Is it simply the possibility that the function will be called fewer times than you would want it to be, or is it possible to create undefined behaviour or other kinds of serious problems?
Similarly for __attribute__((__const__)) which is stricter again - the documentation states:
Basically this is just slightly more strict class than the pure attribute below, since function is not allowed to read global memory.
But what can actually happen if you attach __attribute__((__const__)) to a function that does access global memory?
I would prefer technical answers with explanations of actual possible scenarios within the scope of GCC / G++, rather than the usual "nasal demons" handwaving that appears whenever undefined behaviour gets mentioned.
But what can happen if you attach __attribute__((__pure__))
to a function that doesn't match the above description,
and does have side effects?
Exactly. Here's a short example:
extern __attribute__((pure)) int mypure(const char *p);
int call_pure() {
int x = mypure("Hello");
int y = mypure("Hello");
return x + y;
}
My version of GCC (4.8.4) is clever enough to remove second call to mypure (result is 2*mypure()). Now imagine if mypure were printf - the side effect of printing string "Hello" would be lost.
Note that if I replace call_pure with
char s[];
int call_pure() {
int x = mypure("Hello");
s[0] = 1;
int y = mypure("Hello");
return x + y;
}
both calls will be emitted (because assignment to s[0] may change output value of mypure).
Is it simply the possibility that the function will be called fewer times
than you would want it to be, or is it possible to create
undefined behaviour or other kinds of serious problems?
Well, it can cause UB indirectly. E.g. here
extern __attribute__((pure)) int get_index();
char a[];
int i;
void foo() {
i = get_index(); // Returns -1
a[get_index()]; // Returns 0
}
Compiler will most likely drop second call to get_index() and use the first returned value -1 which will result in buffer overflow (well, technically underflow).
But what can actually happen if you attach __attribute__((__const__))
to a function that does access global memory?
Let's again take the above example with
int call_pure() {
int x = mypure("Hello");
s[0] = 1;
int y = mypure("Hello");
return x + y;
}
If mypure were annotated with __attribute__((const)), compiler would again drop the second call and optimize return to 2*mypure(...). If mypure actually reads s, this will result in wrong result being produced.
EDIT
I know you asked to avoid hand-waving but here's some generic explanation. By default function call blocks a lot of optimizations inside compiler as it has to be treated as a black box which may have arbitrary side effects (modify any global variable, etc.). Annotating function with const or pure instead allows compiler to treat it more like expression which allows for more aggressive optimization.
Examples are really too numerous to give. The one which I gave above is common subexpression elimination but we could as well easily demonstrate benefits for loop invariants, dead code elimination, alias analysis, etc.
int& foo() {
printf("Foo\n");
static int a;
return a;
}
int bar() {
printf("Bar\n");
return 1;
}
void main() {
foo() = bar();
}
I am not sure which one should be evaluated first.
I have tried in VC that bar function is executed first. However, in compiler by g++ (FreeBSD), it gives out foo function evaluated first.
Much interesting question is derived from the above problem, suppose I have a dynamic array (std::vector)
std::vector<int> vec;
int foobar() {
vec.resize( vec.size() + 1 );
return vec.size();
}
void main() {
vec.resize( 2 );
vec[0] = foobar();
}
Based on previous result, the vc evaluates the foobar() and then perform the vector operator[]. It is no problem in such case. However, for gcc, since the vec[0] is being evaluated and foobar() function may lead to change the internal pointer of array. The vec[0] can be invalidated after executation of foobar().
Is it meant that we need to separate the code such that
void main() {
vec.resize( 2 );
int a = foobar();
vec[0] = a;
}
Order of evaluation would be unspecified in that case. Dont write such code
Similar example here
The concept in C++ that governs whether the order of evaluation is defined is called the sequence point.
Basically, at a sequence point, it is guaranteed that all expressions prior to that point (with observable side effects) have been evaluated, and that no expressions beyond that point have been evaluated yet.
Though some might find it surprising, the assignment operator is not a sequence point. A full list of all sequence points is in the Wikipedia article.
c++17 guarantees that bar() will be executed before foo().
Before c++17 this was unspecified behaviour and different compilers would evaluate in different orders. If both sides of the expression modify the same memory location then the behaviour is undefined.
Order of evaluation of an expression is Unspecified Behaviour.
It depends on the compiler which order it chooses to evaluate.
You should refrain from writing shuch codes.
Though if there is no side effect then the order shouldn't matter.
If the order matters, then your code is wrong/ Not portable/ may give different result accross different compilers**.
In my c++ program, I have this function,
char MostFrequentCharacter(ifstream &ifs, int &numOccurances);
and in main(), is this code,
ifstream in("file.htm");
int maxOccurances = 0;
cout <<"Most freq char is "<<MostFrequentCharacter(in, maxOccurances)<<" : "<<maxOccurances;
But this is not working, though I am getting the correct char, the maxOccurance remains zero.
But if I replace the above code in main with this,
ifstream in("file.htm");
int maxOccurances = 0;
char maxFreq = MostFrequentCharacter(in, maxOccurances);
cout <<"Most freq char is "<<maxFreq<<" : "<<maxOccurances;
Then, it is working correctly. My question is why is it not working in first case.
In C++,
cout << a << b
By Associativity evaluates to:
(cout << a) << b
but the compiler is free to evaluate them in any order.
i.e, the compiler can evaluate b first, then a, then the first << operation and the the second << operation. This because there is no sequence point associated with <<
For the sake of simplicity let us consider the following code, which is equivalent:
#include<iostream>
int main()
{
int i = 0;
std::cout<<i<<i++;
return 0;
}
In the above source code:
std::cout<<i<<i++;
evaluates to the function call:
operator<<(operator<<(std::cout,i),i++);
In this function call whether operator<<(std::cout,i) or i++ gets evaluated first is Unspecified. i.e:
operator<<(std::cout,i) maybe evaluated first Or
i++ maybe evaluated first Or
Some Magic Ordering implemented by the compiler
Given the above, that there is no way to define this ordering and hence no explanation is possible either.
Relevant Quote from the C++03 Standard:
Section 1.9
Certain other aspects and operations of the abstract machine are described in this International Standard as unspecified (for example, order of evaluation of arguments to a function). Where possible, this International Standard defines a set of allowable behaviors. These define the nondeterministic aspects of the abstract machine.
Because in the first case, the value of maxOccurances in the expression is being resolved before the call to MostFrequentCharacter. It doesn't have to be that way though, it is unspecified behavior.
You may experience different results with different compilers, or compiler options. If you try that same thing on VC++ for example, I believe you will see different results.
You just have to note that where you see << you are actually calling the operator<< method - so the compiler is working out the value of the arguments to pass into that function before your variable is modified.
In other words, what you have is similar to
operator<<(operator<<(cout, f(x)), x);
...and since the evaluation order of function arguments is undefined, it depends on the compiler.
Cout works right to left in your compiler so first rightmost is evaluated then left one. :)
So the value of referenced variable isn't changed.
I recently saw a piece of code at comp.lang.c++ moderated returning a reference of a static integer from a function. The code was something like this
int& f()
{
static int x;
x++;
return x;
}
int main()
{
f()+=1; //A
f()=f()+1; //B
std::cout<<f();
}
When I debugged the application using my cool Visual Studio debugger I saw just one call to statement A and guess what I was shocked. I always thought i+=1 was equal to i=i+1 so
f()+=1 would be equal to f()=f()+1 and I would be seeing two calls to f(), but I saw only one. What the heck is this? Am I crazy or is my debugger gone crazy or is this a result of premature optimization?
This is what The Standard says about += and friends:
5.17-7: The behavior of an expression of the form E1 op= E2 is equivalent to
E1 = E1 op E2 except that E1 is
evaluated only once.[...]
So the compiler is right on that.
i+=1 is functionally the same as i=i+1. It's actually implemented differently (basically, it's designed to take advantage of CPU level optimization).
But essencially the left side is evaluated only once. It yields a non-const l-value, which is all it needs to read the value, add one and write it back.
This is more obvious when you create an overloaded operator for a custom type. operator+= modifies the this instance. operator+ returns a new instance. It is generally recommended (in C++) to write the oop+= first, and then write op+ in terms of it.
(Note this is applies only to C++; in C#, op+= is exactly as you assumed: just a short hand for op+, and you cannot create your own op+=. It is automatically created for you out of the Op+)
Your thinking is logical but not correct.
i += 1;
// This is logically equivalent to:
i = i + 1;
But logically equivalent and identical are not the same.
The code should be looked at as looking like this:
int& x = f();
x += x;
// Now you can use logical equivalence.
int& x= f();
x = x + 1;
The compiler will not make two function calls unless you explicitly put two function calls into the code. If you have side effects in your functions (like you do) and the compiler started adding extra hard to see implicit calls it would be very hard to actually understand the flow of the code and thus make maintenance very hard.
f() returns a reference to the static integer. Then += 1 adds one to this memory location – there's no need to call it twice in statement A.
In every language I've seen which supports a += operator, the compiler evaluates the operand of the left-hand side once to yield some type of a address which is then used both to read the old value and write the new one. The += operator is not just syntactic sugar; as you note, it can achieve expression semantics which would be awkward to achieve via other means.
Incidentally, the "With" statements in vb.net and Pascal both have a similar feature. A statement like:
' Assime Foo is an array of some type of structure, Bar is a function, and Boz is a variable.
With Foo(Bar(Boz))
.Fnord = 9
.Quack = 10
End With
will compute the address of Foo(Bar(Boz)), and then set two fields of that structure to the values nine and ten. It would be equivalent in C to
{
FOOTYPE *tmp = Foo(Bar(Boz));
tmp->Fnord = 9;
tmp->Quack = 10;
}
but vb.net and Pascal do not expose the temporary pointer. While one could achieve the same effect in VB.net without using "With" to hold the result of Bar(), using "With" allows one to avoid the temporary variable.
I have a quick question about the following expression:
int a_variable = 0;
if(0!=a_variable)
a_variable=1;
what is the difference between "(0 != a_variable)" and "(a_variable != 0)" ?
I dont have any errors for now but is this a wrong way to use it??
if you forget the !, the first will give an error (0 = a_variable) and the second will wreak havoc (a_variable = 0).
Also, with user-defined operators the second form can be implemented with a member function while the first can only be a non-member (possibly friend) function. And it's possible, although a REALLY bad idea, to define the two forms in different ways. Of course since a_variable is an int then there are no user-defined operators in effect in this example.
There is no difference between 0 != x and x != 0.
Any difference it may make is the order in which the arguments will be evaluated. a != b would conventionally evaluate a, then evaluate b and compare them, while b != a would do it the other way round. However, I heard somewhere that the order of evaluation is undefined in some cases.
It doesn't make a big difference with variables or numbers (unless the variable is a class with overloaded != operator), but it may make a difference when you're comparing results of some function calls.
Consider
int x = 1;
int f() {
x = -1;
return x;
}
int g() {
return x;
}
Assuming the operands are evaluated from left to right, then calling (f() != g()) would yield false, because f() will evalute to -1 and g() to -1 - while (g() != f()) would yield true, because g() will evaluate to 1 and f() - to -1.
This is just an example - better avoid writing such code in real life!