I was trying to figure out how lambda works in C++.
And something strange happened. It's so weird that I don't know how to describe it correctly. I tried googling several keywords, but didn't find anything mentioned the behavior.
I first tried this code.
#include <iostream>
#include <utility>
using namespace std ;
auto func() {
int a = 0 ;
auto increase = [ &a ]( int i = 1 ){ a += i ; } ;
auto print = [ &a ](){ cout << a << '\n' ; } ;
pair< decltype(increase), decltype(print) >
p = make_pair( increase, print ) ;
return p ;
}
int main() {
auto lambdas = func() ;
auto increase = lambdas.first ;
auto print = lambdas.second ;
print() ;
increase() ;
print() ;
increase( 123456 ) ;
print() ;
return 0;
}
The output is as expected as
-1218965939
-1218965938
-1218842482
However, after I add this into the 'func()'
cout << typeid( decltype( print ) ).name() << '\n'
<< typeid( decltype( increase ) ).name() << '\n' ;
like this one
the output became
Z4funcvEUlvE0_
Z4funcvEUliE_
0
1
123457
I did not expect to happen.
[UPDATE]
The variable a should have be "dead" because its life-cycle was ended.
But I'm curious why the code exams typeid and decltype cause a seemed to be resurrected?
You are binding to a by reference. But this is a local variable which gets stored on the stack. It's undefined behavior to access it once the function finishes executing.
It's the same as if you returned a pointer to a and then started using it from the caller.
None of the output from your program is "as expected".
The lambdas in func() capture by reference a locally-scoped variable that goes out of scope as soon as func() returns.
After func() returns, a no longer exists, like any other function-local scope object. As such their captured references are now referenced to an object that went out of scope and got destroyed, and any usage of the referecend value becomes undefined behavior.
Worse, the code also sets the value via the no-longer-valid reference. On traditional implementation, this will scribble over some random part of the stack, which can lead to the entire process crashing.
Pure chance.
As I suspect you know, you are printing unspecified values through a dangling reference.
In your first example, the dangling reference tries to "read" from a memory location that has since been re-used for something else.
In your second example, the couts and/or typeids have affected the bloody guts of the implementation of your compiled program such that the memory location of a happens to be untouched by the time you illegally print its value.
But there is no point in trying to rationalise about this any further, and you could get a different result the next time you run the program. Or your computer could explode. Or the timeline could be altered such that you had never been born. Don't try to explain the symptoms of UB — just avoid it.
Related
I have a Visual Studio 2010 C++ program, the main function of which is:
vector<double> v(10);
double start = 0.0; double increment = 10.0;
auto f = [&start, increment]() { return start += increment; };
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
cout << endl << "Changing vars to try again..." << endl;
start = 15; increment = -1.5;
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
return 0;
When I compile this in MS Visual Studio, the first generate does what I expected, resulting in "10, 20, ... 100, ". The second does not; the lambda "sees" the change in start but not the change in increment, so I get "25, 35, ... 115, ".
MSDN explains that
The Visual C++ compiler binds a lambda expression to its captured variables when the expression is declared instead of when the expression is called. ... [T]he reassignment of [a variable captured by value] later in the program does not affect the result of the expression.
So my question is: is this standards-compliant C++11 behavior, or is it Microsoft's own eccentric implementation? Bonus: if it is standard behavior, why was the standard written that way? Does it have to do with enforcing referential transparency for functional programming?
With a lambda expression, the bound variables are captured at the time of declaration.
This sample will make it very clear: https://ideone.com/Ly38P
std::function<int()> dowork()
{
int answer = 42;
auto lambda = [answer] () { return answer; };
// can do what we want
answer = 666;
return lambda;
}
int main()
{
auto ll = dowork();
return ll(); // 42
}
It is clear that the capture must be happening before the invocation, since the variables being captured don't even exist (not in scope, neither in lifetime) anymore at a later time.
It's bound at creation time. Consider:
#include <functional>
#include <iostream>
std::function<int(int)> foo;
void sub()
{
int a = 42;
foo = [a](int x) -> int { return x + a; };
}
int main()
{
sub();
int abc = 54;
abc = foo(abc); // Note a no longer exists here... but it was captured by
// value, so the caller shouldn't have to care here...
std::cout << abc; //96
}
There's no a here when the function is called -- there'd be no way for the compiler to go back and update it. If you pass a by reference, then you have undefined behavior. But if you pass by value any reasonable programmer would expect this to work.
I think you are confusing the mechanism of capture with the mechanism of variable passing. They are not the same thing even if they bear some superficial resemblance to one another. If you need the current value of a variable inside a lambda expression, capture it by reference (though, of course, that reference is bound to a particular variable at the point the lambda is declared).
When you 'capture' a variable, you are creating something very like a closure. And closures are always statically scoped (i.e. the 'capture' happens at the point of declaration). People familiar with the concept of a lambda expression would find C++'s lambda expressions highly strange and confusing if it were otherwise. Adding a brand new feature to a programming language that is different from the same feature in other programming languages in some significant way would make C++ even more confusing and difficult to understand than it already is. Also, everything else in C++ is statically scoped, so adding some element of dynamic scoping would be very strange for that reason as well.
Lastly, if capture always happened by reference, then that would mean a lambda would only be valid as long as the stack frame was valid. Either you would have to add garbage collected stack frames to C++ (with a huge performance hit and much screaming from people who are depending on the stack being largely contiguous) or you would end up creating yet another feature where it was trivially easy to blow your foot off with a bazooka by accident as the stack frame referenced by a lambda expression would go out of scope and you'd basically be creating a lot of invisible opportunities to return local variables by reference.
Yes, it has to capture by value at the point because otherwise you could attempt to capture a variable (by reference for example) that no longer exists when the lambda/function is actually called.
The standard supports capturing both by value AND by reference to address both possible use cases. If you tell the compiler to capture by value it's captured at the point the lambda is created. If you ask to capture by reference, it will capture a reference to the variable which will then be used at the point the lambda is called (requiring of course that the referenced variable must still exist at the point the call is made).
#include<iostream>
using namespace std;
int main( )
{
int *p;
double *q;
cout << p << " " << q << endl;
p++;
q++;
cout << p << " " << q << endl;
//*p = 5; // should be wrong!
}
This function prints
0x7ffe6c0591a0 0
0x7ffe6c0591a4 0x8
Why does p point to some randm address and q to zero? Also, when I uncomment the line *p=5, shouln't it throw an error? It still works fine:
code with line uncommented output
0x7ffc909a2f70 0
0x7ffc909a2f74 0x8
What can explain this weird behaviour?
When local (auto) variables of basic type (int, double, pointers to them, etc) are uninitialised, any operation that accesses their value yields undefined behaviour.
Printing a variable accesses its value, so both the statements with cout << ... give undefined behaviour. Incrementing a variable also accesses its value (it is not possible to give the result of incrementing without accessing the previous value) so both the increment operators present undefined behaviour. Derefererencing an unitialised pointer (as in *p) gives undefined behaviour, as does assigning a value to the result *p = 5.
So every statement you have shown after the definitions of p and q gives undefined behaviour.
Undefined behaviour means there are no constraints on what is permitted to happen - or, more simply, that anything can happen. That allows any result from "appear to do nothing" to "crash" to "reformat your hard drive".
The particular output your are getting therefore doesn't really matter. You may get completely different behavior when the code is built with a different compiler, or even during a different phase of the moon.
In terms of a partial explanation of what you are seeing .... The variables p and q will probably receive values corresponding to whatever happens to be in memory at the location where they are created - and therefore to whatever some code (within an operating system driver, within your program, even within some other program) happened to write at that location previously. But that is only one of many possible explanations.
As you have not initialised the variables - the code resorts to undefined behaviour. So anything can happen including what you have experienced.
C++ gives you more than enough rope to hang yourself. Compile with all the warnings switched on to avoid some of the perils.
When you do not initiate a variable, it set to a random or specific value based on the compiler policy.
About ++ if you create a pointer to class A, and use ++ the pointer will be incremented by sizeof(A).
And about your last question, *p=5 is a good instance of undefined behavior when you did not allocate a memory for p;
This question already has answers here:
Can a local variable's memory be accessed outside its scope?
(20 answers)
Closed 8 years ago.
Please consider this simple example:
#include <iostream>
const int CALLS_N = 3;
int * hackPointer;
void test()
{
static int callCounter = 0;
int local = callCounter++;
hackPointer = &local;
}
int main()
{
for(int i = 0; i < CALLS_N; i++)
{
test();
std::cout << *hackPointer << "(" << hackPointer << ")";
std::cout << *hackPointer << "(" << hackPointer << ")";
std::cout << std::endl;
}
}
The output (VS2010, MinGW without optimization) has the same structure:
0(X) Y(X)
1(X) Y(X)
2(X) Y(X)
...
[CALLS_N](X) Y(X)
where X - some address in memory, Y - some rubbish number.
What is done here is the case of undefined behaviour. However I want to understand why there is such behaviour in current conditions (and it is rather stable for two compilers).
It seems that after test() call first read of hackPointer leads to valid memory, but second successive instant read of it leads to rubbish. Also on any call address of local is the same. I always thought that memory for stack variable is allocated on every function call and is released after return but I can't explain output of the program from this point of view.
"Releasing" automatic storage doesn't make the memory go away, or change the pattern of bits stored there. It just makes it available for reuse, and causes undefined behaviour if you try to access the object that used to be there.
Immediately after returning from the function, the memory occupied by the local probably hasn't been overwritten, so reading it will probably give the value that was assigned within the function.
After calling another function (in this case, operator<<()), the memory is likely to have been reused for a variable within that function, so probably has a different value.
You are quite right that this is undefined behaviour.
That aside, what's happening is that std::cout << *hackPointer involves a function call: operator<<() gets called after the value of *hackPointer has been read. In all likelihood, operator<<() uses its own local variables that end up on the stack where local was, wiping out the latter.
I have a Visual Studio 2010 C++ program, the main function of which is:
vector<double> v(10);
double start = 0.0; double increment = 10.0;
auto f = [&start, increment]() { return start += increment; };
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
cout << endl << "Changing vars to try again..." << endl;
start = 15; increment = -1.5;
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
return 0;
When I compile this in MS Visual Studio, the first generate does what I expected, resulting in "10, 20, ... 100, ". The second does not; the lambda "sees" the change in start but not the change in increment, so I get "25, 35, ... 115, ".
MSDN explains that
The Visual C++ compiler binds a lambda expression to its captured variables when the expression is declared instead of when the expression is called. ... [T]he reassignment of [a variable captured by value] later in the program does not affect the result of the expression.
So my question is: is this standards-compliant C++11 behavior, or is it Microsoft's own eccentric implementation? Bonus: if it is standard behavior, why was the standard written that way? Does it have to do with enforcing referential transparency for functional programming?
With a lambda expression, the bound variables are captured at the time of declaration.
This sample will make it very clear: https://ideone.com/Ly38P
std::function<int()> dowork()
{
int answer = 42;
auto lambda = [answer] () { return answer; };
// can do what we want
answer = 666;
return lambda;
}
int main()
{
auto ll = dowork();
return ll(); // 42
}
It is clear that the capture must be happening before the invocation, since the variables being captured don't even exist (not in scope, neither in lifetime) anymore at a later time.
It's bound at creation time. Consider:
#include <functional>
#include <iostream>
std::function<int(int)> foo;
void sub()
{
int a = 42;
foo = [a](int x) -> int { return x + a; };
}
int main()
{
sub();
int abc = 54;
abc = foo(abc); // Note a no longer exists here... but it was captured by
// value, so the caller shouldn't have to care here...
std::cout << abc; //96
}
There's no a here when the function is called -- there'd be no way for the compiler to go back and update it. If you pass a by reference, then you have undefined behavior. But if you pass by value any reasonable programmer would expect this to work.
I think you are confusing the mechanism of capture with the mechanism of variable passing. They are not the same thing even if they bear some superficial resemblance to one another. If you need the current value of a variable inside a lambda expression, capture it by reference (though, of course, that reference is bound to a particular variable at the point the lambda is declared).
When you 'capture' a variable, you are creating something very like a closure. And closures are always statically scoped (i.e. the 'capture' happens at the point of declaration). People familiar with the concept of a lambda expression would find C++'s lambda expressions highly strange and confusing if it were otherwise. Adding a brand new feature to a programming language that is different from the same feature in other programming languages in some significant way would make C++ even more confusing and difficult to understand than it already is. Also, everything else in C++ is statically scoped, so adding some element of dynamic scoping would be very strange for that reason as well.
Lastly, if capture always happened by reference, then that would mean a lambda would only be valid as long as the stack frame was valid. Either you would have to add garbage collected stack frames to C++ (with a huge performance hit and much screaming from people who are depending on the stack being largely contiguous) or you would end up creating yet another feature where it was trivially easy to blow your foot off with a bazooka by accident as the stack frame referenced by a lambda expression would go out of scope and you'd basically be creating a lot of invisible opportunities to return local variables by reference.
Yes, it has to capture by value at the point because otherwise you could attempt to capture a variable (by reference for example) that no longer exists when the lambda/function is actually called.
The standard supports capturing both by value AND by reference to address both possible use cases. If you tell the compiler to capture by value it's captured at the point the lambda is created. If you ask to capture by reference, it will capture a reference to the variable which will then be used at the point the lambda is called (requiring of course that the referenced variable must still exist at the point the call is made).
I know that cout have buffer several days ago, and when I google it, it is said that the buffer is some like a stack and get the output of cout and printf from right to left, then put them out(to the console or file)from top to bottem. Like this,
a = 1; b = 2; c = 3;
cout<<a<<b<<c<<endl;
buffer:|3|2|1|<- (take “<-” as a poniter)
output:|3|2|<- (output 1)
|3|<- (output 2)
|<- (output 3)
Then I write a code below,
#include <iostream>
using namespace std;
int c = 6;
int f()
{
c+=1;
return c;
}
int main()
{
int i = 0;
cout <<"i="<<i<<" i++="<<i++<<" i--="<<i--<<endl;
i = 0;
printf("i=%d i++=%d i--=%d\n" , i , i++ ,i-- );
cout<<f()<<" "<<f()<<" "<<f()<<endl;
c = 6;
printf("%d %d %d\n" , f() , f() ,f() );
system("pause");
return 0;
}
Under VS2005, the output is
i=0 i++=-1 i--=0
i=0 i++=-1 i--=0
9 8 7
9 8 7
It seems that the stack way is right~
However, I read C++ Primer Plus yesterday, and it is said that the cout work from left to right, every time return an object(cout), so "That’s the feature that lets you concatenate output by using insertion". But the from left to right way can not explain cout<
Then Alnitak tell me that, "The << operator is really ostream& operator<<(ostream& os, int), so another way of writing this is:
operator<< ( operator<< ( operator<< ( cout, a ), b ), c )",
If the rightest argument is first evaluated, it can be some explained.
Now I'm confused about how cout's buffer work, can somebody help me?
You are mixing a lot of things. To date:
Implementation details of cout
Chained calls
Calling conventions
Try to read up on them separately. And don't think about all of them in one go.
printf("i=%d i++=%d i--=%d\n" , i , i++ ,i-- );
The above line invokes undefined behavior. Read the FAQ 3.2. Note, what you observe is a side-effect of the function's calling convention and the way parameters are passed in the stack by a particular implementation (i.e. yours). This is not guaranteed to be the same if you were working on other machines.
I think you are confusing the order of function calls with buffering. When you have a cout statement followed by multiple insertions << you are actually invoking multiple function calls, one after the other. So, if you were to write:
cout << 42 << 0;
It really means: You call,
cout = operator<<(cout, 42)
and then use the return in another call to the same operator as:
cout = operator<<(cout, 0)
What you have tested by the above will not tell you anything cout's internal representation. I suggest you take a look at the header files to know more.
Just as a general tip, never ever use i++ in the same line as another usage of i or i--.
The issue is that function arguments can be evaluated in any order, so if your function arguments have any side-effects (such as the increment and decrement operations) you can't guarantee that they will operate in the order you expect. This is something to avoid.
The same goes for this case, which is similar to the actual expansion of your cout usage:
function1 ( function2 ( foo ), bar );
The compiler is free to evaulate bar before calling function2, or vice versa. You can guarantee that function2 will return before function1 is called, for example, but not that their arguments are evaluated in a specific order.
This becomes a problem when you do something like:
function1 ( function2 ( i++), i );
You have no way to specify whether the "i" is evaluated before or after the "i++", so you're likely to get results that are different than you expect, or different results with different compilers or even different versions of the same compiler.
Bottom line, avoid statements with side-effects. Only use them if they're the only statement on the line or if you know you're only modifying the same variable once. (A "line" means a single statement plus semicolon.)
What you see is undefined behavior.
Local i and global c are added/subtracted multiple times without sequence point. This means that values you get can be about anything. Depends on compiler, possibly also processor architecture and number of cores.
The cout buffer can be thought as queue, so Alnitak is right.
In addition to the other answers which correctly point out that you are seeing undefined behavior, I figured I'd mention that std::cout uses an object of type std::streambuf to do its internal buffering. Basically it is an abstract class which represents of buffer (the size is particular to implementation and can even be 0 for unbufferd stream buffers). The one for std::cout is written such that when it "overflows" it is flushed into stdout.
In fact, you can change the std::streambuf associated with std::cout (or any stream for that matter). This often useful if you want to do something clever like make all std::cout calls end in a log file or something.
And as dirkgently said you are confusing calling convention with other details, they are entirely unrelated to std::cout's buffering.
In addition, mixing output paradigms (printf and cout) are implementation specific.