Is Lambda Expression just capture objects before it? [duplicate] - c++

I have a Visual Studio 2010 C++ program, the main function of which is:
vector<double> v(10);
double start = 0.0; double increment = 10.0;
auto f = [&start, increment]() { return start += increment; };
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
cout << endl << "Changing vars to try again..." << endl;
start = 15; increment = -1.5;
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
return 0;
When I compile this in MS Visual Studio, the first generate does what I expected, resulting in "10, 20, ... 100, ". The second does not; the lambda "sees" the change in start but not the change in increment, so I get "25, 35, ... 115, ".
MSDN explains that
The Visual C++ compiler binds a lambda expression to its captured variables when the expression is declared instead of when the expression is called. ... [T]he reassignment of [a variable captured by value] later in the program does not affect the result of the expression.
So my question is: is this standards-compliant C++11 behavior, or is it Microsoft's own eccentric implementation? Bonus: if it is standard behavior, why was the standard written that way? Does it have to do with enforcing referential transparency for functional programming?

With a lambda expression, the bound variables are captured at the time of declaration.
This sample will make it very clear: https://ideone.com/Ly38P
std::function<int()> dowork()
{
int answer = 42;
auto lambda = [answer] () { return answer; };
// can do what we want
answer = 666;
return lambda;
}
int main()
{
auto ll = dowork();
return ll(); // 42
}
It is clear that the capture must be happening before the invocation, since the variables being captured don't even exist (not in scope, neither in lifetime) anymore at a later time.

It's bound at creation time. Consider:
#include <functional>
#include <iostream>
std::function<int(int)> foo;
void sub()
{
int a = 42;
foo = [a](int x) -> int { return x + a; };
}
int main()
{
sub();
int abc = 54;
abc = foo(abc); // Note a no longer exists here... but it was captured by
// value, so the caller shouldn't have to care here...
std::cout << abc; //96
}
There's no a here when the function is called -- there'd be no way for the compiler to go back and update it. If you pass a by reference, then you have undefined behavior. But if you pass by value any reasonable programmer would expect this to work.

I think you are confusing the mechanism of capture with the mechanism of variable passing. They are not the same thing even if they bear some superficial resemblance to one another. If you need the current value of a variable inside a lambda expression, capture it by reference (though, of course, that reference is bound to a particular variable at the point the lambda is declared).
When you 'capture' a variable, you are creating something very like a closure. And closures are always statically scoped (i.e. the 'capture' happens at the point of declaration). People familiar with the concept of a lambda expression would find C++'s lambda expressions highly strange and confusing if it were otherwise. Adding a brand new feature to a programming language that is different from the same feature in other programming languages in some significant way would make C++ even more confusing and difficult to understand than it already is. Also, everything else in C++ is statically scoped, so adding some element of dynamic scoping would be very strange for that reason as well.
Lastly, if capture always happened by reference, then that would mean a lambda would only be valid as long as the stack frame was valid. Either you would have to add garbage collected stack frames to C++ (with a huge performance hit and much screaming from people who are depending on the stack being largely contiguous) or you would end up creating yet another feature where it was trivially easy to blow your foot off with a bazooka by accident as the stack frame referenced by a lambda expression would go out of scope and you'd basically be creating a lot of invisible opportunities to return local variables by reference.

Yes, it has to capture by value at the point because otherwise you could attempt to capture a variable (by reference for example) that no longer exists when the lambda/function is actually called.
The standard supports capturing both by value AND by reference to address both possible use cases. If you tell the compiler to capture by value it's captured at the point the lambda is created. If you ask to capture by reference, it will capture a reference to the variable which will then be used at the point the lambda is called (requiring of course that the referenced variable must still exist at the point the call is made).

Related

Limiting the scope of a temporary variable

Without resorting to the heap, I would like a temporary variable to pass out of scope, freeing its storage on the stack. However, I can think of no neat way to achieve the desired effect in a case like this:
#include <cstdlib>
#include <iostream>
int main()
{
const int numerator {14}, denominator {3};
// ...
// The desired scope begins here.
const std::div_t quotient_and_remainder {std::div(numerator, denominator)};
const int x_lo {
quotient_and_remainder.quot
};
const int x_hi {
quotient_and_remainder.quot + (quotient_and_remainder.rem ? 1 : 0)
};
// The desired scope ends here.
// The quotient_and_remainder should pass out of scope.
// ...
std::cout
<< "The quotient " << numerator << "/" << denominator
<< " is at least " << x_lo
<< " and is not more than " << x_hi << "." << "\n";
return 0;
}
At the low machine level, one desires the following.
Storage is first reserved on the stack for x_lo and x_hi.
The quotient_and_remainder is then pushed temporarily onto the stack.
The quotient_and_remainder is used to calculate values which are written into x_lo and x_hi.
Having done its duty, the quotient_and_remainder is popped off the stack.
At the high programming level, one desires the following.
The compiler enforces that x_lo and x_hi remain constant.
The name quotient_and_remainder does not clutter the function's entire local namespace.
At both levels, one would like to avoid wasting memory and runtime. (To waste compilation time is okay.)
In my actual program—which is too long to post here—the values of numerator and denominator are unknown at compile time, so constexpr cannot be used.
Ordinarily, I can think of various sound ways to achieve such effects using, for example, extra braces or maybe an anonymous lambda; but this time, I'm stumped.
(One should prefer to solve the problem without anti-idiomatic shenanigans like const_cast. However, if shenanigans were confined to a small part of the program to avert maintainabilty problems, then shenanigans might be acceptable if there existed no better choice.)
In C++17, you can use a function and a structured binding declaration:
auto x_lo_x_hi = [](int numerator, int denominator) {
std::div_t quotient_and_remainder {std::div(numerator, denominator)};
return std::pair{
quotient_and_remainder.quot,
quotient_and_remainder.quot + (quotient_and_remainder.rem ? 1 : 0)
};
};
const auto [x_lo, x_hi] = x_lo_x_hi(numerator, denominator);
Demo
x_lo_x_hi doesn't have to be a local lambda. It could be some function declared elsewhere or it could even be an IIFE (an anonymous lambda that you call immediately on the same line).
That said, this only serves to remove the name quotient_and_remainder from the local scope. Like MSalters said in a comment, any decent compiler will optimize that variable away in your original code. And even if it didn't, removing one variable from the stack will not improve the runtime, and if you have stack memory problems, the actual issue is somewhere else.

What happens to a reference when the object is deleted?

I did a bit of an experiment to try to understand references in C++:
#include <iostream>
#include <vector>
#include <set>
struct Description {
int a = 765;
};
class Resource {
public:
Resource(const Description &description) : mDescription(description) {}
const Description &mDescription;
};
void print_set(const std::set<Resource *> &resources) {
for (auto *resource: resources) {
std::cout << resource->mDescription.a << "\n";
}
}
int main() {
std::vector<Description> descriptions;
std::set<Resource *> resources;
descriptions.push_back({ 10 });
resources.insert(new Resource(descriptions.at(0)));
// Same as description (prints 10)
print_set(resources);
// Same as description (prints 20)
descriptions.at(0).a = 20;
print_set(resources);
// Why? (prints 20)
descriptions.clear();
print_set(resources);
// Object is written to the same address (prints 50)
descriptions.push_back({ 50 });
print_set(resources);
// Create new array
descriptions.reserve(100);
// Invalid address
print_set(resources);
for (auto *res : resources) {
delete res;
}
return 0;
}
https://godbolt.org/z/TYqaY6Tz8
I don't understand what is going on here. I have found this excerpt from C++ FAQ:
Important note: Even though a reference is often implemented using an address in the underlying assembly language, please do not think of a reference as a funny looking pointer to an object. A reference is the object, just with another name. It is neither a pointer to the object, nor a copy of the object. It is the object. There is no C++ syntax that lets you operate on the reference itself separate from the object to which it refers.
This creates some questions for me. So, if reference is the object itself and I create a new object in the same memory address, does this mean that the reference "becomes" the new object? In the example above, vectors are linear arrays; so, as long as the array points to the same memory range, the object will be valid. However, this becomes a lot trickier when other data sets are being used (e.g sets, maps, linked lists) because each "node" typically points to different parts of memory.
Should I treat references as undefined if the original object is destroyed? If yes, is there a way to identify that the reference is destroyed other than a custom mechanism that tracks the references?
Note: Tested this with GCC, LLVM, and MSVC
The note is misleading, treating references as syntax sugar for pointers is fine as a mental model. In all the ways a pointer might dangle, a reference will also dangle. Accessing dangling pointers/references is undefined behaviour (UB).
int* p = new int{42};
int& i = *p;
delete p;
void f(int);
f(*p); // UB
f(i); // UB, with the exact same reason
This also extends to the standard containers and their rules about pointer/reference invalidation. The reason any surprising behaviour happens in your example is simply UB.
The way I explain this to myself is:
Pointer is like a finger on your hands. It can point to memory blocks, think of them as a keyboard. So pointer literally points to a keypad that holds something or does something.
Reference is a nickname for something. Your name may be for example Michael Johnson, but people may call you Mike, MJ, Mikeson etc. Anytime you hear your nickname, person who called REFERED to the same thing - you. If you do something to yourself, reference will show the change too. If you point at something else, it won't affect what you previously pointed on (unless you're doing something weird), but rather point on something new. That being said, as in the accepted answer above, if you do something weird with your fingers and your nicknames, you'll see weird things happening.
References are likely the most important feature that C++ has that is critical in coding for beginners. Many schools today start with MATLAB which is insanely slow when you wish to do things seriously. One of the reasons is the lack of controlling references in MATLAB (yes it has them, make a class and derive from the handle - google it out) as you would in C++.
Look these two functions:
double fun1(std::valarray<double> &array)
{
return array.max();
}
double fun2(std::valarray<double> array)
{
return array.max();
}
These simple two functions are very different. When you have some STL array and use fun1, function will expect nickname for that array, and will process it directly without making a copy. fun2 on the other hand will take the input array, create its copy, and process the copy.
Naturally, it is much more efficient to use references when making functions to process inputs in C++. That being said, you must be certain not to change your input in any way, because that will affect original input array in another piece of code where you generated it - you are processing the same thing, just called differently.
This makes references useful for a bit controversial coding, called side-effects.
In C++ you can't make a function with multiple outputs directly without making a custom data type. One workaround is a side effect in example like this:
#include <stdio.h>
#include <valarray>
#include <iostream>
double fun3(std::valarray<double> &array, double &min)
{
min = array.min();
return array.max();
}
int main()
{
std::valarray<double> a={1, 2, 3, 4, 5};
double sideEffectMin;
double max = fun3(a,sideEffectMin);
std::cout << "max of array is " << max << " min of array is " <<
sideEffectMin<<std::endl;
return 0;
}
So fun3 is expecting a reference to a double data type. In other words, it wants the second input to be a nickname for another double variable. This function then goes to alter the reference, and this will also alter the input. Both name and nickname get altered by the function, because it's the same "thing".
In main function, variable sideEffectMin is initialized to 0, but it will get a value when fun3 function is called. Therefore, you got 2 outputs from fun3.
The example shows you the trick with side effect, but also to be ware not to alter your inputs, specially when they are references to something else, unless you know what you are doing.

Return Reference to An Array of Strings

I am currently working my way through C++ Primer Fifth Edition. I have gone through a couple of other C++ books, but they weren't very detailed and were quite complicated.
This book has been helping me a lot with everything that I have missed. I've just hit a wall.
One of the exercises asks me to write a declaration for a function that returns a reference to an array of ten strings, without using trailing return, decltype, or type alias.
I know it only says write a declaration, which I have done, like so:
string (&returnArray()) [10];
I wanted to write a function definition as well, like so:
string (&returnString(int i, string s)) [10]
{
string s1[10];
s1[i] = s;
return s1;
}
In my main function, I have a for loop which passes a string literal through and stores that string inside a pointer to an array of ten strings. It should then output the results to the screen.
The problem I am having is, when I dereference my pointer to an array, once, it will output the address. If I dereference it twice, the program outputs nothing and stops responding.
Here is my main function, I have changed it multiple times, yet can't figure out why it's not outputting properly. I've probably got it all wrong...
int main()
{
string (*s)[10];
for(int i = 0; i != 10; ++i)
{
s = &returnString(i, "Hello");
cout << s[i] << endl;
}
return 0;
}
Your function returns a reference to a local variable – after the call, it’s a dangling reference. You cannot do that.
You can only return references to storage that goes on existing after the end of the function call.
Returning a reference to a temporary local object invokes undefined behavior.
A short fix is making it static:
string (&returnString(int i, string s)) [10]
{
static string s1[10];
^^^^^^
s1[i] = s;
return s1;
}
int main()
{
string(&s)[10] = returnString(0, "Hello");
for (int i = 0; i != 10; ++i)
{
s[i] = "Hello";
cout << s[i] << endl;
}
}
Returning pointers/references to local variables is bad, undefined behavior. You can follow your exercise and not forget that constraint, your exercise tells you to do something but it's not necessarily telling you to do it the wrong way, in fact it's a good one that will lead you to pitfalls and hence make you a better programmer once you figure them out.
So what's left given that constraint of not returning addresses to local variables but still addressing the task given? You have the static fix as M M. already mentioned, or you could think you're creating something useful like a function rotate, for example, that accepts an string(&)[10] and returns itself rotated ;-)
Look that, iostream insertion and extraction operators already work like that, returning references to parameters passed by reference.

In C++11, when are a lambda expression's bound variables supposed to be captured-by-value?

I have a Visual Studio 2010 C++ program, the main function of which is:
vector<double> v(10);
double start = 0.0; double increment = 10.0;
auto f = [&start, increment]() { return start += increment; };
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
cout << endl << "Changing vars to try again..." << endl;
start = 15; increment = -1.5;
generate(v.begin(), v.end(), f);
for(auto it = v.cbegin(); it != v.cend(); ++it) { cout << *it << ", "; }
return 0;
When I compile this in MS Visual Studio, the first generate does what I expected, resulting in "10, 20, ... 100, ". The second does not; the lambda "sees" the change in start but not the change in increment, so I get "25, 35, ... 115, ".
MSDN explains that
The Visual C++ compiler binds a lambda expression to its captured variables when the expression is declared instead of when the expression is called. ... [T]he reassignment of [a variable captured by value] later in the program does not affect the result of the expression.
So my question is: is this standards-compliant C++11 behavior, or is it Microsoft's own eccentric implementation? Bonus: if it is standard behavior, why was the standard written that way? Does it have to do with enforcing referential transparency for functional programming?
With a lambda expression, the bound variables are captured at the time of declaration.
This sample will make it very clear: https://ideone.com/Ly38P
std::function<int()> dowork()
{
int answer = 42;
auto lambda = [answer] () { return answer; };
// can do what we want
answer = 666;
return lambda;
}
int main()
{
auto ll = dowork();
return ll(); // 42
}
It is clear that the capture must be happening before the invocation, since the variables being captured don't even exist (not in scope, neither in lifetime) anymore at a later time.
It's bound at creation time. Consider:
#include <functional>
#include <iostream>
std::function<int(int)> foo;
void sub()
{
int a = 42;
foo = [a](int x) -> int { return x + a; };
}
int main()
{
sub();
int abc = 54;
abc = foo(abc); // Note a no longer exists here... but it was captured by
// value, so the caller shouldn't have to care here...
std::cout << abc; //96
}
There's no a here when the function is called -- there'd be no way for the compiler to go back and update it. If you pass a by reference, then you have undefined behavior. But if you pass by value any reasonable programmer would expect this to work.
I think you are confusing the mechanism of capture with the mechanism of variable passing. They are not the same thing even if they bear some superficial resemblance to one another. If you need the current value of a variable inside a lambda expression, capture it by reference (though, of course, that reference is bound to a particular variable at the point the lambda is declared).
When you 'capture' a variable, you are creating something very like a closure. And closures are always statically scoped (i.e. the 'capture' happens at the point of declaration). People familiar with the concept of a lambda expression would find C++'s lambda expressions highly strange and confusing if it were otherwise. Adding a brand new feature to a programming language that is different from the same feature in other programming languages in some significant way would make C++ even more confusing and difficult to understand than it already is. Also, everything else in C++ is statically scoped, so adding some element of dynamic scoping would be very strange for that reason as well.
Lastly, if capture always happened by reference, then that would mean a lambda would only be valid as long as the stack frame was valid. Either you would have to add garbage collected stack frames to C++ (with a huge performance hit and much screaming from people who are depending on the stack being largely contiguous) or you would end up creating yet another feature where it was trivially easy to blow your foot off with a bazooka by accident as the stack frame referenced by a lambda expression would go out of scope and you'd basically be creating a lot of invisible opportunities to return local variables by reference.
Yes, it has to capture by value at the point because otherwise you could attempt to capture a variable (by reference for example) that no longer exists when the lambda/function is actually called.
The standard supports capturing both by value AND by reference to address both possible use cases. If you tell the compiler to capture by value it's captured at the point the lambda is created. If you ask to capture by reference, it will capture a reference to the variable which will then be used at the point the lambda is called (requiring of course that the referenced variable must still exist at the point the call is made).

Returning a reference in C++ [duplicate]

This question already has answers here:
Can a local variable's memory be accessed outside its scope?
(20 answers)
Closed 6 years ago.
Consider the following code where I am returning double& and a string&. It works fine in the case of a double but not in the case of a string. Why does the behavior differ?
In both cases the compiler does not even throw the Warning: returning address of local variable or temporary as I am returning a reference.
#include <iostream>
#include <string>
using namespace std;
double &getDouble(){
double h = 46.5;
double &refD = h;
return refD;
}
string &getString(){
string str = "Devil Jin";
string &refStr = str;
return refStr;
}
int main(){
double d = getDouble();
cout << "Double = " << d << endl;
string str = getString();
cout << "String = " << str.c_str() << endl;
return 0;
}
Output:
$ ./a.exe
Double = 46.5
String =
You should never return a reference to a local variable no matter what the compiler does or does not do. The compiler may be fooled easily. you should not base the correctness of your code on some warning which may not have fired.
The reason it didn't fire here is probably that you're not literally returning a reference to a local variable, you are returning a variable that is a reference to a local variable. The compiler probably doesn't detect this somewhat more complex situation. It only detects things like:
string &getString(){
string str = "Devil Jin";
return str;
}
The case of the double is simpler because it doesn't involve constructing and destructing a complex object so in this situation the flow control analysis of the compiler probably did a better job at detecting the mistake.
The reference to a double refers to a location that is still physically in memory but no longer on the stack. You're only getting away with it because the memory hasn't been overwritten yet. Whereas the double is a primitive, the string is an object and has a destructor that may be clearing the internal string to a zero length when it falls out of scope. The fact that you aren't getting garbage from your call to c_str() seems to support that.
GCC used to have an extension called Named Returns that let you accomplish the same thing, but allocated the space outside the function. Unfortunately it doesn't exist anymore; I'm not sure why they took it out
Classic case of a Dangling reference in respect to C++.The double variable was not on call stack while returning reference was trying to access it invoking the compiler to set the guarding flags. String however has explicit Garbage Collection mechanism which lets your compiler to overlook the scenario.