Lambdas and capture by reference local variables : Accessing after the scope - c++

I am passing my local-variables by reference to two lambda. I call these lambdas outside of the function scope. Is this undefined ?
std::pair<std::function<int()>, std::function<int()>> addSome() {
int a = 0, b = 0;
return std::make_pair([&a,&b] {
++a; ++b;
return a+b;
}, [&a, &b] {
return a;
});
}
int main() {
auto f = addSome();
std::cout << f.first() << " " << f.second();
return 0;
}
If it is not, however, changes in one lambda are not reflected in other lambda.
Am i misunderstanding pass-by-reference in context of lambdas ?
I am writing to the variables and it seems to be working fine with no runtime-errors with output
2 0. If it works then i would expect output 2 1.

Yes, this causes undefined behavior. The lambdas will reference stack-allocated objects that have gone out of scope. (Technically, as I understand it, the behavior is defined until the lambdas access a and/or b. If you never invoke the returned lambdas then there is no UB.)
This is undefined behavior the same way that it's undefined behavior to return a reference to a stack-allocated local and then use that reference after the local goes out of scope, except that in this case it's being obfuscated a bit by the lambda.
Further, note that the order in which the lambdas are invoked is unspecified -- the compiler is free to invoke f.second() before f.first() because both are part of the same full-expression. Therefore, even if we fix the undefined behavior caused by using references to destroyed objects, both 2 0 and 2 1 are still valid outputs from this program, and which you get depends on the order in which your compiler decides to execute the lambdas. Note that this is not undefined behavior, because the compiler can't do anything at all, rather it simply has some freedom in deciding the order in which to do some things.
(Keep in mind that << in your main() function is invoking a custom operator<< function, and the order in which function arguments are evaluated is unspecified. Compilers are free to emit code that evaluates all of the function arguments within the same full-expression in any order, with the constraint that all arguments to a function must be evaluated before that function is invoked.)
To fix the first problem, use std::shared_ptr to create a reference-counted object. Capture this shared pointer by value, and the lambdas will keep the pointed-to object alive as long as they (and any copies thereof) exist. This heap-allocated object is where we will store the shared state of a and b.
To fix the second problem, evaluate each lambda in a separate statement.
Here is your code rewritten with the undefined behavior fixed, and with f.first() guaranteed to be invoked before f.second():
std::pair<std::function<int()>, std::function<int()>> addSome() {
// We store the "a" and "b" ints instead in a shared_ptr containing a pair.
auto numbers = std::make_shared<std::pair<int, int>>(0, 0);
// a becomes numbers->first
// b becomes numbers->second
// And we capture the shared_ptr by value.
return std::make_pair(
[numbers] {
++numbers->first;
++numbers->second;
return numbers->first + numbers->second;
},
[numbers] {
return numbers->first;
}
);
}
int main() {
auto f = addSome();
// We break apart the output into two statements to guarantee that f.first()
// is evaluated prior to f.second().
std::cout << f.first();
std::cout << " " << f.second();
return 0;
}
(See it run.)

Unfortunately C++ lambdas can capture by reference but don't solve the "upwards funarg problem".
Doing so would require allocating captured locals in "cells" and garbage collection or reference counting for deallocation. C++ is not doing it and unfortunately this make C++ lambdas a lot less useful and more dangerous than in other languages like Lisp, Python or Javascript.
More specifically in my experience you should avoid at all costs implicit capture by reference (i.e. using the [&](…){…} form) for lambda objects that survive the local scope because that's a recipe for random segfaults later during maintenance.
Always plan carefully about what to capture and how and about the lifetime of captured references.
Of course it's safe to capture everything by reference with [&] if all you are doing is simply using the lambda in the same scope to pass code for example to algorithms like std::sort without having to define a named comparator function outside of the function or as locally used utility functions (I find this use very readable and nice because you can get a lot of context implicitly and there is no need to 1. make up a global name for something that will never be reused anywhere else, 2. pass a lot of context or creating extra classes just for that context).
An approach that can work sometimes is capturing by value a shared_ptr to a heap-allocated state. This is basically implementing by hand what Python does automatically (but pay attention to reference cycles to avoid memory leaks: Python has a garbage collector, C++ doesn't).

When you are going out of scope, make a copy of the locals you use with capture by value ([=]):
MyType func(void)
{
int x = 5;
//When called, local x will no longer be in scope; so, use capture by value.
return ([=] {
x += 2;
});
}
When you are in the same scope, better to use capture by reference ([&]):
void func(void)
{
int x = 5;
//When called, local x will still be in scope; safe to use capture by reference.
([&] {
x += 2;
})(); //Lambda is immediately invoked here, in the same scope as x, with ().
}

Related

How to keep lambda parameter in memory? [duplicate]

Today I encountered a very unintuitive behavior (for me, at least) in C++11 lambdas. The code in question is the following:
#include <stdio.h>
auto sum(int x) {
return [&x](int y) {
return x + y;
};
}
int main() {
int a = sum(2)(3);
printf("%d\n",a);
}
Instead of printing 5, this prints gibberish. Actually, at least in my version of GCC, if I turn on the -O2 optimization flag, it actually prints 5. Since the output depends on the optimization level of the compiler, it is undefined behavior. After a while, I think I understood what is happening.
When the function sum is called, a stack variable corresponding to the argument x is set to 2, then the function sum returns, and this stack variable might be overwritten by anything that the compiler needs to put there to execute following code, and by the time the lambda eventually gets executed, the place where x was no longer holds 2, and the program adds 3 to an arbitrary integer.
Is there any elegant way to do currying in C++ guaranteeing that the variable gets captured correctly?
int x has a limited lifetime. References to automatic storage variables (what you call "the stack") are only valid over the variable's lifetime. In this case, only until the end of the stack frame (the scope) where the variable exists, or the function for function arguments.
[&] captures any mentioned ("local") variable by reference, except this (which is captured by value if used or implicitly used). [=] captures any mentioned variable by value. [x] would capture x explicitly, and [&x] by reference explicitly. In C++17, [*this] also works.
There is also [x=std::move(x)], or [blah=expression].
In general, if the lambda will outlive the current scope don't use [&]: be explicit about what you capture.

When are capture variables captured?

I am making a std::vector of callback std::functions, and I'm having a little trouble understanding the captures. They seem to be going out of scope when I try to use them if I capture by reference. If I capture by value, everything works.
The code that uses these callback functions expects a certain signature, so assuming I can't modify the code that's using these, I need to stick with capture variables instead of passing things as function arguments.
When is localVar being captured? Is it when the lambda is defined, or when it is called? Does the answer change depending on whether I capture by value or reference?
Here's a little example that I would like to understand:
#include <iostream>
#include <functional>
#include <vector>
int main(int argc, char **argv)
{
int n(5);
// make a vector of lambda functions
std::vector<std::function<const int(void)> > fs;
for(size_t i = 0; i < n; ++i){
int localVar = i;
auto my_lambda = [&localVar]()->int // change &localVar to localVar and it works
{
return localVar+100;
};
fs.push_back(my_lambda);
}
// use the vector of lambda functions
for(size_t i = 0; i < n; ++i){
std::cout << fs[i]() << "\n";
}
return 0;
}
The reference is captured when you create the lambda. The value of the referred object is never captured. When you call the lambda, it will use the reference to determine the referred object's value whenever you use it (like using any other reference). If you use the reference after the referred object ceases to exist, you are using a dangling reference, it's undefined behavior.
In this case, auto my_lambda = [&localVar]()->int creates a lambda with a reference named localVar to the local variable localVar.
std::cout << fs[i]() << "\n"; calls one of the lambdas. However, when the lambda executes return localVar+100;, it's trying to use the reference localVar to the local variable localVar(local to the first for loop) but that local variable no longer exists. You have undefined behavior.
If you drop the ampersand and take localVar by value (auto my_lambda = [localVar]()->int), you will instead capture a copy of the value as it is at the moment the lambda is created. Since it's a copy, it doesn't matter what happens to the original localVar.
You can read about this at http://en.cppreference.com/w/cpp/language/lambda#Lambda_capture
They seem to be going out of scope when I try to use them if I capture by reference
That's right. You created a lambda that encapsulates a reference to a local variable. The variable went out of scope, leaving that reference dangling. This is no different to any other reference.
Capturing "happens" at the point where you define the lambda — that is the purpose of it! If it occurred later, when you call the lambda (which time?), the things you wanted to capture would be long gone, or at least unreachable.
Capturing allows us to "save" things that we can name now, for later. But if you capture by reference, you'd better ensure the thing referred-to still exists when you come to use that reference.
Watch out for weirdnesses like this, though.

How to return object more efficient without copying [duplicate]

When a function (callee) returns a quantity to the caller function, is it returned by
value or by reference?
The thing is I have written a function which builds a very large vector of when called. I want to return this big vector to the calling function , ( in this case main() ) by constant reference so I can do some further processing on it.
I was in doubt because I was told that when a C++ function returns and terminates, all the variables/memory associated with that function, get wiped clean.
struct node{
string key;
int pnum;
node* ptr;
}
vector< vector<node> > myfun1(/*Some arguments*/)
{
/*Build the vector of vectors. Call it V*/
return v;
}
int main(void)
{
a=myfun1(/* Some arguments */)
}
C++ functions can return by value, by reference (but don't return a local variable by reference), or by pointer (again, don't return a local by pointer).
When returning by value, the compiler can often do optimizations that make it equally as fast as returning by reference, without the problem of dangling references. These optimizations are commonly called "Return Value Optimization (RVO)" and/or "Named Return Value Optimization (NRVO)".
Another way to for the caller to provide an empty vector (by reference), and have the function fill it in. Then it doesn't need to return anything.
You definitely should read this blog posting: Want Speed? Pass by value.
By default, everything in C/C++ is passed by value, including return type, as in the example below:
T foo() ;
In C++, where the types are usually considered value-types (i.e. they behave like int or double types), the extra copy can be costly if the object's construction/destruction is not trivial.
With C++03
If you want to return by reference, or by pointer, you need to change the return type to either:
T & foo() ; // return a reference
T * foo() ; // return a pointer
but in both cases, you need to make sure the object returned still exists after the return. For example, if the object returned was allocated on stack in the body of the function, the object will be destroyed, and thus, its reference/pointer will be invalid.
If you can't guarantee the object still exists after the return, your only solution is to either:
accept the cost of an extra copy, and hope for a Return Value Optimization
pass instead a variable by reference as a parameter to the function, as in the following:
void foo(T & t) ;
This way, inside the function, you set the t value as necessary, and after the function returns, you have your result.
With C++11
Now, if you have the chance to work with C++0x/C++11, that is, with a compiler that supports r-values references/move semantics, if your object has the right constructor/operator (if your object comes from the standard library, then it's ok), then the extra temporary copy will be optimized away, and you can keep the notation:
T foo() ;
Knowing that the compiler will not generate an unnecessary temporary value.
C++ can return either by reference or by value. If you want to return a reference, you must specify that as part of the return type:
std::vector<int> my_func(); // returns value
std::vector<int>& my_func(); // returns reference
std::vector<int> const& my_func(); // returns constant reference
All local (stack) variables created inside of a function are destroyed when the function returns. That means you should absolutely not return locals by reference or const reference (or pointers to them). If you return the vector by value it may be copied before the local is destroyed, which could be costly. (Certain types of optimizations called "return value optimization" can sometimes remove the copy, but that's out of the scope of this question. It's not always easy to tell whether the optimization will happen on a particular piece of code.)
If you want to "create" a large vector inside of a function and then return it without copying, the easiest way is to pass the vector in to the function as a reference parameter:
void fill_vector(std::vector<int> &vec) {
// fill "vec" and don't return anything...
}
Also note that in the recently ratified new version of the C++ standard (known as C++0x or C++11) returning a local vector by value from a function will not actually copy the vector, it will be efficiently moved into its new location. The code that does this looks identical to code from previous versions of C++ which could be forced to copy the vector. Check with your compiler to see whether it supports "move semantics" (the portion of the C++11 standard that makes this possible).
It's returned by whatever you declare the return type to be. vector<int> f(); and vector<int>& f(); return by value and reference respectively. However, it would be a grave error to return a reference to a local variable in the function as it will have been blown away when the function scope exits.
For good tips on how to efficiently return large vectors from a function, see this question (in fact this one is arguably a duplicate of that).
The function will return what you tell it to return. If you want to return a vector, then it will be copied to the variable hold by the caller. Unless you capture that result by const reference, in which case there is no need to copy it. There are optimizations that allow functions to avoid this extra copy-constructon by placing the result in the object that will hold the return value. You should read this before changing your design for performance:
http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/
Like most things in C++, the answer is "it depends on how you defined the function".
The default for the language is return-by-value. A simple call like "double f()" is going to always return the floating-point number by value. However, you CAN return values by pointer or by reference- you just add the extra symbols '&' or '*' to the return type:
// Return by pointer (*)
T* f();
// Return by reference (a single '&')
T& f();
However, these are ridiculously unsafe in many situations. If the value the function is returning was declared within the function, the returned reference or pointer will point to random garbage instead of valid data. Even if you can guarantee that the pointed-to data is still around, this kind of return is usually more trouble than it is worth given the optimizations all modern C++ compilers will do for you. The idiomatic, safe way to return something by reference is to pass a named reference in as a parameter:
// Return by 'parameter' (a logical reference return)
void f(T& output);
Now the output has a real name, and we KNOW it will survive the call because it has to exist before the call to 'f' is even made. This is a pattern you will see often in C++, especially for things like populating a STL std::vector. Its ugly, but until the advent of C++11 it was often faster than simply returning the vector by value. Now that return by value is both simpler and faster even for many complex types, you will probably not see many functions following the reference return parameter pattern outside of older libraries.
All variables defined on the stack are cleaned upon exit.
To return a variable you should allocate it on the heap, which you do with the new keyword (or malloc).
Classes and structs are passed around as pointers, while the primitive types are passed around as values.

Returning a lambda capturing a local variable

Today I encountered a very unintuitive behavior (for me, at least) in C++11 lambdas. The code in question is the following:
#include <stdio.h>
auto sum(int x) {
return [&x](int y) {
return x + y;
};
}
int main() {
int a = sum(2)(3);
printf("%d\n",a);
}
Instead of printing 5, this prints gibberish. Actually, at least in my version of GCC, if I turn on the -O2 optimization flag, it actually prints 5. Since the output depends on the optimization level of the compiler, it is undefined behavior. After a while, I think I understood what is happening.
When the function sum is called, a stack variable corresponding to the argument x is set to 2, then the function sum returns, and this stack variable might be overwritten by anything that the compiler needs to put there to execute following code, and by the time the lambda eventually gets executed, the place where x was no longer holds 2, and the program adds 3 to an arbitrary integer.
Is there any elegant way to do currying in C++ guaranteeing that the variable gets captured correctly?
int x has a limited lifetime. References to automatic storage variables (what you call "the stack") are only valid over the variable's lifetime. In this case, only until the end of the stack frame (the scope) where the variable exists, or the function for function arguments.
[&] captures any mentioned ("local") variable by reference, except this (which is captured by value if used or implicitly used). [=] captures any mentioned variable by value. [x] would capture x explicitly, and [&x] by reference explicitly. In C++17, [*this] also works.
There is also [x=std::move(x)], or [blah=expression].
In general, if the lambda will outlive the current scope don't use [&]: be explicit about what you capture.

Local Variables Being Passed ( C++)

I have encountered a problem in my learning of C++, where a local variable in a function is being passed to the local variable with the same name in another function, both of these functions run in main().
When this is run,
#include <iostream>
using namespace std;
void next();
void again();
int main()
{
int a = 2;
cout << a << endl;
next();
again();
return 0;
}
void next()
{
int a = 5;
cout << a << endl;
}
void again()
{
int a;
cout << a << endl;
}
it outputs:
2
5
5
I expected that again() would say null or 0 since 'a' is declared again there, and yet it seems to use the value that 'a' was assigned in next().
Why does next() pass the value of local variable 'a' to again() if 'a' is declared another time in again()?
http://en.cppreference.com/w/cpp/language/ub
You're correct, an uninitialized variable is a no-no. However, you are allowed to declare a variable and not initialize it until later. Memory is set aside to hold the integer, but what value happens to be in that memory until you do so can be anything at all. Some compilers will auto-initialize variables to junk values (to help you catch bugs), some will auto-initialize to default values, and some do nothing at all. C++ itself promises nothing, hence it's undefined behavior. In your case, with your simple program, it's easy enough to imagine how the compiler created assembly code that reused that exact same piece of memory without altering it. However, that's blind luck, and even in your simple program isn't guaranteed to happen. These types of bugs can actually be fairly insidious, so make it a rule: Be vigilant about uninitialized variables.
An uninitialized non-static local variable of *built-in type (phew! that was a mouthful) has an indeterminate value. Except for the char types, using that value yields formally Undefined Behavior, a.k.a. UB. Anything can happen, including the behavior that you see.
Apparently with your compiler and options, the stack area that was used for a in the call of next, was not used for something else until the call of again, where it was reused for the a in again, now with the same value as before.
But you cannot rely on that. With UB anything, or nothing, can happen.
* Or more generally of POD type, Plain Old Data. The standard's specification of this is somewhat complicated. In C++11 it starts with §8.5/11, “If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value.”. Where “automatic … storage duration” includes the case of local non-static variable. And where the “no initialization” can occur in two ways via §8.5/6 that defines default initialization, namely either via a do-nothing default constructor, or via the object not being of class or array type.
This is completely coincidental and undefined behavior.
What's happened is that you have two functions called immediately after one another. Both will have more or less identical function prologs and both reserve a variable of exactly the same size on the stack.
Since there are no other variables in play and the stack is not modified between the calls, you just happen to end up with the local variable in the second function "landing" in the same place as the previous function's local variable.
Clearly, this is not good to rely upon. In fact, it's a perfect example of why you should always initialize variables!