I searched everywhere but could not find an answer to my question. I am trying to write an example that shows that capturing a local variable of the enclosing function by reference is dangerous because it may not exist anymore when it is actually referenced. Here's my example:
#include <iostream>
std::function<int (int)> test2(int l) {
int k = 10;
return [&] (int y) { return ++k + 100; };
}
void test(std::function<int (int)> k) {
std::cout << k(100);
}
int main() {
test(test2(100));
std::function<int (int)> func = test2(100);
test(func);
return 0;
}
I tried to reproduce stack corruption from trying to access and modify a local variable that doesn't exist on the stack frame by returning a lambda function from test2 that captures a local variable k and modifies it.
std::function<int (int)> func = test2(100);
test(func);
prints out a garbage value which indicates something went wrong as expected. However,
test(test2(100));
prints out "111". This is confusing to me as I thought when test2(100) returns a lambda function of type std::function, the stack frame for test2 will be gone, and when test is invoked, it should not be able to access the value of k. I'd appreciate any ideas or keywords I can use to search for answers.
I have run your test on my machine and the results are as expected total garbage in both cases. Having a correct answer once in a while in this capacity is very misleading. A dangling reference or pointer might occasionally point out to the same value as long as the pointed memory hasn't been occupied by a different value yet.
In a nutshell, The C++ lambdas do not extend the lifetimes of captured references/pointers shall their reference stack unwind. Same thing applies to capturing the 'this' pointer of a class. If the class goes out of scope, the 'this->' will result in a completely undefined behaviour.
Related
Today I encountered a very unintuitive behavior (for me, at least) in C++11 lambdas. The code in question is the following:
#include <stdio.h>
auto sum(int x) {
return [&x](int y) {
return x + y;
};
}
int main() {
int a = sum(2)(3);
printf("%d\n",a);
}
Instead of printing 5, this prints gibberish. Actually, at least in my version of GCC, if I turn on the -O2 optimization flag, it actually prints 5. Since the output depends on the optimization level of the compiler, it is undefined behavior. After a while, I think I understood what is happening.
When the function sum is called, a stack variable corresponding to the argument x is set to 2, then the function sum returns, and this stack variable might be overwritten by anything that the compiler needs to put there to execute following code, and by the time the lambda eventually gets executed, the place where x was no longer holds 2, and the program adds 3 to an arbitrary integer.
Is there any elegant way to do currying in C++ guaranteeing that the variable gets captured correctly?
int x has a limited lifetime. References to automatic storage variables (what you call "the stack") are only valid over the variable's lifetime. In this case, only until the end of the stack frame (the scope) where the variable exists, or the function for function arguments.
[&] captures any mentioned ("local") variable by reference, except this (which is captured by value if used or implicitly used). [=] captures any mentioned variable by value. [x] would capture x explicitly, and [&x] by reference explicitly. In C++17, [*this] also works.
There is also [x=std::move(x)], or [blah=expression].
In general, if the lambda will outlive the current scope don't use [&]: be explicit about what you capture.
I am making a std::vector of callback std::functions, and I'm having a little trouble understanding the captures. They seem to be going out of scope when I try to use them if I capture by reference. If I capture by value, everything works.
The code that uses these callback functions expects a certain signature, so assuming I can't modify the code that's using these, I need to stick with capture variables instead of passing things as function arguments.
When is localVar being captured? Is it when the lambda is defined, or when it is called? Does the answer change depending on whether I capture by value or reference?
Here's a little example that I would like to understand:
#include <iostream>
#include <functional>
#include <vector>
int main(int argc, char **argv)
{
int n(5);
// make a vector of lambda functions
std::vector<std::function<const int(void)> > fs;
for(size_t i = 0; i < n; ++i){
int localVar = i;
auto my_lambda = [&localVar]()->int // change &localVar to localVar and it works
{
return localVar+100;
};
fs.push_back(my_lambda);
}
// use the vector of lambda functions
for(size_t i = 0; i < n; ++i){
std::cout << fs[i]() << "\n";
}
return 0;
}
The reference is captured when you create the lambda. The value of the referred object is never captured. When you call the lambda, it will use the reference to determine the referred object's value whenever you use it (like using any other reference). If you use the reference after the referred object ceases to exist, you are using a dangling reference, it's undefined behavior.
In this case, auto my_lambda = [&localVar]()->int creates a lambda with a reference named localVar to the local variable localVar.
std::cout << fs[i]() << "\n"; calls one of the lambdas. However, when the lambda executes return localVar+100;, it's trying to use the reference localVar to the local variable localVar(local to the first for loop) but that local variable no longer exists. You have undefined behavior.
If you drop the ampersand and take localVar by value (auto my_lambda = [localVar]()->int), you will instead capture a copy of the value as it is at the moment the lambda is created. Since it's a copy, it doesn't matter what happens to the original localVar.
You can read about this at http://en.cppreference.com/w/cpp/language/lambda#Lambda_capture
They seem to be going out of scope when I try to use them if I capture by reference
That's right. You created a lambda that encapsulates a reference to a local variable. The variable went out of scope, leaving that reference dangling. This is no different to any other reference.
Capturing "happens" at the point where you define the lambda — that is the purpose of it! If it occurred later, when you call the lambda (which time?), the things you wanted to capture would be long gone, or at least unreachable.
Capturing allows us to "save" things that we can name now, for later. But if you capture by reference, you'd better ensure the thing referred-to still exists when you come to use that reference.
Watch out for weirdnesses like this, though.
Today I encountered a very unintuitive behavior (for me, at least) in C++11 lambdas. The code in question is the following:
#include <stdio.h>
auto sum(int x) {
return [&x](int y) {
return x + y;
};
}
int main() {
int a = sum(2)(3);
printf("%d\n",a);
}
Instead of printing 5, this prints gibberish. Actually, at least in my version of GCC, if I turn on the -O2 optimization flag, it actually prints 5. Since the output depends on the optimization level of the compiler, it is undefined behavior. After a while, I think I understood what is happening.
When the function sum is called, a stack variable corresponding to the argument x is set to 2, then the function sum returns, and this stack variable might be overwritten by anything that the compiler needs to put there to execute following code, and by the time the lambda eventually gets executed, the place where x was no longer holds 2, and the program adds 3 to an arbitrary integer.
Is there any elegant way to do currying in C++ guaranteeing that the variable gets captured correctly?
int x has a limited lifetime. References to automatic storage variables (what you call "the stack") are only valid over the variable's lifetime. In this case, only until the end of the stack frame (the scope) where the variable exists, or the function for function arguments.
[&] captures any mentioned ("local") variable by reference, except this (which is captured by value if used or implicitly used). [=] captures any mentioned variable by value. [x] would capture x explicitly, and [&x] by reference explicitly. In C++17, [*this] also works.
There is also [x=std::move(x)], or [blah=expression].
In general, if the lambda will outlive the current scope don't use [&]: be explicit about what you capture.
I am passing my local-variables by reference to two lambda. I call these lambdas outside of the function scope. Is this undefined ?
std::pair<std::function<int()>, std::function<int()>> addSome() {
int a = 0, b = 0;
return std::make_pair([&a,&b] {
++a; ++b;
return a+b;
}, [&a, &b] {
return a;
});
}
int main() {
auto f = addSome();
std::cout << f.first() << " " << f.second();
return 0;
}
If it is not, however, changes in one lambda are not reflected in other lambda.
Am i misunderstanding pass-by-reference in context of lambdas ?
I am writing to the variables and it seems to be working fine with no runtime-errors with output
2 0. If it works then i would expect output 2 1.
Yes, this causes undefined behavior. The lambdas will reference stack-allocated objects that have gone out of scope. (Technically, as I understand it, the behavior is defined until the lambdas access a and/or b. If you never invoke the returned lambdas then there is no UB.)
This is undefined behavior the same way that it's undefined behavior to return a reference to a stack-allocated local and then use that reference after the local goes out of scope, except that in this case it's being obfuscated a bit by the lambda.
Further, note that the order in which the lambdas are invoked is unspecified -- the compiler is free to invoke f.second() before f.first() because both are part of the same full-expression. Therefore, even if we fix the undefined behavior caused by using references to destroyed objects, both 2 0 and 2 1 are still valid outputs from this program, and which you get depends on the order in which your compiler decides to execute the lambdas. Note that this is not undefined behavior, because the compiler can't do anything at all, rather it simply has some freedom in deciding the order in which to do some things.
(Keep in mind that << in your main() function is invoking a custom operator<< function, and the order in which function arguments are evaluated is unspecified. Compilers are free to emit code that evaluates all of the function arguments within the same full-expression in any order, with the constraint that all arguments to a function must be evaluated before that function is invoked.)
To fix the first problem, use std::shared_ptr to create a reference-counted object. Capture this shared pointer by value, and the lambdas will keep the pointed-to object alive as long as they (and any copies thereof) exist. This heap-allocated object is where we will store the shared state of a and b.
To fix the second problem, evaluate each lambda in a separate statement.
Here is your code rewritten with the undefined behavior fixed, and with f.first() guaranteed to be invoked before f.second():
std::pair<std::function<int()>, std::function<int()>> addSome() {
// We store the "a" and "b" ints instead in a shared_ptr containing a pair.
auto numbers = std::make_shared<std::pair<int, int>>(0, 0);
// a becomes numbers->first
// b becomes numbers->second
// And we capture the shared_ptr by value.
return std::make_pair(
[numbers] {
++numbers->first;
++numbers->second;
return numbers->first + numbers->second;
},
[numbers] {
return numbers->first;
}
);
}
int main() {
auto f = addSome();
// We break apart the output into two statements to guarantee that f.first()
// is evaluated prior to f.second().
std::cout << f.first();
std::cout << " " << f.second();
return 0;
}
(See it run.)
Unfortunately C++ lambdas can capture by reference but don't solve the "upwards funarg problem".
Doing so would require allocating captured locals in "cells" and garbage collection or reference counting for deallocation. C++ is not doing it and unfortunately this make C++ lambdas a lot less useful and more dangerous than in other languages like Lisp, Python or Javascript.
More specifically in my experience you should avoid at all costs implicit capture by reference (i.e. using the [&](…){…} form) for lambda objects that survive the local scope because that's a recipe for random segfaults later during maintenance.
Always plan carefully about what to capture and how and about the lifetime of captured references.
Of course it's safe to capture everything by reference with [&] if all you are doing is simply using the lambda in the same scope to pass code for example to algorithms like std::sort without having to define a named comparator function outside of the function or as locally used utility functions (I find this use very readable and nice because you can get a lot of context implicitly and there is no need to 1. make up a global name for something that will never be reused anywhere else, 2. pass a lot of context or creating extra classes just for that context).
An approach that can work sometimes is capturing by value a shared_ptr to a heap-allocated state. This is basically implementing by hand what Python does automatically (but pay attention to reference cycles to avoid memory leaks: Python has a garbage collector, C++ doesn't).
When you are going out of scope, make a copy of the locals you use with capture by value ([=]):
MyType func(void)
{
int x = 5;
//When called, local x will no longer be in scope; so, use capture by value.
return ([=] {
x += 2;
});
}
When you are in the same scope, better to use capture by reference ([&]):
void func(void)
{
int x = 5;
//When called, local x will still be in scope; safe to use capture by reference.
([&] {
x += 2;
})(); //Lambda is immediately invoked here, in the same scope as x, with ().
}
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Returning the address of local or temporary variable
Can a local variable’s memory be accessed outside its scope?
Even knowing what happens as a result of the following snips it would be helpful to understand how it is happening. Four questions follow.
Given:
int& foo()
{
int i = 1;
return i;
}
And knowing that in the following a reference to the local named i is de-referenced into a temp that is assigned to intVal and local i disappears at the end of foo()
int intVal = foo();
First question - in the following, the right hand side of the expression is the same as above so is this a case where the compiler sees the left hand side and, based on context, knows not to de-reference the returned reference, and instead to create a new reference is initialized with it?
Second question - and this alone makes the local i stick around while intRef is in scope?
int& intRef = foo();
Third question - bellow intPtr gets address of local i. So, is the compiler using the context of the assignment and deciding to not de-reference to get a value before taking the address of the reference (rather than say taking the address of a temp containing the de-referenced value)?
Fourth question - does local i stick around while intPtr is in scope?
int* intPtr = &foo();
Nope, none of those will extend the lifetime of the local variable. Nothing in C++ will have that effect. Local objects in C++ live until the end of the scope in which they are declared, end of story.
The only rule which, at first glance, seems to follow different rules is this:
int foo() {
return 42;
}
int main() {
const int& i = foo();
// here, `i` is a reference to the temporary that was returned from `foo`, and whose lifetime has been extended
}
That is, a const reference can extend the lifetime of a temporary being assigned to it.
But that requires the function to return a value, not a reference, and the callee to bind the return value to a const reference, neither of which are done in your code.
In no case (not intVal, not intRef, and not intPtr) does i necessarily stick around after foo returns.
The value on the stack which was previously occupied by i may or may not be changed at any time, after foo returns.
For example (on some CPUs and O/Ses), it is likely to be changed by any subsequent call to a subroutine, and may be changed if a hardware interrupt occurs.