Today I encountered a very unintuitive behavior (for me, at least) in C++11 lambdas. The code in question is the following:
#include <stdio.h>
auto sum(int x) {
return [&x](int y) {
return x + y;
};
}
int main() {
int a = sum(2)(3);
printf("%d\n",a);
}
Instead of printing 5, this prints gibberish. Actually, at least in my version of GCC, if I turn on the -O2 optimization flag, it actually prints 5. Since the output depends on the optimization level of the compiler, it is undefined behavior. After a while, I think I understood what is happening.
When the function sum is called, a stack variable corresponding to the argument x is set to 2, then the function sum returns, and this stack variable might be overwritten by anything that the compiler needs to put there to execute following code, and by the time the lambda eventually gets executed, the place where x was no longer holds 2, and the program adds 3 to an arbitrary integer.
Is there any elegant way to do currying in C++ guaranteeing that the variable gets captured correctly?
int x has a limited lifetime. References to automatic storage variables (what you call "the stack") are only valid over the variable's lifetime. In this case, only until the end of the stack frame (the scope) where the variable exists, or the function for function arguments.
[&] captures any mentioned ("local") variable by reference, except this (which is captured by value if used or implicitly used). [=] captures any mentioned variable by value. [x] would capture x explicitly, and [&x] by reference explicitly. In C++17, [*this] also works.
There is also [x=std::move(x)], or [blah=expression].
In general, if the lambda will outlive the current scope don't use [&]: be explicit about what you capture.
Related
I searched everywhere but could not find an answer to my question. I am trying to write an example that shows that capturing a local variable of the enclosing function by reference is dangerous because it may not exist anymore when it is actually referenced. Here's my example:
#include <iostream>
std::function<int (int)> test2(int l) {
int k = 10;
return [&] (int y) { return ++k + 100; };
}
void test(std::function<int (int)> k) {
std::cout << k(100);
}
int main() {
test(test2(100));
std::function<int (int)> func = test2(100);
test(func);
return 0;
}
I tried to reproduce stack corruption from trying to access and modify a local variable that doesn't exist on the stack frame by returning a lambda function from test2 that captures a local variable k and modifies it.
std::function<int (int)> func = test2(100);
test(func);
prints out a garbage value which indicates something went wrong as expected. However,
test(test2(100));
prints out "111". This is confusing to me as I thought when test2(100) returns a lambda function of type std::function, the stack frame for test2 will be gone, and when test is invoked, it should not be able to access the value of k. I'd appreciate any ideas or keywords I can use to search for answers.
I have run your test on my machine and the results are as expected total garbage in both cases. Having a correct answer once in a while in this capacity is very misleading. A dangling reference or pointer might occasionally point out to the same value as long as the pointed memory hasn't been occupied by a different value yet.
In a nutshell, The C++ lambdas do not extend the lifetimes of captured references/pointers shall their reference stack unwind. Same thing applies to capturing the 'this' pointer of a class. If the class goes out of scope, the 'this->' will result in a completely undefined behaviour.
Today I encountered a very unintuitive behavior (for me, at least) in C++11 lambdas. The code in question is the following:
#include <stdio.h>
auto sum(int x) {
return [&x](int y) {
return x + y;
};
}
int main() {
int a = sum(2)(3);
printf("%d\n",a);
}
Instead of printing 5, this prints gibberish. Actually, at least in my version of GCC, if I turn on the -O2 optimization flag, it actually prints 5. Since the output depends on the optimization level of the compiler, it is undefined behavior. After a while, I think I understood what is happening.
When the function sum is called, a stack variable corresponding to the argument x is set to 2, then the function sum returns, and this stack variable might be overwritten by anything that the compiler needs to put there to execute following code, and by the time the lambda eventually gets executed, the place where x was no longer holds 2, and the program adds 3 to an arbitrary integer.
Is there any elegant way to do currying in C++ guaranteeing that the variable gets captured correctly?
int x has a limited lifetime. References to automatic storage variables (what you call "the stack") are only valid over the variable's lifetime. In this case, only until the end of the stack frame (the scope) where the variable exists, or the function for function arguments.
[&] captures any mentioned ("local") variable by reference, except this (which is captured by value if used or implicitly used). [=] captures any mentioned variable by value. [x] would capture x explicitly, and [&x] by reference explicitly. In C++17, [*this] also works.
There is also [x=std::move(x)], or [blah=expression].
In general, if the lambda will outlive the current scope don't use [&]: be explicit about what you capture.
#include<iostream>
using namespace std;
void fun(int a) {
int x;
cout << x << endl;
x = a;
}
int main() {
fun(12);
fun(1);
return 0;
}
The output of this code is as follows:
178293 //garbage value
12
why are we getting 12 and not garbage value instead??
why are we getting 12 and not garbage value instead??
In theory, the value of x could be anything. However, what's happening in practice is that two calls to fun one after the other is responsible for the previous value of x to be still on the stack frame.
Let's say the stack frame is structured as below:
arguments
return value
local variables
In your case,
Memory used for arguments is equal to sizeof(int).
The compiler may omit using any memory for the return value since the return type is void.
Memory used for local variables is equal to sizeof(int).
When the function call is made the first time, the value in the arguments part is set to 12. The value in the local variables is garbage, as you noticed. However, before you return from the function, you set the value of the local variable to 12.
The second time the function is made, the the value of the argument is set to 1. The value in the local variable is still left over from the previous call. Hence, it is still 12. If you call the function a third time, you are likely to see the value 1 in the local variable.
Anyway, that's one plausible explanation. Once again, remember that this is undefined behavior. Don't count on any specific behavior. The compiler could decide to scrub the stack frame before using it. The compiler could decide to scrub the stack frame immediately after it is used. The compiler could do whatever it wants to do with the stack frame after it is used. If there is another call between the calls to fun, you will most likely get completely different values.
You have not initialized the value x when printing it. Reading from uninitialized memory is UB, i.e., there are no guarantees as to what will happen. It could output a random number, or invoke an unlikely combination of bits that will do something unexpected.
Reading an uninitialized integer is UNDEFINED BEHAVIOR, that means that it can do literally anything, can print anything. It can format your hard drive or collapse the observable universe! Basically I dont know compiler implementations that do so but theoretically they can!
I have encountered a problem in my learning of C++, where a local variable in a function is being passed to the local variable with the same name in another function, both of these functions run in main().
When this is run,
#include <iostream>
using namespace std;
void next();
void again();
int main()
{
int a = 2;
cout << a << endl;
next();
again();
return 0;
}
void next()
{
int a = 5;
cout << a << endl;
}
void again()
{
int a;
cout << a << endl;
}
it outputs:
2
5
5
I expected that again() would say null or 0 since 'a' is declared again there, and yet it seems to use the value that 'a' was assigned in next().
Why does next() pass the value of local variable 'a' to again() if 'a' is declared another time in again()?
http://en.cppreference.com/w/cpp/language/ub
You're correct, an uninitialized variable is a no-no. However, you are allowed to declare a variable and not initialize it until later. Memory is set aside to hold the integer, but what value happens to be in that memory until you do so can be anything at all. Some compilers will auto-initialize variables to junk values (to help you catch bugs), some will auto-initialize to default values, and some do nothing at all. C++ itself promises nothing, hence it's undefined behavior. In your case, with your simple program, it's easy enough to imagine how the compiler created assembly code that reused that exact same piece of memory without altering it. However, that's blind luck, and even in your simple program isn't guaranteed to happen. These types of bugs can actually be fairly insidious, so make it a rule: Be vigilant about uninitialized variables.
An uninitialized non-static local variable of *built-in type (phew! that was a mouthful) has an indeterminate value. Except for the char types, using that value yields formally Undefined Behavior, a.k.a. UB. Anything can happen, including the behavior that you see.
Apparently with your compiler and options, the stack area that was used for a in the call of next, was not used for something else until the call of again, where it was reused for the a in again, now with the same value as before.
But you cannot rely on that. With UB anything, or nothing, can happen.
* Or more generally of POD type, Plain Old Data. The standard's specification of this is somewhat complicated. In C++11 it starts with §8.5/11, “If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value.”. Where “automatic … storage duration” includes the case of local non-static variable. And where the “no initialization” can occur in two ways via §8.5/6 that defines default initialization, namely either via a do-nothing default constructor, or via the object not being of class or array type.
This is completely coincidental and undefined behavior.
What's happened is that you have two functions called immediately after one another. Both will have more or less identical function prologs and both reserve a variable of exactly the same size on the stack.
Since there are no other variables in play and the stack is not modified between the calls, you just happen to end up with the local variable in the second function "landing" in the same place as the previous function's local variable.
Clearly, this is not good to rely upon. In fact, it's a perfect example of why you should always initialize variables!
I am passing my local-variables by reference to two lambda. I call these lambdas outside of the function scope. Is this undefined ?
std::pair<std::function<int()>, std::function<int()>> addSome() {
int a = 0, b = 0;
return std::make_pair([&a,&b] {
++a; ++b;
return a+b;
}, [&a, &b] {
return a;
});
}
int main() {
auto f = addSome();
std::cout << f.first() << " " << f.second();
return 0;
}
If it is not, however, changes in one lambda are not reflected in other lambda.
Am i misunderstanding pass-by-reference in context of lambdas ?
I am writing to the variables and it seems to be working fine with no runtime-errors with output
2 0. If it works then i would expect output 2 1.
Yes, this causes undefined behavior. The lambdas will reference stack-allocated objects that have gone out of scope. (Technically, as I understand it, the behavior is defined until the lambdas access a and/or b. If you never invoke the returned lambdas then there is no UB.)
This is undefined behavior the same way that it's undefined behavior to return a reference to a stack-allocated local and then use that reference after the local goes out of scope, except that in this case it's being obfuscated a bit by the lambda.
Further, note that the order in which the lambdas are invoked is unspecified -- the compiler is free to invoke f.second() before f.first() because both are part of the same full-expression. Therefore, even if we fix the undefined behavior caused by using references to destroyed objects, both 2 0 and 2 1 are still valid outputs from this program, and which you get depends on the order in which your compiler decides to execute the lambdas. Note that this is not undefined behavior, because the compiler can't do anything at all, rather it simply has some freedom in deciding the order in which to do some things.
(Keep in mind that << in your main() function is invoking a custom operator<< function, and the order in which function arguments are evaluated is unspecified. Compilers are free to emit code that evaluates all of the function arguments within the same full-expression in any order, with the constraint that all arguments to a function must be evaluated before that function is invoked.)
To fix the first problem, use std::shared_ptr to create a reference-counted object. Capture this shared pointer by value, and the lambdas will keep the pointed-to object alive as long as they (and any copies thereof) exist. This heap-allocated object is where we will store the shared state of a and b.
To fix the second problem, evaluate each lambda in a separate statement.
Here is your code rewritten with the undefined behavior fixed, and with f.first() guaranteed to be invoked before f.second():
std::pair<std::function<int()>, std::function<int()>> addSome() {
// We store the "a" and "b" ints instead in a shared_ptr containing a pair.
auto numbers = std::make_shared<std::pair<int, int>>(0, 0);
// a becomes numbers->first
// b becomes numbers->second
// And we capture the shared_ptr by value.
return std::make_pair(
[numbers] {
++numbers->first;
++numbers->second;
return numbers->first + numbers->second;
},
[numbers] {
return numbers->first;
}
);
}
int main() {
auto f = addSome();
// We break apart the output into two statements to guarantee that f.first()
// is evaluated prior to f.second().
std::cout << f.first();
std::cout << " " << f.second();
return 0;
}
(See it run.)
Unfortunately C++ lambdas can capture by reference but don't solve the "upwards funarg problem".
Doing so would require allocating captured locals in "cells" and garbage collection or reference counting for deallocation. C++ is not doing it and unfortunately this make C++ lambdas a lot less useful and more dangerous than in other languages like Lisp, Python or Javascript.
More specifically in my experience you should avoid at all costs implicit capture by reference (i.e. using the [&](…){…} form) for lambda objects that survive the local scope because that's a recipe for random segfaults later during maintenance.
Always plan carefully about what to capture and how and about the lifetime of captured references.
Of course it's safe to capture everything by reference with [&] if all you are doing is simply using the lambda in the same scope to pass code for example to algorithms like std::sort without having to define a named comparator function outside of the function or as locally used utility functions (I find this use very readable and nice because you can get a lot of context implicitly and there is no need to 1. make up a global name for something that will never be reused anywhere else, 2. pass a lot of context or creating extra classes just for that context).
An approach that can work sometimes is capturing by value a shared_ptr to a heap-allocated state. This is basically implementing by hand what Python does automatically (but pay attention to reference cycles to avoid memory leaks: Python has a garbage collector, C++ doesn't).
When you are going out of scope, make a copy of the locals you use with capture by value ([=]):
MyType func(void)
{
int x = 5;
//When called, local x will no longer be in scope; so, use capture by value.
return ([=] {
x += 2;
});
}
When you are in the same scope, better to use capture by reference ([&]):
void func(void)
{
int x = 5;
//When called, local x will still be in scope; safe to use capture by reference.
([&] {
x += 2;
})(); //Lambda is immediately invoked here, in the same scope as x, with ().
}