Where are lambda captured variables stored? - c++

How is it possible that this example works? It prints 6:
#include <iostream>
#include <functional>
using namespace std;
void scopeIt(std::function<int()> &fun) {
int val = 6;
fun = [=](){return val;}; //<-- this
}
int main() {
std::function<int()> fun;
scopeIt(fun);
cout << fun();
return 0;
}
Where is the value 6 stored after scopeIt is done being called? If I replace the [=] with a [&], it prints 0 instead of 6.

It is stored within the closure, which - in your code - is then stored within std::function<int()> &fun.
A lambda generates what's equivalent to an instance of a compiler generated class.
This code:
[=](){return val;}
Generates what's effectively equivalent to this... this would be the "closure":
struct UNNAMED_TYPE
{
UNNAMED_TYPE(int val) : val(val) {}
const int val;
// Above, your [=] "equals/copy" syntax means "find what variables
// are needed by the lambda and copy them into this object"
int operator() () const { return val; }
// Above, here is the code you provided
} (val);
// ^^^ note that this DECLARED type is being INSTANTIATED (constructed) too!!

Lambdas in C++ are really just "anonymous" struct functors. So when you write this:
int val = 6;
fun = [=](){return val;};
What the compiler is translating that into is this:
int val = 6;
struct __anonymous_struct_line_8 {
int val;
__anonymous_struct_line_8(int v) : val(v) {}
int operator() () const {
return val; // returns this->val
}
};
fun = __anonymous_struct_line_8(val);
Then, std::function stores that functor via type erasure.
When you use [&] instead of [=], it changes the struct to:
struct __anonymous_struct_line_8 {
int& val; // Notice this is a reference now!
...
So now the object stores a reference to the function's val object, which becomes a dangling (invalid) reference after the function exits (and you get undefined behavior).

The so-called closure type (which is the class type of the lambda expression) has members for each captured entity. Those members are objects for capture by value, and references for capture by reference. They are initialized with the captured entities and live independently within the closure object (the particular object of closure type that this lambda designates).
The unnamed member that corresponds to the value capture of val is initialized with val and accessed from the inside of the closure types operator(), which is fine. The closure object may easily have been copied or moved multiple times until that happens, and that's fine too - closure types have implicitly defined move and copy constructors just as normal classes do.
However, when capturing by reference, the lvalue-to-rvalue conversion that is implicitly performed when calling fun in main induces undefined behavior as the object which the reference member referred to has already been destroyed - i.e. we are using a dangling reference.

The value of a lambda expression is an object of class type, and
For each entity
captured by copy, an unnamed non-static data member is declared in the closure type.
([expr.prim.lambda]/14 in C++11)
That is, the object created by the lambda
[=](){return val;}
actually contains a non-static member of int type, whose value is 6, and this object is copied into the std::function object.

Related

Forwarding additional arguments via lambda function

Here is the minimal reproducible code,
#include <iostream>
#include <string>
void bar(std::string s, int x){
std::cout <<std::endl<< __FUNCTION__<<":"<<s<<" "<<x;
}
using f_ptr = void(*)(std::string);
void foo(f_ptr ptr){
ptr("Hello world");
}
template<typename T> void fun(T f){
static int x;
std::cout <<std::endl<<x++<<std::endl;
f("Hello World");
}
int main()
{
//case:1
f_ptr ptr1 = [](std::string s){bar(s,10);};
// foo(ptr1);
//case:2
static int x =10;
f_ptr ptr2 = [x](std::string s){bar(s,x);};
//foo(ptr2);
//case:3
int y =10;
f_ptr ptr3 = [y](std::string s){bar(s,y);}; /* error*/
foo(ptr3);
//case:4
int z = 12;
fun([z](std::string s){bar(s,z);});
return 0;
}
Error:
main.cpp:25:50: error: cannot convert ‘main()::’ to ‘f_ptr {aka void (*)(std::basic_string)}’ in initialization
f_ptr ptr3 = [y](std::string s){bar(s,y);}; /* error*/
My questions are,
Is there any way to forwards additional arguments like case:3 via lambda?
What conversion is causing error in case:3?
In case:4,typename T is deduced to what?
Is there any way to forwards additional arguments like case:3 via lambda?
What conversion is causing error in case:3?
Lambdas with capture list can't convert to function pointer implicitly; lambdas without capture could. You can use std::function instead,
void foo(std::function<void(std::string)> f){
f("Hello world");
}
Or takes the lambda directly like fun does.
In case:4,typename T is deduced to what?
The type would be the unique closure type; the lambda expression is a prvalue expression of that type.
Some details, why this compiles while case 3 does not:
//case:2
static int x =10;
f_ptr ptr2 = [x](std::string s){bar(s,x);};
It compiles since it effectively doesn't capture anything and so the lambda can be bound to the function pointer, which is not allowed for the effective capture case. The standard says:
5.1.1/2 A name in the lambda-capture shall be in scope in the context of the lambda expression, and shall be this or refer to a local
variable or reference with automatic storage duration.
So the behavior for the static variable capturing case is at least not specified, not necessarily undefined behavior though (degree of freedom for the implementation).
Be aware of the fact, that capturing static variables might lead to horrible issues in doubt since a copied value might be expected from the semantics of the capture-list but nothing is copied actually!
Also be aware of the fact, that this issue is the same for global variables (no automatic storage duration)!

Why does a lambda have a size of 1 byte?

I am working with the memory of some lambdas in C++, but I am a bit puzzled by their size.
Here is my test code:
#include <iostream>
#include <string>
int main()
{
auto f = [](){ return 17; };
std::cout << f() << std::endl;
std::cout << &f << std::endl;
std::cout << sizeof(f) << std::endl;
}
The ouptut is:
17
0x7d90ba8f626f
1
This suggests that the size of my lambda is 1.
How is this possible?
Shouldn't the lambda be, at minimum, a pointer to its implementation?
The lambda in question actually has no state.
Examine:
struct lambda {
auto operator()() const { return 17; }
};
And if we had lambda f;, it is an empty class. Not only is the above lambda functionally similar to your lambda, it is (basically) how your lambda is implemented! (It also needs an implicit cast to function pointer operator, and the name lambda is going to be replaced with some compiler-generated pseudo-guid)
In C++, objects are not pointers. They are actual things. They only use up the space required to store the data in them. A pointer to an object can be larger than an object.
While you might think of that lambda as a pointer to a function, it isn't. You cannot reassign the auto f = [](){ return 17; }; to a different function or lambda!
auto f = [](){ return 17; };
f = [](){ return -42; };
the above is illegal. There is no room in f to store which function is going to be called -- that information is stored in the type of f, not in the value of f!
If you did this:
int(*f)() = [](){ return 17; };
or this:
std::function<int()> f = [](){ return 17; };
you are no longer storing the lambda directly. In both of these cases, f = [](){ return -42; } is legal -- so in these cases, we are storing which function we are invoking in the value of f. And sizeof(f) is no longer 1, but rather sizeof(int(*)()) or larger (basically, be pointer sized or larger, as you expect. std::function has a min size implied by the standard (they have to be able to store "inside themselves" callables up to a certain size) which is at least as large as a function pointer in practice).
In the int(*f)() case, you are storing a function pointer to a function that behaves as-if you called that lambda. This only works for stateless lambdas (ones with an empty [] capture list).
In the std::function<int()> f case, you are creating a type-erasure class std::function<int()> instance that (in this case) uses placement new to store a copy of the size-1 lambda in an internal buffer (and, if a larger lambda was passed in (with more state), would use heap allocation).
As a guess, something like these is probably what you think is going on. That a lambda is an object whose type is described by its signature. In C++, it was decided to make lambdas zero cost abstractions over the manual function object implementation. This lets you pass a lambda into a std algorithm (or similar) and have its contents be fully visible to the compiler when it instantiates the algorithm template. If a lambda had a type like std::function<void(int)>, its contents would not be fully visible, and a hand-crafted function object might be faster.
The goal of C++ standardization is high level programming with zero overhead over hand-crafted C code.
Now that you understand that your f is in fact stateless, there should be another question in your head: the lambda has no state. Why does it not size have 0?
There is the short answer.
All objects in C++ must have a minimium size of 1 under the standard, and two objects of the same type cannot have the same address. These are connected, because an array of type T will have the elements placed sizeof(T) apart.
Now, as it has no state, sometimes it can take up no space. This cannot happen when it is "alone", but in some contexts it can happen. std::tuple and similar library code exploits this fact. Here is how it works:
As a lambda is equivalent to a class with operator() overloaded, stateless lambdas (with a [] capture list) are all empty classes. They have sizeof of 1. In fact, if you inherit from them (which is allowed!), they will take up no space so long as it doesn't cause a same-type address collision. (This is known as the empty base optimization).
template<class T>
struct toy:T {
toy(toy const&)=default;
toy(toy &&)=default;
toy(T const&t):T(t) {}
toy(T &&t):T(std::move(t)) {}
int state = 0;
};
template<class Lambda>
toy<Lambda> make_toy( Lambda const& l ) { return {l}; }
the sizeof(make_toy( []{std::cout << "hello world!\n"; } )) is sizeof(int) (well, the above is illegal because you cannot create a lambda in a non-evaluated context: you have to create a named auto toy = make_toy(blah); then do sizeof(blah), but that is just noise). sizeof([]{std::cout << "hello world!\n"; }) is still 1 (similar qualifications).
If we create another toy type:
template<class T>
struct toy2:T {
toy2(toy2 const&)=default;
toy2(T const&t):T(t), t2(t) {}
T t2;
};
template<class Lambda>
toy2<Lambda> make_toy2( Lambda const& l ) { return {l}; }
this has two copies of the lambda. As they cannot share the same address, sizeof(toy2(some_lambda)) is 2!
A lambda is not a function pointer.
A lambda is an instance of a class. Your code is approximately equivalent to:
class f_lambda {
public:
auto operator() { return 17; }
};
f_lambda f;
std::cout << f() << std::endl;
std::cout << &f << std::endl;
std::cout << sizeof(f) << std::endl;
The internal class that represents a lambda has no class members, hence its sizeof() is 1 (it cannot be 0, for reasons adequately stated elsewhere).
If your lambda were to capture some variables, they'll be equivalent to class members, and your sizeof() will indicate accordingly.
Your compiler more or less translates the lambda to the following struct type:
struct _SomeInternalName {
int operator()() { return 17; }
};
int main()
{
_SomeInternalName f;
std::cout << f() << std::endl;
}
Since that struct has no non-static members, it has the same size as an empty struct, which is 1.
That changes as soon as you add a non-empty capture list to your lambda:
int i = 42;
auto f = [i]() { return i; };
Which will translate to
struct _SomeInternalName {
int i;
_SomeInternalName(int outer_i) : i(outer_i) {}
int operator()() { return i; }
};
int main()
{
int i = 42;
_SomeInternalName f(i);
std::cout << f() << std::endl;
}
Since the generated struct now needs to store a non-static int member for the capture, its size will grow to sizeof(int). The size will keep growing as you capture more stuff.
(Please take the struct analogy with a grain of salt. While it's a nice way to reason about how lambdas work internally, this is not a literal translation of what the compiler will do)
Shouldn't the lambda be, at mimumum, a pointer to its implementation?
Not necessarily. According to the standard, the size of the unique, unnamed class is implementation-defined. Excerpt from [expr.prim.lambda], C++14 (emphasis mine):
The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed nonunion class type — called the closure type — whose properties are described below.
[ ... ]
An implementation may define the closure type differently from what is described below provided this does not alter the observable behavior of the program other than by changing:
— the size and/or alignment of the closure type,
— whether the closure type is trivially copyable (Clause 9),
— whether the closure type is a standard-layout class (Clause 9), or
— whether the closure type is a POD class (Clause 9)
In your case -- for the compiler you use -- you get a size of 1, which doesn't mean it's fixed. It can vary between different compiler implementations.
From http://en.cppreference.com/w/cpp/language/lambda:
The lambda expression constructs an unnamed prvalue temporary object of unique unnamed non-union non-aggregate class type, known as closure type, which is declared (for the purposes of ADL) in the smallest block scope, class scope, or namespace scope that contains the lambda expression.
If the lambda-expression captures anything by copy (either implicitly with capture clause [=] or explicitly with a capture that does not include the character &, e.g. [a, b, c]), the closure type includes unnamed non-static data members, declared in unspecified order, that hold copies of all entities that were so captured.
For the entities that are captured by reference (with the default capture [&] or when using the character &, e.g. [&a, &b, &c]), it is unspecified if additional data members are declared in the closure type
From http://en.cppreference.com/w/cpp/language/sizeof
When applied to an empty class type, always returns 1.

lambda capture by value mutable doesn't work with const &?

Consider the following:
void test( const int &value )
{
auto testConstRefMutableCopy = [value] () mutable {
value = 2; // compile error: Cannot assign to a variable captured by copy in a non-mutable lambda
};
int valueCopy = value;
auto testCopyMutableCopy = [valueCopy] () mutable {
valueCopy = 2; // compiles OK
};
}
Why is the first version a compile error when I've declared the lambda as mutable and captured value by value (which I thought made a copy of it)?
Tested with clang (x86_64-apple-darwin14.3.0), which is where the error message comes from, and Visual C++ (vc120).
[C++11: 5.1.2/14]: An entity is captured by copy if it is implicitly captured and the capture-default is = or if it is explicitly captured with a capture that does not include an &. For each entity captured by copy, an unnamed non-static data member is declared in the closure type. The declaration order of these members is unspecified. The type of such a data member is the type of the corresponding captured entity if the entity is not a reference to an object, or the referenced type otherwise. [..]
The type of value inside your lambda is const int, because it was captured by copy from a const int&.
Thus, even though the lambda's call operator function is not const (you marked the lambda mutable), the actual implicit member value is of type const int and cannot be mutated.
Frankly, this seems absurd; I would expect this rule to say that the referenced type loses constness, as it's a copy. The presence or absence of the mutable keyword on the lambda itself (and, thus, the presence or absence of the const keyword on the generated call operator function) should be the only access control here.
In C++14 you can work around this by capturing as [value=value], which uses the same rules as auto and thus drops the const. C++'s great, ain't it?
mutable allows a lambda to modify copy of a non-const parameter captured by copy, but it does not allow it for const parameters.
So this code works (and outputs inside 2 outside 1):
int a = 1;
[a]() mutable {
a = 2; // compiles OK
cout << "inside " << a << "\n";
}();
cout << " outside " << a << "\n";
But if we omit mutable, or make a const int, the compiler gives an error.
In our case, the first lambda gives an error because value is const:
void test( const int &value )
If we make copyValue const:
const int valueCopy = value;
then the same error will occur with the second lambda.

Is it safe to cast a lambda function to a function pointer?

I have this code:
void foo(void (*bar)()) {
bar();
}
int main() {
foo([] {
int x = 2;
});
}
However, I'm worried that this will suffer the same fate as:
struct X { int i; };
void foo(X* x) {
x->i = 2;
}
int main() {
foo(&X());
}
Which takes the address of a local variable.
Is the first example completely safe?
A lambda that captures nothing is implicitly convertible to a function pointer with its same argument list and return type. Only capture-less lambdas can do this; if it captures anything, then they can't.
Unless you're using VS2010, which didn't implement that part of the standard, since it didn't exist yet when they were writing their compiler.
Yes I believe the first example is safe, regardless of the life-time of all the temporaries created during the evaluation of the full-expression that involves the capture-less lambda-expression.
Per the working draft (n3485) 5.1.2 [expr.prim.lambda] p6
The closure type for a lambda-expression with no lambda-capture has a
public non-virtual non-explicit const conversion function to pointer
to function having the same parameter and return types as the closure
type’s function call operator. The value returned by this conversion
function shall be the address of a function that, when invoked, has
the same effect as invoking the closure type’s function call operator.
The above paragraph says nothing about the pointer-to-function's validity expiring after evaluation of the lambda-expression.
For e.g., I would expect the following to work:
auto L = []() {
return [](int x, int y) { return x + y; };
};
int foo( int (*sum)(int, int) ) { return sum(3, 4); }
int main() {
foo( L() );
}
While implementation details of clang are certainly not the final word on C++ (the standard is), if it makes you feel any better, the way this is implemented in clang is that when the lambda expression is parsed and semantically analyzed a closure-type for the lambda expression is invented, and a static function is added to the class with semantics similar to the function call operator of the lambda. So even though the life-time of the lambda object returned by 'L()' is over within the body of 'foo', the conversion to pointer-to-function returns the address of a static function that is still valid.
Consider the somewhat analagous case:
struct B {
static int f(int, int) { return 0; }
typedef int (*fp_t)(int, int);
operator fp_t() const { return &f; }
};
int main() {
int (*fp)(int, int) = B{};
fp(3, 4); // You would expect this to be ok.
}
I am certainly not a core-c++ expert, but FWIW, this is my interpretation of the letter of the standard, and I feel it is defendable.
Hope this helps.
In addition to Nicol's perfectly correct general answer, I would add some views on your particular fears:
However, I'm worried that this will suffer the same fate as ..., which
takes the address of a local variable.
Of course it does, but this is absolutely no problem when you just call it inside foo (in the same way your struct example is perfectly working), since the surrounding function (main in this case) that defined the local variable/lambda will outlive the called function (foo) anyway. It could only ever be a problem if you would safe that local variable or lambda pointer for later use. So
Is the first example completely safe?
Yes, it is, as is the second example, too.

Reference to global functor in another global functor

The following code yields a Segmentation Fault on the y = anotherFunctor() line. As far as I understand, this happens because the globalFunctor variable does not exist when anotherFunctor is created. But why does it work if I replace std::function<int(int)> with GlobalFunctor? How would I fix it?
#include <functional>
struct GlobalFunctor
{
int operator()() const { return 42; }
};
extern GlobalFunctor globalFunctor;
struct AnotherFunctor
{
AnotherFunctor() : g_(globalFunctor) {}
int operator()() const { return g_(); }
const std::function<int()>& g_;
} anotherFunctor;
GlobalFunctor globalFunctor;
int main()
{
AnotherFunctor af;
int x = af();
int y = anotherFunctor();
int z = x + y;
return 0;
}
Edit: I tried compiling this with clang instead of gcc and it warns me about binding reference member 'g_' to a temporary value -- but it crashes when compiling this. Would the cast to std::function create a temporary reference?
At g_(globalFunctor), globalFunctor has to be converted to an std::function because it is of type GlobalFunctor. So a temporary is produced and this is bound to the constant reference. You could think of the code as doing g_(std::function<int()>(globalFunctor)). However, this temporary only lives until the end of the constructor, as there is a special rule in C++ saying that temporaries in member initializer lists only live until the end of the constructor. This leaves you with a dangling reference.
The code works when you replace std::function<int(int)> with GlobalFunctor because no conversion is involved. Therefore, no temporaries are produced and the reference directly refers to the global object.
You either need to not use references and store a std::function internally or make a global std::function and have a reference to that.