Why does a lambda have a size of 1 byte? - c++

I am working with the memory of some lambdas in C++, but I am a bit puzzled by their size.
Here is my test code:
#include <iostream>
#include <string>
int main()
{
auto f = [](){ return 17; };
std::cout << f() << std::endl;
std::cout << &f << std::endl;
std::cout << sizeof(f) << std::endl;
}
The ouptut is:
17
0x7d90ba8f626f
1
This suggests that the size of my lambda is 1.
How is this possible?
Shouldn't the lambda be, at minimum, a pointer to its implementation?

The lambda in question actually has no state.
Examine:
struct lambda {
auto operator()() const { return 17; }
};
And if we had lambda f;, it is an empty class. Not only is the above lambda functionally similar to your lambda, it is (basically) how your lambda is implemented! (It also needs an implicit cast to function pointer operator, and the name lambda is going to be replaced with some compiler-generated pseudo-guid)
In C++, objects are not pointers. They are actual things. They only use up the space required to store the data in them. A pointer to an object can be larger than an object.
While you might think of that lambda as a pointer to a function, it isn't. You cannot reassign the auto f = [](){ return 17; }; to a different function or lambda!
auto f = [](){ return 17; };
f = [](){ return -42; };
the above is illegal. There is no room in f to store which function is going to be called -- that information is stored in the type of f, not in the value of f!
If you did this:
int(*f)() = [](){ return 17; };
or this:
std::function<int()> f = [](){ return 17; };
you are no longer storing the lambda directly. In both of these cases, f = [](){ return -42; } is legal -- so in these cases, we are storing which function we are invoking in the value of f. And sizeof(f) is no longer 1, but rather sizeof(int(*)()) or larger (basically, be pointer sized or larger, as you expect. std::function has a min size implied by the standard (they have to be able to store "inside themselves" callables up to a certain size) which is at least as large as a function pointer in practice).
In the int(*f)() case, you are storing a function pointer to a function that behaves as-if you called that lambda. This only works for stateless lambdas (ones with an empty [] capture list).
In the std::function<int()> f case, you are creating a type-erasure class std::function<int()> instance that (in this case) uses placement new to store a copy of the size-1 lambda in an internal buffer (and, if a larger lambda was passed in (with more state), would use heap allocation).
As a guess, something like these is probably what you think is going on. That a lambda is an object whose type is described by its signature. In C++, it was decided to make lambdas zero cost abstractions over the manual function object implementation. This lets you pass a lambda into a std algorithm (or similar) and have its contents be fully visible to the compiler when it instantiates the algorithm template. If a lambda had a type like std::function<void(int)>, its contents would not be fully visible, and a hand-crafted function object might be faster.
The goal of C++ standardization is high level programming with zero overhead over hand-crafted C code.
Now that you understand that your f is in fact stateless, there should be another question in your head: the lambda has no state. Why does it not size have 0?
There is the short answer.
All objects in C++ must have a minimium size of 1 under the standard, and two objects of the same type cannot have the same address. These are connected, because an array of type T will have the elements placed sizeof(T) apart.
Now, as it has no state, sometimes it can take up no space. This cannot happen when it is "alone", but in some contexts it can happen. std::tuple and similar library code exploits this fact. Here is how it works:
As a lambda is equivalent to a class with operator() overloaded, stateless lambdas (with a [] capture list) are all empty classes. They have sizeof of 1. In fact, if you inherit from them (which is allowed!), they will take up no space so long as it doesn't cause a same-type address collision. (This is known as the empty base optimization).
template<class T>
struct toy:T {
toy(toy const&)=default;
toy(toy &&)=default;
toy(T const&t):T(t) {}
toy(T &&t):T(std::move(t)) {}
int state = 0;
};
template<class Lambda>
toy<Lambda> make_toy( Lambda const& l ) { return {l}; }
the sizeof(make_toy( []{std::cout << "hello world!\n"; } )) is sizeof(int) (well, the above is illegal because you cannot create a lambda in a non-evaluated context: you have to create a named auto toy = make_toy(blah); then do sizeof(blah), but that is just noise). sizeof([]{std::cout << "hello world!\n"; }) is still 1 (similar qualifications).
If we create another toy type:
template<class T>
struct toy2:T {
toy2(toy2 const&)=default;
toy2(T const&t):T(t), t2(t) {}
T t2;
};
template<class Lambda>
toy2<Lambda> make_toy2( Lambda const& l ) { return {l}; }
this has two copies of the lambda. As they cannot share the same address, sizeof(toy2(some_lambda)) is 2!

A lambda is not a function pointer.
A lambda is an instance of a class. Your code is approximately equivalent to:
class f_lambda {
public:
auto operator() { return 17; }
};
f_lambda f;
std::cout << f() << std::endl;
std::cout << &f << std::endl;
std::cout << sizeof(f) << std::endl;
The internal class that represents a lambda has no class members, hence its sizeof() is 1 (it cannot be 0, for reasons adequately stated elsewhere).
If your lambda were to capture some variables, they'll be equivalent to class members, and your sizeof() will indicate accordingly.

Your compiler more or less translates the lambda to the following struct type:
struct _SomeInternalName {
int operator()() { return 17; }
};
int main()
{
_SomeInternalName f;
std::cout << f() << std::endl;
}
Since that struct has no non-static members, it has the same size as an empty struct, which is 1.
That changes as soon as you add a non-empty capture list to your lambda:
int i = 42;
auto f = [i]() { return i; };
Which will translate to
struct _SomeInternalName {
int i;
_SomeInternalName(int outer_i) : i(outer_i) {}
int operator()() { return i; }
};
int main()
{
int i = 42;
_SomeInternalName f(i);
std::cout << f() << std::endl;
}
Since the generated struct now needs to store a non-static int member for the capture, its size will grow to sizeof(int). The size will keep growing as you capture more stuff.
(Please take the struct analogy with a grain of salt. While it's a nice way to reason about how lambdas work internally, this is not a literal translation of what the compiler will do)

Shouldn't the lambda be, at mimumum, a pointer to its implementation?
Not necessarily. According to the standard, the size of the unique, unnamed class is implementation-defined. Excerpt from [expr.prim.lambda], C++14 (emphasis mine):
The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed nonunion class type — called the closure type — whose properties are described below.
[ ... ]
An implementation may define the closure type differently from what is described below provided this does not alter the observable behavior of the program other than by changing:
— the size and/or alignment of the closure type,
— whether the closure type is trivially copyable (Clause 9),
— whether the closure type is a standard-layout class (Clause 9), or
— whether the closure type is a POD class (Clause 9)
In your case -- for the compiler you use -- you get a size of 1, which doesn't mean it's fixed. It can vary between different compiler implementations.

From http://en.cppreference.com/w/cpp/language/lambda:
The lambda expression constructs an unnamed prvalue temporary object of unique unnamed non-union non-aggregate class type, known as closure type, which is declared (for the purposes of ADL) in the smallest block scope, class scope, or namespace scope that contains the lambda expression.
If the lambda-expression captures anything by copy (either implicitly with capture clause [=] or explicitly with a capture that does not include the character &, e.g. [a, b, c]), the closure type includes unnamed non-static data members, declared in unspecified order, that hold copies of all entities that were so captured.
For the entities that are captured by reference (with the default capture [&] or when using the character &, e.g. [&a, &b, &c]), it is unspecified if additional data members are declared in the closure type
From http://en.cppreference.com/w/cpp/language/sizeof
When applied to an empty class type, always returns 1.

Related

Name the type of a template function in C++ [duplicate]

It is necessary for me to use std::function but I don't know what the following syntax means.
std::function<void()> f_name = []() { FNAME(); };
What is the goal of using std::function? Is it to make a pointer to a function?
std::function is a type erasure object. That means it erases the details of how some operations happen, and provides a uniform run time interface to them. For std::function, the primary1 operations are copy/move, destruction, and 'invocation' with operator() -- the 'function like call operator'.
In less abstruse English, it means that std::function can contain almost any object that acts like a function pointer in how you call it.
The signature it supports goes inside the angle brackets: std::function<void()> takes zero arguments and returns nothing. std::function< double( int, int ) > takes two int arguments and returns double. In general, std::function supports storing any function-like object whose arguments can be converted-from its argument list, and whose return value can be converted-to its return value.
It is important to know that std::function and lambdas are different, if compatible, beasts.
The next part of the line is a lambda. This is new syntax in C++11 to add the ability to write simple function-like objects -- objects that can be invoked with (). Such objects can be type erased and stored in a std::function at the cost of some run time overhead.
[](){ code } in particular is a really simple lambda. It corresponds to this:
struct some_anonymous_type {
some_anonymous_type() {}
void operator()const{
code
}
};
an instance of the above simple pseudo-function type. An actual class like the above is "invented" by the compiler, with an implementation defined unique name (often including symbols that no user-defined type can contain) (I do not know if it is possible that you can follow the standard without inventing such a class, but every compiler I know of actually creates the class).
The full lambda syntax looks like:
[ capture_list ]( argument_list )
-> return_type optional_mutable
{
code
}
But many parts can be omitted or left empty. The capture_list corresponds to both the constructor of the resulting anonymous type and its member variables, the argument_list the arguments of the operator(), and the return type the return type. The constructor of the lambda instance is also magically called when the instance is created with the capture_list.
[ capture_list ]( argument_list ) -> return_type { code }
basically becomes
struct some_anonymous_type {
// capture_list turned into member variables
some_anonymous_type( /* capture_list turned into arguments */ ):
/* member variables initialized */
{}
return_type operator()( argument_list ) const {
code
}
};
Note that in c++20 template arguments were added to lambdas, and that isn't covered above.
[]<typename T>( std::vector<T> const& v ) { return v.size(); }
1 In addition, RTTI is stored (typeid), and the cast-back-to-original-type operation is included.
Let's break the line apart:
std::function
This is a declaration for a function taking no parameters, and returning no value. If the function returned an int, it would look like this:
std::function<int()>
Likewise, if it took an int parameter as well:
std::function<int(int)>
I suspect your main confusion is the next part.
[]() { FNAME(); };
The [] part is called a capture clause. Here you put variables that are local to the declaration of your lambda, and that you want to be available within the lambda function itself. This is saying "I don't want anything to be captured". If this was within a class definition and you wanted the class to be available to the lambda, you might do:
[this]() { FNAME(); };
The next part, is the parameters being passed to the lambda, exactly the same as if it was a regular function. As mentioned earlier, std::function<void()> is a signature pointing to a method that takes no parameters, so this is empty also.
The rest of it is the body of the lambda itself, as if it was a regular function, which we can see just calls the function FNAME.
Another Example
Let's say you had the following signature, that is for something that can sum two numbers.
std::function<int(int, int)> sumFunc;
We could now declare a lambda thusly:
sumFunc = [](int a, int b) { return a + b; };
Not sure if you're using MSVC, but here's a link anyway to the lamda expression syntax:
http://msdn.microsoft.com/en-us/library/dd293603.aspx
Lambdas with captures (stateful lambdas) cannot be assigned to each other since they have unique types, even if they look exactly the same.
To be able to store and pass around lambdas with captures, we can use "std::function" to hold a function object constructed by a lambda expression.
Basically "std::function" is, to be able to assign lambda functions with different content structures to a lambda function object.
Exp :
auto func = [](int a){
cout << "a:" << a << endl;
};
func(40);
//
int x = 10;
func = [x](int a){ //ATTENTION(ERROR!): assigning a new structure to the same object
cout << "x:" << x << ",a:" << a << endl;
};
func(2);
So the above usage will be incorrect.
But if we define a function object with "std::function":
auto func = std::function<void(int)>{};
func = [](int a){
cout << "a:" << a << endl;
};
func(40);
//
int x = 10;
func = [x](int a){ //CORRECT. because of std::function
//...
};
int y = 11;
func = [x,y](int a){ //CORRECT
//...
};

Are function pointers function objects in C++?

The C++ standard defines function objects as:
A function object type is an object type that can be the type of the
postfix-expression in a function call. (link)
First I was thinking that function objects were functors, but then I realized that for a function pointer ptr of type P (not a function, but a function pointer), std::is_object_v<P> is true and can be called with the ptr(Args...) syntax.
I am right that function pointers are considered as function objects by the standard? And if they are not what part of the definition is not satisfied by function pointers?
Yes, they are. The term "object" in C++ standard does not mean "object" in the OOP sense. An int is an object.
Function pointer is what it sounds like: a pointer to function. As itself it's a storage containing a pointer object, which returns a callable of function type.
If you take time and read first chapters of standard, you 'll understand that any variable declaration declares some type of storage that contains objects. Those can be objects of primitive types or classes. Essentially in C++ anything that can be stored is object.
By declaring function pointer you create storage that can store address of that function and operator() can be used
Anther type of callable closure can be created by lambda expression. They are not function objects, each expression creates a unique callable object, but captureless lambdas can be used as one, e.g. to assign it to a function pointer, e.g.
double (*square)(double) = [](double a)->double { return a*a; };
after this you can call it using expression like square(3.6);
For functions and lambda call operator operator() is supplied by language, by defining operator() for a class you create what people often call "functor", which is misnomer because actual functors in mathematics or such languages like Haskell do not store state. Result of lambda expression is a "functor" created by compiler, which stores states of captured objects.
Naming those objects callable might be a little misleading too, because as a concept in C++, a callable object is any object that can be used with INVOKE operation, which include pointers to data members even while no function calls happen.
it leaves only one option, if we can use function call with said object, it's a function object. It can be function, lambda expression, function object, function pointer, member function pointer with specified class instance ( obj.*memberptr or objptr->*memberptr - call of member function is very special) - they are function objects.
The function pointer are function object, but they also have a funny quirk coming from C era:
#include <iostream>
int foo(int a)
{
std::cout << "Hello, " << a << " stars\n";
return a;
}
int (*pr) (int) = foo;
int main()
{
pr(0);
(*pr)(1);
(**pr)(2);
(***pr)(3);
return 0;
}
The postfix expression shall have function type or pointer to function type and function is contextually convertible to a function pointer whenever such conversion is required, so repeating dereference operators bounce type "to and fro", leaving it be a function pointer. Captureless lambda would do too:
// also can write `auto pr2 = ...`
int (*pr2) (int) = [](int a)->int {
std::cout << "Hello, lambda " << a << " stars\n"; return a;
};
// in main()
(**pr2)(2);
Naturally, that code is ill-formed if pr is a generalized "functor"
int b = 2;
auto pr3 = [=]()->int {
std::cout << "Hello, general lambda " << b << std::endl;
return b;
};
pr3();
//(*pr3)(); Ill-formed: not a pointer!

How to define a lambda expression without both std::function and auto?

I have read the item31 of "Effective Modern C++" and web page of http://en.cppreference.com/w/cpp/language/lambda and wonder if I can define a lambda by its definite type instead of the wrapped type of std::function or keyword of auto and how can I accomplish that.
for instance, for the type int:
auto x_1 = 5; // type deduction
int x_2 = 5; // defined by definite type
// both x_1, x_2 are int variables of value 5
now, when the problem comes to the lambda:
auto f_1_0 = []()->int{return 5;};
std::function<int(void)> f_1_1 = []()->int{return 5;};
SomeType f_2 = []()->int{return 5;}; // what's the SomeType here?
Each lambda expression has its own unique type.
Here the expressions f_1 and f2 have different types.
auto f_1 = []()->int {return 5; };
auto f_2 = []()->int {return 5; };
Assigning f_2 = f_1 is illegal.
The standard says the types are "unnamed." In practice, the compiler probably makes up a new, hiiden typename for each lambda. Visual C++17 gave them the following names.
classmain::<lambda_7e9d7fb093569d78a8c871761cbb39d7>
classmain::<lambda_8f061a3967cd210147d6a4978ab6e125>
Not very useful information.
The standard says that the type of the lambda is unnamed so the implementation creates a implementation defined name it uses similar to the other unnamed classes, structs enumc etc.
ISO C++: 5.1.2 Lambda expressions [expr.prim.lambda]
3 The type of the lambda-expression (which is also the type of the closure object) is a unique, unnamed non-union class type — called the closure type — whose properties are described below.
The standard also says that the lambda 'behaves like a function' so you it could be used with the std::function template:
[Note: A closure object behaves like a function object (20.9).—end note]
And if you really want to have names for your can use the old-fashioned functors and do all the work the compiler do for you with the lambdas.
in some cases you can use function pointers (using mvcpp, 0x17):
auto usingAuto = []() {
cout << "autoMagick" << endl;
};
void(*pureCpp)() = []() {
cout << "pureCpp" << endl;
};
//pureCpp = usingAuto; //you can even assing function pointers from lambdas
//usingAuto = pureCpp; //error
pureCpp();
usingAuto();

Where are lambda captured variables stored?

How is it possible that this example works? It prints 6:
#include <iostream>
#include <functional>
using namespace std;
void scopeIt(std::function<int()> &fun) {
int val = 6;
fun = [=](){return val;}; //<-- this
}
int main() {
std::function<int()> fun;
scopeIt(fun);
cout << fun();
return 0;
}
Where is the value 6 stored after scopeIt is done being called? If I replace the [=] with a [&], it prints 0 instead of 6.
It is stored within the closure, which - in your code - is then stored within std::function<int()> &fun.
A lambda generates what's equivalent to an instance of a compiler generated class.
This code:
[=](){return val;}
Generates what's effectively equivalent to this... this would be the "closure":
struct UNNAMED_TYPE
{
UNNAMED_TYPE(int val) : val(val) {}
const int val;
// Above, your [=] "equals/copy" syntax means "find what variables
// are needed by the lambda and copy them into this object"
int operator() () const { return val; }
// Above, here is the code you provided
} (val);
// ^^^ note that this DECLARED type is being INSTANTIATED (constructed) too!!
Lambdas in C++ are really just "anonymous" struct functors. So when you write this:
int val = 6;
fun = [=](){return val;};
What the compiler is translating that into is this:
int val = 6;
struct __anonymous_struct_line_8 {
int val;
__anonymous_struct_line_8(int v) : val(v) {}
int operator() () const {
return val; // returns this->val
}
};
fun = __anonymous_struct_line_8(val);
Then, std::function stores that functor via type erasure.
When you use [&] instead of [=], it changes the struct to:
struct __anonymous_struct_line_8 {
int& val; // Notice this is a reference now!
...
So now the object stores a reference to the function's val object, which becomes a dangling (invalid) reference after the function exits (and you get undefined behavior).
The so-called closure type (which is the class type of the lambda expression) has members for each captured entity. Those members are objects for capture by value, and references for capture by reference. They are initialized with the captured entities and live independently within the closure object (the particular object of closure type that this lambda designates).
The unnamed member that corresponds to the value capture of val is initialized with val and accessed from the inside of the closure types operator(), which is fine. The closure object may easily have been copied or moved multiple times until that happens, and that's fine too - closure types have implicitly defined move and copy constructors just as normal classes do.
However, when capturing by reference, the lvalue-to-rvalue conversion that is implicitly performed when calling fun in main induces undefined behavior as the object which the reference member referred to has already been destroyed - i.e. we are using a dangling reference.
The value of a lambda expression is an object of class type, and
For each entity
captured by copy, an unnamed non-static data member is declared in the closure type.
([expr.prim.lambda]/14 in C++11)
That is, the object created by the lambda
[=](){return val;}
actually contains a non-static member of int type, whose value is 6, and this object is copied into the std::function object.

What is the purpose of std::function and how to use it?

It is necessary for me to use std::function but I don't know what the following syntax means.
std::function<void()> f_name = []() { FNAME(); };
What is the goal of using std::function? Is it to make a pointer to a function?
std::function is a type erasure object. That means it erases the details of how some operations happen, and provides a uniform run time interface to them. For std::function, the primary1 operations are copy/move, destruction, and 'invocation' with operator() -- the 'function like call operator'.
In less abstruse English, it means that std::function can contain almost any object that acts like a function pointer in how you call it.
The signature it supports goes inside the angle brackets: std::function<void()> takes zero arguments and returns nothing. std::function< double( int, int ) > takes two int arguments and returns double. In general, std::function supports storing any function-like object whose arguments can be converted-from its argument list, and whose return value can be converted-to its return value.
It is important to know that std::function and lambdas are different, if compatible, beasts.
The next part of the line is a lambda. This is new syntax in C++11 to add the ability to write simple function-like objects -- objects that can be invoked with (). Such objects can be type erased and stored in a std::function at the cost of some run time overhead.
[](){ code } in particular is a really simple lambda. It corresponds to this:
struct some_anonymous_type {
some_anonymous_type() {}
void operator()const{
code
}
};
an instance of the above simple pseudo-function type. An actual class like the above is "invented" by the compiler, with an implementation defined unique name (often including symbols that no user-defined type can contain) (I do not know if it is possible that you can follow the standard without inventing such a class, but every compiler I know of actually creates the class).
The full lambda syntax looks like:
[ capture_list ]( argument_list )
-> return_type optional_mutable
{
code
}
But many parts can be omitted or left empty. The capture_list corresponds to both the constructor of the resulting anonymous type and its member variables, the argument_list the arguments of the operator(), and the return type the return type. The constructor of the lambda instance is also magically called when the instance is created with the capture_list.
[ capture_list ]( argument_list ) -> return_type { code }
basically becomes
struct some_anonymous_type {
// capture_list turned into member variables
some_anonymous_type( /* capture_list turned into arguments */ ):
/* member variables initialized */
{}
return_type operator()( argument_list ) const {
code
}
};
Note that in c++20 template arguments were added to lambdas, and that isn't covered above.
[]<typename T>( std::vector<T> const& v ) { return v.size(); }
1 In addition, RTTI is stored (typeid), and the cast-back-to-original-type operation is included.
Let's break the line apart:
std::function
This is a declaration for a function taking no parameters, and returning no value. If the function returned an int, it would look like this:
std::function<int()>
Likewise, if it took an int parameter as well:
std::function<int(int)>
I suspect your main confusion is the next part.
[]() { FNAME(); };
The [] part is called a capture clause. Here you put variables that are local to the declaration of your lambda, and that you want to be available within the lambda function itself. This is saying "I don't want anything to be captured". If this was within a class definition and you wanted the class to be available to the lambda, you might do:
[this]() { FNAME(); };
The next part, is the parameters being passed to the lambda, exactly the same as if it was a regular function. As mentioned earlier, std::function<void()> is a signature pointing to a method that takes no parameters, so this is empty also.
The rest of it is the body of the lambda itself, as if it was a regular function, which we can see just calls the function FNAME.
Another Example
Let's say you had the following signature, that is for something that can sum two numbers.
std::function<int(int, int)> sumFunc;
We could now declare a lambda thusly:
sumFunc = [](int a, int b) { return a + b; };
Not sure if you're using MSVC, but here's a link anyway to the lamda expression syntax:
http://msdn.microsoft.com/en-us/library/dd293603.aspx
Lambdas with captures (stateful lambdas) cannot be assigned to each other since they have unique types, even if they look exactly the same.
To be able to store and pass around lambdas with captures, we can use "std::function" to hold a function object constructed by a lambda expression.
Basically "std::function" is, to be able to assign lambda functions with different content structures to a lambda function object.
Exp :
auto func = [](int a){
cout << "a:" << a << endl;
};
func(40);
//
int x = 10;
func = [x](int a){ //ATTENTION(ERROR!): assigning a new structure to the same object
cout << "x:" << x << ",a:" << a << endl;
};
func(2);
So the above usage will be incorrect.
But if we define a function object with "std::function":
auto func = std::function<void(int)>{};
func = [](int a){
cout << "a:" << a << endl;
};
func(40);
//
int x = 10;
func = [x](int a){ //CORRECT. because of std::function
//...
};
int y = 11;
func = [x,y](int a){ //CORRECT
//...
};