Can lambdas translate into functions? - c++

Common knowledge dictated that lambda functions are functors under the hood.
In this video (# about 45:43) Bjarne says:
I mentioned that a lambda translates into a function object, or into a function if that's convenient
I can see how this is a compiler optimization (ie it doesn't change the perception of lambdas as unnamed functors which means that eg lambdas still won't overload) but are there any rules that specify when this is applicable?
Edit
The way I understand the term translate (and that's what I'm asking about) has nothing to do with conversion (I'm not asking whether lambdas are convertible to function ptr etc). By translate I mean "compile lambda expressions into functions instead of function objects".
As mentioned in cppreference :
The lambda expression constructs an unnamed prvalue temporary object of unique unnamed non-union non-aggregate type, known as closure type.
The question is : can this object be ommited and have a plain function instead? If yes, then when and how?
Note: I imagine one such rule being "don't capture anything" but I can't find any reliable sources to confirm it

TLDR: if you only use lambda to convert it to a function pointer (and only invoke it via that function pointer), it is always profitable to omit the closure object. Optimizations which enable this are inlining and dead code elimination. If you do use the lambda itself, it is still possible to optimize the closure away, but requires somewhat more aggressive interprocedural optimization.
I will now try to show how that works under the hood. I will use GCC in my examples, because I'm more familiar with it. Other compilers should do something similar.
Consider this code:
#include <stdio.h>
typedef int (* fnptr_t)(int);
void use_fnptr(fnptr_t fn)
{
printf("fn=%p, fn(1)=%d\n", fn, fn(1));
}
int main()
{
auto lam = [] (int x) { return x + 1; };
use_fnptr((fnptr_t)lam);
}
Now, I compile it and dump intermediate representation (for versions prior to 6, you should add -std=c++11):
g++ test.cc -fdump-tree-ssa
A little cleaned-up and edited (for brevity) dump looks like this:
// _ZZ4mainENKUliE_clEi
main()::<lambda(int)> (const struct __lambda0 * const __closure, int x)
{
return x_1(D) + 1;
}
// _ZZ4mainENUliE_4_FUNEi
static int main()::<lambda(int)>::_FUN(int) (int D.2780)
{
return main()::<lambda(int)>::operator() (0B, _2(D));
}
// _ZZ4mainENKUliE_cvPFiiEEv
main()::<lambda(int)>::operator int (*)(int)() const (const struct __lambda0 * const this)
{
return _FUN;
}
int main() ()
{
struct __lambda0 lam;
int (*<T5c1>) (int) _3;
_3 = main()::<lambda(int)>::operator int (*)(int) (&lam);
use_fnptr (_3);
}
That is, lambda has 2 member functions: function call operator and a conversion operator and one static member function _FUN, which simply invokes lambda with this set to zero. main calls the conversion operator and passes the result to use_fnptr - exactly as it is written in the source code.
I can write:
extern "C" int _ZZ4mainENKUliE_clEi(void *, int);
int main()
{
auto lam = [] (int x) { return x + 1; };
use_fnptr((fnptr_t)lam);
printf("%d %d %d\n", lam(10), _ZZ4mainENKUliE_clEi(&lam, 11), __lambda0::_FUN(12));
printf("%p %p\n", &__lambda0::_FUN, (fnptr_t)lam);
return 0;
}
This program outputs:
fn=0x4005fc, fn(1)=2
11 12 13
0x4005fc 0x4005fc
Now, I think it's pretty obvious, that the compiler should inline lambda (_ZZ4mainENKUliE_clEi) into _FUN (_ZZ4mainENUliE_4_FUNEi), because _FUN is the only caller. And inline operator int (*)(int) into main (because this operator simply returns a constant). GCC does exactly this, when compiling with optimization (-O). You can check this like:
g++ test.cc -O -fdump-tree-einline
Dump file:
// Considering inline candidate main()::<lambda(int)>.
// Inlining main()::<lambda(int)> into static int main():<lambda(int)>::_FUN(int).
static int main()::<lambda(int)>::_FUN(int) (int D.2822)
{
return _2(D) + 1;
}
The closure object is gone. Now, a more complicated case, when lambda itself is used (not a function pointer). Consider:
#include <stdio.h>
#define PRINT(x) printf("%d", (x))
#define PRINT1(x) PRINT(x); PRINT(x); PRINT(x); PRINT(x);
#define PRINT2(x) do { PRINT1(x) PRINT1(x) PRINT1(x) PRINT1(x) } while(0)
__attribute__((noinline)) void use_lambda(auto t)
{
t(1);
}
int main()
{
auto lam = [] (int x) { PRINT2(x); };
use_lambda(lam);
return 0;
}
GCC will not inline lambda, because it is rather huge (that is what I used printf's for):
g++ test2.cc -O2 -fdump-ipa-inline -fdump-tree-einline -fdump-tree-esra
Early inliner's dump:
Considering inline candidate main()::<lambda(int)>
will not early inline: void use_lambda(auto:1) [with auto:1 = main()::<lambda(int)>]/16->main()::<lambda(int)>/19, growth 46 exceeds --param early-inlining-insns
But "early interprocedural scalar replacement of aggregates" pass will do what we want:
;; Function main()::<lambda(int)> (_ZZ4mainENKUliE_clEi, funcdef_no=14, decl_uid=2815, cgraph_uid=12, symbol_order=12)
IPA param adjustments: 0. base_index: 0 - __closure, base: __closure, remove_param
1. base_index: 1 - x, base: x, copy_param
The first parameter (i.e., closure) is not used, and it gets removed. Unfortunately interprocedural SRA is not able to optimize away indirection, which is introduced for captured values (though there are cases when it would be obviously profitable), so there is still some room for enhancements.

From Lambda expressions §5.1.2 p6 (draft N4140)
The closure type for a non-generic lambda-expression with no lambda-capture has a public non-virtual non-
explicit const conversion function to pointer to function with C ++ language linkage having the same
parameter and return types as the closure type’s function call operator.

The standard quote has already been posted, I want to add some examples.
You can assign lambdas to function pointers as long as there are no captured variables:
Legal:
int (*f)(int) = [] (int x) { return x + 1; }; // assign lambda to function pointer
int z = f(3); // use the function pointer
Illegal:
int y = 5;
int (*g)(int) = [y] (int x) { return x + y; }; // error
Legal:
int y = 5;
int z = ([y] (int x) { return x + y; })(2); // use lambda directly
(Edit)
Since we can not ask Bjarne what he meant exactly, I want to try a few interpretations.
"translate" meaning "convert"
This is what I understood initially, but it is clear now that the question is not about this possible meaning.
"translate" as used in the C++ standard, meaning "compile" (more or less)
As Sebastian Redl already commented, there are no function objects on the binary level. There is just opcodes and data, and the standard does not talk about, or specify, any binary formats.
"translate" meaning "being semantically equivalent"
This would imply that if A and B are semantically equivalent, the produced binary code for A and B could be the same. The rest of my answer uses this interpretation.
A closure consists of two parts:
the statements in the lambda body, "code"
the captured variable values or references, "data"
This is equivalent to a functor, as already stated in the question.
Functors can be seen as a subset of objects, because they have code and data, but only one member function: the call operator. So closures could be seen as semantically equivalent to a restricted form of objects.
A function on the other hand, has no data associated with it. There are the arguments of course, but these must be supplied by the caller and can change from one invocation to the other. This is a semantic difference to a closure, where the bound variables can not be changed and are not supplied by the caller.
A member function is not something independent, as it can not work without its object, so I think the question refers to a freestanding function.
So no, a lambda is in general not semantically equivalent to a function.
There is the obvious special case of a lambda with no captured variables, where the functor consists only of the code, and this is equivalent to a function.
But, a lambda could be said to be semantically equivalent to a set of functions. Each possible closure (distinct combination of values/references for the bound variables) would be equivalent to one function in that set.
Of course this can only be useful when the bound variables can only have a very limited set of values / are references to only a few different variables, if at all.
For example I see no reason why a compiler could not treat the following two snippets as (almost*) equivalent:
void Test(bool cond, int x)
{
int y;
if(cond) y = 5;
else y = 3;
auto f = [y](int x) { return x + y; };
// more code that
// uses f
}
A clever compiler could see that y can only have the values 5 or 3, and compile as if it would be written like this:
int F1(int x)
{
return x + 5;
}
int F2(int x)
{
return x + 3;
}
void Test(bool cond, int x)
{
int (*f)(int);
if(cond) f = F1;
else f = F2;
// more code that
// uses f
}
(*) Of course it depends on what more code that uses f does exactly.
Another (maybe better) example would be a lambda that always binds the same variable by reference. Then, there is only one possible closure, and so it is equivalent to a function, if the function has access to that variable by other means than by passing it as an argument.
Another observation that might be helpful is that asking
can this object [closure] be ommited and have a plain function instead? If yes,
then when and how?
is more or less the same as asking when and how a member function can be used without the object. Since lambdas are functors, and functors are objects, the two questions are closely related.
The bound variables of the lambda correspond to the data members of the object, and the lambda body corresponds to the body of the member function.

To give another kind of insight, let have a look to the code produced by clang when compiling the following snippet:
int (*f) = []() { return 0; }
If you compile this with:
clang++ -std=c++11 -S -o- -emit-llvm a.cc
You get the following LLVM bytecode for the lambda definition:
define internal i32 #"_ZNK3$_0clEv"(%class.anon* %this) #0 align 2 {
%1 = alloca %class.anon*, align 8
store %class.anon* %this, %class.anon** %1, align 8
%2 = load %class.anon** %1
ret i32 0
}
define internal i32 #"_ZN3$_08__invokeEv"() #1 align 2 {
%1 = call i32 #"_ZNK3$_0clEv"(%class.anon* undef)
ret i32 %1
}
The first function takes an instance of %class.anon* and return 0: that's the call operator. The second creates an instance of this class (undef) and then call its call operator and return the value.
When compiled with -O2, the whole lambda is turned into:
define internal i32 #"_ZN3$_08__invokeEv"() #0 align 2 {
ret i32 0
}
So that's a single function that returns 0.
I mentioned that a lambda translates into a function object, or into a function if that's convenient
That's exactly what clang does! It transforms the lambda into a function object, and when possible optimize it to a function.

No, it can't be done. Lambdas are defined to be functors, and I don't see the as-if rule helping here.
[C++14: 5.1.2/6]: The closure type for a non-generic lambda-expression with no lambda-capture has a public non-virtual non-explicit const conversion function to pointer to function with C++ language linkage (7.5) having the same parameter and return types as the closure type’s function call operator. [..]
…followed by similar wording for generic lambdas.

Related

Why design a language with unique anonymous types?

This is something that has always been bugging me as a feature of C++ lambda expressions: The type of a C++ lambda expression is unique and anonymous, I simply cannot write it down. Even if I create two lambdas that are syntactically exactly the same, the resulting types are defined to be distinct. The consequence is, that a) lambdas can only be passed to template functions that allow the compile time, unspeakable type to be passed along with the object, and b) that lambdas are only useful once they are type erased via std::function<>.
Ok, but that's just the way C++ does it, I was ready to write it off as just an irksome feature of that language. However, I just learned that Rust seemingly does the same: Each Rust function or lambda has a unique, anonymous type. And now I'm wondering: Why?
So, my question is this:
What is the advantage, from a language designer point of view, to introduce the concept of a unique, anonymous type into a language?
Many standards (especially C++) take the approach of minimizing how much they demand from compilers. Frankly, they demand enough already! If they don't have to specify something to make it work, they have a tendency to leave it implementation defined.
Were lambdas to not be anonymous, we would have to define them. This would have to say a great deal about how variables are captured. Consider the case of a lambda [=](){...}. The type would have to specify which types actually got captured by the lambda, which could be non-trivial to determine. Also, what if the compiler successfully optimizes out a variable? Consider:
static const int i = 5;
auto f = [i]() { return i; }
An optimizing compiler could easily recognize that the only possible value of i that could be captured is 5, and replace this with auto f = []() { return 5; }. However, if the type is not anonymous, this could change the type or force the compiler to optimize less, storing i even though it didn't actually need it. This is a whole bag of complexity and nuance that simply isn't needed for what lambdas were intended to do.
And, on the off-case that you actually do need a non-anonymous type, you can always construct the closure class yourself, and work with a functor rather than a lambda function. Thus, they can make lambdas handle the 99% case, and leave you to code your own solution in the 1%.
Deduplicator pointed out in comments that I did not address uniqueness as much as anonymity. I am less certain of the benefits of uniqueness, but it is worth noting that the behavior of the following is clear if the types are unique (action will be instantiated twice).
int counter()
{
static int count = 0;
return count++;
}
template <typename FuncT>
void action(const FuncT& func)
{
static int ct = counter();
func(ct);
}
...
for (int i = 0; i < 5; i++)
action([](int j) { std::cout << j << std::endl; });
for (int i = 0; i < 5; i++)
action([](int j) { std::cout << j << std::endl; });
If the types were not unique, we would have to specify what behavior should happen in this case. That could be tricky. Some of the issues that were raised on the topic of anonymity also raise their ugly head in this case for uniqueness.
Lambdas are not just functions, they are a function and a state. Therefore both C++ and Rust implement them as an object with a call operator (operator() in C++, the 3 Fn* traits in Rust).
Basically, [a] { return a + 1; } in C++ desugars to something like
struct __SomeName {
int a;
int operator()() {
return a + 1;
}
};
then using an instance of __SomeName where the lambda is used.
While in Rust, || a + 1 in Rust will desugar to something like
{
struct __SomeName {
a: i32,
}
impl FnOnce<()> for __SomeName {
type Output = i32;
extern "rust-call" fn call_once(self, args: ()) -> Self::Output {
self.a + 1
}
}
// And FnMut and Fn when necessary
__SomeName { a }
}
This means that most lambdas must have different types.
Now, there are a few ways we could do that:
With anonymous types, which is what both languages implement. Another consequence of that is that all lambdas must have a different type. But for language designers, this has a clear advantage: Lambdas can be simply described using other already existing simpler parts of the language. They are just syntax sugar around already existing bits of the language.
With some special syntax for naming lambda types: This is however not necessary since lambdas can already be used with templates in C++ or with generics and the Fn* traits in Rust. Neither language ever force you to type-erase lambdas to use them (with std::function in C++ or Box<Fn*> in Rust).
Also note that both languages do agree that trivial lambdas that do not capture context can be converted to function pointers.
Describing complex features of a languages using simpler feature is pretty common. For example both C++ and Rust have range-for loops, and they both describe them as syntax sugar for other features.
C++ defines
for (auto&& [first,second] : mymap) {
// use first and second
}
as being equivalent to
{
init-statement
auto && __range = range_expression ;
auto __begin = begin_expr ;
auto __end = end_expr ;
for ( ; __begin != __end; ++__begin) {
range_declaration = *__begin;
loop_statement
}
}
and Rust defines
for <pat> in <head> { <body> }
as being equivalent to
let result = match ::std::iter::IntoIterator::into_iter(<head>) {
mut iter => {
loop {
let <pat> = match ::std::iter::Iterator::next(&mut iter) {
::std::option::Option::Some(val) => val,
::std::option::Option::None => break
};
SemiExpr(<body>);
}
}
};
which while they seem more complicated for a human, are both simpler for a language designer or a compiler.
Cort Ammon's accepted answer is good, but I think there's one more important point to make about implementability.
Suppose I have two different translation units, "one.cpp" and "two.cpp".
// one.cpp
struct A { int operator()(int x) const { return x+1; } };
auto b = [](int x) { return x+1; };
using A1 = A;
using B1 = decltype(b);
extern void foo(A1);
extern void foo(B1);
The two overloads of foo use the same identifier (foo) but have different mangled names. (In the Itanium ABI used on POSIX-ish systems, the mangled names are _Z3foo1A and, in this particular case, _Z3fooN1bMUliE_E.)
// two.cpp
struct A { int operator()(int x) const { return x + 1; } };
auto b = [](int x) { return x + 1; };
using A2 = A;
using B2 = decltype(b);
void foo(A2) {}
void foo(B2) {}
The C++ compiler must ensure that the mangled name of void foo(A1) in "two.cpp" is the same as the mangled name of extern void foo(A2) in "one.cpp", so that we can link the two object files together. This is the physical meaning of two types being "the same type": it's essentially about ABI-compatibility between separately compiled object files.
The C++ compiler is not required to ensure that B1 and B2 are "the same type." (In fact, it's required to ensure that they're different types; but that's not as important right now.)
What physical mechanism does the compiler use to ensure that A1 and A2 are "the same type"?
It simply burrows through typedefs, and then looks at the fully qualified name of the type. It's a class type named A. (Well, ::A, since it's in the global namespace.) So it's the same type in both cases. That's easy to understand. More importantly, it's easy to implement. To see if two class types are the same type, you take their names and do a strcmp. To mangle a class type into a function's mangled name, you write the number of characters in its name, followed by those characters.
So, named types are easy to mangle.
What physical mechanism might the compiler use to ensure that B1 and B2 are "the same type," in a hypothetical world where C++ required them to be the same type?
Well, it couldn't use the name of the type, because the type doesn't have a name.
Maybe it could somehow encode the text of the body of the lambda. But that would be kind of awkward, because actually the b in "one.cpp" is subtly different from the b in "two.cpp": "one.cpp" has x+1 and "two.cpp" has x + 1. So we'd have to come up with a rule that says either that this whitespace difference doesn't matter, or that it does (making them different types after all), or that maybe it does (maybe the program's validity is implementation-defined, or maybe it's "ill-formed no diagnostic required"). Anyway, mangling lambda types the same way across multiple translation units is certainly a harder problem than mangling named types like A.
The easiest way out of the difficulty is simply to say that each lambda expression produces values of a unique type. Then two lambda types defined in different translation units definitely are not the same type. Within a single translation unit, we can "name" lambda types by just counting from the beginning of the source code:
auto a = [](){}; // a has type $_0
auto b = [](){}; // b has type $_1
auto f(int x) {
return [x](int y) { return x+y; }; // f(1) and f(2) both have type $_2
}
auto g(float x) {
return [x](int y) { return x+y; }; // g(1) and g(2) both have type $_3
}
Of course these names have meaning only within this translation unit. This TU's $_0 is always a different type from some other TU's $_0, even though this TU's struct A is always the same type as some other TU's struct A.
By the way, notice that our "encode the text of the lambda" idea had another subtle problem: lambdas $_2 and $_3 consist of exactly the same text, but they should clearly not be considered the same type!
By the way, C++ does require the compiler to know how to mangle the text of an arbitrary C++ expression, as in
template<class T> void foo(decltype(T())) {}
template void foo<int>(int); // _Z3fooIiEvDTcvT__EE, not _Z3fooIiEvT_
But C++ doesn't (yet) require the compiler to know how to mangle an arbitrary C++ statement. decltype([](){ ...arbitrary statements... }) is still ill-formed even in C++20.
Also notice that it's easy to give a local alias to an unnamed type using typedef/using. I have a feeling that your question might have arisen from trying to do something that could be solved like this.
auto f(int x) {
return [x](int y) { return x+y; };
}
// Give the type an alias, so I can refer to it within this translation unit
using AdderLambda = decltype(f(0));
int of_one(AdderLambda g) { return g(1); }
int main() {
auto f1 = f(1);
assert(of_one(f1) == 2);
auto f42 = f(42);
assert(of_one(f42) == 43);
}
EDITED TO ADD: From reading some of your comments on other answers, it sounds like you're wondering why
int add1(int x) { return x + 1; }
int add2(int x) { return x + 2; }
static_assert(std::is_same_v<decltype(add1), decltype(add2)>);
auto add3 = [](int x) { return x + 3; };
auto add4 = [](int x) { return x + 4; };
static_assert(not std::is_same_v<decltype(add3), decltype(add4)>);
That's because captureless lambdas are default-constructible. (In C++ only as of C++20, but it's always been conceptually true.)
template<class T>
int default_construct_and_call(int x) {
T t;
return t(x);
}
assert(default_construct_and_call<decltype(add3)>(42) == 45);
assert(default_construct_and_call<decltype(add4)>(42) == 46);
If you tried default_construct_and_call<decltype(&add1)>, t would be a default-initialized function pointer and you'd probably segfault. That's, like, not useful.
(Adding to Caleth's answer, but too long to fit in a comment.)
The lambda expression is just syntactic sugar for an anonymous struct (a Voldemort type, because you can't say its name).
You can see the similarity between an anonymous struct and the anonymity of a lambda in this code snippet:
#include <iostream>
#include <typeinfo>
using std::cout;
int main() {
struct { int x; } foo{5};
struct { int x; } bar{6};
cout << foo.x << " " << bar.x << "\n";
cout << typeid(foo).name() << "\n";
cout << typeid(bar).name() << "\n";
auto baz = [x = 7]() mutable -> int& { return x; };
auto quux = [x = 8]() mutable -> int& { return x; };
cout << baz() << " " << quux() << "\n";
cout << typeid(baz).name() << "\n";
cout << typeid(quux).name() << "\n";
}
If that is still unsatisfying for a lambda, it should be likewise unsatisfying for an anonymous struct.
Some languages allow for a kind of duck typing that is a little more flexible, and even though C++ has templates that doesn't really help in making a object from a template that has a member field that can replace a lambda directly rather than using a std::function wrapper.
Why design a language with unique anonymous types?
Because there are cases where names are irrelevant and not useful or even counter-productive. In this case the ability abstract out their existence is useful because it reduces name pollution, and solves one of the two hard problems in computers science (how to name things). For the same reason, temporary objects are useful.
lambda
The uniqueness is not a special lambda thing, or even special thing to anonymous types. It applies to named types in the language as well. Consider following:
struct A {
void operator()(){};
};
struct B {
void operator()(){};
};
void foo(A);
Note that I cannot pass B into foo, even though the classes are identical. This same property applies to unnamed types.
lambdas can only be passed to template functions that allow the compile time, unspeakable type to be passed along with the object ... erased via std::function<>.
There's a third option for a subset of lambdas: Non-capturing lambdas can be converted to function pointers.
Note that if the limitations of an anonymous type are a problem for a use case, then the solution is simple: A named type can be used instead. Lambdas don't do anything that cannot be done with a named class.
C++ lambdas need distinct types for distinct operations, as C++ binds statically. They are only copy/move-constructable, so mostly you don't need to name their type. But that's all somewhat of an implementation detail.
I'm not sure if C# lambdas have a type, as they are "anonymous function expressions", and they immediately get converted to a compatible delegate type or expression tree type. If the do, it's probably an unpronouncable type.
C++ also has anonymous structs, where each definition leads to a unique type. Here the name isn't unpronouncable, it simply doesn't exist as far as the standard is concerned.
C# has anonymous data types, which it carefully forbids from escaping from the scope they are defined. The implementation gives a unique, unpronouncable name to those too.
Having an anonymous type signals to the programmer that they shouldn't poke around inside their implementation.
Aside:
You can give a name to a lambda's type.
auto foo = []{};
using Foo_t = decltype(foo);
If you don't have any captures, you can use a function pointer type
void (*pfoo)() = foo;
Why use anonymous types?
For types that are automatically generated by the compiler, the choice is to either (1) honor a user's request for the name of the type, or (2) let the compiler choose one on its own.
In the former case, the user is expected to explicitly provide a name each time such a construct appears (C++/Rust: whenever a lambda is defined; Rust: whenever a function is defined). This is a tedious detail for the user to provide each time, and in the majority of cases the name is never referred to again. Thus it make sense to let the compiler figure out a name for it automatically, and use existing features such as decltype or type inference to reference the type in the few places where it is needed.
In the latter case, the compiler need to choose a unique name for the type, which would probably be an obscure, unreadable name such as __namespace1_module1_func1_AnonymousFunction042. The language designer could specify precisely how this name is constructed in glorious and delicate detail, but this needlessly exposes an implementation detail to the user that no sensible user could rely upon, since the name is no doubt brittle in the face of even minor refactors. This also unnecessarily constrains the evolution of the language: future feature additions may cause the existing name generation algorithm to change, leading to backward compatibility issues. Thus, it makes sense to simply omit this detail, and assert that the auto-generated type is unutterable by the user.
Why use unique (distinct) types?
If a value has a unique type, then an optimizing compiler can track a unique type across all its use sites with guaranteed fidelity. As a corollary, the user can then be certain of the places where the provenance of this particular value is full known to the compiler.
As an example, the moment the compiler sees:
let f: __UniqueFunc042 = || { ... }; // definition of __UniqueFunc042 (assume it has a nontrivial closure)
/* ... intervening code */
let g: __UniqueFunc042 = /* some expression */;
g();
the compiler has full confidence that g must necessarily originate from f, without even knowing the provenance of g. This would allow the call to g to be devirtualized. The user would know this too, since the user has taken great care to preserve the unique type of f through the flow of data that led to g.
Necessarily, this constrains what the user can do with f. The user is not at liberty to write:
let q = if some_condition { f } else { || {} }; // ERROR: type mismatch
as that would lead to the (illegal) unification of two distinct types.
To work around this, the user could upcast the __UniqueFunc042 to the non-unique type &dyn Fn(),
let f2 = &f as &dyn Fn(); // upcast
let q2 = if some_condition { f2 } else { &|| {} }; // OK
The trade-off made by this type erasure is that uses of &dyn Fn() complicate the reasoning for the compiler. Given:
let g2: &dyn Fn() = /*expression */;
the compiler has to painstakingly examine the /*expression */ to determine whether g2 originates from f or some other function(s), and the conditions under which that provenance holds. In many circumstances, the compiler may give up: perhaps human could tell that g2 really comes from f in all situations but the path from f to g2 was too convoluted for the compiler to decipher, resulting in a virtual call to g2 with pessimistic performance.
This becomes more evident when such objects delivered to generic (template) functions:
fn h<F: Fn()>(f: F);
If one calls h(f) where f: __UniqueFunc042, then h is specialized to a unique instance:
h::<__UniqueFunc042>(f);
This enables the compiler to generate specialized code for h, tailored for the particular argument of f, and the dispatch to f is quite likely to be static, if not inlined.
In the opposite scenario, where one calls h(f) with f2: &Fn(), the h is instantiated as
h::<&Fn()>(f);
which is shared among all functions of type &Fn(). From within h, the compiler knows very little about an opaque function of type &Fn() and so could only conservatively call f with a virtual dispatch. To dispatch statically, the compiler would have to inline the call to h::<&Fn()>(f) at its call site, which is not guaranteed if h is too complex.
First, lambda without capture are convertible to a function pointer. So they provide some form of genericity.
Now why lambdas with capture are not convertible to pointer? Because the function must access the state of the lambda, so this state would need to appear as a function argument.
To avoid name collisions with user code.
Even two lambdas with same implementation will have different types. Which is okay because I can have different types for objects too even if their memory layout is equal.

Declaring variable with name `this` inside lambda inside parentheses leads to different results on 3 different compilers

In C++ it's possible to declare variable inside parentheses like int (x) = 0;. But it seems that if you use this instead of variable name, then constructor is used instead: A (this); calls A::A(B*). So the first question is why it's different for this, is it because variables can't be named this? And to complicate matters a bit lets put this inside a lambda -
struct B;
struct A
{
A (B *) {}
};
struct B
{
B ()
{
[this] { A (this); } ();
}
};
Now gcc calls A::A(B*), msvc prints error about missing default constructor and clang prints expected expression (https://godbolt.org/g/Vxe0fF). It's even funnier in msvc - it really creates variable with name this that you can use, so it's definitely a bug (https://godbolt.org/g/iQaaPH). Which compiler is right and what are the reasons for such behavior?
In the C++ standard §5.1.5 (article 7 for C++11, article 8 later standard) [expr.prim.lambda]:
The lambda-expression’s compound-statement yields the function-body (8.4) of the function call operator, but
for purposes of name lookup (3.4), determining the type and value of this (9.2.2.1) and transforming id-
expressions referring to non-static class members into class member access expressions using (*this) (9.2.2),
the compound-statement is considered in the context of the lambda-expression. [ Example:
struct S1 {
int x, y;
int operator()(int);
void f() {
[=]()->int {
return operator()(this->x + y); // equivalent to S1::operator()(this->x + (*this).y)
// this has type S1*
};
}
};
— end example ]
Thus, gcc is right. You will notice that their is no exception about the fact that you are capturing this. Their is, however a precision since C++14 in the case where you capture *this, still in §5.1.5 (article 17):
If *this is captured by copy, each odr-use of this is transformed into a pointer to the corresponding
unnamed data member of the closure type, cast (5.4) to the type of this.

Forward declaration of lambdas in C++

In C++, it is possible to separate the declaration and definition of functions. For example, it is quite normal to declare a function:
int Foo(int x);
in Foo.h and implement it in Foo.cpp. Is it possible to do something similar with lambdas? For example, define a
std::function<int(int)> bar;
in bar.h and implement it in bar.cpp as:
std::function<int(int)> bar = [](int n)
{
if (n >= 5)
return n;
return n*(n + 1);
};
Disclaimer: I have experience with lambdas in C#, but I have not used them in C++ very much.
You can't separate declaration and definition of lambdas, neither forward declare it. Its type is a unique unnamed closure type which is declared with the lambda expression. But you could do that with std::function objects, which is designed to be able to store any callable target, including lambdas.
As your sample code shown you've been using std::function, just note that for this case bar is a global variable indeed, and you need to use extern in header file to make it a declaration (not a definition).
// bar.h
extern std::function<int(int)> bar; // declaration
and
// bar.cpp
std::function<int(int)> bar = [](int n) // definition
{
if (n >= 5) return n;
return n*(n + 1);
};
Note again that this is not separate declaration and definition of lambda; It's just separate declaration and definition of a global variable bar with type std::function<int(int)>, which is initialized from a lambda expression.
Strictly speaking you can't
Quoting from cpp reference
The lambda expression is a prvalue expression whose value is (until
C++17)whose result object is (since C++17) an unnamed temporary object
of unique unnamed non-union non-aggregate class type, known as closure
type, which is declared (for the purposes of ADL) in the smallest
block scope, class scope, or namespace scope that contains the lambda
expression
So the lambda is a unnamed temporary object. You can bind the lambda to a l-value object(for eg std::function) and by regular rules about variable declaration you can separate the declaration and definition.
Forward declaration is not the correct term because lambdas in C++ are objects, not functions. The code:
std::function<int(int)> bar;
declares a variable and you're not forced to assign it (that type has a default value of "pointer to no function"). You can compile even calls to it... for example the code:
#include <functional>
#include <iostream>
int main(int argc, const char *argv[]) {
std::function<int(int)> bar;
std::cout << bar(21) << "\n";
return 0;
}
will compile cleanly (but of course will behave crazily at runtime).
That said you can assign a lambda to a compatible std::function variable and adding for example:
bar = [](int x){ return x*2; };
right before the call will result in a program that compiles fine and generates as output 42.
A few non-obvious things that can be surprising about lambdas in C++ (if you know other languages that have this concept) are that
Each lambda [..](...){...} has a different incompatible type, even if the signature is absolutely identical. You for example cannot declare a parameter of lambda type because the only way would be to use something like decltype([] ...) but then there would be no way to call the function as any other []... form at a call site would be incompatible. This is solved by std::function so if you have to pass lambdas around or store them in containers you must use std::function.
Lambdas can capture locals by value (but they're const unless you declare the lambda mutable) or by reference (but guaranteeing the lifetime of the referenced object will not be shorter than the lifetime of the lambda is up to the programmer). C++ has no garbage collector and this is something needed to solve correctly the "upward funarg" problem (you can work-around by capturing smart pointers, but you must pay attention to reference loops to avoid leaks).
Differently from other languages lambdas can be copied and when you copy them you're taking a snapshot of their internal captured by-value variables. This can be very surprising for mutable state and this is I think the reason for which captured by-value values are const by default.
A way to rationalize and remember many of the details about lambdas is that code like:
std::function<int(int)> timesK(int k) {
return [k](int x){ return x*k; };
}
is basically like
std::function<int(int)> timesK(int k) {
struct __Lambda6502 {
int k;
__Lambda6502(int k) : k(k) {}
int operator()(int x) {
return x * k;
}
};
return __Lambda6502(k);
}
with one subtle difference that even lambda capturing references can be copied (normally classes containing references as members cannot).

Is it safe to cast a lambda function to a function pointer?

I have this code:
void foo(void (*bar)()) {
bar();
}
int main() {
foo([] {
int x = 2;
});
}
However, I'm worried that this will suffer the same fate as:
struct X { int i; };
void foo(X* x) {
x->i = 2;
}
int main() {
foo(&X());
}
Which takes the address of a local variable.
Is the first example completely safe?
A lambda that captures nothing is implicitly convertible to a function pointer with its same argument list and return type. Only capture-less lambdas can do this; if it captures anything, then they can't.
Unless you're using VS2010, which didn't implement that part of the standard, since it didn't exist yet when they were writing their compiler.
Yes I believe the first example is safe, regardless of the life-time of all the temporaries created during the evaluation of the full-expression that involves the capture-less lambda-expression.
Per the working draft (n3485) 5.1.2 [expr.prim.lambda] p6
The closure type for a lambda-expression with no lambda-capture has a
public non-virtual non-explicit const conversion function to pointer
to function having the same parameter and return types as the closure
type’s function call operator. The value returned by this conversion
function shall be the address of a function that, when invoked, has
the same effect as invoking the closure type’s function call operator.
The above paragraph says nothing about the pointer-to-function's validity expiring after evaluation of the lambda-expression.
For e.g., I would expect the following to work:
auto L = []() {
return [](int x, int y) { return x + y; };
};
int foo( int (*sum)(int, int) ) { return sum(3, 4); }
int main() {
foo( L() );
}
While implementation details of clang are certainly not the final word on C++ (the standard is), if it makes you feel any better, the way this is implemented in clang is that when the lambda expression is parsed and semantically analyzed a closure-type for the lambda expression is invented, and a static function is added to the class with semantics similar to the function call operator of the lambda. So even though the life-time of the lambda object returned by 'L()' is over within the body of 'foo', the conversion to pointer-to-function returns the address of a static function that is still valid.
Consider the somewhat analagous case:
struct B {
static int f(int, int) { return 0; }
typedef int (*fp_t)(int, int);
operator fp_t() const { return &f; }
};
int main() {
int (*fp)(int, int) = B{};
fp(3, 4); // You would expect this to be ok.
}
I am certainly not a core-c++ expert, but FWIW, this is my interpretation of the letter of the standard, and I feel it is defendable.
Hope this helps.
In addition to Nicol's perfectly correct general answer, I would add some views on your particular fears:
However, I'm worried that this will suffer the same fate as ..., which
takes the address of a local variable.
Of course it does, but this is absolutely no problem when you just call it inside foo (in the same way your struct example is perfectly working), since the surrounding function (main in this case) that defined the local variable/lambda will outlive the called function (foo) anyway. It could only ever be a problem if you would safe that local variable or lambda pointer for later use. So
Is the first example completely safe?
Yes, it is, as is the second example, too.

Conversion of lambda expression to function pointer

This is a follow up question to this question: Lambda how can I pass as a parameter
MSDN supposedly has marked the item as fixed. I took a look at the specifications, but I'm having trouble converting their specifications into what the syntax should be.
So for example:
void printOut(int(*eval)(int))
{
for(int x = 0; x < 4; ++x)
{
std::cout << eval(x) << std::endl;
}
}
Now say I have the lambda:
auto lambda1 = [](int x)->int{return x;};
What is the syntax to convert lambda1 into the functional pointer equivalent so it can be passed to printOut?
Also, what about lambdas which actually have something in the brackets? For example:
int y = 5;
auto lambda2 = [y](void)->int{return y;};
If this kind of lambda can't be converted to a function pointer, is there an alternative method for passing this type of lambda expression to printOut (or even a modified version of printOut, if so what's the syntax)?
There is no syntax per se, it's an implicit conversion. Simply cast it (explicitly or implicitly) and you'll get your function pointer. However, this was fixed after Visual Studio 2010 was released, so is not present.†
You cannot make a capture-full lambda into a function pointer ever, as you noted, so it's the function printOut that'll have to change. You can either generalize the function itself:
// anything callable
template <typename Func>
void printOut(Func eval)
{
// ...
}
Or generalize the function type in particular:
// any function-like thing that fits the int(int) requirement
void printOut(std::function<int(int)> eval)
{
// ...
}
Each has their own trade-off.
†As far as I know, it's unknown of we'll get it in a service pack, or if we need to wait until a new release.