Why is this recursive lambda function unsafe? - c++

This question comes from Can lambda functions be recursive? . The accepted answer says the recursive lambda function shown below works.
std::function<int (int)> factorial = [&] (int i)
{
return (i == 1) ? 1 : i * factorial(i - 1);
};
However, it is pointed out by a comment that
such a function cannot be returned safely
, and the reason is supplied in this comment:
returning it destroys the local variable, and the function has a reference to that local variable.
I don't understand the reason. As far as I know, capturing variables is equivalent to retaining them as data members (by-value or by-reference according to the capture list). So what is "local variable" in this context? Also, the code below compiles and works correctly even with -Wall -Wextra -std=c++11 option on g++ 7.4.0.
#include <iostream>
#include <functional>
int main() {
std::function<int (int)> factorial = [&factorial] (int i)
{
return (i == 1) ? 1 : i * factorial(i - 1);
};
std::cout << factorial(5) << "\n";
}
Why is the function unsafe? Is this problem limited to this function, or lambda expression as a whole?

This is because in order to be recursive, it uses type erasure and captures the type erased container by reference.
This has the effect of allowing to use the lambda inside itself, by refering to it indirectly using the std::function.
However, for it to work, it must capture the std::function by reference, and that object has automatic storage duration.
Your lambda contains a reference to a local std::function. Even if you return the std::function by copy, the lambda will still refer to the old one, that died.
To make a secure to return recursive lambda, you can send the lambda to itself in an auto parameter and wrap that in another lambda:
auto factorial = [](auto self, int i) -> int {
return (i == 1) ? 1 : i * self(self, i - 1);
};
return [factorial](int i) { return factorial(factorial, i); };

Related

When is it safe to capture a lambda inside another lambda by reference?

Suppose you have the following program:
static std::function<int(int)> pack_a_lambda( std::function<int(int)> to_be_packed ) {
return [=]( int value ) {
return to_be_packed( value * 4 );
};
}
int main() {
auto f = pack_a_lambda( []( int value ) {
return value * 2;
} );
int result = f( 2 );
std::cout << result << std::endl; // should print 16
return 0;
}
I haven't tried the exact code above, cause I tested it in Google Tests and then slightly edited it like above. So, the function pack_a_lambda takes a lambda by value as input. Here, I believe the temporary lambda is copied. Then, when we create the new lambda, we again capture the copied lambda to_be_packed by value. It works, and seems to me it should be safe.
Now suppose we capture that lambda by reference instead:
static std::function<int(int)> pack_a_lambda( std::function<int(int)> to_be_packed ) {
return [&]( int value ) {
return to_be_packed( value * 4 );
};
}
In my specific use case, the resulting lambda executes four times faster. In the simplified example above I couldn't reproduce this difference, though. In fact, here it seems that capturing the lambda by reference makes it ever-so-slightly slower. So there is clearly some performance difference.
But is it safe? The argument to_be_packed is copied, but it's still a temporary right? That should make it not safe. But I'm not sure. My UB sanitizer and my AddressSanitizer does not complain, but I concede that doesn't prove anything. If I pass to_be_packed by reference...
static std::function<int(int)> pack_a_lambda( const std::function<int(int)> &to_be_packed ) {
return [&]( int value ) {
return to_be_packed( value * 4 );
};
}
...the AddressSanitizer complains, which is not surprising, because the lambda I pass into the function is also a temporary. So that leaves example two: Is it safe or not, and what are possible reasons it might be faster to execute in some cases?
static std::function<int(int)> pack_a_lambda( std::function<int(int)> to_be_packed ) {
return [&]( int value ) {
return to_be_packed( value * 4 );
};
}
is Undefined behavior as you "return" reference to local variable.
By value is the safe way here.
static std::function<int(int)> pack_a_lambda(const std::function<int(int)>& to_be_packed ) {
return [&]( int value ) {
return to_be_packed( value * 4 );
};
}
might be correct. you have to ensure that lifetime of passed parameter is longer than the returned std::function.
auto func = std::function([]( int value ) {
return value * 2;
});
auto f = pack_a_lambda(func); // OK
// auto f2 = pack_a_lambda([](int){ return 42;}); // KO: temporary std::function created
as temporary can bind to const reference, in that case, safer to delete the r-value version:
static std::function<int(int)> pack_a_lambda(std::function<int(int)>&&) = delete;
When is it safe to capture a lambda inside another lambda by reference?
Same as with any captured object: it is safe when the lifetime of the captured object is longer than the capturing lambda.
In your example, you capture a function argument. Its literime ends when the function returns. But you return the capturing lambda to the outside of the function. There, the captured reference will be invalid.

How to make lambdas work with std::nullopt

Background
I have a series of lambdas that perform different checks on the captured variables and return std::nullopt if the check failed. return std::nullopt is the first return statement. Then, if the check succeeded, they go on and compute the value.
Problem
The types of return expressions are not consistent, e.g. std::nullopt_t cannot be converted to std::optional<T>, even though the other way around works. In particular, I'd like the following code to compile and run, printing 2:
#include <functional>
#include <utility>
#include <optional>
int x = 3;
auto lambda = [](){
if (x == 2)
return std::nullopt;
return std::optional(2);
};
#include <iostream>
int main () {
using return_type = std::invoke_result_t<decltype(lambda)>;
static_assert(std::is_same<return_type, std::optional<int>>{},
"return type is still std::nullopt_t");
std::cout << lambda().value() << '\n';
}
Wandbox Demo.
Thoughts
I believe that I need to use std::common_type<Args...> somewhere, but I can neither enforce presence of it or deduce Args, as it might require language support.
Instead of using template type deduction to infer the return type of the lambda, why not explicitly specify that return type?
auto lambda = []() -> std::optional<int> {
if (x == 2)
return std::nullopt;
return 2;
};
std::common_type is commonly with templates, which you don't have.
I suggest to stick with a single return statement and explicitly specified result type without using nullopt at all. It looks somewhat misleading when a function returns either an integer or a nullopt. Especially if the function was longer. Also if value type was something with an explicit constructor then use of emplace allows to avoid typing value type name again.
auto lambda = []()
{
std::optional<int> result{};
if(2 != x)
{
result.emplace(2);
}
return result;
};

Lambda: A by-reference capture that could dangle

Scott Meyers, in Effective Modern C++, says, at lambda chapter, that:
Consider the following code:
void addDivisorFilter()
{
auto calc1 = computeSomeValue1();
auto calc2 = computeSomeValue2();
auto divisor = computeDivisor(calc1, calc2);
filters.emplace_back(
[&](int value) { return value % divisor == 0; }
);
}
This code is a problem waiting to happen. The lambda refers to the local variable divisor, but that variable ceases to exist when addDivisorFilter returns. That's immediately after filters.emplace_back returns, so the function that's added to filters is essentially dead on arrival. Using that filter yields undefined behaviour from virtually the moment it's created.
The question is: Why is it an undefined behaviour? For what I understand, filters.emplace_back only returns after lambda expression is complete, and, during it execution, divisor is valid.
Update
An important data that I've missed to include is:
using FilterContainer = std::vector<std::function<bool(int)>>;
FilterContainer filters;
That's because the scope of the vector filters outlives the one of the function. At function exit, the vector filters still exists, and the captured reference to divisor is now dangling.
For what I understand, filters.emplace_back only returns after lambda expression is complete, and, during it execution, divisor is valid.
That's not true. The vector stores the lambda created from the closure, and does not "execute" the lambda, you execute the lambda after the function exits. Technically the lambda is constructed from a closure (an compiler-dependent-named class) that uses a reference internally, like
#include <vector>
#include <functional>
struct _AnonymousClosure
{
int& _divisor; // this is what the lambda captures
bool operator()(int value) { return value % _divisor == 0; }
};
int main()
{
std::vector<std::function<bool(int)>> filters;
// local scope
{
int divisor = 42;
filters.emplace_back(_AnonymousClosure{divisor});
}
// UB here when using filters, as the reference to divisor dangle
}
You are not evaluating the lambda function while addDivisorFilter is active. You are simply adding "the function" to the collection, not knowing when it might be evaluated (possibly long after addDivisorFilter returned).
In addition to #vsoftco's answer, the following modified example code lets you experience the problem:
#include <iostream>
#include <functional>
#include <vector>
void addDivisorFilter(std::vector<std::function<int(int)>>& filters)
{
int divisor = 5;
filters.emplace_back(
[&](int value) { return value % divisor == 0; }
);
}
int main()
{
std::vector<std::function<int(int)>> filters;
addDivisorFilter(filters);
std::cout << std::boolalpha << filters[0](10) << std::endl;
return 0;
}
live example
This example results in a Floating point exception at runtime, since the reference to divisor is not valid when the lambda is evaluated in main.

What is this C++14 construct called which seems to chain lambdas?

This is a follow-up question on this one: Lambda-Over-Lambda in C++14, where the answers explain the code.
It is about a lambda that creates another lambda which when called, calls the passed lambda and passes the return value to the original lambda, thus returning a new instance of the second lambda.
The example shows how this way lambdas can be chained.
Copy from the original question:
#include <cstdio>
auto terminal = [](auto term) // <---------+
{ // |
return [=] (auto func) // | ???
{ // |
return terminal(func(term)); // >---------+
};
};
auto main() -> int
{
auto hello =[](auto s){ fprintf(s,"Hello\n"); return s; };
auto world =[](auto s){ fprintf(s,"World\n"); return s; };
terminal(stdout)
(hello)
(world) ;
return 0;
}
Is there already a name for this construct and if not what should it be called?
Does it resemble constructs in other languages?
Remark: I'm not interested in whether it is actually useful.
I looked around a bit and turns out the main functionality is reordering the function calls as explained in the answers to the original question.
So world(hello(stdout)); is rewritten to terminal(stdout)(hello)(world); which more generally could be written as compose(stdout)(hello)(world);.
In Haskell this would written as world . hello $ stdout and is called function composition.
In clojure it would be (-> stdout hello world) and is called the "thread-first" macro
I think it is only useful with decent partial application which lambdas provide a little bit, so we could have compose(4)([](int x){ return x + 7; })([](int x){ return x * 2; })([](int x){ return x == 22; }); which should return true if my calculation (and blind coding) is any good.
or to emphasize the partial application:
auto add7 = [](int x){ return x + 7; };
auto dbl = [](int x){ return x * 2; };
auto equal22 = [](int x){ return x == 22; };
assert(compose(4)(add7)(dbl)(equals22));
1 major issue with this implementation is probably that the result can't be evaluated because in the end a lambda is returned, so the construction in this answer might be better suited (function separated by comma instead of parenthesis).
terminal(x) returns an applicator that method-chains its return value into terminal for repeated invocation.
But we could instead generalize it.
Suppose you have a function F. F takes an argument, and stuffs it on a stack.
It then examines the stack. If the top of the stack, evaluated on some subset of the stack, would work for invocation, it does it, and pushes the result back onto the stack. In general, such invocation could return a tuple of results.
So:
F(3)(2)(add)(2)(subtract)(7)(3)(multiply)(power)
would evaluate to:
((3+2)-2)^(7*3)
Your terminal does this with 0 argument functions (the first argument) and with 1 argument functions (every argument after that), and only supports 1 return value per invocation.
Doing this with a lambda would be tricky, but what I described is doable in C++.
So one name for it would be stack-based programming.
As far as I know there is no "official" name, yet.
Suggestions:
Lambda chain
Lambda sausage
Curry sausage

What is wrong with my Phoenix lambda expression?

I would expect the following example Boost Phoenix expression to compile.
What am I missing?
int plus(int a,int b)
{
return a+b;
}
void main(int argc,char** argc)
{
auto plus_1 = phx::bind(&plus,1,arg1);
auto value = phx::lambda[phx::val(plus_1)(arg1)]()(1);
std::cout << value << std::endl;
}
auto plus_1 = phx::bind(&plus,1,arg1);
After this line, plus_1 is a function object that takes one int argument and adds one to it.
phx::lambda[plus_1(arg1)](1);
Whoops. This isn't going to work because (as we said above) plus_1 is a function object that takes one int argument and adds one to it. Here, you're trying to invoke it with arg1.
It isn't obvious from your code what you expect it to do. Can you clarify?
====EDIT====
I see you've edited the code in your question. Your code is still wrong but for a different reason now. This:
phx::val(plus_1)(arg1)
... uses val to create a nullary function that returns the plus_1 unary function. You then try to invoke the nullary function with arg1. Boom.
Here is code that executes and does (what I believe) you intend:
#include <iostream>
#include <boost/phoenix/phoenix.hpp>
namespace phx = boost::phoenix;
using phx::arg_names::arg1;
int plus(int a,int b)
{
return a+b;
}
int main()
{
auto plus_1 = phx::bind(&plus, 1, arg1);
int value = phx::bind(phx::lambda[plus_1], arg1)(1);
std::cout << value << std::endl;
}
The first bind takes the binary plus and turns it into a unary function with the first argument bound to 1. The second bind creates a new unary function that is equivalent to the first, but it does so by safely wrapping the first function using lambda. Why is that necessary? Consider the code below, which is equivalent, but without the lambda:
// Oops, wrong:
int value = phx::bind(phx::bind(&plus, 1, arg1), arg1)(1);
Notice that arg1 appears twice. All expressions get evaluated from the inside out. First, we'll bind the inner arg1 to 1, then evaluate the inner bind yielding 2, which we then try to bind and invoke. That's not going to work because 2 isn't callable.
The use of lambda creates a scope for the inner arg1 so it isn't eagerly substituted. But like I said, the use of the second bind, which forces the need for lambda, yields a function that is equivalent to the first. So it's needlessly complicated. But maybe it helped you understand about bind, lambda and Phoenix scopes.
It's not clear to me what you're trying to accomplish by using lambda here, but if you just want to call plus_1 with 1 (resulting in 2), it's much simpler than your attempt:
#include <iostream>
#include <boost/phoenix.hpp>
int plus(int a, int b)
{
return a + b;
}
int main()
{
namespace phx = boost::phoenix;
auto plus_1 = phx::bind(plus, 1, phx::arg_names::arg1);
std::cout << plus_1(1) << '\n';
}
Online demo
If this isn't what you're trying to accomplish, then you need to describe what you actually want. :-]
Perhaps this can explain it better.
Phoenix is not magic; it is first and foremost C++. It therefore follows the rules of C++.
phx::bind is a function that returns a function object, an object which has an overloaded operator() that calls the function that was bound. Your first statement stores this object into plus_1.
Given all of this, anytime you have the expression plus_1(...), this is a function call. That's what it is; you are saying that you want to call the overloaded operator() function on the type of that object, and that you are going to pass some values to that function.
It doesn't matter whether that expression is in the middle of a [] or not. phx::lambda cannot make C++ change its rules. It can't make plus_1(...) anything other than an immediate function call. Nor can arg1 make plus_1(...) not an immediate function call.