Calling std::function destructor while still executing it - c++

I want to dynamically change the behaviour of a method of a class, so I implemented these method calling the operator() of a std::function holding a copy of one lambda function, that depends on some values known only after the class construction, at a time.
The lambdas change the state of the class, so they reset a container holding the behaviours of all dynamic methods.
Executing the above idea I was not able to access the capture list of the lamba after resetting the container.
The following snippet reproduces the problem:
std::vector< std::function<void(std::string)> > vector;
int main() {
//Change class state when variable value will be known
std::string variableValue = "hello";
auto function = [variableValue](std::string arg) {
std::cout <<"From capture list, before: "<< variableValue << std::endl;
std::cout <<"From arg, before: " << arg << std::endl;
vector.clear();
std::cout << "From capture list, after: " << variableValue << std::endl;
std::cout << "From arg, after: " << arg << std::endl;
};
vector.push_back(function);
//Dynamic method execution
vector[0](variableValue);
return 0;
}
Producing output:
From capture list, before: hello
From arg, before: hello
From capture list, after:
From arg, after: hello
where variableValue is invalidated after vector was clean.
Is the capture list invalidation an expected result?
Is safe using any other local variable, not only in the capture list, after calling std::function destructor?
Is there a suggested way / pattern to accomplish the same behaviour in a safer way (excluding huge switches/if on class states)?

We can get rid of the std::function, lambda and vector for this question. Since lambdas are just syntactic sugar for classes with a function-call operator, your testcase is effectively the same as this:
struct Foo
{
std::string variableValue = "hello";
void bar(std::string arg)
{
std::cout <<"From capture list, before: "<< variableValue << std::endl;
std::cout <<"From arg, before: " << arg << std::endl;
delete this; // ugrh
std::cout << "From capture list, after: " << variableValue << std::endl;
std::cout << "From arg, after: " << arg << std::endl;
}
};
int main()
{
Foo* ptr = new Foo();
ptr->bar(variableValue);
}
The function argument is fine because it's a copy, but after delete this the member Foo::variableValue no longer exists, so your program has undefined behaviour from trying to use it.
Common wisdom is that continuing to run the function itself is legal (because function definitions aren't objects and cannot be "deleted"; they are just a fundamental property of your program), as long as you leave the encapsulating class's members well enough alone.
I would, however, advise avoiding this pattern unless you really need it. It'll be easy to confuse people as to the ownership responsibilities of your class (even when "your class" is autonomously-generated from a lambda expression!).
Is the capture list invalidation an expected result?
Yes.
Is safe using any other local variable, not only in the capture list, after calling std::function destructor?
Yes.
Is there a suggested way / pattern to accomplish the same behaviour in a safer way (excluding huge switches/if on class states)?
That's impossible to say for sure without understanding what it is that you're trying to do. But you could try playing around with storing shared_ptrs in your vector instead… Just be careful not to capture a shared_ptr in the lambda itself, or it'll never be cleaned up! Capturing a weak_ptr instead can be good for this; it can be "converted" to a shared_ptr inside the lambda body, which will protect the lambda's life for the duration of said body.

std::function's destructor destroys the object's target if the object is non-empty, where the target is the wrapped callable object.
In your case, the target is a lambda expression. When you use a lambda expression, the compiler generates a "non-union non-aggregate class type" that contains the captures-by-value as data members and has operator() as a member function.
When you execute vector.clear(), the destructors of its elements are run, and therefore the destructors of the closure's captures-by-value, which are member variables, are run.
As for captures-by-reference, "the reference variable's lifetime ends when the lifetime of the closure object ends."
So, it is not safe to access any capture, whether by value and by reference, after std::function's destructor runs.
What about the actual operator()? "Functions are not objects," so they don't have lifetimes. So, the mere execution of the operator() after the destructor has been run should be fine, as long as you don't access any captures. See the conditions under which one can safely delete this.

Related

Lambda capturing - what is the difference between & and this? [duplicate]

If I need to generate a lambda that calls a member function, should I capture by reference or capture 'this'? My understanding is that '&' captures only the variables used, but 'this' captures all member variable. So better to use '&'?
class MyClass {
public:
int mFunc() {
// accesses member variables
}
std::function<int()> get() {
//return [this] () { return this->mFunc(); };
// or
//return [&] () { return this->mFunc(); };
}
private:
// member variables
}
For the specific example you've provided, capturing by this is what you want. Conceptually, capturing this by reference doesn't make a whole lot of sense, since you can't change the value of this, you can only use it as a pointer to access members of the class or to get the address of the class instance. Inside your lambda function, if you access things which implicitly use the this pointer (e.g. you call a member function or access a member variable without explicitly using this), the compiler treats it as though you had used this anyway. You can list multiple captures too, so if you want to capture both members and local variables, you can choose independently whether to capture them by reference or by value. The following article should give you a good grounding in lambdas and captures:
https://crascit.com/2015/03/01/lambdas-for-lunch/
Also, your example uses std::function as the return type through which the lambda is passed back to the caller. Be aware that std::function isn't always as cheap as you may think, so if you are able to use a lambda directly rather than having to wrap it in a std::function, it will likely be more efficient. The following article, while not directly related to your original question, may still give you some useful material relating to lambdas and std::function (see the section An alternative way to store the function object, but the article in general may be of interest):
https://crascit.com/2015/06/03/on-leaving-scope-part-2/
Here is a good explanation of what &, this and the others indicate when used in the capture list.
In your case, assuming that all what you have to do is calling a member function of the instance that is actually referenced by the this of the method that is currently executing, put this in your capture list should be enough.
Capturing this and capturing by reference are two orthogonal concepts. You can use one, both, or none. It doesn't make sense to capture this by reference but you can capture other variables by reference while capturing this by value.
It's not a clear-cut situation where on is better than the other. Rather, the two (at least potentially) accomplish slightly different things. For example, consider code like this:
#include <iostream>
class foo {
int bar = 0;
public:
void baz() {
int bar = 1;
auto thing1 = [&] { bar = 2; };
auto thing2 = [this] { this->bar = 3; };
std::cout << "Before thing1: local bar: " << bar << ", this->bar: " << this->bar << "\n";
thing1();
std::cout << "After thing1: local bar: " << bar << ", this->bar: " << this->bar << "\n";
thing2();
std::cout << "After thing2: local bar: " << bar << ", this->bar: " << this->bar << "\n";
}
};
int main() {
foo f;
f.baz();
}
As you can see, capturing this captures only the variables that can be referred to via this. In this case, we have a local variable that shadows an instance variable (yes, that's often a bad idea, but in this case we're using it to show part of what each does). As we see when we run the program, we get different results from capturing this vs. an implicit capture by reference:
Before thing1: local bar: 1, this->bar: 0
After thing1: local bar: 2, this->bar: 0
After thing2: local bar: 2, this->bar: 3
As to the specifics of capturing everything vs. only what you use: neither will capture any variable you don't use. But, since this is a pointer, capturing that one variable gives you access to everything it points at. That's not unique to this though. Capturing any pointer will give you access to whatever it points at.

C++ method in thread. Difference between passing: object, object's address, std::ref of object

I am trying to execute an object's method in a C++ thread.
I am able to do it, by passing the method's address and the object (or the object's address, or std::ref(my_obj)) to the thread's constructor.
I observed that if I pass the object, rather than the object's address or std::ref(my_obj), then the object gets copied twice (I'm printing some info in the copy constructor to see that).
Here is the code:
class Warrior{
string _name;
public:
// constructor
Warrior(string name): _name(name) {}
// copy constructor (prints every time the object is copied)
Warrior(const Warrior & other): _name("Copied " + other._name){
cout << "Copying warrior: \"" << other._name;
cout << "\" into : \"" << _name << "\"" << endl;
}
void attack(int damage){
cout << _name << " is attacking for " << damage << "!" << endl;
}
};
int main(){
Warrior conan("Conan");
// run conan.attack(5) in a separate thread
thread t(&Warrior::attack, conan, 5);
t.join(); // wait for thread to finish
}
The output I get in this case is
Copying warrior: "Conan" into : "Copied Conan"
Copying warrior: "Copied Conan" into : "Copied Copied Conan"
Copied Copied Conan is attacking for 5!
While if I simply pass &conan or std::ref(conan) as a second argument to thread t(...) (instead of passing conan), the output is just:
Conan is attacking for 5!
I have 4 doubts:
Why is that I have 2 copies of the object instead of 1?
I was expecting that by passing the instance of the object to the thread's constructor, the object would get copied once in the thread's own stack, and then the attack() method would be called on that copy.
What is the exact reason why the thread's constructor can accept an object, an address, or a std::ref? Is it using this version of the constructor (which I admit I do not fully understand)
template< class Function, class... Args >
explicit thread( Function&& f, Args&&... args );
in all 3 cases?
If we exclude the first case (since it's inefficient), what should I use between &conan and std::ref(conan)?
Is this somehow related to the syntax required by std::bind?
Why is that I have 2 copies of the object instead of 1?
When you spin up a thread the parameters are copied into the thread object. Those parameters are then copied into the actual thread that gets created, so you have two copies. This is why you have to use std::ref when you want to pass parameter that the function takes by reference.
What is the exact reason why the thread's constructor can accept an object, an address, or a std::ref? Is it using this version of the constructor (which I admit I do not fully understand)
std::thread basically starts the new thread with a call like
std::invoke(decay_copy(std::forward<Function>(f)),
decay_copy(std::forward<Args>(args))...);
std::invoke is built to handle all different sorts of callables and one of those is when it has a member function pointer and an object, and it calls the function appropriately. It also knows about std::reference_wrapper and can handle calling a pointer to a member function on a std::reference_wrapper to an object.
If we exclude the first case (since it's inefficient), what should I use between &conan and std::ref(conan)?
This is primarily opinion based. They both essentially do the same thing, although the first version is shorter to write.
Is this somehow related to the syntax required by std::bind?
Kind of. std::bind's operator() is also implemented using std::invoke so they have a very common interface.
All of that said you can use a lambda to give yourself a common interface.
thread t(&Warrior::attack, conan, 5);
can be rewritten as
thread t([&](){ return conan.attack(5); });
And you can use this form for pretty much any other function you want to call. I find it is easier to parse when seeing a lambda.

How does std::future affects the lifetime of an associated std::packaged_task?

I have an std::packaged_task containing a lambda that captures a variable by copy. When this std::packaged_task is deleted, I would expect the variable living inside the lambda to be destructed, but I noticed that if I get the associated std::future for this std::packaged_task, the future object extends the lifetime of the variable inside the lambda.
For example:
#include <iostream>
#include <future>
class Dummy
{
public:
Dummy() {std::cout << this << ": default constructed;" << std::endl;}
Dummy(const Dummy&) {std::cout << this << ": copy constructed;" << std::endl;}
Dummy(Dummy&&) {std::cout << this << ": move constructed;" << std::endl;}
~Dummy() {std::cout << this << ": destructed;" << std::endl;}
};
int main()
{
std::packaged_task<void()>* p_task;
{
Dummy ScopedDummy;
p_task = new std::packaged_task<void()>([ScopedDummy](){std::cout << "lambda call with: " << &ScopedDummy << std::endl;});
std::cout << "p_task completed" << std::endl;
}
{
std::future<void> future_result;
{
future_result = p_task->get_future();
(*p_task)();
delete p_task;
}
std::cout << "after p_task has been deleted, the scope of future_result determines the scope of the dummy inside p_task" << std::endl;
}
std::cout << "p_task cleans up when future_result dies" << std::endl;
}
A possible output is:
0x7fff9cf873fe: default constructed;
0x7fff9cf873ff: copy constructed;
0x1904b38: move constructed;
0x7fff9cf873ff: destructed;
0x7fff9cf873fe: destructed;
lambda call with: 0x1904b38
after p_task has been deleted, the scope of future_result determines the scope of the dummy inside p_task
0x1904b38: destructed;
p_task cleans up when future_result dies
So the object inside the lambda has its lifetime extended by the scope of future_result.
If we comment out the line future_result = p_task->get_future();, a possible output is:
0x7fff57087896: default constructed;
0x7fff57087897: copy constructed;
0x197cb38: move constructed;
0x7fff57087897: destructed;
0x7fff57087896: destructed;
lambda call with: 0x197cb38
0x197cb38: destructed;
after p_task has been deleted, the scope of future_result determines the scope of the dummy inside p_task
p_task cleans up when future_result dies
I have been wondering what mechanism comes into play here, does std::future contains some link that keeps associated objects alive?
looking at gcc7.2.0 packaged_task sources, we read:
packaged_task(allocator_arg_t, const _Alloc &__a, _Fn &&__fn)
: _M_state(__create_task_state<_Res(_ArgTypes...)>(std::forward<_Fn>(__fn), __a)){}
~packaged_task()
{
if (static_cast<bool>(_M_state) && !_M_state.unique())
_M_state->_M_break_promise(std::move(_M_state->_M_result));
}
where _M_state is a shared_ptr to the internal packaged_task shared state. So, it turns out that gcc stores the callable as part of the packaged_task shared state, hence binding the callable lifetime to whom among packaged_task,future,shared_future dies last.
in comparison, clang does not, destroying the callable when the packaged task gets destroyed ( in fact, my copy of clang will store the callable as a proper member ).
Who's right ? the standard is not very clear about the stored task lifetime; from one side, we have
[[futures.task]]
packaged_task defines a type for wrapping a function or callable object so that the return value of the function or callable object is stored in a future when it is invoked.
packaged_task(F&& f)[...]Constructs a new packaged_task object with a shared state and initializes the object’s stored task with std::forward(f).
packaged_task(packaged_task&& rhs)[...]Moves the stored task from rhs to *this.
reset()[...]Effects: As if *this = packaged_task(std::move(f)), where f is the task stored in *this.
that suggests the callable is owned by the packaged_task, but we also have
[[futures.state]]
-Many of the classes introduced in this subclause use some state to communicate results. This shared state consists of some state information and some (possibly not yet evaluated) result, which can be a (possibly void) value or an exception. [ Note: Futures, promises, and tasks defined in this clause reference such shared state. —endnote]
-[ Note: The result can be any kind of object including a function to compute that result, as by async [...]]
and
[futures.task.members]
-packaged_task(F&& f);[...]Invoking a copy of f shall behave the same as invoking f[...]
-~packaged_task(); Effects: Abandons any shared state
suggesting that a callable can be stored in the shared state and that one should not rely on any callable per-instance behaviour ( this may be interpreted to include the callable end of lifetime side-effects; by the way, this also implies that your callable is not strictly valid, because it behaves differently from its copy ); moreover, nothing is mentioned about the stored task in the dtor.
All in all, I think clang follows the wording more consistently, although nothing seems explicitly forbidding gcc behavior. That said, I agree this should be better documented because it may result in surprising bugs otherwise ...

Why can't I change the value of a variable captured by copy in a lambda function?

I'm reading the C++ Programming Language by B. Stroustrup in its section 11.4.3.4 "mutable Lambdas", which says the following:
Usually, we don’t want to modify the state of the function object (the
closure), so by default we can’t. That is, the operator()() for the
generated function object (§11.4.1) is a const member function. In the
unlikely event that we want to modify the state (as opposed to
modifying the state of some variable captured by reference; §11.4.3),
we can declare the lambda mutable.
I don't understand why the default for the operator()() is const when the variable is captured by value. What's the rational for this? What could go wrong when I change the value of a variable, which is copied into the function object?
One can think of lambdas as classes with operator()(), which by default is defined as const. That is, it cannot change the state of the object. Consequently, the lambda will behave as a regular function and produce the same result every time it is called. If instead, we declare the lambda as mutable, it is possible for the lambda to modify the internal state of the object, and provide a different result for different calls depending on that state. This is not very intuitive and therefore discouraged.
For example, with mutable lambda, this can happen:
#include <iostream>
int main()
{
int n = 0;
auto lam = [=]() mutable {
n += 1;
return n;
};
std::cout << lam() << "\n"; // Prints 1
std::cout << n << "\n"; // Prints 0
std::cout << lam() << "\n"; // Prints 2
std::cout << n << "\n"; // Prints 0
}
It is easier to reason about const data.
By defaulting const, brief lamndas are easier to reason about. If you want mutability you can ask for it.
Many function objects in std are copied around; const objects that are copied have simpler state to track.

Creating std::function with lambda causes superfluous copying of the lambda object - why?

When I am constructing std::function with lambda with captured values it makes an additional copy (move) of those parameters (actually the of the whole lambda object I guess).
The code:
#include <iostream>
#include <functional>
// Testing class - just to see constructing/destructing.
class T {
private:
static int idCounter; // The global counter of the constructed objects of this type.
public:
const int id; // Unique object ID
inline T() : id(++idCounter) {
std::cout << " Constuctor Id=" << id << std::endl;
};
inline T(const T& src) : id(++idCounter) {
std::cout << " Copy constructor Id=" << id << std::endl;
}
inline T(const T&& src) : id(++idCounter) {
std::cout << " Move constructor Id=" << id << std::endl;
}
inline void print() const {
std::cout << " Print is called for object with id=" << id << std::endl;
}
inline ~T() {
std::cout << " Destructor Id=" << id << std::endl;
}
};
int T::idCounter=0;
// Declare type of the std::function to store our lambda.
typedef std::function<int (void)> Callback;
int main()
{
std::cout << "Let's the game begin!" << std::endl;
T obj; // Custruct the first object.
std::cout << "Let's create a pointer to the lambda." << std::endl;
// Make a labmda with captured object. (The labmda prints and returns object's id).
// It should make one (local) copy of the captured object but it makes it twice - why?!
const Callback* pcb= new Callback( [obj]() -> int {
obj.print();
return obj.id;
} );
std::cout << "Now let's print lambda execution result." << std::endl;
std::cout << "The functor's id is " << (*pcb)() << std::endl;
std::cout << "Destroying the lambda." << std::endl;
delete pcb;
std::cout << "Terminating." << std::endl;
return 0;
}
The output is:
Let's the game begin!
Constuctor Id=1
Let's create a pointer to the lambda.
Copy constructor Id=2
Move constructor Id=3
Destructor Id=2
Now let's print lambda execution result.
Print is called for object with id=3
The functor's id is 3
Destroying the lambda.
Destructor Id=3
Terminating.
Destructor Id=1
I made a std:function with lambda with captured object. It should make a local copy of the object for lambda but it make the copy twice (look at move constructor call - highlighted with bold). Actually it make a copy of the whole lambda object. Why? How can I avoid that?
I am using lambdas for inter-thread event processing and they may capture noticeable amounts of date so I am trying to find a way to avoid unnecessary copying. So the task is simple - to pass constructed lambda into the function with minimal expenses - if it will copy data twice for every constructed lambda I would search for another way to work with events.
I am using GCC v4.7.2 forced to GNU C++11.
Well, the output is confusing because there is one copy-elision performed by the compiler. So in order to understand the behaviour, we need to disable the copy-elision for a while. Use -fno-elide-constructors flag when compiling the code:
$ g++ -std=c++11 -fno-elide-constructors main.cpp
Now it gives this output (demo-without-copy-elision):
Let's create a pointer to the lambda.
Copy constructor Id=2
Move constructor Id=3
Move constructor Id=4
Destructor Id=3
Destructor Id=2
Well, that is expected. The copy is done when creating the lambda:
[obj]() -> int {
//^^^^ COPY!
obj.print();
return obj.id;
}
Well, that is too obvious!
Now coming to the non-obvious thing : the two move operations!
The first move is done when passing the lambda to the constructor of std::function, because the lambda is an rvalue, hence move-constructor is called. Note that -fno-elide-constructors disables move-elision also (which is just a supposedly faster version of copy, after all!).
The second move is done, when writing (by moving of course) to the member data of std::function in the constructor initialization-list.
So far so good.
Now if you remove -fno-elide-constructors, the compiler optimizes away the first move (because of which it doesn't invoke the move constructor), which is why the output is this:
Let's create a pointer to the lambda.
Copy constructor Id=2
Move constructor Id=3
Destructor Id=2
See demo-with-copy-elision.
Now the move you see now, is because of moving-the-lambda into the member data of std::function. You cannot avoid this move.
Also note that copying/moving the lambda also causes copying/moving the captured data (i.e recursively copying/moving).
Anyway, if you're worrying about copying the captured object (assuming it is a huge object), then I would suggest you to create the captured object using new so that copying the captured object means copying a pointer (4 or 8 bytes!). That should work great!
Hope that helps.
It does not make copy twice. Moving is considered a cheap operation, and practically in 99% of the cases it is. For 'plan old data' types (structs, ints, doubles, ...) the double-copying is a non-issue as most compilers eliminate redundant copies (data-flow analysis). For containers, moving is a very cheap operation.
As mentioned by Nawaz in the comments, the extra move operation that you are worried about is performed when the lambda expression is moved into the std::function<int(void)> (typedef'ed as Callback).
const Callback* pcb= new Callback( [obj]() -> int {
obj.print();
return obj.id;
} );
Here the object obj is passed by value (copy constructed) to the lambda expression, but additionally, the entire lambda expression is passed as an r-value to the constructor of Callback (std::function) and is therefore move-copied into the std::function object. When moving the lambda, all states must also be moved along and hence the obj is also moved (there are actually two move constructions of obj involved but one of them is usually optimized out by the compiler).
Equivalent code:
auto lambda = [obj]() -> int { // Copy obj into lambda.
obj.print();
return obj.id;
};
const Callback* pcb= new Callback(std::move(lambda)); // Move lambda (and obj).
Move operations are considered cheap and won't cause any costly copying of data (in most cases).
You can read more about move semantics here: What are move semantics?.
Finally If you don't want to copy obj then simply capture it by reference in the lambda:
const Callback* pcb= new Callback( [&obj]() -> int {
obj.print();
return obj.id;
} );