Is passing by reference to thread here bad practice? - c++

This is the code listening from one C++ book.
void edit_document(std::string const& filename)
{
open_document_and_display_gui(filename);
while(!done_editing())
{
user_command cmd=get_user_input();
if(cmd.type==open_new_document)
{
std::string const new_name=get_filename_from_user();
std::thread t(edit_document,new_name);
t.detach();
}
else
{
process_user_input(cmd);
}
}
}
As you can see edit_document function can run other thread by itself. But the thread entry function takes filename as a const reference. Is it wrong in this case? Consider example when the new thread gets blocked in some way and new_name variable is actually destroyed and some garbage value is read. Is it possible in this case?

There's nothing wrong with having a thread function that takes its argument by reference. The constructor for std::thread doesn't forward its arguments by reference; it copies the arguments that are passed to it. So, internally, when you create a thread with
std::thread t(edit_document, new_name);
it, in effect, generates code that spins up a thread which does this:
std::string first_arg(new_name);
edit_document(first_arg);
and first_arg lives until after edit_document returns. (Don't take that code literally -- the actual implementation is much more subtle. The constructor for std::thread won't return until first_arg has been constructed, so there's no risk that new_name will go away before the copy has been made)
You have to go out of your way to pass an actual reference to the thread function. That's what std::reference_wrapper does:
std::thread t(edit_document, std::cref(new_name));
If you do that, of course, you have to be sure that the lifetime of new_name will be longer than the thread. That's not common, for obvious reasons.

Related

Do objects captured by a lambda exist for as long as the lambda?

I have always assumed lambda were just function pointers, but I've never thought to use capture statements seriously...
If I create a lambda that captures by copy, and then move that lambda to a completely different thread and make no attempt to save the original objects used in the lambda, will it retain those copies for me?
std::thread createThread() {
std::string str("Success");
auto func = [=](){
printf("%s", str.c_str());
};
str = "Failure";
return std::thread(func);
}
int main() {
std::thread thread = createThread();
thread.join();
// assuming the thread doesn't execute anything until here...
// would it print "Success", "Failure", or deference a dangling pointer?
return 0;
}
It is guaranteed to print Success. Capture-by-copy does exactly what it says. It make a copy of the object right there and stores this copy as part of the closure object. The member of the closure object created from the capture lives as long as the closure object itself.
A lambda is not a function pointer. Lambdas are general function objects that can have internal state, which a function pointer can't have. In fact, only capture-less lambdas can be converted to function pointers and so may behave like one sometimes.
The lambda expression produces a closure type that basically looks something like this:
struct /*unnamed1*/ {
/*unnamed1*/(const /*unnamed1*/&) = default;
/*unnamed1*/(/*unnamed1*/&&) = default;
/*unnamed1*/& operator=(const /*unnamed1*/&) = delete;
void operator()() const {
printf("%s", /*unnamed2*/.c_str());
};
std::string /*unnamed2*/;
};
and the lambda expression produces an object of this type, with /*unnamed2*/ direct-initialized to the current value of str. (Direct-initialized meaning as if by std::string /*unnamed2*/(str);)
You have 3 situations
You can be design guarantee that variables live longer then the thread, because you synchronize with the end of the thread before variables go out of scope.
You know your thread may outlive the scope/life cycle of your thread but you don't need access to the variables anymore from any other thread.
You can't say which thread lives longest, you have multiple thread accessing your data and you want to extend the live time of your variables
In case 1. Capture by reference
In case 2. Capture by value (or you even use move) variables
In case 3. Make data shared, std::shared_ptr and capture that by value
Case 3 will extend the lifetime of the data to the lifetime of the longest living thread.
Note I prefer using std::async over std::thread, since that returns a RAII object (a future). The destructor of that will synchronize with the thread. So you can use that as members in objects with a thread and make sure the object destruction waits for the thread to finish.

C++Common std::make_unique, std::packaged_task and std::promise Problem

The Problem
When creating schedulers the last copy or move of a function object is the last place that the function object is ever referenced (by a worker thread). If you were to use a std::function to store functions in the scheduler then any std::promises or std::packaged_task or other similarly move only types don't work as they cannot be copied by std::function.
Similarly, if you were to use std::packaged_task in the scheduler it imposes unnecessary overhead as many tasks do not require the std::future returned by packaged task at all.
The common and not great solution is to use a std::shared_ptr<std::promise> or a std::shared_ptr<std::packaged_task> which works but it imposes quite a lot of overhead.
The solution
A make_owner, similar to make_unique with one key difference, a move OR copy simply transfers control of destruction of the object. It is basically identical to std::unique_ptr, except that it is copyable (it basically always moves, even on a copy). Grosss....
This means that moving of std::functions doesn't require copies of the std::shared_ptr which require reference counting and it also means there is significantly less overhead on the reference counting etc. A single atomic pointer to the object would be needed and a move OR copy would transfer control. The major difference being that copy also transfers control, this might be a bit of a no-no in terms of strict language rules but I don't see another way around it.
This solution is bad because:
It ignores copy symantics.
It casts away const (in copy constructor and operator =)
Grrr
It isn't as nice of a solution as I'd like so if anybody knows another way to avoid using a shared pointer or only using packaged_tasks in a scheduler I'd love to hear it because I'm stumped...
I am pretty unsatisfied with this solution.... Any ideas?
I am able to re-implement std::function with move symantics but this seems like a massive pain in the arse and it has its own problems regarding object lifetime (but they already exist when using std::function with reference capture).
Some examples of the problem:
EDIT
Note in the target application I cannot do std::thread a (std::move(a)) as the scheduler threads are always running, at most they are put in a sleep state, never joined, never stopped. A fixed number of threads are in the thread pool, I cannot create threads for each task.
auto proms = std::make_unique<std::promise<int>>();
auto future = proms->get_future();
std::thread runner(std::move(std::function( [prom = std::move(proms)]() mutable noexcept
{
prom->set_value(80085);
})));
std::cout << future.get() << std::endl;
std::cin.get();
And an example with a packaged_task
auto pack = std::packaged_task<int(void)>
( []
{
return 1;
});
auto future = pack.get_future();
std::thread runner(std::move(std::function( [pack = std::move(pack)]() mutable noexcept
{
pack();
})));
std::cout << future.get() << std::endl;
std::cin.get();
EDIT
I need to do this from the context of a scheduler, I won't be able to move to the thread.
Please note that the above is minimum re-producible, std::async is not adequate for my application.
The main question is: Why you want to wrap a lambda with std::function before passing it to the std::thread constructor?
It is perfectly fine to do this:
std::thread runner([prom = std::move(proms)]() mutable noexcept
{
prom->set_value(80085);
});
You can find the explanation of why std::function does not allow you to store a move-only lambda here.
If you were going to pass std::function with wrapped lambda to some function, instead of:
void foo(std::function<void()> f)
{
std::thread runner(std::move(f));
/* ... */
}
foo(std::function<void()>([](){}));
You can do this:
void foo(std::thread runner)
{
/* ... */
}
foo(std::thread([](){}));
Update: It can be done in an old-fashioned way.
std::thread runner([prom_deleter = proms.get_deleter(), prom = proms.release()]() mutable noexcept
{
prom->set_value(80085);
// if `proms` deleter is of a `default_deleter` type
// the next line can be simplified to `delete prom;`
prom_deleter(prom);
});

Why does std::thread take function to run by rvalue?

There is one thing about std::thread which I don't understand:
why the constructor of std::thread takes function to run by rvalue?
I usually want to run a Functor with some members to another thread. Like this:
struct Function
{
void operator() ( /* some args */)
{
/* some code */
}
/* some members */
}
void run_thread()
{
Functor f( /* some data */);
std::thread thread(f, /* some data */);
/* do something and wait for thread to finish */
}
With current implementation of std::thread I must be sure my object is implementing move semantics. I don't get why cannot I pass it by reference.
Extra question is: what does it mean to refer to function by rvalue? Lambda expression?
In your run_thread method f is an auto variable. That means at the bottom of the scope f will be destroyed. You claim that you will "wait for the thread to finish" but the compiler/runtime system does not know that! It has to assume that f will be deleted, possibly before the thread that is supposed to call its method has a chance to start.
By copying (or moving) the f, the run time system gains control of the lifetime of its copy of f and can avoid some really nasty, hard to debug problems.
std::reference_wrapper will expose an operator() to the wrapped object. If you are willing to do the manual lifetime maintenance, std::thread t(std::ref(f)); will run f by reference.
Of course in your code, this induces undefined behaviour as you did not manage lifetimes properly.
Finally, note that raw thread is a poor "client code" tool. async is a touch better, but really you want a task queue with packaged_tasks and futures and condition variables. C++11 added enough threading support to write a decent threading system, but it provides primitives, not good "client code" tools.
In a toy program it may be enough.

Is it wise to lock a mutex to just return a value?

class Foo {
public:
// ...
const int &getBar() const noexcept;
void doSomethingWithBar(); // (2)
private:
std::mutex barMutex;
int bar = 7;
};
const int &Foo::getBar() const noexcept {
std::lock_guard<std::mutex> lock(this->barMutex); // (1)
return this->bar;
}
void Foo::doSomethingWithBar() {
std::lock_guard<std::mutex> lock(this->barMutex); // necessary here
this->bar++;
}
In terms of thread-safety, is line 1 necessary, considering that another thread might interfere and call the function in line 2 and thus change the value of bar?
Note that int might be any type here.
Seeing as you're returning a reference, locking is entirely useless for you in this scenario. You may want a lock when you use the reference that is returned though.
However, if you were returning a value it will have more of an effect, take a look at this for an example of a torn read.
Line 1 has not the effect that you expect: apparently, you try to lock the object and keep it locked while the caller works on the reference returned.
But the lock_gard you create is a local object of getBar() and gets destroyed as soon as you return, thus unlocking the lock that you just acquired.
Several alternatives, for example:
You may change your function to return the value of bar, but taking into consideration that this value is a snapshot that can be obsolete when you'll use it.
You may also change your function to get and process bar in one shot (for example by providing a function as parameter).
You could also manage the lock as class member and provide a function releaseBar() to unlock the mutex as soon as it isn't needed anymore. This is however is a more dangerous, approach, because the caller may forget to release the lock.

C++11 std::thread::detach and access to shared data

If you have shared variables between a std::thread and the main thread (or any other thread for that matter), can you still access those shared variables even if you execute the thread::detach() method immediately after creating the thread?
Yes! Global, captured and passed-in variables are still accessible after calling detach().
However, if you are calling detach, it is likely that you want to return from the function that created the thread, allowing the thread object to go out of scope. If that is the case, you will have to take care that none of the locals of that function were passed to the thread either by reference or through a pointer.
You can think of detach() as a declaration that the thread does not need anything local to the creating thread.
In the following example, a thread keeps accessing an int on the stack of the starting thread after it has gone out of scope. This is undefined behaviour!
void start_thread()
{
int someInt = 5;
std::thread t([&]() {
while (true)
{
// Will print someInt (5) repeatedly until we return. Then,
// undefined behavior!
std::cout << someInt << std::endl;
}
});
t.detach();
}
Here are some possible ways to keep the rug from being swept out from under your thread:
Declare the int somewhere that will not go out of scope during the lifetime of any threads that need it (perhaps a global).
Declare shared data as a std::shared_ptr and pass that by value into the thread.
Pass by value (performing a copy) into the thread.
Pass by rvalue reference (performing a move) into the thread.
Yes. Detaching a thread just means that it cleans up after itself when it is finished and you no longer need to, nor are you allowed to, join it.