As the title of the question says, why C++ threads (std::thread and pthread) are movable but not copiable? What consequences are there, if we do make it copiable?
Regarding copying, consider the following snippet:
void foo();
std::thread first (foo);
std::thread second = first; // (*)
When the line marked (*) takes place, presumably some of foo already executed. What would the expected behavior be, then? Execute foo from the start? Halt the thread, copy the registers and state, and rerun it from there?
In particular, given that function objects are now part of the standard, it's very easy to launch another thread that performs exactly the same operation as some earlier thread, by reusing the function object.
There's not much motivation to begin with for this, therefore.
Regarding moves, though, consider the following:
std::vector<std::thread> threads;
without move semantics, it would be problematic: when the vector needs to internally resize, how would it move its elements to another buffer? See more on this here.
If the thread objects are copyable, who is finally responsible for the single thread of execution associated with the thread objects? In particular, what would join() do for each of the thread objects?
There are several possible outcomes, but that is the problem, there are several possible outcomes with no real overlap that can be codified (standardised) as a general use case.
Hence, the most reasonable outcome is that 1 thread of execution is associated with at most 1 thread object.
That is not to say some shared state cannot be provided, it is just that the user then needs to take further action in this regard, such as using a std::shared_ptr.
Related
I want to do roughly this:
Initial thread:
write some values to global vars (they will never be written again)
This could be moderately large data (arrays, strings, etc). Cannot simply be made std::atomic<>.
spawn other threads
Other threads:
read the global state
do work, etc.
Now, I know I can pass arguments to std::thread, but I'm trying to understand the memory guarantees of C++ through this example.
Also, I am pretty confident that on any real-world implementation, creating a thread will cause a memory barrier ensuring that the thread can "see" everything the parent thread wrote up until that point.
But my question is: is this guaranteed by the standard?
Aside: I suppose I could add some dummy std::atomic<int> or so, and write to that before starting the other threads, then on the other threads, read that once on startup. I believe all the happens-before machinery would then guarantee that the previously-written global state is properly visible.
But my question is if something like that is technically required, or is thread creation enough?
Thread creation is enough. There is a synchronization point between the thread constructor and the start of the new thread per [thread.thread.constr]/6
Synchronization: The completion of the invocation of the constructor synchronizes with the beginning of the invocation of the copy of f.
This means that all state in the thread before the new thread is spawned is visible to the spawned thread.
Let's say I have a small operation which I want to perform in a separate thread. I do not need to know when it completes, nor do I need to wait for its completion, but I do not want the operation blocking my current thread. When I write the following code, I will get a crash:
void myFunction() {
// do other stuff
std::thread([]()
{
// do thread stuff
});
}
This crash is solved by assigning the thread to a variable, and detaching it:
void myFunction() {
// do other stuff
std::thread t([]()
{
// do thread stuff
});
t.detach();
}
Why is this step necessary? Or is there a better way to create a small single-use thread?
Because the std::thread::~thread() specification says so:
A thread object does not have an associated thread (and is safe to destroy) after
it was default-constructed
it was moved from
join() has been called
detach() has been called
It looks like detach() is the only one of these that makes sense in your case, unless you want to return the thread object (by moving) to the caller.
Why is this step necessary?
Consider that the thread object represents a long-running "thread" of execution (a lightweight process or kernel schedulable entity or similar).
Allowing you to destroy the object while the thread is still executing, leaves you no way to subsequently join (and find the result of) that thread. This may be a logical error, but it can also make it hard even to correctly exit your program.
Or is there a better way to create a small single-use thread?
Not obviously, but it's frequently better to use a thread pool for running tasks in the background, instead of starting and stopping lots of short-lived threads.
You might be able to use std::async() instead, but the future it returns may block in the destructor in some circumstances, if you try to discard it.
See the documentation of the destructor of std:thread:
If *this has an associated thread (joinable() == true), std::terminate() is called.
You should explicitly say that you don't care what's going to happen with the thread, and that you're OK with loosing any control over it. And that is what detach is for.
In general, this looks like a design problem so crashing makes sense: it's hard to propose a general and not surprising rule about what should happen in such a case (e.g. your program might as well normally end its execution - what should happen with the thread?).
Basically, your use case requires a call to detach() because your use case is pretty weird, and not what C++ is trying to make easy.
While Java and .Net blithely let you toss away a Thread object whose associated thread is still running, in the C++ model the Thread is closer to being the thread, in the sense that the existence of the Thread object coincides with the lifetime, or at least joinability, of the execution it refers to. Note how it's not possible to create a Thread without starting it (except in the case of the default constructor, which is really just there in the service of move semantics), or to copy it or to make one from a thread id. C++ wants Thread to outlive the thread.
Maintaining that condition has various benefits. Final cleanup of a thread's control data doesn't have to be done automagically by the OS, because once a Thread goes away, nothing can ever try to join it. It's easier to ensure that variables with thread storage get destroyed in time, since the main thread is the last to exit (barring some move shenanigans). And a missing join -- which is an extremely common type of bug -- gets properly flagged at runtime.
Letting some thread wander off into the distance, in contrast, is allowed, but it's an unusual thing to do. Unless it's interacting with your other threads through sync objects, there's no way to ensure it's done whatever it was meant to do. A detached thread is on the level of reinterpret_cast: You're allowed to tell the compiler that you know something it doesn't, but that has to be explicit, not just the consequence of the function you didn't call.
Consider this: thread A creates thread B and thread A leaves its scope of execution. The handle for thread B is about to be lost. What should happen now? There are several possibilities, with most obvious as follows:
Thread B is detached and continues its execution indempedently
Thread A waits (joins) thread B before quiting its own scope
Now you can argue which is better: 1 or 2? How should we (the compiler) decide on which one of these is better?
So what the designers did was something different: crash terminate the code so that the developer picks one of these solutions explicitely. In order to avoid implicit (perhaps unwanted) behaviuor. It's a signal for you: "hey, pay attention now, this piece of code is important and I (the compiler) don't want to decide for you".
Assume that I have code like:
void InitializeComplexClass(ComplexClass* c);
class Foo {
public:
Foo() {
i = 0;
InitializeComplexClass(&c);
}
private:
ComplexClass c;
int i;
};
If I now do something like Foo f; and hand a pointer to f over to another thread, what guarantees do I have that any stores done by InitializeComplexClass() will be visible to the CPU executing the other thread that accesses f? What about the store writing zero into i? Would I have to add a mutex to the class, take a writer lock on it in the constructor and take corresponding reader locks in any methods that accesses the member?
Update: Assume I hand a pointer over to a bunch of other threads once the constructor has returned. I'm not assuming that the code is running on x86, but could be instead running on something like PowerPC, which has a lot of freedom to do memory reordering. I'm essentially interested in what sorts of memory barriers the compiler has to inject into the code when the constructor returns.
In order for the other thread to be able to know about your new object, you have to hand over the object / signal other thread somehow. For signaling a thread you write to memory. Both x86 and x64 perform all memory writes in order, CPU does not reorder these operations with regards to each other. This is called "Total Store Ordering", so CPU write queue works like "first in first out".
Given that you create an object first and then pass it on to another thread, these changes to memory data will also occur in order and the other thread will always see them in the same order. By the time the other thread learns about the new object, the contents of this object was guaranteed to be available for that thread even earlier (if the thread only somehow knew where to look).
In conclusion, you do not have to synchronise anything this time. Handing over the object after it has been initialised is all the synchronisation you need.
Update: On non-TSO architectures you do not have this TSO guarantee. So you need to synchronise. Use MemoryBarrier() macro (or any interlocked operation), or some synchronisation API. Signalling the other thread by corresponding API causes also synchronisation, otherwise it would not be synchronisation API.
x86 and x64 CPU may reorder writes past reads, but that is not relevant here. Just for better understanding - writes can be ordered after reads since writes to memory go through a write queue and flushing that queue may take some time. On the other hand, read cache is always consistent with latest updates from other processors (that have went through their own write queue).
This topic has been made so unbelievably confusing for so many, but in the end there is only a couple of things a x86-x64 programmer has to be worried about:
- First, is the existence of write queue (and one should not at all be worried about read cache!).
- Secondly, concurrent writing and reading in different threads to same variable in case of non-atomic variable length, which may cause data tearing, and for which case you would need synchronisation mechanisms.
- And finally, concurrent updates to same variable from multiple threads, for which we have interlocked operations, or again synchronisation mechanisms.)
If you do :
Foo f;
// HERE: InitializeComplexClass() and "i" member init are guaranteed to be completed
passToOtherThread(&f);
/* From this point, you cannot guarantee the state/members
of 'f' since another thread can modify it */
If you're passing an instance pointer to another thread, you need to implement guards in order for both threads to interact with the same instance. If you ONLY plan to use the instance on the other thread, you do not need to implement guards. However, do not pass a stack pointer like in your example, pass a new instance like this:
passToOtherThread(new Foo());
And make sure to delete it when you are done with it.
The C++11 standard says:
30.6.6 Class template future
(3) "The effect of calling any member function other than the destructor,
the move-assignment operator, or valid on a future object for which
valid() == false is undefined."
So, does it mean that the following code might encounter undefined behaviour?
void wait_for_future(std::future<void> & f)
{
if (f.valid()) {
// what if another thread meanwhile calls get() on f (which invalidates f)?
f.wait();
}
else {
return;
}
}
Q1: Is this really a possible undefined behaviour?
Q2: Is there any standard compliant way to avoid the possible undefined behaviour?
Note that the standard has an interesting note [also in 30.6.6 (3)]:
"[Note: Implementations are encouraged
to detect this case and throw an object of type future_error with an
error condition of future_errc::no_state. —endnote]"
Q3: Is it ok if I just rely on the standard's note and just use f.wait() without checking f's validity?
void wait_for_future(std::future<void> & f)
{
try {
f.wait();
}
catch (std::future_error const & err) {
return;
}
}
EDIT: Summary after receiving the answers and further research on the topic
As it turned out, the real problem with my example was not directly due to parallel modifications (a single modifying get was called from a single thread, the other thread called valid and wait which shall be safe).
The real problem was that the std::future object's get function was accessed from a different thread, which is not the intended use case! The std::future object shall only be used from a single thread!
The only other thread that is involved is the thread that sets the shared state: via return from the function passed to std::async or calling set_value on the related std::promise object, etc.
More: even waiting on an std::future object from another thread is not intended behaviour (due to the very same UB as in my example#1). We shall use std::shared_future for this use case, having each thread its own copy of an std::shared_future object. Note that all these are not through the same shared std::future object, but through separate (related) objects!
Bottom line:
These objects shall not be shared between threads. Use a separate (related) object in each thread.
A normal std::future is not threadsafe by itself. So yes it is UB, if you call modifying functions from multiple threads on a single std::future as you have a potential race condition. Though, calling wait from multiple threads is ok as it's const/non-modifying.
However, if you really need to access the return value of a std::future from multiple threads you can first call std::future::share on the future to get a std::shared_future which you can copy to each thread and then each thread can call get. Note that it's important that each thread has its own std::shared_future object.
You only need to check valid if it is somehow possible that your future might be invalid which is not the case for the normal usecases(std::async etc.) and proper usage(e.g.: not callig get twice).
Futures allow you to store the state from one thread and retrieve it from another. They don't provide any further thread safety.
Is this really a possible undefined behaviour?
If you have two threads trying to get the future's state without synchronisation, yes. I've no idea why you might do that though.
Is there any standard compliant way to avoid the possible undefined behaviour?
Only try to get the state from one thread; or, if you genuinely need to share it between threads, use a mutex or other synchronisation.
Is it ok if I just rely on the standard's note
If you known that the only implementations you need to support follow that recommendation, yes. But there should be no need.
and just use f.wait() without checking f's validity?
If you're not doing any weird shenanigans with multiple threads accessing the future, then you can just assume that it's valid until you've retrieved the state (or moved it to another future).
I would like to create a thread pool. I have a class called ServerThread.cpp, whose constructor should do something like this:
ServerThread::ServerThread()
{
for( int i=0 ; i<init_thr_num ; i++ )
{
//create a pool of threads
//suspend them, they will wake up when requests arrive for them to process
}
}
I was wondering if creating pthreads inside a constructor can cause any undefined behavior that one should avoid running into.
Thanks
You can certainly do that in a constructor but should be aware of a problem that is clearly explained by Scott Meyers ins his Effective/More Effective C++ books.
In short his point is that if any kind of exception is raised within a constructor, then your half-backed object will not be destroyed. This leads to memory leaks. So Meyers' suggestion is to have "light" constructors and then do the "heavy" work in an init method called after the object has been fully created.
This argument is not strictly related to creating a pool of pthreads within a constructor (whereby you might argue that no exception will be raised if you simply create them and then immediately suspend them), but is a general consideration about what to do in a constructor (read: good practices).
Another considerations to be done is that a constructor has no return value. While it is true that (if no exceptions are thrown) you can leave the object is a consistent state even if the thread creation fails, it would be possibly better to manage a return value from a kind of init or start method.
You could also read this thread on S.O. about the topic, and this one.
From a strictly formal point of view, a constructor is really just a
function like any other, and there shouldn't be any problem.
Practically, there could be an issue: the threads may actually start
running before the constructor has finished. If the threads need a
fully constructed ServerThread to operate, then you're in
trouble—this is often the case when ServerThread is a base
class, and the threads need to interact with the derived class. (This
is a very difficult problem to spot, because with the most frequently
used thread scheduling algorithms, the new thread will usually not
start executing immediately.)