I would like to create a thread pool. I have a class called ServerThread.cpp, whose constructor should do something like this:
ServerThread::ServerThread()
{
for( int i=0 ; i<init_thr_num ; i++ )
{
//create a pool of threads
//suspend them, they will wake up when requests arrive for them to process
}
}
I was wondering if creating pthreads inside a constructor can cause any undefined behavior that one should avoid running into.
Thanks
You can certainly do that in a constructor but should be aware of a problem that is clearly explained by Scott Meyers ins his Effective/More Effective C++ books.
In short his point is that if any kind of exception is raised within a constructor, then your half-backed object will not be destroyed. This leads to memory leaks. So Meyers' suggestion is to have "light" constructors and then do the "heavy" work in an init method called after the object has been fully created.
This argument is not strictly related to creating a pool of pthreads within a constructor (whereby you might argue that no exception will be raised if you simply create them and then immediately suspend them), but is a general consideration about what to do in a constructor (read: good practices).
Another considerations to be done is that a constructor has no return value. While it is true that (if no exceptions are thrown) you can leave the object is a consistent state even if the thread creation fails, it would be possibly better to manage a return value from a kind of init or start method.
You could also read this thread on S.O. about the topic, and this one.
From a strictly formal point of view, a constructor is really just a
function like any other, and there shouldn't be any problem.
Practically, there could be an issue: the threads may actually start
running before the constructor has finished. If the threads need a
fully constructed ServerThread to operate, then you're in
trouble—this is often the case when ServerThread is a base
class, and the threads need to interact with the derived class. (This
is a very difficult problem to spot, because with the most frequently
used thread scheduling algorithms, the new thread will usually not
start executing immediately.)
Related
Let's say I have a small operation which I want to perform in a separate thread. I do not need to know when it completes, nor do I need to wait for its completion, but I do not want the operation blocking my current thread. When I write the following code, I will get a crash:
void myFunction() {
// do other stuff
std::thread([]()
{
// do thread stuff
});
}
This crash is solved by assigning the thread to a variable, and detaching it:
void myFunction() {
// do other stuff
std::thread t([]()
{
// do thread stuff
});
t.detach();
}
Why is this step necessary? Or is there a better way to create a small single-use thread?
Because the std::thread::~thread() specification says so:
A thread object does not have an associated thread (and is safe to destroy) after
it was default-constructed
it was moved from
join() has been called
detach() has been called
It looks like detach() is the only one of these that makes sense in your case, unless you want to return the thread object (by moving) to the caller.
Why is this step necessary?
Consider that the thread object represents a long-running "thread" of execution (a lightweight process or kernel schedulable entity or similar).
Allowing you to destroy the object while the thread is still executing, leaves you no way to subsequently join (and find the result of) that thread. This may be a logical error, but it can also make it hard even to correctly exit your program.
Or is there a better way to create a small single-use thread?
Not obviously, but it's frequently better to use a thread pool for running tasks in the background, instead of starting and stopping lots of short-lived threads.
You might be able to use std::async() instead, but the future it returns may block in the destructor in some circumstances, if you try to discard it.
See the documentation of the destructor of std:thread:
If *this has an associated thread (joinable() == true), std::terminate() is called.
You should explicitly say that you don't care what's going to happen with the thread, and that you're OK with loosing any control over it. And that is what detach is for.
In general, this looks like a design problem so crashing makes sense: it's hard to propose a general and not surprising rule about what should happen in such a case (e.g. your program might as well normally end its execution - what should happen with the thread?).
Basically, your use case requires a call to detach() because your use case is pretty weird, and not what C++ is trying to make easy.
While Java and .Net blithely let you toss away a Thread object whose associated thread is still running, in the C++ model the Thread is closer to being the thread, in the sense that the existence of the Thread object coincides with the lifetime, or at least joinability, of the execution it refers to. Note how it's not possible to create a Thread without starting it (except in the case of the default constructor, which is really just there in the service of move semantics), or to copy it or to make one from a thread id. C++ wants Thread to outlive the thread.
Maintaining that condition has various benefits. Final cleanup of a thread's control data doesn't have to be done automagically by the OS, because once a Thread goes away, nothing can ever try to join it. It's easier to ensure that variables with thread storage get destroyed in time, since the main thread is the last to exit (barring some move shenanigans). And a missing join -- which is an extremely common type of bug -- gets properly flagged at runtime.
Letting some thread wander off into the distance, in contrast, is allowed, but it's an unusual thing to do. Unless it's interacting with your other threads through sync objects, there's no way to ensure it's done whatever it was meant to do. A detached thread is on the level of reinterpret_cast: You're allowed to tell the compiler that you know something it doesn't, but that has to be explicit, not just the consequence of the function you didn't call.
Consider this: thread A creates thread B and thread A leaves its scope of execution. The handle for thread B is about to be lost. What should happen now? There are several possibilities, with most obvious as follows:
Thread B is detached and continues its execution indempedently
Thread A waits (joins) thread B before quiting its own scope
Now you can argue which is better: 1 or 2? How should we (the compiler) decide on which one of these is better?
So what the designers did was something different: crash terminate the code so that the developer picks one of these solutions explicitely. In order to avoid implicit (perhaps unwanted) behaviuor. It's a signal for you: "hey, pay attention now, this piece of code is important and I (the compiler) don't want to decide for you".
As the title of the question says, why C++ threads (std::thread and pthread) are movable but not copiable? What consequences are there, if we do make it copiable?
Regarding copying, consider the following snippet:
void foo();
std::thread first (foo);
std::thread second = first; // (*)
When the line marked (*) takes place, presumably some of foo already executed. What would the expected behavior be, then? Execute foo from the start? Halt the thread, copy the registers and state, and rerun it from there?
In particular, given that function objects are now part of the standard, it's very easy to launch another thread that performs exactly the same operation as some earlier thread, by reusing the function object.
There's not much motivation to begin with for this, therefore.
Regarding moves, though, consider the following:
std::vector<std::thread> threads;
without move semantics, it would be problematic: when the vector needs to internally resize, how would it move its elements to another buffer? See more on this here.
If the thread objects are copyable, who is finally responsible for the single thread of execution associated with the thread objects? In particular, what would join() do for each of the thread objects?
There are several possible outcomes, but that is the problem, there are several possible outcomes with no real overlap that can be codified (standardised) as a general use case.
Hence, the most reasonable outcome is that 1 thread of execution is associated with at most 1 thread object.
That is not to say some shared state cannot be provided, it is just that the user then needs to take further action in this regard, such as using a std::shared_ptr.
A few days ago my friend told me about the situation, they had in their project.
Someone decided, that it would be good to destroy the object of NotVerySafeClass in parallel thread (like asynchronously). It was implemented some time ago.
Now they get crashes, because some method is called in main thread, while object is destroyed.
Some workaround was created to handle the situation.
Ofcourse, this is just an example of not very good solution, but still the question:
Is there some way to prevent the situation internally in NotVerySafeClass (deny running the methods, if destructor was called already, and force the destructor to wait, until any running method is over (let's assume there is only one method))?
No, no and no. This is a fundamental design issue, and it shows a common misconception in thinking about multithreaded situations and race conditions in general.
There is one thing that can happen equally likely, and this is really showing that you need an ownership concept: The function calling thread could call the function just right after the object has been destroyed, so there is no object anymore and try to call a function on it is UB, and since the object does not exist anymore, it also has no chance to prevent any interaction between the dtor and a member function.
What you need is a sound ownership policy. Why is the code destroying the object when it is still needed?
Without more details about the code, a std::shared_ptr would probably solve this issue. Depending on your specific situation, you may be able to solve it with a more lightweight policy.
Sounds like a horrible design. Can't you use smart pointer to make sure the object is destroyed only when no-one holds any references to it?
If not, I'd use some external synchronization mechanism. Synchronizing the destructor with a method is really awkward.
There is no methods that can be used to prevent this scenario.
In multithread programming, you need to make sure that an object will not be deleted if there are some others thread still accessing it.
If you are dealing with such code, it needs fundamental fix
(Not to promote bad design) but to answer your two questions:
... deny running the methods, if destructor was called already
You can do this with the solution proposed by #snemarch and #Simon (a lock). To handle the situation where one thread is inside the destructor, while another one is waiting for the lock at the beginning of your method, you need to keep track of the state of the object in a thread-safe way in memory shared between threads. E.g. a static atomic int that is set to 0 by the destructor before releasing the lock. The method checks for the int once it acquires the lock and bails if its 0.
... force the destructor to wait, until any running method is over
The solution proposed by #snemarch and #Simon (a lock) will handle this.
No. Just need to design the program propertly so that it is thread safe.
Why not make use of a mutex / semaphore ? At the beginning of any method the mutex is locked, and the destructor wait until the mutex is unlocked. It's a fix, not a solution. Maybe you should change the design of a part of your application.
Simple answer: no.
Somewhat longer answer: you could guard each and every member function and the destructor in your class with a mutex... welcome to deadlock opportunities and performance nightmares.
Gather a mob and beat some design sense into the 'someone' who thought parallel destruction was a good idea :)
I have the following sample C++ code:
class Factory
{
public:
static Factory& createInstance()
{
static Factory fac;
return fac;
}
private:
Factory()
{
//Does something non-trivial
}
};
Let's assume that createInstance is called by two threads at the same time. So will the resulting object be created properly? What happens if the second thread enters the createInstance call when the first thread is in the constructor of Factory?
C++11 and above: local static creation is thread-safe.
The standard guarantees that:
The creation is synchronized.
Should the creation throws an exception, the next time the flow of execution passes the variable definition point, creation will be attempted again.
It is generally implemented with double-checking:
first a thread-local flag is checked, and if set, then the variable is accessed.
if not yet set, then a more expensive synchronized path is taken, and if the variable is created afterward, the thread-local flag is set.
C++03 and C++98: the standard knows no thread.
There are no threads as far as the Standard is concerned, and therefore there is no provision in the Standard regarding synchronization across threads.
However some compilers implement more than the standard mandates, either in the form of extensions or by giving stronger guarantees, so check out for the compilers you're interested in. If they are good quality ones, chances are that they will guarantee it.
Finally, it might not be necessary for it to be thread-safe. If you call this method before creating any thread, then you ensures that it will be correctly initialized before the real multi-threading comes into play, and you'll neatly side-step the issue.
Looking at this page, I'd say that this is not thread-safe, because the constructor could get called multiple times before the variable is finally assigned. An InterlockedCompareExchange() might be needed, where you create a local copy of the variable, then atomically assign the pointer to a static field via the interlocked function, if the static variable is null.
Of course it's thread safe! Unless you are a complete lunatic and spawn threads from constructors of static objects, you won't have any threads until after main() is called, and the createInstance method is just returning a reference to an already constructed object, there's no way this can fail. ISO C++ guarantees that the object will be constructed before the first use after main() is called: there's no assurance that will be before main is called, but is has to be before the first use, and so all systems will perform the initialisation before main() is called. Of course ISO C++ doesn't define behaviour in the presence of threads or dynamic loading, but all compilers for host level machines provide this support and will try to preserve the semantics specified for singly threaded statically linked code where possible.
The instantiation (first call) itself is threadsafe.
However, subsequent access will not be, in general. For instance, suppose after instantiation, one thread calls a mutable Factory method and another calls some accessor method in Factory, then you will be in trouble.
For example, if your factory keeps a count of the number of instances created, you will be in trouble without some kind of mutex around that variable.
However, if Factory is truly a class with no state (no member variables), then you will be okay.
In general, if you have a class that inherits from a Thread class, and you want instances of that class to automatically deallocate after they are finished running, is it okay to delete this?
Specific Example:
In my application I have a Timer class with one static method called schedule. Users call it like so:
Timer::schedule((void*)obj, &callbackFunction, 15); // call callbackFunction(obj) in 15 seconds
The schedule method creates a Task object (which is similar in purpose to a Java TimerTask object). The Task class is private to the Timer class and inherits from the Thread class (which is implemented with pthreads). So the schedule method does this:
Task *task = new Task(obj, callback, seconds);
task->start(); // fork a thread, and call the task's run method
The Task constructor saves the arguments for use in the new thread. In the new thread, the task's run method is called, which looks like this:
void Timer::Task::run() {
Thread::sleep(this->seconds);
this->callback(this->obj);
delete this;
}
Note that I can't make the task object a stack allocated object because the new thread needs it. Also, I've made the Task class private to the Timer class to prevent others from using it.
I am particularly worried because deleting the Task object means deleting the underlying Thread object. The only state in the Thread object is a pthread_t variable. Is there any way this could come back to bite me? Keep in mind that I do not use the pthread_t variable after the run method finishes.
I could bypass calling delete this by introducing some sort of state (either through an argument to the Thread::start method or something in the Thread constructor) signifying that the method that is forked to should delete the object that it is calling the run method on. However, the code seems to work as is.
Any thoughts?
I think the 'delete this' is safe, as long as you don't do anything else afterwards in the run() method (because all of the Task's object's member variables, etc, will be freed memory at that point).
I do wonder about your design though... do you really want to be spawning a new thread every time someone schedules a timer callback? That seems rather inefficient to me. You might look into using a thread pool (or even just a single persistent timer thread, which is really just a thread pool of size one), at least as an optimization for later. (or better yet, implement the timer functionality without spawning extra threads at all... if you're using an event loop with a timeout feature (like select() or WaitForMultipleObjects()) it is possible to multiplex an arbitrary number of independent timer events inside a single thread's event loop)
There's nothing particularly horrible about delete this; as long as you assure that:the object is always dynamically allocated, andno member of the object is ever used after it's deleted.
The first of these is the difficult one. There are steps you can take (e.g. making the ctor private) that help, but nearly anything you do can be bypassed if somebody tries hard enough.
That said, you'd probably be better off with some sort of thread pool. It tends to be more efficient and scalable.
Edit: When I talked about being bypassed, I was thinking of code like this:
class HeapOnly {
private:
HeapOnly () {} // Private Constructor.
~HeapOnly () {} // A Private, non-virtual destructor.
public:
static HeapOnly * instance () { return new HeapOnly(); }
void destroy () { delete this; } // Reclaim memory.
};
That's about as good of protection as we can provide, but getting around it is trivial:
int main() {
char buffer[sizeof(HeapOnly)];
HeapOnly *h = reinterpret_cast<HeapOnly *>(buffer);
h->destroy(); // undefined behavior...
return 0;
}
When it's direct like this, this situation's pretty obvious. When it's spread out over a larger system, with (for example) an object factory actually producing the objects, and code somewhere else entirely allocating the memory, etc., it can become much more difficult to track down.
I originally said "there's nothing particularly horrible about delete this;", and I stand by that -- I'm not going back on that and saying it shouldn't be used. I am trying to warn about the kind of problem that can arise with it if other code "Doesn't play well with others."
delete this frees the memory you have explicitly allocated for the thread to use, but what about the resources allocated by the OS or pthreads library, such as the thread's call stack and kernel thread/process structure (if applicable)? If you never call pthread_join() or pthread_detach() and you never set the detachstate, I think you still have a memory leak.
It also depends on how your Thread class is designed to be used. If it calls pthread_join() in its destructor, that's a problem.
If you use pthread_detach() (which your Thread object might already be doing), and you're careful not to dereference this after deleting this, I think this approach should be workable, but others' suggestions to use a longer-lived thread (or thread pool) are well worth considering.
If all you ever do with a Task object is new it, start it, and then delete it, why would you need an object for it anyway? Why not simply implement a function which does what start does (minus object creation and deletion)?