For starting another program I use fork() and exec() in my code.
Since my program uses the Threading Building Blocks library for task management it initializes the scheduler with a thread pool before.
Whenever I do a fork it seems that all the threads are being forked too (checked the number of threads with top). From what I've read on the Internet only the current thread should be forked.
How do I achieve this behaviour and is the Threading Building Blocks causing the fork of multiple threads?
I believe the Internet is correct in this regard, i.e. right after fork a newly created process has only single thread, one that called fork. Problem with fork in multithreaded program is state integrity for other (not doing fork) threads, i.e. if a lock is taken during fork, it must be untaken in both processes, new and old. TBB has some support for dealing with it, but I’m not sure this is what you need, as exec right after fork is replacing all memory, so taken locks must be not an issue.
If you are doing something special (say, taking a lock possibly locked by TBB workers) between fork and exec, than 1st obstacle with TBB is state of workers. TBB allows you to wait till workers termination (note this is preview functionality).
#define TBB_PREVIEW_WAITING_FOR_WORKERS 1
#include "tbb/task_scheduler_init.h"
{
tbb::task_scheduler_init sch(threads, 0, /*wait_workers=*/true);
tbb::parallel_for(…);
} // wait workers here, no worker threads after this point
Without this special argument to task_scheduler_init(), there is no guarantee for workers termination.
Related
I have a main linux thread (th1) that runs a number of boost fibers that are scheduled using the boost priority scheduler.
Every so often, I would like to launch a fiber from another thread (th2) that will run in th1 and be scheduled along with the other th1 fibers. The code I use to launch fibers in th1 looks like:
void launchFiber()
{
boost::fibers::use_scheduling_algorithm< priority_scheduler >()
boost::fibers::fiber *fib = new boost::fibers::fiber(fb_fiberFunction);
priority_props & props( fib->properties< priority_props >() );
props.set_priority(FiberPriorityValue);
props.name = "Fiber Name";
fib->detach();
}
The launch code works fine when I call the launchFiber function from th1 but it does not work when I call it from th2--it looks like the fiber is not added to the th1 fiber queue. I have added a mutex to the th1 priority_scheduler routine to protect the fiber queue but this doesn't seem to help.
It seems to me that I don't really understand how the fiber system is working when there is more than one thread involved. I have tried to look at the library source code but it is not really clear to me.
My guess is that this would be simple if I understood it correctly. Could someone provide an example of how I might do this.
Contrary to system threads, fibers are based on cooperative scheduling. This means that you should manually tell to the scheduler when another fiber can be scheduled. The scheduler can choose the best fiber to schedule during this user-defined scheduling point. Here, the scheduler will choose the one with the highest priority. If there are no fibers with higher priority, then the same fiber can resume its execution back. The documentation states:
Each fiber has its own stack.
A fiber can save the current execution state, including all registers and CPU flags, the instruction pointer, and the stack pointer and later restore this state. The idea is to have multiple execution paths running on a single thread using cooperative scheduling (versus threads, which are preemptively scheduled). The running fiber decides explicitly when it should yield to allow another fiber to run (context switching).
Control is cooperatively passed between fibers launched on a given thread. At a given moment, on a given thread, at most one fiber is running. Spawning additional fibers on a given thread does not distribute your program across more hardware cores, though it can make more effective use of the core on which it's running.
this_fiber::yield() is meant to perform the actual yields operations on the current fiber.
Note that fibers are not safely compatible with thread-local storage if they are moved between threads (not the case by default) and using basic basic mutex/condition variables is not safe either, particularly if a yield can appear in a middle of a protected code (critical section) as it can cause deadlocks. It can also be sub-optimal because mutexes can cause the current thread to be pre-empted or passively waiting while another fiber could do computations. Boost provide alternative synchronisation mechanisms for fibers that are safer/more-efficient though one still need to care about that. This is why fibers cannot be used to execute any arbitrary code blindly.
For more information, you can give a look to the examples starting from the simplest one.
I did spend some time looking into this problem. It turns out that executing the command: boost::fibers::use_scheduling_algorithm< priority_scheduler >() creates a new priority_scheduler object with its own fiber queue. And this scheduler is associated with a context that is specific to the thread it is running in. So, in my circumstance, when I created a new fiber it ended up in the queue specific to the calling thread (th2, which wasn't running fibers) instead of the thread that was running all my fibers, th1.
So, I abandoned my idea of creating a fiber to run in th1 by a call from th2. I now using a queue that queues fiber launch requests from external threads. The fiber thread (th1) will check this queue when it executes the scheduler pick_next() function and if requests exist, fibers are created and added to th1's scheduler queue. It works fine--though I have an intermediate queue which I would prefer not to have (for esthetic reasons only).
I'm designing a thread library. So far I have a method that initializes the library, one that creates threads, and one that yields the current thread to the next one on a queue of ready threads.
Before I move on to implementing semaphores for the threads, I figured I should probably kill the threads as soon as they are done and free up their allocated memory, but I'm having trouble figuring out how to do that. How do I tell when a thread has "finished"?
You don't just kill threads safely or reliably -- let them exit naturally (when their entry returns).
Although the system provides a means to kill the thread, nearly any C++ program out there could expect undefined behavior if it were to continue. You could dream up cases where killing could be accomplished without side effects (to the rest of the program), but that program does not at all resemble idiomatic C++. Such a program would be very exotic, with many unusual and severe restrictions.
When you want to known when a thread has exited or not, you can add some cleanup before it exits in order to track its status.
When you want the ability to request a thread exit (naturally), consider run loops and messages.
You don't explicitly kill the threads when they are finished running their forked procedures as the code which would be doing that would still be in the context of the thread to be killed.
You have a scheduler/interrupt handler which handles the context switching of the threads and maintains a few queues for managing this. You can have it save a reference to to the threads to be killed, something like scheduler->SetThreadToKill( currentThread ); inside probably your finish() method (or similar), which sets a flag for the corresponding threads.
When a context switch occurs, and you have swapped out all data structures of the current thread with that of the next thread, you scheduler can call the destructor for all the threads which have the toBeKilled flag set.
The best policy, by far, for killing threads is to not explicitly do it, (unless you are an OS, ie. on app shutdown). Queue messages and tasks to threads that loop around some queue to perform more work. If you don't write any code to continually new, create, start, terminate, delete, test, check, enlist, delist, enqueue, dequeue and otherwise micro-manage threads, then that code cannot contain bugs.
I have a program that should get the maximum out of my cpu.
It is multithreaded via pthreads that do their job well apart from the fact that they "only" get my cores to about 60% load which is not enough in my opinion.
I am searching for the reason and am asking myself (and hereby you) if the blocking functions mutex_lock/cond_wait are candidates?
What happens when a thread cannot run on in such a function?
Does pthread switch to another thread it handles or
does the thread yield its time to the system and if the latter is the case, can I change this behavior?
Regards,
Nobody
More Information
The setting is one mainthread that fills the taskpool and countless workers that fetch jobs from there and wait on a conditional that is signaled via broadcast when a serialized calculation is done. They go on with the values from this calculation until they are done, deliver their mail and fetch the next job...
On a typical modern pthreads implementation, each thread is managed by the kernel not unlike a separate process. Any blocking call like pthread_mutex_lock or pthread_cond_wait (but also, say, read) will yield its time to the system. The system will then find another eligible thread to schedule, whether in your process or another process, and run it.
If your program is only taking 60% of the CPU, it is more likely blocked on I/O than on pthread operations, unless you have done something way too granular with your pthread operations.
If a thread is waiting on a mutex/condition, it doesn't use resources (well, uses just a tiny amount). Whenever the thread enters waiting state, control switches to other threads. When the mutex is released (or condition variable signalled), the thread wakes up and may acquire the mutex (if no other thread grabs it first), and continue to run. If however some other thread acquires the mutex (this can happen if several threads are waiting for it), the thread returns to sleeping state.
I am developing a C++ application that needs to process large amount of data. I am not in position to partition data so that multi-processes can handle each partition independently. I am hoping to get ideas on frameworks/libraries that can manage threads and work allocation among worker threads.
Manage threads should include at least below functionality.
1. Decide on how many workers threads are required. We may need to provide user-defined function to calculate number of threads.
2. Create required number of threads.
3. Kill/stop unnecessary threads to reduce resource wastage.
4. Monitor healthiness of each worker thread.
Work allocation should include below functionality.
1. Using callback functionality, the library should get a piece of work.
2. Allocate the work to available worker thread.
3. Master/slave configuration or pipeline-of-worker-threads should be possible.
Many thanks in advance.
Your question essentially boils down to "how do I implement a thread pool?"
Writing a good thread pool is tricky. I recommend hunting for a library that already does what you want rather than trying to implement it yourself. Boost has a thread-pool library in the review queue, and both Microsoft's concurrency runtime and Intel's Threading Building Blocks contain thread pools.
With regard to your specific questions, most platforms provide a function to obtain the number of processors. In C++0x this is std::thread::hardware_concurrency(). You can then use this in combination with information about the work to be done to pick a number of worker threads.
Since creating threads is actually quite time consuming on many platforms, and blocked threads do not consume significant resources beyond their stack space and thread info block, I would recommend that you just block worker threads with no work to do on a condition variable or similar synchronization primitive rather than killing them in the first instance. However, if you end up with a large number of idle threads, it may be a signal that your pool has too many threads, and you could reduce the number of waiting threads.
Monitoring the "healthiness" of each thread is tricky, and typically platform dependent. The simplest way is just to check that (a) the thread is still running, and hasn't unexpectedly died, and (b) the thread is processing tasks at an acceptable rate.
The simplest means of allocating work to threads is just to use a single shared job queue: all tasks are added to the queue, and each thread takes a task when it has completed the previous task. A more complex alternative is to have a queue per thread, with a work-stealing scheme that allows a thread to take work from others if it has run out of tasks.
If your threads can submit tasks to the work queue and wait for the results then you need to have a scheme for ensuring that your worker threads do not all get stalled waiting for tasks that have not yet been scheduled. One option is to spawn a new thread when a task gets blocked, and another is to run the not-yet-scheduled task that is blocking a given thread on that thread directly in a recursive manner. There are advantages and disadvantages with both these schemes, and with other alternatives.
I have a program which:
has a main thread (1) which starts a server thread (2) and another (4).
the server thread (2) does an accept(), then creates a new thread (3) to handle the connection.
At some point, thread (4) does a fork/exec to run another program which should connect to the socket that thread (2) is listening to. Occasionally this fails or takes an unreasonably long time, and it's extremely difficult to diagnose. If I strace the system, it appears that the fork/exec has worked, the accept has happened, the new thread (4) has been created .. but nothing happens in that thread (using strace -ff, the file for the relevant pid is blank).
Any ideas?
I came to the conclusion that it was probably this phenomenon:
http://kerneltrap.org/mailarchive/linux-kernel/2008/8/15/2950234/thread
as the bug is difficult to trigger on our development systems but is generally reported by users running on large shared machines; also the forked application starts a JVM, which itself allocates a lot of threads. The problem is also associated with the machine being loaded, and extensive memory usage (we have a machine with 128Gb of RAM and processes may be 10-100G in size).
I've been reading the O'Reilly pthreads book, which explains pthread_atfork(), and suggests the use of a "surrogate parent" process forked from the main process at startup from which subprocesses are run. It also suggests the use of a pre-created thread pool. Both of these seem like good ideas, so I'm going to implement at least one of them.
It's look like a deadlock condition. Look for blocking functions, like accept(), the problem should be there.
Decrease the code to the smallest possible size that still has the behavior and post it here. Either you will find the answer or we will be able to track it down.
BTW - http://lists.samba.org/archive/linux/2002-February/002171.html it seems that pthread behavior for exec is not well defined and may depend on your OS.
Do you have any code between fork and exec? This may be a problem.
Be very careful with multiple threads and fork. Most of glibc/libstdc++ is thread safe. If a thread, other than the forking thread, is holding a lock when the fork executes the forked process will inherit the mutexes in their current locked state. The new process will never see those mutexes unlocked. For more information see man pthread_atfork.
I've just fallen into same problems, and finally found that fork() duplicates all the threads. Now imagine, what does your program do after a fork() with all the threads running double instance...
The following rules are from "A Mini-guide regarding fork() and Pthreads":
1- You DO NOT WANT to do that.
2- If you needs to fork() then:
whenever possible, fork() all your
childs prior to starting any threads.
Edit: tried, fork() does not duplicate threads.