Continue executing another thread - c++

I'm currently playing with WinAPI and I have proceeded to threads. My program has n threads which work with each other. There is one main thread, which writes some data to specific memory location and waits until thread working with specific memory location processes the data. Then whole procedure repeats again, without terminating any thread.
My problem is that busy waiting for second thread wastes too much time.
Is there any way to suspend current thread (to leave time for enother threads) or to solve this problem differently?
Please help.

I'm guessing that you're currently polling / busy waiting in your main thread, constantly checking the state of some completion flag the worker thread will set. As you note, this isn't desirable as you use some proportion of cpu bandwidth just waiting for the worker to complete. In some cases, this will reduce the amount of time your worker is scheduled for, delaying its completion.
Rather that doing this, you can use a synchronisation object such as Event or Semaphore to have your main thread sleep until the worker signals its completion.

You can use synchronization objects like mutex, semaaphores events etc for synchronization and WaitForSingleObject/WaitForMultipleObject API for thread waiting.

Related

Event respond faster than semaphore?

In a project I run into a case like this (On windows 7),
When several threads are busy (all my CPU cores are busy working), there'll be delay for a thread
to receive a semaphore (which is increased from 0 to 1). It may be as long as 1.5ms.
I solve this by cache a little things and increase the semaphore value earlier.
So to me, it seems signaling a semaphore is slow, it's not immediately received by threads (especially when CPU are busy), but if you signal it earlier before some thread begin to wait on it,, there' be no delay.
I once thought event is just a semaphore with maximum value of 1,,, well, now having met this case, I'm beginning to wonder if event is faster than semaphore at noticing threads to 'wake up'.
Sorry, I tried, but didn't come out with a demo,, I'm not very good at threading yet.
EDIT:
Is it true that Event is faster than Semaphore on Windows?
1.5 milliseconds is not explained by just the overhead between different multithreading primitives.
To simplify, Threads have three states
blocked
runnable
running
If a thread is waiting on a semaphore or an event, then it's blocked. When the event is signalled, it becomes runnable.
So the real question is, "When does a runnable thread actually run?" This varies according to scheduler algorithms, etc, but obviously it needs to run on a core, and that means nothing else can be "running" on that core at the same time. The scheduler will normally 'remove' the current running thread from a core when one of the following happens
it waits on a semaphore/event, and so becomes 'blocked'
It's been running continually for a certain time (time-based, or round-robin scheduling)
A higher priority thread becomes runnable.
The 1.5 milliseconds is probably round-robin, or time-based scheduling. Your thread is runnable but just hasn't started yet. If the thread must start, and should boot out the current thread, then you can try to increase it's priority via SetThreadPriority
http://msdn.microsoft.com/en-us/library/windows/desktop/ms686277(v=vs.85).aspx
If a thread is waiting on a semaphore and it gets signaled, the thread will in my limited testing, become running in ~10us on a box that is not overloaded.
Signaling, and subsequent dispatching onto a core, will take longer if:
The signaled thread is in another process than any thread is preempts.
Running the signaled thread requires a thread running on another core to be preempted.
The box is already overloaded with higher-priority threads.
1.5ms must represent an extreme case where your box is very busy.
In such a case, replacing the semaphore with an event is unlikely to result in any significant improvement to overall signaling latency because the bulk of the work/delay required by the inter-thread signaling is tied up the in scheduling/dispatching, which is required in either case.

Synchronizing worker threads

I have a scenario for which I am trying to come up with the best synchronization approach. We assume that std::thread in C++11 is present, so no need to worry about differences between various threading libraries etc.
The scenario is this. Thread a, the main thread, wants to hand out tasks to a bunch of worker threads. Then, after giving out its final instruction for the time being, it needs to wait for all the threads to complete their work. We don't want to join them, just wait for them to finish their given task. Then thread a has to analyze the collected data from all threads, and then send out commands to the workers to begin the procedure again.
In short, these are the steps.
Thread a sends command x to all worker threads.
Thread a waits until all the workers have finished.
Thread a does processing.
Go back to 1.
What would you suggest that I use? Simple mutexes? Condition variables? A combination of the two? Any tips on how to structure the synchronization to be as efficient as possible would be appreciated.
You have n worker threads and one main thread a, which delegates tasks to workers and must wait for them to complete these tasks before assigning them a new batch of tasks.
The basic technique is to use a barrier (like boost::barrier) to synchronize the end of the worker threads and a.
The barrier is inittialized at n+1. Main thread a waits on the barrier, and each worker threads does the same at the end of its task. When the last thread called wait on the barrier, all the threads are woken up, and main thread can continue its work. You may want to add a second barrier to block the worker threads until a new task is assigned to them.
The body of worker thread may look like the following pseudocode:
while (running) {
startbarrier.wait(); // wait for main thread to signal start
do_work();
endbarrier.wait(); // signal end of work
}
The same thing can also be implemented with semaphores. Both semaphore and barrier can be implemented with a mutex and a condition variable.
See this SO question for more details.

sleeping a thread in the middle of execution

What happens when a thread is put to sleep by other thread, possible by main thread, in the middle of its execution?
assuming I've a function Producer. What if Consumer sleep()s the Producer in the middle of production of one unit ?
Suppose the unit is half produced. and then its put on sleep(). The integrity of system may be in a problem
The thread that sleep is invoked on is put in the idle queue by the thread scheduler and is context switched out of the CPU it is running on, so other threads can take it's place.
All context (registers, stack pointer, base pointer, etc) are saved on the thread stack, so when it's run next time, it can continue from where it left off.
The OS is constantly doing context switches between threads in order to make your system seem like it's doing multiple things. The OS thread scheduler algorithm takes care of that.
Thread scheduling and threading is a big subject, if you want to really understand it, I suggest you start reading up on it. :)
EDIT: Using sleep for thread synchronization purposes not advised, you should use proper synchronization mechanisms to tell the thread to wait for other threads, etc.
There is no problem associated with this, unless some state is mutated while the thread sleeps, so it wakes up with a different set of values than before going to sleep.
Threads are switched in and out of execution by the CPU all the time, but that does not affect the overall outcome of their execution, assuming no data races or other bugs are present.
It would be unadvisable for one thread to forcibly and synchronously interfere with the execution of another thread. One thread could send an asynchronous message to another requesting that it reschedule itself in some way, but that would be handled by the other thread when it was in a suitable state to do so.
Assuming they communicate using channels that are thread-safe, nothing bad shoudl happen, as the sleeping thread will wake up eventually and grab data from its task queue or see that some semaphore has been set and read the prodced data.
If the threads communicate using nonvolatile variables or direct function calls that change state, that's when Bad Things occur.
I don't know of a way for a thread to forcibly cause another thread to sleep. If two threads are accessing a shared resource (like an input/output queue, which seems likely for you Produce/Consumer example), then both threads may contend for the same lock. The losing thread must wait for the other thread to release the lock if the contention is not the "trylock" variety. The thread that waits is placed into a waiting queue associated with the lock, and is removed from the schedulers run queue. When the winning thread releases the lock, the code checks the queue to see if there are threads still waiting to acquire it. If there are, one is chosen as the winner and is given the lock, and placed in the scheduler run queue.

What happens when pthreads wait in mutex_lock/cond_wait?

I have a program that should get the maximum out of my cpu.
It is multithreaded via pthreads that do their job well apart from the fact that they "only" get my cores to about 60% load which is not enough in my opinion.
I am searching for the reason and am asking myself (and hereby you) if the blocking functions mutex_lock/cond_wait are candidates?
What happens when a thread cannot run on in such a function?
Does pthread switch to another thread it handles or
does the thread yield its time to the system and if the latter is the case, can I change this behavior?
Regards,
Nobody
More Information
The setting is one mainthread that fills the taskpool and countless workers that fetch jobs from there and wait on a conditional that is signaled via broadcast when a serialized calculation is done. They go on with the values from this calculation until they are done, deliver their mail and fetch the next job...
On a typical modern pthreads implementation, each thread is managed by the kernel not unlike a separate process. Any blocking call like pthread_mutex_lock or pthread_cond_wait (but also, say, read) will yield its time to the system. The system will then find another eligible thread to schedule, whether in your process or another process, and run it.
If your program is only taking 60% of the CPU, it is more likely blocked on I/O than on pthread operations, unless you have done something way too granular with your pthread operations.
If a thread is waiting on a mutex/condition, it doesn't use resources (well, uses just a tiny amount). Whenever the thread enters waiting state, control switches to other threads. When the mutex is released (or condition variable signalled), the thread wakes up and may acquire the mutex (if no other thread grabs it first), and continue to run. If however some other thread acquires the mutex (this can happen if several threads are waiting for it), the thread returns to sleeping state.

Conditional wait overhead

When using boost::conditional_variable, ACE_Conditional or directly pthread_cond_wait, is there any overhead for the waiting itself? These are more specific issues that trouble be:
After the waiting thread is unscheduled, will it be scheduled back before the wait expires and then unscheduled again or it will stay unscheduled until signaled?
Does wait acquires periodically the mutex? In this case, I guess it wastes each iteration some CPU time on system calls to lock and release the mutex. Is it the same as constantly acquiring and releasing a mutex?
Also, then, how much time passes between the signal and the return from wait?
Afaik, when using semaphores the acquire calls responsiveness is dependent on scheduler time slice size. How does it work in pthread_cond_wait? I assume this is platform dependent. I am more interested in Linux but if someone knows how it works on other platforms, it will help too.
And one more question: are there any additional system resources allocated for each conditional? I won't create 30000 mutexes in my code, but should I worry about 30000 conditionals that use the same one mutex?
Here's what is written in the pthread_cond man page:
pthread_cond_wait atomically unlocks the mutex and waits for the condition variable cond to be signaled. The thread execution is suspended and does not consume any CPU time until the condition variable is signaled.
So from here I'd answer to the questions as following:
The waiting thread won't be scheduled back before the wait was signaled or canceled.
There are no periodic mutex acquisitions. The mutex is reacquired only once before wait returns.
The time that passes between the signal and the wait return is similar to that of thread scheduling due to mutex release.
Regarding the resources, on the same man page:
In the LinuxThreads implementation, no resources are associated with condition variables, thus pthread_cond_destroy actually does nothing except checking that the condition has no waiting threads.
Update: I dug into the sources of pthread_cond_* functions and the behavior is as follows:
All the pthread conditionals in Linux are implemented using futex.
When a thread calls wait it is suspended and unscheduled. The thread id is inserted at the tail of a list of waiting threads.
When a thread calls signal the thread at the head of the list is scheduled back.
So, the waking is as efficient as the scheduler, no OS resources are consumed and the only memory overhead is the size of the waiting list (see futex_wake function).
You should only call pthread_cond_wait if the variable is already in the "wrong" state. Since it always waits, there is always the overhead associated with putting the current thread to sleep and switching.
When the thread is unscheduled, it is unscheduled. It should not use any resources, but of course an OS can in theory be implemented badly. It is allowed to re-acquire the mutex, and even to return, before the signal (which is why you must double-check the condition), but the OS will be implemented so this doesn't impact performance much, if it happens at all. It doesn't happen spontaneously, but rather in response to another, possibly-unrelated signal.
30000 mutexes shouldn't be a problem, but some OSes might have a problem with 30000 sleeping threads.