I am a newbie to the multithread programming. After some research about the functions, e.g. join, sleep, joinable, I am still confused how does the computer decide when to switch operation from one thread into another thread? Let's say that you have one core with multiple threads, have an infinite while loop in each thread (e.g. while(true)), and forget about the mutex locks for now. If the while loop is always on, how does the computer switch?
Is this where the sleep_for function comes in?
Before the operating system starts running a thread, it arranges for the hardware to interrupt it when the thread's time slice is over. If nothing else happens and the thread does get to use up its full timeslice, this interrupt will trigger the operating system to switch threads.
Related
I'm curious about the behavior of cpu during a thread waiting for a mutex. Now I can imagine two possibilities:
The cpu stay on the current thread and check if the mutex had been unlocked continually.
The cpu will switch to another thread(or process) for a moment and switch back to the origin thread and check temporary.
Which one is right or the stl implement in another way?
To understand this you first need to understand the difference between thread and cpu core. Thread is an abstract thing, a data structure, that is used to represent some sequence of operations to be executed. The OS assigns threads to cpu cores, and those cores then execute those operations. The OS (and also hardware) can also interrupt this execution at any time (although not in the middle of a single instruction), save such thread's state, suspend it, and assign some other thread to that core. This is also known as context switch. The OS sometimes does that on so called syscalls (when a program calls some OS's functionality, e.g. asks for the access to disk, network, etc.) as well. It is important because mutexes utilize some syscalls under the hood.
So what happens when a thread tries to access a locked mutex? First of all, no periodical checks happen. While possible, that would be a waste of cpu cycles and extremely unlikely that any serious OS does that. What actually happens is that each mutex internally has a queue associated. When it is locked, the OS will add current thread to this queue and will suspend it. Afterwards the OS will assign some other thread to this cpu core, if available.
Now if a mutex is locked, then there's a thread that actually locked that mutex. Let's call that thread an owner. This thread is not suspended, and it does some work. When it finishes whatever it is doing, it has to unlock the mutex (which is a syscall as well), otherwise those pending threads will never resume. When that (i.e. the unlocking) happens the OS will look at the associated queue, and pick a thread from it (which one is an implementation detail, it will often be some priority queue). This newly picked thread will be the new owner of the mutex, and the OS will resume it, meaning schedule the thread for execution. Schedule, because all cores may be busy at the moment.
Note that this is a brief overview of the topic. There are lots of other things and optimizations in play, like futexes and how to actually implement thread-safe (or rather core-safe) code without mutexes (these are not hardware features, mutexes are implemented in the OS). But that's more or less how things are.
Typically the thread will attempt to acquire the mutex, and if it can't (e.g. because another thread already acquired it) it will inform the scheduler and the scheduler will block the waiting thread and switch to a different thread, and then (later, when the lock is released) the scheduler will unblock the waiting thread and give it CPU time again.
On single-CPU systems; this is almost required. All CPU time spent (e.g. "spinning"/polling the lock again) between finding out the lock can't be acquired and doing a task switch (to a thread that may release the lock) is a waste of CPU time that will achieve nothing (because no other thread can release the lock until a task switch occurs).
However, research on multi-CPU systems (that I vaguely remember from about 20 years ago that may or may not have been done by Sun for Solaris) indicates that a small amount of "spinning" (in the hope that a thread running on a different CPU releases the lock in time) can be beneficial (by avoiding the cost of task switch/es). My intuition is that "time spent spinning before blocking" should be roughly equal to the cost of a task switch (or, if a task switch costs 123 microseconds, it'd probably be worthwhile spinning for 123 microseconds before the scheduler is told to block your thread); but this would depend heavily on scenario (e.g. how heavily contended the lock is, etc).
Typically,
The hardware thread (your "CPU") will be switched to running a different software thread by the kernel, and the original software thread will be set aside until the mutex it is waiting on becomes signaled. At that point the kernel will place it among the set of software threads that it seeks to schedule for execution on one of the hardware threads in the system.
Your option 1 applies to what is called a critical section on Microsoft's platforms and more generally a spinlock. See pthread_spin_lock().
Your option 2 is most similar to what usually happens.
In the Microsoft world, the Mutex is waited on with WaitForSingleObject(), which is described as
If the object's state is nonsignaled, the calling thread enters the wait state until the object is signaled or the time-out interval elapses.
Now you need to know that the "wait state" is a state where the thread is not active. We call it "blocking", which is the opposite of a busy wait where CPU time is used.
At that beginning, the kernel will immediately give the CPU to another thread and never give it back to your thread, unless the Mutex is becoming "signaled". So it will really use 0 CPU cycles during the wait.
When the kernel notices that the Mutex has changed, it can "wake up" the thread and might even boost its priority because it was waiting friendly all the time.
The cpu stay on the current thread and check if the mutex had been unlocked continually.
It's not the CPU that picks a thread to be executed. The thread scheduler of Windows will pick a thread that gets executed.
If a Mutex could block a CPU that way, you need to only 8 or 12 Mutexes to fully brick your system.
The cpu will switch to another thread(or process) for a moment [...]
Almost. There will be an interrupt by a timer. The interrupt will be handled by an interrupt service routine by the Windows kernel. At that time, the kernel can decide which thread will be executed next.
[...] and switch back to the origin thread and check temporary.
No. Because the Mutex is a kernel object, the kernel already knows that there's no used in letting the thread check again unless the Mutex has been signaled.
I'm working with a user mode driver for small scale USB devices. My usb reading loop should be very responsive and operations it performs should be very small ( not necessary to be atomic). Like an interrupt service routine in a kernel mode driver. In one processing I need to create a thread and pass some parameters to that thread inside that reading loop.
So I need to know the exact upper limit of that operation. It will not take more than 200mS , or something like that.
Next alternative is to do the thread initialization at the device initialization time ( probing time ) and then sleep that thread waiting till I signal it from the reading thread. But in this scenario the thread is always running and it would be costly.
What is the best option ? My platform is linux, and they said in linux, thread creation have very short operation. I need to decide what is best. Keep the thread alive at all-time or create the thread when necessary.
Modern machines have hundreds, sometimes thousands of threads instantiated and in "ready" state at all times. "Ready" does not mean "Actually Running".
So, there is no problem with starting one more thread at device initialization and keeping it in "Ready" state most of the time, and giving it some work to do every once in a rare while.
The trick to getting this to work smoothly this is to make sure that the thread is block-waiting for an event to occur. When a thread is block-waiting for a signal it is consuming zero, or near-zero, CPU.
Starting a new thread each time you need to do something can be quite costly. A new thread usually needs to allocate memory, and this can be a time consuming operation, especially in a system that is running low on memory, where memory allocation can cause swapping.
Just create thread once and make it block on some semaphore or mutex until you signal it. This way it won't be "always running" and it won't "be costly". This way you don't need to handle case like: "What if thread didn't start when I needed some processing" or "What if system was busy and thread startup was slow"?..
Just a minor thing: if the thread doesn't do much I would initialize it with smaller stack size.
What happens when a thread is put to sleep by other thread, possible by main thread, in the middle of its execution?
assuming I've a function Producer. What if Consumer sleep()s the Producer in the middle of production of one unit ?
Suppose the unit is half produced. and then its put on sleep(). The integrity of system may be in a problem
The thread that sleep is invoked on is put in the idle queue by the thread scheduler and is context switched out of the CPU it is running on, so other threads can take it's place.
All context (registers, stack pointer, base pointer, etc) are saved on the thread stack, so when it's run next time, it can continue from where it left off.
The OS is constantly doing context switches between threads in order to make your system seem like it's doing multiple things. The OS thread scheduler algorithm takes care of that.
Thread scheduling and threading is a big subject, if you want to really understand it, I suggest you start reading up on it. :)
EDIT: Using sleep for thread synchronization purposes not advised, you should use proper synchronization mechanisms to tell the thread to wait for other threads, etc.
There is no problem associated with this, unless some state is mutated while the thread sleeps, so it wakes up with a different set of values than before going to sleep.
Threads are switched in and out of execution by the CPU all the time, but that does not affect the overall outcome of their execution, assuming no data races or other bugs are present.
It would be unadvisable for one thread to forcibly and synchronously interfere with the execution of another thread. One thread could send an asynchronous message to another requesting that it reschedule itself in some way, but that would be handled by the other thread when it was in a suitable state to do so.
Assuming they communicate using channels that are thread-safe, nothing bad shoudl happen, as the sleeping thread will wake up eventually and grab data from its task queue or see that some semaphore has been set and read the prodced data.
If the threads communicate using nonvolatile variables or direct function calls that change state, that's when Bad Things occur.
I don't know of a way for a thread to forcibly cause another thread to sleep. If two threads are accessing a shared resource (like an input/output queue, which seems likely for you Produce/Consumer example), then both threads may contend for the same lock. The losing thread must wait for the other thread to release the lock if the contention is not the "trylock" variety. The thread that waits is placed into a waiting queue associated with the lock, and is removed from the schedulers run queue. When the winning thread releases the lock, the code checks the queue to see if there are threads still waiting to acquire it. If there are, one is chosen as the winner and is given the lock, and placed in the scheduler run queue.
I have a program that should get the maximum out of my cpu.
It is multithreaded via pthreads that do their job well apart from the fact that they "only" get my cores to about 60% load which is not enough in my opinion.
I am searching for the reason and am asking myself (and hereby you) if the blocking functions mutex_lock/cond_wait are candidates?
What happens when a thread cannot run on in such a function?
Does pthread switch to another thread it handles or
does the thread yield its time to the system and if the latter is the case, can I change this behavior?
Regards,
Nobody
More Information
The setting is one mainthread that fills the taskpool and countless workers that fetch jobs from there and wait on a conditional that is signaled via broadcast when a serialized calculation is done. They go on with the values from this calculation until they are done, deliver their mail and fetch the next job...
On a typical modern pthreads implementation, each thread is managed by the kernel not unlike a separate process. Any blocking call like pthread_mutex_lock or pthread_cond_wait (but also, say, read) will yield its time to the system. The system will then find another eligible thread to schedule, whether in your process or another process, and run it.
If your program is only taking 60% of the CPU, it is more likely blocked on I/O than on pthread operations, unless you have done something way too granular with your pthread operations.
If a thread is waiting on a mutex/condition, it doesn't use resources (well, uses just a tiny amount). Whenever the thread enters waiting state, control switches to other threads. When the mutex is released (or condition variable signalled), the thread wakes up and may acquire the mutex (if no other thread grabs it first), and continue to run. If however some other thread acquires the mutex (this can happen if several threads are waiting for it), the thread returns to sleeping state.
I am creating a program with multiple threads using pthreads.
Is sleep() causing the process (all the threads) to stop executing or just the thread where I am calling sleep?
Just the thread. The POSIX documentation for sleep() says:
The sleep() function shall cause the calling thread to be suspended from execution...
Try this,
#include <unistd.h>
usleep(microseconds);
I usually use nanosleep and it works fine.
Nanosleep supends the execution of the calling thread. I have had the same doubt because in some man pages sleep refers to the entire process.
In practice, there are few cases where you just want to sleep for a small delay (milliseconds). For Linux, read time(7), and see also this answer. For a delay of more than a second, see sleep(3), for a small delay, see nanosleep(2). (A counter example might be a RasPerryPi running some embedded Linux and driving a robot; in such case you might indeed read from some hardware device every tenth of seconds). Of course what is sleeping is just a single kernel-scheduled task (so a process or thread).
It is likely that you want to code some event loop. In such a case, you probably want something like poll(2) or select(2), or you want to use condition variables (read a Pthread tutorial about pthread_cond_init etc...) associated with mutexes.
Threads are expensive resources (since each needs a call stack, often of a megabyte at least). You should prefer having one or a few event loops instead of having thousands of threads.
If you are coding for Linux, read also Advanced Linux Programming and syscalls(2) and pthreads(7).
Posix sleep function is not thread safe.
https://clang.llvm.org/extra/clang-tidy/checks/concurrency/mt-unsafe.html
sleep() function does not cease a specific thread, but it stops the whole process for the specified amount of time. For stopping the execution of a particular thread, we can use one pthread condition object and use pthread_cond_timedwait() function for making the thread wait for a specific amount of time. Each thread will have its own condition object and it will never receive a signal from any other thread.