Concurrency Fairness & Deadlock

Concurrency Fairness & Deadlock - concurrency

Can somebody give appropriate definitions for the terms Fairness and Deadlock. I am informed that these terms are used in concurrent processes.

In a nutshell, concurrent processes share CPU, the operating system schedules CPU bursts for each process to run. Fairness is one of the things needs to be considered in order to achieve progress, and also to prevent starvation.
Deadlock is a situation when there is a circle of dependency, where each process waits for another process to progress. you will need to read also about Mutex and critical section.

Related

Reduce Context Switches Between Threads With Same Priority

I am writing an application that use a third-party library to perform heavy computations.
This library implements parallelism internally and spawn given number threads. I want to run several (dynamic count) instances of this library and therefore end up with quite heavily oversubscribing the cpu.
Is there any way I can increase the "time quantum" of all the threads in a process so that e.g. all the threads with normal priority rarely context switch (yield) unless they are explicitly yielded through e.g. semaphores?
That way I could possibly avoid most of the performance overhead of oversubscribing the cpu. Note that in this case I don't care if a thread is starved for a few seconds.
EDIT:
One complicated way of doing this is to perform thread scheduling manually.
Enumerate all the threads with a specific priority (e.g. normal).
Suspend all of them.
Create a loop which resumes/suspends the threads every e.g. 40 ms and makes sure no mor threads than the current cpu count is run.
Any major drawbacks with this approach? Not sure what the overhead of resume/suspending a thread is?

There is nothing special you need to do. Any decent scheduler will not allow unforced context switches to consume a significant fraction of CPU resources. Any operating system that doesn't have a decent scheduler should not be used.
The performance overhead of oversubscribing the CPU is not the cost of unforced context switches. Why? Because the scheduler can simply avoid those. The scheduler only performs an unforced context switch when that has a benefit. The performance costs are:
It can take longer to finish a job because more work will be done on other jobs between when the job is started and when the job finishes.
Additional threads consume memory for their stacks and related other tracking information.
More threads generally means more contention (for example, when memory is allocated) which can mean more forced context switches where a thread has to be switched out because it can't make forward progress.
You only want to try to change the scheduler's behavior when you know something significant that the scheduler doesn't know. There is nothing like that going on here. So the default behavior is what you want.

Any major drawbacks with this approach? Not sure what the overhead of
resume/suspending a thread is?
Yes,resume/suspend the thread is very very dangerous activity done in user mode of program. So it should not be used(almost never). Moreover we should not use these concepts to achieve something which any modern scheduler does for us. This too is mentioned in other post of this question.
The above is applicable for any operating system, but from SO post tag it appears to me that it has been asked for Microsoft Windows based system. Now if we read about the SuspendThread() from MSDN, we get the following:
"This function is primarily designed for use by debuggers. It is not intended to be used for thread synchronization. Calling SuspendThread on a thread that owns a synchronization object, such as a mutex or critical section, can lead to a deadlock if the calling thread tries to obtain a synchronization object owned by a suspended thread".
So consider the scenario in which thread has acquired some resource(implicitly .i.e. part of not code..by library or kernel mode), and if we suspend the thread this would result into mysterious deadlock situation as other threads of that process would be waiting for that particular resource. The fact is we are not sure(at any time) in our program that what sort of resources are acquired by any running thread, suspend/resume thread is not good idea.

Do deadlocks cause high CPU utilization?

Do deadlocks put processes into a high rate of CPU usage, or do these two processes both "sleep", waiting on the other to finish?
I am trying to debug a multithreaded program written in C++ on a Linux system. I have noticed excessive CPU utilization from one particular process, and am wondering if it could be due to a deadlock issue. I have identified that one process consistently uses more of the CPU than I would anticipate (using top), and the process works, but it works slowly. If deadlocks cause the processes to sleep and do not cause high CPU usage, then at least I know this is not a deadlocking issue.

A deadlock typically does not cause high CPU usage, at least not if the deadlock occurs in synchronization primitives that are backed by the OS such that processes sleep while they wait.
If the deadlock occurs with i.e. lockless synchronization mechanisms (such as compare-exchange with an idle loop), CPU usage will be up.
Also, there is the notion of a livelock, which occurs when a program with multiple threads is unable to advance to some intended state because some condition (that depends on interaction between threads) cannot be fulfilled, even though none of the threads is explicitly waiting for something.

It depends on the type of lock. A lock that is implemented as a spin loop could run up 100% CPU usage in a deadlock situation.
On the other hand, a signalling lock such as a kernel mutex does not consume CPU cycles while waiting, so a deadlock on such a lock would not peg the CPU at 100%

How to prevent threads from starvation in C++11

I am just wondering if there is any locking policy in C++11 which would prevent threads from starvation.
I have a bunch of threads which are competing for one mutex. Now, my problem is that the thread which is leaving a critical section starts immediately compete for the same mutex and most of the time wins. Therefore other threads waiting on the mutex are starving.
I do not want to let the thread, leaving a critical section, sleep for some minimal amount of time to give other threads a chance to lock the mutex.
I thought that there must be some parameter which would enable fair locking for threads waiting on the mutex but I wasn't able to find any appropriate solution.
Well I found std::this_thread::yield() function, which suppose to reschedule the order of threads execution, but it is only hint to scheduler thread and depends on scheduler thread implementation if it reschedule the threads or not.
Is there any way how to provide fair locking policy for the threads waiting on the same mutex in C++11?
What are the usual strategies?
Thanks

This is a common optimization in mutexes designed to avoid wasting time switching tasks when the same thread can take the mutex again. If the current thread still has time left in its time slice then you get more throughput in terms of user-instructions-executed-per-second by letting it take the mutex rather than suspending it, and switching to another thread (which likely causes a big reload of cache lines and various other delays).
If you have so much contention on a mutex that this is a problem then your application design is wrong. You have all these threads blocked on a mutex, and therefore not doing anything: you are probably better off without so many threads.
You should design your application so that if multiple threads compete for a mutex then it doesn't matter which thread gets the lock. Direct contention should also be a rare thing, especially direct contention with lots of threads.
The only situation where I can think this is an OK scenario is where every thread is waiting on a condition variable, which is then broadcast to wake them all. Every thread will then contend for the mutex, but if you are doing this right then they should all do a quick check that this isn't a spurious wake and then release the mutex. Even then, this is called a "thundering herd" situation, and is not ideal, precisely because it serializes all these threads.

Relationship between shared memory concurrency algorithms and mutexes/semaphores

I am trying to figure out the relationship between shared memory based concurrency algorithms (Peterson's / Bakery) and the use of semaphores and mutexes.
In the first case, we have a system without OS intervention, and processes can synchronize themselves using shared memory and busy waiting.
In the second case, the OS provides processes/threads with the ability to block, and not have to busy wait.
Is there ever a situation where we'd like to use shared memory in addition to semaphores (to ensure fairness / lack of starvation), or does the OS offer a better way to do this?
(I am wondering about the general concepts, but answers specific to POSIX/Win32/JAVA threads are also interesting).
Thanks a lot!

I can't think of any circumstances where what you actually want is a busy wait. Busy waiting just consumes processor time without achieving anything. That's not to say that "busy wait" algorithms aren't useful (they are), but the "busy wait" part is not the desired property, it is just a necessary consequence of a property that is desired.
Peterson's lock algorithm and Lamport's bakery algorithm are fundamentally just implementations of the mutex concept. OS facilities provide implementations of the same concept, but with different trade-offs.
The "ideal" implementation of a mutex would have "zero overhead" --- acquiring a lock on a mutex would not take any time at all if it was not currently owned, a waiting thread would wake the instant that the prior owner released the lock, and in the mean time, the waiting thread would not consume any processor time.
A "busy wait" or "spin lock" algorithm trades processor time used by the waiting thread for a reduced wake-up time. Provided the thread is currently scheduled on a processor, a busy-waiter will wake as fast as the processor can transfer the necessary data for acquiring the lock and synchronizing the threads, but whilst it is waiting it will consume its maximum allotment of processor time. If the number of threads exceeds the number of available processors, this may well take time from the thread that currently owns the mutex, thus making the wait longer. However, in some cases the low latency between unlocking and locking is worth the trade-off.
On the other hand, a "blocking" mutex that uses OS facilities to put a waiting thread to sleep has a different trade-off. In this case, the time between unlocking a mutex and a waiting thread acquiring it can be quite large, possibly several hundred times larger than with a busy-wait algorithm. The benefit is that the waiting thread really does consume no processor time whilst waiting, so the OS can schedule other work whilst the thread is waiting. This can thus potentially reduce the overall wait time, and increase the overall throughput of the system.
Some mutex implementations use a combination of busy-waiting and blocking: they busy-wait for a short time, and then switch to blocking if the lock cannot be acquired in the short time. This has the benefits of the fast wake if the lock is released shortly after the thread began waiting, whilst consuming no processor time if the thread has to wait a long time. It also has the downsides of high processor usage for short waits, and slow wake-ups for long waits.

Choosing between Critical Sections, Mutex and Spin Locks

What are the factors to keep in mind while choosing between Critical Sections, Mutex and Spin Locks? All of them provide for synchronization but are there any specific guidelines on when to use what?
EDIT: I did mean the windows platform as it has a notion of Critical Sections as a synchronization construct.

In Windows parlance, a critical section is a hybrid between a spin lock and a non-busy wait. It spins for a short time, then--if it hasn't yet grabbed the resource--it sets up an event and waits on it. If contention for the resource is low, the spin lock behavior is usually enough.
Critical Sections are a good choice for a multithreaded program that doesn't need to worry about sharing resources with other processes.
A mutex is a good general-purpose lock. A named mutex can be used to control access among multiple processes. But it's usually a little more expensive to take a mutex than a critical section.

General points to consider:
The performance cost of using the mechanism.
The complexity introduced by using the mechanism.
In any given situation 1 or 2 may be more important.
E.g.
If you using multi-threading to write a high performance algorithm by making use of many cores and need to guard some data for safe access then 1 is probably very important.
If you have an application where a background thread is used to poll for some information on a timer and on the rare occasion it notices an update you need to guard some data for access then 2 is probably more important than 1.
1 will be down to the underlying implementation and probably scales with the scope of the protection e.g. a lock that is internal to a process is normally faster than a lock across all processes on a machine.
2 is easy to misjudge. First attempts to use locks to write thread safe code will normally miss some cases that lead to a deadlock. A simple deadlock would occur for example if thread A was waiting on a lock held by thread B but thread B was waiting on a lock held by thread A. Surprisingly easy to implement by accident.
On any given platform the naming and qualities of locking mechanisms may vary.
On windows critical sections are fast and process specific, mutexes are slower but cross process. Semaphores offer more complicated use cases. Some problems e.g. allocation from a pool may be solved very efficently using atomic functions rather than locks e.g. on windows InterlockedIncrement which is very fast indeed.

A Mutex in Windows is actually an interprocess concurrency mechanism, making it incredibly slow when used for intraprocess threading. A Critical Section is the Windows analogue to the mutex you normally think of.
Spin Locks are best used when the resource being contested is usually not held for a significant number of cycles, meaning the thread that has the lock is probably going to give it up soon.
EDIT : My answer is only relevant provided you mean 'On Windows', so hopefully that's what you meant.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js