If I need to synchronize two threads that both call a function with send() on a specific socket, would it be more useful to warp a critical section on the send() function or look into using a mutex? (since a socket is a kernel object)
Assuming Windows platform (that's where we have a choice between critical sections and mutexes).
Mutex (of CreateMutex) is way slower: locking and unlocking is always a system call, even if there is no contention. The cost of send, though, is likely to be enough to make this difference unnoticeable.
As pointed by another answer, mutexes can be shared between processes (if named/reopened or inherited), and critical sections are process-local.
I am assuming that this is about Windows (can't recall seeing critical section elsewhere).
It doesn't matter really which synchronization object you use if all the locking is within one process. If you want to lock across process boundary, then you should use mutex because critical section only works within single process, but named mutex can be shared between many processes.
I think, mutex should work faster.
Related
A coworker had an issue recently that boiled down to what we believe was the following sequence of events in a C++ application with two threads:
Thread A holds a mutex.
While thread A is holding the mutex, thread B attempts to lock it. Since it is held, thread B is suspended.
Thread A finishes the work that it was holding the mutex for, thus releasing the mutex.
Very shortly thereafter, thread A needs to touch a resource that is protected by the mutex, so it locks it again.
It appears that thread A is given the mutex again; thread B is still waiting, even though it "asked" for the lock first.
Does this sequence of events fit with the semantics of, say, C++11's std::mutex and/or pthreads? I can honestly say I've never thought about this aspect of mutexes before.
Are there any fairness guarantees to prevent starvation of other threads for too long, or any way to get such guarantees?
Known problem. C++ mutexes are thin layer on top of OS-provided mutexes, and OS-provided mutexes are often not fair. They do not care for FIFO.
The other side of the same coin is that threads are usually not pre-empted until they run out of their time slice. As a result, thread A in this scenario was likely to continue to be executed, and got the mutex right away because of that.
The guarantee of a std::mutex is enable exclusive access to shared resources. Its sole purpose is to eliminate the race condition when multiple threads attempt to access shared resources.
The implementer of a mutex may choose to favor the current thread acquiring a mutex (over another thread) for performance reasons. Allowing the current thread to acquire the mutex and make forward progress without requiring a context switch is often a preferred implementation choice supported by profiling/measurements.
Alternatively, the mutex could be constructed to prefer another (blocked) thread for acquisition (perhaps chosen according FIFO). This likely requires a thread context switch (on the same or other processor core) increasing latency/overhead. NOTE: FIFO mutexes can behave in surprising ways. E.g. Thread priorities must be considered in FIFO support - so acquisition won't be strictly FIFO unless all competing threads are the same priority.
Adding a FIFO requirement to a mutex's definition constrains implementers to provide suboptimal performance in nominal workloads. (see above)
Protecting a queue of callable objects (std::function) with a mutex would enable sequenced execution. Multiple threads can acquire the mutex, enqueue a callable object, and release the mutex. The callable objects can be executed by a single thread (or a pool of threads if synchrony is not required).
•Thread A finishes the work that it was holding the mutex for, thus
releasing the mutex.
•Very shortly thereafter, thread A needs to touch a resource that is
protected by the mutex, so it locks it again
In real world, when the program is running. there is no guarantee provided by any threading library or the OS. Here "shortly thereafter" may mean a lot to the OS and the hardware. If you say, 2 minutes, then thread B would definitely get it. If you say 200 ms or low, there is no promise of A or B getting it.
Number of cores, load on different processors/cores/threading units, contention, thread switching, kernel/user switches, pre-emption, priorities, deadlock detection schemes et. al. will make a lot of difference. Just by looking at green signal from far you cannot guarantee that you will get it green.
If you want that thread B must get the resource, you may use IPC mechanism to instruct the thread B to gain the resource.
You are inadvertently suggesting that threads should synchronise access to the synchronisation primitive. Mutexes are, as the name suggests, about Mutual Exclusion. They are not designed for control flow. If you want to signal a thread to run from another thread you need to use a synchronisation primitive designed for control flow i.e. a signal.
You can use a fair mutex to solve your task, i.e. a mutex that will guarantee the FIFO order of your operations. Unfortunately, C++ standard library doesn't have a fair mutex.
Thankfully, there are open-source implementations, for example yamc (a header-only library).
The logic here is very simple - the thread is not preempted based on mutexes, because that would require a cost incurred for each mutex operation, which is definitely not what you want. The cost of grabbing a mutex is high enough without forcing the scheduler to look for other threads to run.
If you want to fix this you can always yield the current thread. You can use std::this_thread::yield() - http://en.cppreference.com/w/cpp/thread/yield - and that might offer the chance to thread B to take over the mutex. But before you do that, allow me to tell you that this is a very fragile way of doing things, and offers no guarantee. You could, alternatively, investigate the issue deeper:
Why is it a problem that the B thread is not started when A releases the resource? Your code should not depend on such logic.
Consider using alternative thread synchronization objects like barriers (boost::barrier or http://linux.die.net/man/3/pthread_barrier_wait ) instead, if you really need this sort of logic.
Investigate if you really need to release the mutex from A at that point - I find the practice of locking and releasing fast a mutex for more than one time a code smell, it usually impacts terribly the performace. See if you can group extraction of data in immutable structures which you can play around with.
Ambitious, but try to work without mutexes - use instead lock-free structures and a more functional approach, including using a lot of immutable structures. I often found quite a performance gain from updating my code to not use mutexes (and still work correctly from the mt point of view)
How do you know this:
While thread A is holding the mutex, thread B attempts to lock it.
Since it is held, thread B is suspended.
How do you know thread B is suspended. How do you know that it is not just finished the line of code before trying to grab the lock, but not yet grabbed the lock:
Thread B:
x = 17; // is the thread here?
// or here? ('between' lines of code)
mtx.lock(); // or suspended in here?
// how can you tell?
You can't tell. At least not in theory.
Thus the order of acquiring the lock is, to the abstract machine (ie the language), not definable.
I was reading this post on performance differences in C# between critical sections and mutexes for a given test case. I'm womdering if there is any further documentation out there that gives performance overheads for the various locking classes for a C++ application, specifically MFC running on a Windows 32 or 64 bit platform?
The reason that I'm asking is that the profiler results I get across broad automated tests show a lot of time spent in mutex code. What I'm trying to figure out is how much of this is reasonable delay while waiting for a resource to become available, and how much is due to the implementation and specifics of the locking structure. I'm only dealing with a single process, which includes multiple threads, and am considering changing to critical sections. Long term automated testing shows that I don't need the time-outs offered by the mutex class.
Hence the question, is anyone aware of any reference documentation relating to the performance overheads of different MFC locking mechanisms on different Windows platforms?
As far as I can understand, a Win32 Mutex is a full blown kernel object. This means that any call to a Mutex will involve a system call. This will often invalidate the cache and therefore can be quite expensive.
Critical Sections are Userside objects that make no use of the kernel in cases where there is no contention. This is probably done using the x86 LOCK assembler instruction or similar to guarantee atomicity. Since no system call is made, it will be faster but because it not a kernel object, there is no way to access a critical section from another process.
The crucial difference between Critical Sections and Mutexes in Windows is that you can create a named mutex and use it from multiple processes, whereas there is no way to access a critical section of one process from another.
A consequence of a mutex being available in multiple processes is that access to it must be controlled by the kernel.
Read the following support article from Microsoft: http://support.microsoft.com/kb/105678.
Critical sections and mutexes provide synchronization that is very similar, except that critical sections can be used only by the threads of a single process. There are two areas to consider when choosing which method to use within a single process:
Speed. The Synchronization overview says the following about critical sections:
... critical section objects provide a slightly faster, more efficient
mechanism for mutual-exclusion synchronization. Critical sections use
a processor-specific test and set instruction to determine mutual
exclusion.
Deadlock. The Synchronization overview says the following about mutexes:
If a thread terminates without releasing its ownership of a mutex
object, the mutex is considered to be abandoned. A waiting thread can
acquire ownership of an abandoned mutex, but the wait function's
return value indicates that the mutex is abandoned.
WaitForSingleObject() will return WAIT_ABANDONED for a mutex that has
been abandoned. However, the resource that the mutex is protecting is
left in an unknown state.
There is no way to tell whether a critical section has been abandoned.
So I have called createmutex like so
while(1){
HANDLE h;
h=CreateMutex(NULL,TRUE,"mutex1");
y=WaitForSingleObject(h,INFINITE);
///random code
ReleaseMutex(h)
}
It runs fine after looping twice, but deadlocks on WaitForSingleObject (h,INFINITE) after the third loop. This is with two threads running concurrently. How can it deadlock when ReleaseMutex is called? Is the createmutex function called correctly?
You're waiting on a mutex that's already owned... please don't do that.
Also, you're not destroying the mutex, only releasing it. The next call should give you ERROR_ALREADY_EXISTS. The complete quote from MSDN is "If the mutex is a named mutex and the object existed before this function call, the return value is a handle to the existing object, GetLastError returns ERROR_ALREADY_EXISTS, bInitialOwner is ignored, and the calling thread is not granted ownership."
If any of the "random code" waits for the other thread to make progress, it could deadlock while owning the mutex. In which case the other thread will wait forever trying to acquire the mutex, which is the behavior you're seeing.
I suspect you are trying to implement mutual exclusion within a single process. If that is so then the correct synchronization object is the critical section. The naming of these objects is a little confusing because both mutexes and critical sections peform mutual exclusion.
The interface for the critical section is much simpler to use, it being essentially an acquire function and a corresponding release function. If you are synchronizing within a single process, and you need a simple lock (rather than, say, a semaphore), you should use critical sections rather than mutexes.
In fact, very recently here on Stack Overflow, I wrote a more detailed answer to a question which described the standard usage pattern for critical sections. That post has lots of links to the pertinent sections of MSDN documentation.
You only need to use a mutex when you are performing cross process synchronization. Indeed you should only use a mutex when you are synchronizing across a process because critical sections perform so much better (i.e. faster).
What are the factors to keep in mind while choosing between Critical Sections, Mutex and Spin Locks? All of them provide for synchronization but are there any specific guidelines on when to use what?
EDIT: I did mean the windows platform as it has a notion of Critical Sections as a synchronization construct.
In Windows parlance, a critical section is a hybrid between a spin lock and a non-busy wait. It spins for a short time, then--if it hasn't yet grabbed the resource--it sets up an event and waits on it. If contention for the resource is low, the spin lock behavior is usually enough.
Critical Sections are a good choice for a multithreaded program that doesn't need to worry about sharing resources with other processes.
A mutex is a good general-purpose lock. A named mutex can be used to control access among multiple processes. But it's usually a little more expensive to take a mutex than a critical section.
General points to consider:
The performance cost of using the mechanism.
The complexity introduced by using the mechanism.
In any given situation 1 or 2 may be more important.
E.g.
If you using multi-threading to write a high performance algorithm by making use of many cores and need to guard some data for safe access then 1 is probably very important.
If you have an application where a background thread is used to poll for some information on a timer and on the rare occasion it notices an update you need to guard some data for access then 2 is probably more important than 1.
1 will be down to the underlying implementation and probably scales with the scope of the protection e.g. a lock that is internal to a process is normally faster than a lock across all processes on a machine.
2 is easy to misjudge. First attempts to use locks to write thread safe code will normally miss some cases that lead to a deadlock. A simple deadlock would occur for example if thread A was waiting on a lock held by thread B but thread B was waiting on a lock held by thread A. Surprisingly easy to implement by accident.
On any given platform the naming and qualities of locking mechanisms may vary.
On windows critical sections are fast and process specific, mutexes are slower but cross process. Semaphores offer more complicated use cases. Some problems e.g. allocation from a pool may be solved very efficently using atomic functions rather than locks e.g. on windows InterlockedIncrement which is very fast indeed.
A Mutex in Windows is actually an interprocess concurrency mechanism, making it incredibly slow when used for intraprocess threading. A Critical Section is the Windows analogue to the mutex you normally think of.
Spin Locks are best used when the resource being contested is usually not held for a significant number of cycles, meaning the thread that has the lock is probably going to give it up soon.
EDIT : My answer is only relevant provided you mean 'On Windows', so hopefully that's what you meant.
For example the c++0x interfaces
I am having a hard time figuring out when to use which of these things (cv, mutex and lock).
Can anyone please explain or point to a resource?
Thanks in advance.
On the page you refer to, "mutex" is the actual low-level synchronizing primitive. You can take a mutex and then release it, and only one thread can take it at any single time (hence it is a synchronizing primitive). A recursive mutex is one which can be taken by the same thread multiple times, and then it needs to be released as many times by the same thread before others can take it.
A "lock" here is just a C++ wrapper class that takes a mutex in its constructor and releases it at the destructor. It is useful for establishing synchronizing for C++ scopes.
A condition variable is a more advanced / high-level form of synchronizing primitive which combines a lock with a "signaling" mechanism. It is used when threads need to wait for a resource to become available. A thread can "wait" on a CV and then the resource producer can "signal" the variable, in which case the threads who wait for the CV get notified and can continue execution. A mutex is combined with CV to avoid the race condition where a thread starts to wait on a CV at the same time another thread wants to signal it; then it is not controllable whether the signal is delivered or gets lost.
I'm not too familiar w/ C++0x so take this answer w/ a grain of salt.
re: Mutex vs. locks: From the documentation you posted, it looks like a mutex is an object representing an OS mutex, whereas a lock is an object that holds a mutex to facilitate the RAII pattern.
Condition variables are a handy mechanism to associate a blocking/signaling mechanism (signal+wait) with a mutual exclusion mechanism, yet keep them decoupled in the OS so that you as system programmer can choose the association between condvar and mutex. (useful for dealing with multiple sets of concurrently-accessed objects) Rob Krten has some good explanations on condvars in one of the online chapters of his book on QNX.
As far as general references: This book (not out yet) looks interesting.
This question has been answered. I just add this that may help to decide WHEN to use these synchronization primitives.
Simply, the mutex is used to guarantee mutual access to a shared resource in the critical section of multiple threads. The luck is a general term but a binary mutex can be used as a lock. In modern C++ we use lock_guard and similar objects to utilize RAII to simplify and make safe the mutex usage. The conditional variable is another primitive that often combined with a mutex to make something know as a monitor.
I am having a hard time figuring out when to use which of these things
(cv, mutex and lock). Can anyone please explain or point to a
resource?
Use a mutex to guarantee mutual exclusive access to something. It's the default solution for a broad range of concurrency problems. Use lock_guard if you have a scope in C++ that you want to guard it with a mutex. The mutex is handled by the lock_guard. You just create a lock_guard in the scope and initialize it with a mutex and then C++ does the rest for you. The mutex is released when the scope is removed from the stack, for any reason including throwing an exception or returning from a function. It's the idea behind RAII and the lock_guard is another resource handler.
There are some concurrency issues that are not easily solvable by only using a mutex or a simple solution can lead to complexity or inefficiency. For example, the produced-consumer problem is one of them. If we want to implement a consumer thread reading items from a buffer shared with a producer, we should protect the buffer with a mutex but, without using a conditional variable we should lock the mutex, check the buffer and read an item if it's not empty, unlock it and wait for some time period, lock it again and go on. It's a waste of time if the buffer is often empty (busy waiting) and also there will be lots of locking and unlocking and sleeps.
The solution we need for the producer-consumer problem must be simpler and more efficient. A monitor (a mutex + a conditional variable) helps us here. We still need a mutex to guarantee mutual exclusive access but a conditional variable lets us sleep and wait for a certain condition. The condition here is the producer adding an item to the buffer. The producer thread notifies the consumer thread that there is and item in the buffer and the consumer wakes up and gets the item. Simply, the producer locks the mutex, puts something in the buffer, notifies the consumer. The consumer locks the mutex, sleeps while waiting for a condition, wake s up when there is something in the buffer and gets the item from the buffer. It's a simpler and more efficient solution.
The next time you face a concurrency problem think this way: If you need mutual exclusive access to something, use a mutex. Use lock_guard if you want to be safer and simpler. If the problem has a clue of waiting for a condition that must happen in another thread, you MIGHT need a conditional variable.
As a general rule of thumb, first, analyze your problem and try to find a famous concurrency problem similar to yours (for example, see classic problems of synchronization section in this page). Read about the solutions proposed for the well-known solution to peak the best one. You may need some customization.