Semaphore Mutex Concurrency Issue in multithreaded C program - c++

How do i design a multithreaded C program to avoid Semaphore Mutex Concurrency

There are a few ways.
The best way is to eliminate all shared data between threads. While this isn't always practical, it's always good to eliminate as much shared data as possible.
After that, you need to start looking into lockless programming. Lockless programming is a bit of a fad right now, but the dirty secret is that it's often a much better idea to use lock-based concurrency like mutexes and semaphores. Lockless programming is very hard to get correct. Look up Herb Sutter's articles on the subject, or the wikipedia page. There are a lot of good resources about lockless synchronization out there.
Somewhere in between is critical sections. If you're programming on Windows, critical sections should be preferred to mutexes as they do some work to avoid the overhead of full mutex locks and unlocks. Try those out first, and if your performance is unacceptable (or you're targeting platforms without critical sections), then you can look into lockless techniques.

It would be best to program lock less code. It's hard, but possible.
GCC Atomic Builtins
Article by Andrei Alexandrescu (C++)

Be sure to pass on data structures between threads, always knowing which thread the data exclusively belongs to. If you use (as mentioned by Dan before) e.q. lockless queues to pass your data around, you shouldn't run into too many concurrency issues (as your code behaves much more like any other code waiting for some data to arrive).
If you are, however, migrating single- to multithreaded code - this is an entirely different beast. It's very hard. And most of the time there are no elegant solutions.

Also, have a look at InterlockedXXX series to perform atomic operation in multi-threaded environment.
InterlockedIncrement Function
InterlockedDecrement Function
InterlockedExchange Function

Related

Why mutex (std::mutex) is heavy?

Quite often on this site, other forums I read phrases like "mutex is heavy, better use something else". But I can't really find explanation why it's heavy? Also, if we are talking about standard C++11 before C++20, we basically have only std::mutex, used with locks or condition_variable, to make something thread-safe, I expected something from std be quite efficient, especially if it's the only tool(before C++20) to make some task, thread-safety in this case.
So why mutexes and particularly std::mutex is heavy? And what we as C++ developers should use instead? Something from boost?
Mutexes are considered "heavy" because they are often believed to result in a syscall, i.e. a round-trip to the kernel. A trip to the kernel takes on the order of 1,000+ CPU cycles due to context switching between privileged and unprivileged code.
In many OSes these days mutexes are optimized to not go to the kernel until a contention occurs. For example, in Linux it's implemented using a futex ("fast userspace mutex"), in Windows - SRW lock. However, once there is a contention, there will be a trip to the kernel. And once a thread needs to wait, it will be "put to sleep" by the OS and there will be a significant delay between the moment the lock is released and the time the thread is scheduled to be executed again.
If you need synchronization, sometimes looping on a simple atomic can be sufficient. If contentions are rare and short, then you can achieve better performance with a "spin lock", i.e. looping until certain condition is met. Even if you loop 10000 times, it can be faster than a single syscall.
In practice, however, a mutex will provide adequate balance between performance and convenience. So I wouldn't worry about it unless you are counting nanoseconds (as in HFT or real-time applications).
std::mutex was designed to be a light-weight portable wrapper around the operating system's native mutex facility. If your goal was to invoke those facilities, mutex introduces only a very negligible overhead over calling the OS-native API directly.
However, depending on what your use case is, using an OS facility might not be the optimal solution. For instance, to protect data from concurrent access, you could also write your own lock from lower-level primitives like std::atomic. This will however be a different kind of lock algorithm. In particular, std::mutex will put a waiting thread to sleep if the mutex can't be obtained right away, which is something you cannot do without talking to the OS. In some cases though, such a simpler locking algorithm is sufficient to get the job done. A popular example here are cases where lock contention is expected to occur only in rare cases.
That being said, such thoughts get you quite deep into expert-level concurrency programming. Unless you have specific concerns that require worrying about micro-optimizations like rolling your own locking, std::mutex is the way to go and its overhead is well withing reasonable bounds for what it's doing.
All kinds of synchronization is "heavy", and lock based is heavier than atomic.
https://github.com/markwaterman/MutexShootout
This person did a comparison between various mutex implementations. A raw windows SWR lock was the fastest option, but the most recent std mutex they compared it to was the MSVC 2017 one.
I believe that std::shared_mutex is a windows SRW lock under the hood.
Do you need every last tiny percent of performance? Then you should be profiling and swapping out mutexes. If not, std::mutex is within 10s of percents of best options, and will probably continue to be iterated on and supported.
Atomic integer operations are generally cheaper than a mutex lock, but the rules are more complex. In addition, atomic operations cause non-local slowdowns in your code, as it causes cache lines to be cleared to avoid someone else having the wrong value.
In my experience, until you get to extreme situations, you can do algorithm changes to get far more than 10%s of performance changes. And when you really, really need performance, you will probably be stripping out mutexes as much as possible anyhow; even the fastest mutex isn't fast enough for really high performance situations.
Optimization is fungible; you can spend your development effort making code faster when you identify a bottleneck. Don't write code that is prematurely pessimized; but 10s of percent hit from using std mutex over the alternative locks is not usually large enough to be that problem.
Mutexes are expensive in the same sense that copying is expensive. Meaning if you can obviate the need for a copy, that is better than having to copy. But if you need a copy, there is no way around that. The same goes for std::mutex. Not because std::mutex is inefficient, but because mutexes are inherently expensive.

Does Boost have support for Windows EnterCriticalSection API?

I know Boost has support for mutexes and lock_guard, which can be used to implement critical sections.
But Windows has a special API for critical sections (see EnterCriticalSection and LeaveCriticalSection) which is a LOT faster than a mutex (for rarely contended, short sections of code).
Hence my question - it is possible in Boost to take advantage of this API, and fallback to spinlock/mutex/futex-based implementation on other platforms?
The simple answer is no.
Here's some relevant background from an old mailing list thread:
BTW. I am agree that mutex is more universal solution from a
performance point of view. But to be fair - CS are faster in simple
design. I believe that possibility to support them should be at
least
taken in account.
This was the article that someone pointed me to. The conclusion was
that CS are only faster if:
There are less than 8 threads total in the process.
You weren't running in the background.
You weren't on an dual processor machine.
To me this means that simple testing yields good CS performance
results, but any real world program is better off with a full blown
mutex.
I'm not adverse to supporting a CS implementation. However, I
originally chose not to for the following reasons:
You get either construction and destruction hits from using a PIMPL
idiom or you must include Windows.h in the Boost.Threads headers,
which I simply don't want to do. (This can be worked around by
emulating a CS ala OPTEX from the MSDN.)
According to this research paper most programs won't benefit from
a CS design.
It's trivial to code a (non-portable) critical_section class that
follows the Mutex model if you truly can make use of this.
For now I think I've made the right choice, though down the road we
may change the implementation to use a critical section or OPTEX.
Bill Kempf
Speaking as someone who helps out maintaining Boost.Thread, and as someone who failed to get an event object into Boost.Thread, I don't think critical sections have ever been added nor would be added to Boost for these reasons:
A Win32 critical section is trivially easy to build using a boost::atomic and a boost::condition_variable, so much so it isn't really worth having an official one. Here is probably the most complex one you could imagine, but extremely configurable including being constexpr ready (don't ask!): https://github.com/ned14/boost.outcome/blob/master/include/boost/outcome/v1/spinlock.hpp#L331
You can build your own simply by matching (Basic)Lockable concept and using atomic compare_exchange (non-x86/x64) or atomic exchange (x86/x64) and then grab it using a lock_guard around the critical section.
Some may object that a win32 critical section is not this. I am afraid it is: it simply spins on an atomic for a spin count, and then lazily tries to allocate a win32 event object which it then waits upon. Nothing special.
As much as you might think critical sections (really user mode mutexes) are better/faster/whatever, they probably are not as great as you might think. boost::mutex is a big vast heavyweight thing on Windows internally using a win32 semaphore as the kernel wait object because of the need to emulate thread cancellation and to behave well in a general purpose use context. It's easy to write a concurrency structure which is faster than another for some single use case, but it is very very hard to write a concurrency structure which is all of:
Faster than a standard implementation in the uncontended case.
Faster than a standard implementation in the lightly contended case.
Faster than a standard implementation in the heavily contended case.
Even if you manage all three of the above, that still isn't enough: you also need some guarantees on worst case progression ordering, so whether certain patterns of locks, waits and unlocks produce predictable outcomes. This is why threading facilities can appear to look slow in narrow use case scenarios, so Boost.Thread much as the STL can appear to be much slower than hand rolled locking code in say an uncontended use case.
Boost.Thread already does substantial work in user mode to avoid going to kernel sleep on Windows. On POSIX any of the major pthreads implementations also does substantial work to avoid kernel sleeps and hence Boost.Thread doesn't replicate that work. In other words, critical sections don't gain you anything in terms of scaling to load behaviours, though inevitably Boost.Thread v4 especially on Windows does a ton load of work a naive implementation does not (the planned rewrite of Boost.Thread is vastly more efficient on Windows as it can assume Windows Vista or above).
So, it looks like the default Boost mutex doesn't support it, but asio::detail::mutex does.
So I ended up using that:
#include <boost/asio/detail/mutex.hpp>
#include <boost/thread.hpp>
using boost::asio::detail::mutex;
using boost::lock_guard;
int myFunc()
{
static mutex mtx;
lock_guard<mutex> lock(mtx);
. . .
}

What should I know about multithreading and when to use it, mainly in c++

I have never come across multithreading but I hear about it everywhere. What should I know about it and when should I use it? I code mainly in c++.
Mostly, you will need to learn about MT libraries on OS on which your application needs to run. Until and unless C++0x becomes a reality (which is a long way as it looks now), there is no support from the language proper or the standard library for threads. I suggest you take a look at the POSIX standard pthreads library for *nix and Windows threads to get started.
This is my opinion, but the biggest issue with multithreading is that it is difficult. I don't mean that from an experienced programmer point of view, I mean it conceptually. There really are a lot of difficult concurrency problems that appear once you dive into parallel programming. This is well known, and there are many approaches taken to make concurrency easier for the application developer. Functional languages have become a lot more popular because of their lack of side effects and idempotency. Some vendors choose to hide the concurrency behind API's (like Apple's Core Animation).
Multitheaded programs can see some huge gains in performance (both in user perception and actual amount of work done), but you do have to spend time to understand the interactions that your code and data structures make.
MSDN Multithreading for Rookies article is probably worth reading. Being from Microsoft, it's written in terms of what Microsoft OSes support(ed in 1993), but most of the basic ideas apply equally to other systems, with suitable renaming of functions and such.
That is a huge subject.
A few points...
With multi-core, the importance of multi-threading is now huge. If you aren't multithreading, you aren't getting the full performance capability of the machine.
Multi-threading is hard. Communicating and synchronization between threads is tricky to get right. Problems are often intermittent, hard to diagnose, and if the design isn't right for multi-threading, hard to fix.
Multi-threading is currently mostly non-portable and platform specific.
There are portable libraries with wrappers around threading APIs. Boost is one. wxWidgets (mainly a GUI library) is another. It can be done reasonably portably, but you won't have all the options you get from platform-specific APIs.
I've got an introduction to multithreading that you might find useful.
In this article there isn't a single
line of code and it's not aimed at
teaching the intricacies of
multithreaded programming in any given
programming language but to give a
short introduction, focusing primarily
on how and especially why and when
multithreaded programming would be
useful.
Here's a link to a good tutorial on POSIX threads programming (with diagrams) to get you started. While this tutorial is pthread specific, many of the concepts transfer to other systems.
To understand more about when to use threads, it helps to have a basic understanding of parallel programming. Here's a link to a tutorial on the very basics of parallel computing intended for those who are just becoming acquainted with the subject.
The other replies covered the how part, I'll briefly mention when to use multithreading.
The main alternative to multithreading is using a timer. Consider for example that you need to update a little label on your form with the existence of a file. If the file exists, you need to draw a special icon or something. Now if you use a timer with a low timeout, you can achieve basically the same thing, a function that polls if the file exists very frequently and updates your ui. No extra hassle.
But your function is doing a lot of unnecessary work, isn't it. The OS provides a "hey this file has been created" primitive that puts your thread to sleep until your file is ready. Obviously you can't use this from the ui thread or your entire application would freeze, so instead you spawn a new thread and set it to wait on the file creation event.
Now your application is using as little cpu as possible because of the fact that threads can wait on events (be it with mutexes or events). Say your file is ready however. You can't update your ui from different threads because all hell would break loose if 2 threads try to change the same bit of memory at the same time. In fact this is so bad that windows flat out rejects your attempts to do it at all.
So now you need either a synchronization mechanism of sorts to communicate with the ui one after the other (serially) so you don't step on eachother's toes, but you can't code the main thread part because the ui loop is hidden deep inside windows.
The other alternative is to use another way to communicate between threads. In this case, you might use PostMessage to post a message to the main ui loop that the file has been found and to do its job.
Now if your work can't be waited upon and can't be split nicely into little bits (for use in a short-timeout timer), all you have left is another thread and all the synchronization issues that arise from it.
It might be worth it. Or it might bite you in the ass after days and days, potentially weeks, of debugging the odd race condition you missed. It might pay off to spend a long time first to try to split it up into little bits for use with a timer. Even if you can't, the few cases where you can will outweigh the time cost.
You should know that it's hard. Some people think it's impossibly hard, that there's no practical way to verify that a program is thread safe. Dr. Hipp, author of sqlite, states that thread are evil. This article covers the problems with threads in detail.
The Chrome browser uses processes instead of threads, and tools like Stackless Python avoid hardware-supported threads in favor of interpreter-supported "micro-threads". Even things like web servers, where you'd think threading would be a perfect fit, and moving towards event driven architectures.
I myself wouldn't say it's impossible: many people have tried and succeeded. But there's no doubt writting production quality multi-threaded code is really hard. Successful multi-threaded applications tend to use only a few, predetermined threads with just a few carefully analyzed points of communication. For example a game with just two threads, physics and rendering, or a GUI app with a UI thread and background thread, and nothing else. A program that's spawning and joining threads throughout the code base will certainly have many impossible-to-find intermittent bugs.
It's particularly hard in C++, for two reasons:
the current version of the standard doesn't mention threads at all. All threading libraries and platform and implementation specific.
The scope of what's considered an atomic operation is rather narrow compared to a language like Java.
cross-platform libraries like boost Threads mitigate this somewhat. The future C++0x will introduce some threading support. But boost also has good interprocess communication support you could use to avoid threads altogether.
If you know nothing else about threading than that it's hard and should be treated with respect, than you know more than 99% of programmers.
If after all that, you're still interested in starting down the long hard road towards being able to write a multi-threaded C++ program that won't segfault at random, then I recommend starting with Boost threads. They're well documented, high level, and work cross platform. The concepts (mutexes, locks, futures) are the same few key concepts present in all threading libraries.

Read write mutex in C++

This is an interview question. How do you implement a read/write mutex? There will be multiple threads reading and writing to a resource. I'm not sure how to go about it. If there's any information needed, please let me know.
Update: I'm not sure if my statement above is valid/understandable. But what I really want to know is how do you implement multiple read and multiple writes on a single object in terms of mutex and other synchronization objects needed?
Check out Dekker's algorithm.
Dekker's algorithm is the first known
correct solution to the mutual
exclusion problem in concurrent
programming. The solution is
attributed to Dutch mathematician Th.
J. Dekker by Edsger W. Dijkstra in his
manuscript on cooperating sequential
processes. It allows two threads to
share a single-use resource without
conflict, using only shared memory for
communication.
Note that Dekker's algorithm uses a spinlock (not a busy waiting) technique.
(Th. J. Dekker's solution, mentioned by E. W. Dijkstra in his EWD1303 paper)
The short answer is that it is surprisingly difficult to roll your own read/write lock. It's very easy to miss a very subtle timing problem that could result in deadlock, two threads both thinking they have an "exclusive" lock, etc.
In a nutshell, you need to keep a count of how many readers are active at any particular time. Only when the number of active readers is zero, should you grant a thread write access. There are some design choices as to whether readers or writers are given priority. (Often, you want to give writers the priority, on the assumption that writing is done less frequently.) The (surprisingly) tricky part is to ensure that no writer is given access when there are readers, or vice versa.
There is an excellent MSDN article, "Compound Win32 Synchronization Objects" that takes you through the creation of a reader/writer lock. It starts simple, then grows more complicated to handle all the corner cases. One thing that stood out was that they showed a sample that looked perfectly good-- then they would explain why it wouldn't actually work. Had they not pointed out the problems, you might have never noticed. Well worth a read.
Hope this is helpful.
This sounds like an rather difficult question for an interview; I would not "implement" a read/write mutex, in the sense of writing one from scratch--there are much better off-the-shelf solutions available. The sensible real world thing would be to use an existing mutex type. Perhaps what they really wanted to know was how you would use such a type?
Afaik you need either an atomic compare-and-swap instruction, or you need to be able to disable interrupts. See Compare-and-swap on wikipedia. At least, that's how an OS would implement it. If you have an operating system, stand on it's shoulders, and use an existing library (boost for example).

Why do libraries implement their own basic locks on windows?

Windows provides a number of objects useful for synchronising threads, such as event (with SetEvent and WaitForSingleObject), mutexes and critical sections.
Personally I have always used them, especially critical sections since I'm pretty certain they incur very little overhead unless already locked. However, looking at a number of libraries, such as boost, people then to go to a lot of trouble to implement their own locks using the interlocked methods on Windows.
I can understand why people would write lock-less queues and such, since thats a specialised case, but is there any reason why people choose to implement their own versions of the basic synchronisation objects?
Libraries aren't implementing their own locks. That is pretty much impossible to do without OS support.
What they are doing is simply wrapping the OS-provided locking mechanisms.
Boost does it for a couple of reasons:
They're able to provide a much better designed locking API, taking advantage of C++ features. The Windows API is C only, and not very well-designed C, at that.
They are able to offer a degree of portability. the same Boost API can be used if you run your application on a Linux machine or on Mac. Windows' own API is obviously Windows-specific.
The Windows-provided mechanisms have a glaring disadvantage: They require you to include windows.h, which you may want to avoid for a large number of reasons, not least its extreme macro abuse polluting the global namespace.
One particular reason I can think of is portability. Windows locks are just fine on their own but they are not portable to other platforms. A library which wishes to be portable must implement their own lock to guarantee the same semantics across platforms.
In many libraries (aka Boost) you need to write corss platform code. So, using WaitForSingleObject and SetEvent are no-go. Also, there common idioms, like Monitors, Conditions that Win32 API misses, (but it can be implemented using these basic primitives)
Some lock-free data structures like atomic counter are very useful; for example: boost::shared_ptr uses them in order to make it thread safe without overhead of critical section, most compilers (not msvc) use atomic counters in order to implement thread safe copy-on-write std::string.
Some things like queues, can be implemented very efficiently in thread safe way without locks at all that may give significant perfomance boost in certain applications.
There may occasionally be good reasons for implementing your own locks that don't use the Windows OS synchronization objects. But doing so is a "sharp stick." It's easy to poke yourself in the foot.
Here's an example: If you know that you are running the same number of threads as there are hardware contexts, and if the latency of waking up one of those threads which is waiting for a lock is very important to you, you might choose a spin lock implemented completely in user space. If the waiting thread is the only thread spinning on the lock, the latency of transferring the lock from the thread that owns it to the waiting thread is just the latency of moving the cache line to the owner thread and back to the waiting thread -- orders of magnitude faster than the latency of signaling a thread with an OS lock under the same circumstances.
But the scenarios where you want to do this is pretty narrow. As soon as you start having more software threads than hardware threads, you'll likely regret it. In that scenario, you could spend entire OS scheduling quanta doing nothing but spinning on your spin lock. And, if you care about power, spinlocks are bad because they prevent the processor from going into a low-power state.
I'm not sure I buy the portability argument. Portable libraries often have an OS portability layer that abstracts the different OS APIs for synchronization. If you're dealing with locks, a pthread_mutex can be made semantically the same as a Windows Mutex or Critical Section under an abstraction layer. There's some exceptions here, but for most people this is true. If you're dealing with Windows Events or POSIX condition variables, well, those are tougher to abstract. (Vista did introduce POSIX-style condition variables, but not many Windows software developers are in a position to require Vista...)
Writing locking code for a library is useful if that library is meant to be cross platform. Users of the library can use the library's locking functionality and not have to care about the underlying platform implementation. Assuming the library has versions for all the platforms being targetted it's one less bit of code that has to be ported.