C++ synchronization guidelines

C++ synchronization guidelines - c++

Does anyone know of a decent reference for synchronization issues in C++? I'm thinking of something similar to the C++ FAQ lite (and the FQA lite) but with regards to concurrency, locking, threading, performance issues, guidelines, when locks are needed and when they aren't, dealing with multithreaded library code that you can't control, etc. I don't care about the inner issues of how different lock types can be implemented etc, I just use boost for that.
I'm sure there are a lot of good books out there, I'd prefer something (preferably online) that I can use as a goto for when a question or an issue pops up in my mind. I'm not really a beginner to this all so I would like a concise reference for all those different types of situations that can popup when writing multithreaded libraries that use other multithreaded libraries.
Like:
When is it better to have one big lock protecting a bunch of data vs a bunch of smaller locks protecting each piece of data? (what are the costs associated with having lots of locks? Resource acquisition costs? Locking time performance costs?)
What's the performance hit of pushing something onto a queue and having another thread pop the queue vs dealing that that data in the original thread?
Are there any simple idioms to make sure things just work when you're not so concerned about performance?
Anyway, I just want to know if there are any decent references out there that people use.

I'd recommend two resources:
Herb Sutter's Effective Concurrency articles
Anthony Williams's C++ Concurrency In Action (not yet published, but available as a PDF)

Related

what is the relationship among different approaches of F# concurrency

I'm recently learn F# asynchronous workflows, which is an important feature of F# concurrency. What confused me is that how many approaches to write concurrent code in F#? I read Except F#, and some blog about F# concurrency, I know things like background workers; IAsyncResult; If programming on local machine, there is shard-memory concurrency in F#; If programming on distributed system, there is message-passing concurrency. But I really not sure what is the relationship between these techniques, and how to classify them. I understand it is quite a "big" question cannot be answer with one or two sentences, so I would definitely appreciate if anyone can give me specific answer or recommend me some useful references.

I'm also rather new to F#, so I hope more answers come to complement this one :)
The first thing is you need to distinguish between .NET classes (which can be used from any .NET language) and F# unique ways to deal with asynchronous operations. In the first case as you mention and among others, you have:
System.ComponentModel.BackgroundWorker: This was used mainly in the first .NET versions with Windows Forms and it's not recommended anymore.
System.IAsyncResult: This is also an old .NET interface implemented by several classes (also Task) but I don't usually use it directly.
Windows.Foundation.IAsyncOperation: Another interface but used only in Windows Store apps. Most of the times you translate it directly to Task, so you don't have to worry too much about it.
System.Threading.Tasks.Task: This is the recommended way now to handle .NET asynchronous and parallel (with the Parallel Task Library) operations. It's the hidden force behind C# async/await keywords, which are just syntactic sugar to pass continuations to Tasks.
So now with F# unique ways: Asynchronous Workflows and MailboxProcessor. It can roughly be said the former corresponds to parallelism while the latter deals with concurrency.
Asynchronous Workflows: This is just a computation expression (aka monad) which happens to deal with asynchrony: operations that run in the background to prevent blocking the UI thread or parallel operations to get the maximum performance in multi-core systems.
It's more or less the equivalent to C# async/await but we F# fans like to think it's a more elegant solution because it uses a more generic and flexible mechanism (computation expressions) which can be adapted for example to asynchronous sequences, events or even Javascript callbacks. It has also other advantages as Thomas Petricek explains here.
Within an asynchronous workflow most of the time you'll be using the methods in Control.Async or the extensions to .NET classes (like WebRequest.AsyncGetResponse) from the F# Core Library. If necessary, you can also interact directly with .NET Tasks (Async.AwaitTask and Async.StartAsTask) or even easily create your own async operations with Async.StartWithContinuations.
To learn more about asynchronous workflows you can consult the MSDN documentation, the magnificent Scott Wlaschin's site, Tomas Petricek's blog or the F# Wikibook.
Control.MailboxProcessor: Designed to deal with concurrency, that is, several processes running at the same time which usually need to share some information. The traditional .NET way to prevent memory corruption when several threads try to write a variable at the same time was the lock statement. Besides the fact that functional style prefers to use immutable values, memory locks are complicated to use properly and can also have a high performance penalty. So instead of this, MailboxProcessor uses an Erlang-like message-based (or actor-based) approach to concurrency.
I have not used MailboxProcessor myself that much, but for more info you can check Scott Wlaschin's site or the F# Wikibook.
I hope this helps! If someone sees something not completely correct in this answer, please feel free to edit it.
Cheers!

Implementing Semaphores, locks and condition variables

I wanted to know how to go about implementing semaphores, locks and condition variables in C/C++. I am learning OS concepts but want to get around implementing the concepts in C.
Any tutorials?

Semaphores, locks, condition variables etc. are operating system concepts and must typically be implemented in terms of features of the operating system kernel. It is therefore not generally possible to study them in isolation - you need to consider the kernel code too. Probably the best way of doing this is to take a look at the Linux Kernel, with the help of a book such as Understanding The Linux Kernel.

Semaphore at the very simplest is just a counter you can add and subtract from with a single atomic operation. Wikipedia has an easy to understand explanation that pretty much covers your question about them:
http://en.wikipedia.org/wiki/Semaphore_(programming)

A good starting point for learning OS concepts is probably Andrew Tanenbaum's "Modern Operating Systems". He also has another book on his own OS (Minix), which is called "Operating Systems: Design and Implementation" which goes more into detail about coding. You should be able to find those books in your local library.
Related topics you might want to look up to get the grip on how and why to use semaphores: race conditions, synchronization, multithreading, consumer-producer-problem.

At the ground level, if you want to implement that sort of thing, you're going to need to use assembly language. C and C++ simply don't expose the sort of features necessary to write concurrent code --- except by using libraries, which use assembler in their implementation.

The minix stuff is pretty good. A simpler example is the MicroC/OS stuff. It comes with a textbook that goes into good detail, all the source is there. It has the basic elements there and the code is small enough that you can understand it in a relatively short period of time.
http://www.micrium.com/products/rtos/kernel/rtos.html
http://en.wikipedia.org/wiki/MicroC/OS-II
Another thing you can do, is make a faked out OS in an application on linux. I did this by setting up the basic tick with an itimer, then swapping threads around with the function call swapcontext (man 2 swapcontext) which will save the regs on the stack. That gets the ugly stuff out of the way and you are left to implement the semaphores/mutexes/timers and all that. It was quite fun.
Despite what some of the posts say, assembler is not required. A knowledge of it will always help. It never hurts to understand how the internals/complilers/etc work when you are writing even high level applications.

For basic understanding, you can refer book Operating System Concepts, by Avi Silberschatz, Peter Baer Galvin, Greg Gagne and is really good.
You can also visit Dave Marshall's site for some support. Refer Semaphore section there.

Funny, Stevens Book is one of the classic texts for describing the use of synchronisation primitives and their uses. He certainly seems to think they can be used to control inter process communication. I tend to agree with him. Networking, no, IPC yes. most certainly yes.

You can catch up with alot of IPC( Inter Process Communication) Books, that can explain the ins and outs of what you need. There is one classic book. Unix Network Programming Inter Process Communication by Richard Stevens. you will get all you need. :)

Is there a decent C++ wrapper around Win32's lockless SList out there?

Windows provides a lockless singly linked list, as documented at this page:
Win32 SList
I'm wondering if there's an existing good C++ wrapper around this functionality. When I say good, I mean it exports the usual STL interface as much as possible, supports iterators, etc. I'd much rather use someone else's implementation than sit down to write an STL-type container.

You won't be able to layer an STL style interface on top of SList ever. In order to avoid memory management problems the only node in the list that is accessible is the head of the list. And the only way to access that node is to pop it off the list. This prevents two threads from having the same node and then one thread deleting that node while the other thread is still using it. This is what I mean by "memory management issues" and is a common problem in lock free programming. You could always pop the first node and then follow the "Next" pointers in the SLIST_ENTRY structs but this is an exceedingly bad idea unless you can guarantee that the list will not shrink, with nodes being deallocated, while you are reading it. Of course this still removes the head node from the list.
Basically you are trying to use SList wrong. For what it sounds like you want to do you just need to use an STL container and protect access to it using a lock. STL algorithms will not work with lock free data structures that are mutable like SList.
All that being said you could create a C++ wrapper around SList but it wouldn't be STL compatible.

It's worth noticing that the published interface at the page quoted in the question does not in fact implement a linked list (though that may be underlying structure) - it implements a stack. So if you want the features of a linked list that classes such as std::list provide, this may not be for you.
Also notice that stacks cannot support iterators (they basically only support push and pop), so much of the talk of supporting iterators and algorithms is wishful thinking.

You could quickly get up and running with boost and ::boost::iterator_facade.
No it wouldn't be optimal or portable and iterator semantics are something you should hear Alexandrescou suddenly come out against at DevCon. You are not locking the container, you are locking (and potentially relocking and unlocking ) the operations. And locking the operation means serial execution, very simple. There is plenty of iterator manipulation that will be an unnecessary penalty for the abstraction being created.
From Mars view, iterator is hiding the pointer, and hiding under a semi-OO concept that is as odds as OO-vs-Distributed development is.. I'd use a 'procedural' interface for sure and make the users/maintainers pay attention to why it is necessary. Lock-free ops are only as good as 'all the parallel code' surrounding it. And classic examples as people keep giving scoped_lock wrapping reinvention since '96 credit, it produces pretty serial code.
Or use the atomic and Sutter's DDJ entries as reference for poor man way forward (and more than 10 years of unorderedness of Pentium Pro later).
(all that is really happening is that boost and DDJ is running after a .net and MS CCR train that is running after immutability, as well as intel train that is running after a good OO-similar abstraction for lockfree development. The problem is it cannot be done well and some people fight it time and time again; much like concurrent_vector nonsense of TBB. The same reason exceptions never materialised as non-problematic, especially across environments, and the same reason why vector-processing in CPUs is underutilised by C++ compilers and so on and on..)

I think a thin wrapper should be very easy to write. something like 1-2 pages, possibly all in the .h file. Instead of combing google I'd write it myself already.

Which concurrent programming concepts do hiring managers expect developers to understand?

When I hire developers for general mid-to-senior web app development positions, I generally expect them to understand core concurrent programming concepts such as liveness vs. safety, race conditions, thread synchronization and deadlocks. I'm not sure whether to consider topics like fork/join, wait/notify, lock ordering, memory model basics (just the basics) and so forth to be part of what every reasonably seasoned developer ought to know, or whether these are topics that are more for semi-specialists (i.e. developers who have made a conscious decision to know more than the average developer about concurrent programming).
I'd be curious to hear your thoughts.

I tend to think that at this point in time concurrent programming at any serious level of depth is still a specialist skill. Many will claim to know about it through study, but many will also make an almighty mess of it when they come to apply it.
In addition to the considerations listed, I would also look at resource implications and the various overheads of using processes, threads and fibers. In some contexts, e.g. mobile devices, excessive multithreading can have serious performance implications. This can lead to portability issues with multithreaded code.
I guess if I was interviewing a candidate in this situation, I would work with a real world example rather than hitting on more general topics which can be quoted back verbatim from a text book. I say this having done a fair bit of multithreaded work myself and remembering how badly I screwed up the first couple of times. Many can talk the talk... ;)

I know all these topics, but I studied them. I also know many competent senior programmers that don't know these. So unless you expect these programmers to be using those concepts actively, there is no reason to turn down a perfectly good candidate because they don't understand every aspect of concurrency

The real question is:
In what ways does it matter to the code they will be developing?
You should know which concepts the development position you're hiring for needs to know to be able to work on the projects that they will be responsible for.
As with anything in the programming world.. The devil is in the details, and you can't know everything. Would you expect them to know Perl if you were hiring for a Java position?
Also, concurrency, at this stage, while well described in generalized theory, is heavily implementation and platform dependent. Concurrency in Perl on an AIX box is not the same game as concurrency in a C++ Winforms app. They can have all the theory in the world under their belts, but if it's required for the job, then they should have intimate knowledge of the platform they are expected to use it on as well.

I interview folks for concurrency-related positions frequently and I look for three general aspects:
General understanding of core concepts like the ones you list (language-independent)
Specific understanding of Java concurrency libraries and primitives (specific to the work they'd be doing)
Ability to design the solution to a concurrent problem in a reasonable way.
I consider #1 a requirement (for my positions). I consider #2 a nice to have. If they understand it and can describe it in terms of pthreads or whatever other library, it's no biggie to learn the latest Java concurrency libraries (the concepts are the hard part). And #3 tends to separate the hires from the maybe-hires.
Per your question, I wouldn't consider fork/join to be known by almost anyone, esp someone applying for a web app developer position. I would look for developers to have experience with some (but not all) of those topics. Most developers I've interviewed have not used the Java 5+ concurrency libs at all but they can typically describe things like data race or deadlock.

Has anyone tried transactional memory for C++?

I was checking out Intel's "whatif" site and their Transactional Memory compiler (each thread has to make atomic commits or rollback the system's memory, like a Database would).
It seems like a promising way to replace locks and mutexes but I can't find many testimonials. Does anyone here have any input?

I have not used Intel's compiler, however, Herb Sutter had some interesting comments on it...
From Sutter Speaks: The Future of Concurrency
Do you see a lot of interest in and usage of transactional memory, or is the concept too difficult for most developers to grasp?
It's not yet possible to answer who's using it because it hasn't been brought to market yet. Intel has a software transactional memory compiler prototype. But if the question is "Is it too hard for developers to use?" the answer is that I certainly hope not. The whole point is it's way easier than locks. It is the only major thing on the research horizon that holds out hope of greatly reducing our use of locks. It will never replace locks completely, but it's our only big hope to replacing them partially.
There are some limitations. In particular, some I/O is inherently not transactional—you can't take an atomic block that prompts the user for his name and read the name from the console, and just automatically abort and retry the block if it conflicts with another transaction; the user can tell the difference if you prompt him twice. Transactional memory is great for stuff that is only touching memory, though.
Every major hardware and software vendor I know of has multiple transactional memory tools in R&D. There are conferences and academic papers on theoretical answers to basic questions. We're not at the Model T stage yet where we can ship it out. You'll probably see early, limited prototypes where you can't do unbounded transactional memory—where you can only read and write, say, 100 memory locations. That's still very useful for enabling more lock-free algorithms, though.

Dr. Dobb's had an article on the concept last year: Transactional Programming by Calum Grant -- http://www.ddj.com/cpp/202802978
It includes some examples, comparisons, and conclusions using his example library.

I've built the combinatorial STM library on top of some functional programming ideas. It doesn't require any compiler support (except it uses C++17), doesn't bring a new syntax. In general, it adopts the interface of the STM library from Haskell.
So, my library has several nice properties:
Monadically combinatorial. Every transaction is a computation inside the custom monad named STML. You can combine monadic transactions into more big monadic transactions.
Transactions are separated from data model. You construct your concurrent data model with transactional variables (TVars) and run transactions over it.
There is retry combinator. It allows you to rerun the transaction. Very useful to build short and understandable transactions.
There are different monadic combinators to express computations shortly.
There is Context. Every computation should be run in some context, not in the global runtime. So you can have many different contexts if you need several independent STM clusters.
The implementation is quite simple conceptually. At least, the reference implementation in Haskell is so, but I had to reinvent several approaches for C++ implementation due to the lack of a good support of Functional Programming.
The library shows very nice stability and robustness, even if we consider it experimental. Moreover, my approach opens a lot of possibilities to improve the library by performance, features, comprehensiveness, etc.
To demonstrate its work, I've solved the Dining Philosophers task. You can find the code in the links below. Sample transaction:
STML<bool> takeFork(const TVar<Fork>& tFork)
{
STML<bool> alreadyTaken = withTVar(tFork, isForkTaken);
STML<Unit> takenByUs = modifyTVar(tFork, setForkTaken);
STML<bool> success = sequence(takenByUs, pure(true));
STML<bool> fail = pure(false);
STML<bool> result = ifThenElse(alreadyTaken, fail, success);
return result;
};
UPDATE
I've wrote a tutorial, you can find it here.
Dining Philosophers task
My C++ STM library

Sun Microsystems have announced that they're releasing a new processor next year, codenamed Rock, that has hardware support for transactional memory. It will have some limitations, but it's a good first step that should make it easier for programmers to replace locks/mutexes with transactions and expect good performance out of it.
For an interesting talk on the subject, given by Mark Moir, one of the researchers at Sun working on Transactional Memory and Rock, check out this link.
For more information and announcements from Sun about Rock and Transactional Memory in general, this link.
The obligatory wikipedia entry :)
Finally, this link, at the University of Wisconsin-Madison, contains a bibliography of most of the research that has been and is being done about Transactional Memory, whether it's hardware related or software related.

In some cases I can see this as being useful and even necessary.
However, even if the processor has special instructions that make this process easier there is still a large overhead compared to a mutex or semaphore. Depending on how it's implemented it may also impact realtime performance (have to either stop interrupts, or prevent them from writing into your shared areas).
My expectation is that if this was implemented, it would only be needed for portions of a given memory space, though, and so the impact could be limited.
-Adam

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js