Has anyone tried transactional memory for C++?

Has anyone tried transactional memory for C++? - c++

I was checking out Intel's "whatif" site and their Transactional Memory compiler (each thread has to make atomic commits or rollback the system's memory, like a Database would).
It seems like a promising way to replace locks and mutexes but I can't find many testimonials. Does anyone here have any input?

I have not used Intel's compiler, however, Herb Sutter had some interesting comments on it...
From Sutter Speaks: The Future of Concurrency
Do you see a lot of interest in and usage of transactional memory, or is the concept too difficult for most developers to grasp?
It's not yet possible to answer who's using it because it hasn't been brought to market yet. Intel has a software transactional memory compiler prototype. But if the question is "Is it too hard for developers to use?" the answer is that I certainly hope not. The whole point is it's way easier than locks. It is the only major thing on the research horizon that holds out hope of greatly reducing our use of locks. It will never replace locks completely, but it's our only big hope to replacing them partially.
There are some limitations. In particular, some I/O is inherently not transactional—you can't take an atomic block that prompts the user for his name and read the name from the console, and just automatically abort and retry the block if it conflicts with another transaction; the user can tell the difference if you prompt him twice. Transactional memory is great for stuff that is only touching memory, though.
Every major hardware and software vendor I know of has multiple transactional memory tools in R&D. There are conferences and academic papers on theoretical answers to basic questions. We're not at the Model T stage yet where we can ship it out. You'll probably see early, limited prototypes where you can't do unbounded transactional memory—where you can only read and write, say, 100 memory locations. That's still very useful for enabling more lock-free algorithms, though.

Dr. Dobb's had an article on the concept last year: Transactional Programming by Calum Grant -- http://www.ddj.com/cpp/202802978
It includes some examples, comparisons, and conclusions using his example library.

I've built the combinatorial STM library on top of some functional programming ideas. It doesn't require any compiler support (except it uses C++17), doesn't bring a new syntax. In general, it adopts the interface of the STM library from Haskell.
So, my library has several nice properties:
Monadically combinatorial. Every transaction is a computation inside the custom monad named STML. You can combine monadic transactions into more big monadic transactions.
Transactions are separated from data model. You construct your concurrent data model with transactional variables (TVars) and run transactions over it.
There is retry combinator. It allows you to rerun the transaction. Very useful to build short and understandable transactions.
There are different monadic combinators to express computations shortly.
There is Context. Every computation should be run in some context, not in the global runtime. So you can have many different contexts if you need several independent STM clusters.
The implementation is quite simple conceptually. At least, the reference implementation in Haskell is so, but I had to reinvent several approaches for C++ implementation due to the lack of a good support of Functional Programming.
The library shows very nice stability and robustness, even if we consider it experimental. Moreover, my approach opens a lot of possibilities to improve the library by performance, features, comprehensiveness, etc.
To demonstrate its work, I've solved the Dining Philosophers task. You can find the code in the links below. Sample transaction:
STML<bool> takeFork(const TVar<Fork>& tFork)
{
STML<bool> alreadyTaken = withTVar(tFork, isForkTaken);
STML<Unit> takenByUs = modifyTVar(tFork, setForkTaken);
STML<bool> success = sequence(takenByUs, pure(true));
STML<bool> fail = pure(false);
STML<bool> result = ifThenElse(alreadyTaken, fail, success);
return result;
};
UPDATE
I've wrote a tutorial, you can find it here.
Dining Philosophers task
My C++ STM library

Sun Microsystems have announced that they're releasing a new processor next year, codenamed Rock, that has hardware support for transactional memory. It will have some limitations, but it's a good first step that should make it easier for programmers to replace locks/mutexes with transactions and expect good performance out of it.
For an interesting talk on the subject, given by Mark Moir, one of the researchers at Sun working on Transactional Memory and Rock, check out this link.
For more information and announcements from Sun about Rock and Transactional Memory in general, this link.
The obligatory wikipedia entry :)
Finally, this link, at the University of Wisconsin-Madison, contains a bibliography of most of the research that has been and is being done about Transactional Memory, whether it's hardware related or software related.

In some cases I can see this as being useful and even necessary.
However, even if the processor has special instructions that make this process easier there is still a large overhead compared to a mutex or semaphore. Depending on how it's implemented it may also impact realtime performance (have to either stop interrupts, or prevent them from writing into your shared areas).
My expectation is that if this was implemented, it would only be needed for portions of a given memory space, though, and so the impact could be limited.
-Adam

Related

Tools to experiment with weakly ordered concurrency

What tools exist to help one to experiment with weakly ordered concurrency? That is, in what sandbox can one play while teaching oneself about partial fences, weak atomics, acquire/consume/release semantics, lock-free algorithms and the like?
The tool or sandbox one wants would exercise and stress one's weakly ordered, threaded algorithm, exposing the various ways in which the algorithm might theoretically fail. Physically running on an x86, for example, the tool would nevertheless be able to expose ARM-type failures.
An open-source tool would be preferable. Please advise.
References:
the C++11 draft standard (PDF, see clauses 1, 29 and 30);
Hans-J. Boehm's overview of the subject;
McKenney, Boehm and Crowl on the subject;
GCC's developmental notes on the subject;
the Linux kernel's notes on the subject;
a related question with answers here on Stackoverflow
another question, this one comparing fences against atomics;
Cppmem (on the advice of #KerrekSB);
Cppmem's help page;
Spin (a tool for analyzing the logical consistency of concurrent systems, on the advice of #JohnZwinck).
(The references are oriented toward C++11 because this is how I happen to have approached the subject. However, for all I know, a non-C++ answer might be best, so feel free to extend your answer beyond C++ as you see fit.)

This is quite a bit more general than what your question directly asks, but take a look at "Spin," a "model checker" for concurrent systems. An online manual is here: http://spinroot.com/spin/Man/Manual.html
You will probably find it to be a bit "old school" in feel, but I see no reason why it wouldn't be suitable for the jobs you're interested in. Since it is quite general, however, you may need to do a bit of work to teach the tool about the problem space. The good news is that it is platform-independent. The bad news is you'd probably need to model each computer architecture explicitly (Spin doesn't intrinsically know about the guarantees of ARM vs. x86, for example). But maybe some of that work has been done elsewhere (I didn't check), and/or you could share pieces of what you do so others may benefit. The tool is open-source, after all.

You might be interested in having a look at http://www.cprover.org/wmm/ and follow the links there leading to tools and corresponding papers about weak memory, in particular the CAV 2013 paper Partial Orders for Efficient BMC of Concurrent Software and the CAV 2014 paper Don't sit on the fence:
A static analysis approach to automatic fence insertion might be good starting points. You will also find lots of real-world example code and benchmarks there.

When knowledge of "multithreading" is specified in a C++ job description, what is the expectation?

I understand it should cover threading primitives (mutex, semaphore, condition variables etc.) plus design patterns (such as those specified in POSA2). But what's more? Every project has its own multithreading scenarios and one may have not dealt with those that the job is expecting?. So how does one build their knowledge and prove that they have the ability?

Regardless of specifics solid, detailed and very deep knowledge is required. One should understand how the bottlenecks form, how to deal with scalability problems, how to diagnose cases where synchronization is required but is erroneously omitted.
If for example you had a job experience with multithreading and I ran an interview to assess you I'd ask detailed questions on typical scenarios that arise when developing multithreaded programs. I wouldn't expect you knew many technologies or some specific technology, but I'd expect you to have mastered the technology you claim you're familiar with in great detail and to understand which fundamental problems it solves and how.

I would expect the candidate has knowledge and experience of the issues that arise when multiple threads access shared resources. What problems can be caused by concurrent access and what problems the solutions (such as locking etc) present.
At the very least understanding of how to write and read asych code on the platform of choice.
After this it will be understanding the specifics if the platform - e.g. such how to access the primary window in windows system while many things needs up the display at the same time.
Fundamentally is about understanding what trade-offs are needed and when.

May I present a different view. I think you should understand the basics,
but really never give up on a job based on a flyer description. I have
not met a programming concept that could not be figured out in half a day.
So, basically, read a tutorial before the interview, do not try to misrepresent
your actual experience with threading, but make sure they know the things you
had more hands on experience, and see if there is a mutual interest in you working
for the company. They may like you even if you know nothing about threading if they are confident that you can pick it up at full speed.

Is Communicating Sequential Processes ever used in large multi threaded C++ programs?

I'm currently writing a large multi threaded C++ program (> 50K LOC).
As such I've been motivated to read up alot on various techniques for handling multi-threaded code. One theory I've found to be quite cool is:
http://en.wikipedia.org/wiki/Communicating_sequential_processes
And it's invented by a slightly famous guy, who's made other non-trivial contributions to concurrent programming.
However, is CSP used in practice? Can anyone point to any large application written in a CSP style?
Thanks!

CSP, as a process calculus, is fundamentally a theoretical thing that enables us to formalize and study some aspects of a parallel program.
If you instead want a theory that enables you to build distributed programs, then you should take a look to parallel structured programming.
Parallel structural programming is the base of the current HPC (high-performance computing) research and provides to you a methodology about how to approach and design parallel programs (essentially, flowcharts of communicating computing nodes) and runtime systems to implements them.
A central idea in parallel structured programming is that of algorithmic skeleton, developed initially by Murray Cole. A skeleton is a thing like a parallel design pattern with a cost model associated and (usually) a run-time system that supports it. A skeleton models, study and supports a class of parallel algorithms that have a certain "shape".
As a notable example, mapreduce (made popular by Google) is just a kind of skeleton that address data parallelism, where a computation can be described by a map phase (apply a function f to all elements that compose the input data), and a reduce phase (take all the transformed items and "combine" them using an associative operator +).
I found the idea of parallel structured programming both theoretical sound and practical useful, so I'll suggest to give a look to it.
A word about multi-threading: since skeletons addresses massive parallelism, usually they are implemented in distributed memory instead of shared. Intel has developed a tool, TBB, which address multi-threading and (partially) follows the parallel structured programming framework. It is a C++ library, so probably you can just start using it in your projects.

Yes and no. The basic idea of CSP is used quite a bit. For example, thread-safe queues in one form or another are frequently used as the primary (often only) communication mechanism to build a pipeline out of individual processes (threads).
Hoare being Hoare, however, there's quite a bit more to his original theory than that. He invented a notation for talking about the processes, defined a specific set of signals that can be sent between the processes, and so on. The notation has since been refined in various ways, quite a bit of work put into proving various aspects, and so on.
Application of that relatively formal model of CSP (as opposed to just the general idea) is much less common. It's been used in a few systems where high reliability was considered extremely important, but few programmers appear interested in learning (yet another) formal design notation.
When I've designed systems like this, I've generally used an approach that's less rigorous, but (at least to me) rather easier to understand: a fairly simple diagram, with boxes representing the processes, and arrows representing the lines of communication. I doubt I could really offer much in the way of a proof about most of the designs (and I'll admit I haven't designed a really huge system this way), but it's worked reasonably well nonetheless.

Take a look at the website for a company called Verum. Their ASD technology is based on CSP and is used by companies like Philips Healthcare, Ericsson and NXP semiconductors to build software for all kinds of high-tech equipment and applications.
So to answer your question: Yes, CSP is used on large software projects in real-life.
Full disclosure: I do freelance work for Verum

Answering a very old question, yet it seems important that one
There is Go where CSPs are a fundamental part of the language. In the FAQ to Go, the authors write:
Concurrency and multi-threaded programming have a reputation for difficulty. We believe this is due partly to complex designs such as pthreads and partly to overemphasis on low-level details such as mutexes, condition variables, and memory barriers. Higher-level interfaces enable much simpler code, even if there are still mutexes and such under the covers.
One of the most successful models for providing high-level linguistic support for concurrency comes from Hoare's Communicating Sequential Processes, or CSP. Occam and Erlang are two well known languages that stem from CSP. Go's concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects. Experience with several earlier languages has shown that the CSP model fits well into a procedural language framework.
Projects implemented in Go are:
Docker
Google's download server
Many more

This style is ubiquitous on Unix where many tools are designed to process from standard in to standard out. I don't have any first hand knowledge of large systems that are build that way, but I've seen many small once-off systems that are
for instance this simple command line uses (at least) 3 processes.
cat list-1 list-2 list-3 | sort | uniq > final.list

This system is only moderately sized, but I wrote a protocol processor that strips away and interprets successive layers of protocol in a message that used a style very similar to this. It was an event driven system using something akin to cooperative threading, but I could've used multithreading fairly easily with a couple of added tweaks.
The program is proprietary (unfortunately) so I can't show off the source code.
In my opinion, this style is useful for some things, but usually best mixed with some other techniques. Often there is a core part of your program that represents a processing bottleneck, and applying various concurrency increasing techniques there is likely to yield the biggest gains.

Microsoft had a technology called ActiveMovie (if I remember correctly) that did sequential processing on audio and video streams. Data got passed from one filter to another to go from input to output format (and source/sink). Maybe that's a practical example??

The Wikipedia article looks to me like a lot of funny symbols used to represent somewhat pedestrian concepts. For very large or extensible programs, the formalism can be very important to check how the (sub)processes are allowed to interact.
For a 50,000 line class program, you're probably better off architecting it as you see fit.
In general, following ideas such as these is a good idea in terms of performance. Persistent threads that process data in stages will tend not to contend, and exploit data locality well. Also, it is easy to throttle the threads to avoid data piling up as a fast stage feeds a slow stage: just block the fast one if its output buffer grows too big.

A little bit off-topic but for my thesis I used a tool framework called TERRA/LUNA which aims for software development for Embedded Control Systems but is used heavily for all sorts of software development at my institute (so only academical use here).
TERRA is a graphical CSP and software architecture editor and LUNA is both the name for a C++ library for CSP based constructs and the plugin you'll find in TERRA to generate C++ code from your CSP models.
It becomes very handy in combination with FDR3 (a CSP refinement checker) to detect any sort of (dead/life/etc) lock or even profiling.

can one make concurrent scalable reliable programs in C as in erlang?

a theoretical question. After reading Armstrongs 'programming erlang' book I was wondering the following:
It will take some time to learn Erlang. Let alone master it. It really is fundamentally different in a lot of respects.
So my question: Is it possible to write 'like erlang' or with some 'erlang like framework', which given that you take care not to create functions with sideffects, you can create scaleable reliable apps as well as in Erlang? Maybe with the same msgs sending, loads of 'mini processes' paradigm.
The advantage would be to not throw all your accumulated C/C++ knowledge over the fence.
Any thoughts about this would be welcome

Yes, it is possible, but...
Probably the best answer for this question is given by Robert Virding’s First Rule:
“Any sufficiently complicated
concurrent program in another language
contains an ad hoc,
informally-specified, bug-ridden, slow
implementation of half of Erlang.”
Very good rule is use the right tool for the task. Erlang excels in concurrency and reliability. C/C++ was not designed with these properties in mind.
If you don't want to throw away your C/C++ knowledge and experience and your project allows this kind of division, good approach is to create a mixed solution. Write concurrent, communication and error handling code in Erlang, then add C/C++ parts, which will do CPU and IO bound stuff.

You clearly can - the Erlang/OTP system is largely written in C (and Erlang). The question is 'why would you want to?'
In 'ye olde days' people used to write their own operating system - but why would you want to?
If you elect to use an operating system your unwritten software has certain properties - it can persist to hard disk, it can speak to a network, it can draw on screens, it can run from the command line, it can be invoked in batch mode, etc, etc...
The Erlang/OTP system is 1.5M lines of code which has been demonstrated to give 99.9999999% uptime in large systems (the UK phone system) - that's 31ms downtime a year.
With Erlang/OTP your unwritten software has high reliability, it can hot-swap itself, your unwritten application can failover when a physical computer dies.
Why would you want to rewrite that functionality?

I would break this into 2 questions
Can you write concurrent, scalable C++ applications
Yes. It's certainly possible to create the low level constructs needed in order to achieve this.
Would you want to write concurrent, scalable, C++ applications
Perhaps. But if I was going for a highly concurrent application, I would choose a language that was either designed to fill that void or easily lent itself to doing so (Erlang, F# and possibly C#).
C++ was not designed to build highly concurrent applications. But it can certainly be tweaked into doing so. The cost might be higher than you expect though once you factor in memory management.

Yes, but you will be doing some extra work.
Regarding side effects, consider how the .net/plinq team is approaching. Plinq won't be able to enforce you hand it stuff with no side effects, but it will assume you do so and play by its rules so we get to use a simpler api. Even if the language doesn't have built-in support for it, it will still simplify things as you can break the operations more easily.

What I can do in one Turing complete language I can do in any other Turing complete language.
So I interpret your question to read, is it as easy to write a reliable and scalable application in C++ as it is in Erlang?
The answer to that is highly subjective. For me it is easier to write it in C++ for the following reasons:
I have already done it in C++ (at least three times).
I don't know Erlang.
I have read a great deal about Stackless Python, which feels to me like a highly concurrent message based cooperative multitasking system in python, but of course python is written on top of C.
Having said that. If you already know both languages, and you have the problem well defined, you can then make the best choice based on all the information you have at hand.

the main 'problem' with C (or C++) for writing reliable and easy to extend programs is that in C you can do anything. so, the first step would be to write a simple framework that restricts just a bit. most good programmers do that anyway.
in this case, the restrictions would be mostly to make it easy to define a 'process' within whatever level of isolation you want. fork() has a reputation of being slow, and threads also need significant time to spawn, so you might want to use a cooperative multitasking, which can be far more efficient, and you could even make it preemptive (i think that's what Erlang does). to get multi-core efficiency, set a pool of threads and make all of them complete to run the tasks.
another important part would be to create an appropriate library of immutable data structures, so that using them (instead of the standard lib) your functions would be (mostly) side-effect-free.
then it's just a matter of setting a good API for message passing and futures... not easy, but at least it doesn't seem like changing the language itself.

Which concurrent programming concepts do hiring managers expect developers to understand?

When I hire developers for general mid-to-senior web app development positions, I generally expect them to understand core concurrent programming concepts such as liveness vs. safety, race conditions, thread synchronization and deadlocks. I'm not sure whether to consider topics like fork/join, wait/notify, lock ordering, memory model basics (just the basics) and so forth to be part of what every reasonably seasoned developer ought to know, or whether these are topics that are more for semi-specialists (i.e. developers who have made a conscious decision to know more than the average developer about concurrent programming).
I'd be curious to hear your thoughts.

I tend to think that at this point in time concurrent programming at any serious level of depth is still a specialist skill. Many will claim to know about it through study, but many will also make an almighty mess of it when they come to apply it.
In addition to the considerations listed, I would also look at resource implications and the various overheads of using processes, threads and fibers. In some contexts, e.g. mobile devices, excessive multithreading can have serious performance implications. This can lead to portability issues with multithreaded code.
I guess if I was interviewing a candidate in this situation, I would work with a real world example rather than hitting on more general topics which can be quoted back verbatim from a text book. I say this having done a fair bit of multithreaded work myself and remembering how badly I screwed up the first couple of times. Many can talk the talk... ;)

I know all these topics, but I studied them. I also know many competent senior programmers that don't know these. So unless you expect these programmers to be using those concepts actively, there is no reason to turn down a perfectly good candidate because they don't understand every aspect of concurrency

The real question is:
In what ways does it matter to the code they will be developing?
You should know which concepts the development position you're hiring for needs to know to be able to work on the projects that they will be responsible for.
As with anything in the programming world.. The devil is in the details, and you can't know everything. Would you expect them to know Perl if you were hiring for a Java position?
Also, concurrency, at this stage, while well described in generalized theory, is heavily implementation and platform dependent. Concurrency in Perl on an AIX box is not the same game as concurrency in a C++ Winforms app. They can have all the theory in the world under their belts, but if it's required for the job, then they should have intimate knowledge of the platform they are expected to use it on as well.

I interview folks for concurrency-related positions frequently and I look for three general aspects:
General understanding of core concepts like the ones you list (language-independent)
Specific understanding of Java concurrency libraries and primitives (specific to the work they'd be doing)
Ability to design the solution to a concurrent problem in a reasonable way.
I consider #1 a requirement (for my positions). I consider #2 a nice to have. If they understand it and can describe it in terms of pthreads or whatever other library, it's no biggie to learn the latest Java concurrency libraries (the concepts are the hard part). And #3 tends to separate the hires from the maybe-hires.
Per your question, I wouldn't consider fork/join to be known by almost anyone, esp someone applying for a web app developer position. I would look for developers to have experience with some (but not all) of those topics. Most developers I've interviewed have not used the Java 5+ concurrency libs at all but they can typically describe things like data race or deadlock.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js