Tools to experiment with weakly ordered concurrency - concurrency

What tools exist to help one to experiment with weakly ordered concurrency? That is, in what sandbox can one play while teaching oneself about partial fences, weak atomics, acquire/consume/release semantics, lock-free algorithms and the like?
The tool or sandbox one wants would exercise and stress one's weakly ordered, threaded algorithm, exposing the various ways in which the algorithm might theoretically fail. Physically running on an x86, for example, the tool would nevertheless be able to expose ARM-type failures.
An open-source tool would be preferable. Please advise.
References:
the C++11 draft standard (PDF, see clauses 1, 29 and 30);
Hans-J. Boehm's overview of the subject;
McKenney, Boehm and Crowl on the subject;
GCC's developmental notes on the subject;
the Linux kernel's notes on the subject;
a related question with answers here on Stackoverflow
another question, this one comparing fences against atomics;
Cppmem (on the advice of #KerrekSB);
Cppmem's help page;
Spin (a tool for analyzing the logical consistency of concurrent systems, on the advice of #JohnZwinck).
(The references are oriented toward C++11 because this is how I happen to have approached the subject. However, for all I know, a non-C++ answer might be best, so feel free to extend your answer beyond C++ as you see fit.)

This is quite a bit more general than what your question directly asks, but take a look at "Spin," a "model checker" for concurrent systems. An online manual is here: http://spinroot.com/spin/Man/Manual.html
You will probably find it to be a bit "old school" in feel, but I see no reason why it wouldn't be suitable for the jobs you're interested in. Since it is quite general, however, you may need to do a bit of work to teach the tool about the problem space. The good news is that it is platform-independent. The bad news is you'd probably need to model each computer architecture explicitly (Spin doesn't intrinsically know about the guarantees of ARM vs. x86, for example). But maybe some of that work has been done elsewhere (I didn't check), and/or you could share pieces of what you do so others may benefit. The tool is open-source, after all.

You might be interested in having a look at http://www.cprover.org/wmm/ and follow the links there leading to tools and corresponding papers about weak memory, in particular the CAV 2013 paper Partial Orders for Efficient BMC of Concurrent Software and the CAV 2014 paper Don't sit on the fence:
A static analysis approach to automatic fence insertion might be good starting points. You will also find lots of real-world example code and benchmarks there.

Related

Why is C++ template use not recommended in a space/radiated environment?

By reading this question, I understood, for instance, why dynamic allocation or exceptions are not recommended in environments where radiation is high, like in space or in a nuclear power plant.
Concerning templates, I don't see why. Could you explain it to me?
Considering this answer, it says that it is quite safe to use.
Note: I'm not talking about complex standard library stuff, but purpose-made custom templates.
Notice that space-compatible (radiation-hardened, aeronautics compliant) computing devices are very expensive (including to launch in space, since their weight exceeds kilograms), and that a single space mission costs perhaps hundred million € or US$. Losing the mission because of software or computer concerns has generally a prohibitive cost so is unacceptable and justifies costly development methods and procedures that you won't even dream using for developing your mobile phone applet, and using probabilistic reasoning and engineering approaches is recommended, since cosmic rays are still somehow an "unusual" event. From a high-level point of view, a cosmic ray and the bit flip it produces can be considered as noise in some abstract form of signal or of input. You could look at that "random bit-flip" problem as a signal-to-noise ratio problem, then randomized algorithms may provide a useful conceptual framework (notably at the meta level, that is when analyzing your safety-critical source code or compiled binary, but also, at critical system run-time, in some sophisticated kernel or thread scheduler), with an information theory viewpoint.
Why C++ template use is not recommended in space/radiated environment?
That recommendation is a generalization, to C++, of MISRA C coding rules and of Embedded C++ rules, and of DO178C recommendations, and it is not related to radiation, but to embedded systems. Because of radiation and vibration constraints, the embedded hardware of any space rocket computer has to be very small (e.g. for economical and energy-consumption reasons, it is more -in computer power- a Raspberry Pi-like system than a big x86 server system). Space hardened chips cost 1000x much as their civilian counterparts. And computing the WCET on space-embedded computers is still a technical challenge (e.g. because of CPU cache related issues). Hence, heap allocation is frowned upon in safety-critical embedded software-intensive systems (how would you handle out-of-memory conditions in these? Or how would you prove that you have enough RAM for all real run time cases?)
Remember that in the safety-critical software world, you not only somehow "guarantee" or "promise", and certainly assess (often with some clever probabilistic reasoning), the quality of your own software, but also of all the software tools used to build it (in particular: your compiler and your linker; Boeing or Airbus won't change their version of GCC cross-compiler used to compile their flight control software without prior written approval from e.g. FAA or DGAC). Most of your software tools need to be somehow approved or certified.
Be aware that, in practice, most C++ (but certainly not all) templates internally use the heap. And standard C++ containers certainly do. Writing templates which never use the heap is a difficult exercise. If you are capable of that, you can use templates safely (assuming you do trust your C++ compiler and its template expansion machinery, which is the trickiest part of the C++ front-end of most recent C++ compilers, such as GCC or Clang).
I guess that for similar (toolset reliability) reasons, it is frowned upon to use many source code generation tools (doing some kind of metaprogramming, e.g. emitting C++ or C code). Observe, for example, that if you use bison (or RPCGEN) in some safety critical software (compiled by make and gcc), you need to assess (and perhaps exhaustively test) not only gcc and make, but also bison. This is an engineering reason, not a scientific one. Notice that some embedded systems may use randomized algorithms, in particular to cleverly deal with noisy input signals (perhaps even random bit flips due to rare-enough cosmic rays). Proving, testing, or analyzing (or just assessing) such random-based algorithms is a quite difficult topic.
Look also into Frama-Clang and CompCert and observe the following:
C++11 (or following) is an horribly complex programming language. It has no complete formal semantics. The people
expert enough in C++ are only a few dozens worldwide (probably, most
of them are in its standard committee). I am capable of coding in
C++, but not of explaining all the subtle corner cases of move
semantics, or of the C++ memory model. Also, C++ requires in practice many optimizations to be used efficiently.
It is very difficult to make an error-free C++ compiler, in particular because C++ practically requires tricky optimizations, and because of the complexity of the C++ specification. But current
ones (like recent GCC or Clang) are in practice quite good, and they have few (but still some)
residual compiler bugs. There is no CompCert++ for C++ yet, and making one requires several millions of € or US$ (but if you can collect such an amount of money, please contact me by email, e.g. to basile.starynkevitch#cea.fr, my work email). And the space software industry is extremely conservative.
It is difficult to make a good C or C++ heap memory allocator. Coding
one is a matter of trade-offs. As a joke, consider adapting this C heap allocator to C++.
proving safety properties (in particular, lack of race conditions or undefined behavior such as buffer overflow at run-time) of template-related C++ code is still, in 2Q2019, slightly ahead of the state of the art of static program analysis of C++ code. My draft Bismon technical report (it is a draft H2020 deliverable, so please skip pages for European bureaucrats) has several pages explaining this in more details. Be aware of Rice's theorem.
a whole system C++ embedded software test could require a rocket launch (a la Ariane 5 test flight 501, or at least complex and heavy experimentation in lab). It is very expensive. Even testing, on Earth, a Mars rover takes a lot of money.
Think of it: you are coding some safety-critical embedded software (e.g. for train braking, autonomous vehicles, autonomous drones, big oil platform or oil refinery, missiles, etc...). You naively use some C++ standard container, e.g. some std::map<std::string,long>. What should happen for out of memory conditions? How do you "prove", or at least "convince", to the people working in organizations funding a 100M€ space rocket, that your embedded software (including the compiler used to build it) is good enough? A decade-year old rule was to forbid any kind of dynamic heap allocation.
I'm not talking about complex standard library stuff but purposed-made custom templates.
Even these are difficult to prove, or more generally to assess their quality (and you'll probably want to use your own allocator inside them). In space, the code space is a strong constraint. So you would compile with, for example, g++ -Os -Wall or clang++ -Os -Wall. But how did you prove -or simply test- all the subtle optimizations done by -Os (and these are specific to your version of GCC or of Clang)? Your space funding organization will ask you that, since any run-time bug in embedded C++ space software can crash the mission (read again about Ariane 5 first flight failure - coded in some dialect of Ada which had at that time a "better" and "safer" type system than C++17 today), but don't laugh too much at Europeans. Boeing 737 MAX with its MACS is a similar mess).
My personal recommendation (but please don't take it too seriously. In 2019 it is more a pun than anything else) would be to consider coding your space embedded software in Rust. Because it is slightly safer than C++. Of course, you'll have to spend 5 to 10 M€ (or MUS$) in 5 or 7 years to get a fine Rust compiler, suitable for space computers (again, please contact me professionally, if you are capable of spending that much on a free software Compcert/Rust like compiler). But that is just a matter of software engineering and software project managements (read both the Mythical Man-Month and Bullshit jobs for more, be also aware of Dilbert principle: it applies as much to space software industry, or embedded compiler industry, as to anything else).
My strong and personal opinion is that the European Commission should fund (e.g. through Horizon Europe) a free software CompCert++ (or even better, a Compcert/Rust) like project (and such a project would need more than 5 years and more than 5 top-class, PhD researchers). But, at the age of 60, I sadly know it is not going to happen (because the E.C. ideology -mostly inspired by German policies for obvious reasons- is still the illusion of the End of History, so H2020 and Horizon Europe are, in practice, mostly a way to implement tax optimizations for corporations in Europe through European tax havens), and that after several private discussions with several members of CompCert project. I sadly expect DARPA or NASA to be much more likely to fund some future CompCert/Rust project (than the E.C. funding it).
NB. The European avionics industry (mostly Airbus) is using much more formal methods approaches that the North American one (Boeing). Hence some (not all) unit tests are avoided (since replaced by formal proofs of source code, perhaps with tools like Frama-C or Astrée - neither have been certified for C++, only for a subset of C forbidding C dynamic memory allocation and several other features of C). And this is permitted by DO-178C (not by the predecessor DO-178B) and approved by the French regulator, DGAC (and I guess by other European regulators).
Also notice that many SIGPLAN conferences are indirectly related to the OP's question.
The argumentation against the usage of templates in safety code is that they are considered to increase the complexity of your code without real benefit. This argumentation is valid if you have bad tooling and a classic idea of safety. Take the following example:
template<class T> fun(T t){
do_some_thing(t);
}
In the classic way to specify a safety system you have to provide a complete description of each and every function and structure of your code. That means you are not allowed to have any code without specification. That means you have to give a complete description of the functionality of the template in its general form. For obvious reasons that is not possible. That is BTW the same reason why function-like macros are also forbidden. If you change the idea in a way that you describe all actual instantiations of this template, you overcome this limitation, but you need proper tooling to prove that you really described all of them.
The second problem is that one:
fun(b);
This line is not a self-contained line. You need to look up the type of b to know which function is actually called. Proper tooling which understands templates helps here. But in this case it is true that it makes the code harder to check manually.
This statement about templates being a cause of vulnerability seems completely surrealistic to me. For two main reasons:
templates are "compiled away", i.e. instantiated and code-generated like any other function/member, and there is no behavior specific to them. Just as if they never existed;
no construction in any language is neither safe or vulnerable; if an ionizing particule changes a single bit of memory, be it in code or in data, anything is possible (from no noticeable problem occurring up to processor crash). The way to shield a system against this is by adding hardware memory error detection/correction capabilities. Not by modifying the code !

Are there proposals for modeling cache in standard C++? Or any plan?

As I learn more and more about standard C++, I see more and more speakers, authors, and bloggers emphasize the importance of cache-hit for performant programs. Yet I haven't seen any efforts, in the standard or any proposals, to deal with this issue, except the usual suggestion of "use vectors, because the memory is contiguous".
My observation can certainly be biased, and of course, different hardware platforms have different memory hierarchy structures, PC and embedded systems are totally different worlds (my experience is with PC only). Striving to be portable and to avoid making assumptions that would restrict the use case is the core philosophy of C++. But cache use is too important to be a topic left un-dealt-with. And, in my primitive understanding, as multicore become (or is already) the main hardware platform programs run on, cache utility become even more important.
So, does anyone know if there's any plan to address this topic? Or it should not be addressed in the standard at all because it is an implementation-level problem?
Thank you.

Design patterns commonly used for RTOS (VXworks)

Can anyone help me on design patterns commonly used for RTOS?
In VXworks, which pattern is more preferable?
Can we ignore the second sentence in your question? It is meaningless, and perhaps points to a misunderstanding of design patterns. The first part is interesting however. That said, I would generalise it to cover real-time systems rather than RTOS.
Many of the most familiar patterns are mechanistic, but in real-time systems higher-level architectural patterns are also important.
Bruce Powell Douglass is probably the foremost author on the subject of patterns for real time systems. If you want a flavour of what he has to say on the subject then read this article on Embedded.com (it is part three of a series of three; be sure to read the first two as well, since they also touch on the subject, (1) (2)). You could also do worst than to visit Embedded.com and enter "design patterns" into the search box, there are a number of articles on specific patterns and general articles on the subject.
While I think you are being far to specific in requesting patterns for "RTOS(VxWorks)", patterns I have used specifically with VxWorks are the Facade and Adapter patterns. Partly to provide an OO API, and also to provide a level of RTOS agnostic abstraction. The resulting classes were then implemented for Segger emBOS (to allow us to run a smaller, lower cost, royalty free RTOS), and both Windows and Linux to allow test, debug and simulation of the code in a richer environment with more powerful tools.
A non-exhaustive list of many patterns is provided on Wikipedia, many of which will be applicable to real-time systems. The listed concurrency patterns are most obviously relevant.
As Mike DeSimone commented, way too generic. However, here are couple things to keep in mind for a RTOS (not just VxWorks).
Avoid doing too much in the ISR. If possible pass on some of the processing to a waiting task.
Keep multithreading optimal. Too much and you have context switching overhead. Too little and your problem solution may be complicated.
Another important aspect is keeping the RTOS predictable and understandable for the user. Typically you see fixed-priority schedulers that do not try to be fair or adaptive, but rather do exactly as told and if you mess up with priorities and starve some task, so be it. Time to complete kernel operations tend to be short and predictable, often documented with their worst-case execution times.

Which concurrent programming concepts do hiring managers expect developers to understand?

When I hire developers for general mid-to-senior web app development positions, I generally expect them to understand core concurrent programming concepts such as liveness vs. safety, race conditions, thread synchronization and deadlocks. I'm not sure whether to consider topics like fork/join, wait/notify, lock ordering, memory model basics (just the basics) and so forth to be part of what every reasonably seasoned developer ought to know, or whether these are topics that are more for semi-specialists (i.e. developers who have made a conscious decision to know more than the average developer about concurrent programming).
I'd be curious to hear your thoughts.
I tend to think that at this point in time concurrent programming at any serious level of depth is still a specialist skill. Many will claim to know about it through study, but many will also make an almighty mess of it when they come to apply it.
In addition to the considerations listed, I would also look at resource implications and the various overheads of using processes, threads and fibers. In some contexts, e.g. mobile devices, excessive multithreading can have serious performance implications. This can lead to portability issues with multithreaded code.
I guess if I was interviewing a candidate in this situation, I would work with a real world example rather than hitting on more general topics which can be quoted back verbatim from a text book. I say this having done a fair bit of multithreaded work myself and remembering how badly I screwed up the first couple of times. Many can talk the talk... ;)
I know all these topics, but I studied them. I also know many competent senior programmers that don't know these. So unless you expect these programmers to be using those concepts actively, there is no reason to turn down a perfectly good candidate because they don't understand every aspect of concurrency
The real question is:
In what ways does it matter to the code they will be developing?
You should know which concepts the development position you're hiring for needs to know to be able to work on the projects that they will be responsible for.
As with anything in the programming world.. The devil is in the details, and you can't know everything. Would you expect them to know Perl if you were hiring for a Java position?
Also, concurrency, at this stage, while well described in generalized theory, is heavily implementation and platform dependent. Concurrency in Perl on an AIX box is not the same game as concurrency in a C++ Winforms app. They can have all the theory in the world under their belts, but if it's required for the job, then they should have intimate knowledge of the platform they are expected to use it on as well.
I interview folks for concurrency-related positions frequently and I look for three general aspects:
General understanding of core concepts like the ones you list (language-independent)
Specific understanding of Java concurrency libraries and primitives (specific to the work they'd be doing)
Ability to design the solution to a concurrent problem in a reasonable way.
I consider #1 a requirement (for my positions). I consider #2 a nice to have. If they understand it and can describe it in terms of pthreads or whatever other library, it's no biggie to learn the latest Java concurrency libraries (the concepts are the hard part). And #3 tends to separate the hires from the maybe-hires.
Per your question, I wouldn't consider fork/join to be known by almost anyone, esp someone applying for a web app developer position. I would look for developers to have experience with some (but not all) of those topics. Most developers I've interviewed have not used the Java 5+ concurrency libs at all but they can typically describe things like data race or deadlock.

Has anyone tried transactional memory for C++?

I was checking out Intel's "whatif" site and their Transactional Memory compiler (each thread has to make atomic commits or rollback the system's memory, like a Database would).
It seems like a promising way to replace locks and mutexes but I can't find many testimonials. Does anyone here have any input?
I have not used Intel's compiler, however, Herb Sutter had some interesting comments on it...
From Sutter Speaks: The Future of Concurrency
Do you see a lot of interest in and usage of transactional memory, or is the concept too difficult for most developers to grasp?
It's not yet possible to answer who's using it because it hasn't been brought to market yet. Intel has a software transactional memory compiler prototype. But if the question is "Is it too hard for developers to use?" the answer is that I certainly hope not. The whole point is it's way easier than locks. It is the only major thing on the research horizon that holds out hope of greatly reducing our use of locks. It will never replace locks completely, but it's our only big hope to replacing them partially.
There are some limitations. In particular, some I/O is inherently not transactional—you can't take an atomic block that prompts the user for his name and read the name from the console, and just automatically abort and retry the block if it conflicts with another transaction; the user can tell the difference if you prompt him twice. Transactional memory is great for stuff that is only touching memory, though.
Every major hardware and software vendor I know of has multiple transactional memory tools in R&D. There are conferences and academic papers on theoretical answers to basic questions. We're not at the Model T stage yet where we can ship it out. You'll probably see early, limited prototypes where you can't do unbounded transactional memory—where you can only read and write, say, 100 memory locations. That's still very useful for enabling more lock-free algorithms, though.
Dr. Dobb's had an article on the concept last year: Transactional Programming by Calum Grant -- http://www.ddj.com/cpp/202802978
It includes some examples, comparisons, and conclusions using his example library.
I've built the combinatorial STM library on top of some functional programming ideas. It doesn't require any compiler support (except it uses C++17), doesn't bring a new syntax. In general, it adopts the interface of the STM library from Haskell.
So, my library has several nice properties:
Monadically combinatorial. Every transaction is a computation inside the custom monad named STML. You can combine monadic transactions into more big monadic transactions.
Transactions are separated from data model. You construct your concurrent data model with transactional variables (TVars) and run transactions over it.
There is retry combinator. It allows you to rerun the transaction. Very useful to build short and understandable transactions.
There are different monadic combinators to express computations shortly.
There is Context. Every computation should be run in some context, not in the global runtime. So you can have many different contexts if you need several independent STM clusters.
The implementation is quite simple conceptually. At least, the reference implementation in Haskell is so, but I had to reinvent several approaches for C++ implementation due to the lack of a good support of Functional Programming.
The library shows very nice stability and robustness, even if we consider it experimental. Moreover, my approach opens a lot of possibilities to improve the library by performance, features, comprehensiveness, etc.
To demonstrate its work, I've solved the Dining Philosophers task. You can find the code in the links below. Sample transaction:
STML<bool> takeFork(const TVar<Fork>& tFork)
{
STML<bool> alreadyTaken = withTVar(tFork, isForkTaken);
STML<Unit> takenByUs = modifyTVar(tFork, setForkTaken);
STML<bool> success = sequence(takenByUs, pure(true));
STML<bool> fail = pure(false);
STML<bool> result = ifThenElse(alreadyTaken, fail, success);
return result;
};
UPDATE
I've wrote a tutorial, you can find it here.
Dining Philosophers task
My C++ STM library
Sun Microsystems have announced that they're releasing a new processor next year, codenamed Rock, that has hardware support for transactional memory. It will have some limitations, but it's a good first step that should make it easier for programmers to replace locks/mutexes with transactions and expect good performance out of it.
For an interesting talk on the subject, given by Mark Moir, one of the researchers at Sun working on Transactional Memory and Rock, check out this link.
For more information and announcements from Sun about Rock and Transactional Memory in general, this link.
The obligatory wikipedia entry :)
Finally, this link, at the University of Wisconsin-Madison, contains a bibliography of most of the research that has been and is being done about Transactional Memory, whether it's hardware related or software related.
In some cases I can see this as being useful and even necessary.
However, even if the processor has special instructions that make this process easier there is still a large overhead compared to a mutex or semaphore. Depending on how it's implemented it may also impact realtime performance (have to either stop interrupts, or prevent them from writing into your shared areas).
My expectation is that if this was implemented, it would only be needed for portions of a given memory space, though, and so the impact could be limited.
-Adam