can one make concurrent scalable reliable programs in C as in erlang? - c++

a theoretical question. After reading Armstrongs 'programming erlang' book I was wondering the following:
It will take some time to learn Erlang. Let alone master it. It really is fundamentally different in a lot of respects.
So my question: Is it possible to write 'like erlang' or with some 'erlang like framework', which given that you take care not to create functions with sideffects, you can create scaleable reliable apps as well as in Erlang? Maybe with the same msgs sending, loads of 'mini processes' paradigm.
The advantage would be to not throw all your accumulated C/C++ knowledge over the fence.
Any thoughts about this would be welcome

Yes, it is possible, but...
Probably the best answer for this question is given by Robert Virding’s First Rule:
“Any sufficiently complicated
concurrent program in another language
contains an ad hoc,
informally-specified, bug-ridden, slow
implementation of half of Erlang.”
Very good rule is use the right tool for the task. Erlang excels in concurrency and reliability. C/C++ was not designed with these properties in mind.
If you don't want to throw away your C/C++ knowledge and experience and your project allows this kind of division, good approach is to create a mixed solution. Write concurrent, communication and error handling code in Erlang, then add C/C++ parts, which will do CPU and IO bound stuff.

You clearly can - the Erlang/OTP system is largely written in C (and Erlang). The question is 'why would you want to?'
In 'ye olde days' people used to write their own operating system - but why would you want to?
If you elect to use an operating system your unwritten software has certain properties - it can persist to hard disk, it can speak to a network, it can draw on screens, it can run from the command line, it can be invoked in batch mode, etc, etc...
The Erlang/OTP system is 1.5M lines of code which has been demonstrated to give 99.9999999% uptime in large systems (the UK phone system) - that's 31ms downtime a year.
With Erlang/OTP your unwritten software has high reliability, it can hot-swap itself, your unwritten application can failover when a physical computer dies.
Why would you want to rewrite that functionality?

I would break this into 2 questions
Can you write concurrent, scalable C++ applications
Yes. It's certainly possible to create the low level constructs needed in order to achieve this.
Would you want to write concurrent, scalable, C++ applications
Perhaps. But if I was going for a highly concurrent application, I would choose a language that was either designed to fill that void or easily lent itself to doing so (Erlang, F# and possibly C#).
C++ was not designed to build highly concurrent applications. But it can certainly be tweaked into doing so. The cost might be higher than you expect though once you factor in memory management.

Yes, but you will be doing some extra work.
Regarding side effects, consider how the .net/plinq team is approaching. Plinq won't be able to enforce you hand it stuff with no side effects, but it will assume you do so and play by its rules so we get to use a simpler api. Even if the language doesn't have built-in support for it, it will still simplify things as you can break the operations more easily.

What I can do in one Turing complete language I can do in any other Turing complete language.
So I interpret your question to read, is it as easy to write a reliable and scalable application in C++ as it is in Erlang?
The answer to that is highly subjective. For me it is easier to write it in C++ for the following reasons:
I have already done it in C++ (at least three times).
I don't know Erlang.
I have read a great deal about Stackless Python, which feels to me like a highly concurrent message based cooperative multitasking system in python, but of course python is written on top of C.
Having said that. If you already know both languages, and you have the problem well defined, you can then make the best choice based on all the information you have at hand.

the main 'problem' with C (or C++) for writing reliable and easy to extend programs is that in C you can do anything. so, the first step would be to write a simple framework that restricts just a bit. most good programmers do that anyway.
in this case, the restrictions would be mostly to make it easy to define a 'process' within whatever level of isolation you want. fork() has a reputation of being slow, and threads also need significant time to spawn, so you might want to use a cooperative multitasking, which can be far more efficient, and you could even make it preemptive (i think that's what Erlang does). to get multi-core efficiency, set a pool of threads and make all of them complete to run the tasks.
another important part would be to create an appropriate library of immutable data structures, so that using them (instead of the standard lib) your functions would be (mostly) side-effect-free.
then it's just a matter of setting a good API for message passing and futures... not easy, but at least it doesn't seem like changing the language itself.

Related

What's a good and safe language for drawing intensive particle systems over a long period of time?

Another one of my rather ambiguous question today, sorry.
Currently I have written some half decent software that has a 'roll your own' RESTful client, which pulls data from twitter. This data is then visualized with a number of particle systems using Open FrameWorks (a framework that works with c++).
My plans for this were to run the software indefinitely on my VPS, and build some kind of front end GUI allowing users to explore the pretty particles and so on. Between the JSON library I am using, C/C++, OpenFrameworks, and freaking Xcode4 I have produced way too many SIGBIRT and GDB errors to care for. I have go to the ends of the virtual world to fix them, and re wrote everything over and over. I even managed to SIGBIRT the openframeworks draw circle method, HAH!
(TL;DR starts here) Ok so anyway I am starting from scratch, looking for a powerful language that can crunch maths and blast through a good set of particles, and run quite well over the longest periods of time. Right now I am thinking about haskell, any ideas?
Thanks in advance all!
Haskell's (or more specifically GHC's) number crunching speed is approaching that of C++ but it's a little way behind. However, it's certainly not terrible, and Haskell's advantages in parallelism may become important. That is, if you write it in straight Haskell first, there's a good chance that it'll be easy to refactor it to run in parallel now or in the future. That isn't so true of C++.
The 'vector' package (on Hackage) would be a good choice for arrays suitable for number crunching. It supports mutable arrays in case that sort of approach is needed. However, if you're prepared to go more on the bleeding edge and your algorithm can be parallelized, you might want to look at the 'repa' package, and for extreme performance on a GPU, take a look at 'Accelerate' (which works but is still categorized as experimental).
The crashes you mention sound like they could be an indication of a bit of complexity in your problem. Where Haskell does well is in managing the complexity of... well, anything. So, if the problem is complex, then Haskell will serve you very well.
The foreign function interface in Haskell is well designed, though you will need to write C glue between Haskell and C++. So, that's another option for your number crunching.
For the web interface, take a look at 'yesod' which is seeing very active development and advertises itself as doing RESTful.
AFAIK, number crunching speed is not Haskell's strongest point - it's a highly abstract language, far from the 'metal'; its strength in a numeric processing context lies in the "mathiness" of its semantics - Haskell code often reads much like a Mathematical proof, and many of its concepts are borrowed from various fields of Mathematics.
For plain old number crunching, C++ is probably still your best choice, as it allows you to stay close to the hardware and optimize tight loops at the machine level, while offering higher-level programming constructs to manage complexity.
OTOH, if you have a library in place for the heavy lifting, and you merely need to write the glue to make the various parts work together, then go with whatever you're most comfortable with - python, C#, java, haskell, C++, ... - as long as they have bindings for all your libraries, you're good. If you don't have a library, then you might also consider writing the performance critical parts in C, and then pull them into your favorite high-level language - this is trivial in C++, slightly harder in python or haskell, and pretty damn inconvenient in java.

Concurrency model: Erlang vs Clojure

We are going to write a concurrent program using Clojure, which is going to extract keywords from a huge amount of incoming mail which will be cross-checked with a database.
One of my teammates has suggested to use Erlang to write this program.
Here I want to note something that I am new to functional programming so I am in a little doubt whether clojure is a good choice for writing this program, or Erlang is more suitable.
Do you really mean concurrent or distributed?
If you mean concurrent (multi-threaded, multi-core etc.), then I'd say Clojure is the natural solution.
Clojure's STM model is perfectly designed for multi-core concurrency since it is very efficient at storing and managing shared state between threads. If you want to understand more, well worth looking at this excellent video.
Clojure STM allows safe mutation of data by concurrent threads. Erlang sidesteps this problem by making everything immutable, which is fine in itself but doesn't help when you genuinely need shared mutable state. If you want shared mutable state in Erlang, you have to implement it with a set of message interactions which is neither efficient nor convenient (that's the price of a nothing shared model....)
You will get inherently better performance with Clojure if you are in a concurrent setting in a large machine, since Clojure doesn't rely on message passing and hence communication between threads can be much more efficient.
If you mean distributed (i.e. many different machines sharing work over a network which are effectively running as isolated processes) then I'd say Erlang is the more natural solution:
Erlang's immutable, nothing-shared, message passing style forces you to write code in a way that can be distributed. So idiomatic Erlang automatically can be distributed across multiple machines and run in a distributed, fault-tolerant setting.
Erlang is therefore very well optimised for this use case, so would be the natural choice and would certainly be the quickest to get working.
Clojure could do it as well, but you will need to do much more work yourself (i.e. you'd either need to implement or choose some form of distributed computing framework) - Clojure does not currently come with such a framework by default.
In the long term, I hope that Clojure develops a distributed computing framework that matches Erlang - then you can have the best of both worlds!
The two languages and runtimes take different approaches to concurrency:
Erlang structures programs as many lightweight processes communicating between one another. In this case, you will probably have a master process sending jobs and data to many workers and more processes to handle the resulting data.
Clojure favors a design where several threads share data and state using common data structures. It sounds particularly suitable for cases where many threads access the same data (read-only) and share little mutable state.
You need to analyze your application to determine which model suits you best. This may also depend on the external tools you use -- for example, the ability of the database to handle concurrent requests.
Another practical consideration is that clojure runs on the JVM where many open source libraries are available.
Clojure is Lisp running on the Java JVM. Erlang is designed from the ground up to be highly fault tolerant and concurrent.
I believe the task is doable with either of these languages and many others as well. Your experience will depend on how well you understand the problem and how well you know the language. If you are new to both, I'd say the problem will be challenging no matter which one you choose.
Have you thought about something like Lucene/Solr? It's great software for indexing and searching documents. I don't know what "cross checking" means for your context, but this might be a good solution to consider.
My approach would be to write a simple test in each language and test the performance of each one. Both languages are somewhat different to C style languages and if you aren't used to them (and you don't have a team that is used to them) you may end up with a maintenance nightmare.
I'd also look at using something like Groovy 1.8. Groovy now includes GPars to enable parallel computing. String and file manipulation in Groovy is very easy indeed.
It depends what you mean by huge.
Strings in erlang are painful..
but:
If huge means tens of distributed machines, than go with erlang and write workers in text friendly languages (python?, perl?). You will have distributed layer on the top with highly concurrent local workers. Each worker would be represented by erlang process. If you need more performance, rewrite your worker into C. In Erlang it is super easy to talk to another languages.
If huge still means one strong machine go with JVM. It is not huge then.
If huge is hundreds of machines, I think you will need something stronger google-like (bigtable, map/reduce) probably on C++ stack. Erlang still OK, however you will need good devs to code it.

Safe c++ in mission critical realtime apps

I'd want to hear various opinions how to safely use c++ in mission critical realtime applications.
More precisely, it is probably possible to create some macros/templates/class library for safe data manipulation (sealing for overflows, zerodivides produce infinity values or division is possible only for special "nonzero" data types), arrays with bound checking and foreach loops, safe smartpointers (similar to boost shared_ptr, for instance) and even safe multithreading/distributed model (message passing and lightweight processes like ones are defined in Erlang languge).
Then we prohibit some dangerous c/c++ constructions such as raw pointers, some raw types, native "new" operator and native c/c++ arrays ( for application programmer, not for library writer, of course). Ideally, we should create a special preprocessor/checker, at least we must have some formal checking procedure, which can be applyed to sources using some tool or manualy by some person.
So, my questions:
1) Are there any existing libraries/projects that utilize such an idea? (Embedded c++ is apparently not of desired kind) ?
2) Is it a good idea at all or not? Or it may be useful only for prototyping some another hipothetical language? Or it is totally unusable?
3) Any other thoughts (or links) on this matter also welcome
Sorry if this question is not actually a question, offtopic, duplicate, etc.,
but I haven't found more appropriate place to ask it
For good rules on how to write C++ for mission critical real-time applications have a look at the Joint Strike Fighter coding standards. Many of the rules there are based on the MISRA C coding standards, which I believe are proprietary. PC-Lint is a C++ code checker with rule sets like what you want (including the MISRA rules). I believe you can customize your own rules as well.
We use C++ in mission-critical real-time applications, although I suppose we have it easy (in theory) because we have to only provide real-time guarantees as good as the hardware our clients use. Thus, sufficient profiling lets us get by without mlockall() or stack pre-loading or any other RT traditions. As for the language itself, I think everyday modern C++ coding practices (ones that discourage C concepts) are entirely sufficient to write robust applications that can be used in RT contexts, given 21st century hardware.
Unit tests and QA should be the main focus of effort, instead of in-house libraries that duplicate existing language features.
If you're writing critical high-performance realtime S/W in C++, you probably need every microsecond you can get out of the hardware. As such, I wouldn't necessarily suggest implementing all the extra checks such as ones that you mentioned, at least the ones with overhead implications on program execution. You can obviously mask floating point exceptions to prevent divide by zero from crashing the program.
Some observations:
Peer review all code (possibly multiple reviewers). This will go a long way to improving quality without requiring lots of runtime checks.
DO make use of diagnostic tools and non-release-only asserts.
Do make use of simulation systems to test on non-embedded hardware.
C++ was specifically designed without things like bounds checking for performance reasons.
In general I don't suggest arbitrarily restricting the language, although making use of RAII and smart pointers should have minimal overhead and provides a nice benefit.
Someone else pointed out that if you want Ada, just use Ada.

Erlang - C and Erlang

There are certain common library functions in erlang that are much slower than their c equivalent.
Is it possible to have c code do the binary parsing and number crunching, and have erlang spawn processes to run the c code?
Of course C would be faster, in the extreme case, after optimizations. If by faster you mean faster to run.
Erlang would be by far, faster to write. Depending on the speed requirements you have Erlang is probably "fast enough", and it will save you days of searching for bugs in C.
C code will only be faster after optimizations. If you spend the same amount of time on C and Erlang you will come out with about the same speed (note that I count time spent debugging and error fixing in this time estimation. Which will be a lot less in Erlang).
So:
faster writing = Erlang
faster running (after optimisations) = C
faster running without optimisations = any of the two
Take your pick.
There are two rough rules of thumb based on Erlang FAQ:
Code which involves mainly number crunching and data processing will run about 10 times slower than an equivalent C program. This includes almost all "micro benchmarks".
Large systems which spent most of their time communicating with other systems, recovering from faults and making complex decisions run at least as fast as equivalent C programs.
However there are some official solutions to the lack of number crunching performance of Erlang:
Native Implemented Function (NIF):
Implementing a function in C and loading its object code into Erlang virtual machine to be like a standard Erlang function but with native performance.
Examples: Evedis, Bitcask, ElevelDB
Port:
A byte-oriented interface from Erlang virtual machine to external OS processes through standard input and output file descriptors. The communication with this port is through message passing from Erlang's point of view.
Port Driver:
A dynamically linked C object file which is loaded into Erlang virtual machine and acts like a port. The communication with this port driver is through message passing from Erlang's point of view.
Examples: OTP_Inet, ENanomsg, P1_TLS
C Node:
You can simply promote your Erlang runtime to a distributed node. This way there is a specification to implement an Erlang runtime in C and communicate with Erlang nodes with a single interface.
All of aforementioned solutions have its own pros and cons and need to be used with extreme care.
First of all write whole logic of the system in Erlang, then implement handling binaries in C. Using NIFs (it is kind of interface to C) is pretty straight forward and transparent to the rest of the system. Here is another thread about talking to C Run C Code Block in Erlang.
Before hacking C, make sure you benchmarked current implementation. It is possible that it will satisfy your needs especially with the latest Erlang/OTP release (R14) which introduces great enhancements to binary handling.
easy threading is not so interesting to erlang. Easy threading + Message passing and the OTP framework is what's awesome about erlang. If you need number crunching use something like ocaml, python, haskell. Erlang is all that good at number crunching.
Parsing binaries is one of the things erlang is best at though, probably the best for it. Joe's book programming erlang covers everything really well, and is not so expensive used. It also talks about integrating C code and gives an example. the source is available from pragmatic programming without needing to buy the book, you can grep #include or something.
If you really look for speed you should try OpenMP or MPI parallel programming frameworks for C and C++. I recommend you to take a look at Patterns for Parallel Programming (link to amazon.com) for the details of OpenMP and MPI programming patterns.
The section of erl_nif in Erlang ERTS reference manual will be helpful.
If you like Erlang, but want C speed, why not go for JOCAML. It is an extension for OCAML (which is similar to Erlang but is near C in terms of speed) designed for the multicore revolution going on at the moment. I love it (and I know more than 10 programming languages...)
I used C over 20 years.
I am using Erlang almost exclusively the recently years.
C is faster to run for obvious reason.
Hower, Erlang is fast enough for most things when you do it right.
Also, writing Erlang is much faster and more of fun.
For the piece of algorithms for which the run-time speed is critical, it surely can be written in C, which is the way of Erlang BIFs.
Yes,
But there's more than one way to this, loosely speaking, some or all of which are already listed.
We should ask:
Are those procedures really equivalent (how do the Erlang and C differ)?
Is there a better way to write Erlang for this task (other procedures/libraries or data-types)?
It may be helpful to consider this post: Scaling & Speed with Erlang.
To address the question, yes it is possible to have erlang call some c function to handle a specific task. The most common way is to use a NIF - http://erlang.org/doc/tutorial/nif.html. NIFs were recommended only for short running functions before Erlang version 20 or so, few ms, because they were blocking, which couldn't work with Erlangs preemptive scheduler. Now with dirty threads it is more flexible, you can read up on that.
Just to note, C may be faster at parsing binary, though you should run tests, Erlang is by far faster to write the code. Erlang does a great job parsing binaries by pattern matching.

Is Communicating Sequential Processes ever used in large multi threaded C++ programs?

I'm currently writing a large multi threaded C++ program (> 50K LOC).
As such I've been motivated to read up alot on various techniques for handling multi-threaded code. One theory I've found to be quite cool is:
http://en.wikipedia.org/wiki/Communicating_sequential_processes
And it's invented by a slightly famous guy, who's made other non-trivial contributions to concurrent programming.
However, is CSP used in practice? Can anyone point to any large application written in a CSP style?
Thanks!
CSP, as a process calculus, is fundamentally a theoretical thing that enables us to formalize and study some aspects of a parallel program.
If you instead want a theory that enables you to build distributed programs, then you should take a look to parallel structured programming.
Parallel structural programming is the base of the current HPC (high-performance computing) research and provides to you a methodology about how to approach and design parallel programs (essentially, flowcharts of communicating computing nodes) and runtime systems to implements them.
A central idea in parallel structured programming is that of algorithmic skeleton, developed initially by Murray Cole. A skeleton is a thing like a parallel design pattern with a cost model associated and (usually) a run-time system that supports it. A skeleton models, study and supports a class of parallel algorithms that have a certain "shape".
As a notable example, mapreduce (made popular by Google) is just a kind of skeleton that address data parallelism, where a computation can be described by a map phase (apply a function f to all elements that compose the input data), and a reduce phase (take all the transformed items and "combine" them using an associative operator +).
I found the idea of parallel structured programming both theoretical sound and practical useful, so I'll suggest to give a look to it.
A word about multi-threading: since skeletons addresses massive parallelism, usually they are implemented in distributed memory instead of shared. Intel has developed a tool, TBB, which address multi-threading and (partially) follows the parallel structured programming framework. It is a C++ library, so probably you can just start using it in your projects.
Yes and no. The basic idea of CSP is used quite a bit. For example, thread-safe queues in one form or another are frequently used as the primary (often only) communication mechanism to build a pipeline out of individual processes (threads).
Hoare being Hoare, however, there's quite a bit more to his original theory than that. He invented a notation for talking about the processes, defined a specific set of signals that can be sent between the processes, and so on. The notation has since been refined in various ways, quite a bit of work put into proving various aspects, and so on.
Application of that relatively formal model of CSP (as opposed to just the general idea) is much less common. It's been used in a few systems where high reliability was considered extremely important, but few programmers appear interested in learning (yet another) formal design notation.
When I've designed systems like this, I've generally used an approach that's less rigorous, but (at least to me) rather easier to understand: a fairly simple diagram, with boxes representing the processes, and arrows representing the lines of communication. I doubt I could really offer much in the way of a proof about most of the designs (and I'll admit I haven't designed a really huge system this way), but it's worked reasonably well nonetheless.
Take a look at the website for a company called Verum. Their ASD technology is based on CSP and is used by companies like Philips Healthcare, Ericsson and NXP semiconductors to build software for all kinds of high-tech equipment and applications.
So to answer your question: Yes, CSP is used on large software projects in real-life.
Full disclosure: I do freelance work for Verum
Answering a very old question, yet it seems important that one
There is Go where CSPs are a fundamental part of the language. In the FAQ to Go, the authors write:
Concurrency and multi-threaded programming have a reputation for difficulty. We believe this is due partly to complex designs such as pthreads and partly to overemphasis on low-level details such as mutexes, condition variables, and memory barriers. Higher-level interfaces enable much simpler code, even if there are still mutexes and such under the covers.
One of the most successful models for providing high-level linguistic support for concurrency comes from Hoare's Communicating Sequential Processes, or CSP. Occam and Erlang are two well known languages that stem from CSP. Go's concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects. Experience with several earlier languages has shown that the CSP model fits well into a procedural language framework.
Projects implemented in Go are:
Docker
Google's download server
Many more
This style is ubiquitous on Unix where many tools are designed to process from standard in to standard out. I don't have any first hand knowledge of large systems that are build that way, but I've seen many small once-off systems that are
for instance this simple command line uses (at least) 3 processes.
cat list-1 list-2 list-3 | sort | uniq > final.list
This system is only moderately sized, but I wrote a protocol processor that strips away and interprets successive layers of protocol in a message that used a style very similar to this. It was an event driven system using something akin to cooperative threading, but I could've used multithreading fairly easily with a couple of added tweaks.
The program is proprietary (unfortunately) so I can't show off the source code.
In my opinion, this style is useful for some things, but usually best mixed with some other techniques. Often there is a core part of your program that represents a processing bottleneck, and applying various concurrency increasing techniques there is likely to yield the biggest gains.
Microsoft had a technology called ActiveMovie (if I remember correctly) that did sequential processing on audio and video streams. Data got passed from one filter to another to go from input to output format (and source/sink). Maybe that's a practical example??
The Wikipedia article looks to me like a lot of funny symbols used to represent somewhat pedestrian concepts. For very large or extensible programs, the formalism can be very important to check how the (sub)processes are allowed to interact.
For a 50,000 line class program, you're probably better off architecting it as you see fit.
In general, following ideas such as these is a good idea in terms of performance. Persistent threads that process data in stages will tend not to contend, and exploit data locality well. Also, it is easy to throttle the threads to avoid data piling up as a fast stage feeds a slow stage: just block the fast one if its output buffer grows too big.
A little bit off-topic but for my thesis I used a tool framework called TERRA/LUNA which aims for software development for Embedded Control Systems but is used heavily for all sorts of software development at my institute (so only academical use here).
TERRA is a graphical CSP and software architecture editor and LUNA is both the name for a C++ library for CSP based constructs and the plugin you'll find in TERRA to generate C++ code from your CSP models.
It becomes very handy in combination with FDR3 (a CSP refinement checker) to detect any sort of (dead/life/etc) lock or even profiling.