Erlang - C and Erlang - c++

There are certain common library functions in erlang that are much slower than their c equivalent.
Is it possible to have c code do the binary parsing and number crunching, and have erlang spawn processes to run the c code?

Of course C would be faster, in the extreme case, after optimizations. If by faster you mean faster to run.
Erlang would be by far, faster to write. Depending on the speed requirements you have Erlang is probably "fast enough", and it will save you days of searching for bugs in C.
C code will only be faster after optimizations. If you spend the same amount of time on C and Erlang you will come out with about the same speed (note that I count time spent debugging and error fixing in this time estimation. Which will be a lot less in Erlang).
So:
faster writing = Erlang
faster running (after optimisations) = C
faster running without optimisations = any of the two
Take your pick.

There are two rough rules of thumb based on Erlang FAQ:
Code which involves mainly number crunching and data processing will run about 10 times slower than an equivalent C program. This includes almost all "micro benchmarks".
Large systems which spent most of their time communicating with other systems, recovering from faults and making complex decisions run at least as fast as equivalent C programs.
However there are some official solutions to the lack of number crunching performance of Erlang:
Native Implemented Function (NIF):
Implementing a function in C and loading its object code into Erlang virtual machine to be like a standard Erlang function but with native performance.
Examples: Evedis, Bitcask, ElevelDB
Port:
A byte-oriented interface from Erlang virtual machine to external OS processes through standard input and output file descriptors. The communication with this port is through message passing from Erlang's point of view.
Port Driver:
A dynamically linked C object file which is loaded into Erlang virtual machine and acts like a port. The communication with this port driver is through message passing from Erlang's point of view.
Examples: OTP_Inet, ENanomsg, P1_TLS
C Node:
You can simply promote your Erlang runtime to a distributed node. This way there is a specification to implement an Erlang runtime in C and communicate with Erlang nodes with a single interface.
All of aforementioned solutions have its own pros and cons and need to be used with extreme care.

First of all write whole logic of the system in Erlang, then implement handling binaries in C. Using NIFs (it is kind of interface to C) is pretty straight forward and transparent to the rest of the system. Here is another thread about talking to C Run C Code Block in Erlang.
Before hacking C, make sure you benchmarked current implementation. It is possible that it will satisfy your needs especially with the latest Erlang/OTP release (R14) which introduces great enhancements to binary handling.

easy threading is not so interesting to erlang. Easy threading + Message passing and the OTP framework is what's awesome about erlang. If you need number crunching use something like ocaml, python, haskell. Erlang is all that good at number crunching.
Parsing binaries is one of the things erlang is best at though, probably the best for it. Joe's book programming erlang covers everything really well, and is not so expensive used. It also talks about integrating C code and gives an example. the source is available from pragmatic programming without needing to buy the book, you can grep #include or something.

If you really look for speed you should try OpenMP or MPI parallel programming frameworks for C and C++. I recommend you to take a look at Patterns for Parallel Programming (link to amazon.com) for the details of OpenMP and MPI programming patterns.
The section of erl_nif in Erlang ERTS reference manual will be helpful.

If you like Erlang, but want C speed, why not go for JOCAML. It is an extension for OCAML (which is similar to Erlang but is near C in terms of speed) designed for the multicore revolution going on at the moment. I love it (and I know more than 10 programming languages...)

I used C over 20 years.
I am using Erlang almost exclusively the recently years.
C is faster to run for obvious reason.
Hower, Erlang is fast enough for most things when you do it right.
Also, writing Erlang is much faster and more of fun.
For the piece of algorithms for which the run-time speed is critical, it surely can be written in C, which is the way of Erlang BIFs.

Yes,
But there's more than one way to this, loosely speaking, some or all of which are already listed.
We should ask:
Are those procedures really equivalent (how do the Erlang and C differ)?
Is there a better way to write Erlang for this task (other procedures/libraries or data-types)?
It may be helpful to consider this post: Scaling & Speed with Erlang.

To address the question, yes it is possible to have erlang call some c function to handle a specific task. The most common way is to use a NIF - http://erlang.org/doc/tutorial/nif.html. NIFs were recommended only for short running functions before Erlang version 20 or so, few ms, because they were blocking, which couldn't work with Erlangs preemptive scheduler. Now with dirty threads it is more flexible, you can read up on that.
Just to note, C may be faster at parsing binary, though you should run tests, Erlang is by far faster to write the code. Erlang does a great job parsing binaries by pattern matching.

Related

What's a good and safe language for drawing intensive particle systems over a long period of time?

Another one of my rather ambiguous question today, sorry.
Currently I have written some half decent software that has a 'roll your own' RESTful client, which pulls data from twitter. This data is then visualized with a number of particle systems using Open FrameWorks (a framework that works with c++).
My plans for this were to run the software indefinitely on my VPS, and build some kind of front end GUI allowing users to explore the pretty particles and so on. Between the JSON library I am using, C/C++, OpenFrameworks, and freaking Xcode4 I have produced way too many SIGBIRT and GDB errors to care for. I have go to the ends of the virtual world to fix them, and re wrote everything over and over. I even managed to SIGBIRT the openframeworks draw circle method, HAH!
(TL;DR starts here) Ok so anyway I am starting from scratch, looking for a powerful language that can crunch maths and blast through a good set of particles, and run quite well over the longest periods of time. Right now I am thinking about haskell, any ideas?
Thanks in advance all!
Haskell's (or more specifically GHC's) number crunching speed is approaching that of C++ but it's a little way behind. However, it's certainly not terrible, and Haskell's advantages in parallelism may become important. That is, if you write it in straight Haskell first, there's a good chance that it'll be easy to refactor it to run in parallel now or in the future. That isn't so true of C++.
The 'vector' package (on Hackage) would be a good choice for arrays suitable for number crunching. It supports mutable arrays in case that sort of approach is needed. However, if you're prepared to go more on the bleeding edge and your algorithm can be parallelized, you might want to look at the 'repa' package, and for extreme performance on a GPU, take a look at 'Accelerate' (which works but is still categorized as experimental).
The crashes you mention sound like they could be an indication of a bit of complexity in your problem. Where Haskell does well is in managing the complexity of... well, anything. So, if the problem is complex, then Haskell will serve you very well.
The foreign function interface in Haskell is well designed, though you will need to write C glue between Haskell and C++. So, that's another option for your number crunching.
For the web interface, take a look at 'yesod' which is seeing very active development and advertises itself as doing RESTful.
AFAIK, number crunching speed is not Haskell's strongest point - it's a highly abstract language, far from the 'metal'; its strength in a numeric processing context lies in the "mathiness" of its semantics - Haskell code often reads much like a Mathematical proof, and many of its concepts are borrowed from various fields of Mathematics.
For plain old number crunching, C++ is probably still your best choice, as it allows you to stay close to the hardware and optimize tight loops at the machine level, while offering higher-level programming constructs to manage complexity.
OTOH, if you have a library in place for the heavy lifting, and you merely need to write the glue to make the various parts work together, then go with whatever you're most comfortable with - python, C#, java, haskell, C++, ... - as long as they have bindings for all your libraries, you're good. If you don't have a library, then you might also consider writing the performance critical parts in C, and then pull them into your favorite high-level language - this is trivial in C++, slightly harder in python or haskell, and pretty damn inconvenient in java.

Mixing Haskell and C++

If you had the possibility of having an application that would use both Haskell and C++.
What layers would you let Haskell-managed and what layers would you let C++-managed ?
Has any one ever done such an association, (surely) ?
(the Haskell site tells it's really easy because Haskell has a mode where it can be compiled in C by gcc)
At first I think I would keep all I/O operations in the C++ layers. As well as GUI management.
It is pretty vague a question, but as I am planning to learn Haskell, I was thinking about delegating some work to Haskell-code (I learn in actually coding), and I want to choose some part where I will see Haskell benefits.
The benefit of Haskell is the powerful abstractions it allows you to use. You're not thinking in terms of ones and zeros and addresses and registers but computations and type properties and continuations.
The benefit of C++ is how tightly you can optimize it when necessary. You aren't thinking about high-minded monads, arrows, partial application, and composing pure functions: with C++, you can get right down to the bare metal!
There's tension between these two statements. In his paper “Structured Programming with go to statements,” Donald Knuth wrote
I have felt for a long time that a talent for programming consists largely of the ability to switch readily from microscopic to macroscopic views of things, i.e., to change levels of abstraction fluently.
Knowing how to use Haskell and C++ but also how and when to combine them well will knock down all sorts of problems.
The last big project I wrote that used FFI involved using an in-house radar modeling library written in C. Reimplementing it would have been silly, and expressing the high-level logic of the rest of the application would have been a pain. I kept the “brains” of it in Haskell and called the C library when I needed it.
You're wanting to do this as an exercise, so I'd recommend the same approach: write the smarts in Haskell. Shackling Haskell as a slave to C++ will probably end up frustrating you or making you feel as though you wasted your time. Use each language where its strengths lie.
Here is how I see things:
Functional languages excel at transforming things. Whenever you write programs which take an input and map/filter/reduce it, use functional constructs. Wonderful real world examples where Haskell should excel are given by web applications (you basically transform things stored in a database to web pages).
Procedural languages (OOP languages are procedural) excel at side effects and communication between objects. They are cumbersome to use to transform data, but whenever you want to do system programming or (bidirectional) interaction with humans (user interfaces of any kind, including client-side web programming), they do the job cleanly.
However, some may argue that user interfaces should have a functional description, I answer that well established frameworks are easy enough to use with OOP languages that one should use them this way. After all, it is natural to think of UI components in terms of objects and communication between objects.
IO is only a tool: whenever you transform input to output, Haskell (or whatever FP language) should do the IO. Whenever you speak to a human, C++ (or whatever OOP language) should do the IO.
Don't care about speed. When you use Haskell for the right job, it's efficient. When you use C++ or Python for the right job, it's efficient.
Therefore, let's say I'm fluent in Haskell, C, C++ and Python, here's how I write applications:
If my application's main role is to transform data, I write it in Haskell, with possibly some low-level parts written in C (which may, in turn, call some high-tech low level parts written in C++, but I'd stick with C as an interface for portability reasons).
If my application's main role is to interact with a user, I write it in Python (PyQt for instance), and let Python call performance-critical routines written in C++ (boost::python is pretty good as a binding generator). I may also have to call subroutines which transform or fetch data, which will be written in Haskell.
If I have to write a part of an application in Haskell, I segregate it into a C-callable API.
Sometimes, a Haskell application which reads things on stdin and write back on stdout is useful as a submodule (that you call with fork/exec, or whatever on your platform). Sometimes, a shell script is the right wrapper for such applications.
This answer is more a story than a comprehensive answer, but I used a mix of Haskell, Python and C++ for my dissertation in computational linguistics, as well as several C and Java tools that I didn't write. I found it simplest to run everything as a separate process, using Python as glue code to start the Haskell, C++ and Java programs.
The C++ was a fairly simple, tight loop that counted feature occurrences. Basically all it did was math and simple I/O. I actually controlled options by having the Python glue code write out a header full of #defines and recompiling. Kind of hacky, but it worked.
The Haskell was all the intermediate processing: taking the complex output from the various C and Java parsers that I used, filtering extraneous data, and transforming it the simple format the C++ code expected. Then I took the C++ output and transformed it into LaTeX markup (among other formats).
This is an area that you would expect Python to be strong, but I found that Haskell makes manipulation of complex structures easier; Python is probably better for simple line-to-line transformations, but I was slicing and dicing parse trees and I found that I forgot the input and output types when I wrote code in Python.
Since I was using Haskell a lot like a more-structured scripting language, I ended up writing a few file I/O utilities, but beyond that, the built in libraries for tree and list manipulation sufficed.
In summary, if you have a problem like mine, I would suggest C++ for the memory-constrained, speed-critical part, Haskell for the high-level transformations, and Python to run it all.
I have never mixed both languages but your approach feels a little upside down to me.
Haskell is more apt at high-level operations while C++ can be optimized and can be most beneficial for tight loops and other performance critical code.
One of the largest benefits of Haskell is the encapsulation of IO into monads. As long as this IO isn't time critical I don't see any reason to do it in C++.
For the GUI part you are probably right. There is a plethora of Haskell GUI libraries but C++ has powerful tools such as QtCreator which simplify the tedious tasks a lot.

F# performance in scientific computing

I am curious as to how F# performance compares to C++ performance? I asked a similar question with regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching.
I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are:
How well does it do floating-point?
Does it allow vector instructions
how friendly is it towards optimizing
compilers?
How big a memory foot print does it have? Does it allow fine-grained control over memory locality?
does it have capacity for distributed
memory processors, for example Cray?
what features does it have that may be of interest to computational science where heavy number processing is involved?
Are there actual scientific computing
implementations that use it?
Thanks
I am curious as to how F# performance compares to C++ performance?
Varies wildly depending upon the application. If you are making extensive use of sophisticated data structures in a multi-threaded program then F# is likely to be a big win. If most of your time is spent in tight numerical loops mutating arrays then C++ might be 2-3× faster.
Case study: Ray tracer My benchmark here uses a tree for hierarchical culling and numerical ray-sphere intersection code to generate an output image. This benchmark is several years old and the C++ code has been improved upon dozens of times over the years and read by hundreds of thousands of people. Don Syme at Microsoft managed to write an F# implementation that is slightly faster than the fastest C++ code when compiled with MSVC and parallelized using OpenMP.
I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++?
Developing code is much easier and faster with F# than C++, and this applies to optimization as well as maintenance. Consequently, when you start optimizing a program the same amount of effort will yield much larger performance gains if you use F# instead of C++. However, F# is a higher-level language and, consequently, places a lower ceiling on performance. So if you have infinite time to spend optimizing you should, in theory, always be able to produce faster code in C++.
This is exactly the same benefit that C++ had over Fortran and Fortran had over hand-written assembler, of course.
Case study: QR decomposition This is a basic numerical method from linear algebra provided by libraries like LAPACK. The reference LAPACK implementation is 2,077 lines of Fortran. I wrote an F# implementation in under 80 lines of code that achieves the same level of performance. But the reference implementation is not fast: vendor-tuned implementations like Intel's Math Kernel Library (MKL) are often 10x faster. Remarkably, I managed to optimize my F# code well beyond the performance of Intel's implementation running on Intel hardware whilst keeping my code under 150 lines of code and fully generic (it can handle single and double precision, and complex and even symbolic matrices!): for tall thin matrices my F# code is up to 3× faster than the Intel MKL.
Note that the moral of this case study is not that you should expect your F# to be faster than vendor-tuned libraries but, rather, that even experts like Intel's will miss productive high-level optimizations if they use only lower-level languages. I suspect Intel's numerical optimization experts failed to exploit parallelism fully because their tools make it extremely cumbersome whereas F# makes it effortless.
How well does it do floating-point?
Performance is similar to ANSI C but some functionality (e.g. rounding modes) is not available from .NET.
Does it allow vector instructions
No.
how friendly is it towards optimizing compilers?
This question does not make sense: F# is a proprietary .NET language from Microsoft with a single compiler.
How big a memory foot print does it have?
An empty application uses 1.3Mb here.
Does it allow fine-grained control over memory locality?
Better than most memory-safe languages but not as good as C. For example, you can unbox arbitrary data structures in F# by representing them as "structs".
does it have capacity for distributed memory processors, for example Cray?
Depends what you mean by "capacity for". If you can run .NET on that Cray then you could use message passing in F# (just like the next language) but F# is intended primarily for desktop multicore x86 machines.
what features does it have that may be of interest to computational science where heavy number processing is involved?
Memory safety means you do not get segmentation faults and access violations. The support for parallelism in .NET 4 is good. The ability to execute code on-the-fly via the F# interactive session in Visual Studio 2010 is extremely useful for interactive technical computing.
Are there actual scientific computing implementations that use it?
Our commercial products for scientific computing in F# already have hundreds of users.
However, your line of questioning indicates that you think of scientific computing as high-performance computing (e.g. Cray) and not interactive technical computing (e.g. MATLAB, Mathematica). F# is intended for the latter.
In addition to what others said, there is one important point about F# and that's parallelism. The performance of ordinary F# code is determined by CLR, although you may be able to use LAPACK from F# or you may be able to make native calls using C++/CLI as part of your project.
However, well-designed functional programs tend to be much easier to parallelize, which means that you can easily gain performance by using multi-core CPUs, which are definitely available to you if you're doing some scientific computing. Here are a couple of relevant links:
F# and Task-Parallel library (blog by Jurgen van Gael, who is doing machine-learning stuff)
Another interesting answer at SO regarding parllelism
An example of using Parallel LINQ from F#
Chapter 14 of my book discusses parallelism (source code is available)
Regarding distributed computing, you can use any distributed computing framework that's available for the .NET platform. There is a MPI.NET project, which works well with F#, but you may be also able to use DryadLINQ, which is a MSR project.
Some articles: F# MPI tools for .NET, Concurrency with MPI.NET
DryadLINQ project hompepage
F# does floating point computation as fast as the .NET CLR will allow it. Not much difference from C# or other .NET languages.
F# does not allow vector instructions by itself, but if your CLR has an API for these, F# should not have problems using it. See for instance Mono.
As far as I know, there is only one F# compiler for the moment, so maybe the question should be "how good is the F# compiler when it comes to optimisation?". The answer is in any case "potentially as good as the C# compiler, probably a little bit worse at the moment". Note that F# differs from e.g. C# in its support for inlining at compile time, which potentially allows for more efficient code which rely on generics.
Memory foot prints of F# programs are similar to that of other .NET languages. The amount of control you have over allocation and garbage collection is the same as in other .NET languages.
I don't know about the support for distributed memory.
F# has very nice primitives for dealing with flat data structures, e.g. arrays and lists. Look for instance at the content of the Array module: map, map2, mapi, iter, fold, zip... Arrays are popular in scientific computing, I guess due to their inherently good memory locality properties.
For scientific computation packages using F#, you may want to look at what Jon Harrop is doing.
As with all language/performance comparisons, your mileage depends greatly on how well you can code.
F# is a derivative of OCaml. I was surprised to find out that OCaml is used a lot in the financial world, where number crunching performance is very important. I was further surprised to find out that OCaml is one of the faster languages, with performance on par with the fastest C and C++ compilers.
F# is built on the CLR. In the CLR, code is expressed in a form of bytecode called the Common Intermediate Language. As such, it benefits from the optimizing capabilities of the JIT, and has performance comparable to C# (but not necessarily C++), if the code is written well.
CIL code can be compiled to native code in a separate step prior to runtime by using the Native Image Generator (NGEN). This speeds up all later runs of the software as the CIL-to-native compilation is no longer necessary.
One thing to consider is that functional languages like F# benefit from a more declarative style of programming. In a sense, you are over-specifying the solution in imperative languages such as C++, and this limits the compiler's ability to optimize. A more declarative programming style can theoretically give the compiler additional opportunities for algorithmic optimization.
It depends on what kind of scientific computing you are doing.
If you are doing traditional heavy computing, e.g. linear algebra, various optimizations, then you should not put your code in .Net framework, at least not suitable in F#. Because this is at the algorithm level, most of the algorithms must be coded in an imperative languages to have good performance in running time and memory usage. Others mentioned parallel, I must say it is probably useless when you doing low level stuff like parallel an SVD implementation. Because when you know how to parallel an SVD, you simply won't use an high level languages, Fortran, C or modified C(e.g. cilk) are your friends.
However, a lot of the scientific computing today is not of this kind, which is some kind of high level applications, e.g. statistical computing and data mining. In these tasks, aside from some linear algebra, or optimization, there are also a lot of data flows, IOs, prepossessing, doing graphics, etc. For these tasks, F# is really powerful, for its succinctness, functional, safety, easy to parallel, etc.
As others have mentioned, .Net well supports Platform Invoke, actually quite a few projects inside MS are use .Net and P/Invoke together to improve the performance at the bottle neck.
I don't think that you'll find a lot of reliable information, unfortunately. F# is still a very new language, so even if it were ideally suited for performance heavy workloads there still wouldn't be that many people with significant experience to report on. Furthermore, performance is very hard to accurately gauge and microbenchmarks are hard to generalize. Even within C++, you can see dramatic differences between compilers - are you wondering whether F# is competitive with any C++ compiler, or with the hypothetical "best possible" C++ executable?
As to specific benchmarks against C++, here are some possibly relevant links: O'Caml vs. F#: QR decomposition; F# vs Unmanaged C++ for parallel numerics. Note that as an author of F#-related material and as the vendor of F# tools, the writer has a vested interest in F#'s success, so take these claims with a grain of salt.
I think it's safe to say that there will be some applications where F# is competitive on execution time and likely some others where it isn't. F# will probably require more memory in most cases. Of course the ultimate performance will also be highly dependent on the skill of the programmer - I think F# will almost certainly be a more productive language to program in for a moderately competent programmer. Furthermore, I think that at the moment, the CLR on Windows performs better than Mono on most OSes for most tasks, which may also affect your decisions. Of course, since F# is probably easier to parallelize than C++, it will also depend on the type of hardware you're planning to run on.
Ultimately, I think that the only way to really answer this question is to write F# and C++ code representative of the type of calculations that you want to perform and compare them.
Here are two examples I can share:
Matrix multiplication:
I have a blog post comparing different matrix multiplication implementations.
LBFGS
I have a large scale logistic regression solver using LBFGS optimization, which is coded in C++. The implementation is well tuned. I modified some code to code in C++/CLI, i.e. I compiled the code into .Net. The .Net version is 3 to 5 times slower than the naive compiled one on different datasets. If you code LBFGS in F#, the performance can not be better than C++/CLI or C#, (but would be very close).
I have another post on Why F# is the language for data mining, although not quite related to the performance issue you concern here, it is quite related to scientific computing in F#.
If I say "ask again in 2-3 years" I think that will answer your question completely :-)
First, don't expect F# to be any different than C# perf-wise, unless you are doing some convoluted recursions on purpose and I'd guess you are not since you asked about numerics.
Floating-point wise it is bound to be better than Java since CLR doesn't aim at cross-platform uniformity, meaning that JIT will go to 80-bits whenever it can. On the other side you don't control over that beyond watching the number of variables to make sure there's enough FP registers.
Vector-wise, if you scream loud enough maybe something happens in 2-3 yr since Direct3D is entering .NET as a general API anyway and C# code done in XNA runs on Xbox whihc is as close to the bare metal you can get with CLR. That still means that you'd need do so some intermediary code on your own.
So don't expect CUDA or even ability to just link NVIDIA libs and get going. You'd have much more luck trying that approach with Haskell if for some reason you really, really need a "functional" language since Haskell was designed to be linking-friendly out of pure necessity.
Mono.Simd has been mentioned already and while it should be back-portable to CLR it might be quite some work to actually do it.
There,s quite some code in a social.msdn posting on using SSE3 in .NET, vith C++/CLI and C#, come array blitting, injecting SSE3 code for perf etc.
There was some talk about running CECIL on compiled C# to extract parts into HLSL, compile into shaders and link a glue code to schedule it (CUDA is doing the equivalent anyway) but I don't think that there's anything runnable coming out of that.
A thing that might be worth more to you if you want to try something soon is PhysX.Net on codeplex. Don't expect it to just unpack and do the magic. However, ih has currently active author and the code is both normal C++ and C++/CLI and yopu can probably get some help from the author if you want to go into details and maybe use similar approach for CUDA. For full speed CUDA you'll still need to compile your own kernels and then just interface to .NET so the easier that part goes the happier you are going to be.
There is a CUDA.NET lib which is supposed to be free but the page gives just e-mail address so expect some strings attached, and while the author writes a blog he's not particularly talkative about what's inside the lib.
Oh and if you have the budget yo might give that Psi Lambda a look (KappaCUDAnet is the .NET part). Apparently they are going to jack up the prices in Nov (if it's not a sales trick :-)
Firstly C is significantly faster than C++.. So if you need so much speed you should make the lib etc in c.
With regards to F# most bench marks use Mono which is up to 2 * slower than MS CLR due t partially to its use of the boehm GC ( they have a new GC and LVVM but these are still immature dont support generics etc).
.NEt languages itself are compiled to an IR ( the CIL) which compile to native code as efficiently as C++. There is one problem set that most GC languages suffer in and that is large amounts of mutable writes ( this includes C++ .NET as mentioned above) . And there is a certain scientific problem set that requires this , these when needed should probably use a native library or use the Flyweight pattern to reuse objects from a pool ( which reduces writes) . The reason is there is a write barrier in the .NET CLR where when updating a reference field (including a box) it will set a bit in a table saying this table is modified . If your code consists of lots of such writes it will suffer.
That said a .NET app like C# using lots of static code , structs and ref/out on the structs can produce C like performance but it is very difficult to code like this or maintain the code ( like C) .
Where F# shines however is parralelism over immutable data which goes hand and hand with more read based problems. Its worth noting most benchmarks are much higher in mutable writes than real life applications.
With regard to floating point , you should use an alternative lib ( ie the .Net one) to the oCaml ones due to it being slow. C/C++ allows faster for lower precision which oCaml doesnt by default.
Lastly i woudl argue a high level language like C#, F# and proper profiling will give you betetr pefromance than c and C++ for the same developer time. If you change a bottle neck to a c lib pinvoke call you will also end up with C like performance for critical areas. That said if you have unlimited budget and care more about speed then maintenance than C is the way to go ( not C++) .
Last I knew, most scientific computing was still done in FORTRAN. It's still faster than anything else for linear algebra problems - not Java, not C, not C++, not C#, not F#. LINPACK is nicely optimized.
But the remark about "your mileage may vary" is true of all benchmarks. Blanket statements (except mine) are rarely true.

Is Communicating Sequential Processes ever used in large multi threaded C++ programs?

I'm currently writing a large multi threaded C++ program (> 50K LOC).
As such I've been motivated to read up alot on various techniques for handling multi-threaded code. One theory I've found to be quite cool is:
http://en.wikipedia.org/wiki/Communicating_sequential_processes
And it's invented by a slightly famous guy, who's made other non-trivial contributions to concurrent programming.
However, is CSP used in practice? Can anyone point to any large application written in a CSP style?
Thanks!
CSP, as a process calculus, is fundamentally a theoretical thing that enables us to formalize and study some aspects of a parallel program.
If you instead want a theory that enables you to build distributed programs, then you should take a look to parallel structured programming.
Parallel structural programming is the base of the current HPC (high-performance computing) research and provides to you a methodology about how to approach and design parallel programs (essentially, flowcharts of communicating computing nodes) and runtime systems to implements them.
A central idea in parallel structured programming is that of algorithmic skeleton, developed initially by Murray Cole. A skeleton is a thing like a parallel design pattern with a cost model associated and (usually) a run-time system that supports it. A skeleton models, study and supports a class of parallel algorithms that have a certain "shape".
As a notable example, mapreduce (made popular by Google) is just a kind of skeleton that address data parallelism, where a computation can be described by a map phase (apply a function f to all elements that compose the input data), and a reduce phase (take all the transformed items and "combine" them using an associative operator +).
I found the idea of parallel structured programming both theoretical sound and practical useful, so I'll suggest to give a look to it.
A word about multi-threading: since skeletons addresses massive parallelism, usually they are implemented in distributed memory instead of shared. Intel has developed a tool, TBB, which address multi-threading and (partially) follows the parallel structured programming framework. It is a C++ library, so probably you can just start using it in your projects.
Yes and no. The basic idea of CSP is used quite a bit. For example, thread-safe queues in one form or another are frequently used as the primary (often only) communication mechanism to build a pipeline out of individual processes (threads).
Hoare being Hoare, however, there's quite a bit more to his original theory than that. He invented a notation for talking about the processes, defined a specific set of signals that can be sent between the processes, and so on. The notation has since been refined in various ways, quite a bit of work put into proving various aspects, and so on.
Application of that relatively formal model of CSP (as opposed to just the general idea) is much less common. It's been used in a few systems where high reliability was considered extremely important, but few programmers appear interested in learning (yet another) formal design notation.
When I've designed systems like this, I've generally used an approach that's less rigorous, but (at least to me) rather easier to understand: a fairly simple diagram, with boxes representing the processes, and arrows representing the lines of communication. I doubt I could really offer much in the way of a proof about most of the designs (and I'll admit I haven't designed a really huge system this way), but it's worked reasonably well nonetheless.
Take a look at the website for a company called Verum. Their ASD technology is based on CSP and is used by companies like Philips Healthcare, Ericsson and NXP semiconductors to build software for all kinds of high-tech equipment and applications.
So to answer your question: Yes, CSP is used on large software projects in real-life.
Full disclosure: I do freelance work for Verum
Answering a very old question, yet it seems important that one
There is Go where CSPs are a fundamental part of the language. In the FAQ to Go, the authors write:
Concurrency and multi-threaded programming have a reputation for difficulty. We believe this is due partly to complex designs such as pthreads and partly to overemphasis on low-level details such as mutexes, condition variables, and memory barriers. Higher-level interfaces enable much simpler code, even if there are still mutexes and such under the covers.
One of the most successful models for providing high-level linguistic support for concurrency comes from Hoare's Communicating Sequential Processes, or CSP. Occam and Erlang are two well known languages that stem from CSP. Go's concurrency primitives derive from a different part of the family tree whose main contribution is the powerful notion of channels as first class objects. Experience with several earlier languages has shown that the CSP model fits well into a procedural language framework.
Projects implemented in Go are:
Docker
Google's download server
Many more
This style is ubiquitous on Unix where many tools are designed to process from standard in to standard out. I don't have any first hand knowledge of large systems that are build that way, but I've seen many small once-off systems that are
for instance this simple command line uses (at least) 3 processes.
cat list-1 list-2 list-3 | sort | uniq > final.list
This system is only moderately sized, but I wrote a protocol processor that strips away and interprets successive layers of protocol in a message that used a style very similar to this. It was an event driven system using something akin to cooperative threading, but I could've used multithreading fairly easily with a couple of added tweaks.
The program is proprietary (unfortunately) so I can't show off the source code.
In my opinion, this style is useful for some things, but usually best mixed with some other techniques. Often there is a core part of your program that represents a processing bottleneck, and applying various concurrency increasing techniques there is likely to yield the biggest gains.
Microsoft had a technology called ActiveMovie (if I remember correctly) that did sequential processing on audio and video streams. Data got passed from one filter to another to go from input to output format (and source/sink). Maybe that's a practical example??
The Wikipedia article looks to me like a lot of funny symbols used to represent somewhat pedestrian concepts. For very large or extensible programs, the formalism can be very important to check how the (sub)processes are allowed to interact.
For a 50,000 line class program, you're probably better off architecting it as you see fit.
In general, following ideas such as these is a good idea in terms of performance. Persistent threads that process data in stages will tend not to contend, and exploit data locality well. Also, it is easy to throttle the threads to avoid data piling up as a fast stage feeds a slow stage: just block the fast one if its output buffer grows too big.
A little bit off-topic but for my thesis I used a tool framework called TERRA/LUNA which aims for software development for Embedded Control Systems but is used heavily for all sorts of software development at my institute (so only academical use here).
TERRA is a graphical CSP and software architecture editor and LUNA is both the name for a C++ library for CSP based constructs and the plugin you'll find in TERRA to generate C++ code from your CSP models.
It becomes very handy in combination with FDR3 (a CSP refinement checker) to detect any sort of (dead/life/etc) lock or even profiling.

can one make concurrent scalable reliable programs in C as in erlang?

a theoretical question. After reading Armstrongs 'programming erlang' book I was wondering the following:
It will take some time to learn Erlang. Let alone master it. It really is fundamentally different in a lot of respects.
So my question: Is it possible to write 'like erlang' or with some 'erlang like framework', which given that you take care not to create functions with sideffects, you can create scaleable reliable apps as well as in Erlang? Maybe with the same msgs sending, loads of 'mini processes' paradigm.
The advantage would be to not throw all your accumulated C/C++ knowledge over the fence.
Any thoughts about this would be welcome
Yes, it is possible, but...
Probably the best answer for this question is given by Robert Virding’s First Rule:
“Any sufficiently complicated
concurrent program in another language
contains an ad hoc,
informally-specified, bug-ridden, slow
implementation of half of Erlang.”
Very good rule is use the right tool for the task. Erlang excels in concurrency and reliability. C/C++ was not designed with these properties in mind.
If you don't want to throw away your C/C++ knowledge and experience and your project allows this kind of division, good approach is to create a mixed solution. Write concurrent, communication and error handling code in Erlang, then add C/C++ parts, which will do CPU and IO bound stuff.
You clearly can - the Erlang/OTP system is largely written in C (and Erlang). The question is 'why would you want to?'
In 'ye olde days' people used to write their own operating system - but why would you want to?
If you elect to use an operating system your unwritten software has certain properties - it can persist to hard disk, it can speak to a network, it can draw on screens, it can run from the command line, it can be invoked in batch mode, etc, etc...
The Erlang/OTP system is 1.5M lines of code which has been demonstrated to give 99.9999999% uptime in large systems (the UK phone system) - that's 31ms downtime a year.
With Erlang/OTP your unwritten software has high reliability, it can hot-swap itself, your unwritten application can failover when a physical computer dies.
Why would you want to rewrite that functionality?
I would break this into 2 questions
Can you write concurrent, scalable C++ applications
Yes. It's certainly possible to create the low level constructs needed in order to achieve this.
Would you want to write concurrent, scalable, C++ applications
Perhaps. But if I was going for a highly concurrent application, I would choose a language that was either designed to fill that void or easily lent itself to doing so (Erlang, F# and possibly C#).
C++ was not designed to build highly concurrent applications. But it can certainly be tweaked into doing so. The cost might be higher than you expect though once you factor in memory management.
Yes, but you will be doing some extra work.
Regarding side effects, consider how the .net/plinq team is approaching. Plinq won't be able to enforce you hand it stuff with no side effects, but it will assume you do so and play by its rules so we get to use a simpler api. Even if the language doesn't have built-in support for it, it will still simplify things as you can break the operations more easily.
What I can do in one Turing complete language I can do in any other Turing complete language.
So I interpret your question to read, is it as easy to write a reliable and scalable application in C++ as it is in Erlang?
The answer to that is highly subjective. For me it is easier to write it in C++ for the following reasons:
I have already done it in C++ (at least three times).
I don't know Erlang.
I have read a great deal about Stackless Python, which feels to me like a highly concurrent message based cooperative multitasking system in python, but of course python is written on top of C.
Having said that. If you already know both languages, and you have the problem well defined, you can then make the best choice based on all the information you have at hand.
the main 'problem' with C (or C++) for writing reliable and easy to extend programs is that in C you can do anything. so, the first step would be to write a simple framework that restricts just a bit. most good programmers do that anyway.
in this case, the restrictions would be mostly to make it easy to define a 'process' within whatever level of isolation you want. fork() has a reputation of being slow, and threads also need significant time to spawn, so you might want to use a cooperative multitasking, which can be far more efficient, and you could even make it preemptive (i think that's what Erlang does). to get multi-core efficiency, set a pool of threads and make all of them complete to run the tasks.
another important part would be to create an appropriate library of immutable data structures, so that using them (instead of the standard lib) your functions would be (mostly) side-effect-free.
then it's just a matter of setting a good API for message passing and futures... not easy, but at least it doesn't seem like changing the language itself.