RPC from C++ code to Common Lisp code - c++

I have two codebases: one written in C++ and the other in Common Lisp. There is a particular functionality implemented in the Lisp codebase that I would like to access from my C++ code. I searched for Foreign Function Interfaces to call Lisp functions from C++, but couldn't seem to find any (I found FFIs for the other direction mostly). So I decided to implement some form of RPC that fits my requirements, which are:
both codes are going to run on the same machine, so extensibility to remote machine calls is not important.
the input from C++ is going to be a Lisp-style list, which is what the function from the Lisp code is going to take as input.
this call is going to be made 1000s of times per execution of the code, so performance per remote call is critical.
So far, I've learnt from various resources on the web that possible solutions are:
Sockets - set up an instance of the Lisp code that will listen for function calls from the C++ code, run the function on the given input, and return the result to the C++ code.
XML-RPC - set up an XML-RPC server on the Lisp side (which will be easy since I use Allegro Common Lisp, which provides an API that supports XML-RPC) and then use one of the many XML-RPC libraries for C++ to make the client-side call.
The pros and cons I see with these approaches seem to be the following:
Sockets are a low-level construct, so it looks like I would need to do most of the connection management, reading and parsing the data on the sockets, etc on my own.
XML-RPC seems to suit my needs much better, but I read that it always uses HTTP, and there is no way to use UNIX domain sockets. So, it feels like XML-RPC might be overkill for what I have in mind.
Does anyone have any experience in achieving some similar integration of codes? Are there significant differences in performance between sockets and XML-RPC for local RPC? Any advice on which approach might be better would be extremely helpful. Also, suggestions on a different technique to do this would also be appreciated.
EDIT: Here are a few more details on the shared functionality. There is a function f available in the Lisp code (which is complex enough to make reimplementation in C++ prohibitively expensive). It takes as input two lists L1 and L2. How I envision this happening is the following:
L1 and L2 is constructed in C++ and sent over to the Lisp side and waits for the results,
f is invoked on the Lisp side on inputs L1 and L2 and returns results back to the C++ side,
the C++ side takes in the results and continues with its computation.
The sizes of L1 and L2 are typically not big:
L1 is a list containing typically 100s of elements, each element being a list of atmost 3-4 atoms.
L2 is also a list containing < 10 elements, each element being a list of atmost 3-4 atoms.
So the total amount of data per RPC is probably a string of 100s/1000s of bytes. This call is made at the start of each while loop in my C++ code, so its hard to give concrete numbers on number of calls per second. But from my experiments, I can say that its typically done 10s-100s of times per second. f is not a numerical computation: its symbolic. If you're familiar with AI, its essentially doing symbolic unification in first-order logic. So it is free of side-effects.

If you look at some Common Lisp implementations, their FFIs allow calling Lisp from the C side. That's not remote, but local. Sometimes it makes sense to include Lisp directly, and not call it remotely.
Commercial Lisps like LispWorks or Allegro CL also can delivered shared libraries, which you can use from your application code.
For example define-foreign-callable allows a LispWorks function to be called.
Franz ACL can do it: http://www.franz.com/support/documentation/9.0/doc/foreign-functions.htm#lisp-from-c-1
Also something like ECL should be usable from the C side.

I've started working recently on a project that requires similar functionality. Here are some things I've researched so far with some commentary:
cl-mpi would in principle allow (albeit very low-level) direct inter-process communication, but encoding data is a nightmare! You have very uncomfortable design on C/C++ side (just very-very limited + there's no way around sending variable length arrays). And on the other side, the Lisp library is both dated and seems to be at the very early stage in its development.
Apache Trift which is more of a language, then a program. Slow, memory hog. Protobuf, BSON are the same. Protobuf might be the most efficient in this group, but you'd need to roll your own communication solution, it's only the encoding/decoding protocol.
XML, JSON, S-expressions. S-expressions win in this category because they are more expressive and one side has already a very efficient parser. Alas, this is even worse then Trift / Protobuf in terms of speed / memory.
CFFI. Sigh... Managing pointers on both sides will be a nightmare. It is possible in theory, but must be very difficult in practice. This will also inevitably tax the performance of Lisp garbage collector, because you would have to get in its way.
Finally, I switched to ECL. So far so good. I'm researching mmaped files as means of sharing data. The conclusion that I've made so far for myself, this will be the way to go. At least I can't think of anything better at the moment.

There are many other ways to make two processes communicate. You could read the inter-process communication wikipage.
One of the parameters is asynchronous or synchronous character. Is your remote processing a remote procedure call (every request from client has exactly one response from server) or is it an asynchronous message passing (both sides are sending messages, but there is no notion of request and response; each side handle incoming messages as events).
The other parameter is the latency and bandwidth i.e. the volume of data exchanged (per message and e.g. per second).
Bandwidth does matter, even on the same machine. Of course, pipes or Unix sockets give you a very big bandwidth, eg 100 Megabytes/second. But there are scenarii where that might not be enough. In that pipe case, the data is usually copied (often twice) from memory to memory (e.g. from one process address space to another one).
But you might consider e.g. CORBA (see e.g. CLORB on the lisp side, and this tutorial on OmniORB), or RPC/XDR, or XML-RPC (with S-XML-RPC on the lisp side), or JSON-RPC etc...
If you don't have a lot of data and a lot of bandwidth (or a many requests or messages per second), I would suggest using a textual protocol (perhaps serializing with JSON or YAML or XML) because it is easier than a binary protocol (BSON, protobuf, etc...)
The socket layer (which could use unix(7) AF_UNIX sockets, plain anonymous or named pipe(7)-s, or tcp(7) i.e. TCP/IP, which has the advantage of giving you the ability to distribute the computation on two machines communicating by a network) is probably the simplest, as soon as you have on both (C++ and Lisp) sides a multiplexing syscall like poll(2). You need to buffer messages on both sides.
Maybe you want MPI (with CL-MPI on the lisp side).
We can't help you more, unless you explain really well and much more in the details what is the "functionality" to be shared from C++ to Lisp (what is it doing, how many remote calls per second, what volume and kind of data, what computation time, etc etc....). Is the remote function call idempotent or nullipotent, does it have side-effects? Is it a stateless protocol...
The actual data types involved in the remote procedure call matters a lot: it is much more costly to serialize a complex [mathematical] cyclic graph with shared nodes than a plain human readable string....
Given your latest details, I would suggest using JSON... It is quite fit to transmit abstract syntax tree like data. Alternatively, transmit just s-expressions (you may be left with the small issue in C++ to parse them, which is really easy once you specified and documented your conventions; if your leaf or symbolic names have arbitrary characters, you just need to define a convention to encode them.).

Related

What are the approaches and trade-offs involved in a platform-independent representation?

Before saying anything I have to say that, albeit I'm an experienced programmer in Java, I'm rather new to C / C++ programming.
I have to save a binary file in a format that makes it accessible from different operating systems & platforms. It should be very efficient because I have to deal with a lot of data. What approaches should I investigate for that? What are the main advantages and disadvantages?
Currently I'm thinking about using the network notation (something like htonl that is available both under unix and windows ). Is there a better way?
Network order (big-endian) is something of a de facto standard. However, if your program will be used mostly on x86 (which is little-endian), you may want to stick with that for performance reasons (the protocol will still be usable on big-endian machines, but they will instead have the performance impact).
Besides htonl (which converts 32-bit values), there's also htons (16-bit), and bswap_64 (non-standard for 64-bit).
If you want a binary format, but you'd like to abstract away some of the details to ease serialization and deserialization, consider Protocol Buffers or Thrift. Protocol Buffers are updatable (you can add optional or repeated (0 or more) fields to the schema without breaking existing code); not sure about Thrift.
However, before premature optimization, consider whether parsing is really the bottleneck. If reading every line of the file will require a database query or calculation, you may be able to use a more readable format without any noticeable performance impact.
I think there are a couple of decent choices for this kind of task.
In most cases, my first choice would probably be Sun's (now Oracle's) XDR. It's used in Sun's implementation of RPC, so it's been pretty heavily tested for quite a while. It's defined in RFC 1832, so documentation is widely available. There are also libraries (portable and otherwise) that know how to convert to/from this format. On-wire representation is reasonably compact and conversion fairly efficient.
The big potential problem with XDR is that you do need to know what the data represents to decode it -- i.e., you have to (by some outside means) ensure that the sender and receiver agree on (for example) the definition of the structs they'll send over the wire, before the receiver can (easily) understand what's being sent.
If you need to create a stream that's entirely self-describing, so somebody can figure out what it contains based only on the content of the stream itself, then you might consider ASN.1. It's crufty and nasty in some ways, but it does produce self-describing streams, is publicly documented, and it's used pretty widely (though mostly in rather limited domains). There are a fair number of libraries that implement encoding and decoding. I doubt anybody really likes it much, but if you need what it does, it's probably the first choice, if only because it's already known and somewhat accepted.
My first choice for a situation like this would be ASN.1 since it gives you the flexibility of using whatever programing language you desire on either end, as well as being platform independent. It hides the endian-ness issues from you so you don't have to worry about them. One end can use Java while the other end uses C or C++ or C#. It also supports multiple encoding rules you can chose from depending on your needs. There is PER (Packed Encoding Rules) if the goal is making the encoding as small as possible, or there is E-XER (Extended XML Encoding Rules) if you prefer to exchange information using XML, or there is DER (Distingushed Encoding Rules) if your application involves digital signatures or certificates. ASN.1 is widely used in telephony, but also used in banking, automobiles, aviation, medical devices, and several other area. It is mature proven technology that has stood the test of time and continues to be added in new areas where communication between disparate machines and programming languages is needed.
An excellent resource where you can try ASN.1 free is http://asn1-playground.oss.com where you can play with some existing ASN.1 specifications, or try creating your own, and see what the various encoding rules produce.
There are some excellent books available as a free download from http://www.oss.com/asn1/resources/books-whitepapers-pubs/asn1-books.html where the first one is titled "ASN.1 — Communication Between Heterogeneous Systems".

Recommendations for C/C++ remote message queues

I am working on a project which involves several C++ programs that each take input and generate output. The data (tens to hundreds of bytes, probably JSON) essentially flows (asynchronously) in one direction, and the programs will need to be located on different Linux computers around the LAN.
Since the data flows in only one direction, I don't believe I need a transactional model like HTTP. I think a message queue model (fire and forget) makes the most sense and should simplify the logic of each program. It is probably sufficient to merely note that the message was added to the remote queue successfully.
What I am looking for are recommendations for how to implement this message queue in C or C++. It seems like POSIX and Boost message queues are limited to a single host, and RabbitMQ seems to have weak C/C++ support, and MQ4CPP seems inadequately supported for a business-critical role. Am I wrong about this? What about Boost ASIO or ACE or writing socket code myself? I look forward to your suggestions.
In terms of simple messaging support, ZeroMQ is hard to beat. It's available in many language bindings and supports everything from simple send and receive to pub/sub, fanout, or even a messaging pipeline. The code is also easy to digest and makes it pretty easy to switch between patterns.
Looking at their Weather Update Server sample (in 20 some odd languages) shows how easy it can be to create publish/subscribe setups:
zmq::context_t context (1);
zmq::socket_t publisher (context, ZMQ_PUB);
publisher.bind("tcp://*:5556");
publisher.bind("ipc://weather.ipc");
while(1) {
// Send message to all subscribers
zmq::message_t message(20);
snprintf ((char *) message.data(), 20 ,
"%05d %d %d", zipcode, temperature, relhumidity);
publisher.send(message);
}
I've used it on some mixed C# and Python processes without much hassle.
Personally, if I understand the question, I think that you should use a lower-level TCP connection. It has all of the guarantied delivery that you want, and has a rather good Berkley Sockets API.
I've found that if your willing to implement a very simple protocol (eg. four-byte NBO message length, n bytes of data), you can get very simple, very customizable, and very simple. If you go with this, you also (as mentioned) get great C support (which means C++ support, although things aren't in classes and methods). The socket code is also very easy, and they have asynchronous IO with the standard async flags for the Linux/UNIX/POSIX IO functions (thats one of the other benefits, if you know anything about POSIX programing, you basically know the socket API).
One of the best resources for learning the socket API are:
Beej's Guide to Network Programing: http://beej.us/guide/bgnet/, this is very good if you need the overall programming model in addition to specifics
Man Pages: If you just need function signatures, return values, and arguments, these are all you need. I find the Linux ones to be very well written and useful (Proof: Look at my console: man, man, man, man, man, make, man, ...)
Also, for making data network-sendable, if your data is JSON, you have no worries. Because JSON is just ASCII (or UTF-8), it can be sent raw over the network with only a length header. Unless your trying to send something complicated in binary, this should be perfect (if you need complicated in binary, either look at serialization or prepare for a lot of Segmentation Fault).
Also, you probably, if you go the socket path, want to use TCP. Although UDP will give you the one-way aspect, the fact that making it reliable is pitting your home-baked solution against the top-of-the-line TCP given by the Linux kernel, TCP is an obvious option.
RabbitMQ is just one implementation of AMQP. You might want to investigate Apache Qpid or other variants that might be more C/C++ friendly. There is a libamqp for C though I have no first hand experience with it. I don't know exactly what your requirements are but AMQP, properly implemented, is industrial strength and should be orders of magnitude faster and more stable than anything you are going to build by hand in a short amount of time.
I am using Boost Serialization and socket sending for a similar application. You can find an example of serialization here :
http://code.google.com/p/cloudobserver/wiki/TutoriaslBoostSerialization
And on this page:
http://www.boost.org/doc/libs/1_38_0/doc/html/boost_asio/examples.html
under serialization you will find examples on how to make servers and clients. Make one server on a particular port and you can generate multiple clients on multiple computers which can communicate with that port.
The downside to using boost serialization is that it has a large overhead if you have a simple data structure to be serialized but it does make it easy.
Another recommendation is the distributed framework OpenCL. The document The OpenCL C++ Wrapper for API provides further information on the library. In particular, the API function cl::CommandQueue could be of interest for creating queues on devices within a network setup.
Another messaging solution is ICE (http://www.zeroc.com/). It is multi-platform, multi-language. It uses more of an RPC approach.

Concurrency model: Erlang vs Clojure

We are going to write a concurrent program using Clojure, which is going to extract keywords from a huge amount of incoming mail which will be cross-checked with a database.
One of my teammates has suggested to use Erlang to write this program.
Here I want to note something that I am new to functional programming so I am in a little doubt whether clojure is a good choice for writing this program, or Erlang is more suitable.
Do you really mean concurrent or distributed?
If you mean concurrent (multi-threaded, multi-core etc.), then I'd say Clojure is the natural solution.
Clojure's STM model is perfectly designed for multi-core concurrency since it is very efficient at storing and managing shared state between threads. If you want to understand more, well worth looking at this excellent video.
Clojure STM allows safe mutation of data by concurrent threads. Erlang sidesteps this problem by making everything immutable, which is fine in itself but doesn't help when you genuinely need shared mutable state. If you want shared mutable state in Erlang, you have to implement it with a set of message interactions which is neither efficient nor convenient (that's the price of a nothing shared model....)
You will get inherently better performance with Clojure if you are in a concurrent setting in a large machine, since Clojure doesn't rely on message passing and hence communication between threads can be much more efficient.
If you mean distributed (i.e. many different machines sharing work over a network which are effectively running as isolated processes) then I'd say Erlang is the more natural solution:
Erlang's immutable, nothing-shared, message passing style forces you to write code in a way that can be distributed. So idiomatic Erlang automatically can be distributed across multiple machines and run in a distributed, fault-tolerant setting.
Erlang is therefore very well optimised for this use case, so would be the natural choice and would certainly be the quickest to get working.
Clojure could do it as well, but you will need to do much more work yourself (i.e. you'd either need to implement or choose some form of distributed computing framework) - Clojure does not currently come with such a framework by default.
In the long term, I hope that Clojure develops a distributed computing framework that matches Erlang - then you can have the best of both worlds!
The two languages and runtimes take different approaches to concurrency:
Erlang structures programs as many lightweight processes communicating between one another. In this case, you will probably have a master process sending jobs and data to many workers and more processes to handle the resulting data.
Clojure favors a design where several threads share data and state using common data structures. It sounds particularly suitable for cases where many threads access the same data (read-only) and share little mutable state.
You need to analyze your application to determine which model suits you best. This may also depend on the external tools you use -- for example, the ability of the database to handle concurrent requests.
Another practical consideration is that clojure runs on the JVM where many open source libraries are available.
Clojure is Lisp running on the Java JVM. Erlang is designed from the ground up to be highly fault tolerant and concurrent.
I believe the task is doable with either of these languages and many others as well. Your experience will depend on how well you understand the problem and how well you know the language. If you are new to both, I'd say the problem will be challenging no matter which one you choose.
Have you thought about something like Lucene/Solr? It's great software for indexing and searching documents. I don't know what "cross checking" means for your context, but this might be a good solution to consider.
My approach would be to write a simple test in each language and test the performance of each one. Both languages are somewhat different to C style languages and if you aren't used to them (and you don't have a team that is used to them) you may end up with a maintenance nightmare.
I'd also look at using something like Groovy 1.8. Groovy now includes GPars to enable parallel computing. String and file manipulation in Groovy is very easy indeed.
It depends what you mean by huge.
Strings in erlang are painful..
but:
If huge means tens of distributed machines, than go with erlang and write workers in text friendly languages (python?, perl?). You will have distributed layer on the top with highly concurrent local workers. Each worker would be represented by erlang process. If you need more performance, rewrite your worker into C. In Erlang it is super easy to talk to another languages.
If huge still means one strong machine go with JVM. It is not huge then.
If huge is hundreds of machines, I think you will need something stronger google-like (bigtable, map/reduce) probably on C++ stack. Erlang still OK, however you will need good devs to code it.

Erlang style concurrency in the D programming language

I think Erlang-style concurrency is the answer to exponential growth of core count. You can kind of fake it with other main stream languages. But the solutions always leave me wanting. I am not willing to give up multi-paradigm programming (C++/D) to switch to Erlang's draconian syntax.
What is Erlang-style concurrency:
From one of the language authors(What is Erlang's concurrency model actually ?):
Lightweight concurrency.
Cheap to create threads and cheap to maintain insane numbers.
Asynchronous communication.
Threads only communicate via messages.
Error handling.
Process isolation.
Or from an informed blogger (What is Erlang-Style Concurrency?):
Fast process creation/destruction
Ability to support >> 10 000 concurrent processes with largely unchanged characteristics.
Fast asynchronous message passing.
Copying message-passing semantics (share-nothing concurrency).
Process monitoring.
Selective message reception.
I think D's message passing can accomplish most of these features. The ones I wonder about are ">>10,000 concurrent processes(threads)" and "fast process creation/destruction".
How well does D handle these requirements?
I think that to support them correctly you'd have to use green threads. Can D's message passing features be used with green threads library?
Storage is thread-local by default in D, so nothing is shared between threads unless it is specifically marked as shared. If you mark a variable as shared, you can then use the traditional mutexes and conditions as well as synchronized objects and the like to deal with concurrency. However, the preferred means of communicating between threads is to use the message passing facilities in std.concurrency and let all data stay thread-local, only using shared when you must. All objects passed between threads using std.concurrency must either be passed by value or be immutable, so no sharing occurs there and it is completely thread-safe. However, it can currently be a bit of a pain to get an immutable reference type which isn't an array (idup generally makes it easy for arrays), so it can be a bit annoying to pass anything other than value types or arrays (though hopefully that situation improves soon as compiler and standard library bugs relating to const and immutable get fixed and more code is made const-correct).
Now, while message passing in D will definitely result in cleaner, safer code than what you'd get in languages like C++ or Java, it is built on top of normal, C threads (e.g. Linux uses pthreads), so it does not have the kind of light-weight threads that Erlang does, and so dealing with multiple threads is not going to be as efficient as Erlang.
Of course, I don't see any reason why a more efficient thread system could not be written using D, at which point you might be able to get thread efficiency similar to that of Erlang, and it could presumably use an API similar to that of std.concurrency, but all of D's standard threading stuff is built on top of normal, C threads, so you'd have to do all of that yourself, and depending on how you implemented it and depending on how exactly the thread-local/shared stuff is dealt with by the compiler and druntime, it could be difficult to get the type system to enforce that everything be thread-local with your "green" threads. I'm afraid that I don't know enough about exactly how shared is implemented or how "green" threads work to know for sure.
Regardless, D's message passing system will certainly result in dealing with threads being more pleasant than C++ or even Java, but it's not designed to be streamlined in the same way that Erlang is. D is a general purpose systems language, not a language specifically designed to use threads for everything and thus to use them absolutely as efficiently as possible. A large portion of D's standard facilities are built on top of C, so a lot of its efficiency characteristics will be similar to those of C.
This functionality is frequently used in combination with async I/O to efficiently communicate with external sources of data as well. The vibe.d framework seems to offer both the many-fibers-on-a-few-OS-threads threading model and async I/O libraries (in addition to a whole bunch of web application libraries and project management tools).
As an unrelated side note, it's pretty freaking cool that D is both low-level enough that you could write this framework in it and high-level enough to be a compelling language to write your web applications in on top of the framework. Other popular languages with similar frameworks (node.js, Ruby's EventMachine, coroutines in Python and Go) are unable to compete with D on low-level systems coding. Other popular languages with similar systems programming facilities (C, C++) can't compete on high-level application coding.
I'm new to D, but I gotta say, I like what I see.
From whatever little I know about D: its message passing infrastructure is built on top its threading facilities. If the core threading library is a wrapper on OS threads, there is little chance that concurrency in D will reach the magnitude (>> 10000) of Erlang. Moreover D do not enforce immutability on objects, so it is easy to mess things up. So, Erlang is the best choice for heavy concurrency. Probably you can write the concurrency stuff in Erlang and the rest of the project in D. Still, it is possible to have efficient green threads in C like languages (C++, D etc) - have a look at Protothreads and ZeroMQ. You can implement very efficient messaging frameworks using these, and calling them via a C shim or directly from D.

can one make concurrent scalable reliable programs in C as in erlang?

a theoretical question. After reading Armstrongs 'programming erlang' book I was wondering the following:
It will take some time to learn Erlang. Let alone master it. It really is fundamentally different in a lot of respects.
So my question: Is it possible to write 'like erlang' or with some 'erlang like framework', which given that you take care not to create functions with sideffects, you can create scaleable reliable apps as well as in Erlang? Maybe with the same msgs sending, loads of 'mini processes' paradigm.
The advantage would be to not throw all your accumulated C/C++ knowledge over the fence.
Any thoughts about this would be welcome
Yes, it is possible, but...
Probably the best answer for this question is given by Robert Virding’s First Rule:
“Any sufficiently complicated
concurrent program in another language
contains an ad hoc,
informally-specified, bug-ridden, slow
implementation of half of Erlang.”
Very good rule is use the right tool for the task. Erlang excels in concurrency and reliability. C/C++ was not designed with these properties in mind.
If you don't want to throw away your C/C++ knowledge and experience and your project allows this kind of division, good approach is to create a mixed solution. Write concurrent, communication and error handling code in Erlang, then add C/C++ parts, which will do CPU and IO bound stuff.
You clearly can - the Erlang/OTP system is largely written in C (and Erlang). The question is 'why would you want to?'
In 'ye olde days' people used to write their own operating system - but why would you want to?
If you elect to use an operating system your unwritten software has certain properties - it can persist to hard disk, it can speak to a network, it can draw on screens, it can run from the command line, it can be invoked in batch mode, etc, etc...
The Erlang/OTP system is 1.5M lines of code which has been demonstrated to give 99.9999999% uptime in large systems (the UK phone system) - that's 31ms downtime a year.
With Erlang/OTP your unwritten software has high reliability, it can hot-swap itself, your unwritten application can failover when a physical computer dies.
Why would you want to rewrite that functionality?
I would break this into 2 questions
Can you write concurrent, scalable C++ applications
Yes. It's certainly possible to create the low level constructs needed in order to achieve this.
Would you want to write concurrent, scalable, C++ applications
Perhaps. But if I was going for a highly concurrent application, I would choose a language that was either designed to fill that void or easily lent itself to doing so (Erlang, F# and possibly C#).
C++ was not designed to build highly concurrent applications. But it can certainly be tweaked into doing so. The cost might be higher than you expect though once you factor in memory management.
Yes, but you will be doing some extra work.
Regarding side effects, consider how the .net/plinq team is approaching. Plinq won't be able to enforce you hand it stuff with no side effects, but it will assume you do so and play by its rules so we get to use a simpler api. Even if the language doesn't have built-in support for it, it will still simplify things as you can break the operations more easily.
What I can do in one Turing complete language I can do in any other Turing complete language.
So I interpret your question to read, is it as easy to write a reliable and scalable application in C++ as it is in Erlang?
The answer to that is highly subjective. For me it is easier to write it in C++ for the following reasons:
I have already done it in C++ (at least three times).
I don't know Erlang.
I have read a great deal about Stackless Python, which feels to me like a highly concurrent message based cooperative multitasking system in python, but of course python is written on top of C.
Having said that. If you already know both languages, and you have the problem well defined, you can then make the best choice based on all the information you have at hand.
the main 'problem' with C (or C++) for writing reliable and easy to extend programs is that in C you can do anything. so, the first step would be to write a simple framework that restricts just a bit. most good programmers do that anyway.
in this case, the restrictions would be mostly to make it easy to define a 'process' within whatever level of isolation you want. fork() has a reputation of being slow, and threads also need significant time to spawn, so you might want to use a cooperative multitasking, which can be far more efficient, and you could even make it preemptive (i think that's what Erlang does). to get multi-core efficiency, set a pool of threads and make all of them complete to run the tasks.
another important part would be to create an appropriate library of immutable data structures, so that using them (instead of the standard lib) your functions would be (mostly) side-effect-free.
then it's just a matter of setting a good API for message passing and futures... not easy, but at least it doesn't seem like changing the language itself.