Dining philosophers from Rust documentation do not eat concurrently - concurrency

I'm trying to follow the dining philosophers example from the Rust documentation. Final code from the link:
use std::thread;
use std::sync::{Mutex, Arc};
struct Philosopher {
name: String,
left: usize,
right: usize,
}
impl Philosopher {
fn new(name: &str, left: usize, right: usize) -> Philosopher {
Philosopher {
name: name.to_string(),
left: left,
right: right,
}
}
fn eat(&self, table: &Table) {
let _left = table.forks[self.left].lock().unwrap();
thread::sleep_ms(150);
let _right = table.forks[self.right].lock().unwrap();
println!("{} is eating.", self.name);
thread::sleep_ms(1000);
println!("{} is done eating.", self.name);
}
}
struct Table {
forks: Vec<Mutex<()>>,
}
fn main() {
let table = Arc::new(Table { forks: vec![
Mutex::new(()),
Mutex::new(()),
Mutex::new(()),
Mutex::new(()),
Mutex::new(()),
]});
let philosophers = vec![
Philosopher::new("Judith Butler", 0, 1),
Philosopher::new("Gilles Deleuze", 1, 2),
Philosopher::new("Karl Marx", 2, 3),
Philosopher::new("Emma Goldman", 3, 4),
Philosopher::new("Michel Foucault", 0, 4),
];
let handles: Vec<_> = philosophers.into_iter().map(|p| {
let table = table.clone();
thread::spawn(move || {
p.eat(&table);
})
}).collect();
for h in handles {
h.join().unwrap();
}
}
Running this produces the following output:
Michel Foucault is eating.
Michel Foucault is done eating.
Emma Goldman is eating.
Emma Goldman is done eating.
Karl Marx is eating.
Karl Marx is done eating.
Gilles Deleuze is eating.
Gilles Deleuze is done eating.
Judith Butler is eating.
Judith Butler is done eating.
According to the documentation, the philosophers should be able to eat at the same time. Desired result is something like this:
Gilles Deleuze is eating.
Emma Goldman is eating.
Emma Goldman is done eating.
Gilles Deleuze is done eating.
Judith Butler is eating.
Karl Marx is eating.
Judith Butler is done eating.
Michel Foucault is eating.
Karl Marx is done eating.
Michel Foucault is done eating.
Unfortunately, this does not happen no matter how often the code is being executed.
I'm currently using rustc 1.5.0 (3d7cd77e4 2015-12-04) on Windows, but the problem occurs on the Rust playground as well. Feel free to try it yourself.

The implementation of the problem and the suggested output do not match because of the sleep between picking forks.
I am unsure as to why Michel Foucault always starts first (probably the way thread dispatch works), but the rest is easily explained.
Due to the pause (*) between grabbing the main-hand and off-hand forks, there are two phases:
Phase 1: grab your main-hand fork
Phase 2: grab your off-hand fork
After phase 1:
Fork 0 is in the hand of either Michel Foucault or Judith Butler
Fork 1 is in the hand of Gilles Deleuze
Fork 2 is in the hand of Karl Marx
Fork 3 is in the hand of Emma Goldman
Now, note that only Fork 4 is available for grab!
We have two cases in Phase 2:
a) Judith grabbed the Fork 0
b) Michel grabbed the Fork 0
Starting with (a):
All philosophers are blocked except Emma, who grabs Fork 4
When Emma is done, she releases Fork 3, which Karl immediately grabs
When Karl is done...
Finally, Judith is done, she releases Fork 0, and Michel eats
In case (a), only one philosopher can eat at any given time.
Note: I forced the case by pausing Michel for 150ms before letting him grab his first fork.
The case (b) is more complicated as once again we have a race, this time between Emma and Michel to grab Fork 4. We are gentlemen, so Emma will go first and the case of Michel grabbing Fork 4 is now named (c):
Emma grabs Fork 4, all other philosophers are now blocked
When Emma is done, she releases Fork 3 and 4, both Michel and Karl jump on them
When Michel is done, he releases Forks 0 and 4, Judith immediately grabs it... and starts waiting; nobody cares about Fork 4 now
When Karl is done, he releases Fork 2, which Gilles immediately grabs
When Gilles is done, he releases Fork 1, which Judith immediately grabs
When Judith is done, all 5 have eaten
We observe very limited concurrency here: Emma hits first, and only when she is finished do we have two parallel streams, one with Michel, and one going Karl > Gilles > Judith.
Note: I forced the case by pausing Michel for 150ms before letting him grab his second fork.
Finally, we have case (c):
Michel grabs Fork 4, all other philosophers are now blocked
When Michel is done, he releases Fork 4 and 0, which are grabbed respectively by Emma and Judith; Judith is still blocked (first sleeping, then waiting for Fork 1) but Emma starts eating
When Emma is done...
And here again, no concurrency at all.
(*) This is not actually guaranteed, but 150ms being a long time computer-wise, unless the machine is very loaded, it will just happen.
While the solution proposed by the book does work (there is no deadlock whatever the circumstances), it does not exhibit much concurrency, so it is more an exhibit of Rust than an exhibit of concurrency... but then, it is the Rust book and not the concurrency one!
I do not understand why Michel's thread is systematically scheduled first on the playpen; but it can easily be countered by making him sleep specifically.

This is a semi-common question for this example. Programmers tend to think of threads as "random" because threads usually have differing start times and run lengths. Most usages of threads also don't lock a shared resource for the entire life of the thread. Remember that threads are sort-of deterministic, because they are scheduled by an algorithm.
In this example, the main thread creates a whole bunch of threads and adds them to a queue managed by the operating system. Eventually, the main thread is blocked or is interrupted by the scheduler. The scheduler looks through the queue of threads and asks the "first" one if it can run. If it is runnable, then it is run for a time slice or until it is blocked.
The "first" thread is up to the OS. Linux, for example, has multiple tweakable schedulers that allow you to prioritize which threads run. The scheduler can also choose to interrupt a thread earlier or later
If you add a print at the very beginning of the thread, you can see that the threads do start in a different order. Here's a table of which thread starts first, based on 100 runs:
| Position | Emma Goldman | Gilles Deleuze | Judith Butler | Karl Marx | Michel Foucault |
|----------+--------------+----------------+---------------+-----------+-----------------|
| 1 | 4 | 9 | 81 | 5 | 1 |
| 2 | 5 | 66 | 9 | 17 | 3 |
| 3 | 19 | 14 | 5 | 49 | 13 |
| 4 | 46 | 9 | 3 | 20 | 22 |
| 5 | 26 | 2 | 2 | 9 | 61 |
If I'm doing my statistics correctly, the most common starting order is:
Judith Butler
Gilles Deleuze
Karl Marx
Emma Goldman
Michel Foucault
Note that this matches the sequence of philosophers defined in the code!
Also note that the algorithm itself imposes an ordering. All but one philosopher picks up the fork on the left hand first, then waits a bit. If the threads run in order, then each one in turn is waiting on the one before it. Most of the threads have a dependence on the thread sitting to the "left". If we pictured a circular table with everyone holding a left fork (a deadlock), and we picked one person to give an extra fork to (breaking the deadlock), then you can see there would be a cascade of people able to eat.
Also remember that println! uses standard out; a mutable global resource that must be protected by a mutex. As such, printing can cause the thread to be blocked and rescheduled.
I am on OS X, which likely explains the order that I semi-consistently get that is different from yours.

Related

Sequentially consistent but not linearizable execution

I'm trying to understand the difference between linearizability and sequential consistency. More specifically, I would like to have an example of an execution which is sequentially consistent but not linearizable.
My professor gave me the following example of such execution:
Alice and bob write checks to each other.
Alice’s Statement:
-10 Check Alice -> Bob
0 Check Bob -> Alice
Bob’s Statement
-10 Check Bob -> Alice
0 Check Alice -> Bob
Both overdraft.
It is sequential: each client sees a consistent order
It is not linearizable: no globally linear story
But I didn't get it. Line
n Check A -> B
is supposed to be interpreted as "A writes a check to B and its account after the operation is n".
I don't understand why the operation shouldn't be linearisable: both Alice and Bob end up with 0 in the end which is a consistent value, so maybe I didn't get the definition of 'linearisability' properly.
First off, what your professor gave you is not the history/execution explicitly, but the projection of the history on both of its threads.
A history H is linearizable if by leaving away pending invocations and/or adding responses to pending invocations (that you didn't leave away) you can get a history that is equivalent to a sequential history S that doesn't contradict any precedences implied by H.
In other words, the reason your example is not linearizable is because the operations (financial transactions) can't be assigned a single point in time. First money is deducted, and then later added, and this behaviour is observed by the threads/individual bank statements.
If the bank statements were
Alice’s Statement:
-10 Check Alice -> Bob
0 Check Bob -> Alice
Bob’s Statement
10 Check Alice -> Bob
0 Check Bob -> Alice
then we would have a history S as follows:
Alice: Send Bob 10
Alice: Send completed
Bob: Send Alice 10
Bob: Send completed
But in your example the history could be
Alice: Send Bob 10 (i.e. money is gone)
Bob: Send Alice 10 (i.e. money is gone)
Bob: Send completed (i.e. money arrived)
Alice: Send completed (i.e. money arrived)
(or any combination of lines 1/2 switched and 3/4 switched) and you can't reorder that sequentially (i.e. pairs of started/completed together) without changing the account balance observed by each thread in between.

Fixed Round-Robin scheduling with two processes and a quantum of 1

Morning,
I'm using fixed RR algorithm with a quantum of 1. P1 arrives at 0 and P5 arrives at 1. P1 has a burst time of 10 and P5 has a burst time of 5.
P1 executes from 0 to 1. P5 arrives at 1, but it goes to the back of the queue. Since there are only two processes at the start of 1, I believe P1 would execute from 1 to 2, P5 would wait one tick and first execute from 2 to 3.
Is this correct? If not, would P5 execute immediately from 1 to 2?
Thank you
Your understanding is correct,the OS prefers a recently ended process to a newly entered one when end time of p1=start time of p5the following question may be usefulSpecial case scheduling

Wrong process getting killed on other node?

I wrote a simple program ("controller") to run some computation on a separate node ("worker"). The reason being that if the worker node runs out of memory, the controller still works:
-module(controller).
-compile(export_all).
p(Msg,Args) -> io:format("~p " ++ Msg, [time() | Args]).
progress_monitor(P,N) ->
timer:sleep(5*60*1000),
p("killing the worker which was using strategy #~p~n", [N]),
exit(P, took_to_long).
start() ->
start(1).
start(Strat) ->
P = spawn('worker#localhost', worker, start, [Strat,self(),60000000000]),
p("starting worker using strategy #~p~n", [Strat]),
spawn(controller,progress_monitor,[P,Strat]),
monitor(process, P),
receive
{'DOWN', _, _, P, Info} ->
p("worker using strategy #~p died. reason: ~p~n", [Strat, Info]);
X ->
p("got result: ~p~n", [X])
end,
case Strat of
4 -> p("out of strategies. giving up~n", []);
_ -> timer:sleep(5000), % wait for node to come back
start(Strat + 1)
end.
To test it, I deliberately wrote 3 factorial implementations that will use up lots of memory and crash, and a fourth implementation which uses tail recursion to avoid taking too much space:
-module(worker).
-compile(export_all).
start(1,P,N) -> P ! factorial1(N);
start(2,P,N) -> P ! factorial2(N);
start(3,P,N) -> P ! factorial3(N);
start(4,P,N) -> P ! factorial4(N,1).
factorial1(0) -> 1;
factorial1(N) -> N*factorial1(N-1).
factorial2(N) ->
case N of
0 -> 1;
_ -> N*factorial2(N-1)
end.
factorial3(N) -> lists:foldl(fun(X,Y) -> X*Y end, 1, lists:seq(1,N)).
factorial4(0, A) -> A;
factorial4(N, A) -> factorial4(N-1, A*N).
Note even with the tail recursive version, I'm calling it with 60000000000, which will probably take days on my machine even with factorial4. Here is the output of running the controller:
$ erl -sname 'controller#localhost'
Erlang R16B (erts-5.10.1) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
(controller#localhost)1> c(worker).
{ok,worker}
(controller#localhost)2> c(controller).
{ok,controller}
(controller#localhost)3> controller:start().
{23,24,28} starting worker using strategy #1
{23,25,13} worker using strategy #1 died. reason: noconnection
{23,25,18} starting worker using strategy #2
{23,26,2} worker using strategy #2 died. reason: noconnection
{23,26,7} starting worker using strategy #3
{23,26,40} worker using strategy #3 died. reason: noconnection
{23,26,45} starting worker using strategy #4
{23,29,28} killing the worker which was using strategy #1
{23,29,29} worker using strategy #4 died. reason: took_to_long
{23,29,29} out of strategies. giving up
ok
It almost works, but worker #4 was killed too early (should have been close to 23:31:45, not 23:29:29). Looking deeper, only worker #1 was attempted to be killed, and no others. So worker #4 should not have died, yet it did. Why? We can even see that the reason was took_to_long, and that progress_monitor #1 started at 23:24:28, five minutes before 23:29:29. So it looks like progress_monitor #1 killed worker #4 instead of worker #1. Why did it kill the wrong process?
Here is the output of the worker when I ran the controller:
$ while true; do erl -sname 'worker#localhost'; done
Erlang R16B (erts-5.10.1) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
(worker#localhost)1>
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 2733560184 bytes of memory (of type "heap").
Aborted
Erlang R16B (erts-5.10.1) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
(worker#localhost)1>
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 2733560184 bytes of memory (of type "heap").
Aborted
Erlang R16B (erts-5.10.1) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
(worker#localhost)1>
Crash dump was written to: erl_crash.dump
eheap_alloc: Cannot allocate 2733560184 bytes of memory (of type "old_heap").
Aborted
Erlang R16B (erts-5.10.1) [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
(worker#localhost)1>
There are several issues, and eventually you experienced creation number wrap around.
Since you do not cancel the progress_monitor process, it will send always an exit signal after 5 minutes.
The computation is long and/or the VM is slow, hence process 4 is still running 5 minutes after the progress monitor for process 1 was started.
The 4 worker nodes were started sequentially with the same name workers#localhost, and the creation numbers of the first and the fourth node are the same.
Creation numbers (creation field in references and pids) are a mechanism to prevent pids and references created by a crashed node to be interpreted by a new node with the same name. Exactly what you expect in your code when you try to kill worker 1 after the node is long gone, you don't intend to kill a process in a restarted node.
When a node sends a pid or a reference, it encodes its creation number. When it receives a pid or a reference from another node, it checks that the creation number in the pid matches its own creation number. The creation number are attributed by epmd following the 1,2,3 sequence.
Here, unfortunately, when the 4th node gets the exit message, the creation number matches because this sequence wrapped. Since the nodes spawn the process and did the exact same thing before (initialized erlang), the pid of the worker of node 4 matches the pid of the worker of node 1.
As a result, the controller eventually kills worker 4 believing it is worker 1.
To avoid this, you need something more robust than the creation number if there can be 4 workers within the lifespan of a pid or a reference in the controller.

Way to force file descriptor to close so that pclose() will not block?

I am creating a pipe using popen() and the process is invoking a third party tool which in some rare cases I need to terminate.
::popen(thirdPartyCommand.c_str(), "w");
If I just throw an exception and unwind the stack, my unwind attempts to call pclose() on the third party process whose results I no longer need. However, pclose() never returns as it blocks with the following stack trace on Centos 4:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00807dc3 in __waitpid_nocancel () from /lib/libc.so.6
#2 0x007d0abe in _IO_proc_close##GLIBC_2.1 () from /lib/libc.so.6
#3 0x007daf38 in _IO_new_file_close_it () from /lib/libc.so.6
#4 0x007cec6e in fclose##GLIBC_2.1 () from /lib/libc.so.6
#5 0x007d6cfd in pclose##GLIBC_2.1 () from /lib/libc.so.6
Is there any way to force the call to pclose() to be successful before calling it so I can programmatically avoid this situation of my process getting hung up waiting for pclose() to succeed when it never will because I've stopped supplying input to the popen()ed process and wish to throw away its work?
Should I write an end of file somehow to the popen()ed file descriptor before trying to close it?
Note that the third party software is forking itself. At the point where pclose() has hung, there are four processes, one of which is defunct:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
abc 6870 0.0 0.0 8696 972 ? S 04:39 0:00 sh -c /usr/local/bin/third_party /home/arg1 /home/arg2 2>&1
abc 6871 0.0 0.0 10172 4296 ? S 04:39 0:00 /usr/local/bin/third_party /home/arg1 /home/arg2
abc 6874 99.8 0.0 10180 1604 ? R 04:39 141:44 /usr/local/bin/third_party /home/arg1 /home/arg2
abc 6875 0.0 0.0 0 0 ? Z 04:39 0:00 [third_party] <defunct>
I see two solutions here:
The neat one: you fork(), pipe() and execve() (or anything in the exec family of course...) "manually", then it is going to be up to you to decide if you want to let your children become zombies or not. (i.e. to wait() for them or not)
The ugly one: if you're sure you only have one of this child process running at any given time, you could use sysctl() to check if there is any process running with this name before you call pclose()... yuk.
I strongly advise the neat way here, or you could just ask whomever responsible to fix that infinite loop in your third party tool haha.
Good luck!
EDIT:
For you first question: I don't know. Doing some researches on how to find processes by name using sysctl() shoud tell you what you need to know, I myself have never pushed it this far.
For your second and third question: popen() is basically a wrapper to fork() + pipe() + dup2() + execl().
fork() duplicates the process, execl() replaces the duplicated process' image with a new one, pipe() handles inter process communication and dup2() is used to redirect the output... And then pclose() will wait() for the duplicated process to die, which is why we're here.
If you want to know more, you should check this answer where I've recently explained how to perform a simple fork with standard IPC. In this case, it's just a bit more complicated as you have to use dup2() to redirect the standard output to your pipe.
You should also take a look at popen()/pclose() source codes, as they are of course open source.
Finally, here's a brief example, I cannot make it clearer than that:
int pipefd[2];
pipe(pipefd);
if (fork() == 0) // I'm the child
{
close(pipefd[0]); // I'm not going to read from this pipe
dup2(pipefd[1], 1); // redirect standard output to the pipe
close(pipefd[1]); // it has been duplicated, close it as we don't need it anymore
execve()/execl()/execsomething()... // execute the program you want
}
else // I'm the parent
{
close(pipefd[1]); // I'm not going to write to this pipe
while (read(pipefd[0], &buf, 1) > 0) // read while EOF
write(1, &buf, 1);
close(pipefd[1]); // cleaning
}
And as always, remember to read the man pages and to check all your return values.
Again, good luck!
Another solution is to kill all your children. If you know that the only child processes you have are processes that get started when you do popen(), then it's easy enough. Otherwise you may need some more work or use the fork() + execve() combo, in which case you will know the first child's PID.
Whenever you run a child process, it's PPID (parent process ID) is your own PID. It is easy enough to read the list of currently running processes and gather those that have their PPID = getpid(). Repeat the loop looking for processes that have their PPID equal to one of your children's PID. In the end you build a whole tree of child processes.
Since you child processes may end up creating other child processes, to make it safe, you will want to block those processes by sending a SIGSTOP. That way they will stop creating new children. As far as I know, you can't prevent the SIGSTOP from doing its deed.
The process is therefore:
function kill_all_children()
{
std::vector<pid_t> me_and_children;
me_and_children.push_back(getpid());
bool found_child = false;
do
{
found_child = false;
std::vector<process> processes(get_processes());
for(auto p : processes)
{
// i.e. if I'm the child of any one of those processes
if(std::find(me_and_children.begin(),
me_and_children.end(),
p.ppid()))
{
kill(p.pid(), SIGSTOP);
me_and_children.push_back(p.pid());
found_child = true;
}
}
}
while(found_child);
for(auto c : me_and_children)
{
// ignore ourselves
if(c == getpid())
{
continue;
}
kill(c, SIGTERM);
kill(c, SIGCONT); // make sure it continues now
}
}
This is probably not the best way to close your pipe, though, since you probably need to let the command time to handle your data. So what you want is execute that code only after a timeout. So your regular code could look something like this:
void send_data(...)
{
signal(SIGALRM, handle_alarm);
f = popen("command", "w");
// do some work...
alarm(60); // give it a minute
pclose(f);
alarm(0); // remove alarm
}
void handle_alarm()
{
kill_all_children();
}
-- about the alarm(60);, the location is up to you, it could also be placed before the popen() if you're afraid that the popen() or the work after it could also fail (i.e. I've had problems where the pipe fills up and I don't even reach the pclose() because then the child process loops forever.)
Note that the alarm() may not be the best idea in the world. You may prefer using a thread with a sleep made of a poll() or select() on an fd which you can wake up as required. That way the thread would call the kill_all_children() function after the sleep, but you can send it a message to wake it up early and let it know that the pclose() happened as expected.
Note: I left the implementation of the get_processes() out of this answer. You can read that from /proc or with the libprocps library. I have such an implementation in my snapwebsites project. It's called process_list. You could just reap off that class.
I'm using popen() to invoke a child process which doesn't need any stdin or stdout, it just runs for a short time to do its work, then it stops all by itself. Arguably, invoking this type of child process should rather be done with system() ? Anyway, pclose() is used afterwards to verify that the child process exited cleanly.
Under certain conditions, this child process keeps on running indefinitely. pclose() blocks forever, so then my parent process is also stuck. CPU usage runs to 100%, other executables get starved, and my whole embedded system crumbles. I came here looking for solutions.
Solution 1 by #cmc : decomposing popen() into fork(), pipe(), dup2() and execl().
It might just be a matter of personal taste, but I'm reluctant to rewrite perfectly fine system calls myself. I would just end up introducing new bugs.
Solution 2 by #cmc : verifying that the child process actually exists with sysctl(), to make sure that pclose() will return successfully. I find that this somehow sidesteps the problem from the OP #WilliamKF - there is definitely a child process, it just has become unresponsive. Forgoing the pclose() call won't solve that. [As an aside, in the 7 years since #cmc wrote this answer, sysctl() seems to have become deprecated.]
Solution 3 by #Alexis Wilke : killing the child process. I like this approach best. It basically automates what I did when I stepped in manually to resuscitate my dying embedded system. The problem with my stubborn adherence to popen(), is that I get no PID from the child process. I have been trying in vain with
waitid(P_PGID, getpgrp(), &child_info, WNOHANG);
but all I get on my Debian Linux 4.19 system is EINVAL.
So here's what I cobbled together. I'm searching for the child process by name; I can afford to take a few shortcuts, as I'm sure there will only be one process with this name. Ironically, commandline utility ps is invoked by yet another popen(). This won't win any elegance prizes, but at least my embedded system stays afloat now.
FILE* child = popen("child", "r");
if (child)
{
int nr_loops;
int child_pid;
for (nr_loops=10; nr_loops; nr_loops--)
{
FILE* ps = popen("ps | grep child | grep -v grep | grep -v \"sh -c \" | sed \'s/^ *//\' | sed \'s/ .*$//\'", "r");
child_pid = 0;
int found = fscanf(ps, "%d", &child_pid);
pclose(ps);
if (found != 1)
// The child process is no longer running, no risk of blocking pclose()
break;
syslog(LOG_WARNING, "child running PID %d", child_pid);
usleep(1000000); // 1 second
}
if (!nr_loops)
{
// Time to kill this runaway child
syslog(LOG_ERR, "killing PID %d", child_pid);
kill(child_pid, SIGTERM);
}
pclose(child); // Even after it had to be killed
} /* if (child) */
I learned in the hard way, that I have to pair every popen() with a pclose(), otherwise I pile up the zombie processes. I find it remarkable that this is needed after a direct kill; I figure that's because according to the manpage, popen() actually launches sh -c with the child process in it, and it's this surrounding sh that becomes a zombie.

Is F# really faster than Erlang at spawning and killing processes?

Updated: This question contains an error which makes the benchmark meaningless. I will attempt a better benchmark comparing F# and Erlang's basic concurrency functionality and inquire about the results in another question.
I am trying do understand the performance characteristics of Erlang and F#. I find Erlang's concurrency model very appealing but am inclined to use F# for interoperability reasons. While out of the box F# doesn't offer anything like Erlang's concurrency primitives -- from what I can tell async and MailboxProcessor only cover a small portion of what Erlang does well -- I've been trying to understand what is possible in F# performance wise.
In Joe Armstrong's Programming Erlang book, he makes the point that processes are very cheap in Erlang. He uses the (roughly) the following code to demonstrate this fact:
-module(processes).
-export([max/1]).
%% max(N)
%% Create N processes then destroy them
%% See how much time this takes
max(N) ->
statistics(runtime),
statistics(wall_clock),
L = for(1, N, fun() -> spawn(fun() -> wait() end) end),
{_, Time1} = statistics(runtime),
{_, Time2} = statistics(wall_clock),
lists:foreach(fun(Pid) -> Pid ! die end, L),
U1 = Time1 * 1000 / N,
U2 = Time2 * 1000 / N,
io:format("Process spawn time=~p (~p) microseconds~n",
[U1, U2]).
wait() ->
receive
die -> void
end.
for(N, N, F) -> [F()];
for(I, N, F) -> [F()|for(I+1, N, F)].
On my Macbook Pro, spawning and killing 100 thousand processes (processes:max(100000)) takes about 8 microseconds per processes. I can raise the number of processes a bit further, but a million seems to break things pretty consistently.
Knowing very little F#, I tried to implement this example using async and MailBoxProcessor. My attempt, which may be wrong, is as follows:
#r "System.dll"
open System.Diagnostics
type waitMsg =
| Die
let wait =
MailboxProcessor.Start(fun inbox ->
let rec loop =
async { let! msg = inbox.Receive()
match msg with
| Die -> return() }
loop)
let max N =
printfn "Started!"
let stopwatch = new Stopwatch()
stopwatch.Start()
let actors = [for i in 1 .. N do yield wait]
for actor in actors do
actor.Post(Die)
stopwatch.Stop()
printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
printfn "Done."
Using F# on Mono, starting and killing 100,000 actors/processors takes under 2 microseconds per process, roughly 4 times faster than Erlang. More importantly, perhaps, is that I can scale up to millions of processes without any apparent problems. Starting 1 or 2 million processes still takes about 2 microseconds per process. Starting 20 million processors is still feasible, but slows to about 6 microseconds per process.
I have not yet taken the time to fully understand how F# implements async and MailBoxProcessor, but these results are encouraging. Is there something I'm doing horribly wrong?
If not, is there some place Erlang will likely outperform F#? Is there any reason Erlang's concurrency primitives can't be brought to F# through a library?
EDIT: The above numbers are wrong, due to the error Brian pointed out. I will update the entire question when I fix it.
In your original code, you only started one MailboxProcessor. Make wait() a function, and call it with each yield. Also you are not waiting for them to spin up or receive the messages, which I think invalidates the timing info; see my code below.
That said, I have some success; on my box I can do 100,000 at about 25us each. After too much more, I think possibly you start fighting the allocator/GC as much as anything, but I was able to do a million too (at about 27us each, but at this point was using like 1.5G of memory).
Basically each 'suspended async' (which is the state when a mailbox is waiting on a line like
let! msg = inbox.Receive()
) only takes some number of bytes while it's blocked. That's why you can have way, way, way more asyncs than threads; a thread typically takes like a megabyte of memory or more.
Ok, here's the code I'm using. You can use a small number like 10, and --define DEBUG to ensure the program semantics are what is desired (printf outputs may be interleaved, but you'll get the idea).
open System.Diagnostics
let MAX = 100000
type waitMsg =
| Die
let mutable countDown = MAX
let mre = new System.Threading.ManualResetEvent(false)
let wait(i) =
MailboxProcessor.Start(fun inbox ->
let rec loop =
async {
#if DEBUG
printfn "I am mbox #%d" i
#endif
if System.Threading.Interlocked.Decrement(&countDown) = 0 then
mre.Set() |> ignore
let! msg = inbox.Receive()
match msg with
| Die ->
#if DEBUG
printfn "mbox #%d died" i
#endif
if System.Threading.Interlocked.Decrement(&countDown) = 0 then
mre.Set() |> ignore
return() }
loop)
let max N =
printfn "Started!"
let stopwatch = new Stopwatch()
stopwatch.Start()
let actors = [for i in 1 .. N do yield wait(i)]
mre.WaitOne() |> ignore // ensure they have all spun up
mre.Reset() |> ignore
countDown <- MAX
for actor in actors do
actor.Post(Die)
mre.WaitOne() |> ignore // ensure they have all got the message
stopwatch.Stop()
printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
printfn "Done."
max MAX
All this said, I don't know Erlang, and I have not thought deeply about whether there's a way to trim down the F# any more (though it's pretty idiomatic as-is).
Erlang's VM doesn't uses OS threads or process to switch to new Erlang process. It's VM simply counts function calls into your code/process and jumps to other VM's process after some (into same OS process and same OS thread).
CLR uses mechanics based on OS process and threads, so F# has much higher overhead cost for each context switch.
So answer to your question is "No, Erlang is much faster than spawning and killing processes".
P.S. You can find results of that practical contest interesting.