Is F# really faster than Erlang at spawning and killing processes? - concurrency

Updated: This question contains an error which makes the benchmark meaningless. I will attempt a better benchmark comparing F# and Erlang's basic concurrency functionality and inquire about the results in another question.
I am trying do understand the performance characteristics of Erlang and F#. I find Erlang's concurrency model very appealing but am inclined to use F# for interoperability reasons. While out of the box F# doesn't offer anything like Erlang's concurrency primitives -- from what I can tell async and MailboxProcessor only cover a small portion of what Erlang does well -- I've been trying to understand what is possible in F# performance wise.
In Joe Armstrong's Programming Erlang book, he makes the point that processes are very cheap in Erlang. He uses the (roughly) the following code to demonstrate this fact:
-module(processes).
-export([max/1]).
%% max(N)
%% Create N processes then destroy them
%% See how much time this takes
max(N) ->
statistics(runtime),
statistics(wall_clock),
L = for(1, N, fun() -> spawn(fun() -> wait() end) end),
{_, Time1} = statistics(runtime),
{_, Time2} = statistics(wall_clock),
lists:foreach(fun(Pid) -> Pid ! die end, L),
U1 = Time1 * 1000 / N,
U2 = Time2 * 1000 / N,
io:format("Process spawn time=~p (~p) microseconds~n",
[U1, U2]).
wait() ->
receive
die -> void
end.
for(N, N, F) -> [F()];
for(I, N, F) -> [F()|for(I+1, N, F)].
On my Macbook Pro, spawning and killing 100 thousand processes (processes:max(100000)) takes about 8 microseconds per processes. I can raise the number of processes a bit further, but a million seems to break things pretty consistently.
Knowing very little F#, I tried to implement this example using async and MailBoxProcessor. My attempt, which may be wrong, is as follows:
#r "System.dll"
open System.Diagnostics
type waitMsg =
| Die
let wait =
MailboxProcessor.Start(fun inbox ->
let rec loop =
async { let! msg = inbox.Receive()
match msg with
| Die -> return() }
loop)
let max N =
printfn "Started!"
let stopwatch = new Stopwatch()
stopwatch.Start()
let actors = [for i in 1 .. N do yield wait]
for actor in actors do
actor.Post(Die)
stopwatch.Stop()
printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
printfn "Done."
Using F# on Mono, starting and killing 100,000 actors/processors takes under 2 microseconds per process, roughly 4 times faster than Erlang. More importantly, perhaps, is that I can scale up to millions of processes without any apparent problems. Starting 1 or 2 million processes still takes about 2 microseconds per process. Starting 20 million processors is still feasible, but slows to about 6 microseconds per process.
I have not yet taken the time to fully understand how F# implements async and MailBoxProcessor, but these results are encouraging. Is there something I'm doing horribly wrong?
If not, is there some place Erlang will likely outperform F#? Is there any reason Erlang's concurrency primitives can't be brought to F# through a library?
EDIT: The above numbers are wrong, due to the error Brian pointed out. I will update the entire question when I fix it.

In your original code, you only started one MailboxProcessor. Make wait() a function, and call it with each yield. Also you are not waiting for them to spin up or receive the messages, which I think invalidates the timing info; see my code below.
That said, I have some success; on my box I can do 100,000 at about 25us each. After too much more, I think possibly you start fighting the allocator/GC as much as anything, but I was able to do a million too (at about 27us each, but at this point was using like 1.5G of memory).
Basically each 'suspended async' (which is the state when a mailbox is waiting on a line like
let! msg = inbox.Receive()
) only takes some number of bytes while it's blocked. That's why you can have way, way, way more asyncs than threads; a thread typically takes like a megabyte of memory or more.
Ok, here's the code I'm using. You can use a small number like 10, and --define DEBUG to ensure the program semantics are what is desired (printf outputs may be interleaved, but you'll get the idea).
open System.Diagnostics
let MAX = 100000
type waitMsg =
| Die
let mutable countDown = MAX
let mre = new System.Threading.ManualResetEvent(false)
let wait(i) =
MailboxProcessor.Start(fun inbox ->
let rec loop =
async {
#if DEBUG
printfn "I am mbox #%d" i
#endif
if System.Threading.Interlocked.Decrement(&countDown) = 0 then
mre.Set() |> ignore
let! msg = inbox.Receive()
match msg with
| Die ->
#if DEBUG
printfn "mbox #%d died" i
#endif
if System.Threading.Interlocked.Decrement(&countDown) = 0 then
mre.Set() |> ignore
return() }
loop)
let max N =
printfn "Started!"
let stopwatch = new Stopwatch()
stopwatch.Start()
let actors = [for i in 1 .. N do yield wait(i)]
mre.WaitOne() |> ignore // ensure they have all spun up
mre.Reset() |> ignore
countDown <- MAX
for actor in actors do
actor.Post(Die)
mre.WaitOne() |> ignore // ensure they have all got the message
stopwatch.Stop()
printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
printfn "Done."
max MAX
All this said, I don't know Erlang, and I have not thought deeply about whether there's a way to trim down the F# any more (though it's pretty idiomatic as-is).

Erlang's VM doesn't uses OS threads or process to switch to new Erlang process. It's VM simply counts function calls into your code/process and jumps to other VM's process after some (into same OS process and same OS thread).
CLR uses mechanics based on OS process and threads, so F# has much higher overhead cost for each context switch.
So answer to your question is "No, Erlang is much faster than spawning and killing processes".
P.S. You can find results of that practical contest interesting.

Related

Futures and Promises in Erlang

Does Erlang has equivalents for Future and Promises? Or since future and promises are solving a problem that doesn't exist in Erlang systems (synchronisation orchestrating for example), and hence we don't need them in Erlang.
If I want the semantics of futures and promises in Erlang, they could be emulated via Erlang processes/actors?
You could easily implement a future in Erlang like this:
F = fun() -> fancy_function() end,
% fancy code
Pid = self(),
Other = spawn(fun() -> X = F(), Pid ! {future, self(), X} end).
% more fancy code
Value = receive {future, Other, Val} -> Val end.
Having this functionality in a module and building chains from it should be easy too, but to be honest I never actually missed something like this. You are more flexible if you just freely send messages around.
The RPC module contains the rpc:async_call/4 function that does what you need. It will run a computation anywhere in a cluster (including on node(), the local node), and allow to have the result waited on with rpc:yield/1:
1> MaxTime = rpc:async_call(node(), timer, sleep, [30000]).
<0.48.0>
2> lists:sort([a,c,b]).
[a,b,c]
3> rpc:yield(MaxTime).
... [long wait] ...
ok
You can also poll in non-blocking ways by using rpc:nb_yield/1, or for a limited number of milliseconds with rpc:nb_yield/2:
4> Key2 = rpc:async_call(node(), timer, sleep, [30000]).
<0.52.0>
5> rpc:nb_yield(Key2).
timeout
6> rpc:nb_yield(Key2).
timeout
7> rpc:nb_yield(Key2).
timeout
8> rpc:nb_yield(Key2, 1000).
timeout
9> rpc:nb_yield(Key2, 100000).
... [long wait] ...
{value,ok}
That's all in the standard library and ready to be used.

Getting result of a spawned function in Erlang

My objective at the moment is to write Erlang code calculating a list of N elements, where each element is a factorial of it's "index" (so, for N = 10 I would like to get [1!, 2!, 3!, ..., 10!]). What's more, I would like every element to be calculated in a seperate process (I know it is simply inefficient, but I am expected to implement it and compare its efficiency with other methods later).
In my code, I wanted to use one function as a "loop" over given N, that for N, N-1, N-2... spawns a process which calculates factorial(N) and sends the result to some "collecting" function, which packs received results into a list. I know my concept is probably overcomplicated, so hopefully the code will explain a bit more:
messageFactorial(N, listPID) ->
listPID ! factorial(N). %% send calculated factorial to "collector".
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
nProcessesFactorialList(-1) ->
ok;
nProcessesFactorialList(N) ->
spawn(pFactorial, messageFactorial, [N, listPID]), %%for each N spawn...
nProcessesFactorialList(N-1).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
listPrepare(List) -> %% "collector", for the last factorial returns
receive %% a list of factorials (1! = 1).
1 -> List;
X ->
listPrepare([X | List])
end.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
startProcessesFactorialList(N) ->
register(listPID, spawn(pFactorial, listPrepare, [[]])),
nProcessesFactorialList(N).
I guess it shall work, by which I mean that listPrepare finally returns a list of factorials. But the problem is, I do not know how to get that list, how to get what it returned? As for now my code returns ok, as this is what nProcessesFactorialList returns at its finish. I thought about sending the List of results from listPrepare to nProcessesFactorialList in the end, but then it would also need to be a registered process, from which I wouldn't know how to recover that list.
So basically, how to get the result from a registered process running listPrepare (which is my list of factorials)? If my code is not right at all, I would ask for a suggestion of how to get it better. Thanks in advance.
My way how to do this sort of tasks is
-module(par_fact).
-export([calc/1]).
fact(X) -> fact(X, 1).
fact(0, R) -> R;
fact(X, R) when X > 0 -> fact(X-1, R*X).
calc(N) ->
Self = self(),
Pids = [ spawn_link(fun() -> Self ! {self(), {X, fact(X)}} end)
|| X <- lists:seq(1, N) ],
[ receive {Pid, R} -> R end || Pid <- Pids ].
and result:
> par_fact:calc(25).
[{1,1},
{2,2},
{3,6},
{4,24},
{5,120},
{6,720},
{7,5040},
{8,40320},
{9,362880},
{10,3628800},
{11,39916800},
{12,479001600},
{13,6227020800},
{14,87178291200},
{15,1307674368000},
{16,20922789888000},
{17,355687428096000},
{18,6402373705728000},
{19,121645100408832000},
{20,2432902008176640000},
{21,51090942171709440000},
{22,1124000727777607680000},
{23,25852016738884976640000},
{24,620448401733239439360000},
{25,15511210043330985984000000}]
The first problem is that your listPrepare process doesn't do anything with the result. Try to print it in the end.
The second problem is that you don't wait for all the processes to finish, but for process that sends 1 and this is the quickest factorial to calculate. So this message will surely be received before the more complex will be calculated, and you'll end up with only a few responses.
I had answered a bit similar question on the parallel work with many processes here: Create list across many processes in Erlang Maybe that one will help you.
I propose you this solution:
-export([launch/1,fact/2]).
launch(N) ->
launch(N,N).
% launch(Current,Total)
% when all processes are launched go to the result collect phase
launch(-1,N) -> collect(N+1);
launch(I,N) ->
% fact will be executed in a new process, so the normal way to get the answer is by message passing
% need to give the current process pid to get the answer back from the spawned process
spawn(?MODULE,fact,[I,self()]),
% loop until all processes are launched
launch(I-1,N).
% simply send the result to Pid.
fact(N,Pid) -> Pid ! {N,fact_1(N,1)}.
fact_1(I,R) when I < 2 -> R;
fact_1(I,R) -> fact_1(I-1,R*I).
% init the collect phase with an empty result list
collect(N) -> collect(N,[]).
% collect(Remaining_result_to_collect,Result_list)
collect(0,L) -> L;
% accumulate the results in L and loop until all messages are received
collect(N,L) ->
receive
R -> collect(N-1,[R|L])
end.
but a much more straight (single process) solution could be:
1> F = fun(N) -> lists:foldl(fun(I,[{X,R}|Q]) -> [{I,R*I},{X,R}|Q] end, [{0,1}], lists:seq(1,N)) end.
#Fun<erl_eval.6.80484245>
2> F(6).
[{6,720},{5,120},{4,24},{3,6},{2,2},{1,1},{0,1}]
[edit]
On a system with multicore, cache and an multitask underlying system, there is absolutly no guarantee on the order of execution, same thing on message sending. The only guarantee is in the message queue where you know that you will analyse the messages according to the order of message reception. So I agree with Dmitry, your stop condition is not 100% effective.
In addition, using startProcessesFactorialList, you spawn listPrepare which collect effectively all the factorial values (except 1!) and then simply forget the result at the end of the process, I guess this code snippet is not exactly the one you use for testing.

Map-style parallel processing?

I'm trying to do some very simple parallel processing in python. I have a list of data, and I want to compute the exact same thing for each element, and return it as a list, so I looked into some simple map-style modules available (https://wiki.python.org/moin/ParallelProcessing).
I previously used the pprocess module, but it does not seem work this time. I looked into using either forkmap or forkfun, but I haven't really found some nice examples on how to use them.
What would you recommend as the easiest to use map-style parallel processing module? Preferably with a tutorial of some sort.
First off Im not sure having multiple threads would make your program go much faster (but would be curious to see what is the speed up)
I would not use a special tutorial/module and just use basic process/threading stuff
from multiprocessing import Process, Lock, Queue
Output values out of map into the queue (Queue())
processess = []
results_queue = Queue()
for i in xrange(50):
p = Process(target=MyMapFunction, args=tab[i*50:(i+1) * 50])
processess.append(p)
p.start()
# Waiting and Reducing...
all_key_values = {}
for _ in xrange(50):
for k, v in results_queue.get():
all_key_values.setdefault(k, []).append(v)
# Some sort of check that threads are done but they should be
for p in processess:
p.join()
def MyMapFunction(tab):
return [(x, 2 * x) for x in tab]
I let you do the reduce the same way I did the map and correct the shitty i*50 : (i+1) * 50 that I wrote quickly to give an example
This is the pattern I use when I want to multithread in Python

How to fully utilise `lwt` in this case

Here is what I am going to do:
I have a list of task and I need to run them all every 1 hour (scheduling).
All those tasks are similar. for example, for one task, I need to download some data from a server (using http protocol and would take 5 - 8 seconds) and then do a computation on the data (would take 1 - 5 seconds).
I think I can use lwt to achieve these, but can't figure out the best way for efficiency.
For the task scheduling part, I can do like this (How to schedule a task in OCaml?):
let rec start () =
(Lwt_unix.sleep 1.) >>= (fun () -> print_endline "Hello, world !"; start ())
let _ = Lwt_main.run (start())
The questions come from the actual do_task part.
So a task involves http download and computation.
The http download part would have to wait for 5 to 8 seconds. If I really execute each task one by one, then it wastes the bandwidth and of course, I wish the download process of all tasks to be in parallel. So should I put this download part to lwt? and will lwt handle all the downloads in parallel?
By code, should I do like this?:
let content = function
| Some (_, body) -> Cohttp_lwt_unix.Body.string_of_body body
| _ -> return ""
let download task =
Cohttp_lwt_unix.Client.get ("http://dataserver/task?name="^task.name)
let get_data task =
(download task) >>= (fun response -> Lwt.return (Content response))
let do_task task =
(get_data task) >>= (fun data -> Lwt.return_unit (calculate data))
So, through the code above, will all tasks be executed in parallel, at least for the http download part?
For calculation part, will all calculations be executed in sequence?
Furthermore, can any one briefly describe the mechanism of lwt? Internally, what is the logic of light weight thread? Why can it handle IO in parallel?
To do parallel computation using lwt, you can check the lwt_list module, and especially iter_p.
val iter_p : ('a -> unit Lwt.t) -> 'a list -> unit Lwt.t
iter_p f l call the function f on each element of l, then waits for all the threads to terminate. For your purpose, it would look like :
let do_tasks tasks = List.iter_p do_task tasks
Assuming that "tasks" is a list of task.

How to schedule a task in OCaml?

I have a task need to be done every 4 hours or once a day.
In Java, it has quartz or spring or timer.
But in OCaml, how do I do that? Any good lib for that?
I don't know any library to do that, but I think you can easily implement that kind of behavior using the Lwt library.
Little example, to print Hello world every 4 hours :
let rec hello () =
Lwt.bind (Lwt_unix.sleep 14400.)
(fun () -> print_endline "Hello, world !"; hello ())
Lwt.async (hello)
The Lwt.async function call the function given (here, hello) in an asynchronous light weight thread, so you're free to do other stuff in your program. As long as your program doesn't exit, "Hello world" will be printed every 4 hours.
If you want to be able to stop it, you can also launch the thread like this instead of Lwt.async :
let a = hello ()
And then, to stop the thread :
Lwt.cancel a
Be aware that Lwt.cancel throws a "Lwt.canceled" exception !
Then, to be able to launch a task at a particular time of day, I can only encourage you to use functions from the Unix module, like localtime and mktime.