Out-of-order list iteration in Haskell - list

I'm implementing a (toy) stack machine in Haskell. I've defined a step function, step :: State -> Instruction -> State, which applies the result of a given instruction to a given state and returns the resultant state of the machine. Obviously, I'd like to have a function, run :: State -> Program -> State (where Program :: [Instruction]) that essentially calls step as many times as needed in order to execute the given input program.
My initial, naive solution was to foldl, like so:
run :: State -> Program -> State
run st prog = foldl (step) st prog
Obviously, this can't support jumps, which would modify where abouts in the list I need to be. All this implementation does is iterate left-to-right through the program. For additional context, the program's state, State, is as follows:
data State = State {
pc :: Word,
reg :: Word,
stack :: [Word],
memory :: [Word]
}
deriving (Show, Eq)
and instructions are as follows:
data Opcode = Add | Sub | Mul | Div | Mod | Jump | Push | Pop | Load | Store | Set | Call | Ret | Pos | Dup | Swap | Halt | Nop
deriving (Enum, Show, Eq)
data Instruction = Instruction {
opcode :: Opcode,
arg :: Maybe Word
}
deriving (Show, Eq)
How do I iterate through the list in an arbitrary order (and potentially forever, of course) so that I can support jumps?

I guess your step function will need to report whether it's time to halt or not. For example, let's suppose you modify its type to step :: State -> Instruction -> Maybe State. Then you can implement run just by shipping out to it:
run :: State -> Program -> State
run state prog = case step state (prog !! fromIntegral (pc state)) of
Nothing -> state
Just state' -> run state' prog
(You can avoid the fromIntegral by making your pc have type Int instead of Word.)
Note that (!!) is O(2n)* (fight me in the comments ;-). You should consider switching from [Instruction] to Array Word Instruction so that you can use (!) instead, which is O(n).
* Okay, to be precise, (!!) is technically O(1) because Int has a fixed size -- but, ohhhh, that constant factor! So let's say, a suitable generalization of (!!) to Integer is O(2n). Similar quibbles apply to (!).

Using an array is probably a good way to go, but there's also a way to step through your program that's a bit more like a fold by doing away with the program counter altogether.
The program counter is simply a pointer to the instruction in your program that you're going to run next. So, instead of having a program counter, we can just put the instruction to run next right into the state. Consider this alternate implementation of State:
data State = State
{ reg :: Word
, stack :: [Word]
, memory :: [Word]
, program :: Program
} deriving (Show, Eq)
Now we must reconsider step. In particular, every time we step, we aren't increasing (or arbitrarily modifying) the pc but rather changing the program to be run. Furthermore, we don't need to take an instruction as an argument because State already knows what is about to be run. Thus, we have something like:
step :: State -> State
step st = case program st of
[] -> st
HALT : _ -> st {program = []}
ADD : rest -> st {program = rest, ...} -- Do whatever you do for add too
...
But, what do we do for JUMP? How can we jump to an arbitrary position in the program when the program is disappearing on us as we run it? One option is to additionally keep track of the original program. We could put this in the State as another field, but for variety, I'm going to pass it as an additional argument to step, as in:
step :: Program -> State -> State
step originalProgram st = case program st of
[] -> st
HALT : _ -> st {program = []}
JUMP n : _ -> st {program = drop n originalProgram}
...
(Note that here I'm assuming that your JUMP is absolute as opposed to relative. If you have relative jumps, then instead of keeping track of the original program, you'd need to keep track of the "already executed" part of the program, probably as a list of instructions in reverse order. That is, every time you execute an instruction, you would pop it off the program list and push it onto the executed list. When you hit a JUMP that went backwards n instructions, you would just pop n off of the executed and push them back onto the program list.)
Now all that's left is to run the whole program:
run :: State -> Program -> State
run startState originalProgram = go (startState {program = originalProgram})
where
go st = case program st of
[] -> st
_ -> go $ step originalProgram st
As far as performance goes, this is certainly worse than using an array; most steps will be fast, but JUMPS may take longer. On the other hand, you can run infinitely long programs this way, and you don't have to worry about indexes that are out of bounds.

Related

Run the same code with different parameterizations on multiple nodes of a slurm cluster

I have a fortran code with 3 scenarios.
I set a flag at the beginning of the code for which scenario I want to run.
integer :: scenario_no = 1 !Set 1, 2 or 3
I usually manually change this flag, compile the code, and run it into a cluster node.
Is there anyway to create a sbatch file to run each of the 3 scenarios on a different note without having to recompile each time?
I recommend reading in the command line arguments to the program.
integer :: clen, status, scenario_no
character(len=4) :: buffer
! Body of runcfdsim
scenario_no = 1
clen = command_argument_count()
if(clen>0) then
call get_command_argument (1, buffer, clen, status)
if(status==0) then
read(buffer, '(BN,I4)') scenario_no
if(scenario_no<1 .or. scenario_no>4) then
scenario_no = 1
end if
end if
end if
This way you can call
runcfdsim 1
runcfdsim 2
runcfdsim 3
runcfdsim 4
and each run will have a different value for the scenario_no variable.
No, if the variable is hard-coded, you have to recompile every time you change it.
EDIT: As pointed out in the comment, reading from a file may lead to problems when running in parallel. Better read from argument:
program test
character :: arg
integer :: scenario
if(command_argument_count() >= 1) then
call get_command_argument(1, arg)
read(unit=arg, fmt=*, iostat=ios) scenario
if(ios /= 0) then
write(*,"('Invalid argument: ',a)") arg
stop
end if
write(*,"('Running for scenario: ',i0)") scenario
else
write(*,"('Invalid argument')")
end if
end program test
srun test.exe 1 &
srun test.exe 2 &
srun test.exe 3 &
wait
Make sure the outputs are done in different files, so they are not overwritten. You may also need to pin the tasks on different cores.

Erlang register error

I'm writing a program which will take two strings and concatenate them as a shared dropbox stimulation. I'm using code from a different application, which did a similar thing with a joint bank account, so the errors may be because I haven't changed some line of code properly but I just can't work out what’s wrong.
The code is written in two separate files and they link together, the basic dropbox is first and then the code which links that and displays the answer is below.
-module(dropbox).
-export([account/1, start/0, stop/0, deposit/1, get_bal/0, set_bal/1]).
account(Balance) ->
receive
{set, NewBalance} ->
account(NewBalance);
{get, From} ->
From ! {balance, Balance},
account(Balance);
stop -> ok
end.
start() ->
Account_PID = spawn(dropbox, account, [0]),
register(account_process, Account_PID).
stop() ->
account_process ! stop,
unregister(account_process).
set_bal(B) ->
account_process ! {set, B}.
get_bal() ->
account_process ! {get, self()},
receive
{balance, B} -> B
end.
deposit(Amount) ->
OldBalance = get_bal(),
NewBalance = OldBalance ++ Amount,
set_bal(NewBalance).
-module(dropboxtest).
-export([start/0, client/1]).
start() ->
dropbox:start(),
mutex:start(),
register(tester_process, self()),
loop("hello ", "world", 100),
unregister(tester_process),
mutex:stop(),
dropbox:stop().
loop(_, _, 0) ->
true;
loop(Amount1, Amount2, N) ->
dropbox:set_bal(" "),
spawn(dropboxtest, client, [Amount1]),
spawn(dropboxtest, client, [Amount2]),
receive
done -> true
end,
receive
done -> true
end,
io:format("Expected balance = ~p, actual balance = ~p~n~n",
[Amount1 ++ Amount2, dropbox:get_bal()]),
loop(Amount1, Amount2, N-1).
client(Amount) ->
dropbox:deposit(Amount),
tester_process ! done.
This is the error which I'm getting, all of the other ones I've managed to work out but I don't quite get this one so I'm not sure how to solve it.
** exception error: bad argument
in function register/2
called as register(account_process,<0.56.0>)
in call from dropbox:start/0 (dropbox.erl, line 16)
in call from dropboxtest:start/0 (dropboxtest.erl, line 5)
Also I know that this is going to come up with errors due to concurrency issues, I need to show these errors to prove what’s wrong before I can fix it. Some of the functions haven't been changed from the bank program hence balance etc.
As per the documentation, register can fail with badarg for a number of reasons:
If PidOrPort is not an existing local process or port.
If RegName is already in use.
If the process or port is already registered (already has a name).
If RegName is the atom undefined.
In this case I suspect it's the second reason, that there's already a process with the name account_process, from a previous run. You could try restarting the Erlang shell, or you could change the spawn call in dropbox:start to spawn_link, which would cause the old process to crash in case of any error in the shell.

Getting result of a spawned function in Erlang

My objective at the moment is to write Erlang code calculating a list of N elements, where each element is a factorial of it's "index" (so, for N = 10 I would like to get [1!, 2!, 3!, ..., 10!]). What's more, I would like every element to be calculated in a seperate process (I know it is simply inefficient, but I am expected to implement it and compare its efficiency with other methods later).
In my code, I wanted to use one function as a "loop" over given N, that for N, N-1, N-2... spawns a process which calculates factorial(N) and sends the result to some "collecting" function, which packs received results into a list. I know my concept is probably overcomplicated, so hopefully the code will explain a bit more:
messageFactorial(N, listPID) ->
listPID ! factorial(N). %% send calculated factorial to "collector".
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
nProcessesFactorialList(-1) ->
ok;
nProcessesFactorialList(N) ->
spawn(pFactorial, messageFactorial, [N, listPID]), %%for each N spawn...
nProcessesFactorialList(N-1).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
listPrepare(List) -> %% "collector", for the last factorial returns
receive %% a list of factorials (1! = 1).
1 -> List;
X ->
listPrepare([X | List])
end.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
startProcessesFactorialList(N) ->
register(listPID, spawn(pFactorial, listPrepare, [[]])),
nProcessesFactorialList(N).
I guess it shall work, by which I mean that listPrepare finally returns a list of factorials. But the problem is, I do not know how to get that list, how to get what it returned? As for now my code returns ok, as this is what nProcessesFactorialList returns at its finish. I thought about sending the List of results from listPrepare to nProcessesFactorialList in the end, but then it would also need to be a registered process, from which I wouldn't know how to recover that list.
So basically, how to get the result from a registered process running listPrepare (which is my list of factorials)? If my code is not right at all, I would ask for a suggestion of how to get it better. Thanks in advance.
My way how to do this sort of tasks is
-module(par_fact).
-export([calc/1]).
fact(X) -> fact(X, 1).
fact(0, R) -> R;
fact(X, R) when X > 0 -> fact(X-1, R*X).
calc(N) ->
Self = self(),
Pids = [ spawn_link(fun() -> Self ! {self(), {X, fact(X)}} end)
|| X <- lists:seq(1, N) ],
[ receive {Pid, R} -> R end || Pid <- Pids ].
and result:
> par_fact:calc(25).
[{1,1},
{2,2},
{3,6},
{4,24},
{5,120},
{6,720},
{7,5040},
{8,40320},
{9,362880},
{10,3628800},
{11,39916800},
{12,479001600},
{13,6227020800},
{14,87178291200},
{15,1307674368000},
{16,20922789888000},
{17,355687428096000},
{18,6402373705728000},
{19,121645100408832000},
{20,2432902008176640000},
{21,51090942171709440000},
{22,1124000727777607680000},
{23,25852016738884976640000},
{24,620448401733239439360000},
{25,15511210043330985984000000}]
The first problem is that your listPrepare process doesn't do anything with the result. Try to print it in the end.
The second problem is that you don't wait for all the processes to finish, but for process that sends 1 and this is the quickest factorial to calculate. So this message will surely be received before the more complex will be calculated, and you'll end up with only a few responses.
I had answered a bit similar question on the parallel work with many processes here: Create list across many processes in Erlang Maybe that one will help you.
I propose you this solution:
-export([launch/1,fact/2]).
launch(N) ->
launch(N,N).
% launch(Current,Total)
% when all processes are launched go to the result collect phase
launch(-1,N) -> collect(N+1);
launch(I,N) ->
% fact will be executed in a new process, so the normal way to get the answer is by message passing
% need to give the current process pid to get the answer back from the spawned process
spawn(?MODULE,fact,[I,self()]),
% loop until all processes are launched
launch(I-1,N).
% simply send the result to Pid.
fact(N,Pid) -> Pid ! {N,fact_1(N,1)}.
fact_1(I,R) when I < 2 -> R;
fact_1(I,R) -> fact_1(I-1,R*I).
% init the collect phase with an empty result list
collect(N) -> collect(N,[]).
% collect(Remaining_result_to_collect,Result_list)
collect(0,L) -> L;
% accumulate the results in L and loop until all messages are received
collect(N,L) ->
receive
R -> collect(N-1,[R|L])
end.
but a much more straight (single process) solution could be:
1> F = fun(N) -> lists:foldl(fun(I,[{X,R}|Q]) -> [{I,R*I},{X,R}|Q] end, [{0,1}], lists:seq(1,N)) end.
#Fun<erl_eval.6.80484245>
2> F(6).
[{6,720},{5,120},{4,24},{3,6},{2,2},{1,1},{0,1}]
[edit]
On a system with multicore, cache and an multitask underlying system, there is absolutly no guarantee on the order of execution, same thing on message sending. The only guarantee is in the message queue where you know that you will analyse the messages according to the order of message reception. So I agree with Dmitry, your stop condition is not 100% effective.
In addition, using startProcessesFactorialList, you spawn listPrepare which collect effectively all the factorial values (except 1!) and then simply forget the result at the end of the process, I guess this code snippet is not exactly the one you use for testing.

How to fully utilise `lwt` in this case

Here is what I am going to do:
I have a list of task and I need to run them all every 1 hour (scheduling).
All those tasks are similar. for example, for one task, I need to download some data from a server (using http protocol and would take 5 - 8 seconds) and then do a computation on the data (would take 1 - 5 seconds).
I think I can use lwt to achieve these, but can't figure out the best way for efficiency.
For the task scheduling part, I can do like this (How to schedule a task in OCaml?):
let rec start () =
(Lwt_unix.sleep 1.) >>= (fun () -> print_endline "Hello, world !"; start ())
let _ = Lwt_main.run (start())
The questions come from the actual do_task part.
So a task involves http download and computation.
The http download part would have to wait for 5 to 8 seconds. If I really execute each task one by one, then it wastes the bandwidth and of course, I wish the download process of all tasks to be in parallel. So should I put this download part to lwt? and will lwt handle all the downloads in parallel?
By code, should I do like this?:
let content = function
| Some (_, body) -> Cohttp_lwt_unix.Body.string_of_body body
| _ -> return ""
let download task =
Cohttp_lwt_unix.Client.get ("http://dataserver/task?name="^task.name)
let get_data task =
(download task) >>= (fun response -> Lwt.return (Content response))
let do_task task =
(get_data task) >>= (fun data -> Lwt.return_unit (calculate data))
So, through the code above, will all tasks be executed in parallel, at least for the http download part?
For calculation part, will all calculations be executed in sequence?
Furthermore, can any one briefly describe the mechanism of lwt? Internally, what is the logic of light weight thread? Why can it handle IO in parallel?
To do parallel computation using lwt, you can check the lwt_list module, and especially iter_p.
val iter_p : ('a -> unit Lwt.t) -> 'a list -> unit Lwt.t
iter_p f l call the function f on each element of l, then waits for all the threads to terminate. For your purpose, it would look like :
let do_tasks tasks = List.iter_p do_task tasks
Assuming that "tasks" is a list of task.

Is F# really faster than Erlang at spawning and killing processes?

Updated: This question contains an error which makes the benchmark meaningless. I will attempt a better benchmark comparing F# and Erlang's basic concurrency functionality and inquire about the results in another question.
I am trying do understand the performance characteristics of Erlang and F#. I find Erlang's concurrency model very appealing but am inclined to use F# for interoperability reasons. While out of the box F# doesn't offer anything like Erlang's concurrency primitives -- from what I can tell async and MailboxProcessor only cover a small portion of what Erlang does well -- I've been trying to understand what is possible in F# performance wise.
In Joe Armstrong's Programming Erlang book, he makes the point that processes are very cheap in Erlang. He uses the (roughly) the following code to demonstrate this fact:
-module(processes).
-export([max/1]).
%% max(N)
%% Create N processes then destroy them
%% See how much time this takes
max(N) ->
statistics(runtime),
statistics(wall_clock),
L = for(1, N, fun() -> spawn(fun() -> wait() end) end),
{_, Time1} = statistics(runtime),
{_, Time2} = statistics(wall_clock),
lists:foreach(fun(Pid) -> Pid ! die end, L),
U1 = Time1 * 1000 / N,
U2 = Time2 * 1000 / N,
io:format("Process spawn time=~p (~p) microseconds~n",
[U1, U2]).
wait() ->
receive
die -> void
end.
for(N, N, F) -> [F()];
for(I, N, F) -> [F()|for(I+1, N, F)].
On my Macbook Pro, spawning and killing 100 thousand processes (processes:max(100000)) takes about 8 microseconds per processes. I can raise the number of processes a bit further, but a million seems to break things pretty consistently.
Knowing very little F#, I tried to implement this example using async and MailBoxProcessor. My attempt, which may be wrong, is as follows:
#r "System.dll"
open System.Diagnostics
type waitMsg =
| Die
let wait =
MailboxProcessor.Start(fun inbox ->
let rec loop =
async { let! msg = inbox.Receive()
match msg with
| Die -> return() }
loop)
let max N =
printfn "Started!"
let stopwatch = new Stopwatch()
stopwatch.Start()
let actors = [for i in 1 .. N do yield wait]
for actor in actors do
actor.Post(Die)
stopwatch.Stop()
printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
printfn "Done."
Using F# on Mono, starting and killing 100,000 actors/processors takes under 2 microseconds per process, roughly 4 times faster than Erlang. More importantly, perhaps, is that I can scale up to millions of processes without any apparent problems. Starting 1 or 2 million processes still takes about 2 microseconds per process. Starting 20 million processors is still feasible, but slows to about 6 microseconds per process.
I have not yet taken the time to fully understand how F# implements async and MailBoxProcessor, but these results are encouraging. Is there something I'm doing horribly wrong?
If not, is there some place Erlang will likely outperform F#? Is there any reason Erlang's concurrency primitives can't be brought to F# through a library?
EDIT: The above numbers are wrong, due to the error Brian pointed out. I will update the entire question when I fix it.
In your original code, you only started one MailboxProcessor. Make wait() a function, and call it with each yield. Also you are not waiting for them to spin up or receive the messages, which I think invalidates the timing info; see my code below.
That said, I have some success; on my box I can do 100,000 at about 25us each. After too much more, I think possibly you start fighting the allocator/GC as much as anything, but I was able to do a million too (at about 27us each, but at this point was using like 1.5G of memory).
Basically each 'suspended async' (which is the state when a mailbox is waiting on a line like
let! msg = inbox.Receive()
) only takes some number of bytes while it's blocked. That's why you can have way, way, way more asyncs than threads; a thread typically takes like a megabyte of memory or more.
Ok, here's the code I'm using. You can use a small number like 10, and --define DEBUG to ensure the program semantics are what is desired (printf outputs may be interleaved, but you'll get the idea).
open System.Diagnostics
let MAX = 100000
type waitMsg =
| Die
let mutable countDown = MAX
let mre = new System.Threading.ManualResetEvent(false)
let wait(i) =
MailboxProcessor.Start(fun inbox ->
let rec loop =
async {
#if DEBUG
printfn "I am mbox #%d" i
#endif
if System.Threading.Interlocked.Decrement(&countDown) = 0 then
mre.Set() |> ignore
let! msg = inbox.Receive()
match msg with
| Die ->
#if DEBUG
printfn "mbox #%d died" i
#endif
if System.Threading.Interlocked.Decrement(&countDown) = 0 then
mre.Set() |> ignore
return() }
loop)
let max N =
printfn "Started!"
let stopwatch = new Stopwatch()
stopwatch.Start()
let actors = [for i in 1 .. N do yield wait(i)]
mre.WaitOne() |> ignore // ensure they have all spun up
mre.Reset() |> ignore
countDown <- MAX
for actor in actors do
actor.Post(Die)
mre.WaitOne() |> ignore // ensure they have all got the message
stopwatch.Stop()
printfn "Process spawn time=%f microseconds." (stopwatch.Elapsed.TotalMilliseconds * 1000.0 / float(N))
printfn "Done."
max MAX
All this said, I don't know Erlang, and I have not thought deeply about whether there's a way to trim down the F# any more (though it's pretty idiomatic as-is).
Erlang's VM doesn't uses OS threads or process to switch to new Erlang process. It's VM simply counts function calls into your code/process and jumps to other VM's process after some (into same OS process and same OS thread).
CLR uses mechanics based on OS process and threads, so F# has much higher overhead cost for each context switch.
So answer to your question is "No, Erlang is much faster than spawning and killing processes".
P.S. You can find results of that practical contest interesting.