Can I catch an external exit in Erlang? - concurrency

I have two processes linked; let's say they're A and B, with A set to trap exits. I want to be able to recover a piece of the B's process data if someone calls exit/2 on it, e.g. exit(B, diediedie).
In B's module, let's call it bmod.erl, I have some code that looks like this:
-module(bmod).
-export([b_start/2]).
b_start(A, X) ->
spawn(fun() -> b_main(A, X) end).
b_main(A, X) ->
try
A ! {self(), doing_stuff},
do_stuff()
catch
exit:_ -> exit({terminated, X})
end,
b_main(A, X).
do_stuff() -> io:format("doing stuff.~n",[]).
And in A's module, let's call it amod.erl, I have some code that looks like this:
-module(amod).
-export([a_start/0]).
a_start() ->
process_flag(trap_exit, true),
link(bmod:b_start(self(), some_stuff_to_do)),
a_main().
a_main() ->
receive
{Pid, doing_stuff} ->
io:format("Process ~p did stuff.~n",[Pid]),
exit(Pid, diediedie),
a_main();
{'EXIT', Pid, {terminated, X}} ->
io:format("Process ~p was terminated, had ~p.~n", [Pid,X]),
fine;
{'EXIT', Pid, _Reason} ->
io:format("Process ~p was terminated, can't find what it had.~n", [Pid]),
woops
end.
(I realize that I should do spawn_link normally but in my original program there is code in between the spawn and the link, and so I modeled this example code this way.)
Now when I run the code, I get this.
2> c(amod).
{ok,amod}
3> c(bmod).
{ok,bmod}
4> amod:a_start().
doing stuff.
Process <0.44.0> did stuff.
doing stuff.
Process <0.44.0> did stuff.
Process <0.44.0> was terminated, can't find what it had.
woops
5>
How do I get b_main() to catch this external exit so it can report its state X?

For b_main() to catch the external exit, it has to trap exit by calling process_flag(trap_exit, true). This will result in a message to the process where it can exit with the state X. The code is as below
b_start(A, X) ->
spawn(fun() -> process_flag(trap_exit, true), b_main(A, X) end).
b_main(A, X) ->
try
A ! {self(), doing_stuff},
do_stuff()
catch
exit:_ ->
io:format("exit inside do_stuff() . ~n"),
exit({terminated, X})
end,
receive
{'EXIT',Pid, Reason} ->
io:format("Process received exit ~p ~p.~n",[Pid, Reason]),
exit({terminated, X})
after 0 ->
ok
end,
b_main(A, X).

The short answer: you should do trap_exit in b_main/2 as well, and receive {'EXIT', ...} messages. It was outlined by #vinod right before my attempt. I, instead, will try to explain some things about what is going on.
If the process is trapping exits and it comes to die, for example when someone called exit(Pid, die) or some linked process ends up itself with exit(die), then it will get the {'EXIT', ...} message in its mailbox instead of dying silently with the same reason. It is the runtime system that issues exit signals to every linked process, and one may trap it instead of dying.
The only exception to this rule is when exit(Pid, kill) call issued, then no matter whether a process is trapping exits or not, it just dies with reason kill.
So, to avoid silent death caused by external exit signal, the process must trap exits. Again, if the process wants to know why someone linked to him just died and take some efforts to recover, that process must trap exits. Every trapped exit signal appears as a message in the process mailbox.
So, there is no effect of your try ... catch exit:_ -> ... statement in the matter of trapping exits.
Generally trap_exit is considered bad practice. There is simple example that shows why:
18> self().
<0.42.0>
19> Pid = spawn_link(fun () -> process_flag(trap_exit, true),
Loop = fun (F) -> receive Any -> io:format("Any: ~p~n", [Any]) end, F(F) end,
Loop(Loop) end).
<0.58.0>
20> exit(Pid, grenade).
Any: {'EXIT',<0.42.0>,grenade}
true
21> exit(Pid, grenade).
Any: {'EXIT',<0.42.0>,grenade}
true
...
As you may see some process is linked, is trapping exits and refuses to exit normally. It is unexpected and, obviously, is potentially dangerous. And it may break a chain of exits issued to a set of linked processes, since links are transitive.
There are bunch of subtle specialties which are laid out wonderfully in this book chapter.

Related

Concurrent process execution order

Trying to figure out how Erlang concurrency works. For testing, I have the following modules:
server.erl:
-module(server).
-export([loop/0]).
loop() ->
receive
{foo, Msg_foo} ->
io:format("~w~n", [Msg_foo]),
loop();
{bar, Msg_bar} ->
io:format("~w~n", [Msg_bar]),
loop();
stop ->
io:format("~s~n", ["End server process"]),
true
end.
process_a.erl
-module(process_a).
-export([go_a/0]).
go_a() ->
receive
{foo, Pid1} ->
Pid1 ! {foo, 'Message foo from process A'},
go_a();
{bar, Pid2} ->
Pid2 ! {bar, 'Message bar from process A'},
go_a()
end.
process_b.erl
-module(process_b).
-export([go_b/0]).
go_b() ->
receive
{foo, Pid1} ->
Pid1 ! {foo, 'Message foo from process B'},
go_b();
{bar, Pid2} ->
Pid2 ! {bar, 'Message bar from process B'},
go_b()
end.
client.erl
-module(client).
-export([start/0]).
-import(server, [loop/0]).
-import(process_a, [go_a/0]).
-import(process_b, [go_b/0]).
go() ->
Server_Pid = spawn(server, loop, []),
Pid_A = spawn(process_a, go_a, []),
Pid_B = spawn(process_b, go_b, []),
Pid_A ! {foo, Server_Pid},
Pid_B ! {bar, Server_Pid},
Pid_A ! {bar, Server_Pid},
Pid_B ! {foo, Server_Pid},
Pid_A ! {foo, Server_Pid},
Pid_B ! {foo, Server_Pid},
Pid_A ! {bar, Server_Pid},
Pid_B ! {bar, Server_Pid}.
start() ->
go().
The client sends messages to process A and process B which in turn send messages to the server. The order of the messages is:
A foo
B bar
A bar
B foo
A foo
B foo
A bar
B bar
but the program output is:
'Message foo from process A'
'Message bar from process A'
'Message foo from process A'
'Message bar from process A'
'Message bar from process B'
'Message foo from process B'
'Message foo from process B'
'Message bar from process B'
The server first processes all messages from process A, then all the messages from process B. My question is, what does determine the message processing order? I thought that it was the order in which the messages were received.
It all depends on process scheduling. After your client code starts the server and procs A and B, those processes are newly created but might not even have been given any time to execute yet (and if they have, they will immediately be suspended in their receives). The client code keeps executing and quickly sends off a bunch of messages to A and B. These are asynchronous operations and the client process will not have to suspend at all before returning from the call to go().
As soon as a suspended process gets a message, it becomes ready to be scheduled for execution, but it can take a fraction of time before this happens. Meanwhile, more messages may keep arriving in their mailboxes, so when A or B actually start running, they are likely to have all four messages from the client already in their mailboxes. In general you can also not be sure which of A and B will start to execute first, even though the scheduling probably is very predictable in a simple case like this.
So in your case, A gets scheduled before B, it starts executing, and in very short time it consumes all its messages. This does not take much work, so A won't even spend a whole time slice. Then it suspends due to its mailbox being empty. Then B gets scheduled and does the same thing.
If there had been many processes, and/or a lot of work, the Erlang VM could have split the processes up across schedulers on different OS threads (running in truly parallel fashion if you have a multicore CPU). But since the example is so simple, these processes are probably handled within a single scheduler, and thus the ordering becomes even more predictable. If both A and B had thousands of messages in their queue, or each message took a lot of computational effort to process, you would see the messages getting interleaved.
(By the way, your import declarations in the client do nothing, since you are using spawn(Module, Fname, Args). If you had written e.g. spawn(fun() -> loop() end) they would be needed.)

Erlang register error

I'm writing a program which will take two strings and concatenate them as a shared dropbox stimulation. I'm using code from a different application, which did a similar thing with a joint bank account, so the errors may be because I haven't changed some line of code properly but I just can't work out what’s wrong.
The code is written in two separate files and they link together, the basic dropbox is first and then the code which links that and displays the answer is below.
-module(dropbox).
-export([account/1, start/0, stop/0, deposit/1, get_bal/0, set_bal/1]).
account(Balance) ->
receive
{set, NewBalance} ->
account(NewBalance);
{get, From} ->
From ! {balance, Balance},
account(Balance);
stop -> ok
end.
start() ->
Account_PID = spawn(dropbox, account, [0]),
register(account_process, Account_PID).
stop() ->
account_process ! stop,
unregister(account_process).
set_bal(B) ->
account_process ! {set, B}.
get_bal() ->
account_process ! {get, self()},
receive
{balance, B} -> B
end.
deposit(Amount) ->
OldBalance = get_bal(),
NewBalance = OldBalance ++ Amount,
set_bal(NewBalance).
-module(dropboxtest).
-export([start/0, client/1]).
start() ->
dropbox:start(),
mutex:start(),
register(tester_process, self()),
loop("hello ", "world", 100),
unregister(tester_process),
mutex:stop(),
dropbox:stop().
loop(_, _, 0) ->
true;
loop(Amount1, Amount2, N) ->
dropbox:set_bal(" "),
spawn(dropboxtest, client, [Amount1]),
spawn(dropboxtest, client, [Amount2]),
receive
done -> true
end,
receive
done -> true
end,
io:format("Expected balance = ~p, actual balance = ~p~n~n",
[Amount1 ++ Amount2, dropbox:get_bal()]),
loop(Amount1, Amount2, N-1).
client(Amount) ->
dropbox:deposit(Amount),
tester_process ! done.
This is the error which I'm getting, all of the other ones I've managed to work out but I don't quite get this one so I'm not sure how to solve it.
** exception error: bad argument
in function register/2
called as register(account_process,<0.56.0>)
in call from dropbox:start/0 (dropbox.erl, line 16)
in call from dropboxtest:start/0 (dropboxtest.erl, line 5)
Also I know that this is going to come up with errors due to concurrency issues, I need to show these errors to prove what’s wrong before I can fix it. Some of the functions haven't been changed from the bank program hence balance etc.
As per the documentation, register can fail with badarg for a number of reasons:
If PidOrPort is not an existing local process or port.
If RegName is already in use.
If the process or port is already registered (already has a name).
If RegName is the atom undefined.
In this case I suspect it's the second reason, that there's already a process with the name account_process, from a previous run. You could try restarting the Erlang shell, or you could change the spawn call in dropbox:start to spawn_link, which would cause the old process to crash in case of any error in the shell.

How to synchronize multiple goroutines to the termination of a selected goroutine (ie. Thread.join())

I asked this in a previous question, but some people felt that my original question was not detailed enough ("Why would you ever want a timed condition wait??") so here is a more specific one.
I have a goroutine running, call it server. It is already started, will execute for some amount of time, and do its thing. Then, it will exit since it is done.
During its execution some large number of other goroutines start. Call them "client" threads if you like. They run step A, and step B. Then, they must wait for the "server" goroutine to finish for a specified amount of time, and exit with status if "server is not finished, and say run step C if it finishes.
(Please do not tell me how to restructure this workflow. It is hypothetical and a given. It cannot be changed.)
A normal, sane way to do this is to have the server thread signal a condition variable with a selectAll or Broadcast function, and have the other threads in a timed wait state monitoring the condition variable.
func (s *Server) Join(timeMillis int) error {
s.mux.Lock()
defer s.mux.Unlock()
while !s.isFinished {
err = s.cond.Wait(timeMillis)
if err != nil {
stepC()
}
}
return err
}
Where the server will enter a state where isFinished becomes true and broadcast signal the condition variable atomically with respect to the mutex. Except this is impoosible, since Go does not support timed condition waits. (But there is a Broadcast())
So, what is the "Go-centric" way to do this? I've reall all of the Go blogs and documentation, and this pattern or its equivalent, despite its obviousness, never comes up, nor any equivalent "reframing" of the basic problem - which is that IPC style channels are between one routine and one other routine. Yes, there is fan-in/fan-out, but remember these threads are constantly appearing and vanishing. This should be simple - and crucially /not leave thousands of "wait-state" goroutines hanging around waiting for the server to die when the other "branch" of the mux channel (the timer) has signalled/.
Note that some of the "client" above might be started before the server goroutine has started (this is when channel is usually created), some might appear during, and some might appear after... in all cases they should run stepC if and only if the server has run and exited after timeMillis milliseconds post entering the Join() function...
In general the channels facility seems sorely lacking when there's more than one consumer. "First build a registry of channels to which listeners are mapped" and "there's this really nifty recursive data structure which sends itself over a channel it holds as field" are so.not.ok as replacements to the nice, reliable, friendly, obvious: wait(forSomeTime)
I think what you want can be done by selecting on a single shared channel, and then having the server close it when it's done.
Say we create a global "Exit channel", that's shared across all goroutines. It can be created before the "server" goroutine is created. The important part is that the server goroutine never sends anything down the channel, but simply closes it.
Now the client goroutines, simply do this:
select {
case <- ch:
fmt.Println("Channel closed, server is done!")
case <-time.After(time.Second):
fmt.Println("Timed out. do recovery stuff")
}
and the server goroutine just does:
close(ch)
More complete example:
package main
import(
"fmt"
"time"
)
func waiter(ch chan struct{}) {
fmt.Println("Doing stuff")
fmt.Println("Waiting...")
select {
case <- ch:
fmt.Println("Channel closed")
case <-time.After(time.Second):
fmt.Println("Timed out. do recovery stuff")
}
}
func main(){
ch := make(chan struct{})
go waiter(ch)
go waiter(ch)
time.Sleep(100*time.Millisecond)
fmt.Println("Closing channel")
close(ch)
time.Sleep(time.Second)
}
This can be abstracted as the following utility API:
type TimedCondition struct {
ch chan struct{}
}
func NewTimedCondition()*TimedCondition {
return &TimedCondition {
ch: make(chan struct{}),
}
}
func (c *TimedCondition)Broadcast() {
close(c.ch)
}
func (c *TimedCondition)Wait(t time.Duration) error {
select {
// channel closed, meaning broadcast was called
case <- c.ch:
return nil
case <-time.After(t):
return errors.New("Time out")
}
}

File Server erlang response

I am trying to follow the very first example given in the Programming Erlang, Software for a concurrent world by Joe Armstrong. Here is the code:
-module(afile_server).
-export([start/1,loop/1]).
start(Dir) -> spawn(afile_server,loop,[Dir]).
loop(Dir) ->
receive
{Client, list_dir} ->
Client ! {self(), file:list_dir(Dir)};
{Client, {get_file, File}} ->
Full = filename:join(Dir,File),
Client ! {self(), file:read_file(Full)}
end,
loop(Dir).
Then I run this in the shell:
c(afile_server).
FileServer = spawn(afile_server, start, ".").
FileServer ! {self(), list_dir}.
receive X -> X end.
In the book a list of the files is returned as expected however in my shell it looks as if the program has frozen. Nothing gets returned yet the program is still running. I'm not familiar at all with erlang however I can understand how this should work.
I'm running this in Windows 7 64-bit. The directory is not empty as it contains a bunch of other erlang files.
So... what start/1 function is doing here? It's spawning a process which starts from loop/1 and you don't just run it in your shell but spawning it too! So you have a chain of two spawned processes and process under FileServer dies imiedietly because its only job is to spawn actual file server which pid is unknown to you.
Just change line:
FileServer = spawn(afile_server, start, ".").
to:
FileServer = afile_server:start(".").
The drawing below illustrates Lukasz explanation.

Getting result of a spawned function in Erlang

My objective at the moment is to write Erlang code calculating a list of N elements, where each element is a factorial of it's "index" (so, for N = 10 I would like to get [1!, 2!, 3!, ..., 10!]). What's more, I would like every element to be calculated in a seperate process (I know it is simply inefficient, but I am expected to implement it and compare its efficiency with other methods later).
In my code, I wanted to use one function as a "loop" over given N, that for N, N-1, N-2... spawns a process which calculates factorial(N) and sends the result to some "collecting" function, which packs received results into a list. I know my concept is probably overcomplicated, so hopefully the code will explain a bit more:
messageFactorial(N, listPID) ->
listPID ! factorial(N). %% send calculated factorial to "collector".
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
nProcessesFactorialList(-1) ->
ok;
nProcessesFactorialList(N) ->
spawn(pFactorial, messageFactorial, [N, listPID]), %%for each N spawn...
nProcessesFactorialList(N-1).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
listPrepare(List) -> %% "collector", for the last factorial returns
receive %% a list of factorials (1! = 1).
1 -> List;
X ->
listPrepare([X | List])
end.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
startProcessesFactorialList(N) ->
register(listPID, spawn(pFactorial, listPrepare, [[]])),
nProcessesFactorialList(N).
I guess it shall work, by which I mean that listPrepare finally returns a list of factorials. But the problem is, I do not know how to get that list, how to get what it returned? As for now my code returns ok, as this is what nProcessesFactorialList returns at its finish. I thought about sending the List of results from listPrepare to nProcessesFactorialList in the end, but then it would also need to be a registered process, from which I wouldn't know how to recover that list.
So basically, how to get the result from a registered process running listPrepare (which is my list of factorials)? If my code is not right at all, I would ask for a suggestion of how to get it better. Thanks in advance.
My way how to do this sort of tasks is
-module(par_fact).
-export([calc/1]).
fact(X) -> fact(X, 1).
fact(0, R) -> R;
fact(X, R) when X > 0 -> fact(X-1, R*X).
calc(N) ->
Self = self(),
Pids = [ spawn_link(fun() -> Self ! {self(), {X, fact(X)}} end)
|| X <- lists:seq(1, N) ],
[ receive {Pid, R} -> R end || Pid <- Pids ].
and result:
> par_fact:calc(25).
[{1,1},
{2,2},
{3,6},
{4,24},
{5,120},
{6,720},
{7,5040},
{8,40320},
{9,362880},
{10,3628800},
{11,39916800},
{12,479001600},
{13,6227020800},
{14,87178291200},
{15,1307674368000},
{16,20922789888000},
{17,355687428096000},
{18,6402373705728000},
{19,121645100408832000},
{20,2432902008176640000},
{21,51090942171709440000},
{22,1124000727777607680000},
{23,25852016738884976640000},
{24,620448401733239439360000},
{25,15511210043330985984000000}]
The first problem is that your listPrepare process doesn't do anything with the result. Try to print it in the end.
The second problem is that you don't wait for all the processes to finish, but for process that sends 1 and this is the quickest factorial to calculate. So this message will surely be received before the more complex will be calculated, and you'll end up with only a few responses.
I had answered a bit similar question on the parallel work with many processes here: Create list across many processes in Erlang Maybe that one will help you.
I propose you this solution:
-export([launch/1,fact/2]).
launch(N) ->
launch(N,N).
% launch(Current,Total)
% when all processes are launched go to the result collect phase
launch(-1,N) -> collect(N+1);
launch(I,N) ->
% fact will be executed in a new process, so the normal way to get the answer is by message passing
% need to give the current process pid to get the answer back from the spawned process
spawn(?MODULE,fact,[I,self()]),
% loop until all processes are launched
launch(I-1,N).
% simply send the result to Pid.
fact(N,Pid) -> Pid ! {N,fact_1(N,1)}.
fact_1(I,R) when I < 2 -> R;
fact_1(I,R) -> fact_1(I-1,R*I).
% init the collect phase with an empty result list
collect(N) -> collect(N,[]).
% collect(Remaining_result_to_collect,Result_list)
collect(0,L) -> L;
% accumulate the results in L and loop until all messages are received
collect(N,L) ->
receive
R -> collect(N-1,[R|L])
end.
but a much more straight (single process) solution could be:
1> F = fun(N) -> lists:foldl(fun(I,[{X,R}|Q]) -> [{I,R*I},{X,R}|Q] end, [{0,1}], lists:seq(1,N)) end.
#Fun<erl_eval.6.80484245>
2> F(6).
[{6,720},{5,120},{4,24},{3,6},{2,2},{1,1},{0,1}]
[edit]
On a system with multicore, cache and an multitask underlying system, there is absolutly no guarantee on the order of execution, same thing on message sending. The only guarantee is in the message queue where you know that you will analyse the messages according to the order of message reception. So I agree with Dmitry, your stop condition is not 100% effective.
In addition, using startProcessesFactorialList, you spawn listPrepare which collect effectively all the factorial values (except 1!) and then simply forget the result at the end of the process, I guess this code snippet is not exactly the one you use for testing.