Some httpc requests neither complete nor trigger error - concurrency

This Erlang based web client issues N concurrent requests to a given URL, e.g., escript client.erl http://127.0.0.1:1234/random?num=1000 3200 issues 3200 concurrent requests. For completeness, a corresponding server can be found here.
The client uses spawn_link to spawn N processes to issue one request each. The success/failure of httpc:request call in the spawned process is handled in the spawned process (within the dispatch_requests function) and propagated to the parent process. The parent process also handles both data messages from a spawned process and EXIT messages corresponding to the abnormal termination of a spawned process. So, the parent process waits to receive N messages from/about the spawned processes before it terminates normally.
Given the above context, the client hangs on some executions (e.g., the server is terminated forcefully) as the parent process never receives N messages from the children processes. I am observing this behavior on Raspberry Pi 3B running Raspbian 9.9 and esl-erlang 22.0-1.
The parent process does not seem to be not handling all cases of termination of child processes. If so, what are these cases? If not, what might be the reason for fewer than N messages?
Client code:
% escript client.erl http://127.0.0.1:1234/random?num=5 30
-module(client).
-export([main/1]).
-import(httpc, [request/1]).
-mode(compile).
dispatch_request(Url, Parent) ->
Start = erlang:monotonic_time(microsecond),
{Status, Value} = httpc:request(get, {Url, []}, [{timeout, 60000}], []),
Elapsed_Time = (erlang:monotonic_time(microsecond) - Start) / 1000,
Msg = case Status of
ok ->
io_lib:format("~pms OK", [Elapsed_Time]);
error ->
io_lib:format("~pms REQ_ERR ~p", [Elapsed_Time, element(1, Value)])
end,
Parent ! {Status, Msg}.
wait_on_children(0, NumOfSucc, NumOfFail) ->
io:format("Success: ~p~n", [NumOfSucc]),
io:format("Failure: ~p~n", [NumOfFail]);
wait_on_children(Num, NumOfSucc, NumOfFail) ->
receive
{'EXIT', ChildPid, {ErrorCode, _}} ->
io:format("Child ~p crashed ~p~n", [ChildPid, ErrorCode]),
wait_on_children(Num - 1, NumOfSucc, NumOfFail);
{Verdict, Msg} ->
io:format("~s~n", [Msg]),
case Verdict of
ok -> wait_on_children(Num - 1, NumOfSucc + 1, NumOfFail);
error -> wait_on_children(Num - 1, NumOfSucc, NumOfFail + 1)
end
end.
main(Args) ->
inets:start(),
Url = lists:nth(1, Args),
Num = list_to_integer(lists:nth(2, Args)),
Parent = self(),
process_flag(trap_exit, true),
[spawn_link(fun() -> dispatch_request(Url, Parent) end) ||
_ <- lists:seq(1, Num)],
wait_on_children(Num, 0, 0).

Related

ValueError('Cannot invoke RPC: Channel closed!') when multiple process are launched together

I get this error when I launch, from zero, more than 4 process in sync:
{
"insertId": "61a4a4920009771002b74809",
"jsonPayload": {
"asctime": "2021-11-29 09:59:46,620",
"message": "Exception in callback <bound method ResumableBidiRpc._on_call_done of <google.api_core.bidi.ResumableBidiRpc object at 0x3eb1636b2cd0>>: ValueError('Cannot invoke RPC: Channel closed!')",
"funcName": "handle_event",
"lineno": 183,
"filename": "_channel.py"
}
This is the pub-sub schema:
pub-sub-schema
The error seems to happen at step 9 or 10.
The actual code is:
future = publisher.publish(
topic_path,
encoded_message,
msg_attribute=message_key
)
future.add_done_callback(
callback=lambda f:
logging.info(...)
)
subscriber = pubsub_v1.SubscriberClient()
subscription_path = subscriber.subscription_path(
PROJECT_ID,
"..."
)
streaming_pull_future = subscriber.subscribe(
subscription_path,
callback=aggregator_callback_handler.handle_message
)
aggregator_callback_handler.callback = streaming_pull_future
wait_result(
timeout=300,
pratica=...,
f_check_res_condition=lambda: aggregator_callback_handler.response is not None
)
streaming_pull_future.cancel()
subscriber.close()
The module aggregator_callback_handler handles .nack and .ack.
The error is returned for some seconds, then the VMs on which the services are hosted scales and the error stops. Same if, instead of launching the processes all together, I scale them manually launching them one by one and leaving some sleep in-between.
I've already checked the timeouts and put the subscriber outside of context manager, but those solutions doesn't work.
Any idea on how to handle this?

Erlang: looping a function for each element in a list

I have a calls.txt:
{john, [jill,joe,bob]}.
{jill, [bob,joe,bob]}.
{sue, [jill,jill,jill,bob,jill]}.
{bob, [john]}.
{joe, [sue]}.
I want to loop through Receiver1 and call the spawner() function for each element in Receiver1. However, when I call read() in the console, nothing happens. Please help
Here's my code:
read() ->
{ok, [List1, List2, List3, List4, List5]} =
file:consult("calls.txt"),
{Sender1, Receiver1} = List1,
{Sender2, Receiver2} = List2,
{Sender3, Receiver3} = List3,
{Sender4, Receiver4} = List4,
{Sender5, Receiver5} = List5,
[spawner(Sender1,H) || H <- Receiver1].
spawner(Sname,Rname) ->
Sender = spawn(calling,intercept,[]),
Sender ! {self(), {Sname,Rname}},
get_feedback().
get_feedback() ->
receive
{Msg} ->
io:fwrite(Msg)
end.
get_feedback() executes in the main/original process. Just because the call to get_feedback() occurs after the call to spawn() does not mean that get_feedback() executes in the spawned process. The only thing that executes in the spawned process is the function calling:intercept().
As a result, you end up sending a message to the spawned process, then you enter a receive in the main process when you call get_feedback()--which means the main process waits for a message from the spawned process, yet the spawned process doesn't send any messages (or at least you haven't shown any code where the spawned process sends a message to the main process).
If you call get_feedback() inside the function calling:intercept(), then get_feedback() will execute in the spawned process--and the receive will try to match messages sent by the main process to the spawned process.
Also, if for some reason a receive won't match a message and you think it should, then rewrite the receive like this:
receive
Msg -> io:format("~w~n", [Msg])
end
That receive will match any message, then you can examine the output to determine why your actual receive isn't matching.
Or, you can have intercept() send a msg back to the main process, and then get_feedback() can receive the message in the main process:
-module(my).
-compile(export_all).
read() ->
{ok, Pairs} = file:consult("calls.txt"),
[spawner(Sender, Receivers) || {Sender, Receivers} <- Pairs].
spawner(Sender, Receivers) ->
io:format("~w, ~w~n", [Sender, Receivers]),
Pid = spawn(my, intercept, []),
io:format("The main process is: ~w.~n", [self()]),
io:format(
"The newly spawned process where intercept() is executing is: ~w~n",
[Pid]
),
Pid ! {self(), {Sender, Receivers} },
get_feedback().
intercept() ->
receive
{From, Pair} ->
io:format("intercept() received message: ~w, from: ~w~n", [Pair, From]),
From ! {self(), "hello"}
end.
get_feedback() ->
receive
{From, Msg} ->
io:format("Main process received message: ~p, from: ~w~n", [Msg, From])
end.
In the shell:
18> c(my).
my.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,my}
19> my:read().
john, [jill,joe,bob]
The main process is: <0.64.0>.
The newly spawned process where intercept() is executing is: <0.168.0>
intercept() received message: {john,[jill,joe,bob]}, from: <0.64.0>
Main process received message: "hello", from: <0.168.0>
jill, [bob,joe,bob]
The main process is: <0.64.0>.
The newly spawned process where intercept() is executing is: <0.169.0>
intercept() received message: {jill,[bob,joe,bob]}, from: <0.64.0>
Main process received message: "hello", from: <0.169.0>
sue, [jill,jill,jill,bob,jill]
The main process is: <0.64.0>.
The newly spawned process where intercept() is executing is: <0.170.0>
intercept() received message: {sue,[jill,jill,jill,bob,jill]}, from: <0.64.0>
Main process received message: "hello", from: <0.170.0>
bob, [john]
The main process is: <0.64.0>.
The newly spawned process where intercept() is executing is: <0.171.0>
intercept() received message: {bob,[john]}, from: <0.64.0>
Main process received message: "hello", from: <0.171.0>
joe, [sue]
The main process is: <0.64.0>.
The newly spawned process where intercept() is executing is: <0.172.0>
intercept() received message: {joe,[sue]}, from: <0.64.0>
Main process received message: "hello", from: <0.172.0>
[ok,ok,ok,ok,ok]

How to apply timeout on method in Erlang?

Is there a way to apply timeout on simple module method call,
As example,
my_method(Name)->
timer:sleep(2000),
io:format("hello world ~p!~n",[Name]).
I want to add timeout option to above method, is there a way to do it?
You could spawn your function and wait for a a message back. You can set a timeout while you wait on the receive.
my_method(Name)->
YourTimeOut = 10,
Self = self(),
_Pid = spawn(fun()->
timer:sleep(2000),
io:format("hello world ~p!~n",[Name]),
Self ! {self(), ok} end),
receive
{_PidSpawned, ok} -> ok
after
YourTimeOut -> timout
end.
See gen:call/3,4 implementation. It is done by
do_call(Process, Label, Request, Timeout) ->
try erlang:monitor(process, Process) of
Mref ->
%% If the monitor/2 call failed to set up a connection to a
%% remote node, we don't want the '!' operator to attempt
%% to set up the connection again. (If the monitor/2 call
%% failed due to an expired timeout, '!' too would probably
%% have to wait for the timeout to expire.) Therefore,
%% use erlang:send/3 with the 'noconnect' option so that it
%% will fail immediately if there is no connection to the
%% remote node.
catch erlang:send(Process, {Label, {self(), Mref}, Request},
[noconnect]),
receive
{Mref, Reply} ->
erlang:demonitor(Mref, [flush]),
{ok, Reply};
{'DOWN', Mref, _, _, noconnection} ->
Node = get_node(Process),
exit({nodedown, Node});
{'DOWN', Mref, _, _, Reason} ->
exit(Reason)
after Timeout -> %% <-- HERE
erlang:demonitor(Mref, [flush]),
exit(timeout)
end
catch
error:_ ->
%% Node (C/Java?) is not supporting the monitor.
%% The other possible case -- this node is not distributed
%% -- should have been handled earlier.
%% Do the best possible with monitor_node/2.
%% This code may hang indefinitely if the Process
%% does not exist. It is only used for featureweak remote nodes.
Node = get_node(Process),
monitor_node(Node, true),
receive
{nodedown, Node} ->
monitor_node(Node, false),
exit({nodedown, Node})
after 0 ->
Tag = make_ref(),
Process ! {Label, {self(), Tag}, Request},
wait_resp(Node, Tag, Timeout) %% <-- HERE for C/Java nodes
end
end.

Concurrent Erlang Code

From an old exam, there is a question I don't understand the answer to. The module for the question looks like this:
-module(p4).
-export([start/0, init/0, f1/1, f2/1, f3/2]).
start() ->
spawn(fun() -> init() end).
init() -> loop(0).
loop(N) ->
receive
{f1, Pid} ->
Pid ! {f1r, self(), N},
loop(N);
{f2, Pid} ->
Pid ! {f2r, self()},
loop(N+1);
{f3, Pid, M} ->
Pid ! {f3r, self()},
loop(M)
end.
f1(Serv) ->
Serv ! {f1, self()},
receive {f1r, Serv, N} -> N end.
f2(Serv) ->
Serv ! {f2, self()},
receive {f2r, Serv} -> ok end.
f3(Serv, N) ->
Serv ! {f3, self(), N},
receive {f3r, Serv} -> ok end.
The question asks to consider the following function as part of the code, and what the result of the function would be. The correct answer is 2. I would think it'd be 3, since the "increase-call" f2(Server) is after the response of self()!{f1r, Server, 2}.
test3() ->
Server = start(),
self()!{f1r, Server, 2},
f2(Server),
f1(Server).
My questions are:
Why is the answer 2 and not 3, AND how does the self()!{f1r, Server, 2} work?
Isn't the self()!{f1r, Server, 2} a response from the receive clause in the loop(N) of the f1(Serv) function?
self()!{f1r, Server, 2} sends the {f1r, Server, 2} to itself.
This message will wait in the inbox until received.
Then f2 gets executed and then f1.
It is in this last function, when the last line receive {f1r, Serv, N} -> N end is executed, the process running test3 receives the message waiting in the inbox (the one sent to itself) and returns the 2 in that message.
Note that at the end of the program, there would have been a {f1r, <process number>, N} waiting in the inbox, N having a value of 1.

Akka 2.2 ClusterSingletonManager sending HandOverToMe to [None]

I'm using Akka 2.2 contrib's project ClusterSingletonManager to guarantee there is always one and just one specific type of actor (master) in a cluster. However, I've observed an odd behaviour (which, incidentally, may be expected, but can't understand why). Whenever a master drops out of the cluster and joins in later, the following sequence of actions occur:
[INFO] [04/30/2013 17:47:35.805] [ClusterSystem-akka.actor.default-dispatcher-9] [akka://ClusterSystem/system/cluster/core/daemon] Cluster Node [akka.tcp://ClusterSystem#127.0.0.1:2551] - Welcome from [akka.tcp://ClusterSystem#127.0.0.1:2552]
[INFO] [04/30/2013 17:47:48.703] [ClusterSystem-akka.actor.default-dispatcher-8] [akka://ClusterSystem/user/singleton] Member removed [akka.tcp://ClusterSystem#127.0.0.1:52435]
[INFO] [04/30/2013 17:47:48.712] [ClusterSystem-akka.actor.default-dispatcher-2] [akka://ClusterSystem/user/singleton] ClusterSingletonManager state change [Start -> BecomingLeader]
[INFO] [04/30/2013 17:47:49.752] [ClusterSystem-akka.actor.default-dispatcher-9] [akka://ClusterSystem/user/singleton] Retry [1], sending HandOverToMe to [None]
[INFO] [04/30/2013 17:47:50.850] [ClusterSystem-akka.actor.default-dispatcher-21] [akka://ClusterSystem/user/singleton] Retry [2], sending HandOverToMe to [None]
[INFO] [04/30/2013 17:47:51.951] [ClusterSystem-akka.actor.default-dispatcher-20] [akka://ClusterSystem/user/singleton] Retry [3], sending HandOverToMe to [None]
[INFO] [04/30/2013 17:47:53.049] [ClusterSystem-akka.actor.default-dispatcher-3]
...
[INFO] [04/30/2013 17:48:10.650] [ClusterSystem-akka.actor.default-dispatcher-21] [akka://ClusterSystem/user/singleton] Retry [20], sending HandOverToMe to [None]
[INFO] [04/30/2013 17:48:11.751] [ClusterSystem-akka.actor.default-dispatcher-4] [akka://ClusterSystem/user/singleton] Timeout in BecomingLeader. Previous leader unknown, removed and no TakeOver request.
[INFO] [04/30/2013 17:48:11.752] [ClusterSystem-akka.actor.default-dispatcher-4] [akka://ClusterSystem/user/singleton] Singleton manager [akka.tcp://ClusterSystem#127.0.0.1:2551] starting singleton actor
[INFO] [04/30/2013 17:48:11.754] [ClusterSystem-akka.actor.default-dispatcher-4] [akka://ClusterSystem/user/singleton] ClusterSingletonManager state change [BecomingLeader -> Leader]
Why is it attempting to send an HandOverToMe to [None]? It takes about 20 seconds (20 retries) until it becomes the new leader, though in this particular situation the previous one was well known...
I'm not sure if this will answer your question, but in looking at the source code for ClusterSingletonManager, you can see the chain of events that leads to this scenario. This class uses the Finite State Machine logic in Akka, and the behavior you are seeing is kicked off due to a state transition from Start -> BecomingLeader. First, look at the Start state:
when(Start) {
case Event(StartLeaderChangedBuffer, _) ⇒
leaderChangedBuffer = context.actorOf(Props[LeaderChangedBuffer].withDispatcher(context.props.dispatcher))
getNextLeaderChanged()
stay
case Event(InitialLeaderState(leaderOption, memberCount), _) ⇒
leaderChangedReceived = true
if (leaderOption == selfAddressOption && memberCount == 1)
// alone, leader immediately
gotoLeader(None)
else if (leaderOption == selfAddressOption)
goto(BecomingLeader) using BecomingLeaderData(None)
else
goto(NonLeader) using NonLeaderData(leaderOption)
}
The part to look at here is:
else if (leaderOption == selfAddressOption)
goto(BecomingLeader) using BecomingLeaderData(None)
To me, it looks like this piece is saying "If I'm the leader, change start to Become Leader with None as the previousLeader option"
Then, if you look at the BecomingLeader state:
when(BecomingLeader) {
...
case Event(HandOverRetry(count), BecomingLeaderData(previousLeaderOption)) ⇒
if (count <= maxHandOverRetries) {
logInfo("Retry [{}], sending HandOverToMe to [{}]", count, previousLeaderOption)
previousLeaderOption foreach { peer(_) ! HandOverToMe }
setTimer(HandOverRetryTimer, HandOverRetry(count + 1), retryInterval, repeat = false)
} else if (previousLeaderOption forall removed.contains) {
// can't send HandOverToMe, previousLeader unknown for new node (or restart)
// previous leader might be down or removed, so no TakeOverFromMe message is received
logInfo("Timeout in BecomingLeader. Previous leader unknown, removed and no TakeOver request.")
gotoLeader(None)
} else
throw new ClusterSingletonManagerIsStuck(
s"Becoming singleton leader was stuck because previous leader [${previousLeaderOption}] is unresponsive")
}
This is the block that keeps repeating that message you are seeing in the log. It basically looks like it's attempting to get a previous leader to hand over responsibility to w/o knowing who the previous leader was because in the state transition, it passed in None as the previous leader. The million dollar question is "If it doesn't know who the previous leader is, why keep attempting handoffs that will never succeed?".