Sending parallel requests Erlang - concurrency

I am implementing a Twitter-like application in Erlang. I have both its distributed and non-distributed implementations. I am doing a benchmark but it seems I cannot find a way to send parallel requests to each user process for the distributed implementation. I am using a lists:foreach function to send "get tweets" to a list of client processes.My understanding is that the lists:foreach function steps into each element of the list one at a time realizing a sequential behavior which ultimately makes my distributed implementation result in an equal execution time with the non-distributed implementation. Is it possible to send the "get tweets" requests to different client processes all at once? This to me seems like a rather specific case and it has been difficult to search for a solution inside and outside StackOverflow.
test_get_tweets_Bench() ->
{ServerPid, UserInfos} = initializeForBench_server(),
run_benchmark("timeline",
fun () ->
lists:foreach(fun (_) ->
UserChoice = pick_random(UserInfos),
server:get_tweets(element(2, UserChoice), element(1, UserChoice), 1)
end,
lists:seq(1, 10000))
end,
30).
pick_random(List) ->
lists:nth(rand:uniform(length(List)), List).
userinfos is a list of the following form: [{userId,client_process},...]
After trying rpc:pmap instead of the lists:foreach, my benchmark has become approximately 3 times slower. The changes are as follows:
test_get_tweets_Bench2() ->
{ServerPid, UserInfos} = initializeForBench_server(),
run_benchmark("get_tweets 2",
fun () ->
rpc:pmap({?MODULE,do_apply},
[fun (_) ->
UserChoice = pick_random(UserInfos),
server:get_tweets(element(2, UserChoice), element(1, UserChoice), 1)
end],
lists:seq(1, 10000))
end,
30).
pick_random(List) ->
lists:nth(rand:uniform(length(List)), List).
do_apply(X,F)->
F(X).
I thought rpc:pmap would make my benchmark faster as it would send the get_tweet requests in parallel.
Below is my server module which is the API between my benchmark and my Twitter-like application. The API sends the requests from my benchmark to my Twitter-like application.
%% This module provides the protocol that is used to interact with an
%% implementation of a microblogging service.
%%
%% The interface is design to be synchrounous: it waits for the reply of the
%% system.
%%
%% This module defines the public API that is supposed to be used for
%% experiments. The semantics of the API here should remain unchanged.
-module(server).
-export([register_user/1,
subscribe/3,
get_timeline/3,
get_tweets/3,
tweet/3]).
%%
%% Server API
%%
% Register a new user. Returns its id and a pid that should be used for
% subsequent requests by this client.
-spec register_user(pid()) -> {integer(), pid()}.
register_user(ServerPid) ->
ServerPid ! {self(), register_user},
receive
{ResponsePid, registered_user, UserId} -> {UserId, ResponsePid}
end.
% Subscribe/follow another user.
-spec subscribe(pid(), integer(), integer()) -> ok.
subscribe(ServerPid, UserId, UserIdToSubscribeTo) ->
ServerPid ! {self(), subscribe, UserId, UserIdToSubscribeTo},
receive
{_ResponsePid, subscribed, UserId, UserIdToSubscribeTo} -> ok
end.
% Request a page of the timeline of a particular user.
% Request results can be 'paginated' to reduce the amount of data to be sent in
% a single response. This is up to the server.
-spec get_timeline(pid(), integer(), integer()) -> [{tweet, integer(), erlang:timestamp(), string()}].
get_timeline(ServerPid, UserId, Page) ->
ServerPid ! {self(), get_timeline, UserId, Page},
receive
{_ResponsePid, timeline, UserId, Page, Timeline} ->
Timeline
end.
% Request a page of tweets of a particular user.
% Request results can be 'paginated' to reduce the amount of data to be sent in
% a single response. This is up to the server.
-spec get_tweets(pid(), integer(), integer()) -> [{tweet, integer(), erlang:timestamp(), string()}].
get_tweets(ServerPid, UserId, Page) ->
ServerPid ! {self(), get_tweets, UserId, Page},
receive
{_ResponsePid, tweets, UserId, Page, Tweets} ->
Tweets
end.
% Submit a tweet for a user.
% (Authorization/security are not regarded in any way.)
-spec tweet(pid(), integer(), string()) -> erlang:timestamp().
tweet(ServerPid, UserId, Tweet) ->
ServerPid ! {self(), tweet, UserId, Tweet},
receive
{_ResponsePid, tweet_accepted, UserId, Timestamp} ->
Timestamp
end.

In Erlang, a message is exchanged form a process A to a process B. There is no feature available like a broadcast, or a selective broadcast. In your application I see 3 steps:
send a request to get the tweets from the users,
the user process prepare the answer and send it back to the requester
the initial process collects the answers
Sending the requests to the user processes and collecting the tweets (steps 1 and 3) cannot use parallelism. Of course you can use multiple processes to send the requests and collect the answers, up to 1 per user, but I guess that it is not the subject of your question.
What is feasible, is to ensure that the 3 steps are not done in sequence for each user process, but in parallel. I guess that the function server:get_tweets is responsible to send the request and collect the answers. If I am correct (I cannot know since You don't provide the code, and you ignore the returned values), you can use parallelism by splitting this function in 2, the first send the requests, the second collects the answers. (here is an example of code, I don't have tried or even compiled, so consider it with care :o)
test_get_tweets_Bench() ->
{ServerPid, UserInfos} = initializeForBench_server(),
run_benchmark("timeline",
fun () ->
% send the requests
List = lists:map(fun (_) ->
{UserId,Pid} = pick_random(UserInfos),
Ref = server:request_tweets(Pid,UserId),
{Ref,UserId}
end,
lists:seq(1, 10000)),
% collects the answers
collect(L,[])
end,
30).
collect([],Result) -> {ok,Result};
collect(List,ResultSoFar) ->
receive
{Ref,UserId,Tweets} ->
{ok,NewList} = remove_pending_request(Ref,UserId,List),
collect(Newlist,[{UserId,Tweets}|ResultSoFar])
after ?TIMEOUT
{error,timeout,List,ResultSoFar}
end.
remove_pending_request(Ref,UserId,List) ->
{value,{Ref,UserId},NewList} = lists:keytake(Ref,1,List),
{ok,NewList}.
pick_random(List) ->
lists:nth(rand:uniform(length(List)), List).
This is my other attempt at implementing a parallel benchmark which does not achieve any speed up.
get_tweets(Sender, UserId, Node) ->
server:get_tweets(Node, UserId, 0),
Sender ! done_get_tweets.
test_get_tweets3() ->
{_ServerId, UserInfos} = initializeForBench_server(),
run_benchmark("parallel get_tweet",
fun () ->
lists:foreach(
fun (_) ->
{UserId,Pid} = pick_random(UserInfos),
spawn(?MODULE, get_tweets, [self(), UserId, Pid])
end,
lists:seq(1, ?NUMBER_OF_REQUESTS)),
lists:foreach(fun (_) -> receive done_get_tweets -> ok end end, lists:seq(1, ?NUMBER_OF_REQUESTS))
end,
?RUNS).

Related

Why my interface for learnyousomeerlang trade_fsm is failing?

I'm slowly learning erlang language using learnyousomeerlang site and I'm currently at "Rage Against The Finite-State Machines" chapter, which builds and describes how trade_fsm.erl works. As a part of my learning process I've decided to write an interface for this system, where you can control both trading sides by typing console commands. I think I've done a decent job at writing that, however for some reason I cannot understand, whenever I try to start trading, the clients crash. Here's how it goes:
5> z3:init("a", "b").
true
6> z3:display_pids().
First player pid: {<0.64.0>}
Second player pid: {<0.65.0>}.
done
7> z3:p1_propose_trade().
{a}: asking user <0.65.0> for a trade
{b}: <0.64.0> asked for a trade negotiation
done
8> z3:display_pids().
done
9>
And here's my code:
-module(z3).
-compile(export_all).
-record(state, {player1,
player2,
p1items=[],
p2items=[],
p1state,
p2state,
p1name="Carl",
p2name="FutureJim"}).
init(FirstName, SecondName) ->
{ok, Pid1} = trade_fsm:start_link(FirstName),
{ok, Pid2} = trade_fsm:start_link(SecondName),
S = #state{p1name=FirstName, p2name=SecondName,
player1=Pid1, player2=Pid2,
p1state=idle, p2state=idle},
register(?MODULE, spawn(?MODULE, loop, [S])).
display_pids() ->
?MODULE ! display_pids,
done.
p1_propose_trade() ->
?MODULE ! {wanna_trade, p1},
done.
p2_accept_trade() ->
?MODULE ! {accept_trade, p2},
done.
loop(S=#state{}) ->
receive
display_pids ->
io:format("First player pid: {~p}~nSecond player pid: {~p}.~n", [S#state.player1, S#state.player2]),
loop(S);
{wanna_trade, Player} ->
case Player of
p1 ->
trade_fsm:trade(S#state.player1, S#state.player2);
p2 ->
trade_fsm:trade(S#state.player2, S#state.player1);
_ ->
io:format("[Debug:] Invalid player.~n")
end,
loop(S);
{accept_trade, Player} ->
case Player of
p1 ->
trade_fsm:accept_trade(S#state.player1);
p2 ->
trade_fsm:accept_trade(S#state.player2);
_ ->
io:format("[Debug:] Invalid player.~n")
end,
loop(S);
_ ->
io:format("[Debug:] Received invalid command.~n"),
loop(S)
end.
Can anyone tell me why this code fails and how it should be implemented?
when you call z3:p1_propose_trade(). it sends the message {wanna_trade, p1} to registered process z3.
The message is interpreted in the loop function which calls trade_fsm:trade(S#state.player1, S#state.player2); converted into gen_fsm:sync_send_event(S#state.player1, {negotiate, S#state.player2}, 30000).. This call is a synchronous call which is waiting for a reply from the fsm, and which timeout after 30 seconds if it did not receive any answer.
In the state wait, you have caught the message in the statement:
idle({negotiate, OtherPid}, From, S=#state{}) ->
ask_negotiate(OtherPid, self()),
notice(S, "asking user ~p for a trade", [OtherPid]),
Ref = monitor(process, OtherPid),
{next_state, idle_wait, S#state{other=OtherPid, monitor=Ref, from=From}};
No reply value is returned to the caller. You should have used in the last line something like
{reply, Reply, idle_wait, S#state{other=OtherPid, monitor=Ref, from=From}};
or an explicit call to gen_fsm:reply/2.
I didn't dig too much in the code, but if you change it to:
idle({negotiate, OtherPid}, From, S=#state{}) ->
Reply = ask_negotiate(OtherPid, self()),
notice(S, "asking user ~p for a trade", [OtherPid]),
Ref = monitor(process, OtherPid),
{reply, Reply, idle_wait, S#state{other=OtherPid, monitor=Ref, from=From}};
it doesn't stop and seems to work properly.
Maybe some one knowing perfectly the behavior of the gen_fsm can give an explanation of what is going behind the scene (why is there nothing printout when the timeout ends, why the shell is ready for a new command while it should be waiting for an answer?):
If you call manually the function trade(OwnPid, OtherPid) you will see that it doesn't return until the 30 second timeout is reached, and then you get an error message.
when it is called by z3:p1_propose_trade()., after 30 seconds the error message is not shown but the registered process z3 dies.
[EDIT]
I have checked how the code should work, and, in fact, it doesn't seem necessary to modify the fsm code. The reply should come from the process 2, when the second user accept to negotiate. So you can't do the test this way (loop is waiting for an answer, and it cannot send the accept_trade). here is a session that works:
{ok,P1} = trade_fsm:start("a1").
{ok,P2} = trade_fsm:start("a2").
T = fun() -> io:format("~p~n",[trade_fsm:trade(P1,P2)]) end.
A = fun() -> io:format("~p~n",[trade_fsm:accept_trade(P2)]) end.
spawn(T). % use another process to avoid the shell to be locked
A().
You can change the "wanna_trade" interface to avoid the blocking issue
{wanna_trade, Player} ->
case Player of
p1 ->
spawn(fun() -> trade_fsm:trade(S#state.player1, S#state.player2) end);
p2 ->
spawn(fun() -> trade_fsm:trade(S#state.player2, S#state.player1) end);
_ ->
io:format("[Debug:] Invalid player.~n")
end,
loop(S);

how can multiple processes use one common list concurrently in Erlang?

I understand that Erlang is all about concurrency and we use spawn/spawn_link to create a process what I don't understand is how can all processes use one common list of users concurrently? say an ordict/dict storage.
What I am trying to do is;
1. A Spawned User process Subscribes/Listens to registered process A
2. Registered process A stores {Pid, Userid} of all online users
3. When some user sends a message user's process asks process A wether recipient is online or not.
sending a message in erlang is asynchronous but is it also asynchronous when a user is being sent messages by multiple users?
You can make process A a gen_server process and keep any data structure storing online users as the process state. Storing a new user or deleting one could be done with gen_server:cast/2, and checking to see if a user is online could be done with gen_server:call/2. Alternatively, you could have a gen_server create a publicly-readable ets table to allow any process to read it to check for online users, but storing and deleting would still require casts to the gen_server. You could even make the table publicly readable and writable so that any process could store, delete, or check users. But keep in mind that an ets table is by default destroyed when the process that creates it dies, so if you need it to stay around even if the gen_server that created it dies, you must arrange for it to be inherited by some other process, or give it to a supervisor.
A serious solution should use the OTP behaviors (gen_server, supervisor...) as suggested by Steve.
Anyway I wrote a small example module that implement both a server and clients and that can be started on one node using the command erl -sname test for example (or several nodes using erl -sname node1, erl -sname node2...) .
it includes also an example of a shell session that illustrates most of the cases, I hope it can help you to follow the exchanges, synchronous or asynchronous between processes.
NOTE : the access to the user list is not concurrent, it is not possible if the list is owned by a server process like it is in this example. It is why Steve propose to use an ETS to store the information and do real concurrent accesses. I have tried to write the example with interfaces that should allow a quick refactoring with ETS instead of tuple list.
-module(example).
-export([server/0,server_stop/1,server_register_name/2,server_get_address/2, server_quit/2, % server process and its interfaces
client/1,quit/1,register_name/2,get_address/2,send_message/3,print_messages/1, % client process and its interfaces
trace/0]). % to call the tracer for a nice message view
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Client interface
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
client(Node) ->
% connect the current node to the servernode given in parameter
% it will fail if the connection cannot be established
true = net_kernel:connect_node(Node),
% spawn a client process
spawn(fun () -> client([],unregistered,{server,Node}) end).
register_name(ClientPid,Name) ->
% use a helper to facilitate the trace of everything
send_trace(ClientPid,{register_name,self(),Name}),
% wait for an answer, it is then a synchronous call
receive
% no work needed, simply return any value
M -> M
after 1000 ->
% this introduce a timeout, if no answer is received after 1 second, consider it has failed
no_answer_from_client
end.
get_address(ClientPid,UserName) ->
send_trace(ClientPid,{get_address,self(),UserName}),
% wait for an answer, it is then a synchronous call
receive
% in this case, if the answer is tagged with ok, extract the Value (will be a Pid)
{ok,Value} -> Value;
M -> M
after 1000 ->
no_answer_from_client
end.
send_message(ClientPid,To,Message) ->
% simply send the message, it is asynchronous
send_trace(ClientPid,{send_message,To,Message}).
print_messages(ClientPid) ->
send_trace(ClientPid,print_messages).
quit(ClientPid) ->
send_trace(ClientPid,quit).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% client local functions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
client(Messages,Name,Server) ->
receive
{register_name,From,UserName} when Name == unregistered ->
% if not yet registered send the request to the server and
% backward the answer to the requester
Answer = server_register_name(Server,UserName),
send_trace(From,Answer),
NName = case Answer of
registered -> UserName;
_ -> Name
end,
client(Messages,NName,Server);
{register_name,From,_} ->
% if already registered reject the request
send_trace(From,{already_registered_as,Name}),
client(Messages,Name,Server);
{get_address,From,UserName} when Name =/= unregistered ->
Answer = server_get_address(Server,UserName),
send_trace(From,Answer),
client(Messages,Name,Server);
{send_message,To,Message} ->
% directly send the message to the user, the server is not concerned
send_trace(To,{new_message,{erlang:date(),erlang:time(),Name,Message}}),
client(Messages,Name,Server);
print_messages ->
% print all mesages and empty the queue
do_print_messages(Messages),
client([],Name,Server);
quit ->
server_quit(Server,Name);
{new_message,M} ->
% append the new message
client([M|Messages],Name,Server);
_ ->
client(Messages,Name,Server)
end.
do_print_messages(Messages) ->
lists:foreach(fun({D,T,W,M}) -> io:format("from ~p, at ~p on ~p, received ~p~n",[W,T,D,M]) end,Messages).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Server interface
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
server() ->
true = register(server,spawn(fun () -> server([]) end)),
node().
server_stop(Server) ->
send_trace(Server,stop).
server_register_name(Server,User) ->
send_trace(Server,{register_name,self(),User}),
receive
M -> M
after 900 ->
no_answer_from_server
end.
server_get_address(Server,User) ->
send_trace(Server,{get_address,self(),User}),
receive
M -> M
after 900 ->
no_answer_from_server
end.
server_quit(Server,Name) ->
send_trace(Server,{quit,Name}).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% server local functions
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
server(Users) ->
receive
stop ->
ok;
{register_name,From,User} ->
case lists:keyfind(User,1,Users) of
false ->
send_trace(From,registered),
server([{User,From}|Users]);
_ ->
send_trace(From,{already_exist,User}),
server(Users)
end;
{get_address,From,User} ->
case lists:keyfind(User,1,Users) of
false ->
send_trace(From,{does_not_exist,User}),
server(Users);
{User,Pid} ->
send_trace(From,{ok,Pid}),
server(Users)
end;
{quit,Name} ->
server(lists:keydelete(Name,1,Users))
end.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% global
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
trace() ->
% start a collector, a viewer and trace the "trace_me" ...
et_viewer:start([{trace_global, true}, {trace_pattern, {et,max}},{max_actors,20}]).
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% helpers
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
send_trace(To,Message) ->
% all messages will be traced by "et"
et:trace_me(50,self(),To,Message,[]),
To ! Message.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% shell commands
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% c(example).
% example:trace().
% N = node().
% C1 = example:client(N).
% example:register_name(pid(0,5555,0),"fails").
% example:register_name(C1,"fails_again").
% example:server().
% example:register_name(C1,"Joe").
% C2 = example:client(N).
% example:register_name(C2,"Bob").
% example:print_messages(C1).
% C2 = example:get_address(C1,"Bob").
% example:send_message(C1,C2,"Hi Bob!").
% example:send_message(C1,C2,"Hi Bob! are you there?").
% example:print_messages(C2).
% example:send_message(C2,C1,"Hi Joe! Got your message.").
% example:print_messages(C2).
% example:print_messages(C1).
% example:quit(C1).
% example:get_address(C2,"Joe").
% example:server_stop({server,N}).
% example:get_address(C2,"Joe").
% example:get_address(C1,"Bob").
here an extract of the event viewer:

Infinite loop in Erlang process

I'm very new to Erlang and tried to implement a simple class that has some methods to simulate a database. insert() just inserts a key -> value in the process map, and retrieve() just returns the value from the map. However, I am getting stuck in the loop(). What am I doing wrong?
-module(db).
-export([start/0,stop/0,retrieve/1,insert/2]).
start() ->
register(db, spawn(fun() ->
loop()
end)
),
{started}.
insert(Key, Value) ->
rpc({insert, Key, Value}).
retrieve(Key) ->
rpc({retrieve, Key}).
stop() ->
rpc({stop}).
rpc(Request) ->
db ! {self(), Request},
receive
{db, Reply} ->
Reply
end.
loop() ->
receive
{rpc, {insert, Key, Value}} ->
put(Key, Value),
rpc ! {db, done},
loop();
{rpc, {retrieve, Key}} ->
Val = get(Key),
rpc ! {db, Val},
loop();
{rpc, {stop}} ->
exit(db,ok),
rpc ! {db, stopped}
end.
So, after compiling:
I first call db:start().
and then when trying db:insert("A", 1)., it gets stucked.
Thank you
The problem is in loop/0 function. You're using rpc atom to pattern match the messages received ({rpc, {insert, Key, Value}}), but, as you can see on rpc/1 function, you always send messages with the format {self(), Request} to db process.
self() function returns a PID in the format <X.Y.Z>, which will never match against the atom rpc
For example, let's say you're trying to insert some data using the function insert/2 and self() would return the PID <0.36.0>. When rpc/1 sends the message, on the line db ! {self(), {insert, Key, Value}}, loop/0 will receive {<0.36.0>, {insert, Key, Value}} message, which will never match against {rpc, {insert, Key, Value}}, because rpc is an atom.
The solution is to change rpc atom to a variable, like this:
loop() ->
receive
{Rpc, {insert, Key, Value}} ->
put(Key, Value),
Rpc ! {db, done},
loop();
{Rpc, {retrieve, Key}} ->
Val = get(Key),
Rpc ! {db, Val},
loop();
{Rpc, {stop}} ->
Rpc ! {db, stopped},
exit(whereis(db),ok)
end.
Erlang variables start with capital letters, that's why I used Rpc, instead of rpc.
P.S.: Actually, you had two other problems:
In the last part of loop/0, where you handle stop message, you call exit(db, ok) before you actually answer to rpc. In that case, you'd never receive the {db, stopped} message back from db process, which would be dead by that time. That's why I've changed the order, putting the exit/2 call after Rpc ! {db, stopped}.
When you call exit/2, you were passing db, which is an atom, as the first argument, but exit/2 function expects an PID as first argument, which would raise a badarg error. That's why I've changed it to exit(whereis(db), ok).
Let's walk through this a bit more carefully. What do you mean by "rpc"? "Remote Procedure Call" -- sure. But everything in Erlang is an rpc, so we tend not to use that term. Instead we distinguish between synchronous messages (where the caller blocks, waiting on a response) and aynchronous messages (where the caller just fires off a message and runs off without a care in the world). We tend to use the term "call" for a synch message and "cast" for an asynch message.
We can write that easily, as a call looks a lot like your rpc above, with the added idiom in Erlang of adding a unique reference value to tag the message and monitoring the process we sent a message to just in case it crashes (so we don't get left hanging, waiting for a response that will never come... which we'll touch on in your code in a bit):
% Synchronous handler
call(Proc, Request) ->
Ref = monitor(process, Proc),
Proc ! {self(), Ref, Request},
receive
{Ref, Res} ->
demonitor(Ref, [flush]),
Res;
{'DOWN', Ref, process, Proc, Reason} ->
{fail, Reason}
after 1000 ->
demonitor(Ref, [flush]),
{fail, timeout}
end.
Cast is a bit easier:
cast(Proc, Message) ->
Proc ! Message,
ok.
The definition of call above means that the process we are sending to will receive a message of the form {SenderPID, Reference, Message}. Note that this is different than {sender, reference, message}, as lower-case values are atoms, meaning they are their own values.
When we receive messages we are matching on the shape and values of the message received. That means if I have
receive
{number, X} ->
do_stuff(X)
end
in my code and the process sitting in that receive get a message {blah, 25} it will not match. If it receives another message {number, 26} then it will match, that receive will call do_stuff/1 and the process will continue on. (These two things -- the difference between atoms and Variables and the way matching in receive works -- is why your code is hanging.) The initial message, {blah, 25} will still be in the mailbox, though, at the front of the queue, so the next receive has a chance to match on it. This property of mailboxes is immensely useful sometimes.
But what does a catch-all look like?
Above you are expecting three kinds of messages:
{insert, Key, Value}
{retrieve, Key}
stop
You dressed them up differently, but that's the business end of what you are trying to do. Running the insert message through the call/2 function I wrote above it would wind up looking like this: {From, Ref, {insert, Key, Value}}. So if we expect any response from the process's receive loop we will need to match on that exact form. How do we catch unexpected messages or badly formed ones? At the end of the receive clause we can put a single naked variable to match on anything else:
loop(State) ->
receive
{From, Ref, {insert, Key, Value}} ->
NewState = insert(Key, Value, State),
From ! {Ref, ok},
loop(NewState);
{From, Ref, {retrieve, Key}} ->
Value = retrieve(Key, State),
From ! {Ref, {ok, Value}},
loop(State);
{From, Ref, stop} ->
ok = io:format("~tp: ~tp told me to stop!~n", [self(), From]),
From ! {Ref, shutting_down},
exit(normal)
Unexpected ->
ok = io:format("~tp: Received unexpected message: ~tp~n",
[self(), Unexpected]),
loop(State)
end.
You will notice that I am not using the process dictionary. DO NOT USE THE PROCESS DICTIONARY. This isn't what it is for. You'll overwrite something important. Or drop something important. Or... bleh, just don't do it. Use a dict or map or gb_tree or whatever instead, and pass it through as the process' State variable. This will become a very natural thing for you once you start writing OTP code later on.
Toy around with these things a bit and you will soon be happily spamming your processes to death.

Handling Cookies in Ocamlnet

I'm trying to write a bot pulling some data which is only available to authenticated users. I settled for ocaml (v. 3.12.1) and ocamlnet (v. 3.6.5). The first part of the script sends a POST request to the website and by the html I receive back, I can tell that the authentication worked (p1 and p2's values in this code sample are obviously not the ones I'm using).
open Http_client
open Nethttp
let pipeline = new pipeline
let () =
let post_call = new post
"http://www.kraland.org/main.php?p=1&a=100"
[("p1", "username");
("p2", "password");
("Submit", "Ok!")]
in
pipeline#add post_call;
pipeline#run();
Then I extract the cookies where the php session id, the account name, a hash of the password, etc. are stored, put them in the header of the next request and run it. And this is where I run into troubles: I systematically get the boring page every anonymous visitor gets.
let cookies = Header.get_set_cookie post_call#response_header in
let get_call = new get "http://www.kraland.org/main.php?p=1" in
let header = get_call#request_header `Base in
Header.set_set_cookie header cookies;
pipeline#add get_call;
pipeline#run();
When I print the content of the cookies, I do get something weird: I would expect the domain of the cookies to be kraland.org but it does not seem to be the case. This is the printing command I use together with the output:
List.iter (fun c -> Printf.printf "%.0f [%s%s:%b] %s := %s\n"
(match c.cookie_expires with None -> -1. | Some f -> f)
(match c.cookie_domain with None -> "" | Some s -> s)
(match c.cookie_path with None -> "" | Some s -> s)
c.cookie_secure c.cookie_name c.cookie_value)
cookies;
-1 [/:false] PHPSESSID := 410b97b0536b3e949df17edd44965926
1372719625 [:false] login := username
1372719625 [:false] id := myid
1372719625 [:false] password := fbCK/0M+blFRLx3oDp+24bHlwpDUy7x885sF+Q865ms=
1372719625 [:false] pc_id := 872176495311
Edit: I had a go at the problem using Haskell's Http-conduit-browser and it works like a charm using something very much like the doc's example.

Getting responses from erlang processes

I have an erlang project that makes a lot of concurrent SOAP requests to my application. Currently, it's limited by how many nodes are available, but I would like to adjust it so that each node can send more than one message at a time.
I've figured that problem out, but I don't know how to get a response back from process running the SOAP request.
This is my function that I'm attempting to use to do multiple threads:
batch(Url, Message, BatchSize) ->
inets:start(),
Threads = for(1, BatchSize, fun() -> spawn(fun() -> attack_thread() end) end),
lists:map(fun(Pid) -> Pid ! {Url, Message, self()} end, Threads).
This function gets called by the person who initiated the stress tester, it is called on every node in our network. It's called continually until all the requested number of SOAP requests have been sent and timed.
This is the attack_thread that is sent the message by the batch method:
attack_thread() ->
receive
{Url, Message, FromPID} ->
{TimeTaken, {ok, {{_, 200, _}, _, _}}} = timer:tc(httpc, request, [post, {Url, [{"connection", "close"}, {"charset", "utf-8"}], "text/xml", Message}, [], []]),
TimeTaken/1000/1000.
end
As you can see, I want it to return the number of seconds the SOAP request took. However, erlang's message passing (Pid ! Message) doesn't return anything useful.
How can I get a result back?
Each of your attack_thread() threads can simply drop a message in the mailbox of the process operating the batch/3 function:
FromPid ! {time_taken, self(), TimeTaken / 1000 / 1000}.
but then you need to collect the results:
batch(Url, Message, BatchSize) ->
inets:start(),
Pids = [spawn_link(fun attack_thread/0) || _ <- lists:seq(1, BatchSize],
[Pid ! {Url, Message, self()} || Pid <- Pids],
collect(Pids).
collect([]) -> [];
collect(Pids) ->
receive
{time_taken, P, Time} ->
[Time | collect(Pids -- [P])]
end.
Some other comments: you probably want spawn_link/1 here. If something dies along the way, you want the whole thing to die. Also, be sure to tune inets httpc a bit so it is more effective. You might also want to look at basho_bench or tsung.
Finally, you can use a closure directly rather than pass the url and message:
attack_thread(Url, Message, From) -> ...
So your spawn is:
Self = self(),
Pids = [spawn_link(fun() -> attack_thread(Url, Message, Self) end) || _ <- ...]
It avoids passing in the message in the beginning.