Can someone describe in very simple terms how you would scale up a service (lets assume the service is very simple and is the function X() ).
To make this scalable would you just fire off a new node (upto a maximum depending on your hardware) for each client who wants to run X?
So if I had four hardware boxes, I may fire up to four nodes to run service X(), on the 5th client request I would just run X() on the first node, the 6th client on the second node etc?
Following on from this, I know how to spawn processes locally, but how would you get both the 1st and 5th clients to use the same Node 1- would it be by spawning a process remotely on the Node for the Client each time?
Any simple examples are most welcome!
This depends very much on what X is. If X is fully independent, for instance x() -> 37. then you don't even need to connect your nodes. Simply place some standard Load Balancer in front of your system (HAProxy, Varnish, etc) and then forget about any kind of distributed communication. In fact, there is no need to use Erlang for that. Replace Erlang with some other language of your choice. It is equally good.
Where Erlang shines is when several X functions have dependencies on each others result and when the X might live on another physical machine. In that case Erlang can communicate with the other X seamlessly, even if it lives on a different node.
If you want to implement a round-robin scheme in Erlang, the easiest way is to have a single point of entry and then let it forward the requests out to multiple nodes. But this is bad if there is a pattern where a certain node ends up with all the long-running processes. You need to build a mechanism of feedback so you know how to weight the round-robin queue.
Related
I've just learned about CAF, the C++ Actor Framework.
The one thing that surprised me is that the way to make an actor available over the network is to "publish" it to a specific TCP port.
This basically means that the number of actors that you can publish is limited by the number of ports you have ( 64k ). Since you need both one port to publish an actor and one port to access a remote actor, I assume that two processes would each be able to share at best about 32k actors each, while they could probably each hold a million actors on a commodity server. This would be even worse, if the cluster had, say, 10 nodes.
To make the publishing scalable, each process should only need to open 1 port, for each and every actor in one system, and open 1 connection to each actor system that they want to access.
Is there a way to publish one actor as a proxy for all actors in an actor system ( preferably without any significant performance loss )?
Let me add some background. The middleman::publish/middleman::remote_actor function pair does two things: connecting two CAF instances and giving you a handle for communicating to a remote actor. The actor you "publish" to a given port is meant to act as an entry point. This is a convenient rendezvous point, nothing more.
All you need to communicate between two actors is a handle. Of course you need to somehow learn new handles if you want to talk to more actors. The remote_actor function is simply a convenient way to implement a rendezvous between two actors. However, after you learn the handle you can freely pass it around in your distributed system. Actor handles are network transparent.
Also, CAF will always maintain a single TCP connection between two actor system. If you publish 10 actors on host A and "connect" to all 10 actors from host B via remote_actor, you'll see that CAF will initially open 10 connections (because the target node could run multiple actor system) but all but one connection will get closed.
If you don't care about the rendezvous for actors offered by publish/remote_actor then you can also use middleman::open and middleman::connect instead. This will only connect two CAF instances without exchanging actor handles. Instead, connect will return a node_id on success. This is all you need for some features. For example remote spawning of actors.
Is there a way to publish one actor as a proxy for all actors in an actor system ( preferably without any significant performance loss )?
You can publish one actor at a port that's sole purpose it is to model a rendezvous point. If that actor sends 1000 more actor handles to a remote actor this will not cause any additional network connections.
Writing a custom actor that explicitly models the rendezvous between multiple systems by offering some sort dictionary is the recommended way.
Just for the sake of completeness: CAF also has a registry mechanism. However, keys are limited to atom values, i.e., 10-characters-or-less. Since the registry is generic it also only stores strong_actor_ptr and leaves type safety to you. However, if that's all you need: you put handles to the registry (see actor_system::registry) and then access this registry remotely via middleman::remote_lookup (you only need a node_id to do this).
Smooth scaling with ( almost ) no limits is alpha & omega
One way, used in agent-based systems ( not sure if CAF has implemented tools for going this way ) is to use multiple transport-classes { inproc:// | ipc:// | tcp:// | .. | vmci:// } and thus be able to pick from, on an as needed basis.
While building a proxy may sound attractive, welding together two different actor-models one "atop" the other does not mean that it is as simple to achieve as it sounds ( eventloops are fragile to get tuned / blocking-prevented / event-handled in a fair manner - the do not like any other master to try to take their own Hat ... ).
In case CAF provides at the moment no other transport-means but TCP:
still one may resort to use O/S-level steps and measures and harness the features of the ISO-OSI-model up to the limits or as necessary:
sudo ip address add 172.16.100.17/24 dev eth0
or better, make the additional IP-addresses permanent - i.e. edit the file /etc/network/interfaces ( or Ubuntu ) and add as many stanzas, so that it looks like:
iface eth0 inet static
address 172.16.100.17/24
iface eth0 inet static
address 172.16.24.11/24
This way the configuration-space could get extended for cases the CAF does not provide any other means for such actors but the said TCP (address:port#)-transport-class.
I'm trying to find a workaround to the following limitation: When starting an Akka Cluster from scratch, one has to make sure that the first seed node is started. It's a problem to me, because if I have an emergency to restart all my system from scratch, who knows if the one machine everything relies on will be up and running properly? And I might not have the luxury to take time changing the system configuration. Hence my attempt to create the cluster manually, without relying on a static seed node list.
Now it's easy for me to have all Akka systems registering themselves somewhere (e.g. a network filesystem, by touching a file periodically). Therefore when starting up a new system could
Look up the list of all systems that are supposedly alive (i.e. who touched the file system recently).
a. If there is none, then the new system joins itself, i.e. starts the cluster alone. b. Otherwise it tries to join the cluster with Cluster(system).joinSeedNodes using all the other supposedly alive systems as seeds.
If 2. b. doesn't succeed in reasonable time, the new system tries again, starting from 1. (looking up again the list of supposedly alive systems, as it might have changed in the meantime; in particular all other systems might have died and we'd ultimately fall into 2. a.).
I'm unsure how to implement 3.: How do I know whether joining has succeeded or failed? (Need to subscribe to cluster events?) And is it possible in case of failure to call Cluster(system).joinSeedNodes again? The official documentation is not very explicit on this point and I'm not 100% how to interpret the following in my case (can I do several attempts, using different seeds?):
An actor system can only join a cluster once. Additional attempts will
be ignored. When it has successfully joined it must be restarted to be
able to join another cluster or to join the same cluster again.
Finally, let me precise that I'm building a small cluster (it's just 10 systems for the moment and it won't grow very big) and it has to be restarted from scratch now and then (I cannot assume the cluster will be alive forever).
Thx
I'm answering my own question to let people know how I sorted out my issues in the end. Michal Borowiecki's answer mentioned the ConstructR project and I built my answer on their code.
How do I know whether joining has succeeded or failed? After issuing Cluster(system).joinSeedNodes I subscribe to cluster events and start a timeout:
private case object JoinTimeout
...
Cluster(context.system).subscribe(self, InitialStateAsEvents, classOf[MemberUp], classOf[MemberLeft])
system.scheduler.scheduleOnce(15.seconds, self, JoinTimeout)
The receive is:
val address = Cluster(system).selfAddress
...
case MemberUp(member) if member.address == address =>
// Hooray, I joined the cluster!
case JoinTimeout =>
// Oops, couldn't join
system.terminate()
Is it possible in case of failure to call Cluster(system).joinSeedNodes again? Maybe, maybe not. But actually I simply terminate the actor system if joining didn't succeed and restart it for another try (so it's a "let it crash" pattern at the actor system level).
You don't need seed-nodes. You need seed nodes if you want the cluster to auto-start up.
You can start your individual application and then have them "manually" join the cluster at any point in time. For example, if you have http enabled, you can use the akka-management library (or implement a subset of it yourself, they are all basic cluster library functions just nicely wrapped).
I strongly discourage the touch approach. How do you sync on the touch reading / writing between nodes? What if someone reads a transient state (while someone else is writing it) ?
I'd say either go full auto (with multiple seed-nodes), or go full "manual" and have another system be in charge of managing the clusterization of your nodes. By that I mean you start them up individually, and they join the cluster only when ordered to do so by the external supervisor (also very helpful to manage split-brains).
We've started using Constructr extension instead of the static list of seed-nodes:
https://github.com/hseeberger/constructr
This doesn't have the limitation of a statically-configured 1st seed-node having to be up after a full cluster restart.
Instead, it relies on a highly-available lookup service. Constructr supports etcd natively and there are extensions for (at least) zookeeper and consul available. Since we already have a zookeeper cluster for kafka, we went for zookeeper:
https://github.com/typesafehub/constructr-zookeeper
Well my problem is the following. I have a piece of code that runs on several virtual machines, and each virtual machine has N interfaces(a thread per each). The problem itself is receiving a message on one interface and redirect it through another interface in the fastest possible manner.
What I'm doing is, when I receive a message on one interface(Unicast), calculate which interface I want to redirect it through, save all the information about the message(Datagram, and all the extra info I want) with a function I made. Then on the next iteration, the program checks if there are new messages to redirect and if it is the correct interface reading it. And so on... But this makes the program exchange information very slowly...
Is there any mechanism that can speed things up?
Somebody has already invented this particular wheel - it's called MPI
Take a look at either openMPI or MPICH
Why don't you use queuing? As the messages come in, put them on a queue and notify each processing module to pick them up from the queue.
For example:
MSG comes in
Module 1 puts it on queue
Module 2,3 get notified
Module 2 picks it up from queue and saved it in the database
In parallel, Module 3 picks it up from queue and processes it
The key is "in parallel". Since these modules are different threads, while Module 2 is saving to the db, Module 3 can massage your message.
You could use JMS or MQ or make your own queue.
It sounds like you're trying to do parallel computing across multiple "machines" (even if virtual). You may want to look at existing protocols, such as MPI - Message Passing Interface to handle this domain, as they have quite a few features that help in this type of scenario
I am coding a workload scheduler. I would like my piece of software to be a peer-to-peer scheduler, ie. a node only knows some neighbours (other nodes) and use them to reach other nodes.
Each node would have its own weighted-routing table to send messages to other peers (basically based on the number of hops), ie. "I want the master to give me my schedule" or "is resource A available on node B ?" : which neighbor is the closest to my target ?
For instance I have written my own routing protocol using XML-RPC (xmlrpc-c) and std::multimaps / std::maps.
I am thinking of using ZeroMQ to optimze my data streams :
queueing can reduce the network load between peers ;
subscriptions can be used to publish upgrades.
As a consequence :
I would need to open as many sockets as I would create new types of connections ;
Each node would need to be a client, a server, a publisher, a subscriber, a broker and a directory ;
I am not sure that my "peer-to-peer architecture" is compatible with the main purpose of ZeroMQ.
Do you think that ZeroMQ can be a helpful concept to use ?
It would be helpful to know exactly what you mean by "routing protocol".
That sounds like you mean the business logic of routing to a particular peer.
Knowing more fully what you're looking to achieve with ZeroMQ would also be helpful.
Have you read the ZeroMQ Guide?
ZeroMQ is a pretty different beast and without spending some time to play with it, you'll
likely find yourself confused. As a bonus, reading the guide will also help you answer
this question for yourself, since you know your requirements better.
ZeroMQ was designed to build robust distributed and multi-threaded applications. Since distributed applications can often take the form of "peer-to-peer", ZeroMQ could indeed be a good fit for your needs.
could anyone give some advice for how to implement a master machine controlling some slave machines via C++?
I am trying to implement a simple program that can distribute tasks from master to slaves. It is easy to implement one master + one slave machine. However, when there are more than one slave machine, I don't know how to design.
If the solution can be used for both Linux and Windows, it would be much better.
You use should a framework rather than make your own. What you need to search for is Cluster Computing. one that might work easily is Boost.MPI
With n-machines, you need to keep track of which ones are free, and if there are none, load across your slaves (i.e. how many tasks have been queued up at each) and then queue on the lowest loaded machine (or whichever your algorithm deems best), say better hardware means that some slaves perform better than others etc. I'd start with a simple distribution algorithm, and then tweak once it's working...
More interesting problems will arise in exceptional circumstances (i.e. slaves dying, and various such issues.)
I would use an existing messaging bus to make your life easier (rather than re-inventing), the real intelligence is in the distribution algorithm and management of failed nodes.
We need to know more, but basically you just need to make sure the slaves don't block each other. Details of doing that in C++ will get involved, but the first thing to do is ask yourself what the algorithm is. The simplest case is going to be if you don't care about waiting for the slaves, in which case you have
while still tasks to do
launch a task on a slave
If you have to have just one job running on a slave then you'll need something like an array of flags, one per slave
slaves : array 0 to (number of slaves - 1)
initialize slaves to all FALSE
while not done
find the first FALSE slave -- it's not in use
set that slave to TRUE
launch a job on that slave
check for slaves that are done
set that slave to FALSE
Now, if you have multiple threads, you can make that into two threads
while not done
find the first FALSE slave -- it's not in use
set that slave to TRUE
launch a job on that slave
while not done
check for slaves that are done
set that slave to FALSE