Akka configuration to distribute Map workload across different network nodes? - mapreduce

I prepared a working Akka/Java implementation of MapReduce akka_mapreduce_example that I currently use in multiple projects.
I now would like to distribute the Map step workload across many separate network nodes and I'm wondering what I have to change in the Akka configuration to achieve exactly that. I would need configuration changes in the "Master" machine (the one triggering the MapReduce that also runs the reduce step) and the "Slave" machines that help distribute the Map workload.
My current application.conf works for a single machine and is quite simple. I'm hoping that this requires only configuration changes and not code changes, or?

Depending on your code, all that is required are configuration changes. You'll need something like this in your config to set remoting up:
akka {
actor {
provider = "akka.remote.RemoteActorRefProvider"
}
remote {
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp {
hostname = "127.0.0.1"
port = 2552
}
}
}
Then you need to tell Akka which of your actors will be created on the remote node:
akka {
actor {
deployment {
/sampleActor {
remote = "akka.tcp://sampleActorSystem#127.0.0.1:2553"
}
}
}
}
For more info you can look up the excellent documentation. I linked the docs for the 2.2.1 version, if you use a different version, make sure to look in the docs for that version. There tend to be changes between the version, and while things from an older version will still work in a newer version (they might be deprecated though), the other way round is obviously a problem.
You'll notice I said "Depending on your code". What I mean by that, is that all your messages that will be send to the remote node need to be Serializable, and that you do not use any static members in your actors.
If you want more flexibility you can check out the brand new Clustering support. This will give you a nice flexible, dynamic Peer-To-Peer system which you can scale up and down as you want. You might need some slight code changes for that.
Hope that helps.

The workload can be distributed across multiple nodes but you have to choose on two modes of processing i.e. either pull or push based processing.
Both have pros and cons. But pull is more attractive as it provides fault tolerance and you can track assignment of work to worker actor's. To get started look at http://blog.goconspire.com/post/64901258135/akka-at-conspire-part-5-the-importance-of-pulling.
The sample working code is at https://github.com/typesafehub/activator-akka-distributed-workers.
Handling fast producer and slow consumer problem akka streaming's back pressure can be used.
For dynamic creation of work actor's in case of extreme load on system's, you have to come up with your own design. But the solution says to add more nodes to the cluster to distribute work to new nodes.

Related

TraCIScenarioManagerForker vs veins-launchd

I currently use TraCIScenarioManagerForker to spawn SUMO for each simulation, the "forker" method. However, the official VEINS documentation recommends launching the SUMO daemon separately using the veins-launchd script and then run simulations, the "launchd" method.
Using the forker method makes running simulations just a one command job since SUMO is killed when simulation ends. However, with the launchd method, one has to take care of setting up the SUMO daemon and killing it when simulation ends.
What are the advantages and disadvantages of each method? I'm trying to understand the recommended best practices when using VEINS.
Indeed, Veins 5.1 provides three (four, if you count an experimental one) ways of connecting a running OMNeT++ to SUMO:
assuming SUMO is already running and connecting there directly (TraCIScenarioManager)
running SUMO directly from the process - on Linux: as a fork, on Windows: as a process in the same context (TraCIScenarioManagerForker)
connecting to a Proxy (veins_launchd) that launches an isolated instance of SUMO for every client that connects to it (TraCIScenarioManagerLaunchd)
if you are feeling adventurous, the veins_libsumo fork of Veins offers a fourth option: including the SUMO engine directly in your OMNeT++ simulation and using it via method calls (instead of remote procedure calls via a network socket). Contrast, for example, TraCI based code vs. libsumo based code. This can be orders of magnitude faster with none of the drawbacks discussed below. At the time of writing (Mar 2021) this fork is just a proof of concept, though.
Each of these has unique benefits and drawbacks:
is the most flexible: you can connect to a long-running instance of SUMO which is only rolled backwards/forwards in time through the use of snapshots, connect multiple clients to a single instance, etc but
requires you to manually take care of running exactly as many instances of SUMO as you need at exactly the time when you need them
is very convenient, but
requires the simulation (as opposed to the person running the simulation) to "know" how to launch SUMO - so a simulation that works on one machine won't work on another because SUMO might be installed in a different path there etc.
launches SUMO in the directory of the simulation, so file output from multiple SUMO instances overwrites each other and file output is stored in the directory storing the simulation (which might be a slow or write protected disk, etc.)
results in both SUMO and OMNeT++ writing console output into what is potentially the same console window, requiring experience in telling one from the other when debugging crashes (and things get even more messy if one wants to debug SUMO)
does not suffer from any of these problems, but
requires the user to remember starting the proxy before starting the simulations

What is the best way to handle concurrent sharing of configuration data

I am thinking to rewrite a monolithic C/C++ application (intially it was a single C++ executable) and I am trying to design my application to be more modular. I was thinking that I could deliver all my module as dll (or .so object on a linux platform) and compose my application at runtime instead of a single executable. Also I am seeking for modularity, I am doing my best to keep in mind that speed is either important for this application. So my deisgn should be a tradeoff between modularity and performance.
This is an IOT application, aimed to collect various data depending on the geolocalisation of the vehicle.
In my current application, there are 3 components :
AntennaService : this is the main service component of my application. This module would load the others. On every moove of the vehicle, it is querying a ConfigurationDataService to as it to return the most closest geo point indexed in a flat configuration file. And when needed it would fire event to log asynchronously via the log service.
LogService : a module service that is using the pub/sub mecansim to log asynchronously data either locally or online depending on the internet connection.
ConfigurationDataService : this is a module service that could potentially be called simultanisouly by various others component that are querying it about configuration to use. This module is reading some protocol buffers file where readonly configuration data is indexed. And depending on the query criteria asked by the others module, it filters or computer the static configuration data before anserwring.
The problem that I am facing then, is I do not find the best way to model the ConfigurationDataService in order to have fast and optimal response :
should it use lock and critical sections in order to respond to the parallel queries to the others modules ? Or do you think that there could be a far better design.

Tensorflow Setup for Distributed Computing

Can anyone provide guidance on how to setup tensorflow to work on many CPUs across a network? All of the examples I have found thus far use only one local box and multi-gpus at best. I have found that I can pass in a list of targets in the session_opts, but I'm not sure how to setup tensorflow on each box to listen for networked nodes/tasks. Any example would be greatly appreciated!
The open-source version (currently 0.6.0) of TensorFlow supports single-process execution only: in particular, the only valid target in the tensorflow::SessionOptions is the empty string, which means "current process."
The TensorFlow whitepaper describes the structure of the distributed implementation (see Figure 3) that we use inside Google. The basic idea is that the Session interface can be implemented using RPC to a master; and the master can partition the computation across a set of devices in multiple worker processes, which also communicate using RPC. Alas, the current version depends heavily on Google-internal technologies (like Borg), so a lot of work remains to make it ready for external consumption. We are currently working on this, and you can follow the progress on this GitHub issue.
EDIT on 2/26/2016: Today we released an initial version of the distributed runtime to GitHub. It supports multiple machines and multiple GPUs.

ZeroC ICE vs 0MQ/ZeroMQ vs Crossroads IO vs Open Source DDS

How does ZeroC ICE compare to 0MQ? I know that 0MQ/Crossroads and DDS are very similar, but cant seem to figure out where ICE comes in.
I need to quickly implement a system that offloads real-time market-data from C++ to C#, as a first phase of my project. The next phase will be to implement an Event Based architecture with an underlying Pub/Sub design.
I am willing to use TCP.. but the the system is currently running on a single 24 core server.. so an IPC option would be nice. From what I understand ICE is only TCP, while DDS and 0mq have an IPC option.
Currently ,I am leaning towards using Protobuf with either ICE or Crossroads IO. Got turned off from the OpenSplice DDS website. Ive done lots research on the various options, was originally considering OpenMPI + boost:mpi, but there does not seem to be MPI for .NET.
My question is:
How does ICE compare to 0MQ? I cant wrap my head around this. Was unable to find anything online that compares the two.
thanks in advance.
........
More about my project:
Currently using CMAKE C++ on Windows, but the plan is to move to CentOS at some point. An additional desired feature is to store the tic data and all the messages in a "NoSql" database such as Hbase/Hadoop or HDF5. Do any of these middleware/messaging/pub-sub libraries have any database integration?
Some thoughts about ZeroC:
Very fast; Able to have multiple endpoints; Able to load balance on the endpoints; Able to reconnect to a different endpoint in case one of the node goes down. This is transparent to the end user; Has good tool chain (IceGrid, IceStorm, IceBox, etc); Distributed, high availability, multiple failover, etc
Apart from that, I have used it for hot swapping code modules (something similar to Erlang) by having the client create the proxy with multiple endpoints, and later on bring down each endpoint for a quick upgrade one by one. With the transparent retry to a different endpoint, I could have the system up and running the whole time i did an upgrade. Not sure if this is an advertised feature or an unadvertised side-effect :)
Overall, it is very easy to scale out your servers if need be using ZeroC Ice.
I know ZeroMQ provides a fantastic set of tools and messaging patterns and I would keep using it for my pet projects. However, The problem that i see is that it is very easy to go overboard and lose track of all your distributed components. This is a must have in a distributed environment. How will you know where your clients/server are when you need to upgrade? If one of components down the chain does not receive a message, how to identify where the issue is? the publisher? the client? or any one of the bridges (REP/REQ, XREP/XREQ, etc) in between?
Overall, ZeroC provides a much better toolset and ecosystem for enterprise solutions.
And it is open source :)
Jaybny,
ZMQ:
If you want real good performance and the only job for Phase 1 of your job is to move data from C++ to C#, then Zmq is the best option.
Having a pub/sub model for event driven architecture is also something that Zmq can help you with, with its in-built messaging pattern.
Zmq also supports your IPC requirements in this case. Eg: you can have one instance of your application that consumes 24 cores by multithreading and communicating via IPC.
ZeroC Ice:
Ice is a RPC framework very much like CORBA.
Eg.
Socket/ZMQ - You send message over the wire. Read it at the other end, parse the message, do some action, etc.
ZeroC Ice - Create a contract between client and server. Contract is nothing but a template of a class. Now the client calls a proxy method of that class, and the server implements/actions it and returns the value. Thus, int result = mathClass.Add(10,20) is what the client calls. The method, parameters, etc is marshalled and sent to the server, server implements the Add method, returns the result, and the client gets 30 as the result. Thus on the client side, the api is nothing but a proxy for a servant running on a remote host.
Conclusion:
ZeroC ICE has some nice enterprisy features which are really good. However, for your project requirements, ZMQ is the right tool.
Hope this helps.
For me.. the correct answer was Crossroads I/O . It does everything I need.. but still unable to pub/sub when using protobufs... im sure ZeroC ICE is great for distributed IPC, but 0MQ/Crossroads, gives you the added flexibility to use Inter-Thread-Communication.
Note: on windows, 0mq does not have IPC.
So, all in all, the crossroads fork of 0mq is the best. but you will have to roll your own windows/ipc (or use tcp::127..) , and publisher side topic filtering features for pub/sub.
nanomsg, from the guy who wrote crossroads and 0mq (i think).
http://nanomsg.org/

How can I turn a big chunk of native code into a scalable service?

Greetings,
I have a large piece of software developed in Eiffel. It is possible to use this code from C++, but it loads Eiffel runtime, and I can't trust the Eiffel code and runtime to be thread safe, when accessed by multiple threads from C++
I need to turn this native code into a service, but I would like to scale to multiple servers in case of high load. I don't want to delegate the scaling aspect to Eiffel code & runtime, so I'm looking into wrapping this code with existing scalability options.
Is there anything under Apache web server that'd let me provide thread safe access to this chunk of code? How about a pool of Eiffel code instances? What I have in mind is something like this:
[lots of client requests over network] ---> [Some scalable framework] --> [One or more instances of expensive to create Eiffel code]
I'd like the framework to let me wrap multiple instances of expensive chunks of code and I'd like to scale this up just like a web farm, by adding more machines.
Best Regards
Seref
If you're not tied to Apache but any other framework would suffice, I suggest you check out the ZeroMQ message passing framework. Its ZMQ_PUSH/ZMQ_PULL model with zmq_tcp transport seems to do what you want.
Your setup would be something like: one "master" process servicing outside requests (in any language/platform, perhaps an Apache mod) and a runtime-configurable number of C++ worker processes that call into Eiffel code and push results back.