Tensorflow Setup for Distributed Computing - c++

Can anyone provide guidance on how to setup tensorflow to work on many CPUs across a network? All of the examples I have found thus far use only one local box and multi-gpus at best. I have found that I can pass in a list of targets in the session_opts, but I'm not sure how to setup tensorflow on each box to listen for networked nodes/tasks. Any example would be greatly appreciated!

The open-source version (currently 0.6.0) of TensorFlow supports single-process execution only: in particular, the only valid target in the tensorflow::SessionOptions is the empty string, which means "current process."
The TensorFlow whitepaper describes the structure of the distributed implementation (see Figure 3) that we use inside Google. The basic idea is that the Session interface can be implemented using RPC to a master; and the master can partition the computation across a set of devices in multiple worker processes, which also communicate using RPC. Alas, the current version depends heavily on Google-internal technologies (like Borg), so a lot of work remains to make it ready for external consumption. We are currently working on this, and you can follow the progress on this GitHub issue.
EDIT on 2/26/2016: Today we released an initial version of the distributed runtime to GitHub. It supports multiple machines and multiple GPUs.

Related

Creating an SNMP agent with Qt and C++

I'm considering adding SNMP support to a simple daemon I wrote under linux. My daemon is written in C++ and Qt5.
I'm looking for an easy way to add this support. I found several MIB creation tools, the problem is writing the agent (or subagent). I'd rather not code this in C, would anyone know of q Qt library that helps out? I found mib2c which will create a skeleton in C (but I'd rather use C++ with Qt).
You can try using CIMPLE, which I've forked on github from it's original website. I've done some cleanup on github and I've attempted to contact the original authors, but they've never returned any of my emails which makes me wonder whether or not they intend to continue supporting the library.
Regardless, it does work and it plays fairly nicely with both Windows and Linux, which have very different styles of implementing SNMP agents. If you google around for "WBEM" you will find some other libraries as well. CIMPLE is the one we used at Fusion-io for SNMP support. It supported C++ fairly well and uses a code generator to handle lots of the boiler plate stuff that's really boring to write and not specific to your application.

Isis2 in ns-3 and bridge tap

So I need to simulate Isis2 in ns-3. (I am also to modify Isis2 slightly, wrapping it with some C/C++ code since I need at least a quasi real-time mission-critical behavior)
Since I am far from having any of that implemented it would interesting to know if this is a suitable way of conduct. I need to specifically monitor the performance of the consensus during sporadic wifi (ad hoc) behavior.
Would it make sense to virtualize a machine for each instance of Isis2 and then use the tap bridge( model and analyze the traffic in the ns-3 channel?
(I also am to log the events on each instance; composing the various data into a unified presentation)
You need to start by building an Isis2 application program, and this would have to be done using C/CLI or C++/CLI. C++/CLI will be easier because the match with the Isis2 type system is closer. But as I type these words, I'm trying to remember whether Mono actually supports C++/CLI. If there isn't a Mono compiler for C++/CLI, you might be forced to use C# or IronPython. Basically, you have to work with what the compiler will support.
You'll build this and the library on your mono platform and should test it out, which you can do on any Linux system. Once you have it working, that's the thing you'll experiment with on NS/3. Notice that if you work on Windows, you would be able to use C++/CLI (for sure) and then can just make a Windows VM for NS3. So this would mean working on Windows, but not needing to learn C#.
This is because Isis2 is a library for group communication, multicast, file replication and sharing, DHTs and so forth and to access any particular functionality you need an application program to "drive" it. I wouldn't expect performance issues if you follow the recommendations in the video tutorials and the user manual; even for real-time uses the system is probably both fast enough and steady enough in its behavior.
Then yes, I would take a virtual machine with the needed binaries for Mono (Mono is loaded from DLLs so they need to be available at the right virtual file system locations) and your Isis2 test program and run that within NS3. I haven't tried this but don't see any reason it wouldn't work.
Keep in mind that the default timer settings for timeout and retransmission are very slow and tuned for running on Amazon AWS, inside a data center. So once you have this working, but before simulating your wifi setup, you may want to experiment with tuning the system to be more responsive in that setting. I'm thinking that ISIS_DEFAULTTIMEOUT will probably be way too long for you, and the RTDELAY setting may also be too long for you. Amazon AWS is a peculiar environment and what makes Isis2 stable in AWS might not be ideal in a Wifi setting with very different goals... but all of those parameters can be tuned by just setting the desired values in the Environment, which can be done in bash on the line that launches your test program, or using the bash "Export" command.

ZerocICE vs DDS

i know this question maybe seems duplicate, but i think the new versions of RPC frameworks better to compare again, after all i m a newbie in RPC and HLA
my requirements:
Real-time pub/sub messaging architecture, i have 12 nodes connect each other and i want each process of my application run multiple times in different VM servers on each node
each process must know about its replicated process too, if memory in one VM goes up a replicated process must help this process in parallel
log of error occurred for each one of processes for tracing problem, and number of lost messages
i need supporting RTI and HLA for my simulation objects
why DDS is more used for critical systems like Military or Air Traffic management? is opensplice dds is that much good or OMG supported and created by military and DARPA guys too :D ?
do these frameworks provide such an options (for DDS opensource dds based on TAO ACE)?
what is my another options (like thrift)?
good compare of these frameworks? thanks a lot.

PCL library and concurrency

I have started working in a project using the PCL library under Windows 7. My question is if PCL provides any structures or algorithms for concurrent work. For example, creating a new point cloud from a data set concurrently; something like pcl::io::loadPCDFileKCores.
I have searched around in the API documentation and Google but find nothing.
Thanks a lot!.
PCL 1.7 does offer some facilities for exploting multiple processing cores on a system using GPU or CPU.
PCL uses multiple CPU cores on a system through the OpenMP API. You can check for multiple core enabled classes by searching PCL documentation for the "OpenMP" keyword (naive, but effective way!). As at the time of writing the reported OpenMP enabled classes are:
pcl::RangeImage
pcl::tracking::ParticleFilterOMPTracker
pcl::FPFHEstimationOMP
pcl::NormalEstimationOMP
pcl::Narf
pcl::tracking::KLDAdaptiveParticleFilterOMPTracker
pcl::SHOTColorEstimationOMP
pcl::SHOTEstimationOMP
pcl::NormalEstimationOMP< PointInT, Eigen::MatrixXf >
If you search the PCL documentation for the GPU or CUDA keywords a similar, but much longer, list of GPU-enabled classes is reported.

ZeroC ICE vs 0MQ/ZeroMQ vs Crossroads IO vs Open Source DDS

How does ZeroC ICE compare to 0MQ? I know that 0MQ/Crossroads and DDS are very similar, but cant seem to figure out where ICE comes in.
I need to quickly implement a system that offloads real-time market-data from C++ to C#, as a first phase of my project. The next phase will be to implement an Event Based architecture with an underlying Pub/Sub design.
I am willing to use TCP.. but the the system is currently running on a single 24 core server.. so an IPC option would be nice. From what I understand ICE is only TCP, while DDS and 0mq have an IPC option.
Currently ,I am leaning towards using Protobuf with either ICE or Crossroads IO. Got turned off from the OpenSplice DDS website. Ive done lots research on the various options, was originally considering OpenMPI + boost:mpi, but there does not seem to be MPI for .NET.
My question is:
How does ICE compare to 0MQ? I cant wrap my head around this. Was unable to find anything online that compares the two.
thanks in advance.
........
More about my project:
Currently using CMAKE C++ on Windows, but the plan is to move to CentOS at some point. An additional desired feature is to store the tic data and all the messages in a "NoSql" database such as Hbase/Hadoop or HDF5. Do any of these middleware/messaging/pub-sub libraries have any database integration?
Some thoughts about ZeroC:
Very fast; Able to have multiple endpoints; Able to load balance on the endpoints; Able to reconnect to a different endpoint in case one of the node goes down. This is transparent to the end user; Has good tool chain (IceGrid, IceStorm, IceBox, etc); Distributed, high availability, multiple failover, etc
Apart from that, I have used it for hot swapping code modules (something similar to Erlang) by having the client create the proxy with multiple endpoints, and later on bring down each endpoint for a quick upgrade one by one. With the transparent retry to a different endpoint, I could have the system up and running the whole time i did an upgrade. Not sure if this is an advertised feature or an unadvertised side-effect :)
Overall, it is very easy to scale out your servers if need be using ZeroC Ice.
I know ZeroMQ provides a fantastic set of tools and messaging patterns and I would keep using it for my pet projects. However, The problem that i see is that it is very easy to go overboard and lose track of all your distributed components. This is a must have in a distributed environment. How will you know where your clients/server are when you need to upgrade? If one of components down the chain does not receive a message, how to identify where the issue is? the publisher? the client? or any one of the bridges (REP/REQ, XREP/XREQ, etc) in between?
Overall, ZeroC provides a much better toolset and ecosystem for enterprise solutions.
And it is open source :)
Jaybny,
ZMQ:
If you want real good performance and the only job for Phase 1 of your job is to move data from C++ to C#, then Zmq is the best option.
Having a pub/sub model for event driven architecture is also something that Zmq can help you with, with its in-built messaging pattern.
Zmq also supports your IPC requirements in this case. Eg: you can have one instance of your application that consumes 24 cores by multithreading and communicating via IPC.
ZeroC Ice:
Ice is a RPC framework very much like CORBA.
Eg.
Socket/ZMQ - You send message over the wire. Read it at the other end, parse the message, do some action, etc.
ZeroC Ice - Create a contract between client and server. Contract is nothing but a template of a class. Now the client calls a proxy method of that class, and the server implements/actions it and returns the value. Thus, int result = mathClass.Add(10,20) is what the client calls. The method, parameters, etc is marshalled and sent to the server, server implements the Add method, returns the result, and the client gets 30 as the result. Thus on the client side, the api is nothing but a proxy for a servant running on a remote host.
Conclusion:
ZeroC ICE has some nice enterprisy features which are really good. However, for your project requirements, ZMQ is the right tool.
Hope this helps.
For me.. the correct answer was Crossroads I/O . It does everything I need.. but still unable to pub/sub when using protobufs... im sure ZeroC ICE is great for distributed IPC, but 0MQ/Crossroads, gives you the added flexibility to use Inter-Thread-Communication.
Note: on windows, 0mq does not have IPC.
So, all in all, the crossroads fork of 0mq is the best. but you will have to roll your own windows/ipc (or use tcp::127..) , and publisher side topic filtering features for pub/sub.
nanomsg, from the guy who wrote crossroads and 0mq (i think).
http://nanomsg.org/