Suggested network topology for this smallish data/command broadcasting application? - c++

We're putting together a system that reads ~32 voltage signals through an analog-to-digital converter card, does some preliminary processing on them, and passes the results (still separated into 32 channels) to the network as UDP packets, where they are picked up by another computer and variously (a) displayed, (b) further processed, (c) searched for criteria to change the state of the acquisition system, or (d) some combination of A-C. Simultaneously a GUI process is running on the computer doing those latter processes (the vis computer), which changes state in both the data-generating computer and the vis computer's multiple processes, through UDP-packeted command messages.
I'm new to network programming and am struggling to pick a network topology. Are there any heuristics (or book chapters, papers) about network topology for relatively small applications than need to pass data, commands, and command acknowledgments flexibly?
System details:
Raw data acquisition happens on a single Linux box. Simply processing the data, saving to disk, and pushing to the network uses about 25% of the CPU capacity, and a tiny amount of memory. Less than 0.5 Mb/sec of data go to the network. All code for data-generation is in c++.
Another linux machine runs several visualization / processing / GUI processes. The GUI controls both the acquisition machine and the processes on the vis/processing/GUI computer itself. This code is mostly in c++, with a couple little utilities in Python.
We will be writing other applications that will want to listen in on the raw data, the processed data, and all the commands being passed around; those applications will want to issues commands as well - we can't anticipate how many such modules we want to write, but we expect 3 or 4 data-heavy processes that transform all 32 input streams into a single output; as well as 3 or 4 one-off small applications like a "command logger". The modularity requirement means that we want the old data-generators and command-issuers to be agnostic about how many listeners are out there. We also want commands to be acknowledged by their recipients.
The two machines are connected by a switch, and packets (both data and commands, and acknowledgments) are sent in UDP.
The five possibilities we're thinking of:
Data streams, commands, and acknowledgements are targeted by port number. The data-generator sends independent data streams as UDP packets to different port numbers bound by independent visualizer processes on the visualization computer. Each process also binds a listening port for incoming commands, and another port for incoming acknowledgments to outgoing commands. This option seems good because the kernel does the work of trafficing/filtering the packets; but bad because it's hard to see how processes address each other in the face of unpredicted added modules; it also seems to lead to an explosion of bound ports.
Data streams are targeted to their respective visualizers by port number, and each process binds a port for listening for commands. But all command-issuers send their commands to a packet-forwarder process which knows the command-in ports of all processes, and forwards each command to all of them. Acknowledgements are also sent to this universal-command-in port and forwarded to all processes. We pack information about the intended target of each command and each acknowedgment into the command/ack packets, so the processes themselves have to sift through all the commands/acks to find ones that pertain to them.
The packet-forwarder process is also the target of all data packets. All data packets and all command packets are forwarded to perhaps 40 different processes. This obviously puts a whole lot more traffic on the subnet; it also cleans up the explosion of bound ports.
Two packet-distributors could run on the vis computer - one broadcasts commands/ack's to all ports. The other broadcasts data to only ports that would possibly want data.
Our 32 visualization processes could be bundled into 1 process that draws data for the 32 signals, greatly reducing the extra traffic that option 3 causes.
If you've experimented with passing data around among multiple processes on a small number of machines, and have some wisdom or rules of thumb about which strategies are robust, I'd greatly appreciate the advice! (requests for clarification in the pics are welcome)

I don't have enough rep to move this question to programmers.stackexhange.com so I will answer it here.
First I will throw quite a few technologies at you, each of which you need to take a look at.
Hadoop A map-reduce framework. The ability to take a large sum of data, and process it across distributed nodes.
Kafka An extremely high performant messaging system. I would suggest looking at this as your message bus.
ZooKeeper A distributed system that would allow you to "figure out" all the different aspects of your distributed system. It's a coordination system that is distributed
Pub/Sub Messaging
∅mq Another socket library that allows pub/sub messaging and other N-to-N message passing arrangements.
Now that I've thrown a few technologies at you I'll explain what I would do.
Create a system that allows you to create N connectors. These connectors can handle Data/Command N in your diagram, where N is a specific signal. Meaning if you had 32 signals you can setup your system with 32 connectors to "connect". These connectors can handle two-way communications. Hence your receive/command problem. A single connector will publish it's data to something such as Kafka on a topic specific to that signal.
Use a publish/subscribe system. Essentially what happens is the connectors publish it's results to a specified topic. This topic is something you choose. Then processors, either UI, business logic, etc. listen on a specific topic. These are all arbitrary and you can set them up however you want.
============ ============= ===== ============ =============
= Signal 1= < --- > = Connector = < -- = K = --> = "signal 1" ---> = Processor =
============ ============= = a = ============ =============
= f =
============ ============= = k = ============ =============
= Signal 2= < --- > = Connector = < -- = a = --> = "signal 2" ---> = Processor =
============ ============= = = ============ | =============
= = |
============ ============= = = ============ |
= Signal 3= < --- > = Connector = < -- = = --> = "signal 3" ---
============ ============= ===== ============
In this example the first connector "publishes" it's results to topic "signal 1" in which the first processor is listening on that topic. Any data sent to that topic is sent to the first processor. The second processor is listening for both "signal 2" and "signal 3" data. This represents something like a User Interface retrieving different signals at the same time.
One thing to keep in mind is that this can happen across whatever topics you choose. A "processor" can listen to all topics if you deem it important.

Related

How do I use ZeroMQ to listen to and parse UDP-data on a specific port?

I am trying to build a c++ application that must use ZeroMQ to listen to encoded packets being forwarded to port 8080 via UDP on my machine at a rate of 10 [Hz].
How do I setup a zmq socket/server/etc.. such that I can receive and decode the incoming data?
I am on a linux machine, running Ubuntu 16.04
UPDATE + ANSWER:
ZMQ does not listen to generic UDP packets, as #tadman stated. Therefore, considering I was unable to modify the system that was sending the packets, this would not be an appropriate use for ZMQ. I ended up using a generic UDP endpoint as #tadman recommended.
How do I use ZeroMQ to listen to and parse UDP-data on a specific port ?
Greetings to Dearborn/UoM, let's first demystify the problem, ok?
ZeroMQ is not a self-isolating tool, it can and does talk or listen to non-ZeroMQ sockets too.
#tadman was right and wrong at the same time.
ZeroMQ doesn't listen to UDP packets. // == True; ( as of known in 2018-Q2, API ~ 4.2.2 )It listens to ZeroMQ packets. // == False;
Since ZeroMQ native API ~ 4.+, ZeroMQ can both listen and talk to non-ZeroMQ sockets, i.e. your wish may lead to a ZeroMQ Context()-engine working with a plain socket.
If new to ZeroMQ distributed-system's design eco-systems, you may like first a brief dis-ambiguation read into the main conceptual differences in the [ ZeroMQ hierarchy in less than a five seconds ] Section, so as to better touch the roots of the problem to solve.
ZeroMQ has udp:// <transport-class>,can be used for { ZMQ_RADIO | ZMQ_DISH } Archetypes only
While ZeroMQ has the udp:// transport-class ready to use for both unicast and multicast AccessPoint addresses, it is not yet possible to make the Context() instantiate such data-pump for a non-ZeroMQ, plain-socket peers.
ZeroMQ can talk to non-ZeroMQ peers,yet just over a tcp:// <transport-class>
non-ZeroMQ peers can get connected using a plain socked, redressed ( due to many architecture / API design reasons ) inside the ZeroMQ implementation into a ZeroMQ-compliant Scalable Formal Communication Archetype named ZMQ_STREAM ). This is cool and permits to use homogeneous strategies to handle also these types of communicating peers, yet, there is just a need to use the tcp:// transport-class, if this is necessary.
How to ?
Given your source of the dataflow is under your control, try to make it use ZeroMQ eco-system, since which it can be comfortably served as any other ZeroMQ udp://-cross-connected AccessPoint.
If design or "political" constraints prevent you from doing so, the receiving side cannot be ZeroMQ directly, so decide about making an application-specific protocol gateway, mediating Non-ZeroMQ-udp traffic to any form of ZeroMQ "consumable", be it a ZMQ_STREAM over plain-tcp: ( if decided to make a functionally minimalistic design of the proxy, or decide to equip such proxy straight with any other, smarter ZeroMQ archetype, to communicate on way higher level of comfort with your main data-collector / processor ).
If audio is the intended payload and the accumulating latency is a concern, best also read more details on how easily the main engine can get performance tuned - scaled up the number of IOthreads, wisely mapped ZMQ_AFFINITY and ZMQ_PRIORITY settings - all that can influence the target latency + throughput performance envelopes.
Last, but not least, the 10 [Hz] requirement
this one is indeed a nice part, that will test one's insights into asynchronous process coordination. ZeroMQ main engine ( the Context()-instance(s) ) work in an asynchronous and uncoordinated manner.
This means, there is no direct way to avoid accumulated latency or to inspect any of the Broker-less, per-peer managed, async by desing message-queue buffer, so as to "travel"-"back"-in-time, upon a Hard-Real-Time 10 [Hz] probing.
If this is going to work in a weak / "soft" ( not a strict R/T ) flow-of-time system coordination ( having no control-system stability constraints / critical-system / life-supporting or similar system responsibility, as hard R/T system designs do have ), thus tolerating a certain amount of code-execution related jitter RTT- / [ transport + (re-)processing ]-latencies a smart-designed .poll()-based non-blocking inspections and possibly some fast queue pre-emptying policies may help you get into acceptably fast, soft-RT behaviour to make the 10 [Hz]-monitor robust enough.
So, indeed cool days with ZeroMQ in front of you - Good Luck, Sir. If not already overdue with Project's Plan or deadline coming on Monday, best take a read of a fabulous Pieter HINTJENS' book "Code Connected, Volume 1", where most gems of the Zen-of-Zero are well discussed and inspected for distributed-systems designs.

How to "publish" a large number of actors in CAF?

I've just learned about CAF, the C++ Actor Framework.
The one thing that surprised me is that the way to make an actor available over the network is to "publish" it to a specific TCP port.
This basically means that the number of actors that you can publish is limited by the number of ports you have ( 64k ). Since you need both one port to publish an actor and one port to access a remote actor, I assume that two processes would each be able to share at best about 32k actors each, while they could probably each hold a million actors on a commodity server. This would be even worse, if the cluster had, say, 10 nodes.
To make the publishing scalable, each process should only need to open 1 port, for each and every actor in one system, and open 1 connection to each actor system that they want to access.
Is there a way to publish one actor as a proxy for all actors in an actor system ( preferably without any significant performance loss )?
Let me add some background. The middleman::publish/middleman::remote_actor function pair does two things: connecting two CAF instances and giving you a handle for communicating to a remote actor. The actor you "publish" to a given port is meant to act as an entry point. This is a convenient rendezvous point, nothing more.
All you need to communicate between two actors is a handle. Of course you need to somehow learn new handles if you want to talk to more actors. The remote_actor function is simply a convenient way to implement a rendezvous between two actors. However, after you learn the handle you can freely pass it around in your distributed system. Actor handles are network transparent.
Also, CAF will always maintain a single TCP connection between two actor system. If you publish 10 actors on host A and "connect" to all 10 actors from host B via remote_actor, you'll see that CAF will initially open 10 connections (because the target node could run multiple actor system) but all but one connection will get closed.
If you don't care about the rendezvous for actors offered by publish/remote_actor then you can also use middleman::open and middleman::connect instead. This will only connect two CAF instances without exchanging actor handles. Instead, connect will return a node_id on success. This is all you need for some features. For example remote spawning of actors.
Is there a way to publish one actor as a proxy for all actors in an actor system ( preferably without any significant performance loss )?
You can publish one actor at a port that's sole purpose it is to model a rendezvous point. If that actor sends 1000 more actor handles to a remote actor this will not cause any additional network connections.
Writing a custom actor that explicitly models the rendezvous between multiple systems by offering some sort dictionary is the recommended way.
Just for the sake of completeness: CAF also has a registry mechanism. However, keys are limited to atom values, i.e., 10-characters-or-less. Since the registry is generic it also only stores strong_actor_ptr and leaves type safety to you. However, if that's all you need: you put handles to the registry (see actor_system::registry) and then access this registry remotely via middleman::remote_lookup (you only need a node_id to do this).
Smooth scaling with ( almost ) no limits is alpha & omega
One way, used in agent-based systems ( not sure if CAF has implemented tools for going this way ) is to use multiple transport-classes { inproc:// | ipc:// | tcp:// | .. | vmci:// } and thus be able to pick from, on an as needed basis.
While building a proxy may sound attractive, welding together two different actor-models one "atop" the other does not mean that it is as simple to achieve as it sounds ( eventloops are fragile to get tuned / blocking-prevented / event-handled in a fair manner - the do not like any other master to try to take their own Hat ... ).
In case CAF provides at the moment no other transport-means but TCP:
still one may resort to use O/S-level steps and measures and harness the features of the ISO-OSI-model up to the limits or as necessary:
sudo ip address add 172.16.100.17/24 dev eth0
or better, make the additional IP-addresses permanent - i.e. edit the file /etc/network/interfaces ( or Ubuntu ) and add as many stanzas, so that it looks like:
iface eth0 inet static
address 172.16.100.17/24
iface eth0 inet static
address 172.16.24.11/24
This way the configuration-space could get extended for cases the CAF does not provide any other means for such actors but the said TCP (address:port#)-transport-class.

NTPD synchronization with 1PPS signal

I have an AHRS (attitude heading reference system) that interfaces with my C++ application. I receive a 50Hz stream of messages via Ethernet from the AHRS, and as part of this message, I get UTC time. My system will also have NTPD running as the time server for our embedded network. The AHRS also has a 1PPS output that indicates the second roll-over time for UTC. I would like to synchronize the NTPD time with the UTC. After some research, I have found that there are techniques that utilize a serial port as input for the 1PPS. From what I can find, these techniques use GPSD to read the 1PPS and communicate with NTPD to synchronize the system time. However, GPSD is expecting a NMEA formatted message from a GPS. I don't have that.
The way I see it now, I have a couple of optional approaches:
Don't use GPSD. Write a program that reads the 1PPS and the Ethernet
message contain UTC, and then somehow communicates this information
to NTPD.
Use GPSD. Write a program that repackages the Ethernet message into
something that can be sent to GPSD, and let it handle the
interaction with NTPD.
Something else?
Any suggestions would be very much appreciated.
EDIT:
I apologize for this poorly constructed question.
My solution to this problem is as follows:
1 - interface 1PPS to RS232 port, which as it turns out is a standard approach that is handled by GPSD.
2 - write a custom C++ application to read the Ethernet messages containing UTC, and from that build an NMEA message containing the UTC.
3 - feed the NMEA message to GPSD, which in turn interfaces with NTPD to synchronize the GPS/1PPS information with system time.
I dont know why you would want drive a PPS device with a signal that is delivered via ethernet frames. Moreover PPS does not work the way you seem to think it does. There is no timecode in a PPS signal so you cant sync the time to the PPS signal. The PPS signal is simply used to inform the computer of how long a second is.
there are examples that show how a PPS signal can be read in using a serial port, e.g. by attaching it to an interrupt capable pin - that might be RingIndicator (RI) or something else with comparable features. the problem i am seeing there is that any sort of code-driven service of an interrupt has its latencys and jitter. this is defined by your system design (and if you are doing it, by your own system tailored special interrupt handler routine - on a PC even good old ISA bus introduced NMI handlers might see such effects).
to my best understanding people that are doing time sync on a "computer" are using a true hardware timer-counter (with e.g. 64 bits) and a latch that gets triggered to sample and hold the value of the timer on every incoming 1PPS pulse. - folks are doing that already with PTP over the ethernet with the small variation that a special edge of the incoming data is used as the trigger and by this sender and receiver can be synchronized using further program logic that grabs the resulting value from the built in PTP-hardware-latch.
see here: https://en.wikipedia.org/wiki/Precision_Time_Protocol
along with e.g. 802.1AS: http://www.ieee802.org/1/pages/802.1as.html
described wikipedia in section "Related initiatives" as:
"IEEE 802.1AS-2011 is part of the IEEE Audio Video Bridging (AVB) group of standards, further extended by the IEEE 802.1 Time-Sensitive Networking (TSN) Task Group. It specifies a profile for use of IEEE 1588-2008 for time synchronization over a virtual bridged local area network (as defined by IEEE 802.1Q). In particular, 802.1AS defines how IEEE 802.3 (Ethernet), IEEE 802.11 (Wi-Fi), and MoCA can all be parts of the same PTP timing domain."
some article (in German): https://www.elektronikpraxis.vogel.de/ethernet-fuer-multimediadienste-im-automobil-a-157124/index4.html
and some presentation: http://www.ieee802.org/1/files/public/docs2008/as-kbstanton-8021AS-overview-for-dot11aa-1108.pdf
my rationale to your question is:
yes its possible. but it is a precision limited design due to the various internal things like latency and jitter of the interrupt handler you are forced to use. the achievable overall precision per pulse and in a long term run is hard to say but might be in the range of some 10 ms at startup with a single pulse to maybe/guessed 0,1 ms. - doing it means proving it. long term observations should help you unveiling the true practical caps with your very specific computer and selected software environment.

How does EtherCAT support different network topologies?

How does EtherCAT support different network topologies?
Assume a pure EtherCAT network without any standard ethernet switches, hubs, etc... to complicate things, and with one master and multiple slaves.
Some sources describe it as only supporting ring topologies (i.e. Wikipedia), and this makes sense given the theory of operation, but the EtherCAT website says it supports other topologies as well.
100BaseTX ethernet cables contains two half-duplex links, one in each direction; is it true that when viewed as a graph of half-duplex links, EtherCAT is always a ring bus, but when viewed as a graph of physical ethernet cables, the graph can be almost arbitrary?
That's right.
When viewed physically, there can be lots of topologies: dasiy chain, star, tree, etc. For example, you can use Beckhoff EK1122 module to create a three-branch star topology. Logically, there is a single determined path around all the nodes(master and slaves) that EtherCAT frames go through. That forms a ring because the master is the source that initiates all frames and is also the final destination that all frames will go back to.
An EtherCAT "loop" is a connected set of slave devices, which can each connect to at most four neighboring devices. These four possible connections are called ports and are numbered 0-3. Port 0 is the "upstream" connection, which I usually describe as connecting to the slave's parent device, port 1 is usually whatever the "straight through" path would be.
If you take a bus coupler (EK1100) for example, it has:
port 0: RJ45 socket (for Ethernet 8P8C connector) labelled "X1 In"
port 1: EBUS-Out (for EBUS slice connections)
port 2: RJ45 socket labelled "X2 Out"
For comparison an EBUS junction has:
port 0: EBUS-In (for connection to upstream EBUS slices)
port 1: EBUS-Out
port 2: RJ45 socket labelled "X1"
port 3: RJ45 socket labelled "X2"
And a bus extension (EK1110) has:
port 0: EBUS-In
port 1: RJ45 socket labelled "X1 Out"
These connections form a graph where every slave is a node having exactly one parent and at most three children. Each edge in the graph represents a bidirectional ethernet connection between two ports. Once you have built up this connected graph of slaves the auto-increment numbering scheme results from a depth first traversal of the tree, numbering each new slave with the next free number. Sub-graphs are explored along port 1, port 3, then port 2 (no clue why it's that order).
So, yes, each half duplex link is traversed only once during a packet transmission through the network, meaning that it can be viewed as a ring of half duplex links, with each slave-to-slave connection appearing on the ring in two places (once for each direction of traversal).
(Some additional info)
If you have a look for the way EtherCAT masters addresses their slaves, you will see that even if you have a daisy chain topology, the telegram transport behaves like a line topology. It is because the Master counts all the slaves which are present at the bus and assigns an auto-incremental address (in the first phase) to them. Thats the order how the telegram will be processed by the slaves. So the Master passes the telegram to slave1, it puts on the fly its data into its section and passes to slave2 and so on. The last slave closes the bus and sends the telegram back. In the user manual they use sometimes the word "shortcutted".
So physically you can have nearly every topologie you want, but logically you have a line. If you´d like to have redundancy, you could connect the last slave with a second EtherCAT port at the master. This would give you a real ring topology and the bus would still work in the case that a slave goes down (excluded the defect slave).
As Eric Z has answered above, it may be a physical line, ring, star or tree. And he says, that the packet will go through a logical ring. But he did not say how this is achieved, see my comments on his answer. Therefore I digged a little deeper and found this article:
http://digital.ni.com/public.nsf/allkb/3399C1A0211EDC14862580140065286B
which describe that for a "dedicated EtherCAT junction" is needed to build a star (or a tree):
Star:
This is the most familiar topology for many new to EtherCAT®, as it resembles are regular Ethernet network using hubs. However, to implement this, you will need a dedicated EtherCAT® junction. Because of this, it is potentially costlier than ring or line. Also, this topology will be marginally slower than others, as there are more interstitial nodes that must repeat the message between the end nodes (e.g. for a EtherCAT® packet to go from master to slave, it must go through the junction/hub first, which will introduce a small delay). In fact, EtherCAT® star topology is not like traditional star topologies – it is actually a line topology in which data goes through junction port 1, reaches its end slave and comes back to the junction, and then comes through junction port 2 the same way. This topology is best for systems in locations with physical constraints that make it difficult to implement line or ring.
Searching for "EtherCAT junction" I found
https://www.beckhoff.com/english.asp?ethercat/ek1122.htm
which actually is the prodct that Eric Z mentioned, a 2-port EtherCAT junction. There are 8-port devices as well, https://www.beckhoff.com/english.asp?pc_cards_switches/cu1128.htm

Options for inter-service one-way communication

I'm searching for different options for implementing communication between a service and other services/applications.
What I would like to do:
I have a service that is constantly running, polling a device connected to a serial port. At certain points, this service should send a message to interested clients containing data retrieved from the device. Data is uncomplicated, most likely just a single string.
Ideally, the clients would not have to subscribe to receive these messages, which leads me to some sort of event 'broadcast' setup (similar to Windows events). The message sending process should not block, and does not need a response from any clients (or that there even are any clients for that matter).
I've been reading about IPC (COM in particular) and windows events, but am yet to come across something that really fits with what I want to do.
So is this possible? If so, what technologies should I be using? If not, what are some viable communication alternatives?
Here's the particulars of the setup:
Windows 2000/XP environments
'Server' service is a windows service, using VC++2005
Clients would vary, but always be in the windows environment (usual clients would be VC++6 windows services, VB6 applications)
Any help would be appreciated!
Windows supports broadcasting messages, check here. You can SendMessage to HWND_BROADCAST from the service, and receive it in each client.
There are a number of ways to do a broadcast system, but you'll have to either give up reliability (ie, some messages must be lost) or use a proper subscription system.
If you're willing to give up reliability, you can create a shared memory segment and named manual-reset event object. When a new message arrives, write it to the shared memory segment, signal the event object, then close the event object and create a new one with a different name (the name should be in the shmem segment somewhere). Clients open the shmem segment, find the current event object, wait for it to be signaled, then read off the message and new event segment.
In this option, you must be careful to deal with the case of a client reading at the same time as the shmem segment is updated properly. One way to do this is to have two sequence number fields in the shmem segment - one is updated before the new message is written, one after. Clients read the second sequence number prior to reading the message, then re-read both sequence numbers after, and check that they are all equal (and discard the message and retry after a delay if they are not). Be sure to place memory barriers around accesses to these sequence numbers to ensure the compiler does not reorder them!
Of course, this is all a bit hairy. Named pipes are a lot simpler, but a subscription (of a sort) is required. The server calls CreateNamedPipe, then accepts connections with ConnectNamedPipe. Clients use CreateFile to connect to the server's pipe. The server then just loops to send data (using WriteFile) to all of its clients. Note that you will need to create addititonal instance of the pipe using CreateNamedPipe each time you accept a connection. An example of a named pipe server can be found here: http://msdn.microsoft.com/en-us/library/aa365588(v=vs.85).aspx