I'm looking for an ipc mechanism which would allow high throughput of data updates from one process to many (thousands).
The 'server' process would be responsible for updating a data structure at a high frequency. Upon update, I'd like to notify the 'client' processes of the update, and allow those processes to read the new data.
Under a Linux or FreeBSD environment, what would be a good way to go about this?
I would recommend using ZeroMQ. It's a fast, lightweight, cross-platform, cross-language messaging system that already does all you're asking for. It's easy to use, and very robust. It can operate in many, many modes, one of which is one to many messaging (this is called broadcast in CS-speak).
While not clear on your setup limitations or requirements (all processes in the same machine?), it looks that the most versatile solution would be to use MPI which is platform-independent and distributed. In particular, it provides broadcasting functionality.
The downside is that you would have to model your design a bit after the MPI API.
Related
I am designing an application that requires of a distributed set of processing workers that need to asynchronously consume and produce data in a specific flow. For example:
Component A fetches pages.
Component B analyzes pages from A.
Component C stores analyzed bits and pieces from B.
There are obviously more than just three components involved.
Further requirements:
Each component needs to be a separate process (or set of processes).
Producers don't know anything about their consumers. In other words, component A just produces data, not knowing which components consume that data.
This is a kind of data flow solved by topology-oriented systems like Storm. While Storm looks good, I'm skeptical; it's a Java system and it's based on Thrift, neither of which I am a fan of.
I am currently leaning towards a pub/sub-style approach which uses AMQP as the data transport, with HTTP as the protocol for data sharing/storage. This means the AMQP queue model becomes a public API — in other words, a consumer needs to know which AMQP host and queue that the producer uses — which I'm not particularly happy about, but it might be worth the compromise.
Another issue with the AMQP approach is that each component will have to have very similar logic for:
Connecting to the queue
Handling connection errors
Serializing/deserializing data into a common format
Running the actual workers (goroutines or forking subprocesses)
Dynamic scaling of workers
Fault tolerance
Node registration
Processing metrics
Queue throttling
Queue prioritization (some workers are less important than others)
…and many other little details that each component will need.
Even if a consumer is logically very simple (think MapReduce jobs, something like splitting text into tokens), there is a lot of boilerplate. Certainly I can do all this myself — I am very familiar with AMQP and queues and everything else — and wrap all this up in a common package shared by all the components, but then I am already on my way to inventing a framework.
Does a good framework exist for this kind of stuff?
Note that I am asking specifically about Go. I want to avoid Hadoop and the whole Java stack.
Edit: Added some points for clarity.
Because Go has CSP channels, I suggest that Go provides a special opportunity to implement a framework for parallelism that is simple, concise, and yet completely general. It should be possible to do rather better than most existing frameworks with rather less code. Java and the JVM can have nothing like this.
It requires just the implementation of channels using configurable TCP transports. This would consist of
a writing channel-end API, including some general specification of the intended server for the reading end
a reading channel-end API, including listening port configuration and support for select
marshalling/unmarshalling glue to transfer data - probably encoding/gob
A success acceptance test of such a framework should be that a program using channels should be divisible across multiple processors and yet retain the same functional behaviour (even if the performance is different).
There are quite a few existing transport-layer networking projects in Go. Notable is ZeroMQ (0MQ) (gozmq, zmq2, zmq3).
I guess you are looking for a message queue, like beanstalkd, RabbitMQ, or ØMQ (pronounced zero-MQ). The essence of all of these tools is that they provide push/receive methods for FIFO (or non-FIFO) queues and some even have pub/sub.
So, one component puts data in a queue and another one reads. This approach is very flexible in adding or removing components and in scaling each of them up or down.
Most of these tools already have libraries for Go (ØMQ is very popular among Gophers) and other languages, so your overhead code is very little. Just import a library and start receiving and pushing messages.
And to decrease this overhead and avoid dependency on a particular API, you can write a thin package of yours which uses one of these message queue systems to provide very simple push/receive calls and use this package in all of your tools.
I understand that you want to avoid Hadoop+Java, but instead of spending time developing your own framework, you may want to have a look at Cascading. It provides a layer of abstraction over underlying MapReduce jobs.
Best summarized on Wikipedia, It [Cascading] follows a ‘source-pipe-sink’ paradigm, where data is captured from sources, follows reusable ‘pipes’ that perform data analysis processes, where the results are stored in output files or ‘sinks’. Pipes are created independent from the data they will process. Once tied to data sources and sinks, it is called a ‘flow’. These flows can be grouped into a ‘cascade’, and the process scheduler will ensure a given flow does not execute until all its dependencies are satisfied. Pipes and flows can be reused and reordered to support different business needs.
You may also want to have a look at some of their examples, Log Parser, Log Analysis, TF-IDF (especially this flow diagram).
I'm working on an application (C++ combined with Qt for graphic part) to be run on an embedded Linux platform. I need know how to divide the application in different "cores" each one taking care of a different part of the application in such a way to improve stability, efficiency and security of the application itself.
My doubt is: is it more convenient to divide functionalities into threads or to fork different processes?
Let me provide a functional view of the application: there are different user interfaces each one allowing users to do more or less the same things (don't mind about data consistency, I've already solved this problem). Each of these interfaces must act as a stand-alone (like different terminal of the same system). I want all of them to send and receive messages from the same "core" which will take care of updating application data or do other proper stuff.
What's the best way to implement the division between the inner "core" and a user interface?
For sure I'm missing some knowledge but so far I came up with two alternatives:
1 - fork a child from father "core" and let the child execute a specific UI program (I have no practical experience of doing this so how, in this case, can I make father and child communicate (baring in mind that child is a new process)?)
2 - create different threads for each core and UI.
I need this division because the application is required to be as stable as possible and capable of restarting a UI in the case of a crash. Keep in mind also that the overall application wont have infinite memory and resources available.
Thanks in advance for your help, regards.
There are a several reasons why going down the separate process route might is a good choice in an embedded system:
Decoupling of component: running components as seperate processes is the ultimate decoupling. Often useful when projects become very large
Security and privilege management: Quite likely in an embedded system that some components need elevated privilege in order to control devices, whereas others are potential security hazards (for instance network facing components) you want to run with as little as little privilege as possible. Other likely scenarios are components that need real-time threading or to be able to mmap() a lot of system memory. Overallocation of either will lock your system up in a way it won't recover from.
Reliably: You can potentially respawn parts of the system if they fail leaving the remainder running
Building such an arrangement is actually easier than others here are suggesting - Qt has really good support for dbus - which nicely takes care of your IPC, and is used extensive in the Linux desktop for system management functionality.
As for the scenario you describe, you might want to daemonise the 'core' of the application and then allow client connections over dbus from UI components.
Running the UI in a different thread won't give you much in the way of additional stability -- the other thread can trash your heap of the engine, and even if you terminate the thread any resources it has won't be recycled.
You can improve stability a bit by having a really strong wall of abstraction between the Engine and the UI. So this isn't completely futile.
Multiple processes require lots of hoops to jump through -- you need a method of IPC (interprocess communication).
Note that IPC and to a lesser extent walls of abstraction can add to the overhead of your program.
An important question to ask is "how much data has to pass between the UI and the Engine?" -- if it is little enough data (like "start the task" from UI to engine, and "this task is 50% done" from engine to UI), IPC is less of a hassle. If you are an interactive painting application with real-time full-screen updates of an image, IPC is more annoying and less practical.
Now, a quick google on Qt and IPC tells me that there is a Qt extension for embedded linux that allows the Qt signals and slots to pass messages between processes: Qt COmmunications Protocol (QCOP). One issue I have had with frameworks like this is that it can easily lead to entanglements between the client and server state that can compromise stability on the other end of the communications pipe, compared to relatively simple protocols.
I have read that working with more than 64 sockets in a thread is dangerous(?). But -at least for me- Non-blocking sockets are used for avoiding complicated thread things. Since there is only one listener socket, how am i supposed to split sockets into threads and use them with select() ? Should i create fd_sets for each thread or what ? And how am i supposed to assign a client to a thread, since I can only pass values in the beginning with CreateThread() ?
No no no, you got a few things wrong there.
First, the ideal way to handle many sockets is to have a thread pool which will do the work in front of the sockets (clients).
Another thread, or two (actually in the amount of CPUs as far as I know), do the connection accepting.
Now, when a an event occurs, such as a new connection, it is being dispatched to the thread pool to be processed.
Second, it depends on the actual implementation and environment.
For example, in Windows there's something called IOCP.
If you ask me - do not bother with the lower implementation but instead use a framework such as BOOST::ASIO or ACE.
I personally like ASIO. The best thing about those frameworks is that they are usually cross-platform (nix, Windows etc').
So, my answer is a bit broad but I think it's to the best that you take these facts into consideration before diving into code/manuals/implementation.
Good luck!
Well, what you have read is wrong. Many powerful single-threaded applications have been written with non-blocking sockets and high-performance I/O demultiplexers like epoll(4) and kqueue(2). Their advantage is that you setup your wait events upfront, so the kernel does not have to copy ton of file descriptors and [re-]setup lots of stuff on each poll.
Then there are advantages to threading if your primary goal is throughput, and not latency.
Check out this great overview of available techniques: The C10K problem.
The "ideal way to handle many sockets" is not always - as Poni seems to believe - to "have a thread pool."
What does "ideal" pertain to? Is it ease of programming? Best performance?
Since he recommends not bothering "with the lower implementation" and "use a framework such as BOOST::ASIO or ACE" I guess he means ease of programming.
Had he had a performance angle on Windows he would have recommended "something called IOCPs." IOCPs are "IO Control Ports" which will allow implementation of super-fast IO-applications using just a handful of threads (one per available core is recommended). IOCP applications run circles around any thread-pool equivalent which he would have known if he'd ever written code using them. IOCPs are not used alongside thread pools but instead of them.
There is no IOCP equivalent in Linux.
Using a framework on Windows may result in a faster "time to market" product but the performance will be far from what it might have been had a pure IOCP implementation been chosen.
The performance difference is such that OS-specific code implementations should be considered. If a generic solution is chosen anyway, at least performance would "not have been given away accidentally."
I am currently involved in the development of a software using distributed computing to detect different events.
The current approach is : a dozen of threads are running simultaneously on different (physical) computers. Each event is assigned a number ; and every thread broadcasts its detected events to the other and filters the relevant events from the incoming stream.
I feel very bad about that, because it looks awful, is hard to maintain and could lead to performance issues when the system will be upgraded.
So I am looking for a flexible and elegant way to handle this IPC, and I think Boost::Signals seems a good candidate ; but I never used it, and I would like to know whether it is possible to provide encapsulation for network communication.
Since I don't know any solution that will do that, other then Open MPI, if I had to do that, I would first use Google's Protocol Buffer as my message container. With it, I could just create an abstract base message with stuff like source, dest, type, id, etc. Then, I would use Boost ASIO to distribute those across the network, or over a Named PIPE/loopback for local messages. Maybe, in each physical computer, a dedicated process could be running just for distribution. Each thread registers with it which types of messages it is interested in, and what its named pipe is called. This process would know the IP of all the other services.
If you need IPC over the network then boost::signals won't help you, at least not entirely by itself.
You could try using Open MPI.
I'm working on an instant messenger client in C++ (Win32) and I'm experimenting with different asynchronous socket models. So far I've been using WSAAsyncSelect for receiving notifications via my main window. However, I've been experiencing some unexpected results with Winsock spawning additionally 5-6 threads (in addition to the initial thread created when calling WSAAsyncSelect) for one single socket.
I have plans to revamp the client to support additional protocols via DLL:s, and I'm afraid that my current solution won't be suitable based on my experiences with WSAAsyncSelect in addition to me being negative towards mixing network with UI code (in the message loop).
I'm looking for advice on what a suitable asynchronous socket model could be for a multi-protocol IM client which needs to be able to handle roughly 10-20+ connections (depending on amount of protocols and protocol design etc.), while not using an excessive amount of threads -- I am very interested in performance and keeping the resource usage down.
I've been looking on IO Completion Ports, but from what I've gathered, it seems overkill. I'd very much appreciate some input on what a suitable socket solution could be!
Thanks in advance! :-)
There are four basic ways to handle multiple concurrent sockets.
Multiplexing, that is using select() to poll the sockets.
AsyncSelect which is basically what you're doing with WSAAsyncSelect.
Worker Threads, creating a single thread for each connection.
IO Completion Ports, or IOCP. dp mentions them above, but basically they are an OS specific way to handle asynchronous I/O, which has very good performance, but it is a little more confusing.
Which you choose often depends on where you plan to go. If you plan to port the application to other platforms, you may want to choose #1 or #3, since select is not terribly different from other models used on other OS's, and most other OS's also have the concept of threads (though they may operate differently). IOCP is typically windows specific (although Linux now has some async I/O functions as well).
If your app is Windows only, then you basically want to choose the best model for what you're doing. This would likely be either #3 or #4. #4 is the most efficient, as it calls back into your application (similar, but with better peformance and fewer issues to WSAsyncSelect).
The big thing you have to deal with when using threads (either IOCP or WorkerThreads) is marshaling the data back to a thread that can update the UI, since you can't call UI functions on worker threads. Ultimately, this will involve some messaging back and forth in most cases.
If you were developing this in Managed code, i'd tell you to look at Jeffrey Richter's AysncEnumerator, but you've chose C++ which has it's pros and cons. Lots of people have written various network libraries for C++, maybe you should spend some time researching some of them.
consider to use the ASIO library you can find in boost (www.boost.org).
Just use synchronous models. Modern operating systems handle multiple threads quite well. Async IO is really needed in rare situations, mostly on servers.
In some ways IO Completion Ports (IOCP) are overkill but to be honest I find the model for asynchronous sockets easier to use than the alternatives (select, non-blocking sockets, Overlapped IO, etc.).
The IOCP API could be clearer but once you get past it it's actually easier to use I think. Back when, the biggest obstacle was platform support (it needed an NT based OS -- i.e., Windows 9x did not support IOCP). With that restriction long gone, I'd consider it.
If you do decide to use IOCP (which, IMHO, is the best option if you're writing for Windows) then I've got some free code available which takes away a lot of the work that you need to do.
Latest version of the code and links to the original articles are available from here.
And my views on how my framework compares to Boost::ASIO can be found here: http://www.lenholgate.com/blog/2008/09/how-does-the-socket-server-framework-compare-to-boostasio.html.