Dataflow Programming - Patterns and Frameworks - c++

I just came across the proposed Boost::Dataflow library.
It seems like an interesting approach and I was wondering if there are other such alternative frameworks for C++, and if there are any related design patterns.
I have not ruled out Boost::Dataflow, I am just looking into any available alternatives so I can understand the domain and my options better (or roll my own if necessary).

Wikipedia
There are a couple of good articles in the Wikipedia about the theory of the dataflow programming:
Dataflow
Dataflow programming
Flow based programming
Actor model
Visual programming
These articles are written by various authors, so there are some overlaps, and some important stuff are missing, but it is a very good start point.
TinyOS
This is an open source operating system based on the dataflow principle. I have bad feelings about that: they don't even mention the term "dataflow". Altough, it is that, and maybe it's worth studying it.

Look at Intel Threading Building Blocks, particullary its tbb::flow namespace.

You can also look at the two main open source robotics frameworks, ROS and Orocos. There is also Rock, but it is based on Orocos, so it is equivalent if you're just looking for a C++ component framework.

There are some dataflow C++ libraries I have found:
cellspp - allows to use spreadsheet evaluation model.
DSPatch and Route11 - C++ dataflow frameworks. Allows to write programs in dataflow manner. Looks interesting.

if you want this design for image processing or visulization, you can find a good ressource in itk. And if you want a gui for this (data/work)flow you can use devide.
My 2cents,
Johan

Just for the records, you can also consider gstreamermm, which is a C++ wrapper around gstreamer.

Dataflow programming is one of those things that's been lurking around for decades and never quite taken off... for software anyway; in the VHDL/Verilog world you find yourself naturally adopting the dataflow mindset much more readily. But in the software world... somehow it just never seems to scale beyond toy systems, perhaps because people insist on tying it together with visual programming (and I see boost dataflow also treads this path). Some people look to dataflow programming to solve the software crisis by making it more like HW design with pluggable components with interconnectable pins... but hang on, HW design is really hard too! (Interestingly, while in the HW world visual programming systems do exist, noone actually uses them to build anything big).
The most interesting, active modern example I'm aware of using dataflow principles is the PureData audio-visual programming environment.

Visual Studio Concurrency Runtime contains an asynchronous dataflow framework in C++.
An example of image processing dataflow: http://msdn.microsoft.com/en-us/library/ff398050.aspx

You might check my implementation of dataflow here: http://ambient.comp-phys.org
It supports MPI and threading and is based upon custom dataflow types (i.e. ambient::vector) that work through run-time object versioning system.

If your area is sound generation/processing, use http://www.synthedit.com/
It looks promising, I've found a good answers for a deep problem in the SDK docs (polyphony). Funny, but they don't mention the word dataflow.

Maybe Pure Data (pd) has a C++ API...
http://en.wikipedia.org/wiki/Pure_Data

Related

will Spark support Clojure?

I am about to start learning functional programming and Clojure appeals to me the most, I love its community, syntax and concept of immutable data structures. I am also interested in bio inspired ML for rich data Numenta.
However, my huge concern is that Spark does not support it as yet, and Spark rocks!!!
There is a Flambo Flambo Clojure ,but is it a satisfactory solution?
My course and job is in Scala. Should I defeat and enter Scala world or do you think that I should focus solely on Clojure?
Being the author or Sparkling (thanks to Josh Rosen for pointing that out), I can tell you that we use it at our company for ETL processing.
Here's what's good:
it provides a Clojuristic way of interacting with Spark, and as you can see in the presentation "Big Data with style - the Clojure/Spark way"
it's optimized for performance
it's used in production and there are others also using it
Here's what's missing:
There's currently no support for Spark Streaming, Spark SQL, Spark Dataframes or Spark ML. That might come in the future, I'm happy to accept pull requests, but it's currently not main focus (at the time of writing, April 2015).
I hope this helps you make up your mind on going with Clojure or starting to learn Scala.
It's hard to say that something like Spark doesn't support Clojure. It would make more sense to ask if there are good libraries to use that project that are easy to use from Clojure. From googling around Flambo looks like a viable option and at the various clojure conferences I hear incidental talk of using Spark in several contexts.
I would say that there is fairly low technical risk in using spark from Clojure so you are free to make this choice based on the other constraints of your working environment and pojects. Being particularly biased toward Clojure I whole heartedly recommend at least trying it and see what parts of the language and ecosystem work well for you.

discrete event simulators for C++

I am currently looking for a discrete event simulator written for C++. I did not find much on the web written specifically in OO-style; there are some, but outdated. Some others, such as Opnet, Omnet and ns3 are way too complicated for what I need to do. And besides, I need to simulate agent-based algorithms capable of simulating systems of thousands of nodes.
Does anybody know anything suitable for my needs?
Others have good direct answers, but I'm going to suggest an alternative. If I understand you right, you want a system in C++ or such where you can post events that fire in the future, and code is run when those events fire.
I had a project to do like this, and I started out trying to write such an event system in C++ and then quickly realized I had a better solution.
Have you considered writing your program in behavioral Verilog? That may seem strange to write software in a hardware description language, but a Verilog simulator is an event-based system underneath, and behavioral Verilog is a very convenient way to express events, timing, triggers, etc. There is a free Verilog simulator (which is what I used) called Icarus Verilog. If you're not using Ubuntu or some Linux distro with Icarus already in a package, building from source is straightforward.
I would recommend having a second look to OmNet++. At first sight it may look quite complex, but if you look it into more detail you will find that most of the complexity is in the network add-on (the INET Framework). Unless you are going to do a detailed network simulation you do not need the INET.
Using OmNet++ core is not specially difficult and it may be simpler than other similar tools.
You may want to have a look to an intro.
One of the things that makes OmNet++ attractive to me is its scalability. Is possible to run large simulations in a desktop. Besides, it is possible to scale the same simulation to a cluster without rewriting the code.
You should consider SystemC, although I'd also recommend taking a second look at OmNet++.
We use SIMLIB at my school. It is very fast, easy to understand, object oriented, discrete and continuous simulator. It might look outdated but it is still maintained.
There is CSIM from Mesquite Software which supports developing models in C, C++ and Java. However, it is paid-commercial, AFAIK.
Take a look at GBL library. It's written in modern C++ and even supports C++0x features like move semantics and lambda functions. It offers several modeling mechanisms: synchronous and asynchronous event handlers, preemptive threads, and fibers. You can create purely behavioral, cycle accurate, and real-time models, or any mixture of those.

state-of-the-art C++ projects

I like to go through existing software projects as a source of learning and new ideas.
doing so I discover things that I did not think were possible
in your opinion, what is the top state of the art C++ project that you have used/develop/extended? can you state reasons why you consider it state of the art and what you can learn from it.
my latest craze is boost::phoenix, http://www.boost.org/doc/libs/1_43_0/libs/spirit/phoenix/doc/html/index.html, which is very comprehensive functional programming library.
Despite its capabilities it is straightforward and easy to extend. After some tweaking I was able to write multithreaded lambda parallel loops and mathematical domain specific language, probably within 2 weeks.
What is yours? (please do not just say boost, as it is huge collection of project)
Personally, I like to look at code in Qt. I do use it everyday, but it seems like every day I use it, I find something new. In terms of total code, it is probably as big as boost. But it comes with excellent documentation and examples and complete source code and is free for LPGL & GPL versions.
For me, what I liked about Qt was that it's concepts matched the way C# works, so it was a fairly easy transition back into c++ for me. But by looking at their code, it has really given me many ways (although not as many as SO) to understand some of the complexity in C++
From what I've seen, the code-sources that I have learned the most from have been from fairly complex 3rd party software libraries. Havok is an excellent example from which I've not only learned programming practices and solutions, but also quite a few mathematical and philosophical discussions. I've also seen some other code-sources which have not been open sourced from which I've learned how to not solve things.
Game engines for AAA-titles in general tend to involve a lot of complex code that tries to push as much as possible through a piece of hardware. I guess that the recommendation goes for all software that tries to achieve something similar but I've only dived into game engines when it comes to such software. AAA-titled game engines often have good or bad solutions to study and I would generally recommend looking into those. There are some that are open source... I think Source/Valve have released theirs in different stages.

Node.js or Erlang

I really like these tools when it comes to the concurrency level it can handle.
Erlang/OTP looks like much more stable solution but requires much more learning and a lot of diving into functional language paradigm. And it looks like Erlang/OTP makes it much better when it comes to multi-core CPUs (correct me if I am wrong).
But which should I choose? Which one is better in the short and long term perspective?
My goal is to learn a tool which makes scaling my Web projects under high load easier than traditional languages.
I would give Erlang a try. Even though it will be a steeper learning curve, you will get more out of it since you will be learning a functional programming language. Also, since Erlang is specifically designed to create reliable, highly concurrent systems, you will learn plenty about creating highly scalable services at the same time.
I can't speak for Erlang, but a few things that haven't been mentioned about node:
Node uses Google's V8 engine to actually compile javascript into machine code. So node is actually pretty fast. So that's on top of the speed benefits offered by event-driven programming and non-blocking io.
Node has a pretty active community. Hop onto their IRC group on freenode and you'll see what I mean
I've noticed the above comments push Erlang on the basis that it will be useful to learn a functional programming language. While I agree it's important to expand your skillset and get one of those under your belt, you shouldn't base a project on the fact that you want to learn a new programming style
On the other hand, Javascript is already in a paradigm you feel comfortable writing in! Plus it's javascript, so when you write client side code it will look and feel consistent.
node's community has already pumped out tons of modules! There are modules for redis, mongodb, couch, and what have you. Another good module to look into is Express (think Sinatra for node)
Check out the video on yahoo's blog by Ryan Dahl, the guy who actually wrote node. I think that will help give you a better idea where node is at, and where it's going.
Keep in mind that node still is in late development stages, and so has been undergoing quite a few changes—changes that have broke earlier code. However, supposedly it's at a point where you can expect the API not to change too much more. So if you're looking for something fun, I'd say node is a great choice.
I'm a long-time Erlang programmer, and this question prompted me to take a look at node.js. It looks pretty damn good.
It does appear that you need to spawn multiple processes to take advantage of multiple cores. I can't see anything about setting processor affinity though. You could use taskset on linux, but it probably should be parametrized and set in the program.
I also noticed that the platform support might be a little weaker. Specifically, it looks like you would need to run under Cygwin for Windows support.
Looks good though.
Edit
Node.js now has native support for Windows.
I'm looking at the same two alternatives you are, gotts, for multiple projects.
So far, the best razor I've come up with to decide between them for a given project is whether I need to use Javascript. One existing system I'm looking to migrate is already written in Javascript, so its next version is likely to be done in node.js. Other projects will be done in some Erlang web framework because there is no existing code base to migrate.
Another consideration is that Erlang scales well beyond just multiple cores, it can scale to a whole datacenter. I don't see a built-in mechanism in node.js that lets me send another JS process a message without caring which machine it is on, but that's built right into Erlang at the lowest levels. If your problem isn't big enough to need multiple machines or if it doesn't require multiple cooperating processes, this advantage isn't likely to matter, so you should ignore it.
Erlang is indeed a deep pool to dive into. I would suggest writing a standalone functional program first before you start building web apps. An even easier first step, since you seem comfortable with Javascript, is to try programming JS in a more functional style. If you use jQuery or Prototype, you've already started down this path. Try bouncing between pure functional programming in Erlang or one of its kin (Haskell, F#, Scala...) and functional JS.
Once you're comfortable with functional programming, seek out one of the many Erlang web frameworks; you probably shouldn't be writing your app directly to something low-level like inets at this late stage. Look at something like Nitrogen, for instance.
While I'd personally go for Erlang, I'll admit that I'm a little biased against JavaScript. My advice is that you evaluate few points:
Are you reusing existing code in either of those languages (both in terms of source code, and programmer experience!)
Do you need/want on-the-fly updates without stopping the application (This is where Erlang wins by default - its runtime was designed for that case, and OTP contains all the tools necessary)
How big is the expected traffic, in terms of separate, concurrent operations, not bandwidth?
How "parallel" are the operations you do for each request?
Erlang has really fine-tuned concurrency & network-transparent parallel distributed system. Depending on what exactly is the project, the availability of a mature implementation of such system might outweigh any issues regarding learning a new language. There are also two other languages that work on Erlang VM which you can use, the Ruby/Python-like Reia and Lisp-Flavored Erlang.
Yet another option is to use both, especially with Erlang being used as kind of "hub". I'm unsure if Node.js has Foreign Function Interface system, but if it has, Erlang has C library for external processes to interface with the system just like any other Erlang process.
It looks like Erlang performs better for deployment in a relatively low-end server (512MB 4-core 2.4GHz AMD VM). This is from SyncPad's experience of comparing Erlang vs Node.js implementations of their virtual whiteboard server application.
There is one more language on the same VM that erlang is -> Elixir
It's a very interesting alternative to Erlang, check this one out.
Also it has a fast-growing web framework based on it-> Phoenix Framework
whatsapp could never achieve the level of scalability and reliability without erlang https://www.youtube.com/watch?v=c12cYAUTXXs
I will Prefer Erlang over Node.
If you want concurrency, Node can be substituted by Erlang or Golang because of their light weight processes.
Erlang is not easy to learn so requires a lot of effort but its community is active so can get help from that, this is only the reason why people prefer Node .

Where can I find documentation for publishing data to perfmon in C++?

Years ago I wrote some code to "publish" data for perfmon to consume. Using those counters is pretty well documented, but I found it challenging to find (at the time) good documentation and sample code to publish the data for perfmon.
Does anyone know where I can get this documentation? I also seem to recall some class wrappers, but I may be mistaken.
EDIT:
I did find this, and I will keep looking for "custom application performance counters".
You're bringing back old memories!
From 1998, Jeffrey Richter wrote an article in Microsoft Systems Journal describing how to create your own perfmon counters, its very easy (after cutting and pasting his template code just add shared-memory variables in a dll, and update them as needed).
Are you looking for Managed or native wrappers? The link you posted is managed, but your question is native (c++).
In managed world things are fairly easy and straight forward to publish counters using the PerformanceCounter and it relatives http://msdn.microsoft.com/en-us/library/system.diagnostics.performancecounter.aspx. For moderate volumes they can also be used for reading counters, for high volumes though you must use the PDH.DLL as the overhead of the managed counters reading one counter at a time will be overwhelming in my experience.
Personally I developed XSLT transformations to generate all prfmon counters in my apps, I blogged about this here: http://rusanu.com/2009/04/11/using-xslt-to-generate-performance-counters-code/ and I have more upcoming material to blog down the pipe.
If your question is about the unmanaged API, I don't have any pointer handy, but me personally I would go down the path of using again XSLT to generate all my perfmon code, as so much of it is repetitive.