TBB Dynamic Flow Graphs - c++

I'm trying to come up with a way to define a flow graph (think TBB) defined at runtime. Currently, we use TBB to define the nodes and the edges between the nodes at compile time. This is sort of annoying because we have people who want to add processing steps and modify the processing chain without recompiling the whole application or really having to know anything about the application beyond how to add processing kernels. In an ideal world I would have some sort of plugin framework using dlls. We already have the software architected so that each node in TBB represents a processing step so it's pretty easy to add stuff if you're willing to recompile.
As a first step, I was trying to come up with a way to define a TBB flow graph in YAML but it was a massive rabbit hole. Does anyone know if something like this exists before I go all in on implementing this from scratch? It will be a fun project but no point in duplicating work.

I am not sure if anything like this exists in a TBB companion library but it is definitely doable to implement a small subset of the functionalities of Flow Graph configurable at runtime.
If the data that transit through your graph have a well defined type, aka your nodes are basically function_node<T, T> things are manageable. If the Graph transforms data from one type to another it gets more complicated -one solution would be to use a variant of these types and handle the possibly incompatible types at runtime. That really depends on the level of flexibility required.
With:
$ cat nodes.txt
# type concurrency params...
multiply unlimited 2
affine serial 5 -3
and
$ cat edges.txt
# src dst
0 1
1 2
where index 0 is a source node, here is a scaffold of how I would implement it:
using data_t = double;
using node_t = function_node<data_t , data_t >;
graph g;
std::vector<node_t> nodes;
auto node_factory = [&g](std::string type, std::string concurrency, std::string params) -> node_t {
// Implement a dynamic factory of nodes
};
// Add source node first
nodes.push_back(flow::input_node<data_t>(g,
[&]( oneapi::tbb::flow_control &fc ) -> data_t { /*...*/ });
// Parse the node description file and populate the node vector using the factory
for (auto&& n : nodes)
nodes.push_back(node_factory(n.type, n.concurrency, n.params));
// Parse the edge description file and call make_edge accordingly
for (auto&& e : edges)
flow::make_edge(nodes[e.src], nodes[e.dst]);
// Run the graph
nodes[0].activate();
g.wait_for_all();

Related

How to build a graph of specific function calls?

I have a project where I want to dynamically build a graph of specific function calls. For example if I have 2 template classes, A and B, where A have a tracked method (saved as graph node) and B has 3 methods (non-tracked method, tracked method and a tracked method which calls A's tracked method), then I want to be able to only register the tracked method calls into the graph object as nodes. The graph object could be a singleton.
template <class TA>
class A
{
public:
void runTracked()
{
// do stuff
}
};
template <class TB>
class B
{
public:
void runNonTracked()
{
// do stuff
}
void runTracked()
{
// do stuff
}
void callATracked()
{
auto a = A<TB>();
a.runTracked();
// do stuff
}
};
void root()
{
auto b1 = B<int>();
auto b2 = B<double>();
b1.runTracked();
b2.runNonTracked();
b2.callATracked();
}
int main()
{
auto b = B<int>();
b.runTracked()
root();
return 0;
}
This should output a similar graph object to the below:
root()
\-- B<int>::runTracked()
\-- B<double>::callATracked()
\-- A<double>::runTracked()
The tracked functions should be adjustable. If the root would be adjustable (as in the above example) that would be the best.
Is there an easy way to achieve this?
I was thinking about introducing a macro for the tracked functionalities and a Singleton graph object which would register the tracked functions as nodes. However, I'm not sure how to determine which is the last tracked function in the callstack, or (from the graphs perspective) which graph node should be the parent when I want to add a new node.
In general, you have 2 strategies:
Instrument your application with some sort of logging/tracing framework, and then try to replicate some sort of tracing mixin-like functionality to apply global/local tracing depending on which parts of code you apply the mixins.
Recompile your code with some sort of tracing instrumentation feature enabled for your compiler or runtime, and then use the associated tracing compiler/runtime-specific tools/frameworks to transform/sift through the data.
For 1, this will require you to manually insert more code or something like _penter/_pexit for MSVC manually or create some sort of ScopedLogger that would (hopefully!) log async to some external file/stream/process. This is not necessarily a bad thing, as having a separate process control the trace tracking would probably be better in the case where the traced process crashes. Regardless, you'd probably have to refactor your code since C++ does not have great first-class support for metaprogramming to refactor/instrument code at a module/global level. However, this is not an uncommon pattern anyways for larger applications; for example, AWS X-Ray is an example of a commercial tracing service (though, typically, I believe it fits the use case of tracing network calls and RPC calls rather than in-process function calls).
For 2, you can try something like utrace or something compiler-specific: MSVC has various tools like Performance Explorer, LLVM has XRay, GCC has gprof. You essentially compile in a sort of "debug++" mode or there is some special OS/hardware/compiler magic to automatically insert tracing instructions or markers that help the runtime trace your desired code. These tracing-enabled programs/runtimes typically emit to some sort of unique tracing format that must then be read by a unique tracing format reader.
Finally, to dynamically build the graph in memory is a a similar story. Like the tracing strategies above, there are a variety of application and runtime-level libraries to help trace your code that you can interact with programmatically. Even the simplest version of creating ScopedTracer objects that log to a tracing file can then be fitted with a consumer thread that owns and updates the trace graph with whatever desired latency and data durability requirements you have.
Edit: If you would like, OpenTelemetry/Jaeger may be a good place to start visualizing traces once you have extracted the data (and you can also report directly to it if you want), although it prefers a tree presentation format: Jaeger documentation for Trace Detail View

How to add a Branch to An Already Existing TTree: ROOT

I have an existing TTree after doing a simulation. I would like to add a Branch to this TTree and I want to call it Muon.Mass to the tree. I would also want to give the Muon.Mass branch a value of 0.1.
How can I write that?
I have seen how to create TTrees from scratch and to have branches of different variables. But I am not sure exactly what to do when I already have a TTree.
You can call the TTree::Branch method on an existing TTree the same way as for a new TTree. Just for filling you need to ensure you only fill the branch. (this is a strongly cut down example from https://github.com/pseyfert/tmva-branch-adder)
void AddABranch(TTree* tree) {
Float_t my_local_variable;
TBranch* my_new_branch = tree->AddBranch( ... /* use address of my_local_variable */ );
for (Long64_t entry = 0 ; entry < tree->GetEntries() ; ++e ) {
tree->GetEntry();
/* something to compute my_local_variable */
my_new_branch->Fill();
}
}
As alternative you might want to look at the root tutorials for tree friends.
As a side note, depending what you want to do with the tree / whom you give the tree to, I advise against using . in branch names as they cause headache when running MakeClass (branch names can contain periods, but c++ variables can't, so the automatically generated class members for each branch will undergo character replacement).

Connecting nodes of different GraphDef's

From Python, I have a frozen graph.pb that I'm currently using in a C++ environment. Now the data for the input tensor are currently preprocessed on the CPU, but I would like to do this step in another GraphDef to run it on the GPU, but I can't seem to find a way to connect nodes between two GraphDef's.
Lets assume my frozen graph have an input/placeholder named mid that I'd like to connect with the preprocessing steps below
tf::GraphDef create_graph_extension() {
tf::Scope root = tf::Scope::NewRootScope();
auto a = tf::ops::Const(root.WithOpName("in"), {(float) 23.0, (float) 31.0});
auto b = tf::ops::Identity(root.WithOpName("mid"), a);
tf::GraphDef graph;
TF_CHECK_OK(root.ToGraphDef(&graph));
return graph;
}
I usually use session->Extend() to run multiple graphs in the same session, but always making sure their node names are unique. With non-unique node names, that I hoped to connect, I get an error
Failed to install graph:
Invalid argument: GraphDef argument to Extend includes node 'mid', which
was created by a previous call to Create or Extend in this session.
P.s. It seems like it is possible in python at least (link)
You can achieve what you're looking for using the same idea that was suggested for Python - import one GraphDef into another and remap inputs.
In case you do use the C API (which has stability guarantees), you'd want to look at:
TF_GraphImportGraphDef (which is parallel to the tf.import_graph_def call in Python), and
TF_ImportGraphDefOptionsAddInputMapping which serves the same purpose as the input_map argument in Python.
These are implemented on top of the C++ ImportGraphDef function, which you might be able to use directly instead (though that doesn't seem to yet be part of the exported C++ API)
Hope that helps.

How to exchange custom data between Ops in Nuke?

This questions is addressed to developers using C++ and the NDK of Nuke.
Context: Assume a custom Op which implements the interfaces of DD::Image::NoIop and
DD::Image::Executable. The node iterates of a range of frames extracting information at
each frame, which is stored in a custom data structure. An custom knob, which is a member
variable of the above Op (but invisible in the UI), handles the loading and saving
(serialization) of the data structure.
Now I want to exchange that data structure between Ops.
So far I have come up with the following ideas:
Expression linking
Knobs can share information (matrices, etc.) using expression linking.
Can this feature be exploited for custom data as well?
Serialization to image data
The custom data would be serialized and written into a (new) channel. A
node further down the processing tree could grab that and de-serialize
again. Of course, the channel must not be altered between serialization
and de-serialization or else ... this is a hack, I know, but, hey, any port
in a storm!
GeoOp + renderer
In cases where the custom data is purely point-based (which, unfortunately,
it isn't in my case), I could turn the above node into a 3D node and pass
point data to other 3D nodes. At some point a render node would be required
to come back to 2D.
I am going into the correct direction with this? If not, what is a sensible
approach to make this data structure available to other nodes, which rely on the
information contained in it?
This question has been answered on the Nuke-dev mailing list:
If you know the actual class of your Op's input, it's possible to cast the
input to that class type and access it directly. A simple example could be
this snippet below:
//! #file DownstreamOp.cpp
#include "UpstreamOp.h" // The Op that contains your custom data.
// ...
UpstreamOp * upstreamOp = dynamic_cast< UpstreamOp * >( input( 0 ) );
if ( upstreamOp )
{
YourCustomData * data = yourOp->getData();
// ...
}
// ...
UPDATE
Update with reference to a question that I received via email:
I am trying to do this exact same thing, pass custom data from one Iop
plugin to another.
But these two plugins are defined in different dso/dll files.
How did you get this to work ?
Short answer:
Compile your Ops into a single shared object.
Long answer:
Say
UpstreamOp.cpp
DownstreamOp.cpp
define the depending Ops.
In a first attempt I compiled the first plugin using only UpstreamOp.cpp,
as usual. For the second plugin I compiled both DownstreamOp.cpp and
UpstreamOp.cpp into that plugin.
Strangely enough that worked (on Linux; didn't test Windows).
However, by overriding
bool Op::test_input( int input, Op * op ) const;
things will break. Creating and saving a Comp using the above plugins still
works. But loading that same Comp again breaks the connection in the node graph
between UpstreamOp and DownstreamOp and it is no longer possible to connect
them again.
My hypothesis is this: since both plugins contain symbols for UpstreamOp it
depends on the load order of the plugins if a node uses instances of UpstreamOp
from the first or from the second plugin. So, if UpstreamOp from the first plugin
is used then any dynamic_cast in Op::test_input() will fail and the two Op cannot
be connected anymore.
It is still surprising that Nuke would even bother to start at all with the above
configuration, since it can be rather picky about symbols from plugins, e.g if they
are missing.
Anyway, to get around this problem I did the following:
compile both Ops into a single shared object, e.g. myplugins.so, and
add TCL script or Python script (init.py/menu.py)which instructs Nuke how to load
the Ops correctly.
An example for a TCL scripts can be found in the dev guide and the instructions
for your menu.py could be something like this
menu = nuke.menu( 'Nodes' ).addMenu( 'my-plugins' )
menu.addCommand('UpstreamOp', lambda: nuke.createNode('UpstreamOp'))
menu.addCommand('DownstreamOp', lambda: nuke.createNode('DownstreamOp'))
nuke.load('myplugins')
So far, this works reliably for us (on Linux & Windows, haven't tested Mac).

How do I use AdaBoost for feature selection?

I want to use AdaBoost to choose a good set features from a large number (~100k). AdaBoost works by iterating though the feature set and adding in features based on how well they preform. It chooses features that preform well on samples that were mis-classified by the existing feature set.
Im currently using in Open CV's CvBoost. I got an example working, but from the documentation it is not clear how to pull out the feature indexes that It has used.
Using either CvBoost, a 3rd party library or implementing it myself, how can pull out a set of features from a large feature set using AdaBoot?
With the help of #greeness answer I made a subclass of CvBoost
std::vector<int> RSCvBoost::getFeatureIndexes() {
CvSeqReader reader;
cvStartReadSeq( weak, &reader );
cvSetSeqReaderPos( &reader, 0 );
std::vector<int> featureIndexes;
int weak_count = weak->total;
for( int i = 0; i < weak_count; i++ ) {
CvBoostTree* wtree;
CV_READ_SEQ_ELEM( wtree, reader );
const CvDTreeNode* node = wtree->get_root();
CvDTreeSplit* split = node->split;
const int index = split->condensed_idx;
// Only add features that are not already added
if (std::find(featureIndexes.begin(),
featureIndexes.end(),
index) == featureIndexes.end()) {
featureIndexes.push_back(index);
}
}
return featureIndexes;
}
Claim: I am not a user of opencv. From the documentation, opencv's adaboost is using the decision tree (either classification tree or regression tree) as the fundamental weak learner.
It seems to me this is the way to get the underline weak learners:
CvBoost::get_weak_predictors
Returns the sequence of weak tree classifiers.
C++: CvSeq* CvBoost::get_weak_predictors()
The method returns the sequence of weak classifiers.
Each element of the sequence is a pointer to the CvBoostTree class or
to some of its derivatives.
Once you have access to the sequence of CvBoostTree*, you should be able to inspect which features are contained in the tree and what are the split value etc.
If each tree is only a decision stump, only one feature is contained in each weak learner. But if we allow deeper depth of tree, a combination of features could exist in each individual weak learner.
I further took a look at the CvBoostTree class; unfortunately the class itself does not provide a public method to check the internal features used. But you might want to create your own sub-class inheriting from CvBoostTree and expose whatever functionality.