Graph Library API design - c++

I am writing my own Graph library Graph++, I have question regarding what should interface return. For example What should my BFS return, I am confuse on the point that whether it should return set of vertices visited in the order , or should I have callback function, which will get invoked during each visit.
What could be the best option so that my library will easily consumable.

A recurring pattern in the stl is to offer iterators. Your traversal algorithms might return a start iterator, and the library user could increment it as desired, while comparing against an end() iterator that either it or the graph provides.
The visitor pattern may also be relevant to your interests.

I don’t want to be unhelpful or sound arrogant. This is just a personal opinion and you should take it for what it is worth. You did not say why you are writing this library, so I’ll assume you have a specific problem to solve. If you are doing it just for fun or to learn, please go ahead and disregard the reminder of this reply.
Graphs are extremely generic abstractions. Any data structure more complex than a tree is a graph. Most programs have such data structures. For example, a web site containing linked pages is a graph. So is a representation of a map. However, when you think of these as graphs, you ignore all differences between web sites and street maps and focus on the only thing that it is common.
In the vast majority of cases, the details you are trying to abstract away, the fact that web pages are HTML, links are URLs, streets have speed limits, intersections have traffic lights, and so on, are more important. If you start your implementation with a graph abstraction, by the time you implement these other details on top of it you’ve got yourself into quite a mess. It is much better to start with other, more important abstractions as building blocks and connect those together to form a graph. Sure, you won’t get the shortest path algorithm for free for your street map, for example, but you are likely interested in the fastest route anyway, for which you need speed limits, traffic lights, and other information.
I guess what I’m trying to say is that I see very limited uses for a generic graph library. It is nice that we have the Boost Graph Library, but AFAIK, not many people are using it.

In C++11, prefer functional approach to iterative. In C++03, use iterator strategies.

Related

Manually manage and update Intel TBB flow graph?

I have successfully prototyped an application using Intel's awesome TBB flow graph library. It seems to work quite well, but now I need to refactor the code into a production-ready version.
Previously, I have worked with some larger and more "over-developed" frameworks for this particular domain (the work is in image processing and previous applications have used ITK/VTK). For this application, however, I am trying to take a lower-level and more focused approach.
Currently, I am just assembling my entire graph in main() which is obviously not sustainable. I'd like to allow the pipeline to run iteratively so that I can grab output data from each stage and display it for debug/analysis purposes.
My idea so far is to abstract each logical "stage" of the application into a class that accepts a &tbb::flow::graph as a constructor argument and internally stores a reference to the graph node it controls. I can have the wrapper class allocate an additional tbb::flow::broadcast_node at the output and an async node after that to fire off events.
Is this a sensible design concept? Generally speaking, how have others integrated the TBB flow graph concepts into the structure of their application? The examples and documentation are quite scant for this particular portion of the TBB library.
As with any design, I don't think there is a clearly right or wrong way to do it. I don't know your code good enough, but breaking the code up by logical stages is probably a good idea.
When it comes to frameworks like TBB, a design decision that you have to make is whether you should hide all framework aspects behind an interface. The advantage is that you could later swap out the implementation with another implementation (e.g., replace TBB by OpenMP). On the other hand, introducing an additional layer might not be needed in all cases. Especially, if it is unlikely that you will ever replaced TBB.
The design design that you describe in your question is how to structure the framework dependent part. It depends very much on the concrete algorithm that you are implementing. For instance, if it consists of applying separate transformations on one images, creating one class per transformation step could be a good approach.
In addition, it might make sense to wrap everything in a function or class. If the operation that you are implementing takes one image as an input and produces one image as an output, this is something that can be hidden behind an interface that hides the implementation details (in this case TBB).

How to make proper design/architecture of partially reusable algorithm? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am very sorry for the long explanation, but it is required for proper understanding.
I am working on computer vision algorithms for industrial tasks. Computer vision algorithms tend to be very complicate. Usually they involve calls for dozens (at the very least) of simpler algorithms (that are not simple either). Those calls form certain hierarchy: bigger tasks call some smaller ones, which in turn call even smaller ones, and so on.
Let’s take for example typical computer vision task: find object in image under certain conditions. This is a task that should be performed in dozens of different applications. Each application has its own set of conditions and thus it is impossible to create single algorithm that works for all of them. But they are pretty similar. Usually it is enough to replace one or two lower level functions. For example: use different method for detection of points of interest in image.
And here comes a problem: for each new application I had to copy whole code from one of the existing applications and adapt relevant parts, which is a bad practice. I am trying to eliminate those duplications by creating system of algorithms that can be used in all application without changing the code itself. Here is the list of issues system had to deal with (at least the ones I identified so far):
1) Arguments provided to main algorithm should be able to set the 'algorithmic flow' inside the system, i.e. they determine what lower level algorithms are used and how
2) Different sub-algorithms that perform same task may require different inputs. One may need an array of ints, another requires pair of double, and so on... Algorithms on the higher level should be oblivious to replacement of one sub-algorithm with another. That means they should not be aware of what arguments they receive and pass down to sub-algorithms. Same true for output of sub-algorithm. It may vary if different combination of sub-algorithms is used
3) The system must be extendable. If new sub-algorithm became available (for example: yet another way to find points of interest) the system should be able to call it. I understand that changes might be unavoidable at this point, but I would like to keep them at minimum. And in any case the system should be able to work at the same way with previous sets of arguments.
4) System must be debuggable. End user of the system should have reasonable way to dump debug information about the 'algorithmic flow' in his system, so that algorithm developer will be able to recreate the situation. It is not that trivial considering requirement (3).
5) There should be reasonable way to make sanity check for the flow of algorithms.
6) I am not going to throw exceptions but there should be reasonable way to return success / fail status of each algorithm. Again it is not easy because of requirement (3).
7) This one is more 'good to have' rather than 'must have', but it may be important. Some calculations may be performed by multiple sub-algorithms. For example calculation of gradients in image may (or may not) be required for multiple different tasks. It is good to have an option to store results of those calculations in order to reuse them later.
I created some kind of solution to this but it is far from being good. Do you have any recommendations about how this should be done?
Used language: C++
Thanks you
I'd just use some tried and true design patterns.
Use a strategy pattern to represent an algorithm that you may wish to swap out for alternatives.
Use a factory to instantiate different algorithm (strategy) instances based on some input parameter or runtime context - I'm a fan of the prototype factory where you have "inert" instances of each object in some lookup table, and based on a key you pass in you can request a clone of the one needed. I like it mainly because it's easiest to extend - you can even add new configured prototype instances to such a factory at runtime.
Note that the same "strategy" model does not have to serve for everything - it sounds like you might have some higher-level/fuzzy operations which then assemble or chain together low-level/detailed operations. The high level operations could be one type of abstract object while the detailed algorithms are the more concrete strategy instances.
As far as the inputs to the various algorithms, if it varies a lot from algorithm to algorithm you could use an extensible object like a dictionary for parameters so that each algorithm can use just the parameters it needs and ignore the others for an operation. If the dictionary is modifiable during the operation this would also permit upstream algorithms to add parameters for downstream algorithms. Key-value pairs are pretty easy to dump to a log or view in a debugger.
If each strategy instance has a unique semantic identifier you could easily debug the algorithms that get instantiated and chained together. (I use an audio DSP library that has a function to dump a description of the whole chain of configured audio processors, it's very handy).
If you use a system with strategy patterns and extensible parameters you should also be able to segregate shared algorithms from application-specific algorithms, but still have the same basic framework for instantiating and running them.
hth
I'm going to assume that you are a competent OO programmer with good domain knowledge, and your problem is more about a higher level of organisation of software components (implementing algorithms) than OO generally provides.
The patterns mentioned by #orpheist make perfect sense. Consider them. They will not solve all the problems you list. You should also consider the following.
In what form will the data be for algorithms to access?
Will you need adapters to connect one component to another?
Do you pass the data to the component or the component to the data?
Do you want to assemble a pipeline or group of components to build higher ones, which can then be applied to the data?
Do you need a language (XML, DSL) to express connections and to allow for easy experimentation?
Is performance a dominant issue already, or can you afford more interpretive techniques at this stage?
It think you need to refine some of your questions and provide some more concrete specifics. I also think your questions would be a better fit on programmers.stackexchange than here.

How does one best integrate with clojure abstractions?

I am implementing an ordered set in clojure, where I retrieve elements based on their rank. This means that I can retrieve the 4th element (according to the set's ordering), the 3rd, or the 7th, all in logarithmic time.
In order to get my new data structure integrated with clojure's common methods (or "abstractions") such as conj, get, nth, etc., Which is the better way to do it:
Actually implement conj, for example, in my datatype's protocol, or
Implement Rich Hickey's clojure.lang.IPersistentSet or some interface like it.
The first seems easier, but also easier to mess up the semantics of the function. The second seems like I am implementing an interface that was never meant to be part of the public API, and the actual methods that are associated with that interface (protocol) are confusingly different. For example, it seems that in order to implement conj with my set, I must implement a cons method of clojure.lang.IPersistentSet, which has a different name. There seems to have little documentation on how this all works, which poses a large challenge in implementing this ranked set.
Which one should I choose? Should I implement my own or the methods of a clojure.lang interface? If I should do the latter, where is some good documentation that can guide me through the prosses?
EDIT: I want to make it clear that I am trying to make a set from which you can retrieve any element (or "remove" it) in logarithmic time by specifying the element's rank (e.g., "give me the 5th element, mr. set."). To my knowledge, no such set yet exists in clojure.
Firstly, I have just released a library called avl.clj which implements persistent sorted maps and sets with support for the standard Clojure API (they are drop-in replacements for the built-in sorted collections), as well as transients and logarithmic time rank queries (via clojure.core/nth)1. Both Clojure and ClojureScript are supported; performance on the Clojure side is mostly on a par with the built-in variants in my preliminary benchmarking. Follow the link above if you'd like to give it a try. Any experience reports would be greatly appreciated!
As for the actual question: I'm afraid there isn't much in the way of documentation on Clojure's internal interfaces, but still, implementing them is the only way of making one's custom data structures fit in with the built-ins. core.rrb-vector (which I have written and now maintain) takes this approach, as do other Contrib libraries implementing various data structures. This is also what I've done with avl.clj, as well as sorted.clj (which is basically the ClojureScript port of the red-black-tree-based sorted collections backported to Clojure). All of these libraries, as well as Clojure's own gvec.clj file which implements the primitive-storing vectors produced by clojure.core/vector-of, can serve as examples of what's involved. (Though I have to say it's easy to miss a method here and there...)
The situation is much simpler in ClojureScript, where all the core protocols are defined at the top of core.cljs, so you can just look at the list and pick the ones relevant to your data structure. Hopefully the same will be true on the Clojure side one day.
1 Removal by rank is (disj my-set (nth my-set 123)) for now. I might provide a direct implementation later on if it turns out to make enough of a difference performance-wise. (I'll definitely write one to check if it does.)

Is there a good graph layout library callable from C++?

The (directed) graphs represent finite automata. Up until now my test program has been writing out dot files for testing. This is pretty good both for regression testing (keep the verified output files in subversion, ask it if there has been a change) and for visualisation. However, there are some problems...
Basically, I want something callable from C++ and which plans a layout for my states and transitions but leaves the drawing to me - something that will allow me to draw things however I want and draw on GUI (wxWidgets) windows.
I also want a license which will allow commercial use - I don't need that at present, and I may very well release as open source, but I don't want to limit my options ATM.
The problems with GraphViz are (1) the warnings about building from source on Windows, (2) all the unnecessary dependencies for rendering and parsing, and (3) the (presumed) lack of a documented API specifically and purely for layout.
Basically, I want to be able to specify my states (with bounding rectangle sizes) and transitions, and read out positions for the states and waypoints for each transition, then draw based on those co-ordinates myself. I haven't really figured out how annotations on transitions should be handled, but there should be some kind of provision for specifying bounding-box-sizes for those, associating them with transitions, and reading out positions.
Does anyone know of a library that can handle those requirements?
I'm not necessarily against implementing something for myself, but in this case I'd rather avoid it if possible.
Hmm, GDToolkit (or GDT) looks okay: many of the images in the tutorial look pretty nice, and it doesn't look like it's terribly complicated to use.
Edit: But checking the license, it looks like it's commercial software :-(. Whoops!
OGDF is under the GPL.
Pigale is also under the GPL.
GoVisual is commercial software, but it looks like it starts at $1800 for one developer.
I was dealing with a similar problem earlier this year. One important input parameter for a decision however is the expected number of nodes.
I decided to use the Browser as the GUI and therefore looked for nice Javascript libraries, one i came across was wireit, it is very well suited for technical layouts (and also editing with drag and drop and "on the fly" layouting). You could easily connect that to your c++ by running a small webserver in a thread (You will need some kind of eventloop/thread thingie for GUI anyways).
Well just my 2 cents.
Although the answers so far were worth an upvote, I can't really accept any of them. I've still been searching, though.
One thing I found is AGLO. The code is GPL v1, but there are papers that describe the algorithms, so it should be easy enough to re-implement from scratch if necessary.
There's also the paper by Gansner, Koutsofios, North and Vo - "A Technique for Drawing Directed Graphs" - available from here on the Graphviz site.
I've also been looking closely at the BSD-licensed (but Java) JGraph.
One way or the other, it looks like I might be re-implementing the wheel, if not actually re-inventing it.
Here is a good collection of Graph Libs with comparison and searching functionality:
http://gvsr.polytech.univ-nantes.fr/GVSR/task?action=browse#
Maybe you find a lib which fits for you.

Testing approach for algorithms with complex outputs

How to test a result of a program that is basically a black box? For example one year ago I had to write a B tree as a homework and I really struggled with testing the correctness. What strategies do you use in such scenarios? Visualization? Robust input-->result sets of testing data? What do you do when it is hard to get such data because the only way how to get them is your proper working program?
EDIT: I think that my question was misunderstood. There was no problem with understanding how B tree works. That is trivial. But writing robust tests for validating its proper functionality is not so trivial. I think that this school problem is similar to many practical REAL word scenarios and test cases. And sometimes understanding the domain is quite different from delivering working and correct program...
EDIT2: And yes, with B tree it is possible to validate proper behavior with pen and paper. But this is really dirty and not fun :) This is not working well with problems that requires huge amount of data for their validation...
I'm not sure these answers really capture the problem at hand. A B-tree's input and output aren't any different from those of any other dictionary---but the algorithm performs better, if it's implemented correctly. It's only really got two functions to test (add, and find) so theoretically, "black-box" testing of this single component should be fine. Designing for testability isn't the issue, since no matter how you do it the whole algorithm will be one component.
So the question is: when you have to implement subtle algorithms, the kinds with complicated output that you can't always understand in your head so well, how do you test them? I think there are three different strategies you can use:
Black-box test basic functionality. For the B-tree case, this is things like cwash suggested, and also, things like making sure that when you add an item, you can then find it, etc.
Test certain invariants that your algorithm should maintain (the B-tree should be balanced, values within nodes should be sorted, etc.)
A few, small "pencil-and-paper" tests may be necessary -- work the algorithm out by hand and check that it matches what your code does. But the big-data tests can all be of type 2. These can also be brittle, so unless you need to be really sure about your algorithm, you may want to avoid them.
If you do not grasp the problem at hand, how can you develop a solution to it? My suggestion would be to understand the domain enough to be able to work out the problem on paper and ensure that your program matches.
Consult with an expert on the subject.
I know if I have a convoluted procedure I'm trying to fix, I have no idea what the output should be after my changes, so I need to consult a fellow developer with more knowledge of the business need, and they are able to verify what I've done is correct.
I would focus on constructing test cases that exercise the functionality of your B-tree algorithm. I haven't looked at it for years, but I'm fairly sure you'll be able to find a documented sequence of steps to insert a set of values in a specific order, then validate that the leaf nodes are as they should be. If you construct your testing along these lines, you should be able to prove your implementation is correct.
The key is to know there is a balance between testing something to death and doing tests that adequately cover what should be covered. Edge cases, e.g null inputs or checking inputs are numeric by testing an alphabet character or a punctuation character, are likely most of the tests you'd need. To complement this there may be one or two common cases to handle to show the program can handle a non-edge case as well. To cover all valid input in most programs is overkill and would result in an overwhelmingly large amount of tests.
I think the answer to the question you're asking boils down to designing for testability. Often you get a testable design for free when you test-drive the development of the solution. But let's face it, when you're implementing a highly mathematical algorithm, this just doesn't fall out.
To make sure you have a testable design, you need to understand what a seam is. Then you need to know a few rules of thumb, such as avoiding statics, using polymorphism, and properly decomposing problems and separating concerns.
Watch "The Clean Code Talks -- Unit Testing" by Misko Hevery, I think it will help you wrap your head around it.
Try looking at it from a requirements point of view, rather than an implementation point of view. Before you write code, you must understand exactly what you want it to do.
Testing and requirements should be a matching pair. If you're having trouble defining tests, maybe it's because the requirements are not well-defined. That in turn implies that you may have bugs that aren't so much implementation bugs, but "lack of clear requirements" bugs. The code writer in that case would be working to a mental list of requirements that he/she thinks is requirements, but can't be sure, and they're not written down for independent understanding and verification.
I've struggled with software where the requirements weren't clear, because the customer couldn't even tell us what they wanted. But when we delivered to them, they sure could tell us then what they didn't like about it! A big part of software engineering is getting the requirements right before the coding begins. This is true on the high-level (overall product, with requirement input from customer) and also the smaller level (modules, individual functions, where requirements are internally defined by software team or individuals). It is still true to some degree I think for iterative development, although the high-level requirements are more fluid.
#Bystrik Jurina,
I often get involved in projects which involve conversions between disparate data formats. Most answers have focused on testing a B-tree or similar algorithm, but it seems that you're looking for a more general answer.
Most of my work is based on the command line. It may sounds like a contradiction, but one of the first tools I use is visualization. I'll write some methods to write out my data structures in a format that's easy to consume. This can (and usually does) include something that's visually clear. But often it also means something that I could easily parse with a smaller test program, or even import into Excel.
I'll start by focusing on the basic outline, and write a program that does the bare minimum of what I need to accomplish. If it's a multi-step process, this might mean implementing one step at a time and validating the results of each step before moving on. Or writing something that works only in specific cases, and then expanding the set of cases where it's expected to work. At first you can validate that the code works in the limited set of cases, such as for known input data. As the project moves forward, you can start logging warnings for cases you might not have tested, or for unexpected types of input data. This has drawbacks, but is a nice approach when you're dealing with a known set of input data
Validation techniques can include formal test cases, or informal programs that work to challenge your assumptions. It could mean writing a basic driver program to exercise the "core" routines. A good example would be to add a record to a database, then read it back and compare the original object against the one loaded from the database.
If you have trouble wrapping your head around the way a program functions, think about what it needs to accomplish. It might be easier to writing code that tests the way different inputs produce different outputs. Producing visualizations is a good help, because the act of deciding how to display the data can make you think about different conditions and focus in on the most critical parts of your data structures.
Often I've found that building a visualization brings me to admit that the way the data is being stored just isn't very clear. For a B-tree, the representation isn't very flexible. But for other cases, you may be using parallel arrays when a nested tree of objects would be more natural.