Topological Sorting of Basic Blocks in LLVM - llvm

I would like to be able to get the basic blocks in a function in topological order. There is an iterator available which iterates over the basic blocks in a function, however I am not sure if it does it in topological order. I wasn't able to get the next basic block for a particular basic block and wasn't able to do topological sorting myself.
You can assume there are no loops in the CFG.

In the general case, this is impossible because BBs don't form a DAG. A topological order is only defined for a DAG - a graph without cycles; BBs within a function may form cycles (loops, etc).
Your best approximation IMHO is to decompose the BB graph into SCCs (Strongly Connected Components). LLVM already has tools to do that: see include/llvm/ADT/SCCIterator.h, and also tools/opt/PrintSCC.cpp.
The latter, in fact, will already print the SCCs of a function in reverse topological order, invoked like this:
$ opt -print-cfg-sccs <bitcode file>
Update (16-Sep-2013): See also this blog post.

Related

Does partition strategy helps on Gremlin traversal performance

I tried to play around with the partition strategy as what was mentioned here https://tinkerpop.apache.org/docs/current/reference/ .Initially, I expect that when I define a specific partition key for a zone and write some vertices on it, it would index that specific zones and improve the vertex lookup. Eventually, I realize that the partition key is just like another property value define within a vertex. In other words, these codes is nothing more but just a property value lookup which leads to full graph traversal scan:
g.withStrategies(new PartitionStrategy(partitionKey: "_partition", writePartition: "a",
readPartitions: ["a"]));
I'm not sure what are the underlying logic of this partitionstrategy, but it does not seems to be improve the lookup if it really does full graph scan. Correct me if i;m wrong
From TinkerPop's perspective, PartitionStrategy is just automatically modifying your Gremlin to take advantage of particular property in the graph. TinkerPop doesn't know anything about your graph databases's underlying indexing features nor does it implement any. It is up to your graph to optimize such things. Some graphs might do that on their own, some might offer you the opportunity to create indices that would help improve the speed of PartitionStrategy and others might do nothing at all, leaving PartitionStrategy to not work well for all use cases.
Going back to TinkerPop's perspective, the goal of PartitionStrategy (and SubgraphStrategy for that matter) is more to ease the manner with which Gremlin is written for use cases where parts of the graph need to be hidden. Without it, you would have lots and lots of repetitive filters mixed into your traversal which would muddy its readability.
Consider this bit of code:
graph = TinkerGraph.open()
strategy = new PartitionStrategy(partitionKey: "_partition", writePartition: "a", readPartitions: ["a"])
g = traversal().withEmbedded(graph).withStrategies(strategy)
g.addV().addE('link')
g.V().out().out().out()
The traversal is quite readable and straightforward. It is easy to understand the intent - a three step hop. But that's not really the traversal that executed. What executed was:
g.V().out().has('_partition',within("a")).
out().has('_partition',within("a")).
out().has('_partition',within("a"))
If you are using PartitionStrategy then you need to be sure it suits your graph database as well as your use case.

Traverse through a DAG-like structure to produce another DAG-like structure in Clojure

I have a DAG-like structure that is essentially a deeply-nested map. The maps in this structure can have common values, so the overall structure is not a tree but a direct acyclic graph. I'll refer to this structure as a DAG for brevity.
The nodes in this graph are of different but finite number of categories. Each category can have its own structure/keywords/number-of-children. There is one unique node that is the source of this DAG, meaning from this node we can reach all nodes in the DAG.
The task is to traverse through the DAG from the source node, and convert each node to another one or more nodes in a new constructed graph. I'll give an example for illustration.
The graph in the upper half is the input one. The lower half is the one after transformation. For simplicity, the transformation is only done on node A where it is split into node 1 and A1. The children of node A are also reallocated.
What I have tried (or in mind):
Write a function to convert one object for different types. Inside this function, recursively call itself to convert each of its children. This method suffers from the problem that data are immutable. The nodes in the transformed graph cannot be changed randomly to add children. To overcome this, I need to wrap every node in a ref/atom/agent.
Do a topological sort on the original graph. Then convert the nodes in the reversed order, i.e., bottom-up. This method requires a extra traverse of the graph but at least the data need not to be mutable. Regarding the topological sort algorithm, I'm considering DFS-based method as stated in the wiki page, which does not require the knowledge of the full graph nor a node's parents.
My question is:
Is there any other approaches you might consider, possibly more elegant/efficient/idiomatic?
I'm more in favour of the second method, is there any flaws or potential problems?
Thanks!
EDIT: On a second thought, a topological sorting is not necessary. The transformation can be done in the post-order traversal already.
This looks like a perfect application of Zippers. They have all the capabilities you described as needed and can produce the edited 'new' DAG. There are also a number of libraries that ease the search and replace capability using predicate threads.
I've used zippers when working with OWL ontologies defined in nested vector or map trees.
Another option would be to take a look at Walkers although I've found these a bit more tedious to use.

What's the most efficient method of obtaining a topologically ordered Function list?

Suppose we have a module containing some functions with no recursive calls (so the call-graph is a DAG). What is the most efficient method of obtaining a vector of Function*'s from the module ordered by topologcal order in terms of call order?
By topological order I mean that if foo() calls bar() then foo will appear before bar in the sorted list.
Is there any Analysis pass which can give me this info, or do i have to write my own sorting routine?
While I'm not familiar with an existing pass that does exactly what you want, there's code in LLVM that's very close and I'm sure you can use it to solve your problem quickly. It's in the IPA (Inter-procedural analysis) library, in lib/Analysis/IPA. In particular, look at lib/Analysis/IPA/CallGraph.cpp - it builds a call graph in a module. Sorting such a graph topologically should be fairly easy.

Dijkstra's algorithm with an 2d-array

For the past few days I've tried to implement this algorithm. This far I've managed to make a dynamic 2d array and insert the distances between nodes, a function to remove a path between nodes and a function that tells me if there is a path between two nodes.
Now I would like to implement a function that returns the shortest path from node A to node B. I know how dijkstras algorithm works and I've read the pseudo code on wiki without being able to write any code my self. I'm really stuck here.
I've been thinking about how the code should look like and what should happen thats why I've made that function that tells me if theres a path between two nodes. Do I need any more help functions which would make implementing of dijkstras easier?
For now I have only 3 nodes but the code I would like to write needs to work in general for n nodes.
Any kind of help is appreciated.
You are probably thinking to much.
You need 2 things. A clean graph structure you understand. A good description of the algorithm you understand.
If you have both. Just start writing some code. Helpers needed will become obvious on the way.
-- edit --
You will probably need some of the following datastructures
std::vector
std::list
std::priority_queue
I found several codes for this algorithm, but maybe it is better the simplest one in order to undertand it better, so you can check the differences between yours and this one, and complete yours. It is always better to program your way.
Have a look at this one and see if it helps.
http://vinodcse.wordpress.com/2006/05/19/code-for-dijkstras-algorithm-in-c-2/
Good luck.
Edit: Code deleted, and I'm going to give hints:
Store graph as list of adjacency lists of each vertex. (something like this vector < vector < pair<int,int> > > g (n);)
Use some data-structure to keep track what is the vertex with minimal distance in current state. (maybe set, or priority_queue to have O(m log(n)) complexity)
Each time take high_priority vertex (vertex with minimal current distance), delete it from your data_structure and update distances of adjacent to deleted one vertexes.
Note: If you want to get minimal path as well, then keep some vector<int> previous and each time when updating distance of vertex (say v) set previous[v] = index of vertex from where you came here. Your path is last, prev[last], prev[prev[last]],...,first in reversed order.

Is it possible to implement a recursive Algorithm with an Iterator?

I have given a tree like this:
http://www.seqan.de/dddoc/html/streePreorder.png http://www.seqan.de/dddoc/html/streePreorder.png
i can acces each node with the next operator.
// postorder dfs
Iterator< Index<String<char> >, BottomUp<> >::Type myIterator(myIndex);
for (; !atEnd(myIterator); goNext(myIterator))
// do something with myIterator
but i want to use a recursive algorithm on the tree.
Is there a way i can make the recursive algorithm (exlude the biggest subtree on each node) iterative ?
or how i can acces the elements non-recursively ?
Edit:
The Actual problem :
I have given a recursive algorithm , that works on trees. (recursive)
I also use a library where i only can acces the Items with an iterator ( non standard , iterative)
recursive <-> iterative.
How can i solve this ?
You can convert that recursive function to an iterative function with the help of a stack.
//breadth first traversal pseudo-code
push root to a stack
while( stack isn't empty )
pop element off stack
push children
perform action on current node
depending on how you want to traverse the nodes the implementation will be different. All recursive functions can be transformed to iterative ones. A general usage on how requires information on the specific problem. Using stacks/queues and transforming into a for loop are common methods that should solve most situations.
You should also look into tail recursion and how to identify them, as these problems nicely translates into a for loop, many compilers even do this for you.
Some, more mathematically oriented recursive calls can be solved by recurrence relations. The likelihood that you come across these which haven't been solved yet is unlikely, but it might interest you.
//edit, performance?
Really depends on your implementation and the size of the tree. If there is a lot of depth in your recursive call, then you will get a stack overflow, while an iterative version will perform fine. I would get a better grasp on recursion (how memory is used), and you should be able to decide which is better for your situation. Here is an example of this type of analysis with the fibonacci numbers.
If your iterator only supports forward (and possibly backward) traversal, but not following links on the tree or fast random access, you will have a very hard time adapting a tree algorithm to it. However, in the end any answer will depend on the interface presented by your custom iterators, which you have not provided.
For example, consider the easy algorithm of tree search. If the only operation given by your iterator is "start from the first element and move on one-by-one", you obviously cannot implement tree search efficiently. Nor can you implement binary search. So you must provide a list of exactly what operations are supported, and (critically) the complexity bounds for each.
Any recursive function can alternatively be implemented with stacks. If this is the question you are asking.
Here is an article by Phil Haack on the subject.
Performance gains one way or the other are speculative, the compiler does things with our code behind the scenes that can't always predict. Implement both and get some real numbers. If they are similar use the one that you find more readable.
Even with recursive iteration, you end up with a node-per-node visit.
What you need to know is: how can my iterator be told to go depth-first, and then: how will I be notified that one level has started/ended (i.e. the start/end of a recursion step).
That knowledge can be mapped onto the recursive algorithm.