I am currently learning LLVM pass. I managed to create some simple pass, but now I want to get loops and its instructions. I just learned that it is not as simple as getting instructions from functions and basic blocks.
I got some starting code :
for (BasicBlock &B : F)
{
for (Instruction &Inst : B)
{
//maybe get loops and insts here(?)
}
}
What should I do to get the loops?
You can access the function from within a loop pass by BasicBlock::getParent() in case you need something. To do in a function pass you can initialize LoopInfo and use LoopInfo::getLoopFor(BasicBlock *). See: llvm.org/doxygen/DeadStoreElimination_8cpp_source.html
For each loop all the LoopPasses are invoked. For example: llvm/lib/Transforms/Scalar/LoopInstSimplify.cpp . You can make changes in one of the loop passes to get started. There are many examples in llvm/lib/Transforms/Scalar/ directory.
now I want to get loops and its instructions
You can iterate through all the BasicBlocks of a loop using a graph traversal algorithm like Depth First, Reverse Post Order etc.
To get started with loop transformations, you should also learn about:
Natural loops
LoopAccessInfo
DominatorTree
Traversal algorithms like Depth First, Reverse Post Order
Related
I would like to be able to get the basic blocks in a function in topological order. There is an iterator available which iterates over the basic blocks in a function, however I am not sure if it does it in topological order. I wasn't able to get the next basic block for a particular basic block and wasn't able to do topological sorting myself.
You can assume there are no loops in the CFG.
In the general case, this is impossible because BBs don't form a DAG. A topological order is only defined for a DAG - a graph without cycles; BBs within a function may form cycles (loops, etc).
Your best approximation IMHO is to decompose the BB graph into SCCs (Strongly Connected Components). LLVM already has tools to do that: see include/llvm/ADT/SCCIterator.h, and also tools/opt/PrintSCC.cpp.
The latter, in fact, will already print the SCCs of a function in reverse topological order, invoked like this:
$ opt -print-cfg-sccs <bitcode file>
Update (16-Sep-2013): See also this blog post.
I'm sorry for the tricky title, here is the example
graph main_graph = //initialize graph
graph sub_graph = //pick a subset of edges from the main_graph
while ( ! sub_graph.size() == 0) {
select_edge(); //here I pick an edge basing on some heuristics
reduce_graph(); //here I remove some edges from the main_graph
sub_graph = //pick a subset of edges from the main_graph
}
So the point is that I have to write the very same code to define the sub_graph before entering the loop (because it could be already empty) and right before entering a new iteration.
This would not be that bad, if it wasn't that I actually have three nested loops with the same problem, and the code to inizialize the sub_graph is a bunch of lines of code, so my code would look a lot replicated.
Any suggestion on how to better design this loop(s)? I have no restrictions (can use for, do-while...)
Even if this is pseudo-code, since is more a 'design' question, I'm coding in C++!
To avoid repeating lots of code, put the code in a function:
graph calc_subgraph(...) {...}
Then use it to initialize and recalculate your values:
for (graph subgraph = calc_subgraph(...); subgraph.size() != 0; subgraph = calc_subgraph(...))
If the code to initialize sub_Graph is a lot of lines, then write a function that returns an initialized graph, or a function that initializes sub_graph via a passed reference/pointer. Then just call the function inside the loop. That will thin the amount of code that you have to write and read. Loops that involve using the same code twice show up from time to time.
Writing loops that output comma separated lists is a good example of this, since you want the commas to stay on the inside of the list items. So you can either do the first item before the loop, or remove a comma after the loop.
In these kind of cases, calling the initializing code before the loop, and then again at the end of each loop iteration may be faster then putting a conditional test in the loop to skip parts of it every time.
I'd like to run two dependent non-nested for loops. In essence they are two simultaneous Markov chains, where one loop needs to check a value in the other loop. Is it there a right way to do this? Is there a wrong/inefficient way to avoid?
Imaginary example:
Imagine two people are walking round a room and touching things: I record those things they touch in two separate arrays. Those are my two Chains or for loops. That's fine as long as their behaviour is independent. But I'd like to change that and so they will have to react (in real-time) to what the other person is doing. Is this possible to do (surely yes)?
For example, Loop 1 would look something like
for k=1:n
do something
%check loop 2
if something is equivalent
moves=n;
end
end
NB. Technically it could be done one loop after the other, but I'm looking to run something in real-time if possible.
You probably want to construct this as one for loop that processes both chains simultaneously. In pseudocode
for k = 1:n
compute step k of chain 1
compute step k of chain 2
deal with interaction between chains
You will want to package each chain in a data structure that can be passed to a function, so that you do not have to repeat the "compute step k" code twice with variable names modified.
Worry about parallelizing only if this serial approach is too slow.
I have got this problem:
Find the first element in a list, for which a given condition holds.
Unfortunately, the list is quite long (100.000 elements), and evaluation the condition for each element takes in total about 30 seconds using one single Thread.
Is there a way to cleanly parallelize this problem? I have looked through all the tbb patterns, but could not find any fitting.
UPDATE: for performance reason, I want to stop as early as possible when an item is found and stop processing the rest of the list. That's why I believe I cannot use parallel_while or parallel_do.
I'm not too familiar with libraries for this, but just thinking aloud, could you not have a group of threads iterating at different at the same stride from different staring points?
Say you decide to have n threads (= number of cores or whatever), each thread should be given a specific starting point up to n, so the first thread starts on begin(), the next item it compares is begin() + n, etc. etc. second thread starts on begin()+1 and then it's next comparison is in n too etc.
This way you can have a group of threads iterating in parallel through the list, the iteration itself is presumably not expensive - just the comparison. No node will be compared more than once and you can have some condition which is set when a match is made by any of the threads and all should check this condition before iterating/comparing..
I think it's pretty straightforward to implement(?)
I think the best way to solve this problem with TBB is parallel_pipeline.
There should be (at least) two stages in the pipeline. The 1st stage is serial; it just reads the next element from the list and passes it to the 2nd stage. This 2nd stage is parallel; it evaluates the condition of interest for a given element. As soon as the condition is met, the second stage sets a flag (which should be either atomic or protected with a lock) to indicate that a solution is found. The first stage must check this flag and stop reading the list once the solution is found.
Since condition evaluation is performed in parallel for a few elements, it can happen that a found element is not the first suitable one in the list. If this is important, you also need to keep an index of the element, and when a suitable solution is found you detect whether its index is less than that of a previously known solution (if any).
HTH.
ok, I have done it this way:
Put all elements into a tbb::concurrent_bounded_queue<Element> elements.
Create an empty tbb::concurrent_vector<Element> results.
Create a boost::thread_group, and create several threads that run this logic:
logic to run in parallel:
Element e;
while (results.empty() && elements.try_pop(e) {
if (slow_and_painfull_check(e)) {
results.push_back(e);
}
}
So when the first element is found, all other threads will stop processing the next time they check results.empty().
It is possible that two or more threads are working on an element for which slow_and_painfull_check returns true, so I just put the result into a vector and deal with this outside of the parallel loop.
After all threads in the thread group have finished, I check all elements in the results and use the one that comes first.
you can take a look at http://gcc.gnu.org/onlinedocs/libstdc++/manual/parallel_mode.html for parallel algorithms implementations.
And in particular you need find_if algorithm http://www.cplusplus.com/reference/algorithm/find_if/
I see two opportunities for parallelism here: evaluating one element on multiple threads, or evaluating multiple elements at once on different threads.
There isn't enough information to determine the difficulty nor the effectiveness of evaluating one element on multiple threads. If this is easy, the 30 second per element time could be reduced.
I do not see a clean fit into TBB for this problem. There are issues with lists not having random access iterators, determining when to stop, and guaranteeing the first element is found. There may be some games you can play with the ranges to get it to work though.
You could use some lower level thread constructs to implement this yourself as well, but there are a number of places for incorrect results to be returned. To prevent such errors, I would recommend using an existing algorithm. You could convert the list to an array (or some other structure with random access iterators) and use the experimental libstdc++ Parellel Mode find_if algorithm user383522 referenced.
If it's a linked list, A parallel search isn't going to add much speed. However, linked lists tend to perform poorly with caches. You may get a tiny performance increase if you have two threads: one does the find_first_element, and one simply iterates through the list, making sure not to get more than X (100?) ahead of the first thread. The second thread doesn't do any comparisons, but will assure that the items are cached as well as possible for the first thread. This may help your time, or it might make little difference, or it might hinder. Test everything.
Can't you transform the list to a balanced tree or similar? Such data structures are easier to process in parallel - usually you get back the overhead you may have paid in making it balanced in the first time... For example, if you write functional-style code, check this paper: Balanced trees inhabiting functional parallel programming
If you are using GCC, GNU OpenMP provides parallel std functions
link
I've never heard of the Intel tbb library but a quick open and scan of the Tutorial led me to parallel_for which seems like it will do the trick.
For the past few days I've tried to implement this algorithm. This far I've managed to make a dynamic 2d array and insert the distances between nodes, a function to remove a path between nodes and a function that tells me if there is a path between two nodes.
Now I would like to implement a function that returns the shortest path from node A to node B. I know how dijkstras algorithm works and I've read the pseudo code on wiki without being able to write any code my self. I'm really stuck here.
I've been thinking about how the code should look like and what should happen thats why I've made that function that tells me if theres a path between two nodes. Do I need any more help functions which would make implementing of dijkstras easier?
For now I have only 3 nodes but the code I would like to write needs to work in general for n nodes.
Any kind of help is appreciated.
You are probably thinking to much.
You need 2 things. A clean graph structure you understand. A good description of the algorithm you understand.
If you have both. Just start writing some code. Helpers needed will become obvious on the way.
-- edit --
You will probably need some of the following datastructures
std::vector
std::list
std::priority_queue
I found several codes for this algorithm, but maybe it is better the simplest one in order to undertand it better, so you can check the differences between yours and this one, and complete yours. It is always better to program your way.
Have a look at this one and see if it helps.
http://vinodcse.wordpress.com/2006/05/19/code-for-dijkstras-algorithm-in-c-2/
Good luck.
Edit: Code deleted, and I'm going to give hints:
Store graph as list of adjacency lists of each vertex. (something like this vector < vector < pair<int,int> > > g (n);)
Use some data-structure to keep track what is the vertex with minimal distance in current state. (maybe set, or priority_queue to have O(m log(n)) complexity)
Each time take high_priority vertex (vertex with minimal current distance), delete it from your data_structure and update distances of adjacent to deleted one vertexes.
Note: If you want to get minimal path as well, then keep some vector<int> previous and each time when updating distance of vertex (say v) set previous[v] = index of vertex from where you came here. Your path is last, prev[last], prev[prev[last]],...,first in reversed order.