I want to do the DFS(Depth First Search) traversal on the call graph generated by llvm i.e. I am using following code, but stuck at how to proceed further?
bool runOnModule(Module &M) override
{
CallGraph cg = CallGraph(M);
cg.dump();
CallGraph::iterator beg = cg.begin();
CallGraph::iterator end = cg.end();
return false;
}
The above code is only dumping the callGraph. But I want to do DFS traversal on it starting from main method. I am using clang as front end. How to do it?
The depth first traversal is very simple once you learn to use the LLVM graph iterators:
#include <llvm/ADT/DepthFirstIterator.h>
bool runOnModule(Module &M) override {
CallGraph CG = CallGraph(M);
for (auto IT = df_begin(&CG), EI = df_end(&CG); IT != EI; IT++) {
if (Function *F = IT->getFunction()) {
dbgs() << "Visiting function: " << F->getName() << "\n";
}
}
return false;
}
From the df_iterator you can use getPathLength() and getPath(unsigned) member functions to check your position during the visitation or even skip a specific sub-tree withskipChildren() (but be careful with mixing this with the operator++).
You can also use the post-order traversal or a strongly connected component iterator.
Related
I am facing a problem, where I am looking for a special type of node in a graph.
The algorithm works in following way:
bool findSpecial(Node n)
{
if(isSpecial(n))
return true;
bool isSpecial = false;
vector<Node> childs = getChildren(n);
foreach(child, childs)
{
isSpecial |= findSpecial(child);
}
if(isSpecial)
markCurrentNodeSpecial(n);
return isSpecial;
}
Above is a template of the algorithm, assuming that the input is DAG.
It looks for special nodes in the graph and marks current node special if any node in its the DFS tree is special.
The algorithm is basically populating this special attribute wherever it is reachable.
It can however lead to Stack Overflow in some rare cases.
I am trying to figure out if the same thing can be done iteratively or not.
The problem with iterative approach is that the information whether all the children of a Node are processed is not readily available.
Any suggestion?
1) The simplest solution - would std::stack<Node> work? You should make it work like a program stack works - push the starting Node there, then pop a node, check if it's special, if not - push its children, repeat.
2) You don't check if you have already visited a node - maybe you can speed it up a bit. (Edited: I can't read)
Update: yes, this was an interesting beast. How about this code? I didn't test it though, but the main idea should work: split visitation into two steps - before and after recursion.
struct StackNodeEntry
{
Node cur;
optional<Node> parent;
enum class Pos
{
before_recursion, after_recursion
} pos;
};
bool findSpecial(Node root)
{
stack<StackNodeEntry> stack;
stack.push({ root, {}, StackNodeEntry::Pos::before_recursion });
while (!stack.empty())
{
auto& e = stack.top();
switch (e.pos)
{
case StackNodeEntry::Pos::before_recursion:
if (isSpecial(e.cur))
{
if (e.parent)
markCurrentNodeSpecial(*e.parent);
stack.pop();
break;
}
for (auto n : getChildren(e.cur))
stack.push({ n, e.cur, StackNodeEntry::Pos::before_recursion });
e.pos = StackNodeEntry::Pos::after_recursion;
break;
case StackNodeEntry::Pos::after_recursion:
if (isSpecial(e.cur))
{
if (e.parent)
markCurrentNodeSpecial(*e.parent);
}
stack.pop(); // don't use e below this line
break;
}
}
return isSpecial(root);
}
So, I made the following code for DFS:
void dfs (graph * mygraph, int foo, bool arr[]) // here, foo is the source vertex
{
if (arr[foo] == true)
return;
else
{
cout<<foo<<"\t";
arr[foo] = true;
auto it = mygraph->edges[foo].begin();
while (it != mygraph->edges[foo].end())
{
int k = *it;
if (arr[k] == false)
{
//cout<<k<<"\n";
dfs(mygraph,k,arr);
//cout<<k<<"\t";
}
it++;
}
}
//cout<<"\n";
}
Now, I read up that in an undirected graph, if while DFS, it returns to the same vertex again, there is a cycle. Therefore, what I did was this,
bool checkcycle( graph * mygraph, int foo, bool arr[] )
{
bool result = false;
if (arr[foo] == true)
{
result = true;
}
else
{
arr[foo] = true;
auto it = mygraph->edges[foo].begin();
while (it != mygraph->edges[foo].end())
{
int k = *it;
result = checkcycle(mygraph,k,arr);
it++;
}
}
return result;
}
But, my checkcycle function returns true even if their is no cycle. Why is that? Is there something wrong with my function? There is no execution problem, otherwise I would have debugged, but their seems to be something wrong in my logic.
Notice that your function doesn't quite do what you think it does. Let me try to step through what's happening here. Assume the following relationships: (1,2), (1,3), (2,3). I'm not assuming reflexibility (that is, (1,2) does not imply (2,1)). Relationships are directed.
Start with node 1. Flag it as visited
Iterate its children (2 and 3)
When in node 2, recursively call check cycle. At this point 2 is also flagged as visited.
The recursive call now visits 3 (DEPTH search). 3 is also flagged as visited
Call for step 4 dies returning false
Call for step 3 dies returning false
We're back at step 2. Now we'll iterate node 3, which has already been flagged in step 4. It just returns true.
You need a stack of visited nodes or you ONLY search for the original node. The stack will detect sub-cycles as well (cycles that do not include the original node), but it also takes more memory.
Edit: the stack of nodes is not just a bunch of true/false values, but instead a stack of node numbers. A node has been visited in the current stack trace if it's present in the stack.
However, there's a more memory-friendly way: set arr[foo] = false; as the calls die. Something like this:
bool checkcycle( graph * mygraph, int foo, bool arr[], int previousFoo=-1 )
{
bool result = false;
if (arr[foo] == true)
{
result = true;
}
else
{
arr[foo] = true;
auto it = mygraph->edges[foo].begin();
while (it != mygraph->edges[foo].end())
{
int k = *it;
// This should prevent going back to the previous node
if (k != previousFoo) {
result = checkcycle(mygraph,k,arr, foo);
}
it++;
}
// Add this
arr[foo] = false;
}
return result;
}
I think it should be enough.
Edit: should now support undirected graphs.
Node: this code is not tested
Edit: for more elaborate solutions see Strongly Connected Components
Edit: this answer is market as accepted although the concrete solution was given in the comments. Read the comments for details.
are all of the bools in arr[] set to false before checkcycle begins?
are you sure your iterator for the nodes isn't doubling back on edges it has already traversed (and thus seeing the starting node multiple times regardless of cycles)?
What's a good algorithm to traverse the edges of a Voronoi diagram using boost without recursion?
I know it'd have to check for infinite edges in a cell then check its neighbours and repeat from there, but I'd prefer a method that doesn't require recursion since I'm dealing with large sets of data.
Is this possible without recursion?
Edit, for more clarification:
This is a way to obtain all edge cells:
voronoi_diagram vd;
boost::polygon::construct_voronoi(in.begin(), in.end(), &vd);
std::vector<const voronoi_diagram::cell_type *> edge_cells;
for(const voronoi_diagram::edge_type & e : vd.edges())
if (e.is_infinite())
edge_cells.push_back(e.cell());
The problem with the approach above is that it will not traverse the edge cells in any specific order, say clockwise for example.
A recursive implementation would do something akin to this (hastily written and untested) code:
bool findNext(const voronoi_diagram::cell_type * c,
std::list<const voronoi_diagram::cell_type *> & list)
{
const voronoi_diagram::edge_type * e = c->incident_edge();
do
{
// Follow infinite edges, adding cells encountered along the way
if (e->is_infinite() && e->twin()->cell() != list.front() &&
e->twin()->cell() != list.back())
{
list.push_back(c);
return findNext(e->twin()->cell(), list);
}
else if (e->twin()->cell() == list.front())
{
list.push_back(c);
return true; // we reached the starting point, return
}
e = e->next();
} while (e != c->incident_edge());
return false;
}
// ...
std::list<const voronoi_diagram::cell_type *> edge_cells;
// ...
for(const voronoi_diagram::edge_type & e : vd.edges())
{
// find first infinite edge
if (e.is_infinite())
{
if (findNext(e.cell(), edge_cells))
break;
else
edge_cells.clear();
}
}
This would traverse the edges of the Voronoi diagram until it traces its way back to the first cell and then stops, filling the stack all the way.
A non-recursive implementation would model the second example, producing a list of the edge cells in a clockwise or counter-clockwise order, without using recursion.
You only have one recursive call in findNext and it is return findNext(...) so the so-called tail-call optimization can be applied. Your compiler might do it at -O3. But if you don't trust the compiler to do this, you can do it by hand. Below is the transformed function, no longer recursive:
bool findNext(const voronoi_diagram::cell_type * c,
std::list<voronoi_diagram::cell_type *> & list)
{
const voronoi_diagram::edge_type * e = c->incident_edge();
bool justCalled; // true when we repalce the tail call
do
{
justCalled = false;
// Follow infinite edges, adding cells encountered along the way
if (e->is_infinite() && e->twin()->cell() != list.front() &&
e->twin()->cell() != list.back())
{
list.push_back(c);
c = e->twin()->cell(); // reassigns function argument
e = c->incident_edge(); // replay the initiaization (before do loop)
justCalled = true; // force the loop to continue
continue; // jump to start of loop
// everything happens as if we called findNext(e->twin()->cell(), list);
else if (e->twin()->cell() == list.front())
{
list.push_back(c);
return true; // we reached the starting point, return
}
e = e->next();
} while (justCalled || e != c->incident_edge());
return false;
}
This function is equivalent to the one you wrote, so you'd use it the same and you're sure there's no recursion involved. The bool flag is necessary because continue jumps to the test of the loop and not its body see here, thus the test might fail before the loop even begins when we change arguments and call continue.
This is a general technique, not specific to graph traversal but to all recursive algorithms. Of course if you have many function arguments and a large body of code, the transform is heavy lifting, but in this case I think it's a good match.
In more complex cases when the recursion is not a tail call, you still can "de-recusify" any function by maintaining your own stack. This has the advantage that replacing the stack structure with say a priority fifo may change the traversal order in ways more subtle that what recursion can (easily) achieve.
Just to clarify that I also think the title is a bit silly. We all know that most built-in functions of the language are really well written and fast (there are ones even written by assembly). Though may be there still are some advices for my situation. I have a small project which demonstrates the work of a search engine. In the indexing phase, I have a filter method to filter out unnecessary things from the keywords. It's here:
bool Indexer::filter(string &keyword)
{
// Remove all characters defined in isGarbage method
keyword.resize(std::remove_if(keyword.begin(), keyword.end(), isGarbage) - keyword.begin());
// Transform all characters to lower case
std::transform(keyword.begin(), keyword.end(), keyword.begin(), ::tolower);
// After filtering, if the keyword is empty or it is contained in stop words list, mark as invalid keyword
if (keyword.size() == 0 || stopwords_.find(keyword) != stopwords_.end())
return false;
return true;
}
At first sign, these functions (alls are member functions of STL container or standard function) are supposed to be fast and not take many time in the indexing phase. But after profiling with Valgrind, the inclusive cost of this filter is ridiculous high: 33.4%. There are three standard functions of this filter take most of the time for that percentage: std::remove_if takes 6.53%, std::set::find takes 15.07% and std::transform takes 7.71%.
So if there are any thing I can do (or change) to reduce the instruction times cost by this filter (like using parallellizing or something like that), please give me your advice. Thanks in advance.
UPDATE: Thanks for all your suggestion. So in brief, I've summarize what I need to do is:
1) Merge tolower and remove_if into one by construct my own loop.
2) Use unordered_set instead of set for faster find method.
Thus I've chosen Mark_B's as the right answer.
First, are you certain that optimization and inlining are enabled when you compile?
Assuming that's the case, I would first try writing my own transformer that combines removing garbage and lower-casing into one step to prevent iterating over the keyword that second time.
There's not a lot you can do about the find without using a different container such as unordered_set as suggested in a comment.
Is it possible for your application that doing the filtering really just is a really CPU-intensive part of the operation?
If you use a boost filter iterator you can merge the remove_if and transform into one, something like (untested):
keyword.erase(std::transform(boost::make_filter_iterator(!boost::bind(isGarbage), keyword.begin(), keyword.end()),
boost::make_filter_iterator(!boost::bind(isGarbage), keyword.end(), keyword.end()),
keyword.begin(),
::tolower), keyword.end());
This is assuming you want the side effect of modifying the string to still be visible externally, otherwise pass by const reference instead and just use count_if and a predicate to do all in one. You can build a hierarchical data structure (basically a tree) for the list of stop words that makes "in-place" matching possible, for example if your stop words are SELECT, SELECTION, SELECTED you might build a tree:
|- (other/empty accept)
\- S-E-L-E-C-T- (empty, fail)
|- (other, accept)
|- I-O-N (fail)
\- E-D (fail)
You can traverse a tree structure like that simultaneously whilst transforming and filtering without any modifications to the string itself. In reality you'd want to compact the multi-character runs into a single node in the tree (probably).
You can build such a data structure fairly trivially with something like:
#include <iostream>
#include <map>
#include <memory>
class keywords {
struct node {
node() : end(false) {}
std::map<char, std::unique_ptr<node>> children;
bool end;
} root;
void add(const std::string::const_iterator& stop, const std::string::const_iterator c, node& n) {
if (!n.children[*c])
n.children[*c] = std::unique_ptr<node>(new node);
if (stop == c+1) {
n.children[*c]->end = true;
return;
}
add(stop, c+1, *n.children[*c]);
}
public:
void add(const std::string& str) {
add(str.end(), str.begin(), root);
}
bool match(const std::string& str) const {
const node *current = &root;
std::string::size_type pos = 0;
while(current && pos < str.size()) {
const std::map<char,std::unique_ptr<node>>::const_iterator it = current->children.find(str[pos++]);
current = it != current->children.end() ? it->second.get() : nullptr;
}
if (!current) {
return false;
}
return current->end;
}
};
int main() {
keywords list;
list.add("SELECT");
list.add("SELECTION");
list.add("SELECTED");
std::cout << list.match("TEST") << std::endl;
std::cout << list.match("SELECT") << std::endl;
std::cout << list.match("SELECTOR") << std::endl;
std::cout << list.match("SELECTED") << std::endl;
std::cout << list.match("SELECTION") << std::endl;
}
This worked as you'd hope and gave:
0
1
0
1
1
Which then just needs to have match() modified to call the transformation and filtering functions appropriately e.g.:
const char c = str[pos++];
if (filter(c)) {
const std::map<char,std::unique_ptr<node>>::const_iterator it = current->children.find(transform(c));
}
You can optimise this a bit (compact long single string runs) and make it more generic, but it shows how doing everything in-place in one pass might be achieved and that's the most likely candidate for speeding up the function you showed.
(Benchmark changes of course)
If a call to isGarbage() does not require synchronization, then parallelization should be the first optimization to consider (given of course that filtering one keyword is a big enough task, otherwise parallelization should be done one level higher). Here's how it could be done - in one pass through the original data, multi-threaded using Threading Building Blocks:
bool isGarbage(char c) {
return c == 'a';
}
struct RemoveGarbageAndLowerCase {
std::string result;
const std::string& keyword;
RemoveGarbageAndLowerCase(const std::string& keyword_) : keyword(keyword_) {}
RemoveGarbageAndLowerCase(RemoveGarbageAndLowerCase& r, tbb::split) : keyword(r.keyword) {}
void operator()(const tbb::blocked_range<size_t> &r) {
for(size_t i = r.begin(); i != r.end(); ++i) {
if(!isGarbage(keyword[i])) {
result.push_back(tolower(keyword[i]));
}
}
}
void join(RemoveGarbageAndLowerCase &rhs) {
result.insert(result.end(), rhs.result.begin(), rhs.result.end());
}
};
void filter_garbage(std::string &keyword) {
RemoveGarbageAndLowerCase res(keyword);
tbb::parallel_reduce(tbb::blocked_range<size_t>(0, keyword.size()), res);
keyword = res.result;
}
int main() {
std::string keyword = "ThIas_iS:saome-aTYpe_Ofa=MoDElaKEYwoRDastrang";
filter_garbage(keyword);
std::cout << keyword << std::endl;
return 0;
}
Of course, the final code could be improved further by avoiding data copying, but the goal of the sample is to demonstrate that it's an easily threadable problem.
You might make this faster by making a single pass through the string, ignoring the garbage characters. Something like this (pseudo-code):
std::string normalizedKeyword;
normalizedKeyword.reserve(keyword.size())
for (auto p = keyword.begin(); p != keyword.end(); ++p)
{
char ch = *p;
if (!isGarbage(ch))
normalizedKeyword.append(tolower(ch));
}
// then search for normalizedKeyword in stopwords
This should eliminate the overhead of std::remove_if, although there is a memory allocation and some new overhead of copying characters to normalizedKeyword.
The problem here isn't the standard functions, it's your use of them. You are making multiple passes over your string when you obviously need to be doing only one.
What you need to do probably can't be done with the algorithms straight up, you'll need help from boost or rolling your own.
You should also carefully consider whether resizing the string is actually necessary. Yeah, you might save some space but it's going to cost you in speed. Removing this alone might account for quite a bit of your operation's expense.
Here's a way to combine the garbage removal and lower-casing into a single step. It won't work for multi-byte encoding such as UTF-8, but neither did your original code. I assume 0 and 1 are both garbage values.
bool Indexer::filter(string &keyword)
{
static char replacements[256] = {1}; // initialize with an invalid char
if (replacements[0] == 1)
{
for (int i = 0; i < 256; ++i)
replacements[i] = isGarbage(i) ? 0 : ::tolower(i);
}
string::iterator tail = keyword.begin();
for (string::iterator it = keyword.begin(); it != keyword.end(); ++it)
{
unsigned int index = (unsigned int) *it & 0xff;
if (replacements[index])
*tail++ = replacements[index];
}
keyword.resize(tail - keyword.begin());
// After filtering, if the keyword is empty or it is contained in stop words list, mark as invalid keyword
if (keyword.size() == 0 || stopwords_.find(keyword) != stopwords_.end())
return false;
return true;
}
The largest part of your timing is the std::set::find so I'd also try std::unordered_set to see if it improves things.
I would implement it with lower level C functions, something like this maybe (not checking this compiles), doing the replacement in place and not resizing the keyword.
Instead of using a set for garbage characters, I'd add a static table of all 256 characters (yeah, it will work for ascii only), with 0 for all characters that are ok, and 1 for those who should be filtered out. something like:
static const char GARBAGE[256] = { 1, 1, 1, 1, 1, ...., 0, 0, 0, 0, 1, 1, ... };
then for each character in offset pos in const char *str you can just check if (GARBAGE[str[pos]] == 1);
this is more or less what an unordered set does, but will have much less instructions. stopwords should be an unordered set if they're not.
now the filtering function (I'm assuming ascii/utf8 and null terminated strings here):
bool Indexer::filter(char *keyword)
{
char *head = pos;
char *tail = pos;
while (*head != '\0') {
//copy non garbage chars from head to tail, lowercasing them while at it
if (!GARBAGE[*head]) {
*tail = tolower(*head);
++tail; //we only advance tail if no garbag
}
//head always advances
++head;
}
*tail = '\0';
// After filtering, if the keyword is empty or it is contained in stop words list, mark as invalid keyword
if (tail == keyword || stopwords_.find(keyword) != stopwords_.end())
return false;
return true;
}
I am writing a solver for the N-Puzzle (see http://en.wikipedia.org/wiki/Fifteen_puzzle)
Right now I am using a unordered_map to store hash values of the puzzle board,
and manhattan distance as the heuristic for the algorithm, which is a plain DFS.
so I have
auto pred = [](Node * lhs, Node * rhs){ return lhs->manhattanCost_ < rhs->manhattanCost_; };
std::multiset<Node *, decltype(pred)> frontier(pred);
std::vector<Node *> explored; // holds nodes we have already explored
std::tr1::unordered_set<unsigned> frontierHashTable;
std::tr1::unordered_set<unsigned> exploredHashTable;
This works great for n = 2 and 3.
However, its really hit and miss for n=4 and above. (stl unable to allocate memory for a new node)
I also suspect that I am getting hash collisions in the unordered_set
unsigned makeHash(const Node & pNode)
{
unsigned int b = 378551;
unsigned int a = 63689;
unsigned int hash = 0;
for(std::size_t i = 0; i < pNode.data_.size(); i++)
{
hash = hash * a + pNode.data_[i];
a = a * b;
}
return hash;
}
16! = 2 × 10^13 (possible arrangements)
2^32 = 4 x 10^9 (possible hash values in a 32 bit hash)
My question is how can I optimize my code to solve for n=4 and n=5?
I know from here
http://kociemba.org/fifteen/fifteensolver.html
http://www.ic-net.or.jp/home/takaken/e/15pz/index.html
that n=4 is possible in less than a second on average.
edit:
The algorithm itself is here:
bool NPuzzle::aStarSearch()
{
auto pred = [](Node * lhs, Node * rhs){ return lhs->manhattanCost_ < rhs->manhattanCost_; };
std::multiset<Node *, decltype(pred)> frontier(pred);
std::vector<Node *> explored; // holds nodes we have already explored
std::tr1::unordered_set<unsigned> frontierHashTable;
std::tr1::unordered_set<unsigned> exploredHashTable;
// if we are in the solved position in the first place, return true
if(initial_ == target_)
{
current_ = initial_;
return true;
}
frontier.insert(new Node(initial_)); // we are going to delete everything from the frontier later..
for(;;)
{
if(frontier.empty())
{
std::cout << "depth first search " << "cant solve!" << std::endl;
return false;
}
// remove a node from the frontier, and place it into the explored set
Node * pLeaf = *frontier.begin();
frontier.erase(frontier.begin());
explored.push_back(pLeaf);
// do the same for the hash table
unsigned hashValue = makeHash(*pLeaf);
frontierHashTable.erase(hashValue);
exploredHashTable.insert(hashValue);
std::vector<Node *> children = pLeaf->genChildren();
for( auto it = children.begin(); it != children.end(); ++it)
{
unsigned childHash = makeHash(**it);
if(inFrontierOrExplored(frontierHashTable, exploredHashTable, childHash))
{
delete *it;
}
else
{
if(**it == target_)
{
explored.push_back(*it);
current_ = **it;
// delete everything else in children
for( auto it2 = ++it; it2 != children.end(); ++it2)
delete * it2;
// delete everything in the frontier
for( auto it = frontier.begin(); it != frontier.end(); ++it)
delete *it;
// delete everything in explored
explored_.swap(explored);
for( auto it = explored.begin(); it != explored.end(); ++it)
delete *it;
return true;
}
else
{
frontier.insert(*it);
frontierHashTable.insert(childHash);
}
}
}
}
}
Since this is homework I will suggest some strategies you might try.
First, try using valgrind or a similar tool to check for memory leaks. You may have some memory leaks if you don't delete everything you new.
Second, calculate a bound on the number of nodes that should be explored. Keep track of the number of nodes you do explore. If you pass the bound, you might not be detecting cycles properly.
Third, try the algorithm with depth first search instead of A*. Its memory requirements should be linear in the depth of the tree and it should just be a matter of changing the sort ordering (pred). If DFS works, your A* search may be exploring too many nodes or your memory structures might be too inefficient. If DFS doesn't work, again it might be a problem with cycles.
Fourth, try more compact memory structures. For example, std::multiset does what you want but std::priority_queue with a std::deque may take up less memory. There are other changes you could try and see if they improve things.
First i would recommend cantor expansion, which you can use as the hashing method. It's 1-to-1, i.e. the 16! possible arrangements would be hashed into 0 ~ 16! - 1.
And then i would implement map by my self, as you may know, std is not efficient enough for computation. map is actually a Binary Search Tree, i would recommend Size Balanced Tree, or you can use AVL tree.
And just for record, directly use bool hash[] & big prime may also receive good result.
Then the most important thing - the A* function, like what's in the first of your link, you may try variety of A* function and find the best one.
You are only using the heuristic function to order the multiset. You should use the min(g(n) + f(n)) i.e. the min(path length + heuristic) to order your frontier.
Here the problem is, you are picking the one with the least heuristic, which may not be the correct "next child" to pick.
I believe this is what is causing your calculation to explode.