dependency sort with detection of cyclic dependencies - c++

Before you start throwing links to wikipedia and blogs in my face, please hear me out.
I'm trying to find the optimal algorithm/function to do a dependency sort on... stuff. Each item has a list of its dependencies.
I would like to have something iterator-based, but that's not very important.
What is important is that the algorithm points out exactly which items are part of the dependency cycle. I'd like to give detailed error information in this case.
Practically, I'm thinking of subclassing my items from a "dependency node" class, which has the necessary booleans/functions to get the job done. Cool (but descriptive) names are welcome :)

It's normally called a topological sort. Most books/papers/whatever that cover topological sorting will also cover cycle detection as a matter of course.

I don't exactly get why is it so hard to find the dependecy cycle if there is any! you just have to check if there is any node you already passed over while appling bfs algorithm to find out all the dependecies. if there is one you just roll back the way you came down to revisit a node alll the way up and mark all the nodes until you reach the first visit at the specified node. all the ones in your pass will be marked as a cycle. (just leave a comment and i'll give a code to do that if you need)

Related

Is boost::graph the right tool for this task?

I have the following simplified situation:
I need to create a tree like the following one:
I start at one node with a score of 3. From this node I calculate all possible next nodes which have the score 4, 6, 5 and 7. In the next step I only want to consider the two nodes with the highest score, in our case 6 and 7. From these two nodes I again calculate all possible next nodes. The highest score of all the next nodes are 12 and 13 so in the next step, I only want to consider these two nodes as my next start point. This means all nodes from the previous score 6 nodes are ignored from now on.
And so on...
I have no idea about graph theory right now (but I guess I have to do some research)
I looked for some libraries which might help me implement this in C++. I came across boost::graph which looks promising at a first glance. Downside is, it looks also also quite complex.
My question is:
Do you think boost::graph can be easily used to implement a tree like this? Is it worth spending some days trying to lean boost::graph, or is it not the right library for me.
Is there a better library? I quite like boost and have used it a lot, but not the graph library of it.
I think my requirements are
calculation of my nodes is quite expensive, therefore I need a tree/graph which saves the "score" of each node and can be easily extended later.
it should be possible to simply say _"from now on let's ignore those nodes" and focus on those two best ones.
I need to somehow easily be able to get access to the "end of my tree" at each step, so that I know which nodes are the current nodes I have to calculate the next possible nodes.
to calculate the score of my next nodes I need to be able to easily follow the tree back to its root (I do not only need the previous score but more information. My implementation would somehow require that each node is an object and a score and I need to have access to the members of those objects as well as to the score). This means I need to be able to reconstruct my way through the tree.
Note:
I guess it should be a kind of standard problem for graph theory. But right now I do not really know where to start my research. If you have any good literature, papers, descriptions, key words for google, for this problem please tell me.
Edit:
Many people have pointed out that a array might be sufficient. First of all: You might be right and thank you for your comment. My example was highly simplified. At each step I would need to calculate up to ten thousand of new possible nodes and I need to make roughly 1000 iterations. The Idea of only following a couple of nodes (in my example 2 but in my application most likely ~100 or something like "the best x percent" was simply to reduce the number of possibilities. I think it is obvious that the problem would otherwise explode quite quickly.
Edit2:
seems like there is some confusion about the numbers:
Right now the code runs sequential (no graph, just an array). For every possible new node I calculated a score based on some metrics. At the end I select the node which has the highest score and go on with the new iteration.
The idea how to implement it with a graph is the following:
Again for every possible node I calculate the score (this is the number over the lines that connect two nodes) but the thing we are interested in is the sum of all these scores through the graph which is the number I wrote in the nodes. This means I am interested in the path through the graph which leads to the highest sum of scores.

How to receive the current depth in RecursiveASTVisitor (clang)?

I try to understand several weeks the principles of operation with clang
AST, but meanwhile I did not answer the main issue: how to walk on this
tree?
I read all guides which I found, studied doxygen documentation and even
watched couple of lectures on YouTube, however the understanding did not
come.
That I understood:
1) AST of a tree has no general type of nodes
2) For movement on a tree it is offered to use either RecursiveASTVisitor,
or matcher. The first recursively realizes bypass in depth, and the second
allows to look for those nodes which are interesting.
The problem is in what part of a tree I am in what in one of two options I
cannot learn. I do not know how to define at what moment my visitor passed
to other branch and at what moment continues to move to depth.
Ideally it would be desirable to know depth of the node visited by me. It is
possible?
I very much like a dump() function output because in it communications
between nodes are accurately shown. However how to receive it in pure form
(but not as the text) I do not know.
Generally, the question is as follows: whether I can construct the tree on
the basis of AST, but with uniform type of nodes and how to make it?

Testing important implementation details

I'm implementing a key -> value associative container. Internally it's a sorted binary tree, and I want to make sure it's balanced so that find operations are sure to be Olog(n). The problem is that this is an implementation detail that is entirely private to the class, and I can't readily measure it from outside.
The best I can think to do is to benchmark my find operations - if they operate in linear time it's probably because the tree is unbalanced - but that seems far too inexact, and I'd feel better if I had a more direct way to measure.
What design/testing patterns are out there that might be helpful in these sorts of situations?
You could extract the balanced tree to it's own class, and test that class. In that class the balanced-ness is a feature of the class and could expose something like depth which would let you inspect it and assert that the tree remains balanced.
You are correct in saying that you are testing an implementation detail. The problem here is that a bad implementation also produces the correct output, it just takes longer. This means that the only measurable unit is time.
What I would do is similar to what you propose: create a big collection of data and structure it in a way that a good implementation should be able to find what you're looking for in a matter of moments and a bad implementation has to go through your entire collection before finding it.
This could translate to having thousands of elements and searching for the element that's last in line. You could structure it in a way that a good implementation should find it at the top of the tree and thus find it very quickly while a bad implementation should find it somewhere at the bottom, thus taking time to find it.
Many frameworks have an option to specify a timeout so if you set this to a low enough value and you have plenty of data in your collection, you can weed out slow-running implementations like that.

How to implement TSP with dynamic in C++

Recently I asked a question on Stack Overflow asking for help to solve a problem. It is a travelling salesman problem where I have up to 40,000 cities but I only need to visit 15 of them.
I was pointed to use Dijkstra with a priority queue to make a connectivity matrix for the 15 cities I need to visit and then do TSP on that matrix with DP. I had previously only used Dijkstra with O(n^2). After trying to figure out how to implement Dijkstra, I finally did it (enough to optimize from 240 seconds to 0.6 for 40,000 cities). But now I am stuck at the TSP part.
Here are the materials I used for learning TSP :
Quora
GeeksForGeeks
I sort of understand the algorithm (but not completely), but I am having troubles implementing it. Before this I have done dynamic programming with arrays that would be dp[int] or dp[int][int]. But now when my dp matrix has to be dp[subset][int] I don't have any idea how should I do this.
My questions are :
How do I handle the subsets with dynamic programming? (an example in C++ would be appreciated)
Do the algorithms I linked to allow visiting cities more than once, and if they don't what should I change?
Should I perhaps use another TSP algorithm instead? (I noticed there are several ways to do it). Keep in mind that I must get the exact value, not approximate.
Edit:
After some more research I stumbled across some competitive programming contest lectures from Stanford and managed to find TSP here (slides 26-30). The key is to represent the subset as a bitmask. This still leaves my other questions unanswered though.
Can any changes be made to that algorithm to allow visiting a city more than once. If it can be done, what are those changes? Otherwise, what should I try?
I think you can use the dynamic solution and add to each pair of node a second edge with the shortest path. See also this question:Variation of TSP which visits multiple cities.
Here is a TSP implementation, you will find the link of the implemented problem in the post.
The algorithms you linked don't allow visiting cities more than once.
For your third question, I think Phpdna answer was good.
Can cities be visited more than once? Yes and no. In your first step, you reduce the problem to the 15 relevant cities. This results in a complete graph, i.e. one where every node is connected to every other node. The connection between two such nodes might involve multiple cities on the original map, including some of the relevant ones, but that shouldn't be relevant to your algorithm in the second step.
Whether to use a different algorithm, I would perhaps do a depth-first search through the graph. Using a minimum spanning tree, you can give an upper and lower bound to the remaining cities, and use that to pick promising solutions and to discard hopeless ones (aka pruning). There was also a bunch of research done on this topic, just search the web. For example, in cases where the map is actually carthesian (i.e. the travelling costs are the distance between two points on a plane), you can exploit this info to improve the algorithms a bit.
Lastly, if you really intend to increase the number of visited cities, you will find that the time for computing it increases vastly, so you will have to abandon your requirement for an exact solution.

Graph - strongly connected components

Is there any fast way to determine the size of the largest strongly connected component in a graph?
I mean, like, the obvious approach would mean determining every SCC (could be done using two DFS calls, I suppose) and then looping through them and taking the maximum.
I'm pretty sure there has to be some better approach if I only need to have the size of that component and only the largest one, but I can't think of a good solution. Any ideas?
Thanks.
Let me answer your question with another question -
How can you determine which value in a set is the largest without examining all of the values?
Firstly you could use Tarjan's algorithm which needs only one DFS instead of two. If you understand the algorithm clearly, the SCCs form a DAG and this algo finds them in the reverse topological sort order. So if you have a sense of the graph (like a visual representation) and if you know that relative big SCCs occur at end of the DAG then you could stop the algorithm once first few SCCs are found.