How to implement TSP with dynamic in C++ - c++

Recently I asked a question on Stack Overflow asking for help to solve a problem. It is a travelling salesman problem where I have up to 40,000 cities but I only need to visit 15 of them.
I was pointed to use Dijkstra with a priority queue to make a connectivity matrix for the 15 cities I need to visit and then do TSP on that matrix with DP. I had previously only used Dijkstra with O(n^2). After trying to figure out how to implement Dijkstra, I finally did it (enough to optimize from 240 seconds to 0.6 for 40,000 cities). But now I am stuck at the TSP part.
Here are the materials I used for learning TSP :
Quora
GeeksForGeeks
I sort of understand the algorithm (but not completely), but I am having troubles implementing it. Before this I have done dynamic programming with arrays that would be dp[int] or dp[int][int]. But now when my dp matrix has to be dp[subset][int] I don't have any idea how should I do this.
My questions are :
How do I handle the subsets with dynamic programming? (an example in C++ would be appreciated)
Do the algorithms I linked to allow visiting cities more than once, and if they don't what should I change?
Should I perhaps use another TSP algorithm instead? (I noticed there are several ways to do it). Keep in mind that I must get the exact value, not approximate.
Edit:
After some more research I stumbled across some competitive programming contest lectures from Stanford and managed to find TSP here (slides 26-30). The key is to represent the subset as a bitmask. This still leaves my other questions unanswered though.
Can any changes be made to that algorithm to allow visiting a city more than once. If it can be done, what are those changes? Otherwise, what should I try?

I think you can use the dynamic solution and add to each pair of node a second edge with the shortest path. See also this question:Variation of TSP which visits multiple cities.

Here is a TSP implementation, you will find the link of the implemented problem in the post.
The algorithms you linked don't allow visiting cities more than once.
For your third question, I think Phpdna answer was good.

Can cities be visited more than once? Yes and no. In your first step, you reduce the problem to the 15 relevant cities. This results in a complete graph, i.e. one where every node is connected to every other node. The connection between two such nodes might involve multiple cities on the original map, including some of the relevant ones, but that shouldn't be relevant to your algorithm in the second step.
Whether to use a different algorithm, I would perhaps do a depth-first search through the graph. Using a minimum spanning tree, you can give an upper and lower bound to the remaining cities, and use that to pick promising solutions and to discard hopeless ones (aka pruning). There was also a bunch of research done on this topic, just search the web. For example, in cases where the map is actually carthesian (i.e. the travelling costs are the distance between two points on a plane), you can exploit this info to improve the algorithms a bit.
Lastly, if you really intend to increase the number of visited cities, you will find that the time for computing it increases vastly, so you will have to abandon your requirement for an exact solution.

Related

Is boost::graph the right tool for this task?

I have the following simplified situation:
I need to create a tree like the following one:
I start at one node with a score of 3. From this node I calculate all possible next nodes which have the score 4, 6, 5 and 7. In the next step I only want to consider the two nodes with the highest score, in our case 6 and 7. From these two nodes I again calculate all possible next nodes. The highest score of all the next nodes are 12 and 13 so in the next step, I only want to consider these two nodes as my next start point. This means all nodes from the previous score 6 nodes are ignored from now on.
And so on...
I have no idea about graph theory right now (but I guess I have to do some research)
I looked for some libraries which might help me implement this in C++. I came across boost::graph which looks promising at a first glance. Downside is, it looks also also quite complex.
My question is:
Do you think boost::graph can be easily used to implement a tree like this? Is it worth spending some days trying to lean boost::graph, or is it not the right library for me.
Is there a better library? I quite like boost and have used it a lot, but not the graph library of it.
I think my requirements are
calculation of my nodes is quite expensive, therefore I need a tree/graph which saves the "score" of each node and can be easily extended later.
it should be possible to simply say _"from now on let's ignore those nodes" and focus on those two best ones.
I need to somehow easily be able to get access to the "end of my tree" at each step, so that I know which nodes are the current nodes I have to calculate the next possible nodes.
to calculate the score of my next nodes I need to be able to easily follow the tree back to its root (I do not only need the previous score but more information. My implementation would somehow require that each node is an object and a score and I need to have access to the members of those objects as well as to the score). This means I need to be able to reconstruct my way through the tree.
Note:
I guess it should be a kind of standard problem for graph theory. But right now I do not really know where to start my research. If you have any good literature, papers, descriptions, key words for google, for this problem please tell me.
Edit:
Many people have pointed out that a array might be sufficient. First of all: You might be right and thank you for your comment. My example was highly simplified. At each step I would need to calculate up to ten thousand of new possible nodes and I need to make roughly 1000 iterations. The Idea of only following a couple of nodes (in my example 2 but in my application most likely ~100 or something like "the best x percent" was simply to reduce the number of possibilities. I think it is obvious that the problem would otherwise explode quite quickly.
Edit2:
seems like there is some confusion about the numbers:
Right now the code runs sequential (no graph, just an array). For every possible new node I calculated a score based on some metrics. At the end I select the node which has the highest score and go on with the new iteration.
The idea how to implement it with a graph is the following:
Again for every possible node I calculate the score (this is the number over the lines that connect two nodes) but the thing we are interested in is the sum of all these scores through the graph which is the number I wrote in the nodes. This means I am interested in the path through the graph which leads to the highest sum of scores.

Flavor of a Traveling Salesman(TSP) using Boost::Graph

I need to find an optimal solution for a Travelling Salesman Problem on graphs with the small number of vertices (< 10). Since this is an NP hard problem I am ready to do the brute force approach, for the small number of vertices it should be doable in a very small time.
I have a slightly modified conditions for 2 problems:
(A)
The graph is bidirectional, with different weights in each direction.
All vertices are connected to all.
(Nice to have condition) You can visit the same vertices more than once, and travel the same paths more than once (however for eventual completeness you should not loop infinitely)
(B)
In addition to conditions of (A), here you need to visit a subset of vertices, while you still allow to travel through all other vertices of a graph. (given that it is a better solution).
A while back I have implemented a brute force solution and some heuristics like Lin–Kernighan (using simple matrix of weights), however I never used Graph data structures like in boost. And I was wounding if there is an existing implementation that I could use or a set of algorithms that could help me out to get optimal solution. Also I would appreciate if you could advise on how to get the part (B) right.
Thanks!

What algorithm opencv GCGRAPH (max flow) is based on?

opencv has an implementation of max-flow algorithm (class GCGRAPH in file gcgraph.hpp). It's available here.
Does anyone know which particular max-flow algorithm is implemented by this class?
I am not 100% confident about this, but I believe that the algorithm is based on this research paper describing max-flow algorithms for computer vision. Specifically, Section 3 describes a new algorithm for computing maximum flows.
I haven't lined up every detail of the paper's algorithm with the implementation of the algorithm, but many details seem to match:
The algorithm described works by using a bidirectional search from both s and t, which the implementation is doing as well: for example, there's a comment reading // grow S & T search trees, find an edge connecting them.
The algorithm described keeps track of a set of orphaned nodes, which the variable std::vector<Vtx*> orphans seems to track in the implementation.
The algorithm described works by building up a set of trees and reusing them; the algorithm implementation keeps track of a tree associated with each node.
I hope this helps!

Graph - strongly connected components

Is there any fast way to determine the size of the largest strongly connected component in a graph?
I mean, like, the obvious approach would mean determining every SCC (could be done using two DFS calls, I suppose) and then looping through them and taking the maximum.
I'm pretty sure there has to be some better approach if I only need to have the size of that component and only the largest one, but I can't think of a good solution. Any ideas?
Thanks.
Let me answer your question with another question -
How can you determine which value in a set is the largest without examining all of the values?
Firstly you could use Tarjan's algorithm which needs only one DFS instead of two. If you understand the algorithm clearly, the SCCs form a DAG and this algo finds them in the reverse topological sort order. So if you have a sense of the graph (like a visual representation) and if you know that relative big SCCs occur at end of the DAG then you could stop the algorithm once first few SCCs are found.

Boost Graph Library: Is there a neat algorithm built into BGL for community detection?

Anybody out there using BGL for large production servers?
How many node does your network consist of?
How do you handle community detection
Does BGL have any cool ways to detect communities?
Sometimes two communities might be linked together by one or two edges, but these edges are not reliable and can fade away. Sometimes there are no edges at all.
Could someone speak briefly on how to solve this problem.
Please open my mind and inspire me.
So far I have managed to work out if two nodes are on an island (in a community)
in a lest expensive manner, but now I need to work out which two nodes on separate islands are closest to each other. We can only make minimal use of unreliable geographical data.
If we figuratively compare it to a mainland and an island and take it out of social distance context. I want to work out which two bits of land are the closest together across a body of water.
I've used the BGL for graphs with millions of nodes, but the size of the graph you can use depends on what algorithm you are trying to run. You can quickly compute distances between nodes. There are 4 shortest path algorithms which are most applicable depending on your data: (single pairs of points, for all pairs of points, sparse and dense graphs,...).
As for community detection, there aren't any algorithms built-into the BGL specifically for that (but maybe you can contribute one when you are finished with your project). There are a few algorithms that might be helpful in building a community detection algorithm. The max-flow/min-cut algorithms are typically used in community detection (if there is a lot of flow possible between two nodes, then they are likely to be in the same community, if there isn't much flow, then the min-cut is likely to represent roads between communities). There are also heuristics to order the nodes of the graph to reduce bandwidth. Nodes making up "communities" are likely to be close to each other in such an ordering.
As far as I know BGL doesn't have any algorithms specifically for community detection.
By "island" do you mean a disconnected subgraph?
Also, graphs do not have any notion of 'distance'.
This 'social distance' is something that you are going to have to define. Once you've done that a large part of the work is done.
There are numerous methods listed on the page you linked to, most of those only require you to define something like a 'distance' metric, and then plug your definitions into the algorithm.
# David Nehme
Graphs without edge-weights are only about connectedness, they have no notion of distance. If you want to talk about a network then you can talk about distance. But a graph with no edge-weights does not have any distance, unless you want to assume an implied edge-weight of 1 for all edges. But this is really just turning the graph into a network.
Also, he is talking about the distance between two disconnected graphs. To model this, you have to introduce an external concept for distance between nodes, separate from the edge-distance.