Optimize Network Graph creation - c++

I have the following code that goes through a matrix of 188k x 188k rows of data and attempts to create a network graph out of it. The problem here is my algorithm is extremely slow (as expected since its quadratic). Is there a better way of doing this that I'm not seeing? I'm already thinking of using openMP to parallelize this but would be great if I don't have to.
Here's whats true about my matrix - its symmetric, its over 188 thousand by 188 thousand, each value in the matrix corresponds to the edge weight So for example, an element aij is the weight of the edge between i and j. Here's my code:
The graph creation:
typedef boost::adjacency_list
<
boost::vecS,
boost::vecS,
boost::undirectedS,
boost::property<boost::vertex_name_t, std::string>,
boost::property<boost::edge_weight_t, float>,
boost::property<boost::graph_name_t, std::string>
> UGraph;
typedef UGraph::vertex_descriptor vertex_t;
typedef UGraph::edge_descriptor edge_t;
Now the function creating the network:
vertex_t u;
vertex_t v;
edge_t e;
bool found=0;
int idx =0;
float cos_similarity;
for(int p =1;p<=adj_matrix.cols();p++){
//using a previously created vector to track already created nodes
if(std::find(created_nodes.begin(), created_nodes.end(), nodes[idx]) == created_nodes.end()){
u = add_vertex(nodes[idx], ug);
created_nodes.push_back(nodes[idx]);
}else{
u = vertex(p,ug);
}
int jdx = 0;
for(int q =1;q<=adj_matrix.cols();q++){
if(p!=q){//NO LOOPS IN THIS GRAPH
//using a previously created vector to track already created nodes
if(std::find(created_nodes.begin(), created_nodes.end(), nodes[jdx]) == created_nodes.end()){
v = add_vertex(nodes[jdx], ug);
created_nodes.push_back(nodes[jdx]);
}else{
u = vertex(q,ug);
}
tie(e, found) = edge(u, v, ug);
if(!found){//check that edge does not already exist
cos_similarity = adj_matrix(p,q);
fil<<cos_similarity<<endl;
fil.flush();
if(cos_similarity >= 0.2609){ //only add edge if value of cell is greater than this threshold
boost::add_edge(u,v,cos_similarity, ug);
edge_out<<p<<" "<<q<<" "<<cos_similarity<<endl; //creating an edge-weight list for later use
}
}
}
jdx++;
}
idx++;
}

A simple tip:
I think your algorithm is cubic rather than quadratic, because vector and std::find(vector.begin(), vector.end()) are used to avoid duplicates in the inside loop.
To avoid duplicates and keep the algorithm quadraic, you can just traverse the upper triangle of the matrix as it's symmetric, which means the graph is an undirected graph.

Related

How to use Boost's Kruskal MST algorithm on CGAL's Triangulation_3?

I've been trying to puzzle out how to form edge descriptors for a CGAL Triangulation_3 such that I can use Boost's implementation of Kruskal's Minimum Spanning Tree on that Triangulation.
I have been reading through the reference documentation for a Triangulation_2 (provided here), but have observed that no implementation exists for boost::graph_traits<Triangulation_3>. While puzzling it out, I found that I could possibly provide my own implementation for edge descriptors through an adjacency list as shown in Boost's example for a Kruskal MST, but got lost and confused at this step, and didn't know if that would be a sufficient approach.
Ultimately, it seems that what I need to do is create a boost Graph implementation, but am lost at what resources I need to accomplish this step. From there, the desired use is to be able to traverse this MST to perform graph-based min-cuts at specific edges matching a predicate.
// EDIT :>
My current attempt revolves around creating the EMST via pushing simplex edges defined as a pair of vertex iterate indices, with weights defined as euclidean distance between vertices (a Point_3 with data attached), using the Graph construction shown in the Boost example.
My hope is to have BGL vertices (as a Point_3 with color information attached) be connected by BGL edges (as a simplex[!] edge after the triangulation). My ultimate use just requires that I traverse some sort of contiguous spatial ordering of my Point_3's (with RGB info), and split estimated planes into "patches" which meet a max-distance (complete linkage?) threshold, or a within-patch distance variance. It's not exactly segmentation, but similar.
// some defns...
using RGBA = std::array<uint16_t, 3>;
using PointData = boost::tuple<
Point_3, // Point location; Easting-Altitude-Northing
Vector_3, // Estimated Normal Vector at Point
RGBA, // Photo Color
RGBA, // RANSAC Shape Colorization
size_t, // Estimated Patch ID
RGBA>; // Estimated Patch Colorization
//
// Some operations on the points and RANSAC estimation occurs here
//
// iterate through shapes
while (it != shapes.end()) {
boost::shared_ptr<EfficientRANSAC::Shape> shape = *it;
std::cout << (*it)->info() << std::endl;
// generate a random color code for this shape
RGBA rgb;
for (int i=0; i<3; i++) {
rgb[i] = rand()%256;
}
// Form triangulation to later convert into Graph representation
using VertexInfoBase = CGAL::Triangulation_vertex_base_with_info_3<
PointData,
Kernel
>;
using TriTraits = CGAL::Triangulation_data_structure_3<
VertexInfoBase,
CGAL::Delaunay_triangulation_cell_base_3<Kernel>,
CGAL::Parallel_tag
>;
using Triangulation_3 = CGAL::Delaunay_triangulation_3<Kernel, TriTraits>;
Triangulation_3 tr;
// Iterate through point indices assigned to each detected shape.
std::vector<std::size_t>::const_iterator
index_it = (*it)->indices_of_assigned_points().begin();
while (index_it != (*it)->indices_of_assigned_points().end()) {
PointData& p = *(points.begin() + (*index_it));
// assign shape diagnostic color info
boost::get<3>(p) = rgb;
// insert Point_3 data for triangulation and attach PointData info
auto vertex = tr.insert(boost::get<0>(p));
vertex->info() = p;
index_it++; // next assigned point
}
std::cout << "Found triangulation with: \n\t" <<
tr.number_of_vertices() << "\tvertices\n\t" <<
tr.number_of_edges() << "\tedges\n\t" <<
tr.number_of_facets() << "\tfacets" << std::endl;
// build a Graph out of the triangulation that we can do a Minimum-Spanning-Tree on
using Graph = boost::adjacency_list<
boost::vecS, // OutEdgeList
boost::vecS, // VertexList
boost::undirectedS, // Directed
boost::no_property, // VertexProperties
boost::property< boost::edge_weight_t, int >, // EdgeProperties
boost::no_property, // GraphProperties
boost::listS // EdgeList
>;
using Edge = boost::graph_traits<Graph>::edge_descriptor;
using E = std::pair< size_t, size_t >; // <: TODO - should be iterator index of vertex in Triangulation_3 instead of size_t?
std::vector<E> edge_array; // edges should be between Point_3's with attached RGBA photocolor info.
// It is necessary to later access both the Point_3 and RGBA info for vertices after operations are performed on the EMST
std::vector<float> weights; // weights are `std::sqrt(CGAL::squared_distance(...))` between these Point_3's
// Question(?) :> Should be iterating over "finite" edges here?
for (auto edge : tr.all_edges()) {
// insert simplex (!!) edge (between-vertices) here
edge_array.push_back(...);
// generate weight using std::sqrt(CGAL::squared_distance(...))
weights.push_back(...);
}
// build Graph from `edge_array` and `weights`
Graph g(...);
// build Euclidean-Minimum-Spanning-Tree (EMST) as list of simplex edges between vertices
std::list<E> emst;
boost::kruskal_minimum_spanning_tree(...);
// - traverse EMST from start of list, performing "cuts" into "patches" when we have hit
// max patch distance (euclidean) from current "first" vertex of "patch".
// - have to be able to access Triangulation_3 vertex info (via `locate`?) here
// - foreach collection of PointData in patch, assign `patch_id` and diagnostic color info,
// then commit individual serialized "patches" collections of Point_3 and RGBA photocolor to database
todo!();
it++; // next shape
}
The end goal of traversing each of the shapes using a Minimum Spanning Tree via Triangulation is to break each of the RANSAC estimated shapes into chunks, for other purposes. Picture example:
Do you want the graph of vertices and edges, or the graph of the dual, that is the tetrahedra would be BGL vertices and the faces between tetrahedra would be BGL edges?
For both it is not that hard to write the specialization of the graph traits class and the some free functions to navigate. Get inspired by the code for the 2D version for the graph_traits
Ultimately, it seems that what I need to do is create a boost Graph implementation, but am lost at what resources I need to accomplish this step.
The algorithm documents the concept requirements:
You can zoom in on the implications here: VertexListGraph and EdgeListGraph.
I found that I could possibly provide my own implementation for edge descriptors through an adjacency list as shown in Boost's example for a Kruskal MST, but got lost and confused at this step, and didn't know if that would be a sufficient approach.
It would be fine to show your attempt as a question, because it would help us know where you are stuck. Right now there is really no code to "go at", so I'll happily await a newer, more concrete question.
I was able to find an attempt at an answer. I added another property to my Point collection implementation (that included the index of that point in an array), and used this to iterate over edges in the triangulation to build the Graph, before running the EMST algorithm on it.
However, the real answer is don't do this -- it still is not working correctly (incorrect number of edges, including infinite vertices in the edge list, and other problems).
// Form triangulation to later convert into Graph representation
using VertexInfoBase = CGAL::Triangulation_vertex_base_with_info_3<
PointData,
Kernel
>;
using TriTraits = CGAL::Triangulation_data_structure_3<
VertexInfoBase,
CGAL::Delaunay_triangulation_cell_base_3<Kernel>,
CGAL::Parallel_tag
>;
using Triangulation_3 = CGAL::Delaunay_triangulation_3<Kernel, TriTraits>;
Triangulation_3 tr;
// Iterate through point indices assigned to each detected shape.
std::vector<std::size_t>::const_iterator
index_it = (*it)->indices_of_assigned_points().begin();
while (index_it != (*it)->indices_of_assigned_points().end()) {
PointData& p = *(points.begin() + (*index_it));
// assign shape diagnostic color info
boost::get<3>(p) = rgb;
// insert Point_3 data for triangulation and attach PointData info
TriTraits::Vertex_handle vertex = tr.insert(boost::get<0>(p));
vertex->info() = p;
index_it++; // next assigned point
}
std::cout << "Found triangulation with: \n\t" <<
tr.number_of_vertices() << "\tvertices\n\t" <<
tr.number_of_edges() << "\tedges\n\t" <<
tr.number_of_facets() << "\tfacets" << std::endl;
// build a Graph out of the triangulation that we can do a Minimum-Spanning-Tree on
// examples taken from https://www.boost.org/doc/libs/1_80_0/libs/graph/example/kruskal-example.cpp
using Graph = boost::adjacency_list<
boost::vecS, // OutEdgeList
boost::vecS, // VertexList
boost::undirectedS, // Directed
boost::no_property, // VertexProperties
boost::property< boost::edge_weight_t, double > // EdgeProperties
>;
using Edge = boost::graph_traits<Graph>::edge_descriptor;
using E = std::pair< size_t, size_t >; // <: TODO - should be iterator index of vertex in Triangulation_3 instead of size_t?
Graph g(tr.number_of_vertices());
boost::property_map< Graph, boost::edge_weight_t >::type weightmap = boost::get(boost::edge_weight, g);
// iterate over finite edges in the triangle, and add these
for (
Triangulation_3::Finite_edges_iterator eit = tr.finite_edges_begin();
eit != tr.finite_edges_end();
eit++
)
{
Triangulation_3::Segment s = tr.segment(*eit);
Point_3 vtx = s.point(0);
Point_3 n_vtx = s.point(1);
// locate the (*eit), get vertex handles?
// from https://www.appsloveworld.com/cplus/100/204/how-to-get-the-source-and-target-points-from-edge-iterator-in-cgal
Triangulation_3::Vertex_handle vh1 = eit->first->vertex((eit->second + 1) % 3);
Triangulation_3::Vertex_handle vh2 = eit->first->vertex((eit->second + 2) % 3);
double weight = std::sqrt(CGAL::squared_distance(vtx, n_vtx));
Edge e;
bool inserted;
boost::tie(e, inserted)
= boost::add_edge(
boost::get<6>(vh1->info()),
boost::get<6>(vh2->info()),
g
);
weightmap[e] = weight;
}
// build Euclidean-Minimum-Spanning-Tree (EMST) as list of simplex edges between vertices
//boost::property_map<Graph, boost::edge_weight_t>::type weight = boost::get(boost::edge_weight, g);
std::vector<Edge> spanning_tree;
boost::kruskal_minimum_spanning_tree(g, std::back_inserter(spanning_tree));
// we can use something like a hash table to go from source -> target
// for each of the edges, making traversal easier.
// from there, we can keep track or eventually find a source "key" which
// does not correspond to any target "key" within the table
std::unordered_map< size_t, std::vector<size_t> > map = {};
// iterate minimum spanning tree to build unordered_map (hashtable)
std::cout << "Found minimum spanning tree of " << spanning_tree.size() << " edges for #vertices " << tr.number_of_vertices() << std::endl;
for (std::vector< Edge >::iterator ei = spanning_tree.begin();
ei != spanning_tree.end(); ++ei)
{
size_t source = boost::source(*ei, g);
size_t target = boost::target(*ei, g);
// << " with weight of " << weightmap[*ei] << std::endl;
if ( map.find(source) == map.end() ) {
map.insert(
{
source,
std::vector({target})
}
);
} else {
std::vector<size_t> target_vec = map[source];
target_vec.push_back(target);
map[source] = target_vec;
}
}
// iterate over map to find an "origin" node
size_t origin = 0;
for (const auto& it : map) {
bool exit_flag = false;
std::vector<size_t> check_targets = it.second;
for (size_t target : check_targets) {
if (map.find(target) == map.end()) {
origin = target;
exit_flag = true;
break;
}
}
if (exit_flag) {
break;
}
}
std::cout << "Found origin of tree with value: " << origin << std::endl;

Using defined classes as edge weight for an undirected graph in graph boost library

I am new to Boost Graph Library and have questions concerning undirected graphs.
My problem is, a 2D space, in which several landmarks are positioned. At the beginning one Master-landmark is defined and the goal is to compute the relative poses from the Master-landmark to all other landmarks. Since the grid is to big to see it all at once, I get only local information meaning the relative connection/pose between two or more landmarks (the Master-landmark does not have to be in every image). Given these local information I need to build a tree in order to get the paths from the Master-landmark to the other landmarks. So far so good.
Since the landmark observation is done by another program, I build a matrix, where the information about the landmark relations are encoded (were two landmarks seen together? if yes, relative pose if not 0). This matrix is the so-called tag_relation_mat matrix of size N×N. N defines the number of landmarks and both rows and columns of the matrix represent all N landmarks to encode their relation. Each element (if there is a connection between two landmarks otherwise it is 0) is a rel_marker_pose consisting of: 2D pose (x- and y-values) and an error value (high value is bad). In other words, the tag_relation_mat is a matrix which encodes all valid connections between the landmarks.
Based on this matrix I'd like to generate an undirected graph and compute relative poses from the Master-landmark to all others given the best error value.
So what I am basically doing (with the help of this link) right now is
to build up a undirected graph and
solve it with the dijkstra_shortest_paths algorithm
The outcome is the combined weight (from master-landmark to landmark) and the parent of the landmark.
Given this information, I am recursively generating the paths from each landmark to the master-landmark by combining the transformations given by the N×N matrix:
.h file:
// Edge weight.
typedef boost::property<boost::edge_weight_t, double> EdgeWeightProperty;
typedef adjacency_list < boost::setS, boost::vecS, boost::undirectedS,
boost::no_property, EdgeWeightProperty > Graph; //graph type
typedef typename boost::graph_traits<Graph>::vertex_descriptor Vertex; //vertex descriptor
typedef typename boost::graph_traits<Graph>::edge_descriptor Edge; //edge descriptor
typedef std::pair<int, int> _edge;
class GraphTree {
public:
GraphTree(int no_of_tags) {
this->num_vertices = no_of_tags;
//generate vertices
for(int i = 0; i < no_of_tags; i++) {
Vertex v = boost::add_vertex(this->undirected_graph);
idx_to_vertex.insert({i, v});
}
}
/**
* #brief generateTree generates tree graph with tags as edges and
*/
void generateTree();
/**
* #brief solveTree thin tree based on chosen solver
*/
std::vector<rel_marker_pose> solveTree(tree_solver solver, int master_vertex_idx);
/**
* #brief getRelPosesToMaster given a master tag (either defined or automatically chosen) rel. poses to the other existing tags are computed
*/
void getRelPosesToMaster(int master_node, std::vector<Vertex> parent_vec, int start_node, rel_marker_pose* rel_m_pose);
private:
// declare a graph object
Graph undirected_graph;
std::map<int, Vertex> idx_to_vertex;
int num_vertices;// const int num_vertices = N;
int num_edges;//int num_edges = sizeof(edge_array)/sizeof(edge_array[0]);
std::vector<Edge> edge_vec;
};
.cpp file:
void GraphTree::generateTree() {
for(int i = 0; i < this->num_vertices; ++i) { //iterate over rows (landmark_from)
for(int j = 0; j < this->num_vertices; ++j) { //iterate over columns (landmark_to)
double weight = this->tag_relation_mat.at(i).at(j).rel_pose.quality;
//check if tags have been seen together -> add only relevant edges
if(this->tag_relation_mat.at(i).at(j).landmark_from != 0 && this->tag_relation_mat.at(i).at(j).landmark_to != 0) {
boost::add_edge(this->idx_to_vertex[i], this->idx_to_vertex[j], EdgeWeightProperty(weight), this->undirected_graph);
}
}
}
}
std::vector<rel_marker_pose> GraphTree::solveTree(tree_solver solver, int master_vertex_idx) {
std::vector<rel_marker_pose> master_to_vertex_vec;
if(solver == tree_solver::Dijkstra) { //http://www.boost.org/doc/libs/1_46_1/libs/graph/example/dijkstra-example.cpp
//create vectors to store the predecessors (p) and the distances from the root (d)
std::vector<Vertex> p(boost::num_vertices(this->undirected_graph));
std::vector<double> d(boost::num_vertices(this->undirected_graph));
//create a descriptor for the source node
Vertex master_vertex = vertex(master_vertex_idx, this->undirected_graph);
//evaluate dijkstra on graph g with source s, predecessor_map p and distance_map d
boost::dijkstra_shortest_paths(this->undirected_graph, master_vertex, boost::predecessor_map(&p[0]).distance_map(&d[0]));
boost::graph_traits < Graph >::vertex_iterator vi, vend;
for (boost::tie(vi, vend) = boost::vertices(this->undirected_graph); vi != vend; ++vi) {
rel_marker_pose r_m_p;
this->getRelPosesToMaster(master_vertex_idx, p, *vi, &r_m_p);
master_to_vertex_vec.push_back(r_m_p);
}
}
return master_to_vertex_vec;
}
void GraphTree::getRelPosesToMaster(int master_node, std::vector<Vertex> parent_vec, int start_node, rel_marker_pose* rel_m_pose ) {
rel_m_pose->rel_pose.quality += this->tag_relation_mat.at(start_node).at(parent_vec[start_node]).rel_pose.quality;
// ...
// combine the transformations
// ...
rel_m_pose->landmark_from = parent_vec[start_node];
if(parent_vec[start_node] == master_node) {
return;
}
else {
this->getRelPosesToMaster(master_node, parent_vec, parent_vec[start_node], rel_m_pose);
}
}
Well, it works, but is definitely not the most elegant way! I was wondering:
If there is any possibility to add all relevant information directly into the undirected graph (landmark name to the vertices; pose and quality to the edges). So far, it is just the quality, which is then used to solve the graph. So that after generating the graph, I would only need the graph and not merge informations between graph and matrix after solving the graph?
Is there an implemented way (in the boost lib) to obtain all visited edges from master-landmark to landmark?
Thanks a lot for your help!

Obtain predecessors with boost BGL for an all-pair shortest path search

I am using boost's BGL and I managed to compute the distance matrix in a graph where all the weights are set to one as follows:
using EdgeProperty = boost::property<boost::edge_weight_t, size_t>;
using UGraph =
boost::adjacency_list<
boost::vecS,
boost::vecS,
boost::undirectedS,
boost::no_property,
EdgeProperty
>;
using DistanceProperty = boost::exterior_vertex_property<UGraph, size_t>;
using DistanceMatrix = DistanceProperty::matrix_type;
template<typename Matrix>
Matrix distance_matrix(const UGraph& ug){
const size_t n_vertices{ boost::num_vertices(ug) };
DistanceMatrix d{n_vertices};
boost::johnson_all_pairs_shortest_paths(ug, d);
Matrix dist{ linalg::zeros<Matrix>(n_vertices, n_vertices) };
for(size_t j{0}; j < n_vertices; j++){
for(size_t i{0}; i < n_vertices; i++){
dist(i,j) = d[i][j];
}
}
return dist;
}
The element (i,j) of the distance matrix returned by distance_matrix corresponds to the number of edges between i and j along the shortest path (since the weight are set to one).
How can I obtain the information to reconstruct the shortest path from an all-pair problem? The list of predecessors seems available only for single-source problems (using dijkstra_shortest_paths) and I can't see how to obtain a similar information in the case of johnson_all_pairs_shortest_paths.
I would like to get the same result obtained in Python with scipy.sparse.csgraph.shortest_path when setting return_predecessors=True (see SciPy doc).

Creating random undirected graph in C++

The issue is I need to create a random undirected graph to test the benchmark of Dijkstra's algorithm using an array and heap to store vertices. AFAIK a heap implementation shall be faster than an array when running on sparse and average graphs, however when it comes to dense graphs, the heap should became less efficient than an array.
I tried to write code that will produce a graph based on the input - number of vertices and total number of edges (maximum number of edges in undirected graph is n(n-1)/2).
On the entrance I divide the total number of edges by the number of vertices so that I have a const number of edges coming out from every single vertex. The graph is represented by an adjacency list. Here is what I came up with:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <list>
#include <set>
#define MAX 1000
#define MIN 1
class Vertex
{
public:
int Number;
int Distance;
Vertex(void);
Vertex(int, int);
~Vertex(void);
};
Vertex::Vertex(void)
{
Number = 0;
Distance = 0;
}
Vertex::Vertex(int C, int D)
{
Number = C;
Distance = D;
}
Vertex::~Vertex(void)
{
}
int main()
{
int VertexNumber, EdgeNumber;
while(scanf("%d %d", &VertexNumber, &EdgeNumber) > 0)
{
int EdgesFromVertex = (EdgeNumber/VertexNumber);
std::list<Vertex>* Graph = new std::list<Vertex> [VertexNumber];
srand(time(NULL));
int Distance, Neighbour;
bool Exist, First;
std::set<std::pair<int, int>> Added;
for(int i = 0; i < VertexNumber; i++)
{
for(int j = 0; j < EdgesFromVertex; j++)
{
First = true;
Exist = true;
while(First || Exist)
{
Neighbour = rand() % (VertexNumber - 1) + 0;
if(!Added.count(std::pair<int, int>(i, Neighbour)))
{
Added.insert(std::pair<int, int>(i, Neighbour));
Exist = false;
}
First = false;
}
}
First = true;
std::set<std::pair<int, int>>::iterator next = Added.begin();
for(std::set<std::pair<int, int>>::iterator it = Added.begin(); it != Added.end();)
{
if(!First)
Added.erase(next);
Distance = rand() % MAX + MIN;
Graph[it->first].push_back(Vertex(it->second, Distance));
Graph[it->second].push_back(Vertex(it->first, Distance));
std::set<std::pair<int, int>>::iterator next = it;
First = false;
}
}
// Dijkstra's implementation
}
return 0;
}
I get an error:
set iterator not dereferencable" when trying to create graph from set data.
I know it has something to do with erasing set elements on the fly, however I need to erase them asap to diminish memory usage.
Maybe there's a better way to create some undirectioned graph? Mine is pretty raw, but that's the best I came up with. I was thinking about making a directed graph which is easier task, but it doesn't ensure that every two vertices will be connected.
I would be grateful for any tips and solutions!
Piotry had basically the same idea I did, but he left off a step.
Only read half the matrix, and ignore you diagonal for writing values to. If you always want a node to have an edge to itself, add a one at the diagonal. If you always do not want a node to have an edge to itself, leave it as a zero.
You can read the other half of your matrix for a second graph for testing your implementation.
Look at the description of std::set::erase :
Iterator validity
Iterators, pointers and references referring to elements removed by
the function are invalidated.
All other iterators, pointers and
references keep their validity.
In your code, if next is equal to it, and you erase element of std::set by next, you can't use it. In this case you must (at least) change it and only after this keep using of it.

Algorithm for selecting all edges and vertices connected to one vertex

I'm using Boost Graph to try and make sense of some dependency graphs I have generated in Graphviz Dot format.
Unfortunately I don't know very much about graph theory, so I have a hard time framing what I want to know in terms of graph theory lingo.
From a directed dependency graph with ~150 vertices, I'd like to "zoom in" on one specific vertex V, and build a subgraph containing V, all its incoming edges and their incoming edges, all its outgoing edges and their outgoing edges, sort of like a longest path through V.
These dependency graphs are pretty tangled, so I'd like to remove clutter to make it clearer what might affect the vertex in question.
For example, given;
g
|
v
a -> b -> c -> d
| | |
v v |
e f <-------+
if I were to run the algorithm on c, I think I want;
g
|
v
a -> b -> c -> d -> f
Not sure if b -> f should be included as well... I think of it as all vertices "before" c should have their in-edges included, and all vertices "after" c should have their out-edges included, but it seems to me that that would lose some information.
It feels like there should be an algorithm that does this (or something more sensible, not sure if I'm trying to do something stupid, cf b->f above), but I'm not sure where to start looking.
Thanks!
Ok, so I'll translate and adapt my tutorial to your specific question.
The documentation always assumes tons of "using namespace"; I won't use any so you know what is what.
Let's begin :
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/astar_search.hpp>
First, define a Vertex and an Edge :
struct Vertex{
string name; // or whatever, maybe nothing
};
struct Edge{
// nothing, probably. Or a weight, a distance, a direction, ...
};
Create the type or your graph :
typedef boost::adjacency_list< // adjacency_list is a template depending on :
boost::listS, // The container used for egdes : here, std::list.
boost::vecS, // The container used for vertices: here, std::vector.
boost::directedS, // directed or undirected edges ?.
Vertex, // The type that describes a Vertex.
Edge // The type that describes an Edge
> MyGraph;
Now, you can use a shortcut to the type of the IDs of your Vertices and Edges :
typedef MyGraph::vertex_descriptor VertexID;
typedef MyGraph::edge_descriptor EdgeID;
Instanciate your graph :
MyGraph graph;
Read your Graphviz data, and feed the graph :
for (each Vertex V){
VertexID vID = boost::add_vertex(graph); // vID is the index of a new Vertex
graph[vID].name = whatever;
}
Notice that graph[ a VertexID ] gives a Vertex, but graph[ an EdgeID ] gives an Edge. Here's how to add one :
EdgeID edge;
bool ok;
boost::tie(edge, ok) = boost::add_edge(u,v, graphe); // boost::add_edge gives a std::pair<EdgeID,bool>. It's complicated to write, so boost::tie does it for us.
if (ok) // make sure there wasn't any error (duplicates, maybe)
graph[edge].member = whatever you know about this edge
So now you have your graph. You want to get the VertexID for Vertex "c". To keep it simple, let's use a linear search :
MyGraph::vertex_iterator vertexIt, vertexEnd;
boost::tie(vertexIt, vertexEnd) = vertices(graph);
for (; vertexIt != vertexEnd; ++vertexIt){
VertexID vertexID = *vertexIt; // dereference vertexIt, get the ID
Vertex & vertex = graph[vertexID];
if (vertex.name == std::string("c")){} // Gotcha
}
And finally, to get the neighbours of a vertex :
MyGraph::adjacency_iterator neighbourIt, neighbourEnd;
boost::tie(neighbourIt, neighbourEnd) = adjacent_vertices( vertexIdOfc, graph );
for(){you got it I guess}
You can also get edges with
std::pair<out_edge_iterator, out_edge_iterator> out_edges(vertex_descriptor u, const adjacency_list& g)
std::pair<in_edge_iterator, in_edge_iterator> in_edges(vertex_descriptor v, const adjacency_list& g)
// don't forget boost::tie !
So, for your real question :
Find the ID of Vertex "c"
Find in_edges recursively
Find out_edges recursively
Example for in_edges (never compiled or tried, out of the top of my head):
void findParents(VertexID vID){
MyGraph::inv_adjacency_iterator parentIt, ParentEnd;
boost::tie(parentIt, ParentEnd) = inv_adjacent_vertices(vID, graph);
for(;parentIt != parentEnd); ++parentIt){
VertexID parentID = *parentIt;
Vertex & parent = graph[parentID];
add_edge_to_graphviz(vID, parentID); // or whatever
findParents(parentID);
}
}
For the other way around, just rename Parent into Children, and use adjacency_iterator / adjacent_vertices.
Here's how it ended up. I realized I needed to work entirely in terms of in-edges and out-edges:
// Graph-related types
typedef property < vertex_name_t, std::string > vertex_p;
typedef adjacency_list < vecS, vecS, bidirectionalS, vertex_p> graph_t;
typedef graph_t::vertex_descriptor vertex_t;
typedef std::set< graph_t::edge_descriptor > edge_set;
// Focussing algorithm
edge_set focus_on_vertex(graph_t& graph, const std::string& focus_vertex_name)
{
const vertex_t focus_vertex = find_vertex_named(graph, focus_vertex_name);
edge_set edges;
collect_in_edges(graph, focus_vertex, edges);
collect_out_edges(graph, focus_vertex, edges);
return edges;
}
// Helpers
void collect_in_edges(const graph_t& graph, vertex_t vertex, edge_set& accumulator)
{
typedef graph_t::in_edge_iterator edge_iterator;
edge_iterator begin, end;
boost::tie(begin, end) = in_edges(vertex, graph);
for (edge_iterator i = begin; i != end; ++i)
{
if (accumulator.find(*i) == accumulator.end())
{
accumulator.insert(*i);
collect_in_edges(graph, source(*i, graph), accumulator);
}
}
}
void collect_out_edges(const graph_t& graph, vertex_t vertex, edge_set& accumulator)
{
typedef graph_t::out_edge_iterator edge_iterator;
edge_iterator begin, end;
boost::tie(begin, end) = out_edges(vertex, graph);
for (edge_iterator i = begin; i != end; ++i)
{
if (accumulator.find(*i) == accumulator.end())
{
accumulator.insert(*i);
collect_out_edges(graph, target(*i, graph), accumulator);
}
}
}
vertex_t find_vertex_named(const graph_t& graph, const std::string& name)
{
graph_t::vertex_iterator begin, end;
boost::tie(begin, end) = vertices(graph);
for (graph_t::vertex_iterator i = begin; i != end; ++i)
{
if (get(vertex_name, graph, *i) == name)
return *i;
}
return -1;
}
This also handles cycles before or after the vertex in question. My source dependency graph had cycles (shudder).
I made some attempts at generalizing collect_*_edges into a templated collect_edges, but I didn't have enough meta-programming debugging energy to spend on it.