How to split set of vertices with Boost Graph? - c++

I'm writing some algorithm in C++ for parallel graph coloring using Boost Graph and adjacency_list.
I'm working with very big graph (the smallest has 32K vertices).
What I'm trying to do is to take the whole set of vertices, split them in parts and assign each part to a different thread and work in parallel, but I'm struggling with some passages.
The basic idea what this:
int step = g.m_vertices.size()/4;
int min = 0;
for(int i = 0; i < 4; i++){
// call the function
}
And the function I call inside is something like that
for (vp = vertices(g); vp.first != vp.second; ++vp.first) {
cout << *vp.first << endl;
}
So I have two questions:
g.m_vertices.size()/4; is the right solutions? If initially I have 10 vertices, then I remove some vertex in the middle (e.g. 4), only 6 vertices left (so this is the new size) but the index of the vertices go from 0 to 5 or from 0 to 9?
How can pass only a subset of vertices to vp instead of passing vertices(g)?

g.m_vertices.size()/4; is the right solutions?
That depends ONLY on your requirements.
If initially I have 10 vertices, then I remove some vertex in the middle (e.g. 4), only 6 vertices left (so this is the new size) but the index of the vertices go from 0 to 5 or from 0 to 9?
That depends on your graph model. You don't specify the type of your graph (I know, you do say which template, but not the template parameters). Assuming vecS for the Vertex container selector, then yes, after 4 removals, the vertex descriptors (and index) will be [0,6).
How can pass only a subset of vertices to vp instead of passing vertices(g)
Many ways.
you can std::for_each with a parallel execution policy
you can use openmp to create a parallel section from the plain loop
you can use filtered_graph adapter to create 4 "views" of the underlying graph and operate on those
you can use PBGL which is actually created for dealing with huge graphs. This has the added benefit that it works with threading/interprocess/inter-host communication, can coordinate algorithms across segments etc.
you can use sub_graphs; this is mainly (only) interesting if the way your graphs get built have a natural segmentation
None of the solutions are trivial. But, here's naive demo using filtered_graph:
Live On Compiler Explorer
#include <boost/graph/adjacency_list.hpp>
#include <boost/graph/filtered_graph.hpp>
#include <boost/graph/random.hpp>
#include <iostream>
#include <random>
using G = boost::adjacency_list<>;
using V = G::vertex_descriptor;
G make_graph() {
G g;
std::mt19937 prng(std::random_device{}());
generate_random_graph(g, 32 * 1024 - (prng() % 37), 64 * 1024, prng);
return g;
}
template <int NSegments, int Segment> struct SegmentVertices {
std::hash<V> _h;
bool operator()(V vd) const { return (_h(vd) % NSegments) == Segment; }
};
template <int N>
using Quart = boost::filtered_graph<G, boost::keep_all, SegmentVertices<4, N>>;
template <typename Graph>
void the_function(Graph const& g, std::string_view name)
{
std::cout << name << " " << size(boost::make_iterator_range(vertices(g)))
<< " vertices\n";
}
int main()
{
G g = make_graph();
the_function(g, "full graph");
Quart<0> f0(g, {}, {});
Quart<1> f1(g, {}, {});
Quart<2> f2(g, {}, {});
Quart<3> f3(g, {}, {});
the_function(f0, "f0");
the_function(f1, "f1");
the_function(f2, "f2");
the_function(f3, "f3");
}
Printing e.g.
full graph 32766 vertices
f0 8192 vertices
f1 8192 vertices
f2 8191 vertices
f3 8191 vertices

Related

About the using vertices as index in graphs c++ why we wasting space

I have a question about the vertices of graphs in c++. Like, let's suppose I want to have a graph with vertices as 100,200,300,400 and they are connected in some manner not important but if we are creating an adjacency list graph what we do is.
adj[u].push_back(v);
adj[v].push_back(u);
and let 400 is connected with 200 we are doing adj[400] and creating a large matrix of vectors when all we need was a matrix of size 4 as there are four vertices and here we going till 400 can someone explain this. Is it like in graphs we have all vertices consecutive and must start from some small number? The code works fine when you have vertices like 1,2,3,4,5. We are using vertices as an index and depending on our vertices they can vary by a lot than what we needed.
An adjacency list stores a list of the connected vertices for each vertex in the graph. For example, given this graph:
1---2
|\ |
| \ |
| \|
3---4
You would store:
1: 2, 3, 4
2: 1, 4
3: 1, 4
4: 1, 2, 3
This can be done with a std::vector<std::vector<int>>. Note that you do not need to use the values of the graph as the indexes into these vectors. If the values of the graph were instead 100, 200, 300, 400 you could use a separate map container to convert from vertex value to an index into the adjacency list (std::unordered_map<ValueType, IndexType>). You could also store a Vertex structure such as this:
struct Vertex {
int index; // 0, 1, 2, 3, 4, 5, etc.
int value; // 100, 200, or whatever value you want
};
Not sure what the problem is exactly but i guess si about the speed, the most simple and easy fix is to have a "memory layout" like in a pixel buffer, a index is a implicit value defied by de position since each segment is.
-------------------------------------------------------------------...
| float, float, float, float | float, float, float, float | float,
-------------------------------------------------------------------...
| index 0 | index 1 | index 2
-------------------------------------------------------------------...
As you didn't give a sample code to give a better idea the example asumes a lot if things but basically implements the layout idea; using arrays is not needed, is my preference, vector would give almost no performance penalty the bigges one being the resizement; some of the lines are not intuitive, like why is a operation + a array faster than having an array inside an array, it just is, the memory is slower than te cpu.
Small note, bacause all the "small arrays" are just a big array you need to worrie of overflows and underflow or add a check; if some vertex groups are smaller that the chunk size just waste the space, the time to compact and un compact the data is worst in most cases than having the padding.
#include <iostream>
#include <chrono>
template <typename VAL>
struct Ver_Map {
VAL * base_ptr;
uint32_t map_size;
uint32_t vertex_len;
void alloc_map(uint32_t elem, uint32_t ver_len, VAL in){
base_ptr = new VAL[elem * ver_len] { in };
vertex_len = ver_len;
map_size = elem;
}
void free_map(){
delete base_ptr;
}
VAL * operator()(uint32_t object){
return &base_ptr[(object * vertex_len)];
}
VAL & operator()(uint32_t object, uint32_t vertex){
return base_ptr[(object * vertex_len) + vertex];
}
};
int main (void) {
const uint32_t map_len = 10000;
Ver_Map<float> ver_map;
ver_map.alloc_map(map_len, 4, 0.0f);
// Use case
ver_map(0, 2) = 0.5f;
std::cout << ver_map(0)[1] << std::endl;
std::cout << ver_map(0)[2] << std::endl;
std::cout << ver_map(0, 2) << std::endl;
// Size in memory
std::cout << "Size of struct -> "
<< (map_len * sizeof(float)) + sizeof(Ver_Map<float>)
<< " bytes" << std::endl;
// Time to fully clear
auto start = std::chrono::steady_clock::now();
for(int x=0; x < map_len; x++){
for(int y=0; y < ver_map.vertex_len; y++){
ver_map(x, y) = 1.0f;
}
}
std::cout << "Full write time -> "
<< (uint32_t)std::chrono::duration_cast<std::chrono::microseconds>
(std::chrono::steady_clock::now() - start).count()
<< " microseconds" << std::endl;
ver_map.free_map();
return 0;
}

K-mean Clustering R-Tree boost

I'm using R-Tree boost. I added a hundred thousand points in r-tree boost. Now I want to cluster and group my points like this link. It seems like that I should calculate k-mean value from points. How is it possible to calculate k-mean value from r-tree points geometry.
There are various clustering algorithms having different properties and inputs. What needs to be considered before choosing an algorithm is what do you want to achieve. k-means referred by you in the question aims to partition set of points into k clusters. So the input is the desired number of clusters. On the other hand, the algorithm described in the blog you linked, a variant of greedy clustering algorithm aims to partition set of points into circular clusters of some size. The input is the radius of the desired cluster.
There are various algorithms performing k-means clustering used for different data and applications like separating 2 n-dimensional subsets with hyperplane or clustering with Voronoi diagram (Lloyd's algorithm) often called k-means algorithm. There are also density-based clustering algorithms mentioned by #Anony-Mousse in the comments under your question.
In the article, you mentioned it's a hierarchical version of greedy clustering. They have to calculate the clusters for multiple zoom levels and to avoid analyzing all of the points each time they use the centroids of clusters from the previously analyzed level as a source of points for clustering for the next level. However, in this answer, I'll show how to implement this algorithm for one level only. So the input will be a set of points and a size of cluster as a radius. If you need hierarchical version you should calculate centroids of the output clusters and use them as the input of the algorithm for the next level.
Using Boost.Geometry R-tree the algorithm for one level (so not hierarchical) could be implemented like this (in C++11):
#include <boost/geometry.hpp>
#include <boost/geometry/index/rtree.hpp>
#include <boost/range/adaptor/indexed.hpp>
#include <boost/range/adaptor/transformed.hpp>
#include <iostream>
#include <vector>
namespace bg = boost::geometry;
namespace bgi = boost::geometry::index;
typedef bg::model::point<double, 2, bg::cs::cartesian> point_t;
typedef bg::model::box<point_t> box_t;
typedef std::vector<point_t> cluster_t;
// used in the rtree constructor with Boost.Range adaptors
// to generate std::pair<point_t, std::size_t> from point_t on the fly
template <typename First, typename Second>
struct pair_generator
{
typedef std::pair<First, Second> result_type;
template<typename T>
inline result_type operator()(T const& v) const
{
return result_type(v.value(), v.index());
}
};
// used to hold point-related information during clustering
struct point_data
{
point_data() : used(false) {}
bool used;
};
// find clusters of points using cluster radius r
void find_clusters(std::vector<point_t> const& points,
double r,
std::vector<cluster_t> & clusters)
{
typedef std::pair<point_t, std::size_t> value_t;
typedef pair_generator<point_t, std::size_t> value_generator;
if (r < 0.0)
return; // or return error
// create rtree holding std::pair<point_t, std::size_t>
// from container of points of type point_t
bgi::rtree<value_t, bgi::rstar<4> >
rtree(points | boost::adaptors::indexed()
| boost::adaptors::transformed(value_generator()));
// create container holding point states
std::vector<point_data> points_data(rtree.size());
// for all pairs contained in the rtree
for(auto const& v : rtree)
{
// ignore points that were used before
if (points_data[v.second].used)
continue;
// current point
point_t const& p = v.first;
double x = bg::get<0>(p);
double y = bg::get<1>(p);
// find all points in circle of radius r around current point
std::vector<value_t> res;
rtree.query(
// return points that are in a box enclosing the circle
bgi::intersects(box_t{{x-r, y-r},{x+r, y+r}})
// and were not used before
// and are indeed in the circle
&& bgi::satisfies([&](value_t const& v){
return points_data[v.second].used == false
&& bg::distance(p, v.first) <= r;
}),
std::back_inserter(res));
// create new cluster
clusters.push_back(cluster_t());
// add points to this cluster and mark them as used
for(auto const& v : res) {
clusters.back().push_back(v.first);
points_data[v.second].used = true;
}
}
}
int main()
{
std::vector<point_t> points;
for (double x = 0.0 ; x < 10.0 ; x += 1.0)
for (double y = 0.0 ; y < 10.0 ; y += 1.0)
points.push_back(point_t{x, y});
std::vector<cluster_t> clusters;
find_clusters(points, 3.0, clusters);
for(size_t i = 0 ; i < clusters.size() ; ++i) {
std::cout << "Cluster " << i << std::endl;
for (auto const& p : clusters[i]) {
std::cout << bg::wkt(p) << std::endl;
}
}
}
See also their implementation: https://github.com/mapbox/supercluster/blob/master/index.js#L216
Furthermore, take into account the remarks of #Anony-Mousse about the accuracy of distance calculation on the globe. The solution above is for cartesian coordinate system. If you want to use different coordinate system then you have to define point type differently, e.g. use bg::cs::spherical_equatorial<bg::degree> or bg::cs::geographic<bg::degree> instead of bg::cs::cartesian. You also have to generate the query bounding box differently. But bg::distance() will automatically return correct distance after changing the point type.

How can I get the vertices of all cells in a container with Voro++ Library?

I try to implement a simple 3d voronoi application with Voro++. I have a container containing particles. After putting all the particles in the container, how can I get the vertices of all voronoicells computed by Voro++.
from the documentation it should be something like this:
http://math.lbl.gov/voro++/examples/polygons/
the doc says:
On line 47, a call is made to the face_vertices routine, which returns information about which vertices comprise each face. It is a vector of integers with a specific format: the first entry is a number k corresponding to the number of vertices making up a face, and this is followed k additional entries describing which vertices make up this face. For example, the sequence (3, 16, 20, 13) would correspond to a triangular face linking vertices 16, 20, and 13 together. On line 48, the vertex positions are returned: this corresponds to a vector of triplets (x, y, z) describing the position of each vertex.
i modified the example script so that it stores every particles cell connections and vertex positions in vectors. but please verify this!
hope this helps!
#include "voro++.hh"
#include "container.hh"
#include <v_compute.hh>
#include <c_loops.hh>
#include <vector>
#include <iostream>
using namespace voro;
int main() {
// Set up constants for the container geometry
const double x_min=-5,x_max=5;
const double y_min=-5,y_max=5;
const double z_min=0,z_max=10;
unsigned int i,j;
int id,nx,ny,nz;
double x,y,z;
std::vector<int> neigh;
voronoicell_neighbor c;
// Set up the number of blocks that the container is divided into
const int n_x=6,n_y=6,n_z=6;
// Create a container with the geometry given above, and make it
// non-periodic in each of the three coordinates. Allocate space for
// eight particles within each computational block
container con(x_min,x_max,y_min,y_max,z_min,z_max,n_x,n_y,n_z,
false,false,false,8);
//Randomly add particles into the container
con.import("pack_six_cube");
// Save the Voronoi network of all the particles to text files
// in gnuplot and POV-Ray formats
con.draw_cells_gnuplot("pack_ten_cube.gnu");
con.draw_cells_pov("pack_ten_cube_v.pov");
// Output the particles in POV-Ray format
con.draw_particles_pov("pack_ten_cube_p.pov");
// Loop over all particles in the container and compute each Voronoi
// cell
c_loop_all cl(con);
int dimension = 0;
if(cl.start()) do if(con.compute_cell(c,cl)) {
dimension+=1;
} while (cl.inc());
std::vector<std::vector<int> > face_connections(dimension);
std::vector<std::vector<double> > vertex_positions(dimension);
int counter = 0;
if(cl.start()) do if(con.compute_cell(c,cl)) {
cl.pos(x,y,z);id=cl.pid();
std::vector<int> f_vert;
std::vector<double> v;
// Gather information about the computed Voronoi cell
c.neighbors(neigh);
c.face_vertices(f_vert);
c.vertices(x,y,z,v);
face_connections[counter] = f_vert;
vertex_positions[counter] = v;
std::cout << f_vert.size() << std::endl;
std::cout << v.size() << std::endl;
counter += 1;
} while (cl.inc());
}
Regards, Lukas

Implement weighted graph

Say I have the following adjacency matrix:
A B C D
A 0 9 0 5
B 9 0 0 0
C 0 0 0 2
D 5 0 2 0
How would this acutally be implemented? I realize I can use a 2D array to represent the weighted edges between vertices but I'm not sure how to represent the vertices.
int edges[4][4];
string vertices[4];
Is this the way to do it? The index in the vertices array corresponds to the row index in the edges array.
You can use a two dimensional std::map
Using this method allows for the matrix to grow and shrink when ever you want.
#include <map>
#include <string>
#include <iostream>
int main()
{
std::map<std::string, std::map<std::string, int>> vertices;
vertices["A"]["A"] = 0; vertices["A"]["B"] = 9; vertices["A"]["C"] = 0; vertices["A"]["D"] = 5;
vertices["B"]["A"] = 9; vertices["B"]["B"] = 0; vertices["B"]["C"] = 0; vertices["B"]["D"] = 0;
vertices["C"]["A"] = 0; vertices["C"]["B"] = 0; vertices["C"]["C"] = 0; vertices["C"]["D"] = 2;
vertices["D"]["A"] = 5; vertices["D"]["B"] = 0; vertices["D"]["C"] = 2; vertices["D"]["D"] = 0;
std::cout << vertices["A"]["A"] << std::endl;
std::cout << vertices["A"]["B"] << std::endl;
}
Going by the indexes of adjacency matrix as vertices is common practice.
If there are a fixed number of vertices in the graph. You could declare the vertices as an enum and index the array directly using the enumerated vertex names. I think it makes the mapping a little clearer.
enum VERTEX
{
A = 0,
B,
C,
D,
LAST = D
};
int edge[LAST+1][LAST+1];
edge[A][A] = 0;
edge[A][B] = edge[B][A] = 9;
edge[A][C] = edge[C][A] = 0;
// etc.
This keeps things simple and quick by allowing you to use an array without any look-up penalty while keeping things easy to understand.
For most intents and purposes it's usually more efficient to represent the graph as an adjentacy list:
std::vector< std::list<int> > graph;
So graph[i] is all neighboring vertexes of i. The advantage being that when dealing with graphs we usually want to traverse i's neighbors. Also, for large sparse graphs, space complexity is much lower. This of course could also be extended to include weights with something like:
std::vector< std::list<std::pair< int, int> > > graph;
or for more complex types, define a type Vertex...
EDIT: If you require indices as strings, this could easily be done by opting to std::map instead of std::vector:
std::map< std::string, std::list<std::pair< std::string /*vertex*/, int/*weight*/> > > graph;
But it appears from your original post that you just wish to map indexes to vertex names, in which case I'd go with the first soluion but also keep the vertex name mapped to the index:
std::vector< std::pair< std::string/*name*/, std::list<std::pair< int /*index*/, int/*weight*/> > > > graph;

Calculating critical path of a DAG in C++

I'm doing the calculation of the critical path for the DAG of the image, according to this algorithm for another post.My teacher requires that aarray be implemented, I simplify the homework statement, a simple graph implemented through arrays.
This es my code in which I have 3 arrays v, u and d, representing the origin node of the edges, the end node of the edges and the distance between each pair of vertices, as shown in the picture above. in the graph of the image, the duration of the project is equal to 25 corresponding to the sum of distances from the critical path.
My code fails to make good the calculation of distances according to the pseudocode of this link
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <queue>
#include <iostream>
#include <algorithm>
using namespace std;
int main (){
int numberVertex=6; //number of vertex
int numberActivities=9;//number of edges
int i, j;
/*vertices are of the form (v, u) */
//indegree of each vertex (that is, count the number of edges entering them)
int indegree [6]={0,0,0,0,0,0};
int v[9]={0,0,1,1,2,4,3,3,3};//array v represent the starting vertex of de edge
int u[9]={1,2,2,3,4,5,4,5,5};//array u represent the coming vertex of de edge
int d[9]={5,6,3,8,2,12,0,1,4};//array d represent the time of de activity (v,u)
int project_duration=0;//project duration
/*# Compute the indegree for each vertex v from the graph:
for each neighbor u of v: indegree[u] += 1*/
for (j=0; j<numberActivities; j++){
indegree[u[j]]++;
}
for (j=0;j<numberVertex; j++)
printf ("indegree %d= %d\n",j,indegree[j] );
queue<int> Q; //queue Q = empty queue
int distance [numberVertex];
memset(distance, 0, sizeof(int) * numberVertex);//distance = array filled with zeroes
//for each vertex v:
//if indegree[v] = 0:
//insert v on Q
for (j=0; j<numberVertex; j++)
{
if (indegree[j]==0)
Q.push(v[j]);
}
int first;
//printf ("first in the queue=%d\n", Q.front());
/*for each neighbor u of v:
d istance[u] = max(distance[u], distance[v] + time(v, u))
indegree[u] -= 1
if indegree[u] = 0:
insert u on Q
*/
while (!Q.empty()){ //while Q is not empty:
first= Q.front (); //v = get front element from Q
Q.pop(); //delete de first from queue
distance[u[first]]=std::max(distance[u[first]],
distance[v[first]]+ d[first]);
indegree[u[first]]-=1;
if (indegree[u[first]]==0){
Q.push(u[first]);
}
}
for (j=0; j<numberVertex; j++)
{
printf ("dist [%d]= %d\n", j, distance[j]);
}
/*Now, select the vertex x with the largest distance.
This is the minimum total project_duration.*/
printf ("Total Project Duration %d\n", project_duration);
return (0);
}
What am I doing wrong or how it could solve the code to tell me what is the duration of the project (corresponds to the sum of distances from the critical path)?. only able to calculate the distance to the first 3 vertex.
Your queue contains vertices. Your arrays u, v, d, are indexed by edge numbers.
So you cannot write
first = Q.front();
... u[first] ...
since first is a vertex.
More generally, your code will be a lot easier to read (and the bug will be more obvious) if you use meaningful variable names. first is not very explicit (first what?), and u, v, d are also quite cryptic.
writing something like
cur_vertex = todo.front()
distance[dest[cur_vertex]] = std::max(distance[dest[cur_vertex]],
distance[source[cur_vertex]]+ weight[cur_vertex]);
will immediately raise a question: the source of a vertex, what is that?
(Here we are using variable names as a substitute for proper type checking. An ADA programmer would have declared two different integer types to avoid the confusion between vertices and edge numbers.)
Another question: where did the loop over the successors of first go? It's in the pseudo-code, but not in your source.