Boost: Big graphs & Multithreading - c++

I need to create a directed graph that can be quite large from a big dataset. I know these things for sure:
Each node has at most K outgoing edges
I have a list (unordered_map) of N >> K nodes
The graph is build by comparing all nodes with each other (yes, O(N^2) unfortunately)
Thinking about it, I would parallelize the graph creation using std::thread, and I was wondering if this could be done via Boost Graph Library.
If I use the adjacency matrix, it should be possible to preallocate the matrix (K*N elements), and hence it would be thread-safe to insert all adjacent nodes.
I've read that BGL could be thread-unsafe, but the posts I've found are three years old.
Do you know if it's possible to do what I'm thinking? Do you recommend doing otherwise?
Cheers!

Almost any graph algorithm in BGL needs a mapping: vertex -> int which assigns to each vertex a unique integer within the range [0, num_vertices(g) ). This mapping is known as "vertex_index" and is usually accessible as property_map.
Having said that, I can assume your vertices are already integers or associated with some integers (e.g. your unordered_map has some extra field in "mapped_type"). Even better (for performance and memory) if your input vertices are stored in continuous tight array, e.g. std::vector, then indexing is natural.
If vertices are [associated with] integers, your best choice for memory-tight graph is "Compressed Sparse Row Graph". The graph is immutable, so you need to populate edges container before you generate a graph.
As ravenspoint explained, your best choice is to equip each thread with its own local container of results and lock the central container only when merging the local result into the final one. Such strategy is implemented lock-less by TBB template tbb::parallel_reduce. So your full code for graph building can look roughly as below:
#include "tbb/blocked_range2d.h"
#include "tbb/parallel_reduce.h"
#include "boost/graph/compressed_sparse_row_graph.hpp"
typedef something vertex; //e.g.something is integer giving index of a real data
class EdgeBuilder
{
public:
typedef std::pair<int,int> edge;
typedef std::vector<edge> Edges;
typedef ActualStorage Input;
EdgeBuilder(const Input & input):_input(input){} //OPTIONAL: reserve some space in _edges
EdgeBuilder( EdgeBuilder& parent, tbb::split ): _input(parent.input){} // reserve something
void operator()( const const tbb::blocked_range2d<size_t>& r )
{
for( size_t i=r.rows().begin(); i!=r.rows().end(); ++i ){
for( size_t j=r.cols().begin(); j!=r.cols().end(); ++j ) {
//I assume you provide some function to compute existence
if (my_func_edge_exist(_input,i, j))
m_edges.push_back(edge(i,j));
}
}
}
//merges local results from two TBB threads
void join( EdgeBuilder& rhs )
{
m_edges.insert( m_edges.end(), rhs.m_edges.begin(), rhs.m_edges.end() );
}
Edges _edges; //for a given interval of vertices
const Input & _input;
};
//full flow:
boost::compressed_sparse_row_graph<>* build_graph( const Storage & vertices)
{
EdgeBuilder builder(vertices);
tbb::blocked_range2d<size_t,size_t> range(0,vertices.size(), 100, //row grain size
0,vertices.size(), 100); //col grain size
tbb::parallel_reduce(range, builder);
boost::compressed_sparse_row_graph<>
theGraph = new boost::compressed_sparse_row_graph<>
(boost::edges_are_unsorted_multi_pass_t,
builder._edges.begin(), builder._edges.end(),
vertices.size() );
return theGraph;
}

I think you should break your goal down into two separate sub-goals.
Create the links between nodes by doing the N * ( N - 1 ) tests of pairs of nodes. You appear to have an idea of how to break this up into independent threads. Store the results in a data structure that you know is thread safe, without worrying about the mysteries of boost:graph.
Create the boost::graph from your nodes and ( just created ) links.
A note about storing the links created in each thread: It is not so easy to find a suitable thread safe data structure. If you use a STL dynamically allocated structure, then you have to worry about making a thread safe allocator which is a challenge. If you pre-allocate, then there is a lot of meessy code to handle the allocations. So, I would suggest storing the links created by each thread in a separate data structure, so they do not have to be thread safe. When the links are all created, you can loop over the links created by each thread one by one.
A slightly more efficient design could be imagined, but will require a lot of arcane knowledge about thread safety. The design I propose can be implemented without arcane knowledge or tricky code and will therefore be implemented more quickly and more robustly and will be easier to maintain.

Related

Best datastructure for iterating over and moving elements to front

As part of a solution to a bigger problem that is finding the solution to a maximum flow problem. In my implementation of the relabel-to-front algorithm I'm having a performance bottleneck that I didn't expect.
The general structure for storing the graph data is as follows:
struct edge{
int destination;
int capacity;
};
struct vertex{
int e_flow;
int h;
vector<edge> edges;
};
The specifics of the algorithm are not that important to the question. In the main loop of the solution I'm looping over all vertices except the source and the sink. If at some point a change is made to a vertex then that vertex is put at the front of the list and the iteration starts again from the start. Until the end of the list is reached and we terminate. This part looks as follows now
//nodes are 0..nodeCount-1 with source=0 and sink=nodeCount-1
vector<int> toDischarge(nodeCount-2,0);
for(int i=1;i<sink;i++){
toDischarge[i-1]=i;
}//skip over source and sink
//custom pointer to the entry of toDischarge we are currently accessing
int point = 0;
while(point != nodeCount-2){
int val = toDischarge[point];
int oldHeight = graph[val].h;
discharge(val, graph, graph[val].e_flow);
if(graph[val].h != oldHeight){
rotate(toDischarge.begin(), toDischarge.begin()+point, toDischarge.begin()+point+1);
//if the value of the vertex has changed move it to the front and reset pointer
point = 0;
}
point++;
}
I tried using an std::list data structure before the vector solution but that was even slower even though conceptually that didn't make sense to me since (re)moving elements in a list should be easy. After some research I found out that it was probably horribly performant due to caching issues with list.
Even with the vector solution though I did some basic benchmarking using valgrind and have the following results.
If I understand this correctly then over 30% of my execution time is just spent doing vector element accesses.
Another solution I've tried is making a copy of the vertex needed for that iteration into a variable since it is accessed multiple times, but that was even worse performance because I think it is also making a copy of the whole edge list.
What data structure would improve the general performance of these operations? I'm also interested in other data structures for storing the graph data if that would help.
It seems to me that this is what std::deque<> is for. Imagine it as a 'non-continuous vector', or some vector-like batches tied together. You can use the same interface as vector, except that you cannot assume that adding an index to the first element's pointer results in the given element (or anything sensible other than UB); you need to use [] for indexing. Also, you have dq.insert(it, elem); that's quick if it is std::begin(it) or std::end(it).

Which data structure to use to store edges of graph so that I can access edge weight in constant time in c++?

I am using an adjacency list to store a graph. Since I am using adjacency list, I cannot access edge weight of a graph in constant time. So, I wonder which EXTRA data structure to use only to story edges indexed by the two nodes u and v?
Currently, I am trying with map<pair<int, int>, int> but it has log (N) complexity and unordered_map does not have a policy for pairs. I know that, an edge weight is independent of the order of {u,v}, but I am not able to use this feature anyhow.
Use an adjacency matrix; a 2D array where each element in the array[x][y] is the weight of the edge between nodes x and y.
A quite simple solution would be to create an array that stores the outgoing edges for each node plus the weight. You'd simply jump to one node, search the other node in it's outgoing edges and take the weight. Complexity is maximal degree, which I usually assume to be capped.
Only thing to make sure is that all the redundant information is kept consistent.
Like
class AdjacentWeightedEdges {
struct OutgoingWeightedEdge {
size_t target_node;
int weight;
}
vector<OutgoingWeightedEdge> edges;
int edge_weight(const size_t index) const {
iterate through edges
if an edge with index found, return it's weight
raise an error if not
}
}
class Graph {
//your stuff as it is right inserted here
vector<AdjacentWeightedEdges> adjacencies;
int edge_weight(const size_t index_1, const size_t index_2){
return adjacencies[index_1].edge_weight(index_2);
}
}
If even a 1d approach like this creates memory problems, consider only storing the edges for index_1 < index_2.
Another similar method:
Store an array of pointers to the nodes, have the edge weights in the adjacency list and do what I did directly. If you don't go with indices anyhow if memory is a problem.
Another answer here talks about the adjacency matrix - this one could even work if a certain structure for a sparse matrix class is used that stores first non-zero in row/column pointers and then pointers to every following non-zero entry. Although this essentially collapses to my approach. Might be worthwhile if you need a sparse matrix class anyway.

Looking for a data structure which provides both random and "sequential" access

This is a programming problem I come across very often and was wondering whether there is a data structure, either in the C++ STL or one I can implement myself which provides both random and sequential access.
An example of why I might need this:
Say there are n types of items, (n = 1000000, for example), and there's a fixed number of each type of item (for example, 0 or 10)
I store these items into an array, where the array index represents the type of the item, and the value represents how many items of that given type are there
Now, I have an algorithm which iterates over all EXISTING items. To obtain these items, it is very wasteful to iterate over the entire array when all the entries are 0, except for i.e. Array[99999] and Array[999999].
Normally, I solve this by using a linked list which saves the indices of all the nonzero array entries. I implement the standard operations in this way:
Insert(int t):
1) If Array[t] == 0, LinkedList.push_back(t);
2) Array[t]++;
Delete(int t):
1) If Array[t] == 1, find and remove t from LinkedList;
2) Array[t]--;
If I want O(1) complexity for the deletion operation, I make the array store containers instead of integers. Each container contains an integer and a pointer to the respective element of the LinkedList, so I don't have to search through the list.
I would love to know whether there is a data structure which formalizes/improves this approach, or whether there's a better way to do this altogether.
Given the following requirements:
Random access
Fast lookups
Fast insertions
Fast removals
Avoid wasted space
then you probably want something called a sparse array. Sparse arrays are not part of the standard library, so you'll have to emulate your own, using a std::map or std::unordered_map. In a sparse array, only non-zero elements occupy space in the collection.
An ordered_map will have O(1) lookups, insertions, and removals, but does not provide ordered iteration. A map will generally have slower operations, but will provide ordered iteration. I'm oversimplifying things when I say std::map is slower, as it depends on the number of elements and usage patterns (a topic probably already discussed in another question).
If you must absolutely have both O(1) lookups and ordered iteration, then you can combine both a map and ordered_map and keep them in sync. At that point, you'll want to consider using Boost.MultiIndex.
Here's a rough sketch showing how you can implement your own sparse vector class:
class SparseVector
{
public:
int get(size_t index) const
{
auto kv = map_.find(index);
return (kv == map_.end()) ? 0 : kv->second;
}
void put(size_t index, int value)
{
if (value == 0)
map_.erase(index);
else
map_.emplace(index, value);
}
// etc...
private:
std::unordered_map<size_t, int> map_;
};
In such a sparse vector class, you can overload operator[] if you wish to allow something like sparseVec[42] = 123.
Linear algebra libraries, such as Eigen or Boost.uBlas, already provide templates for sparse vectors and sparse matrices.

The knights tour. Chosing a container

I have been reading up on C++ lately, especially STL, and I decided to do the Knights Tour problem again. I'm thinking about the best way to implement this, and I'm looking for some help.
Just for fun and practice, I thought I'd start with a "Piece" base class, which a "Knight" class can inherit from. I want to do this so I later can try adding other pieces(even though most of the pieces can't walk over the whole board and complete the problem).
So the "piece class" will need some sort of container to store the coordinates of the piece on the board and the number of moves it has made in that specific step.
I'm thinking I need a linked list with 64 (8 * 8) places to do this most efficiently, containing x,y and moves.
Looking at the STL containers, I can't find anything except map that will hold more than one type.
What can I do to store the coordinate pair and an int for the number of moves in one container? Are there more efficient ways of doing this than using vector, list or map? Do I need a custom container?
Thanks!
You can use
struct CellInfo
{
int x, y, move_count;
}
And store it in std::vector for constant access.
Apart from STL and encapsulation, a very efficient way is to use arrays:
pair<int, int> piece_pos[N];
int piece_move[N];
This avoids the overhead of memory leakage and is faster than dynamic allocation.
If you stell want to use STL, then:
vector<pair<int, int> > piece_pos(N);
vector<int> piece(N);
The C++ STL now has static arrays as well. If you want to store the number of times a given x,y coordinate has been moved to, you can create an array of arrays like the following:
using container_type = std::array<std::array<int, 8>, 8>;
// ...
container_type c;
int moves = c[x][y]; // constant-time access.
If you don't need to look moves up based on x,y, and just want the data stored efficiently, use a flat array of size 8x8 = 64.
If your compiler is out of date, consider using std::vector instead.

Simulation design - flow of data, coupling

I am writing a simulation and need some hint on the design. The basic idea is that data for the given stochastic processes is being generated and later on consumed for various calculations. For example for 1 iteration:
Process 1 -> generates data for source 1: x1
Process 2 -> generates data for source 1: x2
and so on
Later I want to apply some transformations for example on the output of source 2, which results in x2a, x2b, x2c. So in the end up with the following vector: [x1, x2a, x2b, x2c].
I have a problem, as for N-multivariate stochastic processes (representing for example multiple correlated phenomenons) I have to generate N dimensional sample at once:
Process 1 -> generates data for source 1...N: x1...xN
I am thinking about the simple architecture that would allow to structuralize the simulation code and provide flexibility without hindering the performance.
I was thinking of something along these lines (pseudocode):
class random_process
{
// concrete processes would generate and store last data
virtual data_ptr operator()() const = 0;
};
class source_proxy
{
container_type<process> processes;
container_type<data_ptr> data; // pointers to the process data storage
data operator[](size_type number) const { return *(data[number]);}
void next() const {/* update the processes */}
};
Somehow I am not convinced about this design. For example, if I'd like to work with vectors of samples instead of single iteration, then above design should be changed (I could for example have the processes to fill the submatrices of the proxy-matrix passed to them with data, but again not sure if this is a good idea - if yes then it would also fit nicely the single iteration case). Any comments, suggestions and criticism are welcome.
EDIT:
Short summary of the text above to summarize the key points and clarify the situation:
random_processes contain the logic to generate some data. For example it can draw samples from multivariate random gaussian with the given means and correlation matrix. I can use for example Cholesky decomposition - and as a result I'll be getting a set of samples [x1 x2 ... xN]
I can have multiple random_processes, with different dimensionality and parameters
I want to do some transformations on individual elements generated by random_processes
Here is the dataflow diagram
random_processes output
x1 --------------------------> x1
----> x2a
p1 x2 ------------transform|----> x2b
----> x2c
x3 --------------------------> x3
p2 y1 ------------transform|----> y1a
----> y1b
The output is being used to do some calculations.
When I read this "the answer" doesn't materialize in my mind, but instead a question:
(This problem is part of a class of problems that various tool vendors in the market have created configurable solutions for.)
Do you "have to" write this or can you invest in tried and proven technology to make your life easier?
In my job at Microsoft I work with high performance computing vendors - several of which have math libraries. Folks at these companies would come much closer to understanding the question than I do. :)
Cheers,
Greg Oliver [MSFT]
I'll take a stab at this, perhaps I'm missing something but it sounds like we have a list of processes 1...N that don't take any arguments and return a data_ptr. So why not store them in a vector (or array) if the number is known at compile time... and then structure them in whatever way makes sense. You can get really far with the stl and the built in containers (std::vector) function objects(std::tr1::function) and algorithms (std::transform)... you didn't say much about the higher level structure so I'm assuming a really silly naive one, but clearly you would build the data flow appropriately. It gets even easier if you have a compiler with support for C++0x lambdas because you can nest the transformations easier.
//compiled in the SO textbox...
#include <vector>
#include <functional>
#include <numerics>
typedef int data_ptr;
class Generator{
public:
data_ptr operator()(){
//randomly generate input
return 42 * 4;
}
};
class StochasticTransformation{
public:
data_ptr operator()(data_ptr in){
//apply a randomly seeded function
return in * 4;
}
};
public:
data_ptr operator()(){
return 42;
}
};
int main(){
//array of processes, wrap this in a class if you like but it sounds
//like there is a distinction between generators that create data
//and transformations
std::vector<std::tr1::function<data_ptr(void)> generators;
//TODO: fill up the process vector with functors...
generators.push_back(Generator());
//transformations look like this (right?)
std::vector<std::tr1::function<data_ptr(data_ptr)> transformations;
//so let's add one
transformations.push_back(StochasticTransformation);
//and we have an array of results...
std::vector<data_ptr> results;
//and we need some inputs
for (int i = 0; i < NUMBER; ++i)
results.push_back(generators[0]());
//and now start transforming them using transform...
//pick a random one or do them all...
std::transform(results.begin(),results.end(),
results.begin(),results.end(),transformation[0]);
};
I think that the second option (the one mentioned in the last paragraph) makes more sense. In the one you had presented you are playing with pointers and indirect access to random process data. The other one would store all the data (either vector or a matrix) in one place - the source_proxy object. The random processes objects are then called with a submatrix to populate as a parameter, and themselves they do not store any data. The proxy manages everything - from providing the source data (for any distinct source) to requesting new data from the generators.
So changing a bit your snippet we could end up with something like this:
class random_process
{
// concrete processes would generate and store last data
virtual void operator()(submatrix &) = 0;
};
class source_proxy
{
container_type<random_process> processes;
matrix data;
data operator[](size_type source_number) const { return a column of data}
void next() {/* get new data from the random processes */}
};
But I agree with the other comment (Greg) that it is a difficult problem, and depending on the final application may require heavy thinking. It's easy to go into the dead-end resulting in rewriting lots of code...