What is the best standard data structure to build a Graph? - c++

at first i am a beginner at c++ and i am self learning it, so please be quite simple in answers ...
i need to program a graph that contains nodes each node has id and list of edges each edge has the other node id and the distance
what i am looking for is what should i use to build this graph considering that i wants to use dijkstra algorithm to get the shortest path form one point to the other ... so searching performance should be the most important i think !!
i have searched a lot and i am so confused now
thank you in advance for the help

You can define an Edge structure like
struct Edge
{
int destination;
int weight;
};
And create a graph as
vector<vector<Edge> > graph;
Then to access all the edges coming from the vertex u, you write something like
for( int i = 0; i < graph[u].size(); ++i ) {
Edge edge = graph[u][i];
// here edge.destination and edge.weight give you some details.
}
You can dynamically add new edges, for example an edge from 3rd vertex to 7th with a weight of 8:
Edge newEdge;
newEdge.destination = 7;
newEdge.weight = 8;
graph[3].push_back( newEdge );
etc.
For undirected graphs you should not forget to add the symmetric edge, of course.
This should do ok.
Edit
The choice of base containers (std::vector, std::list, std::map) depends on the use case, e.g. what are you doing with the graph more often: add/remove vertices/edges, just traversing. Once your graph is created, either std::list or std::vector is equally good for Dijkstra, std::vector being a bit faster thanks to sequential access pattern of the relaxation stage.

Use unordered_map<int,vector<int>> to represent adjacency list if you have huge number of vertexes. If you're planning on implementing a small scale graph, then go with array of vectors. Eg: vector<int> v[20];

a graph that contains nodes each node has id and list of edges each edge has the other node id and the distance
If we consider each node id as an index. We can draw an nxn matrix of the edges as follows.
This can help you draw the graph with edges.
[0][1][2][3]
[0] | 1 0 0 0|
[1] | 0 0 1 0|
[2] | 1 0 0 1|
[3] | 0 0 1 0|
So, a 2D array is a good representation of matrix.
int maxtrix[4][4] = new int[4][4];

I personally would use a std::map<Node*, std::set<Node*> >. This is extremely useful because each time you are at a node, you can quickly find out which nodes that node is connected to. It is also really easy to iterate over all the nodes if you need to. If you need to put weights on the edges, you could use std::map<Node*, std::set< std::pair<int, Node*> > >. This will give much better performance than using vectors, especially for large graphs.

Related

C++ : Storing weight for larger Graph

I was solving some question on graph. It requires to store weight for N Nodes(N<=50000). I cant use matrix to store weight of graph(as 50000x50000 can't be allocated). Do you know any other way? Thanks.
My preferred way of storing not too dense graphs is using adjacency lists.
The downside using adjacency lists is however that you can't directly check if node i is connected to node j. Instead you traverse all neighbors of node i (in which j would eventually show up if it is connected with node i). Also it's not practical to remove edges. I use it when doing breadth-first or depth-first searches on a graph, since one is only interested in the set of neighbors and not whether two specific nodes are connected.
In summary:
Takes only as much memory as you have edges (which is what you wanted) but at least as much memory as you have nodes.
Easy to traverse egdes for any node, i.e. always constant time per neighbor
To check whether two nodes i and j are connected you need to traverse the whole neighborhoodlist of node i or j. Which is bad if one node is connected to almost all other nodes and cheap if its connected to a few
Removing edges is also expensive for large neighborhoods (at worst linear time in the number of neighbors of a node) and cheap for small neighborhoods.
Inserting edges is very cheap (constant time)
To give you an example (first with all weights 1)
using Graph = std::vector<std::vector<int>>;
now you can create a graph with n nodes with:
Graph mygraph(n);
And if you want to connect node i and j just do
mygraph[i].push_back(j);
mygraph[j].push_back(i);
And to traverse all edges of some node, you can simply do
for (int neighbor : mygraph[i]) {
std::cout << i << " is connected with " << neighbor << std::endl;
}
And now for the harder problem with general weights:
using Graph = std::vector<std::vector<std::pair<int, double>>>;
Graph myWeightedgraph(n);
Now you can insert edges very easily
double weight = 123.32424;
myWeightedgraph[i].push_back({j, w});
myWeightedgraph[j].push_back({i, w});
And for traversal:
for (auto& neighbor : myWeightedgraph[i]) {
std::cout << i << " is connected with " << neighbor.first << " with weight " << neighbor.second << std::endl;
}
If two nodes can't have multiple edges between them:
First think of some system how to give each existing edge an unique number.
Eg. for N nodes and node numbers netween 0 and N-1, a edge between node A and node B could simply have A*N+B (eg. in an uint64_t variable)
Then make a std::map of edges, with the calculated number as key and the weight as value. Most operations there have logarithmic time, which is not as good than the 2D array but still good, and you need much less memory.
There are generally two ways to represent graphs. As you stated, the first one is to use an adjacency matrix. The pros are that you can easily see if two nodes i and j are connected. The downside is the space complexity (O(V²) where V is the number of vertices).
The other one is the adjacency list: for each vertex, you store an adjacency list that contains every edge coming out of that vertex. Obviously, the spatial complexity is O(V + E) where V is the number of vertices and E the number of edges.
Note that you can store the edges in adjacency maps instead of lists. Let's say you give each edge a unique integer key. If your graph is sparse, an std::unordered_map would fit well since collisions odds will be low. This grants you on average O(1) lookup and insertion complexity for a given edge.
If your graph can have a huge number of edges, then just use a regular std::map which relies on red black trees. You'll then have a logarithmic complexity for both inserting or looking up a node.
Here is some sample code:
struct Edge {
int weight;
int start, end;
}
struct Vertex {
int key;
std::unordered_map<int, Edge> adjacency_map;
}
struct Graph {
std::vector<Edge> edges;
}
You can't allocate an array with size of orders 10^9 as a static memory. You should be using malloc instead. Better still, you can use adjacency list to store the graph.

Dijkstra' algorithm- vertex as coordinate

I went through Dijkstra for shortest path algorithm,while i was practicing i encountered a question in which vertex is not a single number(say 1,2,3...and so)but it was a pair more specifically given as (x,y)coordinates.I have never done such type of question nor i have seen them.Can you please help me out how to approach for such kind of question.O(V^2) is heartily welcome
Map the coordinates to integer vertices using a hashmap. Now you have a graph with nodes as single numbers. Apply dijkstra's algorithm. Time complexity : O(V) for converting to integer vertices. O(V^2) for running dijkstra's algorithm. Therefore O(V^2) total complexity.
Pseudo code:
int cntr = 0;
for(Edge e : graph){
int from = e.from;
int to= e.to;
if(!map.contains(from)){
map.put(from, cntr++);
}
if(!map.contains(to)){
map.put(to, cntr++);
}
}
Each vertex would still have an id (which you could assign, if not given). The Cartesian coordinates are just additional attributes of the vertex, which could be used to compute distances between connected vertices. (sqrt(delta_x^2 + delta_y^2))

The best way to store graph into the memory

The problem is that I have 150 000+ nodes with 200 000+ (may vary up to 1 000 000 or even more) all of them are written to a DB.
Now I'd like to create a normal graph which will open access to routing. So, I need to compose it using data from existing DB.
The idea is to build this huge graph, divide it into small pieces and write to DB BLOBS for storing. I tried to build it recursively but it seems to me that stack could't store so much data and all the time my algorithm breaks with allocation error. So, now I'm a bit confused with a way which will allow me to build this graph. I'm thinking about some kind of iterative method, but the main problem is architecture, I mean structures which I'm going to use for storing nodes and arcs.
As I see this solution it should be smith like that:
struct Graph
{
unsigned int nodesAmount;
unsigned int arcsAmount;
vector<Node*> NodeArr; //Some kind of container to store all existing Nodes
}
struct Node
{
unsigned int id;
int dimension; //how many arcs use this node
vector<Arcs*> ArcArr;
}
struct Arcs
{
unsigned int id;
double cost;
Node* Node_from;
Node* Node_to;
}
I read lots of articles about method of storing graphs, but didn't find really good solution for such huge graphs.
I would be very pleased for any ideas. Thank you
You are on the right path.
Some small changes that I would suggest:
struct Graph
{
unsigned int nodesAmount;
unsigned int arcsAmount;
vector<Node> NodeArr; // Store the nodes directly, not pointers
}
struct Node
{
unsigned int id;
int dimension; //how many arcs use this node
vector<int> Neighbours; // store neighbour IDs, saves memory
}
Since you are moving between database and C I would strongly suggest not to use pointers because those do not translate. Use IDs and look up your nodes by ID. If you need to store the edges separately then also do this by ID, not by pointer.
I know that this solution has nothing to do with your snippet, but i'd like to show you another way.
The option that's used quite often is to have two arrays - one for the edges, one for the vertices.
The vertices array points to the edges array and says where the adjacent vertices start. The edges array stores the adjacent vertices itself.
For instance :
V = 6, E = 7
vertices = [0, 1, 1, 2, 5, 6]
edges = [1, 2, 3, 4, 5, 6, 0]
Considering the indexes, the edges array would look like :
| [1] | [] | [2] | [3, 4, 5] | [6] | [0] |
So the first vertex has a single adjacent vertex (with id 1), the fifth vertex has 3 adjacent vertices with IDs 3, 4, 5 etc.

Find distance from a node to the one farthest from it BOOST

I need to fin the distance from all nodes to the node farthest from it in the minimum spanning tree. I have done this so far but I got no clue as to find the longest distance from a node.
#include<iostream>
#include<boost/config.hpp>
#include<boost/graph/adjacency_list.hpp>
#include<boost/graph/kruskal_min_spanning_tree.hpp>
#include<boost/graph/prim_minimum_spanning_tree.hpp>
using namespace std;
using namespace boost;
int main()
{
typedef adjacency_list< vecS, vecS, undirectedS, property <vertex_distance_t,int>, property< edge_weight_t, int> > Graph;
int test=0,m,a,b,c,w,d,i,no_v,no_e,arr_w[100],arr_d[100];
cin>>test;
m=0;
while(m!=test)
{
cin>>no_v>>no_e;
Graph g(no_v);
property_map <Graph, edge_weight_t>:: type weightMap=get(edge_weight,g);
bool bol;
graph_traits<Graph>::edge_descriptor ed;
for(i=0;i<no_e;i++)
{
cin>>a>>b>>c;
tie(ed,bol)=add_edge(a,b,g);
weightMap[ed]=c;
}
property_map<Graph,edge_weight_t>::type weightM=get(edge_weight,g);
property_map<Graph,vertex_distance_t>::type distanceMap=get(vertex_distance,g);
property_map<Graph,vertex_index_t>::type indexMap=get(vertex_index,g);
vector< graph_traits<Graph>::edge_descriptor> spanning_tree;
kruskal_minimum_spanning_tree(g,back_inserter(spanning_tree));
vector<graph_traits<Graph>::vector_descriptor>p(no_v);
prim_minimum_spanning_tree(g,0,&p[0],distancemap,weightMap,indexMap,default_dijkstra_visitor());
w=0;
for(vector<graph_traits<Graph>::edge_descriptor>::iterator eb=spanning_tree.begin();eb!=spanning_tree.end();++eb) //spanning tree weight
{
w=w+weightM[*eb];
}
arr_w[m]=w;
d=0;
graph_traits<Graph>::vertex_iterator vb,ve;
for(tie(vb,ve)=vertices(g),.
arr_d[m]=d;
m++;
}
for( i=0;i<test;i++)
{
cout<<arr_w[i]<<endl;
}
return 0;
}
If i have a spanning tree with nodes 1 2 3 4 I need to find longest distance from 1 2 3 4 in the spanning tree(and the longest distance can comprise of many edges not only one).
I'll not give you exact code how to do this but I'll give you and idea how to do this.
First, result of MST (minimum spanning tree) is so called tree. Think about the definition. One can say it is a graph where exists path from every node to every other nodes and there are no cycles. Alternatively you can say that given graph is a tree iff exists exactly one path from vertex u to v for every u and v.
According to the definition you can define following
function DFS_Farthest (Vertex u, Vertices P)
begin
define farthest is 0
define P0 as empty set
add u to P
foreach v from neighbours of u and v is not in P do
begin
( len, Ps ) = DFS_Farthest(v, P)
if L(u, v) + len > farthest then
begin
P0 is Ps union P
farthest is len + L(u, v)
end
end
return (farthest, P0)
end
Then you'll for every vertex v in graph call DFS_Farthest(v, empty set) and it'll give you (farthest, P) where farthest is distance of the farthest node and P is set of vertices from which you can reconstruct the path from v to farthest vertex.
So now to describe what is it doing. First the signature. First parameter is from what vertex you want to know farthest one. Second parameter is a set of banned vertices. So it says "Hey, give me the longest path from v to farthest vertex so the vertices from P are not in that path".
Next there is this foreach thing. There you are looking for farthest vertices from current vertex without visiting vertices already in P (current vertex is already there). When you find path longer then currently found not it to farthest and P0. Note that L(u, v) is length of the edge {u, v}.
At the end you'll return those length and banned vertices (this is the path to the farthest vertex).
This is just simple DFS (depth first search) algorithm where you remember already visited vertices.
Now about time complexity. Suppose you can get neighbours of given vertex in O(1) (depends on data structure you have). Function visits every vertex exactly once. So it is at least O(N). To know farthest vertex from every vertex you have to call this function for every vertex. This gives you time complexity of this solution of your's problem at least O(n^2).
My guess is that better solution might be done using dynamic programming but this is just a guess. Generally finding longest path in graph is NP-hard problem. This makes me suspicious that there might not me any significantly better solution. But it's another guess.

Better method to search array?

I have an array (nodes[][]) that contains values of effective distances that looks something like this:
__ __
|1 0.4 3 |
|0.4 1 0 |
|3 3.2 1 ... |
|0.8 4 5 |
|0 0 1 |
-- --
Where the first value, node[0][0] is the distance from node 0 to node 0 which is 1.
So the distance from node 2 to node 1 is 3.2 (node[2][1]=3.2)
I need, given a node column, to search through the rows to find the farthest distance, while not picking itself (node[1][1])
The method I was thinking to do something like this:
int n=0;
currentnode=0; //this is the column I am searching now
if(currentnode==n)
n++;
best=node[n][currentnode];
nextbest=node[n++][currentnode];
if(nextbest>best)
best=nextbest;
else
for(int x=n;x<max;x++) //max is the last column
{
if(currentnode==n)
continue;
nextbest=node[x][currentnode];
if(nextbest>best)
best=nextbest;
}
I can't think of a better method to do this. I could use functions to make it shorter but this is GENERALLY what I am thinking about using. After this I have to loops this to go to the next column that the best distance returns and do this routine again.
As always when trying to optimize, you have to make a choice:
Do you want the cost during insertion, or during search ?
If you have few insertions, and a lot of search to do in the container, then you need a sorted container. Finding the maximum will be O(1) - i.e. just pick the last element.
If you have a lot of insertions and a few search, then you can stay with an unsorted container, and finding a maximum is O(n) - i.e. you have to check all values at least once to pick the the maximum.
You can simplify it quite a bit. A lot of your checks and temporary variables are redundant. Here's a small function that performs your search. I've renamed most of the variables to be a little more precise what their roles are.
int maxDistance(int fromNode) {
int max = -1;
for (int toNode = 0; toNode < nodeCount; ++toNode)
{
if (fromNode != toNode && nodes[toNode][fromNode] > max) {
max = node[toNode][fromNode];
}
}
return max;
}
If you are willing to sacrifice some space, you could add additional arrays to keep track of the maximum distance seen so far for a particular column/row and the node that that distance corresponds to.
Profile it. Unless this is a major bottleneck, I'd favour clarity (maintainability) over cleverness.
Looping linearly over arrays is something that modern processors do rather well, and the O(N) approach often works just fine.
With thousands of nodes, I'd expect your old Pentium III to be able to a few gazillion a second! :)