C++ : Storing weight for larger Graph - c++

I was solving some question on graph. It requires to store weight for N Nodes(N<=50000). I cant use matrix to store weight of graph(as 50000x50000 can't be allocated). Do you know any other way? Thanks.

My preferred way of storing not too dense graphs is using adjacency lists.
The downside using adjacency lists is however that you can't directly check if node i is connected to node j. Instead you traverse all neighbors of node i (in which j would eventually show up if it is connected with node i). Also it's not practical to remove edges. I use it when doing breadth-first or depth-first searches on a graph, since one is only interested in the set of neighbors and not whether two specific nodes are connected.
In summary:
Takes only as much memory as you have edges (which is what you wanted) but at least as much memory as you have nodes.
Easy to traverse egdes for any node, i.e. always constant time per neighbor
To check whether two nodes i and j are connected you need to traverse the whole neighborhoodlist of node i or j. Which is bad if one node is connected to almost all other nodes and cheap if its connected to a few
Removing edges is also expensive for large neighborhoods (at worst linear time in the number of neighbors of a node) and cheap for small neighborhoods.
Inserting edges is very cheap (constant time)
To give you an example (first with all weights 1)
using Graph = std::vector<std::vector<int>>;
now you can create a graph with n nodes with:
Graph mygraph(n);
And if you want to connect node i and j just do
mygraph[i].push_back(j);
mygraph[j].push_back(i);
And to traverse all edges of some node, you can simply do
for (int neighbor : mygraph[i]) {
std::cout << i << " is connected with " << neighbor << std::endl;
}
And now for the harder problem with general weights:
using Graph = std::vector<std::vector<std::pair<int, double>>>;
Graph myWeightedgraph(n);
Now you can insert edges very easily
double weight = 123.32424;
myWeightedgraph[i].push_back({j, w});
myWeightedgraph[j].push_back({i, w});
And for traversal:
for (auto& neighbor : myWeightedgraph[i]) {
std::cout << i << " is connected with " << neighbor.first << " with weight " << neighbor.second << std::endl;
}

If two nodes can't have multiple edges between them:
First think of some system how to give each existing edge an unique number.
Eg. for N nodes and node numbers netween 0 and N-1, a edge between node A and node B could simply have A*N+B (eg. in an uint64_t variable)
Then make a std::map of edges, with the calculated number as key and the weight as value. Most operations there have logarithmic time, which is not as good than the 2D array but still good, and you need much less memory.

There are generally two ways to represent graphs. As you stated, the first one is to use an adjacency matrix. The pros are that you can easily see if two nodes i and j are connected. The downside is the space complexity (O(V²) where V is the number of vertices).
The other one is the adjacency list: for each vertex, you store an adjacency list that contains every edge coming out of that vertex. Obviously, the spatial complexity is O(V + E) where V is the number of vertices and E the number of edges.
Note that you can store the edges in adjacency maps instead of lists. Let's say you give each edge a unique integer key. If your graph is sparse, an std::unordered_map would fit well since collisions odds will be low. This grants you on average O(1) lookup and insertion complexity for a given edge.
If your graph can have a huge number of edges, then just use a regular std::map which relies on red black trees. You'll then have a logarithmic complexity for both inserting or looking up a node.
Here is some sample code:
struct Edge {
int weight;
int start, end;
}
struct Vertex {
int key;
std::unordered_map<int, Edge> adjacency_map;
}
struct Graph {
std::vector<Edge> edges;
}

You can't allocate an array with size of orders 10^9 as a static memory. You should be using malloc instead. Better still, you can use adjacency list to store the graph.

Related

How to calculate the total distance between various vertices in a graph?

Let's say I have a weighted, undirected, acyclic graph with no negative value weights, comprised of n vertices and n-1 edges. If I want to calculate the total distance between every single one of them (using edge weight) and then add it up, which algorithm should I use? If for example a graph has 4 vertices, connected like a-b, a-c, c-d then the program should output the total distance needed to go from a-d, a-c, a-b, b-c, b-d and so on. You could call it every possible path between the given vertices. The language I am using is C++.
I have tried using Dijikstra's and Prim's algorithm, but none have worked for me. I have thought about using normal or multisource DFS, but I have been struggling with it for some time now. Is there really a fast way to calculate it, or have I misunderstood the problem entirely?
Since you have an acyclic graph, there is only one possible path between any two points. This makes things a lot simpler to compute and you don't need to use any real pathfinding algorithms.
Let's say we have an edge E that connects nodes A and B. Calculate how many nodes can be reached from node A, not using edge E (including A). Multiply that by the number of nodes that can be reached from node B, not using edge E (including B). Now you have the number of paths that travel through edge E. Multiply this by the weight of edge E, and you have the total contribution of edge E to the sum.
Do the same thing for every edge and add up the results.
To make the algorithm more efficient, each edge can store cached values that say the number of nodes that are reachable on each side of the edge.
You don't have to use a depth first search. Here is some pseudocode showing how you calculate the number of nodes reachable on a side of edge E very fast taking advantage of caching:
int count_nodes_reachable_on_edge_side(Edge e, Node a) {
// assume edge e directly connects to node a
if (answer is already cached in e) { return the answer; }
answer = 1; // a is reachable
for each edge f connected to node a {
if (f is not e) {
let b be other node f touches (not a)
answer += count_nodes_reachable_on_edge_side(f, b)
}
}
cache the answer in edge e;
return answer;
}
I already presented an O(N^2) algorithm in my other answer, but I think you can actually do this in O(N) time with this pseudo code:
let root be an arbitrary node on the graph;
let total_count be the total number of nodes;
let total_cost be 0;
process(root, null);
// Returns the number of nodes reachable from node n without going
// through edge p. Also adds to total_cost the contribution from
// all edges touching node n, except for edge p.
int process(Node n, Edge p)
{
count = 1
for each edge q that touches node n {
if (q != p) {
let m be the other node connected to q (not n)
sub_count = process(m, q)
total_cost += weight(q) * sub_count * (total_count - sub_count)
count += sub_count
}
}
return count
}
The run time of this is O(N), where N is the number of nodes, because process will be called exactly once for each node.
(For the detail-oriented readers: the loop inside process does not matter: there are O(N) iterations that call process, because process is called on each node exactly once. There are O(N) iterations that don't do anything (because q == p), because those iterations can only happen once for process call.)
Every edge will also be visited. After we recursively count the number of nodes on one side of the edge, we can do a simple subtraction (total_count - sub_count) to get the number of nodes on the other side of the edge. When we have these two node counts, we can just multiply them together to get the total number of paths going through the edge, then mulitply that by the weight, and add it to the total cost.

How to find largest bi-partite subgraph in the given graph?

Given an undirected unweighted graph : it may be cyclic and each vertex has given value ,as shown in image.
Find the size of largest Bi-Partite sub-graph (Largest means maximum number of vertices (connected) in that graph) ?
Answer:
The largest graph is the orange-coloured one, so the answer is 8.
My approach:
#define loop(i,n) for(int i=0;i<n;i++)
int vis[N+1];
vector<int> adj[N+1] // graph in adjacency vector list
int dfs(int current_vertex,int parent,int original_value,int other_value){
int ans=0;
vis[current_vertex]=1; // mark as visited
// map for adding values from neighbours having same value
map<int,int> mp;
// if curr vertex has value original_value then look for the neighbours
// having value as other,but if other is not defined define it
if(value[current_vertex]==original_value){
loop(i,adj[current_vertex].size()){
int v=adj[current_vertex][i];
if(v==parent)continue;
if(!vis[v]){
if(value[v]==other_value){
mp[value[v]]+=dfs(v,current_vertex,original,other);
}
else if(other==-1){
mp[value[v]]+=dfs(v,current_vertex,original,value[v]);
}
}
}
}
//else if the current_vertex has other value than look for original_value
else{
loop(i,adj[current_vertex].size()){
int v=adj[current_vertex][i];
if(v==p)continue;
if(!vis[v]){
if(value[v]==original){
mp[value[v]]+=dfs(v,current_vertex,original,other);
}
}
}
}
// find maximum length that can be found from neighbours of curr_vertex
map<int,int> ::iterator ir=mp.begin();
while(ir!=mp.end()){
ans=max(ans,ir->second);
ir++;
}
return ans+1;
}
calling :
// N is the number of vertices in original graph : n=|V|
for(int i=0;i<N;i++){
ans=max(ans,dfs(i,-1,value[i],-1);
memset(vis,0,sizeof(vis));
}
But I'd like to improve this to run in O(|V|+|E|) time. |V| is the number of veritces and |E| is the number of edges and How do I do that?
This doesn't seem hard. Traverse the edge list and add each one to a multimap keyed by vertex label canonical pairs (the 1,2,3 in your diagram, e.g. with the lowest vertex label value first in the pair).
Now for each value in the multimap - which is a list of edges - accumulate the corresponding vertex set.
The biggest vertex set corresponds to the edges of the biggest bipartite graph.
This algorithm traverses each edge twice, doing a fixed number of map and set operations per edge. So its amortized run time and space is indeed O(|V|+|E|).
Note that it's probably simpler to implement this algorithm with an adjacency list representation than with a matrix because the list gives the edge explicitly. The matrix requires a more careful traversal (like a DFS) to avoid Omega(|V|^2) performance.

create pairs of vertices using adjacency list in linear time

I have n number of vertices numbered 1...n and want to pair every vertex with all other vertices. That would result in n*(n-1)/2 number of edges. Each vertex has some strength.The difference between the strength of two vertices is the weight of the edge.I need to get the total weight. Using two loops I can do this in O(n^2) time. But I want to reduce the time.I can use adjacency list and using that create a graph of n*(n-1)/2 edges but how will I create the adjacency list without using two loops. The input takes only the number of vertices and the strength of each vertex.
for(int i=0;i<n;i++)
for(int j=i+1;j<n;j++)
{
int w=abs((strength[i]-strength[j]));
sum+=w;
}
this is what i did earlier.I need a better way to do this.
If there are O(N*N) edges, then you can't list them all in linear time.
However, if indeed all you need is to compute the sum, here's a solution in O(N*log(N)). You can kind of improve the solution by using instead O(N) sorting algorithm, such as radix sort.
#include <algorithm>
#include <cstdint>
// ...
std::sort(strength, strength+n);
uint64_t sum = 0;
int64_t runSum = strength[0];
for(int i=1; i<n; i++) {
sum += int64_t(i)*strength[i] - runSum;
runSum += strength[i];
}
// Now "sum" contains the sum of weigths over all edges
To explain the algorithm:
The idea is to avoid summing over all edges explicitly (requiring O(N*N)), but rather to add sums of several weights at once. Consider the last vertex n-1 and the average A[n-1] = (strength[0] + strength[1] + ... + strength[n-2])/(n-1): obviously we could add (strength[n-1] - A[n-1]) * (n-1), i.e. n-1 weights at once, if the weights were all larger than strength[n-1], or all smaller than it. However, due to abs operation, we would have to add different amounts depending on whether the strength of the other vertex is larger or smaller than the strength of the current vertex. So one solution is to sort the strengths first, so to ensure that each next strength is greater or equal to the previous.

How can I economically store a sparse matrix during the process of element filling?

I know there are quite a few good ways to store a sparse matrix without taking up much memory.
But I'm wondering whether there is a good way to store a sparse matrix during the construction of it? Here is the more detailed scenario: the program constructs a sparse matrix by figuring out where to put a non-zero value on each iteration; and since the coordinates of the non-zero value will not be known until runtime, they are totally random and unpredictable.
I'm programming in C++. So is there a way to implement this in C++? Solutions in other languages are also appreciated.
You could have 3 parallel list and store rows id in one, column id in the other, value in the third. Once you are done with all entries, you could reorganize as needed, ex. sort by rows and columns.
What is not described in your question is how do you need/want to represent the sparse matrix in the end? What do you need to do with it? This would affect the representation
std::map might be what you're looking for, it's a key -> value map type. Combine this with std::set, which is a unique collection of elements. So, you could use a map of std::set, like so:
std::map<int, std::set<int> > sparseMatrix;
// Add some edges.
sparseMatrix[0].insert(1); // Add an edge from vertex 0 to 1.
sparseMatrix[4].insert(2); // Add an edge from vertex 4 to 2.
sparseMatrix[0].insert(1); // Edge already exists, no data added to the set.
This representation lets you represent a directed graph, it's analogous to an edge list. The behaviour of a set also prevents you from having two edges that are 'identical' (a->b and c->d, where a=b and c=d), which is nice, a behaviour you'd get if you used an adjacency matrix. You can iterate al the edges like so:
for(std::map<int, std::set<int> >::const_iterator i = sparseMatrix.begin();
i != sparseMatrix.end();
++i)
{
for(std::set<int>::const_iterator j = i->second.begin();
j != i->second.end();
++j)
{
std::cout << "An edge exists from " << i->first << " to " << *j << ".";
}
}
Some links:
Set documentation
Map documentation

What is the best standard data structure to build a Graph?

at first i am a beginner at c++ and i am self learning it, so please be quite simple in answers ...
i need to program a graph that contains nodes each node has id and list of edges each edge has the other node id and the distance
what i am looking for is what should i use to build this graph considering that i wants to use dijkstra algorithm to get the shortest path form one point to the other ... so searching performance should be the most important i think !!
i have searched a lot and i am so confused now
thank you in advance for the help
You can define an Edge structure like
struct Edge
{
int destination;
int weight;
};
And create a graph as
vector<vector<Edge> > graph;
Then to access all the edges coming from the vertex u, you write something like
for( int i = 0; i < graph[u].size(); ++i ) {
Edge edge = graph[u][i];
// here edge.destination and edge.weight give you some details.
}
You can dynamically add new edges, for example an edge from 3rd vertex to 7th with a weight of 8:
Edge newEdge;
newEdge.destination = 7;
newEdge.weight = 8;
graph[3].push_back( newEdge );
etc.
For undirected graphs you should not forget to add the symmetric edge, of course.
This should do ok.
Edit
The choice of base containers (std::vector, std::list, std::map) depends on the use case, e.g. what are you doing with the graph more often: add/remove vertices/edges, just traversing. Once your graph is created, either std::list or std::vector is equally good for Dijkstra, std::vector being a bit faster thanks to sequential access pattern of the relaxation stage.
Use unordered_map<int,vector<int>> to represent adjacency list if you have huge number of vertexes. If you're planning on implementing a small scale graph, then go with array of vectors. Eg: vector<int> v[20];
a graph that contains nodes each node has id and list of edges each edge has the other node id and the distance
If we consider each node id as an index. We can draw an nxn matrix of the edges as follows.
This can help you draw the graph with edges.
[0][1][2][3]
[0] | 1 0 0 0|
[1] | 0 0 1 0|
[2] | 1 0 0 1|
[3] | 0 0 1 0|
So, a 2D array is a good representation of matrix.
int maxtrix[4][4] = new int[4][4];
I personally would use a std::map<Node*, std::set<Node*> >. This is extremely useful because each time you are at a node, you can quickly find out which nodes that node is connected to. It is also really easy to iterate over all the nodes if you need to. If you need to put weights on the edges, you could use std::map<Node*, std::set< std::pair<int, Node*> > >. This will give much better performance than using vectors, especially for large graphs.