The best way to store graph into the memory - c++

The problem is that I have 150 000+ nodes with 200 000+ (may vary up to 1 000 000 or even more) all of them are written to a DB.
Now I'd like to create a normal graph which will open access to routing. So, I need to compose it using data from existing DB.
The idea is to build this huge graph, divide it into small pieces and write to DB BLOBS for storing. I tried to build it recursively but it seems to me that stack could't store so much data and all the time my algorithm breaks with allocation error. So, now I'm a bit confused with a way which will allow me to build this graph. I'm thinking about some kind of iterative method, but the main problem is architecture, I mean structures which I'm going to use for storing nodes and arcs.
As I see this solution it should be smith like that:
struct Graph
{
unsigned int nodesAmount;
unsigned int arcsAmount;
vector<Node*> NodeArr; //Some kind of container to store all existing Nodes
}
struct Node
{
unsigned int id;
int dimension; //how many arcs use this node
vector<Arcs*> ArcArr;
}
struct Arcs
{
unsigned int id;
double cost;
Node* Node_from;
Node* Node_to;
}
I read lots of articles about method of storing graphs, but didn't find really good solution for such huge graphs.
I would be very pleased for any ideas. Thank you

You are on the right path.
Some small changes that I would suggest:
struct Graph
{
unsigned int nodesAmount;
unsigned int arcsAmount;
vector<Node> NodeArr; // Store the nodes directly, not pointers
}
struct Node
{
unsigned int id;
int dimension; //how many arcs use this node
vector<int> Neighbours; // store neighbour IDs, saves memory
}
Since you are moving between database and C I would strongly suggest not to use pointers because those do not translate. Use IDs and look up your nodes by ID. If you need to store the edges separately then also do this by ID, not by pointer.

I know that this solution has nothing to do with your snippet, but i'd like to show you another way.
The option that's used quite often is to have two arrays - one for the edges, one for the vertices.
The vertices array points to the edges array and says where the adjacent vertices start. The edges array stores the adjacent vertices itself.
For instance :
V = 6, E = 7
vertices = [0, 1, 1, 2, 5, 6]
edges = [1, 2, 3, 4, 5, 6, 0]
Considering the indexes, the edges array would look like :
| [1] | [] | [2] | [3, 4, 5] | [6] | [0] |
So the first vertex has a single adjacent vertex (with id 1), the fifth vertex has 3 adjacent vertices with IDs 3, 4, 5 etc.

Related

Rotating array in C++ using single array

I've been recently solving various programming tasks. One that I found was rather easy - array circular rotation. I've chosen C++ as a language and I'd like to keep it.
The idea is to rotate each array element right by a given number of places: e.g. [3, 8, 9, 7, 6] rotated 3 times gives [9, 7, 6, 3, 8].
It wasn't so difficult to figure out solution using extra array. The only required think was new position calculated:
(old_position + rotations)%array_size
However, I've started to think how (if) it can be achieved only with one array? My initial idea was to just swap two elements between new and old position (knowing what old position was) and repeat it for all elements. My code looks like that:
int temp_el;
int temp_index = 0;
int new_pos;
for(int i=0;i<A.size();i++)
{
new_pos = (temp_index + rotations)%A.size();
temp_el = A[new_pos];
A[new_pos] = A[0];
A[0] = temp_el;
temp_index = new_pos;
}
but I forgot about case where in the middle or at the beginning of rotations element at 0 is correct. Which element next I need to pick next? I cannot just take ith element, because it might be already moved.

Find all of adjacent cells/cubes in 3D-space

I am dividing a part of a 3D space into a series of 1x1x1 cubes, and that part may have a volume of 100^3 up to 1000^3, however, the cubes/cells I am really interested in rarely exceed 5000-20000 in their numbers.
What I am trying to do is to find all the cells/cubes which satisfy my criteria, adjacent to the chosen one. However, I am not sure what algorithm is the best for such a task. First thing which comes to my mind is to use a regular flood fill algorithm, but the following problem arises: I have to store information about all of the cells in the working area, which as I said may have up to 1000^3 elements, but the ones I need are barely 5000-20000.
So said my questions are:
If I should use flood fill, is there any data structure which can be used in my case?
If I shouldn't use flood fill, what should I?
I'll try to rephrase the need: you want to store some data (bool visited) for every cell and for most of cells it will be the same (nor visited), so you want to save some memory.
Recently I heard about OpenVDB: http://www.openvdb.org/documentation/doxygen/
I haven't used it, but it looks like it matches the requirement - it stores sparse volumetric data and claims to be memory and time efficient.
I think this should illustrate my idea of how you can solve your problem. You can also consider transferring the set to a vector once you are done with the initial processing (though strictly speaking both structures are similar in resepct to full iteration amortised speed)
set<pair<int, int> > getAllPointsToProcess(const pair<int, int>& initialCell) {
set<pair<int, int> > activatedCells; // these will be returned
queue<pair<int, int> > toProcess;
toProcess.push(initialCell);
activatedCells.insert(initialCell);
int adjacentOffsets[][] = {{1, 0}, {-1, 0}, {0, 1}, {0, -1}};
pair<int, int> currentlyProcessed;
int neighbourCell;
while (!toProcess.empty()) {
currentlyProcessed = toProcess.front();
toProcess.pop();
for (int i = 0; i < 4; i++) {
neighbourCell.first = currentlyProcessed.first + adjacentOffsets[i][0];
neighbourCell.second = currentlyProcessed.second + adjacentOffsets[i][1];
if (isActive(neighbourCell) && activatedCells.find(neighbourCell) == activatedCells.end()) {
toProcess.push(neighbourCell);
activatedCells.insert(neighbourCell);
}
}
return activatedCells;
}
As you pointed out, Flood-Fill algorithm seems relevant to this problem.The problem you are facing is storing information about all the cubes whether they have been visited or not.
You got two options :
Keep Hash for each cubes , Space :
O(1000^3) Time : O(1) : which u don't want
Maintain a list of visited cubes , Space : O(10000) Time : O( 100002 )
: Everytime u need to check whether this cube has been visited just traverse the complete list of visited cubes.
That's just space-time trade off.
P.S : I hope I got your problem correctly !

C++ creating a graph out of dot coordinates and finding MST

I'm trying to make a program in which the user inputs n dot coordinates, 0 < n <= 100. Supposedly, the dots have to be connected, lets say, with ink in a way that you can get e.g from point A to point X while following the inked line and using the least amount of ink possible.
I thought of using Prim Algorithm or something like that to get the MST but for that I need a graph. In all the webpages I've looked they don't really explain that, they always already have the graph with its edges already in there.
I need help specifically creating a graph in C++ out of a bunch of (x, y) coordinates, like the user inputs:
0 0
4 4
4 0
0 4
Please note I'm just starting with C++ and that I can't use any weird libraries since this would be for a page like CodeForces where you only get to use the native libraries.
(For the ones that are also doing this and are here for help, the correct output for this input would be 12)
To assume a complete graph may be most appropriate as suggested by "Beta".
Following code may creates edges between every pair of two dots from a list of dots in the array dots and returns the number of edges created.
After execute this code, you may be able to apply Prim Algorithm for finding MST.
// Definition of Structure "node" and an array to store inputs
typedef struct node {
int x; // x-cordinate of dot
int y; // y-cordinate of dot
} dots[100];
// Definition of Structure "edge"
typedef struct edge {
int t1; // index of dot in dots[] for an end.
int t2; // index of dot in dots[] for another end.
float weight; // weight (geometric distance between two ends)
} lines[];
// Function to create edges of complete graph from an array of nodes.
// Argument: number of nodes stored in the array dots.
// ReturnValue: number of edges created.
// Assumption: the array lines is large enough to store all edges of complete graph
int createCompleteGraph(int numberOfNodes){
int i,j,k,x-diff,y-diff;
k=0; // k is index of array lines
for (i=0; i<numberOfNodes-1; i++) {// index of a node at one end
for (j=i+1; j<numberOfNodes; j++) {// index of a node at another end
lines[k].t1 = i;
lines[k].t2 = j;
x-diff = dots[i].x - dots[j].x;
y-diff = dots[i].y - dots[j].y;
lines[k].weight = sqrt(x-diff * x-diff + y-diff * y-diff) // calculate geometric distance
k++;
}
}
return k;
}

Storing cost between nodes

I am making a maze solver using Uniform Cost Search and basically what I want to do is store random costs between rooms in my maze.
Data structure of rooms (named cells):
struct Cell
{
int row;
int column;
vector<Cell*> neighbors;
State state;
};
row and column are the position in the maze vector of the Cell, the vector<Cell*> neighbors defines with which cells this particular cell is connected to and state keeps a state of the cell (visited, empty etc.).
What I tried doing is making a property of the Cell struct like this: vector<int> cost where every element of that array matches the neighbor element.
For example:
012345
0 ######
1 # ##
2 # # #
3 ######
maze[1][1] has in it's neighbors vector:
neighbors[0] = *maze[1][2];
neighbors[1] = *maze[2][1];
it's cost vector now is:
cost[0] = 5;
cost[1] = 10;
But that way of doing it created a lot of problems.
What I have thought is that I need a cost matrix which will match one node with another and store the cost in the matrix, something like this:
0 1 2
0[0][2][4]
1[2][0][6]
2[4][6][0]
But in order to do this how will I make my matrix know which cell is which? how instead of 0's and 1's I make it know that it's [0][0] [0][1] [0][2] etc.
Do I need to utilize a 3D vector for something like this? If I do I would prefer to avoid it since I am inexperienced with 3D vectors.
Couldn't you use a custom object for your link to another room? Eg:
struct Cell;
struct CellLink {
const Cell *cell;
const int weight;
..
};
struct Cell {
int row;
int column;
vector<CellLink> neighbors;
State state;
};
This would keep cost and cell coupled with no worries. Only drawback is that you will store each cost twice (assuming it's symmetric) but this is true in many other approaches (matrix included).

What is the best standard data structure to build a Graph?

at first i am a beginner at c++ and i am self learning it, so please be quite simple in answers ...
i need to program a graph that contains nodes each node has id and list of edges each edge has the other node id and the distance
what i am looking for is what should i use to build this graph considering that i wants to use dijkstra algorithm to get the shortest path form one point to the other ... so searching performance should be the most important i think !!
i have searched a lot and i am so confused now
thank you in advance for the help
You can define an Edge structure like
struct Edge
{
int destination;
int weight;
};
And create a graph as
vector<vector<Edge> > graph;
Then to access all the edges coming from the vertex u, you write something like
for( int i = 0; i < graph[u].size(); ++i ) {
Edge edge = graph[u][i];
// here edge.destination and edge.weight give you some details.
}
You can dynamically add new edges, for example an edge from 3rd vertex to 7th with a weight of 8:
Edge newEdge;
newEdge.destination = 7;
newEdge.weight = 8;
graph[3].push_back( newEdge );
etc.
For undirected graphs you should not forget to add the symmetric edge, of course.
This should do ok.
Edit
The choice of base containers (std::vector, std::list, std::map) depends on the use case, e.g. what are you doing with the graph more often: add/remove vertices/edges, just traversing. Once your graph is created, either std::list or std::vector is equally good for Dijkstra, std::vector being a bit faster thanks to sequential access pattern of the relaxation stage.
Use unordered_map<int,vector<int>> to represent adjacency list if you have huge number of vertexes. If you're planning on implementing a small scale graph, then go with array of vectors. Eg: vector<int> v[20];
a graph that contains nodes each node has id and list of edges each edge has the other node id and the distance
If we consider each node id as an index. We can draw an nxn matrix of the edges as follows.
This can help you draw the graph with edges.
[0][1][2][3]
[0] | 1 0 0 0|
[1] | 0 0 1 0|
[2] | 1 0 0 1|
[3] | 0 0 1 0|
So, a 2D array is a good representation of matrix.
int maxtrix[4][4] = new int[4][4];
I personally would use a std::map<Node*, std::set<Node*> >. This is extremely useful because each time you are at a node, you can quickly find out which nodes that node is connected to. It is also really easy to iterate over all the nodes if you need to. If you need to put weights on the edges, you could use std::map<Node*, std::set< std::pair<int, Node*> > >. This will give much better performance than using vectors, especially for large graphs.