Dynamically adding to a graph data structure - c++

Let me first state that I just want direction, not necessarily actual code, unless a small snippet is the only way to get the point across.
I need to create a DIRECTED graph data structure using an adjacency list or matrix in C++, and add vertices/edges from standard input, which means dynamically.
I think I'd be able to create a graph fine if I was able to instantiate a set of Vertices first, then create edges and add them to the graph, but I don't understand how it is possible to add an edge which contains a vertex that hasn't been instantiated yet.
for example, the first line from standard input reads:
Miami -> New York/1100 -> Washington/1000 -> albuquerque/1700
How am I supposed to add an edge from Miami to New York if the New York vertex hasn't been added to the graph yet?
Thanks for the direction everyone!

how it is possible to add an edge which
contains a vertex that hasn't been instantiated yet.
Simple: instantiate it..
I do not see any issue with this. Assume V to be the vertex set seen so far. V is initially empty. As you read the input x->y, you get its end points (x and y). If any one of them is not instantiated (i.e., not in V), you instantiate it and add it to the vertex set.
Another way to look to it: imagine we are defining the graph by its edge set E. By definition any edge is a pair of vertices which in turn defines the vertex set of the graph.

How about you resize the adjacency list each time a new unique node comes in? You can maintain a set of unique node values and use its size to adjust the size of the adjacency list each time you have to add a node. Below is a some code that does the same.
class Graph
{
public:
// Add links in the graph
void addLink(int id1, int id2){
// Add to hashset
uniqueNodes.insert(id1);
uniqueNodes.insert(id2);
// Resize on the adjacency list based on how many nodes exists in the uniqueNodes set
adjList.resize(uniqueNodes.size());
// Make the connections assuming undirected graph
adjList[id1].push_back(id2);
adjList[id2].push_back(id1);
}
// Print the graph
void printGraph(){
for(int i = 0; i < adjList.size(); i++){
cout << i << ":";
for(auto it = adjList[i].begin(); it != adjList[i].end(); it++)
cout << *it << "->";
cout << "NULL\n";
}
}
private:
// Adjacency list for the graph
vector<list<int>> adjList;
// Hashset to help define the size of the adjacency list as nodes come in
set<int> uniqueNodes;
};

Related

How to sort overlapping octree positions

So I have a sorting problem. What I want to do is construct an octree around the camera position, copy the leaf node data to a temporary container, then reset the octree and enter a new camera position. After this, check if any of the new node positions are in the old leaf nodes, and only update the data in the positions that have changed. The result should be that the positions closest to the camera have a higher level of detail and a larger number of nodes, as the octree is subdivided based on how close the node's center is to the camera position.
The image below illustrates this simply. If the black boxes represent the octree leaf nodes from the old camera position, none of them would be included in the new set as their centers are not the same as any of the pink boxes.
Obviously, the result I'm getting with my code isn't what it should be. There are huge areas of terrain that don't load the new positions properly/overlap with existing terrain.
I'm using an unordered set for the leaf nodes, as each one's position should be unique, and an unordered map for the chunks I want to load, so that I can load the correct chunk based on the key which is the position of the leafnode.
In the code, I update the list of chunks to create only if the amount of nodes in the world changes: if (tempNodes.size() != _octree->leafNodes.size())
The rest of the logic pertains to what I've already been explaining. It might just be some simple mistake, but I also might just be misinterpreting the way certain member functions work.
void LandscapeManager::updateCenter(glm::ivec3 newCenter)
{
std::unordered_set<Octree*> tempNodes;
tempNodes = _octree->leafNodes;
// reconstruct octree
delete _octree;
_octree = new Octree(glm::vec3(0, 0, 0), glm::vec3(pow(2, 8), pow(2, 8), pow(2, 8)));
// update camera position in octree
_octree->leafNodes.clear();
_octree->insert(newCenter);
std::cout << "Generating: " << _octree->leafNodes.size() << " chunks...\n";
if (tempNodes.size() != _octree->leafNodes.size())
{
// clear chunk list and refresh with new list generated by code below
_chunkLoader.Clear();
for (auto i : _octree->leafNodes)
{
// add a chunk to the chunkloader if it's position does not exist in the old octree's leaf nodes
if (!tempNodes.count(i))
{
_chunkLoader.Add(i->getOrigin(), i->getHalfSize());
}
if (tempNodes.count(i))
{
// remove unchanged leaf nodes from the old list so that all
// the remains is a list of changed leaf nodes, which will be used to delete old chunks
tempNodes.erase(tempNodes.find(i));
}
}
for (auto i : tempNodes)
{
if (_chunks.count(i->getOrigin()))
{
// unload and delete the meshes that have changed
auto unload = _chunks.find(i->getOrigin());
_chunks.at(i->getOrigin())->Unload();
delete _chunks.at(i->getOrigin());
_chunks.erase(unload);
}
}
}
_center = newCenter;
}

Boost connected components not working

I'm having trouble getting boost's connected components algorithm working reliably. I want to use it to separate collections of 3D point data. I have a data set where there is are two obvious 'clusters' of 3D points separated by a large distance.
I create an undirected Boost graph object with bundled information which is a simple 'point' class, this contains an x, y and z property and a method to calculate distance of this point from another.
adjacency_list<vecS, vecS, undirectedS,Point> Graph;
I have a vector of point information (Point_dat) which I now transfer to vertices of the graph
for (size_t i = 0; i < Inter_dat.size(); i++)
{
// Create a new vertex for every element of the original data
add_vertex(Graph);
(Graph)[i] = Point_dat[i];
}
I now iterate through the vertices of the graph (in two loops) to calculate the distance from each vertex to each other vertex (I know there are more efficient ways of doing this but this is a first stab)
If the distance between two points is below a threshold, I add an edge between the two vertices.
typedef adjacency_list<vecS, vecS, undirectedS, Point>::vertex_iterator iterator;
std::pair<iterator, iterator> p = vertices(Graph);
for (iterator it = p.first; it != p.second; it++)
{
for (iterator it2 = it; it2 != p.second; it2++)
{
double thisdist = Graph[*it].Distance(Graph[*it2]);
if (thisdist < distance_thresh & thisdist > 0)
{
add_edge(*it,*it2, Graph);
}
}
}
I then calculate the connected components.
std::vector<int> comp(num_vertices(Graph));
int num = connected_components(Graph, &comp[0]);
The problem is, the data isn't being separated into the two clusters. The main cluster still contains some of the second. In fact if I re-run the analysis using the largest component (as selected by Boost) it finds a number of new components.
Why isn't it working? Do I need to add different parameters?
UPDATE: To give more context, I'm trying to implement a process I developed in Matlab in C++. To try and see the root of the problem I saved the edges of my graph
std::ofstream myfile("D:\\edges.txt");
auto q = edges(Graph);
for (auto it = q.first; it != q.second; ++it)
myfile << *it << std::endl;
myfile.close();
and imported the resulting data into Matlab. I then used Matlab's connected components process on this edge information and it gave me exactly the right answer. It seems that I'm setting the graph up correctly but Boosts connected components isn't giving me the right answer.

Linked list based Graph, Can vertex has a multiple edge?

I'm thinking the way to implement the graph which is linked list based. But, as far as I know, linked list can only access to next list (or previous too if doubly) and vertex from graph can access any other vertex unless it doesn't have an edge to some vertex. These two different features break my idea to build a graph.
If my vertex(or node) class(or structure) has pointer to another vertex,
class Vertex
{
Vertex *link; //edge to another veretex
int item; //item in vertex
}
and my graph class looks like
class GraphClass
{
Vertex **Graph; //Graph itself
int VertexQuantity; // number of vertex in graph
}
I can add vertex into graph with function addVertex() but when try to connect two vertexes, starts to losing my head. What I considering to build addEdge() function is
two vertexes must exist in graph
two vertexes shouldn't connected yet
and a function below is my addEdge() function which I'm working now.
void addEdge(Graph *g, Vertex *source, Vertex *destination)
{
unsigned index, sourceIndex;
Vertex *temp;
// if source or destination is not exist in graph
if((sourceIndex = searchVertex(g, source) < 0 || searchVertex(g, destination) < 0))
return;
// if source and destination are already connected
if(checkConnection(g, sourceIndex, source, destination) < 0 || sourceIndex < 0)
return;
temp = g->Graph[sourceIndex];
temp->link = destination;
}
This is my question. Let's say, v1 is vertex and connected to v2. What should I do if I wanna make a connection between v1 and v3? v1 only has its link to point some vertex, if I change vertex's next pointer as array to point multiple vertex, it breaks the rule of linked list.
Usually this is done by having a linked list of all the nodes that your current node is connected too. If this is too confusing, start with a vector or array of pointers to all the nodes that your current node connects to, then implement that vector/array using a linked list.

How to get igraph vertex id after adding vertex using igraph_add_vertices in c++

I would like to get the VID (vertex ID) after I have added a single vertex to an existing graph. I current get a vertex_set after adding the new vertex and loop through to the end of the vertex set (assuming this is always the last added vertex even in the event of a earlier one being deleted?). I need to test if deleting a vertex from the middle of the set changes the VIDs still. But I am sure there must be a better (read more efficient way) of doing this.. The code below is what I currently use.
Any help appreciated as I am new to iGraph.
// add into graph
igraph_integer_t t = 1;
if(igraph_add_vertices(user_graph, t, 0) != 0)
{
::MessageBoxW(NULL, L"Failed to add vertex to iGraph, vertex not added.", L"Network Model", MB_ICONSTOP);
return false;
}
/* get all verticies */
igraph_vs_t vertex_set;
igraph_vit_t vit;
igraph_integer_t vid = 0;
igraph_vs_all(&vertex_set);
igraph_vit_create(user_graph, vertex_set, &vit);
// must be a better way - look for starting from end.
while (!IGRAPH_VIT_END(vit))
{
vid = IGRAPH_VIT_GET(vit);
IGRAPH_VIT_NEXT(vit);
}
// add vid to vertex ca
ca->graphid = (int)vid;
// Add new vertex to local store
vm->CreateVertex(ca);
// cleanup
igraph_vit_destroy(&vit);
igraph_vs_destroy(&vertex_set);
Vertex IDs (and also edge IDs) in igraph are integers from zero up to the number of vertices/edges minus one. Therefore, if you add a new vertex or edge, its ID will always be equal to the number of vertices/edges before the addition. Also, if you delete some edges, the IDs of existing edges will be re-arranged to make the edge ID range continuous again. Same applies for the deletion of vertices, and note that deleting some vertices will also re-arrange the edge IDs unless the deleted vertices were isolated.

How to Store Very Large Graphs Space Efficiently Yet have Fast Indexing?

I am working on a graph with 875713 nodes and 5105039 edges. Using vector<bitset<875713>> vec(875713) or array<bitset<875713>, 875713> throws a segfault at me. I need to calculate all-pair-shortest-paths with path recovery. What alternative data structures do I have?
I found this SO Thread but it doesn't answer my query.
EDIT
I tried this after reading the suggestions, seems to work. Thanks everyone for helping me out.
vector<vector<uint>> neighboursOf; // An edge between i and j exists if
// neighboursOf[i] contains j
neighboursOf.resize(nodeCount);
while (input.good())
{
uint fromNodeId = 0;
uint toNodeId = 0;
getline(input, line);
// Skip comments in the input file
if (line.size() > 0 && line[0] == '#')
continue;
else
{
// Each line is of the format "<fromNodeId> [TAB] <toNodeId>"
sscanf(line.c_str(), "%d\t%d", &fromNodeId, &toNodeId);
// Store the edge
neighboursOf[fromNodeId].push_back(toNodeId);
}
}
Your graph is sparse, that is, |E| << |V|^2, so you should probably either use a sparse matrix to represent your adjacency matrix, or equivalently, store for each node a list of its neighbors (which is results in a jagged array), like this -
vector<vector<int> > V (number_of_nodes);
// For each cell of V, which is a vector itself, push only the indices of adjacent nodes.
V[0].push_back(2); // Node number 2 is a neighbor of node number 0
...
V[number_of_nodes-1].push_back(...);
This way, your expected memory requirements are O(|E| + |V|) instead of O(|V|^2), which in your case should be around 50 MB instead of a gazzillion MB.
This will also result in a faster Dijkstra (or any other shortest-path algorithm) since you only need to consider the neighbors of a node at each step.
You could store lists of edges per node in a single array. If the number of edges per node is variable you can terminate the lists with a null edge. This will avoid the space overhead for many small lists (or similar data structures). The result could look like this:
enum {
MAX_NODES = 875713,
MAX_EDGES = 5105039,
};
int nodes[MAX_NODES+1]; // contains index into array edges[].
// index zero is reserved as null node
// to terminate lists.
int edges[MAX_EDGES+MAX_NODES]; // contains null terminated lists of edges.
// each edge occupies a single entry in the
// array. each list ends with a null node.
// there are MAX_EDGES entries and MAX_NODES
// lists.
[...]
/* find edges for node */
int node, edge, edge_index;
for (edge_index=nodes[node]; edges[edge_index]; edge_index++) {
edge = edges[edge_index];
/* do something with edge... */
}
Minimizing the space overhead is very important since you have a huge number of small data structures. The overhead for each list of nodes is just one integer, this is much less than the overhead of e.g. a stl vector. Also the lists are continuously layed out in memory, which means that there is no wasted space between any two lists. With variable sized vectors this will not be the case.
Reading all edges for any given node will be very fast because the edges for any node are stored continuously in memory.
The downside of this data arrangement is that when you initialize the arrays and construct the edge lists, you need to have all the edges for a node at hand. This is not a problem if you get the edges sorted by node, but does not work well if the edges are in random order.
If we declare a Node as below:
struct{
int node_id;
vector<int> edges; //all the edges starts from this Node.
} Node;
Then all the nodes can be expressed as below:
array<Node> nodes;