Greetings code-gurus!
I am writing an algorithm that connects, for instance node_A of Region_A with node_D of Region_D. (node_A and node_D are just integers). There could be 100k+ such nodes.
Assume that the line segment between A and D passes through a number of other regions, B, C, Z . There will be a maximum of 20 regions in between these two nodes.
Each region has its own properties that may vary according to the connection A-D. I want to access these at a later point of time.
I am looking for a good data structure (perhaps an STL container) that can hold this information for a particular connection.
For example, for connection A - D I want to store :
node_A,
node_D,
crosssectional area (computed elsewhere) ,
regionB,
regionB_thickness,
regionB other properties,
regionC, ....
The data can be double , int , string and could also be an array /vector etc.
First I considered creating structs or classes for regionB, regionC etc .
But, for each connection A-D, certain properties like thickness of the region through which this connection passes are different.
There will only be 3 or 4 different things I need to store pertaining to a region.
Which data structure should I consider here (any STL container like vector?) Could you please suggest one? (would appreciate a code snippet)
To access a connection between nodes A-D, I want to make use of int node_A (an index).
This probably means I need to use a hashmap or similar data structure.
Can anyone please suggest a good data structure in C++ that can efficiently
hold this sort of data for connection A -D described above? (would appreciate a code snippet)
thank you!
UPDATE
for some reasons, I can not make use of pkgs like boost. So want to know if I can use any libraries from STL
You should try to group stuff together when you can. You can group the information on each region together with something like the following:
class Region_Info {
Region *ptr;
int thickness;
// Put other properties here.
};
Then, you can more easily create a data structure for your line segment, maybe something like the following:
class Line_Segment {
int node_A;
int node_D;
int crosssectional_area;
std::list<Region_Info>;
};
If you are limited to only 20 regions, then a list should work fine. A vector is also fine if you would prefer.
Have you considered a adjacency array for each node, which stores the nodes it is connected to, along with other data?
First, define a node
class Node
{
int id_;
std::vector<AdjacencyInfo> adjacency_;
}
Where the class AdjacencyInfo can store the myriad data which you need. You can change the Vector to a hashmap with the node id as the key if lookup speed is an issue. For fancy access you may want to overload the [] operator if it is an essential requirement.
So as an example
class Graph
{
std::map<int, Node> graph_;
}
boost has a graph library: boost.graph. Check it out if it is useful in your case.
Well, as everyone else has noticed, that's a graph. The question is, is it a sparse graph, or a dense one? There are generally two ways of representing graphs (more, but you'll probably only need to consider these two) :
adjacency matrix
adjacency list
An adjacency matrix is basically a NxN matrix which stores all the nodes in the first row and column, and connection data (edges) as cells, so you can index edges by vertices. Sorry if my English sucks, not my native language. Anyway, you should only consider adjacency matrix if you have a dense graph, and need to find node->edge->node connections really fast. However, iterating through neighbours or adding/removing vertices in an adjacency matrix is slow, the first requiring N iterations, and the second resizing the array/vector you use to store the matrix.
Your other option is to use an adjacency list. Basically, you have a class that represents a node, and one that represents an edge, that stores all the data for that edge, and two pointers that point to the nodes it's connected to. The node class has a collection of some sort (a list will do), and keeps track of all the edges it's connected to. Then you'll need a manager class, or simply a bunch of functions that operate on your nodes. Adding/connecting nodes is trivial in this case as is listing neighbours or connected edges. However, it's harder to iterate over all the edges. This structure is more flexible than the adjacency matrix and it's better for sparse graphs.
I'm not sure that I understood your question completely, but if I did, I think you'd be better off with an adjacency matrix, seems like you have a dense graph with lots of interconnected nodes and only need connection info.
Wikipedia has a good article on graphs as a data structure, as well as good references and links, and finding examples shouldn't be hard. Hope this helps :
Link
Related
When I look at a book, I only show examples of how to implement graphs in almost every book by adjacent matrix method and adjacent list method.
I'm trying to create a node-based editor, in which case the number of edges that stretch out on each node is small, but there's a lot of vertex.
So I'm trying to implement the adjacent list method rather than the adjacent matrix method.
However, adjacent lists store and use each edge as a connection list.
But, I would like to use the node in the form listed below.
class GraphNode
{
int x, y;
dataType data;
vector<GraphNode*> in;
vector<GraphNode*> out;
public:
GraphNode(var...) = 0;
};
So like this, I want to make the node act as a vertex and have access to other nodes that are connected.
Because when I create a node-based editor program, I have to access and process different nodes that are connected to each node.
I want to implement this without using a linked list.
And, I'm going to use graph algorithms in this state.
Is this a bad method?
Lastly, I apologize for my poor English.
Thank you for reading.
You're just missing the point of the difference between adjacency list and adjacency matrix. The main point is the complexity of operations, like finding edges or iterating over them. If you compare a std::list and a std::vector as datatype implementing an adjacency list, both have a complexity of O(n) (n being the number of edges) for these operations, so they are equivalent.
Other considerations:
If you're modifying the graph, insertion and deletion may be relevant as well. In that case, you may prefer a linked list.
I said that the two are equivalent, but generally std::vector has a better locality of reference and less memory overhead, so it performs better. The general rule in C++ is to use std::vector for any sequential container, until profiling has shown that it is a bottleneck.
Short answer: It is probably a reasonable way for implementing a graph.
Long answer: What graph data structure to use is always dependent on what you want to use it for. A adjacency matrix is good for very dense graphs were it will not waste space due to many 0 entries and if we want to answer the question "Is there an edge between A and B?" fast. The iteration over all members of a node can take pretty long, since it has to look at a whole row and not just the neighbors.
An adjacency list is good for sparse graphs and if we mostly want to look up all neighbors of a node (which is very often the case for graph mustering algorithms). In a directed graph were we want to treat ingoing and outgoing edges seperately, it is probably a good idea to have a seperate adjacency list for ingoing and outgoing egdes (as has your code).
Regarding what container to use for the list, it depends on the use case. If you will much more often iterate over the graph and not so often delete something from it, using a vector over a list is a very good idea (basically all graph programms I ever wrote were of this type). If you have a graph that changes very often, you have to delete edges very often, you don't want to have iterator invalidation and so on, maybe it is better having a list. But that is very seldom the case.
A good design would be to make it very easy to change between list and vector so you can easily profile both and then use what is better for your program.
Btw if you often delete one edge, this is also pretty easily done fast with a vector, if you do not care about the order of your edges in adjacency list (so do not do this without thinking while iterating over the vector):
void delte_in_edge(size_t index) {
std::swap(in[i], in.back()); // The element to be deleted is now at the last position,
// the formerly last element is at position i
in.pop_back(); // Delete the current last element
}
This has O(1) complexity (and the swap is probably pretty fast).
I am designing an application which should be based on graphs.
I am not sure which is the best way to represent the graph adjacency list in memory. The requirements from the customer are quite vague, so I must make some several assumptions. The nodes of the graphs are some IDs, but I am not sure if the IDs are sequential or not. What does the graph theory say, when it comes to general specifications?
If they are sequential, the number of nodes (N) should also limit the max IDs, and basically it is ensured that the IDs will cover the interval 1,2…N. See option A below.
If they are not sequential, the IDs could jump from 1 to e.g. 11, and may skip some natural numbers in the specification. See option B below.
Beside ID, there is also a c++ data structure, where I store multiple info ( payload, connected edges etc.)
There are two options left for my algorithm:
A. Represent the graph as a vector< Data > , and index of vector will mean to the nodeID.
B. Represent the graph as a map , where Node ID is the key, and Data is the storage value.
Map would allow me having random IDs, let’s say that the input data is given randomly.
The literature ( e.g. DFS, BFS or other graph articles) is mostly considering option A, where node IDs fully cover an interval [1..N]. I would also go for this option, as it represents a commonly agreed notation.
Then, add this to the documentation/precondition section of my application.
What is the best option to proper cover customer’s ambiguous specifications?
You could choose to represent a your graph as a combination of your two listed options: have a Node structure that contains two members - an integer label and a the other struct you need.
The graph will store a std::vector<Node*> nodes;. However, given the restriction that a node's label will not match its position in the above vector, you will need to store the correspondence between label and vector indexes in a std::map<int, int> corresp;
Given this structure, if you need to access the Node* with a label value of 11, you would do Node* node = nodes[corresp[label]];
Also, the label could be any other type, for instance a std::string. The only modification that needs to be done is to change the key type of the map to std::string.
Case 1: sequential IDs. Then you may store the nodes in an array in such a way that the indexes correspond to the IDs.
Case 2: sparse IDs.
Usually the representation of the nodes of a graph allows them to have a payload (attributes), such as the ID. If you don't need to access the nodes by ID, use an array and you are done.
If you do need to access the nodes by ID, use a dictionary (map) to establish the correspondence. You can also store the nodes directly in the dictionary, but node enumeration or sorting will be harder.
I usually recommend identifying things with (maybe smart) pointers if they are objects, since that's the mechanism that C/C++ provides to identify objects.
Fundamentally, your graph consists of a number of nodes and edges, so you would generally have something like:
class Node {
int id;
Data data;
std::vector<Node *> edges;
}
Then, in your Graph class, you will need some kind of map for every other way you need to access nodes. You will probably need to be able to find nodes by id, so the graph class will need some kind of index for that -- a vector<Node *> nodesById for dense ids or a map<int,Node*> nodesById for sparse ids. Which one to choose should not be an important decision that has a lot of consequences. Add a method Node *getNodeById(int id), and then you can change the representation whenever you want. Always remember that, in software development, when a decision doesn't have an obvious answer, or when the best answer is likely to change in the future, then making it easy to change your mind is much better than making the right choice.
As people add requirements to your graph, you may need to access nodes in different ways and may have to add more kinds of indexes to support those particular use cases.
Two jobs you will need to do with your graph are construction and destruction. Construction will probably require that nodesById index. Destruction will definately require some way to enumerate all the nodes, and whichever representation you choose for nodesById will suffice for that as well.
You could use a map of vectors. Something like this:
Map<int,vector<Node *>>;
The key in this map would be your node id. The corresponding vector has the first entry as your corresponding node of that particular Id and then all the edges from that Id node.
Suppose, your graph has a node with id 2, and this node has its edges with nodes with id 3,4 and 6.
So your entry corresponding to the key 2 in your map would be a vector, that has its first entry as node with id 2, then next entry as node of id 3, then with 4 and then at last with node 6.
Your each vector entry of Node could look similar to this:
struct Node {
int id,
InfoData obj;
}
I am working on a graph implementation for a C++ class I am taking. This is what I came up with so far:
struct Edge {
int weight;
Vertex *endpoints[2]; // always will have 2 endpoints, since i'm making this undirected
};
struct Vertex {
int data; // or id
list<Edge*> edges;
};
class Graph {
public:
// constructor, destructor, methods, etc.
private:
list<Vertex> vertices;
};
It's a bit rough at the moment, but I guess I'm wondering... Am I missing something basic? It seems a bit too easy at the moment, and usually that means I'm designing it wrong.
My thought is that a graph is just a list of vertices, which has a list edges, which will have a list of edges, which has two vertex end points.
Other than some functions I'll put into the graph (like: shortest distance, size, add a vertex, etc), am I missing something in the basic implementation of these structs/classes?
Sometimes you need to design stuff like this and it is not immediately apparent what the most useful implementation and data representation is (for example, is it better storing a collection of points, or a collection of edges, or both?), you'll run into this all the time.
You might find, for example, that your first constructor isn't something you'd actually want. It might be easier to have the Graph class create the Vertices rather than passing them in.
Rather than working within the class itself and playing a guessing game, take a step back and work on the client code first. For example, you'll want to create a Graph object, add some points, connect the points with edges somehow, etc.
The ordering of the calls you make from the client will come naturally, as will the parameters of the functions themselves. With this understanding of what the client will look like, you can start to implement the functions themselves, and it will be more apparent what the actual implementation should be
Comments about your implementation:
A graph is a collection of objects in which some pairs of objects are related. Therefore, your current implementation is one potential way of doing it; you model the objects and the relationship between them.
The advantages of your current implementation are primarily constant lookup time along an edge and generalizability. Lookup time: if you want to access the nth neighbor of node k, that can be done in constant time. Generalizability: this represents almost any graph someone could think of, especially if you replace the data type of weight and data with an object (or a Template).
The disadvantages of your current implementation are that it will probably be slower than ideal. Looking across an edge will be cheap, but still take two hops instead of one (node->edge->node). Furthermore, using a list of edges is going to take you O(d) time to look up a specific edge, where d is the degree of the graph. (Your reliance on pointers also require that the graph fits in the memory of one computer; you'd have trouble with Facebook's graphs or the US road network. I doubt that parallel computing is a concern of yours at this point.)
Concerns when implementing a graph:
However, your question asks whether this is the best way. That's a difficult question, as several specific qualities of a graph come in to play.
Edge Information: If the way in which vertices are related doesn't matter (i.e., there is no weight or value to an edge), there is little point in using edge objects; this will only slow you down. Instead, each vertex can just keep a list of pointers to its neighbors, or a list of the IDs of its neighbors.
Construction: As you noticed in the comments, your current implementation requires that you have a vertex available before adding an edge. That is true in general. But you may want to create vertices on the fly as you add edges; this can make the construction look cleaner, but will take more time if the vertices have non-constant lookup time. If you know all vertices before construction the graph, it may be beneficial to explicitly create them first, then the edges.
Density: If the graph is sparse (i.e., the number of edges per vertex is approximately constant), then an adjacency list is again a good method. However, if it is dense, you can often get increased performance if you use an adjacency matrix. Every vertex holds a list of all other vertices in order, and so accessing any edge is a constant time operation.
Algorithm: What problems do you plan on solving on the graph? Several famous graph algorithms have different running times based on how the graph is represented.
Addendum:
Have a look at this question for many more comments that may help you out:
Graph implementation C++
I am learning C++ and I appreciate your support by answering my question to help me to understand fundamental concepts. I am sure I need to learn many stuff, but I need a some advice to help me to find the right way.
The problem I have is explained in below.
I want to implement a class to create a graph in C++. As I noticed, I can use matrices, but I am not interested in matrices as you can see later.
The graph is undirected and weighted. The graph is a vector of nodes and I use the standard library vector.
Each node(vertex) of the graph has below parameters and some neighbors.
node_index, node_degree, X, Y , Z.
The neighbors are nodes too and I can define a vector of nodes for them.
However, there are 3 reasons that I don't like to create a vector of nodes.
First,I don't need the Y,Z from a neighbor. I also need weight between this node and each of its neighbors.
Second, I need to calculate the node_degree, X for each node separately, and if I have duplicate nodes as neighbors, I need to update them manually that is extra work.
Third, the graph would be be large and I don't want to waste the valuable memory for useless information.
Having said that, I was thinking of having a base class that later I can derive the Node class and Neighbor class from it. Then for neighbors I keep a vector of pointers to beginning of each neighbor.
I don't know how, but I think I can cast that pointer to base class and by using it I can retrieve the information that I need from neighbor nodes.
In another words, I am trying to keep pointers to neighbors and when I update the neighbors parameters, I access to latest information of the nodes directly using pointers.
Would you please give a link to related topics that I should learn to implement it?
Please let me know if this is a very bad idea (by explaining the problems) and what is the better or best way to do this.
I advise you to use a Link structure, to represent an edge in the graph:
struct Link
{
Node *N;
float weight;
}
Then each Node can contain
vector<Link> neighbors;
This way there is no duplication of Nodes. There is a duplication of weights, since if Node A has a Link pointing to Node B, then Node B has a Link with the same weight pointing to Node A. If that duplication of weight is a problem (e.g. if the graph is so big that storage of the weights is expensive, or if weights are often updated), then you can make Link bidirectional (two Node* and one weight) and give each Node
vector<*Link>
The code will be slightly more complicated in that case, but it is the price of efficiency.
I understand that we create an array of linked lists, representing each vertex by an index of the array. But how do we store the actual data associated with each vertex? For example, if I have a directed graph containing vertices as : 1. John 2. Mary 3. Sunny , and edges as (John,Mary) and (Mary,Sunny), we can create an adjacency list in the usual way with (1,2) and (2,3) as edges. But where do we store the names associated with 1,2 and 3?
What did I do?
I created a class 'vertexnode' which stores a name and a pointer to an object of class 'edgenode'.Then I created an array containing objects of class 'vertexnode'. The 'edgenode' class contains 1) an index of this array, where the index represents second endpoint of an edge, and 2) a pointer to the next object of 'edgenode'. I then added vertices(names) and edges(pairs of names) to the graph. The program runs correctly.
I want to know whether this is a valid approach, or it is better to store names separately in an array, or there is some other method? Basically, I want to know how is it done conventionally?
P.S.: Please avoid using STL or Boost, or something similar in your answer.I am new to graphs, and I want to know how things work at the basic level, not some function which has already cooked the recipe for us. Thanks in advance.
I think you have too much classes for this case, don't know if this is the best approach, but I would go with one class:
Vertex
{
string name;
List<Vertex> adjacent;
}
that way you can know which vertex is linked to which one, and you only have one class to store everything.
But I'd say that the best approach would be the one that fits your needs, and it depends on what you're gonna do with your graph