Modelling a graph with classes 'edge' and 'node'

Modelling a graph with classes 'edge' and 'node' - c++

I have to model an unweighted graph in C++ and do BFS and DFS. But, I have to create two separate classes: node and edge which would contain the two extremities of type node. I do not understand how would I be able to do any search in the graph using the edge class. If it were me, I would only create the node class and have the neighbours of node X in a list/array inside the class. Is there something I am missing? How could I take advantege of the class edge? Is there any way to do the searches without using an adjacency matrix or list? Thank you!

You can still use adjacency lists. But they will contain not references to other nodes, but rather references to edge instances, which in turn contain both endpoints as node references, as you mention. Of course, that seems kind of redundant because if you have, say source and target nodes in each edge and then you access some edge of a given node using something like node.edges[i].source, then you instantly know that this source is the node itself, otherwise this edge wouldn't even be in this node's adjacency list. But source may still be useful if you pass a reference to edge alone somewhere, where you don't have the source node at hand.
That aside, for the simplest graph this kind of approach seems like an overkill, because edges only store source and destination edges, the former being mostly redundant. But you may need to extend your edge class later with something like weights, labels, auxiliary data like residual flow or something like that. So having such class is a nice idea after all.
Moreover, some algorithms work directly on edges. For example, you may need to search for an edge satisfying some criterion. Having a separate class gives you freedom to create lists of edges without ad-hoc approaches like pair<Node, Node> or something.

You miss the constant of space complexity in storing the edges.
When you store the edges in adjacency matrix/list, you have to store both (node1, node2) and (node2, node1) for each edge.
You double the space, although the big-O complexity stays the same.
However, there're times when such space constant has to be considered.
You can take advantage of the class edge when you would like to save as much space as possible, and you would like to prioritize space over time.
Linear search on all the edge instances is a way.
Linear search is slow but you prioritize space over time.
Maybe parallel search when you have a distributed system is another way, but you have to verify.
Your homework question may have an artificial constraint of the design of the edge class. Artificial constraint idea comes from https://meta.stackexchange.com/questions/10811/how-do-i-ask-and-answer-homework-questions

Related

Is this a bad way to implement a graph?

When I look at a book, I only show examples of how to implement graphs in almost every book by adjacent matrix method and adjacent list method.
I'm trying to create a node-based editor, in which case the number of edges that stretch out on each node is small, but there's a lot of vertex.
So I'm trying to implement the adjacent list method rather than the adjacent matrix method.
However, adjacent lists store and use each edge as a connection list.
But, I would like to use the node in the form listed below.
class GraphNode
{
int x, y;
dataType data;
vector<GraphNode*> in;
vector<GraphNode*> out;
public:
GraphNode(var...) = 0;
};
So like this, I want to make the node act as a vertex and have access to other nodes that are connected.
Because when I create a node-based editor program, I have to access and process different nodes that are connected to each node.
I want to implement this without using a linked list.
And, I'm going to use graph algorithms in this state.
Is this a bad method?
Lastly, I apologize for my poor English.
Thank you for reading.

You're just missing the point of the difference between adjacency list and adjacency matrix. The main point is the complexity of operations, like finding edges or iterating over them. If you compare a std::list and a std::vector as datatype implementing an adjacency list, both have a complexity of O(n) (n being the number of edges) for these operations, so they are equivalent.
Other considerations:
If you're modifying the graph, insertion and deletion may be relevant as well. In that case, you may prefer a linked list.
I said that the two are equivalent, but generally std::vector has a better locality of reference and less memory overhead, so it performs better. The general rule in C++ is to use std::vector for any sequential container, until profiling has shown that it is a bottleneck.

Short answer: It is probably a reasonable way for implementing a graph.
Long answer: What graph data structure to use is always dependent on what you want to use it for. A adjacency matrix is good for very dense graphs were it will not waste space due to many 0 entries and if we want to answer the question "Is there an edge between A and B?" fast. The iteration over all members of a node can take pretty long, since it has to look at a whole row and not just the neighbors.
An adjacency list is good for sparse graphs and if we mostly want to look up all neighbors of a node (which is very often the case for graph mustering algorithms). In a directed graph were we want to treat ingoing and outgoing edges seperately, it is probably a good idea to have a seperate adjacency list for ingoing and outgoing egdes (as has your code).
Regarding what container to use for the list, it depends on the use case. If you will much more often iterate over the graph and not so often delete something from it, using a vector over a list is a very good idea (basically all graph programms I ever wrote were of this type). If you have a graph that changes very often, you have to delete edges very often, you don't want to have iterator invalidation and so on, maybe it is better having a list. But that is very seldom the case.
A good design would be to make it very easy to change between list and vector so you can easily profile both and then use what is better for your program.
Btw if you often delete one edge, this is also pretty easily done fast with a vector, if you do not care about the order of your edges in adjacency list (so do not do this without thinking while iterating over the vector):
void delte_in_edge(size_t index) {
std::swap(in[i], in.back()); // The element to be deleted is now at the last position,
// the formerly last element is at position i
in.pop_back(); // Delete the current last element
}
This has O(1) complexity (and the swap is probably pretty fast).

Rudimentary C++ Graph Implementation

I am working on a graph implementation for a C++ class I am taking. This is what I came up with so far:
struct Edge {
int weight;
Vertex *endpoints[2]; // always will have 2 endpoints, since i'm making this undirected
};
struct Vertex {
int data; // or id
list<Edge*> edges;
};
class Graph {
public:
// constructor, destructor, methods, etc.
private:
list<Vertex> vertices;
};
It's a bit rough at the moment, but I guess I'm wondering... Am I missing something basic? It seems a bit too easy at the moment, and usually that means I'm designing it wrong.
My thought is that a graph is just a list of vertices, which has a list edges, which will have a list of edges, which has two vertex end points.
Other than some functions I'll put into the graph (like: shortest distance, size, add a vertex, etc), am I missing something in the basic implementation of these structs/classes?

Sometimes you need to design stuff like this and it is not immediately apparent what the most useful implementation and data representation is (for example, is it better storing a collection of points, or a collection of edges, or both?), you'll run into this all the time.
You might find, for example, that your first constructor isn't something you'd actually want. It might be easier to have the Graph class create the Vertices rather than passing them in.
Rather than working within the class itself and playing a guessing game, take a step back and work on the client code first. For example, you'll want to create a Graph object, add some points, connect the points with edges somehow, etc.
The ordering of the calls you make from the client will come naturally, as will the parameters of the functions themselves. With this understanding of what the client will look like, you can start to implement the functions themselves, and it will be more apparent what the actual implementation should be

Comments about your implementation:
A graph is a collection of objects in which some pairs of objects are related. Therefore, your current implementation is one potential way of doing it; you model the objects and the relationship between them.
The advantages of your current implementation are primarily constant lookup time along an edge and generalizability. Lookup time: if you want to access the nth neighbor of node k, that can be done in constant time. Generalizability: this represents almost any graph someone could think of, especially if you replace the data type of weight and data with an object (or a Template).
The disadvantages of your current implementation are that it will probably be slower than ideal. Looking across an edge will be cheap, but still take two hops instead of one (node->edge->node). Furthermore, using a list of edges is going to take you O(d) time to look up a specific edge, where d is the degree of the graph. (Your reliance on pointers also require that the graph fits in the memory of one computer; you'd have trouble with Facebook's graphs or the US road network. I doubt that parallel computing is a concern of yours at this point.)
Concerns when implementing a graph:
However, your question asks whether this is the best way. That's a difficult question, as several specific qualities of a graph come in to play.
Edge Information: If the way in which vertices are related doesn't matter (i.e., there is no weight or value to an edge), there is little point in using edge objects; this will only slow you down. Instead, each vertex can just keep a list of pointers to its neighbors, or a list of the IDs of its neighbors.
Construction: As you noticed in the comments, your current implementation requires that you have a vertex available before adding an edge. That is true in general. But you may want to create vertices on the fly as you add edges; this can make the construction look cleaner, but will take more time if the vertices have non-constant lookup time. If you know all vertices before construction the graph, it may be beneficial to explicitly create them first, then the edges.
Density: If the graph is sparse (i.e., the number of edges per vertex is approximately constant), then an adjacency list is again a good method. However, if it is dense, you can often get increased performance if you use an adjacency matrix. Every vertex holds a list of all other vertices in order, and so accessing any edge is a constant time operation.
Algorithm: What problems do you plan on solving on the graph? Several famous graph algorithms have different running times based on how the graph is represented.
Addendum:
Have a look at this question for many more comments that may help you out:
Graph implementation C++

undirected weighted graph data structure in c++

I am learning C++ and I appreciate your support by answering my question to help me to understand fundamental concepts. I am sure I need to learn many stuff, but I need a some advice to help me to find the right way.
The problem I have is explained in below.
I want to implement a class to create a graph in C++. As I noticed, I can use matrices, but I am not interested in matrices as you can see later.
The graph is undirected and weighted. The graph is a vector of nodes and I use the standard library vector.
Each node(vertex) of the graph has below parameters and some neighbors.
node_index, node_degree, X, Y , Z.
The neighbors are nodes too and I can define a vector of nodes for them.
However, there are 3 reasons that I don't like to create a vector of nodes.
First,I don't need the Y,Z from a neighbor. I also need weight between this node and each of its neighbors.
Second, I need to calculate the node_degree, X for each node separately, and if I have duplicate nodes as neighbors, I need to update them manually that is extra work.
Third, the graph would be be large and I don't want to waste the valuable memory for useless information.
Having said that, I was thinking of having a base class that later I can derive the Node class and Neighbor class from it. Then for neighbors I keep a vector of pointers to beginning of each neighbor.
I don't know how, but I think I can cast that pointer to base class and by using it I can retrieve the information that I need from neighbor nodes.
In another words, I am trying to keep pointers to neighbors and when I update the neighbors parameters, I access to latest information of the nodes directly using pointers.
Would you please give a link to related topics that I should learn to implement it?
Please let me know if this is a very bad idea (by explaining the problems) and what is the better or best way to do this.

I advise you to use a Link structure, to represent an edge in the graph:
struct Link
{
Node *N;
float weight;
}
Then each Node can contain
vector<Link> neighbors;
This way there is no duplication of Nodes. There is a duplication of weights, since if Node A has a Link pointing to Node B, then Node B has a Link with the same weight pointing to Node A. If that duplication of weight is a problem (e.g. if the graph is so big that storage of the weights is expensive, or if weights are often updated), then you can make Link bidirectional (two Node* and one weight) and give each Node
vector<*Link>
The code will be slightly more complicated in that case, but it is the price of efficiency.

Traverse through a DAG-like structure to produce another DAG-like structure in Clojure

I have a DAG-like structure that is essentially a deeply-nested map. The maps in this structure can have common values, so the overall structure is not a tree but a direct acyclic graph. I'll refer to this structure as a DAG for brevity.
The nodes in this graph are of different but finite number of categories. Each category can have its own structure/keywords/number-of-children. There is one unique node that is the source of this DAG, meaning from this node we can reach all nodes in the DAG.
The task is to traverse through the DAG from the source node, and convert each node to another one or more nodes in a new constructed graph. I'll give an example for illustration.
The graph in the upper half is the input one. The lower half is the one after transformation. For simplicity, the transformation is only done on node A where it is split into node 1 and A1. The children of node A are also reallocated.
What I have tried (or in mind):
Write a function to convert one object for different types. Inside this function, recursively call itself to convert each of its children. This method suffers from the problem that data are immutable. The nodes in the transformed graph cannot be changed randomly to add children. To overcome this, I need to wrap every node in a ref/atom/agent.
Do a topological sort on the original graph. Then convert the nodes in the reversed order, i.e., bottom-up. This method requires a extra traverse of the graph but at least the data need not to be mutable. Regarding the topological sort algorithm, I'm considering DFS-based method as stated in the wiki page, which does not require the knowledge of the full graph nor a node's parents.
My question is:
Is there any other approaches you might consider, possibly more elegant/efficient/idiomatic?
I'm more in favour of the second method, is there any flaws or potential problems?
Thanks!
EDIT: On a second thought, a topological sorting is not necessary. The transformation can be done in the post-order traversal already.

This looks like a perfect application of Zippers. They have all the capabilities you described as needed and can produce the edited 'new' DAG. There are also a number of libraries that ease the search and replace capability using predicate threads.
I've used zippers when working with OWL ontologies defined in nested vector or map trees.
Another option would be to take a look at Walkers although I've found these a bit more tedious to use.

How can one create cyclic (and immutable) data structures in Clojure without extra indirection?

I need to represent directed graphs in Clojure. I'd like to represent each node in the graph as an object (probably a record) that includes a field called :edges that is a collection of the nodes that are directly reachable from the current node. Hopefully it goes without saying, but I would like these graphs to be immutable.
I can construct directed acyclic graphs with this approach as long as I do a topological sort and build each graph "from the leaves up".
This approach doesn't work for cyclic graphs, however. The one workaround I can think of is to have a separate collection (probably a map or vector) of all of the edges for an entire graph. The :edges field in each node would then have the key (or index) into the graph's collection of edges. Adding this extra level of indirection works because I can create keys (or indexes) before the things they (will) refer to exist, but it feels like a kludge. Not only do I need to do an extra lookup whenever I want to visit a neighboring node, but I also have to pass around the global edges collection, which feels very clumsy.
I've heard that some Lisps have a way of creating cyclic lists without resorting to mutation functions. Is there a way to create immutable cyclic data structures in Clojure?

You can wrap each node in a ref to give it a stable handle to point at (and allow you to modify the reference which can start as nil). It is then possible to possible to build cyclic graphs that way. This does have "extra" indirection of course.
I don't think this is a very good idea though. Your second idea is a more common implementation. We built something like this to hold an RDF graph and it is possible to build it out of the core data structures and layer indices over the top of it without too much effort.

I've been playing with this the last few days.
I first tried making each node hold a set of refs to edges, and each edge hold a set of refs to the nodes. I set them equal to each other in a (dosync... (ref-set...)) type of operation. I didn't like this because changing one node requires a large amount of updates, and printing out the graph was a bit tricky. I had to override the print-method multimethod so the repl wouldn't stack overflow. Also any time I wanted to add an edge to an existing node, I had to extract the actual node from the graph first, then do all sorts of edge updates and that sort of thing to make sure everyone was holding on to the most recent version of the other thing. Also, because things were in a ref, determining whether something was connected to something else was a linear-time operation, which seemed inelegant. I didn't get very far before determining that actually performing any useful algorithms with this method would be difficult.
Then I tried another approach which is a variation of the matrix referred to elsewhere. The graph is a clojure map, where the keys are the nodes (not refs to nodes), and the values are another map in which the keys are the neighboring nodes and single value of each key is the edge to that node, represented either as a numerical value indicating the strength of the edge, or an edge structure which I defined elsewhere.
It looks like this, sort of, for 1->2, 1->3, 2->5, 5->2
(def graph {node-1 {node-2 edge12, node-3 edge13},
node-2 {node-5 edge25},
node-3 nil ;;no edge leaves from node 3
node-5 {node-2 edge52}) ;; nodes 2 and 5 have an undirected edge
To access the neighbors of node-1 you go (keys (graph node-1)) or call the function defined elsewhere (neighbors graph node-1), or you can say ((graph node-1) node-2) to get the edge from 1->2.
Several advantages:
Constant time lookup of a node in the graph and of a neighboring node, or return nil if it doesn't exist.
Simple and flexible edge definition. A directed edge exists implicitly when you add a neighbor to a node entry in the map, and its value (or a structure for more information) is provided explicitly, or nil.
You don't have to look up the existing node to do anything to it. It's immutable, so you can define it once before adding it to the graph and then you don't have to chase it around getting the latest version when things change. If a connection in the graph changes, you change the graph structure, not the nodes/edges themselves.
This combines the best features of a matrix representation (the graph topology is in the graph map itself not encoded in the nodes and edges, constant time lookup, and non-mutating nodes and edges), and the adjacency-list (each node "has" a list of its neighboring nodes, space efficient since you don't have any "blanks" like a canonical sparse matrix).
You can have multiples edges between nodes, and if you accidentally define an edge which already exists exactly, the map structure takes care of making sure you are not duplicating it.
Node and edge identity is kept by clojure. I don't have to come up with any sort of indexing scheme or common reference point. The keys and values of the maps are the things they represent, not a lookup elsewhere or ref. Your node structure can be all nils, and as long as it's unique, it can be represented in the graph.
The only big-ish disadvantage I see is that for any given operation (add, remove, any algorithm), you can't just pass it a starting node. You have to pass the whole graph map and a starting node, which is probably a fair price to pay for the simplicity of the whole thing. Another minor disadvantage (or maybe not) is that for an undirected edge you have to define the edge in each direction. This is actually okay because sometimes an edge has a different value for each direction and this scheme allows you to do that.
The only other thing I see here is that because an edge is implicit in the existence of a key-value pair in the map, you cannot define a hyperedge (ie one which connects more than 2 nodes). I don't think this is a big deal necessarily since most graph algorithms I've come across (all?) only deal with an edge that connects 2 nodes.

I ran into this challenge before and concluded that it isn't possible using truly immutable data structures in Clojure at present.
However you may find one or more of the following options acceptable:
Use deftype with ":unsynchronized-mutable" to create a mutable :edges field in each node that you change only once during construction. You can treat it as read-only from then on, with no extra indirection overhead. This approach will probably have the best performance but is a bit of a hack.
Use an atom to implement :edges. There is a bit of extra indirection, but I've personally found reading atoms to be extremely efficient.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js