Implementing list position locator in C++? - c++

I am writing a basic Graph API in C++ (I know libraries already exist, but I am doing it for the practice/experience). The structure is basically that of an adjacency list representation. So there are Vertex objects and Edge objects, and the Graph class contains:
list<Vertex *> vertexList
list<Edge *> edgeList
Each Edge object has two Vertex* members representing its endpoints, and each Vertex object has a list of Edge* members representing the edges incident to the Vertex.
All this is quite standard, but here is my problem. I want to be able to implement deletion of Edges and Vertices in constant time, so for example each Vertex object should have a Locator member that points to the position of its Vertex* in the vertexList. The way I first implemented this was by saving a list::iterator, as follows:
vertexList.push_back(v);
v->locator = --vertexList.end();
Then if I need to delete this vertex later, then rather than searching the whole vertexList for its pointer, I can call:
vertexList.erase(v->locator);
This works fine at first, but it seems that if enough changes (deletions) are made to the list, the iterators will become out-of-date and I get all sorts of iterator errors at runtime. This seems strange for a linked list, because it doesn't seem like you should ever need to re-allocate the remaining members of the list after deletions, but maybe the STL does this to optimize by keeping memory somewhat contiguous?
In any case, I would appreciate it if anyone has any insight as to why this happens. Is there a standard way in C++ to implement a locator that will keep track of an element's position in a list without becoming obsolete?
Much thanks,
Jeff

(I am assuming you are single-threaded, obviously list isn't thread-safe)
but maybe the STL does this to optimize by keeping memory somewhat contiguous?
Incorrect - list::insert, list::push_front and list::push_back do not affect the validity of list::iterators. If you are only calling these mutators on the list, than it will remain valid.
In any case, I would appreciate it if anyone has any insight as to why this happens. Is there a standard way in C++ to implement a locator that will keep track of an element's position in a list without becoming obsolete?
Your method should work, please post some code demonstrating it not working. In the meantime here are two alternative representations:
Why not use:
struct Graph
{
typedef unique_ptr<Vertex> PVertex;
typedef unique_ptr<Edge> PEdge;
unordered_set<PVertex> verticies;
unordered_set<PEdge> edges;
};
That way you can delete them in constant time like you wish. unordered_set is generally implemented with a hash table so its amortized O(1) access time.
And also unique_ptr means that you can have the unordred_sets "owning" them
If verticies are countable and have a fixed maxmimum upper limit (N), another representation would be:
struct Graph
{
typedef unique_ptr<Vertex> PVertex;
typedef unique_ptr<Edge> PEdge;
array<PVertex, N> verticies;
array<array<PEdge, N>, N> edges;
};
Where edges[i][j] holds the edge between verticies[i] and verticies[j]
If verticies[x] or edges[x][y] is nullptr in means the corresponding vertex or edge does not exist.
Old C++ Versions:
unordered_set was introduced in TR1. If you don't have this you can use boost. if you don't want to use boost you can use a plain old set which will give logn access time, or you can implement your own hash table.
unique_ptr can be replaced with auto_ptr for older versions.
array can be replaced with a regular array or with a vector.

Related

nested lists to implement adjacency lists in c++

i was going through the code for topological sorting on this website.
i understood the code except for one part which is the declaration of the adjacency list (on line 15), which is
list<int> *adj;
basically, to me, adj should be a pointer to a list of integers but in this case based on how they have used it, it is a list of lists ... so shouldn't a list of lists be
list <list<int> > adj;
can someone please explain this to me?
You could also do it like that, however what this website is creating is an array of lists, not exactly a list of lists. I think this approach (array of lists) is the usual way to represent adjacency lists in lots of programming languages.
In this way the vertices are numbered from 0 to V-1, and you can access the list of adjacents directly by using the index operator adj[i]
I don't really know the exact reason, but I imagine it's for efficiency purposes.
EDIT:
Notice that lists are, according the C++ reference, double-linked lists, so if you want to access element i, an iteration through the linked nodes is needed until you reach element i. With an array, you access directly the list that you are interested in, without iterating and therefore more efficiently.
In www.cplusplus.com we can read:
The main drawback of lists and forward_lists compared to these other sequence containers is that they lack direct access to the elements by their position; For example, to access the sixth element in a list, one has to iterate from a known position (like the beginning or the end) to that position, which takes linear time in the distance between these. They also consume some extra memory to keep the linking information associated to each element (which may be an important factor for large lists of small-sized elements).
It's important to note that the code you've linked to is rather unidiomatic. One prominent issue it has is exactly with that member you mentioned, list<int> *adj.
In modern C++, we tend to avoid using new and delete directly, when instead, we could use a smart pointer (e.g. std::unique_ptr) or a container. In this specific case, instead of:
list<int> *adj;
// ... etc. ...
Graph(int V) {
adj = new std::list<int>[V];
}
it would indeed be better to use:
std::vector<std::list<int>> vertex_adjacencies;
// ... etc. ...
Graph(std::size_t num_vertices) : vertex_adjacencies(num_vertices) { }
Now, as for your suggestion of a list-of-lists - that's also possible:
std::list<std::list<int>> vertex_adjacencies;
// ... etc. ...
Graph(std::size_t num_vertices) : vertex_adjacencies()
{
auto empty_adjacencies = std::list<int>{};
std::fill_n(
std::front_inserter(vertex_adjacencies),
num_vertices,
empty_adjacencies);
}
but it would require rewriting various other methods. Also note, that the graph is intended to have a fixed number of vertices, without vertices being added or removed, so placing the vertex-specific adjacency in a list does not make a lot of sense. (Not that a separate std::list for each vertex' adjacencies is such a good idea, performance-wise, anyway, but never mind that).

Is this a bad way to implement a graph?

When I look at a book, I only show examples of how to implement graphs in almost every book by adjacent matrix method and adjacent list method.
I'm trying to create a node-based editor, in which case the number of edges that stretch out on each node is small, but there's a lot of vertex.
So I'm trying to implement the adjacent list method rather than the adjacent matrix method.
However, adjacent lists store and use each edge as a connection list.
But, I would like to use the node in the form listed below.
class GraphNode
{
int x, y;
dataType data;
vector<GraphNode*> in;
vector<GraphNode*> out;
public:
GraphNode(var...) = 0;
};
So like this, I want to make the node act as a vertex and have access to other nodes that are connected.
Because when I create a node-based editor program, I have to access and process different nodes that are connected to each node.
I want to implement this without using a linked list.
And, I'm going to use graph algorithms in this state.
Is this a bad method?
Lastly, I apologize for my poor English.
Thank you for reading.
You're just missing the point of the difference between adjacency list and adjacency matrix. The main point is the complexity of operations, like finding edges or iterating over them. If you compare a std::list and a std::vector as datatype implementing an adjacency list, both have a complexity of O(n) (n being the number of edges) for these operations, so they are equivalent.
Other considerations:
If you're modifying the graph, insertion and deletion may be relevant as well. In that case, you may prefer a linked list.
I said that the two are equivalent, but generally std::vector has a better locality of reference and less memory overhead, so it performs better. The general rule in C++ is to use std::vector for any sequential container, until profiling has shown that it is a bottleneck.
Short answer: It is probably a reasonable way for implementing a graph.
Long answer: What graph data structure to use is always dependent on what you want to use it for. A adjacency matrix is good for very dense graphs were it will not waste space due to many 0 entries and if we want to answer the question "Is there an edge between A and B?" fast. The iteration over all members of a node can take pretty long, since it has to look at a whole row and not just the neighbors.
An adjacency list is good for sparse graphs and if we mostly want to look up all neighbors of a node (which is very often the case for graph mustering algorithms). In a directed graph were we want to treat ingoing and outgoing edges seperately, it is probably a good idea to have a seperate adjacency list for ingoing and outgoing egdes (as has your code).
Regarding what container to use for the list, it depends on the use case. If you will much more often iterate over the graph and not so often delete something from it, using a vector over a list is a very good idea (basically all graph programms I ever wrote were of this type). If you have a graph that changes very often, you have to delete edges very often, you don't want to have iterator invalidation and so on, maybe it is better having a list. But that is very seldom the case.
A good design would be to make it very easy to change between list and vector so you can easily profile both and then use what is better for your program.
Btw if you often delete one edge, this is also pretty easily done fast with a vector, if you do not care about the order of your edges in adjacency list (so do not do this without thinking while iterating over the vector):
void delte_in_edge(size_t index) {
std::swap(in[i], in.back()); // The element to be deleted is now at the last position,
// the formerly last element is at position i
in.pop_back(); // Delete the current last element
}
This has O(1) complexity (and the swap is probably pretty fast).

Rudimentary C++ Graph Implementation

I am working on a graph implementation for a C++ class I am taking. This is what I came up with so far:
struct Edge {
int weight;
Vertex *endpoints[2]; // always will have 2 endpoints, since i'm making this undirected
};
struct Vertex {
int data; // or id
list<Edge*> edges;
};
class Graph {
public:
// constructor, destructor, methods, etc.
private:
list<Vertex> vertices;
};
It's a bit rough at the moment, but I guess I'm wondering... Am I missing something basic? It seems a bit too easy at the moment, and usually that means I'm designing it wrong.
My thought is that a graph is just a list of vertices, which has a list edges, which will have a list of edges, which has two vertex end points.
Other than some functions I'll put into the graph (like: shortest distance, size, add a vertex, etc), am I missing something in the basic implementation of these structs/classes?
Sometimes you need to design stuff like this and it is not immediately apparent what the most useful implementation and data representation is (for example, is it better storing a collection of points, or a collection of edges, or both?), you'll run into this all the time.
You might find, for example, that your first constructor isn't something you'd actually want. It might be easier to have the Graph class create the Vertices rather than passing them in.
Rather than working within the class itself and playing a guessing game, take a step back and work on the client code first. For example, you'll want to create a Graph object, add some points, connect the points with edges somehow, etc.
The ordering of the calls you make from the client will come naturally, as will the parameters of the functions themselves. With this understanding of what the client will look like, you can start to implement the functions themselves, and it will be more apparent what the actual implementation should be
Comments about your implementation:
A graph is a collection of objects in which some pairs of objects are related. Therefore, your current implementation is one potential way of doing it; you model the objects and the relationship between them.
The advantages of your current implementation are primarily constant lookup time along an edge and generalizability. Lookup time: if you want to access the nth neighbor of node k, that can be done in constant time. Generalizability: this represents almost any graph someone could think of, especially if you replace the data type of weight and data with an object (or a Template).
The disadvantages of your current implementation are that it will probably be slower than ideal. Looking across an edge will be cheap, but still take two hops instead of one (node->edge->node). Furthermore, using a list of edges is going to take you O(d) time to look up a specific edge, where d is the degree of the graph. (Your reliance on pointers also require that the graph fits in the memory of one computer; you'd have trouble with Facebook's graphs or the US road network. I doubt that parallel computing is a concern of yours at this point.)
Concerns when implementing a graph:
However, your question asks whether this is the best way. That's a difficult question, as several specific qualities of a graph come in to play.
Edge Information: If the way in which vertices are related doesn't matter (i.e., there is no weight or value to an edge), there is little point in using edge objects; this will only slow you down. Instead, each vertex can just keep a list of pointers to its neighbors, or a list of the IDs of its neighbors.
Construction: As you noticed in the comments, your current implementation requires that you have a vertex available before adding an edge. That is true in general. But you may want to create vertices on the fly as you add edges; this can make the construction look cleaner, but will take more time if the vertices have non-constant lookup time. If you know all vertices before construction the graph, it may be beneficial to explicitly create them first, then the edges.
Density: If the graph is sparse (i.e., the number of edges per vertex is approximately constant), then an adjacency list is again a good method. However, if it is dense, you can often get increased performance if you use an adjacency matrix. Every vertex holds a list of all other vertices in order, and so accessing any edge is a constant time operation.
Algorithm: What problems do you plan on solving on the graph? Several famous graph algorithms have different running times based on how the graph is represented.
Addendum:
Have a look at this question for many more comments that may help you out:
Graph implementation C++

C/C++: Creating simple graph library

I've been thinking about creating a class in C++ on graph theory. The idea is it'll be a class to hold indefinite number of vertices and edges for a simple graph (at most one edge between a pair of vertices). The problem is how'd I store this indefinite number of vertices/edges in the most efficient way.
I came up with the idea of having dynamic pointer to array of vertices as a member in the class. However, it'd be inefficient, and I also encounter problem of how to determine the connection of vertices (I wouldn't be able to determine which vertices connect with which), if I use this method. The alternative is to create a class Vertex that suppose to contain information of its connectivity. However, because of indefinite number of edges, I cannot think of other way around other than to use dynamic variables inside Vertex. It'd make my code efficiency worse with this approach.
So is there a better approach?
If you do not plan to frequently add and remove items from inside the collections, I'd use STL vectors. They're fast for iterating through, but not terrible for inserts and removes in the middle.
If you want to add / remove anywhere frequently, I'd use STL lists. They're slower for iterating, but insertion / removal is O(1).
You can then define your vertex and edge as something like:
class Edge;
class Vertex
{
// ...
public:
std::list<Edge> incomingEdges;
std::list<Edge> outgoingEdges;
}
class Edge
{
// ...
public:
Vertex startpoint;
Vertex endpoint;
}
You'll pretty quickly find yourself wanting both a Vertex and Edge class -- there are too many algorithms that depend on coloring, or weighting, or marking edges, and it's also simpler to mix directed and undirected edges. The odds are good that you aren't going to really care a lot about storing the appropriate references dynamically, because that can be reduced to a vector of pointers.
Another issue to think about is if you will want to store this thing persistently.
Suggestion: try the Simplest Thing That Can Possibly Work first. Assuming an Array class that resizes itself as needed, that will look something like
class Vertex {
Array<Edge> edges ;
VertexData vd ; // define this for the task.
public:
// ctor etc; quiz: what operations?
}
class Edge {
Vertex v1, v2;
EdgeData ed;
public:
// ctor etc
}
Construct all the vertices and edges with new, don't worry about performance,and write some code against these classes.
Then go back, think how you'd have liked to write the code, and re-implement the classes to have that interface.
I'm a little prejudiced, since I used to teach the book and worked for Marshall Cline and Mike Girou, but I think one of the best C++ books for someone trying to really use it effectively is The C++ FAQBook, by Cline, Girou, and Lomow.

how boost multi_index is implemented

I have some difficulties understanding how Boost.MultiIndex is implemented. Lets say I have the following:
typedef multi_index_container<
employee,
indexed_by<
ordered_unique<member<employee, std::string, &employee::name> >,
ordered_unique<member<employee, int, &employee::age> >
>
> employee_set;
I imagine that I have one array, Employee[], which actually stores the employee objects, and two maps
map<std::string, employee*>
map<int, employee*>
with name and age as keys. Each map has employee* value which points to the stored object in the array. Is this ok?
A short explanation on the underlying structure is given here, quoted below:
The implementation is based on nodes interlinked with pointers, just as say your favorite std::set implementation. I'll elaborate a bit on this: A std::set is usually implemented as an rb-tree where nodes look like
struct node
{
// header
color c;
pointer parent,left,right;
// payload
value_type value;
};
Well, a multi_index_container's node is basically a "multinode" with as many headers as indices as well as the payload. For instance, a multi_index_container with two so-called ordered indices uses an internal node that looks like
struct node
{
// header index #0
color c0;
pointer parent0,left0,right0;
// header index #1
color c1;
pointer parent1,left1,right2;
// payload
value_type value;
};
(The reality is more complicated, these nodes are generated through some metaprogramming etc. but you get the idea) [...]
Conceptually, yes.
From what I understand of Boost.MultiIndex (I've used it, but not seen the implementation), your example with two ordered_unique indices will indeed create two sorted associative containers (like std::map) which store pointers/references/indices into a common set of employees.
In any case, every employee is stored only once in the multi-indexed container, whereas a combination of map<string,employee> and map<int,employee> would store every employee twice.
It may very well be that there is indeed a (dynamic) array inside some multi-indexed containers, but there is no guarantee that this is true:
[Random access indices] do not provide memory contiguity,
a property of std::vectors by which
elements are stored adjacent to one
another in a single block of memory.
Also, Boost.Bimap is based on Boost.MultiIndex and the former allows for different representations of its "backbone" structure.
Actually I do not think it is.
Based on what is located in detail/node_type.hpp. It seems to me that like a std::map the node will contain both the value and the index. Except that in this case the various indices differ from one another and thus the node interleaving would actually differ depending on the index you're following.
I am not sure about this though, Boost headers are definitely hard to parse, however it would make sense if you think in term of memory:
less allocations: faster allocation/deallocation
better cache locality
I would appreciate a definitive answer though, if anyone knows about the gore.