Deleting edges in a graph stored as adjacency list - c++

I am trying to implement an algorithm for finding an Eulerian path in undirected graph stored as adjacency list. I need a fast way(linear time) to remove an edge from the graph.
My initial idea was to use something like
vector<list<pair<Vertex, List<Vertex>::iterator>>> Graph
so when I delete the edge in one direction I will have a fast way to delete it in the oposite direction using the iterator to the place where it is stored for the reverse direction. However several sources claim that those iterators won't be valid anymore, because as I start deleting items the pointer structure will become different and those iterators won't point to the right elements anymore.
My question is, is there a way to achieve deleting an edge in O(1) time using adjacency lists or is there a way to mark the edge somehow, so when I am in the adjacent vertex I will know for sure that the edge in the oposite direction was traversed. Thanks in advance.

I need a fast way(linear time) to remove an edge from the graph.
It's possible, but you have to change your graph representation, because of problems you have described.
Approach 1 -- guaranteed O(logE) complexity
Just use std::set instead of std::list:
std::vector<std::set<int>> Graph;
This allows to traverse & process all adjacent nodes in the same manner:
// adj is your graph,
// v is current vertex
for (auto &w : adj[v]) {
// process edge [v, w]
}
But you can remove opposite edge in O(logE):
// remove [v,w] and [w,v]
adj[v].erase(w);
adj[w].erase(v);
Approach 2 -- average O(1), worst case O(E)
Constant time complexity is possible with std::unordered_set, but only on average:
std::vector<std::unordered_set<int>> Graph;
Traversing and erasing patterns stay the same, but personally I would prefer approach 1.

Related

what are the containers in C++ STL to store a small number of integer values and find them in O(1)

Suppose, I want to create a vector of vectors to store/find the edges between the nodes in a graph. There are many points in the graph which doesn't have any edge and I don't want to save them. e.g. there are 2 millions nodes which 1.5 million of them don't have any edge.
Moreover, each node which I save could have 1 to couple hundreds edges.
After, I saved all the edges, I want to remove the edges which are not exist in both direction. So, if edge (i,j) exist but edge(j,i) doesn't exist, I want to erase the (i,j).
I used "vector of vector" to communicate what I want to create and I know it doesn't scale as it would be completely dense.
So, using vector of vectors format, I start to go through the first vector(suppose it is i) and for each item in it's the second vector (suppose it is j), I need to check if there is i in the second vector of j'th first vector. Which I need to be fast and preferably constant time. Something like hash table which I think std::set might help.
At this point, if the other edge (j,i) does not exist, I need to remove the current edge (i,j).
What would be a good container for my scenario?

Datastructure for undirected graph edges with constant time complexity

I have a undirected graph where the nodes are stored in a flat array. Now I am looking for a data structure for the edges. It should have constant time complexity for getting all edges of a given node. An edge contains two node indices and additional information such as a weight.
The only way I see is duplicating the data, one sorted by the left node and another sorted by the right node.
vector<vector<int>> left, right;
But I would like to prevent duplicating the edges.
It sounds like you just want an adjacency list representation.
In this representation, each node would store a list of all its connected edges.
For an undirected graph, you can have each endpoint both store the edge.
There isn't really a way to get the connected edges for a node in constant time without some duplication. But you can just store a pointer, reference or unique ID (which can be an index in an edge array, for example) to the actual edge, preventing the need to actually have 2 copies of it floating around.
Make a vector of vectors.
Each node will have a vector of all the nodes it has.
You should build this during the graph creation.

Comparison of Graph implementation in C++ using stl

To implement a Graph we can use vector of lists std::vector<std::list<vertex>>
but i have seen somewhere if use maps like this std::map<vertex, std::set<vertex>> then we can do better. Can anybody please figure it out how this is better option than first one in terms of memory or speed whatever in which it is better?
There are two differences to note here.
std::vector<std::list<vertex>> is what is known as an "adjacency list", and std::map<vertex, std::set<vertex>> is known as a an "adjacency set", with the added difference that there is hashing of the vertex array index using a map instead of a vector. I'll talk about the first difference first (that is, list<vertex> vs set<vertex>).
The first implementation is basically an array of linked lists, where each linked list gives all the vertices adjacent a vertex. The second implementation is an ordered map mapping each vertex to a set of adjacent vertices.
Comparison of Adjacency List vs Adjacency Set order of growth:
Space: (E + V) vs (E + V)
Add Edge: 1 vs log V
Check Adjacency: (degree of vertex checked) vs log V
Iterating through Neighbours of a vertex: (degree of vertex checked) vs (log V + degree of vertex checked)
... where E is the number of edges and V the number of vertices, and degree of a vertex is the number of edges connected to it. (I'm using the language of an undirected graph but you can reason similarly for directed graphs). So if you have a very dense graph (each vertex has lots of edges, i.e. high degree) then you want to use adjacency sets.
Regarding the use of map vs vector: insert and erase are O(N) for vector and O(log N) for map. However lookup is O(1) for vector and O(log N) for map. Depending on your purposes you might use one over the other. Though you should note that there are cache optimizations and such when you use a contiguous memory space (as vector does). I don't know much about that however, but there are other answers that mention it: vector or map, which one to use?

Hashmap to implement adjacency lists

I've implement an adjacency list using the vector of vectors approach with the nth element of the vector of vectors refers to the friend list of node n.
I was wondering if the hash map data structure would be more useful. I still have hesitations because I simply cannot identify the difference between them and for example if I would like to check and do an operation in nth elements neighbors (search,delete) how could it be more efficient than the vector of vectors approach.
A vector<vector<ID>> is a good approach if the set of nodes is fixed. If however you suddenly decide to remove a node, you'll be annoyed. You cannot shrink the vector because it would displace the elements stored after the node and you would lose the references. On the other hand, if you keep a list of free (reusable) IDs on the side, you can just "nullify" the slot and then reuse later. Very efficient.
A unordered_map<ID, vector<ID>> allows you to delete nodes much more easily. You can go ahead and assign new IDs to the newly created nodes and you will not be losing empty slots. It is not as compact, especially on collisions, but not so bad either. There can be some slow downs on rehashing when a vector need be moved with older compilers.
Finally, a unordered_multimap<ID, ID> is probably one of the easiest to manage. It also scatters memory to the wind, but hey :)
Personally, I would start prototyping with a unordered_multimap<ID, ID> and switch to another representation only if it proves too slow for my needs.
Note: you can cut in half the number of nodes if the adjacency relationship is symmetric by establishing than the relation (x, y) is stored for min(x, y) only.
Vector of vectors
Vector of vectors is good solution when you don't need to delete edges.
You can add edge in O(1), you can iterate over neighbours in O(N).
You can delete edge by vector[node].erase(edge) but it will be slow, complexity only O(number of vertices).
Hash map
I am not sure how you want to use hash map. If inserting edge means setting hash_map[edge] = 1 then notice that you are unable to iterate over node's neighbours.

Graph Representation in C++

I am going through a book The Design and Analysis of Computer Algorithms Reading through the Graph chapter, I am trying to implement DFS. By Reading definition of this algorithm it says, Graph G=(V,E) partiions the edges in E into two sets T and B. An Edge (v,w) is place in set T if vertes w has not been previously visited when we are at vertex v considering edged (v,w) , otherwise edge `(v,w) is place in set B.
Basically his algorithm of DFS will give me new Graph which will be G=(V,T). I want to know how one would implement this in C++.
I tried using adjacency list, but I am confuse is there a need of storing edges of just a map of list should be fine.
In VTK, edges are stored in a vector, and it always stores a pair (v,w). Near this vector there are 2 other vector of vectors to store in and out edges of graph nodes. When a new edge is added, it added to edge vector, its nodes (v,w) are added to in and out edges vector of vectors, too.
I am not quite clear about what your exact question is. I assume that you are asking about how to maintain two sets T and B to distinguish edges that have been visited from edges that have been not during DFS. I think the easiest way to do so is to add a bool field "visited" to the node struct in your adjacency list. Initial value of this field for all nodes are "false". Suppose in the above case, when DFS come to v, and the edge (v,w) is not visited, then the node on the list of v that corresponds to w would have a value "false" for "visited" at that time. Otherwise it will have a value of "true".
I think the author just try to give you the idea that edges will be categorized into two kinds: visited and not visited at the end of DFS. But I don't think keep two explicit sets maintaining those two kinds of edges are necessary. You can always print the visited edges after DFS according to their updated "visited" value.