Should I use iterators or descriptors to keep a reference on an edge or vertex? - boost-graph

I am currently designing an application composed of both a Boost Graph (adjacency_list) and several classes referencing edges or vertices from this structure.
My question is: what is the recommended way to maintain a reference to a node or a vertex?
I guess that in the iterator case the object access is faster but the iterator can be invalidated by dynamic changes on the graph structure.
On the opposite, a descriptor is an id which means that a search is necessary to retrieve the data but may be less prone to trigger a memory error in case the graph changes.
Is it true?

Stability of iterators/descriptors and efficiency of iterators both depend on your vertex container.
For vectorS for example the vertex descriptor is just the index of the vertex in the vector, so lookups in the container are as fast as indexing into a vector. The descriptor is just as unstable as the iterator in this instance as insertions & deletions can cause elements to move around.
For listS I expect (read: 'guess') the descriptor is the address of the element, so both descriptor and iterator are likely to have the same stability guarantees. In this case, using the vertex descriptor to get access to the property may well be as efficient as the iterator.
For more information on adjacency_list iterator/descriptor stability read the section titled Iterator and Descriptor Stability/Invalidation on this page. With performance concerns you're best profiling the 2 to compare, and only when it seems to be a bottleneck in your application.

Related

Why does a boost adjacency_list using vecS as OutEdgeList template parameter invalidate edges on traversal?

I'm reading the documentation about adjacency_list and the impact choosing the graph type has on memory consumption, big-O runtime, and descriptor stability.
Quoting the first link:
The following table summarizes which operations cause descriptors and iterators to become invalid. In the table, EL is an abbreviation for OutEdgeList and VL means VertexList. The Adj Iter category includes the out_edge_iterator, in_edge_iterator, and adjacency_iterator types.
A more detailed description of descriptor and iterator invalidation is given in the documentation for each operation.
I understand that if one uses vecS for the VertexList that on remove_vertex() all vertex descriptors will be adjusted so that they are still continuous.
I do not understand why, apparently, using vecS causes the edge iteration to invalidate edge descriptors.
Do I understand the table correctly in that it is saying "If you use vecS for the Edge List type on an adjacency_list graph that is directedS then you can not stably iterate over the edges because iterating over them will invalidate the edge iterators and edge descriptors"?
If I do understand this correctly, please explain why this is the case. If I misunderstand, please clarify the actual effect using vecS for the Edge List has.
Thank you.
You’re misreading it, as you suspected.
The confusion is that the columns “Edge Iter”, “Vertex Iter” and “Adj Iter” are abbreviating for “Iterator”, not “iteration”.
The mere act of iteration never changes the adjacency_list, so cannot invalidate descriptors nor iterators.
There are graph models where the descriptors are more stable than the iterators (the iterators. That’s actually the key reason for the descriptor concepts. Any container selector with reference stability (i.e. node-based containers) will naturally have iterator stability (only invalidating iterators to erased nodes).
The table is useful because for performance on “immutable” (or change rarely, query-often) graphs can be greatly enhanced by choosing contiguous storage (like vecS) and they will naturally impose more restricting invalidation rules (e.g. vectors may reallocate, which invalidates all iterators, but the descriptor might remain stable up to the index of modification/insertion).
Tip
To get a raw compile-time check on basic invalidation issues, consider taking your graph by const reference. That will eliminate the off-chance of unintended modifications. Of course, in some cases you really want to walk the edge (for performance) where you want to performa modifications to the graph and you’ll have to per-use the documentation to see exactly what invalidation rules apply to that modification.

Why does the Boost Graph Library invalidate all iterators when removing a vertex?

In the Boost Graph Library documentation it says that when you remove a vertex from a graph (when its vertices are stored in a vector at least), all iterators (and descriptors) are invalidated.
This surprised me, as it seems not be semantically necessary to do so.
Is there a way to make adjacency_list work in a way that doesn't aggressively invalidate iterators in such a case? Can't I somehow just 'invalidate' the vertex and garbage-collect it at some convenient time?
It's just the underlying container semantics: Iterator invalidation rules
It's also clearly a trade-off: you get O(1) vertex indexing for free.
If you want something else, use a different container selector (e.g. listS ) for the vertex container.
Isn't it (generally) better to just 'invalidate' the vertex in the vector and garbage-collect it when the next resize is necessary?
It's indeed a common pattern to mark vertices as deleted. boost::filtered_graph<> is a handy adaptor for such cases.

BGL: storing vertex descriptors in a way that they won't invalidate

The question is about the Boost Graph Library.
Suppose that we store an object with each vertex of the graph and there is a one-to-one correspondence between the vertices and the objects. Suppose further that we maintain an std::map to enable looking up a vertex descriptor that corresponds to a given object.
However, this solution seems to be prone to invalidation of vertex descriptors in the case of a vertex being deleted. Is there a way to get around this problem?
In this question, the following sentence appears:
I want to store vertex descriptors in a way that they wont invalidate in case I remove a vertex, so I use boost::listS
It seems like the author of that question has a solution to the problem of vertex invalidation, but I do not understand what it is.
EDIT: to clarify the reason for maintaining a map. The need to look up a vertex based on an object arises in the following scenario. Suppose that we have generated an object. We need to look up a vertex that corresponds to an object that's equal (in the sense of operator==) to an object that we've just generated.
Using a listS or setS for the vertex container selector makes the invalidation guarantees equal to the corresponding standard library container (Iterator invalidation rules).
In general, the node-based containers do not invalidate any iterators unless removed.
At this point I would like to suggest bundled properties too, where you don't have to maintain a mapping from object to vertex descriptor/index at all.
Sample (coming: https://www.livecoding.tv/sehe/)

Implementing a set cover data structure

I want to implement a data structure which represents the abstract data type "cover of a set". The elements of the set are represented by integer indices, and so are the subsets. Each element uint64_t e is assigned to at least one but possibly multiple subsets uint64_t s. This can be implemented by storing the subset indices in a std::vector. The number of subsets to which any element will be assigned is usually much smaller than the total number of elements.
Performance (time and memory) is important, so which implementation would you recommend?
std::vector<std::vector<uint64_t>>
std::vector<std::unordered_set<uint64_t>>
std::vector<std::set<uint64_t>>
anything else?
Frequent operations include:
assigning an element to a subset
removing an element from a subset (and possibly moving it to another)
checking whether an element is a member of a particular subset
getting all the subsets to which an element belongs
efficient iteration over all elements of a particular subset would be nice, but I believe this conflicts with other goals
The fastest (in terms of implementation time) would be a pair of data structures:
std::vector< std::unordered_set<X> > set_to_elements;
std::unordered_map< X, std::unordered_set<std::size_t> > element_to_sets;
with the two kept coherent. boost multi index containers may be able to do this bi directional stuff a bit more efficiently.
assigning an element to a subset
set_to_elements[subset].insert(element);
element_to_sets[element].insert( subset );
removing an element from a subset (and possibly moving it to another)
set_to_elements[subset].erase(element);
element_to_sets[element].erase( subset );
checking whether an element is a member of a particular subset
return set_to_elements[subset].find(element) != set_to_elements[subset].end();
or
return element_to_sets[element].find(subset) != element_to_sets[element].end();
getting all the subsets to which an element belongs
return element_to_sets[element];
efficient iteration over all elements of a particular subset would be nice, but I believe this conflicts with other goals
return set_to_elements[subset];
All operations are constant time and linear memory. Memory and time requirements are roughly double what a compact one that requires only one of the last two above.
Micro optimizations to cache the result of [] operations should be done in real code if it is actually performance sensitive. Storing iterators from one container to the other, to make operation #1 and #2 faster, are optional, and might make them a touch faster, but I wouldn't bother.
You could try to read the paper Segmented Iterators and Hierarchial Algorithms by Matt Austern. He discusses how to efficiently process hierarchial data structures of the form container<container<T>>. One problem that needs to be solved is iterating as if you have a flat container<T>. For this, the Standard Library algorithms need to be specialized for so-called segmented iterators.
A segmented iterator is a two-level data structure that -apart from performing top-level iteration- also contains a local iterator to go one level deeper. Because these local iterators themselves can also be segmented iterators, this allows arbitrarily nested data structures (such as trees and graphs).
A set cover for a discrete set can be constructed as std::vector<std::set<T>>. To apply STL algorithms to such a container is either cumbersome or requires segmented iterators and hierarchial algoritms. Unfortunately, neither the Standard Library nor Boost actually implement this, so you have some work to do there.

STL Container, Speed of Creation/Destruction

Background:
I am creating an efficient(hopefully) collision detection system for my game engine- it's introduced a slight problem when I place large amounts of objects on the screen. My problem is this:
I will be adding and removing objects regularly and I have several manager classes that keep track of the objects at any given time which means a lot of adding and removing these objects from containers. I've been using vectors and deques for most of this, which is fine, however I would greatly like to upgrade the core speed of the system.
Thus the question: Which container ((STL or not) [preferably the former]) gives me the quickest (order doesn't matter) addition, removal, and random access of elements?
I have been thinking that I'll use a set, I will iterate through the elements, though not as often as I'll be utilizing the other three functions.
Additional Info: essentially I'm splitting my screen into a grid of undefined size, and when an object moves I'm going to find the square that the upper left corner is currently in, then the lower right corner (assuming object is squareish of course) thus I'll know all current grid positions the object occupies. When I do collision detection, I'll only run checks on the grid positions with more than one object, when I check for collisions it will hopefully be much faster than my previous system =]
std::set is unlikely to offer better performance: it is a node-based container, so each element requires a dynamic allocation, which can prove expensive.
Consider sticking with std::vector: it offers constant-time random access to all elements in the sequence and constant-time insertion and removal at the end of the sequence.
Since you say that order does not matter, you can also get constant-time removal of any element from the middle of the sequence by moving the element from the end of the sequence to have it replace the element being removed; something like this:
void remove_element(std::vector<Entity>& v, std::vector<Entity>::iterator it)
{
std::vector<Entity>::iterator last_element_it = v.end() - 1;
if (it != last_element_it) {
using std::swap;
swap(*it, *last_element_it);
}
v.erase(last_element_it);
}
Which container ((STL or not) [preferably the former]) gives me the quickest (order doesn't matter) addition, removal, and random access of elements?
None of them. You have to pick which things you want.
Addition to a std::vector at the end is fast, as is removal from the end. But insertion/removal anywhere else will hurt at least somewhat.
Insertion and removal from a std::list will be very fast no matter where, but there's no random access.
A std::deque has std::vector-like insertion and removal from the beginning as well as the end, and it does have random access.
My question is this: how often do you need random access for a list of collision objects? Won't most of your operations be iterating through the list (for each object do X)? If so, I'd go for a std::list.
Alternatively, you could use a std::map, where the map's key is some kind of unique entity ID. This will give you slower insertion/deletion than std::list (due to the needs of a balanced binary tree), but you will get the ability to access an entity by identifier reasonably quickly. Which can be important.
A std::map is probably halfway between a std::vector/deque and a std::list in this regard. Slower insertion/deletion than a list, slower random access than a vector/deque, but you do get some of both.
That being said, I highly doubt that this kind of optimization will be terribly useful to you. How many of these objects are you going to have, maybe a few thousand? How often are you touching them? Do you really think that the kind of container you use will be a significant factor in your collision system's performance?
Get your collision algorithms optimized before bothering with the container for them.