Should I use std::list or is there a better method? - c++

I am currently coding in a 2D geometry editor in c++. I am having the user place nodes. Lines and arcs can be drawn by selecting 2 nodes.
Right now, I am storing the nodes in a std::deque container (same thing for the lines and arcs) because I would like to store the address of the node into each line/arc. This makes things very convenient coding wise when I implement a feature to move the node. If I were to store the actual node inside of each line/arc, then when I want to move a node, then I would have to iterate through the entire line and arc stucture to find the node that I just moved and update the parameters. This option isn't an option on the table. Hence, the need to be able to store the address of the node inside each line/arc.
However, I am running into some issue where I need to delete the node. Looking on the reference manual, it seems that for all pointer, these are invalidated when you erase an element from the deque (unless that element is at the beginning or the end. For the sake of discussion, I will not be considering this case). This causes issue with the erasing because now, all of my lines/arc reconnect themselves to different nodes or are not drawn at all when a node is erased and the program eventually crashes.
Continuing to look online, I come across std::list which (from my understanding of reading the documentation) does not invalidate any pointers or references when one of the elements is erased. This seems to be a very nice solution to my problem.
However, I have been looking a little bit on stack overflow to see what are the benefits/disadvantages of using a list vs a deque. And it seems like there is more of a preference to use a deque then a list. It seems as though the list is slower to access then the deque. This is not good because I am not sure how many nodes a user would like to draw. For all I know, there could be 10,000+ nodes in the geometry and if the user wants to move a node, I don't want the user have to wait 30 sec for the program to iterate through all of the elements to find the node(s) to erase.
So on one hand, deque are alot faster but as soon as an element is removed, all of the pointers and references are invalidated. On the other hand, std::list allows me to erase whatever element I want without invalidating any of the pointers and references but is slower compared to a deque.
I am considering to switch to a list because even if the list is slower, if I can't erase an element without invalidating the pointers and references, then there isn't much of a benefit speed wise if the program doesn't work.
However, is using a list the best choice in my situation? Is there any way to use a deque? Or is there a third option that I haven't considered?
Edit:
I forgot to mention. One thing that I am not to fond of with lists is the inability to get an element's data directly (in std::deque and vector, I can use the at function to access elements). This isn't a huge deal breaker with my code. But it does makes things convenient. For example, when a user selects a node when they want to create a line/arc, the code iterates over the entire node list to find out which one was selected and then, for the first selection, stores the index into a variable (called firstNodeIndex). For the second node, it does the same thing but when both variables (firstNodeIndex and secondNodeIndex) are viable numbers, then the function for creating the line/arc is called and the function uses the two stored indexes to re-access the node list to grab an address to the node. If I were to use the list, I would have to store the address of the two nodes in variables and then create some additional logic to make sure that the two variables containing the addresses to the two nodes are viable options.
Another alternate solution would be to reiterate through the entire node list again to grab the nodes that are selected (I would have a variable inside each node to indicate that it is selected). But I am afraid that this might not be a good idea given std::list limitations.
I am kind of in favor of my first way but I am open to change if need be or if there is a better method

So your problem is that you don't want your iterators invalidated when you insert or erase element, but you want your data structure to be fast.
Linked list is only slow when you have to iterate all elements frequently. In does not take advantage of continuous data access like vector or deque. Also linear search in list is slow.
I had similar situations. Here are some options:
Use list and try to avoid linear searches. See if memory access speed of linked list affect your performance significantly and if it doesn't - use it.
Use map or set. Same cons as list except search, which is O(logn). Or you can use unordered versions if you don't care about sorting elements.
Use non-standard data structure like plf::colony. If you don't care about order of insertion, this is probably your best option.
Create your own deque-like data structure that does not invalidate iterators (using skipfields or storing free elements somewhere). I wouldn't recommend it since you will probably end up writing something like plf::colony anyway.

A rule of thumb:
will I want to add and delete items at random?
set, list, map, multimap and unordered versions of same
will I want to be able to name individual items and find them quickly?
map, set, multimap and unordered versions of same
does the thing I am storing have mutable data, or is it more detailed than just its name (key)?
map, multimap, unordered versions thereof
do I need the items to say in order?
yes: map, no: unordered_map

Related

Can I create an iterator by copying the structure to a list and returning iterator of the list?

If I have a somewhat complex structure (such as hash table with chaining) and I want to create a custom iterator for the structure, is it valid to copy the contents of the complex structure into some sort of simple structure (such as a list) and then return the implicit iterator over the simple structure?
I realize it would take extra memory but are there any other reasons why I shouldn't just do that as opposed to creating my own iterator from a scratch?
Ultimately, yes you can do this if you don't need to edit elements in the original collection via your iterator.
You identify the memory issue; are there other reasons you shouldn't do this? There's the time taken to create the list. You'd either need to recreate this list copy every time you want to iterate or you'd have to make sure you keep the list up-to-date if the original collection can change.
That cost is particularly unfortunate if you wanted to use your iterator to do something like find the first element that meets some rule. If the first element meets the rule but there are a large number of elements then you end up doing a lot of copying in order to eventually only iterate up to the first element.
You can however write your own iterator to do the same job as your nested loops. Its hard to give a decent code example without knowing the structure you're trying to iterate, but in general you're likely to implement it using a class that holds an iterator of elements within a subcollection this is advanced until that current subcollection has been fully iterated and then moves on to the start next subcollection. So your iterator also has an iterator of the collections i.e. 2 iterators - one returns an element and one returns a (sub)collection.

C++/STL Structure for Indexed Linked List (Indices in Hash Table)

I'm looking for a way to remember locations in a doubly-linked list (in hash tables or other data structures).
In C, I would add prev and next pointers to my struct. Then, I could store references to elements of my struct wherever I wanted, and refer to them later. I need only maintain these prev/next pointers to manipulate my linked list, and stored references to locations in the list will stay updated.
What is the C++ approach to this problem?
The end goal is an data structure (which is sequenced, but not ordered, i.e. no comparison function exists, but they are relatively sequenced based on where they are inserted). I need to cheaply insert, delete, move objects as the structure grows. But I also need to cheaply look up each element by some key unrelated to the ordering, and I look up meaningful locations (like head, tail, and various checkpoints in the structure called slices). I need to be able to traverse the sequenced list after looking up a starting place by key or by slice.
Head and tail will be free. I was planning a hash table that maps the keys to list elements, and another hash table that maps slices to list elements.
I asked a more specific question related to this here:
Using Both Map and List for Same Objects
The conclusion I made was that I would need to maintain both a List and various Maps pointing to the same data to get the performance I need. But doing this by storing iterators in C++ seemed subpar. Instead it seemed easier to reimplement linked list (building it into my class) and using STL maps to point to data.
I was hoping for some input about which is a more fruitful route, or if there is some third plan that better meets my needs. My assumption is that the STL implementation of unordered_map is faster than anything I would implement, but I could match or beat the performance of list since I'm only using a subset of its functionality.
Thanks!
More precise description of my data/performance requirements:
Data will come in with a unique key. I will add it into a queue.
I'll need to update/move/remove/delete this data in O(1) based on its unique key.
I'll need to insert new data/read data based on metadata stored in other data structures.
I was speaking imprecisely when I said very large list above. The list will definitely fit into memory. Space is cheap enough that it is worth using other data structures to index this list.
I understand your requirements as being:
the data has a unique key
update/move/remove/delete this data in constant time, using its unique key
According to this the best fit would be the unodered_map: It works with a key, and uses a hash table to access the elements. In average insert, find, update is constant time (thanks to the hash table), unless the hash function is not appropriate (i.e. worst case if all elements would yield the same hash value, you would have linear time, as in a list, due to the colisions).
This seems also to match your initial intention:
Head and tail will be free. I was planning a hash table that maps the
keys to list elements, and another hash table that maps slices to list
elements.
Edit: If you need also to master sequencing of elements, independently of their key, you'd need to build a combined container, based on a list and an unordered_map which associates the key to an iterator to the element in the list. You'd then have to manage synchronisation, for example:
insert element: get iterator by inserting element into list, then add the iterator to the unordered_map using the element's key.
remove element: find iterator to element by searching for the key in the unordered_map, erase element in the list using this iterator, and finally erase the key in the unordered_map.
find element: find iterator to element by searching for the key in the unordered_map
sequential iteration: use the iterator to the begin of the list.
I'd route you to STL containers to browse... but when you write word 'very large' (and I'm currently Big Data professional) everything changes.
Nobody usually gives you good advice for scalability but ... here are points.
What is 'very large' in your case? Does std::list fit your needs? Before 3rd paragraph everything looks suitable if you are not too large. Do your structure fits in memory?
How about your structure aligned to memory manager? Simply C-like list with 'prev' and 'next' has serious disadvantage - every element usually is allocated from memory manager. If you are large, this matters and gives your memory over-usage.
What do you expect to be element external reference? If you use pointers - you loose ability to perform optimization on your structure. But probably you don't need it.
Actually you definitely need to consider some 'pools' management if you are really large and indices in such pools can be pretty good references if you modify your structure intensively.
Please consider about large twice. If you mean really large - you need special solution. Especially if your data is larger than your memory. If you are not so large - why not start with just std:list? When you answer to this question, probably your life could be much more easy ;-).

What container should I use for the game objects that are created and deleted frequently?

I am doing a game where I create objects and kill them frequently. I must be able to loop the list of objects linearly, in a way that the next object is always newer than previous, so the rendering of the objects will be correct (they will overlap). I also need to be able to store the pointers of each object into a quadtree, to quickly find nearby objects.
My first thought was to use std::list, but I have never done anything like this before, so I am looking for experts thoughts about this.
What container should I use?
Edit: I am not just deleting from the front: the objects can be killed at any order, but they are always added in the end of the list, so last item is newest.
std::vector is the recommended container to start with when you're not sure what you're doing. Only when you know that's not going to work for you should you choose something else.
That said, if you're regularly adding to the back of the container and deleting from the front, you probably want std::deque. [Edit] But it appears that's not what you're doing.
I'm thinking you might want two containers, one to maintain the insert order and one for your quadtree. There are lots of quadtree implementations on the Internet, so I'll focus on the other one. Using std::list as you suggest will make the delete operation itself faster than vector or deque. It also has the advantage of letting you store iterators, because list won't invalidate the other iterators when an element is removed. Your objects in the quadtree could maintain an iterator into the insert order list. When you remove an element from the quadtree, you can remove it from the list too in O(1) time.
As always, the decision about which container to use is all about tradeoffs. A list comes with increased memory footprint over vector and the loss of contiguous memory layout. You might be surprised how much cache locality matters when your data set is large. The only way to know for sure is to try various containers and see which one runs the best for your application.
I think boost::stable_vector fits your needs for deletion\iteration.
So, you want to be able to iterate through through your container in the order in which the items have been added, but you want to be able to remove items from any point in the container. A simple queue obviously isn't going to hack it.
Happily, there are 4 containers that will do this job easily enough, std::vector, std::list and std::deque and std::set. If you use standard container idioms (eg. begin, end, erase, insert, and to a lesser extent, push_front, pop_back, front, back) you can use whichever container you felt like. With those 8 operations, you could switch between std::vector, std::list and std::deque, and with just the first 4 you could use std::set, too. Write your code, and then you can easily chop and change between the different container types and do a little profiling to compare performance and memory overheads or whatever.
Intuitively, std::list is probably a good bet, and perhaps std::set would work too. But rather than making assumptions, just use the general tools the template library gives you, and profile and optimise things later when you have some meaningful performance data to work with.

Removing specific node(s) from STL List

I need a list of items (some object) to be maintained with the following operations supported:
Insertion at end
Removal from any pos
STL list seems to be the right choice. However, how can I do the 2nd operation in constant time? I could save pointer to each Node somewhere and delete them directly, but erase takes an iterator so that wouldn't work.
Example usage:
items = list<myobj>..
items.push_back(obj1)
items.push_back(obj2)
items.push_back(obj3)
items.remove(obj2) // <- How to do this in constant time.
If push_back somehow gave access to node, i could've used:
map[obj1] = items.push_back(obj1)
items.remove(map[obj1])
One option is to use iterator in the map, Is there any simpler way?
A compromise would be a red-black tree like most std::set implementations. It's O(log n) for insert, search, and delete. Trees are basically the ultimate compromise data structure. They're not amazing for anything, but they always get the job done well enough.
Otherwise, profile your code with both a linked list and a vector, and find out if resizing the vector is really as terrible as it could be (this'll depend on how often you do it).
There may be a better (inelegant, but very effective) solution that might just take care of your problems. You haven't mentioned how you'll be consuming the list, but if you can set aside a value as being unused, you can use a vector and simply mark an element as deleted instead of actually deleting it. Or use a vector and a have a separate bitset (or C++03 vector<bool>) telling you if an item is deleted or valid. To delete it, you just have to flip the bit in the bitmap. When iterating, just remember to check the deletion bitmap before processing the content. If memory isn't an issue and only speed matters and easiness/clarity matter, replace the vector<object> with vector<pair<bool, object>> where the bool is the deletion flag.
You might want to look at Boost.MultiIndex. This allows you to combine several indices over a data structure. So you can have constant time lookup and removal of any element, at the expense of linear space overhead.

Is there a data structure that doesn't allow duplicates and also maintains order of entry?

Duplicate: Choosing a STL container with uniqueness and which keeps insertion ordering
I'm looking for a data structure that acts like a set in that it doesn't allow duplicates to be inserted, but also knows the order in which the items were inserted. It would basically be a combination of a set and list/vector.
I would just use a list/vector and check for duplicates myself, but we need that duplicate verification to be fast as the size of the structure can get quite large.
Take a look at Boost.MultiIndex. You may have to write a wrapper over this.
A Boost.Bimap with the insertion order as an index should work (e.g. boost::bimap < size_t, Foo > ). If you are removing objects from the data structure, you will need to track the next insertion order value separately.
Writing your own class that wraps a vector and a set would seem the obvious solution - there is no C++ standard library container that does what you want.
Java has this in the form of an ordered set. I don't thing C++ has this, but it is not that difficult to implement yourself. What the Sun guys did with the Java class was to extend the hash table such that each item was simultaneously inserted into a hash table and kept in a double linked list. There is very little overhead in this, especially if you preallocate the items that are used to construct the linked list from.
If I where you, I would write a class that either used a private vector to store the items in or implement a hashtable in the class yourself. When any item is to be inserted into the set, check to see if it is in the hash table and optionally replace the item in there if such an item is in it. Then find the old item in the hash table, update the list to point to the new element and you are done.
To insert a new element you do the same, except you have to use a new element in the list - you can't reuse the old ones.
To delete an item, you reorder the list to point around it, and free the list element.
Note that it should be possible for you to get the part of the linked list where the element you are interested in is directly from the element so that you don't have to walk the chain each time you have to move or change an element.
If you anticipate having a lot of these items changed during the program run, you might want to keep a list of the list items, such that you can merely take the head of this list, rather than allocating memory each time you have to add a new element.
You might want to look at the dancing links algorithm.
I'd just use two data structures, one for order and one for identity. (One could point into the other if you store values, depending on which operation you want the fastest)
Sounds like a job for an OrderedDictionary.
Duplicate verification that's fast seems to be the critical part here. I'd use some type of a map/dictionary maybe, and keep track of the insertion order yourself as the actual data. So the key is the "data" you're shoving in (which is then hashed, and you don't allow duplicate keys), and put in the current size of the map as the "data". Of course this only works if you don't have any deletions. If you need that, just have an external variable you increment on every insertion, and the relative order will tell you when things were inserted.
Not necessarily pretty, but not that hard to implement either.
Assuming that you're talking ANSI C++ here, I'd either write my own or use composition and delegation to wrap a map for data storage and a vector of the keys for order of insertion. Depending on the characteristics of the data, you might be able to use the insertion index as your map key and avoid using the vector.