Is it possible to directly access a position (not a key) in a multimap - c++

Question: How do you access a value at a specified position in a key range without loops?
The only possible way I know to acquire this data is by incrementing the iterator however many times the position is from the beginning or end of a key range.
edit The reason I am reluctant in using loops is to reduce processing time by getting the wanted value when the values position in an index is known.

As properly stated in the comments, you basically cannot do this on multimap. Or on map. Or on any container that does not support random access. The simple answer to the question "Why I cannot do this?" is "Because it is not in the interface".
The longer answer requires a minimal understanding of the implementation of different containers. Elements of, say, vector are stored in the memory consequently. Knowing address of the i-th element, you can add k and acquire the address of i+k-th element.
Maps (and multimaps) are different. To allow an efficient way of associative access they use some kind of trees as an underlying data structure. The simplest is the binary search tree. And all nodes of the tree are allocated in the heap. You don't know where before you actually access them. But you can access a node only through other nodes.
What you can do is to go through all elements and store their addresses in a vector, so they can now be accessed "randomly". However, this vector is invalidated once a new element is added or an element is removed from the map. There is no magic data structure which allows you both effective associative and random access.

Related

How to implement a map with limited size

I would like to implement a map whose number of elements never exceeds a certain limit L. When the L+1-th element is inserted, the oldest entry should be removed from the map to empty the space.
I found something similar: Data Structure for Queue using Map Implementations in Java with Size limit of 5. There it is suggested to use a linked hash map, i.e., a hash map that also keeps a linked list of all elements. Unfortunately, that is for java, and I need a solution for C++. I could not find anything like this in the standard library nor in the boost libraries.
Same story here: Add and remove from MAP with limited size
A possible solution for C++ is given here, but it does not address my questions below: C++ how to mix a map with a circular buffer?
I would implement it in a very similar way to what is described there. An hash map to store the key-value pairs and a linked list, or a double-ended queue, of keys to keep the index of the entries. To insert a new value I would add it to the hash map and its key at the end of the index; if the size of the has at this point exceeds the limit, I would pop the first element of the index and remove the entry with that key from the has. Simple, same complexity as adding to the hash map.
Removing an entry requires an iteration over the index to remove the key from there, which has linear complexity for both linked lists and double-ended queue. (Double-ended queues also have the disadvantage that removing an element has itself linear complexity.) So it looks like the remove operation on such a data structure does not preserve the complexity as the underlying has map.
The question is: Is this increase in complexity necessary, or there are some clever ways to implement a limited map data structure so that both insertion and removal keep the same complexity?
Ooops, just posted and immediately realized something important. The size of the index is also limited. If that limit is constant, then the complexity of iterating over it can also be considered constant.
Well, the limit gives an upper bound to the cost of the removal operation.
If the limit is very high one might still prefer a solution that does not involve a linear iteration over the index.
I would still use an associative container to have direct access and a sequential one to allow easy removal of the older item. Let us look at the required access methods:
access to an element given its key => ok, the associative container allows direct access
add a new key-value pair
if the map is not full it is easy: push_back on the sequence container, and simple addition to the associative one
if the map is full, above action must happen, but the oldest element must be removed => front on the sequence container will give that element, and pop_front and erase will remove it, provided the key is contained in the sequence container
remove an element given by its key => trivial to remove from an associative container, but only list allows for constant time removal of an element provided you have an iterator on it. The good news is that removing or inserting an element in a list does not invalidate iterators pointing on other elements.
As you did not give any requirement for keeping keys sorted, I would use an unordered_map for the associative container and a list for the sequence one. The additional requirements is that the list must contain the key, and that the unordered_map must contain an iterator to its corresponding element in the list. The value can be in either container. As I assume that the major access will be the direct one, I would store the value in the map.
It boils down to:
a list<K> to allow identification of the oldest key
an unordered_map<K, pair<V, list<K>::iterator>>
It doubles the storage for key and adds an additional iterator. But keys are expected not to be too big and list::iterator normally contains little more than a pointer: this changes a small amount of memory for speed.
That should be enough to provide constant time
key access
insertion of a new item
key removal of an item
You may want to take a look at Boost.MultiIndex MRU example.

Should I use std::list or is there a better method?

I am currently coding in a 2D geometry editor in c++. I am having the user place nodes. Lines and arcs can be drawn by selecting 2 nodes.
Right now, I am storing the nodes in a std::deque container (same thing for the lines and arcs) because I would like to store the address of the node into each line/arc. This makes things very convenient coding wise when I implement a feature to move the node. If I were to store the actual node inside of each line/arc, then when I want to move a node, then I would have to iterate through the entire line and arc stucture to find the node that I just moved and update the parameters. This option isn't an option on the table. Hence, the need to be able to store the address of the node inside each line/arc.
However, I am running into some issue where I need to delete the node. Looking on the reference manual, it seems that for all pointer, these are invalidated when you erase an element from the deque (unless that element is at the beginning or the end. For the sake of discussion, I will not be considering this case). This causes issue with the erasing because now, all of my lines/arc reconnect themselves to different nodes or are not drawn at all when a node is erased and the program eventually crashes.
Continuing to look online, I come across std::list which (from my understanding of reading the documentation) does not invalidate any pointers or references when one of the elements is erased. This seems to be a very nice solution to my problem.
However, I have been looking a little bit on stack overflow to see what are the benefits/disadvantages of using a list vs a deque. And it seems like there is more of a preference to use a deque then a list. It seems as though the list is slower to access then the deque. This is not good because I am not sure how many nodes a user would like to draw. For all I know, there could be 10,000+ nodes in the geometry and if the user wants to move a node, I don't want the user have to wait 30 sec for the program to iterate through all of the elements to find the node(s) to erase.
So on one hand, deque are alot faster but as soon as an element is removed, all of the pointers and references are invalidated. On the other hand, std::list allows me to erase whatever element I want without invalidating any of the pointers and references but is slower compared to a deque.
I am considering to switch to a list because even if the list is slower, if I can't erase an element without invalidating the pointers and references, then there isn't much of a benefit speed wise if the program doesn't work.
However, is using a list the best choice in my situation? Is there any way to use a deque? Or is there a third option that I haven't considered?
Edit:
I forgot to mention. One thing that I am not to fond of with lists is the inability to get an element's data directly (in std::deque and vector, I can use the at function to access elements). This isn't a huge deal breaker with my code. But it does makes things convenient. For example, when a user selects a node when they want to create a line/arc, the code iterates over the entire node list to find out which one was selected and then, for the first selection, stores the index into a variable (called firstNodeIndex). For the second node, it does the same thing but when both variables (firstNodeIndex and secondNodeIndex) are viable numbers, then the function for creating the line/arc is called and the function uses the two stored indexes to re-access the node list to grab an address to the node. If I were to use the list, I would have to store the address of the two nodes in variables and then create some additional logic to make sure that the two variables containing the addresses to the two nodes are viable options.
Another alternate solution would be to reiterate through the entire node list again to grab the nodes that are selected (I would have a variable inside each node to indicate that it is selected). But I am afraid that this might not be a good idea given std::list limitations.
I am kind of in favor of my first way but I am open to change if need be or if there is a better method
So your problem is that you don't want your iterators invalidated when you insert or erase element, but you want your data structure to be fast.
Linked list is only slow when you have to iterate all elements frequently. In does not take advantage of continuous data access like vector or deque. Also linear search in list is slow.
I had similar situations. Here are some options:
Use list and try to avoid linear searches. See if memory access speed of linked list affect your performance significantly and if it doesn't - use it.
Use map or set. Same cons as list except search, which is O(logn). Or you can use unordered versions if you don't care about sorting elements.
Use non-standard data structure like plf::colony. If you don't care about order of insertion, this is probably your best option.
Create your own deque-like data structure that does not invalidate iterators (using skipfields or storing free elements somewhere). I wouldn't recommend it since you will probably end up writing something like plf::colony anyway.
A rule of thumb:
will I want to add and delete items at random?
set, list, map, multimap and unordered versions of same
will I want to be able to name individual items and find them quickly?
map, set, multimap and unordered versions of same
does the thing I am storing have mutable data, or is it more detailed than just its name (key)?
map, multimap, unordered versions thereof
do I need the items to say in order?
yes: map, no: unordered_map

C++/STL Structure for Indexed Linked List (Indices in Hash Table)

I'm looking for a way to remember locations in a doubly-linked list (in hash tables or other data structures).
In C, I would add prev and next pointers to my struct. Then, I could store references to elements of my struct wherever I wanted, and refer to them later. I need only maintain these prev/next pointers to manipulate my linked list, and stored references to locations in the list will stay updated.
What is the C++ approach to this problem?
The end goal is an data structure (which is sequenced, but not ordered, i.e. no comparison function exists, but they are relatively sequenced based on where they are inserted). I need to cheaply insert, delete, move objects as the structure grows. But I also need to cheaply look up each element by some key unrelated to the ordering, and I look up meaningful locations (like head, tail, and various checkpoints in the structure called slices). I need to be able to traverse the sequenced list after looking up a starting place by key or by slice.
Head and tail will be free. I was planning a hash table that maps the keys to list elements, and another hash table that maps slices to list elements.
I asked a more specific question related to this here:
Using Both Map and List for Same Objects
The conclusion I made was that I would need to maintain both a List and various Maps pointing to the same data to get the performance I need. But doing this by storing iterators in C++ seemed subpar. Instead it seemed easier to reimplement linked list (building it into my class) and using STL maps to point to data.
I was hoping for some input about which is a more fruitful route, or if there is some third plan that better meets my needs. My assumption is that the STL implementation of unordered_map is faster than anything I would implement, but I could match or beat the performance of list since I'm only using a subset of its functionality.
Thanks!
More precise description of my data/performance requirements:
Data will come in with a unique key. I will add it into a queue.
I'll need to update/move/remove/delete this data in O(1) based on its unique key.
I'll need to insert new data/read data based on metadata stored in other data structures.
I was speaking imprecisely when I said very large list above. The list will definitely fit into memory. Space is cheap enough that it is worth using other data structures to index this list.
I understand your requirements as being:
the data has a unique key
update/move/remove/delete this data in constant time, using its unique key
According to this the best fit would be the unodered_map: It works with a key, and uses a hash table to access the elements. In average insert, find, update is constant time (thanks to the hash table), unless the hash function is not appropriate (i.e. worst case if all elements would yield the same hash value, you would have linear time, as in a list, due to the colisions).
This seems also to match your initial intention:
Head and tail will be free. I was planning a hash table that maps the
keys to list elements, and another hash table that maps slices to list
elements.
Edit: If you need also to master sequencing of elements, independently of their key, you'd need to build a combined container, based on a list and an unordered_map which associates the key to an iterator to the element in the list. You'd then have to manage synchronisation, for example:
insert element: get iterator by inserting element into list, then add the iterator to the unordered_map using the element's key.
remove element: find iterator to element by searching for the key in the unordered_map, erase element in the list using this iterator, and finally erase the key in the unordered_map.
find element: find iterator to element by searching for the key in the unordered_map
sequential iteration: use the iterator to the begin of the list.
I'd route you to STL containers to browse... but when you write word 'very large' (and I'm currently Big Data professional) everything changes.
Nobody usually gives you good advice for scalability but ... here are points.
What is 'very large' in your case? Does std::list fit your needs? Before 3rd paragraph everything looks suitable if you are not too large. Do your structure fits in memory?
How about your structure aligned to memory manager? Simply C-like list with 'prev' and 'next' has serious disadvantage - every element usually is allocated from memory manager. If you are large, this matters and gives your memory over-usage.
What do you expect to be element external reference? If you use pointers - you loose ability to perform optimization on your structure. But probably you don't need it.
Actually you definitely need to consider some 'pools' management if you are really large and indices in such pools can be pretty good references if you modify your structure intensively.
Please consider about large twice. If you mean really large - you need special solution. Especially if your data is larger than your memory. If you are not so large - why not start with just std:list? When you answer to this question, probably your life could be much more easy ;-).

STL What is Random access and Sequential access?

So I am curious to know, what is random access?
I searched a little bit, and couldn't find much. The understanding I have now is that the "blocks" in the container are placed randomly (as seen here). Random access then means I can access every block of the container no matter what position (so I can read what it says on position 5 without going through all blocks before that), while with sequential access, I have to go through 1st , 2nd, 3rd and 4th to get to the 5th block.
Am I right? Or if not, then can someone explain to me what random access is and sequential access is?
Sequential access means the cost of accessing the 5th element is 5 times the cost of accessing the first element, or at least that there is an increasing cost associated with an elements position in the set. This is because to access the 5th element of the set, you must first perform an operation to find the 1st, 2nd, 3rd, and 4th elements, so accessing the 5th element requires 5 operations.
Random access means that accessing any element in the set has the same cost as any other element in the set. Finding the 5th element of a set is still only a single operation.
So accessing a random element in a random access data-structure is going to have O(1) cost whereas accessing a random element in a sequential data-structure is going to have a O(n/2) -> O(n) cost. The n/2 comes from that if want to access a random element in a set 100 times, the average position of that element is going to be about halfway through the set. So for a set of n elements, that comes out to n/2 (Which in big O notation can just be approximated to n).
Something you might find cool:
Hashmaps are an example of a data structure which implements random access. A cool thing to note is that on hash collisions in a hash map, the two collided elements are stored in a sequential linked list in that bucket on the hash map. So that means that if you have 100% collisions for a hash map you actually end up with sequential storage.
Here's an image of a hashmap illustrating what I'm describing:
This means the worst case scenario for a hash map is actually O(n) for accessing an element, the same as average case for sequential storage, or put more correctly, finding an element in a hashmap is Ω(n), O(1), and Θ(1). Where Ω is worst case, Θ is best case, and O is average case.
So:
Sequential access: Finding a random element in a set of n elements is Ω(n), O(n/2), and Θ(1) which for very large numbers becomes Ω(n), O(n), and Θ(1).
Random access: Finding a random element in a set of n elements is Ω(n/2), O(1), and Θ(1) which for very large numbers becomes Ω(n), O(1), and Θ(1)
So random access has the benefit of giving better performance for accessing elements, however sequential storage data structures provide benefits in other areas.
Second Edit For #sumsar1812:
I want to preface with this is how I understand the advantages/use cases of sequential storage, but I am not as certain about my understanding of benefits of sequential containers as I am about my answer above. So please correct me anywhere I am mistaken.
Sequential storage is useful because the data will actually be stored sequentially in memory.
You can actually access the next member of a sequentially stored data set by offsetting a pointer to the previous element of that set by the amount of bytes it takes to store a single element of that type.
So since a signed int requires 8 bytes to store, if you have a fixed array of integers with a pointer pointing to the first integer:
int someInts[5];
someInts[1] = 5;
someInts is a pointer pointing to the first element of that array. Adding 1 to that pointer just offsets where it points to in memory by 8 bytes.
(someInts+1)* //returns 5
This means if you need to access every element in your data structure in that specific order its going to be much faster since each lookup for sequential storage is just adding a constant value to the pointer.
For random access storage, there is no guarantee that each element is stored even near the other elements. This means each lookup will be more expensive that just adding a constant amount.
Random access storage containers can still simulate what appears to be an ordered list of elements by using an iterator. However, as long as you allow random access lookups for elements there will be no guarantee that those elements are stored sequentially in memory. This means that even though a container can exhibit behavior of both a random access container and a sequential container, it will not exhibit the benefits of a sequential container.
So if the order of the elements in your container is supposed to be meaningful, or you plan on iterating and operating on every element in a data set then you might benefit from a sequential container.
In truth it still gets a little more complicated because a linked list, which is a sequential container doesn't actually store sequentially in memory whereas a vector, another sequential container, does. Here's a good article that explains use cases for each specific container better than I can.
There are two main aspects to this, and it's unclear which of the two is more relevant to your question. One of those aspects is accessing the content of an STL container via iterators, where those iterators allow either random access or forward (sequential) access. The other aspect is that of accessing a container or even just memory itself in random or sequential orders.
Iterators - Random Access vs. Sequential Access
To start with iterators, take two examples: std::vector<T> and std::list<T>. A vector stores an array of values, whereas a list stores a linked list of values. The former is stored sequentially in memory, and this allows arbitrary random access: calculating the location of any element is just as fast as calculating the location of the next element. Thus the sequential storage gives you efficient random access, and the iterator is a random access iterator.
By contrast, a list performs a separate allocation for each node, and each node only knows where its neighbors are. Thus calculating the location of a random non-neighbor node cannot be done directly. Any attempt to do so must traverse all the intermediate nodes, and thus algorithms that attempt to skip nodes may perform badly. The non-sequential storage yields randomized locations and thus only efficient sequential access. Thus the iterator that list provides is a bidirectional iterator, one of a few different sequential iterators.
Memory - Random Access vs. Sequential Access
However there's another wrinkle in your question. The iterator parts only address the traversal of the container. Underneath that, however, the CPU will be accessing memory itself in a particular pattern. While at a high level the CPU is capable of addressing any random address with no overhead of calculating where it is (it's like a big vector), in practice reading memory involves caching and lots of subtleties that make accessing different parts of memory take different amounts of time.
For example, once you start working with a rather large data set, even if you're working with a vector, it's more efficient to access all elements in sequential order than to access all elements in some random order. By contrast a list doesn't make this possible. Since the nodes of a list aren't even necessarily located sequential memory locations, even a sequential access of the list's items may not read memory sequentially, and can be more expensive because of this.
The terms themselves don't imply any performance characteristics as #echochamber says. The terms only refer to a method of access.
"Random Access" refers to accessing elements in a container in an arbitrary order. std::vector is an example of a C++ container that performs great for random access. std::stack is an example of a C++ container that doesn't even allow random access.
"Sequential Access" refers to accessing elements in order. This is only relevant for ordered containers. Some containers are optimized better for sequential access than random access, for example std::list.
Here's some code to show the difference:
// random access. picking elements out, regardless of ordering or sequencing.
// The important factor is that we are selecting items by some kind of
// index.
auto a = some_container[25];
auto b = some_container[1];
auto c = some_container["hi"];
// sequential access. Here, there is no need for an index.
// This implies that the container has a concept of ordering, where an
// element has neighbors
for(container_type::iterator it = some_container.begin();
it != some_container.end();
++ it)
{
auto d = *it;
}

Is there a linked hash set in C++?

Java has a LinkedHashSet, which is a set with a predictable iteration order. What is the closest available data structure in C++?
Currently I'm duplicating my data by using both a set and a vector. I insert my data into the set. If the data inserted successfully (meaning data was not already present in the set), then I push_back into the vector. When I iterate through the data, I use the vector.
If you can use it, then a Boost.MultiIndex with sequenced and hashed_unique indexes is the same data structure as LinkedHashSet.
Failing that, keep an unordered_set (or hash_set, if that's what your implementation provides) of some type with a list node in it, and handle the sequential order yourself using that list node.
The problems with what you're currently doing (set and vector) are:
Two copies of the data (might be a problem when the data type is large, and it means that your two different iterations return references to different objects, albeit with the same values. This would be a problem if someone wrote some code that compared the addresses of the "same" elements obtained in the two different ways, expecting the addresses to be equal, or if your objects have mutable data members that are ignored by the order comparison, and someone writes code that expects to mutate via lookup and see changes when iterating in sequence).
Unlike LinkedHashSet, there is no fast way to remove an element in the middle of the sequence. And if you want to remove by value rather than by position, then you have to search the vector for the value to remove.
set has different performance characteristics from a hash set.
If you don't care about any of those things, then what you have is probably fine. If duplication is the only problem then you could consider keeping a vector of pointers to the elements in the set, instead of a vector of duplicates.
To replicate LinkedHashSet from Java in C++, I think you will need two vanilla std::map (please note that you will get LinkedTreeSet rather than the real LinkedHashSet instead which will get O(log n) for insert and delete) for this to work.
One uses actual value as key and insertion order (usually int or long int) as value.
Another ones is the reverse, uses insertion order as key and actual value as value.
When you are going to insert, you use std::map::find in the first std::map to make sure that there is no identical object exists in it.
If there is already exists, ignore the new one.
If it does not, you map this object with the incremented insertion order to both std::map I mentioned before.
When you are going to iterate through this by order of insertion, you iterate through the second std::map since it will be sorted by insertion order (anything that falls into the std::map or std::set will be sorted automatically).
When you are going to remove an element from it, you use std::map::find to get the order of insertion. Using this order of insertion to remove the element from the second std::map and remove the object from the first one.
Please note that this solution is not perfect, if you are planning to use this on the long-term basis, you will need to "compact" the insertion order after a certain number of removals since you will eventually run out of insertion order (2^32 indexes for unsigned int or 2^64 indexes for unsigned long long int).
In order to do this, you will need to put all the "value" objects into a vector, clear all values from both maps and then re-insert values from vector back into both maps. This procedure takes O(nlogn) time.
If you're using C++11, you can replace the first std::map with std::unordered_map to improve efficiency, you won't be able to replace the second one with it though. The reason is that std::unordered map uses a hash code for indexing so that the index cannot be reliably sorted in this situation.
You might wanna know that std::map doesn't give you any sort of (log n) as in "null" lookup time. And using std::tr1::unordered is risky business because it destroys any ordering to get constant lookup time.
Try to bash a boost multi index container to be more freely about it.
The way you described your combination of std::set and std::vector sounds like what you should be doing, except by using std::unordered_set (equivalent to Java's HashSet) and std::list (doubly-linked list). You could also use std::unordered_map to store the key (for lookup) along with an iterator into the list where to find the actual objects you store (if the keys are different from the objects (or only a part of them)).
The boost library does provide a number of these types of combinations of containers and look-up indices. For example, this bidirectional list with fast look-ups example.