c++ set data structure which keeps inserted order - c++

Is there any C++ built-in set data structure which keeps the inserted order?
It doesn't a problem whether the set is a hash set or a set implemented by a balanced binary tree.

In C++11, both std::multiset and std::multimap are guaranteed to preserve the insertion order of same-valued/same-keyed elements.
Quoting from the C++11 standard,
23.2.4 Associative containers
4 An associative container supports unique keys if it may contain at most one element for each key. Otherwise,
it supports equivalent keys. The set and map classes support unique keys; the multiset and multimap classes
support equivalent keys. For multiset and multimap, insert, emplace, and erase preserve the relative
ordering of equivalent elements.
It must be explicitly stated that their unordered (hash) variants, std::unordered_multiset and std::unordered_multimap does not guarantee (it is unspecified) the relative order of insertion of elements.

I am not 100% sure if I understand what you are asking, but it sounds to me like a linked list would sufficiently suit your needs. You can just push and pop to keep the list in the order you placed it in. You can look here for a reference: http://www.cplusplus.com/reference/list/list/
Furthermore, you can use the unique method to remove duplicates making it emulate a set data structure.

Boost guarantees that all associative containers preserve insertion ordering.
a_eq.insert(t): If a range containing elements equivalent to t exists in a_eq, t is inserted at the end of that range.
ref: https://www.boost.org/doc/libs/1_68_0/doc/html/container/cpp_conformance.html#container.cpp_conformance.insertion_hints

Related

Use of equal_range for unordered_map

As it makes sense, lower_bound and upper_bound are absent for std::unordered_map because there is no order for elements.
However std::unordered_map has method equal_range. it return iterators for range corresponding to a key.
How does it make sense? Since there can be only one element with a key in std::unordered_map. It is just find method.
Also, in std::unordered_multimap, does it presence means all elements with same key will always come together while iterating unordered_multimap with a iterator?
How does it make sense?
It kind of does. The standard requires all associative container to offer equal_range so the non multi containers need to provide it. It does make writing generic code easier so I suspect that is the reason why all of the containers are required to use it.
Does it presence means all elements with same key will always come together while iterating unordered_map with a iterator?
Actually, yes. Since the all of the keys will have the same value they will hash to the same value which means they all get stored in the same bucket and will be grouped together since the keys compare equally. From [unord.req]/6
An unordered associative container supports unique keys if it may contain at most one element for each key. Otherwise, it supports equivalent keys. unordered_­set and unordered_­map support unique keys. unordered_­multiset and unordered_­multimap support equivalent keys. In containers that support equivalent keys, elements with equivalent keys are adjacent to each other in the iteration order of the container. Thus, although the absolute order of elements in an unordered container is not specified, its elements are grouped into equivalent-key groups such that all elements of each group have equivalent keys. Mutating operations on unordered containers shall preserve the relative order of elements within each equivalent-key group unless otherwise specified.
emphasis mine
It's for consistency with the other containers.
It makes more sense in the _multi variants, but is present in all the associative (and unordered associative) containers in the standard library.
It allows you to write code like
template <typename MapLike, typename KeyLike>
void do_stuff(const MapLike & map, const KeyLike & key)
{
auto range = map.equal_range(key);
for (auto it = range.first; it != range.second; ++it)
// blah
}
Which does not care about what specific container it is dealing with
cplusplus.com writes about std::unordered_map::equal_range:
Returns the bounds of a range that includes all the elements in the container with a key that compares equal to k. In unordered_map containers, where keys are unique, the range will include one element at most.
Also, in std::unordered_multimap, does it presence means all elements with same key will always come together while iterating unordered_multimap with a iterator?
In general, the order, in which elements stored in a std::unordered_multimap are obtained while traversing it, is actually not defined. However, note that std::unordered_multimaps are usually implemented as hash tables. By analysing such an implementation you will realize that the ordering is not going to be as "undefined" as someone might initially think.
At element insertion (or hash table rehashing), the value resulting of applying the hash function to an element's key is used to select the bucket where that element is going to be stored. Two elements with equal keys will result in the same hash value, therefore they will be stored in the same bucket, so they come togetherX while iterating an std::unordered_multimap.
XNote that even two elements with different keys might also result in the same hash value (i.e., a collision). However, std::unordered_multimap can handle these cases by comparing the keys against equality, and therefore still group elements with equal keys together.

Difference between a list and multiset in C++

In C++:
A list is a collection that can contain non-unique values in sequence
A multiset is a collection that can contain non-unique elements in sequence
What then is the specific difference between the two? Why would I use on over the other?
I've tried finding this information online but most references (e.g. cplusplus.com) talk about the two containers in different ways, such that the difference is not apparent.
From multiset:
std::multiset is an associative container that contains a sorted
set of objects
Search, insertion, and removal operations have logarithmic complexity.
From list:
std::list is a container that supports constant time insertion and removal of elements from anywhere in the container
Fast random access is not supported
Thus, if you want to have a faster search, use multiset.
For faster insertion and removal: use list.
The biggest difference is std::list is a linked list while std::multiset is a tree structure (typically an RB Tree). This means element access in a std::list has O(N) access while a std::multiset has O(logN).
This also means iterating a std::multiset from begin() to end() will give you sorted data while iterating a std::list will give you the order the data was inserted.
Its not that straightforward, but to be short:
If you need to query search by value many times, opt for std::multiset.
Otherwise std::list.
There are quite a few containers in standard library, they have different properties and implication on timing for access, insert and remove elements. You should choose one that is most applicable in particular case from whole group. Your definition of std::multiset and std::list is artificial and does not make any sense. If chair and horse both have legs it does not mean they are similar.

loop a std::unordered_map, the sequence is always the sequence I insert elements?

I constructed a std::unordered_map and use for loop to visit it.
I found that the sequence of the iteration result shows the elements is put in the sequence that I created those elements, no matter how I inserted them.
Is this part of C++ standard that unordered_map, when visited, the iteration sequence is the insertion sequence? Or this is implementation preference?
I ask this question is, I wish to know if this feature, is something I can rely in my c++ code?
No. The standard makes no guarantees about the order of elements in the unordered associative containers † (unordered map, set and their multivalued versions) and you can not rely on any particular ordering in your code.
† Except for a special case [unord.req]/6 (standard draft, emphasis mine):
An unordered associative container supports unique keys if it may contain at most one element for each key. Otherwise, it supports equivalent keys. unordered_set and unordered_map support unique keys. unordered_multiset and unordered_multimap support equivalent keys. In containers that support equivalent keys, elements with equivalent keys are adjacent to each other in the iteration order of the container. Thus, although the absolute order of elements in an unordered container is not specified, its elements are grouped into equivalent-key groups such that all elements of each group have equivalent keys. Mutating operations on unordered containers shall preserve the relative order of elements within each equivalent-key group unless otherwise specified.

Questions about STL containers in C++

How often do std::multimap and std::unordered_multimap shuffle entries around? I'm asking because my code passes around references to distinguish between entries with the same hash, and I want to know when to run the reference redirection function on them.
What happens if I do this:
std::multimap atable; //Type specification stuff left out
//Code that pus in two entries with the same key, call that key foo
int bar = atable[foo];
Is the result different if it's an unordered_multimap instead?
Back to passing references around to distinguish entries with the same hash. Is there a safer way to do that?
Do the entries move around if I remove one of the entries (That's what's suggested by a reading of the documentation for std::vector)?
NO, no elements will be harmed during any operation.
As is explained in this famous Q&A, for associative containers, there is no iterator invalidation upon insertions / erasure (except for the element being erased of course). For unordered associative containers, there is iterator invalidation during rehashing, about which the Standard says (emphasize mine)
23.2.5 Unordered associative containers [unord.req]
9 The elements of an unordered associative container are organized into
buckets. Keys with the same hash code appear in the same bucket. The
number of buckets is automatically increased as elements are added to
an unordered associative container, so that the average number of
elements per bucket is kept below a bound. Rehashing invalidates
iterators, changes ordering between elements, and changes which
buckets elements appear in, but does not invalidate pointers or
references to elements. For unordered_multiset and unordered_multimap,
rehashing preserves the relative ordering of equivalent elements.
Again, this does not entail the reshuflling of the actually stored elements (the Key and Value types in unordered_map<Key, Value>), because unordered maps have buckets that are organized as linked lists, and iterators to stored elements (key-value pairs) have both an element pointer and a bucket pointer. The rehashing shuffles buckets, which invalidates the iterators (because their bucket pointer is invalidated) but not pointers or references to the elements itself. This is explained in detail in another Q&A
How often do std::multimap and std::unordered_multimap shuffle entries around?
Never. The iterators that point to elements of any associative container (including sets, maps, and their unordered or "multi" versions) are never invalidated (unless the specific element they point to is deleted). In other words, the actual elements are never "shuffled around". These are required to be implemented as linked structures (e.g., linked-tree), meaning they can be re-structured just by changing a few pointers, without having to physically move any element.
EDIT: Apparently (see TemplateRex' comment), this is not the case for unordered containers. In that case, the iterators can get invalidated, but the elements themselves do not move around. These requirements imply an indirect container with no back-pointer, which I guess is a reasonable choice, but not one I would have expected.
What happens if I do this: ... (get [] of multimap) ...
The operator[] is not defined for std::multimap (or unordered version). So, what would happen? A compiler error would happen.
Is the result different if it's an unordered_multimap instead?
No, it's the same, the operator[] does not exist.
Back to passing references around to distinguish entries with the same hash. Is there a safer way to do that?
Yes, the recommended practice is to refer to elements of the map / set / whatever using iterators, not references. The iterators to elements are guaranteed to remain valid, and they are copyable and have the right const-ness protection on them, and that makes them the perfect objects to "refer to an entry".
EDIT: As per the same comment, I would have to recommend using pointers to the elements if dealing with a hashed container (unordered containers), because iterators are not guaranteed (by the standard) to remain valid.
All of the associative containers in the C++ standard library are node based, i.e., their elements stay put. However, whether the hash is computed on the object after copying it or on a temporary object passed to the container isn't specified. I would guess, that generally the hash is computed before the object is copied/moved.
To distinguish elements with the same hash you need to have an equality function anyway: if the location of the object causes it to be different it would mean that all objects are different and you wouldn't be able to look them up at all. You need to have an equality function for the elements in an unordered container which defines equivalence of keys. For the ordered associative the equivalent class is based on the strict weak ordering, i.e., on an expression like this (using < rather than a binary predicate for readability; any binary predicate defining a strict weak order would work, too):
bool equivalent = !(a < b) && !(b < a);

Best c++ container to strip items away from?

I have a list of files (stored as c style strings) that I will be performing a search on and I will remove those files that do not match my parameters. What is the best container to use for this purpose? I'm thinking Set as of now. Note the list of files will never be larger than when it is initialized. I'll only be deleting from the container.
I would definitely not use a set - you don't need to sort it so no point in using a set. Set is implemented as a self-balancing tree usually, and the self-balancing algorithm is unnecessary in your case.
If you're going to be doing this operation once, I would use a std::vector with remove_if (from <algorithm>), followed by an erase. If you haven't used remove_if before, what it does is go through and shifts all the relevant items down, overwriting the irrelevant ones in the process. You have to follow it with an erase to reduce the size of the vector. Like so:
std::vector<const char*> files;
files.erase(remove_if(files.begin(), files.end(), RemovePredicate()), files.end());
Writing the code to do the same thing with a std::list would be a little bit more difficult if you wanted to take advantage of its O(1) deletion time property. Seeing as you're just doing this one-off operation which will probably take so little time you won't even notice it, I'd recommend doing this as it's the easiest way.
And to be honest, I don't think you'll see that much difference in terms of speed between the std::list and std::vector approaches. The vector approach only copies each value once so it's actually quite fast, yet takes much less space. In my opinion, going up to a std::list and using three times the space is only really justified if you're doing a lot of addition and deletion throughout the entire application's lifetime.
Elements in a std::set must be unique, so unless the filenames are globally unique this won't suit your needs.
I would probably recommend a std::list.
From SGI:
A vector is a Sequence that supports random access to elements, constant time insertion and removal of elements at the end, and linear time insertion and removal of elements at the beginning or in the middle.
A list is a doubly linked list. That is, it is a Sequence that supports both forward and backward traversal, and (amortized) constant time insertion and removal of elements at the beginning or the end, or in the middle.
An slist is a singly linked list: a list where each element is linked to the next element, but not to the previous element. That is, it is a Sequence that supports forward but not backward traversal, and (amortized) constant time insertion and removal of elements.
Set is a Sorted Associative Container that stores objects of type Key. Set is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Unique Associative Container, meaning that no two elements are the same.
Multiset is a Sorted Associative Container that stores objects of type Key. Multiset is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Multiple Associative Container, meaning that two or more elements may be identical.
Hash_set is a Hashed Associative Container that stores objects of type Key. Hash_set is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Unique Associative Container, meaning that no two elements compare equal using the Binary Predicate EqualKey.
Hash_multiset is a Hashed Associative Container that stores objects of type Key. Hash_multiset is a simple associative container, meaning that its value type, as well as its key type, is Key. It is also a Multiple Associative Container, meaning that two or more elements may compare equal using the Binary Predicate EqualKey.
(Some containers have been omitted.)
I would go with hash_set if all you care to have is a container which is fast and doesn't contain multiple identical keys. hash_multiset if you do, set or multiset if you want the strings to be sorted, or list or slist if you want the strings to keep their insertion order.
After you've built your list/set, use remove_if to filter out your items based on your criteria.
I will start by tossing out vector since it is a sequential container. Set, I believe is close to being sequential or hashed. I would avoid that. A doublely-linked list, the stl list is one of these, has two pointers and the node. Basically, to remove an item, it breaks the chain then rejoins the two parts with the pointers.
Assuming your search criteria does not depend on the filename (ie. you search for content, file sizes etc.), so you cannot make use of a set, I'd go with a list. It will take you O(N) to construct the whole list, and O(1) per one delete.
If you wanted to make it even faster, and didn't insist on using ready-made STL containers, I would:
use a vector
delete using false delete, ie. marking an item as deleted
when the ratio of deleted/all items raises above certain threshold, I would filter the items with remove_if
This should give you the best space/time/cache performance. (Although you should profile it to be sure)
You can use two lists/vectors/whatever:
using namespace std;
vector<const char *> files;
files.push_back("foo.bat");
files.push_back("bar.txt");
vector<const char *> good_files; // Maybe reserve elements given files.size()?
for(vector<const char *>::const_iterator i = files.begin(); i != files.end(); ++i) {
if(file_is_good(*i)) {
new_files.push_back(*i);
}
}