Difference between a list and multiset in C++ - c++

In C++:
A list is a collection that can contain non-unique values in sequence
A multiset is a collection that can contain non-unique elements in sequence
What then is the specific difference between the two? Why would I use on over the other?
I've tried finding this information online but most references (e.g. cplusplus.com) talk about the two containers in different ways, such that the difference is not apparent.

From multiset:
std::multiset is an associative container that contains a sorted
set of objects
Search, insertion, and removal operations have logarithmic complexity.
From list:
std::list is a container that supports constant time insertion and removal of elements from anywhere in the container
Fast random access is not supported
Thus, if you want to have a faster search, use multiset.
For faster insertion and removal: use list.

The biggest difference is std::list is a linked list while std::multiset is a tree structure (typically an RB Tree). This means element access in a std::list has O(N) access while a std::multiset has O(logN).
This also means iterating a std::multiset from begin() to end() will give you sorted data while iterating a std::list will give you the order the data was inserted.

Its not that straightforward, but to be short:
If you need to query search by value many times, opt for std::multiset.
Otherwise std::list.

There are quite a few containers in standard library, they have different properties and implication on timing for access, insert and remove elements. You should choose one that is most applicable in particular case from whole group. Your definition of std::multiset and std::list is artificial and does not make any sense. If chair and horse both have legs it does not mean they are similar.

Related

What is the most efficient std container for non-duplicated items?

What is the most efficient way of adding non-repeated elements into STL container and what kind of container is the fastest? I have a large amount of data and I'm afraid each time I try to check if it is a new element or not, it takes a lot of time. I hope map be very fast.
// 1- Map
map<int, int> Map;
...
if(Map.find(Element)!=Map.end()) Map[Element]=ID;
// 2-Vector
vector<int> Vec;
...
if(find(Vec.begin(), Vec.end(), Element)!=Vec.end()) Vec.push_back(Element);
// 3-Set
// Edit: I made a mistake: set::find is O(LogN) not O(N)
Both set and map has O(log(N)) performance for looking up keys. vector has O(N).
The difference between set and map, as far as you should be concerned, is whether you need to associate a key with a value, or just store a value directly. If you need the former, use a map, if you need the latter, use a set.
In both cases, you should just use insert() instead of doing a find().
The reason is insert() will insert the value into the container if and only if the container does not already contain that value (in the case of map, if the container does not contain that key). This might look like
Map.insert(std::make_pair(Element, ID));
for a map or
Set.insert(Element);
for a set.
You can consult the return value to determine whether or not an insertion was actually performed.
If you're using C++11, you have two more choices, which are std::unordered_map and std::unordered_set. These both have amortized O(1) performance for insertions and lookups. However, they also require that the key (or value, in the case of set) be hashable, which means you'll need to specialize std::hash<> for your key. Conversely, std::map and std::set require that your key (or value, in the case of set) respond to operator<().
If you're using C++11, you can use std::unordered_set. That would allow you O(1) existence-checking (technically amortized O(1) -- O(n) in the worst case).
std::set would probably be your second choice with O(lg n).
Basically, std::unordered_set is a hash table and std::set is a tree structure (a red black tree in every implementation I've ever seen)1.
Depending on how well your hashes distribute and how many items you have, a std::set might actually be faster. If it's truly performance critical, then as always, you'll want to do benchmarking.
1) Technically speaking, I don't believe either are required to be implemented as a hash table or as a balanced BST. If I remember correctly, the standard just mandates the run time bounds, not the implementation -- it just turns out that those are the only viable implementations that fit the bounds.
You should use a std::set; it is a container designed to hold a single (equivalent) copy of an object and is implemented as a binary search tree. Therefore, it is O(log N), not O(N), in the size of the container.
std::set and std::map often share a large part of their underlying implementation; you should check out your local STL implementation.
Having said all this, complexity is only one measure of performance. You may have better performance using a sorted vector, as it keeps the data local to one another and, therefore, more likely to hit the caches. Cache coherence is a large part of data structure design these days.
Sounds like you want to use a std::set. It's elements are unique, so you don't need to care about uniqueness when adding elements, and a.find(k) (where a is an std::set and k is a value) is defined as being logarithmic in complexity.
if your elements can be hashed for O(1), then better to use an index in a unordered_map or unordered_set (not in a map/set because they use RB tree in implementation which is O(logN) find complexity)
Your examples show a definite pattern:
check if the value is already in container
if not, add the value to the container.
Both of these operation would potentially take some time. First, looking up an element can be done in O(N) time (linear search) if the elements are not arranged in any particular manner (e.g., just a plain std::vector), it could be done in O(logN) time (binary search) if the elements are sorted (e.g., either std::map or std::set), and it could be done in O(1) time if the elements are hashed (e.g., either std::unordered_map or std::unordered_set).
The insertion will be O(1) (amortized) for a plain vector or an unordered container (hash container), although the hash container will be a bit slower. For a sorted container like set or map, you'll have log-time insertions because it needs to look for the place to insert it before inserting it.
So, the conclusion, use std::unordered_set or std::unordered_map (if you need the key-value feature). And you don't need to check before doing the insertion, these are unique-key containers, they don't allow duplicates.
If std::unordered_set / std::unordered_map (from C++11) or std::tr1::unordered_set / std::tr1::unordered_map (since 2007) are not available to you (or any equivalent), then the next best alternative is std::set / std::map.

what is the uses cases in which order in key value associative container is useful

I am wondering in which uses cases, we can be interested by having an ordered associative container.
In other terms, why use std::map and no std::unorderd_map
You use an ordered map when you need to be able to iterate over the keys in an ordered fashion.
As Mat pointed out, it depends on whether you need to access your elements in sorted order. Also, map has guaranteed, worst-case logarithmic access time (in the number of elements), whereas unordered map has average constant access time.
The main reason for using map instead of unordered_map is that map is in standard and for sure you have it. Some companies have policy not to use definitions from tr1.
But you ask when order is important - it is needed for operations like lower_bound, upper_bound, set_union, set_intersection etc.
If the comparison between unordered containers is needed, the complexity
might be an issue.
It seems to be difficult to implement operator==/operator!= between
unordered containers efficiently.
N3290 ยง23.2.5/11 says
the complexity of operator== ... is
proportional to ... N^2 in the worst
case, where N is a.size().
while other containers have linear complexity.
Unordered_map use more memory than map. If there is a constraint in memory then its recommended to use map. So the operations become slower when using unordered_map
EDIT
Quoting from the wiki article
It is similar to the map class in the C++ standard library but has different
constraints. As its name implies, unlike the map class, the elements of an
unordered_map are not ordered. This is due to the use of hashing to store objects.
unordered_map can still be iterated through like a regular map.

Simple and efficient container in C++ with characteristics of map and list containers

I'm looking for a C++ container that will enjoy both map container and list container benefits.
map container advantages I would like to maintain:
O(log(n)) access
operator[] ease of use
sparse nature
list container advantages I would like to maintain:
having an order between the items
being able to traverse the list easily UPDATE: by a sorting order based on the key or value
A simple example application would be to hold a list of certain valid dates (business dates, holidays, some other set of important dates...), once given a specific date, you could find it immediately "map style" and then find the next valid date "list style".
std::map is already a sorted container where you can iterate over the contained items in order. It only provides O(log(n)) access, though.
std::tr1::unordered_map (or std::unordered_map in C++0x) has O(1) access but is unsorted.
Do you really need O(1) access? You have to use large datasets and do many lookups for O(log(n)) not being fast enough.
If O(log(n)) is enough, std::map provides everything you are asking for.
If you don't consider the sparse nature, you can take a look at the Boost Multi-Index library. For the sparse nature, you can take a look at the Boost Flyweight library, but I guess you'll have to join both approaches by yourself. Note that your requirements are often contradictory and hard to achieve. For instance, O(1) and order between the items is difficult to maintain efficiently.
Maps are generally implemented as trees and thus have logarithmic look up time, not O(1), but it sounds like you want a sorted associative container. Hash maps have O(1) best case, O(N) worst case, so perhaps that is what you mean, but they are not sorted, and I don't think they are part of the standard library yet.
In the C++ standard library, map, set, multimap, and multiset are sorted associative containers, but you have to give up the O(1) look up requirement.
According to Stroustrup, the [] operator for maps is O(log(n)). That is much better than the O(n) you'd get if you were to try such a thing with a list, but it is definitely not O(1). The only container that gives you that for the [] operator is vector.
That aside, you can already do all your listy stuff with maps. Iterators work fine on them. So if I were you, I'd stick with map.
having an order between the items
being able to traverse the list easily
Maps already do both. They are sorted, so you start at begin() and traverse until you hit end(). You can, of course, start at any map iterator; you may find map's find, lower_bound, and related methods helpful.
You can store data in a list and have a map to iterators of your list enabling you to find the actual list element itself. This kind of thing is something I often use for LRU containers, where I want a list because I need to move the accessed element to the end to make it the most recently accessed. You can use the splice function to do this, and since the 2003 standard it does not invalidate the iterator as long as you keep it in the same list.
How about this one: all dates are stored in std::list<Date>, but you look it up with helper structure stdext::hash_map<Date, std::list<Date>::iterator>. Once you have iterator for the list access to the next element is simple. In your STL implementation it could be std::tr1::unordered_map instead of stdext::hash_map, and there is boost::unordered_map as well.
You will never find a container that satisfies both O(log n) access and an ordered nature. The reason is that if a container is ordered then inherently it must support an arbitrary order. That's what an ordered nature means: you get to decide exactly where any element is positioned. So to find any element you have to guess where it is. It can be anywhere, because you can place it anywhere!
Note that an ordered sequence is not the same as a sorted sequence. A sorted nature means there is one particular ordering relation between any two elements. An ordered nature means there may be more than one ordering relation among the elements.

Best c++ container to strip items away from?

I have a list of files (stored as c style strings) that I will be performing a search on and I will remove those files that do not match my parameters. What is the best container to use for this purpose? I'm thinking Set as of now. Note the list of files will never be larger than when it is initialized. I'll only be deleting from the container.
I would definitely not use a set - you don't need to sort it so no point in using a set. Set is implemented as a self-balancing tree usually, and the self-balancing algorithm is unnecessary in your case.
If you're going to be doing this operation once, I would use a std::vector with remove_if (from <algorithm>), followed by an erase. If you haven't used remove_if before, what it does is go through and shifts all the relevant items down, overwriting the irrelevant ones in the process. You have to follow it with an erase to reduce the size of the vector. Like so:
std::vector<const char*> files;
files.erase(remove_if(files.begin(), files.end(), RemovePredicate()), files.end());
Writing the code to do the same thing with a std::list would be a little bit more difficult if you wanted to take advantage of its O(1) deletion time property. Seeing as you're just doing this one-off operation which will probably take so little time you won't even notice it, I'd recommend doing this as it's the easiest way.
And to be honest, I don't think you'll see that much difference in terms of speed between the std::list and std::vector approaches. The vector approach only copies each value once so it's actually quite fast, yet takes much less space. In my opinion, going up to a std::list and using three times the space is only really justified if you're doing a lot of addition and deletion throughout the entire application's lifetime.
Elements in a std::set must be unique, so unless the filenames are globally unique this won't suit your needs.
I would probably recommend a std::list.
From SGI:
A vector is a Sequence that supports random access to elements, constant time insertion and removal of elements at the end, and linear time insertion and removal of elements at the beginning or in the middle.
A list is a doubly linked list. That is, it is a Sequence that supports both forward and backward traversal, and (amortized) constant time insertion and removal of elements at the beginning or the end, or in the middle.
An slist is a singly linked list: a list where each element is linked to the next element, but not to the previous element. That is, it is a Sequence that supports forward but not backward traversal, and (amortized) constant time insertion and removal of elements.
Set is a Sorted Associative Container that stores objects of type Key. Set is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Unique Associative Container, meaning that no two elements are the same.
Multiset is a Sorted Associative Container that stores objects of type Key. Multiset is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Multiple Associative Container, meaning that two or more elements may be identical.
Hash_set is a Hashed Associative Container that stores objects of type Key. Hash_set is a Simple Associative Container, meaning that its value type, as well as its key type, is Key. It is also a Unique Associative Container, meaning that no two elements compare equal using the Binary Predicate EqualKey.
Hash_multiset is a Hashed Associative Container that stores objects of type Key. Hash_multiset is a simple associative container, meaning that its value type, as well as its key type, is Key. It is also a Multiple Associative Container, meaning that two or more elements may compare equal using the Binary Predicate EqualKey.
(Some containers have been omitted.)
I would go with hash_set if all you care to have is a container which is fast and doesn't contain multiple identical keys. hash_multiset if you do, set or multiset if you want the strings to be sorted, or list or slist if you want the strings to keep their insertion order.
After you've built your list/set, use remove_if to filter out your items based on your criteria.
I will start by tossing out vector since it is a sequential container. Set, I believe is close to being sequential or hashed. I would avoid that. A doublely-linked list, the stl list is one of these, has two pointers and the node. Basically, to remove an item, it breaks the chain then rejoins the two parts with the pointers.
Assuming your search criteria does not depend on the filename (ie. you search for content, file sizes etc.), so you cannot make use of a set, I'd go with a list. It will take you O(N) to construct the whole list, and O(1) per one delete.
If you wanted to make it even faster, and didn't insist on using ready-made STL containers, I would:
use a vector
delete using false delete, ie. marking an item as deleted
when the ratio of deleted/all items raises above certain threshold, I would filter the items with remove_if
This should give you the best space/time/cache performance. (Although you should profile it to be sure)
You can use two lists/vectors/whatever:
using namespace std;
vector<const char *> files;
files.push_back("foo.bat");
files.push_back("bar.txt");
vector<const char *> good_files; // Maybe reserve elements given files.size()?
for(vector<const char *>::const_iterator i = files.begin(); i != files.end(); ++i) {
if(file_is_good(*i)) {
new_files.push_back(*i);
}
}

Dynamically sorted STL containers

I'm fairly new to the STL, so I was wondering whether there are any dynamically sortable containers? At the moment my current thinking is to use a vector in conjunction with the various sort algorithms, but I'm not sure whether there's a more appropriate selection given the (presumably) linear complexity of inserting entries into a sorted vector.
To clarify "dynamically", I am looking for a container that I can modify the sorting order at runtime - e.g. sort it in an ascending order, then later re-sort in a descending order.
You'll want to look at std::map
std::map<keyType, valueType>
The map is sorted based on the < operator provided for keyType.
Or
std::set<valueType>
Also sorted on the < operator of the template argument, but does not allow duplicate elements.
There's
std::multiset<valueType>
which does the same thing as std::set but allows identical elements.
I highly reccomend "The C++ Standard Library" by Josuttis for more information. It is the most comprehensive overview of the std library, very readable, and chock full of obscure and not-so-obscure information.
Also, as mentioned by 17 of 26, Effective Stl by Meyers is worth a read.
If you know you're going to be sorting on a single value ascending and descending, then set is your friend. Use a reverse iterator when you want to "sort" in the opposite direction.
If your objects are complex and you're going to be sorting in many different ways based on the member fields within the objects, then you're probably better off with using a vector and sort. Try to do your inserts all at once, and then call sort once. If that isn't feasible, then deque may be a better option than the vector for large collections of objects.
I think that if you're interested in that level of optimization, you had better be profiling your code using actual data. (Which is probably the best advice anyone here can give: it may not matter that you call sort after each insert if you're only doing it once in a blue moon.)
It sounds like you want a multi-index container. This allows you to create a container and tell that container the various ways you may want to traverse the items in it. The container then keeps multiple lists of the items, and those lists are updated on each insert/delete.
If you really want to re-sort the container, you can call the std::sort function on any std::deque, std::vector, or even a simple C-style array. That function takes an optional third argument to determine how to sort the contents.
The stl provides no such container. You can define your own, backed by either a set/multiset or a vector, but you are going to have to re-sort every time the sorting function changes by either calling sort (for a vector) or by creating a new collection (for set/multiset).
If you just want to change from increasing sort order to decreasing sort order, you can use the reverse iterator on your container by calling rbegin() and rend() instead of begin() and end(). Both vector and set/multiset are reversible containers, so this would work for either.
std::set is basically a sorted container.
You should definitely use a set/map. Like hazzen says, you get O(log n) insert/find. You won't get this with a sorted vector; you can get O(log n) find using binary search, but insertion is O(n) because inserting (or deleting) an item may cause all existing items in the vector to be shifted.
It's not that simple. In my experience insert/delete is used less often than find. Advantage of sorted vector is that it takes less memory and is more cache-friendly. If happen to have version that is compatible with STL maps (like the one I linked before) it's easy to switch back and forth and use optimal container for every situation.
in theory an associative container (set, multiset, map, multimap) should be your best solution.
In practice it depends by the average number of the elements you are putting in.
for less than 100 elements a vector is probably the best solution due to:
- avoiding continuous allocation-deallocation
- cache friendly due to data locality
these advantages probably will outperform nevertheless continuous sorting.
Obviously it also depends on how many insertion-deletation you have to do. Are you going to do per-frame insertion/deletation?
More generally: are you talking about a performance-critical application?
remember to not prematurely optimize...
The answer is as always it depends.
set and multiset are appropriate for keeping items sorted but are generally optimised for a balanced set of add, remove and fetch. If you have manly lookup operations then a sorted vector may be more appropriate and then use lower_bound to lookup the element.
Also your second requirement of resorting in a different order at runtime will actually mean that set and multiset are not appropriate because the predicate cannot be modified a run time.
I would therefore recommend a sorted vector. But remember to pass the same predicate to lower_bound that you passed to the previous sort as the results will be undefined and most likely wrong if you pass the wrong predicate.
Set and multiset use an underlying binary tree; you can define the <= operator for your own use. These containers keep themselves sorted, so may not be the best choice if you are switching sort parameters. Vectors and lists are probably best if you are going to be resorting quite a bit; in general list has it's own sort (usually a mergesort) and you can use the stl binary search algorithm on vectors. If inserts will dominate, list outperforms vector.
STL maps and sets are both sorted containers.
I second Doug T's book recommendation - the Josuttis STL book is the best I've ever seen as both a learning and reference book.
Effective STL is also an excellent book for learning the inner details of STL and what you should and shouldn't do.
For "STL compatible" sorted vector see A. Alexandrescu's AssocVector from Loki.