I couldn't find an answer but I am pretty sure I am not the first one looking for this.
Did anyone know / use / see an STL like container with bidirectional access iterator that has O(1) complexity for Insert/Erase/Lookup ?
Thank you.
There is no abstract data type with O(1) complexity for Insert, Erase AND Lookup which also provides a bi-directional access iterator.
Edit:
This is true for an arbitrarily large domain. Given a sufficiently small domain you can implement a set with O(1) complexity for Insert, Erase and Lookup and a bidirectional access iterator using an array and a doubly linked list:
std::list::iterator array[MAX_VALUE];
std::list list;
Initialise:
for (int i=0;i<MAX_VALUE;i++)
array[i] = list.end();
Insert:
if (array[value] != list.end())
array[value] = list.insert(value);
Erase:
if (array[value] != list.end()) {
array[value].erase();
array[value] = list.end();
}
Lookup:
array[value] != list.end()
tr1's unordered_set (also available in boost) is probably what you are looking for. You don't specify whether or not you want a sequence container or not, and you don't specify what you are using to give O(1) lookup (ie. vectors have O(1) lookup on index, unordered_set mentioned above has O(1) average case lookup based on the element itself).
In practice, it may be sufficient to use array (vector) and defer costs of inserts and deletes.
Delete element by marking it as deleted, insert element into bin at desired position and remember offset for larger indices.
Inserts and deletes will O(1) plus O(N) cleanup at convenient time; lookup will be O(1) average, O(number of changes since last cleanup) worst case.
Associative arrays (hashtable) have O(1) lookup complexity, while doubly linked lists have O(1) bidi iteration.
One trick I've done when messing about storage optimization is to implement a linked list with an add of O(1)[1], then have a caching operation which provides a structure with a faster O(n) lookup[2]. The actual cache takes some O(n) time to build, and I didn't focus on erase. So I 'cheated' a bit and pushed the work into another operation. But if you don't have to do a ton of adds/deletes, it's not a bad way to do it.
[1] Store end pointer and only add onto the end. No traversal required.
[2] I created a dynamic array[3] and searched against it. Since the data wasn't sorted, I couldn't binsearch against it for O(lg n) time. Although I suppose I could have sorted it.
[3]Arrays have better cache performance than lists.
Full list of all the complexity gurantees for the STL can be found here:
What are the complexity guarantees of the standard containers?
Summary:
Insert: No container gurantees O(1) for generic insert.
The only container that has a genric insert gurtantee is: the 'Associative Container'. And this is O(ln(n))
There are containers the provide limited insert gurantees
Forward sequece gurantee an insert at head of O(1)
Back sequence gurantee an insert at tail of O(1)
Erase
The Associative containers gurantee O(1) for erase (If you have an iterator).
Lookup:
If you mean element access by lookup (as no container has O(1) find capabilities).
Then Random Access container is the only container with O(1) accesses
So the answer is based on container types.
This is what the standard gurantees are defiend for how does this translate to real containers:
std::vector: Sequence, Back Sequence, Forward/Reverse/Random Container
std::deque: Sequence, Front/Back Sequence, Forward/Reverse/Random Container
std::list: Sequence, Front/Back Seuqence, Forward/Reverse Container
std::set: Sorted/Simple/Unique Associative Container, Forward Container
std::map: Sorted/Pair/Unique Associative Container, Forward Container
std::multiset: Sorted/Simple/Multiple Associative Container, Forward Container
std::multimap: Sorted/Pair/Multiple Associative Container, Forward Container
You won't be able to fit all of your requirements into one container... something's gotta give ;)
However, maybe this is interesting for you:
http://www.cplusplus.com/reference/stl/
Related
I'm trying to get a better understanding of the implementations of map-like std container. By map-like, I mean something with a key/value pair. Which container(s) performs the fewest copies during insert and erase (or which is better at each if it's not the same)?
There are two standard map-like containers. std::map and std::unordered_map. There're also their multi-map variants, but I presume that those don't count as "map-like" for the purpose of this question.
Neither standard map like container performs any copies of elements during insert and move. They will perform operations, such as copies on their internal structure however.
std::map insert complexity is logarithmic unless you use a good hint in which case it's amortised constant time. std::map erase complexity is constant with an iterator; otherwise logarithmic. std::unordered_map insert and erase complexity is linear in worst case; constant on average.
The algorithm I'm implementing has the structure:
while C is not empty
select a random entry e from C
if some condition on e
append some new entries to C (I don't care where)
else
remove e from C
It's important that each iteration of the loop e is chosen at random (with uniform probability).
Ideally the select, append and remove steps are all O(1).
If I understand correctly, using std::list the append and remove steps will be O(1) but the random selection will be O(n) (e.g., using std::advance as in this solution).
And std::deque and std::vector seem to have complementary O(1) and O(n) operations.
I'm guessing that std::set will introduce some O(log n) complexity.
Is there any stl container that supports all three operations that I need in constant time (or amortized constant time)?
If you don't care about order and uniqueness of elements in your container, you can use the following:
std::vector<int> C;
while (!C.empty()) {
size_t pos = some_function_returning_a_number_between_zero_and_C_size_minus_one();
if (condition())
C.push_back(new_entry);
else {
C[i] = std::move(C.back());
C.pop_back();
}
}
No such container exists if element order should be consistent. You can get O(1) selection and (amortized) append with vector or deque, but removal is O(n). You can get O(1) (average case) insertion and removal with unordered_map, but selection is O(n). list gets you O(1) for append and removal, but O(n) selection. There is no container that will get you O(1) for all three operations. Figure out the least commonly used one, choose a container which works for the others, and accept the one operation will be slower.
If the order of the container doesn't matter per use 3365922's comment, the removal step could be done in O(1) on a vector/deque by swapping the element to be removed with the final element, then performing a pop_back.
I'm guessing that std::set will introduce some O(log n) complexity.
Not quite. Random selection in a set has linear compexity.
Is there any stl container that supports all three operations that I need in constant time (or amortized constant time)?
Strictly speaking no.
However, if you don't care about the order of the elements, then you can remove from a vector or deque in constant time. With this relaxation of requirements, all operations would have constant complexity.
In case you did need to keep the order between operations, constant complexity would still be possible as long as the order of the elements doesn't need to affect the random distribution (i.e. you want even distribution). The solution is to use a hybrid approach:
Store the values in a linked list. Store iterator to each element in a vector. Use the vector for random selection; Erase the element of the list using the iterator which keeps the order of elements; Erase the iterator from the vector without maintaining order of the iterators. When adding elements to the list, remember to add the iterator.
I'm looking for data structure that behaves like a list, where we can insert an element at ANY given position and then read an element at ANY given position, where insertion and reading should be in logarithmic time. Is there something like this in the standard library or maybe I'm stuck with having to write this on my own (I know it can be implemented as a tree)?
std::multiset behaves pretty much like the logarithmic std::list that you are looking for
iteration is bidirectional
insertion / reading are O(log N)
Note however (as pointed out by #SergeRogatch) that the "price" you pay for O(log N) lookup (instead of O(N) for list) multiset will order elements as they are inserted. This behaves differently than std::list. This also means that your elements need to be comparable using std::less<> or you need to provide your own comparator.
An alternative would be to use std::unordered_multiset (i.e. a hash table), which has amortized O(1) element acces, but then there is no deterministic order either. But again, then your elements need to be usable with std::hash<> or you need to write your own hash function.
What is the most efficient way of adding non-repeated elements into STL container and what kind of container is the fastest? I have a large amount of data and I'm afraid each time I try to check if it is a new element or not, it takes a lot of time. I hope map be very fast.
// 1- Map
map<int, int> Map;
...
if(Map.find(Element)!=Map.end()) Map[Element]=ID;
// 2-Vector
vector<int> Vec;
...
if(find(Vec.begin(), Vec.end(), Element)!=Vec.end()) Vec.push_back(Element);
// 3-Set
// Edit: I made a mistake: set::find is O(LogN) not O(N)
Both set and map has O(log(N)) performance for looking up keys. vector has O(N).
The difference between set and map, as far as you should be concerned, is whether you need to associate a key with a value, or just store a value directly. If you need the former, use a map, if you need the latter, use a set.
In both cases, you should just use insert() instead of doing a find().
The reason is insert() will insert the value into the container if and only if the container does not already contain that value (in the case of map, if the container does not contain that key). This might look like
Map.insert(std::make_pair(Element, ID));
for a map or
Set.insert(Element);
for a set.
You can consult the return value to determine whether or not an insertion was actually performed.
If you're using C++11, you have two more choices, which are std::unordered_map and std::unordered_set. These both have amortized O(1) performance for insertions and lookups. However, they also require that the key (or value, in the case of set) be hashable, which means you'll need to specialize std::hash<> for your key. Conversely, std::map and std::set require that your key (or value, in the case of set) respond to operator<().
If you're using C++11, you can use std::unordered_set. That would allow you O(1) existence-checking (technically amortized O(1) -- O(n) in the worst case).
std::set would probably be your second choice with O(lg n).
Basically, std::unordered_set is a hash table and std::set is a tree structure (a red black tree in every implementation I've ever seen)1.
Depending on how well your hashes distribute and how many items you have, a std::set might actually be faster. If it's truly performance critical, then as always, you'll want to do benchmarking.
1) Technically speaking, I don't believe either are required to be implemented as a hash table or as a balanced BST. If I remember correctly, the standard just mandates the run time bounds, not the implementation -- it just turns out that those are the only viable implementations that fit the bounds.
You should use a std::set; it is a container designed to hold a single (equivalent) copy of an object and is implemented as a binary search tree. Therefore, it is O(log N), not O(N), in the size of the container.
std::set and std::map often share a large part of their underlying implementation; you should check out your local STL implementation.
Having said all this, complexity is only one measure of performance. You may have better performance using a sorted vector, as it keeps the data local to one another and, therefore, more likely to hit the caches. Cache coherence is a large part of data structure design these days.
Sounds like you want to use a std::set. It's elements are unique, so you don't need to care about uniqueness when adding elements, and a.find(k) (where a is an std::set and k is a value) is defined as being logarithmic in complexity.
if your elements can be hashed for O(1), then better to use an index in a unordered_map or unordered_set (not in a map/set because they use RB tree in implementation which is O(logN) find complexity)
Your examples show a definite pattern:
check if the value is already in container
if not, add the value to the container.
Both of these operation would potentially take some time. First, looking up an element can be done in O(N) time (linear search) if the elements are not arranged in any particular manner (e.g., just a plain std::vector), it could be done in O(logN) time (binary search) if the elements are sorted (e.g., either std::map or std::set), and it could be done in O(1) time if the elements are hashed (e.g., either std::unordered_map or std::unordered_set).
The insertion will be O(1) (amortized) for a plain vector or an unordered container (hash container), although the hash container will be a bit slower. For a sorted container like set or map, you'll have log-time insertions because it needs to look for the place to insert it before inserting it.
So, the conclusion, use std::unordered_set or std::unordered_map (if you need the key-value feature). And you don't need to check before doing the insertion, these are unique-key containers, they don't allow duplicates.
If std::unordered_set / std::unordered_map (from C++11) or std::tr1::unordered_set / std::tr1::unordered_map (since 2007) are not available to you (or any equivalent), then the next best alternative is std::set / std::map.
I'm looking for a C++ implementation of a data structure ( or a combination of data structures ) that meet the following criteria:
items are accessed in the same way as in std::vector
provides random access iterator ( along with iterator comparison <,> )
average item access(:lookup) time is at worst of O(log(n)) complexity
items are iterated over in the same order as they were added to the container
given an iterator, i can find out the ordinal position of the item pointed to in the container, at worst of O(log(n)) complexity
provides item insertion and removal at specific position of at worst O(log(n)) complexity
removal/insertion of items does not invalidate previously obtained iterators
Thank you in advance for any suggestions
Dalibor
(Edit) Answers:
The answer I selected describes a data structure that meet all these requirements. However, boost::multi_index, as suggested by Maxim Yegorushkin, provides features very close to those above.
(Edit) Some of the requirements were not correctly specified. They are modified according to correction(:original)
(Edit) I've found an implementation of the data structure described in the accepted answer. So far, it works as expected. It's called counter tree
(Edit) Consider using the AVL-Array suggested by sp2danny
Based on your requirements boost::multi_index with two indices does the trick.
The first index is ordered index. It allows for O(log(n)) insert/lookup/remove. The second index is random access index. It allows for random access and the elements are stored in the order of insertion. For both indices iterators don't get invalidated when other elements are removed. Converting from one iterator to another is O(1) operation.
Let's go through these...
average item lookup time is at worst of O(log(n)) complexity
removal/insertion of items does not invalidate previously obtained iterators
provides item insertion and removal of at worst O(log(n)) complexity
That pretty much screams "tree".
provides random access iterator ( along with iterator comparison <,> )
given an iterator, i can find out the ordinal position of the item pointed to in the container, at worst of O(log(n)) complexity
items are iterated over in the same order as they were added to the container
I'm assuming that the index you're providing your random-access iterator is by order of insertion, so [0] would be the oldest element in the container, [1] would be the next oldest, etc. This means that, on deletion, for the iterators to be valid, the iterator internally cannot store the index, since it could change without notice. So just using a map with the key being the insertion order isn't going to work.
Given that, each node of your tree needs to keep track of how many elements are in each subtree, in addition to its usual members. This will allow random-access with O(log(N)) time. I don't know of a ready-to-go set of code, but subclassing std::rb_tree and std::rb_node would be my starting point.
See here: STL Containers (scroll down the page to see information on algorithmic complexity) and I think std::deque fits your requirements.
AVL-Array should fit the bill.
Here's my "lv" container that fit the requirement, O(log n) insert/delete/access time.
https://github.com/xhawk18/lv
The container is header only C++ libraries,
and has the same iterator and functions with other C++ containers, such as list and vector.
"lv" container is based on rb-tree, each node of which has a size value about the amount of nodes in the sub-tree. By check the size of left/right child of a tree, we can fast access the node randomly.