STL container to select and remove random item? - c++

The algorithm I'm implementing has the structure:
while C is not empty
select a random entry e from C
if some condition on e
append some new entries to C (I don't care where)
else
remove e from C
It's important that each iteration of the loop e is chosen at random (with uniform probability).
Ideally the select, append and remove steps are all O(1).
If I understand correctly, using std::list the append and remove steps will be O(1) but the random selection will be O(n) (e.g., using std::advance as in this solution).
And std::deque and std::vector seem to have complementary O(1) and O(n) operations.
I'm guessing that std::set will introduce some O(log n) complexity.
Is there any stl container that supports all three operations that I need in constant time (or amortized constant time)?

If you don't care about order and uniqueness of elements in your container, you can use the following:
std::vector<int> C;
while (!C.empty()) {
size_t pos = some_function_returning_a_number_between_zero_and_C_size_minus_one();
if (condition())
C.push_back(new_entry);
else {
C[i] = std::move(C.back());
C.pop_back();
}
}

No such container exists if element order should be consistent. You can get O(1) selection and (amortized) append with vector or deque, but removal is O(n). You can get O(1) (average case) insertion and removal with unordered_map, but selection is O(n). list gets you O(1) for append and removal, but O(n) selection. There is no container that will get you O(1) for all three operations. Figure out the least commonly used one, choose a container which works for the others, and accept the one operation will be slower.
If the order of the container doesn't matter per use 3365922's comment, the removal step could be done in O(1) on a vector/deque by swapping the element to be removed with the final element, then performing a pop_back.

I'm guessing that std::set will introduce some O(log n) complexity.
Not quite. Random selection in a set has linear compexity.
Is there any stl container that supports all three operations that I need in constant time (or amortized constant time)?
Strictly speaking no.
However, if you don't care about the order of the elements, then you can remove from a vector or deque in constant time. With this relaxation of requirements, all operations would have constant complexity.
In case you did need to keep the order between operations, constant complexity would still be possible as long as the order of the elements doesn't need to affect the random distribution (i.e. you want even distribution). The solution is to use a hybrid approach:
Store the values in a linked list. Store iterator to each element in a vector. Use the vector for random selection; Erase the element of the list using the iterator which keeps the order of elements; Erase the iterator from the vector without maintaining order of the iterators. When adding elements to the list, remember to add the iterator.

Related

C++ - List with logarithmic read, insertion at given position

I'm looking for data structure that behaves like a list, where we can insert an element at ANY given position and then read an element at ANY given position, where insertion and reading should be in logarithmic time. Is there something like this in the standard library or maybe I'm stuck with having to write this on my own (I know it can be implemented as a tree)?
std::multiset behaves pretty much like the logarithmic std::list that you are looking for
iteration is bidirectional
insertion / reading are O(log N)
Note however (as pointed out by #SergeRogatch) that the "price" you pay for O(log N) lookup (instead of O(N) for list) multiset will order elements as they are inserted. This behaves differently than std::list. This also means that your elements need to be comparable using std::less<> or you need to provide your own comparator.
An alternative would be to use std::unordered_multiset (i.e. a hash table), which has amortized O(1) element acces, but then there is no deterministic order either. But again, then your elements need to be usable with std::hash<> or you need to write your own hash function.

std::vector vs std::list for insertion frequency and dynamic size

Consider the case where I've to add a random number of items to a container i.e. the size of the container cannot be predicted, the frequency of the insertion is high ,and the insertion should be at the end of the container, additional to that I want to remove an element with almost a constant time.
Note: I want also to use the list or the vector in the shared memory.
So in this case, which is better to use std::vector or std::list?
With the clarification in the comments, the answer is std::vector. This is not surprising, as std::list is rarely the best container for a job.
Adding elements to the end is amortized constant time, and removing an element at a random index is as fast as a list, as finding elements in a list typically takes longer than deleting them from a vector.
Note that if order does not matter, you can swap an arbitrary element of a vector with the end one, then pop_back for constant-time random-access erase.
If you regularly iterate over the container, you can remove_if - erase in order to erase elements efficiently at the same time. If erasure happens at a different time than iteration, marking elements as 'to be erased' then erasing them on the next iteration can keep things sane.
Another container to consider if the element order does not matter is unordered_set.

C++ std::unordered_map complexity

I've read a lot about unordered_map (c++11) time-complexity here at stackoverflow, but I haven't found the answer for my question.
Let's assume indexing by integer (just for example):
Insert/at functions work constantly (in average time), so this example would take O(1)
std::unordered_map<int, int> mymap = {
{ 1, 1},
{ 100, 2},
{ 100000, 3 }
};
What I am curious about is how long does it take to iterate through all (unsorted) values stored in map - e.g.
for ( auto it = mymap.begin(); it != mymap.end(); ++it ) { ... }
Can I assume that each stored value is accessed only once (or twice or constant-times)? That would imply that iterate through all values is in N-valued map O(N). The other possibility is that my example with keys {1,10,100000} could take up to 1000000 iteration (if represented by array)
Is there any other container, that can be iterated linearly and value accessed by given key constantly?
What I would really need is (pseudocode)
myStructure.add(key, value) // O(1)
value = myStructure.at(key) // O(1)
for (auto key : mySructure) {...} // O(1) for each key/value pair = O(N) for N values
Is std::unordered_map the structure I need?
Integer indexing is sufficient, average complexity as well.
Regardless of how they're implemented, standard containers provide iterators that meet the iterator requirements. Incrementing an iterator is required to be constant time, so iterating through all the elements of any standard container is O(N).
The complexity guarantees of all standard containers are specified in the C++ Standard.
std::unordered_map element access and element insertion is required to be of complexity O(1) on average and O(N) worst case (cf. Sections 23.5.4.3 and 23.5.4.4; pages 797-798).
A specific implementation (that is, a specific vendor's implementation of the Standard Library) can choose whatever data structure they want. However, to be compliant with the Standard, their complexity must be at least as specified.
There's a few different ways that a hash table can be implemented, and I suggest you read more on those if you're interested, but the main two are through chaining and open addressing.
In the first case you have an array of linked lists. Each entry in the array could be empty, each item in the hashtable will be in some bucket. So iteration is walking down the array, and walking down each non-empty list in it. Clearly O(N), but could potentially be very memory inefficient depending on how the linked lists themselves are allocated.
In the second case, you just have one very large array which will have lots of empty slots. Here, iteration is again clearly linear, but could be inefficient if the table is mostly empty (which is should be for lookup purposes) because the elements that are actually present will be in different cache lines.
Either way, you're going to have linear iteration and you're going to be touching every element exactly once. Note that this is true of std::map also, iteration will be linear there as well. But in the case of the maps, iteration will definitely be far less efficient that iterating a vector, so keep that in mind. If your use-case involves requiring BOTH fast lookup and fast iteration, if you insert all your elements up front and never erase, it could be much better to actually have both the map and the vector. Take the extra space for the added performance.

How to efficiently insert a range of consecutive integers into a std::set?

In C++, I have a std::set that I would like to insert a range of consecutive integers. How can I do this efficiently, hopefully in O(n) time where n is the length of the range?
I'm thinking I'd use the inputIterator version of std::insert, but am unclear on how to build the input iterator.
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(inputIteratorTo34, inputIteratorTo75);
How can I create the input iterator and will this be O(n) on the range size?
The efficient way of inserting already ordered elements into a set is to hint the library as to where the next element will be. For that you want to use the version of insert that takes an iterator:
std::set<int>::iterator it = mySet.end();
for (int x : input) {
it = mySet.insert(it, x);
}
On the other hand, you might want to consider other containers. Whenever possible, use std::vector. If the amount of insertions is small compared to lookups, or if all inserts happen upfront, then you can build a vector, sort it and use lower_bound for lookups. In this case, since the input is already sorted, you can skip the sorting.
If insertions (or removals) happen all over the place, you might want to consider using std::unordered_set<int> which has an average O(1) insertion (per element) and lookup cost.
For the particular case of tracking small numbers in a set, all of which are small (34 to 75 are small numbers) you can also consider using bitsets or even a plain array of bool in which you set the elements to true when inserted. Either will have O(n) insertion (all elements) and O(1) lookup (each lookup), which is better than the set.
A Boost way could be:
std::set<int> numbers(
boost::counting_iterator<int>(0),
boost::counting_iterator<int>(10));
A great LINK for other answers, Specially #Mani's answer
std::set is a type of binary-search-tree, which means an insertion costs O(lgn) on average,
c++98:If N elements are inserted, Nlog(size+N) in general, but linear
in size+N if the elements are already sorted according to the same
ordering criterion used by the container.
c++11:If N elements are inserted, Nlog(size+N). Implementations may
optimize if the range is already sorted.
I think the C++98 implement will trace the current insertion node and check if the next value to insert is larger than the current one, in which case there's no need to start from root again.
in c++11, this is an optional optimize, so you may implement a skiplist structure, and use this range-insert feture in your implement, or you may optimize the programm according to your scenarios
Taking the hint provided by aksham, I see the answer is:
#include <boost/iterator/counting_iterator.hpp>
std::set<int> mySet;
// Insert [34 - 75):
mySet.insert(boost::counting_iterator<int>(34),
boost::counting_iterator<int>(75));
It's not clear why you specifically want to insert using iterators to specify a range.
However, I believe you can use a simple for-loop to insert with the desired O(n) complexity.
Quoting from cppreference's page on std::set, the complexity is:
If N elements are inserted, Nlog(size+N) in general, but linear in size+N if the elements are already sorted according to the same ordering criterion used by the container.
So, using a for-loop:
std::set<int> mySet;
for(int i = 34; i < 75; ++i)
mySet.insert(i);

A variation of priority queue

I need some kind of priority queue to store pairs <key, value>. Values are unique, but keys aren't. I will be performing the following operations (most common first):
random insertion;
retrieving (and removing) all elements with the least key.
random removal (by value);
I can't use std::priority_queue because it only supports removing the head.
For now, I'm using an unsorted std::list. Insertion is performed by just pushing new elements to the back (O(1)). Operation 2 sorts the list with list::sort (O(N*logN)), before performing the actual retrieval. Removal, however, is O(n), which is a bit expensive.
Any idea of a better data structure?
When you need order, use an ordered container. There is no point in paying the cost of sorting later on.
Your current solution is:
Insertion O(1)
Retrieval O(N log N)
Removal O(N) (which is as good as you can get without keeping another index there)
Simply using a std::multi_map you could have:
Insertion O(log N)
Retrieval O(log N) <-- much better isn't it ? We need to find the end of the range
Removal O(N)
Now, you could do slightly better with a std::map< key, std::vector<value> >:
Insertion O(log M) where M is the number of distinct keys
Retrieval O(1) (begin is guaranteed to be amortized constant time)
Removal O(N)
You can't really push the random removal... unless you're willing to keep another index there. For example:
typedef std::vector<value_type> data_value_t;
typedef std::map<key_type, data_value_t> data_t;
typedef std::pair<data_t::iterator,size_t> index_value_t;
// where iterator gives you the right vector and size_t is an index in it
typedef std::unordered_map<value_type, index_value_t> index_t;
But keeping this second index up to date is error prone... and will be done at the expense of the other operations! For example, with this structure, you would have:
Insertion O(log M) --> complexity of insertion in hash map is O(1)
Retrieval O(N/M) --> need to de index all the values in the vector, there are N/M in average
Removal O(N/M) --> finding in hash map O(1), dereferencing O(1), removing from the vector O(N/M) because we need to shift approximately half the content of the vector. Using a list would yield O(1)... but might not be faster (depends on the number of elements because of the memory tradeoff).
Also bear in mind that hash map complexity are amortized ones. Trigger a reallocation because you overgrew the load factor, and this particular insertion will take a very long time.
I'd really go with the std::map<key_type, std::vector<value_type> > in your stead. That's the best bang for the buck.
Can you reverse the order of the collection, i.e. store them in <value, key> order?
Then you could just use std::map having O(logn) time for insertion O(n) for removal (traversing whole collection) and O(logn) for random removal of value (which would be the key of said map).
If you could find a map implementation based on hashes instead of trees (like std::map) the times would be even better: O(1), O(n), O(1).
If you're using Visual Studio they have hash_multimap. I should also add that Boost has an unordered multimap, here. If you need an ordered multimap, STL multimap or ordered multiset STL multiset
std::multimap seem to be what you are searching for.
It will store your objects ordered by key, allow you to retrieve the lowest/highest key value (begin(), rbegin()) and all the object with a given key (equal_range, lower_bound, upper_bound).
(EDIT: if you have just a few items, say less than 30, you should also test the performance of just using a deque or a vector)
If I understood well, you performance target is to have fast (1) and (3), and (2) is not that important. In this case, and given that values are unique, why not just have a std::set<value>, and do a sequential search for (2)? You'd have O(log n) for (1) and (3), and O(n) for (2). Better yet, if your STL has std::hash_set, you'd have close to O(1) for (1) and (3).
If you need something better than O(n) for (2), one alternative would be to have a set of priority queues.
Ok, so I've tested many options and ended up with something based on the idea of Matthieu M.. I'm currently using a std::map<key_type, std::list<value_type> >, where the value_type contains a std::list<value_type>::iterator to itself, which is useful for removal.
Removal must check if the vector is empty, which implies a map query and possibly a call to erase. Worst-case complexity is when keys are distinct, O(logN) for insertion, O(1) for retrieval and O(logN) for removal. I've got very good experimental results comparing to other alternatives on my test machine.
Using a std::vector is less efficient both in terms of theoretical complexity (O(N) worst-case for removal when keys are identical) and experimentation I've been doing.