C++ std::set index of insert - c++

I've faced a following issue:
Suppose I've got a std::set named Numbers, containing n values. I want to insert (n+1)th value (equal to x), which I in advance know not to be in the set yet. What I need is some way to check, in which position will it be inserted, or, equivalently, how many of elements less than x are already contained in Numbers.
I definitely know some ways of doing it at O(n), but what I need is O(log(n)). Theoretically it might be possible as std::set is usually implemented as Binary Search Tree (presumably O(log(n)) is possible only if it stores information about sizes of each subtree in each vertex). The question is whether it's technically possible, and if it is, how to do it.

There's no "position" in set, there's iterator and set gives you no promises regarding implementation. You can, probably use lower/upper_bound and count elements, but I don't think it's going to take internals into account.

All of the set functions are going to work with iterators; the iterator of a set is bidirectional, not random-access, so determining the position will be an O(n) operation.
You don't need to know the position to insert a new element in the set, and insertions are O(log n).

You can find "position" where this new element would be inserted in O(lon(n)) using set::lower_bound, but it's just an iterator. std::set::iterator is bidirectional, not random access, so you cannot count how many elements are smaller than that new one in O(lon(n))

Maybe you should use set::lower_bound(), which time, according to this (http://lafstern.org/matt/col1.pdf) document, should be proportional to log N

Instead of:
std::set<MyT> mySet;
use:
std::set<std::pair<MyT,int>> mySet;
Then, for example:
//inserting a std::vector<MyT> myVec:
for (int i=0; i<myVec.size(); i++)
mySet.insert( std::pair<MyT,int>(myVec[i], i) );
The sorted result:
for (auto it=mySet.begin(); it!=mySet.end(); ++it)
cout << it->first << " index=" << it->second << "\n";

Related

Where will a new element be inserted in a std::set?

I have a loop like this (where mySet is a std::set):
for(auto iter=mySet.begin(); iter!=mySet.end(); ++iter){
if (someCondition){mySet.insert(newElement);}
if (someotherCondition){mySet.insert(anothernewElement);}
}
I am experiencing some strange behavior, and I am asking myself if this could be due to the inserted element being inserted "before" the current iterator position in the loop. Namely, I have an Iteration where both conditions are true, but still the distance
distance(iter, mySet.end())
is only 1, not 2 as I would expect. Is my guess about set behavior right? And more importantly, can I still do what I want to do?
what I'm trying to do is to build "chains" on a hexagonal board beween fields of the same color. I have a set containing all fields of my color, and the conditions check the color of neighboring fields, and if they are of the same color, copy this field to mySet, so the chain.
I am trying to use std::set for this because it allows no fields to be in the chain more than once. Reading the comments so far I fear I need to swich to std::vector, where append() will surely add the element at the end, but then I will run into new problems due to having to think of a way to forbid doubling of elements. I therefore am hoping for advice how to solve this the best way.
Depending on the new element's value, it may be inserted before or after current iterator value. Below is an example of inserting before and after an iterator.
#include <iostream>
#include <set>
int main()
{
std::set<int> s;
s.insert(3);
auto it = s.begin();
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(2); // 2 will be inserted before it
std::cout << std::distance(it, s.end()) << std::endl; // prints 1
s.insert(5); // 5 will be inserted after it
std::cout << std::distance(it, s.end()) << std::endl; // prints 2
}
Regarding your question in the comments: In my particular case, modifying it while iterating is basically exactly what I want, but of course I need to add averything after the current position; no you can not manually arrange the order of the elements. A new value's order is determined by comparing the new one and existing elements. Below is the quote from cppreference.
std::set is an associative container that contains a sorted set of unique objects of type Key. Sorting is done using the key comparison function Compare. Search, removal, and insertion operations have logarithmic complexity. Sets are usually implemented as red-black trees.
Thus, the implementation of the set will decide where exactly it will be placed.
If you really need to add values after current position, you need to use a different container. For example, simply a vector would be suitable:
it = myvector.insert ( it+1 , 200 ); // +1 to add after it
If you have a small number of items, doing a brute-force check to see if they're inside a vector can actually be faster than checking if they're in a set. This is because vectors tend to have better cache locality than lists.
We can write a function to do this pretty easily:
template<class T>
void insert_unique(std::vector<T>& vect, T const& elem) {
if(std::find(vect.begin(), vect.end(), elem) != vect.end()) {
vect.push_back(elem);
}
}

Stable sort a C++ hash map - preserve the insertion order for equal elements

Say I have a std::unordered_map<std::string, int> that represents a word and the number of times that word appeared in a book, and I want to be able to sort it by the value.
The problem is, I want the sorting to be stable, so that in case two items have equal value I want the one who got inserted first to the map to be first.
It is simple to implement it by adding addition field that will keep the time it got inserted. Then, create a comperator that uses both time and the value. Using simple std::sort will give me O(Nlog(N)) time complexity.
In my case, space is not an issue whenever time can be improved. I want to take advantage of it and do a bucket sorting. Which should give me O(N) time complexity. But when using bucket sorting, there is no comperator, when iterating the items in the map the order is not preserved.
How can I both make it stable and still keep the O(N) time complexity via bucket sorting or something else?
I guess that if I had some kind of hash map that preserves the order of insertion while iterating it, it would solve my issue.
Any other solutions with the same time complexity are acceptable.
Note - I already saw this and that and due to the fact that they are both from 2009 and that my case is more specific I think, I opened this question.
Here is a possible solution I came up with using an std::unordered_map and tracking the order of inserting using a std::vector.
Create a hash map with the string as key and count as value.
In addition, create a vector with iterators to that map type.
When counting elements, if the object is not yet in the map, add to both map and vector. Else, just increment the counter. The vector will preserve the order the elements got inserted to the map, and the insertion / update will still be in O(1) time complexity.
Apply bucket sort by iterating over the vector (instead of the map), this ensures the order is preserved and we'll get a stable sort. O(N)
Extract from the buckets to make a sorted array. O(N)
Implementation:
unordered_map<std::string, int> map;
std::vector<std::unordered_map<std::string,int>::iterator> order;
// Lets assume this is my string stream
std::vector<std::string> words = {"a","b","a" ... };
// Insert elements to map and the corresponding iterator to order
for (auto& word : words){
auto it = map.emplace(word,1);
if (!it.second){
it.first->second++;
}
else {
order.push_back(it.first);
}
max_count = std::max(max_count,it.first->second);
}
// Bucket Sorting
/* We are iterating over the vector and not the map
this ensures we are iterating by the order they got inserted */
std::vector<std::vector<std::string>> buckets(max_count);
for (auto o : order){
int count = o->second;
buckets[count-1].push_back(o->first);
}
std::vector<std::string> res;
for (auto it = buckets.rbegin(); it != buckets.rend(); ++it)
for (auto& str : *it)
res.push_back(str);

What is the fastest way to compare set of strings to one string?

I have set of strings and I need to find if one specific string is in it. I need to do this only one time (next time strings are different).
I'm thinking to sort strings with bucket sort and then do binary search.
Time complexity: O(n+k)+O(log n)
Is there any faster/better solution?
With set I mean more strings not std::set.
To summarize the comments above in an answer. If you are loading strings to be compared on the fly and do not need them to be in a specific order, then std::unordered_set is by far the fastest.
unordered_set is a hash set and will punch your string through a hash function and find if it is already in the set in constant time O(1).
If you need to retain the order of the elements then it becomes a question what is faster of retaining a vector and doing a linear search though it, or whether it is still worth to build the hash set.
Code:
std::unordered_set<std::string> theSet;
// Insert a few elements.
theSet.insert("Mango");
theSet.insert("Grapes");
theSet.insert("Bananas");
if ( theSet.find("Hobgoblins") == theSet.end() ) {
cout << "Could not find any hobgoblins in the set." << endl;
}
if ( theSet.find("Bananas") != theSet.end() ) {
cout << "But we did find bananas!!! YAY!" << endl;
}
For comparison:
If you use std::vector you will need O(n) time building the vector and then O(n) time finding an element.
If you use std::unordered_set you will still need O(n) time to build the vector, but afterwards you can find an element in constant time O(1).

Inserting elements at desired positions in a STL map

map <int, string> rollCallRegister;
map <int, string> :: iterator rollCallRegisterIter;
map <int, string> :: iterator temporaryRollCallRegisterIter;
rollCallRegisterIter = rollCallRegister.begin ();
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (55, "swati"));
rollCallRegisterIter++;
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (44, "shweta"));
rollCallRegisterIter++;
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (33, "sindhu"));
// Displaying contents of this map.
cout << "\n\nrollCallRegister contains:\n";
for (rollCallRegisterIter = rollCallRegister.begin(); rollCallRegisterIter != rollCallRegister.end(); ++rollCallRegisterIter)
{
cout << (*rollCallRegisterIter).first << " => " << (*rollCallRegisterIter).second << endl;
}
Output:
rollCallRegister contains:
33 => sindhu
44 => shweta
55 => swati
I have incremented the iterator. Why is it still getting sorted? And if the position is supposed to be changed by the map on its own, then what's the purpose of providing an iterator?
Because std::map is a sorted associative container.
In a map, the key value is generally used to uniquely identify the element, while the mapped value is some sort of value associated to this key.
According to here position parameter is
the position of the first element to be compared for the insertion
operation. Notice that this does not force the new element to be in
that position within the map container (elements in a set always
follow a specific ordering), but this is actually an indication of a
possible insertion position in the container that, if set to the
element that precedes the actual location where the element is
inserted, makes for a very efficient insertion operation. iterator is
a member type, defined as a bidirectional iterator type.
So the purpose of this parameter is mainly slightly increasing the insertion speed by narrowing the range of elements.
You can use std::vector<std::pair<int,std::string>> if the order of insertion is important.
The interface is indeed slightly confusing, because it looks very much like std::vector<int>::insert (for example) and yet does not produce the same effect...
For associative containers, such as set, map and the new unordered_set and co, you completely relinquish the control over the order of the elements (as seen by iterating over the container). In exchange for this loss of control, you gain efficient look-up.
It would not make sense to suddenly give you control over the insertion, as it would let you break invariants of the container, and you would lose the efficient look-up that is the reason to use such containers in the first place.
And thus insert(It position, value_type&& value) does not insert at said position...
However this gives us some room for optimization: when inserting an element in an associative container, a look-up need to be performed to locate where to insert this element. By letting you specify a hint, you are given an opportunity to help the container speed up the process.
This can be illustrated for a simple example: suppose that you receive elements already sorted by way of some interface, it would be wasteful not to use this information!
template <typename Key, typename Value, typename InputStream>
void insert(std::map<Key, Value>& m, InputStream& s) {
typename std::map<Key, Value>::iterator it = m.begin();
for (; s; ++s) {
it = m.insert(it, *s).first;
}
}
Some of the items might not be well sorted, but it does not matter, if two consecutive items are in the right order, then we will gain, otherwise... we'll just perform as usual.
The map is always sorted, but you give a "hint" as to where the element may go as an optimisation.
The insertion is O(log N) but if you are able to successfully tell the container where it goes, it is constant time.
Thus if you are creating a large container of already-sorted values, then each value will get inserted at the end, although the tree will need rebalancing quite a few times.
As sad_man says, it's associative. If you set a value with an existing key, then you overwrite the previous value.
Now the iterators are necessary because you don't know what the keys are, usually.

What’s An Algorithm or code for the obtaining ordinal position of an element in a list sorted by value in c++

This is similar to a recent question.
I will be maintaining sorted a list of values. I will be inserting items of arbitrary value into the list. Each time I insert a value, I would like to determine its ordinal position in the list (is it 1st, 2nd, 1000th). What is the most efficient data structure and algorithm for accomplishing this? There are obviously many algorithms which could allow you to do this but I don't see any way to easily do this using simple STL or QT template functionality. Ideally, I would like to know about existing open source C++ libraries or sample code that can do this.
I can imagine how to modify a B-tree or similar algorithm for this purpose but it seems like there should be an easier way.
Edit3:
Mike Seymour pretty well confirmed that, as I wrote in my original post, that there is indeed no way to accomplish this task using simple STL. So I'm looking for a good btree, balanced-tree or similar open source c++ template which can accomplish without modification or with the least modification possible - Pavel Shved showed this was possible but I'd prefer not to dive into implementing a balanced tree myself.
(the history should show my unsuccessful efforts to modify Mathieu's code to be O(log N) using make_heap)
Edit 4:
I still give credit to Pavel for pointing out that btree can give a solution to this, I have to mention that simplest way to achieve this kind of functionality without implementing a custom btree c++ template of your own is to use an in-memory database. This would give you log n and is fairly easy to implement.
Binary tree is fine with this. Its modification is easy as well: just keep in each node the number of nodes in its subtree.
After you inserted a node, perform a search for it again by walking from root to that node. And recursively update the index:
if (traverse to left subtree)
index = index_on_previous_stage;
if (traverse to right subtree)
index = index_on_previous_stage + left_subtree_size + 1;
if (found)
return index + left_subtree_size;
This will take O(log N) time, just like inserting.
I think you can std::set here. It provides you sorting behavior also returns the position of the iterator where the value is inserted. From that position you can get the index. For example:
std::set<int> s;
std::pair<std::set<int>::iterator, bool> aPair = s.insert(5);
size_t index = std::distance(s.begin(), aPair.first) ;
Note that the std::list insert(it, value) member function returns an iterator to the newly inserted element. Maybe it can help?
If, as you say in one of your comments, you only need an approximate ordinal position,
you could estimate this from the range of values you already have - you only need to read the first and last values in the collection in constant time, something like this:
multiset<int> values;
values.insert(value);
int ordinal = values.size() * (value - values.front()) /
(values.back()-values.front());
To improve the approximation, you could keep track of statistical properties (mean and variance, and possibly higher-order moments for better accuracy) of the values as you add them to a multiset. This will still be constant time. Here's a vague sketch of the sort of thing you might do:
class SortedValues : public multiset<int>
{
public:
SortedValues() : sum(0), sum2(0) {}
int insert(int value)
{
// Insert the value and update the running totals
multiset<int>::insert(value);
sum += value;
sum2 += value*value;
// Calculate the mean and deviation.
const float mean = float(sum) / size();
const float deviation = sqrt(mean*mean - float(sum2)/size());
// This function is left as an exercise for the reader.
return size() * EstimatePercentile(value, mean, deviation);
}
private:
int sum;
int sum2;
};
If you want ordinal position, you want a container which models the RandomAccessContainer Concept... basically, a std::vector.
Operations of sorts on a std::vector are relatively fast, and you can get to the position you wish using std::lower_bound or std::upper_bound, you can decide by yourself if you want multiple values at once, to retrieve all equal values, a good way is to use std::equal_range which basically gives you a the same result as applying both the lower and upper bounds but with a better complexity.
Now, for the ordinal position, the great news is that std::distance as a O(1) complexity on models of RandomAccessIterator.
typedef std::vector<int> ints_t;
typedef ints_t::iterator iterator;
ints_t myInts;
for (iterator it = another.begin(), end = another.end(); it != end; ++it)
{
int myValue = *it;
iterator search = std::lower_bound(myInts.begin(), myInts.end(), myValue);
myInts.insert(search, myValue);
std::cout << "Inserted " << myValue << " at "
<< std::distance(myInts.begin(), search) << "\n";
// Not necessary to flush there, that would slow things down
}
// Find all values equal to 50
std::pair<iterator,iterator> myPair =
std::equal_range(myInts.begin(), myInts.end(), 50);
std::cout << "There are " << std::distance(myPair.first,myPair.second)
<< " values '50' in the vector, starting at index "
<< std::distance(myInts.begin(), myPair.first) << std::endl;
Easy, isn't it ?
std::lower_bound, std::upper_bound and std::equal_range have a O(log(n)) complexity and std::distance has a O(1) complexity, so everything there is quite efficient...
EDIT: as underlined in the comments >> inserting is actually O(n) since you have to move the elements around.
Why do you need the ordinal position? As soon as you insert another item in the list the ordinal positions of other items later in the list will change, so there doesn't seem to be much point in finding the ordinal position when you do an insert.
It may be better to simply append elements to a vector, sort and then use a binary search to find the ordinal position, but it depends on what you are really trying to achieve
If you have the iterator to the item (as suggested by dtrosset), you can use std::distance (e.g. std::distance(my_list.begin(), item_it))
if you have an iterator that you want to find the index of then use std::distance,
which is either O(1) or O(n) depending on the container, however the O(1) containers are going to have O(n) inserts so overall you are looking at an O(n) algorithm with any stl container.
as others have said its not immediatly obvious why this is useful?