emplace_hint performance when hint is wrong

emplace_hint performance when hint is wrong - c++

I am trying to determine if emplace_hint should be used to insert a key into a multimap (as opposed to regular emplace). I have already calculated the range of the key in an earlier operation (on the same key):
range = multimap.equal_range(key);
Should I use range.first, range.second, or nothing as a hint to insert the key, value pair? What if the range is empty?

Should I use range.first, range.second, or nothing as a hint to insert the key, value pair?
As std::multimap::emplace_hint() states:
Inserts a new element into the container as close as possible to the position just before hint.
(emphasis is mine) you should use second iterator from range and it should make insertion more efficient:
Complexity
Logarithmic in the size of the container in general, but amortized constant if the new element is inserted just before hint.
as for empty range, it is still fine to use second iterator as it should always point to greater than element or behind the last if not such one exists.

First, performance wise, it will not make any difference if you use range.first or range.second. Let's have a look at the return value of equal_range:
std::equal_range - return value
std::pair containing a pair of iterators defining the wanted range,
the first pointing to the first element that is not less than value
and the second pointing to the first element greater than value. If
there are no elements not less than value, last is returned as the
first element. Similarly if there are no elements greater than value,
last is returned as the second element
This means that - when obtained for a value key - both range.first and range.secod are represent positions wherekeymay be correctly inserted right before. So performance wise it should not matter if you userange.firstorrange.last`. Complexity should be "amortized constant", since the new element is inserted just before hint.
Second, when the range is "empty", range.first and range.second are both one-past-the-end, and therefore performance as well as result are identical, actually the same as if you used emplace without any hint.
See the following program demonstrating this:
int main()
{
std::multimap<std::string, std::string> m;
// some clutter:
m.emplace(std::make_pair(std::string("k"), std::string("1")));
m.emplace(std::make_pair(std::string("k"), std::string("2")));
m.emplace(std::make_pair(std::string("z"), std::string("1")));
m.emplace(std::make_pair(std::string("z"), std::string("2")));
// relevant portion of demo data: order a-c-b may be preserved
m.emplace(std::make_pair(std::string("x"), std::string("a")));
m.emplace(std::make_pair(std::string("x"), std::string("c")));
m.emplace(std::make_pair(std::string("x"), std::string("b")));
auto r = m.equal_range("x");
// will insert "x.zzzz" before "x.a":
m.emplace_hint(r.first, std::make_pair(std::string("x"), std::string("zzzz")));
// will insert "x.0" right after "x.b":
m.emplace_hint(r.second, std::make_pair(std::string("x"), std::string("0")));
auto rEmpty = m.equal_range("e");
// "empty" range, normal lookup:
m.emplace_hint(rEmpty.first, std::make_pair(std::string("e"), std::string("b")));
m.emplace_hint(rEmpty.second, std::make_pair(std::string("e"), std::string("a")));
auto rWrong = m.equal_range("k");
m.emplace_hint(rWrong.first, std::make_pair(std::string("z"), std::string("a")));
for (const auto &p : m) {
std::cout << p.first << " => " << p.second << '\n';
}
}
Output:
e => b
e => a
k => 1
k => 2
x => zzzz
x => a
x => c
x => b
x => 0
z => a
z => 1
z => 2
In short: if you have a valid range for key pre-calculated, then use it when inserting key. It will help anyway.
EDIT:
There have been discussions around whether an "invalid" hint might lead to an insertion at a position that does not then reflect the "order of insertion" for values with the same key. This might be concluded from a general multimap statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".
I did not find support for the one or the other point of view in any normative document. I just found the following statement in cplusplus multimap/emplace_hint documentation:
emplate <class... Args>
iterator emplace_hint (const_iterator position, Args&&... args);
position Hint for the position where the element can be inserted. The function optimizes its insertion time if position points to the
element that will follow the inserted element (or to the end, if it
would be the last). Notice that this does not force the new element to
be in that position within the multimap container (the elements in a
multimap always follow a specific order). const_iterator is a member
type, defined as a bidirectional iterator type that points to
elements.
I know that this is not a normative reference, but at least my Apple LLVM 8.0 compiler adheres to this definition (see demo above):
If one inserts an element with a "wrong" hint, i.e. one pointing even before the position where a pair shall be inserted, the algorithm recognizes this and chooses a valid position (see inserting "z"=>"a" where a hint points to an "x"-element).
If we use a range for key "x" and use range.first, the position right before the first x is interpreted as a valid position.
So: I think that m.emplace_hint(r.first,... behaves in a way that the algorithm chooses a valid position immediately, and that to a position close to hint overrules the general statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".

Related

For loop exit condition with map iterator

I have a std::map<str,int> my_map
Right now, the key-value mapping looks like this -
{["apple",3],["addition",2],["app",7],["adapt",8]}
Objective:
Calculate the sum of values of keys with a given prefix.
Example : sum("ap") should return 10 (3 + 7).
I could implement it with two loops and an if condition. But, I'm trying to understand the following code that's submitted by someone to implement this.
for (auto it = my_map.lower_bound(prefix);
it != my_map.end() && it->first.substr(0, n) == prefix;
it++)
Won't the loop condition become false in the middle of iterating through my_map hence calculating an incorrect sum ?
I don't know how the code is able to give the right result. Why wouldn't the loop exit when it gets to key "addition" while looking for prefix "ap" ?
Any kind of help is appreciated.

The loop is completely correct, but not so readable at first sight.
We have std::map which is an associative container and sorted according to the compare function provided. For your map (i.e std::map<std:.string, int>), it will be sorted according to the std::string (i.e key).
So your map is already ordered like :
{["adapt",8], ["addition",2], ....., ["app",7], ["apple",3], .... }
Now let's start with the std::lower_bound:
Returns an iterator pointing to the first element in the range [first,
Last) that is not less than (i.e. greater or equal to) value, or last
if no such element is found.
Meaning at the loop start:
auto it = my_map.lower_bound(prefix);
iterator it is pointing to the map entry ["app",7]. In otherwards the iteration starts from the first possible start.
["app",7], ["apple",3], ....
Now the condition comes in to play:
it != my_map.end() && it->first.substr(0, n) == prefix;
The first one to see whether the iterator is valid (i.e. it != my_map.end()).
The second one checks whether the prefix is the same as the key start (i.e. it->first.substr(0, n) == prefix;). Since we start from the sorted possible prefix start, the outcome of the loop will be correct.

What does this std::upper bound exactly do in this code?

Grade School - Excercism
There is a test suite attached to the exercise as well.
void school::add(std::string name, int grade)
{
roster_[grade].insert(std::upper_bound(roster_[grade].begin(), roster_[grade].end(), name), name);
}
roster_ is defined as std::map<int, std::vector<std::string>> roster_;.

I find this definition easier to remember/visualize:
it = std::upper_bound(beg, end, x) gives you the iterator to the last position where you can insert x in the container [beg, end) such that the container remains sorted if it is sorted;
it = std::lower_bound(beg, end, x) gives you the iterator to the first position where you can insert x in the container [beg, end) such that the container remains sorted if it is sorted.
Therefore, given std::vector<int> v{0,2,2,2,3};,
std::upper_bound(v.begin(), v.end(), 2) returns the iterator to the 3, because inserting 2 just before the 3 doesn't break the sorting;
std::lower_bound(v.begin(), v.end(), 1) returns the iterator to the first 2 because inserting 1 just before it doesn't break the sorting.
Therefore, that code (adding some new line for clarity) inserts name at the last place it can go without breaking a pre-existing sorting.
roster_[grade].insert(
std::upper_bound(roster_[grade].begin(), roster_[grade].end(), name),
name);
The definitions you find on cppreference are useful and necessary if you assume the container is not sorted, in which case these functions are still useful, but it in a less obvious way, imho.

Your code inserts name into a sorted list in the correct place. You've got:
roster_[grade].insert(
std::upper_bound(roster_[grade].begin(), roster_[grade].end(), name),
name);
where rooster_[grade] is a std::vector. What happens is:
you use std::upper_bound to find the first item in the list that is larger than name, i.e. the item that name should be inserted before in order to keep the list sorted. This relies on your list already being sorted, and probably uses a binary search. If it doesn't find any larger values, i.e. name is larger than all values in the list, it'll return the end iterator.
you use std::vector::insert with the return value from std::upper_bound to insert name at that position. If we have the end iterator we'll append name to the end of the list.

C++ - Modifying key for all map elements

Let's consider this code:
std::map< int, char > charMap;
for( auto& i : charMap )
{
charMap[ i.first + 1 ] = charMap[ i.first ];
charMap.erase( i.first );
}
Let's say that the map has some values with randomed keys. I am trying to shift the keys by 1.
This won't work because the loop goes on forever.
Is there a fast way to make it work?

In C++17, you can use node extraction and splicing (see P0083R3):
std::map<int, char> tmpMap;
for (auto it = charMap.begin(); it != charMap.end(); )
{
auto nh = charMap.extract(it++); // node handle
++nh.key();
tmpMap.insert(tmpMap.end(), std::move(nh));
}
tmpMap.swap(charMap);
The loop extracts consecutive map nodes, mutates them, and reinserts the node into tmpMap (now with the different key). At the end, charMap is empty and tmpMap contains all the elements with their modified keys, so we swap the two.
Before C++17, you would have to copy (or move) the value data around to insert a new element with a new key.
std::map<int, char> tmpMap;
for (auto & p : charMap)
tmpMap.emplace_hint(tmpMap.end(), p.first + 1, std::move(p.second));
tmpMap.swap(charMap);
This requires memory allocations for the nodes, though, so the new splicing-based solution is more efficient.
In either case we can use the hinted insertion, because we are reconstructing elements in the same order and so the newest element is always inserted at the end.

Ad hoc solution using the known impact on order
You could simply opt for a backward iteration, starting from the last element:
for( auto pi = charMap.end(); pi-- != charMap.begin(); pi=charMap.erase( pi ))
charMap[ pi->first + 1 ] = charMap[ pi->first ];
Online demo
This will not loop forever here, because the new element that you insert will always be after the current one and will hence not be reprocessed again and again.
More general solution
For a more general transformation where you can't be sure about the impact on element ordering, I'd rather go for a std::transform():
std::map<int, char> tmp;
std::transform(charMap.begin(), charMap.end(), std::inserter(tmp,tmp.begin()),
[](auto e) { return std::make_pair(e.first+1, e.second); });
std::swap(tmp, charMap); // the old map will be discarded when tmp goes out of scope
Online demo

You cannot use this kind of range iteration for two fundamental reasons:
The first reason is that a fundamental property of a map is that iterating over the map iterates in key order.
You are iterating over the map. So, if the first key in the map is key 0, you will copy the value to key 1. Then, you iterate to the next key in the map, which is the key 1 that you just created, and then copy it to key 2. Lather, rinse, repeat.
The are several ways to solve this, but none of that matters because of a second fundamental aspect of the map:
charMap[1]=charMap[0];
This copes charMap[0] to charMap[1]. It does nothing to charMap[0]. It is still there. Nothing happened to it. So, presuming that the lowest key in the map is 0, and you shifted the keys correctly, you will still have a value in the map with key 0. Ditto for the everything else in the map.
But let's say you solved the first problem in one of the several ways that it could be solved. Then, let's say your map has values for keys 0, 5, and 7.
After you copy key #0 to key #1, key #5 to key #6, and key #7 to key #8, take a paper and pencil, and figure out what you have now in your map.
Answer: it is not going to be keys 1, 6, and 8. It will be keys 0, 1, 5, 6, 7, and 8.
All you did was copy each value to the next key. This is because a computer does exactly what you tell it to do, no more, no less. A computer does not do what you want it to do.
The easiest way to do this is to create a new map, and copy the contents of the old map to the new map, with an updated key value. You can still use range iteration for that. Then, replace the old map with the new map.
Of course, this becomes impractical if the map is very large. In that case, it is still possible to do this without using a second map, but the algorithm is going to be somewhat complicated. The capsule summary is:
1) Iterate over the keys in reverse order. Can't use range iteration here.
2) After copying the key to the next value in the map, explicitly remove the value from its original key.

Inserting elements at desired positions in a STL map

map <int, string> rollCallRegister;
map <int, string> :: iterator rollCallRegisterIter;
map <int, string> :: iterator temporaryRollCallRegisterIter;
rollCallRegisterIter = rollCallRegister.begin ();
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (55, "swati"));
rollCallRegisterIter++;
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (44, "shweta"));
rollCallRegisterIter++;
tempRollCallRegisterIter = rollCallRegister.insert (rollCallRegisterIter, pair <int, string> (33, "sindhu"));
// Displaying contents of this map.
cout << "\n\nrollCallRegister contains:\n";
for (rollCallRegisterIter = rollCallRegister.begin(); rollCallRegisterIter != rollCallRegister.end(); ++rollCallRegisterIter)
{
cout << (*rollCallRegisterIter).first << " => " << (*rollCallRegisterIter).second << endl;
}
Output:
rollCallRegister contains:
33 => sindhu
44 => shweta
55 => swati
I have incremented the iterator. Why is it still getting sorted? And if the position is supposed to be changed by the map on its own, then what's the purpose of providing an iterator?

Because std::map is a sorted associative container.
In a map, the key value is generally used to uniquely identify the element, while the mapped value is some sort of value associated to this key.
According to here position parameter is
the position of the first element to be compared for the insertion
operation. Notice that this does not force the new element to be in
that position within the map container (elements in a set always
follow a specific ordering), but this is actually an indication of a
possible insertion position in the container that, if set to the
element that precedes the actual location where the element is
inserted, makes for a very efficient insertion operation. iterator is
a member type, defined as a bidirectional iterator type.
So the purpose of this parameter is mainly slightly increasing the insertion speed by narrowing the range of elements.
You can use std::vector<std::pair<int,std::string>> if the order of insertion is important.

The interface is indeed slightly confusing, because it looks very much like std::vector<int>::insert (for example) and yet does not produce the same effect...
For associative containers, such as set, map and the new unordered_set and co, you completely relinquish the control over the order of the elements (as seen by iterating over the container). In exchange for this loss of control, you gain efficient look-up.
It would not make sense to suddenly give you control over the insertion, as it would let you break invariants of the container, and you would lose the efficient look-up that is the reason to use such containers in the first place.
And thus insert(It position, value_type&& value) does not insert at said position...
However this gives us some room for optimization: when inserting an element in an associative container, a look-up need to be performed to locate where to insert this element. By letting you specify a hint, you are given an opportunity to help the container speed up the process.
This can be illustrated for a simple example: suppose that you receive elements already sorted by way of some interface, it would be wasteful not to use this information!
template <typename Key, typename Value, typename InputStream>
void insert(std::map<Key, Value>& m, InputStream& s) {
typename std::map<Key, Value>::iterator it = m.begin();
for (; s; ++s) {
it = m.insert(it, *s).first;
}
}
Some of the items might not be well sorted, but it does not matter, if two consecutive items are in the right order, then we will gain, otherwise... we'll just perform as usual.

The map is always sorted, but you give a "hint" as to where the element may go as an optimisation.
The insertion is O(log N) but if you are able to successfully tell the container where it goes, it is constant time.
Thus if you are creating a large container of already-sorted values, then each value will get inserted at the end, although the tree will need rebalancing quite a few times.

As sad_man says, it's associative. If you set a value with an existing key, then you overwrite the previous value.
Now the iterators are necessary because you don't know what the keys are, usually.

STL "closest" method?

I'm looking for an STL sort that returns the element "closest" to the target value if the exact value is not present in the container. It needs to be fast, so essentially I'm looking for a slightly modified binary search... I could write it, but it seems like something that should already exist...

Do you mean the lower_bound/upper_bound functions? These perform a binary search and return the closest element above the value you're looking for.
Clarification: The global versions of lower/upper_bound only work if the range is sorted, as they use some kind of binary search internally. (Obviously, the lower/upper_bound methods in std::map always work). You said in your question that you were looking for some kind of binary search, so I'll assume the range is sorted.
Also, Neither lower_bound nor upper_bound returns the closest member. If the value X you're looking for isn't a member of the range, they will both return the first element greater then X. Otherwise, lower_bound will return the first value equal to X, upper_boundwill return the last value equals X.
So to find the closest value, you'd have to
call lower_bound
if it returns the end of the range, all values are less then X. The last (i.e. the highest) element is the closest one
it if returns the beginning of the range, all values are greater then X. The first (i.e. the lowest) element is the closest one
if it returns an element in the middle of the range, check that element and the element before - the one that's closer to X is the one you're looking for

So you're looking for an element which has a minimal distance from some value k?
Use std::transform to transform each x to x-k. The use std::min_element with a comparison function which returns abs(l) < abs(r). Then add k back onto the result.
EDIT: Alternatively, you could just use std::min_element with a comparison function abs(l-k) < abs(r-k), and eliminate the std::transform.
EDIT2: This is good for unsorted containers. For sorted containers, you probably want nikie's answer.

If the container is already sorted (as implied) you should be able to use std::upper_bound and the item directly before to figure out which is closest:
// Untested.
template <class Iter, class T>
Iter closest_value(Iter begin, Iter end, T value)
{
Iter result = std::upper_bound(begin, end, value);
if(result != begin)
{
Iter lower_result = result;
--lower_result;
if(result == end || ((value - *lower_result) < (*result - value)))
{
result = lower_result;
}
}
return result;
}
If the container is not sorted, use min_element with a predicate as already suggested.

If your data is not sorted, use std::min_element with a comparison functor that calculates your distance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js