For loop exit condition with map iterator - c++

I have a std::map<str,int> my_map
Right now, the key-value mapping looks like this -
{["apple",3],["addition",2],["app",7],["adapt",8]}
Objective:
Calculate the sum of values of keys with a given prefix.
Example : sum("ap") should return 10 (3 + 7).
I could implement it with two loops and an if condition. But, I'm trying to understand the following code that's submitted by someone to implement this.
for (auto it = my_map.lower_bound(prefix);
it != my_map.end() && it->first.substr(0, n) == prefix;
it++)
Won't the loop condition become false in the middle of iterating through my_map hence calculating an incorrect sum ?
I don't know how the code is able to give the right result. Why wouldn't the loop exit when it gets to key "addition" while looking for prefix "ap" ?
Any kind of help is appreciated.

The loop is completely correct, but not so readable at first sight.
We have std::map which is an associative container and sorted according to the compare function provided. For your map (i.e std::map<std:.string, int>), it will be sorted according to the std::string (i.e key).
So your map is already ordered like :
{["adapt",8], ["addition",2], ....., ["app",7], ["apple",3], .... }
Now let's start with the std::lower_bound:
Returns an iterator pointing to the first element in the range [first,
Last) that is not less than (i.e. greater or equal to) value, or last
if no such element is found.
Meaning at the loop start:
auto it = my_map.lower_bound(prefix);
iterator it is pointing to the map entry ["app",7]. In otherwards the iteration starts from the first possible start.
["app",7], ["apple",3], ....
Now the condition comes in to play:
it != my_map.end() && it->first.substr(0, n) == prefix;
The first one to see whether the iterator is valid (i.e. it != my_map.end()).
The second one checks whether the prefix is the same as the key start (i.e. it->first.substr(0, n) == prefix;). Since we start from the sorted possible prefix start, the outcome of the loop will be correct.

Related

Is decrementing std::vector::begin undefined, even if it is never used?

Note that, contrary to the many questions on the subject (and probably why I can not find a satisfactory answer to this question neither on google nor on stackoverflow), I never dereference *(begin() - 1)
My requirements are to:
iterate backwards
use functions that do not take reverse iterators, in this example vector::erase()
try to keep the code clean, so try to avoid the mental juggling of:
vector.erase(rev_it.base() - 1)
(What should the reverse iterator be now to get the the next element in the iteration ? The iterator returned by erase() ? + 1, probably ? - 1 , unlikely ?)
What I’ve come up with is:
for (auto it = vector.end(); it-- != vector.begin(); ) {
if (condition(*it)) {
it = vector.erase(it);
}
}
This seems to work, as it-- returns the iterator’s value and then only decrements it, meaning the iterator is always decremented after the check but before entering the loop body.
In particular:
When entering the loop
if vector.end() == vector.begin() the vector is empty and we exit the loop immediately
if vector.end() != vector.begin() then we enter the loop, with the first loop body execution with it == vector.end() - 1
When erasing elements
vector.erase(it) returns the following element in the vector, so decrementing the iterator at every iteration gets us to consider exactly once every element in the vector.
When exiting the loop
At the last execution of the loop’s body, it == vector.begin(), so the next time we try the loop condition:
the condition returns false
it is decremented one last time
we exit the loop
That is, my code does seem to compute the iterator position begin() - 1 but never accesses it, nor uses it for comparisons or anything like that.
Is this undefined behaviour?
Do I risk a segfault or something? Or just accessing uninitialized data maybe? Nothing at all because the iterator is discarded before being used in anyway? Is there no way of knowing?
How about
for (auto it = vector.end(); it != vector.begin(); ) {
--it;
... rest of the loop body

emplace_hint performance when hint is wrong

I am trying to determine if emplace_hint should be used to insert a key into a multimap (as opposed to regular emplace). I have already calculated the range of the key in an earlier operation (on the same key):
range = multimap.equal_range(key);
Should I use range.first, range.second, or nothing as a hint to insert the key, value pair? What if the range is empty?
Should I use range.first, range.second, or nothing as a hint to insert the key, value pair?
As std::multimap::emplace_hint() states:
Inserts a new element into the container as close as possible to the position just before hint.
(emphasis is mine) you should use second iterator from range and it should make insertion more efficient:
Complexity
Logarithmic in the size of the container in general, but amortized constant if the new element is inserted just before hint.
as for empty range, it is still fine to use second iterator as it should always point to greater than element or behind the last if not such one exists.
First, performance wise, it will not make any difference if you use range.first or range.second. Let's have a look at the return value of equal_range:
std::equal_range - return value
std::pair containing a pair of iterators defining the wanted range,
the first pointing to the first element that is not less than value
and the second pointing to the first element greater than value. If
there are no elements not less than value, last is returned as the
first element. Similarly if there are no elements greater than value,
last is returned as the second element
This means that - when obtained for a value key - both range.first and range.secod are represent positions wherekeymay be correctly inserted right before. So performance wise it should not matter if you userange.firstorrange.last`. Complexity should be "amortized constant", since the new element is inserted just before hint.
Second, when the range is "empty", range.first and range.second are both one-past-the-end, and therefore performance as well as result are identical, actually the same as if you used emplace without any hint.
See the following program demonstrating this:
int main()
{
std::multimap<std::string, std::string> m;
// some clutter:
m.emplace(std::make_pair(std::string("k"), std::string("1")));
m.emplace(std::make_pair(std::string("k"), std::string("2")));
m.emplace(std::make_pair(std::string("z"), std::string("1")));
m.emplace(std::make_pair(std::string("z"), std::string("2")));
// relevant portion of demo data: order a-c-b may be preserved
m.emplace(std::make_pair(std::string("x"), std::string("a")));
m.emplace(std::make_pair(std::string("x"), std::string("c")));
m.emplace(std::make_pair(std::string("x"), std::string("b")));
auto r = m.equal_range("x");
// will insert "x.zzzz" before "x.a":
m.emplace_hint(r.first, std::make_pair(std::string("x"), std::string("zzzz")));
// will insert "x.0" right after "x.b":
m.emplace_hint(r.second, std::make_pair(std::string("x"), std::string("0")));
auto rEmpty = m.equal_range("e");
// "empty" range, normal lookup:
m.emplace_hint(rEmpty.first, std::make_pair(std::string("e"), std::string("b")));
m.emplace_hint(rEmpty.second, std::make_pair(std::string("e"), std::string("a")));
auto rWrong = m.equal_range("k");
m.emplace_hint(rWrong.first, std::make_pair(std::string("z"), std::string("a")));
for (const auto &p : m) {
std::cout << p.first << " => " << p.second << '\n';
}
}
Output:
e => b
e => a
k => 1
k => 2
x => zzzz
x => a
x => c
x => b
x => 0
z => a
z => 1
z => 2
In short: if you have a valid range for key pre-calculated, then use it when inserting key. It will help anyway.
EDIT:
There have been discussions around whether an "invalid" hint might lead to an insertion at a position that does not then reflect the "order of insertion" for values with the same key. This might be concluded from a general multimap statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".
I did not find support for the one or the other point of view in any normative document. I just found the following statement in cplusplus multimap/emplace_hint documentation:
emplate <class... Args>
iterator emplace_hint (const_iterator position, Args&&... args);
position Hint for the position where the element can be inserted. The function optimizes its insertion time if position points to the
element that will follow the inserted element (or to the end, if it
would be the last). Notice that this does not force the new element to
be in that position within the multimap container (the elements in a
multimap always follow a specific order). const_iterator is a member
type, defined as a bidirectional iterator type that points to
elements.
I know that this is not a normative reference, but at least my Apple LLVM 8.0 compiler adheres to this definition (see demo above):
If one inserts an element with a "wrong" hint, i.e. one pointing even before the position where a pair shall be inserted, the algorithm recognizes this and chooses a valid position (see inserting "z"=>"a" where a hint points to an "x"-element).
If we use a range for key "x" and use range.first, the position right before the first x is interpreted as a valid position.
So: I think that m.emplace_hint(r.first,... behaves in a way that the algorithm chooses a valid position immediately, and that to a position close to hint overrules the general statement "The order of the key-value pairs whose keys compare equivalent is the order of insertion and does not change. (since C++11)".

Using C++ Iterator on a set of one only element

I'm working with iterators on a set of elements of size greater that three almost all the time, but it happens that the generated set contains only one element, in this case, the following loop:
for(i = data_set.begin(); i != data_set.end(); i++)
{
//do something with the data
}
will never be entered even though "data_set" is not empty because data_set.begin()==data_set.end()
I'm doing a test to handle this particular case alone but the code is turning to a mess and is no longer clean.
What should be done to handle this properly?
Thanks,
자스민
If the set contains only 1 element, then:
std::next( data_set.begin() ) == data_set.end(), because begin() iterator points at first element of the container, and end() points to the element that is next after the last one.

Erasing an equal_range iterator

I've got a pair of iterators:
pair <multimap<CFile,Filetype>::iterator, multimap<CFile,Filetype>::iterator> range;
range = m_DirectoryMap.equal_range(obj);
That pair holds duplicated elements in a MultiMap - e.g. there is 1 object that has 2 more duplicates (so basically 3 objects) and I need to remove 2 of them, so only 1 is left.
I was doing this by a simple while loop, like this:
auto it = range.first;
++it;
while (it != range.second)
it = m_DirectoryMap.erase(it);
After that, only 1 object was left - which is my goal.
Later I have found that I should probably try and erase the whole pair by 1 function call and there shouldn't be any needs for loops, like this:
m_DirectoryMap.erase(range.first, range.second);
This seems more cleaner, but the problem is that it removes all objects.
Then I tried:
m_DirectoryMap.erase(++range.first, range.second);
This seems to leave the first object and remove the rest, so it is working for me, but my question is - is this the right way to do, what I'm looking for?

STL "closest" method?

I'm looking for an STL sort that returns the element "closest" to the target value if the exact value is not present in the container. It needs to be fast, so essentially I'm looking for a slightly modified binary search... I could write it, but it seems like something that should already exist...
Do you mean the lower_bound/upper_bound functions? These perform a binary search and return the closest element above the value you're looking for.
Clarification: The global versions of lower/upper_bound only work if the range is sorted, as they use some kind of binary search internally. (Obviously, the lower/upper_bound methods in std::map always work). You said in your question that you were looking for some kind of binary search, so I'll assume the range is sorted.
Also, Neither lower_bound nor upper_bound returns the closest member. If the value X you're looking for isn't a member of the range, they will both return the first element greater then X. Otherwise, lower_bound will return the first value equal to X, upper_boundwill return the last value equals X.
So to find the closest value, you'd have to
call lower_bound
if it returns the end of the range, all values are less then X. The last (i.e. the highest) element is the closest one
it if returns the beginning of the range, all values are greater then X. The first (i.e. the lowest) element is the closest one
if it returns an element in the middle of the range, check that element and the element before - the one that's closer to X is the one you're looking for
So you're looking for an element which has a minimal distance from some value k?
Use std::transform to transform each x to x-k. The use std::min_element with a comparison function which returns abs(l) < abs(r). Then add k back onto the result.
EDIT: Alternatively, you could just use std::min_element with a comparison function abs(l-k) < abs(r-k), and eliminate the std::transform.
EDIT2: This is good for unsorted containers. For sorted containers, you probably want nikie's answer.
If the container is already sorted (as implied) you should be able to use std::upper_bound and the item directly before to figure out which is closest:
// Untested.
template <class Iter, class T>
Iter closest_value(Iter begin, Iter end, T value)
{
Iter result = std::upper_bound(begin, end, value);
if(result != begin)
{
Iter lower_result = result;
--lower_result;
if(result == end || ((value - *lower_result) < (*result - value)))
{
result = lower_result;
}
}
return result;
}
If the container is not sorted, use min_element with a predicate as already suggested.
If your data is not sorted, use std::min_element with a comparison functor that calculates your distance.