Efficient substitute for std::map::insert_or_assign with hint

Efficient substitute for std::map::insert_or_assign with hint - c++

I'm trying to write a substitute for std::map::insert_or_assign that takes the hint parameter, for build environments that don't support C++17.
I'd like for this substitute to be just as efficient, and not require that the mapped type be DefaultConstructible. The latter requirement rules out map[key] = value.
I've come up with this:
template <class M, class K, class T>
typename M::iterator insert_or_assign(M& map, typename M::const_iterator hint,
K&& key, T&& value)
{
using std::forward;
auto old_size = map.size();
auto iter = map.emplace_hint(hint, forward<K>(key), forward<T>(value));
// If the map didn't grow, the key already already existed and we can directly
// assign its associated value.
if (map.size() == old_size)
iter->second = std::forward<T>(value);
return iter;
}
However, I don't know if I can trust std::map not to move-assign the value twice in the case where the key already existed. Is this safe? If not, is there a safe way to efficiently implement a substitute for std::map::insert_or_assign taking a hint parameter?

As per NathanOliver's comment, where he cited the cppreference documentation for std::map::emplace:
The element may be constructed even if there already is an element
with the key in the container, in which case the newly constructed
element will be destroyed immediately.
If we assume the same applies for std::map::emplace_hint, then the value could moved away prematurely in the solution I proposed in my question.
I've come up with this other solution (NOT TESTED), which only forwards the value once. I admit it's not pretty. :-)
// Take 'hint' as a mutating iterator to avoid an O(N) conversion.
template <class M, class K, class T>
typename M::iterator insert_or_assign(M& map, typename M::iterator hint,
K&& key, T&& value)
{
using std::forward;
#ifdef __cpp_lib_map_try_emplace
return map.insert_or_assign(hint, forward<K>(key), forward<T>(value);
#else
// Check if the given key goes between `hint` and the entry just before
// hint. If not, check if the given key matches the entry just before hint.
if (hint != map.begin())
{
auto previous = hint;
--previous; // O(1)
auto comp = map.key_comp();
if (comp(previous->first, key)) // key follows previous
{
if (comp(key, hint->first)) // key precedes hint
{
// Should be O(1)
return map.emplace_hint(hint, forward<K>(key),
forward<T>(value));
}
}
else if (!comp(key, previous->first)) // key equals previous
{
previous->second = forward<T>(value); // O(1)
return previous;
}
}
// If this is reached, then the hint has failed.
// Check if key already exists. If so, assign its associated value.
// If not, emplace the new key-value pair.
auto iter = map.find(key); // O(log(N))
if (iter != map.end())
iter->second = forward<T>(value);
else
iter = map.emplace(forward<K>(key), forward<T>(value)); // O(log(N))
return iter;
#endif
}
I hope somebody else will come up with a nicer solution!
Note that I check for the __cpp_lib_map_try_emplace feature test macro to test if std::map::insert_or_assign is supported before resorting to this ugly mess.
EDIT: Removed the the slow iterator arithmetic silliness in attempting to check if the key already exists at hint.
EDIT 2: hint is now taken as a mutating iterator to avoid an expensive O(N) conversion if it was otherwise passed as a const_iterator. This allows me to manually check the hint and perform an O(1) insertion or assignment if the hint succeeds.

Related

unordered_map insert on past-the-end iterator

Is it defined and valid behavior that insert through pass-the-end iterator returned by find when the key is not found:
auto it = m.find(key);
if (it == m.end()) {
m.insert(it, make_pair(key, value));
}
because this will save an additional lookup compared using:
m[key] = value;

While it's safe to pass an end iterator as a hint to to unordered_map::insert, it doesn't actually accomplish anything.
Of the three major standard library implementations, only libstdc++ does anything with that hint, and even then it will only end up using it if it points to a valid entry.
If you want to avoid doing two lookups (one to determine if the element is present and another to insert it), you should just try to insert it. insert returns both a bool denoting whether a new element was inserted and an iterator to either the newly-inserted element or the existing element that prevented insertion. That means the most efficient way to insert an element if it doesn't exist and get an iterator to the element is to do something like this:
decltype(m)::iterator it;
bool inserted;
std::tie(it, inserted) = m.insert(std::make_pair(key, value));
if (inserted) {
// ...
}
If your mapped_type is expensive to construct, you can avoid building it with try_emplace (only available with C++17 or later):
auto [it, inserted] = m.try_emplace(key, args, to, value, constructor);
if (inserted) {
// ...
}
Pre C++17, you can just let operator[] default-construct the element and compare the container size to determine if a new element was added:
size_t size_before = m.size();
ValueType& element = m[key];
size_t size_after = m.size();
if (size_before != size_after) {
element = ValueType{args, to, value, constructor};
// ...
}
Obviously this has the drawback of default-constructing the element and only working with assignable types.

std::map insert() hint location: difference between c++98 and c++11

On cplusplus' entry on map::insert() I read about the location one could add as a hint for the function that the "function optimizes its insertion time if position points to the element that will precede the inserted element" for c++98, while for c++11 the optimization occurs "if position points to the element that will follow the inserted element (or to the end, if it would be the last)".
Does this mean that the performance of code snippets of the following form (which are abundant in the legacy code I'm working on and modeled after Scott Meyer's "Effective STL", item 24) were affected in switching to a C++11-compliant compiler?
auto pLoc = someMap.lower_bound(someKey);
if(pLoc != someMap.end() && !(someMap.key_comp()(someKey, pLoc->first)))
return pLoc->second;
else
auto newValue = expensiveCalculation();
someMap.insert(pLoc, make_pair(someKey, newValue)); // using the lower bound as hint
return newValue;
What would be the best way to improve this pattern for use with C++11?

The C++98 specification is a defect in the standard. See the discussion in LWG issue 233 and N1780.
Recall that lower_bound returns an iterator to the first element with key not less than the specified key, while upper_bound returns an iterator to the first element with key greater than the specified key. If there is no key equivalent to the specified key in the container, then lower_bound and upper_bound return the same thing - an iterator to the element that would be after the key if it were in the map.
So, in other words, your current code already works correctly under the C++11 spec, and in fact would be wrong under C++98's defective specification.

Yes, it will affect the complexity. Giving the correct hint will make insert() have amortized constant complexity, while giving and incorrect hint will force the map to search for the position from the beginning, giving logarithmic complexity. Basically, a good hint makes the insertion happen in constant time, no matter how big your map is; with a bad hint the insertion will be slower on larger maps.
The solution is, apparently, to search for the hint with upper_bound instead of lower_bound.

I am thinking the correct C++11-style hint insertion might be as follows:
iterator it = table.upper_bound(key); //upper_bound returns an iterator pointing to the first element that is greater than key
if (it == table.begin() || (--it)->first < key) {
// key not found path
table.insert(it, make_pair(key, value));
}
else {
// key found path
it->second = value;
}

A snapshot of working lambda function for your reference.
Note: m_map should not be empty. It is trivially known where to add the element if the map is empty.
auto create_or_get_iter = [this] (const K& key) {
auto it_upper = m_map.upper_bound(key);
auto it_effective = it_upper == m_map.begin() ? it_upper : std::prev(it_upper);
auto init_val = it_effective->second;
if (it_effective == m_map.begin() || it_effective->first < key) {
return m_map.insert(it_effective, std::make_pair(key, init_val));
} else {
it_effective->second = init_val;
return it_effective;
}
};

What's wrong with my vector<T>::erase here?

I have two vector<T> in my program, called active and non_active respectively. This refers to the objects it contains, as to whether they are in use or not.
I have some code that loops the active vector and checks for any objects that might have gone non active. I add these to a temp_list inside the loop.
Then after the loop, I take my temp_list and do non_active.insert of all elements in the temp_list.
After that, I do call erase on my active vector and pass it the temp_list to erase.
For some reason, however, the erase crashes.
This is the code:
non_active.insert(non_active.begin(), temp_list.begin(), temp_list.end());
active.erase(temp_list.begin(), temp_list.end());
I get this assertion:
Expression:("_Pvector == NULL || (((_Myvec*)_Pvector)->_Myfirst <= _Ptr && _Ptr <= ((_Myvect*)_Pvector)->_Mylast)",0)
I've looked online and seen that there is a erase-remove idiom, however not sure how I'd apply that to a removing a range of elements from a vector<T>
I'm not using C++11.

erase expects a range of iterators passed to it that lie within the current vector. You cannot pass iterators obtained from a different vector to erase.
Here is a possible, but inefficient, C++11 solution supported by lambdas:
active.erase(std::remove_if(active.begin(), active.end(), [](const T& x)
{
return std::find(temp_list.begin(), temp_list.end(), x) != temp_list.end();
}), active.end());
And here is the equivalent C++03 solution without the lambda:
template<typename Container>
class element_of
{
Container& container;
element_of(Container& container) : container(container) {}
public:
template<typename T>
bool operator()(const T& x) const
{
return std::find(container.begin(), container.end(), x)
!= container.end();
}
};
// ...
active.erase(std::remove_if(active.begin(), active.end(),
element_of<std::vector<T> >(temp_list)),
active.end());
If you replace temp_list with a std::set and the std::find_if with a find member function call on the set, the performance should be acceptable.

The erase method is intended to accept iterators to the same container object. You're trying to pass in iterators to temp_list to use to erase elements from active which is not allowed for good reasons, as a Sequence's range erase method is intended to specify a range in that Sequence to remove. It's important that the iterators are in that sequence because otherwise we're specifying a range of values to erase rather than a range within the same container which is a much more costly operation.
The type of logic you're trying to perform suggests to me that a set or list might be better suited for the purpose. That is, you're trying to erase various elements from the middle of a container that match a certain condition and transfer them to another container, and you could eliminate the need for temp_list this way.
With list, for example, it could be as easy as this:
for (ActiveList::iterator it = active.begin(); it != active.end();)
{
if (it->no_longer_active())
{
inactive.push_back(*it);
it = active.erase(it);
}
else
++it;
}
However, sometimes vector can outperform these solutions, and maybe you have need for vector for other reasons (like ensuring contiguous memory). In that case, std::remove_if is your best bet.
Example:
bool not_active(const YourObjectType& obj);
active_list.erase(
remove_if(active_list.begin(), active_list.end(), not_active),
active_list.end());
More info on this can be found under the topic, 'erase-remove idiom' and you may need predicate function objects depending on what external states are required to determine if an object is no longer active.

You can actually make the erase/remove idiom usable for your case. You just need to move the value over to the other container before std::remove_if possibly shuffles it around: in the predicate.
template<class OutIt, class Pred>
struct copy_if_predicate{
copy_if_predicate(OutIt dest, Pred p)
: dest(dest), pred(p) {}
template<class T>
bool operator()(T const& v){
if(pred(v)){
*dest++ = v;
return true;
}
return false;
}
OutIt dest;
Pred pred;
};
template<class OutIt, class Pred>
copy_if_predicate<OutIt,Pred> copy_if_pred(OutIt dest, Pred pred){
return copy_if_predicate<OutIt,Pred>(dest,pred);
}
Live example on Ideone. (I directly used bools to make the code shorter, not bothering with output and the likes.)

The function std::vector::erase requires the iterators to be iterators into this vector, but you are passing iterators from temp_list. You cannot erase elements from a container that are in a completely different container.

active.erase(temp_list.begin(), temp_list.end());
You try to erase elements from one list, but you use iterators for second list. First list iterators aren't the same, like in second list.

I would like to suggest that this is an example of where std::list should be used. You can splice members from one list to another. Look at std::list::splice()for this.
Do you need random access? If not then you don't need a std::vector.
Note that with list, when you splice, your iterators, and references to the objects in the list remain valid.
If you don't mind making the implementation "intrusive", your objects can contain their own iterator value, so they know where they are. Then when they change state, they can automate their own "moving" from one list to the other, and you don't need to transverse the whole list for them. (If you want this sweep to happen later, you can get them to "register" themselves for later moving).
I will write an algorithm here now to run through one collection and if a condition exists, it will effect a std::remove_if but at the same time will copy the element into your "inserter".
//fwd iterator must be writable
template< typename FwdIterator, typename InputIterator, typename Pred >
FwdIterator copy_and_remove_if( FwdIterator inp, FwdIterator end, InputIterator outp, Pred pred )
{
for( FwdIterator test = inp; test != end; ++test )
{
if( pred(*test) ) // insert
{
*outp = *test;
++outp;
}
else // keep
{
if( test != inp )
{
*inp = *test;
}
++inp;
}
}
return inp;
}
This is a bit like std::remove_if but will copy the ones being removed into an alternative collection. You would invoke it like this (for a vector) where isInactive is a valid predicate that indicates it should be moved.
active.erase( copy_and_remove_if( active.begin(), active.end(), std::back_inserter(inactive), isInactive ), active.end() );

The iterators you pass to erase() should point into the vector itself; the assertion is telling you that they don't. This version of erase() is for erasing a range out of the vector.
You need to iterate over temp_list yourself and call active.erase() on the result of dereferencing the iterator at each step.

How should std::map be used with a value that does not have a default constructor?

I've got a value type that I want put into a map.
It has a nice default copy constructor, but does not have a default constructor.
I believe that so long as I stay away from using operator[] that everything will be OK.
However I end up with pretty ugly constructs like this to actually insert an object.
(I think insert just fails if there is already a value for that key).
// equivalent to m[5]=x but without default construction
std::map<int,X>::iterator it = m.find(5);
if( it != m.end() )
{
m->second = x;
}
else
{
m->insert( std::make_pair(5,x) );
}
Which I believe will scan the map twice, and also looks pretty ugly.
Is there a neater / more efficient way to do this?

You can simply "insert-or-overwrite" with the standard insert function:
auto p = mymap.insert(std::make_pair(key, new_value));
if (!p.second) p.first->second = new_value; // overwrite value if key already exists
If you want to pass the elements by rerference, make the pair explicit:
insert(std::pair<K&, V&>(key, value));
If you have a typedef for the map like map_t, you can say std::pair<map_t::key_type &, map_t::mapped_type &>, or any suitable variation on this theme.
Maybe this is best wrapped up into a helper:
template <typename Map>
void insert_forcefully(Map & m,
typename Map::key_type const & key,
typename Map::mapped_type const & value)
{
std::pair<typename Map::iterator, bool> p = m.insert(std::pair<typename Map::key_type const &, typename Map::mapped_type const &>(key, value));
if (!p.second) { p.first->second = value; }
}

You could first get the position to insert the pair with lower_bound, then check if it's already there, and if not, insert it, providing the iterator where to insert. Something along those lines.

There are two things you missed in the interface of map (and the like):
insert(value_type) returns a std::pair<iterator, bool>, the .first member points to the element with the key you tried to insert and the .second member indicates whether it is actually the element you tried to insert or another that previously was in the container.
insert(iterator, value_type) allows you to give a hint as to where insert
The latter is not necessarily useful in your situation though.
typedef std::map<int,X> Map;
// insert and check
std::pair<Map::iterator, bool> const result =
map.insert(std::make_pair(5, x)); // O(log N)
if (not result.second)
{
result->first.second = x; // O(1)
// OR
using std::swap;
swap(result->first.second, x);
}
If you type does not support assignment and there is no swap, however, you need to bite the bullet:
// locate and insert
Map::iterator position = map.lower_bound(5); // O(log N)
if (position != map.end() and position->first == 5)
{
position = map.erase(position); // O(1)
}
map.insert(position, std::make_pair(5, x)); // O(log N) if rebalancing
In C++11, the insert methods are doubled:
insert(value_type const&) // insertion by copy
insert(P&&) // insertion by move
and with perfect forwarding we get the new emplace method. Similar to insert, but which construct the element in place by forwarding the arguments to its constructor. How it differentiate arguments for the key and value is a mystery to me though.

c++ map find() to possibly insert(): how to optimize operations?

I'm using the STL map data structure, and at the moment my code first invokes find(): if the key was not previously in the map, it calls insert() it, otherwise it does nothing.
map<Foo*, string>::iterator it;
it = my_map.find(foo_obj); // 1st lookup
if(it == my_map.end()){
my_map[foo_obj] = "some value"; // 2nd lookup
}else{
// ok do nothing.
}
I was wondering if there is a better way than this, because as far as I can tell, in this case when I want to insert a key that is not present yet, I perform 2 lookups in the map data structures: one for find(), one in the insert() (which corresponds to the operator[] ).
Thanks in advance for any suggestion.

Normally if you do a find and maybe an insert, then you want to keep (and retrieve) the old value if it already existed. If you just want to overwrite any old value, map[foo_obj]="some value" will do that.
Here's how you get the old value, or insert a new one if it didn't exist, with one map lookup:
typedef std::map<Foo*,std::string> M;
typedef M::iterator I;
std::pair<I,bool> const& r=my_map.insert(M::value_type(foo_obj,"some value"));
if (r.second) {
// value was inserted; now my_map[foo_obj]="some value"
} else {
// value wasn't inserted because my_map[foo_obj] already existed.
// note: the old value is available through r.first->second
// and may not be "some value"
}
// in any case, r.first->second holds the current value of my_map[foo_obj]
This is a common enough idiom that you may want to use a helper function:
template <class M,class Key>
typename M::mapped_type &
get_else_update(M &m,Key const& k,typename M::mapped_type const& v) {
return m.insert(typename M::value_type(k,v)).first->second;
}
get_else_update(my_map,foo_obj,"some value");
If you have an expensive computation for v you want to skip if it already exists (e.g. memoization), you can generalize that too:
template <class M,class Key,class F>
typename M::mapped_type &
get_else_compute(M &m,Key const& k,F f) {
typedef typename M::mapped_type V;
std::pair<typename M::iterator,bool> r=m.insert(typename M::value_type(k,V()));
V &v=r.first->second;
if (r.second)
f(v);
return v;
}
where e.g.
struct F {
void operator()(std::string &val) const
{ val=std::string("some value")+" that is expensive to compute"; }
};
get_else_compute(my_map,foo_obj,F());
If the mapped type isn't default constructible, then make F provide a default value, or add another argument to get_else_compute.

There are two main approaches. The first is to use the insert function that takes a value type and which returns an iterator and a bool which indicate if an insertion took place and returns an iterator to either the existing element with the same key or the newly inserted element.
map<Foo*, string>::iterator it;
it = my_map.find(foo_obj); // 1st lookup
my_map.insert( map<Foo*, string>::value_type(foo_obj, "some_value") );
The advantage of this is that it is simple. The major disadvantage is that you always construct a new value for the second parameter whether or not an insertion is required. In the case of a string this probably doesn't matter. If your value is expensive to construct this may be more wasteful than necessary.
A way round this is to use the 'hint' version of insert.
std::pair< map<foo*, string>::iterator, map<foo*, string>::iterator >
range = my_map.equal_range(foo_obj);
if (range.first == range.second)
{
if (range.first != my_map.begin())
--range.first;
my_map.insert(range.first, map<Foo*, string>::value_type(foo_obj, "some_value") );
}
The insertiong is guaranteed to be in amortized constant time only if the element is inserted immediately after the supplied iterator, hence the --, if possible.
Edit
If this need to -- seems odd, then it is. There is an open defect (233) in the standard that hightlights this issue although the description of the issue as it applies to map is clearer in the duplicate issue 246.

In your example, you want to insert when it's not found. If default construction and setting the value after that is not expensive, I'd suggest simpler version with 1 lookup:
string& r = my_map[foo_obj]; // only lookup & insert if not existed
if (r == "") r = "some value"; // if default (obj wasn't in map), set value
// else existed already, do nothing
If your example tells what you actually want, consider adding that value as str Foo::s instead, you already have the object, so no lookups would be needed, just check if it has default value for that member. And keep the objs in the std::set. Even extending class FooWithValue2 may be cheaper than using map.
But If joining data through the map like this is really needed or if you want to update only if it existed, then Jonathan has the answer.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

Efficient substitute for std::map::insert_or_assign with hint - c++

Related

unordered_map insert on past-the-end iterator

std::map insert() hint location: difference between c++98 and c++11

What's wrong with my vector<T>::erase here?

How should std::map be used with a value that does not have a default constructor?

c++ map find() to possibly insert(): how to optimize operations?

Categories

Resources