c++ map find() to possibly insert(): how to optimize operations?

c++ map find() to possibly insert(): how to optimize operations? - c++

I'm using the STL map data structure, and at the moment my code first invokes find(): if the key was not previously in the map, it calls insert() it, otherwise it does nothing.
map<Foo*, string>::iterator it;
it = my_map.find(foo_obj); // 1st lookup
if(it == my_map.end()){
my_map[foo_obj] = "some value"; // 2nd lookup
}else{
// ok do nothing.
}
I was wondering if there is a better way than this, because as far as I can tell, in this case when I want to insert a key that is not present yet, I perform 2 lookups in the map data structures: one for find(), one in the insert() (which corresponds to the operator[] ).
Thanks in advance for any suggestion.

Normally if you do a find and maybe an insert, then you want to keep (and retrieve) the old value if it already existed. If you just want to overwrite any old value, map[foo_obj]="some value" will do that.
Here's how you get the old value, or insert a new one if it didn't exist, with one map lookup:
typedef std::map<Foo*,std::string> M;
typedef M::iterator I;
std::pair<I,bool> const& r=my_map.insert(M::value_type(foo_obj,"some value"));
if (r.second) {
// value was inserted; now my_map[foo_obj]="some value"
} else {
// value wasn't inserted because my_map[foo_obj] already existed.
// note: the old value is available through r.first->second
// and may not be "some value"
}
// in any case, r.first->second holds the current value of my_map[foo_obj]
This is a common enough idiom that you may want to use a helper function:
template <class M,class Key>
typename M::mapped_type &
get_else_update(M &m,Key const& k,typename M::mapped_type const& v) {
return m.insert(typename M::value_type(k,v)).first->second;
}
get_else_update(my_map,foo_obj,"some value");
If you have an expensive computation for v you want to skip if it already exists (e.g. memoization), you can generalize that too:
template <class M,class Key,class F>
typename M::mapped_type &
get_else_compute(M &m,Key const& k,F f) {
typedef typename M::mapped_type V;
std::pair<typename M::iterator,bool> r=m.insert(typename M::value_type(k,V()));
V &v=r.first->second;
if (r.second)
f(v);
return v;
}
where e.g.
struct F {
void operator()(std::string &val) const
{ val=std::string("some value")+" that is expensive to compute"; }
};
get_else_compute(my_map,foo_obj,F());
If the mapped type isn't default constructible, then make F provide a default value, or add another argument to get_else_compute.

There are two main approaches. The first is to use the insert function that takes a value type and which returns an iterator and a bool which indicate if an insertion took place and returns an iterator to either the existing element with the same key or the newly inserted element.
map<Foo*, string>::iterator it;
it = my_map.find(foo_obj); // 1st lookup
my_map.insert( map<Foo*, string>::value_type(foo_obj, "some_value") );
The advantage of this is that it is simple. The major disadvantage is that you always construct a new value for the second parameter whether or not an insertion is required. In the case of a string this probably doesn't matter. If your value is expensive to construct this may be more wasteful than necessary.
A way round this is to use the 'hint' version of insert.
std::pair< map<foo*, string>::iterator, map<foo*, string>::iterator >
range = my_map.equal_range(foo_obj);
if (range.first == range.second)
{
if (range.first != my_map.begin())
--range.first;
my_map.insert(range.first, map<Foo*, string>::value_type(foo_obj, "some_value") );
}
The insertiong is guaranteed to be in amortized constant time only if the element is inserted immediately after the supplied iterator, hence the --, if possible.
Edit
If this need to -- seems odd, then it is. There is an open defect (233) in the standard that hightlights this issue although the description of the issue as it applies to map is clearer in the duplicate issue 246.

In your example, you want to insert when it's not found. If default construction and setting the value after that is not expensive, I'd suggest simpler version with 1 lookup:
string& r = my_map[foo_obj]; // only lookup & insert if not existed
if (r == "") r = "some value"; // if default (obj wasn't in map), set value
// else existed already, do nothing
If your example tells what you actually want, consider adding that value as str Foo::s instead, you already have the object, so no lookups would be needed, just check if it has default value for that member. And keep the objs in the std::set. Even extending class FooWithValue2 may be cheaper than using map.
But If joining data through the map like this is really needed or if you want to update only if it existed, then Jonathan has the answer.

Related

Efficient substitute for std::map::insert_or_assign with hint

I'm trying to write a substitute for std::map::insert_or_assign that takes the hint parameter, for build environments that don't support C++17.
I'd like for this substitute to be just as efficient, and not require that the mapped type be DefaultConstructible. The latter requirement rules out map[key] = value.
I've come up with this:
template <class M, class K, class T>
typename M::iterator insert_or_assign(M& map, typename M::const_iterator hint,
K&& key, T&& value)
{
using std::forward;
auto old_size = map.size();
auto iter = map.emplace_hint(hint, forward<K>(key), forward<T>(value));
// If the map didn't grow, the key already already existed and we can directly
// assign its associated value.
if (map.size() == old_size)
iter->second = std::forward<T>(value);
return iter;
}
However, I don't know if I can trust std::map not to move-assign the value twice in the case where the key already existed. Is this safe? If not, is there a safe way to efficiently implement a substitute for std::map::insert_or_assign taking a hint parameter?

As per NathanOliver's comment, where he cited the cppreference documentation for std::map::emplace:
The element may be constructed even if there already is an element
with the key in the container, in which case the newly constructed
element will be destroyed immediately.
If we assume the same applies for std::map::emplace_hint, then the value could moved away prematurely in the solution I proposed in my question.
I've come up with this other solution (NOT TESTED), which only forwards the value once. I admit it's not pretty. :-)
// Take 'hint' as a mutating iterator to avoid an O(N) conversion.
template <class M, class K, class T>
typename M::iterator insert_or_assign(M& map, typename M::iterator hint,
K&& key, T&& value)
{
using std::forward;
#ifdef __cpp_lib_map_try_emplace
return map.insert_or_assign(hint, forward<K>(key), forward<T>(value);
#else
// Check if the given key goes between `hint` and the entry just before
// hint. If not, check if the given key matches the entry just before hint.
if (hint != map.begin())
{
auto previous = hint;
--previous; // O(1)
auto comp = map.key_comp();
if (comp(previous->first, key)) // key follows previous
{
if (comp(key, hint->first)) // key precedes hint
{
// Should be O(1)
return map.emplace_hint(hint, forward<K>(key),
forward<T>(value));
}
}
else if (!comp(key, previous->first)) // key equals previous
{
previous->second = forward<T>(value); // O(1)
return previous;
}
}
// If this is reached, then the hint has failed.
// Check if key already exists. If so, assign its associated value.
// If not, emplace the new key-value pair.
auto iter = map.find(key); // O(log(N))
if (iter != map.end())
iter->second = forward<T>(value);
else
iter = map.emplace(forward<K>(key), forward<T>(value)); // O(log(N))
return iter;
#endif
}
I hope somebody else will come up with a nicer solution!
Note that I check for the __cpp_lib_map_try_emplace feature test macro to test if std::map::insert_or_assign is supported before resorting to this ugly mess.
EDIT: Removed the the slow iterator arithmetic silliness in attempting to check if the key already exists at hint.
EDIT 2: hint is now taken as a mutating iterator to avoid an expensive O(N) conversion if it was otherwise passed as a const_iterator. This allows me to manually check the hint and perform an O(1) insertion or assignment if the hint succeeds.

Comparator can be used to set a new key, isn't?

I need to change the "key" of a multiset:
multiset<IMidiMsgExt, IMidiMsgExtCompByNoteNumber> playingNotes;
such as that when I use the .find() function it search and return the first object (iterator) with that NoteNumber property value.
I said "first" because my multiset list could contains objects with the same "key". So I did:
struct IMidiMsgExtCompByNoteNumber {
bool operator()(const IMidiMsgExt& lhs, const IMidiMsgExt& rhs) {
return lhs.NoteNumber() < rhs.NoteNumber();
}
};
but when I try to do:
auto it = playingNotes.find(60);
the compiler says no instance of overloaded function "std::multiset<_Kty, _Pr, _Alloc>::find [with _Kty=IMidiMsgExt, _Pr=IMidiMsgExtCompByNoteNumber, _Alloc=std::allocator<IMidiMsgExt>]" matches the argument list
Am I misunderstanding the whole thing? What's wrong?

I do believe that you have some misunderstandings here:
Part of an associative container's type is it's key type and comparator. Because C++ is strongly typed the only way to change the comparator on a container is to create a new container, copying or moving all the elements into it
Creating a copy of all the elements in a container is a potentially expensive process
By creating a copy you are violating the Single Source of Truth best practice
multiset is used infrequently, I have used it once in my career, others have pointed out it's shortcomings and recommended that you use another container, write your own container, or in my case I'd suggests simply using vector and sorting it how you want when you have to
I'm going to catalog your comments to show how the answer I've already given you is correct:
We're going to assume that the multiset<IMidiMsgExt, IMidiMsgExtCompByNoteNumber> that you've selected is necessary and cannot be improved upon by using vector as suggested in 4, where:
struct IMidiMsgExtCompByNoteNumber {
bool operator()(const IMidiMsgExt& lhs, const IMidiMsgExt& rhs) {
return lhs.NoteNumber() < rhs.NoteNumber();
}
};
You cannot use multiset::find because that requires you tospecify the exact IMidiMsgExt you are searching for; so you'll need to use find_if(cbegin(playingNotes), cend(playingNotes), [value = int{60}](const auto& i){return i.mNote == value;}) to search for a specific property value. Which will be fine to use on to use directly on PlayingNotes without changing the sorting, because you say:
I want to delete the first note that has mNote of 60. No matter the mTime when deleting.
You'll need to capture the result of the [find_if], check if it is valid, and if so erase it as demonstrated in my answer, because you say:
The first element find will find for that, erase. [sic]
I would roll the code from my answer into a function because you say:
Ill recall find if I want another element, maybe with same value, to get deleted [sic]
Your final solution should be to write a function like this:
bool foo(const multiset<IMidiMsgExt, IMidiMsgExtCompByNoteNumber>& playingNotes, const int value) {
const auto it = find_if(cbegin(playingNotes), cend(playingNotes), [=](const auto& i){return i.mNote == value;});
const auto result = it != cend(playingNotes);
if(result) {
playingNotes.erase(it);
}
return result;
}
And you'd call it something like this: foo(playingNotes, 60) if you wish to know whether an element was removed you may test foo's return.

Optimization of a C++ code (that uses UnorderedMap and Vector)

I am trying to optimize some part of a C++ code that is taking a long time (the following part of the code takes about 19 seconds for X amount of data, and I am trying to finish the whole process in less than 5 seconds for the same amount of data - based on some benchmarks that I have). I have a function "add" that I have written and copied the code here. I will try to explain as much as possible that I think is needed to understand the code. Please let me know if I have missed something.
The following function add is called X times for X amount of data entries.
void HashTable::add(PointObject vector) // PointObject is a user-defined object
{
int combinedHash = hash(vector); // the function "hash" takes less than 1 second for X amount of data
// hashTableMap is an unordered_map<int, std::vector<PointObject>>
if (hashTableMap.count(combinedHash) == 0)
{
// if the hashmap does not contain the combinedHash key, then
// add the key and a new vector
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
else
{
// otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
std::vector<PointObject> pointVectorList = it->second;
pointVectorList.push_back(vector);
it->second = pointVectorList;
}
}
}

You are doing a lot of useless operations... if I understand correctly, a simplified form could be simply:
void HashTable::add(const PointObject& vector) {
hashTableMap[hash(vector)].push_back(vector);
}
This works because
A map when accessed using operator[] will create a default-initialized value if it's not already present in the map
The value (an std::vector) is returned by reference so you can directly push_back the incoming point to it. This std::vector will be either a newly inserted one or a previously existing one if the key was already in the map.
Note also that, depending on the size of PointObject and other factors, it could be possibly more efficient to pass vector by value instead of by const PointObject&. This is the kind of micro optimization that however requires profiling to be performed sensibly.

Instead of calling hashTableMap.count(combinedHash) and hashTableMap.find(combinedHash), better just insert new element and check what insert() returned:
In versions (1) and (2), the function returns a pair object whose
first element is an iterator pointing either to the newly inserted
element in the container or to the element whose key is equivalent,
and a bool value indicating whether the element was successfully
inserted or not.
Moreover, do not pass objects by value, where you don't have to. Better pass it by pointer or by reference. This:
std::vector<PointObject> pointVectorList = it->second;
is inefficient since it will create an unnecessary copy of the vector.

This .count() is totally unecessary, you could simplify your function to:
void HashTable::add(PointObject vector)
{
int combinedHash = hash(vector);
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
std::vector<PointObject> pointVectorList = it->second;
pointVectorList.push_back(vector);
it->second = pointVectorList;
}
else
{
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
}
You are also performing copy operations everywhere. Copying an object is time consuming, avoid doing that. Also use references and pointers when possible:
void HashTable::add(PointObject& vector)
{
int combinedHash = hash(vector);
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
it->second.push_back(vector);
}
else
{
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(vector);
hashTableMap.insert(std::make_pair(combinedHash, pointVectorList));
}
}
This code can probably be optimized further, but it would require knowing hash(), knowing the way hashTableMap works (by the way, why is it not a std::map?) and some experimentation.
If hashTableMap was a std::map<int, std::vector<pointVectorList>>, you could simplify your function to this:
void HashTable::add(PointObject& vector)
{
hashTableMap[hash(vector)].push_back(vector);
}
And if it was a std::map<int, std::vector<pointVectorList*>> (pointer) you can even avoid that last copy operation.

Without the if, try to insert an empty entry on the hash table:
auto ret = hashTableMap.insert(
std::make_pair(combinedHash, std::vector<PointObject>());
Either a new blank entry will be added, or the already present entry will be retrieved. In your case, you don't need to check which it the case, you just need to take the returned iterator and add the new element:
auto &pointVectorList = *ret.first;
pointVectorList.push_back(vector);

Assuming that PointObject is big and making copies of it is expensive, std::move is your friend here. You'll want to ensure that PointObject is move-aware (either don't define a destructor or copy operator, or provide a move-constructor and move-assignment operator yourself).
void HashTable::add(PointObject vector) // PointObject is a user-defined object
{
int combinedHash = hash(vector); // the function "hash" takes less than 1 second for X amount of data
// hashTableMap is an unordered_map<int, std::vector<PointObject>>
if (hashTableMap.count(combinedHash) == 0)
{
// if the hashmap does not contain the combinedHash key, then
// add the key and a new vector
std::vector<PointObject> pointVectorList;
pointVectorList.push_back(std::move(vector));
hashTableMap.insert(std::make_pair(combinedHash, std::move(pointVectorList)));
}
else
{
// otherwise find the key and the corresponding vector of PointObjects and add the current PointObject to the existing vector
auto it = hashTableMap.find(combinedHash);
if (it != hashTableMap.end())
{
std::vector<PointObject> pointVectorList = it->second;
pointVectorList.push_back(std::move(vector));
it->second = std::move(pointVectorList);
}
}
}

Using std::unordered_map doesn't seem appropriate here - you use the int from hash as the key (which presumably) is the hash of PointObject rather than PointObject itself. Essentially double hashing. And also if you need a PointObject in order to compute the map key then it's not really a key at all! Perhaps std::unordered_multiset would be a better choice?
First define the hash function form PointObject
namespace std
{
template<>
struct hash<PointObject> {
size_t operator()(const PointObject& p) const {
return ::hash(p);
}
};
}
Then something like
#include <unordered_set>
using HashTable = std::unordered_multiset<PointObject>;
int main()
{
HashTable table {};
PointObject a {};
table.insert(a);
table.emplace(/* whatever */);
return 0;
}

Your biggest problem is that you're copying the entire vector (and every element in that vector) twice in the else part:
std::vector<PointObject> pointVectorList = it->second; // first copy
pointVectorList.push_back(vector);
it->second = pointVectorList; // second copy
This means that every time you're adding an element to an existing vector you're copying that entire vector.
If you used a reference to that vector you'd do a lot better:
std::vector<PointObject> &pointVectorList = it->second;
pointVectorList.push_back(vector);
//it->second = pointVectorList; // don't need this anymore.
On a side note, in your unordered_map you're hashing your value to be your key.
You could use an unordered_set with your hash function instead.

C++ find and erase a multimap element

I need to add, store and delete some pairs of objects, e.g. Person-Hobby. Any person can have several hobbies and several persons can have the same hobby. So, multimap is a good container, right?
Before adding a pair I need to know, if it's not added yet. As I can see here there is no standard class-method to know, if the concrete pair e.g. Peter-Football exists in the MM. Thus, I've written a method which returns a positive integer (equal to the distance between mm.begin() and pair iterator) if the pair exists and -1 otherwise.
Then I need to delete some pair. I call my find method, which returns, some positive integer. I call myMultiMap.erase(pairIndex); but the pair is not being deleted for some reason. That is my problem. Obviously the erase method needs an iterator, not the int. The question is: how do I convert an integer to an iterator?
Thanks!
UPDATE:
I've tried this c.begin() + int_value but got an error error: no match for ‘operator+’ on this line....

Not that I favour your approach, but if the int is distance between begin() and the iterator in question, you can just use
c.begin() + int_value
or
std::advance(c.begin(), int_value)
to get the iterator. The second version is needed for iterators which are not random-access-iterators.
In the interest of your personal sanity (and the program's speed), I'd suggest you return the iterator directly in some form.
There are many possible interfaces that solve this one way or the other. What I would call the "old C way" would be returning by an out parameter:
bool find_stuff(stuff, container::iterator* out_iter) {
...
if(found && out_iter)
*out_iter = found_iter;
return found;
}
use it:
container::iterator the_iter;
if(find_stuff(the_stuff, &the_iter)) ...
or
if(find_stuff(the_stuff, 0)) // if you don't need the iterator
This is not idiomatic C++, but Linus would be pleased with it.
The second possible and theoretically sound version is using something like boost::optional to return the value. This way, you return either some value or none.
boost::optional<container::iterator> find_stuff(stuff) {
...
if(found && out_iter)
return found_iter;
return boost::none;
}
Use:
boost::optional<container::iterator> found = find_stuff(the_stuff);
if(found) {
do something with *found, which is the iterator.
}
or
if(find_stuff(the_stuff)) ...
Third possible solution would be going the std::set::insert way, ie. returning a pair consisting of a flag and a value:
std::pair<bool, container::iterator> find_stuff(stuff) {
...
return std::make_pair(found, found_iter);
}
Use:
std::pair<bool, container::iterator> found = find_stuff(the_stuff);
if(found.first) ...

Consider to change your mulitmap<Person,Hoobby> to set<pair<Person,Hobby> > - then you will not have problems you have now. Or consider to change to map<Person, set<Hobby> >. Both options will not allow to insert duplicate pairs.

Use 2 sets(not multi sets) one for hobbies and one for persons, these two acts as filters so you don't add the same person twice(or hobbie). the insert opertations on these sets gives the iterator for the element that is inserted(or the "right" iterator for element if it allready was inserted). The two iterators you get from inserting into hobbies_set and person_set are now used as key and value in a multimap
Using a third set(not multi_set) for the relation instead of a multi_map, may give the advantage of not needing to check before inserting a relation if it is allready there it will not be added again, and if it's not there it will be added. In both ways it will return an iterator and bool(tells if it was allready there or if it was added)
datastructures:
typedef std::set<Hobbie> Hobbies;
typedef std::set<Person> Persons;
typedef std::pair<Hobbies::iterator,bool> HobbiesInsertRes;
typedef std::pair<Persons::iterator,bool> PersonsInsertRes;
struct Relation {
Hobbies::iterator hobbieIter;
Persons::iterator personIter;
// needed operator<(left for the as an exercies for the reader);
};
typedef set<Relation> Relations;
Hobbies hobbies;
Persons persons;
Relations relations;
insert:
HobbiesInsertRes hres = hobbies.insert(Hobbie("foo"));
PersonsInsertRes pres = persons.insert(Person("bar"));
relations.insert(Relation(hres.first, pres.first));
// adds the relation if does not exists, if it allready did exist, well you only paid the same amount of time that you would have if you would to do a check first.
lookup:
// for a concrete Person-Hobbie lookup use
relations.find(Relation(Hobbie("foo"),Person("Bar")));
// to find all Hobbies of Person X you will need to do some work.
// the easy way, iterate all elements of relations
std::vector<Hobbie> hobbiesOfX;
Persons::iterator personX = persons.find(Person("bar"));
std::for_each(relations.begin(), relations.end(), [&hobbiesOfBar, personX](Relation r){
if(r.personIter = personX)
hobbiesOfX.push_back(r.hobbieIter);
});
// other way to lookup all hobbies of person X
Persons::iterator personX = persons.find(Person("bar"));
relations.lower_bound(Relation(personX,Hobbies.begin());
relations.upper_bound(Relation(personX,Hobbies.end());
// this needs operator< on Relation to be implemented in a way that does ordering on Person first, Hobbie second.

How should std::map be used with a value that does not have a default constructor?

I've got a value type that I want put into a map.
It has a nice default copy constructor, but does not have a default constructor.
I believe that so long as I stay away from using operator[] that everything will be OK.
However I end up with pretty ugly constructs like this to actually insert an object.
(I think insert just fails if there is already a value for that key).
// equivalent to m[5]=x but without default construction
std::map<int,X>::iterator it = m.find(5);
if( it != m.end() )
{
m->second = x;
}
else
{
m->insert( std::make_pair(5,x) );
}
Which I believe will scan the map twice, and also looks pretty ugly.
Is there a neater / more efficient way to do this?

You can simply "insert-or-overwrite" with the standard insert function:
auto p = mymap.insert(std::make_pair(key, new_value));
if (!p.second) p.first->second = new_value; // overwrite value if key already exists
If you want to pass the elements by rerference, make the pair explicit:
insert(std::pair<K&, V&>(key, value));
If you have a typedef for the map like map_t, you can say std::pair<map_t::key_type &, map_t::mapped_type &>, or any suitable variation on this theme.
Maybe this is best wrapped up into a helper:
template <typename Map>
void insert_forcefully(Map & m,
typename Map::key_type const & key,
typename Map::mapped_type const & value)
{
std::pair<typename Map::iterator, bool> p = m.insert(std::pair<typename Map::key_type const &, typename Map::mapped_type const &>(key, value));
if (!p.second) { p.first->second = value; }
}

You could first get the position to insert the pair with lower_bound, then check if it's already there, and if not, insert it, providing the iterator where to insert. Something along those lines.

There are two things you missed in the interface of map (and the like):
insert(value_type) returns a std::pair<iterator, bool>, the .first member points to the element with the key you tried to insert and the .second member indicates whether it is actually the element you tried to insert or another that previously was in the container.
insert(iterator, value_type) allows you to give a hint as to where insert
The latter is not necessarily useful in your situation though.
typedef std::map<int,X> Map;
// insert and check
std::pair<Map::iterator, bool> const result =
map.insert(std::make_pair(5, x)); // O(log N)
if (not result.second)
{
result->first.second = x; // O(1)
// OR
using std::swap;
swap(result->first.second, x);
}
If you type does not support assignment and there is no swap, however, you need to bite the bullet:
// locate and insert
Map::iterator position = map.lower_bound(5); // O(log N)
if (position != map.end() and position->first == 5)
{
position = map.erase(position); // O(1)
}
map.insert(position, std::make_pair(5, x)); // O(log N) if rebalancing
In C++11, the insert methods are doubled:
insert(value_type const&) // insertion by copy
insert(P&&) // insertion by move
and with perfect forwarding we get the new emplace method. Similar to insert, but which construct the element in place by forwarding the arguments to its constructor. How it differentiate arguments for the key and value is a mystery to me though.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js

c++ map find() to possibly insert(): how to optimize operations? - c++

Related

Efficient substitute for std::map::insert_or_assign with hint

Comparator can be used to set a new key, isn't?

Optimization of a C++ code (that uses UnorderedMap and Vector)

C++ find and erase a multimap element

How should std::map be used with a value that does not have a default constructor?

Categories

Resources