Efficiency of map::count followed by map::operator[] - c++

There's a kind-of known problem that when a map contains some elements and the access to that element is needed, while the "element not found" situation is preferred to be handled by just if-check, there's still lacking a standard method to easily "please give me access to the element mapped at given key, tell me by some meaningful value that it doesn't exist, and let me access it otherwise".
My goal is: I need a clean, easy method, looking good an readably in the code, and, of course, efficient. This is the "official" way to do it:
map<int, vector<Command> > m;
map<int, vector<Command> >::iterator i = m.find(required_key);
if ( i == m.end() )
error_not_found();
i->second.SetCode(x);
This is what I use, and prefer due to readability - my question is: how less efficient is this method towards the official one above:
map<int, vector<Command> > m;
if ( !m.count(required_key) )
error_not_found();
m[required_key].SetCode(x);

The obvious efficiency hit is that the lookup has to be done twice while in the "official" code, it is done once. For many accesses on a large map, this will be noticeable. Notice too that even though the "official" code might be less readable, it is still well known and it also works on read-only maps. C++11s auto might help with readability.

Your method does the look-up twice, whereas the official one do it only once.
If error_not_found(); throws, you may instead use std::map::at which throws when the key is not found and just have:
m.at(required_key).SetCode(x);
You may still write an helper function to increase readability and keeping performance:
template<typename Map, typename Key>
auto value(Map& m, const Key& key)
{
auto it = m.find(key);
return it == m.end() ? nullptr : std::addressof(it->second);
}
and then
std::map<int, Command> m;
auto* command = value(m, required_key);
if (command == nullptr) {
return error_not_found();
}
command->SetCode(x);

Related

Why does std::set not have a "contains" member function?

I'm heavily using std::set<int> and often I simply need to check if such a set contains a number or not.
I'd find it natural to write:
if (myset.contains(number))
...
But because of the lack of a contains member, I need to write the cumbersome:
if (myset.find(number) != myset.end())
..
or the not as obvious:
if (myset.count(element) > 0)
..
Is there a reason for this design decision ?
I think it was probably because they were trying to make std::set and std::multiset as similar as possible. (And obviously count has a perfectly sensible meaning for std::multiset.)
Personally I think this was a mistake.
It doesn't look quite so bad if you pretend that count is just a misspelling of contains and write the test as:
if (myset.count(element))
...
It's still a shame though.
To be able to write if (s.contains()), contains() has to return a bool (or a type convertible to bool, which is another story), like binary_search does.
The fundamental reason behind the design decision not to do it this way is that contains() which returns a bool would lose valuable information about where the element is in the collection. find() preserves and returns that information in the form of an iterator, therefore is a better choice for a generic library like STL. This has always been the guiding principle for Alex Stepanov, as he has often explained (for example, here).
As to the count() approach in general, although it's often an okay workaround, the problem with it is that it does more work than a contains() would have to do.
That is not to say that a bool contains() isn't a very nice-to-have or even necessary. A while ago we had a long discussion about this very same issue in the
ISO C++ Standard - Future Proposals group.
It lacks it because nobody added it. Nobody added it because the containers from the STL that the std library incorporated where designed to be minimal in interface. (Note that std::string did not come from the STL in the same way).
If you don't mind some strange syntax, you can fake it:
template<class K>
struct contains_t {
K&& k;
template<class C>
friend bool operator->*( C&& c, contains_t&& ) {
auto range = std::forward<C>(c).equal_range(std::forward<K>(k));
return range.first != range.second;
// faster than:
// return std::forward<C>(c).count( std::forward<K>(k) ) != 0;
// for multi-meows with lots of duplicates
}
};
template<class K>
containts_t<K> contains( K&& k ) {
return {std::forward<K>(k)};
}
use:
if (some_set->*contains(some_element)) {
}
Basically, you can write extension methods for most C++ std types using this technique.
It makes a lot more sense to just do this:
if (some_set.count(some_element)) {
}
but I am amused by the extension method method.
The really sad thing is that writing an efficient contains could be faster on a multimap or multiset, as they just have to find one element, while count has to find each of them and count them.
A multiset containing 1 billion copies of 7 (you know, in case you run out) can have a really slow .count(7), but could have a very fast contains(7).
With the above extension method, we could make it faster for this case by using lower_bound, comparing to end, and then comparing to the element. Doing that for an unordered meow as well as an ordered meow would require fancy SFINAE or container-specific overloads however.
You are looking into particular case and not seeing bigger picture. As stated in documentation std::set meets requirement of AssociativeContainer concept. For that concept it does not make any sense to have contains method, as it is pretty much useless for std::multiset and std::multimap, but count works fine for all of them. Though method contains could be added as an alias for count for std::set, std::map and their hashed versions (like length for size() in std::string ), but looks like library creators did not see real need for it.
Although I don't know why std::set has no contains but count which only ever returns 0 or 1,
you can write a templated contains helper function like this:
template<class Container, class T>
auto contains(const Container& v, const T& x)
-> decltype(v.find(x) != v.end())
{
return v.find(x) != v.end();
}
And use it like this:
if (contains(myset, element)) ...
The true reason for set is a mystery for me, but one possible explanation for this same design in map could be to prevent people from writing inefficient code by accident:
if (myMap.contains("Meaning of universe"))
{
myMap["Meaning of universe"] = 42;
}
Which would result in two map lookups.
Instead, you are forced to get an iterator. This gives you a mental hint that you should reuse the iterator:
auto position = myMap.find("Meaning of universe");
if (position != myMap.cend())
{
position->second = 42;
}
which consumes only one map lookup.
When we realize that set and map are made from the same flesh, we can apply this principle also to set. That is, if we want to act on an item in the set only if it is present in the set, this design can prevent us from writing code as this:
struct Dog
{
std::string name;
void bark();
}
operator <(Dog left, Dog right)
{
return left.name < right.name;
}
std::set<Dog> dogs;
...
if (dogs.contain("Husky"))
{
dogs.find("Husky")->bark();
}
Of course all this is a mere speculation.
Since c++20,
bool contains( const Key& key ) const
is available.
I'd like to point out , as mentioned by Andy, that since C++20 the standard added the contains Member function for maps or set:
bool contains( const Key& key ) const; (since C++20)
Now I'd like to focus my answer regarding performance vs readability.
In term of performance if you compare the two versions:
#include <unordered_map>
#include <string>
using hash_map = std::unordered_map<std::string,std::string>;
hash_map a;
std::string get_cpp20(hash_map& x,std::string str)
{
if(x.contains(str))
return x.at(str);
else
return "";
};
std::string get_cpp17(hash_map& x,std::string str)
{
if(const auto it = x.find(str); it !=x.end())
return it->second;
else
return "";
};
You will find that the cpp20 version takes two calls to std::_Hash_find_last_result while the cpp17 takes only one call.
Now I find myself with many data structure with nested unordered_map.
So you end up with something like this:
using my_nested_map = std::unordered_map<std::string,std::unordered_map<std::string,std::unordered_map<int,std::string>>>;
std::string get_cpp20_nested(my_nested_map& x,std::string level1,std::string level2,int level3)
{
if(x.contains(level1) &&
x.at(level1).contains(level2) &&
x.at(level1).at(level2).contains(level3))
return x.at(level1).at(level2).at(level3);
else
return "";
};
std::string get_cpp17_nested(my_nested_map& x,std::string level1,std::string level2,int level3)
{
if(const auto it_level1=x.find(level1); it_level1!=x.end())
if(const auto it_level2=it_level1->second.find(level2);it_level2!=it_level1->second.end())
if(const auto it_level3=it_level2->second.find(level3);it_level3!=it_level2->second.end())
return it_level3->second;
return "";
};
Now if you have plenty of condition in-between these ifs, using the iterator really is painful, very error prone and unclear, I often find myself looking back at the definition of the map to understand what kind of object was at level 1 or level2, while with the cpp20 version , you see at(level1).at(level2).... and understand immediately what you are dealing with.
So in term of code maintenance/review, contains is a very nice addition.
What about binary_search ?
set <int> set1;
set1.insert(10);
set1.insert(40);
set1.insert(30);
if(std::binary_search(set1.begin(),set1.end(),30))
bool found=true;
contains() has to return a bool. Using C++ 20 compiler I get the following output for the code:
#include<iostream>
#include<map>
using namespace std;
int main()
{
multimap<char,int>mulmap;
mulmap.insert(make_pair('a', 1)); //multiple similar key
mulmap.insert(make_pair('a', 2)); //multiple similar key
mulmap.insert(make_pair('a', 3)); //multiple similar key
mulmap.insert(make_pair('b', 3));
mulmap.insert({'a',4});
mulmap.insert(pair<char,int>('a', 4));
cout<<mulmap.contains('c')<<endl; //Output:0 as it doesn't exist
cout<<mulmap.contains('b')<<endl; //Output:1 as it exist
}
Another reason is that it would give a programmer the false impression that std::set is a set in the math set theory sense. If they implement that, then many other questions would follow: if an std::set has contains() for a value, why doesn't it have it for another set? Where are union(), intersection() and other set operations and predicates?
The answer is, of course, that some of the set operations are already implemented as functions in (std::set_union() etc.) and other are as trivially implemented as contains(). Functions and function objects work better with math abstractions than object members, and they are not limited to the particular container type.
If one need to implement a full math-set functionality, he has not only a choice of underlying container, but also he has a choice of implementation details, e.g., would his theory_union() function work with immutable objects, better suited for functional programming, or would it modify its operands and save memory? Would it be implemented as function object from the start or it'd be better to implement is a C-function, and use std::function<> if needed?
As it is now, std::set is just a container, well-suited for the implementation of set in math sense, but it is nearly as far from being a theoretical set as std::vector from being a theoretical vector.

Get all the keys which matches a query in a map

Say I have more than one key with the same value in a map. Then in that case how do I retrieve all keys that matches a query.
Or, Is there any possibility to tell find operation to search after a specific value.
I am using an std::map, C++.
Would something like this work for you:
void FindKeysWithValue(Value aValue, list<Key>& aList)
{
aList.clear();
for_each(iMap.begin(), iMap.end(), [&] (const pair<Key, Value>& aPair)
{
if (aPair.second == aValue)
{
aList.push_back(aPair.first);
}
});
}
The associative containers probably won't help you too much because for std::map<K, V> the key happens to be unique and chances that your chosen query matches the ordering relation you used may not be too high. If the order matches, you can use the std::map<K, V> members lower_bound() and upper_bound(). For std::multimap<K, V> you can also use equal_range().
In general, i.e., if you query isn't really related to the order, you can use std::copy_if() to get a sequence of objects matching a predicate:
Other other;
// ...
std::vector<Other::value_type> matches;
std::copy_if(other.begin(), other.end(),
std::back_inserter(matches), predicate);
When copying the elements is too expensive, you should probably consider using std:find_if() instead:
for (auto it(other.begin());
other.end() != (it = std::find_if(it, other.end(), predicate));
++it) {
// do something with it
}
The only way is to iterate over map.
this link may be useful: Reverse map lookup
Provided you want quick access and you don't mind using some more space, then you maintain another map that gets stored as value, key. In your case, you would need to handle the duplicate values (that you will be storing as keys).
Not a great idea but definitely an option.
A map is meant for efficient lookup of keys. Lookup based on values is not efficient, and you basically have to iterate through the map, extracting matches yourself:
for(map<A,B>::iterator i = m.begin(); i != m.end(); i++)
if(i->second == foo)
you_found_a_match();
If you intend to do this often, you can build up a multimap mapping the other way, so you can efficiently perform a value-based lookup:
multimap<B,A> reverse;
for(map<A,B>::iterator i = m.begin(); i != m.end(); i++)
reverse.insert(pair<B,A>(i->second,i->first));
You can now easily find the keys with a given value value:
matches = reverse.equal_range(value);
for(multimap<B,A>::iterator i = matches.first; i != matches.second; i++)
A & key = i->second;
If these maps aren't going to grow continuously, it may be more efficient to simply maintain a vector > instead, define a comparator for it based on the value, and use equal_range on that instead.

stl map operator[] bad?

My code reviewers has pointed it out that the use of operator[] of the map is very bad and lead to errors:
map[i] = new someClass; // potential dangling pointer when executed twice
Or
if (map[i]==NULL) ... // implicitly create the entry i in the map
Although I understand the risk after reading the API that the insert() is better of since it checks for duplicate, thus can avoid the dangling pointer from happening, I don't understand that if handled properly, why [] can not be used at all?
I pick map as my internal container exactly because I want to use its quick and self-explaining indexing capability.
I hope someone can either argue more with me or stand on my side:)
The only time (that I can think of) where operator[] can be useful is when you want to set the value of a key (overwrite it if it already has a value), and you know that it is safe to overwrite (which it should be since you should be using smart pointers, not raw pointers) and is cheap to default construct, and in some contexts the value should have no-throw construction and assignment.
e.g. (similar to your first example)
std::map<int, std::unique_ptr<int>> m;
m[3] = std::unique_ptr<int>(new int(5));
m[3] = std::unique_ptr<int>(new int(3)); // No, it should be 3.
Otherwise there are a few ways to do it depending on context, however I would recommend to always use the general solution (that way you can't get it wrong).
Find a value and create it if it doesn't exist:
1. General Solution (recommended as it always works)
std::map<int, std::unique_ptr<int>> m;
auto it = m.lower_bound(3);
if(it == std::end(m) || m.key_comp()(3, it->first))
it = m.insert(it, std::make_pair(3, std::unique_ptr<int>(new int(3)));
2. With cheap default construction of value
std::map<int, std::unique_ptr<int>> m;
auto& obj = m[3]; // value is default constructed if it doesn't exists.
if(!obj)
{
try
{
obj = std::unique_ptr<int>(new int(3)); // default constructed value is overwritten.
}
catch(...)
{
m.erase(3);
throw;
}
}
3. With cheap default construction and no-throw insertion of value
std::map<int, my_objecct> m;
auto& obj = m[3]; // value is default constructed if it doesn't exists.
if(!obj)
obj = my_objecct(3);
Note: You could easily wrap the general solution into a helper method:
template<typename T, typename F>
typename T::iterator find_or_create(T& m, const typename T::key_type& key, const F& factory)
{
auto it = m.lower_bound(key);
if(it == std::end(m) || m.key_comp()(key, it->first))
it = m.insert(it, std::make_pair(key, factory()));
return it;
}
int main()
{
std::map<int, std::unique_ptr<int>> m;
auto it = find_or_create(m, 3, []
{
return std::unique_ptr<int>(new int(3));
});
return 0;
}
Note that I pass a templated factory method instead of a value for the create case, this way there is no overhead when the value was found and does not need to be created. Since the lambda is passed as a template argument the compiler can choose to inline it.
You are right that map::operator[] has to be used with care, but it can be quite useful: if you want to find an element in the map, and if not there create it:
someClass *&obj = map[x];
if (!obj)
obj = new someClass;
obj->doThings();
And there is just one lookup in the map.
If the new fails, you may want to remove the NULL pointer from the map, of course:
someClass *&obj = map[x];
if (!obj)
try
{
obj = new someClass;
}
catch (...)
{
obj.erase(x);
throw;
}
obj->doThings();
Naturally, if you want to find something, but not to insert it:
std::map<int, someClass*>::iterator it = map.find(x); //or ::const_iterator
if (it != map.end())
{
someClass *obj = it->second;
obj->doThings();
}
Claims like "use of operator[] of the map is very bad" should always be a warning sign of almost religious belief. But as with most such claims, there is a bit of truth lurking somewhere. The truth here however is as with almost any other construct in the C++ standard library: be careful and know what you are doing. You can (accidentally) misuse almost everything.
One common problem is potential memory leaks (assuming your map owns the objects):
std::map<int,T*> m;
m[3] = new T;
...
m[3] = new T;
This will obviously leak memory, as it overwrites the pointer. Using insert here correctly isn't easy either, and many people make a mistake that will leak anyways, like:
std::map<int,T*> m;
minsert(std::make_pair(3,new T));
...
m.insert(std::make_pair(3,new T));
While this will not overwrite the old pointer, it will not insert the new and also leak it. The correct way with insert would be (possibly better enhanced with smart pointers):
std::map<int,T*> m;
m.insert(std::make_pair(3,new T));
....
T* tmp = new T;
if( !m.insert(std::make_pair(3,tmp)) )
{
delete tmp;
}
But this is somewhat ugly too. I personally prefer for such simple cases:
std::map<int,T*> m;
T*& tp = m[3];
if( !tp )
{
tp = new T;
}
But this is maybe the same amount of personal preference as your code reviewers have for not allowing op[] usage...
operator [] is avoided for insertion, because for the same reason
you mentioned in your question. It doesn't check for duplicate key
and overwrites on the existing one.
operator [] is mostly avoided for searching in the std::map.
Because, if a key doesn't exist in your map, then operator []
would silently create new key and initialize it (typically to
0). Which may not be a preferable in all cases. One should use
[] only if there is need to create a key, if it doesn't exist.
This is not a problem with [] at all. It's a problem with storing raw pointers in containers.
If your map is like for example this :
std::map< int, int* >
then you lose, because next code snippet would leak memory :
std::map< int, int* > m;
m[3] = new int( 5 );
m[3] = new int( 2 );
if handled properly, why [] can not be used at all?
If you properly tested your code, then your code should still fail the code review, because you used raw pointers.
Other then that, if used properly, there is nothing wrong with using map::operator[]. However, you would probably be better with using insert/find methods, because of possible silent map modification.
map[i] = new someClass; // potential dangling pointer when executed twice
here, the problem isn't map's operator[], but *the lack of smart pointers.
Your pointer should be stored into some RAII object (such as a smart pointer), which imemdiately takes ownership of the allocated object, and ensures it will get freed.
If your code reviewers ignore this, and instead say that you should avid operator[], buy them a good C++ textbook.
if (map[i]==NULL) ... // implicitly create the entry i in the map
That's true. But that's because operator[] is designed to behave differently. Obviously, you shouldn't use it in situations where it does the wrong thing.
Generally the problem is that operator[] implicitly creates a value associated with the passed-in key and inserts a new pair in the map if the key does not occur already. This can break you logic from then on, e.g. when you search whether a certain key exists.
map<int, int> m;
if (m[4] != 0) {
cout << "The object exists" << endl; //furthermore this is not even correct 0 is totally valid value
} else {
cout << "The object does not exist" << endl;
}
if (m.find(4) != m.end()) {
cout << "The object exists" << endl; // We always happen to be in this case because m[4] creates the element
}
I recommend using the operator[] only when you know you will be referencing a key already existing in the map(this by the way proves to be not so infrequent case).
There's nothing wrong with operator[] of map, per se, as long as its
semantics correspond to what you want. The problem is defining what you
want (and knowing the exact semantics of operator[]). There are times
when implicitly creating a new entry with a default value when the entry
isn't present is exactly what you want (e.g. counting words in a text
document, where ++ countMap[word] is all that you need); there are
many other times that it's not.
A more serious problem in your code may be that you are storing pointers
in the map. A more natural solution might be to use a map <keyType,
someClass>, rather than a map <keyType, SomeClass*>. But again, this
depends on the desired semantics; for example, I use a lot of map
which are initialized once, at program start up, with pointers to static
instances. If you're map[i] = ... is in an initialization loop,
executed once at start up, there's probably no issue. If it's something
executed in many different places in the code, there probably is an
issue.
The solution to the problem isn't to ban operator[] (or maps to
pointers). The solution is to start by specifying the exact semantics
you need. And if std::map doesn't provide them directly (it rarely
does), write a small wrapper class which defines the exact semantics you
want, using std::map to implement them. Thus, your wrapper for
operator[] might be:
MappedType MyMap::operator[]( KeyType const& key ) const
{
MyMap::Impl::const_iterator elem = myImpl.find( key );
if ( elem == myImpl.end() )
throw EntryNotFoundError();
return elem->second;
}
or:
MappedType* MyMap::operator[]( KeyType const& key ) const
{
MyMap::Impl::const_iterator elem = myImpl.find( key );
return elem == myImpl.end()
? NULL // or the address of some default value
: &elem->second;
}
Similarly, you might want to use insert rather than operator[] if
you really want to insert a value that isn't already present.
And I've almost never seen a case where you'd insert an immediately
newed object into a map. The usual reason for using new and
delete is that the objects in question have some specific lifetime of
their own (and are not copiable—although not an absolute rule, if
you're newing an object which supports copy and assignment, you're
probably doing something wrong). When the mapped type is a pointer,
then either the pointed to objects are static (and the map is more or
less constant after initialization), or the insertion and removal is
done in the constructor and destructor of the class. (But this is just
a general rule; there are certainly exceptions.)

Advantages of std::for_each over for loop

Are there any advantages of std::for_each over for loop? To me, std::for_each only seems to hinder the readability of code. Why do then some coding standards recommend its use?
The nice thing with C++11 (previously called C++0x), is that this tiresome debate will be settled.
I mean, no one in their right mind, who wants to iterate over a whole collection, will still use this
for(auto it = collection.begin(); it != collection.end() ; ++it)
{
foo(*it);
}
Or this
for_each(collection.begin(), collection.end(), [](Element& e)
{
foo(e);
});
when the range-based for loop syntax is available:
for(Element& e : collection)
{
foo(e);
}
This kind of syntax has been available in Java and C# for some time now, and actually there are way more foreach loops than classical for loops in every recent Java or C# code I saw.
Here are some reasons:
It seems to hinder readability just because you're not used to it and/or not using the right tools around it to make it really easy. (see boost::range and boost::bind/boost::lambda for helpers. Many of these will go into C++0x and make for_each and related functions more useful.)
It allows you to write an algorithm on top of for_each that works with any iterator.
It reduces the chance of stupid typing bugs.
It also opens your mind to the rest of the STL-algorithms, like find_if, sort, replace, etc and these won't look so strange anymore. This can be a huge win.
Update 1:
Most importantly, it helps you go beyond for_each vs. for-loops like that's all there is, and look at the other STL-alogs, like find / sort / partition / copy_replace_if, parallel execution .. or whatever.
A lot of processing can be written very concisely using "the rest" of for_each's siblings, but if all you do is to write a for-loop with various internal logic, then you'll never learn how to use those, and you'll end up inventing the wheel over and over.
And (the soon-to-be available range-style for_each) + lambdas:
for_each(monsters, [](auto& m) { m.think(); });
is IMO more readable than:
for (auto i = monsters.begin(); i != monsters.end(); ++i) {
i->think();
}
Also this:
for_each(bananas, [&](auto& b) { my_monkey.eat(b); );
Is more concise than:
for (auto i = bananas.begin(); i != bananas.end(); ++i) {
my_monkey->eat(*i);
}
But new range based for is probably the best:
for (auto& b : bananas)
my_monkey.eat(b);
But the for_each could be useful, especially if you have several functions to call in order but need to run each method for all objects before next... but maybe that's just me. ;)
Update 2: I've written my own one-liner wrappers of stl-algos that work with ranges instead of pair of iterators. boost::range_ex, once released, will include that and maybe it will be there in C++0x too?
for_each is more generic. You can use it to iterate over any type of container (by passing in the begin/end iterators). You can potentially swap out containers underneath a function which uses for_each without having to update the iteration code. You need to consider that there are other containers in the world than std::vector and plain old C arrays to see the advantages of for_each.
The major drawback of for_each is that it takes a functor, so the syntax is clunky. This is fixed in C++11 (formerly C++0x) with the introduction of lambdas:
std::vector<int> container;
...
std::for_each(container.begin(), container.end(), [](int& i){
i+= 10;
});
This will not look weird to you in 3 years.
Personally, any time I'd need to go out of my way to use std::for_each (write special-purpose functors / complicated boost::lambdas), I find BOOST_FOREACH and C++0x's range-based for clearer:
BOOST_FOREACH(Monster* m, monsters) {
if (m->has_plan())
m->act();
}
vs
std::for_each(monsters.begin(), monsters.end(),
if_then(bind(&Monster::has_plan, _1),
bind(&Monster::act, _1)));
its very subjective, some will say that using for_each will make the code more readable, as it allows to treat different collections with the same conventions.
for_each itslef is implemented as a loop
template<class InputIterator, class Function>
Function for_each(InputIterator first, InputIterator last, Function f)
{
for ( ; first!=last; ++first ) f(*first);
return f;
}
so its up to you to choose what is right for you.
You're mostly correct: most of the time, std::for_each is a net loss. I'd go so far as to compare for_each to goto. goto provides the most versatile flow-control possible -- you can use it to implement virtually any other control structure you can imagine. That very versatility, however, means that seeing a goto in isolation tells you virtually nothing about what's it's intended to do in this situation. As a result, almost nobody in their right mind uses goto except as a last resort.
Among the standard algorithms, for_each is much the same way -- it can be used to implement virtually anything, which means that seeing for_each tells you virtually nothing about what it's being used for in this situation. Unfortunately, people's attitude toward for_each is about where their attitude toward goto was in (say) 1970 or so -- a few people had caught onto the fact that it should be used only as a last resort, but many still consider it the primary algorithm, and rarely if ever use any other. The vast majority of the time, even a quick glance would reveal that one of the alternatives was drastically superior.
Just for example, I'm pretty sure I've lost track of how many times I've seen people writing code to print out the contents of a collection using for_each. Based on posts I've seen, this may well be the single most common use of for_each. They end up with something like:
class XXX {
// ...
public:
std::ostream &print(std::ostream &os) { return os << "my data\n"; }
};
And their post is asking about what combination of bind1st, mem_fun, etc. they need to make something like:
std::vector<XXX> coll;
std::for_each(coll.begin(), coll.end(), XXX::print);
work, and print out the elements of coll. If it really did work exactly as I've written it there, it would be mediocre, but it doesn't -- and by the time you've gotten it to work, it's difficult to find those few bits of code related to what's going on among the pieces that hold it together.
Fortunately, there is a much better way. Add a normal stream inserter overload for XXX:
std::ostream &operator<<(std::ostream *os, XXX const &x) {
return x.print(os);
}
and use std::copy:
std::copy(coll.begin(), coll.end(), std::ostream_iterator<XXX>(std::cout, "\n"));
That does work -- and takes virtually no work at all to figure out that it prints the contents of coll to std::cout.
Like many of the algorithm functions, an initial reaction is to think it's more unreadable to use foreach than a loop. It's been a topic of many flame wars.
Once you get used to the idiom you may find it useful. One obvious advantage is that it forces the coder to separate the inner contents of the loop from the actual iteration functionality. (OK, I think it's an advantage. Other's say you're just chopping up the code with no real benifit).
One other advantage is that when I see foreach, I know that either every item will be processed or an exception will be thrown.
A for loop allows several options for terminating the loop. You can let the loop run its full course, or you can use the break keyword to explicitly jump out of the loop, or use the return keyword to exit the entire function mid-loop. In contrast, foreach does not allow these options, and this makes it more readable. You can just glance at the function name and you know the full nature of the iteration.
Here's an example of a confusing for loop:
for(std::vector<widget>::iterator i = v.begin(); i != v.end(); ++i)
{
/////////////////////////////////////////////////////////////////////
// Imagine a page of code here by programmers who don't refactor
///////////////////////////////////////////////////////////////////////
if(widget->Cost < calculatedAmountSofar)
{
break;
}
////////////////////////////////////////////////////////////////////////
// And then some more code added by a stressed out juniour developer
// *#&$*)#$&#(#)$#(*$&#(&*^$#(*$#)($*#(&$^#($*&#)$(#&*$&#*$#*)$(#*
/////////////////////////////////////////////////////////////////////////
for(std::vector<widgetPart>::iterator ip = widget.GetParts().begin(); ip != widget.GetParts().end(); ++ip)
{
if(ip->IsBroken())
{
return false;
}
}
}
The advantage of writing functional for beeing more readable, might not show up when for(...) and for_each(...).
If you utilize all algorithms in functional.h, instead of using for-loops, the code gets a lot more readable;
iterator longest_tree = std::max_element(forest.begin(), forest.end(), ...);
iterator first_leaf_tree = std::find_if(forest.begin(), forest.end(), ...);
std::transform(forest.begin(), forest.end(), firewood.begin(), ...);
std::for_each(forest.begin(), forest.end(), make_plywood);
is much more readable than;
Forest::iterator longest_tree = it.begin();
for (Forest::const_iterator it = forest.begin(); it != forest.end(); ++it{
if (*it > *longest_tree) {
longest_tree = it;
}
}
Forest::iterator leaf_tree = it.begin();
for (Forest::const_iterator it = forest.begin(); it != forest.end(); ++it{
if (it->type() == LEAF_TREE) {
leaf_tree = it;
break;
}
}
for (Forest::const_iterator it = forest.begin(), jt = firewood.begin();
it != forest.end();
it++, jt++) {
*jt = boost::transformtowood(*it);
}
for (Forest::const_iterator it = forest.begin(); it != forest.end(); ++it{
std::makeplywood(*it);
}
And that is what I think is so nice, generalize the for-loops to one line functions =)
Easy: for_each is useful when you already have a function to handle every array item, so you don't have to write a lambda. Certainly, this
for_each(a.begin(), a.end(), a_item_handler);
is better than
for(auto& item: a) {
a_item_handler(a);
}
Also, ranged for loop only iterates over whole containers from start to end, whilst for_each is more flexible.
The for_each loop is meant to hide the iterators (detail of how a loop is implemented) from the user code and define clear semantics on the operation: each element will be iterated exactly once.
The problem with readability in the current standard is that it requires a functor as the last argument instead of a block of code, so in many cases you must write specific functor type for it. That turns into less readable code as functor objects cannot be defined in-place (local classes defined within a function cannot be used as template arguments) and the implementation of the loop must be moved away from the actual loop.
struct myfunctor {
void operator()( int arg1 ) { code }
};
void apply( std::vector<int> const & v ) {
// code
std::for_each( v.begin(), v.end(), myfunctor() );
// more code
}
Note that if you want to perform an specific operation on each object, you can use std::mem_fn, or boost::bind (std::bind in the next standard), or boost::lambda (lambdas in the next standard) to make it simpler:
void function( int value );
void apply( std::vector<X> const & v ) {
// code
std::for_each( v.begin(), v.end(), boost::bind( function, _1 ) );
// code
}
Which is not less readable and more compact than the hand rolled version if you do have function/method to call in place. The implementation could provide other implementations of the for_each loop (think parallel processing).
The upcoming standard takes care of some of the shortcomings in different ways, it will allow for locally defined classes as arguments to templates:
void apply( std::vector<int> const & v ) {
// code
struct myfunctor {
void operator()( int ) { code }
};
std::for_each( v.begin(), v.end(), myfunctor() );
// code
}
Improving the locality of code: when you browse you see what it is doing right there. As a matter of fact, you don't even need to use the class syntax to define the functor, but use a lambda right there:
void apply( std::vector<int> const & v ) {
// code
std::for_each( v.begin(), v.end(),
[]( int ) { // code } );
// code
}
Even if for the case of for_each there will be an specific construct that will make it more natural:
void apply( std::vector<int> const & v ) {
// code
for ( int i : v ) {
// code
}
// code
}
I tend to mix the for_each construct with hand rolled loops. When only a call to an existing function or method is what I need (for_each( v.begin(), v.end(), boost::bind( &Type::update, _1 ) )) I go for the for_each construct that takes away from the code a lot of boiler plate iterator stuff. When I need something more complex and I cannot implement a functor just a couple of lines above the actual use, I roll my own loop (keeps the operation in place). In non-critical sections of code I might go with BOOST_FOREACH (a co-worker got me into it)
Aside from readability and performance, one aspect commonly overlooked is consistency. There are many ways to implement a for (or while) loop over iterators, from:
for (C::iterator iter = c.begin(); iter != c.end(); iter++) {
do_something(*iter);
}
to:
C::iterator iter = c.begin();
C::iterator end = c.end();
while (iter != end) {
do_something(*iter);
++iter;
}
with many examples in between at varying levels of efficiency and bug potential.
Using for_each, however, enforces consistency by abstracting away the loop:
for_each(c.begin(), c.end(), do_something);
The only thing you have to worry about now is: do you implement the loop body as function, a functor, or a lambda using Boost or C++0x features? Personally, I'd rather worry about that than how to implement or read a random for/while loop.
I used to dislike std::for_each and thought that without lambda, it was done utterly wrong. However I did change my mind some time ago, and now I actually love it. And I think it even improves readability, and makes it easier to test your code in a TDD way.
The std::for_each algorithm can be read as do something with all elements in range, which can improve readability. Say the action that you want to perform is 20 lines long, and the function where the action is performed is also about 20 lines long. That would make a function 40 lines long with a conventional for loop, and only about 20 with std::for_each, thus likely easier to comprehend.
Functors for std::for_each are more likely to be more generic, and thus reusable, e.g:
struct DeleteElement
{
template <typename T>
void operator()(const T *ptr)
{
delete ptr;
}
};
And in the code you'd only have a one-liner like std::for_each(v.begin(), v.end(), DeleteElement()) which is slightly better IMO than an explicit loop.
All of those functors are normally easier to get under unit tests than an explicit for loop in the middle of a long function, and that alone is already a big win for me.
std::for_each is also generally more reliable, as you're less likely to make a mistake with range.
And lastly, compiler might produce slightly better code for std::for_each than for certain types of hand-crafted for loop, as it (for_each) always looks the same for compiler, and compiler writers can put all of their knowledge, to make it as good as they can.
Same applies to other std algorithms like find_if, transform etc.
If you frequently use other algorithms from the STL, there are several advantages to for_each:
It will often be simpler and less error prone than a for loop, partly because you'll be used to functions with this interface, and partly because it actually is a little more concise in many cases.
Although a range-based for loop can be even simpler, it is less flexible (as noted by Adrian McCarthy, it iterates over a whole container).
Unlike a traditional for loop, for_each forces you to write code that will work for any input iterator. Being restricted in this way can actually be a good thing because:
You might actually need to adapt the code to work for a different container later.
At the beginning, it might teach you something and/or change your habits for the better.
Even if you would always write for loops which are perfectly equivalent, other people that modify the same code might not do this without being prompted to use for_each.
Using for_each sometimes makes it more obvious that you can use a more specific STL function to do the same thing. (As in Jerry Coffin's example; it's not necessarily the case that for_each is the best option, but a for loop is not the only alternative.)
With C++11 and two simple templates, you can write
for ( auto x: range(v1+4,v1+6) ) {
x*=2;
cout<< x <<' ';
}
as a replacement for for_each or a loop. Why choose it boils down to brevity and safety, there's no chance of error in an expression that's not there.
For me, for_each was always better on the same grounds when the loop body is already a functor, and I'll take any advantage I can get.
You still use the three-expression for, but now when you see one you know there's something to understand there, it's not boilerplate. I hate boilerplate. I resent its existence. It's not real code, there's nothing to learn by reading it, it's just one more thing that needs checking. The mental effort can be measured by how easy it is to get rusty at checking it.
The templates are
template<typename iter>
struct range_ {
iter begin() {return __beg;} iter end(){return __end;}
range_(iter const&beg,iter const&end) : __beg(beg),__end(end) {}
iter __beg, __end;
};
template<typename iter>
range_<iter> range(iter const &begin, iter const &end)
{ return range_<iter>(begin,end); }
for is for loop that can iterate each element or every third etc. for_each is for iterating only each element. It is clear from its name. So it is more clear what you are intending to do in your code.
for_each allow us to implement Fork-Join pattern . Other than that it supports fluent-interface.
fork-join pattern
We can add implementation gpu::for_each to use cuda/gpu for heterogeneous-parallel computing by calling the lambda task in multiple workers.
gpu::for_each(users.begin(),users.end(),update_summary);
// all summary is complete now
// go access the user-summary here.
And gpu::for_each may wait for the workers work on all the lambda-tasks to finish before executing the next statements.
fluent-interface
It allow us to write human-readable code in concise manner.
accounts::erase(std::remove_if(accounts.begin(),accounts.end(),used_this_year));
std::for_each(accounts.begin(),accounts.end(),mark_dormant);
Mostly you'll have to iterate over the whole collection. Therefore I suggest you write your own for_each() variant, taking only 2 parameters. This will allow you to rewrite Terry Mahaffey's example as:
for_each(container, [](int& i) {
i += 10;
});
I think this is indeed more readable than a for loop. However, this requires the C++0x compiler extensions.
I find for_each to be bad for readability. The concept is a good one but c++ makes it very hard to write readable, at least for me. c++0x lamda expressions will help. I really like the idea of lamdas. However on first glance I think the syntax is very ugly and I'm not 100% sure I'll ever get used to it. Maybe in 5 years I'll have got used to it and not give it a second thought, but maybe not. Time will tell :)
I prefer to use
vector<thing>::iterator istart = container.begin();
vector<thing>::iterator iend = container.end();
for(vector<thing>::iterator i = istart; i != iend; ++i) {
// Do stuff
}
I find an explicit for loop clearer to read and explicity using named variables for the start and end iterators reduces the clutter in the for loop.
Of course cases vary, this is just what I usually find best.
There are a lot of good reasons in other answers but all seem to forget that
for_each allows you to use reverse or pretty much any custom iterator when for loop always starts with begin() iterator.
Example with reverse iterator:
std::list<int> l {1,2,3};
std::for_each(l.rbegin(), l.rend(), [](auto o){std::cout<<o;});
Example with some custom tree iterator:
SomeCustomTree<int> a{1,2,3,4,5,6,7};
auto node = a.find(4);
std::for_each(node.breadthFirstBegin(), node.breadthFirstEnd(), [](auto o){std::cout<<o;});
You can have the iterator be a call to a function that is performed on each iteration through the loop.
See here:
http://www.cplusplus.com/reference/algorithm/for_each/
For loop can break;
I dont want to be a parrot for Herb Sutter so here is the link to his presentation:
http://channel9.msdn.com/Events/BUILD/BUILD2011/TOOL-835T
Be sure to read the comments also :)
std::for_each is great when you don't have a range.
For example, consider std::istream_iterator:
using Iter = std::istream_iterator<int>;
for (Iter i(str); i != Iter(); ++i) {
f(*i);
}
It has no container, so you can't easily use a for (auto &&item: ...) loop, but you can do:
std::for_each(Iter(str), Iter(), [](int item)
// ...
});

In STL maps, is it better to use map::insert than []?

A while ago, I had a discussion with a colleague about how to insert values in STL maps. I preferred map[key] = value; because it feels natural and is clear to read whereas he preferred map.insert(std::make_pair(key, value)).
I just asked him and neither of us can remember the reason why insert is better, but I am sure it was not just a style preference rather there was a technical reason such as efficiency. The SGI STL reference simply says: "Strictly speaking, this member function is unnecessary: it exists only for convenience."
Can anybody tell me that reason, or am I just dreaming that there is one?
When you write
map[key] = value;
there's no way to tell if you replaced the value for key, or if you created a new key with value.
map::insert() will only create:
using std::cout; using std::endl;
typedef std::map<int, std::string> MyMap;
MyMap map;
// ...
std::pair<MyMap::iterator, bool> res = map.insert(MyMap::value_type(key,value));
if ( ! res.second ) {
cout << "key " << key << " already exists "
<< " with value " << (res.first)->second << endl;
} else {
cout << "created key " << key << " with value " << value << endl;
}
For most of my apps, I usually don't care if I'm creating or replacing, so I use the easier to read map[key] = value.
The two have different semantics when it comes to the key already existing in the map. So they aren't really directly comparable.
But the operator[] version requires default constructing the value, and then assigning, so if this is more expensive then copy construction, then it will be more expensive. Sometimes default construction doesn't make sense, and then it would be impossible to use the operator[] version.
Another thing to note with std::map:
myMap[nonExistingKey]; will create a new entry in the map, keyed to nonExistingKey initialized to a default value.
This scared the hell out of me the first time I saw it (while banging my head against a nastly legacy bug). Wouldn't have expected it. To me, that looks like a get operation, and I didn't expect the "side-effect." Prefer map.find() when getting from your map.
If the performance hit of the default constructor isn't an issue, the please, for the love of god, go with the more readable version.
:)
insert is better from the point of exception safety.
The expression map[key] = value is actually two operations:
map[key] - creating a map element with default value.
= value - copying the value into that element.
An exception may happen at the second step. As result the operation will be only partially done (a new element was added into map, but that element was not initialized with value). The situation when an operation is not complete, but the system state is modified, is called the operation with "side effect".
insert operation gives a strong guarantee, means it doesn't have side effects (https://en.wikipedia.org/wiki/Exception_safety). insert is either completely done or it leaves the map in unmodified state.
http://www.cplusplus.com/reference/map/map/insert/:
If a single element is to be inserted, there are no changes in the container in case of exception (strong guarantee).
If your application is speed critical i will advice using [] operator because it creates total 3 copies of the original object out of which 2 are temporary objects and sooner or later destroyed as.
But in insert(), 4 copies of the original object are created out of which 3 are temporary objects( not necessarily "temporaries") and are destroyed.
Which means extra time for:
1. One objects memory allocation
2. One extra constructor call
3. One extra destructor call
4. One objects memory deallocation
If your objects are large, constructors are typical, destructors do a lot of resource freeing, above points count even more. Regarding readability, i think both are fair enough.
The same question came into my mind but not over readability but speed.
Here is a sample code through which I came to know about the point i mentioned.
class Sample
{
static int _noOfObjects;
int _objectNo;
public:
Sample() :
_objectNo( _noOfObjects++ )
{
std::cout<<"Inside default constructor of object "<<_objectNo<<std::endl;
}
Sample( const Sample& sample) :
_objectNo( _noOfObjects++ )
{
std::cout<<"Inside copy constructor of object "<<_objectNo<<std::endl;
}
~Sample()
{
std::cout<<"Destroying object "<<_objectNo<<std::endl;
}
};
int Sample::_noOfObjects = 0;
int main(int argc, char* argv[])
{
Sample sample;
std::map<int,Sample> map;
map.insert( std::make_pair<int,Sample>( 1, sample) );
//map[1] = sample;
return 0;
}
Now in c++11 I think that the best way to insert a pair in a STL map is:
typedef std::map<int, std::string> MyMap;
MyMap map;
auto& result = map.emplace(3,"Hello");
The result will be a pair with:
First element (result.first), points to the pair inserted or point to
the pair with this key if the key already exist.
Second element (result.second), true if the insertion was correct or
false it something went wrong.
PS: If you don´t case about the order you can use std::unordered_map ;)
Thanks!
A gotcha with map::insert() is that it won't replace a value if the key already exists in the map. I've seen C++ code written by Java programmers where they have expected insert() to behave the same way as Map.put() in Java where values are replaced.
One note is that you can also use Boost.Assign:
using namespace std;
using namespace boost::assign; // bring 'map_list_of()' into scope
void something()
{
map<int,int> my_map = map_list_of(1,2)(2,3)(3,4)(4,5)(5,6);
}
Here's another example, showing that operator[] overwrites the value for the key if it exists, but .insert does not overwrite the value if it exists.
void mapTest()
{
map<int,float> m;
for( int i = 0 ; i <= 2 ; i++ )
{
pair<map<int,float>::iterator,bool> result = m.insert( make_pair( 5, (float)i ) ) ;
if( result.second )
printf( "%d=>value %f successfully inserted as brand new value\n", result.first->first, result.first->second ) ;
else
printf( "! The map already contained %d=>value %f, nothing changed\n", result.first->first, result.first->second ) ;
}
puts( "All map values:" ) ;
for( map<int,float>::iterator iter = m.begin() ; iter !=m.end() ; ++iter )
printf( "%d=>%f\n", iter->first, iter->second ) ;
/// now watch this..
m[5]=900.f ; //using operator[] OVERWRITES map values
puts( "All map values:" ) ;
for( map<int,float>::iterator iter = m.begin() ; iter !=m.end() ; ++iter )
printf( "%d=>%f\n", iter->first, iter->second ) ;
}
This is a rather restricted case, but judging from the comments I've received I think it's worth noting.
I've seen people in the past use maps in the form of
map< const key, const val> Map;
to evade cases of accidental value overwriting, but then go ahead writing in some other bits of code:
const_cast< T >Map[]=val;
Their reason for doing this as I recall was because they were sure that in these certain bits of code they were not going to be overwriting map values; hence, going ahead with the more 'readable' method [].
I've never actually had any direct trouble from the code that was written by these people, but I strongly feel up until today that risks - however small - should not be taken when they can be easily avoided.
In cases where you're dealing with map values that absolutely must not be overwritten, use insert. Don't make exceptions merely for readability.
The fact that std::map insert() function doesn't overwrite value associated with the key allows us to write object enumeration code like this:
string word;
map<string, size_t> dict;
while(getline(cin, word)) {
dict.insert(make_pair(word, dict.size()));
}
It's a pretty common problem when we need to map different non-unique objects to some id's in range 0..N. Those id's can be later used, for example, in graph algorithms. Alternative with operator[] would look less readable in my opinion:
string word;
map<string, size_t> dict;
while(getline(cin, word)) {
size_t sz = dict.size();
if (!dict.count(word))
dict[word] = sz;
}
The difference between insert() and operator[] has already been well explained in the other answers. However, new insertion methods for std::map were introduced with C++11 and C++17 respectively:
C++11 offers emplace() as also mentioned in einpoklum's comment and GutiMac's answer.
C++17 offers insert_or_assign() and try_emplace().
Let me give a brief summary of the "new" insertion methods:
emplace(): When used correctly, this method can avoid unnecessary copy or move operations by constructing the element to be inserted in place. Similar to insert(), an element is only inserted if there is no element with the same key in the container.
insert_or_assign(): This method is an "improved" version of operator[]. Unlike operator[], insert_or_assign() doesn't require the map's value type to be default constructible. This overcomes the disadvantage mentioned e.g. in Greg Rogers' answer.
try_emplace(): This method is an "improved" version of emplace(). Unlike emplace(), try_emplace() doesn't modify its arguments (due to move operations) if insertion fails due to a key already existing in the map.
For more details on insert_or_assign() and try_emplace() please see my answer here.
Simple example code on Coliru