Why using a reference as an iterator - c++

I was learning about the emplace() of std::vector and stumble upon this code:
// vector::emplace
#include <iostream>
#include <vector>
int main ()
{
std::vector<int> myvector = {10,20,30};
auto it = myvector.emplace ( myvector.begin()+1, 100 );
myvector.emplace ( it, 200 );
myvector.emplace ( myvector.end(), 300 );
std::cout << "myvector contains:";
for (auto& x: myvector)
std::cout << ' ' << x;
std::cout << '\n';
return 0;
}
I am wondering why in the for loop they use a reference auto& x instead of a simple copy, I tried without the & and it worked the same, is this a security to avoid a copy or a a performance trick ?

The other difference between auto and auto& in this context is that auto& will let you modify the value in the vector. This may be an undesirable bug just waiting to happen. Ideally, if you are going to take a reference only for reading, you should take a const reference: const auto &
The benefit of using the reference when the vector contains objects that are more than a fundamental numeric or pointer type is it won't copy the whole object to a temporary. If the object has any deep copy semantics, or is perhaps a shared_ptr then there may be significant overhead that is totally avoided.
For a fundamental type the copy is usually very fast, so a single copy is preferred, but you can expect the compiler optimiser to do the "right thing" if asked to reference a fundamental and then use that reference numerous times, so for template programming you should favour the const-ref over the copy to keep the code simple when you don't know the type.

It's as simple as you said, it would be a copy. So it's indeed a performance trick, but for an int it won't be any faster, it might be even be slower. But had you had a std::vector<std::string> with million elements then it would make a big difference. You can try it yourself.
But, it's needed if you want to modify the contents of the iterated container. Without the reference, you would be changing the copy, not the element inside the container. The difference would be seen here:
std::vector<int> numbers1 = {1,2,3,4};
std::vector<int> numbers2 = {1,2,3,4};
for(auto& x: numbers1) ++x;
for(auto x: numbers2) ++x;
assert(numbers1!=numbers2); // True
Also I would recommend using auto&& instead of auto& because it will work better with temporaries, see e.g. this answer.

Related

What is the difference between 'const auto& element' and 'auto element'? [duplicate]

In C++11, I can iterate over some container like so:
for(auto i : vec){
std::cout << i << std::endl;
}
But I know that this needlessly - needlessly, since I only need to print the values of vec - makes a copy of (EDIT: each element of) vec, so instead I could do:
for(auto &i : vec){
std::cout << i << std::endl;
}
But I want to make sure that the values of vec are never modified and abide by const-correctness, so I can do:
for(const auto &i : vec){
std::cout << i << std::endl;
}
So my question is: If I only need to look at the values of some container, wouldn't the very last loop (const auto &i) always be preferred due to the increased effieciency of not having an extra copy of (EDIT: each element of) vec?
I have a program that I'm developing in which I'm considering making this change throughout, since efficiency is critical in it (the reason I'm using C++ in the fist place).
Yes. The same reason if you only ever read an argument you make the parameter const&.
T // I'm copying this
T& // I'm modifying this
const T& // I'm reading this
Those are your "defaults". When T is a fundamental type (built-in), though, you generally just revert to const T (no reference) for reading, because a copy is cheaper than aliasing.
I have a program that I'm developing in which I'm considering making this change throughout, since efficiency is critical in it
Don't make blind sweeping changes. A working program is better than a fast but broken program.
How you iterate through your loops probably won't make much of a difference; you're looping for a reason, aren't you? The body of your loop will much more likely be the culprit.
If efficiency is critical, you want to use a profiler to find which parts of your program are actually slow, rather than guess at parts that might be slow. See #2 for why your guess may be wrong.
Disclaimer: In general the difference between auto and auto& is subtle, partly a matter of style, but sometimes also a matter of correctness. I am not going to cover the general case here!
In a range based for loop, the difference between
for (auto element : container) {}
and
for (auto& element_ref : container) {}
is that element is a copy of the elements in the container, while element_ref is a reference to the elements in the container.
To see the difference in action, consider this example:
#include <iostream>
int main(void) {
int a[5] = { 23,443,16,49,66 };
for (auto i : a) i = 5;
for (const auto& i : a) std::cout << i << std::endl;
for (auto& i : a) i = 5;
for (const auto& i : a) std::cout << i << std::endl;
}
It will print
23
443
16
49
66
5
5
5
5
5
because the first loop works on copies of the array elements, while the second actually modifies the elements in the array.
If you dont want to modify the elements then often a const auto& is more appropriate, because it avoids copying the elements (which can be expensive).
Imagine if your vector contains strings. Long strings. 5000 long strings. Copy them unnecessarily and you end up with a nicely written for loop that is awfully inefficient.
Make sure your code follows your intention. If you do not need a copy inside of the loop, do not make one.
Use a reference & as suggested above, or iterators.

Disturbing order of evaluation

When I work with my favorite containers, I tend to chain operations. For instance, in the well-known Erase–remove idiom:
v.erase( std::remove_if(v.begin(), v.end(), is_odd), v.end() );
From what I know of the order of evaluation, v.end() (on the rhs) might be evaluated before the call to std::remove_if. This is not a problem here since std::remove* only shuffle the vector without changing its end iterator.
But it could lead to really surprising constructs, like for instance (demo):
#include <iostream>
struct Data
{
int v;
int value() const { return v; }
};
auto inc(Data& data) { return ++data.v; }
void print_rhs(int, int value) { std::cout << value << '\n'; }
int main()
{
Data data{0};
print_rhs(inc(data), data.value()); // might print 0
}
This is surprising since print_rhs is called after inc has been called; which means data.v is 1 when print_rhs is called. Nevertheless, since data.value() might be evaluated before, 0 is a possible output.
I think it might be a nice improvement if the order of evaluation would be less surprising; in particular if the arguments of a function with side effects were evaluated before those without.
My questions are then:
Has that change ever been discussed or suggested in a C++ committee?
Do you see any problem it could bring?
Has that change ever been discussed or suggested in a C++ committee?
Probably.
Do you see any problem it could bring?
Yes. It could reduce optimization opportunities which exist today, and brings no direct benefit other than the ability to write more one-liners. But one-liners are not a good thing anyway, so this proposal would probably never get past -99 points.

Iterating over a container of unique_ptr's

How does one access unique_ptr elements of a container (via an iterator) without taking ownership away from the container? When one gets an iterator to an element in the container is the element ownership still with the container? How about when one dereferences the iterator to gain access to the unique_ptr? Does that perform an implicit move of the unique_ptr?
I find I'm using shared_ptr a lot when I need to store elements in a container (not by value), even if the container conceptually owns the elements and other code simply wishes to manipulate elements in the container, because I'm afraid of not being able to actually access the unique_ptr elements in the container without ownership being taken from it.
Any insights?
With auto and the range-based for-loops of C++11 this becomes relatively elegant:
std::vector< std::unique_ptr< YourClass >> pointers;
for( auto&& pointer : pointers ) {
pointer->functionOfYourClass();
}
The reference & to the std::unique_ptr avoids the copying and you can use the uniqe_ptr without dereferencing.
As long as you don't try to make a copy of the unique_ptr, you can just use it. You'll have to "double dereference" the iterator to get to the pointer's value, just as you would have to with shared_ptr. Here's a brief example:
#include <vector>
#include <memory>
#include <iostream>
template <class C>
void
display(const C& c)
{
std::cout << '{';
if (!c.empty())
std::cout << *c.front();
for (auto i = std::next(c.begin()); i != c.end(); ++i)
std::cout << ", " << **i;
std::cout << "}\n";
}
int main()
{
typedef std::unique_ptr<int> Ptr;
std::vector<Ptr> v;
for (int i = 1; i <= 5; ++i)
v.push_back(Ptr(new int(i)));
display(v);
for (auto i = v.begin(); i != v.end(); ++i)
**i += 2;
display(v);
}
If you do (accidentally) make a copy of the unique_ptr:
Ptr p = v[0];
then you'll find out at compile time. It won't cause a run time error. Your use case is why container<unique_ptr<T>> was built. Things should just work, and if they don't, the problem appears at compile time instead of run time. So code away, and if you don't understand the compile time error, then ask another question back here.

C++: Proper way to iterate over STL containers

In my game engine project, I make extensive use of the STL, mostly of the std::string and std::vector classes.
In many cases, I have to iterate through them. Right now, the way I'm doing it is:
for( unsigned int i = 0; i < theContainer.size(); i ++ )
{
}
Am I doing it the right way?
If not, why, and what should I do instead?
Is size() really executed every loop cycle with this implementation? Would the performance loss be negligible?
C++11 has a new container aware for loop syntax that can be used if your compiler supports the new standard.
#include <iostream>
#include <vector>
#include <string>
using namespace std;
int main()
{
vector<string> vs;
vs.push_back("One");
vs.push_back("Two");
vs.push_back("Three");
for (const auto &s : vs)
{
cout << s << endl;
}
return 0;
}
You might want to look at the standard algorithms.
For example
vector<mylass> myvec;
// some code where you add elements to your vector
for_each(myvec.begin(), myvec.end(), do_something_with_a_vector_element);
where do_something_with_a_vector_element is a function that does what goes in your loop
for example
void
do_something_with_a_vector_element(const myclass& element)
{
// I use my element here
}
The are lots of standard algorithms - see http://www.cplusplus.com/reference/algorithm/ - so most things are supported
STL containers support Iterators
vector<int> v;
for (vector<int>::iterator it = v.begin(); it!=v.end(); ++it) {
cout << *it << endl;
}
size() would be re-computed every iteration.
For random-access containers, it's not wrong.
But you can use iterators instead.
for (string::const_iterator it = theContainer.begin();
it != theContainer.end(); ++it) {
// do something with *it
}
There are some circumstances under which a compiler may optimize away the .size() (or .end() in the iterator case) calls (e.g. only const access, function is pure). But do not depend on it.
Usually the right way to "iterate" over a container is using "iterators". Something like
string myStr = "hello";
for(string::iterator i = myStr.begin(); i != myStr.end(); ++i){
cout << "Current character: " << *i << endl;
}
Of course, if you aren't going to modify each element, it's best to use string::const_iterator.
And yes, size() gets called every time, and it's O(n), so in many cases the performance loss will be noticeable and it's O(1), but it's a good practice to calculate the size prior to the loop than calling size every time.
No, this is not the correct way to do it. For a ::std::vector or a ::std::string it works fine, but the problem is that if you ever use anything else, it won't work so well. Additionally, it isn't idiomatic.
And, to answer your other question... The size function is probably inline. This means it likely just fetches a value from the internals of ::std::string or ::std::vector. The compiler will optimize this away and only fetch it once in most cases.
But, here is the idiomatic way:
for (::std::vector<Foo>::iterator i = theContainer.begin();
i != theContainer.end();
++i)
{
Foo &cur_element = *i;
// Do stuff
}
The ++i is very important. Again, for ::std:vector or ::std::string where the iterator is basically a pointer, it's not so important. But for more complicated data structures it is. i++ has to make a copy and create a temporary because the old value needs to stick around. ++i has no such issue. Get into the habit of always using ++i unless you have a compelling reason not to.
Lastly, theContainer.end() will also be generally optimized out of existence. But you can force things to be a little better by doing this:
const ::std::vector<Foo>::iterator theEnd = theContainer.end();
for (::std::vector<Foo>::iterator i = theContainer.begin(); i != theEnd; ++i)
{
Foo &cur_element = *i;
// Do stuff
}
Of course, C++0x simplifies all of this considerably with a new syntax for for loops:
for (Foo &i: theContainer)
{
// Do stuff with i
}
These will work on standard fix-sized arrays as well as any type that defines begin and end to return iterator-like things.
Native for-loop (especially index-based) - it's C-way, not C++-way.
Use BOOST_FOREACH for loops.
Compare, for container of integers:
typedef theContainer::const_iterator It;
for( It it = theContainer.begin(); it != theContainer.end(); ++it ) {
std::cout << *it << std::endl;
}
and
BOOST_FOREACH ( int i, theContainer ) {
std::cout << i << std::endl;
}
But this is not perfect way. If you can do your work without loop - you MUST do it without loop. For example, with algorithms and Boost.Phoenix:
boost::range::for_each( theContainer, std::cout << arg1 << std::endl );
I understand that these solutions bring additional dependencies in your code, but Boost is 'must-have' for modern C++.
You're doing it OK for vectors, although that doesn't translate into the right way for other containers.
The more general way is
for(std::vector<foo>::const_iterator i = theContainer.begin(); i != theContainer.end; ++i)
which is more typing than I really like, but will become a lot more reasonable with the redefinition of auto in the forthcoming Standard. This will work on all standard containers. Note that you refer to the individual foo as *i, and use &*i if you want its address.
In your loop, .size() is executed every time. However, it's constant time (Standard, 23.1/5) for all standard containers, so it won't slow you down much if at all. Addition: the Standard says "should" have constant complexity, so a particularly bad implementation could make it not constant. If you're using such a bad implementation, you've got other performance issues to worry about.

In STL maps, is it better to use map::insert than []?

A while ago, I had a discussion with a colleague about how to insert values in STL maps. I preferred map[key] = value; because it feels natural and is clear to read whereas he preferred map.insert(std::make_pair(key, value)).
I just asked him and neither of us can remember the reason why insert is better, but I am sure it was not just a style preference rather there was a technical reason such as efficiency. The SGI STL reference simply says: "Strictly speaking, this member function is unnecessary: it exists only for convenience."
Can anybody tell me that reason, or am I just dreaming that there is one?
When you write
map[key] = value;
there's no way to tell if you replaced the value for key, or if you created a new key with value.
map::insert() will only create:
using std::cout; using std::endl;
typedef std::map<int, std::string> MyMap;
MyMap map;
// ...
std::pair<MyMap::iterator, bool> res = map.insert(MyMap::value_type(key,value));
if ( ! res.second ) {
cout << "key " << key << " already exists "
<< " with value " << (res.first)->second << endl;
} else {
cout << "created key " << key << " with value " << value << endl;
}
For most of my apps, I usually don't care if I'm creating or replacing, so I use the easier to read map[key] = value.
The two have different semantics when it comes to the key already existing in the map. So they aren't really directly comparable.
But the operator[] version requires default constructing the value, and then assigning, so if this is more expensive then copy construction, then it will be more expensive. Sometimes default construction doesn't make sense, and then it would be impossible to use the operator[] version.
Another thing to note with std::map:
myMap[nonExistingKey]; will create a new entry in the map, keyed to nonExistingKey initialized to a default value.
This scared the hell out of me the first time I saw it (while banging my head against a nastly legacy bug). Wouldn't have expected it. To me, that looks like a get operation, and I didn't expect the "side-effect." Prefer map.find() when getting from your map.
If the performance hit of the default constructor isn't an issue, the please, for the love of god, go with the more readable version.
:)
insert is better from the point of exception safety.
The expression map[key] = value is actually two operations:
map[key] - creating a map element with default value.
= value - copying the value into that element.
An exception may happen at the second step. As result the operation will be only partially done (a new element was added into map, but that element was not initialized with value). The situation when an operation is not complete, but the system state is modified, is called the operation with "side effect".
insert operation gives a strong guarantee, means it doesn't have side effects (https://en.wikipedia.org/wiki/Exception_safety). insert is either completely done or it leaves the map in unmodified state.
http://www.cplusplus.com/reference/map/map/insert/:
If a single element is to be inserted, there are no changes in the container in case of exception (strong guarantee).
If your application is speed critical i will advice using [] operator because it creates total 3 copies of the original object out of which 2 are temporary objects and sooner or later destroyed as.
But in insert(), 4 copies of the original object are created out of which 3 are temporary objects( not necessarily "temporaries") and are destroyed.
Which means extra time for:
1. One objects memory allocation
2. One extra constructor call
3. One extra destructor call
4. One objects memory deallocation
If your objects are large, constructors are typical, destructors do a lot of resource freeing, above points count even more. Regarding readability, i think both are fair enough.
The same question came into my mind but not over readability but speed.
Here is a sample code through which I came to know about the point i mentioned.
class Sample
{
static int _noOfObjects;
int _objectNo;
public:
Sample() :
_objectNo( _noOfObjects++ )
{
std::cout<<"Inside default constructor of object "<<_objectNo<<std::endl;
}
Sample( const Sample& sample) :
_objectNo( _noOfObjects++ )
{
std::cout<<"Inside copy constructor of object "<<_objectNo<<std::endl;
}
~Sample()
{
std::cout<<"Destroying object "<<_objectNo<<std::endl;
}
};
int Sample::_noOfObjects = 0;
int main(int argc, char* argv[])
{
Sample sample;
std::map<int,Sample> map;
map.insert( std::make_pair<int,Sample>( 1, sample) );
//map[1] = sample;
return 0;
}
Now in c++11 I think that the best way to insert a pair in a STL map is:
typedef std::map<int, std::string> MyMap;
MyMap map;
auto& result = map.emplace(3,"Hello");
The result will be a pair with:
First element (result.first), points to the pair inserted or point to
the pair with this key if the key already exist.
Second element (result.second), true if the insertion was correct or
false it something went wrong.
PS: If you don´t case about the order you can use std::unordered_map ;)
Thanks!
A gotcha with map::insert() is that it won't replace a value if the key already exists in the map. I've seen C++ code written by Java programmers where they have expected insert() to behave the same way as Map.put() in Java where values are replaced.
One note is that you can also use Boost.Assign:
using namespace std;
using namespace boost::assign; // bring 'map_list_of()' into scope
void something()
{
map<int,int> my_map = map_list_of(1,2)(2,3)(3,4)(4,5)(5,6);
}
Here's another example, showing that operator[] overwrites the value for the key if it exists, but .insert does not overwrite the value if it exists.
void mapTest()
{
map<int,float> m;
for( int i = 0 ; i <= 2 ; i++ )
{
pair<map<int,float>::iterator,bool> result = m.insert( make_pair( 5, (float)i ) ) ;
if( result.second )
printf( "%d=>value %f successfully inserted as brand new value\n", result.first->first, result.first->second ) ;
else
printf( "! The map already contained %d=>value %f, nothing changed\n", result.first->first, result.first->second ) ;
}
puts( "All map values:" ) ;
for( map<int,float>::iterator iter = m.begin() ; iter !=m.end() ; ++iter )
printf( "%d=>%f\n", iter->first, iter->second ) ;
/// now watch this..
m[5]=900.f ; //using operator[] OVERWRITES map values
puts( "All map values:" ) ;
for( map<int,float>::iterator iter = m.begin() ; iter !=m.end() ; ++iter )
printf( "%d=>%f\n", iter->first, iter->second ) ;
}
This is a rather restricted case, but judging from the comments I've received I think it's worth noting.
I've seen people in the past use maps in the form of
map< const key, const val> Map;
to evade cases of accidental value overwriting, but then go ahead writing in some other bits of code:
const_cast< T >Map[]=val;
Their reason for doing this as I recall was because they were sure that in these certain bits of code they were not going to be overwriting map values; hence, going ahead with the more 'readable' method [].
I've never actually had any direct trouble from the code that was written by these people, but I strongly feel up until today that risks - however small - should not be taken when they can be easily avoided.
In cases where you're dealing with map values that absolutely must not be overwritten, use insert. Don't make exceptions merely for readability.
The fact that std::map insert() function doesn't overwrite value associated with the key allows us to write object enumeration code like this:
string word;
map<string, size_t> dict;
while(getline(cin, word)) {
dict.insert(make_pair(word, dict.size()));
}
It's a pretty common problem when we need to map different non-unique objects to some id's in range 0..N. Those id's can be later used, for example, in graph algorithms. Alternative with operator[] would look less readable in my opinion:
string word;
map<string, size_t> dict;
while(getline(cin, word)) {
size_t sz = dict.size();
if (!dict.count(word))
dict[word] = sz;
}
The difference between insert() and operator[] has already been well explained in the other answers. However, new insertion methods for std::map were introduced with C++11 and C++17 respectively:
C++11 offers emplace() as also mentioned in einpoklum's comment and GutiMac's answer.
C++17 offers insert_or_assign() and try_emplace().
Let me give a brief summary of the "new" insertion methods:
emplace(): When used correctly, this method can avoid unnecessary copy or move operations by constructing the element to be inserted in place. Similar to insert(), an element is only inserted if there is no element with the same key in the container.
insert_or_assign(): This method is an "improved" version of operator[]. Unlike operator[], insert_or_assign() doesn't require the map's value type to be default constructible. This overcomes the disadvantage mentioned e.g. in Greg Rogers' answer.
try_emplace(): This method is an "improved" version of emplace(). Unlike emplace(), try_emplace() doesn't modify its arguments (due to move operations) if insertion fails due to a key already existing in the map.
For more details on insert_or_assign() and try_emplace() please see my answer here.
Simple example code on Coliru