Is there a performance difference between i++ and ++i in C++? - c++

We have the question is there a performance difference between i++ and ++i in C?
What's the answer for C++?

[Executive Summary: Use ++i if you don't have a specific reason to use i++.]
For C++, the answer is a bit more complicated.
If i is a simple type (not an instance of a C++ class), then the answer given for C ("No there is no performance difference") holds, since the compiler is generating the code.
However, if i is an instance of a C++ class, then i++ and ++i are making calls to one of the operator++ functions. Here's a standard pair of these functions:
Foo& Foo::operator++() // called for ++i
{
this->data += 1;
return *this;
}
Foo Foo::operator++(int ignored_dummy_value) // called for i++
{
Foo tmp(*this); // variable "tmp" cannot be optimized away by the compiler
++(*this);
return tmp;
}
Since the compiler isn't generating code, but just calling an operator++ function, there is no way to optimize away the tmp variable and its associated copy constructor. If the copy constructor is expensive, then this can have a significant performance impact.

Yes. There is.
The ++ operator may or may not be defined as a function. For primitive types (int, double, ...) the operators are built in, so the compiler will probably be able to optimize your code. But in the case of an object that defines the ++ operator things are different.
The operator++(int) function must create a copy. That is because postfix ++ is expected to return a different value than what it holds: it must hold its value in a temp variable, increment its value and return the temp. In the case of operator++(), prefix ++, there is no need to create a copy: the object can increment itself and then simply return itself.
Here is an illustration of the point:
struct C
{
C& operator++(); // prefix
C operator++(int); // postfix
private:
int i_;
};
C& C::operator++()
{
++i_;
return *this; // self, no copy created
}
C C::operator++(int ignored_dummy_value)
{
C t(*this);
++(*this);
return t; // return a copy
}
Every time you call operator++(int) you must create a copy, and the compiler can't do anything about it. When given the choice, use operator++(); this way you don't save a copy. It might be significant in the case of many increments (large loop?) and/or large objects.

Here's a benchmark for the case when increment operators are in different translation units. Compiler with g++ 4.5.
Ignore the style issues for now
// a.cc
#include <ctime>
#include <array>
class Something {
public:
Something& operator++();
Something operator++(int);
private:
std::array<int,PACKET_SIZE> data;
};
int main () {
Something s;
for (int i=0; i<1024*1024*30; ++i) ++s; // warm up
std::clock_t a = clock();
for (int i=0; i<1024*1024*30; ++i) ++s;
a = clock() - a;
for (int i=0; i<1024*1024*30; ++i) s++; // warm up
std::clock_t b = clock();
for (int i=0; i<1024*1024*30; ++i) s++;
b = clock() - b;
std::cout << "a=" << (a/double(CLOCKS_PER_SEC))
<< ", b=" << (b/double(CLOCKS_PER_SEC)) << '\n';
return 0;
}
O(n) increment
Test
// b.cc
#include <array>
class Something {
public:
Something& operator++();
Something operator++(int);
private:
std::array<int,PACKET_SIZE> data;
};
Something& Something::operator++()
{
for (auto it=data.begin(), end=data.end(); it!=end; ++it)
++*it;
return *this;
}
Something Something::operator++(int)
{
Something ret = *this;
++*this;
return ret;
}
Results
Results (timings are in seconds) with g++ 4.5 on a virtual machine:
Flags (--std=c++0x) ++i i++
-DPACKET_SIZE=50 -O1 1.70 2.39
-DPACKET_SIZE=50 -O3 0.59 1.00
-DPACKET_SIZE=500 -O1 10.51 13.28
-DPACKET_SIZE=500 -O3 4.28 6.82
O(1) increment
Test
Let us now take the following file:
// c.cc
#include <array>
class Something {
public:
Something& operator++();
Something operator++(int);
private:
std::array<int,PACKET_SIZE> data;
};
Something& Something::operator++()
{
return *this;
}
Something Something::operator++(int)
{
Something ret = *this;
++*this;
return ret;
}
It does nothing in the incrementation. This simulates the case when incrementation has constant complexity.
Results
Results now vary extremely:
Flags (--std=c++0x) ++i i++
-DPACKET_SIZE=50 -O1 0.05 0.74
-DPACKET_SIZE=50 -O3 0.08 0.97
-DPACKET_SIZE=500 -O1 0.05 2.79
-DPACKET_SIZE=500 -O3 0.08 2.18
-DPACKET_SIZE=5000 -O3 0.07 21.90
Conclusion
Performance-wise
If you do not need the previous value, make it a habit to use pre-increment. Be consistent even with builtin types, you'll get used to it and do not run risk of suffering unecessary performance loss if you ever replace a builtin type with a custom type.
Semantic-wise
i++ says increment i, I am interested in the previous value, though.
++i says increment i, I am interested in the current value or increment i, no interest in the previous value. Again, you'll get used to it, even if you are not right now.
Knuth.
Premature optimization is the root of all evil. As is premature pessimization.

It's not entirely correct to say that the compiler can't optimize away the temporary variable copy in the postfix case. A quick test with VC shows that it, at least, can do that in certain cases.
In the following example, the code generated is identical for prefix and postfix, for instance:
#include <stdio.h>
class Foo
{
public:
Foo() { myData=0; }
Foo(const Foo &rhs) { myData=rhs.myData; }
const Foo& operator++()
{
this->myData++;
return *this;
}
const Foo operator++(int)
{
Foo tmp(*this);
this->myData++;
return tmp;
}
int GetData() { return myData; }
private:
int myData;
};
int main(int argc, char* argv[])
{
Foo testFoo;
int count;
printf("Enter loop count: ");
scanf("%d", &count);
for(int i=0; i<count; i++)
{
testFoo++;
}
printf("Value: %d\n", testFoo.GetData());
}
Whether you do ++testFoo or testFoo++, you'll still get the same resulting code. In fact, without reading the count in from the user, the optimizer got the whole thing down to a constant. So this:
for(int i=0; i<10; i++)
{
testFoo++;
}
printf("Value: %d\n", testFoo.GetData());
Resulted in the following:
00401000 push 0Ah
00401002 push offset string "Value: %d\n" (402104h)
00401007 call dword ptr [__imp__printf (4020A0h)]
So while it's certainly the case that the postfix version could be slower, it may well be that the optimizer will be good enough to get rid of the temporary copy if you're not using it.

The Google C++ Style Guide says:
Preincrement and Predecrement
Use prefix form (++i) of the increment and decrement operators with
iterators and other template objects.
Definition: When a variable is incremented (++i or i++) or decremented (--i or
i--) and the value of the expression is not used, one must decide
whether to preincrement (decrement) or postincrement (decrement).
Pros: When the return value is ignored, the "pre" form (++i) is never less
efficient than the "post" form (i++), and is often more efficient.
This is because post-increment (or decrement) requires a copy of i to
be made, which is the value of the expression. If i is an iterator or
other non-scalar type, copying i could be expensive. Since the two
types of increment behave the same when the value is ignored, why not
just always pre-increment?
Cons: The tradition developed, in C, of using post-increment when the
expression value is not used, especially in for loops. Some find
post-increment easier to read, since the "subject" (i) precedes the
"verb" (++), just like in English.
Decision: For simple scalar (non-object) values there is no reason to prefer one
form and we allow either. For iterators and other template types, use
pre-increment.

++i - faster not using the return value
i++ - faster using the return value
When not using the return value the compiler is guaranteed not to use a temporary in the case of ++i. Not guaranteed to be faster, but guaranteed not to be slower.
When using the return value i++ allows the processor to push both the
increment and the left side into the pipeline since they don't depend on each other. ++i may stall the pipeline because the processor cannot start the left side until the pre-increment operation has meandered all the way through. Again, a pipeline stall is not guaranteed, since the processor may find other useful things to stick in.

I would like to point out an excellent post by Andrew Koenig on Code Talk very recently.
http://dobbscodetalk.com/index.php?option=com_myblog&show=Efficiency-versus-intent.html&Itemid=29
At our company also we use convention of ++iter for consistency and performance where applicable. But Andrew raises over-looked detail regarding intent vs performance. There are times when we want to use iter++ instead of ++iter.
So, first decide your intent and if pre or post does not matter then go with pre as it will have some performance benefit by avoiding creation of extra object and throwing it.

#Ketan
...raises over-looked detail regarding intent vs performance. There are times when we want to use iter++ instead of ++iter.
Obviously post and pre-increment have different semantics and I'm sure everyone agrees that when the result is used you should use the appropriate operator. I think the question is what should one do when the result is discarded (as in for loops). The answer to this question (IMHO) is that, since the performance considerations are negligible at best, you should do what is more natural. For myself ++i is more natural but my experience tells me that I'm in a minority and using i++ will cause less metal overhead for most people reading your code.
After all that's the reason the language is not called "++C".[*]
[*] Insert obligatory discussion about ++C being a more logical name.

Mark: Just wanted to point out that operator++'s are good candidates to be inlined, and if the compiler elects to do so, the redundant copy will be eliminated in most cases. (e.g. POD types, which iterators usually are.)
That said, it's still better style to use ++iter in most cases. :-)

The performance difference between ++i and i++ will be more apparent when you think of operators as value-returning functions and how they are implemented. To make it easier to understand what's happening, the following code examples will use int as if it were a struct.
++i increments the variable, then returns the result. This can be done in-place and with minimal CPU time, requiring only one line of code in many cases:
int& int::operator++() {
return *this += 1;
}
But the same cannot be said of i++.
Post-incrementing, i++, is often seen as returning the original value before incrementing. However, a function can only return a result when it is finished. As a result, it becomes necessary to create a copy of the variable containing the original value, increment the variable, then return the copy holding the original value:
int int::operator++(int& _Val) {
int _Original = _Val;
_Val += 1;
return _Original;
}
When there is no functional difference between pre-increment and post-increment, the compiler can perform optimization such that there is no performance difference between the two. However, if a composite data type such as a struct or class is involved, the copy constructor will be called on post-increment, and it will not be possible to perform this optimization if a deep copy is needed. As such, pre-increment generally is faster and requires less memory than post-increment.

#Mark: I deleted my previous answer because it was a bit flip, and deserved a downvote for that alone. I actually think it's a good question in the sense that it asks what's on the minds of a lot of people.
The usual answer is that ++i is faster than i++, and no doubt it is, but the bigger question is "when should you care?"
If the fraction of CPU time spent in incrementing iterators is less than 10%, then you may not care.
If the fraction of CPU time spent in incrementing iterators is greater than 10%, you can look at which statements are doing that iterating. See if you could just increment integers rather than using iterators. Chances are you could, and while it may be in some sense less desirable, chances are pretty good you will save essentially all the time spent in those iterators.
I've seen an example where the iterator-incrementing was consuming well over 90% of the time. In that case, going to integer-incrementing reduced execution time by essentially that amount. (i.e. better than 10x speedup)

#wilhelmtell
The compiler can elide the temporary. Verbatim from the other thread:
The C++ compiler is allowed to eliminate stack based temporaries even if doing so changes program behavior. MSDN link for VC 8:
http://msdn.microsoft.com/en-us/library/ms364057(VS.80).aspx

Since you asked for C++ too, here is a benchmark for java (made with jmh) :
private static final int LIMIT = 100000;
#Benchmark
public void postIncrement() {
long a = 0;
long b = 0;
for (int i = 0; i < LIMIT; i++) {
b = 3;
a += i * (b++);
}
doNothing(a, b);
}
#Benchmark
public void preIncrement() {
long a = 0;
long b = 0;
for (int i = 0; i < LIMIT; i++) {
b = 3;
a += i * (++b);
}
doNothing(a, b);
}
The result shows, even when the value of the incremented variable (b) is actually used in some computation, forcing the need to store an additional value in case of post-increment, the time per operation is exactly the same :
Benchmark Mode Cnt Score Error Units
IncrementBenchmark.postIncrement avgt 10 0,039 0,001 ms/op
IncrementBenchmark.preIncrement avgt 10 0,039 0,001 ms/op

An the reason why you ought to use ++i even on built-in types where there's no performance advantage is to create a good habit for yourself.

Both are as fast ;)
If you want it is the same calculation for the processor, it's just the order in which it is done that differ.
For example, the following code :
#include <stdio.h>
int main()
{
int a = 0;
a++;
int b = 0;
++b;
return 0;
}
Produce the following assembly :
0x0000000100000f24 <main+0>: push %rbp
0x0000000100000f25 <main+1>: mov %rsp,%rbp
0x0000000100000f28 <main+4>: movl $0x0,-0x4(%rbp)
0x0000000100000f2f <main+11>: incl -0x4(%rbp)
0x0000000100000f32 <main+14>: movl $0x0,-0x8(%rbp)
0x0000000100000f39 <main+21>: incl -0x8(%rbp)
0x0000000100000f3c <main+24>: mov $0x0,%eax
0x0000000100000f41 <main+29>: leaveq
0x0000000100000f42 <main+30>: retq
You see that for a++ and b++ it's an incl mnemonic, so it's the same operation ;)

The intended question was about when the result is unused (that's clear from the question for C). Can somebody fix this since the question is "community wiki"?
About premature optimizations, Knuth is often quoted. That's right. but Donald Knuth would never defend with that the horrible code which you can see in these days. Ever seen a = b + c among Java Integers (not int)? That amounts to 3 boxing/unboxing conversions. Avoiding stuff like that is important. And uselessly writing i++ instead of ++i is the same mistake.
EDIT: As phresnel nicely puts it in a comment, this can be summed up as "premature optimization is evil, as is premature pessimization".
Even the fact that people are more used to i++ is an unfortunate C legacy, caused by a conceptual mistake by K&R (if you follow the intent argument, that's a logical conclusion; and defending K&R because they're K&R is meaningless, they're great, but they aren't great as language designers; countless mistakes in the C design exist, ranging from gets() to strcpy(), to the strncpy() API (it should have had the strlcpy() API since day 1)).
Btw, I'm one of those not used enough to C++ to find ++i annoying to read. Still, I use that since I acknowledge that it's right.

i++ is sometimes faster than ++i!
For x86-architectures that use ILP (instruction level paralellism) i++ might in some situations outperform ++i.
Why? Because of data dependencies. Modern CPUs parallelise a lot of stuff. If the next few CPU cycles don't have any direct dependency on the incremented value of i, the CPU might omit microcode to delay the increment of i and shove it in a "free slot". This means that you essentially get a "free" increment.
I don't know how far ILE goes in this case but I suppose if the iterator becomes too complicated and does pointer dereferencing this might not work.
Here's a talk by Andrei Alexandrescu explaining the concept: https://www.youtube.com/watch?v=vrfYLlR8X8k&list=WL&index=5

++i is faster than i = i +1 because in i = i + 1 two operation are taking place, first increment and second assigning it to a variable. But in i++ only increment operation is taking place.

Time to provide folks with gems of wisdom ;) - there is simple trick to make C++ postfix increment behave pretty much the same as prefix increment (Invented this for myself, but the saw it as well in other people code, so I'm not alone).
Basically, trick is to use helper class to postpone increment after the return, and RAII comes to rescue
#include <iostream>
class Data {
private: class DataIncrementer {
private: Data& _dref;
public: DataIncrementer(Data& d) : _dref(d) {}
public: ~DataIncrementer() {
++_dref;
}
};
private: int _data;
public: Data() : _data{0} {}
public: Data(int d) : _data{d} {}
public: Data(const Data& d) : _data{ d._data } {}
public: Data& operator=(const Data& d) {
_data = d._data;
return *this;
}
public: ~Data() {}
public: Data& operator++() { // prefix
++_data;
return *this;
}
public: Data operator++(int) { // postfix
DataIncrementer t(*this);
return *this;
}
public: operator int() {
return _data;
}
};
int
main() {
Data d(1);
std::cout << d << '\n';
std::cout << ++d << '\n';
std::cout << d++ << '\n';
std::cout << d << '\n';
return 0;
}
Invented is for some heavy custom iterators code, and it cuts down run-time. Cost of prefix vs postfix is one reference now, and if this is custom operator doing heavy moving around, prefix and postfix yielded the same run-time for me.

++i is faster than i++ because it doesn't return an old copy of the value.
It's also more intuitive:
x = i++; // x contains the old value of i
y = ++i; // y contains the new value of i
This C example prints "02" instead of the "12" you might expect:
#include <stdio.h>
int main(){
int a = 0;
printf("%d", a++);
printf("%d", ++a);
return 0;
}
Same for C++:
#include <iostream>
using namespace std;
int main(){
int a = 0;
cout << a++;
cout << ++a;
return 0;
}

Related

Performance with value semantics

I am very concerned about performance and readability of the code and I get most of my ideas from Chandler Carruth from Google. I would like to apply the following rules for C++ for clean code without loosing performance at all.
Pass all Builtin types as values
Pass all objects that you don't want to mutate by const reference
Pass all objects that your function needs to consume by value
Ban everything else. When in corner cases, pass a pointer.
That way, functions don't have side effects. That's a must for code readability and makes C++ kind of functionnal. Now comes performance. What can you do if you want to write a function that adds 1 to every element of a std::vector? Here is my solution.
std::vector<int> add_one(std::vector<int> v) {
for (std::size_t k = 0; k < v.size(); ++k) {
v[k] += 1;
}
return v;
}
...
v = add_one(std::move(v));
...
I find this very elegant and makes only 2 moves. Here are my questions:
Is it legal C++11 ?
Do you think of any drawback with this design?
Couldn't a compiler automatically transform v = f(v) into this? A kind of copy elision.
PS: People ask me why I don't like to pass by reference. I have 2 arguments:
1 - It's does not make explicit at the call site which argument might get mutated.
2 - It sometimes kill performance. References and pointers are hell for compiler because of aliasing. Let's take the following code
std::array<double, 2> fval(const std::array<double, 2>& v) {
std::array<double, 2> ans;
ans[0] = cos(v[0] + v[1]);
ans[1] = sin(v[0] + v[1]);
return ans;
}
The same code that takes ans as a reference is 2 times slower:
std::array<double, 2> fref(const std::array<double, 2>& v,
std::array<double, 2>& ans) {
ans[0] = cos(v[0] + v[1]);
ans[1] = sin(v[0] + v[1]);
}
Pointer aliasing prevents the compiler from computing sin and cos with a single machine instruction (Gcc does not currently do that optimization, but icpc makes the optimization with value semantics).
It appears to be legal C++11 to me. Drawbacks may possibly be opinion-based so I won't address that, but as for your third point, the compiler could only do that transformation if it can prove that the function doesn't also alias v in any way. As such compiler writers may not have elected to implement such an optimization for the simple cases they can analyze and leave that burden (no-aliasing) on the programmer.
However also consider: When the function resists being written in a clear version with obvious performance characteristics perhaps you're writing the wrong function. Instead write an algorithm that operates on a range like the standard library and the problem goes away:
template <typename Iterator>
void add_one(Iterator first, Iterator last)
{
for(; first != last; ++first)
{
(*first) += 1;
}
}

Efficient way to return a std::vector in c++

How much data is copied, when returning a std::vector in a function and how big an optimization will it be to place the std::vector in free-store (on the heap) and return a pointer instead i.e. is:
std::vector *f()
{
std::vector *result = new std::vector();
/*
Insert elements into result
*/
return result;
}
more efficient than:
std::vector f()
{
std::vector result;
/*
Insert elements into result
*/
return result;
}
?
In C++11, this is the preferred way:
std::vector<X> f();
That is, return by value.
With C++11, std::vector has move-semantics, which means the local vector declared in your function will be moved on return and in some cases even the move can be elided by the compiler.
You should return by value.
The standard has a specific feature to improve the efficiency of returning by value. It's called "copy elision", and more specifically in this case the "named return value optimization (NRVO)".
Compilers don't have to implement it, but then again compilers don't have to implement function inlining (or perform any optimization at all). But the performance of the standard libraries can be pretty poor if compilers don't optimize, and all serious compilers implement inlining and NRVO (and other optimizations).
When NRVO is applied, there will be no copying in the following code:
std::vector<int> f() {
std::vector<int> result;
... populate the vector ...
return result;
}
std::vector<int> myvec = f();
But the user might want to do this:
std::vector<int> myvec;
... some time later ...
myvec = f();
Copy elision does not prevent a copy here because it's an assignment rather than an initialization. However, you should still return by value. In C++11, the assignment is optimized by something different, called "move semantics". In C++03, the above code does cause a copy, and although in theory an optimizer might be able to avoid it, in practice its too difficult. So instead of myvec = f(), in C++03 you should write this:
std::vector<int> myvec;
... some time later ...
f().swap(myvec);
There is another option, which is to offer a more flexible interface to the user:
template <typename OutputIterator> void f(OutputIterator it) {
... write elements to the iterator like this ...
*it++ = 0;
*it++ = 1;
}
You can then also support the existing vector-based interface on top of that:
std::vector<int> f() {
std::vector<int> result;
f(std::back_inserter(result));
return result;
}
This might be less efficient than your existing code, if your existing code uses reserve() in a way more complex than just a fixed amount up front. But if your existing code basically calls push_back on the vector repeatedly, then this template-based code ought to be as good.
It's time I post an answer about RVO, me too...
If you return an object by value, the compiler often optimizes this so it doesn't get constructed twice, since it's superfluous to construct it in the function as a temporary and then copy it. This is called return value optimization: the created object will be moved instead of being copied.
A common pre-C++11 idiom is to pass a reference to the object being filled.
Then there is no copying of the vector.
void f( std::vector & result )
{
/*
Insert elements into result
*/
}
If the compiler supports Named Return Value Optimization (http://msdn.microsoft.com/en-us/library/ms364057(v=vs.80).aspx), you can directly return the vector provide that there is no:
Different paths returning different named objects
Multiple return paths (even if the same named object is returned on
all paths) with EH states introduced.
The named object returned is referenced in an inline asm block.
NRVO optimizes out the redundant copy constructor and destructor calls and thus improves overall performance.
There should be no real diff in your example.
vector<string> getseq(char * db_file)
And if you want to print it on main() you should do it in a loop.
int main() {
vector<string> str_vec = getseq(argv[1]);
for(vector<string>::iterator it = str_vec.begin(); it != str_vec.end(); it++) {
cout << *it << endl;
}
}
follow code will works without copy constructors:
your routine:
std::vector<unsigned char> foo()
{
std::vector<unsigned char> v;
v.resize(16, 0);
return std::move(v); // move the vector
}
After, You can use foo routine for get the vector without copy itself:
std::vector<unsigned char>&& moved_v(foo()); // use move constructor
Result: moved_v size is 16 and it filled by [0]
As nice as "return by value" might be, it's the kind of code that can lead one into error. Consider the following program:
#include <string>
#include <vector>
#include <iostream>
using namespace std;
static std::vector<std::string> strings;
std::vector<std::string> vecFunc(void) { return strings; };
int main(int argc, char * argv[]){
// set up the vector of strings to hold however
// many strings the user provides on the command line
for(int idx=1; (idx<argc); ++idx){
strings.push_back(argv[idx]);
}
// now, iterate the strings and print them using the vector function
// as accessor
for(std::vector<std::string>::interator idx=vecFunc().begin(); (idx!=vecFunc().end()); ++idx){
cout << "Addr: " << idx->c_str() << std::endl;
cout << "Val: " << *idx << std::endl;
}
return 0;
};
Q: What will happen when the above is executed? A: A coredump.
Q: Why didn't the compiler catch the mistake? A: Because the program is
syntactically, although not semantically, correct.
Q: What happens if you modify vecFunc() to return a reference? A: The program runs to completion and produces the expected result.
Q: What is the difference? A: The compiler does not
have to create and manage anonymous objects. The programmer has instructed the compiler to use exactly one object for the iterator and for endpoint determination, rather than two different objects as the broken example does.
The above erroneous program will indicate no errors even if one uses the GNU g++ reporting options -Wall -Wextra -Weffc++
If you must produce a value, then the following would work in place of calling vecFunc() twice:
std::vector<std::string> lclvec(vecFunc());
for(std::vector<std::string>::iterator idx=lclvec.begin(); (idx!=lclvec.end()); ++idx)...
The above also produces no anonymous objects during iteration of the loop, but requires a possible copy operation (which, as some note, might be optimized away under some circumstances. But the reference method guarantees that no copy will be produced. Believing the compiler will perform RVO is no substitute for trying to build the most efficient code you can. If you can moot the need for the compiler to do RVO, you are ahead of the game.
vector<string> func1() const
{
vector<string> parts;
return vector<string>(parts.begin(),parts.end()) ;
}
This is still efficient after c++11 onwards as complier automatically uses move instead of making a copy.

Is there any difference between push_back() and resize(size()+1) [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
A standard vector can be expanded to hold one more member by doing either:
std::vector<int> v;
v.push_back(1);
or
int os = v.size();
v.resize(os+1);
v[os] = 1;
Apart from the terseness of the code using push_back(), are there any other differences? For example, is one more efficient than the other, or is the extra memory assigned differently in each case?
push_back will construct in-place, meaning it calls the copy constructor.
resize will call the default constructor, and then v[os] will call the assignment operator.
Use push_back.
As Lightness says, the allocations in either case are equivalent. See his answer for more details.
Here's an example: http://stacked-crooked.com/view?id=43766666e5c72d282bd94c05e43e8897
Memory expansion
but do they expand memory in the same way?
It is not specified explicitly in C++11 (I checked), but yes.
resize in this case is defined to "append to the container", the only logical interpretation of which makes it equivalent to insert and push_back in terms of allocation.
The back-end of the container implementation usually allocates "more than it needs" in all cases when its memory block is expanded — this is done by the standard not prohibiting it rather than the standard mandating it, and there is no wording to suggest that this is not the case for resize also.
Ultimately, then, it's completely up to the implementation, but I'd be surprised to see a difference in memory allocation rules between the two approaches on any mainstream compiler.
This testcase shows one simple example:
#include <iostream>
#include <vector>
int main() {
std::vector<int> a, b;
for(int i = 0; i != 100; ++i)
a.push_back(0);
for(int i = 0; i != 100; ++i)
b.resize(i+1);
std::cout << a.capacity() << " , " << b.capacity() << std::endl;
}
// Output from my GCC 4.7.2:
// 128 , 128
Note that this one is subtly different:
int main() {
std::vector<int> a, b;
for(int i = 0; i != 100; ++i)
a.push_back(0);
b.resize(100);
std::cout << a.capacity() << " , " << b.capacity() << std::endl;
}
// Output from my GCC 4.7.2:
// 128 , 100
This is not a fair test between the two approaches, because in the latter example we're expanding memory in different increments, and the memory backing sequence containers tends to be expanded in multiples.
Performance concerns
Anyway, as #Pubby identifies, in the resize case you'll do the implicit construction followed by the explicit assignment, which is non-optimal if you're prematurely optimising.
In the real world, the real issue here is using the wrong function for the job and writing silly code.
They are not comparable, and you should not base the decision on performance, but that being said the performance might differ quite a lot depending on the implementation.
The standard requires push_back() to have amortized constant time, which basically means that the implementation must grow the buffer for the objects following a geometric series (i.e. when growing, the new size must be proportional to the previous size by a factor F > 1).
There is no such requirement for resize(). As a matter of fact some implementations assume that if you are calling resize() it is because you have better knowledge of what the final size of the vector will be. With that in mind, those implementations will grow the buffer (if needed) to exactly the size that you are requesting, not following a geometric progression. That means that the cost of appending following this mechanism can be O(N^2), instead of O(N) for the push_back case.
Yeah, there can be a significant difference, particularly when using custom data types. Observe:
#include <iostream>
#include <vector>
struct S
{
S()
{
std::cout << "S()\n";
}
S(const S&)
{
std::cout << "S(const S&)\n";
}
S& operator = (const S&)
{
std::cout << "operator =\n";
return *this;
}
~S()
{
std::cout << "~S()\n";
}
};
int main()
{
std::vector<S> v1;
std::cout << "push_back:\n";
v1.push_back(S());
std::vector<S> v2;
std::cout << '\n' << "resize:\n";
v2.resize(1);
v2[0] = S();
std::cout << "\nend\n"; // Ignore destructors after "end" (they're not pertinent to the comparison)
}
Output:
push_back:
S()
S(const S&)
~S()
resize:
S()
S(const S&)
~S()
S()
operator =
~S()
end
~S()
~S()
push_back FTW.
Edit: In response to #Lightness Races in Orbit's comment, this is, of course, just a silly example. S obviously isn't a really "useful" struct, and if you remove all the printing statements, or simply define S as struct S {};, then of course the compiler can optimize a lot of it away.
But since when do you only use incredibly trivial data types in a program? You'll use some trivial data types, but you'll also likely use some non-trivial data types someday too. These non-trivial data types may have expensive constructors, destructors, or assignment operators (or they may not be super expensive, but they may add up if you repeatedly use resize instead of push_back), and push_back would certainly be the right choice.
resize() may allocate memory better when you need to add add few elements.
i.e
std::vector<int> v;
for(int i = 0; i < n; ++i)
v.push_back(1);
or
int os = v.size();
v.resize(os.size() + n);
for(int i = 0; i < n; ++i)
v[os + i] = 1;
But in this case it's better to use
v.reserve(os.size() + n);
and then push_back's, it'll avoid construction + assignment too, as in your case

Fastest way to check equality with tolerance within a range?

The following function compare two arrays, and returns true if all elements are equal taking in account a tolerance.
// Equal
template<typename Type>
bool eq(const unsigned int n, const Type* x, const Type* y, const Type tolerance)
{
bool ok = true;
for(unsigned int i = 0; i < n; ++i) {
if (std::abs(x[i]-y[i]) > std::abs(tolerance)) {
ok = false;
break;
}
}
return ok;
}
Is there a way to beat the performances of this function ?
Compute abs(tolerance) outside the loop.
You might try unrolling the loop into a 'major' loop and a 'minor' loop where the 'minor' loop's only jump is to its beginning and the 'major' loop has the 'if' and 'break' stuff. Do something like ok &= (x[i]-y[i] < abstol) & (y[i]-x[i] < abstol); in the minor loop to avoid branching -- note & instead of &&.
Then partially unroll and vectorise the minor loop. Then specialise for whatever floating-point types you're actually using and use your platform's SIMD instructions to do the minor loop.
Think before doing this, of course, since it can increase code size and thereby have ill effects on maintainability and sometimes the performance of other parts of your system.
You can avoid those return variable assignments, and precalculate the absolute value of tolerance:
// Equal
template<typename Type>
bool eq(const unsigned int n, const Type* x, const Type* y, const Type tolerance) {
const Type absTolerance = std::abs(tolerance);
for(unsigned int i = 0; i < n; ++i) {
if (std::abs(x[i]-y[i]) > absTolerance) {
return false;
}
}
return true;
}
Also, if you know the tolerance will be always possitive there's no need to calculate its absolute value. If not, you may take it as a precondition.
I would do it like this, you can roll a C++03 version with class functors also, it will be more verbose but should be equally efficient:
std::equal(x, x+n, y, [&tolerance](Type a, Type b) -> bool { return ((a-b) < tolerance) && ((a-b) > -tolerance); }
Major difference is dropping the abs: depending on Type and how abs is implemented you might get a conditional execution path extra, with lots of branch mispredictions, this should certainly avoid that. The duplicate calculation of a-b will likely be optimized away by the compiler (if it deems necessary).
Of course, it introduces an extra operator requirement for Type and if operators < or > are slow, it might be slower then abs (measure it).
Also, std::equal is a standard algorithm doing all that looping and early breaking for you, it's always a good idea to use a standard library for this. It's usually nicer to maintain (in C++11 at least) and could get optimized better because you clearly show intent.

Is the book wrong?

template <typename T>
class Table {
public:
Table();
Table(int m, int n);
Table(int m, int n, const T& value);
Table(const Table<T>& rhs);
~Table();
Table<T>& operator=(const Table& rhs);
T& operator()(int i, int j);
int numRows()const;
int numCols()const;
void resize(int m, int n);
void resize(int m, int n, const T& value);
private:
// Make private because this method should only be used
// internally by the class.
void destroy();
private:
int mNumRows;
int mNumCols;
T** mDataMatrix;
};
template <typename T>
void Table<T>::destroy() {
// Does the matrix exist?
if (mDataMatrix) {
for (int i = 0; i < _m; ++i) {
// Does the ith row exist?
if (mDataMatrix[i]) {
// Yes, delete it.
delete[]mDataMatrix[i];
mDataMatrix[i] = 0;
}
}
// Delete the row-array.
delete[] mDataMatrix;
mDataMatrix = 0;
}
mNumRows = 0;
mNumCols = 0;
}
This is a code sample I got from a book. It demonstrates how to destroy or free a 2x2 matrix where mDataMatrix is the pointer to array of pointers.
What I don't understand is this part:
for(int i = 0; i < _m; ++i) {
// Does the ith row exist?
if (mDataMatrix[i]) {
//.….
}
}
I don't know why the book uses _m for max number of row-ptr. It wasn't even a variable define in class; the variable for max row is mNumRows. Maybe it is some compiler pre-defined variable? Another thing I am quite confuse is why is it ++i? pre-operator, why not i++? Will it make different if I change it into i++?
Another thing I am quite confuse is why is it ++i? pre-operator, why not i++? Will it make different if I change it into i++?
Because ++i is more natural and easier to understand: increment i and then yield the variable i as a result. i++ on the other hand means copy the current value of i somewhere (let's call it temp), increment i, and then yield the value temp as a result.
Also, for user-defined types, i++ is potentially slower than ++i.
Note that ++i as a loop increment does not imply the increment happens before entering the loop body or something. (This seems to be a common misconception among beginners.) If you're not using ++i or i++ as part of a larger expression, the semantics are exactly the same, because prefix and postfix increment only differ in their result (incremented variable vs. old value), not in their side effect (incrementing the variable).
Without seeing the entire class code, it is hard to tell for your first question, but if it hasn't been defined as part of the class, my guess would be that it is a typo.
as for your second question, ++i vs. i++, the prefix increment operator (++i) returns the object you are incrementing, whereas the postfix increment operator returns a copy of the object, in the objects original state. i.e.-
int i=1;
std::cout << i++ << std::endl; // output: 1
std::cout << i << std::endl // output: 2
std::cout << ++i << std::endl // output: 3
as for will the code change with the postfix- no, it works the same in loops, and makes basically no difference in loops for integer types. For user defined types, however, it may be more efficient to use the prefix increment, and is the style many c++ programmers use by default.
If the _mvariable isn't defined anywhere this is an error. From that context it looks like it should contain the number of rows that are allocated with new somewhere (probably in the constructor, or there might be methods like addRow). If that number is always mNumRows, than this would be appropriate for the loop in the destructor.
If you use ++i or i++ in that for loop doesn't make any difference. Both variants increment the integer, and the return value of the expression (that would be different) isn't used anywhere.
I can't speak to the first part of the question, but I can explain the pre- versus post- increment dilemma.
Prefix versions increment and decrement are slightly more efficient and are generally preferred. In the end, though, the extra overhead caused by using i++ over ++i is negligible unless the loop is being executed many, many times.
As others have said, the prefix operator is preferred for performance reasons when dealing with user-defined types. The reason it has no impact on the for loop is because the test involving the value of the variable (i.e. i < _m) is performed before the operation that modifies the variable is performed.
The real mess with this book is the way it illustrates a 2x2 matrix. The problem is here that for 4 elements you have 3 blocks of memory allocated, and not only does it slows down the program but it is certainly much more tricky to handle.
The usual technic is much simpler:
T* mData = new T[2*2];
And then you access it like so:
T& operator()(size_t r, size_t c) { return mData[r * mNbRows + c]; }
This is a bit more work (you have to multiply by the number of rows if you are row major), but then the destroy is incredibly easy:
template <class T>
void Table<T>::destroy()
{
delete[] mData;
mData = 0;
mNbRows = 0;
mNbColumns = 0;
}
Also note that here there is no need for a if: it's fine to call delete on a null pointer, it just doesn't do anything.
Finally, I have no idea why your book is using int for coordinates, do negative coordinates have any meaning in the context of this Table class ? If not, you're better off using an unsigned integral type (like size_t) and throwing the book away.