I want to write code that compiles conditionally and according to the following two cases:
CASE_A:
for(int i = 1; i <= 10; ++i){
// do something...
}
CASE_B: ( == !CASE_A)
{
const int i = 0;
// do something...
}
That is, in case A, I want to have a normal loop over variable i but, in case B, i want to restrict local scope variable i to only a special case (designated here as i = 0). Obviously, I could write something along the lines:
for(int i = (CASE_A ? 1 : 0); i <= (CASE_A ? 10 : 0); ++i){
// do something
}
However, I do not like this design as it doesn't allow me to take advantage of the const declaration in the special case B. Such declaration would presumably allow for lots of optimization as the body of this loop benefits greatly from a potential compile-time replacement of i by its constant value.
Looking forward to any tips from the community on how to efficiently achieve this.
Thank you!
EDITS:
CASE_A vs CASE_B can be evaluated at compile-time.
i is not passed as reference
i is not re-evaluated in the body (otherwise const would not make sense), but I am not sure the compiler will go through the effort to certify that
Assuming, you aren't over-simplifying your example, it shouldn't matter. Assuming CASE_A can be evaluated at compile-time, the code:
for( int i = 0; i <= 0; ++i ) {
do_something_with( i );
}
is going to generate the same machine code as:
const int i = 0;
do_something_with( i );
for any decent compiler (with optimization turned on, of course).
In researching this, I find there is a fine point here. If i gets passed to a function via a pointer or reference, the compiler can't assume it doesn't change. This is true even if the pointer or reference is const! (Since the const can be cast away in the function.)
Seems to be the obvious solution:
template<int CASE>
void do_case();
template<>
void do_case<CASE_A>()
{
for(int i = 1; i <= 10; ++i){
do_something( i );
}
}
template<>
void do_case<CASE_B>()
{
do_something( 0 );
}
// Usage
...
do_case<CURRENT_CASE>(); // CURRENT_CASE is the compile time constant
If your CASE_B/CASE_B determination can be expressed as a compile time constant, then you can do what you want in a nice, readable format using something like the following (which is just a variation on your example of using the ?: operator for the for loop initialization and condition):
enum {
kLowerBound = (CASE_A ? 1 : 0),
kUpperBound = (CASE_A ? 10 : 0)
};
for (int i = kLowerBound; i <= kUpperBound; ++i) {
// do something
}
This makes it clear that the for loop bounds are compile time constants - note that I think most compilers today would have no problem making that determination even if the ?: expressions were used directly in the for statement's controlling clauses. However, I do think using enums makes it more evident to people reading the code.
Again, any compiler worth its salt today should recognize when i is invariant inside the loop, and in the CASE_B situation also determine that the loop will never iterate. Making i const won't benefit the compiler's optimization possibilities.
If you're convinced that the compiler might be able to optimize better if i is const, then a simple modification can help:
for (int ii = kLowerBound; ii <= kUpperBound; ++ii) {
const int i = ii;
// do something
}
I doubt this will help the compiler much (but check it's output - I could be wrong) if i isn't modified or has its address taken (even by passing it as a reference). However, it might help you make sure that i isn't inappropriately modified or passed by reference/address in the loop.
On the other hand, you might actually see a benefit to optimizations produced by the compiler if you use the const modifier on it - in the cases where the address of i is taken or the const is cast away, the compiler is still permitted to treat i as not being modified for its lifetime. Any modifications that might be made by something that cast away the const would be undefined behavior, so the compiler is allowed to ignore that they might occur. Of course, if you have code that might do this, you have bigger worries than optimization. So it's more important to make sure that there are no 'behind the back' modification attempts to i than to simply marking i as const 'for optimization', but using const might help you identify whether modifications are made (but remember that casts can continue to hide that).
I'm not quite sure that this is what you're looking for, but I'm using this macro version of the vanilla FOR loop which enforces the loop counter to be const to catch any modification of it in the body
#define FOR(type, var, start, maxExclusive, inc) if (bool a = true) for (type var##_ = start; var##_ < maxExclusive; a=true,var##_ += inc) for (const auto var = var##_;a;a=false)
Usage:
#include <stdio.h>
#define FOR(type, var, start, maxExclusive, inc) if (bool a = true) for (type var##_ = start; var##_ < maxExclusive; a=true,var##_ += inc) for (const auto var = var##_;a;a=false)
int main()
{
FOR(int, i, 0, 10, 1) {
printf("i: %d\n", i);
}
// does the same as:
for (int i = 0; i < 10; i++) {
printf("i: %d\n", i);
}
// FOR catches some bugs:
for (int i = 0; i < 10; i++) {
i += 10; // is legal but bad
printf("i: %d\n", i);
}
FOR(int, i, 0, 10, 1) {
i += 10; // is illlegal and will not compile
printf("i: %d\n", i);
}
return 0;
}
Related
I have a problem. I will use a hypothetical example as I do not think that I need to use my actual function as it is kind of complex. This is the example:
int GetNumberTimesTwo(int num)
{
return num * 2;
}
Now, let's assume that if the Number is bigger than two something bad happens. Is there any way how I can force num to be less or equal than 2? Of course, I could do
int GetNumberTimesTwo(int num)
{
if (num > 2)
return;
return num * 2;
}
The problem is that this would be annoying as it just prevents this from happening, but I would like to know about this error before compiling. Meaning, is there something like int num < 2 that I can do?
In my dreams, it could be done like that:
int GetNumberTimesTwo(int num < 2)
{
return num * 2;
}
But as I said, in my dreams, and because I know C++, I know that nothing ever works like I would like it to work, therefore I have to ask what the correct way to do this is.
C++ What would be the best way to set a maximum number for an integer in the function parameters
There are basically two alternatives:
Decide how to handle invalid input at runtime.
Accept a parameter type whose all possible values are valid, thereby making it impossible to pass invalid input.
For 1. there are many alternatives. Here are some:
Simply document the requirement as a precondition and violating the precondition results in undefined behaviour. Assume that input is always valid and don't check for invalid values. This is the most efficient choice, but least safe.
Throw an exception.
Return std::expected (proposed, not yet standard feature) or similar class which contains either result, or error state.
For 2. There are no trivial fool-proof choices. For a small set of valid values, an enum class is reasonably difficult to misuse:
enum class StrangeInput {
zero = 0,
one = 1,
two = 2,
};
int GetNumberTimesTwo(StrangeInput num)
There is no way the compiler knows how to validate this, and if it does what would be the error? stop compiling?
The problem is that you are passing a variable to the function, In the universe of possible values of ints there are for sure values greater than 2 so how would the compiler know that in your execution you are never passing that amount.
This is a typical runtime validation, you should be validating the preconditions in your function and handle these scenarios in case unexpected values comes.
With template and/or constexpr, you might have some kind of check, but requires value known at compile time and possibly restrict available functions to call inside the function:
template <int N>
int GetNumberTimesTwo()
{
static_assert(N <= 2);
return N * 2;
}
constexpr int GetNumberTimesTwo(int num)
{
// only code valid for constexpr
// the restrictions has changed with standard
if (num > 2) throw std::runtime_error("out of range");
return num * 2;
}
constexpr int ok = GetNumberTimesTwo(1);
constexpr int ko = GetNumberTimesTwo(42); // Compile time error
int no_check1 = GetNumberTimesTwo(1); // ok at runtime
int no_check2 = GetNumberTimesTwo(42); // throw at runtime
this is the way to go:
int GetNumberTimesTwo(int num)
{
if (num > 2)
{
return 0;
}
return num * 2;
}
or throw an error:
int GetNumberTimesTwo(int num)
{
if (num < 0)
{
throw std::invalid_argument("count cannot be negative!");
}
return num * 2;
}
I have function that receives an array of pointers like so:
void foo(int *ptrs[], int num, int size)
{
/* The body is an example only */
for (int i = 0; i < size; ++i) {
for (int j = 0; j < num-1; ++j)
ptrs[num-1][i] += ptrs[j][i];
}
}
What I want to convey to the compiler is that the pointers ptrs[i] are not aliases of each other and that the arrays ptrs[i] do not overlap. How shall I do this ? My ulterior motive is to encourage automatic vectorization.
Also, is there a way to get the same effect as __restrict__ on an iterator of a std::vector ?
restrict, unlike the more common const, is a property of the pointer rather than the data pointed to. It therefore belongs on the right side of the '*' declarator-modifier. [] in a parameter declaration is another way to write *. Putting these things together, you should be able to get the effect you want with this function prototype:
void foo(int *restrict *restrict ptrs, int num, int size)
{
/* body */
}
and no need for new names. (Not tested. Your mileage may vary. restrict is a pure optimization hint and may not actually do anything constructive with your compiler.)
Something like:
void foo(int *ptrs[], int num, int size)
{
/* The body is an example only */
for (int i = 0; i < size; ++i) {
for (int j = 0; j < num-1; ++j) {
int * restrict a = ptrs[num-1];
int * restrict b = ptrs[j];
a[i] += b[i];
}
}
... should do it, I think, in C99. I don't think there's any way in C++, but many C++ compilers also support restrict.
In C++, pointer arguments are assumed not to alias if they point to fundamentally different types ("strict aliasing" rules).
In C99, the "restrict" keyword specifies that a pointer argument does not alias any other pointer argument.
Call std::memcpy. Memcpy's definition will have restrict set up if your language/version and compiler support it, and most compilers will lower it into vector instructions if the size of the copied region is small.
int fkt(int &i)
{
return i++;
}
int main()
{
int i = 5;
printf("%d ", fkt(i));
printf("%d ", fkt(i));
printf("%d ", fkt(i));
}
prints '5 6 7 '. Say I want to print '5 7 9 ' like this, is it possible to do it in a similar way without a temporary variable in fkt()? (A temporary variable would marginally decrease efficiency, right?) I.e., something like
return i+=2
or
return i, i+=2;
which both first increases i and then return it, what is not what I need.
Thanks
EDIT: The main reason, I do it in a function and not outside is because fkt will be a function pointer. The original function will do something else with i. I just feel that using {int temp = i; i+=2; return temp;} does not seem as nice as {return i++;}.
I don't care about printf, this is just for illustration of the use of the result.
EDIT2: Wow, that appears to be more of a chat here than a traditional board :) Thanks for all the answers. My fkt is actually this. Depending on some condition, I will define get_it as either get_it_1, get_it_2, or get_it_4:
unsigned int (*get_it)(char*&);
unsigned int get_it_1(char* &s)
{return *((unsigned char*) s++);}
unsigned int get_it_2(char* &s)
{unsigned int tmp = *((unsigned short int*) s); s += 2; return tmp;}
unsigned int get_it_4(char* &s)
{unsigned int tmp = *((unsigned int*) s); s += 4; return tmp;}
For get_it_1, it is so simple... I'll try to give more background in the future...
"A temporary variable would marginally decrease efficiency, right?"
Wrong.
Have you measured it? Please be aware that ++ only has magical efficiency powers on a PDP-11. On most other processors it's just the same as +=1. Please measure the two to see what the actual differences actually are.
(A temporary variable would marginally decrease efficiency, right?)
If that's the main reason you're asking, then you're worrying far too early. Don't second-guess your compiler until you have an actual performance problem.
If you want fkt to be able to add different amounts to i, then you need to pass a parameter. There is no material reason to prefer ++ over +=.
You could increment twice, then subtract, something like:
return (i += 2) - 2
Just answering the question, though, I don't think you should be scared of using a temporary variable.
If you really need that speed, then check the assembly code the compiler outputs. If you do:
int t = i;
i+=2;
return t;
then the compiler may optimize it while inlining the function into something like:
printf("%d ", i);
i+=2;
Which is as good as it's gonna get.
EDIT: Now you say you're jumping to this function via a function pointer? Calling a function by a pointer is pretty slow compared to using a temporary variable. I'm not certain, but I think I remember the Intel docs saying it's somewhere around 20-30 cpu cycles on Core 2s and i7s, that is if your function code is all in cache. Since the compiler cannot inline your function when it is called by a pointer, the ++ will also create a temporary variable anyways.
A temporary variable would marginally
decrease efficiency, right?
Always take in account "premature optimization is the root of all evil". Spend time on other parts then try to optimize this away. += and ++ will probably result in the same thing. Your PC will manage ;)
Well, you could do
return i+=2, i-2;
That said, just follow this simple maxim:
Write code that is easy to understand.
Code that is easy for a human to understand is usually easy for the compiler to understand and hence the compiler can optimise it.
Writing crazy stuff in an effort to optimise your code often just confuses the compiler and makes it harder for the compiler to optimise your code.
I would recommend
int fkt(int &i)
{
int orig = i;
i += 2;
return orig;
}
A "temporary variable" will very unlikely decrease performance, as i++ has to hold the old state internally to (i.e. it implicitly uses what you call a temprorary variable). However, the instruction that is needed to increase by two may be slower than the one used for ++.
You can try:
int fkt(int &i)
{
++ii;
return i++;
}
You can compare this to:
int fkt(int &i)
{
int t = i;
i += 2;
return t;
}
That said, I don't think you should be doing any such performance considerations prematurely.
I think what you really want is a functor:
class Fkt
{
int num;
public:
Fkt(int i) :num(i-2) {}
operator()() { num+=2; return num; }
}
int main()
{
Fkt fkt(5);
printf("%d ", fkt());
printf("%d ", fkt());
printf("%d ", fkt());
}
What about
int a = 0;
++++a;
Increments +2 without temporary variables.
EDIT:
Oops, sorry: didn't notice you want to achieve 5,7,9. This requires temporary variables so prefix notation can't be used.
Alright, I'm guessing this is an easy question, so I'll take the knocks, but I'm not finding what I need on google or SO. I'd like to create an array in one place, and populate it inside a different function.
I define a function:
void someFunction(double results[])
{
for (int i = 0; i<100; ++i)
{
for (int n = 0; n<16; ++n) //note this iteration limit
{
results[n] += i * n;
}
}
}
That's an approximation to what my code is doing, but regardless, shouldn't be running into any overflow or out of bounds issues or anything. I generate an array:
double result[16];
for(int i = 0; i<16; i++)
{
result[i] = -1;
}
then I want to pass it to someFunction
someFunction(result);
When I set breakpoints and step through the code, upon entering someFunction, results is set to the same address as result, and the value there is -1.000000 as expected. However, when I start iterating through the loop, results[n] doesn't seem to resolve to *(results+n) or *(results+n*sizeof(double)), it just seems to resolve to *(results). What I end up with is that instead of populating my result array, I just get one value. What am I doing wrong?
EDIT
Oh fun, I have a typo: it wasn't void someFunction(double results[]). It was:
void someFunction(double result[])...
So perhaps this is turning into a scoping question. If my double result[16] array is defined in a main.cpp, and someFunction is defined in a Utils.h file that's included by the main.cpp, does the result variable in someFunction then wreak havoc on the result array in main?
EDIT 2:
#gf, in the process of trying to reproduce this problem with a fresh project, the original project "magically" started working.
I don't know how to explain it, as nothing changed, but I'm pretty sure of what I saw - my original description of the issue was pretty clear, so I don't think I was hallucinating. I appreciate the time and answers...sorry for wasting your time. I'll update again if it happens again, but for the meantime, I think I'm in the clear. Thanks again.
Just a point about the variable scope part of the question - there is no issue of variable scope here. result/results in your someFunction definition is a parameter -> it will take on the value passed in. There is no relation between variables in a called function and it's caller -> the variables in the caller function are unknown to the called function unless passed in. Also, variable scoping issues do not occur between routines in C++ because there are no nested routines. The following pieces of code would demonstrate scoping issues:
int i = 0;
{
int i = 0;
i = 5; //changes the second i, not the first.
//The first is aliased by the second i defined first.
}
i = 5; //now changes the first i; the inner block is gone and so is its local i
so if C++ did have nested routines, this would cause variable scoping
void main()
{
double results[16];
double blah[16];
doSomething(blah);
void doSomething(double * results)
{
//blah doing something here uses our parameter results,
//which refers to blah, but not to the results in the higher scope.
//The results in the higher scope is hidden.
}
}
void someFunction(double results[])
should be exactly equivalent to
void someFunction(double *results)
Try using the alternative declaration and see if the problem persists.
To me it seems that your code should simply work.
I just tried this in g++ and worked fine. I guess your problem is elsewhere? have you tried the snipped you posted?
#include <iostream>
void someFunction(double results[])
{
for (int i = 0; i<100; ++i)
{
for (int n = 0; n<16; ++n) //note this iteration limit
{
results[n] += i * n;
}
}
}
int main()
{
double result[16];
for(int i = 0; i<16; i++)
{
result[i] = -1;
}
someFunction(result);
for(int i = 0; i<16; i++)
std::cerr << result[i] << " ";
std::cerr << std::endl;
}
Have you perhaps double defined your results array in a couple places and then accidently refered to one copy in one place and another copy elsewhere? Perhaps the second is a pointer and not an array and that is why the debugger is confused?
To ensure this problem doesn't occur, you should never use global variables like that. If you absolutely must have one, put it in a namespace for clarity.
I am trying to show by example that the prefix increment is more efficient than the postfix increment.
In theory this makes sense: i++ needs to be able to return the unincremented original value and therefore store it, whereas ++i can return the incremented value without storing the previous value.
But is there a good example to show this in practice?
I tried the following code:
int array[100];
int main()
{
for(int i = 0; i < sizeof(array)/sizeof(*array); i++)
array[i] = 1;
}
I compiled it using gcc 4.4.0 like this:
gcc -Wa,-adhls -O0 myfile.cpp
I did this again, with the postfix increment changed to a prefix increment:
for(int i = 0; i < sizeof(array)/sizeof(*array); ++i)
The result is identical assembly code in both cases.
This was somewhat unexpected. It seemed like that by turning off optimizations (with -O0) I should see a difference to show the concept. What am I missing? Is there a better example to show this?
In the general case, the post increment will result in a copy where a pre-increment will not. Of course this will be optimized away in a large number of cases and in the cases where it isn't the copy operation will be negligible (ie., for built in types).
Here's a small example that show the potential inefficiency of post-increment.
#include <stdio.h>
class foo
{
public:
int x;
foo() : x(0) {
printf( "construct foo()\n");
};
foo( foo const& other) {
printf( "copy foo()\n");
x = other.x;
};
foo& operator=( foo const& rhs) {
printf( "assign foo()\n");
x = rhs.x;
return *this;
};
foo& operator++() {
printf( "preincrement foo\n");
++x;
return *this;
};
foo operator++( int) {
printf( "postincrement foo\n");
foo temp( *this);
++x;
return temp;
};
};
int main()
{
foo bar;
printf( "\n" "preinc example: \n");
++bar;
printf( "\n" "postinc example: \n");
bar++;
}
The results from an optimized build (which actually removes a second copy operation in the post-increment case due to RVO):
construct foo()
preinc example:
preincrement foo
postinc example:
postincrement foo
copy foo()
In general, if you don't need the semantics of the post-increment, why take the chance that an unnecessary copy will occur?
Of course, it's good to keep in mind that a custom operator++() - either the pre or post variant - is free to return whatever it wants (or even do whatever it wants), and I'd imagine that there are quite a few that don't follow the usual rules. Occasionally I've come across implementations that return "void", which makes the usual semantic difference go away.
You won't see any difference with integers. You need to use iterators or something where post and prefix really do something different. And you need to turn all optimisations on, not off!
I like to follow the rule of "say what you mean".
++i simply increments. i++ increments and has a special, non-intuitive result of evaluation. I only use i++ if I explicitly want that behavior, and use ++i in all other cases. If you follow this practice, when you do see i++ in code, it's obvious that post-increment behavior really was intended.
Several points:
First, you're unlikely to see a major performance difference in any way
Second, your benchmarking is useless if you have optimizations disabled. What we want to know is if this change gives us more or less efficient code, which means that we have to use it with the most efficient code the compiler is able to produce. We don't care whether it is faster in unoptimized builds, we need to know if it is faster in optimized ones.
For built-in datatypes like integers, the compiler is generally able to optimize the difference away. The problem mainly occurs for more complex types with overloaded increment iterators, where the compiler can't trivially see that the two operations would be equivalent in the context.
You should use the code that clearest expresses your intent. Do you want to "add one to the value", or "add one to the value, but keep working on the original value a bit longer"? Usually, the former is the case, and then a pre-increment better expresses your intent.
If you want to show the difference, the simplest option is simply to impement both operators, and point out that one requires an extra copy, the other does not.
This code and its comments should demonstrate the differences between the two.
class a {
int index;
some_ridiculously_big_type big;
//etc...
};
// prefix ++a
void operator++ (a& _a) {
++_a.index
}
// postfix a++
void operator++ (a& _a, int b) {
_a.index++;
}
// now the program
int main (void) {
a my_a;
// prefix:
// 1. updates my_a.index
// 2. copies my_a.index to b
int b = (++my_a).index;
// postfix
// 1. creates a copy of my_a, including the *big* member.
// 2. updates my_a.index
// 3. copies index out of the **copy** of my_a that was created in step 1
int c = (my_a++).index;
}
You can see that the postfix has an extra step (step 1) which involves creating a copy of the object. This has both implications for both memory consumption and runtime. That is why prefix is more efficient that postfix for non-basic types.
Depending on some_ridiculously_big_type and also on whatever you do with the result of the incrememt, you'll be able to see the difference either with or without optimizations.
In response to Mihail, this is a somewhat more portable version his code:
#include <cstdio>
#include <ctime>
using namespace std;
#define SOME_BIG_CONSTANT 100000000
#define OUTER 40
int main( int argc, char * argv[] ) {
int d = 0;
time_t now = time(0);
if ( argc == 1 ) {
for ( int n = 0; n < OUTER; n++ ) {
int i = 0;
while(i < SOME_BIG_CONSTANT) {
d += i++;
}
}
}
else {
for ( int n = 0; n < OUTER; n++ ) {
int i = 0;
while(i < SOME_BIG_CONSTANT) {
d += ++i;
}
}
}
int t = time(0) - now;
printf( "%d\n", t );
return d % 2;
}
The outer loops are there to allow me to fiddle the timings to get something suitable on my platform.
I don't use VC++ any more, so i compiled it (on Windows) with:
g++ -O3 t.cpp
I then ran it by alternating:
a.exe
and
a.exe 1
My timing results were approximately the same for both cases. Sometimes one version would be faster by up to 20% and sometimes the other. This I would guess is due to other processes running on my system.
Try to use while or do something with returned value, e.g.:
#define SOME_BIG_CONSTANT 1000000000
int _tmain(int argc, _TCHAR* argv[])
{
int i = 1;
int d = 0;
DWORD d1 = GetTickCount();
while(i < SOME_BIG_CONSTANT + 1)
{
d += i++;
}
DWORD t1 = GetTickCount() - d1;
printf("%d", d);
printf("\ni++ > %d <\n", t1);
i = 0;
d = 0;
d1 = GetTickCount();
while(i < SOME_BIG_CONSTANT)
{
d += ++i;
}
t1 = GetTickCount() - d1;
printf("%d", d);
printf("\n++i > %d <\n", t1);
return 0;
}
Compiled with VS 2005 using /O2 or /Ox, tried on my desktop and on laptop.
Stably get something around on laptop, on desktop numbers are a bit different (but rate is about the same):
i++ > 8xx <
++i > 6xx <
xx means that numbers are different e.g. 813 vs 640 - still around 20% speed up.
And one more point - if you replace "d +=" with "d = " you will see nice optimization trick:
i++ > 935 <
++i > 0 <
However, it's quite specific. But after all, I don't see any reasons to change my mind and think there is no difference :)
Perhaps you could just show the theoretical difference by writing out both versions with x86 assembly instructions? As many people have pointed out before, compiler will always make its own decisions on how best to compile/assemble the program.
If the example is meant for students not familiar with the x86 instruction set, you might consider using the MIPS32 instruction set -- for some odd reason many people seem to find it to be easier to comprehend than x86 assembly.
Ok, all this prefix/postfix "optimization" is just... some big misunderstanding.
The major idea that i++ returns its original copy and thus requires copying the value.
This may be correct for some unefficient implementations of iterators. However in 99% of cases even with STL iterators there is no difference because compiler knows how to optimize it and the actual iterators are just pointers that look like class. And of course there is no difference for primitive types like integers on pointers.
So... forget about it.
EDIT: Clearification
As I had mentioned, most of STL iterator classes are just pointers wrapped with classes, that have all member functions inlined allowing out-optimization of such irrelevant copy.
And yes, if you have your own iterators without inlined member functions, then it may
work slower. But, you should just understand what compiler does and what does not.
As a small prove, take this code:
int sum1(vector<int> const &v)
{
int n;
for(auto x=v.begin();x!=v.end();x++)
n+=*x;
return n;
}
int sum2(vector<int> const &v)
{
int n;
for(auto x=v.begin();x!=v.end();++x)
n+=*x;
return n;
}
int sum3(set<int> const &v)
{
int n;
for(auto x=v.begin();x!=v.end();x++)
n+=*x;
return n;
}
int sum4(set<int> const &v)
{
int n;
for(auto x=v.begin();x!=v.end();++x)
n+=*x;
return n;
}
Compile it to assembly and compare sum1 and sum2, sum3 and sum4...
I just can tell you... gcc give exactly the same code with -02.