Different ways to access array's element - c++

As well as I know, there are two ways to access array's element in C++:
int array[5]; //If we have an array of 5 integers
1) Using square brackets
array[i]
2) Using pointers
*(array+i)
My university's teacher forces me to use *(array+i) method, telling me that "it's more optimized".
So, can you please explain, is there any real difference between them? Does the second method has any advantages over the first one?

Is one option more optimized than the other ?
Well, let's see in practice the assembler code generated with MSVC2013 (NON-OPTIMIZED debug mode):
; 21 : array[i] = 8;
mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR _array$[ebp+eax*4], 8
; 22 : *(array + i) = 8;
mov eax, DWORD PTR _i$[ebp]
mov DWORD PTR _array$[ebp+eax*4], 8
Well, with the best will, I cannot see any difference in the code generated.
By the way, someone recently wrote on SO: premature optimizing is the root of all evil. Your teacher should know that !
Has one an advantage over the other ?
Clearly, option one has the advantage of being intuitive and readable. Option2 becomes quickly UNREADABLE in mathematical applications.
Example 1: distance of a 2D mathematical vector implemented as an array.
double v[2] = { 2.0, 1.0 };
// option 1:
double d1 = sqrt(v[0] * v[0] + v[1] * v[1]);
//option 2:
double d2 = sqrt(*v**v + *(v + 1)**(v + 1));
In fact the second option is really misleading due to the **, because you have to read the formula carefully to understand if it's a double dereference or a multiplication by a dereferenced pointer. Not speaking of people who might be mislead by some other languages like ADA in which ** means "power"
Example 2: calculation of the determinant of a 2x2 matrix
double m[2][2] = { { 1.0, 2.0 }, { 3.0, 4.0 } };
// option 1
double dt1 = m[0][0] * m[1][1] - m[1][0] * m[0][1];
// option 2
double *x = reinterpret_cast<double*>(m);
double dt2 = *x **(x+2*1+1) - *(x+2*1) * *(x+1);
With multidimensional arays, option 2 is a nightmare. Note that :
I've used a temporary one dimensional pointer x to be able to use the formula. Using m here would have caused misleading compilation error messages.
you have to know the precise layout of your object and you have to introduce the size of the first dimension in every formula !
Imagine that later on you want to increase the number of elements in your 2D array. You'll have to rewrite everything !
Semantic gap
What your teacher is missing here, is that the operator [] has a meaning that is well understood by the compiler and the reader. It's an abstraction that is not dependent on how your data structure is implemented in reality.
Suppose you have an array and a very simple code:
int w[10] {0};
... // put something in w
int sum = 0;
for (int i = 0; i < 10; i++)
sum += w[i];
Later you decide to use a std::vector instead of an array, because you've learnt that it's much more flexible and powerful. All you have to do is to change the definition (and initialisation) of w :
vector<int> w(10,0);
The rest of your code will work, because the semantic of [] is the same for the two data strutures. I let you imagine what would have hapened if you'd have used your teacher's advice...

"My university's teacher forces me to use *(array+i) method, telling me that "it's more optimized"."
What are they telling you please? If you didn't got something completely wrong with this statement1, ask them for a proof regarding the generated assembler code (#Christophe was giving one in his answer here). I don't believe they could give you such, when looking in deeper.
You could easily check this out yourself using the e.g. the -S option of GCC to produce the assembler code, and compare the results achieved with one or the other version.
Any decent, modern C++ compiler will produce the exactly same assembler code for both of these statements, as long they refer to any c++ fundamental types.
"Does the second method has any advantages over the first one?"
No. The opposite appears to occur, because of less intuitive readability of the code.
1) For class/struct types there could be overloads of the T& operator[](int index) that do things behind the scenes, but if so, *(array+i) should be implemented to behave consistently.

My university's teacher forces me to use *(array+i) method, telling me that "it's more optimized".
Your teacher is absolutely wrong.
The standard defines array[i] to be equivalent to *(array+i), and there is no reason for a compiler to treat them otherwise. They are the same. Neither will be "more optimized" than the other.
The only reason to recommend one over the other is convention and readability and, in those competitions, array[i] wins.
I wonder what else your teacher is getting wrong? :(

Related

Signed overflow in C++ and undefined behaviour (UB)

I'm wondering about the use of code like the following
int result = 0;
int factor = 1;
for (...) {
result = ...
factor *= 10;
}
return result;
If the loop is iterated over n times, then factor is multiplied by 10 exactly n times. However, factor is only ever used after having been multiplied by 10 a total of n-1 times. If we assume that factor never overflows except on the last iteration of the loop, but may overflow on the last iteration of the loop, then should such code be acceptable? In this case, the value of factor would provably never be used after the overflow has happened.
I'm having a debate on whether code like this should be accepted. It would be possible to put the multiplication inside an if-statement and just not do the multiplication on the last iteration of the loop when it can overflow. The downside is that it clutters the code and adds an unnecessary branch that would need to check for on all the previous loop iterations. I could also iterate over the loop one fewer time and replicate the loop body once after the loop, again, this complicates the code.
The actual code in question is used in a tight inner-loop that consumes a large chunk of the total CPU time in a real-time graphics application.
Compilers do assume that a valid C++ program does not contain UB. Consider for example:
if (x == nullptr) {
*x = 3;
} else {
*x = 5;
}
If x == nullptr then dereferencing it and assigning a value is UB. Hence the only way this could end in a valid program is when x == nullptr will never yield true and the compiler can assume under the as if rule, the above is equivalent to:
*x = 5;
Now in your code
int result = 0;
int factor = 1;
for (...) { // Loop until factor overflows but not more
result = ...
factor *= 10;
}
return result;
The last multiplication of factor cannot happen in a valid program (signed overflow is undefined). Hence also the assignment to result cannot happen. As there is no way to branch before the last iteration also the previous iteration cannot happen. Eventually, the part of code that is correct (i.e., no undefined behaviour ever happens) is:
// nothing :(
The behaviour of int overflow is undefined.
It doesn't matter if you read factor outside the loop body; if it has overflowed by then then the behaviour of your code on, after, and somewhat paradoxically before the overflow is undefined.
One issue that might arise in keeping this code is that compilers are getting more and more aggressive when it comes to optimisation. In particular they are developing a habit where they assume that undefined behaviour never happens. For this to be the case, they may remove the for loop altogether.
Can't you use an unsigned type for factor although then you'd need to worry about unwanted conversion of int to unsigned in expressions containing both?
It might be insightful to consider real-world optimizers. Loop unrolling is a known technique. The basic idea of loop unrolling is that
for (int i = 0; i != 3; ++i)
foo()
might be better implemented behind the scenes as
foo()
foo()
foo()
This is the easy case, with a fixed bound. But modern compilers can also do this
for variable bounds:
for (int i = 0; i != N; ++i)
foo();
becomes
__RELATIVE_JUMP(3-N)
foo();
foo();
foo();
Obviously this only works if the compiler knows that N<=3. And that's where we get back to the original question:
int result = 0;
int factor = 1;
for (...) {
result = ...
factor *= 10;
}
return result;
Because the compiler knows that signed overflow does not occur, it knows that the loop can execute a maximum of 9 times on 32 bits architectures. 10^10 > 2^32. It can therefore do a 9 iteration loop unroll. But the intended maximum was 10 iterations !.
What might happen is that you get a relative jump to a assembly instruction (9-N) with N==10, so an offset of -1, which is the jump instruction itself. Oops. This is a perfectly valid loop optimization for well-defined C++, but the example given turns into a tight infinite loop.
Any signed integer overflow results in undefined behaviour, regardless of whether or not the overflowed value is or might be read.
Maybe in your use-case you can to lift the first iteration out of the loop, turning this
int result = 0;
int factor = 1;
for (int n = 0; n < 10; ++n) {
result += n + factor;
factor *= 10;
}
// factor "is" 10^10 > INT_MAX, UB
into this
int factor = 1;
int result = 0 + factor; // first iteration
for (int n = 1; n < 10; ++n) {
factor *= 10;
result += n + factor;
}
// factor is 10^9 < INT_MAX
With optimization enabled, the compiler might unroll the second loop above into one conditional jump.
This is UB; in ISO C++ terms the entire behaviour of the entire program is completely unspecified for an execution that eventually hits UB. The classic example is as far as the C++ standard cares, it can make demons fly out of your nose. (I recommend against using an implementation where nasal demons are a real possibility). See other answers for more details.
Compilers can "cause trouble" at compile time for paths of execution they can see leading to compile-time-visible UB, e.g. assume those basic blocks are never reached.
See also What Every C Programmer Should Know About Undefined Behavior (LLVM blog). As explained there, signed-overflow UB lets compilers prove that for(... i <= n ...) loops are not infinite loops, even for unknown n. It also lets them "promote" int loop counters to pointer width instead of redoing sign-extension. (So the consequence of UB in that case could be accessing outside the low 64k or 4G elements of an array, if you were expecting signed wrapping of i into its value range.)
In some cases compilers will emit an illegal instruction like x86 ud2 for a block that provably causes UB if ever executed. (Note that a function might not ever be called, so compilers can't in general go berserk and break other functions, or even possible paths through a function that don't hit UB. i.e. the machine code it compiles to must still work for all inputs that don't lead to UB.)
Probably the most efficient solution is to manually peel the last iteration so the unneeded factor*=10 can be avoided.
int result = 0;
int factor = 1;
for (... i < n-1) { // stop 1 iteration early
result = ...
factor *= 10;
}
result = ... // another copy of the loop body, using the last factor
// factor *= 10; // and optimize away this dead operation.
return result;
Or if the loop body is large, consider simply using an unsigned type for factor. Then you can let the unsigned multiply overflow and it will just do well-defined wrapping to some power of 2 (the number of value bits in the unsigned type).
This is fine even if you use it with signed types, especially if your unsigned->signed conversion never overflows.
Conversion between unsigned and 2's complement signed is free (same bit-pattern for all values); the modulo wrapping for int -> unsigned specified by the C++ standard simplifies to just using the same bit-pattern, unlike for one's complement or sign/magnitude.
And unsigned->signed is similarly trivial, although it is implementation-defined for values larger than INT_MAX. If you aren't using the huge unsigned result from the last iteration, you have nothing to worry about. But if you are, see Is conversion from unsigned to signed undefined?. The value-doesn't-fit case is implementation-defined, which means that an implementation must pick some behaviour; sane ones just truncate (if necessary) the unsigned bit pattern and use it as signed, because that works for in-range values the same way with no extra work. And it's definitely not UB. So big unsigned values can become negative signed integers. e.g. after int x = u; gcc and clang don't optimize away x>=0 as always being true, even without -fwrapv, because they defined the behaviour.
If you can tolerate a few additional assembly instructions in the loop, instead of
int factor = 1;
for (int j = 0; j < n; ++j) {
...
factor *= 10;
}
you can write:
int factor = 0;
for (...) {
factor = 10 * factor + !factor;
...
}
to avoid the last multiplication. !factor will not introduce a branch:
xor ebx, ebx
L1:
xor eax, eax
test ebx, ebx
lea edx, [rbx+rbx*4]
sete al
add ebp, 1
lea ebx, [rax+rdx*2]
mov edi, ebx
call consume(int)
cmp r12d, ebp
jne .L1
This code
int factor = 0;
for (...) {
factor = factor ? 10 * factor : 1;
...
}
also results in branchless assembly after optimization:
mov ebx, 1
jmp .L1
.L2:
lea ebx, [rbx+rbx*4]
add ebx, ebx
.L1:
mov edi, ebx
add ebp, 1
call consume(int)
cmp r12d, ebp
jne .L2
(Compiled with GCC 8.3.0 -O3)
You didn't show what's in the parentheses of the for statement, but I'm going to assume it's something like this:
for (int n = 0; n < 10; ++n) {
result = ...
factor *= 10;
}
You can simply move the counter increment and loop termination check into the body:
for (int n = 0; ; ) {
result = ...
if (++n >= 10) break;
factor *= 10;
}
The number of assembly instructions in the loop will remain the same.
Inspired by Andrei Alexandrescu's presentation "Speed Is Found In The Minds of People".
Consider the function:
unsigned mul_mod_65536(unsigned short a, unsigned short b)
{
return (a*b) & 0xFFFFu;
}
According to the published Rationale, the authors of the Standard would have expected that if this function were invoked on (e.g.) a commonplace 32-bit computer with arguments of 0xC000 and 0xC000, promoting the operands of * to signed int would cause the computation to yield -0x10000000, which when converted to unsigned would yield 0x90000000u--the same answer as if they had made unsigned short promote to unsigned. Nonetheless, gcc will sometimes optimize that function in ways that would behave nonsensically if an overflow occurs. Any code where some combination of inputs could cause an overflow must be processed with -fwrapv option unless it would be acceptable to allow creators of deliberately-malformed input to execute arbitrary code of their choosing.
Why not this:
int result = 0;
int factor = 10;
for (...) {
factor *= 10;
result = ...
}
return result;
There are many different faces of Undefined Behavior, and what's acceptable depends on the usage.
tight inner-loop that consumes a large chunk of the total CPU time in a real-time graphics application
That, by itself, is a bit of an unusual thing, but be that as it may... if this is indeed the case, then the UB is most probably within the realm "allowable, acceptable". Graphics programming is notorious for hacks and ugly stuff. As long as it "works" and it doesn't take longer than 16.6ms to produce a frame, usually, nobody cares. But still, be aware of what it means to invoke UB.
First, there is the standard. From that point of view, there's nothing to discuss and no way to justify, your code is simply invalid. There are no ifs or whens, it just isn't a valid code. You might as well say that's middle-finger-up from your point of view, and 95-99% of the time you'll be good to go anyway.
Next, there's the hardware side. There are some uncommon, weird architectures where this is a problem. I'm saying "uncommon, weird" because on the one architecture that makes up 80% of all computers (or the two architectures that together make up 95% of all computers) overflow is a "yeah, whatever, don't care" thing on the hardware level. You sure do get a garbage (although still predictable) result, but no evil things happen.
That is not the case on every architecture, you might very well get a trap on overflow (though seeing how you speak of a graphics application, the chances of being on such an odd architecture are rather small). Is portability an issue? If it is, you may want to abstain.
Last, there is the compiler/optimizer side. One reason why overflow is undefined is that simply leaving it at that was easiest to cope with hardware once upon a time. But another reason is that e.g. x+1 is guaranteed to always be larger than x, and the compiler/optimizer can exploit this knowledge. Now, for the previously mentioned case, compilers are indeed known to act this way and simply strip out complete blocks (there existed a Linux exploit some years ago which was based on the compiler having dead-stripped some validation code because of exactly this).
For your case, I would seriously doubt that the compiler does some special, odd, optimizations. However, what do you know, what do I know. When in doubt, try it out. If it works, you are good to go.
(And finally, there's of course code audit, you might have to waste your time discussing this with an auditor if you're unlucky.)

C++ pointer comparison with []operator for arrays?

I have been reading a book which says that accessing array elements by pointer arithmetic's is much faster than the [] operator. In short this code is faster than this code.
The book does not say why. Is it advisible to use such pointer arithmetic's even if it provides significant improvement in speed?
#include <iostream>
using namespace std;
int main() {
// your code goes here
double *array = new double[1000000];
for(int i = 0; i < 1000000; i++)
{
array[i] = 0;//slower?
}
delete[] array;
return 0;
}
#include <iostream>
using namespace std;
int main() {
// your code goes here
double *array = new double[1000000];
for(int i = 0; i < 1000000; i++)
{
*(array + i) = 0;//faster?
}
delete[] array;
return 0;
}
EDIT:
Quote from book pg 369, 2nd last line
The pointer accessing method is much faster than array indexing.
No, they are exactly the same thing. I definitely suggest you to drop that book and pick another one up as soon as possible.
And even if there was any performance difference, the clarity of x[12] over *(x + 12) is much more important.
Array indices are just syntactic sugar for pointer arithmetic. Your compiler will boil down a[i] into *((a) + (i)). Agreed, run away from that book!
For more in-depth explanations, see
SO Answer
Eli Bendersky's explanation
There is no difference at all, if we go to the draft C++ standard section 5.2.1 Subscripting paragraph 1 says (emphasis mine):
[...]The expression E1[E2] is identical (by definition) to *((E1)+(E2)) [Note: see 5.3 and 5.7 for details of * and + and 8.3.4 for details of arrays. —end note ]
Utter rubbish. a[x] on a plain array decays into *(a + x). There will literally be 0 performance difference.
The book is just plain wrong - especially if those are the actual examples they gave. Decent compilers are likely to produce identical code for both methods, even without optimization and they will have identical performance.
Without optimization, or with compilers from the 80s, you might get performance differences with some types of pointer arithmetic, but the examples don't even represent that case. The examples are basically just different syntax for the same thing.
Here's an example that could plausibly have a performance difference (versus the array index case which is unchanged):
int main() {
// your code goes here
double *array = new double[1000000], *ptr = array;
for(; ptr < array + 1000000; ptr++)
{
*ptr = 0;
}
return 0;
}
Here, you aren't indexing against the base pointer each time through the loop, but are incrementing the pointer each time. In theory, you avoid the multiplication implicit in indexing, resulting in a faster loop. In practice, any decent compiler can reduce the indexed form to the additive form, and on modern hardware the multiplication by sizeof(double) implied by indexing is often free as part of an instruction like lea (load effective address), so even at the assembly level the indexed version may not be slower (and may in fact be faster since it avoids a loop-carried dependency and also lends itself better to aliasing analysis).
Your two forms are the same, you're not really doing pointer arithmetic.
The pointer form would be:
double * array= new double[10000000] ;
double * dp= array ;
for ( int i= 0 ; ( i < 10000000 ) ; i ++, dp ++ )
{
* dp= 0 ;
}
Hear, the address in dp is moved to the next one via an add. In the other forms, the address is calculated each go through the loop by multiplying i time sizeof(double) and adding it to array. Its the multiply that historically was slower than the add.

is it always faster to store multiple class calls in a variable?

If you have a method such as this:
float method(myClass foo)
{
return foo.PrivateVar() + foo.PrivateVar();
}
is it always faster/better to do this instead?:
float method(myClass foo)
{
float temp = foo.PrivateVar();
return temp + temp;
}
I know you're not supposed to put a call like foo.PrivateVar() in a for loop for example, because it evaluates it many times when you actually only need to use the value once (in some cases.
for (int i = 0; i < foo.PrivateInt(); i++)
{
//iterate through stuff with 'i'
}
from this I made the assumption to change code like the first example to that in the second, but then I've been told by people to not try to be smarter than the compiler! and that it could very well inline the calls.
I don't want to profile anything, I just want a few simple rules for good practice on this. I'm writing a demo for a job application and I don't want anyone to look at the code and see some rookie mistake.
That completely depends on what PrivateVar() is doing and where it's defined etc. If the compiler has access to the code in PrivateVar() and can guarantee that there are no side effects by calling the function it can do CSE which is basically what you've done in your second code example.
Exactly the same is true for your for loop. So if you want to be sure it's only evaluated once because it's a hugely expensive function (which also means that guaranteeing no side-effects get's tricky even if there aren't any) write it explicitly.
If PrivateVar() is just a getter, write the clearer code - even if the compiler may not do CSE the performance difference won't matter in 99.9999% of all cases.
Edit: CSE stands for Common Subexpression eliminiation and does exactly what it stands for ;) The wiki page shows an example for a simple multiplication, but we can do this for larger code constructs just as well, like for example a function call.
In all cases we have to guarantee that only evaluating the code once doesn't change the semantics, i.e. doing CSE for this code:
a = b++ * c + g;
d = b++ * c * d;
and changing it to:
tmp = b++ * c;
a = tmp + g;
d = tmp * d;
would obviously be illegal (for function calls this is obviously a bit more complex, but it's the same principle).

Question about optimization in C++

I've read that the C++ standard allows optimization to a point where it can actually hinder with expected functionality. When I say this, I'm talking about return value optimization, where you might actually have some logic in the copy constructor, yet the compiler optimizes the call out.
I find this to be somewhat bad, as in someone who doesn't know this might spend quite some time fixing a bug resulting from this.
What I want to know is whether there are any other situations where over-optimization from the compiler can change functionality.
For example, something like:
int x = 1;
x = 1;
x = 1;
x = 1;
might be optimized to a single x=1;
Suppose I have:
class A;
A a = b;
a = b;
a = b;
Could this possibly also be optimized? Probably not the best example, but I hope you know what I mean...
Eliding copy operations is the only case where a compiler is allowed to optimize to the point where side effects visibly change. Do not rely on copy constructors being called, the compiler might optimize away those calls.
For everything else, the "as-if" rule applies: The compiler might optimize as it pleases, as long as the visible side effects are the same as if the compiler had not optimized at all.
("Visible side effects" include, for example, stuff written to the console or the file system, but not runtime and CPU fan speed.)
It might be optimized, yes. But you still have some control over the process, for example, suppose code:
int x = 1;
x = 1;
x = 1;
x = 1;
volatile int y = 1;
y = 1;
y = 1;
y = 1;
Provided that neither x, nor y are used below this fragment, VS 2010 generates code:
int x = 1;
x = 1;
x = 1;
x = 1;
volatile int y = 1;
010B1004 xor eax,eax
010B1006 inc eax
010B1007 mov dword ptr [y],eax
y = 1;
010B100A mov dword ptr [y],eax
y = 1;
010B100D mov dword ptr [y],eax
y = 1;
010B1010 mov dword ptr [y],eax
That is, optimization strips all lines with "x", and leaves all four lines with "y". This is how volatile works, but the point is that you still have control over what compiler does for you.
Whether it is a class, or primitive type - all depends on compiler, how sophisticated it's optimization caps are.
Another code fragment for study:
class A
{
private:
int c;
public:
A(int b)
{
*this = b;
}
A& operator = (int b)
{
c = b;
return *this;
}
};
int _tmain(int argc, _TCHAR* argv[])
{
int b = 0;
A a = b;
a = b;
a = b;
return 0;
}
Visual Studio 2010 optimization strips all the code to nothing, in release build with "full optimization" _tmain does just nothing and immediately returns zero.
This will depend on how class A is implemented, whether the compiler can see the implementation and whether it is smart enough. For example, if operator=() in class A has some side effects such optimizing out would change the program behavior and is not possible.
Optimization does not (in proper term) "remove calls to copy or assignments".
It convert a finite state machine in another finite state, machine with a same external behaviour.
Now, if you repeadly call
a=b; a=b; a=b;
what the compiler do depends on what operator= actually is.
If the compiler founds that a call have no chances to alter the state of the program (and the "state of the program" is "everything lives longer than a scope that a scope can access") it will strip it off.
If this cannot be "demonstrated" the call will stay in place.
Whatever the compiler will do, don't worry too much about: the compiler cannot (by contract) change the external logic of a program or of part of it.
i dont know c++ that much but am currently reading Compilers-Principles, techniques and tools
here is a snippet from their section on code optimization:
the machine-independent code-optimization phase attempts to improve
intermediate code so that better target code will result. Usually
better means faster, but other objectives may be desired, such as
shorter code, or target code that consumes less power. for example a
straightforward algorithm generates the intermediate code (1.3) using
an instruction for each operator in the tree representation that comes
from semantic analyzer. a simple intermediate code generation
algorithm followed by code optimization is a reasonable way to
generate good target code. the optimizar can duduce that the
conversion of 60 from integer to floating point can be done once and
for all at compile time, so the inttofloat operation can be eliminated
by replacing the integer 6- by the floating point number 60.0.
moreover t3 is used only once to trasmit its value to id1 so the
optimizer can transform 1.3 into the shorter sequence (1.4)
1.3
t1 - intoffloat(60
t2 -- id3 * id1
ts -- id2 + t2
id1 t3
1.4
t1=id3 * 60.0
id1 = id2 + t1
all and all i mean to say that code optimization should come at a much deeper level and because the code is at such a simple state is doesnt effect what your code does
I had some trouble with const variables and const_cast. The compiler produced incorrect results when it was used to calculate something else. The const variable was optimized away, its old value was made into a compile-time constant. Truly "unexpected behavior". Okay, perhaps not ;)
Example:
const int x = 2;
const_cast<int&>(x) = 3;
int y = x * 2;
cout << y << endl;

C and C++: Array element access pointer vs int

Is there a performance difference if you either do myarray[ i ] or store the adress of myarray[ i ] in a pointer?
Edit: The pointers are all calculated during an unimportant step in my program where performance is no criteria. During the critical parts the pointers remain static and are not modified. Now the question is if these static pointers are faster than using myarray[ i ] all the time.
For this code:
int main() {
int a[100], b[100];
int * p = b;
for ( unsigned int i = 0; i < 100; i++ ) {
a[i] = i;
*p++ = i;
}
return a[1] + b[2];
}
when built with -O3 optimisation in g++, the statement:
a[i] = i;
produced the assembly output:
mov %eax,(%ecx,%eax,4)
and this statement:
*p++ = i;
produced:
mov %eax,(%edx,%eax,4)
So in this case there was no difference between the two. However, this is not and cannot be a general rule - the optimiser might well generate completely different code for even a slightly different input.
It will probably make no difference at all. The compiler will usually be smart enough to know when you are using an expression more than once and create a temporary itself, if appropriate.
Compilers can do surprising optimizations; the only way to know is to read the generated assembly code.
With GCC, use -S, with -masm=intel for Intel syntax.
With VC++, use /FA (IIRC).
You should also enable optimizations: -O2 or -O3 with GCC, and /O2 with VC++.
I prefer using myarray[ i ] since it is more clear and the compiler has easier time compiling this to optimized code.
When using pointers it is more complex for the compiler to optimize this code since it's harder to know exactly what you're doing with the pointer.
There should not be much different but by using indexing you avoid all types of different pitfalls that the compiler's optimizer is prone to (aliasing being the most important one) and thus I'd say the indexing case should be easier to handle for the compiler. This doesn't mean that you should take care of aforementioned things before the loop, but pointers in a loop generally just adds to the complexity.
Yes. Having a pointer the address won't be calculated by using the initial address of the array. It will accessed directly. So you have a little performance improve if you save the address in a pointer.
But the compiler will usually optimize the code and use the pointer in both cases (if you have statical arrays)
For dynamic arrays (created with new) the pointer will offer you more performance as the compiler cannot optimize array accesses at compile time.
There will be no substantial difference. Premature optimization is the root of all evil - get a profiler before checking micro-optimizations like this. Also, the myarray[i] is more portable to custom types, such as a std::vector.
Okay so your questions is, whats faster:
int main(int argc, char **argv)
{
int array[20];
array[0] = 0;
array[1] = 1;
int *value_1 = &array[1];
printf("%d", *value_1);
printf("%d", array[1]);
printf("%d", *(array + 1));
}
Like someone else already pointed out, compilers can do clever optimization. Of course this is depending on where an expression is used, but normally you shouldn't care about those subtle differences. All your assumption can be proven wrong by the compiler. Today you shouldn't need to care about such differences.
For example the above code produces the following (only snippet):
mov [ebp+var_54], 1 #store 1
lea eax, [ebp+var_58] # load the address of array[0]
add eax, 4 # add 4 (size of int)
mov [ebp+var_5C], eax
mov eax, [ebp+var_5C]
mov eax, [eax]
mov [esp+88h+var_84], eax
mov [esp+88h+var_88], offset unk_403000 # points to %d
call printf
mov eax, [ebp+var_54]
mov [esp+88h+var_84], eax
mov [esp+88h+var_88], offset unk_403000
call printf
mov eax, [ebp+var_54]
mov [esp+88h+var_84], eax
mov [esp+88h+var_88], offset unk_403000
call printf
Short answer: the only way to know for sure is to code up both versions and compare performance. I would personally be surprised if there was a measureable difference unless you were doing a lot of array accesses in a really tight loop. If this is something that happens once or twice over the lifetime of the program, or depends on user input, it's not worth worrying about.
Remember that the expression a[i] is evaluated as *(a+i), which is an addition plus a dereference, whereas *p is just a dereference. Depending on how the code is structured, though, it may not make a difference. Assume the following:
int a[N]; // for any arbitrary N > 1
int *p = a;
size_t i;
for (i = 0; i < N; i++)
printf("a[%d] = %d\n", i, a[i]);
for (i = 0; i < N; i++)
printf("*(%p) = %d\n", (void*) p, *p++);
Now we're comparing a[i] to *p++, which is a dereference plus a postincrement (in addition to the i++ in the loop control); that may turn out to be a more expensive operation than the array subscript. Not to mention we've introduced another variable that's not strictly necessary; we're trading a little space for what may or may not be an improvement in speed. It really depends on the compiler, the structure of the code, optimization settings, OS, and CPU.
Worry about correctness first, then worry about readability/maintainability, then worry about safety/reliability, then worry about performance. Unless you're failing to meet a hard performance requirement, focus on making your intent clear and easy to understand. It doesn't matter how fast your code is if it gives you the wrong answer or performs the wrong action, or if it crashes horribly at the first hint of bad input, or if you can't fix bugs or add new features without breaking something.
Yes.. when storing myarray[i] pointer it will perform better (if used on large scale...)
Why??
It will save you an addition and may be a multiplication (or a shift..)
Many compilers may optimize that for you in case of static memory allocation.
If you are using dynamic memory allocation, the compiler will not optimize it, because it is in runtime!