This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
In C arrays why is this true? a[5] == 5[a]
Is the possibility of both array[index] and index[array] a compiler feature or a language feature. How is the second one possible?
The compiler will turn
index[array]
into
*(index + array)
With the normal syntax it would turn
array[index]
into
*(array + index)
and thus you see that both expressions evaluate to the same value. This holds for both C and C++.
From the earliest days of C, the expression a[i] was simply the address of a[0] added to i (scaled up by the size of a[0]) and then de-referenced. In fact, all these were equivalent:
a[i]
i[a]
*(a+i)
====
The only thing I'd be concerned about is the actual de-referencing. Whilst they all produce the same address, de-referencing may be a concern if the types of a and i are different.
For example:
int i = 4;
long a[9];
long x = a[i]; //get the long at memory location X.
long x = i[a]; //get the int at memory location X?
I haven't actually tested that behavior but it's something you may want to watch out for. If it does change what gets de-referenced, it's likely to cause all sorts of problems with arrays of objects as well.
====
Update:
You can probably safely ignore the bit above between the ===== lines. I've tested it under Cygwin with a short and a long and it seems okay, so I guess my fears were unfounded, at least for the basic cases. I still have no idea what happens with more complicated ones because it's not something I'm ever likely to want to do.
As Matthew Wilson discusses in Imperfect C++, this can be used to enforce type safety in C++, by preventing use of DIMENSION_OF()-like macros with instances of types that define the subscript operator, as in:
#define DIMENSION_OF_UNSAFE(x) (sizeof(x) / sizeof((x)[0]))
#define DIMENSION_OF_SAFER(x) (sizeof(x) / sizeof(0[(x)]))
int ints[4];
DIMENSION_OF_UNSAFE(ints); // 4
DIMENSION_OF_SAFER(ints); // 4
std::vector v(4);
DIMENSION_OF_UNSAFE(v); // gives impl-defined value; v likely wrong
DIMENSION_OF_SAFER(v); // does not compile
There's more to this, for dealing with pointers, but that requires some additional template smarts. Check out the implementation of STLSOFT_NUM_ELEMENTS() in the STLSoft libraries, and read about it all in chapter 14 of Imperfect C++.
edit: some of the commenters suggest that the implementation does not reject pointers. It does (as well as user-defined types), as illustrated by the following program. You can verify this by uncommented lines 16 and 18. (I just did this on Mac/GCC4, and it rejects both forms).
#include <stlsoft/stlsoft.h>
#include <vector>
#include <stdio.h>
int main()
{
int ar[1];
int* p = ar;
std::vector<int> v(1);
printf("ar: %lu\n", STLSOFT_NUM_ELEMENTS(ar));
// printf("p: %lu\n", STLSOFT_NUM_ELEMENTS(p));
// printf("v: %lu\n", STLSOFT_NUM_ELEMENTS(v));
return 0;
}
In C and C++ (with array being a pointer or array) it is a language feature: pointer arithmetic. The operation a[b] where either a or b is a pointer is converted into pointer arithmetic: *(a + b). With addition being symetrical, reordering does not change meaning.
Now, there are differences for non-pointers. In fact given a type A with overloaded operator[], then a[4] is a valid method call (will call A::operator ) but the opposite will not even compile.
Related
I find something interesting when i tried to check the different between two variables (you can see in the code below)
#include <stdio.h>
#include <conio.h>
int main() {
int a, b;
printf("%d", (int)&a - (int)&b);
getch();
return 0;
}
And for each time, result is 12 . I do not know why the result is 12 and I think the result must be 4 (or -4). My computer is 64-bit, Please explain me.
No there is no way you can say that the result must be 4 simply because you know sizeof int is 4 in your case.
There is no standard-compliant,
portable way of doing what you are looking for(getting the difference between address of two int variables not part of any array).
Declaring two int vars in two consecutive lines doesn't mean they will be placed in memory consecutively. It might be very much possible that the ordering is different from what you expected it to be. (In this case it is int a,b I talk about here). If you want the ints to be adjacent in memory, an array (like int ab[2]) is the only option that ISO C guarantees will give you that on all implementations. (On most C implementations, you could also use a struct, but that's in theory not fully portable.2)
As pointed out this code is typecasting the pointer to int which invokes implementation defined behavior. Also note that signed integer overflow is UB, and there is no guarantee that int can hold the address in a particular system. Thus intptr_t should be a safe way to avoid UB and get a merely implementation-defined result from subtracting the integer value of addresses of separate objects.
The nice point as mentioned that if we consider that the architecture implements flat addressing (like almost every C implementation in real use) then we can simply cast the pointers to intptr_t and subtract it to get the result1. But well as it says - standard never constrained this particular memory layout (it doesn't demand the architecture to be like this) -- to be much more robust and applicable to large number of systems. Whatever is said holds true until we consider that an implementation in an architecture without a flat address space might have some issues that require it to access elements complicated ways.
Note: if you run this piece of code with gcc with or without different optimization flags (-O3,-O2 etc) you will likely get the desired result of +4 or -4. This must be the compiler specific case which gave you this result. (It most likely not gcc).
Footnotes
To covert an object address to an integer is a 2 stage process: convert to a void * first, then to an integer like intptr_t/uintptr_t. To print the difference of two such integers, use PRIdPTR/PRIuPTR. intptr_t and uintptr_t are optional types, yet very commonly available since C99. If intptr_t/uintptr_t are not available, cast to the widest available type and use its matching specifier.
#include <inttypes.h>
// printf("%d", (int)&a - (int)&b);
printf("%" PRIdPTR, (intptr_t)(void*)&a - (intptr_t)(void*)&b);
// or pre-C99
printf("%ld", (long)(void*)&a - (long)(void*)&b);
struct layout and type sizes:
In practice, struct intpair { int a,b; } ab; will also have consecutive a and b on mainstream implementations, but ISO C allows arbitrary amounts of padding in struct layouts. It does require that struct members have increasing addresses, though, so the compile can pad but not reorder your structs. (Or classes in C++; the rules are the same there).
And thus to minimize padding (for speed / cache space efficiency), it's often a good idea to sort members from largest to smallest, because many types have an alignment requirement equal to their width. Or to group smaller members in pairs / quads if you want to place them before wider members. Keep in mind that many real implementations differ between 32 / 64-bit pointers, and / or 32 / 64-bit long. e.g. 64-bit pointers and 32-bit long on x86-64 Windows, but 64/64 on x86-64 everything else. Of course pure ISO C only sets minimum ranges of values that types must be able to represent, and that their minimum sizeof is 1, but most modern CPUs (and mainstream C implementations for them) have settled on 32-bit int.
Avoid writing code that depends on assumptions like that for correctness, but it's useful to keep in mind when considering performance.
Since the standard does not specify the arithmetics for pointers to unrelated objects, subtracting such pointers is prone to UB and it is implementation dependant.
Since it is implementation dependent one cannot count on the results.
Typically, from the experience, complier would allocated two integers on the program stack close (size by side) to each other. In this case for system with flat memory architecture subtracting addresses would give us the size of the int.
This is test to check what your program could give you:
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main() {
int a, b;
// Print address of the variables of a and b
printf("Address of b %p\n", (void *)&b);
printf("Address of a %p\n", (void *)&a);
// THIS IS PRONE TO UB since pointers `&a` and '&b' not related to each other:
printf("Substructing pointers: %lld\n", &b - &a );
// Now we substract addresses:
// Get the distance in memory on any architecture with flat addressing.
printf("\nSubtracting addr: %lld\n", (long long int)&b - (long long int)&a);
printf("Subtracting addr: %lld\n", (__intptr_t)(void *)&b - (__intptr_t)(void *)&a);
printf("%" PRIdPTR, (intptr_t)(void*)&a - (intptr_t)(void*)&b);
return 0;
}
Output:
Address of b 0x7ffc2d3cd2d4
Address of a 0x7ffc2d3cd2d0
Substructing pointers: 1
Subtracting addr: 4
Subtracting addr: 4
-4
I have been reading a book which says that accessing array elements by pointer arithmetic's is much faster than the [] operator. In short this code is faster than this code.
The book does not say why. Is it advisible to use such pointer arithmetic's even if it provides significant improvement in speed?
#include <iostream>
using namespace std;
int main() {
// your code goes here
double *array = new double[1000000];
for(int i = 0; i < 1000000; i++)
{
array[i] = 0;//slower?
}
delete[] array;
return 0;
}
#include <iostream>
using namespace std;
int main() {
// your code goes here
double *array = new double[1000000];
for(int i = 0; i < 1000000; i++)
{
*(array + i) = 0;//faster?
}
delete[] array;
return 0;
}
EDIT:
Quote from book pg 369, 2nd last line
The pointer accessing method is much faster than array indexing.
No, they are exactly the same thing. I definitely suggest you to drop that book and pick another one up as soon as possible.
And even if there was any performance difference, the clarity of x[12] over *(x + 12) is much more important.
Array indices are just syntactic sugar for pointer arithmetic. Your compiler will boil down a[i] into *((a) + (i)). Agreed, run away from that book!
For more in-depth explanations, see
SO Answer
Eli Bendersky's explanation
There is no difference at all, if we go to the draft C++ standard section 5.2.1 Subscripting paragraph 1 says (emphasis mine):
[...]The expression E1[E2] is identical (by definition) to *((E1)+(E2)) [Note: see 5.3 and 5.7 for details of * and + and 8.3.4 for details of arrays. —end note ]
Utter rubbish. a[x] on a plain array decays into *(a + x). There will literally be 0 performance difference.
The book is just plain wrong - especially if those are the actual examples they gave. Decent compilers are likely to produce identical code for both methods, even without optimization and they will have identical performance.
Without optimization, or with compilers from the 80s, you might get performance differences with some types of pointer arithmetic, but the examples don't even represent that case. The examples are basically just different syntax for the same thing.
Here's an example that could plausibly have a performance difference (versus the array index case which is unchanged):
int main() {
// your code goes here
double *array = new double[1000000], *ptr = array;
for(; ptr < array + 1000000; ptr++)
{
*ptr = 0;
}
return 0;
}
Here, you aren't indexing against the base pointer each time through the loop, but are incrementing the pointer each time. In theory, you avoid the multiplication implicit in indexing, resulting in a faster loop. In practice, any decent compiler can reduce the indexed form to the additive form, and on modern hardware the multiplication by sizeof(double) implied by indexing is often free as part of an instruction like lea (load effective address), so even at the assembly level the indexed version may not be slower (and may in fact be faster since it avoids a loop-carried dependency and also lends itself better to aliasing analysis).
Your two forms are the same, you're not really doing pointer arithmetic.
The pointer form would be:
double * array= new double[10000000] ;
double * dp= array ;
for ( int i= 0 ; ( i < 10000000 ) ; i ++, dp ++ )
{
* dp= 0 ;
}
Hear, the address in dp is moved to the next one via an add. In the other forms, the address is calculated each go through the loop by multiplying i time sizeof(double) and adding it to array. Its the multiply that historically was slower than the add.
This question already has answers here:
Access array beyond the limit in C and C++ [duplicate]
(7 answers)
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 9 years ago.
Say I have an array like so:
int val[10];
and I intentionally index it with everything from negative values to anything higher than 9, but WITHOUT using the resulting value in any way. This would be for performance reasons (perhaps it's more efficient to check the input index AFTER the array access has been made).
My questions are:
Is it safe to do so, or will I run into some sort of memory protection barriers, risk corrupting memory or similar for certain indices?
Is it perhaps not at all efficient if I access data out of range like this? (assuming the array has no built in range check).
Would it be considered bad practice? (assuming a comment is written to indicate we're aware of using out of range indices).
It is undefined behavior. By definition, undefined means "anything could happen." Your code could crash, it could work perfectly, it could bring about peace and harmony amongst all humans. I wouldn't bet on the second or the last.
It is Undefined Behavior, and you might actually run afoul of the optimizers.
Imagine this simple code example:
int select(int i) {
int values[10] = { .... };
int const result = values[i];
if (i < 0 or i > 9) throw std::out_of_range("out!");
return result;
}
And now look at it from an optimizer point of view:
int values[10] = { ... };: valid indexes are in [0, 9].
values[i]: i is an index, thus i is in [0, 9].
if (i < 0 or i > 9) throw std::out_of_range("out!");: i is in [0, 9], never taken
And thus the function rewritten by the optimizer:
int select(int i) {
int values[10] = { ... };
return values[i];
}
For more amusing stories about forward and backward propagation of assumptions based on the fact that the developer is not doing anything forbidden, see What every C programmer should know about Undefined Behavior: Part 2.
EDIT:
Possible work-around: if you know that you will access from -M to +N you can:
declare the array with appropriate buffer: int values[M + 10 + N]
offset any access: values[M + i]
As verbose said, this yields undefined behavior. A bit more precision follows.
5.2.1/1 says
[...] The expression E1[E2] is identical (by definition) to *((E1)+(E2))
Hence, val[i] is equivalent to *((val)+i)). Since val is an array, the array-to-pointer conversion (4.2/1) occurs before the addition is performed. Therefore, val[i] is equivalent to *(ptr + i) where ptr is an int* set to &val[0].
Then, 5.7/2 explains what ptr + i points to. It also says (emphasis are mine):
[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
In the case of ptr + i, ptr is the pointer operand and the result is ptr + i. According to the quote above, both should point to an element of the array or to one past the last element. That is, in the OP's case ptr + i is a well defined expression for all i = 0, ..., 10. Finally, *(ptr + i) is well defined for 0 <= i < 10 but not for i = 10.
Edit:
I'm puzzled to whether val[10] (or, equivalently, *(ptr + 10)) yields undefined behavior or not (I'm considering C++ not C). In some circumstances this is true (e.g. int x = val[10]; is undefined behavior) but in others this is not so clear. For instance,
int* p = &val[10];
As we have seen, this is equivalent to int* p = &*(ptr + 10); which could be undefined behavior (because it dereferences a pointer to one past the last element of val) or the same as int* p = ptr + 10; which is well defined.
I found these two references which show how fuzzy this question is:
May I take the address of the one-past-the-end element of an array?
Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?
If you put it in a structure with some padding ints, it should be safe (since the pointer actually points to "known" destinations).
But it's better to avoid it.
struct SafeOutOfBoundsAccess
{
int paddingBefore[6];
int val[10];
int paddingAfter[6];
};
void foo()
{
SafeOutOfBoundsAccess a;
bool maybeTrue1 = a.val[-1] == a.paddingBefore[5];
bool maybeTrue2 = a.val[10] == a.paddingAfter[0];
}
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Pointer Arithmetic
The given code
int arr[12];
int * cur = arr;
cur++;
cout<<"cur-arr = "<<cur-arr<<endl;
Outputs 1, but I expected sizeof(int). Can someone please explain the nature of this behavior?
It's a defined behavior of C pointer arithmetic. It uses the size of pointed type as a unit. If you change subtraction in the last line to
(char *)cur - (char *)arr
you get 4 in the output.
This is the number of elements (ints here) between arr and cur (which is arr+1 at the time of subtraction). Compiler takes note that cur is a pointer to an integer and arr is an integer array. To get total number of bytes, try this:
(cur - arr) * sizeof(arr[0]);
cur is a pointer to int, initialized to some value (arr - the semantics of array-to-pointer conversion are irrelevant here), incremented (cur++) and compared to its old value. Unsurprisingly, it grew by one through the increment operation.
Pointer arithmetic with a given type works just like regular arithmetic. While the pointer is advanced by sizeof(int) bytes in this example, the difference between pointers is also calculated in units of sizeof(int), so you see plain simple arithmetics.
Addition and substraction for pointers works in accordance to the pointer type.
Can someone point me the to the implementation of sizeof operator in C++ and also some description about its implementation.
sizeof is one of the operator that cannot be overloaded.
So it means we cannot change its default behavior?
sizeof is not a real operator in C++. It is merely special syntax which inserts a constant equal to the size of the argument. sizeof doesn't need or have any runtime support.
Edit: do you want to know how to determine the size of a class/structure looking at its definition? The rules for this are part of the ABI, and compilers merely implement them. Basically the rules consist of
size and alignment definitions for primitive types;
structure, size and alignment of the various pointers;
rules for packing fields in structures;
rules about virtual table-related stuff (more esoteric).
However, ABIs are platform- and often vendor-specific, i.e. on x86 and (say) IA64 the size of A below will be different because IA64 does not permit unaligned data access.
struct A
{
char i ;
int j ;
} ;
assert (sizeof (A) == 5) ; // x86, MSVC #pragma pack(1)
assert (sizeof (A) == 8) ; // x86, MSVC default
assert (sizeof (A) == 16) ; // IA64
http://en.wikipedia.org/wiki/Sizeof
Basically, to quote Bjarne Stroustrup's C++ FAQ:
Sizeof cannot be overloaded because built-in operations, such as incrementing a pointer into an array implicitly depends on it. Consider:
X a[10];
X* p = &a[3];
X* q = &a[3];
p++; // p points to a[4]
// thus the integer value of p must be
// sizeof(X) larger than the integer value of q
Thus, sizeof(X) could not be given a
new and different meaning by the
programmer without violating basic
language rules.
No, you can't change it. What do you hope to learn from seeing an implementation of it?
What sizeof does can't be written in C++ using more basic operations. It's not a function, or part of a library header like e.g. printf or malloc. It's inside the compiler.
Edit: If the compiler is itself written in C or C++, then you can think of the implementation being something like this:
size_t calculate_sizeof(expression_or_type)
{
if (is_type(expression_or_type))
{
if (is_array_type(expression_or_type))
{
return array_size(exprssoin_or_type) *
calculate_sizeof(underlying_type_of_array(expression_or_type));
}
else
{
switch (expression_or_type)
{
case int_type:
case unsigned_int_type:
return 4; //for example
case char_type:
case unsigned_char_type:
case signed_char_type:
return 1;
case pointer_type:
return 4; //for example
//etc., for all the built-in types
case class_or_struct_type:
{
int base_size = compiler_overhead(expression_or_type);
for (/*loop over each class member*/)
{
base_size += calculate_sizeof(class_member) +
padding(class_member);
}
return round_up_to_multiple(base_size,
alignment_of_type(expression_or_type));
}
case union_type:
{
int max_size = 0;
for (/*loop over each class member*/)
{
max_size = max(max_size,
calculate_sizeof(class_member));
}
return round_up_to_multiple(max_size,
alignment_of_type(expression_or_type));
}
}
}
}
else
{
return calculate_sizeof(type_of(expression_or_type));
}
}
Note that is is very much pseudo-code. There's lots of things I haven't included, but this is the general idea. The compiler probably doesn't actually do this. It probably calculates the size of a type (including a class) and stores it, instead of recalculating every time you write sizeof(X). It is also allowed to e.g. have pointers being different sizes depending on what they point to.
sizeof does what it does at compile time. Operator overloads are simply functions, and do what they do at run time. It is therefore not possible to overload sizeof, even if the C++ Standard allowed it.
sizeof is a compile-time operator, which means that it is evaluated at compile-time.
It cannot be overloaded, because it already has a meaning on all user-defined types - the sizeof() a class is the size that the object the class defines takes in memory, and the sizeof() a variable is the size that the object the variable names occupies in memory.
Unless you need to see how C++-specific sizes are calculated (such as allocation for the v-table), you can look at Plan9's C compiler. It's much simpler than trying to tackle g++.
Variable:
#define getsize_var(x) ((char *)(&(x) + 1) - (char *)&(x))
Type:
#define getsize_type(type) ( (char*)((type*)(1) + 1) - (char*)((type *)(1)))
Take a look at the source for the Gnu C++ compiler for an real-world look at how this is done.