C/C++ intentional out of range indexing [duplicate] - c++

This question already has answers here:
Access array beyond the limit in C and C++ [duplicate]
(7 answers)
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 9 years ago.
Say I have an array like so:
int val[10];
and I intentionally index it with everything from negative values to anything higher than 9, but WITHOUT using the resulting value in any way. This would be for performance reasons (perhaps it's more efficient to check the input index AFTER the array access has been made).
My questions are:
Is it safe to do so, or will I run into some sort of memory protection barriers, risk corrupting memory or similar for certain indices?
Is it perhaps not at all efficient if I access data out of range like this? (assuming the array has no built in range check).
Would it be considered bad practice? (assuming a comment is written to indicate we're aware of using out of range indices).

It is undefined behavior. By definition, undefined means "anything could happen." Your code could crash, it could work perfectly, it could bring about peace and harmony amongst all humans. I wouldn't bet on the second or the last.

It is Undefined Behavior, and you might actually run afoul of the optimizers.
Imagine this simple code example:
int select(int i) {
int values[10] = { .... };
int const result = values[i];
if (i < 0 or i > 9) throw std::out_of_range("out!");
return result;
}
And now look at it from an optimizer point of view:
int values[10] = { ... };: valid indexes are in [0, 9].
values[i]: i is an index, thus i is in [0, 9].
if (i < 0 or i > 9) throw std::out_of_range("out!");: i is in [0, 9], never taken
And thus the function rewritten by the optimizer:
int select(int i) {
int values[10] = { ... };
return values[i];
}
For more amusing stories about forward and backward propagation of assumptions based on the fact that the developer is not doing anything forbidden, see What every C programmer should know about Undefined Behavior: Part 2.
EDIT:
Possible work-around: if you know that you will access from -M to +N you can:
declare the array with appropriate buffer: int values[M + 10 + N]
offset any access: values[M + i]

As verbose said, this yields undefined behavior. A bit more precision follows.
5.2.1/1 says
[...] The expression E1[E2] is identical (by definition) to *((E1)+(E2))
Hence, val[i] is equivalent to *((val)+i)). Since val is an array, the array-to-pointer conversion (4.2/1) occurs before the addition is performed. Therefore, val[i] is equivalent to *(ptr + i) where ptr is an int* set to &val[0].
Then, 5.7/2 explains what ptr + i points to. It also says (emphasis are mine):
[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
In the case of ptr + i, ptr is the pointer operand and the result is ptr + i. According to the quote above, both should point to an element of the array or to one past the last element. That is, in the OP's case ptr + i is a well defined expression for all i = 0, ..., 10. Finally, *(ptr + i) is well defined for 0 <= i < 10 but not for i = 10.
Edit:
I'm puzzled to whether val[10] (or, equivalently, *(ptr + 10)) yields undefined behavior or not (I'm considering C++ not C). In some circumstances this is true (e.g. int x = val[10]; is undefined behavior) but in others this is not so clear. For instance,
int* p = &val[10];
As we have seen, this is equivalent to int* p = &*(ptr + 10); which could be undefined behavior (because it dereferences a pointer to one past the last element of val) or the same as int* p = ptr + 10; which is well defined.
I found these two references which show how fuzzy this question is:
May I take the address of the one-past-the-end element of an array?
Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

If you put it in a structure with some padding ints, it should be safe (since the pointer actually points to "known" destinations).
But it's better to avoid it.
struct SafeOutOfBoundsAccess
{
int paddingBefore[6];
int val[10];
int paddingAfter[6];
};
void foo()
{
SafeOutOfBoundsAccess a;
bool maybeTrue1 = a.val[-1] == a.paddingBefore[5];
bool maybeTrue2 = a.val[10] == a.paddingAfter[0];
}

Related

Negative array index

I have a pointer which is defined as follows:
A ***b;
What does accessing it as follows do:
A** c = b[-1]
Is it an access violation because we are using a negative index to an array? Or is it a legal operation similar to *--b?
EDIT Note that negative array indexing has different support in C and C++. Hence, this is not a dupe.
X[Y] is identical to *(X + Y) as long as one of X and Y is of pointer type and the other has integral type. So b[-1] is the same as *(b - 1), which is an expression that may or may not be evaluated in a well-formed program – it all depends on the initial value of b! For example, the following is perfectly fine:
int q[24];
int * b = q + 13;
b[-1] = 9;
assert(q[12] == 9);
In general, it is your responsibility as a programmer to guarantee that pointers have permissible values when you perform operations with them. If you get it wrong, your program has undefined behaviour. For example:
int * c = q; // q as above
c[-1] = 0; // undefined behaviour!
Finally, just to reinforce the original statement, the following is fine, too:
std::cout << 2["Good morning"] << 4["Stack"] << 8["Overflow\n"];

C++ pointer comparison with []operator for arrays?

I have been reading a book which says that accessing array elements by pointer arithmetic's is much faster than the [] operator. In short this code is faster than this code.
The book does not say why. Is it advisible to use such pointer arithmetic's even if it provides significant improvement in speed?
#include <iostream>
using namespace std;
int main() {
// your code goes here
double *array = new double[1000000];
for(int i = 0; i < 1000000; i++)
{
array[i] = 0;//slower?
}
delete[] array;
return 0;
}
#include <iostream>
using namespace std;
int main() {
// your code goes here
double *array = new double[1000000];
for(int i = 0; i < 1000000; i++)
{
*(array + i) = 0;//faster?
}
delete[] array;
return 0;
}
EDIT:
Quote from book pg 369, 2nd last line
The pointer accessing method is much faster than array indexing.
No, they are exactly the same thing. I definitely suggest you to drop that book and pick another one up as soon as possible.
And even if there was any performance difference, the clarity of x[12] over *(x + 12) is much more important.
Array indices are just syntactic sugar for pointer arithmetic. Your compiler will boil down a[i] into *((a) + (i)). Agreed, run away from that book!
For more in-depth explanations, see
SO Answer
Eli Bendersky's explanation
There is no difference at all, if we go to the draft C++ standard section 5.2.1 Subscripting paragraph 1 says (emphasis mine):
[...]The expression E1[E2] is identical (by definition) to *((E1)+(E2)) [Note: see 5.3 and 5.7 for details of * and + and 8.3.4 for details of arrays. —end note ]
Utter rubbish. a[x] on a plain array decays into *(a + x). There will literally be 0 performance difference.
The book is just plain wrong - especially if those are the actual examples they gave. Decent compilers are likely to produce identical code for both methods, even without optimization and they will have identical performance.
Without optimization, or with compilers from the 80s, you might get performance differences with some types of pointer arithmetic, but the examples don't even represent that case. The examples are basically just different syntax for the same thing.
Here's an example that could plausibly have a performance difference (versus the array index case which is unchanged):
int main() {
// your code goes here
double *array = new double[1000000], *ptr = array;
for(; ptr < array + 1000000; ptr++)
{
*ptr = 0;
}
return 0;
}
Here, you aren't indexing against the base pointer each time through the loop, but are incrementing the pointer each time. In theory, you avoid the multiplication implicit in indexing, resulting in a faster loop. In practice, any decent compiler can reduce the indexed form to the additive form, and on modern hardware the multiplication by sizeof(double) implied by indexing is often free as part of an instruction like lea (load effective address), so even at the assembly level the indexed version may not be slower (and may in fact be faster since it avoids a loop-carried dependency and also lends itself better to aliasing analysis).
Your two forms are the same, you're not really doing pointer arithmetic.
The pointer form would be:
double * array= new double[10000000] ;
double * dp= array ;
for ( int i= 0 ; ( i < 10000000 ) ; i ++, dp ++ )
{
* dp= 0 ;
}
Hear, the address in dp is moved to the next one via an add. In the other forms, the address is calculated each go through the loop by multiplying i time sizeof(double) and adding it to array. Its the multiply that historically was slower than the add.

Difference of two addresses in C [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Pointer Arithmetic
The given code
int arr[12];
int * cur = arr;
cur++;
cout<<"cur-arr = "<<cur-arr<<endl;
Outputs 1, but I expected sizeof(int). Can someone please explain the nature of this behavior?
It's a defined behavior of C pointer arithmetic. It uses the size of pointed type as a unit. If you change subtraction in the last line to
(char *)cur - (char *)arr
you get 4 in the output.
This is the number of elements (ints here) between arr and cur (which is arr+1 at the time of subtraction). Compiler takes note that cur is a pointer to an integer and arr is an integer array. To get total number of bytes, try this:
(cur - arr) * sizeof(arr[0]);
cur is a pointer to int, initialized to some value (arr - the semantics of array-to-pointer conversion are irrelevant here), incremented (cur++) and compared to its old value. Unsurprisingly, it grew by one through the increment operation.
Pointer arithmetic with a given type works just like regular arithmetic. While the pointer is advanced by sizeof(int) bytes in this example, the difference between pointers is also calculated in units of sizeof(int), so you see plain simple arithmetics.
Addition and substraction for pointers works in accordance to the pointer type.

assignment operation confusion

What is the output of the following code:
int main() {
int k = (k = 2) + (k = 3) + (k = 5);
printf("%d", k);
}
It does not give any error, why? I think it should give error because the assignment operations are on the same line as the definition of k.
What I mean is int i = i; cannot compile.
But it compiles. Why? What will be the output and why?
int i = i compiles because 3.3.1/1 (C++03) says
The point of declaration for a name is immediately after its complete declarator and before its initializer
So i is initialized with its own indeterminate value.
However the code invokes Undefined Behaviour because k is being modified more than once between two sequence points. Read this FAQ on Undefined Behaviour and Sequence Points
int i = i; first defines the variable and then assigns a value to it. In C you can read from an uninitialized variable. It's never a good idea, and some compilers will issue a warning message, but it's possible.
And in C, assignments are also expressions. The output will be "10", or it would be if you had a 'k' there, instead of an 'a'.
Wow, I got 11 too. I think k is getting assigned to 3 twice and then once to 5 for the addition. Making it just int k = (k=2)+(k=3) yields 6, and int k = (k=2)+(k=4) yields 8, while int k = (k=2)+(k=4)+(k=5) gives 13. int k = (k=2)+(k=4)+(k=5)+(k=6) gives 19 (4+4+5+6).
My guess? The addition is done left to right. The first two (k=x) expressions are added, and the result is stored in a register or on the stack. However, since it is k+k for this expression, both values being added are whatever k currently is, which is the second expression because it is evaluated after the other (overriding its assignment to k). However, after this initial add, the result is stored elsewhere, so is now safe from tampering (changing k will not affect it). Moving from left to right, each successive addition reassigns k (not affected the running sum), and adds k to the running sum.

C++ array[index] vs index[array] [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
In C arrays why is this true? a[5] == 5[a]
Is the possibility of both array[index] and index[array] a compiler feature or a language feature. How is the second one possible?
The compiler will turn
index[array]
into
*(index + array)
With the normal syntax it would turn
array[index]
into
*(array + index)
and thus you see that both expressions evaluate to the same value. This holds for both C and C++.
From the earliest days of C, the expression a[i] was simply the address of a[0] added to i (scaled up by the size of a[0]) and then de-referenced. In fact, all these were equivalent:
a[i]
i[a]
*(a+i)
====
The only thing I'd be concerned about is the actual de-referencing. Whilst they all produce the same address, de-referencing may be a concern if the types of a and i are different.
For example:
int i = 4;
long a[9];
long x = a[i]; //get the long at memory location X.
long x = i[a]; //get the int at memory location X?
I haven't actually tested that behavior but it's something you may want to watch out for. If it does change what gets de-referenced, it's likely to cause all sorts of problems with arrays of objects as well.
====
Update:
You can probably safely ignore the bit above between the ===== lines. I've tested it under Cygwin with a short and a long and it seems okay, so I guess my fears were unfounded, at least for the basic cases. I still have no idea what happens with more complicated ones because it's not something I'm ever likely to want to do.
As Matthew Wilson discusses in Imperfect C++, this can be used to enforce type safety in C++, by preventing use of DIMENSION_OF()-like macros with instances of types that define the subscript operator, as in:
#define DIMENSION_OF_UNSAFE(x) (sizeof(x) / sizeof((x)[0]))
#define DIMENSION_OF_SAFER(x) (sizeof(x) / sizeof(0[(x)]))
int ints[4];
DIMENSION_OF_UNSAFE(ints); // 4
DIMENSION_OF_SAFER(ints); // 4
std::vector v(4);
DIMENSION_OF_UNSAFE(v); // gives impl-defined value; v likely wrong
DIMENSION_OF_SAFER(v); // does not compile
There's more to this, for dealing with pointers, but that requires some additional template smarts. Check out the implementation of STLSOFT_NUM_ELEMENTS() in the STLSoft libraries, and read about it all in chapter 14 of Imperfect C++.
edit: some of the commenters suggest that the implementation does not reject pointers. It does (as well as user-defined types), as illustrated by the following program. You can verify this by uncommented lines 16 and 18. (I just did this on Mac/GCC4, and it rejects both forms).
#include <stlsoft/stlsoft.h>
#include <vector>
#include <stdio.h>
int main()
{
int ar[1];
int* p = ar;
std::vector<int> v(1);
printf("ar: %lu\n", STLSOFT_NUM_ELEMENTS(ar));
// printf("p: %lu\n", STLSOFT_NUM_ELEMENTS(p));
// printf("v: %lu\n", STLSOFT_NUM_ELEMENTS(v));
return 0;
}
In C and C++ (with array being a pointer or array) it is a language feature: pointer arithmetic. The operation a[b] where either a or b is a pointer is converted into pointer arithmetic: *(a + b). With addition being symetrical, reordering does not change meaning.
Now, there are differences for non-pointers. In fact given a type A with overloaded operator[], then a[4] is a valid method call (will call A::operator ) but the opposite will not even compile.