This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Pointer Arithmetic
The given code
int arr[12];
int * cur = arr;
cur++;
cout<<"cur-arr = "<<cur-arr<<endl;
Outputs 1, but I expected sizeof(int). Can someone please explain the nature of this behavior?
It's a defined behavior of C pointer arithmetic. It uses the size of pointed type as a unit. If you change subtraction in the last line to
(char *)cur - (char *)arr
you get 4 in the output.
This is the number of elements (ints here) between arr and cur (which is arr+1 at the time of subtraction). Compiler takes note that cur is a pointer to an integer and arr is an integer array. To get total number of bytes, try this:
(cur - arr) * sizeof(arr[0]);
cur is a pointer to int, initialized to some value (arr - the semantics of array-to-pointer conversion are irrelevant here), incremented (cur++) and compared to its old value. Unsurprisingly, it grew by one through the increment operation.
Pointer arithmetic with a given type works just like regular arithmetic. While the pointer is advanced by sizeof(int) bytes in this example, the difference between pointers is also calculated in units of sizeof(int), so you see plain simple arithmetics.
Addition and substraction for pointers works in accordance to the pointer type.
Related
I have a question about pointers, and memory addresses:
Supposing I have the following code:
int * array = (int *) malloc(sizeof(int) * 4);
Now in array im storing a memory address, I know that c++ takes already care when adding +1 to this pointer it will add 4 bytes, but what If I want to add manually 4 bytes?
array + 0x004
If im correct this will lead to add 4*4 (16) bytes, but my Idea is to add manually those 4 bytes.
Why? Just playing around, i've tried this and I got a totally different result from what I expected, then i've researched and i've seen that c++ takes already care when you add +1 to a pointer (it sums 4 bytes in this case).
Any idea?
For a pointer p to a type T with value v, the expression p+n will (on most systems anyway) result in a pointer to the address v+n*sizeof(T). To get a fixed-byte offset to the pointer, you can first cast it to a character pointer, like this:
reinterpret_cast<T*>(reinterpret_cast<char*>(p) + n)
In c++, sizeof(char) is defined to be equal to 1.
Do note that accessing improperly aligned values can have large performance penalties.
Another thing to note is that, in general, casting pointers to different types is not allowed (called the strict aliasing rule), but an exception is explicitly made for casting any pointer type to char* and back.
The trick is convert the type of array into any pointer-type with a size of 1 Byte, or store the pointer value in an integer.
#include <stdint.h>
int* increment_1(int* ptr) {
//C-Style
return (int*)(((char*)ptr) + 4);
}
int* increment_2(int* ptr) {
//C++-Style
char* result = reinterpret_cast<char*>(ptr);
result += 4;
return reinterpret_cast<int*>(result);
}
int* increment_3(int* ptr) {
//Store in integer
intptr_t result = reinterpret_cast<intptr_t>(ptr);
result += 4;
return reinterpret_cast<int*>(result);
}
Consider that if you add an arbitrary number of bytes to an address of an object of type T, it no longer makes sense to use a pointer of type T, since there might not be an object of type T at the incremented memory address.
If you want to access a particular byte of an object, you can do so using a pointer to a char, unsigned char or std::byte. Such objects are the size of a byte, so incrementing behaves just as you would like. Furthermore, while rules of C++ disallow accessing objects using incompatible pointers, these three types are excempt of that rule and are allowed to access objects of any type.
So, given
int * array = ....
You can access the byte at index 4 like this:
auto ptr = reinterpret_cast<unsigned char*>(array);
auto byte_at_index_4 = ptr + 4;
array + 0x004
If im correct this will lead to add 4*4 (16) bytes
Assuming sizeof(int) happens to be 4, then yes. But size of int is not guaranteed to be 4.
I am looking at some old code of mine that manipulates data buffers. I have many places that have:
char *ptr1 ;
char *ptr2 ;
And then I need to find the number of bytes between the two.
int distance = ptr2 - ptr1 ;
I am getting a lot of warnings about truncation. What is the type of
ptr2 - ptr1
I have found a number of answers dealing with pointer arithmetic but, oddly not an answer to this particular question.
Pointer Arithmetic
The result of subtracting two pointers†is a std::ptrdiff_t. It is an implementation-defined signed integer; it could be larger than what an int could store.
See http://en.cppreference.com/w/cpp/types/ptrdiff_t for more information.
†You can only subtract pointers if they point to elements of the same array, otherwise it's UB.
This question already has answers here:
Pointer Arithmetic In C
(2 answers)
Closed 8 years ago.
A project I did last year involved pointer arithmetic. When I did that, I was able to treat pointers like memory addresses and add or subtract from them as I wanted. For example, if int* p == array[0], then you'd know that p + sizeof(int) would find array[1]. That doesn't seem to be the case anymore, as I have a relatively well-known interview question in front of me in which I have to debug the following code:
void
ReverseTheArray( const short *pArrayStart, int nArrayByteLength )
{
short const *pArrayEnd = (pArrayStart + nArrayByteLength);
while(pArrayStart != pArrayEnd)
{
short tmp = *pArrayStart;
*pArrayStart = *pArrayEnd;
*pArrayEnd = tmp;
pArrayStart++;
pArrayEnd--;
}
}
Note the last two lines - I would have bet that these were wrong because simply adding 1 to the pointer wouldn't work, you would need to add sizeof(short). But from testing the code it would seem I'm wrong - "pArrayStart++" adds sizeof(short) to the pointer, not 1.
When did this change? Can anyone give me some insight into what I'm wrong about so that I can not look stupid if I'm asked about this?
Edit: Okay - seems like it's always been that way. My bad.
The type of pointer is merely for this purpose. "Pointer to int" means adding one would skip 4 bytes (if int is 4 bytes on that machine.
Update: (Of course, in addition to explaining what type of data it is pointing to).
This question already has answers here:
Access array beyond the limit in C and C++ [duplicate]
(7 answers)
How dangerous is it to access an array out of bounds?
(12 answers)
Closed 9 years ago.
Say I have an array like so:
int val[10];
and I intentionally index it with everything from negative values to anything higher than 9, but WITHOUT using the resulting value in any way. This would be for performance reasons (perhaps it's more efficient to check the input index AFTER the array access has been made).
My questions are:
Is it safe to do so, or will I run into some sort of memory protection barriers, risk corrupting memory or similar for certain indices?
Is it perhaps not at all efficient if I access data out of range like this? (assuming the array has no built in range check).
Would it be considered bad practice? (assuming a comment is written to indicate we're aware of using out of range indices).
It is undefined behavior. By definition, undefined means "anything could happen." Your code could crash, it could work perfectly, it could bring about peace and harmony amongst all humans. I wouldn't bet on the second or the last.
It is Undefined Behavior, and you might actually run afoul of the optimizers.
Imagine this simple code example:
int select(int i) {
int values[10] = { .... };
int const result = values[i];
if (i < 0 or i > 9) throw std::out_of_range("out!");
return result;
}
And now look at it from an optimizer point of view:
int values[10] = { ... };: valid indexes are in [0, 9].
values[i]: i is an index, thus i is in [0, 9].
if (i < 0 or i > 9) throw std::out_of_range("out!");: i is in [0, 9], never taken
And thus the function rewritten by the optimizer:
int select(int i) {
int values[10] = { ... };
return values[i];
}
For more amusing stories about forward and backward propagation of assumptions based on the fact that the developer is not doing anything forbidden, see What every C programmer should know about Undefined Behavior: Part 2.
EDIT:
Possible work-around: if you know that you will access from -M to +N you can:
declare the array with appropriate buffer: int values[M + 10 + N]
offset any access: values[M + i]
As verbose said, this yields undefined behavior. A bit more precision follows.
5.2.1/1 says
[...] The expression E1[E2] is identical (by definition) to *((E1)+(E2))
Hence, val[i] is equivalent to *((val)+i)). Since val is an array, the array-to-pointer conversion (4.2/1) occurs before the addition is performed. Therefore, val[i] is equivalent to *(ptr + i) where ptr is an int* set to &val[0].
Then, 5.7/2 explains what ptr + i points to. It also says (emphasis are mine):
[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
In the case of ptr + i, ptr is the pointer operand and the result is ptr + i. According to the quote above, both should point to an element of the array or to one past the last element. That is, in the OP's case ptr + i is a well defined expression for all i = 0, ..., 10. Finally, *(ptr + i) is well defined for 0 <= i < 10 but not for i = 10.
Edit:
I'm puzzled to whether val[10] (or, equivalently, *(ptr + 10)) yields undefined behavior or not (I'm considering C++ not C). In some circumstances this is true (e.g. int x = val[10]; is undefined behavior) but in others this is not so clear. For instance,
int* p = &val[10];
As we have seen, this is equivalent to int* p = &*(ptr + 10); which could be undefined behavior (because it dereferences a pointer to one past the last element of val) or the same as int* p = ptr + 10; which is well defined.
I found these two references which show how fuzzy this question is:
May I take the address of the one-past-the-end element of an array?
Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?
If you put it in a structure with some padding ints, it should be safe (since the pointer actually points to "known" destinations).
But it's better to avoid it.
struct SafeOutOfBoundsAccess
{
int paddingBefore[6];
int val[10];
int paddingAfter[6];
};
void foo()
{
SafeOutOfBoundsAccess a;
bool maybeTrue1 = a.val[-1] == a.paddingBefore[5];
bool maybeTrue2 = a.val[10] == a.paddingAfter[0];
}
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
In C arrays why is this true? a[5] == 5[a]
Is the possibility of both array[index] and index[array] a compiler feature or a language feature. How is the second one possible?
The compiler will turn
index[array]
into
*(index + array)
With the normal syntax it would turn
array[index]
into
*(array + index)
and thus you see that both expressions evaluate to the same value. This holds for both C and C++.
From the earliest days of C, the expression a[i] was simply the address of a[0] added to i (scaled up by the size of a[0]) and then de-referenced. In fact, all these were equivalent:
a[i]
i[a]
*(a+i)
====
The only thing I'd be concerned about is the actual de-referencing. Whilst they all produce the same address, de-referencing may be a concern if the types of a and i are different.
For example:
int i = 4;
long a[9];
long x = a[i]; //get the long at memory location X.
long x = i[a]; //get the int at memory location X?
I haven't actually tested that behavior but it's something you may want to watch out for. If it does change what gets de-referenced, it's likely to cause all sorts of problems with arrays of objects as well.
====
Update:
You can probably safely ignore the bit above between the ===== lines. I've tested it under Cygwin with a short and a long and it seems okay, so I guess my fears were unfounded, at least for the basic cases. I still have no idea what happens with more complicated ones because it's not something I'm ever likely to want to do.
As Matthew Wilson discusses in Imperfect C++, this can be used to enforce type safety in C++, by preventing use of DIMENSION_OF()-like macros with instances of types that define the subscript operator, as in:
#define DIMENSION_OF_UNSAFE(x) (sizeof(x) / sizeof((x)[0]))
#define DIMENSION_OF_SAFER(x) (sizeof(x) / sizeof(0[(x)]))
int ints[4];
DIMENSION_OF_UNSAFE(ints); // 4
DIMENSION_OF_SAFER(ints); // 4
std::vector v(4);
DIMENSION_OF_UNSAFE(v); // gives impl-defined value; v likely wrong
DIMENSION_OF_SAFER(v); // does not compile
There's more to this, for dealing with pointers, but that requires some additional template smarts. Check out the implementation of STLSOFT_NUM_ELEMENTS() in the STLSoft libraries, and read about it all in chapter 14 of Imperfect C++.
edit: some of the commenters suggest that the implementation does not reject pointers. It does (as well as user-defined types), as illustrated by the following program. You can verify this by uncommented lines 16 and 18. (I just did this on Mac/GCC4, and it rejects both forms).
#include <stlsoft/stlsoft.h>
#include <vector>
#include <stdio.h>
int main()
{
int ar[1];
int* p = ar;
std::vector<int> v(1);
printf("ar: %lu\n", STLSOFT_NUM_ELEMENTS(ar));
// printf("p: %lu\n", STLSOFT_NUM_ELEMENTS(p));
// printf("v: %lu\n", STLSOFT_NUM_ELEMENTS(v));
return 0;
}
In C and C++ (with array being a pointer or array) it is a language feature: pointer arithmetic. The operation a[b] where either a or b is a pointer is converted into pointer arithmetic: *(a + b). With addition being symetrical, reordering does not change meaning.
Now, there are differences for non-pointers. In fact given a type A with overloaded operator[], then a[4] is a valid method call (will call A::operator ) but the opposite will not even compile.