I am using a code sample to check the distance between two integers like in the answer of this question.
int i = 0, j = 0;
std::cout << &i - &j;
From my understanding of the memory representation, these memory addresses of these two variables should be next to each other and the difference should be exactly 1.
To my surprise, running this code with MS compiler in VS2017 prints 3 and running the same code with GCC prints 1.
Why this happens, is something wrong with VS?
C++ standard does not make any requirements for C++ compilers to allocate variables with automatic storage duration in any particular way, including making them contiguous in memory. In fact, compiler may choose to not allocate any memory to a variable, optimizing it out completely.
That is why subtracting pointers makes sense only when they both point to memory inside the same array, or one element past the end of it. In all other situations, including yours, you get undefined behavior.
The pointer arithmetic you tried has undefined behavior:
If the pointer P points to the ith element of an array, and the
pointer Q points at the jth element of the same array, the
expression P-Q has the value i-j, if the value fits in std::ptrdiff_t.
Both operands must point to the elements of the same array (or one
past the end), otherwise the behavior is undefined. If the result does
not fit in std::ptrdiff_t, the behavior is undefined.
Related
I am using a code sample to check the distance between two integers like in the answer of this question.
int i = 0, j = 0;
std::cout << &i - &j;
From my understanding of the memory representation, these memory addresses of these two variables should be next to each other and the difference should be exactly 1.
To my surprise, running this code with MS compiler in VS2017 prints 3 and running the same code with GCC prints 1.
Why this happens, is something wrong with VS?
C++ standard does not make any requirements for C++ compilers to allocate variables with automatic storage duration in any particular way, including making them contiguous in memory. In fact, compiler may choose to not allocate any memory to a variable, optimizing it out completely.
That is why subtracting pointers makes sense only when they both point to memory inside the same array, or one element past the end of it. In all other situations, including yours, you get undefined behavior.
The pointer arithmetic you tried has undefined behavior:
If the pointer P points to the ith element of an array, and the
pointer Q points at the jth element of the same array, the
expression P-Q has the value i-j, if the value fits in std::ptrdiff_t.
Both operands must point to the elements of the same array (or one
past the end), otherwise the behavior is undefined. If the result does
not fit in std::ptrdiff_t, the behavior is undefined.
This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 4 years ago.
I was writing code and realized that I can "access" elements of an array that are at the same or greater index than the size of the array. Why doesn't this produce an error?
For example,
#include <iostream>
using namespace std;
int main ()
{
int b_array[5] = {1, 2, 3, 4, 5};
cout << b_array[5] << endl // Returns 0
<< b_array[66] << endl; // Returns some apparently random value.
return 0;
}
The only technical answer is "because the C++ language specification say that". Accessing an out-of-bounds value is undefined behavior. Your personal taste are irrelevant.
Behind the "undefined behaviors" (there are many in the C++ specs) there is the need to let compiler developer to implement different optimizations depending on the platform they have to run on.
If you consider that indexes are often used in loops, if you check the bounds, you end up with a check for each iteration, always succeeding (thus wasting processor time).
C++ does not implement boundary checking due to the performance penalties that incurs.
For example the vector template contains an at()function which checks for boundary, but is ~5 times slower than the [] operator.
Low-level language tend to force the programmer to produce safe and error free code in return for high performance.
Although there are simple cases like your where compilers and/or static analyzers could detect that an access is out of bounds, doing it in general at compile-time is not doable. For example, if you pass off your array to a function it immediately decays into a pointer and the compiler has no chance to do bounds checking at compile-time.
The alternative, run-time bounds checking, is comparatively expensive: doing a check upon each access would turn a simple memory dereference into a potentially stalling branch. To make things even harder, you can use the dereference operator on pointers, i.e., you can't even know easily where to locate the size of the actual array object.
As a result, the behavior of out of bounds array accesses is deliberately made undefined: a system can track these accesses but it doesn't have to. Also, what the system actually does upon an out of bounds array access is not specified, i.e., it can do different things depending on the context. In many cases, it will just return junk which isn't really too useful. However, especially with suitable debug settings the system may instead assert() upon detecting a violation.
C++ allows direct memory access to your program. There are no boundary checks done for you. This can be cause of very nasty bugs, but it's also very efficient as compared to other "safer" languages.
An array is nothing but a pointer to a memory location. The index that you are trying to access, such as index 66 in array [66], is resolved by adding 66 * sizeof(int) to the starting address of the array. Whether the finally calculated address is within some bounds or not is beyond the things checked by the compiler.
In other words, array [i] is same as *(array + i) in C++. In fact, you might be surprised that array [i] can also be written as i [array]!
The following example is from Wikipedia.
int arr[4] = {0, 1, 2, 3};
int* p = arr + 5; // undefined behavior
If I never dereference p, then why is arr + 5 alone undefined behaviour? I expect pointers to behave as integers - with the exception that when dereferenced the value of a pointer is considered as a memory address.
That's because pointers don't behave like integers. It's undefined behavior because the standard says so.
On most platforms however (if not all), you won't get a crash or run into dubious behavior if you don't dereference the array. But then, if you don't dereference it, what's the point of doing the addition?
That said, note that an expression going one over the end of an array is technically 100% "correct" and guaranteed not to crash per §5.7 ¶5 of the C++11 spec. However, the result of that expression is unspecified (just guaranteed not to be an overflow); while any other expression going more than one past the array bounds is explicitly undefined behavior.
Note: That does not mean it is safe to read and write from an over-by-one offset. You likely will be editing data that does not belong to that array, and will cause state/memory corruption. You just won't cause an overflow exception.
My guess is that it's like that because it's not only dereferencing that's wrong. Also pointer arithmetics, comparing pointers, etc. So it's just easier to say don't do this instead of enumerating the situations where it can be dangerous.
The original x86 can have issues with such statements. On 16 bits code, pointers are 16+16 bits. If you add an offset to the lower 16 bits, you might need to deal with overflow and change the upper 16 bits. That was a slow operation and best avoided.
On those systems, array_base+offset was guaranteed not to overflow, if offset was in range (<=array size). But array+5 would overflow if array contained only 3 elements.
The consequence of that overflow is that you got a pointer which doesn't point behind the array, but before. And that might not even be RAM, but memory-mapped hardware. The C++ standard doesn't try to limit what happens if you construct pointers to random hardware components, i.e. it's Undefined Behavior on real systems.
If arr happens to be right at the end of the machine's memory space then arr+5 might be outside that memory space, so the pointer type might not be able to represent the value i.e. it might overflow, and overflow is undefined.
"Undefined behavior" doesn't mean it has to crash on that line of code, but it does mean that you can't make any guaranteed about the result. For example:
int arr[4] = {0, 1, 2, 3};
int* p = arr + 5; // I guess this is allowed to crash, but that would be a rather
// unusual implementation choice on most machines.
*p; //may cause a crash, or it may read data out of some other data structure
assert(arr < p); // this statement may not be true
// (arr may be so close to the end of the address space that
// adding 5 overflowed the address space and wrapped around)
assert(p - arr == 5); //this statement may not be true
//the compiler may have assigned p some other value
I'm sure there are many other examples you can throw in here.
Some systems, very rare systems and I can't name one, will cause traps when you increment past boundaries like that. Further, it allows an implementation that provides boundary protection to exist...again though I can't think of one.
Essentially, you shouldn't be doing it and therefor there's no reason to specify what happens when you do. Specifying what happens puts unwarranted burden on the implementation provider.
This result you are seeing is because of the x86's segment-based memory protection. I find this protection to be justified as when you are incrementing the pointer address and storing, It means at future point of time in your code you will be dereferencing the pointer and using the value. So compiler wants to avoid such kind of situations where you will end up changing some other's memory location or deleting the memory which is being owned by some other guy in your code. To avoid such scenario's compiler has put the restriction.
In addition to hardware issues, another factor was the emergence of implementations which attempted to trap on various kinds of programming errors. Although many such implementations could be most useful if configured to trap on constructs which a program is known not to use, even though they are defined by the C Standard, the authors of the Standard did not want to define the behavior of constructs which would--in many programming fields--be symptomatic of errors.
In many cases, it will be much easier to trap on actions which use pointer arithmetic to compute address of unintended objects than to somehow record the fact that the pointers cannot be used to access the storage they identify, but could be modified so that they could access other storage. Except in the case of arrays within larger (two-dimensional) arrays, an implementation would be allowed to reserve space that's "just past" the end of every object. Given something like doSomethingWithItem(someArray+i);, an implementation could trap any attempt to pass any address which doesn't point to either an element of the array or the space just past the last element. If the allocation of someArray reserved space for an extra unused element, and doSomethingWithItem() only accesses the item to which it receives a pointer, the implementation could relatively inexpensively ensure that any non-trapped execution of the above code could--at worst--access otherwise-unused storage.
The ability to compute "just-past" addresses makes bounds checking more difficult than it otherwise would be (the most common erroneous situation about would be passing doSomethingWithItem() a pointer just past the end of the array, but behavior would be defined unless doSomethingWithItem would try to dereference that pointer--something the caller may be unable to prove). Because the Standard would allow compilers to reserve space just past the array in most cases, however, such allowance would allow implementations to limit the damage caused by untrapped errors--something that would likely not be practical if more generalized pointer arithmetic were allowed.
If it is legal to take the address one past the end of an array, how would I do this if the last element of array's address is 0xFFFFFFFF?
How would this code work:
for (vector<char>::iterator it = vector_.begin(), it != vector_.end(); ++it)
{
}
Edit:
I read here that it is legal before making this question: May I take the address of the one-past-the-end element of an array?
If this situation is a problem for a particular architecture (it may or may not be), then the compiler and runtime can be expected to arrange that allocated arrays never end at 0xFFFFFFFF. If they were to fail to do this, and something breaks when an array does end there, then they would not conform to the C++ standard.
Accessing out of the array boundaries is undefined behavior. You shouldn't be surprised if a demon flies out of your nose (or something like that)
What might actually happen would be an overflow in the address which could lead to you reading address zero and hence segmentation fault.
If you are always within the array range, and you do the last ++it which goes out of the array and you compare it against _vector.end(), then you are not really accessing anything and there should not be a problem.
I think there is a good argument for suggesting that a conformant C implementation cannot allow an array to end at (e.g.) 0xFFFFFFFF.
Let p be a pointer to one-element-off-the-end-of-the-array: if buffer is declared as char buffer[BUFFSIZE], then p = buffer+BUFFSIZE, or p = &buffer[BUFFSIZE]. (The latter means the same thing, and its validity was made explicit in the C99 standard document.)
We then expect the ordinary rules of pointer comparison to work, since the initialization of p was an ordinary bit of pointer arithmetic. (You cannot compare arbitrary pointers in standard C, but you can compare them if they are both based in a single array, memory buffer, or struct.) But if buffer ended at 0xFFFFFFFF, then p would be 0x00000000, and we would have the unlikely situation that p < buffer!
This would break a lot of existing code which assumes that, in valid pointer arithmetic done relative to an array base, the intuitive address-ordering property holds.
It's not legal to access one past the end of an array
that code doesn't actually access that address.
and you will never get an address like that on a real system for you objects.
The difference is between dereferencing that element and taking its address. In your example the element past the end wont be dereferenced and so it is a valid. Although this was not really clear in the early days of C++ it is clear now. Also the value you pass to subscript does not really matter.
Sometimes the best thing you can do about corner cases is forbid them. I saw this class of problem with some bit field extraction instructions of the NS32032 in which the hardware would load 32 bits starting at the byte address and extract from that datum. So even single-bit fields anywhere in the last 3 bytes of mapped memory would fail. The solution was to never allow the last 4 bytes of memory to be available for allocation.
Quite a few architectures that would be affected by this solve the problem by reserving offset 0xFFFFFFFF (and a bit more) for the OS.
Obviously, dereferencing an invalid pointer causes undefined behavior. But what about simply storing an invalid memory address in a pointer variable?
Consider the following code:
const char* str = "abcdef";
const char* begin = str;
if (begin - 1 < str) { /* ... do something ... */ }
The expression begin - 1 evaluates to an invalid memory address. Note that we don't actually dereference this address - we simply use it in pointer arithmetic to test if it is valid. Nonetheless, we still have to load an invalid memory address into a register.
So, is this undefined behavior? I never thought it was, since a lot of pointer arithmetic seems to rely on this sort of thing, and a pointer is really nothing but an integer anyway. But recently I heard that even the act of loading an invalid pointer into a register is undefined behavior, since certain architectures will automatically throw a bus error or something if you do that. Can anyone point me to the relevant part of the C or C++ standard which settles this either way?
I have the C Draft Standard here, and it makes it undefined by omission. It defines the case of ptr + I at 6.5.6/8 for
If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression.
Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object.
Your case does not fit any of these. Neither is your array large enough to have -1 adjust the pointer to point to a different array element, nor does any of the result or original pointer point one-past-end.
Your code is undefined behavior for a different reason:
the expression begin - 1 does not yield an invalid pointer. It is undefined behavior. You are not allowed to perform pointer arithmetics beyond the bounds of the array you're working on. So it is the subtraction itself that is invalid, and not the act of storing the resulting pointer.
Some architectures have dedicated registers for holding pointers. Putting the value of an unmapped address into such a register is allowed to crash. Integer overflow/underflow is allowed to crash. Because C aims to work on a broad variety of platforms, pointers provide a mechanism for safely programming unsafe circuits.
If you know you won't be running on exotic hardware with such finicky characteristics, you don't need to worry about what is undefined by the language. It is well-defined by the platform.
Of course, the example is poor style and there isn't a good reason to do it.
Any use of an invalid pointer yields undefined behaviour. I don't have the C Standard here at work, but see 'invalid pointers' in the Rationale: http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf
$5.7/6 - "Unless both pointers point
to elements of the same array object,
or one past the last element of the
array object, the behavior is
undefined.75)"
Summary, it is undefined even if you do not dereference the pointer.
The correct answers have been given years ago, but I find it interesting that the C99 rationale [sec. 6.5.6, last 3 paragraphs] explains why the standard endorses adding 1 to a pointer that points to the last element of an array (p+1):
An important endorsement of widespread practice is the requirement that a pointer can always be incremented to just past the end of an array, with no fear of overflow or wraparound
and why p-1 is not endorsed:
In the case of p-1, on the other hand, an entire object would have to be allocated prior to the array of objects that p traverses, so decrement loops that run off the bottom of an array can fail. This restriction allows segmented architectures, for instance, to place objects at the start of a range of addressable memory.
So if the pointer p points to an object at the start of a range of addressable memory, which is endorsed by this comment, then p-1 would generate an underflow.
Note that integer overflow is the standard's example for undefined behavior [sec. 3.4.3], as it depends on the translation environment and the operating environment. I believe it is easy to see that this dependence on the environment extends to pointer underflow.
This is why the standard explicitly makes it undefined behavior [in 6.5.6/8], as noted by other answers here. To cite that sentence:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
See also [sec. 6.3.2.3, last 4 paragraphs] of the C99 rationale, which gives a more detailed description of how invalid pointers can be generated, and what effects that may have.
Yes, it's undefined behavior. See the accepted answer to this closely related question. Assigning an invalid pointer to a variable, comparing an invalid pointer, casting an invalid pointer triggers undefined behavior.