pointing to a memory address outside an array boundary [duplicate] - c++

This question already has answers here:
Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?
(13 answers)
Access element beyond the end of an array in C
(4 answers)
Closed 8 years ago.
I was wondering if accessing an array outside its boundary(line 2 in the following code sample) would ever produce an error?
int a[20];
int* ptr = &a[20]; // line 2
int count=20;
do
{
ptr--;
printf("%d",*ptr);
}while(--count!=0);

According to C Traps and Pitfalls:
But how can it make sense to refer to an element that doesn't exist?
Fortunately we do not have to refer to this element, merely to its
address, and that address does exist in every C implementation we have
encountered. Moreover, ANSI C explicitly permits this usage: the address
of the nonexistent element just past the end of an array may be taken and
used for assignment and comparison purposes. Of course it is illegal
actually to refer to that element!

Trying to access memory beyond the end of the array is undefined behavior. However, It is perfectly legal to have a pointer to point at one element beyond the end of the array.
The distinction between pointing to a certain address and accessing it is important.
For example, you can use the following to find the size of an array arr
int size = (&arr)[1] - arr;

Accessing memory address outside an array boundary may not crash the the process always. But certainly it will corrupt the address space if you modify the data. Also, sometime memory address outside an array boundary will really lie outside the process address space, that will result in seg-fault.
Anyway, this is a dangerous. It will introduce hard to narrow-down error. The memory corruption, or weird behavior of the program will manifest at some other place, and we will spend hours and hours finding the needle in the haystack.

Modifying the address past array boundary will never produce a compiler error, since C compiler doesn't treat it as error.
However on some machines if this address is not allowed to be accessed, you may get a Runtime Error (Seg Fault).
This is similar to writing to address 0 (NULL).
On a microprocessor/microcntroller/embedded devices on some architectures, you may be allowed to write at address 0, it is perfectly valid but on some other machines you may get SEGFAULT.
That is why this is also termed as undefined behavior.

Your code does not access memory beyond the end of the array.
so changes your code little. Count changed to 100 from 20
#include<stdio.h>
int main() {
int a[20];
int* ptr = &a[20]; // line 2
int count=100;
do
{
ptr--;
printf("%d",*ptr);
}while(--count!=0);
return 0;
}
see http://codepad.org/X8yqrnDC
NO at compilation it will not give any error.
But at run time it might get segmentation fault.
as per standard its undefined behavior

Related

hex subtraction result is not shown as expected [duplicate]

I am using a code sample to check the distance between two integers like in the answer of this question.
int i = 0, j = 0;
std::cout << &i - &j;
From my understanding of the memory representation, these memory addresses of these two variables should be next to each other and the difference should be exactly 1.
To my surprise, running this code with MS compiler in VS2017 prints 3 and running the same code with GCC prints 1.
Why this happens, is something wrong with VS?
C++ standard does not make any requirements for C++ compilers to allocate variables with automatic storage duration in any particular way, including making them contiguous in memory. In fact, compiler may choose to not allocate any memory to a variable, optimizing it out completely.
That is why subtracting pointers makes sense only when they both point to memory inside the same array, or one element past the end of it. In all other situations, including yours, you get undefined behavior.
The pointer arithmetic you tried has undefined behavior:
If the pointer P points to the ith element of an array, and the
pointer Q points at the jth element of the same array, the
expression P-Q has the value i-j, if the value fits in std::ptrdiff_t.
Both operands must point to the elements of the same array (or one
past the end), otherwise the behavior is undefined. If the result does
not fit in std::ptrdiff_t, the behavior is undefined.

Creating an array using new without declaring size [duplicate]

This question already has answers here:
Accessing an array out of bounds gives no error, why?
(18 answers)
Closed 3 years ago.
This has been bugging me for quite some time. I have a pointer. I declare an array of type int.
int* data;
data = new int[5];
I believe this creates an array of int with size 5. So I'll be able to store values from data[0] to data[4].
Now I create an array the same way, but without size.
int* data;
data = new int;
I am still able to store values in data[2] or data[3]. But I created an array of size 1. How is this possible?
I understand that data is a pointer pointing to the first element of the array. Though I haven't allocated memory for the next elements, I still able to access them. How?
Thanks.
Normally, there is no need to allocate an array "manually" with new. It is just much more convenient and also much safer to use std::vector<int> instead. And leave the correct implementation of dynamic memory management to the authors of the standard library.
std::vector<int> optionally provides element access with bounds checking, via the at() method.
Example:
#include <vector>
int main() {
// create resizable array of integers and resize as desired
std::vector<int> data;
data.resize(5);
// element access without bounds checking
data[3] = 10;
// optionally: element access with bounds checking
// attempts to access out-of-range elements trigger runtime exception
data.at(10) = 0;
}
The default mode in C++ is usually to allow to shoot yourself in the foot with undefined behavior as you have seen in your case.
For reference:
https://en.cppreference.com/w/cpp/container/vector
https://en.cppreference.com/w/cpp/container/vector/at
https://en.cppreference.com/w/cpp/language/ub
Undefined, unspecified and implementation-defined behavior
What are all the common undefined behaviours that a C++ programmer should know about?
Also, in the second case you don't allocate an array at all, but a single object. Note that you must use the matching delete operator too.
int main() {
// allocate and deallocate an array
int *arr = new int[5];
delete[] arr;
// allocate and deallocate a single object
int *p = new int;
delete p;
}
For reference:
https://en.cppreference.com/w/cpp/language/new
https://en.cppreference.com/w/cpp/language/delete
How does delete[] know it's an array?
When you used new int then accessing data[i] where i!=0 has undefined behaviour.
But that doesn't mean the operation will fail immediately (or every time or even ever).
On most architectures its very likely that the memory addresses just beyond the end of the block you asked for are mapped to your process and you can access them.
If you're not writing to them it's no surprise you can access them (though you shouldn't).
Even if you write to them most memory allocators have a minimum allocation and behind the scenes you may well have been allocated space for more (4 is realistic) integers even though the code only requests 1.
You may also be overwriting some area of memory but never get tripped up. A common consequence of writing beyond the end of an array is to corrupt the free-memory store itself. The consequence may be catastrophe but may only exhibit itself in a later allocation possibly of a similar sized object.
It's a dreadful idea to rely on such behaviour but it's not very surprising that it appears to work.
C++ doesn't (typically or by default) perform strict range checking and accessing invalid array elements may work or at least appear to work initially.
This is why C and C++ can be plagued with bizarre and intermittent errors. Not all code that provokes undefined behaviour fails catastrophically in every execution.
Going outside the bounds of an array in C++ is undefined behavior, so anything can happen, including things that appear to work "correctly".
In practical implementation terms on common systems, you can think of "virtual" memory as a large "flat" space from 0 up to the size of a pointer, and pointers are into this space.
The "virtual" memory for a process is mapped to physical memory, page file, etc. Now, if you access an address that is not mapped, or try to write a read-only part, you will get an error, such as an access violation or segfault.
But this mapping is done for fairly large chunks for efficiency, such as for 4KiB "pages". The allocators in a process, such as new and delete (or the stack) will further split up these pages as required. So accessing other parts of a valid page are unlikely to raise an error.
This has the unfortunate result that it can be hard to detect such out of bounds access, use after free, etc. In many cases writes will succeed, only to corrupt some other seemingly unrelated object, which may cause a crash later, or incorrect program output, so best to be very careful about C and C++ memory management.
data = new int; // will be some virtual address
data[1000] = 5; // possibly the start of a 4K page potentially allowing a great deal beyond it
other_int = new int[5];
other_int[10] = 10;
data[10000] = 42; // with further pages beyond, so you can really make a mess of your programs memory
other_int[10] == 42; // perfectly possible to overwrite other things in unexpected ways
C++ provides many tools to help, such as std::string, std::vector and std::unique_ptr, and it is generally best to try and avoid manual new and delete entirely.
new int allocates 1 integer only. If you access offsets larger than 0, e.g. data[1] you override the memory.
int * is a pointer to something that's probably an int. When you allocate using new int , you're allocating one int and storing the address to the pointer. In reality, int * is just a pointer to some memory.
We can treat an int * as a pointer to a scalar element (i.e. new int) or an array of elements -- the language has no way of telling you what your pointer is really pointing to; a very good argument to stop using pointers and only using scalar values and std::vector.
When you say a[2], you well access the memory sizeof(int) after the value pointed to by a. If a is pointing to a scalar value, anything could be after a and reading it causes undefined behaviour (your program might actually crash -- this is an actual risk). Writing to that adress will most likley cause problems; it is not merely a risk, but something you should actively guard against -- i.e. use std::vector if you need an array and int or int& if you don't.
The expression a[b], where one of the operands is a pointer, is another way to write *(a+b). Let's for the sake of sanity assume that a is the pointer here (but since addition is commutative it can be the other way around! try it!); then the address in a is incremented by b times sizeof(*a), resulting in the address of the bth object after *a.
The resulting pointer is dereferenced, resulting in a "name" for the object whose address is a+b.
Note that a does not have to be an array; if it is one, it "decays" to a pointer before the operator [] is applied. The operation is taking place on a typed pointer. If that pointer is invalid, or if the memory at a+b does not in fact hold an object of the type of *a, or even if that object is unrelated to *a (e.g., because it is not in the same array or structure), the behavior is undefined.
In the real world, "normal" programs do not do any bounds checking but simply add the offset to the pointer and access that memory location. (Accessing out-of-bounds memory is, of course, one of the more common bugs in C and C++, and one of the reasons these languages are not without restrictions recommended for high-security applications.)
If the index b is small, the memory is probably accessible by your program. For plain old data like int the most likely result is then that you simply read or write the memory in that location. This is what happened to you.
Since you overwrite unrelated data (which may in fact be used by other variables in your program) the results are often surprising in more complex programs. Such errors can be hard to find, and there are tools out there to detect such out-of-bounds access.
For larger indices you'll at some point end up in memory which is not assigned to your program, leading to an immediate crash on modern systems like Windows NT and up, and unpredictable results on architectures without memory management.
I am still able to store values in data[2] or data[3]. But I created an array of size 1. How is this possible?
The behaviour of the program is undefined.
Also, you didn't create an array of size 1, but a single non-array object instead. The difference is subtle.

Why two variables declared one after another are not next to each other in memory?

I am using a code sample to check the distance between two integers like in the answer of this question.
int i = 0, j = 0;
std::cout << &i - &j;
From my understanding of the memory representation, these memory addresses of these two variables should be next to each other and the difference should be exactly 1.
To my surprise, running this code with MS compiler in VS2017 prints 3 and running the same code with GCC prints 1.
Why this happens, is something wrong with VS?
C++ standard does not make any requirements for C++ compilers to allocate variables with automatic storage duration in any particular way, including making them contiguous in memory. In fact, compiler may choose to not allocate any memory to a variable, optimizing it out completely.
That is why subtracting pointers makes sense only when they both point to memory inside the same array, or one element past the end of it. In all other situations, including yours, you get undefined behavior.
The pointer arithmetic you tried has undefined behavior:
If the pointer P points to the ith element of an array, and the
pointer Q points at the jth element of the same array, the
expression P-Q has the value i-j, if the value fits in std::ptrdiff_t.
Both operands must point to the elements of the same array (or one
past the end), otherwise the behavior is undefined. If the result does
not fit in std::ptrdiff_t, the behavior is undefined.

C++ - operator -= on a pointer [duplicate]

This question already has answers here:
C/C++: Pointer Arithmetic
(7 answers)
Closed 5 years ago.
Say I have pointer array:
char* buf = new char[256];
what will happen to array pointer's value/size if i do
buf -= 100;
+, +=, - and -= move a pointer around.
The pointer char* ptr = new char[256] points at the start of a block of 256 bytes.
So ptr+10 is now a pointer poimting 10 in, and ptr+=10 is the same as ptr = ptr+10.
This is called pointer arithmetic.
In C++, poimter arithmetic is no longer valid if the result takes you out of the object the pointer is pointing within, or one-place-past-the-end. So ptr-0 to ptr+256 are the only valid places you are allowed to generate from ptr.
ptr-=100 is undefined behaviour.
In C++, most implementations currently active implement pointers as unsigned integer indexes into a flat address space at runtime. This still doesn't mean you can rely on this fact while doing pointer arithmetic. Each pointer has a range of validity, and going outside of it the C++ standard no longer defines what anything in your program does (UB).
Undefined Behaviour doesn't just mean "could segfault"; the compiler is free to do anything, and there are instances of compilers optimizing entire branches of code out because the only way to reach them required UB, or because it proved that if you reached them UB would occur. UB makes the correctness of your program bassically impossible to reason about.
That being said, 99/100+ times what will happen is that ptr-=100 now points to a different part of the heap than it did when initialized, and reading/writing to what it points at will result in getting junk, corrupting memory, and/or segfaulting. And doing a +=100 will bring ptr back to the valid range.
The block of memory won't be bothered by moving ptr, just ptr won't be pointing within it.
The standard says that even just trying to calculate a pointer that goes outside the actual boundaries of an array (and the "one past last" value, which is explicitly allowed, although not for dereferencing) is undefined behavior, i.e. anything can happen.
On some bizarre machines, even just this calculation may make the program crash (they had registers specifically for pointers, that trapped in case they pointed to non mapped pages/invalid addresses).
In practice, on non-patological machines calculating that pointer won't do much - you'll probably just obtain a pointer to an invalid memory location (which may crash your program when trying to dereference it) or, worse, to memory you don't own (so you may overwrite unrelated data).
The only case where that cose may be justified is if you have "insider knowledge" about the memory allocator (or you have actually replaced it with your own, e.g. providing your own override of the global new operator), and you know that the actual array starts before the returned address - possibly storing there extra data.

How do I take the address of one past the end of an array if the last address is 0xFFFFFFFF?

If it is legal to take the address one past the end of an array, how would I do this if the last element of array's address is 0xFFFFFFFF?
How would this code work:
for (vector<char>::iterator it = vector_.begin(), it != vector_.end(); ++it)
{
}
Edit:
I read here that it is legal before making this question: May I take the address of the one-past-the-end element of an array?
If this situation is a problem for a particular architecture (it may or may not be), then the compiler and runtime can be expected to arrange that allocated arrays never end at 0xFFFFFFFF. If they were to fail to do this, and something breaks when an array does end there, then they would not conform to the C++ standard.
Accessing out of the array boundaries is undefined behavior. You shouldn't be surprised if a demon flies out of your nose (or something like that)
What might actually happen would be an overflow in the address which could lead to you reading address zero and hence segmentation fault.
If you are always within the array range, and you do the last ++it which goes out of the array and you compare it against _vector.end(), then you are not really accessing anything and there should not be a problem.
I think there is a good argument for suggesting that a conformant C implementation cannot allow an array to end at (e.g.) 0xFFFFFFFF.
Let p be a pointer to one-element-off-the-end-of-the-array: if buffer is declared as char buffer[BUFFSIZE], then p = buffer+BUFFSIZE, or p = &buffer[BUFFSIZE]. (The latter means the same thing, and its validity was made explicit in the C99 standard document.)
We then expect the ordinary rules of pointer comparison to work, since the initialization of p was an ordinary bit of pointer arithmetic. (You cannot compare arbitrary pointers in standard C, but you can compare them if they are both based in a single array, memory buffer, or struct.) But if buffer ended at 0xFFFFFFFF, then p would be 0x00000000, and we would have the unlikely situation that p < buffer!
This would break a lot of existing code which assumes that, in valid pointer arithmetic done relative to an array base, the intuitive address-ordering property holds.
It's not legal to access one past the end of an array
that code doesn't actually access that address.
and you will never get an address like that on a real system for you objects.
The difference is between dereferencing that element and taking its address. In your example the element past the end wont be dereferenced and so it is a valid. Although this was not really clear in the early days of C++ it is clear now. Also the value you pass to subscript does not really matter.
Sometimes the best thing you can do about corner cases is forbid them. I saw this class of problem with some bit field extraction instructions of the NS32032 in which the hardware would load 32 bits starting at the byte address and extract from that datum. So even single-bit fields anywhere in the last 3 bytes of mapped memory would fail. The solution was to never allow the last 4 bytes of memory to be available for allocation.
Quite a few architectures that would be affected by this solve the problem by reserving offset 0xFFFFFFFF (and a bit more) for the OS.