Accessing unallocated memory C++ - c++

I am having this piece of code:
try
{
int* myTestArray = new int[2];
myTestArray[4] = 54;
cout << "Should throw ex " << myTestArray[4] + 1 << endl;
}
catch (exception& exception)
{
cout << "Exception content: " << exception.what() << endl;
}
What is really curios for me, is that why the exception is not thrown here, since it was accessed an index which was not allocated... and why 55 is print ? Is that C++ automatically increased the size of the array ?

Accessing unallocated memory is not guaranteed to throw exceptions.
It's actually not guaranteed to do anything, since that's undefined behavior. Anything could happen. Beware of nasal demons.
It prints 55 because you just stored 54, fetched it back and then printed 54+1. It's not at all guaranteed to print 55, although that's often what will happen in practice. This time it worked.

There is an unstated, and incorrect, assumptions here. That assumption is that C++ actually gives a damn about what you do with memory. C++, like its C ancestor, has a completely unchecked model of memory. What you have here is classically called a buffer overflow, and is a source of innumerable bugs including some horrible security flaws.
Here's what your code really says:
myTestArray is the name of a location in memory big enough to hold the address of an int.
Two ints worth of memory have been allocated on the heap for it. [And that addreress is put into the location myTestArray. Doesn't matter, but that probably makes it clearer.] (Along with probably 16 bytes of overhead, but we don't care about that now.)
you then are sticking the value 54 into the memory location 4 ints from the address contained in myTestArray.
looking at that location, adding 1 and printing the result.
You are demonstrating that C(++) indeed just doesn't care.
Now, under most conditions the underlying memory management and run time system won't let you get away with it; you will violate it's assumptions and get a segmentation error or something similar. But in this case, you are not hitting a boundary yet, most likely because you're piddling on the data structure that malloc is using under the covers to manage the heap. You're getting away with it because nothing is happening with the heap for the rest of the program. But for a real good time, write a little loop that does this code, freeing myTestArray and reallocating it. I'd lay long odds it won't run for more than 10 iterations before the program blows up, and might not make two.

Knowing what's going on here for sure is very hard to do. But I can give you a rough idea.
Most operating systems have a minimum size for memory allocations. In Unix it is the native page size. On x86 and amd64 systems this is 4 kB. In Windows it is 64 kB (I think).
The memory allocator used by malloc and new gets memory from the operating system in chunks of this size. It sets up data structures (often a linked list, sometimes a bitmap, or a tree) and hands out small pieces of the requested sizes.
One other confusing thing is that before your program even starts running main() it has run quite a bit of other code and allocated memory. For std::cout and other static and global objects, and for shared library linking.
But assume that when you call new your program first gets a chunk of 4 kB and gives you a pointer to 8 bytes of it (two integers). Your program has the entire 4 kB allocated and you can write there without crashing. However, what happens if you call new again? It is very likely that the memory allocator wrote some important tracking information somewhere into that 4 kB. The next bytes might be the size of the following block. Writing 54 into it might make it think it has more or less memory than it does. Or those bytes might be a pointer to the next block of free memory, and your 54 will cause the next memory allocation to crash the program.

You can write out of array range, but it is not guaranteed to work, and the data is not guaranteed to be persistent there, as something else can overwrite it.
It's simply not a good idea, and since there's no exception, potentially hard to find bug.
When reading that memory, you will be pulling some random garbage that was there left over from some other program or whatever used the memory before, so it can really be anything.

As others have said, this is undefined behavior, but I thought a bit more info might help. myTestArray is not an "Array" in the sense of a type, with special operators, etc. It is just a pointer to a location in memory. The expression myTestArray[4] is just short-hand for *(myTestArray+4) - it is returning a reference to the memory location that is 4 * sizeof(int) past myTestArray. If you want bounds checking, you'll have to use std::vector<int>::at().

Accessing array out of range is undefined behavior. Thus 55 is one of many possible results and there is nothing surprising here.
C++ Standard n3337 § 5.7 Additive operators
5) When an expression that has integral type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integral expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i + n-th and i −
n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined.

Related

Why two variables declared one after another are not next to each other in memory?

I am using a code sample to check the distance between two integers like in the answer of this question.
int i = 0, j = 0;
std::cout << &i - &j;
From my understanding of the memory representation, these memory addresses of these two variables should be next to each other and the difference should be exactly 1.
To my surprise, running this code with MS compiler in VS2017 prints 3 and running the same code with GCC prints 1.
Why this happens, is something wrong with VS?
C++ standard does not make any requirements for C++ compilers to allocate variables with automatic storage duration in any particular way, including making them contiguous in memory. In fact, compiler may choose to not allocate any memory to a variable, optimizing it out completely.
That is why subtracting pointers makes sense only when they both point to memory inside the same array, or one element past the end of it. In all other situations, including yours, you get undefined behavior.
The pointer arithmetic you tried has undefined behavior:
If the pointer P points to the ith element of an array, and the
pointer Q points at the jth element of the same array, the
expression P-Q has the value i-j, if the value fits in std::ptrdiff_t.
Both operands must point to the elements of the same array (or one
past the end), otherwise the behavior is undefined. If the result does
not fit in std::ptrdiff_t, the behavior is undefined.

Why uninitialized char array is filled with random symbols?

I have a this fragment of code in C++:
char x[50];
cout << x << endl;
which outputs some random symbols as seen here:
So my first question: what is the reason behind this output? Shouldn't it be spaces or at least same symbols?
The reason I am concerned with this is that I am writing program in CUDA and I'm doing some character manipulations inside __global__ function, hence the use of string gives a "calling host function is not allowed" error.
But if I am using "big enough" char array (each chunk of text I am operating with differs in size, meaning that it will not always utilize char array fully) it's sometimes not fully filled and I left with junk like in the picture below hanging at the end of text:
So my second question: is there any way to avoid this?
what is the reason behind this output?
The values in an automatic variable are indeterminate. The standard doesn't specify it, so it might be spaces as you said, it might be random content.
[...] sometimes not fully filled and I left with junk [...]
Strings in C are null-terminated, so any routine dedicated to printing a string will loop as long as no null byte is encountered. In uninitialized memory, this null byte occurs randomly (or not at all). These weird, trailing characters are a result of that.
is there any way to avoid this?
Yes. Initialize it.
(will assume x86 in this post)
what is the reason behind this output?
Here's roughly what happens, in assembly, when you do char x[50];:
ADD ESP, 0x34 ; 52 bytes
Essentially, the stack is moved up by 0x34 bytes (must be divisible by 4). Then, that space on the stack becomes x. There's no cleaning, no changes or pushes or pops, just this space becoming x. Anything that was there before (abandoned params, return addresses, variables from previous function calls) will be in x.
Here's roughly what happens when you do new char[50]:
1. Control gets passed to the allocator
2. The allocator looks for any heap of sufficient size (readas: an already allocated but uncommited heap)
3. If 2 fails, the allocator makes a new heap
4. The allocator takes the heap (either the found or allocated one) and commits it
5. The address of that heap is returned to your code where it is used as a char*
The same as with a stack, you get whatever data is there. Some programs or systems may have allocators that zero out heaps when they are allocated or committed, where others may only zero when allocated but not committed, and some may not zero at all. Depending on the allocator, you may get clean memory or you may get re-used and dirty memory. This is why the values here can be non-zero and aren't predictable.
is there any way to avoid this?
In the case of heap memory, you can overload the new and delete operators in C++ and always zero newly allocated memory. You can see examples of overloading these operators here. As for memory on the stack, you just have to live with zeroing it out every time.
ZeroMemory(myArray, sizeof(myarray));
Alternatively, for both methods, you could stay away from naked arrays and use std::vector or other wrappers that take care of initialization for you. You'll still want to make sure to initialize integers and other numeric or pointer data-types, though.
No, there is no way to avoid it. C++ does not initialize automatic variables of built-in types (such as arrays of built-in types in your case) automatically, you need to initialize them yourself.
Why are you having issues with this code?
char x[50];
cout << new char[50] << endl;
cout << x << endl;
You're leaking memory with the 'new char[50] without a corresponding delete.
Also, uninitialized memory is undefined as others have said and in most cases you get garbage within that memory block. A better method is to initialize it:
char x[50] = {};
char* y = new char[50]();
Then just remember to call delete on y later to free the memory. Yes, the OS will do it for you, but this is never a way to write good programs though.

What is the purpose of allocating a specific amount of memory for arrays in C++?

I'm a student taking a class on Data Structures in C++ this semester and I came across something that I don't quite understand tonight. Say I were to create a pointer to an array on the heap:
int* arrayPtr = new int [4];
I can access this array using pointer syntax
int value = *(arrayPtr + index);
But if I were to add another value to the memory position immediately after the end of the space allocated for the array, I would then be able to access it
*(arrayPtr + 4) = 0;
int nextPos = *(arrayPtr + 4);
//the value of nextPos will be 0, or whatever value I previously filled that space with
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems. So aside from it being a requirement of C++, why even give arrays a specific size when declaring them?
When you go past the end of allocated memory, you are actually accessing memory of some other object (or memory that is free right now, but that could change later). So, it will cause you problems. Especially if you'll try to write something to it.
I can access this array using pointer syntax
int value = *(arrayPtr + index);
Yeah, but don't. Use arrayPtr[index]
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems.
You understand wrong. Oh so very wrong. You're invoking undefined behavior and undefined behavior is undefined. It may work for a week, then break one day next week and you'll be left wondering why. If you don't know the collection size in advance use something dynamic like a vector instead of an array.
Yes, in C/C++ you can access memory outside of the space you claim to have allocated. Sometimes. This is what is referred to as undefined behavior.
Basically, you have told the compiler and the memory management system that you want space to store four integers, and the memory management system allocated space for you to store four integers. It gave you a pointer to that space. In the memory manager's internal accounting, those bytes of ram are now occupied, until you call delete[] arrayPtr;.
However, the memory manager has not allocated that next byte for you. You don't have any way of knowing, in general, what that next byte is, or who it belongs to.
In a simple example program like your example, which just allocates a few bytes, and doesn't allocate anything else, chances are, that next byte belongs to your program, and isn't occupied. If that array is the only dynamically allocated memory in your program, then it's probably, maybe safe to run over the end.
But in a more complex program, with multiple dynamic memory allocations and deallocations, especially near the edges of memory pages, you really have no good way of knowing what any bytes outside of the memory you asked for contain. So when you write to bytes outside of the memory you asked for in new you could be writing to basically anything.
This is where undefined behavior comes in. Because you don't know what's in that space you wrote to, you don't know what will happen as a result. Here's some examples of things that could happen:
The memory was not allocated when you wrote to it. In that case, the data is fine, and nothing bad seems to happen. However, if a later memory allocation uses that space, anything you tried to put there will be lost.
The memory was allocated when you wrote to it. In that case, congratulations, you just overwrote some random bytes from some other data structure somewhere else in your program. Imagine replacing a variable somewhere in one of your objects with random data, and consider what that would mean for your program. Maybe a list somewhere else now has the wrong count. Maybe a string now has some random values for the first few characters, or is now empty because you replaced those characters with zeroes.
The array was allocated at the edge of a page, so the next bytes don't belong to your program. The address is outside your program's allocation. In this case, the OS detects you accessing random memory that isn't yours, and terminates your program immediately with SIGSEGV.
Basically, undefined behavior means that you are doing something illegal, but because C/C++ is designed to be fast, the language designers don't include an explicit check to make sure you don't break the rules, like other languages (e.g. Java, C#). They just list the behavior of breaking the rules as undefined, and then the people who make the compilers can have the output be simpler, faster code, since no array bounds checks are made, and if you break the rules, it's your own problem.
So yes, this sometimes works, but don't ever rely on it.
It would not cause any problems in a a purely abstract setting, where you only worry about whether the logic of the algorithm is sound. In that case there's no reason to declare the size of an array at all. However, your computer exists in the physical world, and only has a limited amount of memory. When you're allocating memory, you're asking the operating system to let you use some of the computer's finite memory. If you go beyond that, the operating system should stop you, usually by killing your process/program.
Yes, you must write it as arrayptr[index] because the position in memory of *(arrayptr + 4) is past the end of the space which you have allocated for the array. Its the flaw in C++ that the array size cant be extended once allocated.

How do I take the address of one past the end of an array if the last address is 0xFFFFFFFF?

If it is legal to take the address one past the end of an array, how would I do this if the last element of array's address is 0xFFFFFFFF?
How would this code work:
for (vector<char>::iterator it = vector_.begin(), it != vector_.end(); ++it)
{
}
Edit:
I read here that it is legal before making this question: May I take the address of the one-past-the-end element of an array?
If this situation is a problem for a particular architecture (it may or may not be), then the compiler and runtime can be expected to arrange that allocated arrays never end at 0xFFFFFFFF. If they were to fail to do this, and something breaks when an array does end there, then they would not conform to the C++ standard.
Accessing out of the array boundaries is undefined behavior. You shouldn't be surprised if a demon flies out of your nose (or something like that)
What might actually happen would be an overflow in the address which could lead to you reading address zero and hence segmentation fault.
If you are always within the array range, and you do the last ++it which goes out of the array and you compare it against _vector.end(), then you are not really accessing anything and there should not be a problem.
I think there is a good argument for suggesting that a conformant C implementation cannot allow an array to end at (e.g.) 0xFFFFFFFF.
Let p be a pointer to one-element-off-the-end-of-the-array: if buffer is declared as char buffer[BUFFSIZE], then p = buffer+BUFFSIZE, or p = &buffer[BUFFSIZE]. (The latter means the same thing, and its validity was made explicit in the C99 standard document.)
We then expect the ordinary rules of pointer comparison to work, since the initialization of p was an ordinary bit of pointer arithmetic. (You cannot compare arbitrary pointers in standard C, but you can compare them if they are both based in a single array, memory buffer, or struct.) But if buffer ended at 0xFFFFFFFF, then p would be 0x00000000, and we would have the unlikely situation that p < buffer!
This would break a lot of existing code which assumes that, in valid pointer arithmetic done relative to an array base, the intuitive address-ordering property holds.
It's not legal to access one past the end of an array
that code doesn't actually access that address.
and you will never get an address like that on a real system for you objects.
The difference is between dereferencing that element and taking its address. In your example the element past the end wont be dereferenced and so it is a valid. Although this was not really clear in the early days of C++ it is clear now. Also the value you pass to subscript does not really matter.
Sometimes the best thing you can do about corner cases is forbid them. I saw this class of problem with some bit field extraction instructions of the NS32032 in which the hardware would load 32 bits starting at the byte address and extract from that datum. So even single-bit fields anywhere in the last 3 bytes of mapped memory would fail. The solution was to never allow the last 4 bytes of memory to be available for allocation.
Quite a few architectures that would be affected by this solve the problem by reserving offset 0xFFFFFFFF (and a bit more) for the OS.

Why the allocation succeeds for size zero bytes?

This is similar to What does zero-sized array allocation do/mean?
I have following code
int *p = new int[0];
delete []p;
p gets an address and gets deleted properly.
My question is: Why allocation of zero bytes is allowed by c++ Standard in the first place?
Why doesn't it throw bad_alloc or some special exception ?
I think, It is just postponing the catastrophic failure, making programmer's life difficult. Because if size to be allocated is calculated at run time and if programmer assumes its allocated properly and tries to write something to that memory, ends up corrupting memory !!! and Crash may happen some where else in the code.
EDIT: How much memory it allocates upon zero size request ?
Why would you want it to fail? If the programmer tries to read/write to non-existent elements, then that is an error. The initial allocation is not (this is no different to e.g. int *p = new int[1]; p[1] = 5;).
3.7.3.1/2:
[32. The intent is to have operator new() implementable by calling malloc() or calloc(), so the rules are substantially the same. C++ differs from C in requiring a zero request to return a non-null pointer.]
Compare dynamically allocated array to std::vector for example. You can have a vector of size 0, so why not allow the same for the array? And it is always an error to access past the end of the array whether its size is 0 or not.
Long time ago, before using exceptions, the malloc function returned a NULL pointer if the allocation failed.
If allocating zero bytes would also return a NULL pointer, it would be hard to make the distinction between a failed allocation and a succeeding-zero-bytes allocation.
On the other hand if the allocation of zero bytes would return a non-NULL pointer, you end up with a situation in which two different allocations of zero bytes can have the same pointer.
Therefore, to keep things simple, the malloc function of zero bytes allocates 1 byte.
The same can be said for int[N] where N>0:
Because if size to be allocated is calculated at run time and if programmer assumes its allocated properly and tries to write something past end of that memory, ends up corrupting memory !!! and Crash may happen some where else in the code.
Zero sized array allocation is covered in the ISO C++ Standard under 5.3.4, paragrahp 7
When the value of the expression in a direct-new-declarator is zero, the allocation function is called to allocate an array with no elements.
This makes code that performs dnaymic array allocation easier.
In general: If someone calls a function and asks it to return an array with n (0 in your case) elements, the code shouldn't be trying to read the returned array past the n-nth element anyway.
So, I don't really see the catastrophic failure, since the code would have been faulty to begin with for any n.
As you say:
Because if size to be allocated is calculated at run time and if programmer assumes its allocated properly
The calculated size would be "0", if he tries to access more than his calculated size then, well.. I am repeating myself ;)