Why does my dynamically allocated array get initialized to 0? - c++

I have some code that creates a dynamically allocated array with
int *Array = new int[size];
From what I understand, Array should be a pointer to the first item of Array in memory. When using gdb, I can call x Array to examine the value at the first memory location, x Array+1 to examine the second, etc. I expect to have junk values left over from whatever application was using those spots in memory prior to mine. However, using x Array returns 0x00000000 for all those spots. What am I doing wrong? Is my code initializing all of the values of the Array to zero?
EDIT: For the record, I ask because my program is an attempt to implement this: http://eli.thegreenplace.net/2008/08/23/initializing-an-array-in-constant-time/. I want to make sure that my algorithm isn't incrementing through the array to initialize every element to 0.

In most modern OSes, the OS gives zeroed pages to applications, as opposed to letting information seep between unrelated processes. That's important for security reasons, for example. Back in the old DOS days, things were a bit more casual. Today, with memory protected OSes, the OS generally gives you zeros to start with.
So, if this new happens early in your program, you're likely to get zeros. You'd be crazy to rely on that though; it's undefined behavior if you do.
If you keep allocating, filling, and freeing memory, eventually new will return memory that isn't zeroed. Rather, it'll contain remnants of your process' own earlier scribblings.
And there's no guarantee that any particular call to new, even at the beginning of your program, will return memory filled with zeros. You're just likely to see that for calls to new early in your program. Don't let that mislead you.

I expect to have junk values left over from whatever application was using those spots
It's certainly possible but by no means guaranteed. Particularly in debug builds, you're just as likely to have the runtime zero out that memory (or fill it with some recognisable bit pattern) instead, to help you debug things if you use the memory incorrectly.
And, really, "those spots" is a rather loose term, given virtual addressing.
The important thing is that, no, your code is not setting all those values to zero.

Related

Can my program use unallocated memory on the free store without my knowledge?

When defining a variable without initialization on either the stack or the free store it usually has a garbage value, as assigning it to some default value e.g. 0 would just be a waste of time.
Examples:
int foo;//uninitialized foo may contain any value
int* fooptr=new int;//uninitialized *fooptr may contain any value
This however doens't answer the question of where the garbage values come from.
The usual explanation to that is that new or malloc or whatever you use to get dynamically allocated memory don't initialize the memory to some value as I've stated above and the garbage values are just leftover from whatever program used the same memory prior.
So I put this explanation to the test:
#include <iostream>
int main()
{
int* ptr= new int[10]{0};//allocate memory and initialize everything to 0
for (int i=0;i<10;++i)
{
std::cout<<*(ptr+i)<<" "<<ptr+i<<std::endl;
}
delete[]ptr;
ptr= new int[10];//allocate memory without initialization
for (int i=0;i<10;++i)
{
std::cout<<*(ptr+i)<<" "<<ptr+i<<std::endl;
}
delete[]ptr;
}
Output:
0 0x1291a60
0 0x1291a64
0 0x1291a68
0 0x1291a6c
0 0x1291a70
0 0x1291a74
0 0x1291a78
0 0x1291a7c
0 0x1291a80
0 0x1291a84
19471096 0x1291a60
19464384 0x1291a64
0 0x1291a68
0 0x1291a6c
0 0x1291a70
0 0x1291a74
0 0x1291a78
0 0x1291a7c
0 0x1291a80
0 0x1291a84
In this code sample I allocated memory for 10 ints twice. The first time I do so I initialize every value to 0. I use delete[] on the pointer and proceed to immediately allocate the memory for 10 ints again but this time without initialization.
Yes I know that the results of using an uninitialized variable are undefined, but I want to focus on the garbage values fro now.
The output shows that the first two ints now contain garbage values in the same memory location.
If we take the explanation for garbage values into consideration this leaves me only one conclusion: Between deleting the pointer and allocating the memory again something must have tampered with the values in those memory locations.
But isn't the free store reserved for new and delete?
What could have tampered those values?
Edit:
I removed the std::cout as a comment pointed it out.
I use the compiler Eclipse 2022-06 comes with (MinGW GCC) using default flags on Windows 10.
One of the things you need to understand about heap allocations is that there is always a small control block also allocated when you do a new. The values in the control block tend to inform the compiler how much space is being freed when delete is called.
When a block is deleted, the first part of the buffer is often overwritten by a control block. If you look at the two values you see from your program as hex values, you will note they appear to be addresses in the same general memory space. The first looks to be a pointer to the next allocated location, while the second appears to be a pointer to the start of the heap block.
Edit: One of the main reasons to add this kind of control block in a recently deallocated buffer is that is supports memory coalescence. That two int signature will effectively show how much memory can be claimed if that space is reused, and it signals that it is empty by pointing to the start of the frame.
When defining a variable without initialization on either the stack or the free store it usually has a garbage value, as assigning it to some default value e.g. 0 would just be a waste of time.
No. The initial value of a variable that is not initialized is always garbage. All garbage. This is inherent in "not initialized". The language semantics do not specify what the value of the variable is, and reading that value produces undefined behavior. If you do read it and it seems to make sense to you -- it is all zeroes, for example, or it looks like the value that some now-dead variable might have held -- that is meaningless.
This however doens't answer the question of where the garbage values come from.
At the level of the language semantics, that question is non-sensical. "Garbage values" aren't a thing of themselves. The term is descriptive of values on which you cannot safely rely, precisely because the language does not describe where they come from or how they are determined.
The usual explanation to that is that new or malloc or whatever you use to get dynamically allocated memory don't initialize the memory [so the] values are just leftover from whatever program used the same memory prior.
That's an explanation derived from typical C and C++ implementation details. Read again: implementation details. These are what you are asking about, and unless your objective is to learn about writing C and / or C++ compilers or implementations of their standard libraries, it is not a particularly useful area to probe. The specifics vary across implementations and sometimes between versions of the same implementation, and if your programs do anything that exposes them to these details then those programs are wrong.
I know that the results of using an uninitialized variable are undefined, but I want to focus on the garbage values fro now.
No, apparently you do not know that the results of using the value of an uninitialized variable are undefined. If you did, you would not present the results of your program as if they were somehow meaningful.
You also seem not understand the term "garbage value", for in addition to thinking that the results of your program are meaningful, you appear to think that some of the values it outputs are not garbage.

Large number of null variables with the total size exceeding the memory in C

I have a very basic question related to NULL variables in C. Consider a hypothetical 64-bit system with very limited memory say 4KB and with a large number of integer pointers all set to NULL, such that the total size exceeds the available memory. Will such a program compile and execute?
Assume that the program doesn't have to do anything meaningful, just do declarations to a bunch of null integer pointers(of the sort int *x = NULL) and terminate.
Even though you did this:
int *x = NULL;
memory is still allocated for storing the pointer x (despite there being NULL on the right hand side). Memory in such case, if x is automatic variable was allocated on the stack.
If you had used malloc on the right hand side you would additionally have claimed memory from the heap.
Now if you create many such pointers which will exceed available stack memory you will get stack overflow on run time - but if you don't use these pointers they might as well get optimized away.
If you declare but don't use a variable which has no side effects the compiler will optimize it out of existence. So no, this is not a way to go out of memory.
If you don't have optimizations turned on, you could create enough variables on the stack to cause a stack overflow. You could also just create a really big array on the stack.
That said, it's quite easy to run out of memory, and you don't need to do it with copious quantities of int pointers. No matter how you manage to run out of memory, it won't stop you from compiling the program successfully.

What is the purpose of allocating a specific amount of memory for arrays in C++?

I'm a student taking a class on Data Structures in C++ this semester and I came across something that I don't quite understand tonight. Say I were to create a pointer to an array on the heap:
int* arrayPtr = new int [4];
I can access this array using pointer syntax
int value = *(arrayPtr + index);
But if I were to add another value to the memory position immediately after the end of the space allocated for the array, I would then be able to access it
*(arrayPtr + 4) = 0;
int nextPos = *(arrayPtr + 4);
//the value of nextPos will be 0, or whatever value I previously filled that space with
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems. So aside from it being a requirement of C++, why even give arrays a specific size when declaring them?
When you go past the end of allocated memory, you are actually accessing memory of some other object (or memory that is free right now, but that could change later). So, it will cause you problems. Especially if you'll try to write something to it.
I can access this array using pointer syntax
int value = *(arrayPtr + index);
Yeah, but don't. Use arrayPtr[index]
The position in memory of *(arrayPtr + 4) is past the end of the space allocated for the array. But as far as I understand, the above still would not cause any problems.
You understand wrong. Oh so very wrong. You're invoking undefined behavior and undefined behavior is undefined. It may work for a week, then break one day next week and you'll be left wondering why. If you don't know the collection size in advance use something dynamic like a vector instead of an array.
Yes, in C/C++ you can access memory outside of the space you claim to have allocated. Sometimes. This is what is referred to as undefined behavior.
Basically, you have told the compiler and the memory management system that you want space to store four integers, and the memory management system allocated space for you to store four integers. It gave you a pointer to that space. In the memory manager's internal accounting, those bytes of ram are now occupied, until you call delete[] arrayPtr;.
However, the memory manager has not allocated that next byte for you. You don't have any way of knowing, in general, what that next byte is, or who it belongs to.
In a simple example program like your example, which just allocates a few bytes, and doesn't allocate anything else, chances are, that next byte belongs to your program, and isn't occupied. If that array is the only dynamically allocated memory in your program, then it's probably, maybe safe to run over the end.
But in a more complex program, with multiple dynamic memory allocations and deallocations, especially near the edges of memory pages, you really have no good way of knowing what any bytes outside of the memory you asked for contain. So when you write to bytes outside of the memory you asked for in new you could be writing to basically anything.
This is where undefined behavior comes in. Because you don't know what's in that space you wrote to, you don't know what will happen as a result. Here's some examples of things that could happen:
The memory was not allocated when you wrote to it. In that case, the data is fine, and nothing bad seems to happen. However, if a later memory allocation uses that space, anything you tried to put there will be lost.
The memory was allocated when you wrote to it. In that case, congratulations, you just overwrote some random bytes from some other data structure somewhere else in your program. Imagine replacing a variable somewhere in one of your objects with random data, and consider what that would mean for your program. Maybe a list somewhere else now has the wrong count. Maybe a string now has some random values for the first few characters, or is now empty because you replaced those characters with zeroes.
The array was allocated at the edge of a page, so the next bytes don't belong to your program. The address is outside your program's allocation. In this case, the OS detects you accessing random memory that isn't yours, and terminates your program immediately with SIGSEGV.
Basically, undefined behavior means that you are doing something illegal, but because C/C++ is designed to be fast, the language designers don't include an explicit check to make sure you don't break the rules, like other languages (e.g. Java, C#). They just list the behavior of breaking the rules as undefined, and then the people who make the compilers can have the output be simpler, faster code, since no array bounds checks are made, and if you break the rules, it's your own problem.
So yes, this sometimes works, but don't ever rely on it.
It would not cause any problems in a a purely abstract setting, where you only worry about whether the logic of the algorithm is sound. In that case there's no reason to declare the size of an array at all. However, your computer exists in the physical world, and only has a limited amount of memory. When you're allocating memory, you're asking the operating system to let you use some of the computer's finite memory. If you go beyond that, the operating system should stop you, usually by killing your process/program.
Yes, you must write it as arrayptr[index] because the position in memory of *(arrayptr + 4) is past the end of the space which you have allocated for the array. Its the flaw in C++ that the array size cant be extended once allocated.

dangers of heap overflows?

I have a question about heap overflows.
I understand that if a stack variable overruns it's buffer, it could overwrite the EIP and ESP values and, for example, make the program jump to a place where the coder did not expect it to jump.
This seems, as I understand, to behave like this because of the backward little endian storing (where f.e. the characters in an array are stored "backwards", from last to first).
If you on the other hand put that array into the heap, which grows contra the stack, and you would overflow it, would it just write random garbage into empty memory space then? (unless you where on a solaris which as far as I know has a big endian system,side note)
Would this basicly be a danger since it would just write into "empty space"?
So no aimed jumping to adresses and areas the code was not designed for?
Am I getting this wrong?
To specify my question:
I am writing a program where the user is meant to pass a string argument and a flag when executing it via command line, and I want to know if the user could perform a hack with this string argument when it is put on the heap with the malloc function.
If you on the other hand put that array into the heap, which grows contra the stack, and you would overflow it, would it just write random garbage into empty memory space then?
You are making a couple of assumptions:
You are assuming that the heap is at the end of the main memory segment. That ain't necessarily so.
You are assuming that the object in the heap is at the end of the heap. That ain't necessarily so. (In fact, it typically isn't so ...)
Here's an example that is likely to cause problems no matter how the heap is implemented:
char *a = malloc(100);
char *b = malloc(100);
char *c = malloc(100);
for (int i = 0; i < 200; i++) {
b[i] = 'Z';
}
Writing beyond the end of b is likely to trample either a or c ... or some other object in the heap, or the free list.
Depending on what objects you trample, you may overwrite function pointers, or you may do other damage that results in segmentation faults, unpredictable behaviour and so on. These things could be used for code injection, to cause the code to malfunction in other ways that are harmful from a security standpoint ... or just to implement a denial of service attack by crashing the target application / service.
There are various ways heap overflow could lead to code execution:
Most obvious - you overflow into another object that contains function pointers and get to overwrite one of them.
Slightly less obvious - the object you overflow into doesn't itself contain function pointers, but it contains pointers that will be used for writing, and you get to overwrite one of them to point to a function pointer so that a subsequent write overwrites a function pointer.
Exploiting heap bookkeeping structures - by overwriting the data that the heap allocator itself uses to track size and status of allocated/free blocks, you trick it into overwriting something valuable elsewhere in memory.
Etc.
For some advanced techniques, see:
http://packetstormsecurity.com/files/view/40638/MallocMaleficarum.txt
Even if you can't overwrite a return address, how do you feel about an attacker modifying the rest of your data? This shouldn't thrill you.
To answer your question generally: it is a very bad idea to let the user copy data anywhere without checking its size. You should absolutely never do that, especially on purpose.
If the user means no harm, they may crash your program, either by overwriting useful data, or by causing a page fault. If your user is malicious, you're potentially letting them hijack your system. Both are highly undesirable.
Endianness does not matter to buffer overflows. Big endian machines are just as vulnerable as little-endian machines. The only difference will be the byte order of the malicious data.
You may be thinking instead of the direction the stack grows in, which is independent of endianness. In the case where it grows up, you won't be able to hijack the return address of the function that declares the buffer. However, if you pass that buffer address to any other function, and this function overflows instead, an attacker may change this function's return address. This would be the case, for instance, if you called memcpy of scanf or any other function to modify your buffer (assuming that the compiler didn't inline them).
The stack usually grows downwards. In this case, an attacker can use an overflow to hijack the return address of the function that declares it.
In other words, neither the stack configuration nor endianness offer meaningful protection against stack buffer overflows.
As for the heap:
If you on the other hand put that array into the heap, which grows contra the stack, and you would overflow it, would it just write random garbage into empty memory space then?
The answer, as almost always, is it depends, but probably not. The 32-bit implementation of malloc in glibc keeps bookkeeping structure at the end of the buffer (or at least, used to). By overflowing onto the bookkeeping structures with the correct incantations, when the allocation was freed, you could cause free to write four arbitrary bytes at an arbitrary location. This is a lot of power. This kind of exploit comes up regularly in capture-the-flag competitions and is very exploitable.

segfault with array

I have two questions regarding array:
First one is regarding following code:
int a[30]; //1
a[40]=1; //2
why isn't the line 2 giving segfault, it should give because array has been allocated
only 30 int space and any dereferencing outside its allocated space should give segfault.
Second: assuming that above code works is there any chance that a[40] will get over written, since it doesn't come is the reserved range of arrray.
Thanks in advance.
That's undefined behavior - it may crash, it may silently corrupt data, it may produce no observable results, anything. Don't do it.
In your example the likely explanation is that the array is stack-allocated and so there's a wide range of addresses around the array accessible for writing, so there're no immediate observable results. However depending on how (which direction - to larger addresses or to smaller addresses) the stack grows on your system this might overwrite the return address and temporaries of functions up the call stack and this will crash your program or make it misbehave when it tries to return from the function.
For performance reason, C will not check array size each time you access it. You could also access elements via direct pointers in which case there is no way to validate the access.
SEGFAULT will happen only if you are out of the memory allocated to your process.
For 2nd question, yes it can be overwritten as this memory is allocated to your process and is possibly used by other variables.
It depends on where has the system allocated that array, if by casuality position 40 is in an operative system reserved memory then you will receive segfault.
Your application will crash only if you do something illegal for the rest of your system: if you try and access a virutal memory address that your program doesn't own, what happens is that your hardware will notice that, will inform your operating system, and it will kill your application with a segmentation fault: you accessed a memory segment you were not supposed to.
However if you access a random memory address (which is what you did: for sure a[40] is outside of your array a, but it could be wherever), you could access a valid memory cell (which is what happened to you).
This is an error: you'll likely overwrite some memory area your program owns, thus risking to break your program elsewhere, but the system cannot know if you accessed it by purpose or by mistake and won't kill you.
Programs written in managed languages (ie: programs that run in a protected environment checking anything) would notice your erroneous memory access, but C is not a managed language: you're free to do whatever you want (as soon as you don't create problems to the rest of the system).
The reason line 2 works and doesn't throw a segfault is because in C/C++, arrays are pointers. So your array variable a points to some memory address e.g. 1004. The array syntax tells your program how many bytes down from the location of a to look for an array element.
This means that
printf("%p", a);
// prints out "1004"
and
printf("%p", a[0]);
// prints out "1004"
should print the same value.
However,
printf("%p", a[40]);
// prints out "1164"
returns the memory address that is sizeof(int) * 40 down from the address of a.
Yes, it will eventually be overwritten.
If you malloc the space, you should get a segfault (or at least I believe so), but when using an array without allocating space, you'll be able to overwrite memory for a while. It will crash eventually, possibly when the program does an array size check or maybe when you hit a memory block reserved for something else (not sure what's going on under the hood).
Funny thing is that, IIRC, efence won't catch this either :D.