Initializing C++ array in constant time - c++

char buffer[1000] = {0};
This initializes all 1000 elements to 0. Is this constant time? If not, why?
It seems like the compiler could optimize this to O(1) based on the following facts:
The array is of fixed size and known at compile time
The array is located on the stack, which means that presumably the executable could contain this data in the data segment of the executable (on Windows) as a chunk of data that is already filled with 0's.
Note that answers can be general to any compiler, but I'm specifically interested in answers tested on the MSVC compiler (any version) on Windows.
Bonus Points: Links to any articles, white papers, etc. on the details of this would be greatly appreciated.

If it's inside a function, no, it's not constant time.
Your second assumption isn't correct:
"The array is located on the stack, which means that presumably the executable could contain this data in the data segment of the executable (on Windows) as a chunk of data that is already filled with 0's."
The stack isn't already filled with zeros. It's filled with junk leftover from previous function calls.
So it's not possible to do it in O(1) because it will have to zero it.

It can be O(1) only as a global variable. If it is a local variable (on stack) it is O(n) where n is the size of array.
Stack is a shared memory, you need to actively zero always you want to have 1000 zeros in there. An array like you defined is not implemented as a pointer to data segment, it is 1000 variables on the stack and must be initialized in O(1000).
EDIT: Dani is right, I have to fix my statement: If it is a global array, it is initialized when program starts. And it is O(n) as well.

It will never be constant time, global or not. its true the compiler initializes that, but the operating system must load all the file into memory, which takes O(n) time.

The array is located on the stack, which means that presumably the executable could contain this data in the data segment of the executable (on Windows) as a chunk of data that is already filled with 0's.
What if you recurse into the function that defines array? The global DATA segment would need to have a copy of this array for each function call to allow each function to have its own array to work over. The compiler would have to run your code to decide the maximum recursions are going to happen.
Also what happens when you have multiple threads in your program and each calls foo? All the sudden you have shared stuff in DATA that has to be locked. The locking might cause more performance problems than getting rid of the initialization.
I wouldn't worry about it too much too. Most platforms have fairly efficient ways of zero filling memory. Unless you profile it and find a problem, don't sweat it.

As others have pointed out, assumption 2 is wrong. Stack variables are allocated at run-time in O(1) time but are not normally initialised unless you're running a debug build.
PUSH ebp
MOV ebp, esp
SUB esp, 10
// function body code goes here
Here, the stack pointer 'esp' is decremented by 10 to make room for some local function variables. They are not initalised... that would require a loop.
This article seems friendly enough.

If it's a global static, the "constant" here is ZERO - initialization is done at COMPILE TIME.

Related

C++ how are variables accessed in memory?

When I create a new variable in a C++ program, eg a char:
char c = 'a';
how does C++ then have access to this variable in memory? I would imagine that it would need to store the memory location of the variable, but then that would require a pointer variable, and this pointer would again need to be accessed.
See the docs:
When a variable is declared, the memory needed to store its value is
assigned a specific location in memory (its memory address).
Generally, C++ programs do not actively decide the exact memory
addresses where its variables are stored. Fortunately, that task is
left to the environment where the program is run - generally, an
operating system that decides the particular memory locations on
runtime. However, it may be useful for a program to be able to obtain
the address of a variable during runtime in order to access data cells
that are at a certain position relative to it.
You can also refer this article on Variables and Memory
The Stack
The stack is where local variables and function parameters reside. It
is called a stack because it follows the last-in, first-out principle.
As data is added or pushed to the stack, it grows, and when data is
removed or popped it shrinks. In reality, memory addresses are not
physically moved around every time data is pushed or popped from the
stack, instead the stack pointer, which as the name implies points to
the memory address at the top of the stack, moves up and down.
Everything below this address is considered to be on the stack and
usable, whereas everything above it is off the stack, and invalid.
This is all accomplished automatically by the operating system, and as
a result it is sometimes also called automatic memory. On the
extremely rare occasions that one needs to be able to explicitly
invoke this type of memory, the C++ key word auto can be used.
Normally, one declares variables on the stack like this:
void func () {
int i; float x[100];
...
}
Variables that are declared on the stack are only valid within the
scope of their declaration. That means when the function func() listed
above returns, i and x will no longer be accessible or valid.
There is another limitation to variables that are placed on the stack:
the operating system only allocates a certain amount of space to the
stack. As each part of a program that is being executed comes into
scope, the operating system allocates the appropriate amount of memory
that is required to hold all the local variables on the stack. If this
is greater than the amount of memory that the OS has allowed for the
total size of the stack, then the program will crash. While the
maximum size of the stack can sometimes be changed by compile time
parameters, it is usually fairly small, and nowhere near the total
amount of RAM available on a machine.
Assuming this is a local variable, then this variable is allocated on the stack - i.e. in the RAM. The compiler keeps track of the variable offset on the stack. In the basic scenario, in case any computation is then performed with the variable, it is moved to one of the processor's registers and the CPU performs the computation. Afterwards the result is returned back to the RAM. Modern processors keep whole stack frames in the registers and have multiple levels of registers, so it can get quite complex.
Please note the "c" name is no more mentioned in the binary (unless you have debugging symbols). The binary only then works with the memory locations. E.g. it would look like this (simple addition):
a = b + c
take value of memory offset 1 and put it in the register 1
take value of memory offset 2 and put in in the register 2
sum registers 1 and 2 and store the result in register 3
copy the register 3 to memory location 3
The binary doesn't know "a", "b" or "c". The compiler just said "a is in memory 1, b is in memory 2, c is in memory 3". And the CPU just blindly executes the commands the compiler has generated.
C++ itself (or, the compiler) would have access to this variable in terms of the program structure, represented as a data structure. Perhaps you're asking how other parts in the program would have access to it at run time.
The answer is that it varies. It can be stored either in a register, on the stack, on the heap, or in the data/bss sections (global/static variables), depending on its context and the platform it was compiled for: If you needed to pass it around by reference (or pointer) to other functions, then it would likely be stored on the stack. If you only need it in the context of your function, it would probably be handled in a register. If it's a member variable of an object on the heap, then it's on the heap, and you reference it by an offset into the object. If it's a global/static variable, then its address is determined once the program is fully loaded into memory.
C++ eventually compiles down to machine language, and often runs within the context of an operating system, so you might want to brush up a bit on Assembly basics, or even some OS principles, to better understand what's going on under the hood.
Lets say our program starts with a stack address of 4000000
When, you call a function, depending how much stack you use, it will "allocate it" like this
Let's say we have 2 ints (8bytes)
int function()
{
int a = 0;
int b = 0;
}
then whats gonna happen in assembly is
MOV EBP,ESP //Here we store the original value of the stack address (4000000) in EBP, and we restore it at the end of the function back to 4000000
SUB ESP, 8 //here we "allocate" 8 bytes in the stack, which basically just decreases the ESP addr by 8
so our ESP address was changed from
4000000
to
3999992
that's how the program knows knows the stack addresss for the first int is "3999992" and the second int is from 3999996 to 4000000
Even tho this pretty much has nothing to do with the compiler, it's really important to know because when you know how stack is "allocated", you realize how cheap it is to do things like
char my_array[20000];
since all it's doing is just doing sub esp, 20000 which is a single assembly instruction
but if u actually use all those bytes like memset(my_array,20000) that's a different history.
how does C++ then have access to this variable in memory?
It doesn't!
Your computer does, and it is instructed on how to do that by loading the location of the variable in memory into a register. This is all handled by assembly language. I shan't go into the details here of how such languages work (you can look it up!) but this is rather the purpose of a C++ compiler: to turn an abstract, high-level set of "instructions" into actual technical instructions that a computer can understand and execute. You could sort of say that assembly programs contain a lot of pointers, though most of them are literals rather than "variables".

What do C++ arrays init to?

So I can fix this manually so it isn't an urgent question but I thought it was really strange:
Here is the entirety of my code before the weird thing that happens:
int main(int argc, char** arg) {
int memory[100];
int loadCounter = 0;
bool getInput = true;
print_memory(memory);
and then some other unrelated stuff.
The print memory just prints the array which should've initialized to all zero's but instead the first few numbers are:
+1606636544 +32767 +1606418432 +32767 +1856227894 +1212071026 +1790564758 +813168429 +0000 +0000
(the plus and the filler zeros are just for formatting since all the numbers are supposed to be from 0-1000 once the array is filled. The rest of the list is zeros)
It also isn't memory leaking because I tried initializing a different array variable and on the first run it also gave me a ton of weird numbers. Why is this happening?
Since you asked "What do C++ arrays init to?", the answer is they init to whatever happens to be in the memory they have been allocated at the time they come into scope.
I.e. they are not initialized.
Do note that some compilers will initialize stack variables to zero in debug builds; this can lead to nasty, randomly occurring issues once you start doing release builds.
The array you are using is stack allocated:
int memory[100];
When the particular function scope exits (In this case main) or returns, the memory will be reclaimed and it will not leak. This is how stack allocated memory works. In this case you allocated 100 integers (32 bits each on my compiler) on the stack as opposed to on the heap. A heap allocation is just somewhere else in memory hopefully far far away from the stack. Anyways, heap allocated memory has a chance for leaking. Low level Plain Old Data allocated on the stack (like you wrote in your code) won't leak.
The reason you got random values in your function was probably because you didn't initialize the data in the 'memory' array of integers. In release mode the application or the C runtime (in windows at least) will not take care of initializing that memory to a known base value. So the memory that is in the array is memory left over from last time the stack was using that memory. It could be a few milli-seconds old (most likely) to a few seconds old (less likely) to a few minutes old (way less likely). Anyways, it's considered garbage memory and it's to be avoided at all costs.
The problem is we don't know what is in your function called print_memory. But if that function doesn't alter the memory in any ways, than that would explain why you are getting seemingly random values. You need to initialize those values to something first before using them. I like to declare my stack based buffers like this:
int memory[100] = {0};
That's a shortcut for the compiler to fill the entire array with zero's.
It works for strings and any other basic data type too:
char MyName[100] = {0};
float NoMoney[100] = {0};
Not sure what compiler you are using, but if you are using a microsoft compiler with visual studio you should be just fine.
In addition to other answers, consider this: What is an array?
In managed languages, such as Java or C#, you work with high-level abstractions. C and C++ don't provide abstractions (I mean hardware abstractions, not language abstractions like OO features). They are dessigned to work close to metal that is, the language uses the hardware directly (Memory in this case) without abstractions.
That means when you declare a local variable, int a for example, what the compiler does is to say "Ok, im going to interpret the chunk of memory [A,A + sizeof(int)] as an integer, which I call 'a'" (Where A is the offset between the beginning of that chunk and the start address of function's stack frame).
As you can see, the compiler only "assigns" memory-segments to variables. It does not do any "magic", like "creating" variables. You have to understand that your code is executed in a machine, and the machine has only a memory and a CPU. There is no magic.
So what is the value of a variable when the function execution starts? The value represented with the data which the chunk of memory of the variable has. Commonly, that data has no sense from our current point of view (Could be part of the data used previously by a string, for example), so when you access that variable you get extrange values. Thats what we call "garbage": Data previously written which has no sense in our context.
The same applies to an array: An array is only a bigger chunk of memory, with enough space to fit all the values of the array: [A,A + (length of the array)*sizeof(type of array elements)]. So as in the variable case, the memory contains garbage.
Commonly you want to initialize an array with a set of values during its declaration. You could achieve that using an initialiser list:
int array[] = {1,2,3,4};
In that case, the compiler adds code to the function to initialize the memory-chunk which the array is with that values.
Sidenote: Non-POD types and static storage
The things explained above only applies to POD types such as basic types and arrays of basic types. With non-POD types like classes the compiler adds calls to the constructor of the variables, which are designed to initialise the values (attributes) of a class instance.
In addition, even if you use POD types, if variables have static storage specification, the compiler initializes its memory with a default value, because static variables are allocated at program start.
the local variable on stack is not initialized in c/c++. c/c++ is designed to be fast so it doesn't zero stack on function calls.
Before main() runs, the language runtime sets up the environment. Exactly what it's doing you'd have to discover by breaking at the load module's entry point and watching the stack pointer, but at any rate your stack space on entering main is not guaranteed clean.
Anything that needs clean stack or malloc or new space gets to clean it itself. Plenty of things don't. C[++] isn't in the business of doing unnecessary things. In C++ a class object can have non-trivial constructors that run implicitly, those guarantee the object's set up for use, but arrays and plain scalars don't have constructors, if you want an inital value you have to declare an initializer.

How much memory does a function use?

I was asked this question in an interview- "how much memory does a function use?". So I tried to answer by saying you could add up all the memory taken by all the data variables , data structures it instantiates- for example add 4 bytes for long, 1 for char , 4 for int, 32 bits for a pointer on 32 bits system, and adding any inputs that were dynamically allotted. The interviewer was not happy with my answer.
I am learning C++, and will appreciate any insight.
Question is quite undefined. A function itself will occupy just the space for its activation record from the caller, for parameters and for its local variables on the stack. According to architecture the activation record will contain things like saved registers, address to return when the function is called and whatever.
But a function can allocate how much memory it requires on the heap so there is no a precise answer.
Oh in addition, if the function is recursive then it could use a lot of memory, always because of activation records which are needed between each call.
From point of view of static behavior,
1. Data used by it - Sum of all variables memory sizes
2. Size of instructions - Each instruction written inside a function will occupy some memory in binary. That is how size of your function will be identified. This is nothing but your compiled code size.
From point of view of dynamic behavior (run time),
1. Heap memory resulted because of a function call is function memory.
i think this guide on function footprints is what you were talking about. they were probably looking for "32/64 bits (integer) because its a pointer"...
I bet the right answer could be "Undefined". An empty function consumes nothing.
function func(){}
A chaining one takes more than we can actually estimate.
function funcA()
{
funcB();
funcC();
//...
}
A local object without being used in its scope will be optimized away by most compilers so it too takes zero memory in its container.
function func()
{
var IamIgnored=0;
//don't do anything with IamIgnored
}
And please don't miss the memory alignment so I think calculating memory used by an object or a function can't be simply done by accumulating all objects' memory sizes within their scopes.

Stack overflow with large array but not with equally large vector?

I ran into a funny issue today working with large data structures. I initially was using a vector to store upwards of 1000000 ints but later decided I didn't actually need the dynamic functionality of the vector (I was reserving 1000000 spots as soon as it was declared anyway) and it would be beneficial to, instead, be able to add values any place in the data structure. So I switched it to an array and BAM stack overflow. I'm guessing this is because declaring the size of the array at compile time puts it in the stack and making use of a dynamic vector instead placed it on the heap (which I'm guessing is larger?).
So what's the right answer here? Move back to a dynamic memory system just so it gets put on the heap? Increase the size of the stack? Or am I way off base on the whole thing here...?
Thanks!
I initially was using a vector to store upwards of 1000000 ints
Good idea.
but later decided I didn't actually need the dynamic functionality of the vector (I was reserving 1000000 spots as soon as it was declared anyway)
Not such a good idea. You did need it.
and it would be beneficial to, instead, be able to add values any place in the data structure.
I don't follow.
I'm guessing this is because declaring the size of the array at compile time puts it in the stack and making use of a dynamic vector instead placed it on the heap (which I'm guessing is larger?).
Much. The call stack is typically of the order of 1MB-2MB in size by default. Your "heap" (free store) is only really bounded by your available RAM.
So what's the right answer here? Move back to a dynamic memory system just so it gets put on the heap?
Yes.
[edit: Joachim's right — static is another possible answer.]
Increase the size of the stack?
You could but even if you could stretch 4MB out of it, you've left yourself no wiggle room for other local data variables. Best use dynamic memory — that's the appropriate thing to do.
Or am I way off base on the whole thing here...?
No.

Why does a C/C++ compiler need know the size of an array at compile time?

I know C standards preceding C99 (as well as C++) says that the size of an array on stack must be known at compile time. But why is that? The array on stack is allocated at run-time. So why does the size matter in compile time? Hope someone explain to me what a compiler will do with size at compile time. Thanks.
The example of such an array is:
void func()
{
/*Here "array" is a local variable on stack, its space is allocated
*at run-time. Why does the compiler need know its size at compile-time?
*/
int array[10];
}
To understand why variably-sized arrays are more complicated to implement, you need to know a little about how automatic storage duration ("local") variables are usually implemented.
Local variables tend to be stored on the runtime stack. The stack is basically a large array of memory, which is sequentially allocated to local variables and with a single index pointing to the current "high water mark". This index is the stack pointer.
When a function is entered, the stack pointer is moved in one direction to allocate memory on the stack for local variables; when the function exits, the stack pointer is moved back in the other direction, to deallocate them.
This means that the actual location of local variables in memory is defined only with reference to the value of the stack pointer at function entry1. The code in a function must access local variables via an offset from the stack pointer. The exact offsets to be used depend upon the size of the local variables.
Now, when all the local variables have a size that is fixed at compile-time, these offsets from the stack pointer are also fixed - so they can be coded directly into the instructions that the compiler emits. For example, in this function:
void foo(void)
{
int a;
char b[10];
int c;
a might be accessed as STACK_POINTER + 0, b might be accessed as STACK_POINTER + 4, and c might be accessed as STACK_POINTER + 14.
However, when you introduce a variably-sized array, these offsets can no longer be computed at compile-time; some of them will vary depending upon the size that the array has on this invocation of the function. This makes things significantly more complicated for compiler writers, because they must now write code that accesses STACK_POINTER + N - and since N itself varies, it must also be stored somewhere. Often this means doing two accesses - one to STACK_POINTER + <constant> to load N, then another to load or store the actual local variable of interest.
1. In fact, "the value of the stack pointer at function entry" is such a useful value to have around, that it has a name of its own - the frame pointer - and many CPUs provide a separate register dedicated to storing the frame pointer. In practice, it is usually the frame pointer from which the location of local variables is calculated, rather than the stack pointer itself.
It is not an extremely complicated thing to support, so the reason C89 doesn't allow this is not because it was impossible back then.
There are however two important reasons why it is not in C89:
The runtime code will get less efficient if the array size is not known at compile-time.
Supporting this makes life harder for compiler writers.
Historically, it has been very important that C compilers should be (relatively) easy to write. Also, it should be possible to make compilers simple and small enough to run on modest computer systems (by 80s standards). Another important feature of C is that the generated code should consistently be extremely efficient, without any surprises,
I think it is unfortunate that these values no longer hold for C99.
The compiler has to generate the code to create the space for the frame on the stack to hold the array and other local local variables. For this it needs the size of the array.
Depends on how you allocate the array.
If you create it as a local variable, and specify a length, then it matters because the compiler needs to know how much space to allocate on the stack for the elements of the array. If you don't specify a size of the array, then it doesn't know how much space to set aside for the array elements.
If you create just a pointer to an array, then all you need to do is allocate the space for the pointer itself, and then you can dynamically create array elements during run time. But in this form of array creation, you're allocating space for the array elements in the heap, not on the stack.
In C++ this becomes even more difficult to implement, because variables stored on the stack must have their destructors called in the event of an exception, or upon returning from a given function or scope. Keeping track of the exact number/size of variables to be destroyed adds additional overhead and complexity. While in C it's possible to use something like a frame pointer to make freeing of the VLA implicit, in C++ that doesn't help you, because those destructors need to be called.
Also, VLAs can cause Denial of Service security vulnerabilities. If the user is able to supply any value which is eventually used as the size for a VLA, then they could use a sufficiently large value to cause a stack overflow (and therefore failure) in your process.
Finally, C++ already has a safe and effective variable length array (std::vector<t>), so there's little reason to implement this feature for C++ code.
Let's say you create a variable sized array on the stack. The size of the stack frame needed by the function won't be known at compile time. So, C assumed that some run time environments require this to be known up front. Hence the limitation. C goes back to the early 1970's. Many languages at the time had a "static" look and feel (like Fortran)