What do C++ arrays init to? - c++

So I can fix this manually so it isn't an urgent question but I thought it was really strange:
Here is the entirety of my code before the weird thing that happens:
int main(int argc, char** arg) {
int memory[100];
int loadCounter = 0;
bool getInput = true;
print_memory(memory);
and then some other unrelated stuff.
The print memory just prints the array which should've initialized to all zero's but instead the first few numbers are:
+1606636544 +32767 +1606418432 +32767 +1856227894 +1212071026 +1790564758 +813168429 +0000 +0000
(the plus and the filler zeros are just for formatting since all the numbers are supposed to be from 0-1000 once the array is filled. The rest of the list is zeros)
It also isn't memory leaking because I tried initializing a different array variable and on the first run it also gave me a ton of weird numbers. Why is this happening?

Since you asked "What do C++ arrays init to?", the answer is they init to whatever happens to be in the memory they have been allocated at the time they come into scope.
I.e. they are not initialized.
Do note that some compilers will initialize stack variables to zero in debug builds; this can lead to nasty, randomly occurring issues once you start doing release builds.

The array you are using is stack allocated:
int memory[100];
When the particular function scope exits (In this case main) or returns, the memory will be reclaimed and it will not leak. This is how stack allocated memory works. In this case you allocated 100 integers (32 bits each on my compiler) on the stack as opposed to on the heap. A heap allocation is just somewhere else in memory hopefully far far away from the stack. Anyways, heap allocated memory has a chance for leaking. Low level Plain Old Data allocated on the stack (like you wrote in your code) won't leak.
The reason you got random values in your function was probably because you didn't initialize the data in the 'memory' array of integers. In release mode the application or the C runtime (in windows at least) will not take care of initializing that memory to a known base value. So the memory that is in the array is memory left over from last time the stack was using that memory. It could be a few milli-seconds old (most likely) to a few seconds old (less likely) to a few minutes old (way less likely). Anyways, it's considered garbage memory and it's to be avoided at all costs.
The problem is we don't know what is in your function called print_memory. But if that function doesn't alter the memory in any ways, than that would explain why you are getting seemingly random values. You need to initialize those values to something first before using them. I like to declare my stack based buffers like this:
int memory[100] = {0};
That's a shortcut for the compiler to fill the entire array with zero's.
It works for strings and any other basic data type too:
char MyName[100] = {0};
float NoMoney[100] = {0};
Not sure what compiler you are using, but if you are using a microsoft compiler with visual studio you should be just fine.

In addition to other answers, consider this: What is an array?
In managed languages, such as Java or C#, you work with high-level abstractions. C and C++ don't provide abstractions (I mean hardware abstractions, not language abstractions like OO features). They are dessigned to work close to metal that is, the language uses the hardware directly (Memory in this case) without abstractions.
That means when you declare a local variable, int a for example, what the compiler does is to say "Ok, im going to interpret the chunk of memory [A,A + sizeof(int)] as an integer, which I call 'a'" (Where A is the offset between the beginning of that chunk and the start address of function's stack frame).
As you can see, the compiler only "assigns" memory-segments to variables. It does not do any "magic", like "creating" variables. You have to understand that your code is executed in a machine, and the machine has only a memory and a CPU. There is no magic.
So what is the value of a variable when the function execution starts? The value represented with the data which the chunk of memory of the variable has. Commonly, that data has no sense from our current point of view (Could be part of the data used previously by a string, for example), so when you access that variable you get extrange values. Thats what we call "garbage": Data previously written which has no sense in our context.
The same applies to an array: An array is only a bigger chunk of memory, with enough space to fit all the values of the array: [A,A + (length of the array)*sizeof(type of array elements)]. So as in the variable case, the memory contains garbage.
Commonly you want to initialize an array with a set of values during its declaration. You could achieve that using an initialiser list:
int array[] = {1,2,3,4};
In that case, the compiler adds code to the function to initialize the memory-chunk which the array is with that values.
Sidenote: Non-POD types and static storage
The things explained above only applies to POD types such as basic types and arrays of basic types. With non-POD types like classes the compiler adds calls to the constructor of the variables, which are designed to initialise the values (attributes) of a class instance.
In addition, even if you use POD types, if variables have static storage specification, the compiler initializes its memory with a default value, because static variables are allocated at program start.

the local variable on stack is not initialized in c/c++. c/c++ is designed to be fast so it doesn't zero stack on function calls.

Before main() runs, the language runtime sets up the environment. Exactly what it's doing you'd have to discover by breaking at the load module's entry point and watching the stack pointer, but at any rate your stack space on entering main is not guaranteed clean.
Anything that needs clean stack or malloc or new space gets to clean it itself. Plenty of things don't. C[++] isn't in the business of doing unnecessary things. In C++ a class object can have non-trivial constructors that run implicitly, those guarantee the object's set up for use, but arrays and plain scalars don't have constructors, if you want an inital value you have to declare an initializer.

Related

stack overflow eror in c++ [duplicate]

I am using Dev C++ to write a simulation program. For it, I need to declare a single dimensional array with the data type double. It contains 4200000 elements - like double n[4200000].
The compiler shows no error, but the program exits on execution. I have checked, and the program executes just fine for an array having 5000 elements.
Now, I know that declaring such a large array on the stack is not recommended. However, the thing is that the simulation requires me to call specific elements from the array multiple times - for example, I might need the value of n[234] or n[46664] for a given calculation. Therefore, I need an array in which it is easier to sift through elements.
Is there a way I can declare this array on the stack?
No there is no(we'll say "reasonable") way to declare this array on the stack. You can however declare the pointer on the stack, and set aside a bit of memory on the heap.
double *n = new double[4200000];
accessing n[234] of this, should be no quicker than accessing n[234] of an array that you declared like this:
double n[500];
Or even better, you could use vectors
std::vector<int> someElements(4200000);
someElements[234];//Is equally fast as our n[234] from other examples, if you optimize (-O3) and the difference on small programs is negligible if you don't(+5%)
Which if you optimize with -O3, is just as fast as an array, and much safer. As with the
double *n = new double[4200000];
solution you will leak memory unless you do this:
delete[] n;
And with exceptions and various things, this is a very unsafe way of doing things.
You can increase your stack size. Try adding these options to your link flags:
-Wl,--stack,36000000
It might be too large though (I'm not sure if Windows places an upper limit on stack size.) In reality though, you shouldn't do that even if it works. Use dynamic memory allocation, as pointed out in the other answers.
(Weird, writing an answer and hoping it won't get accepted... :-P)
Yes, you can declare this array on the stack (with a little extra work), but it is not wise.
There is no justifiable reason why the array has to live on the stack.
The overhead of dynamically allocating a single array once is neglegible (you could say "zero"), and a smart pointer will safely take care of not leaking memory, if that is your concern.
Stack allocated memory is not in any way different from heap allocated memory (apart from some caching effects for small objects, but these do not apply here).
Insofar, just don't do it.
If you insist that you must allocate the array on the stack, you will need to reserve 32 megabytes of stack space first (preferrably a bit more). For that, using Dev-C++ (which presumes Windows+MingW) you will either need to set the reserved stack size for your executable using compiler flags such as -Wl,--stack,34000000 (this reserves somewhat more than 32MiB), or create a thread (which lets you specify a reserved stack size for that thread).
But really, again, just don't do that. There's nothing wrong with allocating a huge array dynamically.
Are there any reasons you want this on the stack specifically?
I'm asking because the following will give you a construct that can be used in a similar way (especially accessing values using array[index]), but it is a lot less limited in size (total max size depending on 32bit/64bit memory model and available memory (RAM and swap memory)) because it is allocated from the heap.
int arraysize= 4200000;
int *heaparray= new int[arraysize];
...
k= heaparray[456];
...
delete [] heaparray;
return;

Why uninitialized char array is filled with random symbols?

I have a this fragment of code in C++:
char x[50];
cout << x << endl;
which outputs some random symbols as seen here:
So my first question: what is the reason behind this output? Shouldn't it be spaces or at least same symbols?
The reason I am concerned with this is that I am writing program in CUDA and I'm doing some character manipulations inside __global__ function, hence the use of string gives a "calling host function is not allowed" error.
But if I am using "big enough" char array (each chunk of text I am operating with differs in size, meaning that it will not always utilize char array fully) it's sometimes not fully filled and I left with junk like in the picture below hanging at the end of text:
So my second question: is there any way to avoid this?
what is the reason behind this output?
The values in an automatic variable are indeterminate. The standard doesn't specify it, so it might be spaces as you said, it might be random content.
[...] sometimes not fully filled and I left with junk [...]
Strings in C are null-terminated, so any routine dedicated to printing a string will loop as long as no null byte is encountered. In uninitialized memory, this null byte occurs randomly (or not at all). These weird, trailing characters are a result of that.
is there any way to avoid this?
Yes. Initialize it.
(will assume x86 in this post)
what is the reason behind this output?
Here's roughly what happens, in assembly, when you do char x[50];:
ADD ESP, 0x34 ; 52 bytes
Essentially, the stack is moved up by 0x34 bytes (must be divisible by 4). Then, that space on the stack becomes x. There's no cleaning, no changes or pushes or pops, just this space becoming x. Anything that was there before (abandoned params, return addresses, variables from previous function calls) will be in x.
Here's roughly what happens when you do new char[50]:
1. Control gets passed to the allocator
2. The allocator looks for any heap of sufficient size (readas: an already allocated but uncommited heap)
3. If 2 fails, the allocator makes a new heap
4. The allocator takes the heap (either the found or allocated one) and commits it
5. The address of that heap is returned to your code where it is used as a char*
The same as with a stack, you get whatever data is there. Some programs or systems may have allocators that zero out heaps when they are allocated or committed, where others may only zero when allocated but not committed, and some may not zero at all. Depending on the allocator, you may get clean memory or you may get re-used and dirty memory. This is why the values here can be non-zero and aren't predictable.
is there any way to avoid this?
In the case of heap memory, you can overload the new and delete operators in C++ and always zero newly allocated memory. You can see examples of overloading these operators here. As for memory on the stack, you just have to live with zeroing it out every time.
ZeroMemory(myArray, sizeof(myarray));
Alternatively, for both methods, you could stay away from naked arrays and use std::vector or other wrappers that take care of initialization for you. You'll still want to make sure to initialize integers and other numeric or pointer data-types, though.
No, there is no way to avoid it. C++ does not initialize automatic variables of built-in types (such as arrays of built-in types in your case) automatically, you need to initialize them yourself.
Why are you having issues with this code?
char x[50];
cout << new char[50] << endl;
cout << x << endl;
You're leaking memory with the 'new char[50] without a corresponding delete.
Also, uninitialized memory is undefined as others have said and in most cases you get garbage within that memory block. A better method is to initialize it:
char x[50] = {};
char* y = new char[50]();
Then just remember to call delete on y later to free the memory. Yes, the OS will do it for you, but this is never a way to write good programs though.

What's the advantage of malloc?

What is the advantage of allocating a memory for some data. Instead we could use an array of them.
Like
int *lis;
lis = (int*) malloc ( sizeof( int ) * n );
/* Initialize LIS values for all indexes */
for ( i = 0; i < n; i++ )
lis[i] = 1;
we could have used an ordinary array.
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing? And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
Your question seems to rather compare dynamically allocated C-style arrays with variable-length arrays, which means that this might be what you are looking for: Why aren't variable-length arrays part of the C++ standard?
However the c++ tag yields the ultimate answer: use std::vector object instead.
As long as it is possible, avoid dynamic allocation and responsibility for ugly memory management ~> try to take advantage of objects with automatic storage duration instead. Another interesting reading might be: Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)
"And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?"
- If you still consider n to be the amount of integers that it is possible to store in this array, you will most likely experience undefined behavior.
More fundamentally, I think, apart from the stack vs heap and variable vs constant issues (and apart from the fact that you shouldn't be using malloc() in C++ to begin with), is that a local array ceases to exist when the function exits. If you return a pointer to it, that pointer is going to be useless as soon as the caller receives it, whereas memory dynamically allocated with malloc() or new will still be valid. You couldn't implement a function like strdup() using a local array, for instance, or sensibly implement a linked representation list or tree.
The answer is simple. Local1 arrays are allocated on your stack, which is a small pre-allocated memory for your program. Beyond a couple thousand data, you can't really do much on a stack. For higher amounts of data, you need to allocate memory out of your stack.
This is what malloc does.
malloc allocates a piece of memory as big as you ask it. It returns a pointer to the start of that memory, which could be treated similar to an array. If you write beyond the size of that memory, the result is undefined behavior. This means everything could work alright, or your computer may explode. Most likely though you'd get a segmentation fault error.
Reading values from the memory (for example for printing) is the same as reading from an array. For example printf("%d", list[5]);.
Before C99 (I know the question is tagged C++, but probably you're learning C-compiled-in-C++), there was another reason too. There was no way you could have an array of variable length on the stack. (Even now, variable length arrays on the stack are not so useful, since the stack is small). That's why for variable amount of memory, you needed the malloc function to allocate memory as large as you need, the size of which is determined at runtime.
Another important difference between local arrays, or any local variable for that matter, is the life duration of the object. Local variables are inaccessible as soon as their scope finishes. malloced objects live until they are freed. This is essential in practically all data structures that are not arrays, such as linked-lists, binary search trees (and variants), (most) heaps etc.
An example of malloced objects are FILEs. Once you call fopen, the structure that holds the data related to the opened file is dynamically allocated using malloc and returned as a pointer (FILE *).
1 Note: Non-local arrays (global or static) are allocated before execution, so they can't really have a length determined at runtime.
I assume you are asking what is the purpose of c maloc():
Say you want to take an input from user and now allocate an array of that size:
int n;
scanf("%d",&n);
int arr[n];
This will fail because n is not available at compile time. Here comes malloc()
you may write:
int n;
scanf("%d",&n);
int* arr = malloc(sizeof(int)*n);
Actually malloc() allocate memory dynamically in the heap area
Some older programming environments did not provide malloc or any equivalent functionality at all. If you needed dynamic memory allocation you had to code it yourself on top of gigantic static arrays. This had several drawbacks:
The static array size put a hard upper limit on how much data the program could process at any one time, without being recompiled. If you've ever tried to do something complicated in TeX and got a "capacity exceeded, sorry" message, this is why.
The operating system (such as it was) had to reserve space for the static array all at once, whether or not it would all be used. This phenomenon led to "overcommit", in which the OS pretends to have allocated all the memory you could possibly want, but then kills your process if you actually try to use more than is available. Why would anyone want that? And yet it was hyped as a feature in mid-90s commercial Unix, because it meant that giant FORTRAN simulations that potentially needed far more memory than your dinky little Sun workstation had, could be tested on small instance sizes with no trouble. (Presumably you would run the big instance on a Cray somewhere that actually had enough memory to cope.)
Dynamic memory allocators are hard to implement well. Have a look at the jemalloc paper to get a taste of just how hairy it can be. (If you want automatic garbage collection it gets even more complicated.) This is exactly the sort of thing you want a guru to code once for everyone's benefit.
So nowadays even quite barebones embedded environments give you some sort of dynamic allocator.
However, it is good mental discipline to try to do without. Over-use of dynamic memory leads to inefficiency, of the kind that is often very hard to eliminate after the fact, since it's baked into the architecture. If it seems like the task at hand doesn't need dynamic allocation, perhaps it doesn't.
However however, not using dynamic memory allocation when you really should have can cause its own problems, such as imposing hard upper limits on how long strings can be, or baking nonreentrancy into your API (compare gethostbyname to getaddrinfo).
So you have to think about it carefully.
we could have used an ordinary array
In C++ (this year, at least), arrays have a static size; so creating one from a run-time value:
int lis[n];
is not allowed. Some compilers allow this as a non-standard extension, and it's due to become standard next year; but, for now, if we want a dynamically sized array we have to allocate it dynamically.
In C, that would mean messing around with malloc; but you're asking about C++, so you want
std::vector<int> lis(n, 1);
to allocate an array of size n containing int values initialised to 1.
(If you like, you could allocate the array with new int[n], and remember to free it with delete [] lis when you're finished, and take extra care not to leak if an exception is thrown; but life's too short for that nonsense.)
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
malloc in C and new in C++ allocate persistent memory from the "free store". Unlike memory for local variables, which is released automatically when the variable goes out of scope, this persists until you explicitly release it (free in C, delete in C++). This is necessary if you need the array to outlive the current function call. It's also a good idea if the array is very large: local variables are (typically) stored on a stack, with a limited size. If that overflows, the program will crash or otherwise go wrong. (And, in current standard C++, it's necessary if the size isn't a compile-time constant).
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?
You haven't allocated enough space for n integers; so code that assumes you have will try to access memory beyond the end of the allocated space. This will cause undefined behaviour; a crash if you're lucky, and data corruption if you're unlucky.
And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
You mean something like this?
for (i = 0; i < len; ++i) std::cout << lis[i] << '\n';

Declare large array on Stack

I am using Dev C++ to write a simulation program. For it, I need to declare a single dimensional array with the data type double. It contains 4200000 elements - like double n[4200000].
The compiler shows no error, but the program exits on execution. I have checked, and the program executes just fine for an array having 5000 elements.
Now, I know that declaring such a large array on the stack is not recommended. However, the thing is that the simulation requires me to call specific elements from the array multiple times - for example, I might need the value of n[234] or n[46664] for a given calculation. Therefore, I need an array in which it is easier to sift through elements.
Is there a way I can declare this array on the stack?
No there is no(we'll say "reasonable") way to declare this array on the stack. You can however declare the pointer on the stack, and set aside a bit of memory on the heap.
double *n = new double[4200000];
accessing n[234] of this, should be no quicker than accessing n[234] of an array that you declared like this:
double n[500];
Or even better, you could use vectors
std::vector<int> someElements(4200000);
someElements[234];//Is equally fast as our n[234] from other examples, if you optimize (-O3) and the difference on small programs is negligible if you don't(+5%)
Which if you optimize with -O3, is just as fast as an array, and much safer. As with the
double *n = new double[4200000];
solution you will leak memory unless you do this:
delete[] n;
And with exceptions and various things, this is a very unsafe way of doing things.
You can increase your stack size. Try adding these options to your link flags:
-Wl,--stack,36000000
It might be too large though (I'm not sure if Windows places an upper limit on stack size.) In reality though, you shouldn't do that even if it works. Use dynamic memory allocation, as pointed out in the other answers.
(Weird, writing an answer and hoping it won't get accepted... :-P)
Yes, you can declare this array on the stack (with a little extra work), but it is not wise.
There is no justifiable reason why the array has to live on the stack.
The overhead of dynamically allocating a single array once is neglegible (you could say "zero"), and a smart pointer will safely take care of not leaking memory, if that is your concern.
Stack allocated memory is not in any way different from heap allocated memory (apart from some caching effects for small objects, but these do not apply here).
Insofar, just don't do it.
If you insist that you must allocate the array on the stack, you will need to reserve 32 megabytes of stack space first (preferrably a bit more). For that, using Dev-C++ (which presumes Windows+MingW) you will either need to set the reserved stack size for your executable using compiler flags such as -Wl,--stack,34000000 (this reserves somewhat more than 32MiB), or create a thread (which lets you specify a reserved stack size for that thread).
But really, again, just don't do that. There's nothing wrong with allocating a huge array dynamically.
Are there any reasons you want this on the stack specifically?
I'm asking because the following will give you a construct that can be used in a similar way (especially accessing values using array[index]), but it is a lot less limited in size (total max size depending on 32bit/64bit memory model and available memory (RAM and swap memory)) because it is allocated from the heap.
int arraysize= 4200000;
int *heaparray= new int[arraysize];
...
k= heaparray[456];
...
delete [] heaparray;
return;

Why does a C/C++ compiler need know the size of an array at compile time?

I know C standards preceding C99 (as well as C++) says that the size of an array on stack must be known at compile time. But why is that? The array on stack is allocated at run-time. So why does the size matter in compile time? Hope someone explain to me what a compiler will do with size at compile time. Thanks.
The example of such an array is:
void func()
{
/*Here "array" is a local variable on stack, its space is allocated
*at run-time. Why does the compiler need know its size at compile-time?
*/
int array[10];
}
To understand why variably-sized arrays are more complicated to implement, you need to know a little about how automatic storage duration ("local") variables are usually implemented.
Local variables tend to be stored on the runtime stack. The stack is basically a large array of memory, which is sequentially allocated to local variables and with a single index pointing to the current "high water mark". This index is the stack pointer.
When a function is entered, the stack pointer is moved in one direction to allocate memory on the stack for local variables; when the function exits, the stack pointer is moved back in the other direction, to deallocate them.
This means that the actual location of local variables in memory is defined only with reference to the value of the stack pointer at function entry1. The code in a function must access local variables via an offset from the stack pointer. The exact offsets to be used depend upon the size of the local variables.
Now, when all the local variables have a size that is fixed at compile-time, these offsets from the stack pointer are also fixed - so they can be coded directly into the instructions that the compiler emits. For example, in this function:
void foo(void)
{
int a;
char b[10];
int c;
a might be accessed as STACK_POINTER + 0, b might be accessed as STACK_POINTER + 4, and c might be accessed as STACK_POINTER + 14.
However, when you introduce a variably-sized array, these offsets can no longer be computed at compile-time; some of them will vary depending upon the size that the array has on this invocation of the function. This makes things significantly more complicated for compiler writers, because they must now write code that accesses STACK_POINTER + N - and since N itself varies, it must also be stored somewhere. Often this means doing two accesses - one to STACK_POINTER + <constant> to load N, then another to load or store the actual local variable of interest.
1. In fact, "the value of the stack pointer at function entry" is such a useful value to have around, that it has a name of its own - the frame pointer - and many CPUs provide a separate register dedicated to storing the frame pointer. In practice, it is usually the frame pointer from which the location of local variables is calculated, rather than the stack pointer itself.
It is not an extremely complicated thing to support, so the reason C89 doesn't allow this is not because it was impossible back then.
There are however two important reasons why it is not in C89:
The runtime code will get less efficient if the array size is not known at compile-time.
Supporting this makes life harder for compiler writers.
Historically, it has been very important that C compilers should be (relatively) easy to write. Also, it should be possible to make compilers simple and small enough to run on modest computer systems (by 80s standards). Another important feature of C is that the generated code should consistently be extremely efficient, without any surprises,
I think it is unfortunate that these values no longer hold for C99.
The compiler has to generate the code to create the space for the frame on the stack to hold the array and other local local variables. For this it needs the size of the array.
Depends on how you allocate the array.
If you create it as a local variable, and specify a length, then it matters because the compiler needs to know how much space to allocate on the stack for the elements of the array. If you don't specify a size of the array, then it doesn't know how much space to set aside for the array elements.
If you create just a pointer to an array, then all you need to do is allocate the space for the pointer itself, and then you can dynamically create array elements during run time. But in this form of array creation, you're allocating space for the array elements in the heap, not on the stack.
In C++ this becomes even more difficult to implement, because variables stored on the stack must have their destructors called in the event of an exception, or upon returning from a given function or scope. Keeping track of the exact number/size of variables to be destroyed adds additional overhead and complexity. While in C it's possible to use something like a frame pointer to make freeing of the VLA implicit, in C++ that doesn't help you, because those destructors need to be called.
Also, VLAs can cause Denial of Service security vulnerabilities. If the user is able to supply any value which is eventually used as the size for a VLA, then they could use a sufficiently large value to cause a stack overflow (and therefore failure) in your process.
Finally, C++ already has a safe and effective variable length array (std::vector<t>), so there's little reason to implement this feature for C++ code.
Let's say you create a variable sized array on the stack. The size of the stack frame needed by the function won't be known at compile time. So, C assumed that some run time environments require this to be known up front. Hence the limitation. C goes back to the early 1970's. Many languages at the time had a "static" look and feel (like Fortran)