Large array in C++ works, but why?

Large array in C++ works, but why? - c++

Why can I compile and run this code? Isn't the array too large? How is memory allocated to this array?
#include <iostream>
#define Nbig 10000000000000000
int main() {
int x[Nbig];
x[Nbig-1]=100;
std::cout <<"x[Nbig-1]= "<< x[Nbig-1] <<"\n\n";
return 0;
}
I thought when a static array is declared, a chunk of RAM should be allocated to it and when I assign a value to say x[1000], the memory bytes at the 'x+1000*4' address (4 for int and x the address of the first element) should represent the value. I tried googling and read about static and dynamics allocation, heap and stack, RAM itsel but didn't find my answer anywhere. Additional information that might help: I'm using linux with 32GB RAM and and compile the code with gcc.

I thought when a static array is declared, a chunk of RAM should be allocated to it
That is an implementation detail of the C++ implementation. You are asking for an array of the given size on the language level. The compiler has to compile the program in such a way that it behaves as the language specifies programs using arrays to behave. The language standard makes no mention of RAM or a stack and allows arrays of any arbitrary size up to an implementation-defined limit. How the compiler uses memory to provide for this behavior of the program is completely up to the compiler. If it can figure out that e.g. no RAM use is required to make the program behave equivalently to the specification, then it doesn't need to use any.
Since you use only one element of the array, there is clearly no need to reserve memory for the whole array and using less memory than asked for is also a desirable optimization, so it is not surprising that a compiler would choose to not allocate memory for the rest of the array. Even further it is obvious that you are only using the single element of the array to pass a constant to std::cout, so the compiler can completely avoid reserving memory for the array and just pass the constant directly to std::cout << in a register.

If the address of an automatic-duration object is never exposed to outside code or otherwise used in ways a compiler can't fully track, and if a compiler can "understand" everything that is done with the object, the compiler need not allocate actual storage for the object.
In this case, some compilers would probably be able to see that only one element of the array is ever read, and it's always written with the value 100, and thus there is no need to allocate any storage for the array. Instead, any operation that would read the array may be replaced with operation that loads the constant 100.

FYI, this code:
int main() {
int x[Nbig];
attempts to allocate x on the stack (it's local to the function main and is not explicitly declared static). Typically, the max size of the stack is somewhere around a couple of megabytes, so your array won't fit. Most compilers know this and will refuse to compile this code. If it does compile, it will almost certainly fail at runtime.
In your case, the compiler will have optimized the whole thing away, since you only use one element.
If you need a compile-time array that can be allocated, make it static, like this:
int main() {
static int x[Nbig];
Or:
int x[Nbig];
int main() {
(ignoring the fact that your #define may have been truncated)
This will allocate the array in the fixed-size data segment.

Related

How to check if a certain memory address is available for use in c++?

I'm working on my hobby project in c++, and want to test a continuous memory allocation for variables of different types, (Like array with variables of different types). How can i check if a specific memory address is available for use?
More Details:
Let's say we've got the following code: we have an integer int_var, (it doesn't matter in which memory address this variable seats), In order to allocate a variable of different type in the address right after the address of int_var i need to check if that address is available and then use it. i tried the following code:
int int_var = 5;
float* flt_ptr = (float*)(&int_var + (sizeof(int_var) / sizeof(int)));
// check if flt_ptr is successfully allocated
if (flt_ptr) { // successfully allocated
// use that address
} else { // not successfully allocated
cout << "ERROR";
}
The problem is: When i run the program, sometimes flt_ptr is successfully allocated and all right, and sometimes not - but when it is not successfully allocated it throws an exception that says "Read access violation ..." instead of printing "ERROR". Why is that? Maybe i missed something about checking if flt_ptr is successfully allocated? Or did something wrong? If so, How do i check if flt_ptr is successfully allocated before i use it?
Thanks!!

This memory model you are assuming was valid back in DOS, where, in real mode, the memory was a continuous stream of bytes.
Now that we have paging (either in x86 or in x64), this is not possible. Therefore, you can make no assumptions on the existance of memory "near" memory.
You have to allocate properly, which means using C++ shared_ptr/unique_ptr/STL. Or, new/malloc the old (bad) way.
If you want variables to be one near the other, allocate the whole memory at once (via a struct, for example).

You can't, C++ memory model does not work like that.
The only valid pointers are those obtained by the '&' operator, those returned from 'new/malloc' and static arrays. There is no mechanism for checking if the memory address is (still) valid or whether the object has been destroyed or not existed there at all. So it is up to the programmer to manage the correctness of the pointers.
Because of the reasons above your program has undefined behavior.
if(pointer) only checks whether pointer==0, nothing more. Note that int n=5; int array[n]; is not valid C++ either. Not sure if you are using it, but if you do, don't.
Based on the comments, you want a heterogenous container. In that case use an array of unions or better std::array<std::variant<int,double,char, float...>> array;. Or std::vector if you need dynamic size.
C++ guarantees that arrays ([], malloc or new[]) are contiguous, but they only contain one type. In general, you cannot store float, double, int, char continuously together because of the alignment issues. The array above is continuous in terms of std::variant object, but its size will be at least the size of the largest type. So chars will not be packed together.

That is not how you allocate memory. You need to do it properly using new.
See here.

want to test a continuous memory allocation for variables of a
different types
You may use a structure and declare required variables as members for continuous memory allocation for different datatypes.
For example:
struct eg_struct
{
unsigned char abc;
unsigned int xyz;
}
Note that if required you may need to pack the structure.
Here there is no need to check whether memory is free or not.

Why is it impossible to allocate an array of an arbitrary size on the stack?

Why can't I write the following?
char acBuf[nSize];
Only to prevent the stack from overgrowing?
Or is there a possibility to do something similar, if I can ensure that I always take just a few hundred kilobytes?
As far as I know, the std::string uses the memory of its members to store the assigned strings, as long as they are 15 characters or less. Only if the strings are longer, it uses this memory to store the address of some heap-allocated memory, which then takes the data.
It seems like it has to be 100%ly determined, during compile-time, how the stack will be aligned during runtime. Is that true? Why is that?

It has nothing to do with preventing stack overflow, you can overflow the stack just fine with char a[SOME_LARGE_CONSTANT]. In C++ the array size has to be known at compile time, this is among other things needed to compute the size of structures containing arrays.
C on the other hand had Variable Length Arrays since C99, which adds an exception and allow runtime dependant size for arrays within function scope. As to why C++ does not have this? It was never adopted by a C++ standard.

Why can't I write the following?
char acBuf[nSize];
Those are called Variable Length Arrays (VLA's) and aren't supported by C++. The reason being that the stack is very fast but tiny compared to the free store (the heap in your words). Which means that at any moment that you add lots of elements to a VLA your stack might just overflow and you get a vague runtime exception. This can also happen with compile-time sized stack arrays but these are way easier to catch because the behaviour of the program doesn't influence their size. Which means that x doesn't have to happen after y to create a stack overflow, it's just there right off the bat. This covers it in more detail and rage.
Containers like std::vector use the free store which is way bigger and has a way to deal with over-allocation (throws bad_alloc).

Unlike C, C++ doesn't support variable length arrays. If you want them, you can use non-standard extensions such as alloca or GNU extensions (supported by clang and GCC). They have their caveats, so be sure to read the manual to make sure you use them safely.
The reason the stack layout is mostly determined statically is so that the generated code has to perform fewer computations (additions, multiplications, and pointer dereferencing) to figure out where the data is on the stack. The offsets can instead be hardcoded into the generated machine code.

My advice is to take a look at alloca.h
void *alloca(size_t size);
The alloca() function allocates size bytes of space in the stack
frame of the caller. This temporary space is automatically freed
when the function that called alloca() returns to its caller.

One possible problem I see with VLA in C++ is the type.
What is the type of acBuf in char acBuf[nSize] or even worse in char acBuf[nSize][nSize] ?
template <typename T> void foo(const T&);
void foo(int n)
{
char mat[n][n];
foo(mat);
}
You cannot pass that array by reference to
template <typename T, std::size_t N>
void foo_on_array(const T (&a)[N]);

You should be happy that the C++ standard discourages a dangerous practice (variable-length arrays on the stack) and instead encourages a less dangerous practice (variable-length arrays and std::vector with heap allocation).
Variable-length arrays on the stack are more dangerous because:
The available stack space is typically 8 MB, much smaller than the 2 GB (or more) of available heap space.
When the stack space is exhausted, the program crashes with SIGSEGV, and it requires special software such as GNU libsigsegv to recover from such a situation.
In typical C++ programs, the programmer does not know whether the array length will definitely stay under a limit such as 4 MB.

Why can't I write the following?
char acBuf[nSize];
You can't do that because in C++ the lenght of the array has to be known at compile time, that's because the compiler reserves the specified memory for the array and it can not be modified during runtime. it's not about prevent a stack overflow, it's about memory layout.
If you want to make a dynamic array you should use the new operator so it will be stored in heap.
char *acBuf = new char[nsize];

What do C++ arrays init to?

So I can fix this manually so it isn't an urgent question but I thought it was really strange:
Here is the entirety of my code before the weird thing that happens:
int main(int argc, char** arg) {
int memory[100];
int loadCounter = 0;
bool getInput = true;
print_memory(memory);
and then some other unrelated stuff.
The print memory just prints the array which should've initialized to all zero's but instead the first few numbers are:
+1606636544 +32767 +1606418432 +32767 +1856227894 +1212071026 +1790564758 +813168429 +0000 +0000
(the plus and the filler zeros are just for formatting since all the numbers are supposed to be from 0-1000 once the array is filled. The rest of the list is zeros)
It also isn't memory leaking because I tried initializing a different array variable and on the first run it also gave me a ton of weird numbers. Why is this happening?

Since you asked "What do C++ arrays init to?", the answer is they init to whatever happens to be in the memory they have been allocated at the time they come into scope.
I.e. they are not initialized.
Do note that some compilers will initialize stack variables to zero in debug builds; this can lead to nasty, randomly occurring issues once you start doing release builds.

The array you are using is stack allocated:
int memory[100];
When the particular function scope exits (In this case main) or returns, the memory will be reclaimed and it will not leak. This is how stack allocated memory works. In this case you allocated 100 integers (32 bits each on my compiler) on the stack as opposed to on the heap. A heap allocation is just somewhere else in memory hopefully far far away from the stack. Anyways, heap allocated memory has a chance for leaking. Low level Plain Old Data allocated on the stack (like you wrote in your code) won't leak.
The reason you got random values in your function was probably because you didn't initialize the data in the 'memory' array of integers. In release mode the application or the C runtime (in windows at least) will not take care of initializing that memory to a known base value. So the memory that is in the array is memory left over from last time the stack was using that memory. It could be a few milli-seconds old (most likely) to a few seconds old (less likely) to a few minutes old (way less likely). Anyways, it's considered garbage memory and it's to be avoided at all costs.
The problem is we don't know what is in your function called print_memory. But if that function doesn't alter the memory in any ways, than that would explain why you are getting seemingly random values. You need to initialize those values to something first before using them. I like to declare my stack based buffers like this:
int memory[100] = {0};
That's a shortcut for the compiler to fill the entire array with zero's.
It works for strings and any other basic data type too:
char MyName[100] = {0};
float NoMoney[100] = {0};
Not sure what compiler you are using, but if you are using a microsoft compiler with visual studio you should be just fine.

In addition to other answers, consider this: What is an array?
In managed languages, such as Java or C#, you work with high-level abstractions. C and C++ don't provide abstractions (I mean hardware abstractions, not language abstractions like OO features). They are dessigned to work close to metal that is, the language uses the hardware directly (Memory in this case) without abstractions.
That means when you declare a local variable, int a for example, what the compiler does is to say "Ok, im going to interpret the chunk of memory [A,A + sizeof(int)] as an integer, which I call 'a'" (Where A is the offset between the beginning of that chunk and the start address of function's stack frame).
As you can see, the compiler only "assigns" memory-segments to variables. It does not do any "magic", like "creating" variables. You have to understand that your code is executed in a machine, and the machine has only a memory and a CPU. There is no magic.
So what is the value of a variable when the function execution starts? The value represented with the data which the chunk of memory of the variable has. Commonly, that data has no sense from our current point of view (Could be part of the data used previously by a string, for example), so when you access that variable you get extrange values. Thats what we call "garbage": Data previously written which has no sense in our context.
The same applies to an array: An array is only a bigger chunk of memory, with enough space to fit all the values of the array: [A,A + (length of the array)*sizeof(type of array elements)]. So as in the variable case, the memory contains garbage.
Commonly you want to initialize an array with a set of values during its declaration. You could achieve that using an initialiser list:
int array[] = {1,2,3,4};
In that case, the compiler adds code to the function to initialize the memory-chunk which the array is with that values.
Sidenote: Non-POD types and static storage
The things explained above only applies to POD types such as basic types and arrays of basic types. With non-POD types like classes the compiler adds calls to the constructor of the variables, which are designed to initialise the values (attributes) of a class instance.
In addition, even if you use POD types, if variables have static storage specification, the compiler initializes its memory with a default value, because static variables are allocated at program start.

the local variable on stack is not initialized in c/c++. c/c++ is designed to be fast so it doesn't zero stack on function calls.

Before main() runs, the language runtime sets up the environment. Exactly what it's doing you'd have to discover by breaking at the load module's entry point and watching the stack pointer, but at any rate your stack space on entering main is not guaranteed clean.
Anything that needs clean stack or malloc or new space gets to clean it itself. Plenty of things don't. C[++] isn't in the business of doing unnecessary things. In C++ a class object can have non-trivial constructors that run implicitly, those guarantee the object's set up for use, but arrays and plain scalars don't have constructors, if you want an inital value you have to declare an initializer.

What's the advantage of malloc?

What is the advantage of allocating a memory for some data. Instead we could use an array of them.
Like
int *lis;
lis = (int*) malloc ( sizeof( int ) * n );
/* Initialize LIS values for all indexes */
for ( i = 0; i < n; i++ )
lis[i] = 1;
we could have used an ordinary array.
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing? And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?

Your question seems to rather compare dynamically allocated C-style arrays with variable-length arrays, which means that this might be what you are looking for: Why aren't variable-length arrays part of the C++ standard?
However the c++ tag yields the ultimate answer: use std::vector object instead.
As long as it is possible, avoid dynamic allocation and responsibility for ugly memory management ~> try to take advantage of objects with automatic storage duration instead. Another interesting reading might be: Understanding the meaning of the term and the concept - RAII (Resource Acquisition is Initialization)
"And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?"
- If you still consider n to be the amount of integers that it is possible to store in this array, you will most likely experience undefined behavior.

More fundamentally, I think, apart from the stack vs heap and variable vs constant issues (and apart from the fact that you shouldn't be using malloc() in C++ to begin with), is that a local array ceases to exist when the function exits. If you return a pointer to it, that pointer is going to be useless as soon as the caller receives it, whereas memory dynamically allocated with malloc() or new will still be valid. You couldn't implement a function like strdup() using a local array, for instance, or sensibly implement a linked representation list or tree.

The answer is simple. Local1 arrays are allocated on your stack, which is a small pre-allocated memory for your program. Beyond a couple thousand data, you can't really do much on a stack. For higher amounts of data, you need to allocate memory out of your stack.
This is what malloc does.
malloc allocates a piece of memory as big as you ask it. It returns a pointer to the start of that memory, which could be treated similar to an array. If you write beyond the size of that memory, the result is undefined behavior. This means everything could work alright, or your computer may explode. Most likely though you'd get a segmentation fault error.
Reading values from the memory (for example for printing) is the same as reading from an array. For example printf("%d", list[5]);.
Before C99 (I know the question is tagged C++, but probably you're learning C-compiled-in-C++), there was another reason too. There was no way you could have an array of variable length on the stack. (Even now, variable length arrays on the stack are not so useful, since the stack is small). That's why for variable amount of memory, you needed the malloc function to allocate memory as large as you need, the size of which is determined at runtime.
Another important difference between local arrays, or any local variable for that matter, is the life duration of the object. Local variables are inaccessible as soon as their scope finishes. malloced objects live until they are freed. This is essential in practically all data structures that are not arrays, such as linked-lists, binary search trees (and variants), (most) heaps etc.
An example of malloced objects are FILEs. Once you call fopen, the structure that holds the data related to the opened file is dynamically allocated using malloc and returned as a pointer (FILE *).
1 Note: Non-local arrays (global or static) are allocated before execution, so they can't really have a length determined at runtime.

I assume you are asking what is the purpose of c maloc():
Say you want to take an input from user and now allocate an array of that size:
int n;
scanf("%d",&n);
int arr[n];
This will fail because n is not available at compile time. Here comes malloc()
you may write:
int n;
scanf("%d",&n);
int* arr = malloc(sizeof(int)*n);
Actually malloc() allocate memory dynamically in the heap area

Some older programming environments did not provide malloc or any equivalent functionality at all. If you needed dynamic memory allocation you had to code it yourself on top of gigantic static arrays. This had several drawbacks:
The static array size put a hard upper limit on how much data the program could process at any one time, without being recompiled. If you've ever tried to do something complicated in TeX and got a "capacity exceeded, sorry" message, this is why.
The operating system (such as it was) had to reserve space for the static array all at once, whether or not it would all be used. This phenomenon led to "overcommit", in which the OS pretends to have allocated all the memory you could possibly want, but then kills your process if you actually try to use more than is available. Why would anyone want that? And yet it was hyped as a feature in mid-90s commercial Unix, because it meant that giant FORTRAN simulations that potentially needed far more memory than your dinky little Sun workstation had, could be tested on small instance sizes with no trouble. (Presumably you would run the big instance on a Cray somewhere that actually had enough memory to cope.)
Dynamic memory allocators are hard to implement well. Have a look at the jemalloc paper to get a taste of just how hairy it can be. (If you want automatic garbage collection it gets even more complicated.) This is exactly the sort of thing you want a guru to code once for everyone's benefit.
So nowadays even quite barebones embedded environments give you some sort of dynamic allocator.
However, it is good mental discipline to try to do without. Over-use of dynamic memory leads to inefficiency, of the kind that is often very hard to eliminate after the fact, since it's baked into the architecture. If it seems like the task at hand doesn't need dynamic allocation, perhaps it doesn't.
However however, not using dynamic memory allocation when you really should have can cause its own problems, such as imposing hard upper limits on how long strings can be, or baking nonreentrancy into your API (compare gethostbyname to getaddrinfo).
So you have to think about it carefully.

we could have used an ordinary array
In C++ (this year, at least), arrays have a static size; so creating one from a run-time value:
int lis[n];
is not allowed. Some compilers allow this as a non-standard extension, and it's due to become standard next year; but, for now, if we want a dynamically sized array we have to allocate it dynamically.
In C, that would mean messing around with malloc; but you're asking about C++, so you want
std::vector<int> lis(n, 1);
to allocate an array of size n containing int values initialised to 1.
(If you like, you could allocate the array with new int[n], and remember to free it with delete [] lis when you're finished, and take extra care not to leak if an exception is thrown; but life's too short for that nonsense.)
Well I don't understand exactly how malloc works, what is actually does. So explaining them would be more beneficial for me.
malloc in C and new in C++ allocate persistent memory from the "free store". Unlike memory for local variables, which is released automatically when the variable goes out of scope, this persists until you explicitly release it (free in C, delete in C++). This is necessary if you need the array to outlive the current function call. It's also a good idea if the array is very large: local variables are (typically) stored on a stack, with a limited size. If that overflows, the program will crash or otherwise go wrong. (And, in current standard C++, it's necessary if the size isn't a compile-time constant).
And suppose we replace sizeof(int) * n with just n in the above code and then try to store integer values, what problems might i be facing?
You haven't allocated enough space for n integers; so code that assumes you have will try to access memory beyond the end of the allocated space. This will cause undefined behaviour; a crash if you're lucky, and data corruption if you're unlucky.
And is there a way to print the values stored in the variable directly from the memory allocated space, for example here it is lis?
You mean something like this?
for (i = 0; i < len; ++i) std::cout << lis[i] << '\n';

Why does a C/C++ compiler need know the size of an array at compile time?

I know C standards preceding C99 (as well as C++) says that the size of an array on stack must be known at compile time. But why is that? The array on stack is allocated at run-time. So why does the size matter in compile time? Hope someone explain to me what a compiler will do with size at compile time. Thanks.
The example of such an array is:
void func()
{
/*Here "array" is a local variable on stack, its space is allocated
*at run-time. Why does the compiler need know its size at compile-time?
*/
int array[10];
}

To understand why variably-sized arrays are more complicated to implement, you need to know a little about how automatic storage duration ("local") variables are usually implemented.
Local variables tend to be stored on the runtime stack. The stack is basically a large array of memory, which is sequentially allocated to local variables and with a single index pointing to the current "high water mark". This index is the stack pointer.
When a function is entered, the stack pointer is moved in one direction to allocate memory on the stack for local variables; when the function exits, the stack pointer is moved back in the other direction, to deallocate them.
This means that the actual location of local variables in memory is defined only with reference to the value of the stack pointer at function entry1. The code in a function must access local variables via an offset from the stack pointer. The exact offsets to be used depend upon the size of the local variables.
Now, when all the local variables have a size that is fixed at compile-time, these offsets from the stack pointer are also fixed - so they can be coded directly into the instructions that the compiler emits. For example, in this function:
void foo(void)
{
int a;
char b[10];
int c;
a might be accessed as STACK_POINTER + 0, b might be accessed as STACK_POINTER + 4, and c might be accessed as STACK_POINTER + 14.
However, when you introduce a variably-sized array, these offsets can no longer be computed at compile-time; some of them will vary depending upon the size that the array has on this invocation of the function. This makes things significantly more complicated for compiler writers, because they must now write code that accesses STACK_POINTER + N - and since N itself varies, it must also be stored somewhere. Often this means doing two accesses - one to STACK_POINTER + <constant> to load N, then another to load or store the actual local variable of interest.
1. In fact, "the value of the stack pointer at function entry" is such a useful value to have around, that it has a name of its own - the frame pointer - and many CPUs provide a separate register dedicated to storing the frame pointer. In practice, it is usually the frame pointer from which the location of local variables is calculated, rather than the stack pointer itself.

It is not an extremely complicated thing to support, so the reason C89 doesn't allow this is not because it was impossible back then.
There are however two important reasons why it is not in C89:
The runtime code will get less efficient if the array size is not known at compile-time.
Supporting this makes life harder for compiler writers.
Historically, it has been very important that C compilers should be (relatively) easy to write. Also, it should be possible to make compilers simple and small enough to run on modest computer systems (by 80s standards). Another important feature of C is that the generated code should consistently be extremely efficient, without any surprises,
I think it is unfortunate that these values no longer hold for C99.

The compiler has to generate the code to create the space for the frame on the stack to hold the array and other local local variables. For this it needs the size of the array.

Depends on how you allocate the array.
If you create it as a local variable, and specify a length, then it matters because the compiler needs to know how much space to allocate on the stack for the elements of the array. If you don't specify a size of the array, then it doesn't know how much space to set aside for the array elements.
If you create just a pointer to an array, then all you need to do is allocate the space for the pointer itself, and then you can dynamically create array elements during run time. But in this form of array creation, you're allocating space for the array elements in the heap, not on the stack.

In C++ this becomes even more difficult to implement, because variables stored on the stack must have their destructors called in the event of an exception, or upon returning from a given function or scope. Keeping track of the exact number/size of variables to be destroyed adds additional overhead and complexity. While in C it's possible to use something like a frame pointer to make freeing of the VLA implicit, in C++ that doesn't help you, because those destructors need to be called.
Also, VLAs can cause Denial of Service security vulnerabilities. If the user is able to supply any value which is eventually used as the size for a VLA, then they could use a sufficiently large value to cause a stack overflow (and therefore failure) in your process.
Finally, C++ already has a safe and effective variable length array (std::vector<t>), so there's little reason to implement this feature for C++ code.

Let's say you create a variable sized array on the stack. The size of the stack frame needed by the function won't be known at compile time. So, C assumed that some run time environments require this to be known up front. Hence the limitation. C goes back to the early 1970's. Many languages at the time had a "static" look and feel (like Fortran)

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js