I thought using static variables might cause ambiguity in readability of the code but nothing special. But now, I know that there are 5 data sagments: text, data, bss, heap, stack. text segment is for code, data seg. is for declared variables, bss seg. is for undeclared variables, heap is for pointers, and stack if for variables of functions.
Would it be better not to use static variable over local variable to minimize the size the program takes on computer when running?
I'm pretty sure static variable and global variable are saved in bss or data segment. And the size of bss and data segment does not change after compiled. And for heap and stack, they get released once used, so there is nothing to worry about size.
Am I right in thinking this?
Text segment is for code, data seg. is for declared variables, bss seg. is for undeclared variables
So far you are right.
heap is for pointers
No. Heap is for data allocated via malloc() and, in the case of C++, new.
The pointers are stored wherever you put them (data, bss, stack).
and stack for variables of functions.
And for function arguments.
Would it be better not to use static variable over local variable to minimize the size the program takes on computer when running?
The size is quite the same while the variable exists (in data/bss vs. on stack); if it doesn't exist, the stack-based approach wins.
The stack-based approach wins as well concerning other aspects: reentrancy (as already was said) and readability.
And for heap and stack, they get released once used, so there is nothing to worry about size.
Of course you have to worry about the size here as well. Just go and try to allocate one million chunks of 16 MiB in size (at least on a 32 bit machine), and you'll see...
You should use static variables when you need them, and others if you don't.
When you declare a variable static it means it will have to be initialised with 0.So it
is the extra effort for the compiler to initialise it with 0 . So if you have say
100 variables then the work of the compiler will be 100 times than of declaring them
automatic.But automatic variables are not initialised with any value so they contain
garbage.So it is advisable not to use static as long as needed.
Related
I'm currently adapting some example Arduino code to fit my needs. The following snippet confuses me:
// Dont put this on the stack:
uint8_t buf[RH_RF95_MAX_MESSAGE_LEN];
What does it mean to put the buf variable on the stack? How can I avoid doing this? What bad things could happen if I did it?
The program stack has a limited size (even on desktop computers, it's typically capped in megabytes, and on an Arduino, it may be much smaller).
All function local variables for functions are stored there, in a LIFO manner; the variables of your main method are at the bottom of the stack, the variables of the functions called in main on top of that, and so on; space is (typically) reserved on entering a function, and not reclaimed until a function returns. If a function allocates a truly huge buffer (or multiple functions in a call chain allocate slightly smaller buffers) you can rapidly approach the stack limit, which will cause your program to crash.
It sounds like your array is being allocated outside of a function, putting it at global scope. The downside to this is there is only one shared buffer (so two functions can't use it simultaneously without coordinating access, while a stack buffer would be independently reserved for each function), but the upside is that it doesn't cost stack to use it; it's allocated from a separate section of program memory (a section that's usually unbounded, or at least has limits in the gigabyte, rather than megabyte range).
So to answer your questions:
What does it mean to put the buf variable on the stack?
It would be on the stack if it:
Is declared in function scope rather than global scope, and
Is not declared as static (or thread_local, though that's more complicated than you should care about right now); if it's declared static at function scope, it's basically global memory that can only be referenced directly in that specific function
How can I avoid doing this?
Don't declare huge non-static arrays at function scope.
What bad things could happen if I did it?
If the array is large enough, you could suffer a stack overflow from running out of available stack space, crashing your program.
Since global and static variables are initialized to 0 by default, why are local variables not initialized to 0 by default as well?
Because such zero-initializations take execution time. It would make your program significantly slower. Each time you call a function, the program would have to execute pointless overhead code, which sets the variables to zero.
Static variables persist for the whole lifetime of the program, so there you can afford the luxuary to zero-initialize them, because they are only initialized once. While locals are initialized in runtime.
It is not uncommon in realtime systems to enable a compiler option which stops the zero initialization of static storage objects as well. Such an option makes the program non-standard, but also makes it start up faster.
This is because global and static variables live in different memory regions than local variables.
uninitialized static and global variables live in the .bss segment, which is a memory region that is guaranteed to be initialized to zero on program startup, before the program enters `main'
explicitly initialized static and global variables are part of the actual application file, their value is determined at compile-time and loaded into memory together with the application
local variables are dynamically generated at runtime, by growing the stack. If your stack grows over a memory region that holds garbage, then your uninitialized local variables will contain garbage (garbage in, garbage out).
Because that would take time, and it's not always the case that you need them to be zero.
The allocation of local variables (typically on the CPU's hardware stack) is very fast, much less than one instruction per variable and basically independent of the size of the variables.
So any initialization code (which generally would not be independent of the size of the variables) would add a relatively massive amount of overhead, compared to the allocation, and since you cannot be sure that the initialization is needed, it would be very disruptive when optimizing for performance.
Global/static variables are different, they generally live in a segment of the program's binary that is set to 0 by the program loader anyway, so you get that "for free".
Mainly historical. Back when C was being defined, the zero
initialization of static variables was handled automatically by
the OS, and would occur anyway, where as the zero initialization
of local variables would require runtime. The first is still
true today on a lot of systmes (including all Unix and Windows).
The second is far less an issue, however; most compilers would
detect a superfluous initialization in most cases, and skip it,
and in the cases where the compiler couldn't do so, the rest of
the code would be complicated enough that the time required for
the initialization wouldn't be measurable. You can still
construct special cases where this wouldn't be the case, but
they're certainly very rare. However, the original C was
specified like this, and none of the committees have reviewed
the issue since.
Global and static variables are stored at Data Segment [Data in the uninitialised data segment is initialized by the kernel to arithmetic 0 before the program starts executing], while local variables are stored at call stack.
The global or static variables that are initialized by you explicitly will be stored in the .data segment (initialized data) and the uninitialized global or static variables are stored in the .bss (uninitialized data).
This .bss is not stored in the compiled .obj files because there is no data available for these variables (remember you have not initialized them with any certain data).
Now, when the OS loads an exe, it just looks at the size of the .bss segment, allocates that much memory, and zero-initializes it for you (exec). That's why it is necessary to initialize .bss segment to zero.
The local variables are not initialized because there is no such need to initialize them. They are stored at the stack level, so loading an exe will automatically load that amount of memory needed by the local variables. So, why to do extra initialization of local variables and make our program slower.
Suppose you need to call a function 100 times and if there is a local variable and suppose if it were
to initialise to 0 every time...oops there will be extra overhead and wastage of time.
on the other hand global variables are initialised only once.so we can afford its default initialising to 0.
I am new to c++ and have one question to global variables. I see in many examples that global variables are pointers with addresses of the heap. So the pointers are in the memory for global/static variables and the data behind the addresses is on the heap, right?
Instead of this you can declare global (no-pointer) variables that are stored the data. So the data is stored in the memory for global/static variables and not on the heap.
Has this solution any disadvantages over the first solution with the pointers and the heap?
Edit:
First solution:
//global
Sport *sport;
//somewhere
sport = new Sport;
Second solution:
//global
Sport sport;
A disadvantage of storing your data in a global/static variable is that the size is fixed at compile time and can't be changed as opposed to heap storage where the size can be determined at runtime and grow or shrink repeatedly over the run. The lifetime is also fixed as the complete run of the program from start to finish for global/static variables as opposed to heap storage where it can be acquired and released (even repeatedly) all through the runtime of the program. On the other hand, global and static storage management is all handled for you by the compiler where as heap storage has to be explicitly managed by your code. So in summary, global/static storage is easier but not as flexible as heap storage.
You are right in your hypothesis of where the objects are located. About usage,
It's horses for courses. There is no definite rule, it depends on the design & the type of functionality you want to implement. For example:
One may choose the pointer version to achieve lazy initialization or polymorphic behavior, neither of which is possible with global non pointer object approach.
Right. Declared variables go in the DataSegment. And they sit there for the life of the program. You cannot free them. You cannot reallocate them. In Windows, the DataSegment is a fixed size....if you put everything there you may run out of memory (at least it used to be this way).
I'm aware that questions about the stack vs. the heap have been asked several times, but I'm confused about one small aspect of choosing how to declare objects in C++.
I understand that the heap--accessed with the "new" operator--is used for dynamic memory allocation. According to an answer to another question on Stack Overflow, "the heap is for storage of data where the lifetime of the storage cannot be determined ahead of time". The stack is faster than the heap, and seems to be used for variables of local scope, i.e., the variables are automatically deleted when the relevant section of code is completed. The stack also has a relatively limited amount of available space.
In my case, I know prior to runtime that I will need an array of pointers to exactly 500 objects of a particular class, and I know I will need to store the pointers and the objects throughout the duration of runtime. The heap doesn't make sense because I know beforehand how long I will need the memory and I know exactly how man objects I will need. The stack also doesn't make sense if it is limited in scope; plus, I don't know if it can actually hold all of my objects/pointers.
What would be the best way to approach this situation and why? Thanks!
Objects allocated on the stack in main() have a lifetime of the entire run of the program, so that's an option. An array of 500 pointers is either 2000 or 4000 bytes depending on whether your pointers are 32 or 64 bits wide -- if you were programming in an environment whose stack limit was that small, you would know it (such environments do exist: for instance, kernel mode stacks are often 8192 bytes or smaller in total) so I wouldn't hesitate to put the array there.
Depending on how big your objects are, it might also be reasonable to put them on the stack -- the typical stack limit in user space nowadays is order of 8 megabytes, which is not so large that you can totally ignore it, but is not peanuts, either.
If they are too big for the stack, I would seriously consider making a global variable that was an array of the objects themselves. The major downside of this is you can't control precisely when they are initialized. If the objects have nontrivial constructors this is very likely to be a problem. An alternative is to allocate storage for the objects as a global variable, initialize them at the appropriate point within main using placement new, and explicitly call their destructors on the way out. This requires care in the presence of exceptions; I'd write a one-off RAII class that encapsulated the job.
It is not a matter of stack or heap (which to be accurate do not mean what you think in c++: they are just data structures like vector, set or queue). It is a matter of storage duration.
You most likely need here static duration objects, which can be either global, or members of a class. Automatic variables declared inside the main function could also do the job, if you design a way to access them from your other code.
There is some information about the different storage durations of C++ (automatic, static, dynamic) there. The accepted answer however uses the confusing stack/heap terminology, but the explanation is correct.
the heap is for storage of data where the lifetime of the storage cannot be determined ahead of time
While that is correct, it's also incomplete.
The stack unwinds when you exit its scope, so using it for global scoped variables is unfeasible, like you said. This however is where you stop being on the right track. While you know the lifetime (or more accurately the scope since that's the most important factor here), you also know it's above the stack frame, so given only two choices, you put it on the heap.
There is a third option, an actual static variable declared at the top scope, but this will only work if your objects have default constructors.
TL;DR: use global (static) storage for either a pointer to the array (dynamic allocation) or just the actual array (static allocation).
Note: Your assumption that the stack is somehow "faster" than the heap is wrong, they're both backed in the same RAM, you just access it relative to different registers. Also I'd like to mention yet again how much I dislike the use of the terms stack and heap.
The stack, as you mention, often has size limits. That leaves you with two choices - dynamically allocate your objects, or make them global. The time cost to allocate all of your objects once at application startup is almost certainly not of significant concern. So just pick whichever method you prefer and do it.
I think you are confusing the use of the stack (storage for local variables and parameters) and unscoped data (static class variables and data allocated via new or malloc). One appropriate solution based on your description would be a static class that has your array of pointers declared as a static class member. This would be allocated on a heap like structure (maybe even the heap depending on your C++ implementation). a "quick and dirty" solution would be to declare the array as a static variable (basically a global variable), however it isn't the best approach from a maintainability perspective.
The best concept to go by is "Use the stack when you can and the heap when you must." I don't see why the stack wouldn't be able to hold all of your objects unless they're large or you're working with a limited resources system. Try the stack and if it can't handle it, the time it would take to allocate the memory on the heap can all be done early in the program's execution and wouldn't be a significant problem.
Unless speed is an issue or you can't afford the overhead, you should stick the objects in a std::vector. If copy semantics aren't defined for the objects, you should use a std::vector of std::shared_ptrs.
Where are variables in C++ stored?
Inside the RAM or the processor's cache?
Named variables are stored:
On the stack, if they're function-local variables.
C++ calls this "automatic storage"1 and doesn't require it to actually be the asm call stack, and in some rare implementations it isn't. But in mainstream implementations it is.
In a per-process data area if they are global or static.
C++ calls this "static storage class"; it's implemented in asm by putting / reserving bytes in section .data, .bss, .rodata, or similar.
If the variable is a pointer initialized with int *p = new int[10]; or similar, the pointer variable p will go in automatic storage or static storage as above. The pointed-to object in memory is:
On the heap (what C++ calls dynamic storage), allocated with new or malloc, etc.
In asm, this means calling an allocator function, which may ultimately get new memory from the OS via some kind of system call if its free-list is empty. "The heap" isn't a single contiguous region in modern OSes / C++ implementations.
C and C++ don't do automatic garbage collection, and named variables can't themselves be in dynamic storage ("the heap"). Objects in dynamic storage are anonymous, other than being pointed-to by other objects, some of which may be proper variables. (An object of struct or class type, as opposed to primitive types like int, can let you refer to named class members in this anonymous object. In a member function they even look identical.)
This is why you can't (safely/usefully) return a pointer or reference to a local variable.
This is all in RAM, of course. Caching is transparent to userspace processes, though it may visibly affect performance.
Compilers may optimize code to store variables in registers. This is highly compiler and code-dependent, but good compilers will do so aggressively.
Footnote 1: Fun fact: auto in C++03 and earlier, and still in C, meant automatic storage-class, but now (C++11) it infers types.
For C++ in general, the proper answer is "wherever your compiler decides to put them". You should not make assumptions otherwise, unless you somehow direct your compiler otherwise. Some variables can be stored entirely in registers, and some might be totally optimized away and replaced by a literal somewhere. With some compilers on some platforms, constants might actually end up in ROM.
The part of your question about "the processor's cache" is a bit confused. There are some tools for directing how the processor handles its cache, but in general that is the processor's business and should be invisible to you. You can think of the cache as your CPU's window into RAM. Pretty much any memory access goes through the cache.
On the other end of the equation, unused RAM sometimes will get swapped out to disk on most OSes. So its possible (but unlikely) that at some moments your variables are actually being stored on disk. :-)
Variables are usually stored in RAM. This is either on the Heap (e.g. global variables, static variables in methods/functions) or on the Stack (e.g. non-static variables declared within a method/function). Stack and Heap are both RAM, just different locations.
Pointers are a bit special. Pointers themselves follow the rules above but the data they point to is typically stored on the Heap (memory blocks created with malloc, objects created with new). Yet you can create pointers pointing to stack memory: int a = 10; int * b = &a;; b points to the memory of a and a is stored on the stack.
What goes into CPU cache is beyond compilers control, the CPU decides itself what to cache and how to long to cache it (depending on factors like "Has this data been recently used?" or "Is it to be expected that the data is used pretty soon again?") and of course the size of the cache has a big influence as well.
The compiler can only decide which data goes into a CPU register. Usually data is kept there if it's accessed very often in a row since register access is faster than cache and much faster than RAM. Some operations on certain systems can actually only be performed if the data is in a register, in that case the compiler must move data to a register before performing the operation and can only decide when to move the data back to RAM.
Compilers will always try to keep the most often accessed data in a register. When a method/function is called, usually all register values are written back to RAM, unless the compiler can say for sure that the called function/method will not access the memory where the data came from. Also on return of a method/function it must write all register data back to RAM, otherwise the new values would be lost. The return value itself is passed in a register on some CPU architectures, it is passed via stack otherwise.
Variables in C++ are stored either on the stack or the heap.
stack:
int x;
heap:
int *p = new int;
That being said, both are structures built in RAM.
If your RAM usage is high though windows can swap this out to disk.
When computation is done on variables, the memory will be copied to registers.
C++ is not aware of your processor's cache.
When you are running a program, written in C++ or any other language, your CPU will keep a copy of "popular" chunks of RAM in a cache. That's done at the hardware level.
Don't think of CPU cache as "other" or "more" memory...it's just a mechanism to keep some chunks of RAM close by.
I think you are mixing up two concepts. One, how does the C++ language store variables in memory. Two, how does the computer and operating system manage that memory.
In C++, variables can be allocated on the stack, which is memory that is reserved for the program's use and is fixed in size at thread start or in dynamic memory which can be allocated on the fly using new. A compiler can also choose to store the variables on registers in the processor if analysis of the code will allow it. Those variables would never see the system memory.
If a variable ends up in memory, the OS and the processor chip set take over. Both stack based addresses and dynamic addresses are virtual. That means that they may or may not be resident in system memory at any given time. The in memory variable may be stored in the systems memory, paged onto disk or may be resident in a cache on or near the processor. So, it's hard to know where that data is actually living. If a program hasn't been idle for a time or two programs are competing for memory resources, the value can be saved off to disk in the page file and restored when it is the programs turn to run. If the variable is local to some work being done, it could be modified in the processors cache several times before it is finally flushed back to the system memory. The code you wrote would never know this happened. All it knows is that it has an address to operate on and all of the other systems take care of the rest.
Variables can be held in a number of different places, sometimes in more than one place. Most variables are placed in RAM when a program is loaded; sometimes variables which are declared const are instead placed in ROM. Whenever a variable is accessed, if it is not in the processor's cache, a cache miss will result, and the processor will stall while the variable is copied from RAM/ROM into the cache.
If you have any halfway decent optimizing compiler, local variables will often instead be stored in a processor's register file. Variables will move back and forth between RAM, the cache, and the register file as they are read and written, but they will generally always have a copy in RAM/ROM, unless the compiler decides that's not necessary.
The C++ language supports two kinds of memory allocation through the variables in C++ programs:
Static allocation is what happens when you declare a static or global variable. Each static or global variable defines one block of space, of a fixed size. The space is allocated once, when your program is started (part of the exec operation), and is never freed.
Automatic allocation happens when you declare an automatic variable, such as a function argument or a local variable. The space for an automatic variable is allocated when the compound statement containing the declaration is entered, and is freed when that compound statement is exited. The size of the automatic storage can be an expression that varies. In other CPP implementations, it must be a constant.
A third important kind of memory allocation, dynamic allocation, is not supported by C++ variables but is available Library functions.
Dynamic Memory Allocation
Dynamic memory allocation is a technique in which programs determine as they are running where to store some information. You need dynamic allocation when the amount of memory you need, or how long you continue to need it, depends on factors that are not known before the program runs.
For example, you may need a block to store a line read from an input file; since there is no limit to how long a line can be, you must allocate the memory dynamically and make it dynamically larger as you read more of the line.
Or, you may need a block for each record or each definition in the input data; since you can't know in advance how many there will be, you must allocate a new block for each record or definition as you read it.
When you use dynamic allocation, the allocation of a block of memory is an action that the program requests explicitly. You call a function or macro when you want to allocate space, and specify the size with an argument. If you want to free the space, you do so by calling another function or macro. You can do these things whenever you want, as often as you want.
Dynamic allocation is not supported by CPP variables; there is no storage class “dynamic”, and there can never be a CPP variable whose value is stored in dynamically allocated space. The only way to get dynamically allocated memory is via a system call , and the only way to refer to dynamically allocated space is through a pointer. Because it is less convenient, and because the actual process of dynamic allocation requires more computation time, programmers generally use dynamic allocation only when neither static nor automatic allocation will serve.
For example, if you want to allocate dynamically some space to hold a struct foobar, you cannot declare a variable of type struct foobar whose contents are the dynamically allocated space. But you can declare a variable of pointer type struct foobar * and assign it the address of the space. Then you can use the operators ‘*’ and ‘->’ on this pointer variable to refer to the contents of the space:
{
struct foobar *ptr
= (struct foobar *) malloc (sizeof (struct foobar));
ptr->name = x;
ptr->next = current_foobar;
current_foobar = ptr;
}
depending on how they are declared, they will either be stored in the "heap" or the "stack"
The heap is a dynamic data structure that the application can use.
When the application uses data it has to be moved to the CPU's registers right before they are consumed, however this is very volatile and temporary storage.