MASM locals: dynamically allocated data? - c++

I'm learning masm32, following some tutorials.
In one tutorial: http://win32assembly.online.fr/tut3.html
there is stated:
LOCAL directive allocates memory from the stack for local variables
used in the function. The bunch of LOCAL directives must be
immediately below the PROC directive. The LOCAL directive is
immediately followed by :.
So LOCAL wc:WNDCLASSEX tells MASM to allocate memory from the stack
the size of WNDCLASSEX structure for the variable named wc. We can
refer to wc in our codes without any difficulty involved in stack
manipulation. That's really a godsend, I think. The downside is that
local variables cannot be used outside the function they're created
and will be automatically destroyed when the function returns to the
caller. Another drawback is that you cannot initialize local variables
automatically because they're just stack memory allocated dynamically
when the function is entered . You have to manually assign them with
desired values after LOCAL directives.
I've always been told stack memory is static, and any dynamic allocation is heap.
Can we really consider those as locals in the sense of C++ then?
When you create local variables in C++, will those variables be dynamically allocated on the stack as well?

Can we really consider those as locals in the sense of C++ then? When you create local variables in C++, will those variables be dynamically allocated on the stack as well?
In C++, local (automatic) variables live on the stack, so yes and yes.
They are allocated dynamically in the sense that they come and go as the function is entered/exited. However, as you rightly point out, this type of allocation is rather different from heap allocation.
In addition to the heap and the stack, there is a third area where variables can reside. It is the data segment. It's where global as well as function- and class-level static variables live.

Related

C++ variables and where they are stored in memory (stack, heap, static)

I've just begun learning C++ and I wanted to get my head around the different ways to create variables and what the different keywords mean. I couldn't find any description that really went through it, so I wrote this to try to understand what's going on. Have I missed anything? Am I wrong about anything?
Global Variables
Global variables are stored neither on the heap nor stack.
static global variables are non-exported (standard global variables can be accessed with extern, static globals cannot)
Dynamic Variables
Any variable that is accessed with a pointer is stored on the heap.
Heap variables are allocated with the new keyword, which returns a pointer to the memory address on the heap.
The pointer itself is a standard stack variable.
Variables inside {} that aren't created with new
Stored in the stack, which is limited in size so it should be used only for primitives and small data structures.
static keyword means the variable is essentially global and stored in the same memory space as global variables, but scope is restricted to this function/class.
const keyword means you can't change the variable.
thread_local is like static but each thread gets its own variable.
Register
A variable can be declared as register to hint to the compiler that it should be stored in the register.
The compiler will probaby ignore this and apply it to whatever it thinks would be the best improvement.
Typical usage would be for an index or pointer being used as an interator in a loop.
Good practice
Use const by default when applicable, its faster.
Be wary of static and globals in multithreaded applications, instead use thread_local or mutex
Use register on iterators
Notes
Any variables created inside a function (non-global) that is not static or thread_local and is not created with new, will be on the stack. Stack variables should not exceed more than a few KB in memory, otherwise use new to put it on the heap.
The full available system memory can be used for variables with static keyword, thread_local keyword, created with new, or global.
Variables created with new need to be freed with delete. All others are automatically freed when they're out of scope, except static, thead_local and globals which are freed when the program ends.
Despite all the parroting about how globals should not be used, don't be bullied: they are great for some use cases, and more efficient than variables allocated on the heap. Mutexes will be needed to avoid race conditions in multi-threaded applications.
Mostly right.
Any variable that is accessed with a pointer is stored on the heap.
This isn't true. You can have pointers to stack-based or global variables.
Also it's worth pointing out that global variables are generally unified by the linker (i.e. if two modules have "int i" at global scope, you'll only have one global variable called "i"). Dynamic libraries complicate that slightly; on Windows, DLLs don't have that behaviour (i.e. an "int i" in a Windows DLL will not be the same "int i" as in another DLL in the same process, or as the main executable), while most other platforms dynamic libraries do. There are some additional complications on Darwin (iOS/macOS) which has a hierarchical namespace for symbols; as long as you're linking with the flat_namespace option, what I just said will hold.
Additionally, it's worth talking about initialisation behaviour; global variables are initialised automatically by the runtime (typically either using special linker features or by means of a call that is inserted into the code for your main function). The order of initialisation of globals isn't guaranteed. However, static variables declared at function scope are initialised when that function is first executed, and not at program start-up as you might suppose, and that feature is commonly used by C++ programmers to do lazy initialisation.
(Similar concerns apply to destructors for global objects; those are best avoided entirely IMO, not least because on some platforms there are fast termination features that simply won't call them.)
const keyword means you can't change the variable.
Almost. const affects the type, and there is a difference depending on where you write it exactly. For example
const char *foo;
should be read as foo is a pointer to a const char, i.e. foo itself is not const, but the thing it points at is. Contrast with
char * const foo;
which says that foo is a const pointer to char.
Finally, you've missed out volatile, the point of which is to tell the compiler not to make assumptions about the thing to which it applies (e.g. it can't assume that it's safe to cache a volatile value in a register, or to optimise away accesses, or in general to optimise across any operation that affects a volatile value). Hopefully you'll never need to use volatile; it's most often useful if you're doing really low-level things that frankly a lot of people have no need to go anywhere near.
The other answer is correct, but doesn't mention the use of register.
The compiler will probaby ignore this and apply it to whatever it thinks would be the best improvement.
This is correct. Compilers are so good at choosing variables to put in registers (and typical programmer is bad at that), that C++ committees decided it's completely useless.
This keyword was deprecated in C++11 and removed in C++17 (but it's still reserved for possible future use).
Do not use it.
You need to differentiate between specification and implementation. The specification does not say anything about stack and heap, because that's an implementation detail. They purposely talk about Storage duration.
How this storage duration is achieved depends on the target environment and if the compiler needs to do allocations those at all or if these values can be determined at the compile-time, and are then only part of the machine code (which for sure is also at some part of the memory).
So most of your descriptions would be For the target platform XY it will generally allocate on stack/heap if I do XY
C++ could also be used as an interpreted language e.g. cling that could have completely different ways of handling memory.
It could be cross-compiled to some kind of byte interpreter in which every type is dynamically allocated.
And when it comes to embedded systems the way how memory is managed/handled might be even more different.
Heap variables are allocated with the new keyword, which returns a pointer to the memory address on the heap.
If the default operator new, operator new[] are mapped to something like malloc (or any other equivalent in the given OS) this is likely the case (if the object really needs to be allocated).
But for embedded systems, it might be the case that operator new, operator new[] aren't implemented at all. The "OS" just might provide you a chunk of memory for the application that is handled like stack memory for which you manually reserve a certain amount of memory, and you implement a operator new and operator new[] that works with this preallocated memory, so in such a case you only have stack memory.
Besides that, you can create a custom operator new for certain classes that allocates the memory on some hardware that is different to the "regular" memory provided by the OS.
The std::vector is allocating the memory in the same memory space that new is allocating it, i.e. the heap, or its not? This is important because it changes how I use it.
A std::vector is defined as template<class T, class Allocator = std::allocator<T>> class vector; so there is a default behavior (that is given by the implementation) where the vector allocates memory, for common Desktop OS it uses something like OS call like malloc to dynamically allocate memory. But you could also provide a custom allocator that uses memory at any other addressable memory location (e.g. stack).

Reason for not using static variables?

I thought using static variables might cause ambiguity in readability of the code but nothing special. But now, I know that there are 5 data sagments: text, data, bss, heap, stack. text segment is for code, data seg. is for declared variables, bss seg. is for undeclared variables, heap is for pointers, and stack if for variables of functions.
Would it be better not to use static variable over local variable to minimize the size the program takes on computer when running?
I'm pretty sure static variable and global variable are saved in bss or data segment. And the size of bss and data segment does not change after compiled. And for heap and stack, they get released once used, so there is nothing to worry about size.
Am I right in thinking this?
Text segment is for code, data seg. is for declared variables, bss seg. is for undeclared variables
So far you are right.
heap is for pointers
No. Heap is for data allocated via malloc() and, in the case of C++, new.
The pointers are stored wherever you put them (data, bss, stack).
and stack for variables of functions.
And for function arguments.
Would it be better not to use static variable over local variable to minimize the size the program takes on computer when running?
The size is quite the same while the variable exists (in data/bss vs. on stack); if it doesn't exist, the stack-based approach wins.
The stack-based approach wins as well concerning other aspects: reentrancy (as already was said) and readability.
And for heap and stack, they get released once used, so there is nothing to worry about size.
Of course you have to worry about the size here as well. Just go and try to allocate one million chunks of 16 MiB in size (at least on a 32 bit machine), and you'll see...
You should use static variables when you need them, and others if you don't.
When you declare a variable static it means it will have to be initialised with 0.So it
is the extra effort for the compiler to initialise it with 0 . So if you have say
100 variables then the work of the compiler will be 100 times than of declaring them
automatic.But automatic variables are not initialised with any value so they contain
garbage.So it is advisable not to use static as long as needed.

C++: Global variable as pointer

I am new to c++ and have one question to global variables. I see in many examples that global variables are pointers with addresses of the heap. So the pointers are in the memory for global/static variables and the data behind the addresses is on the heap, right?
Instead of this you can declare global (no-pointer) variables that are stored the data. So the data is stored in the memory for global/static variables and not on the heap.
Has this solution any disadvantages over the first solution with the pointers and the heap?
Edit:
First solution:
//global
Sport *sport;
//somewhere
sport = new Sport;
Second solution:
//global
Sport sport;
A disadvantage of storing your data in a global/static variable is that the size is fixed at compile time and can't be changed as opposed to heap storage where the size can be determined at runtime and grow or shrink repeatedly over the run. The lifetime is also fixed as the complete run of the program from start to finish for global/static variables as opposed to heap storage where it can be acquired and released (even repeatedly) all through the runtime of the program. On the other hand, global and static storage management is all handled for you by the compiler where as heap storage has to be explicitly managed by your code. So in summary, global/static storage is easier but not as flexible as heap storage.
You are right in your hypothesis of where the objects are located. About usage,
It's horses for courses. There is no definite rule, it depends on the design & the type of functionality you want to implement. For example:
One may choose the pointer version to achieve lazy initialization or polymorphic behavior, neither of which is possible with global non pointer object approach.
Right. Declared variables go in the DataSegment. And they sit there for the life of the program. You cannot free them. You cannot reallocate them. In Windows, the DataSegment is a fixed size....if you put everything there you may run out of memory (at least it used to be this way).

How is the memory layout of a C/C++ program?

I know that there are sections like Stack, Heap, Code and Data. Stack/Heap do they use the same section of memory as they can grow independently?
What is this code section? When I have a function is it a part of the stack or the code section?
Also what is this initialized/uninitialized data segment?
Are there read only memory section available? When I have a const variable, what is actually happening is it that the compiler marks a memory section as read only or does it put into a read only memory section.
Where are static data kept?
Where are global data kept?
Any good references/articles for the same?
I thought the memory sections and layout are OS independent and it has more to do with compiler. Doesn't Stack, Heap, Code, Data [Initialized, Uninitialized] segment occur in all the OS? When there is a static data, what is happening the compiler has understood it is static, what next, what will it do? It is the compiler which is managing the program and it should know what to do right? All compilers shouldn't they follow common standards?
There's very little that's actually definitive about C++ memory layouts. However, most modern OS's use a somewhat similar system, and the segments are separated based on permissions.
Code has execute permission. The other segments don't. In a Windows application, you can't just put some native code on the stack and execute. Linux offers the same functionality- it's in the x86 architecture.
Data is data that's part of the result (.exe, etc) but can't be written to. This section is basically where literals go. Only read permission in this section.
Those two segments are part of the resulting file. Stack and Heap are runtime allocated, instead of mapped off the hard drive.
Stack is essentially one, large (1MB or so, many compilers offer a setting for it) heap allocation. The compiler manages it for you.
Heap memory is memory that the OS returns to you through some process. Normally, heap is a heap (the data structure) of pointers to free memory blocks and their sizes. When you request one, it's given to you. Both read and write permissions here, but no execute.
There is read-only memory(ROM). However, this is just the Data section. You can't alter it at runtime. When you make a const variable, nothing special happens to it in memory. All that happens is that the compiler will only create certain instructions on it. That's it. x86 has no knowledge or notion of const- it's all in the compiler.
AFAIK:
Stack/Heap
do they use the same section of memory
as they can grow independently?
They can grow indipendently.
What is this code section?
A read-only segment where code and const data are stored.
When I have a function is it a part of the stack or
the code section?
The definition (code) of the function will be in the CS. The arguments of each call are passed on the stack.
Also what is this
initialized/uninitialized data
segment?
The data segment is where globals/static variables are stored.
Are there read only memory section
available?
The code segment. I suppose some OS's might offer primitives for creating custom read-only segments.
When I have a const variable, what is actually happening
is it that the compiler marks a memory
section as read only or does it put
into a read only memory section.
It goes into the CS.
Where are static data kept? Where are
global data kept?
The data segment.
I was in same dilemma when I was reading about memory layout's of C/C++. Here is the link which I followed to get the questions cleared.
http://www.geeksforgeeks.org/memory-layout-of-c-program/
The link's main illustration is added here:
I hope this helps 'the one' finding answers to similar question.
(Note: The following applies to Linux)
The stack and heap of a process both exist in the "same" part of a process's memory. The stack and heap grow towards each other (initially, when the process is started, the stack occupies the entire area that can be occupied by the combination of the stack and the heap; each memory allocation (malloc/free/new/delete) can push the boundary between the stack and the heap either up or down). The BSS section, also located on the same OS-allocated process space, is in its own section and contains global variables. Read-only data resides in the rodata section and contains such things as string literals. For example, if your code has the line:
char tmpStr[] = "hello";
Then, the portion of the source code containing "hello" will reside in the rodata section.
A good, thorough book on this is Randall E. Bryant's Computer Systems.
As an addendum to the answers, here is a quote from GotW that classifies some major memory areas (note the difference between free-store, which is what I would usually refer to as the heap, and the actual heap, which is the part managed through malloc/free). The article is a bit old so I don't know if it applies to modern C++; so far I haven't found a direct contradiction.
Const Data The const data area stores string literals and
other data whose values are known at compile
time. No objects of class type can exist in
this area. All data in this area is available
during the entire lifetime of the program. Further, all
of this data is read-only, and the
results of trying to modify it are undefined.
This is in part because even the underlying
storage format is subject to arbitrary
optimization by the implementation. For
example, a particular compiler may store string
literals in overlapping objects if it wants to.
Stack The stack stores automatic variables. Typically
allocation is much faster than for dynamic
storage (heap or free store) because a memory
allocation involves only pointer increment
rather than more complex management. Objects
are constructed immediately after memory is
allocated and destroyed immediately before
memory is deallocated, so there is no
opportunity for programmers to directly
manipulate allocated but uninitialized stack
space (barring willful tampering using explicit
dtors and placement new).
Free Store The free store is one of the two dynamic memory
areas, allocated/freed by new/delete. Object
lifetime can be less than the time the storage
is allocated; that is, free store objects can
have memory allocated without being immediately
initialized, and can be destroyed without the
memory being immediately deallocated. During
the period when the storage is allocated but
outside the object's lifetime, the storage may
be accessed and manipulated through a void* but
none of the proto-object's nonstatic members or
member functions may be accessed, have their
addresses taken, or be otherwise manipulated.
Heap The heap is the other dynamic memory area,
allocated/freed by malloc/free and their
variants. Note that while the default global
new and delete might be implemented in terms of
malloc and free by a particular compiler, the
heap is not the same as free store and memory
allocated in one area cannot be safely
deallocated in the other. Memory allocated from
the heap can be used for objects of class type
by placement-new construction and explicit
destruction. If so used, the notes about free
store object lifetime apply similarly here.
Global/Static Global or static variables and objects have
their storage allocated at program startup, but
may not be initialized until after the program
has begun executing. For instance, a static
variable in a function is initialized only the
first time program execution passes through its
definition. The order of initialization of
global variables across translation units is not
defined, and special care is needed to manage
dependencies between global objects (including
class statics). As always, uninitialized proto-
objects' storage may be accessed and manipulated
through a void* but no nonstatic members or
member functions may be used or referenced
outside the object's actual lifetime.

Stack Frame Question: Java vs C++

Q1. In Java, all objects, arrays and class variables are stored on the heap? Is the same true for C++? Is data segment a part of Heap?
What about the following code in C++?
class MyClass{
private:
static int counter;
static int number;
};
MyClass::number = 100;
Q2. As far as my understanding goes, variables which are given a specific value by compiler are stored in data segment, and unintialized global and static variables are stored in BSS (Block started by symbol). In this case, MyClass::counter being static is initialized to zero by the compiler and so it is stored at BSS and MyClass::number which is initialized to 100 is stored in the data segment. Am I correct in making the conclusion?
Q3. Consider following piece of codes:
void doHello(MyClass &localObj){
// 3.1 localObj is a reference parameter, where will this get stored in Heap or Stack?
// do something
}
void doHelloAgain(MyClass localObj){
// 3.2 localObj is a parameter, where will this get stored in Heap or Stack?
// do something
}
int main(){
MyClass *a = new MyClass(); // stored in heap
MyClass localObj;
// 3.3 Where is this stored in heap or stack?
doHello(localObj);
doHelloAgain(localObj);
}
I hope I have made my questions clear to all
EDIT:
Please refer this article for some understanding on BSS
EDIT1: Changed the class name from MyInstance to MyClass as it was a poor name. Sincere Apologies
EDIT2: Changed the class member variable number from non-static to static
This is somewhat simplified but mostly accurate to the best of my knowledge.
In Java, all objects are allocated on the heap (including all your member variables). Most other stuff (parameters) are references, and the references themselves are stored on the stack along with native types (ints, longs, etc) except string which is more of an object than a native type.
In C++, if you were to allocate all objects with the "new" keyword it would be pretty much the same situation as java, but there is one unique case in C++ because you can allocate objects on the stack instead (you don't always have to use "new").
Also note that Java's heap performance is closer to C's stack performance than C's heap performance, the garbage collector does some pretty smart stuff. It's still not quite as good as stack, but much better than a heap. This is necessary since Java can't allocate objects on the stack.
Q1
Java also stores variables on the stack but class instances are allocated on the heap. In C++ you are free to allocate your class instances either on the stack or on the heap. By using the new keyword you allocate the instance on the heap.
The data segment is not part of the heap, but is allocated when the process starts. The heap is used for dynamic memory allocations while the data segment is static and the contents is known at compile time.
The BSS segment is simply an optimization where all the data belongning to the data segment (e.g. string, constant numbers etc.) that are not initialized or initialized to zero are moved to the BSS segment. The data segment has to be embedded into the executable and by moveing "all the zeros" to the end they can be removed from the executable. When the executable is loaded the BSS segment is allocated and initialized to zero, and the compiler is still able to know the addresses of the various buffers, variables etc. inside the BSS segment.
Q2
MyClass::number is stored where the instance of MyClass class is allocated. It could be either on the heap or on the stack. Notice in Q3 how a points to an instance of MyClass allocated on the heap while localObj is allocated on the stack. Thus a->number is located on the heap while localObj.number is located on the stack.
As MyClass::number is an instance variable you cannot assign it like this:
MyClass::number = 100;
However, you can assign MyClass::counter as it is static (except that it is private):
MyClass::counter = 100;
Q3
When you call doHello the variable localObj (in main) is passed by reference. The variable localObj in doHello refers back to that variable on the stack. If you change it the changes will be stored on the stack where localObj in main is allocated.
When you call doHelloAgain the variable localObj (in main) is copied onto the stack. Inside doHelloAgain the variable localObj is allocated on the stack and only exists for the duration of the call.
In C++, objects may be allocated on the stack...for example, localObj in your Q3 main routine.
I sense some confusion about classes versus instances. "MyInstance" makes more sense as a variable name than a class name. In your Q1 example, "number" is present in each object of type MyInstance. "counter" is shared by all instances. "MyInstance::counter = 100" is a valid assignment, but "MyInstance::number = 100" is not, because you haven't specified
which object should have its "number" member assigned to.
Q1. In Java, all objects, arrays and
class variables are stored on the
heap? Is the same true for C++? Is
data segment a part of Heap?
No, the data section is separate from the heap. Basically, the data section is allocated at load time, everything there has a fixed location after that. In addition, objects can be allocated on the stack.
The only time objects are on the heap is if you use the new keyword, or if you use something from the malloc family of functions.
Q2. As far as my understanding goes,
variables which are given a specific
value by compiler are stored in data
segment, and unintialized global and
static variables are stored in BSS
(Block started by symbol). In this
case, MyInstance::counter being static
is initialized to zero by the compiler
and so it is stored at BSS and
MyInstance::number which is
initialized to 100 is stored in the
data segment. Am I correct in making
the conclusion?
Yes, your understanding of the BSS section is correct. However, since number isn't static the code:
MyInstance::number = 100;
isn't legal, it needs to be either made static or initialized in the constructor properly. If you initialize it in the constructor, it will exist wherever the owning object is allocated. If you make it static, it will end up in the data section... if anywhere. Often static const int variables can be inlined directly into the code used such that a global variable isn't needed at all.
Q3. Consider following piece of codes: ...
void doHello(MyInstance &localObj){
localObj is a reference to the passed object. As far as you know, there is no storage, it refers to wherever the variable being passed is. In reality, under the hood, a pointer may be passed on the stack to facilitate this. But The compiler may just as easily optimize that out if it can.
void doHelloAgain(MyInstance localObj){
a copy of the passed parameter is placed on the stack.
MyInstance localObj;
// 3.3 Where is this stored in heap or stack?
localObj is on the stack.
All memory areas in C++ are listed here