Allocation of global variables C++ - c++

It seems unclear as to where global variables get stored when declared different ways and which way is best?
For example, where are the variables stored in each example and what is their scope?
//Example 1 (at the top of a cpp file):
Rectangle rect(15,12);
//Example 2:
Rectangle *rect = new Rectangle(15,12);

"Where are the variables stored" is usually the wrong question. It varies between platforms and the language is designed to provide an abstraction over such details anyway.
Example 1 creates a Rectangle object with static storage duration. It will normally be destroyed automatically after main returns.
Example 2 creates a Rectangle object with dynamic storage duration. It will be destroyed whenever you call delete on the pointer (or, perhaps, call the destructor explicitly); otherwise it won't be destroyed. Informally people say objects of dynamic storage duration are "on the heap", but the implementation detail this evokes has a platform-dependent meaning.

If the first is definied outside a function, it is going to be stored in the DATA segment. If it's defined in a function, it is going to be stored on the stack.
With the second (the pointer itself) it's the same, but the object the pointer is pointing to is going to be allocated on the heap.

At the risk of oversimplification . . . .
An compiler will divide the compilation unit into sections:
- Executable data
- Read only data
- read write data
The linker will collect all the section with the same attributes together. At the end of the link process a global read/write data usually gets merges with read/write data.
This creates read/write data.
Rectangle rect(15,12);
This creates read/write data for rect as well as executable data that calls new at startup.
Rectangle *rect = new Rectangle(15,12);
Ignoring debug information, local variables on have scope only during compilation. After compilation, local variables are only [relative] memory locations. Global variables remain identifiable after compilation. After linking, global variables essentially disappear.
(For simplicity I ignore universal symbols and shared libraries.)

Where the variables "get stored" is implementation defined, and is not in the scope of the C++ standard, except as to the specific semantics of their scope.
Assuming that both declarations are statically-scoped, in both cases 'rect' will be stored at the static scope. In the second case, rect will refer to a heap-allocated object, and throughout the application's lifetime, the application may delete the pointer, and/or reassign it to point to some other instance of this class.

Related

Why do C and C++ compilers place explicitly initialized and default initialized global variables in different segments?

I was reading this great post about memory layout of C programs. It says that default initialized global variables resides in the BSS segment, and if you explicitly provide a value to a global variable then it will reside in the data segment.
I've tested the following programs in C and C++ to examine this behaviour.
#include <iostream>
// Both i and s are having static storage duration
int i; // i will be kept in the BSS segment, default initialized variable, default value=0
int s(5); // s will be kept in the data segment, explicitly initialized variable,
int main()
{
std::cout<<&i<<' '<<&s;
}
Output:
0x488020 0x478004
So, from the output it clearly looks like both variable i & s resides in completely different segments. But if I remove the initializer (initial value 5 in this program) from the variable S and then run the program, it gives me the below output.
Output:
0x488020 0x488024
So, from the output it clearly looks like both variables i and s resides in the same (in this case BSS) segment.
This behaviour is also the same in C.
#include <stdio.h>
int i; // i will be kept in the BSS segment, default initialized variable, default value=0
int s=5; // s will be kept in the data segment, explicitly initialized variable,
int main(void)
{
printf("%p %p\n",(void*)&i,(void*)&s);
}
Output:
004053D0 00403004
So, again we can say by looking at the output (means examining the address of variables), both variable i and s resides in completely different segments. But again if I remove the initializer (initial value 5 in this program) from the variable S and then run the program it gives me the below output.
Output:
004053D0 004053D4
So, from the output it clearly looks like both variables i and s resides in the same (in this case BSS) segment.
Why do C and C++ compilers place explicitly initialized and default initialized global variables in different segments? Why is there a distinction about where the global variable resides between default initialized and explicitly initialized variables? If I am not wrong, the C and C++ standards never talk about the stack, heap, data segment, code segment, BSS segment and all such things which are implementation-specific. So, is it possible for a C++ implementation to store explicitly initialized and default initialized variables in the same segments instead of keeping it in different segments?
Neither language C or C++ has any notion of "segments", and not all OSs do either, so your question is inevitably dependent on the platform and compiler.
That said, common implementations will treat initialized vs. uninitialized variables differently. The main difference is that uninitialized (or default 0-initialized) data does not have to be actually saved with the compiled module, but only declared/reserved for later use at run time. In practical "segment" terms, initialized data is saved to disk as part of the binary, while uninitialized data is not, instead it's allocated at startup to satisfy the declared "reservations".
The really short answer is "because it takes up less space". (As noted by others, the compiler doesn't have to do this!)
In the executable file, the data section will contain data that has its value store in the relative place. This means for every byte of initialized data, that data section contains one byte.
For zero-initialized globals, there is no reason to store a lot of zeros. Instead, just store the size of the whole set of data in one single size-value. So instead of storing 4132 bytes of zero in the data seciton, there is just a "BSS is 4132 bytes long" - and it's up to the OS/runtime to set up so that it is zero. - in some cases, the runtime of the compiler will memset(BSSStart, 0, BSSSize) or similar. In for example Linux, all "unused" memory is filled with zero anyway when the process is created, so setting BSS to zero is just a matter of allocating the memory in the first place.
And of course, shorter executable files have several benefits: Less space taken up on your hard-disk, faster loading time [extra bonus if the OS pre-fills the allocated memory with zero], faster compile time as the compiler/linker doesn't have to write the data to disk.
So there is an entirely practical reason for this.
By definition, BSS is not a different segment, it is a part of data-segment.
In C and C++, statically-allocated objects without an explicit
initializer are initialized to zero, an implementation may also assign
statically-allocated variables and constants initialized with a value
consisting solely of zero-valued bits to the BSS section.
A reason to store them in BSS is, those types of variables with uninitialized or default values can be obtained in run-time without wasting space in the binary files rather than the variables which are placed in data-segment.

How does the compiler know where are static fields allocated?

Imagine you have a class A with a static field int mstatic.
Imagine if that class has a method mymethod that modifies mstatic. When compiling mymethod, how can the adress of mstatic be known ? I know that in case of non-static fields, a pointer to the calling object (the famous "this") is implicitly passed to the method so it is used to find the adresses, but how do we do for static fields ?
Static fields are allocated similarly to namespace-scope or global variables... there's basically one or two areas (variables needing 0 initialisation may be separated from those needing initial non-0 values) sequentially populated with all such variables in the translation unit. If the variable's defined in another variable, the address will be patched in during linking or loading. Note that the addresses are typically effectively hard-coded (fixed address, perhaps from a specific data segment register), unlike stack (which may be stack register relative, but the stack register is modified as functions are called and return, unlike data segment registers which may be set to the same value while the thread is running) or heap hosted variables (where the address is determined during malloc or new.

Global object and creation order

I'm still learning C++. I have one problem. Lets say that your project has global object which always exists e.g ApiManager and all other modules have access to it (by #include). For now I'm doing it by:
Header:
class ApiManager : public QObject
{
Q_OBJECT
public:
explicit ApiManager(QObject *parent = 0);
signals:
public slots:
};
extern ApiManager apiMng;
Source:
ApiManager apiMng;
The problem is that other objects need to have access when initialized too and I noticed that C++ global objects are created alphabetically. I'm wondering how do you deal with it? Exists some trick for this? For example in Free Pascal world each class module has initialization and finalization sections:
Type
TApiManager = class
end;
var ApiMng: TApiManager;
initialization
ApiMng := TApiManager.Create;
finalization
ApiMng.Free;
... and initialization order of project modules can be sorted in project source in uses clause (like #include in C++). I know that there is a lot of ways to do this (for example initialize everything in main.cpp with custom order) but want to know what is a "good habit" in C++ world
Edit: Solved by Q_GLOBAL_STATIC (introduced in Qt 5.1 but work for Qt 4.8 too) but still have two issues:
Still don't know how to manage constructor orders (and where to initialize it). Because global objects created by Q_GLOBAL_STATIC are not created at application startup. They are created on first usage. So I need to "touch" these object somewhere (in main.cpp?) with my custom order.
Documentation is saying that Q_GLOBAL_STATIC must be called in body .cpp file, not in header. But then other classes do not see this object. So I created static function which expose reference to this object:
.cpp:
Q_GLOBAL_STATIC(ApiManager, apiMng)
ApiManager *ApiManager::instance()
{
return apiMng();
}
But from this topic: http://qt-project.org/forums/viewthread/13977 Q_GLOBAL_STATIC should expose instance automatically, but it doesn't
They are not initialized in alphabetical order, and the initialization order among the translation units are undefined as nothing is guaranteed by the standard about it.
Why global variables are evil
Global variables should be avoided for several reasons, but the primary reason is because they increase your program’s complexity immensely. For example, say you were examining a program and you wanted to know what a variable named g_nValue was used for. Because g_nValue is a global, and globals can be used anywhere in the entire program, you’d have to examine every single line of every single file! In a computer program with hundreds of files and millions of lines of code, you can imagine how long this would take!
Second, global variables are dangerous because their values can be changed by any function that is called, and there is no easy way for the programmer to know that this will happen.
Why Global Variables Should Be Avoided When Unnecessary
Non-locality -- Source code is easiest to understand when the scope of its individual elements are limited. Global variables can be read or modified by any part of the program, making it difficult to remember or reason about every possible use.
No Access Control or Constraint Checking -- A global variable can be get or set by any part of the program, and any rules regarding its use can be easily broken or forgotten. (In other words, get/set accessors are generally preferable over direct data access, and this is even more so for global data.) By extension, the lack of access control greatly hinders achieving security in situations where you may wish to run untrusted code (such as working with 3rd party plugins).
Implicit coupling -- A program with many global variables often has tight couplings between some of those variables, and couplings between variables and functions. Grouping coupled items into cohesive units usually leads to better programs.
Concurrency issues -- if globals can be accessed by multiple threads of execution, synchronization is necessary (and too-often neglected). When dynamically linking modules with globals, the composed system might not be thread-safe even if the two independent modules tested in dozens of different contexts were safe.
Namespace pollution -- Global names are available everywhere. You may unknowingly end up using a global when you think you are using a local (by misspelling or forgetting to declare the local) or vice versa. Also, if you ever have to link together modules that have the same global variable names, if you are lucky, you will get linking errors. If you are unlucky, the linker will simply treat all uses of the same name as the same object.
Memory allocation issues -- Some environments have memory allocation schemes that make allocation of globals tricky. This is especially true in languages where "constructors" have side-effects other than allocation (because, in that case, you can express unsafe situations where two globals mutually depend on one another). Also, when dynamically linking modules, it can be unclear whether different libraries have their own instances of globals or whether the globals are shared.
Testing and Confinement - source that utilizes globals is somewhat more difficult to test because one cannot readily set up a 'clean' environment between runs. More generally, source that utilizes global services of any sort (e.g. reading and writing files or databases) that aren't explicitly provided to that source is difficult to test for the same reason. For communicating systems, the ability to test system invariants may require running more than one 'copy' of a system simultaneously, which is greatly hindered by any use of shared services - including global memory - that are not provided for sharing as part of the test.
In general, please avoid global variables as a rule of thumb. If you do need to have them, please use Q_GLOBAL_STATIC.
Creates a global and static object of type QGlobalStatic, of name VariableName and that behaves as a pointer to Type. The object created by Q_GLOBAL_STATIC initializes itself on the first use, which means that it will not increase the application or the library's load time. Additionally, the object is initialized in a thread-safe manner on all platforms.
You can also use Q_GLOBAL_STATIC_WITH_ARGS. Here you can find some inline highlight from the documentation:
Creates a global and static object of type QGlobalStatic, of name VariableName, initialized by the arguments Arguments and that behaves as a pointer to Type. The object created by Q_GLOBAL_STATIC_WITH_ARGS initializes itself on the first use, which means that it will not increase the application or the library's load time. Additionally, the object is initialized in a thread-safe manner on all platforms.
Some people also tend to create a function for wrapping them, but they do not reduce the complexity significantly, and they eventually either forget to make those functions thread-safe, or they put more complexity in. Forget about doing that as well when you can.
The initialization order of global objects is only defined within a translation unit (there it is top to bottom). There is no guarantee between translation units. The typical work-around is to wrap the object into a function and return a reference to a local object:
ApiManager& apiMng() {
static ApiManager rc;
return rc;
}
The local object is initialized the first time the function is called (and, when using C++11 also in a thread-safe fashion). This way, the order of construction of globally accessed objects can be ordered in a useful way.
That said, don't use global objects. They are causing more harm than good.
Good habit in C++ world would be to avoid global objects at all costs - the more localized is the object the better it is.
If you absolutely have to have global object, I think the best would be to initialize objects in custom order in main - to be explicit about initialization order. Fact that you are using qt is one more argument towards initializing in main - you probably would want to initialize QApplication (which requires argc and argv as input arguments) prior to any other QObject.

What does static variable in general mean for various programming language and circumstances?

Static variables are usually: (in most programming languages) shared, persistent, and allocated on the code section of the program
But what does that have anything to do with the word static? What is so static about that? I thought static means doesn't change?
For example, in vb.net static is written shared and that means a member function that can be accessed without object instantiation. Static within function usually means that the variable life time is the life time of the whole program. It seems that static variables are stored on the code section of the computer. Am I correct in my understanding based on the example?
Well, I think the keyword is appropriate. It means the variable you declare as static will remain stored at the same location throughout the whole execution of your program.
I thought static means doesn't change
This corresponds to the const keyword. Const implies it doesn't change, static implies it doesn't "move", as to it stays stored at the same location.
In general, what doesn't change with something that is static in
a programming language is whether it is alive or not. Static
variables are always alive; they have a single instance which
comes into being either at the beginning of the program or the
first time they are visible, and lasts until the end of the
program. Non-static variables come and go, as blocks are
entered and left, or as class instances are created and
destroyed.
In C++, for reasons of C compatibility, static, when applied to
variables at namespace scope, has a completely unrelated
meaning: it means that the variable has internal, rather than
external linkage, and is not visible in other translation units.
Why the word static was adopted for this in early C, I don't
know; I can only guess that they needed something, and didn't
want to introduce a new keyword. (Originally, in the very
earliest versions of C, variables at file scope obeyed the rules
of a Fortran named common block: all variables of the same name
referred to the same storage.) Looking back, of course (with 20/20
hindsight), the default for variables at file scope should have
been internal linkage, with a special keyword (public?) to say
that the variable had external linkage. But this was a lot less
obvious in the early 1970's.
Static is referred to the variable storage. Inside a function call, every variable that you declare is pushed on the stack. Unlike other variables, a static variable isn't pushed on the stack, it's like a global variable, that survives the whole execution of the program, with the difference that is visible only inside the block is declared.
I think you just have to learn the meaning of "static" in computer science, and not relate it to spoken English. Especially as it applies to variables and functions, with slightly different outcomes in C.
The definition of the word from http://dictionary.reference.com/browse/static?s=t
pertaining to or characterized by a fixed or stationary condition.
showing little or no change: a static concept; a static relationship.
A static variable is one that maintains its state even after it goes out of scope as opposed to a non static variable which would be re-initialised every time it came back into scope - so can be thought of in terms of having a "stationary condition" or exhibits "no change"
If you can avoid it, just don't go into static for C++. In any modern language static just means there's only ever one instance and it's never destroyed. That's not too far a stretch from the English meaning, and leads nicely to a discussion of const/final/readonly and what that means.
Static variable means ,there is only one copy of the variable,even if you create multiple instances of the class.That is, all objects of the specified class use the same memory location.Or if you want an example,say , we have two threads .On first thread you create a progressbar and on the second you need to update it.In this case you can define a static variable in your progressbar's class to store the progress and create one instance of the class in each thread.One thread for initialising and in the other you change the value of static variable.Since both use the same copy the progress will be available in the first thread.
So static means something that doesnt change its location on creating a new instance..Or we can say something tha preserves its state ;) Blah blah blah

Copy static class member to local variable for optimization

While browsing open source code (from OpenCV), I came across the following type of code inside a method:
// copy class member to local variable for optimization
int foo = _foo; //where _foo is a class member
for (...) //a heavy loop that makes use of foo
From another question on SO I've concluded that the answer to whether or not this actually needs to be done or is done automatically by the compiler may be compiler/setting dependent.
My question is if it would make any difference if _foo were a static class member? Would there still be a point in this manual optimization, or is accessing a static class member no more 'expensive' than accessing a local variable?
P.S. - I'm asking out of curiosity, not to solve a specific problem.
Accessing a property means de-referencing the object, in order to access it.
As the property may change during the execution (read threads), the compiler will read the value from memory each time the value is accessed.
Using a local variable will allow the compiler to use a register for the value, as it can safely assume the value won't change from the outside. This way, the value is read only once from memory.
About your question concerning the static member, it's the same, as it can also be changed by another thread, for instance. The compiler will also need to read the value each time from memory.
I think a local variable is more likely to participate in some optimization, precisely because it is local to the function: this fact can be used by the compiler, for example if it sees that nobody modifies the local variable, then the compiler may load it once, and use it in every iteration.
In case of member data, the compiler may have to work more to conclude that nobody modifies the member. Think about multi-threaded application, and note that the memory model in C++11 is multi-threaded, which means some other thread might modify the member, so the compiler may not conclude that nobody modifies it, in consequence it has to emit code for load member for every expression which uses it, possibly multiple times in a single iteration, in order to work with the updated value of the member.
In this example the the _foo will be copied into new local variable. so both cases the same.
Statis values are like any other variable. its just stored in different memory segment dedicated for static memory.
Reading a static class member is effectively like reading a global variable. They both have a fixed address. Reading a non-static one means first reading the this-pointer, adding an offset to the result and then reading that address. In other words, reading a non-static one requires more steps and memory accesses.