I am trying to understand the implications / side effects / advantages of a recent code change someone made. The change is as follows:
Original
static List<type1> Data;
Modified
static List<type1> & getData (void)
{
static List<type1> * iList = new List<type1>;
return * iList;
}
#define Data getData()
What purpose could the change serve?
The benefit to the revision that I can see is an issue of 'initialization time'.
The old code triggered an initialization before main() is called.
The new code does not trigger initialization until getData() is called for the first time; if the function is never called, you never pay to initialize a variable you didn't use. The (minor) downside is that there is an initialization check in the generated code each time the function is used, and there is a function call every time you need to access the list of data.
If you have a variable with static duration, it is created when the application is initialized. When the application terminates the object is destroyed. It is not possible to control the order in which different objects are created.
The change will make the object be created when it is first used, and (as it is allocated dynamically) it will never be destroyed.
This can be a good thing if other objects need this objects when they are destroyed.
Update
The original code accessed the object using the variable Data. The new code does not have to be modified in any way. When the code use Data it will, in fact, be using the macro Data, which will be expanded into getData(). This function will return a reference to the actual (dynamically allocated object). In practice, the new code will work as a drop-in replacement for the old code, with the only noticable difference being what I described in the original answer above.
Delaying construction until the first use of Data avoids the "static initialization order fiasco".
Making some guesses about your List,... the default-constructed Data is probably an empty list of type1 items, so it's probably not at great risk of causing the fiasco in question. But perhaps someone felt it better to be safe than sorry.
There are several reasons why that change was made :
to prevent the static order initialization fiasco
to delay the initialization of the static variable (for whatever reason)
Related
Having recently got feedback from Code Review stating the impropriety of non-initialized variables, my class variable initialization now seems very ugly:
class MyClass
{
private:
int variable_one;
int variable_two;
int variable_three;
MyClass():variable_one(0),variable_two(0),variable_three(0){};
//...
};
Previously, I wouldn't define my variables until they are needed:
class MyClass
{
private:
int variable_one;
void MyFunction(int x)
{
variable_one = x;
}
};
Why is my second example frowned upon? What are the risks involved by bot initializing variables?
The risk with leaving variables uninitialized is that you might read them before they've been set up. That can lead to extremely hard-to-diagnose bugs and erratic behavior. You can also initialize them to sentinel values to make it easier to detect when they haven't been set up.
As a note, since C++11 (what's now supported by most compilers), you can just do this:
class MyClass
{
private:
int variable_one = 0;
int variable_two = 0;
int variable_three = 0;
};
Now there's little code overhead and it makes clear that they get default values unless you specifically set them to something else.
It is frowned upon because someone, someday is going to use your class and assume that internal state has been setup during initialization. You are perhaps worried about initialization not needed and wasting time? Instead, depending on how many various and assorted methods you have, you will be repeating the initialization code in every one of them, until you forget in just one of them, but it works fine because you ran in debug mode so all memory is cleared. Then a month later, someone compiles in release and all the sudden they think that their code is broken because you didn't initialize your code in a central location.
RAII - Resource Allocation Is Initialization whenever possible. When they create an instance of your class, make sure it is initialized.
The answers provided so far are all correct, but no one mentioned there's a more general OO principle at work: An object's methods transform it from one internally consistent state to another. It should never be possible to use an object where it does something silly (unless it's a silly object).
For example, if an object has an array and a count of currently active elements in the array, it should never be true that there are active elements when count is zero. The method that updates the array also updates the count, keeping the state consistent with itself. The momentary inconsistency -- after the element is added and before the count is updated -- is not visible to the user of the object.
In your example, MyClass gets off on the wrong foot by creating a nondeterministic initial state. Whatever relationship the member variables have to each other, their values are determined by compiler happenstance. The more it's used, the probability that that's what you want approaches zero.
The first method you've specified is called as initializer list, and it's the only way to initialize when you've const or reference data members in your class.
If you don't initialize your data members that are C++ objects, it will still be initialized to a default value by calling the corresponding constructor, and then the same exercise will be repeated when you try to initialize it at a later time (like how you're doing it inside a function, when you deem it to be necessary).
class MyClass {
private:
unsigned int currentTimeMS;
public:
void update() {
currentTimeMS = getTimeMS();
// ...
}
};
class MyClass {
public:
void update() {
unsigned int currentTimeMS = getTimeMS();
// ...
}
};
update() calls in main game loop so in the second case we get a lot of allocation operations (unsigned int currentTimeMS). In the first case we get only one allocate and use that allocated variable before.
Which of this code better to use and why?
I recommend the second variant because it is stateless and the scope of the variable is smaller. Use the first one only if you really experience a performance issue, which I consider unlikely.
If you do not modify the variable value later, you should also consider to make it const in order to express this intent in your code and to give the compiler additional optimization options.
It depends upon your needs. If currentTimeMS is needed only temporarily in the update(), then surely declare it there. (in your case, #option2)
But if it's value is needed for the instance of the class (i.e. being used in some other method), then you should declare it as a field (in your case, #option1).
In the first example, you are saving the state of this class object. In the second one, you're not, so the currentTime will be lost the instant update() is called.
It is really up to you to decide which one you need.
The first case is defining a member variable the second a local variable. Basic class stuff. A private member variable is available to any function (method) in that class. a local variable is only available in the function in which it is declared.
Which of this code better to use and why?
First and foremost, the cited code is at best a tiny micro-optimization. Don't worry about such things unless you have to.
In fact, this is most likely a disoptimization. Sometimes automatic variables are allocated on the stack. Stack allocation is extremely fast (and even free sometimes). There is no need to worry. Other times, the compiler may place a small automatic variable such the unsigned int used here in a register. There's no allocation whatsoever.
Compare that to making the variable a data member of the class, and solely for the purpose of avoiding that allocation. Accessing that variable involves going through the this pointer. Pointer dereference has a cost, potentially well beyond that of adding an offset to a pointer. The dereference might result in a cache miss. Even worse, this dereferencing may well be performed every time the variable is referenced.
That said, sometimes it is better to create data members solely for the purpose of avoiding automatic variables in various member functions. Large arrays declared as local automatic variables might well result in stack overflow. Note, however, that making double big_array[2000][2000] a data member of MyClass will most likely make it impossible to have a variable of type MyClass be declared as a local automatic variable in some function.
The standard solution to the problems created by placing large arrays on the stack is to instead allocate them on the heap. This leads to another place where creating a data member to avoid a local variable can be beneficial. While stack allocation is extremely fast, heap allocation (e.g., new) is quite slow. A member function that is called repeatedly may benefit by making the automatic variable std::unique_ptr<double> big_array = std::make_unique<double>(2000*2000) a data member of MyClass.
Note that neither of the above applies to the sample code in the question. Note also that the last concern (making an heap-allocated variable a data member so as to avoid repeated allocations and deallocations) means that the code has to go through the this pointer to access that memory. In tight code, I've sometimes been forced to create a local automatic pointer variable such as double* local_pointer = this->some_pointer_member to avoid repeated traversals through this.
Recent I read some C++ code using extensively following getInstance() method:
class S
{
private:
int some_int = 0;
public:
static S& getInstance()
{
static S instance; / (*) /
return instance;
}
};
From how this code fragment being used, I learned the getInstance() works like return this, returning the address(or ref) of the instance of class S. But I got confused.
1) where does the static variable S defined in line(*) allocated in memory? And why it can work like return this?
2) what if there exist more than one instance of class S, whose reference will be returned?
This is the so-called Singleton design pattern. Its distinguishing feature is that there can only ever be exactly one instance of that class and the pattern ensures that. The class has a private constructor and a statically-created instance that is returned with the getInstance method. You cannot create an instance from the outside and thus get the object only through said method.
Since instance is static in the getInstance method it will retain its value between multiple invocations. It is allocated and constructed somewhen before it's first used. E.g. in this answer it seems like GCC initializes the static variable at the time the function is first used. This answer has some excerpts from the C++ standard related to that.
Static variable with a function scope is allocated first time the function is called. The compiler keeps track of the fact that is is initialized the first time and avoids further creation during the next visit to the function. This property would be ideal to implement a singleton pattern since this ensures we just maintain a single copy of object. In addition newer compilers makes sure this allocation is also thread safe giving a bonus thread safe singleton implementation without using any dynamic memory allocation. [As pointed out in comments, it is c++11 standard which guarantees the thread safety so do your check about thread safety if you are using a different compiler]
1) where does the static variable S defined in line(*) allocated in
memory? And why it can work like return this?
There is specific regions in memory where static variables are stored, like heap for dynamically allocated variables and stack for the typical compile time defined variables. It could vary with compilers but for GCC there are specific sections called DATA and BSS segments with in the executable generated by the compiler where initialized and uninitialized static variables are stored.
2) what if there exist more than one instance of class S, whose
reference will be returned?
As mentioned on the top, since it is a static variable the compiler ensures there can only be one instance created the first time function is visited. Also since it has the scope with in the function it cannot clash with any other instances existing else where and getInstance ensures you see the same single instance.
I've just read that if I want to be sure about the initialization order it will be better to use some function which will turn global variable into local(but still static), my question, do I need to keep some identifier which tells me that my static object has already been created(the identifier inside function which prevent me from the intialization of the static object one more time) or not? cause I can use this function with initialization in different places, thanks in advance for any help
The first question is do your static lifetime objects care about the order they are initialized?
If true the second question is why?
The initialization is only a problem if a global object uses another global object during its initialization (i.e. when the constructor is running). Note: This is horrible proactive and should be avoided (globals should not be used and if they are they should be interdependent).
If they must be linked then they should be related (in which case you could potentially make a new object that includes the two old ones so that you can control their creation more precisely). If that is not possible you just put them in the same compilation unit (read *.cpp file).
As far as the standard is concerned, initialization of a function-scope static variable only happens once:
int *gettheint(bool set_to_four) {
static int foo = 3; // only happens once, ever
if (set_to_four) {
foo = 4; // happens as many times as the function is called with true
}
return &foo;
}
So there's no need for gettheint to check whether foo has already been initialized - the value won't be overwritten with 3 on the second and subsequent calls.
Threads throw a spanner in the works, being outside the scope of the standard. You can check the documentation for your threading implementation, but chances are that the once-onliness of the initialization is not thread-safe in your implementation. That's what pthread_once is for, or equivalent. Alternatively in a multi-threaded program, you could call the function before creating any extra threads.
Let's consider a C++ class. At the beginning of the execution I want to read a set of values from an XML file and assign them to 7 of the data members of this class. Those values do not change during the whole execution and they have to be shared by all the objects / instances of the class in question. Are static data members the most elegant way to achieve this behavior? (Of course, I do not consider global variables)
As others have mentioned, the use of static members in this case seems appropriate. Just remember that it is not foolproof; some of the issues with global data apply to static members:
You cannot control the order of initialization of your static members, so you need to make sure that no globals or other statics refer to these objects. See this C++ FAQ Question for more details and also for some tips for avoiding this problem.
If your accessing these in a multi-threaded environment you need to make sure that the members are fully initialized before you spawn any threads.
Sounds like a good use of static variables to me. You're using these more as fixed parameters than variables, and the values legitimately need to be shared.
static members would work here and are perfectly acceptable. Another option is to use a singleton pattern for the class that holds these members to ensure that they are constructed/set only once.
It is not a clean design. Static class members are global state and global state is bad.
It might not cause you trouble if this is a small- to medium-sized project and you do not have high goals for automatic testing, but since you ask: there are better ways.
A cleaner design would be to create a Factory for the class and have the Factory pass your seven variables to the class when it constructs it. It is then the Factory's responsility to ensure that all instances share the same values.
That way your class becomes testable and you have properly separated your concerns.
PS. Don't use singletons either.
sounds like an appropriate use of static class members. just don't forget that they're really global variables with a namespace and (maybe) some protection. therefore, if there's the possibility that your application could someday evolve separate 'environments' or something that would need a set of these globals for each, you'd have painted yourself into a corner.
as suggested by Rob, consider using a singleton, which is easier to turn later into some kind of managed environment variable.
At the beginning of the execution I want to read a set of values from an
XML file and assign them to 7 of the
data members of this class. Those
values do not change during the whole
execution and they have to be shared
by all the objects / instances of the
class in question.
The sentence in boldface is the kicker here. As long as that statement holds, the use of static variables is OK. How will this be enforced?
It's hard to. So, if for your use right now the statement is always true, go ahead. If you want to be same from some future developer (or you) using your classes wrong (like reading another XML file midway in the program), then do something like what Rasmus Farber says.
Yes, static datamembers are what you look for. But you have to take care for the initialization/destruction order of your static variables. There is no mechanism in C++ to ensure that your static variables are initialized before you use them across translation units. To be safe, use what looks like the singleton pattern and is well known to fix that issue. It works because:
All static objects are completely constructed after the complete construction of any xml_stuff instance.
The order of destruction of static objects in C++ is the exact opposite of the completion of their construction (when their constructor finishes execution).
Code:
class xml_stuff {
public:
xml_stuff() {
// 1. touch all members once
// => 2. they are created before used
// => 3. they are created before the first xml_stuff instance
// => 4. they will be destructed after the last xml_stuff instance is
// destructed at program exit.
get_member1();
get_member2();
get_member3();
// ...
}
// the first time their respective function is called, these
// objects will be created and references to them are returned.
static type1 & get_member1() { static type1 t; return t; }
static type2 & get_member2() { static type2 t; return t; }
static type1 & get_member3() { static type1 t; return t; }
// ... all other 7 members
};
Now, the objects returned by xml_stuff::get_memberN() are valid the whole lifetime of any xml_stuff instance, because any of those members were constructed before any xml_stuff instance. Using plain static data members, you cannot ensure that, because order of creation across translation units is left undefined in C++.
As long as you think of testability and you have another way to set the static variables besides reading in a file, plus you don't rely on the data benig unchanged for the entire execution time of the process - you should be fine.
I've found that thinking of writing tests when you design your code helps you keep the code well-factored and reusable.