Crazy talk (paranoid about initialization) - c++

I learned long ago that the only reliable way for a static member of be initialized for sure is to do in a function. Now, what I'm about to do is to start returning static data by non-const reference and I need someone to stop me.
function int& dataSlot()
{
static int dataMember = 0;
return dataMember;
}
To my knowledge this is the only way to ensure that the static member is initlized to zero. However, it creates obscure code like this:
dataSlot() = 7; // perfectly normal?
The other way is to put the definition in a translation unit and keep the stuff out of the header file. I have nothing against that per se but I have no idea what the standard says regard when and under what circumstances that is safe.
The absolute last thing I wanna end up doing is accidently accessing uninitialized data and losing control of my program.

(With the usual cautions against indiscriminate use of globals...) Just declare the variable at global scope. It is guaranteed to be zero-initialized before any code runs.
You have to be more cunning when it comes to types with non-trivial constructors, but ints will work fine as globals.

Returning a non-const reference in itself is fairly harmless, for example it's what vector::at() does, or vector::iterator::operator*.
If you don't like the syntax dataSlot() = 7;, you could define:
void setglobal(int i) {
dataSlot() = i;
}
int getglobal() {
return dataSlot();
}
Or you could define:
int *dataSlot() {
static int dataMember = 0;
return &dataMember;
}
*dataSlot() = 7; // better than dataSlot() = 7?
std::cout << *dataSlot(); // worse than std::cout << dataSlot()?
If you want someone to stop you, they need more information in order to propose an alternative to your use of mutable global state!

It is called Meyers singletor, and it is almost perfectly safe.
You have to take care that the object is created when the function dataSlot() is called, but it is going to be destroyed when the program exists (somewhere when global variables are destructed), therefore you have to take special care. Using this function in destructors is specially dangerous and might cause random crashes.

I learned long ago that the only reliable way for a static member of be initialized for sure is to do in a function.
No, it isn't. The standard guarantees that:
All objects with static storage (both block and file or class-static scope) with trivial constructors are initialized before any code runs. Any code of the program at all.
All objects with file/global/class-static scope and non-trivial constructos are than initialized before the main function is called. It is guaranteed that if objects A and B are defined in the same translation unit and A is defined before B, than A is initialized before B. However order of construction of objects defined in different translation units is unspecified and will often differ between compilations.
Any block-static objects are initialized when their declaration is reached for the first time. Since C++03 standard does not have any support for threads, this is NOT thread-safe!
All objects with static storage (both block and file/global/class-static scoped) are destroyed in the reverse order of their constructors completing after the main() function exits or the application terminates using exit() system call.
Neither of the methods is usable and reliable in all cases!
Now, what I'm about to do is to start returning static data by non-const reference and I need someone to stop me.
Nobody is going to stop you. It's legal and perfectly reasonable thing to do. But make sure you don't fall in the threads trap.
E.g. any reasonable unit-test library for C++ automatically registers all test cases. It does it by having something like:
std::vector<TestCase *> &testCaseList() {
static std::vector<TestCase *> test_cases;
return test_cases;
}
TestCase::TestCase() {
...
testCaseList().push_back(this);
}
Because that's the one of only two ways to do it. The other is:
TestCase *firstTest = NULL;
class TestCase {
...
TestCase *nextTest;
}
TestCase::TestCase() {
...
nextTest = firstTest;
firstTest = this;
}
this time using the fact that firstTest has trivial constructor and therefore will be initialized before any of the TestCases that have non-trivial one.
dataSlot() = 7; // perfectly normal?
Yes. But if you really want, you can do either:
The old C thing of
#define dataSlot _dataSlot()
in a way the errno "variable" is usually defined,
Or you can wrap it in a struct like
class dataSlot {
Type &getSlot() {
static Type slot;
return slot;
}
operator const Type &() { return getSlot(); }
operator=(Type &newValue) { getSlot() = newValue; }
};
(the disadvantage here is that compiler won't look for Type's method if you try to invoke them on dataSlot directly; that's why it needs the operator=)

You could make yourself 2 functions, dataslot() and set_dataslot() which are wrappers round the actual dataslot, a bit like this:
int &_dataslot() { static int val = 0; return val; }
int dataslot() { return _dataslot(); }
void set_dataslot(int n) { _dataslot() = n; }
You probably wouldn't want to inline that lot in a header, but I've found some C++ implementations do rather badly if you try that sort of thing anyway.

Related

Any risk of sharing local static variable of a method between instances?

Let's say I create:
class Hello {
public:
int World(int in)
{
static int var = 0; // <<<< This thing here.
if (in >= 0) {
var = in;
} else {
cout << var << endl;
}
}
};
Now, if I do:
Hello A;
Hello B;
A.World(10);
A.World(-1);
B.World(-1);
I'm getting output of "10" followed by another "10". The value of the local variable of a method just crossed over from one instance of a class to another.
It's not surprising - technically methods are just functions with a hidden this parameter, so a static local variable should behave just like in common functions. But is it guaranteed? Is it a behavior enforced by standard, or is it merely a happy byproduct of how the compiler handles methods? In other words - is this behavior safe to use? (...beyond the standard risk of baffling someone unaccustomed...)
Yes. It doesn't matter if the function is a [non-static] member of a class or not, it's guranteed to have only one instance of it's static variables.
Proper technical explanation for such variables is that those are objects with static duration and internal linkage - and thus those names live until program exits, and all instances of this name refer to the same entity.
Just one thing to add to the correct answer. If your class was templated, then the instance of var would only be shared amongst objects of the same instantiation type. So if you had:
template<typename C>
class Hello {
public:
int World(int in)
{
static int var = 0; // <<<< This thing here.
if (in >= 0) {
var = in;
} else {
cout << var << endl;
}
}
};
And then:
Hello<int> A;
Hello<int> B;
Hello<unsigned> C;
A.World(10);
A.World(-1);
B.World(-1);
C.World(-1);
Then the final output would be "0" rather than "10", because the Hello<unsigned> instantiation would have its own copy of var.
If we are talking about the Windows Compiler it's guaranteed
https://msdn.microsoft.com/en-us/library/y5f6w579.aspx
The following example shows a local variable declared static in a member function. The static variable is available to the whole program; all instances of the type share the same copy of the static variable.
They use an example very similar to yours.
I don't know about GCC
Yes, it is guaranteed. Now, to answer the question "Any risk of sharing local static variable of a method between instances?" it might be a bit less straightforward. There might be potential risks in the initialization and utilization of the variable and these risks are specific to variables local to the method (as opposed to class variables).
For the initialization, a relevant part in the standard is 6.7/4 [stmt.dcl]:
Dynamic initialization of a block-scope variable with static storage
duration (3.7.1) or thread storage duration (3.7.2) is performed the
first time control passes through its declaration; such a variable is
considered initialized upon the completion of its initialization. If
the initialization exits by throwing an exception, the initialization
is not complete, so it will be tried again the next time control
enters the declaration. If control enters the declaration concurrently
while the variable is being initialized, the concurrent execution
shall wait for completion of the initialization. If control
re-enters the declaration recursively while the variable is being
initialized, the behavior is undefined.
In the simple cases, everything should work as expected. When the construction and initialization of the variable is more complex, there will be risks specific to this case. For instance, if the constructor throws, it will have the opportunity to throw again on the next call. Another example would be recursive initialization which is undefined behavior.
Another possible risk is the performance of the method. The compiler will need to implement a mechanism to ensure compliant initialization of the variable. This is implementation-dependent and it could very well be a lock to check if the variable is initialized, and that lock could be executed every time the method is called. When that happens, it can have a significant adverse effect on performance.

Static thread safety and initialization order

I've come across a threading issue within some code I'm working on.
MyStruct gets constructed on multiple threads which causes the program to sometimes crash within staticFunc. Big surprise as the code is accessing an unsafe variable value (MSVC 2012 compiler).
struct MyStruct : baseClass
{
MyStruct() : baseClass(myFunc())
{
}
static int* myFunc()
{
static int value[const_size] = get_array();
// Do some complex stuff to value.
return &value[0];
}
};
My question is, what is the best way to solve this?
My first thought would be to remove the static declarations all together.
This actually allowed the unit tests I wrote to pass.
struct MyStruct : baseClass
{
MyStruct() : baseClass(myFunc())
{
}
int* myFunc()
{
this->value = get_array();
// Do some complex stuff to value.
return &this->value[0];
}
int value[const_size];
};
But I'm concerned, isn't it undefined behavior if value is used before constructed/initialized?
Construction order:
baseClass constructor calls myFunc()
myFunc() uses value.
WAIT! value has not been initialized by MyStruct's constructor yet!
Is this what is happening or am i missing something and if so, what would be the correct way of solving this? static mutex?
Edit: I am using MSVC 2012(static is NOT thread safe) and moving to a different compiler is NOT an option.
The first question is whether you want to share the value between different thread or whether it is ok for each thread to store its own value.
If you want to share the value across threads you will need to protect all access with a Critical Section.
If you are ok with different threads holding different values, you will need to look at thread local storage. c++11 provides the thread_local for this purpose but I am not sure if MSVC supports this. Otherwise you can use TlsAlloc, TlsGetValue, TlsSetValue and TlsFree.
Use std::call_once, it was literally invented to solve this problem. From memory you don't need thread-safe statics for file-scope statics as they must be initialized prior to main being entered, so there's no chance of anybody having any threads in your code before it's initialized.
// Place the func_flag at file scope.
static std::once_flag my_func_flag;
class blah {
static const int* myFunc()
{
static int value[const_size];
std::call_once(my_func_flag, [] {
// perform ALL mutations of value in here.
value = get_array();
});
// Read-only from here on.
return &value[0];
}
};

Calling operator new at global scope

A colleague and I were arguing the compilability of writing this at global scope:
int* g_pMyInt = new int;
My arguments revolved around the fact that calling a function (which new is)
at global scope was impossible. To my surprise, the above line compiled just fine
(MS-VC8 & Apple's LLVM 3).
So I went on and tried:
int* foo()
{
return new int;
}
int* g_pMyInt = foo(); // Still global scope.
And, that compiled as well and worked like a charm (tested later with a class
whos constructor/destructor printed out a message. The ctor's message
went through, the dtor's didn't. Less surprised that time.)
While this appears very wrong to me (no orderly/right way/time to call delete),
it's not prohibited by the compiler. Why?
Why shouldn't it be allowed? All you're doing is initializing a global variable, which you are perfectly welcome to do, even if the initialization involves a function call:
int i = 5 + 6;
double j(std::sin(1.25));
const Foo k = get_my_foo_on(i, 11, true);
std::ostream & os(std::cout << "hello world\n");
int * p(new int); // fine but very last-century
std::unique_ptr<int> q(new int); // ah, welcome to the real world
int main() { /* ... */ }
Of course you'll need to worry about deleting dynamically allocated objects, whether they were allocated at global scope or not... a resource-owning wrapper class such as unique_ptr would be the ideal solution.
C++ allow processing to happen before and after the main function, in particular for static objects with constructors & destructors (their constructor have to run before main, their destructor after it). And indeed, the execution order is not well defined.
If you are using GCC, see also its constructor function attribute (which may help to give an order).
Of course you can call functions from global scope, as part of the initialization of global objects. If you couldn't, you couldn't define global variables of types with constructors, because constructors also are functions. However be aware that the initialization order between different translation units is not well defined, so if your function relies on a global variable from another translation unit, you will be in trouble unless you took special precaution.

What makes a static variable initialize only once?

I noticed that if you initialize a static variable in C++ in code, the initialization only runs the first time you run the function.
That is cool, but how is that implemented? Does it translate to some kind of twisted if statement? (if given a value, then ..)
void go( int x )
{
static int j = x ;
cout << ++j << endl ; // see 6, 7, 8
}
int main()
{
go( 5 ) ;
go( 5 ) ;
go( 5 ) ;
}
Yes, it does normally translate into an implicit if statement with an internal boolean flag. So, in the most basic implementation your declaration normally translates into something like
void go( int x ) {
static int j;
static bool j_initialized;
if (!j_initialized) {
j = x;
j_initialized = true;
}
...
}
On top of that, if your static object has a non-trivial destructor, the language has to obey another rule: such static objects have to be destructed in the reverse order of their construction. Since the construction order is only known at run-time, the destruction order becomes defined at run-time as well. So, every time you construct a local static object with non-trivial destructor, the program has to register it in some kind of linear container, which it will later use to destruct these objects in proper order.
Needless to say, the actual details depend on implementation.
It is worth adding that when it comes to static objects of "primitive" types (like int in your example) initialized with compile-time constants, the compiler is free to initialize that object at startup. You will never notice the difference. However, if you take a more complicated example with a "non-primitive" object
void go( int x ) {
static std::string s = "Hello World!";
...
then the above approach with if is what you should expect to find in the generated code even when the object is initialized with a compile-time constant.
In your case the initializer is not known at compile time, which means that the compiler has to delay the initialization and use that implicit if.
Yes, the compiler usually generates a hidden boolean "has this been initialized?" flag and an if that runs every time the function is executed.
There is more reading material here: How is static variable initialization implemented by the compiler?
While it is indeed "some kind of twisted if", the twist may be more than you imagined...
ZoogieZork's comment on AndreyT's answer touches on an important aspect: the initialisation of static local variables - on some compilers including GCC - is by default thread safe (a compiler command-line option can disable it). Consequently, it's using some inter-thread synchronisation mechanism (a mutex or atomic operation of some kind) which can be relatively slow. If you wouldn't be comfortable - performance wise - with explicit use of such an operation in your function, then you should consider whether there's a lower-impact alternative to the lazy initialisation of the variable (i.e. explicitly construct it in a threadsafe way yourself somewhere just once). Very few functions are so performance sensitive that this matters though - don't let it spoil your day, or make your code more complicated, unless your programs too slow and your profiler's fingering that area.
They are initialized only once because that's what the C++ standard mandates. How this happens is entirely up to compiler vendors. In my experience, a local hidden flag is generated and used by the compiler.

Lazy/multi-stage construction in C++

What's a good existing class/design pattern for multi-stage construction/initialization of an object in C++?
I have a class with some data members which should be initialized in different points in the program's flow, so their initialization has to be delayed. For example one argument can be read from a file and another from the network.
Currently I am using boost::optional for the delayed construction of the data members, but it's bothering me that optional is semantically different than delay-constructed.
What I need reminds features of boost::bind and lambda partial function application, and using these libraries I can probably design multi-stage construction - but I prefer using existing, tested classes. (Or maybe there's another multi-stage construction pattern which I am not familiar with).
The key issue is whether or not you should distinguish completely populated objects from incompletely populated objects at the type level. If you decide not to make a distinction, then just use boost::optional or similar as you are doing: this makes it easy to get coding quickly. OTOH you can't get the compiler to enforce the requirement that a particular function requires a completely populated object; you need to perform run-time checking of fields each time.
Parameter-group Types
If you do distinguish completely populated objects from incompletely populated objects at the type level, you can enforce the requirement that a function be passed a complete object. To do this I would suggest creating a corresponding type XParams for each relevant type X. XParams has boost::optional members and setter functions for each parameter that can be set after initial construction. Then you can force X to have only one (non-copy) constructor, that takes an XParams as its sole argument and checks that each necessary parameter has been set inside that XParams object. (Not sure if this pattern has a name -- anybody like to edit this to fill us in?)
"Partial Object" Types
This works wonderfully if you don't really have to do anything with the object before it is completely populated (perhaps other than trivial stuff like get the field values back). If you do have to sometimes treat an incompletely populated X like a "full" X, you can instead make X derive from a type XPartial, which contains all the logic, plus protected virtual methods for performing precondition tests that test whether all necessary fields are populated. Then if X ensures that it can only ever be constructed in a completely-populated state, it can override those protected methods with trivial checks that always return true:
class XPartial {
optional<string> name_;
public:
void setName(string x) { name_.reset(x); } // Can add getters and/or ctors
string makeGreeting(string title) {
if (checkMakeGreeting_()) { // Is it safe?
return string("Hello, ") + title + " " + *name_;
} else {
throw domain_error("ZOINKS"); // Or similar
}
}
bool isComplete() const { return checkMakeGreeting_(); } // All tests here
protected:
virtual bool checkMakeGreeting_() const { return name_; } // Populated?
};
class X : public XPartial {
X(); // Forbid default-construction; or, you could supply a "full" ctor
public:
explicit X(XPartial const& x) : XPartial(x) { // Avoid implicit conversion
if (!x.isComplete()) throw domain_error("ZOINKS");
}
X& operator=(XPartial const& x) {
if (!x.isComplete()) throw domain_error("ZOINKS");
return static_cast<X&>(XPartial::operator=(x));
}
protected:
virtual bool checkMakeGreeting_() { return true; } // No checking needed!
};
Although it might seem the inheritance here is "back to front", doing it this way means that an X can safely be supplied anywhere an XPartial& is asked for, so this approach obeys the Liskov Substitution Principle. This means that a function can use a parameter type of X& to indicate it needs a complete X object, or XPartial& to indicate it can handle partially populated objects -- in which case either an XPartial object or a full X can be passed.
Originally I had isComplete() as protected, but found this didn't work since X's copy ctor and assignment operator must call this function on their XPartial& argument, and they don't have sufficient access. On reflection, it makes more sense to publically expose this functionality.
I must be missing something here - I do this kind of thing all the time. It's very common to have objects that are big and/or not needed by a class in all circumstances. So create them dynamically!
struct Big {
char a[1000000];
};
class A {
public:
A() : big(0) {}
~A() { delete big; }
void f() {
makebig();
big->a[42] = 66;
}
private:
Big * big;
void makebig() {
if ( ! big ) {
big = new Big;
}
}
};
I don't see the need for anything fancier than that, except that makebig() should probably be const (and maybe inline), and the Big pointer should probably be mutable. And of course A must be able to construct Big, which may in other cases mean caching the contained class's constructor parameters. You will also need to decide on a copying/assignment policy - I'd probably forbid both for this kind of class.
I don't know of any patterns to deal with this specific issue. It's a tricky design question, and one somewhat unique to languages like C++. Another issue is that the answer to this question is closely tied to your individual (or corporate) coding style.
I would use pointers for these members, and when they need to be constructed, allocate them at the same time. You can use auto_ptr for these, and check against NULL to see if they are initialized. (I think of pointers are a built-in "optional" type in C/C++/Java, there are other languages where NULL is not a valid pointer).
One issue as a matter of style is that you may be relying on your constructors to do too much work. When I'm coding OO, I have the constructors do just enough work to get the object in a consistent state. For example, if I have an Image class and I want to read from a file, I could do this:
image = new Image("unicorn.jpeg"); /* I'm not fond of this style */
or, I could do this:
image = new Image(); /* I like this better */
image->read("unicorn.jpeg");
It can get difficult to reason about how a C++ program works if the constructors have a lot of code in them, especially if you ask the question, "what happens if a constructor fails?" This is the main benefit of moving code out of the constructors.
I would have more to say, but I don't know what you're trying to do with delayed construction.
Edit: I remembered that there is a (somewhat perverse) way to call a constructor on an object at any arbitrary time. Here is an example:
class Counter {
public:
Counter(int &cref) : c(cref) { }
void incr(int x) { c += x; }
private:
int &c;
};
void dontTryThisAtHome() {
int i = 0, j = 0;
Counter c(i); // Call constructor first time on c
c.incr(5); // now i = 5
new(&c) Counter(j); // Call the constructor AGAIN on c
c.incr(3); // now j = 3
}
Note that doing something as reckless as this might earn you the scorn of your fellow programmers, unless you've got solid reasons for using this technique. This also doesn't delay the constructor, just lets you call it again later.
Using boost.optional looks like a good solution for some use cases. I haven't played much with it so I can't comment much. One thing I keep in mind when dealing with such functionality is whether I can use overloaded constructors instead of default and copy constructors.
When I need such functionality I would just use a pointer to the type of the necessary field like this:
public:
MyClass() : field_(0) { } // constructor, additional initializers and code omitted
~MyClass() {
if (field_)
delete field_; // free the constructed object only if initialized
}
...
private:
...
field_type* field_;
next, instead of using the pointer I would access the field through the following method:
private:
...
field_type& field() {
if (!field_)
field_ = new field_type(...);
return field_;
}
I have omitted const-access semantics
The easiest way I know is similar to the technique suggested by Dietrich Epp, except it allows you to truly delay the construction of an object until a moment of your choosing.
Basically: reserve the object using malloc instead of new (thereby bypassing the constructor), then call the overloaded new operator when you truly want to construct the object via placement new.
Example:
Object *x = (Object *) malloc(sizeof(Object));
//Use the object member items here. Be careful: no constructors have been called!
//This means you can assign values to ints, structs, etc... but nested objects can wreak havoc!
//Now we want to call the constructor of the object
new(x) Object(params);
//However, you must remember to also manually call the destructor!
x.~Object();
free(x);
//Note: if you're the malloc and new calls in your development stack
//store in the same heap, you can just call delete(x) instead of the
//destructor followed by free, but the above is the correct way of
//doing it
Personally, the only time I've ever used this syntax was when I had to use a custom C-based allocator for C++ objects. As Dietrich suggests, you should question whether you really, truly must delay the constructor call. The base constructor should perform the bare minimum to get your object into a serviceable state, whilst other overloaded constructors may perform more work as needed.
I don't know if there's a formal pattern for this. In places where I've seen it, we called it "lazy", "demand" or "on demand".