Is uninitialized data behavior well specified? - c++

Note: I am using the g++ compiler (which is I hear is pretty good and supposed to be pretty close to the standard).
I have the simplest class I could think of:
class BaseClass {
public:
int pub;
};
Then I have three equally simple programs to create BaseClass object(s) and print out the [uninitialized] value of their data.
Case 1
BaseClass B1;
cout<<"B1.pub = "<<B1.pub<<endl;
This prints out:
B1.pub = 1629556548
Which is fine. I actually thought it would get initialized to zero because it is a POD or Plain Old Datatype or something like that, but I guess not? So far so good.
Case 2
BaseClass B1;
cout<<"B1.pub = "<<B1.pub<<endl;
BaseClass B2;
cout<<"B2.pub = "<<B2.pub<<endl;
This prints out:
B1.pub = 1629556548
B2.pub = 0
This is definitely weird. I created two of the same objects the same exact way. One got initialized and the other did not.
Case 3
BaseClass B1;
cout<<"B1.pub = "<<B1.pub<<endl;
BaseClass B2;
cout<<"B2.pub = "<<B2.pub<<endl;
BaseClass* pB3 = new BaseClass;
cout<<"B3.pub = "<<pB3->pub<<endl;
This prints out:
B1.pub = 0
B2.pub = 0
B3.pub = 0
This is the most weird yet. They all get initialized to zero. All I did was add two lines of code and it changed the previous behavior.
So is this just a case of 'uninitialized data leads to unspecified behavior' or is there something more logical going on 'under the hood'?
I really want to understand the default constructor/destructor behavior because I have a feeling that it will be very important for completely understanding the inheritance stuff..

So is this just a case of 'uninitialized data leads to unspecified behavior'
Yes...
Sometimes if you call malloc (or new, which calls malloc) you will get data that is filled with zeroes because it is in a fresh page from the kernel. Other times it will be full of junk. If you put something on the stack (i.e., auto storage), you will almost certainly get garbage — but it can be hard to debug, because on your system that garbage might happen to be somewhat predictable. And with objects on the stack, you'll find that changing code in a completely different source file can change the values you see in an uninitialized data structure.
About POD: Whether or not something is POD is really a red herring here. I only explained it because the question mentioned POD, and the conversation derailed from there. The two relevant concepts are storage duration and constructors. POD objects don't have constructors, but not everything without a constructor is POD. (Technically, POD objects don't have non-trivial constructors nor members with non-trivial constructors.)
Storage duration: There are three kinds. Static duration is for globals, automatic is for local variables, and dynamic is for objects on the heap. (This is a simplification and not exactly correct, but you can read the C++ standard yourself if you need something exactly correct.)
Anything with static storage duration gets initialized to zero. So if you make a global instance of BaseClass, then its pub member will be zero (at first). Since you put it on the stack and the heap, this rule does not apply — and you don't do anything else to initialize it, so it is uninitialized. It happens to contain whatever junk was left in memory by the last piece of code to use it.
As a rule, any POD on the heap or the stack will be uninitialized unless you initialize it yourself, and the value will be undefined, possible changing when you recompile or run the program again. As a rule, any global POD will get initialized to zero unless you initialize it to something else.
Detecting uninitialized values: Try using Valgrind's memcheck tool, it will help you find where you use uninitialized values — these are usually errors.

It depends how you declare them:
// Assuming the POD type's
class BaseClass
{
public:
int pub;
};
Static Storage Duration Objects
These objects are always zero initialized.
// Static storage duration objects:
// PODS are zero initialized.
BaseClass global; // Zero initialized pub = 0
void plop()
{
static BaseClass functionStatic; // Zero initialized.
}
Automatic/Dynamic Storage Duration Objects
These objects may by default or zero initialized depending on how you declare them
void plop1()
{
// Dynamic
BaseClass* dynaObj1 = new BaseClass; // Default initialized (does nothing)
BaseClass* dynaObj2 = new BaseClass(); // Zero Initialized
// Automatic
BaseClass autoObj1; // Default initialized (does nothing)
BaseClass autoObj2 = BaseClass(); // Zero Initialized
// Notice that zero initialization of an automatic object is not the same
// as the zero initialization of a dynamic object this is because of the most
// vexing parse problem
BaseClass autoObj3(); // Unfortunately not a zero initialized object.
// Its a forward declaration of a function.
}
I use the term 'Zero-Initialized'/'Default-Initialization' but technically slightly more complex. 'Default-Initialization' will becomes 'no-initialization' of the pub member. While the () invokes 'Value-Initialization' that becomes 'Zero-Initialization' of the pub member.
Note: As BaseClass is a POD this class behaves just like the builtin types. If you swap BaseClass for any of the standard type the behavior is the same.

In all three cases, those POD objects could have indeterminate values.
POD objects without any initializer will NOT be initialized by default value. They just contain garbage..
From Standard 8.5 Initializers,
"If no initializer is specified for an object, and the object is of
(possibly cv-qualified) non-POD class type (or array thereof), the
object shall be default-initialized; if the object is of
const-qualified type, the underlying class type shall have a
user-declared default constructor. Otherwise, if no initializer is
specified for a nonstatic object, the object and its subobjects, if
any, have an indeterminate initial value; if the object or any of
its subobjects are of const-qualified type, the program is
ill-formed."
You can zero initialize all the members of a POD struct like this,
BaseClass object={0};

The way you wrote your class, the value of pub is undefined and can be anything. If you create a default constructor that will call the default constructor of pub - it will be guaranteed to be zero:
class BaseClass {
public:
BaseClass() : pub() {}; // calling pub() guarantees it to be zero.
int pub;
};
That would be a much better practice.

As a general rule, yes, uninitialized data leads to unspecified behavior. That's why other languages like C# take steps to insure that you don't use uninitialized data.

This is why you always, always, always, initialize a classes (or ANY variable) to a stable state instead of relying on the compiler to do it for you; especially since some compilers deliberately fill them with garbage. In fact, the only POD MSVC++ doesn't fill with garbage is bool, they get initialized to true. One would think it'd be safer to init it to false, but that's Microsoft for ya.

Case 1
Regardless of whether the encapsulating type is POD, data members of a built-in type are not default initialised by an encapsulating default constructor.
Case 2
No, neither got initialised. The underlying bytes at the memory position of one of them just happened to be 0.
Case 3
Same.
You seem to be expecting some guarantees about the "value" of uninitialised objects, whilst simultaneously professing the understanding that no such value exists.

Related

C++ initialize differently in container vs as local variable

This question came up while I was trying to write an RAII wrapper class for an OpenGL buffer object. The way that OpenGL creates buffer objects is by a call to void glGenBuffers(GLsizei n​, GLuint* buffers​) which "returns n buffer object names in buffers​." Also, note that the object name 0 is special to OpenGL, being a default value "like a NULL pointer."
So my first idea is to create a class buffer_object like
class buffer_object {
public:
// constructors, destructor
private:
unsigned int const name;
};
I want two different initialization behaviors:
If I create an array std::array<buffer_object, N> buffer_obj_array {};, I want the buffer_object array elements to have their names initialized to 0. Ditto for other containers.
If I create a local variable, don't initialize to 0, but actually give it a name by calling glGenBuffers(1, &name).
The problem is I don't fully understand list initialization, value initialization, and default initialization.
About std::array's default constructor, cppreference.com says it uses aggregate initialization, which is a type of list initialization, which value initializes the elements in an array that aren't given in an initialization list. About value initialization, cppreference.com says,
if T is a class type with at least one user-provided constructor of
any kind, the default constructor is called
So, does that mean there's no way to do the following?
{
buffer_object obj; // has some name given by glGenBuffers; ready to use
int const n = 3;
std::array<buffer_object, n> foo; // each element has name 0
glGenBuffers(n, foo.data()); // gives each element its own name
}
// everything destructed using buffer_object::~buffer_object
— because I'm asking for two different behaviors from the default constructor?
Maybe I'm taking the wrong approach.
C++ object initialization is not, and cannot be, predicated on where that object is created (with the exception of default-initializing objects of trivial types, which zero-initialize when used for global variables, but not for automatics).
If you give a type a non-trivial default constructor, you are making a strong statement that objects of this type need to have some real code executed before they can be considered valid objects of this type. C++ will therefore assume you are serious about this statement and ensure that said "real code" is executed in all cases when objects of that type are created.
If you give buffer_object a default constructor that generates a buffer object, you are saying that objects of this type should always own a valid OpenGL buffer object (except maybe when they are moved-from). If you give buffer_object a default constructor that zeros out the internal buffer object handle, you are saying that having no OpenGL buffer object handle is a normal, valid state for a buffer_object to be in.
There's no half-way with these two statements. Either the default is empty or it isn't.
Now, you can have alternate constructors. The default constructor could make the handle zero, with another constructor that takes a simple tag type that explicitly says that it will generate a handle. Or better yet, how about a factory function that makes buffer_objects that have generated handles:
buffer_object gen_buffer()
{
GLuint handle = 0;
glGenBuffers(1, &handle);
return buffer_object(handle);
};
auto obj = gen_buffer(); //Clearly states what is going on.
Now there is no question what is in obj: it has a generated buffer object handle. Because that's what the code says is going on.

Is circumventing a class' constructor legal or does it result in undefined behaviour?

Consider following sample code:
class C
{
public:
int* x;
};
void f()
{
C* c = static_cast<C*>(malloc(sizeof(C)));
c->x = nullptr; // <-- here
}
If I had to live with the uninitialized memory for any reason (of course, if possible, I'd call new C() instead), I still could call the placement constructor. But if I omit this, as above, and initialize every member variable manually, does it result in undefined behaviour? I.e. is circumventing the constructor per se undefined behaviour or is it legal to replace calling it with some equivalent code outside the class?
(Came across this via another question on a completely different matter; asking for curiosity...)
It is legal now, and retroactively since C++98!
Indeed the C++ specification wording till C++20 was defining an object as (e.g. C++17 wording, [intro.object]):
The constructs in a C++ program create, destroy, refer to, access, and
manipulate objects. An object is created by a definition (6.1), by a
new-expression (8.5.2.4), when implicitly changing the active member
of a union (12.3), or when a temporary object is created (7.4, 15.2).
The possibility of creating an object using malloc allocation was not mentioned. Making it a de-facto undefined behavior.
It was then viewed as a problem, and this issue was addressed later by https://wg21.link/P0593R6 and accepted as a DR against all C++ versions since C++98 inclusive, then added into the C++20 spec, with the new wording:
[intro.object]
The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is created by a definition, by a new-expression, by an operation that implicitly creates objects (see below)...
...
Further, after implicitly creating objects within a specified region of
storage, some operations are described as producing a pointer to a
suitable created object. These operations select one of the
implicitly-created objects whose address is the address of the start
of the region of storage, and produce a pointer value that points to
that object, if that value would result in the program having defined
behavior. If no such pointer value would give the program defined
behavior, the behavior of the program is undefined. If multiple such
pointer values would give the program defined behavior, it is
unspecified which such pointer value is produced.
The example given in C++20 spec is:
#include <cstdlib>
struct X { int a, b; };
X *make_x() {
// The call to std​::​malloc implicitly creates an object of type X
// and its subobjects a and b, and returns a pointer to that X object
// (or an object that is pointer-interconvertible ([basic.compound]) with it),
// in order to give the subsequent class member access operations
// defined behavior.
X *p = (X*)std::malloc(sizeof(struct X));
p->a = 1;
p->b = 2;
return p;
}
There is no living C object, so pretending that there is one results in undefined behavior.
P0137R1, adopted at the committee's Oulu meeting, makes this clear by defining object as follows ([intro.object]/1):
An object is created by a definition ([basic.def]), by a new-expression ([expr.new]), when implicitly changing the active member of a union ([class.union]), or when a temporary object is created ([conv.rval], [class.temporary]).
reinterpret_cast<C*>(malloc(sizeof(C))) is none of these.
Also see this std-proposals thread, with a very similar example from Richard Smith (with a typo fixed):
struct TrivialThing { int a, b, c; };
TrivialThing *p = reinterpret_cast<TrivialThing*>(malloc(sizeof(TrivialThing)));
p->a = 0; // UB, no object of type TrivialThing here
The [basic.life]/1 quote applies only when an object is created in the first place. Note that "trivial" or "vacuous" (after the terminology change done by CWG1751) initialization, as that term is used in [basic.life]/1, is a property of an object, not a type, so "there is an object because its initialization is vacuous/trivial" is backwards.
I think the code is ok, as long as the type has a trivial constructor, as yours. Using the object cast from malloc without calling the placement new is just using the object before calling its constructor. From C++ standard 12.7 [class.dctor]:
For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior.
Since the exception proves the rule, referrint to a non-static member of an object with a trivial constructor before the constructor begins execution is not UB.
Further down in the same paragraphs there is this example:
extern X xobj;
int* p = &xobj.i;
X xobj;
This code is labelled as UB when X is non-trivial, but as not UB when X is trivial.
For the most part, circumventing the constructor generally results in undefined behavior.
There are some, arguably, corner cases for plain old data types, but you don't win anything avoiding them in the first place anyway, the constructor is trivial. Is the code as simple as presented?
[basic.life]/1
The lifetime of an object or reference is a runtime property of the object or reference. An object is said to have non-vacuous initialization if it is of a class or aggregate type and it or one of its subobjects is initialized by a constructor other than a trivial default constructor. [ Note: initialization by a trivial copy/move constructor is non-vacuous initialization. — end note ] The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-vacuous initialization, its initialization is complete.
The lifetime of an object of type T ends when:
if T is a class type with a non-trivial destructor ([class.dtor]), the destructor call starts, or
the storage which the object occupies is reused or released.
Aside from code being harder to read and reason about, you will either not win anything, or land up with undefined behavior. Just use the constructor, it is idiomatic C++.
This particular code is fine, because C is a POD. As long as C is a POD, it can be initialized that way as well.
Your code is equivalent to this:
struct C
{
int *x;
};
C* c = (C*)malloc(sizeof(C));
c->x = NULL;
Does it not look like familiar? It is all good. There is no problem with this code.
While you can initialize all explicit members that way, you cannot initialize everything a class may contain:
references cannot be set outside an initializer list
vtable pointers cannot be manipulated by code at all
That is, the moment that you have a single virtual member, or virtual base class, or reference member, there is no way to correctly initialize your object except by calling its constructor.
I think it shouldn't be UB. You make your pointer point to some raw memory and are treating its data in a particular way, there's nothing bad here.
If the constructor of this class does something (initializes variables, etc), you'll end up with, again, a pointer to raw, uninitialized object, using which without knowing what the (default) constructor was supposed to be doing (and repeating its behavior) will be UB.

C++ static member initialization order

I have a code which has been in use for a long time which has a scenario like following.
There are two classes, A and B.
A class has a public static variable static B* pB;
B class has a static object of itself (static B instance;).
In B constructor I set A::pB = this;
My question is, since static variable initialization order is undefined, if b::instance got initialized before A::pB, making B constructor to get called first and it tries to set A::pB which is uninitialized yet, could it lead to a problem?
My current code runs without any unexpected behavior. Wanted to find out whether it is just my luck or not
(Initialization of A::pB and B::instance happens in different translation units)
There are three parts to initialization, zero initialization,
static initialization and dynamic initialization. (Zero
initialization is actually part of static initialization, but
it's often convenient to keep them separate.) They occur in
that order.
If you don't specify any initialization for static B* A::pb;,
it will be zero initialized (before anything else), and nothing
else. If you specify a constant initializer, e.g.
B* A::pb = nullptr;
, that will also occur before any dynamic initialization.
What happens in a constructor is dynamic initialization, so
there should be no problem with your code unless A::pb also
has dynamic initialization, something like:
B* A::pb = someFunctionReturningABStar();
And finally: the default constructor for pointer is trivial; the
pointer is effectively "constructed" when the program is loaded,
before any code is executed. So there could never be a problem
due to assigning to pB in a constructor. The only problem
could occur if there was dynamic initialization of the pointer,
which might occur after the assignment in the constructor of
B::instance, and overwrite it.
And of course, until B::instance is constructed, any other
code will see a null pointer in A::pb.

Pointers not associated to NULL by default in Qt

I discovered something playing with QtCreator and bugs I was having in a destructor working with gcc: pointers are not equal to NULL by default; this needs to be added manually (to the constructor in my case). Is this right? How come? What's their default value?
Pointers, as every other primitive type, start living uninitialized, which means that they may have any value (actually, it's undefined behavior to read from any uninitialized variable).
On the other hand, the constructor for structs and classes is guaranteed to run on creation of a class instance, which means that, if it is written correctly, the class starts living in a valid state.
Unless a constructor in place, there are no default values
Looking at int * x; x has an undefined value
std::string is refers to a class with a contructor so with std::string x there is a defined value for x: "empty string constructor (default constructor) Constructs an empty string, with a length of zero characters."

default initialization in C++

I have a question about the default initialization in C++. I was told the non-POD object will be initialized automatically. But I am confused by the code below.
Why when I use a pointer, the variable i is initialized to 0, however, when I declare a local variable, it's not. I am using g++ as the compiler.
class INT {
public: int i;
};
int main () {
INT* myint1 = new INT;
INT myint2;
cout<<"myint1.i is "<<myint1->i<<endl;
cout<<"myint2.i is "<<myint2.i<<endl;
return 0;
}
The output is
myint1.i is 0
myint2.i is -1078649848
You need to declare a c'tor in INT and force 'i' to a well-defined value.
class INT {
public:
INT() : i(0) {}
...
};
i is still a POD, and is thus not initialized by default. It doesn't make a difference whether you allocate on the stack or from the heap - in both cases, the value if i is undefined.
in both cases its not initialized, you were just lucky to get 0 in the first one
It depends on the compiler. The big difference here is that a pointer set to new Something refers to some area of memory in heap, while local variables are stored on the stack. Perhaps your compiler zeroes heap memory but doesn't bother zeroing stack memory; either way, you can't count on either method zeroing your memory. You should use something like ZeroMemory in Win32 or memset from the C standard libraries to zero your memory, or set i = 0 in the constructor of INT.
For a class or struct-type, if you don't tell it which constructor to use when defining a variable, then the default constructor is called. If you didn't define a default constructor, then the compiler creates one for you. If the type is not a class (or struct) type, then it is not initialized since it won't have a constructor, let alone a default constructor (so no built-in types like int will ever be default-initialized).
So, in your example, both myint1 and myint2 are default constructed with the default constuctor that the compiler declared for INT. But since that won't initialize any non-class/struct variables in an INT, the i member variable of INT is not initialized.
If you want i to be initialized, you need to write a default constructor for INT which initializes it.
Firstly, your class INT is POD.
Secondly, when something is "initialized automatically" (for automatic or dynamic objects), it means that the constructor is called automatically. There is no other "automatic" initialization scenario (for automatic or dynamic objects); all other scenarios require a "manually" specified initializer. However, if the constructor does not do anything to perform the desired initialization, then that initialization will not take place. It is your responsibility to write that constructor, when necessary.
In both cases in your example you should get garbage in your objects. The 0 that you observe in case of new-ed object is there purely by accident.
That's true for non-POD object's but your object is POD since it has no user defined constructor and only contains PODs itself.
We just had this topic, you can find some explinations here.
If you add more variables to the class, you will end up with non-initialized member variables.