Related
Consider following sample code:
class C
{
public:
int* x;
};
void f()
{
C* c = static_cast<C*>(malloc(sizeof(C)));
c->x = nullptr; // <-- here
}
If I had to live with the uninitialized memory for any reason (of course, if possible, I'd call new C() instead), I still could call the placement constructor. But if I omit this, as above, and initialize every member variable manually, does it result in undefined behaviour? I.e. is circumventing the constructor per se undefined behaviour or is it legal to replace calling it with some equivalent code outside the class?
(Came across this via another question on a completely different matter; asking for curiosity...)
It is legal now, and retroactively since C++98!
Indeed the C++ specification wording till C++20 was defining an object as (e.g. C++17 wording, [intro.object]):
The constructs in a C++ program create, destroy, refer to, access, and
manipulate objects. An object is created by a definition (6.1), by a
new-expression (8.5.2.4), when implicitly changing the active member
of a union (12.3), or when a temporary object is created (7.4, 15.2).
The possibility of creating an object using malloc allocation was not mentioned. Making it a de-facto undefined behavior.
It was then viewed as a problem, and this issue was addressed later by https://wg21.link/P0593R6 and accepted as a DR against all C++ versions since C++98 inclusive, then added into the C++20 spec, with the new wording:
[intro.object]
The constructs in a C++ program create, destroy, refer to, access, and manipulate objects. An object is created by a definition, by a new-expression, by an operation that implicitly creates objects (see below)...
...
Further, after implicitly creating objects within a specified region of
storage, some operations are described as producing a pointer to a
suitable created object. These operations select one of the
implicitly-created objects whose address is the address of the start
of the region of storage, and produce a pointer value that points to
that object, if that value would result in the program having defined
behavior. If no such pointer value would give the program defined
behavior, the behavior of the program is undefined. If multiple such
pointer values would give the program defined behavior, it is
unspecified which such pointer value is produced.
The example given in C++20 spec is:
#include <cstdlib>
struct X { int a, b; };
X *make_x() {
// The call to std::malloc implicitly creates an object of type X
// and its subobjects a and b, and returns a pointer to that X object
// (or an object that is pointer-interconvertible ([basic.compound]) with it),
// in order to give the subsequent class member access operations
// defined behavior.
X *p = (X*)std::malloc(sizeof(struct X));
p->a = 1;
p->b = 2;
return p;
}
There is no living C object, so pretending that there is one results in undefined behavior.
P0137R1, adopted at the committee's Oulu meeting, makes this clear by defining object as follows ([intro.object]/1):
An object is created by a definition ([basic.def]), by a new-expression ([expr.new]), when implicitly changing the active member of a union ([class.union]), or when a temporary object is created ([conv.rval], [class.temporary]).
reinterpret_cast<C*>(malloc(sizeof(C))) is none of these.
Also see this std-proposals thread, with a very similar example from Richard Smith (with a typo fixed):
struct TrivialThing { int a, b, c; };
TrivialThing *p = reinterpret_cast<TrivialThing*>(malloc(sizeof(TrivialThing)));
p->a = 0; // UB, no object of type TrivialThing here
The [basic.life]/1 quote applies only when an object is created in the first place. Note that "trivial" or "vacuous" (after the terminology change done by CWG1751) initialization, as that term is used in [basic.life]/1, is a property of an object, not a type, so "there is an object because its initialization is vacuous/trivial" is backwards.
I think the code is ok, as long as the type has a trivial constructor, as yours. Using the object cast from malloc without calling the placement new is just using the object before calling its constructor. From C++ standard 12.7 [class.dctor]:
For an object with a non-trivial constructor, referring to any non-static member or base class of the object before the constructor begins execution results in undefined behavior.
Since the exception proves the rule, referrint to a non-static member of an object with a trivial constructor before the constructor begins execution is not UB.
Further down in the same paragraphs there is this example:
extern X xobj;
int* p = &xobj.i;
X xobj;
This code is labelled as UB when X is non-trivial, but as not UB when X is trivial.
For the most part, circumventing the constructor generally results in undefined behavior.
There are some, arguably, corner cases for plain old data types, but you don't win anything avoiding them in the first place anyway, the constructor is trivial. Is the code as simple as presented?
[basic.life]/1
The lifetime of an object or reference is a runtime property of the object or reference. An object is said to have non-vacuous initialization if it is of a class or aggregate type and it or one of its subobjects is initialized by a constructor other than a trivial default constructor. [ Note: initialization by a trivial copy/move constructor is non-vacuous initialization. — end note ] The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
if the object has non-vacuous initialization, its initialization is complete.
The lifetime of an object of type T ends when:
if T is a class type with a non-trivial destructor ([class.dtor]), the destructor call starts, or
the storage which the object occupies is reused or released.
Aside from code being harder to read and reason about, you will either not win anything, or land up with undefined behavior. Just use the constructor, it is idiomatic C++.
This particular code is fine, because C is a POD. As long as C is a POD, it can be initialized that way as well.
Your code is equivalent to this:
struct C
{
int *x;
};
C* c = (C*)malloc(sizeof(C));
c->x = NULL;
Does it not look like familiar? It is all good. There is no problem with this code.
While you can initialize all explicit members that way, you cannot initialize everything a class may contain:
references cannot be set outside an initializer list
vtable pointers cannot be manipulated by code at all
That is, the moment that you have a single virtual member, or virtual base class, or reference member, there is no way to correctly initialize your object except by calling its constructor.
I think it shouldn't be UB. You make your pointer point to some raw memory and are treating its data in a particular way, there's nothing bad here.
If the constructor of this class does something (initializes variables, etc), you'll end up with, again, a pointer to raw, uninitialized object, using which without knowing what the (default) constructor was supposed to be doing (and repeating its behavior) will be UB.
Note: I am using the g++ compiler (which is I hear is pretty good and supposed to be pretty close to the standard).
I have the simplest class I could think of:
class BaseClass {
public:
int pub;
};
Then I have three equally simple programs to create BaseClass object(s) and print out the [uninitialized] value of their data.
Case 1
BaseClass B1;
cout<<"B1.pub = "<<B1.pub<<endl;
This prints out:
B1.pub = 1629556548
Which is fine. I actually thought it would get initialized to zero because it is a POD or Plain Old Datatype or something like that, but I guess not? So far so good.
Case 2
BaseClass B1;
cout<<"B1.pub = "<<B1.pub<<endl;
BaseClass B2;
cout<<"B2.pub = "<<B2.pub<<endl;
This prints out:
B1.pub = 1629556548
B2.pub = 0
This is definitely weird. I created two of the same objects the same exact way. One got initialized and the other did not.
Case 3
BaseClass B1;
cout<<"B1.pub = "<<B1.pub<<endl;
BaseClass B2;
cout<<"B2.pub = "<<B2.pub<<endl;
BaseClass* pB3 = new BaseClass;
cout<<"B3.pub = "<<pB3->pub<<endl;
This prints out:
B1.pub = 0
B2.pub = 0
B3.pub = 0
This is the most weird yet. They all get initialized to zero. All I did was add two lines of code and it changed the previous behavior.
So is this just a case of 'uninitialized data leads to unspecified behavior' or is there something more logical going on 'under the hood'?
I really want to understand the default constructor/destructor behavior because I have a feeling that it will be very important for completely understanding the inheritance stuff..
So is this just a case of 'uninitialized data leads to unspecified behavior'
Yes...
Sometimes if you call malloc (or new, which calls malloc) you will get data that is filled with zeroes because it is in a fresh page from the kernel. Other times it will be full of junk. If you put something on the stack (i.e., auto storage), you will almost certainly get garbage — but it can be hard to debug, because on your system that garbage might happen to be somewhat predictable. And with objects on the stack, you'll find that changing code in a completely different source file can change the values you see in an uninitialized data structure.
About POD: Whether or not something is POD is really a red herring here. I only explained it because the question mentioned POD, and the conversation derailed from there. The two relevant concepts are storage duration and constructors. POD objects don't have constructors, but not everything without a constructor is POD. (Technically, POD objects don't have non-trivial constructors nor members with non-trivial constructors.)
Storage duration: There are three kinds. Static duration is for globals, automatic is for local variables, and dynamic is for objects on the heap. (This is a simplification and not exactly correct, but you can read the C++ standard yourself if you need something exactly correct.)
Anything with static storage duration gets initialized to zero. So if you make a global instance of BaseClass, then its pub member will be zero (at first). Since you put it on the stack and the heap, this rule does not apply — and you don't do anything else to initialize it, so it is uninitialized. It happens to contain whatever junk was left in memory by the last piece of code to use it.
As a rule, any POD on the heap or the stack will be uninitialized unless you initialize it yourself, and the value will be undefined, possible changing when you recompile or run the program again. As a rule, any global POD will get initialized to zero unless you initialize it to something else.
Detecting uninitialized values: Try using Valgrind's memcheck tool, it will help you find where you use uninitialized values — these are usually errors.
It depends how you declare them:
// Assuming the POD type's
class BaseClass
{
public:
int pub;
};
Static Storage Duration Objects
These objects are always zero initialized.
// Static storage duration objects:
// PODS are zero initialized.
BaseClass global; // Zero initialized pub = 0
void plop()
{
static BaseClass functionStatic; // Zero initialized.
}
Automatic/Dynamic Storage Duration Objects
These objects may by default or zero initialized depending on how you declare them
void plop1()
{
// Dynamic
BaseClass* dynaObj1 = new BaseClass; // Default initialized (does nothing)
BaseClass* dynaObj2 = new BaseClass(); // Zero Initialized
// Automatic
BaseClass autoObj1; // Default initialized (does nothing)
BaseClass autoObj2 = BaseClass(); // Zero Initialized
// Notice that zero initialization of an automatic object is not the same
// as the zero initialization of a dynamic object this is because of the most
// vexing parse problem
BaseClass autoObj3(); // Unfortunately not a zero initialized object.
// Its a forward declaration of a function.
}
I use the term 'Zero-Initialized'/'Default-Initialization' but technically slightly more complex. 'Default-Initialization' will becomes 'no-initialization' of the pub member. While the () invokes 'Value-Initialization' that becomes 'Zero-Initialization' of the pub member.
Note: As BaseClass is a POD this class behaves just like the builtin types. If you swap BaseClass for any of the standard type the behavior is the same.
In all three cases, those POD objects could have indeterminate values.
POD objects without any initializer will NOT be initialized by default value. They just contain garbage..
From Standard 8.5 Initializers,
"If no initializer is specified for an object, and the object is of
(possibly cv-qualified) non-POD class type (or array thereof), the
object shall be default-initialized; if the object is of
const-qualified type, the underlying class type shall have a
user-declared default constructor. Otherwise, if no initializer is
specified for a nonstatic object, the object and its subobjects, if
any, have an indeterminate initial value; if the object or any of
its subobjects are of const-qualified type, the program is
ill-formed."
You can zero initialize all the members of a POD struct like this,
BaseClass object={0};
The way you wrote your class, the value of pub is undefined and can be anything. If you create a default constructor that will call the default constructor of pub - it will be guaranteed to be zero:
class BaseClass {
public:
BaseClass() : pub() {}; // calling pub() guarantees it to be zero.
int pub;
};
That would be a much better practice.
As a general rule, yes, uninitialized data leads to unspecified behavior. That's why other languages like C# take steps to insure that you don't use uninitialized data.
This is why you always, always, always, initialize a classes (or ANY variable) to a stable state instead of relying on the compiler to do it for you; especially since some compilers deliberately fill them with garbage. In fact, the only POD MSVC++ doesn't fill with garbage is bool, they get initialized to true. One would think it'd be safer to init it to false, but that's Microsoft for ya.
Case 1
Regardless of whether the encapsulating type is POD, data members of a built-in type are not default initialised by an encapsulating default constructor.
Case 2
No, neither got initialised. The underlying bytes at the memory position of one of them just happened to be 0.
Case 3
Same.
You seem to be expecting some guarantees about the "value" of uninitialised objects, whilst simultaneously professing the understanding that no such value exists.
Why is it that for user defined types when creating an array of objects every element of this array is initialized with the default constructor, but when I create an array of a built-in type that isn't the case?
And second question: Is it possible to specify default value to be used while initializing elements in the array? Something like this (not valid):
char* p = new char[size]('\0');
And another question in this topic while I'm with arrays. I suppose that when creating an array of user defined type, every element of this array will be initialized with default value. Why is this?
If arrays for built in types do not initialize their elements with their defaults, why do they do it for User Defined Types?
Is there a way to avoid/circumvent this default construction somehow? It seems like bit of a waste if I for example have created an array with size 10000, which forces 10000 default constructor calls, initializing data which I will (later on) overwrite anyway.
I think that behaviour should be consistent, so either every type of array should be initialized or none. And I think that the behaviour for built-in arrays is more appropriate.
That's how built-in types work in C++. In order to initialize them you have to supply an explicit initializer. If you don't, then the object will remain uninitialized. This behavior is in no way specific to arrays. Standalone objects behave in exactly the same way.
One problem here is that when you are creating an array using new[], you options for supplying an initializer (in the current version of the language) are very limited. In fact, the only initializer you can supply is the empty ()
char* p = new char[size]();
// The array is filled with zeroes
In case of char type (or any other scalar type), the () initializer will result in zero-initialization, which is incidentally what you tried to do.
Of course, if your desired default value in not zero, you are out of luck, meaning that you have to explicitly assign the default values to the elements of the new[]-ed array afterwards.
As for disabling the default constructor call for arrays of types with user-defined default constructor... well, there's no way to achieve that with ordinary new[]. However, you can do it by implementing your own array construction process (which is what std::vector does, for one example). You first allocate raw memory for the entire array, and then manually construct the elements one-by-one in any way you see fit. Standard library provides a number of primitive intended to be used specifically for that purpose. That includes std::allocator and functions like uninitialized_copy, uninitialized_fill and so on.
Something like this (not valid):
As far as I know that is perfectly valid.
Well not completely, but you can get a zero intialized character array:
#include <iostream>
#include <cstdlib>
int main(int argc, char* argv[])
{
//The extra parenthesis on the end call the "default constructor"
//of char, which initailizes it with zero.
char * myCharacters = new char[100]();
for(size_t idx = 0; idx != 100; idx++) {
if (!myCharacters[idx])
continue;
std::cout << "Error at " << idx << std::endl;
std::system("pause");
}
delete [] myCharacters;
return 0;
}
This program produces no output.
And another question in this topic while I'm with arrays. I suppose that when creating an array of user defined type and knowing the fact that every elem. of this array will be initialized with default value firstly why?
Because there's no good syntactic way to specialize each element allocated with new. You can avoid this problem by using a vector instead, and calling reserve() in advance. The vector will allocate the memory but the constructors will not be called until you push_back into the vector. You should be using vectors instead of user managed arrays anyway because new'd memory handling is almost always not exception safe.
I think that behaviour should be consistent, so either every type of array should be initialized or none. And I think that the behaviour for built-in arrays is more appropriate.
Well if you can think of a good syntax for this you can write up a proposal for the standard -- not sure how far you'll get with that.
Why is it that for user defined types when creating an array of objects every element of this array is initialized with the default constructor, but when I create an array of a built-in type that isn't the case? and
And another question in this topic while I'm with arrays. I suppose that when creating an array of user defined type, every element of this array will be initialized with default value. Why is this? and
If arrays for built in types do not initialize their elements with their defaults, why do they do it for User Defined Types?
Because a user defined type is never ever valid until its constructor is called. Built in types are always valid even if a constructor has not been called.
And second question: Is it possible to specify default value to be used while initializing elements in the array? Something like this (not valid):
Answered this above.
Is there a way to avoid/circumvent this default construction somehow? It seems like bit of a waste if I for example have created an array with size 10000, which forces 10000 default constructor calls, initializing data which I will (later on) overwrite anyway.
Yes, you can use a vector as I described above.
Currently (unless you're using the new C++0x), C++ will call the constructor that takes no arguments e.g. myClass::myClass(). If you want to initialise it to something, implement a constructor like this that initialises your variables. e.g.
class myChar {
public:
myChar();
char myCharVal;
};
myChar::myChar(): myCharVal('\0') {
}
The C++ philosophy is - don't pay for something you don't need.
And I think the behviour is pretty unified. If your UDT didn't have a default constructor, nothing would be run anyway and the behaviour would be the same as for built-in types (which don't have a default constructor).
Why is it that for user defined types when creating an array of objects every element of this array is initialized with the default constructor, but when I create an array of a built-in type that isn't the case?
But if the default constructor of an object does nothing then it is still not initialiized.
class X
{
public:
char y;
}
X* data = new X[size]; // defaut constructor called. But y is still undefined.
And second question: Is it possible to specify default value to be used while initializing elements in the array? > Something like this (not valid):
Yes:
char data1[size] = { 0 };
std::vector<char> data2(size,0);
char* data3 = new char[size];
memset(data3,0,size);
Is there a way to avoid/circumvent this default construction somehow? It seems like bit of a waste if I for example have created an array with size 10000, which forces 10000 default constructor calls, initializing data which I will (later on) overwrite anyway.
Yes. Use a std::vector.
You can reserve the space for all the elements you need without calling the constructor.
std::vector<char> data4;
data4.reserve(size);
I have a question about the default initialization in C++. I was told the non-POD object will be initialized automatically. But I am confused by the code below.
Why when I use a pointer, the variable i is initialized to 0, however, when I declare a local variable, it's not. I am using g++ as the compiler.
class INT {
public: int i;
};
int main () {
INT* myint1 = new INT;
INT myint2;
cout<<"myint1.i is "<<myint1->i<<endl;
cout<<"myint2.i is "<<myint2.i<<endl;
return 0;
}
The output is
myint1.i is 0
myint2.i is -1078649848
You need to declare a c'tor in INT and force 'i' to a well-defined value.
class INT {
public:
INT() : i(0) {}
...
};
i is still a POD, and is thus not initialized by default. It doesn't make a difference whether you allocate on the stack or from the heap - in both cases, the value if i is undefined.
in both cases its not initialized, you were just lucky to get 0 in the first one
It depends on the compiler. The big difference here is that a pointer set to new Something refers to some area of memory in heap, while local variables are stored on the stack. Perhaps your compiler zeroes heap memory but doesn't bother zeroing stack memory; either way, you can't count on either method zeroing your memory. You should use something like ZeroMemory in Win32 or memset from the C standard libraries to zero your memory, or set i = 0 in the constructor of INT.
For a class or struct-type, if you don't tell it which constructor to use when defining a variable, then the default constructor is called. If you didn't define a default constructor, then the compiler creates one for you. If the type is not a class (or struct) type, then it is not initialized since it won't have a constructor, let alone a default constructor (so no built-in types like int will ever be default-initialized).
So, in your example, both myint1 and myint2 are default constructed with the default constuctor that the compiler declared for INT. But since that won't initialize any non-class/struct variables in an INT, the i member variable of INT is not initialized.
If you want i to be initialized, you need to write a default constructor for INT which initializes it.
Firstly, your class INT is POD.
Secondly, when something is "initialized automatically" (for automatic or dynamic objects), it means that the constructor is called automatically. There is no other "automatic" initialization scenario (for automatic or dynamic objects); all other scenarios require a "manually" specified initializer. However, if the constructor does not do anything to perform the desired initialization, then that initialization will not take place. It is your responsibility to write that constructor, when necessary.
In both cases in your example you should get garbage in your objects. The 0 that you observe in case of new-ed object is there purely by accident.
That's true for non-POD object's but your object is POD since it has no user defined constructor and only contains PODs itself.
We just had this topic, you can find some explinations here.
If you add more variables to the class, you will end up with non-initialized member variables.
I have code like this:
class MapIndex
{
private:
typedef std::map<std::string, MapIndex*> Container;
Container mapM;
public:
void add(std::list<std::string>& values)
{
if (values.empty()) // sanity check
return;
std::string s(*(values.begin()));
values.erase(values.begin());
if (values.empty())
return;
MapIndex *&mi = mapM[s]; // <- question about this line
if (!mi)
mi = new MapIndex();
mi->add(values);
}
}
The main concern I have is whether the mapM[s] expression would return reference to NULL pointer if new item is added to the map?
The SGI docs say this: data_type& operator[](const key_type& k)
Returns a reference to the object that is associated with a particular key. If the map does not already contain such an object, operator[] inserts the default object data_type().
So, my question is whether the insertion of default object data_type() will create a NULL pointer, or it could create an invalid pointer pointing somewhere in the memory?
It'll create a NULL (0) pointer, which is an invalid pointer anyway :)
Yes it should be a zero (NULL) pointer as stl containers will default initialise objects when they aren't explicitly stored (ie accessing a non-existant key in a map as you are doing or resizing a vector to a larger size).
C++ Standard, 8.5 paragraph 5 states:
To default-initialize an object of
type T means:
If T is a non-POD class type (clause class), the default
constructor for T is called (and the
initialization is ill-formed if T has
no accessible default constructor)
If T is an array type, each element is default-initialized
Otherwise, the storage for the object iszero-initialized.
You should also note that default initialisation is different to simply ommiting the constructor. When you omit the constructor and simply declare a simple type you will get an indeterminate value.
int a; // not default constructed, will have random data
int b = int(); // will be initialised to zero
UPDATE: I completed my program and that very line I was asking about is causing it to crash sometimes, but at a later stage. The problem is that I'm creating a new object without changing the pointer stored in std::map. What is really needed is either reference or pointer to that pointer.
MapIndex *mi = mapM[s]; // <- question about this line
if (!mi)
mi = new MapIndex();
mi->add(values);
should be changed to:
MapIndex* &mi = mapM[s]; // <- question about this line
if (!mi)
mi = new MapIndex();
mi->add(values);
I'm surprised nobody noticed this.
The expression data_type() value-initializes an object. For a class type with a default constructor, it is invoked; if it doesn’t exist (or is defaulted), such as pointers, the object is zero-initialized.
So yes, you can rely on your map creating a NULL pointer.
Not sure about the crash, but there's definitely a memory leak as this statement:
if (!mi)
mi = new MapIndex();
always returns true, because pointer mi is not a reference to to what mapM is holding for a particular value of s.
I would also avoid using regular pointers and use boost::shared_ptr or some
other pointer that releases memory when destroyed. This allows you to call mapM.clear() or erase(), which should call destructors of keys and values stored in the map. Well, if the value is POD such as your pointer then no destructor is called therefor unless manually deleted, while iterating through a whole map will lead to memory leaks.