Does a member have to be initialized to take its address? - c++

Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
#include <string>
class Klass {
public:
Klass()
: ptr_str{&str}
, str{}
{}
private:
std::string *ptr_str;
std::string str;
};
this question is similar to mine, but the order is correct there, and the answer says
I'd advise against coding like this in case someone changes the order of the members in your class.
Which seems to mean reversing the order would be illegal but I couldn't be sure.

Does a member have to be initialized to take its address?
No.
Can I initialize a pointer to a data member before initializing the member? In other words, is this valid C++?
Yes. Yes.
There is no restriction that operand of unary & need to be initialised. There is an example in the standard in specification of unary & operator:
int a;
int* p1 = &a;
Here, the value of a is indeterminate and it is OK to point to it.
What that example doesn't demonstrate is pointing to an object before its lifetime has begun, which is what happens in your example. Using a pointer to an object before and after its lifetime is explicitly allowed if the storage is occupied. Standard draft says:
[basic.life] Before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways ...
The rule goes on to list how the usage is restricted. You can get by with common sense. In short, you can treat it as you could treat a void*, except violating these restrictions is UB rather than ill-formed. Similar rule exists for references.
There are also restrictions on computing the address of non-static members specifically. Standard draft says:
[class.cdtor] ... To form a pointer to (or access the value of) a direct non-static member of an object obj, the construction of obj shall have started and its destruction shall not have completed, otherwise the computation of the pointer value (or accessing the member value) results in undefined behavior.
In the constructor of Klass, the construction of Klass has started and destruction hasn't completed, so the above rule is satisfied.
P.S. Your class is copyable, but the copy will have a pointer to the member of another instance. Consider whether that makes sense for your class. If not, you will need to implement custom copy and move constructors and assignment operators. A self-reference like this is a rare case where you may need custom definitions for those, but not a custom destructor, so it is an exception to the rule of five (or three).
P.P.S If your intention is to point to one of the members, and no object other than a member, then you might want to use a pointer to member instead of pointer to object.

Funny question.
It is legitimate and will "work", though barely. There is a little "but" related to types which makes the whole thing a bit awkward with a bad taste (but not illegitimate), and which might make it illegal some border cases involving inheritance.
You can, of course, take the address of any object whether it's initialized or not, as long as it exists in the scope and has a name which you can prepend operator& to. Dereferencing the pointer is a different thing, but that wasn't the question.
Now, the subtle problem is that the standard defines the result of operator& for non-static struct members as "“pointer to member of class C of type T” and is a prvalue designating C::m".
Which basically means that ptr_str{&str} will take the address of str, but the type is not pointer-to, but pointer-to-member-of. It is then implicitly and silently cast to pointer-to.
In other words, although you do not need to explicitly write &this->str, that's nevertheless what its type is -- it's what it is and what it means [1].
Is this valid, and is it safe to use it within the initializer list? Well yes, just... barely. It's safe to use it as long as it's not being used to access uninitialized members or virtual functions, directly or indirectly. Which, as it happens, is the case here (it might not be the case in a different, arguably contrived case).
[1] Funnily, paragraph 4 starts with a clause that says that no member pointer is formed when you put stuff in parentheses. That's remarkable because most people would probably do that just to be 100% sure they got operator precedence right. But if I read correctly, then &this->foo and &(this->foo) are not in any way the same!

Related

Is it legal to construct data members of a struct separately?

class A;
class B;
//we have void *p pointing to enough free memory initially
std::pair<A,B> *pp=static_cast<std::pair<A,B> *>(p);
new (&pp->first) A(/*...*/);
new (&pp->second) B(/*...*/);
After the code above get executed, is *pp guaranteed to be in a valid state? I know the answer is true for every compiler I have tested, but the question is whether this is legal according to the standard and hence. In addition, is there any other way to obtain such a pair if A or B is not movable in C++98/03? (thanks to #StoryTeller-UnslanderMonica , there is a piecewise constructor for std::pair since C++11)
“Accessing” the members of the non-existent pair object is undefined behavior per [basic.life]/5; pair is never a POD-class (having user-declared constructors), so a pointer to its storage out of lifetime may not be used for its members. It’s not clear whether forming a pointer to the member is already undefined, or if the new is.
Neither is there a way to construct a pair of non-copyable (not movable, of course) types in C++98—that’s why the piecewise constructor was added along with move semantics.
A more simple question: is using a literal string well defined?
Not even that is, as its lifetime is not defined. You can't use a string literal in conforming code.
So the committee that never took the time to make string literals well defined obviously did not bother with specifying which objects of class type can be made to exist by placement new of its subobjects - polymorphic objects obviously cannot be created that way!
That standard did not even bother describing the semantic of union.
About lifetime the standard is all over the place, and that isn't just editorial: it reflects a deep disagreement between serious people about what begins a lifetime, what an object is, what an lvalue is, etc.
Notably people have all sorts of false or contradicting intuitions:
an infinite number of objects cannot be created by one call to malloc
an lvalue refers to an object
overlapping objects are against the object model
a unnamed object can only be created by new or by the compiler (temporaries)
...

Could reinterpret_cast turn an invalid pointer value into a valid one?

Consider this union:
union A{
int a;
struct{
int b;
} c;
};
c and a are not layout-compatibles types so it is not possible to read the value of b through a:
A x;
x.c.b=10;
x.a+x.a; //undefined behaviour (UB)
Trial 1
For the case below I think that since C++17, I also get an undefined behavior:
A x;
x.a=10;
auto p = &x.a; //(1)
x.c.b=12; //(2)
*p+*p; //(3) UB
Let's consider [basic.type]/3:
Every value of pointer type is one of the following:
a pointer to an object or function (the pointer is said to point to the object or function), or
a pointer past the end of an object ([expr.add]), or
the null pointer value ([conv.ptr]) for that type, or
an invalid pointer value.
Let's call this 4 pointer values categories as pointer value genre.
The value of a pointer may transition from of the above mentioned genre to an other, but the standard is not really explicit about that. Fill free to correct me if I am wrong. So I suppose that at (1) the value of p is a pointer to value. Then in (2) a life ends and the value of p becomes an invalid pointer value. So in (3) I get UB because I try to access the value of an object (a) out of its lifetime.
Trial 2
Now consider this weird code:
A x;
x.a=10;
auto p = &x.a; //(1)
x.c.b=12; //(2)
p = reinterpret_cast<int*>(p); //(2')
*p+*p; //(3) UB?
Could the reinterpret_cast<int*>(p) change the pointer value genre from invalid pointer value to a pointer to value.
reinterpret_cast<int*>(p) is defined to be equivalent to static_cast<int*>(static_cast<void*>(p)), so let's consider how is defined the static_cast from void* to int*, [expr.static.cast]/13:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.
So in our case the original pointer pointed to the object a. So I suppose the reinterpret_cast will not help because a is not within its lifetime. Is my reading to strict? Could this code be well defined?
Then in (2) a life ends and the value of p becomes an invalid pointer value.
Incorrect. Pointers only become invalid when they point into memory that has ended its storage duration.
The pointer in this case becomes a pointer to an object outside of its lifetime. The object it points to is gone, but the pointer is not "invalid" in the way the specification means it. [basic.life] spends quite a bit of time explaining what you can and cannot do to pointers to objects outside of their lifetime.
reinterpret_cast cannot turn a pointer to an object outside of its lifetime into a pointer to a different object that is within its lifetime.
The notion of objects in the standard is rather abstract and differs somewhat from intuition. An object may be within its lifetime or not, and objects not within their lifetimes can have the same address, this is why unions work at all: the definition of active member is "the member that is within its lifetime".
A pointer to an object not within its lifetime is still a pointer to object. reinterpret_cast only casts between the type of the pointer, but not its validity. The UB you get with casting to non-pointer-interconvertible types are due to the strict-aliasing rule, not due to the validity of the pointer.
In all your trials, including your follow up question, you are using an object not within its lifetime in ways that aren't allowed, ie accessing it, and are consequently UB.
Every version to date of the C and C++ Standards has been ambiguous or contradictory with regard to what can be done with addresses of union members. The authors of the C Standard didn't want to require that compilers make pessimistic allowances for the possibility that functions might be invoked by constructs like:
someFunction(&myUnion.member1, &myUnion.member2);
in cases where function would cause the value one member of myUnion would be changed between access made via the other. While the ability to take union members' addresses would have been pretty useless if code couldn't do things like:
someFunction1(&myUnion.member1);
someFunction2(&myUnion.member2);
someFunction3(&myUnion.member1);
the authors of the Standard expected that quality implementations intended for various purposes would process constructs that Undefined Behavior "in a documented fashion characteristic of the environment" when doing so would best serve those purposes, and thus thought that making support for such constructs be a quality-of-implementation issue would be simpler than trying to formulate precise rules for which patterns must be supported. A compiler that generated code for the called functions in the second example without knowing their calling context wouldn't be able to interleave accesses performed by the two functions, and a quality compiler that expanded them inline while processing the above code would have no trouble noticing when each pointer was derived from myUnion.
The authors of the C89 Standard didn't think it necessary to define precise rules for how pointers to union members behave, because they thought compiler writers' desire to produce quality implementations would drive them to handle appropriate cases sensibly even without such rules. Unfortunately, some compiler writers were too lazy to handle cases like the second example above, and rather than recognizing that there was never any reason for quality compilers to be incapable of handling such cases, the authors of later C and C++ Standards have bent over backward to come up with weirdly contorted, ambiguous, and contradictory rules that justify such compiler behavior.
As a result, the address-of operator should only be regarded as meaningfully applicable to union members in cases where the resulting pointer will be used for accessing individual bytes of storage, either using character-types directly, or passing to functions like memcpy that are defined in such fashion. Unless or until there's a major revamp of the Standard, or an appendix that describes means by which implementations can offer optional guarantees beyond what the Standard requires, it would be best to pretend that union members are--like bitfields--lvalues that don't have addresses.

Managing trivial types

I have found the intricacies of trivial types in C++ non-trivial to understand and hope someone can enlighten me on the following.
Given type T, storage for T allocated using ::operator new(std::size_t) or ::operator new[](std::size_t) or std::aligned_storage, and a void * p pointing to a location in that storage suitably aligned for T so that it may be constructed at p:
If std::is_trivially_default_constructible<T>::value holds, is the code invoking undefined behavior when code skips initialization of T at p (i.e. by using T * tPtr = new (p) T();) before otherwise accessing *p as T? Can one just use T * tPtr = static_cast<T *>(p); instead without fear of undefined behavior in this case?
If std::is_trivially_destructible<T>::value holds, does skipping destruction of T at *p (i.e by calling tPtr->~T();) cause undefined behavior?
For any type U for which std::is_trivially_assignable<T, U>::value holds, is std::memcpy(&t, &u, sizeof(U)); equivalent to t = std::forward<U>(u); (for any t of type T and u of type U) or will it cause undefined behavior?
No, you can't. There is no object of type T in that storage, and accessing the storage as if there was is undefined. See also T.C.'s answer here.
Just to clarify on the wording in [basic.life]/1, which says that objects with vacuous initialization are alive from the storage allocation onward: that wording obviously refers to an object's initialization. There is no object whose initialization is vacuous when allocating raw storage with operator new or malloc, hence we cannot consider "it" alive, because "it" does not exist. In fact, only objects created by a definition with vacuous initialization can be accessed after storage has been allocated but before the vacuous initialization occurs (i.e. their definition is encountered).
Omitting destructor calls never per se leads to undefined behavior. However, it's pointless to attempt any optimizations in this area in e.g. templates, since a trivial destructor is just optimized away.
Right now, the requirement is being trivially copyable, and the types have to match. However, this may be too strict. Dos Reis's N3751 at least proposes distinct types to work as well, and I could imagine this rule being extended to trivial copy assignment across one type in the future.
However, what you've specifically shown does not make a lot of sense (not least because you're asking for assignment to a scalar xvalue, which is ill-formed), since trivial assignment can hold between types whose assignment is not actually "trivial", that is, has the same semantics as memcpy. E.g. is_trivially_assignable<int&, double> does not imply that one can be "assigned" to the other by copying the object representation.
Technically reinterpreting storage is not enough to introduce a new object as. Look at the note for Trivial default constructor states:
A trivial default constructor is a constructor that performs no action. All data types compatible with the C language (POD types) are trivially default-constructible. Unlike in C, however, objects with trivial default constructors cannot be created by simply reinterpreting suitably aligned storage, such as memory allocated with std::malloc: placement-new is required to formally introduce a new object and avoid potential undefined behavior.
But the note says it's a formal limitation, so probably it is safe in many cases. Not guaranteed though.
No. is_assignable does not even guarantee the assignment will be legal under certain conditions:
This trait does not check anything outside the immediate context of the assignment expression: if the use of T or U would trigger template specializations, generation of implicitly-defined special member functions etc, and those have errors, the actual assignment may not compile even if std::is_assignable::value compiles and evaluates to true.
What you describe looks more like is_trivially_copyable, which says:
Objects of trivially-copyable types are the only C++ objects that may be safely copied with std::memcpy or serialized to/from binary files with std::ofstream::write()/std::ifstream::read().
I don't really know. I would trust KerrekSB's comments.

Built-in datatypes versus User defined datatypes in C++

Constructors build objects from dust.
This is a statement which I have been coming across many times,recently.
While initializing a built-in datatype variable, the variable also HAS to be "built from dust" . So, are there also constructors for built in types?
Also, how does the compiler treat a BUILT IN DATATYPE and a USER DEFINED CLASS differently, while creating instances for each?
I mean details regarding constructors, destructors etc.
This query on stack overflow is regarding the same and it has some pretty intresting details , most intresting one being what Bjarne said ... !
Do built-in types have default constructors?
Simply put, according to the C++ standard:
12.1 Constructors [class.ctor]
2. A constructor is used to initialize objects of its class type...
so no, built-in datatypes (assuming you're talking about things like ints and floats) do not have constructors because they are not class types. Class types are are specified as such:
9 Classes [class]
1. A class is a type. Its name becomes a class-name (9.1) within
its scope.
class-name:
identifier
template-id
Class-specifiers and elaborated-type-specifiers (7.1.5.3) are used
to make class-names. An object of a class consists of a (possibly
empty) sequence of members and base class objects.
class-specifier:
class-head { member-specification (opt) }
class-head:
class-key identifieropt base-clauseopt
class-key nested-name-specifier identifier base-clauseopt
class-key nested-name-specifieropt template-id base-clauseopt
class-key:
class
struct
union
And since the built-in types are not declared like that, they cannot be class types.
So how are instances of built-in types created? The general process of bringing built-in and class instances into existance is called initialization, for which there's a huge 8-page section in the C++ standard (8.5) that lays out in excruciating detail about it. Here's some of the rules you can find in section 8.5.
As already mentioned, built-in data types don't have constructors.
But you still can use construction-like initialization syntax, like in int i(3), or int i = int(). As far as I know that was introduced to language to better support generic programming, i.e. to be able to write
template <class T>
T f() { T t = T(); }
f(42);
While initializing a built-in datatype variable, the variable also HAS to be "built from dust" . So, are there also constructors for built in types?
Per request, I am rebuilding my answer from dust.
I'm not particularly fond of that "Constructors build objects from dust" phrase. It is a bit misleading.
An object, be it a primitive type, a pointer, or a instance of a big class, occupies a certain known amount of memory. That memory must somehow be set aside for the object. In some circumstances, that set-aside memory is initialized. That initialization is what constructors do. They do not set aside (or allocate) the memory needed to store the object. That step is performed before the constructor is called.
There are times when a variable does not have to be initialized. For example,
int some_function (int some argument) {
int index;
...
}
Note that index was not assigned a value. On entry to some_function, a chunk of memory is set aside for the variable index. This memory already exists somewhere; it is just set aside, or allocated. Since the memory already exists somewhere, each bit will have some pre-existing value. If a variable is not initialized, it will have an initial value. The initial value of the variable index might be 42, or 1404197501, or something entirely different.
Some languages provide a default initialization in case the programmer did not specify one. (C and C++ do not.) Sometimes there is nothing wrong with not initializing a variable to a known value. The very next statement might be an assignment statement, for example. The upside of providing a default initialization is that failing to initialize variables is a typical programming mistake. The downside is that this initialization has a cost, albeit typically tiny. That tiny cost can be significant when it occurs in a time-critical, multiply-nested loop. Not providing a default initial value fits the C and C++ philosophy of not providing something the programmer did not ask for.
Some variables, even non-class variables, absolutely do need to be given an initial value. For example, there is no way to assign a value to a variable that is of a reference type except in the declaration statement. The same goes for variables that are declared to be constant.
Some classes have hidden data that absolutely do need to be initialized. Some classes have const or reference data members that absolutely do need to be initialized. These classes need to be initialized, or constructed. Not all classes do need to be initialized. A class or structure that doesn't have any virtual functions, doesn't have an explicitly-provided constructor or destructor, and whose member data are all primitive data types, is called plain old data, or POD. POD classes do not need to be constructed.
Bottom line:
An object, whether it is a primitive type or an instance of a very complex class, is not "built from dust". Dust is, after all, very harmful to computers. They are built from bits.
Setting aside, or allocating, memory for some object and initializing that set-aside memory are two different things.
The memory need to store an object is allocated, not created. The memory already exists. Because that memory already exists, the bits that comprise the object will have some pre-existing values. You should of course never rely on those preexisting values, but they are there.
The reason for initializing variables, or data members, is to give them a reliable, known value. Sometimes that initialization is just a waste of CPU time. If you didn't ask the compiler to provide such a value, C and C++ assume the omission is intentional.
The constructor for some object does not allocate the memory needed to store the object itself. That step has already been done by the time the constructor is called. What a constructor does do is to initialize that already allocated memory.
The initial response:
A variable of a primitive type does not have to be "built from dust". The memory to store the variable needs to be allocated, but the variable can be left uninitialized. A constructor does not build the object from dust. A constructor does not allocate the memory needed to store the to-be constructed object. That memory has already been allocated by the time the constructor is called. (A constructor might initialize some pointer data member to memory allocated by the constructor, but the bits occupied by that pointer must already exist.)
Some objects such as primitive types and POD classes do not necessarily need to be initialized. Declare a non-static primitive type variable without an initial value and that variable will be uninitialized. The same goes for POD classes. Suppose you know you are going to assign a value to some variable before the value of the variable is accessed. Do you need to provide an initial value? No.
Some languages do give an initial value to every variable. C and C++ do not. If you didn't ask for an initial value, C and C++ are not going to force an initial value on the variable. That initialization has a cost, typically tiny, but it exists.
Built In data types(fundamental types, arrays,references, pointers, and enums) do not have constructors.
A constructor is a member function. A member function can only be defined for a class type
C++03 9.3/1:
"Functions declared in the definition of a class, excluding those declared with a friend specifier, are called member functions of that class".
Many a times usage of an POD type in certain syntax's(given below) might give an impression that they are constructed using constructors or copy constructors but it just Initialization without any of the two.
int x(5);

C++: member pointer initialised?

Code sample should explain things:
class A
{
B* pB;
C* pC;
D d;
public :
A(int i, int j) : d(j)
{
pC = new C(i, "abc");
} // note pB is not initialised, e.g. pB(NULL)
...
};
Obviously pB should be initialised to NULL explicitly to be safe (and clear), but, as it stands, what is the value of pB after construction of A? Is it default initialised (which is zero?) or not (i.e. indeterminate and whatever was in memory). I realise initialisation in C++ has a fair few rules.
I think it isn't default initialised; as running in debug mode in Visual Studio it has set pB pointing to 0xcdcdcdcd - which means the memory has been new'd (on the heap) but not initialised. However in release mode, pB always points to NULL. Is this just by chance, and therefore not to be relied upon; or are these compilers initialising it for me (even if it's not in the standard)? It also seems to be NULL when compiled with Sun's compiler on Solaris.
I'm really looking for a specific reference to the standard to say one way or the other.
Thanks.
Here is the relevant passage fromt he standard:
12.6.2 Initializing bases and members [class.base.init]
4 If a given nonstatic data member or
base class is not named by a mem-
initializer-id in the
mem-initializer-list, then
--If the entity is a nonstatic data
member of (possibly cv-qualified)
class type (or array thereof) or a base class, and the entity class
is a non-POD class, the entity is default-initialized (dcl.init).
If the entity is a nonstatic data member of a const-qualified type,
the entity class shall have a user-declared default constructor.
--Otherwise, the entity is not
initialized. If the entity is of
const-qualified type or reference type, or of a (possibly cv-quali-
fied) POD class type (or array thereof) containing (directly or
indirectly) a member of a const-qualified type, the program is
ill-
formed.
After the call to a constructor for
class X has completed, if a member
of X is neither specified in the
constructor's mem-initializers, nor
default-initialized, nor initialized
during execution of the body of
the constructor, the member has
indeterminate value.
According to the C++0x standard section 12.6.2.4, in the case of your pointer variable, if you don't include it in the initializer list and you don't set it in the body of the constructor, then it has indeterminate value. 0xCDCDCDCD and 0 are two possible such values, as is anything else. :-)
I believe this is a artifact from the good old C days when you could not have expectations on what alloc'd memory contains. As the standards progressed to C++ this "convention" was maintained. As the C++ compilers developed the individual authors took it upon themselves to "fix" this problem. Therefore your mileage may vary depending on your compiler of choice.
The "0xcdcdcdcd" looks to be a readily identifiable pattern that "helps" in debugging you code. That is why it doesn't show in release mode.
I hope this helped in a little way and good luck.
Uninitialised pointers are allow to basically contain a random value, although some compilers tend to fill them with 0 or some other recognisable value, especially in debug mode.
IMHO this is due to C++'s "don't pay for what you don't use" design. If you don't consider it important, the compiler does not need to go through the expense of initialising the variable for you. Of course, once you've chased a random pointer you might find it prudent to initialise it the next time around...
The value of pB is undefined. It may or may not be consistently the same value - usually depends on what was previously at the same place in memory prior to the allocation of a particular instance of A.
Uninitialised pointers can point to anything. Some compiler vendors will help you out and make them point to 0 or 0xcdcdcdcd or whatever.
To make sure your code is safe and portable you should always initialise your pointers. either to 0 or to a valid value.
e.g.
C* pc = 0;
or
C* pc = new C(...);
If you always initialise pointers to 0 then this is safe :
if (!pc)
pc = new C(...);
If you don't initialise then you've got no way of telling initialised and uninitialised pointers apart.
As an aside, there's no such keyword in C++ as NULL. Most compilers define NULL as 0, but it's not considered portable to use it. The new c++0x standard will introduce a new keyword, nullptr, so when that comes out we'll finally have a portable null pointer constant.
It's rare that I'll recommend not learning something about the language you're using, but in this case, whether or not pB is initialized isn't useful information. Just initialize it. If it's automatically initialized, the compiler will optimize out the extra initialization. If it isn't, you've added one extra processor instruction and prevented a whole slew of potential bugs.