I am trying to determine whether the following code invokes undefined behavior:
#include <iostream>
class A;
void f(A& f)
{
char* x = reinterpret_cast<char*>(&f);
for (int i = 0; i < 5; ++i)
std::cout << x[i];
}
int main(int argc, char** argue)
{
A* a = reinterpret_cast<A*>(new char[5])
f(*a);
}
My understanding is that reinterpret_casts to and from char* are compliant because the standard permits aliasing with char and unsigned char pointers (emphasis mine):
If a program attempts to access the stored value of an object through an lvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
However, I am not sure whether f(*a) invokes undefined behavior by creating a A& reference to the invalid pointer. The deciding factor seems to be what "attempts to access" verbiage means in the context of the C++ standard.
My intuition is that this does not constitute an access, since an access would require A to be defined (it is declared, but not defined in this example). Unfortunately, I cannot find a concrete definition of "access" in the C++ standard:
Does f(*a) invoke undefined behavior? What constitutes "access" in the C++ standard?
I understand that, regardless of the answer, it is likely a bad idea to rely on this behavior in production code. I am asking this question primarily out of a desire to improve my understanding of the language.
[Edit] #SergeyA cited this section of the standard. I've included it here for easy reference (emphasis mine):
5.3.1/1 [expr.unary.op]
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.” [Note: indirection through a pointer to an incomplete type (other than cv void) is valid. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to a prvalue, see 4.1. — end note ]
Tracing the reference to 4.1, we find:
4.1/1 [conv.lval]
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise, the type of the prvalue is T.
When an lvalue-to-rvalue conversion is applied to an expression e, and either:
e is not potentially evaluated, or
the evaluation of e results in the evaluation of a member ex of the set of potential results of e, and ex names a variable x that is not odr-used by ex (3.2)
the value contained in the referenced object is not accessed.
I think our answer lies in whether *a satisfies the second bullet point. I am having trouble parsing that condition, so I am not sure.
char* x = reinterpret_cast<char*>(&f); is valid. Or, more specifically, access through x is allowed - the cast itself is always valid.
A* a = reinterpret_cast<A*>(new char[5]) is not valid - or, to be precise, access through a will trigger undefined behaviour.
The reason for this is that while it's OK to access object through a char*, it's not OK to access array of chars through a random object. Standard allows first, but not the second.
Or, in layman terms, you can alias a type* through char*, but you can't alias char* through type*.
EDIT
I just noticed I didn't answer direct question ("What constitutes "access" in the C++ standard"). Apparently, Standard does not define access (at least, I was not able to find the formal definition), but dereferencing the pointer is commonly understood to qualify for access.
Related
Probably this question was raised multiple times but I still cannot find any valid reasoned answer. Consider the following code piece:
struct A {virtual int vfunc() = 0;};
struct B {virtual ~B() {}};
struct C {void *cdata;};
//...
struct Z{};
struct Parent:
public A,
virtual B,
private C,
//...
protected Z
{
int data;
virtual ~Parent(){}
virtual int vfunc() {return 0;} // implements A::vfunc interface
virtual void pvfunc() {};
double func() {return 0.0;}
//...etc
};
struct Child:
public Parent
{
virtual ~Child(){}
int more_data;
virtual int vfunc() {return 0;} // reimplements A::vfunc interface
virtual void pvfunc() {};// implements Parent::pvfunc interface
};
template<class T>
struct Wrapper: public T
{
// do nothing, just empty
};
int main()
{
Child ch;
Wrapper<Child> &wr = reinterpret_cast<Wrapper<Child>&/**/>(ch);
wr.data = 100;
wr.more_data = 200;
wr.vfunc();
//some more usage of wr...
Parent pr = wr;
pr.data == wr.data; // true?
//...
return 0;
}
Basically this shows a cast to reference to dummy child class Wrapper and usage of members its ancestors classes.
The question is: is this code valid by the standard? if not then what exactly does it violate?
PS: Do not provide answers like "this is wrong on so many levels omg" and similar please. I need exact quotes from the standard proving the point.
I surely hope this is something you are doing as an academic exercise. Please do not ever write any real code that resembles any of this in any way. I can't possibly point out all the issues with this snippet of code as there are issues with just about everything in here.
However, to answer the real question - this is complete undefined behavior. In C++17, it is section 8.2.10 [expr.reinterpret.cast]. Use the phrase in the brackets to get the relevant section for previous standards.
EDIT I thought a succinct answer would suffice, but more details have been requested. I will not mention the other code issues, because they will just muddy the water.
There are several key issues here. Let's focus on the reinterpret_cast.
Child ch;
Wrapper<Child> &wr = reinterpret_cast<Wrapper<Child>&/**/>(ch);
Most of the wording in the spec uses pointers, so based on 8.2.10/11, we will change the example code slightly to this.
Child ch;
Wrapper<Child> *wr = reinterpret_cast<Wrapper<Child>*>(&ch);
Here is the quoted part of the standard for this justification.
A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. The result refers to the same object as the source glvalue, but with the specified type. [ Note: That is, for lvalues, a reference cast reinterpret_cast(x) has the same effect as the conversion *reinterpret_cast(&x) with the built-in & and * operators (and similarly for reinterpret_cast(x)). — end note ] No temporary is created, no copy is made, and constructors (15.1) or conversion functions (15.3) are not called.
One subtle little part of the standard is 6.9.2/4 which allows for certain special cases for treating a pointer to one object as if it were pointing to an object of a different type.
Two objects a and b are pointer-interconvertible if:
(4.1) — they are the same object, or
(4.2) - one is a standard-layout union object and the other is a non-static data member of that object (12.3), or
(4.3) — one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, the first base class subobject of that object (12.2), or
(4.4) — there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer- interconvertible.
If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_cast (8.2.10). [ Note: An array object and its first element are not pointer-interconvertible, even though they have the same address. — end note ]
However, your case does not meet this criteria, so we can't use this exception to treat a pointer to Child as if it were a pointer to Wrapper<Child>.
We will ignore the stuff about reinterpret_cast that does not deal with casting between two pointer types, since this case just deals with pointer types.
Note the last sentence of 8.2.10/1
Conversions that can be performed explicitly using reinterpret_cast are listed below. No other conversion can be performed explicitly using reinterpret_cast.
There are 10 paragraphs that follow.
Paragraph 2 says reinterpret_cast can't cast away constness. Not our concern.
Paragraph 3 says that the result may or may not produce a different representation.
Paragraphs 4 and 5 are about casting between pointers and integral types.
Paragraph 6 is about casting function pointers.
Paragraph 8 is about converting between function pointers and object pointers.
Paragraph 9 is about converting null pointer values.
Paragraph 10 is about converting between member pointers.
Paragraph 11 is quoted above and basically says that casting references is akin to casting pointers.
That leaves paragraph 7, which states.
An object pointer can be explicitly converted to an object pointer of a different type.73 When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast(static_cast(v)). [ Note: Converting a prvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. — end note ]
This means that we can cast back and forth between those two pointer types all day long. However, that's all we can safely do. You are doing more than that, and yes, there are a few exceptions that allow for some other things.
Here is 6.10/8
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
(8.1) — the dynamic type of the object,
(8.2) — a cv-qualified version of the dynamic type of the object,
(8.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
(8.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,
(8.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
(8.6) — an aggregate or union type that includes one of the aforementioned types among its elements or non- static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
(8.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
(8.8) — a char, unsigned char, or std::byte type.
You case does not satisfy any of those.
In your case, you are taking a pointer to one type, and forcing the compiler to pretend that it is pointing to a different type. Does not matter how much the two look to your eyes - did you know that a completely standard conforming compiler does not have to put data for a derived class after the data for a base class? Those details are NOT part of the C++ standard, but part of the ABI your compiler implements.
In fact, there are very few cases where using reinterpret_cast for anything other than carrying a pointer around and then casting it back to its original type that does not elicit undefined behavior.
As stated in another answer, this discussion relates to section
8.2.10 [expr.reinterpret.cast] of the C++17 standard.
Sentence 11 of this section explains that for references to
objects we can have the same reasoning as for pointers to
objects.
Wrapper<Child> &wr = reinterpret_cast<Wrapper<Child>&/**/>(ch);
or
Wrapper<Child> *wr = reinterpret_cast<Wrapper<Child>*/**/>(&ch);
Sentence 7 of this section explains that for pointers to objects
reinterpret_cast can be seen as two static_cast in sequence
(through void *).
In the specific case of this question, the type Wrapper<Child>
actually inherits from Child, so a single static_cast should
be sufficient (no need for two static_cast, nor reinterpret_cast).
So if reinterpret_cast can be seen here as the combination of a
useless static_cast through void * and a correct static_cast,
it should be considered equivalent to this correct static_cast.
hum...
On second thought, I think I'm totally wrong!
(the static_cast is incorrect, I have read it the wrong way)
If we had
Wrapper<Child> wc=...
Child *pc=&wc;
Wrapper<Child> *pwc=static_cast<Wrapper<Child>*>(pc);
the static_cast (then the reinterpret_cast) would be correct
because it goes back to the original type.
But in your example the original the original type was not
Wrapper<Child> but Child.
Even if it is very unlikely, nothing forbids the compiler to
add some hidden data members in Wrapper<Child>.
Wrapper<Child> is not an empty structure, it participates
in a hierarchy with dynamic polymorphism, and any solution could
be used under the hood by the compiler.
So, after reinterpret_cast, it becomes undefined behavior because
the address stored in the pointer (or reference) will point to
some bytes with the layout of Child but the following code
will use these bytes with the layout of Wrapper<Child>
which may be different.
Let B be derived from class A. By reading various posts I've got an impression that casting like in
const std::shared_ptr<const A> a(new B());
const std::shared_ptr<const B>& b = reinterpret_cast<const std::shared_ptr<const B>&>(a);
is for some reason discouraged and that one should use reinterpret_pointer_cast instead. However, I would like to avoid creating a new shared_ptr for performance reasons. Is the above code legal? Does it lead to undefined behavior? It seems to work in gcc and in Visual Studio.
You want static_pointer_cast.
const std::shared_ptr<const A> a(new B());
const std::shared_ptr<const B> b = std::static_pointer_cast<const B>(a);
I highly doubt the above will cause any performance issues. But if you have evidence that a shared_ptr creates a performance problem, then fallback to the raw pointer:
const B* pB = static_cast<const B*>(a.get());
Another hint. Please try to avoid reinterpret_cast between classes with an inheritance relationship. In cases where there are virtual methods and/or multiple inheritance, the static_cast will correctly adjust the pointer offset to the correct vtable or base offset. But reinterpret_cast will not. (Or technically: undefined behavior)
reinterpret_cast will usually lead to UB. Sometimes you are willing to take the risk of using it, for performances reasons, but you will try to avoid this kind of thing as much as you can. In this case, it better for you to use static_pointer_cast.
Pay attention, that even if you don't know, in this case, which other cast can you use, and you willing to take the risk with reinterpret_cast, you must use some validations after and before the casting- otherwise you will be able to get a lot of errors, and a lot of time spending.
First, you create an object a of type const std::shared_ptr<const A> a and initialize it with a pointer to some type B. This only works if you can assign a B* to an A*, so there should be a relationship such as inheritance. Ignoring this, you convert an object of some type to a reference to another type with reinterpret_cast:
A glvalue expression of type T1 can be cast to the type “reference to
T2” if an expression of type “pointer to T1” can be explicitly
converted to the type “pointer to T2” using a reinterpret_cast
The result refers to the same object as the source glvalue, but with
the specified type. [ Note: That is, for lvalues, a reference cast
reinterpret_cast(x) has the same effect as the conversion
*reinterpret_cast(&x) with the built-in & and * operators (and similarly for reinterpret_cast(x)). —end note ]
For pointers, reinterpret_cast boils down to conversion to void* and then to the target type:
An object pointer can be explicitly converted to an object pointer of
a different type.72 When a prvalue v of object pointer type is
converted to the object pointer type “pointer to cv T”, the result is
static_cast<cv T*>(static_cast<cv void*>(v)).
The semantics of the two static casts are defined as:
A prvalue of type “pointer to cv1 void” can be converted to a prvalue
of type “pointer to cv2 T,” where T is an object type and cv2 is the
same cv-qualification as, or greater cv-qualification than, cv1. The
null pointer value is converted to the null pointer value of the
destination type. If the original pointer value represents the address
A of a byte in memory and A satisfies the alignment requirement of T,
then the resulting pointer value represents the same address as the
original pointer value, that is, A. The result of any other such
pointer conversion is unspecified.
The platform I am working on has near and far pointers which are 16 or 32 bit. In that case, the types shared_ptr<A> and shared_ptr<B> are of different size and alignment, and casting one into the other is then unspecified behavior. If alignment matches, the result of the static casts is defined.
However, the first clause about reinterpret_cast to a reference also contains a note
[ Note: That is, for lvalues, a reference
cast reinterpret_cast<T&>(x) has the same effect as the conversion *reinterpret_cast<T*>(&x) with
the built-in & and * operators (and similarly for reinterpret_cast<T&&>(x)). —end note ]
so basically, the cast is semantically identical to a pointer conversion with immediate dereferencing. Even if the pointers are of identical size (and compatible alignment), using the casted pointer will violate the strict alias rule since dereferencing is an access.
If a program attempts to access the stored value of an object
through a glvalue of other than one of the
following types the behavior is undefined:53
— the dynamic type of the object,
— a cv-qualified version of the dynamic type of the object,
— a type similar (as defined in 4.4) to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to the dynamic type of the object,
— a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
of the object,
— an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic
data members (including, recursively, an element or non-static data member of a subaggregate
or contained union),
Using functionalities of a shared_ptr, or any other standard class or template, is only defined when calling the functions (including member functions) for the class of the type which you pass to the function (including as implicit this argument):
Nothing in the standard defines what happens when you call a standard function expecting a Foo and passing a Bar, for any two standard types Foo and Bar (or even for user types.)
That's not defined; that's undefined. By not meeting the most basic precondition: to use arguments of the correct type.
Sample code:
struct S { int x; };
int func()
{
S s{2};
return (int &)s; // Equivalent to *reinterpret_cast<int *>(&s)
}
I believe this is common and considered acceptable. The standard does guarantee that there is no initial padding in the struct. However this case is not listed in the strict aliasing rule (C++17 [basic.lval]/11):
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
(11.1) the dynamic type of the object,
(11.2) a cv-qualified version of the dynamic type of the object,
(11.3) a type similar (as defined in 7.5) to the dynamic type of the object,
(11.4) a type that is the signed or unsigned type corresponding to the dynamic type of the object,
(11.5) a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
(11.6) an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
(11.7) a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
(11.8) a char, unsigned char, or std::byte type.
It seems clear that the object s is having its stored value accessed.
The types listed in the bullet points are the type of the glvalue doing the access, not the type of the object being accessed. In this code the glvalue type is int which is not an aggregate or union type, ruling out 11.6.
My question is: Is this code correct, and if so, under which of the above bullet points is it allowed?
The behaviour of the cast comes down to [expr.static.cast]/13;
A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original
pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T , then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.
The definition of pointer-interconvertible is:
Two objects a and b are pointer-interconvertible if:
they are the same object, or
one is a union object and the other is a non-static data member of that object, or
one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, the first base class subobject of that object, or
there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
So in the original code, s and s.x are pointer-interconvertible and it follows that (int &)s actually designates s.x.
So, in the strict aliasing rule, the object whose stored value is being accessed is s.x and not s and so there is no problem, the code is correct.
I think it's in expr.reinterpret.cast#11
A glvalue expression of type T1, designating an object x, can be cast
to the type “reference to T2” if an expression of type “pointer to T1”
can be explicitly converted to the type “pointer to T2” using a
reinterpret_cast. The result is that of *reinterpret_cast<T2 *>(p)
where p is a pointer to x of type “pointer to T1”. No temporary is
created, no copy is made, and no constructors or
conversion functions are called [1].
[1] This is sometimes referred to as a type pun when the result refers to the same object as the source glvalue
Supporting #M.M's answer about pointer-incovertible:
from cppreference:
Assuming that alignment requirements are met, a reinterpret_cast does
not change the value of a pointer outside of a few limited cases
dealing with pointer-interconvertible objects:
struct S { int a; } s;
int* p = reinterpret_cast<int*>(&s); // value of p is "pointer to s.a" because s.a
// and s are pointer-interconvertible
*p = 2; // s.a is also 2
versus
struct S { int a; };
S s{2};
int i = (int &)s; // Equivalent to *reinterpret_cast<int *>(&s)
// i doesn't change S.a;
The cited rule is derived from a similar rule in C89 which would be nonsensical as written unless one stretches the meaning of the word "by", or recognizes what "Undefined Behavior" meant when C89 was written. Given something like struct S {unsigned dat[10];}s;, the statement s.dat[1]++; would clearly modify the stored value of s, but the only lvalue of type struct S in that expression is used solely for the purpose of producing a value of type unsigned*. The only lvalue which is used to modify any object is of type int.
As I see it, there are two related ways of resolving this issue: (1) recognizing that the authors of the Standard wanted to allow cases where an lvalue of one type was visibly derived from one of another type, but didn't want to get hung up on details of what forms of visible derivation must be accounted for, especially since the range of cases compilers would need to recognize would vary considerably based upon the styles of optimization they performed and the tasks for which they were being used; (2) recognizing that the authors of the Standard had no reason to think it should matter whether the Standard actually required that a particular construct be processed usefully, if it would be have been clear to everyone that there was reason to do otherwise.
I don't think there has consensus among the Committee members over whether a compiler given something like:
struct foo {int ct; int *dat;} it;
void test(void)
{
for (int i=0; i < it.ct; i++)
it.dat[i] = 0;
}
should be required to ensure that e.g. after it.ct = 1234; it.dat = &it.ct;, a call to test(); would zero it.ct and have no other effect. Parts of the Rationale would suggest that at least some committee members would have expected so, but the omission of any rule that would allow for an object of structure type to be accessed using an arbitrary lvalue of member type suggests otherwise. The C Standard has never really resolved this issue, and the C++ Standard cleans things up somewhat but doesn't really solve it either.
3.10/10
If a program attempts to access the stored value of an object through
a glvalue of other than one of the following types the behavior is
undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
a type similar (as defined in 4.4) to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to the dynamic type of the object,
a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including,
recursively, an element or non-static data member of a subaggregate or
contained union),
a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
a char or unsigned char type.
Readers should be aware that the paragraph cited by the OP is a religious issue, as the basis of a disputed behavior of the g++ compiler. Any answer sowing doubt on the accuracy or completeness of this paragraph (and it's neither) will generally get downvoted on SO.
Here's an example of UB according to the paragraph you're citing:
struct X { int i; };
auto main() -> int
{
X o{ 0 };
return reinterpret_cast<int&>( o );
}
Considering each possibility in C++11 §3.10/10 in order:
Is “the dynamic type of the object” o an int?
No, the dynamic type is an X.
Is int perhaps “a cv-qualified version” of the dynamic type X?
No, X is not an int, cv-qualified or not.
Is int “a type similar (as defined in 4.4) to the dynamic type of the object”?
Again, no. 4.4 deals with multi-level cv-qualification.
Well, is int “a type that is the signed or unsigned type corresponding to the dynamic type of the object”?
No, there are no signed or unsigned versions of a class type like X.
So what about “a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object”?
No.
Well, is int perhaps “an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union)”?
No, not that either.
So maybe int is “a type that is a (possibly cv-qualified) base class type of the dynamic type of the object”?
No, an int can’t be a base class.
Finally, is int “a char or unsigned char type”?
No.
And this exhausts all possibilities, proving that according to that paragraph in isolation, this code has Undefined Behavior.
However, this code is guaranteed to work by another part of the standard (I guess mainly for C compatibility).
So, the paragraph you cite isn't 100% good even for the completely platform-independent formal.
Edit: "dyp" asked in a comment how this relates to use of an xvalue. An xvalue is a glvalue, so one can just substitute an xvalue for the lvalue expression o. An example of such xvalue is an rvalue reference returned from a function, e.g. from std::move:
#include <utility>
using std::move;
struct X { int i; };
template< class T >
auto ref( T&& r ) -> T& { return r; }
auto main() -> int
{
X o{ 0 };
return reinterpret_cast<int&>( ref( move( o ) ) );
}
All this does is however to mask the essentials.
Skimming through the standard draft (n3242) I found this sentence in Clause 9.2 (emphasis mine):
Non-static (9.4) data members shall not have incomplete types. In
particular, a class C shall not contain a non-static member of class
C, but it can contain a pointer or reference to an object of class
C.
From this I argue that is fine to define a class like this:
class A {
public:
A(A& a) : a_(a){
}
private:
A& a_;
};
Then in clause 8.3.2 I found the following:
A reference shall be initialized to refer to a valid object or
function
Question 1: Is it permitted to define an object of this type passing its name as a reference:
A a(a);
or will this trigger undefined behavior?
Question 2: If yes, what are the parts of the standard that permit the initialization of the reference from a still-to-be-constructed object?
Question 3: If no, does this mean the definition of class A is well formed but no first object can be created without triggering UB? In this case what is the rationale behind this?
"valid object" is not defined anywhere in the standard, but it is intented to mean a region of memory with appropriate size and alignment that can contain an object of the specified type. It just means to exclude references to such things as dereferenced null pointers, misaligned regions of memory, etc. An uninitialised object is valid.
There is an open issue to clear up the wording, CWG 453.
n3337 § 3.8/6
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
— an lvalue-to-rvalue conversion (4.1) is
applied to such a glvalue,
— the glvalue is used to access a
non-static data member or call a non-static member function of the
object, or
— the glvalue is implicitly converted (4.10) to a reference
to a base class type, or
— the glvalue is used as the operand of a
static_cast (5.2.9) except when the conversion is ultimately to cv
char& or cv unsigned char&, or
— the glvalue is used as the operand of
a dynamic_cast (5.2.7) or as the operand of typeid.
So, to answer your questions:
Question 1: Is it permitted to define an object of this type passing
its name as a reference?
Yes. Using just the address seems not to violate this (at least for a variable put on stack).
A a(a);
or will this trigger undefined behavior?
No.
Question 2: If yes, what are the parts of the standard that permit the
initialization of the reference from a still-to-be-constructed object?
§ 3.8/6 (above)
The only question that remains is how this correspond to
A reference shall be initialized to refer to a valid object or
function.
The problem is in term valid object. Because § 8.3.2/4 says that
It is unspecified whether or not a reference requires storage
it seems that § 8.3.2 is problematic and should be reworded. The confusion lead to change proposed in document C++ Standard Core Language Active Issues, Revision 87 dated on 20.01.2014:
A reference shall be initialized to refer to an object or function.
Change 8.3.2 [dcl.ref] paragraph 4 as follows:
If an lvalue to which a reference is directly bound designates neither
an existing object or function of an appropriate type (8.5.3
[dcl.init.ref]), nor a region of memory of suitable size and alignment
to contain an object of the reference's type (1.8 [intro.object], 3.8
[basic.life], 3.9 [basic.types]), the behavior is undefined.
From n1905, 3.3.1.1
The point of declaration for a name is immediately after its complete
declarator (clause 8 ) and before its initializer (if any), except as
noted below.
[ Example:
int x = 12;
{ int x = x; }
Here the second x
is initialized with its own (indeterminate) value.
—end example ]
My emphasis ( correct me if I am wrong ): In your example -
A a(a);
is equivalent to -
A a = a; // Copy initialization
So, according to standard a is initialized with it's own indeterminate value. And the member is holding reference to one such indeterminate value.