c++ access static members using null pointer - c++

Recently tried the following program and it compiles, runs fine and produces expected output instead of any runtime error.
#include <iostream>
class demo
{
public:
static void fun()
{
std::cout<<"fun() is called\n";
}
static int a;
};
int demo::a=9;
int main()
{
demo* d=nullptr;
d->fun();
std::cout<<d->a;
return 0;
}
If an uninitialized pointer is used to access class and/or struct members behaviour is undefined, but why it is allowed to access static members using null pointers also. Is there any harm in my program?

TL;DR: Your example is well-defined. Merely dereferencing a null pointer is not invoking UB.
There is a lot of debate over this topic, which basically boils down to whether indirection through a null pointer is itself UB.
The only questionable thing that happens in your example is the evaluation of the object expression. In particular, d->a is equivalent to (*d).a according to [expr.ref]/2:
The expression E1->E2 is converted to the equivalent form
(*(E1)).E2; the remainder of 5.2.5 will address only the first
option (dot).
*d is just evaluated:
The postfix expression before the dot or arrow is evaluated;65 the
result of that evaluation, together with the id-expression, determines
the result of the entire postfix expression.
65) If the class member access expression is evaluated, the subexpression evaluation happens even if the result is unnecessary
to determine the value of the entire postfix expression, for example if the id-expression denotes a static member.
Let's extract the critical part of the code. Consider the expression statement
*d;
In this statement, *d is a discarded value expression according to [stmt.expr]. So *d is solely evaluated1, just as in d->a.
Hence if *d; is valid, or in other words the evaluation of the expression *d, so is your example.
Does indirection through null pointers inherently result in undefined behavior?
There is the open CWG issue #232, created over fifteen years ago, which concerns this exact question. A very important argument is raised. The report starts with
At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."
Note that the example mentioned was changed to cover modifications of const objects instead, and the note in [dcl.ref] - while still existing - is not normative. The normative passage was removed to avoid commitment.
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says
If the lvalue expression is obtained by applying the unary * operator
to a pointer and the pointer is a null pointer value (4.10
[conv.ptr]), the typeid expression throws the bad_typeid exception
(18.7.3 [bad.typeid]).
This is inconsistent and should be cleaned up.
The last point is especially important. The quote in [expr.typeid] still exists and appertains to glvalues of polymorphic class type, which is the case in the following example:
int main() try {
// Polymorphic type
class A
{
virtual ~A(){}
};
typeid( *((A*)0) );
}
catch (std::bad_typeid)
{
std::cerr << "bad_exception\n";
}
The behavior of this program is well-defined (an exception will be thrown and catched), and the expression *((A*)0) is evaluated as it isn't part of an unevaluated operand. Now if indirection through null pointers induced UB, then the expression written as
*((A*)0);
would be doing just that, inducing UB, which seems nonsensical when compared to the typeid scenario. If the above expression is merely evaluated as every discarded-value expression is1, where is the crucial difference that makes the evaluation in the second snippet UB? There is no existing implementation that analyzes the typeid-operand, finds the innermost, corresponding dereference and surrounds its operand with a check - there would be a performance loss, too.
A note in that issue then ends the short discussion with:
We agreed that the approach in the standard seems okay: p = 0; *p;
is not inherently an error. An lvalue-to-rvalue conversion would give
it undefined behavior.
I.e. the committee agreed upon this. Although the proposed resolution of this report, which introduced so-called "empty lvalues", was never adopted…
However, “not modifiable” is a compile-time concept, while in fact
this deals with runtime values and thus should produce undefined
behavior instead. Also, there are other contexts in which lvalues can
occur, such as the left operand of . or .*, which should also be
restricted. Additional drafting is required.
…that does not affect the rationale. Then again, it should be noted that this issue even precedes C++03, which makes it less convincing while we approach C++17.
CWG-issue #315 seems to cover your case as well:
Another instance to consider is that of invoking a member function
from a null pointer:
struct A { void f () { } };
int main ()
{
A* ap = 0;
ap->f ();
}
[…]
Rationale (October 2003):
We agreed the example should be allowed. p->f() is rewritten as
(*p).f() according to 5.2.5 [expr.ref]. *p is not an error when
p is null unless the lvalue is converted to an rvalue (4.1
[conv.lval]), which it isn't here.
According to this rationale, indirection through a null pointer per se does not invoke UB without further lvalue-to-rvalue conversions (=accesses to stored value), reference bindings, value computations or the like. (Nota bene: Calling a non-static member function with a null pointer should invoke UB, albeit merely hazily disallowed by [class.mfct.non-static]/2. The rationale is outdated in this respect.)
I.e. a mere evaluation of *d does not suffice to invoke UB. The identity of the object is not required, and neither is its previously stored value. On the other hand, e.g.
*p = 123;
is undefined since there is a value computation of the left operand, [expr.ass]/1:
In all cases, the assignment is sequenced after the value computation
of the right and left operands
Because the left operand is expected to be a glvalue, the identity of the object referred to by that glvalue must be determined as mentioned by the definition of evaluation of an expression in [intro.execution]/12, which is impossible (and thus leads to UB).
1 [expr]/11:
In some contexts, an expression only appears for its side effects.
Such an expression is called a discarded-value expression. The
expression is evaluated and its value is discarded. […]. The lvalue-to-rvalue conversion (4.1) is
applied if and only if the expression is a glvalue of
volatile-qualified type and […]

From the C++ Draft Standard N3337:
9.4 Static members
2 A static member s of class X may be referred to using the qualified-id expression X::s; it is not necessary to use the class member access syntax (5.2.5) to refer to a static member. A static member may be referred
to using the class member access syntax, in which case the object expression is evaluated.
And in the section about object expression...
5.2.5 Class member access
4 If E2 is declared to have type “reference to T,” then E1.E2 is an lvalue; the type of E1.E2 is T. Otherwise,
one of the following rules applies.
— If E2 is a static data member and the type of E2 is T, then E1.E2 is an lvalue; the expression designates the named member of the class. The type of E1.E2 is T.
Based on the last paragraph of the standard, the expressions:
d->fun();
std::cout << d->a;
work because they both designate the named member of the class regardless of the value of d.

runs fine and produces expected output instead of any runtime error.
That's a basic assumption error. What you are doing is undefined behavior, which means that your claim for any kind of "expected output" is faulty.
Addendum: Note that, while there is a CWG defect (#315) report that is closed as "in agreement" of not making the above UB, it relies on the positive closing of another CWG defect (#232) that is still active, and hence none of it is added to the standard.
Let me quote a part of a comment from James McNellis to an answer to a similar Stack Overflow question:
I don't think CWG defect 315 is as "closed" as its presence on the "closed issues" page implies. The rationale says that it should be allowed because "*p is not an error when p is null unless the lvalue is converted to an rvalue." However, that relies on the concept of an "empty lvalue," which is part of the proposed resolution to CWG defect 232, but which has not been adopted.

The expressions d->fun and d->a() both cause evaluation of *d ([expr.ref]/2).
The complete definition of the unary * operator from [expr.unary.op]/1 is:
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.
For the expression d there is no "object or function to which the expression points" . Therefore this paragraph does not define the behaviour of *d.
Hence the code is undefined by omission, since the behaviour of evaluating *d is not defined anywhere in the Standard.

What you are seeing here is what I would consider an ill-conceived and unfortunate design choice in the specification of the C++ language and many other languages that belong to the same general family of programming languages.
These languages allow you to refer to static members of a class using a reference to an instance of the class. The actual value of the instance reference is of course ignored, since no instance is required to access static members.
So, in d->fun(); the the compiler uses the d pointer only during compilation to figure out that you are referring to a member of the demo class, and then it ignores it. No code is emitted by the compiler to dereference the pointer, so the fact that it is going to be NULL during runtime does not matter.
So, what you see happening is in perfect accordance to the specification of the language, and in my opinion the specification suffers in this respect, because it allows an illogical thing to happen: to use an instance reference to refer to a static member.
P.S. Most compilers in most languages are actually capable of issuing warnings for that kind of stuff. I do not know about your compiler, but you might want to check, because the fact that you received no warning for doing what you did might mean that you do not have enough warnings enabled.

Related

Is reading a variable outside its lifetime during constant evaluation diagnosable?

Shall one expect a reliable failure of in constant evaluation if it reads a variable outside of its lifetime?
For example:
constexpr bool g() {
int * p = nullptr;
{
int c = 0;
p = &c;
}
return *p == 0;
};
int main() {
static_assert( g() );
}
Here Clang stops with the error
read of object outside its lifetime is not allowed in a constant expression
But GCC accepts the program silently (Demo).
Are both compilers within their rights, or GCC must fail the compilation as well?
GCC dropped the ball.
[expr.const]
5 An expression E is a core constant expression unless the
evaluation of E, following the rules of the abstract machine
([intro.execution]), would evaluate one of the following:
...
an operation that would have undefined behavior as specified in [intro] through [cpp];
...
Indirection via dangling pointer has undefined behavior.
[basic.stc.general]
4 When the end of the duration of a region of storage is reached, the values of all pointers representing the address of any part of that region of storage become invalid pointer values. Indirection through an invalid pointer value and passing an invalid pointer value to a deallocation function have undefined behavior. Any other use of an invalid pointer value has implementation-defined behavior.
So the invocation of g() may not be a constant expression, and may not appear in the condition of a static_assert which must be constant evaluated.
The program is ill-formed.
The above quotes are from the C++20 standard draft, but C++17 has them too.
Shall one expect a reliable failure of in constant evaluation if it reads a variable outside of its lifetime?
Yes, but your example doesn't necessarily do this. Its behavior is implementation-defined.
When the block with the variable c exits ([basic.stc.auto]/1), the value of p becomes an invalid pointer value ([basic.stc.general]/4).
When *p is evaluated, the lvalue-to-rvalue conversion ([conv.lval]) is applied to p. And [conv.lval]/3 says:
The result of the conversion is determined according to the following rules:
...
— Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
So.
Are both compilers within their rights, or GCC must fail the compilation as well?
AFAIK neither of the implementations define its behavior here, but I think it could theoretically be defined in such a way that neither the conversion nor the rest of the evaluation would make g() not a constant expression.

Is it undefined behavior to dereference nullptr like this? [duplicate]

First of all, I've seen this question about C99 and the accepted answer references operand is not evaluated wording in the C99 Standard draft. I'm not sure this answer applies to C++03. There's also this question about C++ that has an accepted answer citing similar wording and also In some contexts, unevaluated operands appear. An unevaluated operand is not evaluated. wording.
I have this code:
int* ptr = 0;
void* buffer = malloc( 10 * sizeof( *ptr ) );
The question is - is there a null pointer dereference (and so UB) inside sizeof()?
C++03 5.3.3/1 says The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is not evaluated, or a parenthesized type-id.
The linked to answers cite this or similar wording and make use of "is not evaluated" part to deduce there's no UB.
However I cannot find where exactly the Standard links evaluation to having or not having UB in this case.
Does "not evaluating" the expression to which sizeof is applied make it legal to dereference a null or invalid pointer inside sizeof in C++?
I believe this is currently underspecified in the standard, like many issues such as What is the value category of the operands of C++ operators when unspecified?. I don't think it was intentional, like hvd points outs it is probably obvious to the committee.
In this specific case I think we have the evidence to show what the intention was. From GB 91 comment from the Rapperswil meeting which says:
It is mildly distasteful to dereference a null pointer as part of our specification, as we are playing on the edges of undefined behaviour. With the addition of the declval function template, already used in these same expressions, this is no longer necessary.
and suggested an alternate expression, it refers to this expression which is no longer in the standard but can be found in N3090:
noexcept(*(U*)0 = declval<U>())
The suggestion was rejected since this does not invoke undefined behavior since it is unevaluated:
There is no undefined behavior because the expression is an unevaluated operand. It's not at all clear that the proposed change would be clearer.
This rationale applies to sizeof as well since it's operands are unevaluated.
I say underspecified but I wonder if this is covered by section 4.1 [conv.lval] which says:
The value contained in the object indicated by the lvalue is the rvalue result. When an lvalue-to-rvalue conversion occurs
within the operand of sizeof (5.3.3) the value contained in the referenced object is not accessed, since that operator
does not evaluate its operand.
It says the value contained is not accessed, which if we follow the logic of issue 232 means there is no undefined behavior:
In other words, it is only the act of "fetching", of lvalue-to-rvalue conversion, that triggers the ill-formed or undefined behavior
This is somewhat speculative since the issue is not settled yet.
Since you explicitly asked for standard references - [expr.sizeof]/1:
The operand is either an expression, which is an unevaluated operand
(Clause 5), or a parenthesized type-id.
[expr]/8:
In some contexts, unevaluated operands appear (5.2.8, 5.3.3, 5.3.7,
7.1.6.2). An unevaluated operand is not evaluated.
Because the expression (i.e. the dereferenciation) is never evaluated, this expression is not subject to some constraints that it would normally be violating. Solely the type is inspected. In fact, the standard uses null references itself in an example in [dcl.fct]/12:
A trailing-return-type is most useful for a type that would be more
complicated to specify before the
declarator-id:
template <class T, class U> auto add(T t, U u) -> decltype(t + u);
rather than
template <class T, class U> decltype((*(T*)0) + (*(U*)0)) add(T t, U u);
— end note ]
The specification only says that dereferencing some pointer that is NULL is UB. Since sizeof() is not a real function, and it doesn't actually use the arguments for anything other than getting the type, it never references the pointer. That's WHY it works. Someone else can get all the points for looking up the spec wording that states that "the argument to sizeof doesn't get referenced".
Note that it's also entirely legal to do int arr[2]; size_t s = sizeof(arr[-111100000]); too - it doesn't matter what the index is, because sizeof never actually "does anything" to the argument passed.
Another example to show how it's "not doing anything" would be something like this:
int func()
{
int *ptr = reinterpret_cast<int*>(32);
*ptr = 7;
return 42;
}
size_t size = sizeof(func());
Again, this wouldn't crash, because func() is just resolved by the compiler to the type that it produces.
Equally, if sizeof actually "does something" with the argument, what would happen when you do this:
char *buffer = new sizeof(char[10000000000]);
Would it create a 10000000000 stack allocation, then give the size back after it crashed the code because there isn't enough megabytes of stack? [In some systems, stack size is counted in bytes, not megabytes]. And whilst nobody writes code like that, you could easily come up with something similar using typedef of either buffer_type as an array of char, or some kind of struct with large content.

Accessing field of NULL pointer to a struct works if address-of is used [duplicate]

Code sample:
struct name
{
int a, b;
};
int main()
{
&(((struct name *)NULL)->b);
}
Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".
6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.
The semantics of -> in 6.5.2.3/4 says:
A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.
However, NULL does not point to an object, so the second sentence seems underspecified.
Also relevant might be 6.5.3.2/1:
Constraints:
The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.
6.3.2.1/1 does say:
An lvalue is an expression (with an object type other than void) that potentially
designates an object; if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).
This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)
From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.
From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.
As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.
My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.
Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.
The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)
This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.
Let's start with the indirection operator *:
6.5.3.2 p4:
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type "pointer to type", the result has type "type". If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined. 102)
*E, where E is a null pointer, is undefined behavior.
There is a footnote that states:
102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is
always true that if E is a function designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of
an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.
Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?
C Standard doesn't define that behavior.
If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
This means that an integer constant expression with the value 0 is converted to a null pointer constant.
But the value of a null pointer constant is not defined as 0. The value is implementation defined.
7.19 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant
This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior
Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.
First, let's establish that we need a pointer to an object:
6.5.2.3 Structure and union members
4 A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.96) If the first expression is a pointer to
a qualified type, the result has the so-qualified version of the type of the designated
member.
Unfortunately, no null pointer ever points to an object.
6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Result: Undefined Behavior.
As a side-note, some other things to chew over:
6.3.2.3 Pointers
4 Conversion of a null pointer to another pointer type yields a null pointer of that type.
Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.
Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:
Code loads a null pointer into the addressing unit
Code asks the addressing unit to add the offset of field b.
The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.
The very essence of what Undefined Behavior meant at the time.
Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.
No. Let's take this apart:
&(((struct name *)NULL)->b);
is the same as:
struct name * ptr = NULL;
&(ptr->b);
The first line is obviously valid and well defined.
In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.
In fact, the same approach is used on the C macro offsetof (wikipedia):
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).

What is the significance of special language in standard for lvalue-to-rvalue conversions for unsigned character types of indeterminate value

In the C++14 standard (n3797), the section on lvalue to rvalue conversions reads as follows (emphasis mine):
4.1 Lvalue-to-rvalue-conversion [conv.lval]
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise the type of the prvalue is T.
When an lvalue-to-rvalue conversion occurs in an unevaluated operand
or a subexpression thereof (Clause 5) the value contained in the
referenced object is not accessed. In all other cases, the result of the
conversion is determined according to the following rules:
If T is a (possibly cv-qualified) std::nullptr_t then the result is a null pointer constant.
Otherwise, if T has class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary.
Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
Otherwise, if T is a (possibly cv-qualified) unsigned character type, and the object to which the glvalue refers contains an indeterminate value, and that object does not have automatic storage duration or the glvalue was the operand of a unary & operator or it was bound to a reference, the result is an unspecified value.
Otherwise, if the object to which the glvalue refers has an indeterminate value, the behavior is undefined.
Otherwise, the object indicated by the glvalue is the prvalue result.
[Note: See also 3.10]
What's the significance of this paragraph (in bold)?
If this paragraph were not here, then the situations in which it applies would lead to undefined behavior. Normally, I would expect that accessing an unsigned char value while it has an indeterminate value leads to undefined behavior. But, with this paragraph it means that
If I'm not actually accessing the character value, i.e. I'm immediately passing it to & or binding it to a reference, or
If the unsigned char does not have automatic storage duration,
then the conversion yields an unspecified value, and not undefined behavior.
Am I correct to conclude that this program:
#include <new>
#include <iostream>
// using T = int;
using T = unsigned char;
int main() {
T * array = new T[500];
for (int i = 0; i < 500; ++i) {
std::cout << static_cast<int>(array[i]) << std::endl;
}
delete[] array;
}
is well-defined by the standard, and must output a sequence of 500 unspecified ints, while the same program where T = int, would have undefined behavior?
IIUC, one of the reasons to make it UB to read things with indeterminate values, is to allow aggressive dead store elimination by the optimizer. So, this paragraph may mean that a conforming compiler can't do as much optimization when working with unsigned char or arrays of unsigned char.
Assuming I understand correctly, what is the rationale for this rule? When is it useful to be able to read unsigned char that have indeterminate values, and get unspecified results instead of UB? I have this feeling that if they put this much effort into crafting this part of the rule, they had some motivation to help certain code examples that they cared about, or to be consistent with some other part of the standard, or simplify some other issue. But I have no idea what that might be.
In many situations, code will write some parts of a PODS or array without writing everything, and then use functions like memcpy or fwrite to copy or write the entire thing without regard for which parts had assigned values and which did not. Although it is not terribly common for C++ code to use byte-based operations to copy or write out the contents of aggregates, the ability to do so is a fundamental part of the language. Requiring that a program write definite values to all portions of an object, including those nothing will ever "care" about, would needlessly impair efficiency.

Lvalues which do not designate objects in C++14

I'm using N3936 as a reference here (please correct this question if any of the C++14 text differs).
Under 3.10 Lvalues and rvalues we have:
Every expression belongs to exactly one of the fundamental classifications in this taxonomy: lvalue, xvalue, or prvalue.
However the definition of lvalue reads:
An lvalue [...] designates a function or an object.
In 4.1 Lvalue-to-rvalue conversion the text appears:
[...] In all other cases, the result of the conversion is determined according to the following rules:
[...]
Otherwise, the value contained in the object indicated by the glvalue is the prvalue result.
My question is: what happens in code where the lvalue does not designate an object? There are two canonical examples:
Example 1:
int *p = nullptr;
*p;
int &q = *p;
int a = *p;
Example 2:
int arr[4];
int *p = arr + 4;
*p;
int &q = *p;
std::sort(arr, &q);
Which lines (if any) are ill-formed and/or cause undefined behaviour?
Referring to Example 1: is *p an lvalue? According to my first quote it must be. However, my second quote excludes it since *p does not designate an object. (It's certainly not an xvalue or a prvalue either).
But if you interpret my second quote to mean that *p is actually an lvalue, then it is not covered at all by the lvalue-to-rvalue conversion rules. You may take the catch-all rule that "anything not defined by the Standard is undefined behaviour" but then you must permit null references to exist, so long as there is no lvalue-to-rvalue conversion performed.
History: This issue was raised in DR 232 . In C++11 the resolution from DR232 did in fact appear. Quoting from N3337 Lvalue-to-rvalue conversion:
If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
which still appears to permit null references to exist - it only clears up the issue of performing lvalue-to-rvalue conversion on one. Also discussed on this SO thread
The resolution from DR232 no longer appears in N3797 or N3936 though.
It isn't possible to create a reference to null or a reference to the off-the-end element of an array, because section 8.3.2 says (reading from draft n3936) that
A reference shall be initialized to refer to a valid object or function.
However, it is not clear that forming an expression with a value category of lvalue constitutes "initialization of a reference". Quite the contrary, in fact, temporary objects are objects, and references are not objects, so it cannot be said that *(a+n) initializes a temporary object of reference type.
I think the answer to this although probably not the answer you really want, is that this is under-specified or ill-specified and therefore we can not really say whether the examples you have provided are ill-formed or invoke undefined behavior according the current draft standard.
We can see this by looking DR 232 and DR 453.
DR 232 tells us that the standard conflicts on whether derferencing a null pointer is undefined behavior:
At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says
and introduces the concept of an empty lvalue which is the result of indiretion on a null pointer or one past the end of an array:
if any. If the pointer is a null pointer value (4.10 [conv.ptr]) or
points one past the last element of an array object (5.7 [expr.add]),
the result is an empty lvalue and does not refer to any object or
function.
and proposes that the lvaue-to-rvalue conversion of such is undefined behavior.
and DR 453 tell us that we don't know what a valid object is:
What is a "valid" object? In particular the expression "valid object"
seems to exclude uninitialized objects, but the response to Core Issue
363 clearly says that's not the intent.
and suggests that binding a reference to an empty value is undefined behavior.
If an lvalue to which a reference is directly bound designates neither
an existing object or function of an appropriate type (8.5.3
[dcl.init.ref]), nor a region of memory of suitable size and alignment
to contain an object of the reference's type (1.8 [intro.object], 3.8
[basic.life], 3.9 [basic.types]), the behavior is undefined.
and includes the following examples in the proposal:
int& f(int&);
int& g();
extern int& ir3;
int* ip = 0;
int& ir1 = *ip; // undefined behavior: null pointer
int& ir2 = f(ir3); // undefined behavior: ir3 not yet initialized
int& ir3 = g();
int& ir4 = f(ir4); // ill-formed: ir4 used in its own initializer
So if we want to restrict ourselves to dealing only with the intent then I feel that DR 232 and DR 453 provide the information we need to say that the intention is that lvalue-to-rvalue conversion of a null pointer is undefined behavior and a reference to a null pointer or an indeterminate value is also undefined behavior.
Now although it has taken a while for both of these report resolutions to be sorted out, they are both active with relatively recent updates and apparently the committee so far does not disagree with the main premise that the defects reported are actual defects. So it follows without knowing these two items it would imply it is not possible to provide an answer to your question using the current draft standards.