What happens if you dereference `new int`? - c++

Is the following safe?
*(new int);
I get output as 0.

It’s undefined because you’re reading an object with an indeterminate value. The expression new int() uses zero-initialisation, guaranteeing a zero value, while new int (without parentheses) uses default-initialisation, giving you an indeterminate value. This is effectively the same as saying:
int x; // not initialised
cout << x << '\n'; // undefined value
But in addition, since you are immediately dereferencing the pointer to the object you just allocated, and do not store the pointer anywhere, this constitutes a memory leak.
Note that the presence of such an expression does not necessarily make a program ill-formed; this is a perfectly valid program, because it sets the value of the object before reading it:
int& x = *(new int); // x is an alias for a nameless new int of undefined value
x = 42;
cout << x << '\n';
delete &x;

This is undefined behavior(UB) since you are accessing an indeterminate value, C++14 clearly makes this undefined behavior. We can see that new without initializer is default initialized, from the draft C++14 standard section 5.3.4 New paragraph 17 which says (emphasis mine going forward):
If the new-initializer is omitted, the object is default-initialized
(8.5). [ Note: If no initialization is performed, the object has an
indeterminate value. —end note ]
for int this means an indeterminate value, from section 8.5 paragraph 7 which says:
To default-initialize an object of type T means:
— if T is a (possibly cv-qualified) class type (Clause 9), the default constructor (12.1) for T is called (and
the initialization is ill-formed if T has no default constructor or overload resolution (13.3) results in an
ambiguity or in a function that is deleted or inaccessible from the context of the initialization);
— if T is an array type, each element is default-initialized;
— otherwise, no initialization is performed.
we can see from section 8.5 that producing an indeterminate value is undefined:
If no initializer is specified for an object, the object is
default-initialized. When storage for an object with automatic or
dynamic storage duration is obtained, the object has an indeterminate
value, and if no initialization is performed for the object, that
object retains an indeterminate value until that value is replaced
(5.17). [ Note: Objects with static or thread storage duration are
zero-initialized, see 3.6.2. — end note
If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases
and all the exceptions have to do with unsigned narrow char which int is not.
Jon brings up an interesting example:
int& x = *(new int);
it may not be immediately obvious why this is not undefined behavior. The key point to notice is that is is undefined behavior to produce a value but in this case no value is produced. We can see this by going to section 8.5.3 References, which covers initialization of references and it says:
A reference to type “cv1 T1” is initialized by an expression of type “cv2 T2” as follows:
— If the reference is an lvalue reference and the initializer expression
— is an lvalue (but is not a bit-field), and “cv1 T1” is reference-compatible with “cv2 T2,” or
and goes on to say:
then the reference is bound to the initializer expression lvalue in
the first case [...][ Note: The usual lvalue-to-rvalue (4.1),
array-to-pointer (4.2), and function-to-pointer (4.3) standard
conversions are not needed, and therefore are suppressed, when such
direct bindings to lvalues are done. —end note ]

It is possible that a computer has "trapping" values of int: invalid values, such as a checksum bit which raises a hardware exception when it doesn't match its expected state.
In general, uninitialized values lead to undefined behavior. Initialize it first.
Otherwise, no, there's nothing wrong or really unusual about dereferencing a new-expression. Here is some odd, but entirely valid code using your construction:
int & ir = * ( new int ) = 0;
…
delete & ir;

First of all, Shafik Yaghmour gave references to the Standard in his answer. That is the best, complete and authoritative answer. None the less, let me try to give you specific examples that should illustrate the aforementioned points.
This code is safe, well-formed and meaningful:
int *p = new int; // ie this is a local variable (ptr) that points
// to a heap-allocated block
You must not, however, dereference the pointer as that results in undefined behavior. IE you may get 0x00, or 0xFFFFFFFF, or the instruction pointer (aka RIP register on Intel) may jump to a random location. The computer may crash.
int *p = new int;
std::cout << *p; // Very, bad. Undefined behavior.
Run-time checkers such as Valgrind and ASan will catch the issue, flag it and crash with a nice error message.
It is, however, perfectly fine to initialize the memory block you had allocated:
int *p = new int;
*p = 0;
Background info: this particular way of writing the specification is very useful for performance, as it is prohibitively expensive to implement the alternative.
Note, as per the Standard references, sometimes the initialization is cheap, so you can do the following:
// at the file scope
int global1; // zero-initialized
int global2 = 1; // explicitly initialized
void f()
{
std::cout << global1;
}
These things go into the executable's sections (.bss and .data) and are initialized by the OS loader.

Related

Can I legally reuse fields in aggregate struct initialization?

The program below compiles and runs without warnings with gcc:
#include <iostream>
struct A { int a, b; };
int main() {
A x = {.a = 42, .b = x.a}; // <-- b is initialized from x.a
std::cout << x.a << ' ' << x.b << std::endl;
}
42 42
x is essentially used during its initialization. This is very handy for cases where .a is initialized by a large expression.
Is this a legal expression in C++, and am I guaranteed to always get the right answer?
[dcl.init.aggr]
The initializations of the elements of the aggregate are evaluated in the element order.
That is, all value computations and side effects associated with a given element are sequenced before those of any element that follows it in order.
Thus, x.a has been initialised before initialisation of x.b. So far, so good.
[basic.life]
The lifetime of an object or reference is a runtime property of the object or reference. A variable is said to have vacuous initialization if it is default-initialized and, if it is of class type or a (possibly multi-dimensional) array thereof, that class type has a trivial default constructor. The lifetime of an object of type T begins when:
storage with the proper alignment and size for type T is obtained, and
its initialization (if any) is complete (including vacuous initialization) ([dcl.init]),
The lifetime of x.a has begun although the lifetime of x has not.
Similarly, before the lifetime of an object has started ... any glvalue that refers to the original object may be used but only in limited ways.
For an object under construction or destruction, see [class.cdtor].
Otherwise, such a glvalue refers to allocated storage ([basic.stc.dynamic.allocation]), and using the properties of the glvalue that do not depend on its value is well-defined.
The program has undefined behavior if:
the glvalue is used to access the object, or
...
"Access" is defined:
[defns.access]
⟨execution-time action⟩ read or modify the value of an object
Note 1: Only objects of scalar type can be accessed.
Reads of scalar objects are described in [conv.lval] and modifications of scalar objects are describred in [expr.ass], [expr.post.incr], and [expr.pre.incr].
Attempts to read or modify an object of class type typically invoke a constructor or assignment operator; such invocations do not themselves constitute accesses, although they may involve accesses of scalar subobjects.
— end note]
According to this note, accessing x.a is not an access of the class object named by x. Thus, x.a being initialised should be sufficient and the example is well-defined and OK.
Minor problem: Notes are not normative.
Edit: I removed quotes to rules that apply to objects "under/during construction" with the assumption that those do not apply to aggregates being initialised.
P.S. Perhaps for clarity, or even just to not make readers of the code to be concerned about the legality, consider using an intermediate variable:
int temp = 42;
A x = {.a = temp, .b = temp};
P.P.S Designated initialisers were first introduced to standard C++ in C++20. They are not in C+17.

What is the significance of special language in standard for lvalue-to-rvalue conversions for unsigned character types of indeterminate value

In the C++14 standard (n3797), the section on lvalue to rvalue conversions reads as follows (emphasis mine):
4.1 Lvalue-to-rvalue-conversion [conv.lval]
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If T is a non-class type, the type of the prvalue is the cv-unqualified version of T. Otherwise the type of the prvalue is T.
When an lvalue-to-rvalue conversion occurs in an unevaluated operand
or a subexpression thereof (Clause 5) the value contained in the
referenced object is not accessed. In all other cases, the result of the
conversion is determined according to the following rules:
If T is a (possibly cv-qualified) std::nullptr_t then the result is a null pointer constant.
Otherwise, if T has class type, the conversion copy-initializes a temporary of type T from the glvalue and the result of the conversion is a prvalue for the temporary.
Otherwise, if the object to which the glvalue refers contains an invalid pointer value, the behavior is implementation-defined.
Otherwise, if T is a (possibly cv-qualified) unsigned character type, and the object to which the glvalue refers contains an indeterminate value, and that object does not have automatic storage duration or the glvalue was the operand of a unary & operator or it was bound to a reference, the result is an unspecified value.
Otherwise, if the object to which the glvalue refers has an indeterminate value, the behavior is undefined.
Otherwise, the object indicated by the glvalue is the prvalue result.
[Note: See also 3.10]
What's the significance of this paragraph (in bold)?
If this paragraph were not here, then the situations in which it applies would lead to undefined behavior. Normally, I would expect that accessing an unsigned char value while it has an indeterminate value leads to undefined behavior. But, with this paragraph it means that
If I'm not actually accessing the character value, i.e. I'm immediately passing it to & or binding it to a reference, or
If the unsigned char does not have automatic storage duration,
then the conversion yields an unspecified value, and not undefined behavior.
Am I correct to conclude that this program:
#include <new>
#include <iostream>
// using T = int;
using T = unsigned char;
int main() {
T * array = new T[500];
for (int i = 0; i < 500; ++i) {
std::cout << static_cast<int>(array[i]) << std::endl;
}
delete[] array;
}
is well-defined by the standard, and must output a sequence of 500 unspecified ints, while the same program where T = int, would have undefined behavior?
IIUC, one of the reasons to make it UB to read things with indeterminate values, is to allow aggressive dead store elimination by the optimizer. So, this paragraph may mean that a conforming compiler can't do as much optimization when working with unsigned char or arrays of unsigned char.
Assuming I understand correctly, what is the rationale for this rule? When is it useful to be able to read unsigned char that have indeterminate values, and get unspecified results instead of UB? I have this feeling that if they put this much effort into crafting this part of the rule, they had some motivation to help certain code examples that they cared about, or to be consistent with some other part of the standard, or simplify some other issue. But I have no idea what that might be.
In many situations, code will write some parts of a PODS or array without writing everything, and then use functions like memcpy or fwrite to copy or write the entire thing without regard for which parts had assigned values and which did not. Although it is not terribly common for C++ code to use byte-based operations to copy or write out the contents of aggregates, the ability to do so is a fundamental part of the language. Requiring that a program write definite values to all portions of an object, including those nothing will ever "care" about, would needlessly impair efficiency.

initialize array, placement new, read variables, defined behavior?

given a class who's only member is a char[10], that has no inheritance nor virtual members, that has a constructor that does not mention the array in any way (such that it gets default-initialization -> no initialization, like so:
class in_place_string {
char data[10];
static struct pre_initialized_type {} pre_initialized;
in_place_string(pre_initialized_type) {} //This is the constructor in question
in_place_string() :data() {} //this is so you don't yell at me, not relevent
};
Is it defined behavior to placement-new this class into a buffer that already has data, and then read from the array member?
int main() {
char buffer[sizeof(in_place_string)] = "HI!";
in_place_string* str = new(buffer) in_place_string(in_place_string::pre_initialized);
cout << str->data; //undefined behavior?
}
I'm pretty sure it's not well defined, so I'm asking if this is implementation defined or undefined behavior.
You're not performing a reinterpret_cast (which wouldn't be safe, since the class has non-trivial initialization); you're creating a new object whose member is uninitialized.
Performing lvalue->rvalue conversion on an uninitialized object gives an indeterminate value and undefined behavior. So is the object uninitialized?
According to 5.3.4 all objects created by new-expression have dynamic storage duration. There's no exception for placement new.
Entities created by a new-expression have dynamic storage duration
And then 8.5 says
If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17). [ Note: Objects with static or thread storage duration are zero-initialized, see end note ] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
and the following cases permit only unsigned char, and even then the value is not useful.
In your case the new object has dynamic storage duration (!) and its members for which no initialization is performed have indeterminate value. Reading them gives undefined behavior.
I think the relevant clause is 8.5 [dcl.init] paragraph 12:
If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17). [ Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2. —end note ] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the
following cases:
If an indeterminate value of unsigned narrow character type (3.9.1) is produced by the evaluation of:
the second or third operand of a conditional expression (5.16),
the right operand of a comma expression (5.18),
the operand of a cast or conversion to an unsigned narrow character type (4.7, 5.2.3, 5.2.9, 5.4), or
a discarded-value expression (Clause 5), then the result of the operation is an indeterminate value.
If an indeterminate value of unsigned narrow character type is produced by the evaluation of the right operand of a simple assignment operator (5.17) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.
If an indeterminate value of unsigned narrow character type is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.
I don't think any of the exception applies. Since the value is read before being initialized after the object is constructed, I think the code results in undefined behavior.

Lvalues which do not designate objects in C++14

I'm using N3936 as a reference here (please correct this question if any of the C++14 text differs).
Under 3.10 Lvalues and rvalues we have:
Every expression belongs to exactly one of the fundamental classifications in this taxonomy: lvalue, xvalue, or prvalue.
However the definition of lvalue reads:
An lvalue [...] designates a function or an object.
In 4.1 Lvalue-to-rvalue conversion the text appears:
[...] In all other cases, the result of the conversion is determined according to the following rules:
[...]
Otherwise, the value contained in the object indicated by the glvalue is the prvalue result.
My question is: what happens in code where the lvalue does not designate an object? There are two canonical examples:
Example 1:
int *p = nullptr;
*p;
int &q = *p;
int a = *p;
Example 2:
int arr[4];
int *p = arr + 4;
*p;
int &q = *p;
std::sort(arr, &q);
Which lines (if any) are ill-formed and/or cause undefined behaviour?
Referring to Example 1: is *p an lvalue? According to my first quote it must be. However, my second quote excludes it since *p does not designate an object. (It's certainly not an xvalue or a prvalue either).
But if you interpret my second quote to mean that *p is actually an lvalue, then it is not covered at all by the lvalue-to-rvalue conversion rules. You may take the catch-all rule that "anything not defined by the Standard is undefined behaviour" but then you must permit null references to exist, so long as there is no lvalue-to-rvalue conversion performed.
History: This issue was raised in DR 232 . In C++11 the resolution from DR232 did in fact appear. Quoting from N3337 Lvalue-to-rvalue conversion:
If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
which still appears to permit null references to exist - it only clears up the issue of performing lvalue-to-rvalue conversion on one. Also discussed on this SO thread
The resolution from DR232 no longer appears in N3797 or N3936 though.
It isn't possible to create a reference to null or a reference to the off-the-end element of an array, because section 8.3.2 says (reading from draft n3936) that
A reference shall be initialized to refer to a valid object or function.
However, it is not clear that forming an expression with a value category of lvalue constitutes "initialization of a reference". Quite the contrary, in fact, temporary objects are objects, and references are not objects, so it cannot be said that *(a+n) initializes a temporary object of reference type.
I think the answer to this although probably not the answer you really want, is that this is under-specified or ill-specified and therefore we can not really say whether the examples you have provided are ill-formed or invoke undefined behavior according the current draft standard.
We can see this by looking DR 232 and DR 453.
DR 232 tells us that the standard conflicts on whether derferencing a null pointer is undefined behavior:
At least a couple of places in the IS state that indirection through a
null pointer produces undefined behavior: 1.9 [intro.execution]
paragraph 4 gives "dereferencing the null pointer" as an example of
undefined behavior, and 8.3.2 [dcl.ref] paragraph 4 (in a note) uses
this supposedly undefined behavior as justification for the
nonexistence of "null references."
However, 5.3.1 [expr.unary.op] paragraph 1, which describes the unary
"*" operator, does not say that the behavior is undefined if the
operand is a null pointer, as one might expect. Furthermore, at least
one passage gives dereferencing a null pointer well-defined behavior:
5.2.8 [expr.typeid] paragraph 2 says
and introduces the concept of an empty lvalue which is the result of indiretion on a null pointer or one past the end of an array:
if any. If the pointer is a null pointer value (4.10 [conv.ptr]) or
points one past the last element of an array object (5.7 [expr.add]),
the result is an empty lvalue and does not refer to any object or
function.
and proposes that the lvaue-to-rvalue conversion of such is undefined behavior.
and DR 453 tell us that we don't know what a valid object is:
What is a "valid" object? In particular the expression "valid object"
seems to exclude uninitialized objects, but the response to Core Issue
363 clearly says that's not the intent.
and suggests that binding a reference to an empty value is undefined behavior.
If an lvalue to which a reference is directly bound designates neither
an existing object or function of an appropriate type (8.5.3
[dcl.init.ref]), nor a region of memory of suitable size and alignment
to contain an object of the reference's type (1.8 [intro.object], 3.8
[basic.life], 3.9 [basic.types]), the behavior is undefined.
and includes the following examples in the proposal:
int& f(int&);
int& g();
extern int& ir3;
int* ip = 0;
int& ir1 = *ip; // undefined behavior: null pointer
int& ir2 = f(ir3); // undefined behavior: ir3 not yet initialized
int& ir3 = g();
int& ir4 = f(ir4); // ill-formed: ir4 used in its own initializer
So if we want to restrict ourselves to dealing only with the intent then I feel that DR 232 and DR 453 provide the information we need to say that the intention is that lvalue-to-rvalue conversion of a null pointer is undefined behavior and a reference to a null pointer or an indeterminate value is also undefined behavior.
Now although it has taken a while for both of these report resolutions to be sorted out, they are both active with relatively recent updates and apparently the committee so far does not disagree with the main premise that the defects reported are actual defects. So it follows without knowing these two items it would imply it is not possible to provide an answer to your question using the current draft standards.

Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++14?

As covered in Does initialization entail lvalue-to-rvalue conversion? Is int x = x; UB? the C++ standard has a surprising example in section 3.3.2 Point of declaration in which an int is initialized with it's own indeterminate value:
int x = 12;
{ int x = x; }
Here the second x is initialized with its own (indeterminate) value.
— end example ]
Which Johannes answer to this question indicates is undefined behavior since it requires an lvalue-to-rvalue conversion.
In the latest C++14 draft standard N3936 which can be found here this example has changed to:
unsigned char x = 12;
{ unsigned char x = x; }
Here the second x is initialized with its own (indeterminate) value.
— end example ]
Has something changed in C++14 with respect to indeterminate values and undefined behavior that has driven this change in the example?
Yes, this change was driven by changes in the language which makes it undefined behavior if an indeterminate value is produced by an evaluation but with some exceptions for unsigned narrow characters.
Defect report 1787 whose proposed text can be found in N39141 was recently accepted in 2014 and is incorporated in the latest working draft N3936:
The most interesting change with respect to indeterminate values would be to section 8.5 paragraph 12 which goes from:
If no initializer is specified for an object, the object is default-initialized; if no initialization is performed, an object with automatic or dynamic storage duration has indeterminate value. [ Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2. — end note ]
to (emphasis mine):
If no initializer is specified for an object, the object is
default-initialized. When storage for an object with automatic or
dynamic storage duration is obtained, the object has an indeterminate
value, and if no initialization is performed for the object, that
object retains an indeterminate value until that value is replaced
(5.17 [expr.ass]). [Note: Objects with static or thread storage
duration are zero-initialized, see 3.6.2 [basic.start.init]. —end
note] If an indeterminate value is produced by an evaluation, the
behavior is undefined except in the following cases:
If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of:
the second or third operand of a conditional expression (5.16 [expr.cond]),
the right operand of a comma (5.18 [expr.comma]),
the operand of a cast or conversion to an unsigned narrow character type (4.7 [conv.integral], 5.2.3 [expr.type.conv], 5.2.9
[expr.static.cast], 5.4 [expr.cast]), or
a discarded-value expression (Clause 5 [expr]),
then the result of the operation is an indeterminate value.
If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the right
operand of a simple assignment operator (5.17 [expr.ass]) whose first
operand is an lvalue of unsigned narrow character type, an
indeterminate value replaces the value of the object referred to by
the left operand.
If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the
initialization expression when initializing an object of unsigned
narrow character type, that object is initialized to an indeterminate
value.
and included the following example:
[ Example:
int f(bool b) {
unsigned char c;
unsigned char d = c; // OK, d has an indeterminate value
int e = d; // undefined behavior
return b ? d : 0; // undefined behavior if b is true
}
— end example ]
We can find this text in N3936 which is the current working draft and N3937 is the C++14 DIS.
Prior to C++1y
It is interesting to note that prior to this draft unlike C which has always had a well specified notion of what uses of indeterminate values were undefined C++ used the term indeterminate value without even defining it (assuming we can not borrow definition from C99) and also see defect report 616. We had to rely on the underspecified lvalue-to-rvalue conversion which in draft C++11 standard is covered in section 4.1 Lvalue-to-rvalue conversion paragraph 1 which says:
[...]if the object is uninitialized, a program that necessitates this conversion has undefined behavior.[...]
Footnotes:
1787 is a revision of defect report 616, we can find that information in N3903