[basic.scope.pdecl]/1 of the C++20 standard draft had the following (non-normative) example in a note (partial quote from before the merge of pull request 3580, see answer to this question):
unsigned char x = x;
[...] x is initialized with its own (indeterminate) value.
Does this actually have well-defined behavior in C++20?
Generally the self-initialization of the form T x = x; has undefined behavior by virtue of x's value being indeterminate before initialization is completed. Evaluating indeterminate values generally causes undefined behavior ([basic.indent]/2), but there is a specific exception in [basic.indent]/2.3 that allows directly initializing an unsigned char variable from an lvalue unsigned char with indeterminate value (causing initialization with an indeterminate value).
This alone does therefore not cause undefined behavior, but would for other types T that are not unsigned narrow character types or std::byte, e.g. int x = x;. These considerations applied in C++17 and before as well, see also linked questions at the bottom.
However, even for unsigned char x = x;, the current draft's [basic.lifetime]/7 says:
Similarly, before the lifetime of an object has started [...] using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:
the glvalue is used to access the object, or
[...]
This seems to imply that x's value in the example can only be used during its lifetime.
[basic.lifetime]/1 says:
[...]
The lifetime of an object of type T begins when:
[...] and
its initialization (if any) is complete (including vacuous initialization) ([dcl.init]),
[...]
Thus x's lifetime begins only after initialization is completed. But in the quoted example x's value is used before x's initialization is complete. Therefore the use has undefined behavior.
Is my analysis correct and if it is, does it affect similar cases of use-before-initialization such as
int x = (x = 1);
which, as far as I can tell, were well-defined in C++17 and before as well?
Note that in C++17 (final draft) the second requirement for lifetime to begin was different:
if the object has non-vacuous initialization, its initialization is complete,
Since x would have vacuous initialization by C++17's definition (but not the one in the current draft), its lifetime would have already begun when it is accessed in the initializer in the examples given above and so in both examples there was no undefined behavior due to lifetime of x in C++17.
The wording before C++17 is again different, but with the same result.
The question is not about undefined behavior when using indeterminate values, which was covered in e.g. the following questions:
Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++14?
Does initialization entail lvalue-to-rvalue conversion? Is int x = x; UB?
This was opened as an editorial issue. It was forwarded to CWG for (internal) discussion. Approximately 24 hours later, the person who forwarded the issue created a pull request which modifies the example to make it clear that this is UB:
Here, the initialization of the second \tcode{x} has undefined behavior, because the initializer accesses the second \tcode{x} outside its lifetime\iref{basic.life}.
That PR has since been added and the issue closed. So it seems clear that the obvious interpretation (UB due to accessing an object whose lifetime has not started) is the intended interpretation. It appears that the intent of the committee is to make these constructs non-functional, and the standard's non-normative text has been updated to reflect this.
Related
I am reading through the cppreference page on default initialization and I noticed a section that states something along these lines:
//UB
int x;
int y = x;
//Defined and ok
unsigned char c;
unsigned char d = c;
And the same rule for unsigned char, applys for std::byte aswell.
My question is why does every other non class variable (int, bool, char etc) result in UB if you try to use the value before assigning it (like above example), but not unsigned char? Why is unsigned char special?
The page I am reading for reference
The difference is not in initialisation behaviour. The value of uninitialised int is indeterminate and default initialisation leaves it indeterminate. The value of uninitialised unsigned char is indeterminate and default initialisation leaves it indeterminate. There is no difference there.
The difference is that behaviour of producing an indeterminate value of type int - or any other type besides the exceptional unsigned char or std::byte - is undefined (unless the value is discarded).
The exception for unsigned char (and later std::byte) was added to the language in C++14 when indeterminate value was properly defined (although since the change was a defect resolution, to my understanding it applies to the official standard at the time, C++11).
I could not find a documented rationale for that design choice. Here is a timeline of the definitions (all standard quotes are from drafts):
C89 - 1.6 DEFINITIONS OF TERMS
Undefined behavior --- behavior, upon use of ... indeterminately-valued objects
C89 - 3.5.7 Initialization - Semantics
... If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.
There are no exceptions for any type. You'll see why C standard is relevant when reading C++98 standard.
C++98 - [dcl.init]
... Otherwise, if no initializer is specified for an object, the object and its subobjects, if any, have an indeterminate initial value
There was no definition for what indeterminate value means or what happens when you use it. The intended meaning may presumably have been same as C89, but it is underspecified.
C99 - 3. Terms, definitions, and symbols - 3.17.2
3.17.2 indeterminate value
either an unspecified value or a trap representation
3.17.3 unspecified value
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance
NOTE An unspecified value cannot be a trap representation.
C99 - 6.2.6 Representations of types - 6.2.6.1 General
Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined. 41) Such a representation is called a trap representation.
C99 - J.2 Undefined behavior
The behavior is undefined in the following circumstances:
...
The value of an object with automatic storage duration is used while it is indeterminate
A trap representation is read by an lvalue expression that does not have character type
A trap representation is produced by a side effect that modifies any part of the object using an lvalue expression that does not have character type
...
C99 introduced the term trap representation, and which also have UB when used, just like indeterminate values. Character types (which are char, unsigned char and signed char) don't have trap representations, and may be used to operate on trap representations of other types without UB.
C++ core language issue - 616. Definition of “indeterminate value”
The C++ Standard uses the phrase “indeterminate value” without defining it. C99 defines it as “either an unspecified value or a trap representation.” Should C++ follow suit?
Proposed resolution (October, 2012):
[dcl.init] paragraph 12 as follows:
If no initializer is specified for an object, the object is default-initialized. When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced (5.17 [expr.ass]). [Note: Objects with static or thread storage duration are zero-initialized, see 3.6.2 [basic.start.init]. —end note] If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of:
the second or third operand of a conditional expression (5.16 [expr.cond]),
the right operand of a comma (5.18 [expr.comma]),
the operand of a cast or conversion to an unsigned narrow character type (4.7 [conv.integral], 5.2.3 [expr.type.conv], 5.2.9 [expr.static.cast], 5.4 [expr.cast]), or
a discarded-value expression (Clause 5 [expr]),
then the result of the operation is an indeterminate value.
If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the right operand of a simple assignment operator (5.17 [expr.ass]) whose first operand is an lvalue of unsigned narrow character type, an indeterminate value replaces the value of the object referred to by the left operand.
If an indeterminate value of unsigned narrow character type (3.9.1 [basic.fundamental]) is produced by the evaluation of the initialization expression when initializing an object of unsigned narrow character type, that object is initialized to an indeterminate value.
The proposed change was accepted as a defect resolution with some further changes (issue 1213) but has remained mostly the same (similar enough for purposes of this question). This is where the exception for unsigned char seems to have been introduced into C++. The core language issue has no public comments or notes about the rationale for the exception as far as I could find.
Under C89 and C99, uninitialized values could have any bit pattern. If addressable locations have n bits, then unsigned char was guaranteed to have 2ⁿ possible values, so every possible bit pattern would be a valid value. Other types, however, would on some platforms be stored in ways where not all bit patterns were valid. The Standard imposed no requirements on what might happen if code attempted to read an object when the stored bit pattern didn't represent a valid value, so the question of whether reading an object of a type other than unsigned char would yield an Unspecified Value, or could trigger arbitrary behavior, would depend upon whether the implementation's specified representation of the type assigned valid values to all possible bit patterns.
The C11 Standard added an additional proviso which says that even implementations which specify that all objects, whether or not their address is taken, will always be stored in ways were all bit patterns would represent valid values may opt to behave in completely arbitrary fashion if an attempt is made to access an uninitialized object that isn't an unsigned char whose address is taken. Although no rationale document is published for C11 (unlike earlier versions), I think such changes stem from a lack of consensus about whether the Standard is supposed to only describe the behavior of 100% portable programs, or of a wider variety of practical programs. If a program is going to be run on a completely unspecified implementation, then it will be impossible to know what the effect of reading an uninitialized object would be except in the case specified by the C11 Standard. If it's going to be run on a known implementation, then it will be processed however that implementation decides to process it, whether or not the Standard mandates the behavior, so there should be no need to mandate anything in particular. Unfortunately, the authors of a Gratuitously "Clever" Compiler believe that when the Standard characterizes an action as "non-portable or erroneous" what it really means is "non-portable, and therefore erroneous", and excludes the possibility of "non-portable but correct on the intended target", despite the fact that such a notion directly contradicts the published Rationale documents for earlier versions of the Standard.
For example, in the following code:
int myarray[3];
int x = myarray[1];
Is the code guaranteed to execute successfully in constant time, with x having some integral value? Or can the compiler skip emitting code for this entirely / emit code to launch GNU Chess and still comply with the C++ standard?
This is useful in a data structure that's like an array, but can be initialized in constant time. (Sorry, don't have my copy of Aho, Hopcroft and Ullman handy so can't look up the name.)
It's undefined behavior.
According to the standard ([dcl.init] paragraph 12),
If no initializer is specified for an object, the object is default-initialized. When storage for an object
with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if
no initialization is performed for the object, that object retains an indeterminate value until that value is
replaced ... If an indeterminate value is produced by an evaluation, the behavior is undefined except in the
following cases
with all the following cases addressing access of unsigned narrow character type or std::byte, which can result in an indeterminate value instead of being undefined.
Accessing any uninitialized data is undefined behavior.
So far I can't find how to deduce that the following:
int* ptr;
*ptr = 0;
is undefined behavior.
First of all, there's 5.3.1/1 that states that * means indirection which converts T* to T. But this doesn't say anything about UB.
Then there's often quoted 3.7.3.2/4 saying that using deallocation function on a non-null pointer renders the pointer invalid and later usage of the invalid pointer is UB. But in the code above there's nothing about deallocation.
How can UB be deduced in the code above?
Section 4.1 looks like a candidate (emphasis mine):
An lvalue (3.10) of a
non-function, non-array type T can be
converted to an rvalue. If T is an
incomplete type, a program that
necessitates this conversion is
ill-formed. If the object to which the
lvalue refers is not an object of type
T and is not an object of a type
derived from T, or if the object is
uninitialized, a program that
necessitates this conversion has
undefined behavior. If T is a
non-class type, the type of the rvalue
is the cv-unqualified version of T.
Otherwise, the type of the rvalue is
T.
I'm sure just searching on "uninitial" in the spec can find you more candidates.
I found the answer to this question is a unexpected corner of the C++ draft standard, section 24.2 Iterator requirements, specifically section 24.2.1 In general paragraph 5 and 10 which respectively say (emphasis mine):
[...][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] [...] Dereferenceable values are always non-singular.
and:
An invalid iterator is an iterator that may be singular.268
and footnote 268 says:
This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.
Although it does look like there is some controversy over whether a null pointer is singular or not and it looks like the term singular value needs to be properly defined in a more general manner.
The intent of singular is seems to be summed up well in defect report 278. What does iterator validity mean? under the rationale section which says:
Why do we say "may be singular", instead of "is singular"? That's becuase a valid iterator is one that is known to be nonsingular. Invalidating an iterator means changing it in such a way that it's no longer known to be nonsingular. An example: inserting an element into the middle of a vector is correctly said to invalidate all iterators pointing into the vector. That doesn't necessarily mean they all become singular.
So invalidation and being uninitialized may create a value that is singular but since we can not prove they are nonsingular we must assume they are singular.
Update
An alternative common sense approach would be to note that the draft standard section 5.3.1 Unary operators paragraph 1 which says(emphasis mine):
The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.[...]
and if we then go to section 3.10 Lvalues and rvalues paragraph 1 says(emphasis mine):
An lvalue (so called, historically, because lvalues could appear on the left-hand side of an assignment expression) designates a function or an object. [...]
but ptr will not, except by chance, point to a valid object.
The OP's question is nonsense. There is no requirement that the Standard say certain behaviours are undefined, and indeed I would argue that all such wording be removed from the Standard because it confuses people and makes the Standard more verbose than necessary.
The Standard defines certain behaviour. The question is, does it specify any behaviour in this case? If it does not, the behaviour is undefined whether or not it says so explicitly.
In fact the specification that some things are undefined is left in the Standard primarily as a debugging aid for the Standards writers, the idea being to generate a contradiction if there is a requirement in one place which conflicts with an explicit statement of undefined behaviour in another: that's a way to prove a defect in the Standard. Without the explicit statement of undefined behaviour, the other clause prescribing behaviour would be normative and unchallenged.
Evaluating an uninitialized pointer causes undefined behaviour. Since dereferencing the pointer first requires evaluating it, this implies that dereferencing also causes undefined behaviour.
This was true in both C++11 and C++14, although the wording changed.
In C++14 it is fully covered by [dcl.init]/12:
When storage for an object with automatic or dynamic storage duration is obtained, the object has an indeterminate value, and if no initialization is performed for the object, that object retains an indeterminate value until that value is replaced.
If an indeterminate value is produced by an evaluation, the behavior is undefined except in the following cases:
where the "following cases" are particular operations on unsigned char.
In C++11, [conv.lval/2] covered this under the lvalue-to-rvalue conversion procedure (i.e. retrieving the pointer value from the storage area denoted by ptr):
A glvalue of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not
an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.
The bolded part was removed for C++14 and replaced with the extra text in [dcl.init/12].
I'm not going to pretend I know a lot about this, but some compilers would initialize the pointer to NULL and dereferencing a pointer to NULL is UB.
Also considering that uninitialized pointer could point to anything (this includes NULL) you could concluded that it's UB when you dereference it.
A note in section 8.3.2 [dcl.ref]
[Note: in particular, a null reference
cannot exist in a well-defined
program, because the only way to
create such a reference would be to
bind it to the “object” obtained by
dereferencing a null pointer, which
causes undefined behavior. As
described in 9.6, a reference cannot
be bound directly to a bitfield. ]
—ISO/IEC 14882:1998(E), the ISO C++ standard, in section 8.3.2 [dcl.ref]
I think I should have written this as comment instead, I'm not really that sure.
To dereference the pointer, you need to read from the pointer variable (not talking about the object it points to). Reading from an uninitialized variable is undefined behaviour.
What you do with the value of pointer after you have read it, doesn't matter anymore at this point, be it writing to (like in your example) or reading from the object it points to.
Even if the normal storage of something in memory would have no "room" for any trap bits or trap representations, implementations are not required to store automatic variables the same way as static-duration variables except when there is a possibility that user code might hold a pointer to them somewhere. This behavior is most visible with integer types. On a typical 32-bit system, given the code:
uint16_t foo(void);
uint16_t bar(void);
uint16_t blah(uint32_t q)
{
uint16_t a;
if (q & 1) a=foo();
if (q & 2) a=bar();
return a;
}
unsigned short test(void)
{
return blah(65540);
}
it would not be particularly surprising for test to yield 65540 even though that value is outside the representable range of uint16_t, a type which has no trap representations. If a local variable of type uint16_t holds Indeterminate Value, there is no requirement that reading it yield a value within the range of uint16_t. Since unexpected behaviors could result when using even unsigned integers in such fashion, there's no reason to expect that pointers couldn't behave in even worse fashion.
Here's the sample code:
X * makeX(int index) { return new X(index); }
struct Tmp {
mutable int count;
Tmp() : count(0) {}
const X ** getX() const {
static const X* x[] = { makeX(count++), makeX(count++) };
return x;
}
};
This reports Undefined Behavior on CLang build 500 in the static array construction.
For sake of simplification for this post, the count is not static, but it does not change anything. The error I am receiving is as follows:
test.cpp:8:44: warning: multiple unsequenced modifications to 'count' [-Wunsequenced]
In C++11, this is fine; each clause of an initialiser list is sequenced before the next one, so the evaluation is well-defined.
Historically, the clauses might have been unsequenced, so the two unsequenced modifications of count would give undefined behaviour.
(Although, as noted in the comments, it might have been well-defined even then - you can probably interpret the standard as implying that each clause is a full-expression, and there's a seqeuence point at the end of each full-expression. I'll leave it to historians to debate the finer points of obsolete languages.)
Update 2
So after some research I realized this was actually well defined although the evaluation order is unspecified. It was a pretty interesting putting the pieces together and although there is a more general question covering this for the C++11 case there was not a general question covering the pre C++11 case so I ended up creating a self answer question, Are multiple mutations of the same variable within initializer lists undefined behavior pre C++11 that covers all the details.
Basically, the instinct when seeing makeX(count++), makeX(count++) is to see the whole thing as a full-expression but it is not and therefore each intializer has a sequence point.
Update
As James points out it may not be undefined pre-C++11, which would seem to rely on interpreting the initialization of each element as a full expression but it is not clear you can definitely make that claim.
Original
Pre-C++11 it is undefined behavior to modify a variable more than once within a sequence point, we can see that by looking at the relevant section in an older draft standard would be section 5 Expressions paragraph 4 which says (emphasis mine):
[...]Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined.
In the C++11 draft standard this changes and to the following wording from section 1.9 Program execution paragraph 15 says (emphasis mine):
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [...] If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
and we can see that for initializer lists from section 8.5.4 List-initialization paragraph 4 says:
Within the initializer-list of a braced-init-list, the initializer-clauses, including any that result from pack expansions (14.5.3), are evaluated in the order in which they appear. That is, every value computation and side effect associated with a given initializer-clause is sequenced before every value computation and side effect associated with any initializer-clause that follows it in the comma-separated list of the initializer-list.
Because it this case, the , is NOT a sequence point, but acts more like a delimiter in the initialization of the elements of the array.
In other words, you're modifying the same variable twice in a statement without sequence points (between the modifications).
EDIT: thanks to #MikeSeymour: this is an issue in C++03 an before. It seems like in C++11, the order of evaluation is defined for this case.
This question might seem naive (hell, I think it is) but I am unable to find an answer that satisfies me.
Take this simple C++ program:
#include<iostream>
using namespace std;
int main ()
{
bool b;
cout << b;
return 0;
}
When compiled and executed, it always prints 0.
The problem is that is not what I'm expecting it to do: as far as I know, a local variable has no initialization value, and I believe that a random byte has more chances of being different rather than equal to 0.
What am I missing?
That is undefined behavior, because you are using the value of an uninitialized variable. You cannot expect anything out of a program with undefined behavior.
In particular, your program necessitates a so-called lvalue-to-rvalue conversion when initializing the parameter of operator << from b. Paragraph 4.1/1 of the C++11 Standard specifies:
A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue. If T is an incomplete
type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not
an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program
that necessitates this conversion has undefined behavior. If T is a non-class type, the type of the prvalue is
the cv-unqualified version of T. Otherwise, the type of the prvalue is T.
The behaviour is undefined; there is no requirement for it to be assigned a random value, and certainly not a uniformly-distributed one.
What is probably happening is that the memory allocated to the process is zero-initialised by the operating system, and this is the first time that that byte is used, so it still contains zero.
But, like all undefined behaviour, you can't rely on it and there's little point speculating about the details.
As Andy said, it's undefined behaviour. I think the fact that you are so lucky and always receive 0 is implementation defined. Probably the stack is empty and clean (initialized with zeros) when you program starts. So it happens that you get zero when allocation a variable there.
This may be guaranteed to succeed in your current implementation (as others said, maybe it initializes the stack with zeros) but it's also guaranteed to fail in, say, a Visual C++ debug build (which initializes local variables to 0xCCCCCCCC).
C++ static and global variables are initialized by default as C89 and C99 says - and specifically arithmetic variables are initialized by 0.
Auto variables are indeterminate which by C89 and C99 at 3.17.2 means that they are either an unspecified value or a trap representation. Trap representation in the context of bool type might mean 0 - this is compiler specific.