Why is this undefined behaviour? - c++

Here's the sample code:
X * makeX(int index) { return new X(index); }
struct Tmp {
mutable int count;
Tmp() : count(0) {}
const X ** getX() const {
static const X* x[] = { makeX(count++), makeX(count++) };
return x;
}
};
This reports Undefined Behavior on CLang build 500 in the static array construction.
For sake of simplification for this post, the count is not static, but it does not change anything. The error I am receiving is as follows:
test.cpp:8:44: warning: multiple unsequenced modifications to 'count' [-Wunsequenced]

In C++11, this is fine; each clause of an initialiser list is sequenced before the next one, so the evaluation is well-defined.
Historically, the clauses might have been unsequenced, so the two unsequenced modifications of count would give undefined behaviour.
(Although, as noted in the comments, it might have been well-defined even then - you can probably interpret the standard as implying that each clause is a full-expression, and there's a seqeuence point at the end of each full-expression. I'll leave it to historians to debate the finer points of obsolete languages.)

Update 2
So after some research I realized this was actually well defined although the evaluation order is unspecified. It was a pretty interesting putting the pieces together and although there is a more general question covering this for the C++11 case there was not a general question covering the pre C++11 case so I ended up creating a self answer question, Are multiple mutations of the same variable within initializer lists undefined behavior pre C++11 that covers all the details.
Basically, the instinct when seeing makeX(count++), makeX(count++) is to see the whole thing as a full-expression but it is not and therefore each intializer has a sequence point.
Update
As James points out it may not be undefined pre-C++11, which would seem to rely on interpreting the initialization of each element as a full expression but it is not clear you can definitely make that claim.
Original
Pre-C++11 it is undefined behavior to modify a variable more than once within a sequence point, we can see that by looking at the relevant section in an older draft standard would be section 5 Expressions paragraph 4 which says (emphasis mine):
[...]Between the previous and next sequence point a scalar object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored. The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full expression; otherwise the behavior is undefined.
In the C++11 draft standard this changes and to the following wording from section 1.9 Program execution paragraph 15 says (emphasis mine):
Except where noted, evaluations of operands of individual operators and of subexpressions of individual expressions are unsequenced. [...] If a side effect on a scalar object is unsequenced relative to either another side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined.
and we can see that for initializer lists from section 8.5.4 List-initialization paragraph 4 says:
Within the initializer-list of a braced-init-list, the initializer-clauses, including any that result from pack expansions (14.5.3), are evaluated in the order in which they appear. That is, every value computation and side effect associated with a given initializer-clause is sequenced before every value computation and side effect associated with any initializer-clause that follows it in the comma-separated list of the initializer-list.

Because it this case, the , is NOT a sequence point, but acts more like a delimiter in the initialization of the elements of the array.
In other words, you're modifying the same variable twice in a statement without sequence points (between the modifications).
EDIT: thanks to #MikeSeymour: this is an issue in C++03 an before. It seems like in C++11, the order of evaluation is defined for this case.

Related

Is it guaranteed that all forms of Undefined Behavior are caught when evaluating a constant expression

I came across the following claim:
Actually, all forms of UB in the language are required to be caught when evaluating a constant expression (though UB in the standard library is not required to be caught). It's only runtime UB where anything can happen.
(emphasis mine)
My question is that is the above statement technically correct?
Upon asking the user how does the standard impose this, they cited expr.const#5.8, which states:
5. An expression E is a core constant expression unless the evaluation of E, following the rules of the abstract machine ([intro.execution]), would evaluate one of the following:
5.8. an operation that would have undefined behavior as specified in [intro] through [cpp];
But after reading the above [expr.const#5.8], I could not figure out how this implies that all forms of UB in the language are required to be caught when evaluating a constant expression. So can someone clarify how does this citation support (if it does) the claim made in the comment quoted above?
I also read this which says:
If the behavior is undefined, the compiler could accept it, reject it, issue a warning, and according to the standard, even crash, hang or install a virus on your computer.
So it seems to me (upon reading the very first comment) that there is a fundamental difference between UB during the evaluation of a constant expression and a runtime UB.
What is the truth?
i could not figure out how this implies that all forms of UB in the language are required to be caught when evaluating a constant expression.
Not necessarily for all forms of UB. As per the quoted rule, only if operation is that is evaluated would have undefined behavior as specified in [intro] through [cpp];.
No other UB such as specified in other sections, or UB that isn't caused by evaluation of an operation, prevents an expression from being core constant. There is a clarifying rule:
[expr.const]
If E satisfies the constraints of a core constant expression, but evaluation of E would evaluate an operation that has undefined behavior as specified in [library] through [thread], or an invocation of the va_­start macro ([cstdarg.syn]), it is unspecified whether E is a core constant expression.
This clarification (including the "as specified in ..." sentence from question) is a resolution to defect report 1952 and the wording is in C++17.
To clarify, the rule causes certain UB to prevent an expression from being core constant. Consider a case where a rule requires an expression to be constant. Here is a an example of such rule:
[dcl.array]
D1 [ constant-expression opt ] attribute-specifier-seq opt
... The constant-expression shall be a converted constant expression of type std​::​size_­t ([expr.const]). Its value N specifies the array bound, i.e., the number of elements in the array; ...
If some context requires an expression to be constant, then the expression not being constant will violate that rule. In such case this applies:
[intro.compliance.general]
If a program contains a violation of any diagnosable rule or an occurrence of a construct described in this document as “conditionally-supported” when the implementation does not support that construct, a conforming implementation shall issue at least one diagnostic message.

Use of variable in own initializer

[basic.scope.pdecl]/1 of the C++20 standard draft had the following (non-normative) example in a note (partial quote from before the merge of pull request 3580, see answer to this question):
unsigned char x = x;
[...] x is initialized with its own (indeterminate) value.
Does this actually have well-defined behavior in C++20?
Generally the self-initialization of the form T x = x; has undefined behavior by virtue of x's value being indeterminate before initialization is completed. Evaluating indeterminate values generally causes undefined behavior ([basic.indent]/2), but there is a specific exception in [basic.indent]/2.3 that allows directly initializing an unsigned char variable from an lvalue unsigned char with indeterminate value (causing initialization with an indeterminate value).
This alone does therefore not cause undefined behavior, but would for other types T that are not unsigned narrow character types or std::byte, e.g. int x = x;. These considerations applied in C++17 and before as well, see also linked questions at the bottom.
However, even for unsigned char x = x;, the current draft's [basic.lifetime]/7 says:
Similarly, before the lifetime of an object has started [...] using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:
the glvalue is used to access the object, or
[...]
This seems to imply that x's value in the example can only be used during its lifetime.
[basic.lifetime]/1 says:
[...]
The lifetime of an object of type T begins when:
[...] and
its initialization (if any) is complete (including vacuous initialization) ([dcl.init]),
[...]
Thus x's lifetime begins only after initialization is completed. But in the quoted example x's value is used before x's initialization is complete. Therefore the use has undefined behavior.
Is my analysis correct and if it is, does it affect similar cases of use-before-initialization such as
int x = (x = 1);
which, as far as I can tell, were well-defined in C++17 and before as well?
Note that in C++17 (final draft) the second requirement for lifetime to begin was different:
if the object has non-vacuous initialization, its initialization is complete,
Since x would have vacuous initialization by C++17's definition (but not the one in the current draft), its lifetime would have already begun when it is accessed in the initializer in the examples given above and so in both examples there was no undefined behavior due to lifetime of x in C++17.
The wording before C++17 is again different, but with the same result.
The question is not about undefined behavior when using indeterminate values, which was covered in e.g. the following questions:
Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++14?
Does initialization entail lvalue-to-rvalue conversion? Is int x = x; UB?
This was opened as an editorial issue. It was forwarded to CWG for (internal) discussion. Approximately 24 hours later, the person who forwarded the issue created a pull request which modifies the example to make it clear that this is UB:
Here, the initialization of the second \tcode{x} has undefined behavior, because the initializer accesses the second \tcode{x} outside its lifetime\iref{basic.life}.
That PR has since been added and the issue closed. So it seems clear that the obvious interpretation (UB due to accessing an object whose lifetime has not started) is the intended interpretation. It appears that the intent of the committee is to make these constructs non-functional, and the standard's non-normative text has been updated to reflect this.

Is an Incremented Variable Reusable in a tie Call?

So I understand that re-usage of a variable that has been post incremented is undefined behavior in a function call. My understanding is this is not a problem in constructors. My question is about tie which is oddly halfway between each.
Given: pair<int, int> func() can I do:
tie(*it++, *it) = func();
Or is that undefined behavior?
Since C++17, this code has unspecified behavior. There are two possible outcomes:
the first argument is the result of dereferencing the original iterator, the second argument is the result of dereferencing the incremented iterator; or
the first argument and the second argument are both the result of dereferencing the original iterator.
Per [expr.call]/8:
[...] The initialization of a parameter, including every associated
value computation and side effect, is indeterminately sequenced with
respect to that of any other parameter. [...]
So the second argument to tie may be either the result of dereferencing the incremented iterator or the original iterator.
Prior to C++17, the situation was a bit complicated:
if both ++ and * invoke a function (e.g., when the type of it is a sophisticated class), then the behavior was unspecified, similar to the case since C++17;
otherwise, the behavior was undefined.
Per N4140 (C++14 draft) [expr.call]/8:
[ Note: The evaluations of the postfix expression and of the
arguments are all unsequenced relative to one another. All side
effects of argument evaluations are sequenced before the function is
entered (see [intro.execution]). — end note ]
Thus, the code was undefined behavior because the evaluation of one argument was unsequenced with the other. The evaluation of the two arguments may overlap, resulting in a data race. Unless it is specified otherwise ...
Per N4140 [intro.execution]/15:
When calling a function (whether or not the function is inline), every
value computation and side effect associated with any argument
expression, or with the postfix expression designating the called
function, is sequenced before execution of every expression or
statement in the body of the called function. [ Note: Value
computations and side effects associated with different argument
expressions are unsequenced. — end note ] Every evaluation
in the calling function (including other function calls) that is not
otherwise specifically sequenced before or after the execution of the
body of the called function is indeterminately sequenced with respect
to the execution of the called function.9 Several
contexts in C++ cause evaluation of a function call, even though no
corresponding function call syntax appears in the translation unit.
[
Example: Evaluation of a new expression invokes one or more allocation and constructor functions; see [expr.new]. For another
example, invocation of a conversion function ([class.conv.fct]) can
arise in contexts in which no function call syntax appears. —
end example ] The sequencing constraints on the execution of the called function (as described above) are features of the function
calls as evaluated, whatever the syntax of the expression that calls
the function might be.
9)
In other words, function executions do not interleave with each
other.
Thus, if the operators are actually function calls, then the behavior is similarly unspecified.

How can we predict the output of the following C++ Program [duplicate]

This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 6 years ago.
I am confused about the output of the code.
It depends on what compiler i run the code. Why is it so?
#include <iostream>
using namespace std;
int f(int &n)
{
n--;
return n;
}
int main()
{
int n=10;
n=n-f(n);
cout<<n;
return 0;
}
Running it on the Ubuntu terminal with g++, the output is 1 whereas running it on Turbo C++ ( the compiler we used in school) gives output as 0.
In C++03, modifying a variable and also using its value in the same expression, without an intervening C++03 sequence point, was Undefined Behavior.
C++03 §5/4:
” Between the previous
and next sequence point a scalar object shall have its stored value modified at most once by the evaluation
of an expression. Furthermore, the prior value shall be accessed only to determine the value to be stored.
The requirements of this paragraph shall be met for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined.
Undefined Behavior, UB, provides the compiler with an opportunity to optimize, because it can assume that UB does not occur in a valid program.
However, with all the myriad UB rules of C++ it's difficult to reason about source code.
In C++11 sequence points were replaced with sequenced before, indeterminately sequenced and unsequenced relations:
C++11 §1.9/3
” Given any two evaluations A and B, if
A is sequenced before B, then the execution of A shall precede the execution of B. If A is not sequenced before
B and B is not sequenced before A, then A and B are unsequenced. [Note: The execution of unsequenced
evaluations can overlap. —end note ] Evaluations A and B are indeterminately sequenced when either A
is sequenced before B or B is sequenced before A, but it is unspecified which.
And with the new C++11 sequence relationship rules the modification in the function in the code in question is indeterminately sequenced with respect to the use of the variable, and so the code has unspecified behavior rather than Undefined Behavior, as noted by Eric M Schmidt in a comment to (the first version of) this answer. Essentially that means that there is no danger of nasal daemons or other possible UB effects, and that the behavior is a reasonable one. The two possible behaviors here are that the modification via the function call is done before the use of the value, or that it's done after the use of the value.
Why it's unspecified behavior:
C++11 §1.9/15:
” Every evaluation in the calling function (including other function calls) that is not otherwise specifically
sequenced before or after the execution of the body of the called function is indeterminately sequenced with
respect to the execution of the called function.
What “unspecified behavior” means:
C++11 §1.3.25:
” unspecified behavior
Behavior, for a well-formed program construct and correct data, that depends on the implementation
[Note: The implementation is not required to document which behavior occurs. The range of possible
behaviors is usually delineated by this International Standard. —end note ]
Why the modification effected by the assignment is not problematic:
C++11 §5.17/1
” In all cases, the assignment is sequenced after the value
computation of the right and left operands, and before the value computation of the assignment expression.
This is also quite different from C++03.
As the rather drastic edit of this answer shows, following Eric's comment, this kind of issue is not simple! The main advice I can give is to as much as possible just Say No™ to effects governed by subtle or very complex rules, the corners of the language. Simple code has a better chance of being correct, while so called clever code does not have a good chance of being significantly faster.

Why is "volatileQualifiedExpr + volatileQualifiedExpr" not necessarily UB in C but in C++?

When I today read the C Standard, it says about side effects
Accessing a volatile object, modifying an object, modifying a file, or calling a function
that does any of those operations are all side effects
and the C++ Standard says
Accessing an object designated by a volatile glvalue (3.10), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects
Hence because both forbid unsequenced side effects to occur on the same scalar object, C allows the following, but C++ makes it undefined behavior
int a = 0;
volatile int *pa = &a;
int b = *pa + *pa;
Am I reading the specifications correctly? And what is the reason for the discrepancy, if so?
I don't believe there is an effective variation between C and C++ in this regards. Though the wording on sequencing varies the end result is the same: both result in undefined behaviour (though C seems to indicate the evaluation will suceed but with an undefined result).
In C99 (sorry, don't have C11 handy) paragraph 5.1.2.3.5 specifies:
— At sequence points, volatile objects are stable in the sense that previous accesses are
complete and subsequent accesses have not yet occurred.
Combined with your quote from 5.1.2.3.2 would indicate the value of pa would not be in a stable state for at least one of the accesses to pa. This makes logical sense since the compiler would be allowed to evaluate them in any order, just once, or at the same time (if possible). It doesn't actually define what stable means however.
In C++11 there is explicit reference to unsequenced oeprations at 1.9.13. Then point 15 indicates such unsequenced operations on the same operand is undefined. Since undefined behaviour can mean anything happens it is perhaps strong than C's unstable behaviour. However, in both cases there is no guaranteed result of your expression.