Does this post-increment statement result in undefined behaviour? [duplicate] - c++

This question already has answers here:
Undefined behavior and sequence points
(5 answers)
Closed 4 years ago.
When building a program using a newer version of GCC, I found a problem in the code.
count[i] = count[i]++;
This code worked with an older version of GCC (2.95), but doesn't work with a newer version (4.8).
So I suspect this statement causes undefined behaviour, am I correct? Or is there a better term for this problem?

This is actually specified as undefined behavior as each compiler defines its own order of operation as stated on: https://en.cppreference.com/w/cpp/language/eval_order
Order of evaluation of the operands of almost all C++ operators (including the order of evaluation of function arguments in a function-call expression and the order of evaluation of the subexpressions within any expression) is unspecified. The compiler can evaluate operands in any order, and may choose another order when the same expression is evaluated again.
There is actually a warning on the increment/decrement page in the cppreference: https://en.cppreference.com/w/cpp/language/operator_incdec
Because of the side-effects involved, built-in increment and decrement operators must be used with care to avoid undefined behavior due to violations of sequencing rules.

Indeed, this is undefined behavior.
int i = 2;
i = i++; // is i assigned to be 2 or 3?

Related

Why does C++17 GCC compiler gives warning about undefined?

According to C++17, there is no guarantee for order of evaluation in following expression. It is called unspecified behaviour.
int i = 0;
std::cout<<i<<i++<<std::endl;
C++17 GCC compiler gives following warning: Live Demo
prog.cc: In function 'int main()':
prog.cc:6:20: warning: operation on 'i' may be undefined [-Wsequence-point]
std::cout<<i<<i++<<std::endl;
I don't understand, in c++17 above express no longer undefined behaviour, then Why does compiler gives warning about undefined?
Seems like gcc gives a warning because this is a corner case, or at least very close to being one. Portability seems to be one concern.
From the page https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
The C++17 standard will define the order of evaluation of operands in more cases: in particular it requires that the right-hand side of an assignment be evaluated before the left-hand side, so the above examples are no longer undefined. But this warning will still warn about them, to help people avoid writing code that is undefined in C and earlier revisions of C++.
The standard is worded confusingly, therefore there is some debate over the precise meaning of the sequence point rules in subtle cases. Links to discussions of the problem, including proposed formal definitions, may be found on the GCC readings page, at http://gcc.gnu.org/readings.html.

In the comma operator, is the left operand guaranteed not to be actually executed if it hasn't side effects?

To show the topic I'm going to use C, but the same macro can be used also in C++ (with or without struct), raising the same question.
I came up with this macro
#define STR_MEMBER(S,X) (((struct S*)NULL)->X, #X)
Its purpose is to have strings (const char*) of an existing member of a struct, so that if the member doesn't exist, the compilation fails. A minimal usage example:
#include <stdio.h>
struct a
{
int value;
};
int main(void)
{
printf("a.%s member really exists\n", STR_MEMBER(a, value));
return 0;
}
If value weren't a member of struct a, the code wouldn't compile, and this is what I wanted.
The comma operator should evaluate the left operand and then discard the result of the expression (if there is one), so that my understanding is that usually this operator is used when the evaluation of the left operand has side effects.
In this case, however, there aren't (intended) side effects, but of course it works iff the compiler doesn't actually produce the code which evaluates the expression, for otherwise it would access to a struct located at NULL and a segmentation fault would occur.
Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.
Adding volatile in the macro (e.g. because accessing that memory address is the desired side effect) was so far the only way to trigger the segmentation fault.
So the question: is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator when the compiler can be sure that the evaluation hasn't side effects?
Notes and fixing
I am not asking for a judgment about the macro as it is and the opportunity to use it or make it better. For the purpose of this question, the macro is bad if and only if it evokes undefined behaviour — i.e., if and only if it is risky because compilers are allowed to generate the “evaluation code” even when this hasn't side effects.
I have already two obvious fixes in mind: “reifying” the struct and using offsetof. The former needs an accessible memory area as big as the biggest struct we use as first argument of STR_MEMBER (e.g. maybe a static union could do…). The latter should work flawlessly: it gives an offset we aren't interested in and avoids the access problem — indeed I'm assuming gcc, because it's the compiler I use (hence the tag), and that its offsetof built-in behaves.
With the offsetof fix the macro becomes
#define STR_MEMBER(S,X) (offsetof(struct S,X), #X)
Writing volatile struct S instead of struct S doesn't cause the segfault.
Suggestions about other possible “fixes” are welcome, too.
Added note
Actually, the real usage case was in C++ in a static storage struct. This seems to be fine in C++, but as soon as I tried C with a code closer to the original instead of the one boiled for this question, I realized that C isn't happy at all with that:
error: initializer element is not constant
C wants the struct to be initializable at compile time, instead C++ it's fine with that.
Is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator ?
It's the opposite. The standard guarantees that the left operand IS evaluated (really it does, there aren't any exceptions). The result is discarded.
Note: for lvalue expressions, "evaluate" does not mean "access the stored value". Instead, it means to work out where the designated memory location is. The other code encompassing the lvalue expression may or may not then go on to access the memory location. The process of reading from the memory location is known as "lvalue conversion" in C, or "lvalue to rvalue conversion" in C++.
In C++ a discarded-value expression (such as the left operand of the comma operator) only has lvalue to rvalue conversion performed on it if it is volatile and also meets some other criteria (see C++14 [expr]/11 for detail). In C lvalue conversion does occur for expressions whose result is not used (C11 6.3.2.1/2).
In your example, it is moot whether or not lvalue conversion happens. In both languages X->Y, where X is a pointer, is defined as (*X).Y; in C the act of applying * to a null pointer already causes undefined behaviour (C11 6.5.3/3), and in C++ the . operator is only defined for the case when the left operand actually designates an object (C++14 [expr.ref]/4.2).
The comma operator (C documentation, says something very similar) has no such guarantees.
In a comma expression E1, E2, the expression E1 is evaluated, its result is discarded ..., and its side effects are completed before evaluation of the expression E2 begins
irrelevant information omitted
To put it simply, E1 will be evaluated, although the compiler might optimize it away by the as-if rule if it is able to determine that there are no side-effects.
Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.
clang will produce code which raises an error if you pass it the -fsanitize=undefined option. Which should answer your question: at least one major implementation's developers clearly consider the code as having undefined behaviour. And they are correct.
Suggestions about other possible “fixes” are welcome, too.
I would look for something which is guaranteed not to evaluate the expression. Your suggestion of offsetof does the job, but may occasionally cause code to be rejected that would otherwise be accepted, such as when X is a.b. If you want that to be accepted, my thought would be to use sizeof to force an expression to remain unevaluated.
You ask,
is there anything in the C and C++ languages standard which guarantees
that compilers will always avoid actual evaluation of the left operand
of the comma operator when the compiler can be sure that the
evaluation hasn't side effects?
As others have remarked, the answer is "no". On the contrary, the standards both unconditionally state that the left-hand operand of the comma operator is evaluated, and that the result is discarded.
This is of course a description of the execution model of an abstract machine; implementations are permitted to work differently, so long as the observable behavior is the same as the abstract machine behavior would produce. If indeed evaluation of the left-hand expression produces no side effects, then that would permit skipping it altogether, but there is nothing in either standard that provides for requiring that it be skipped.
As for fixing it, you have various options, some of which apply only to one or the other of the two languages you have named. I tend to like your offsetof() alternative, but others have noted that in C++, there are types to which offsetof cannot be applied. In C, on the other hand, the standard specifically describes its application to structure types, but says nothing about union types. Its behavior on union types, though very likely to be consistent and natural, as technically undefined.
In C only, you could use a compound literal to avoid the undefined behavior in your approach:
#define HAS_MEMBER(T,X) (((T){0}).X, #X)
That works equally well on structure and union types (though you need to provide a full type name for this version, not just a tag). Its behavior is well defined when the given type does have such a member. The expansion violates a language constraint -- thus requiring a diagnostic to be emitted -- when the type does not have such a member, including when it is neither a structure type nor a union type.
You might also use sizeof, as #alain suggested, because although the sizeof expression will be evaluated, its operand will not be evaluated (except, in C, when its operand has variably-modified type, which will not apply to your use). I think this variation will work in both C and C++ without introducing any undefined behavior:
#define HAS_MEMBER(T,X) (sizeof(((T *)NULL)->X), #X)
I have again written it so that it works for both structs and unions.
The left operand of the comma operator is a discarded-value expression
5 Expressions
11 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value
expression. The expression is evaluated and its value is discarded.
[...]
There are also unevaluated operands which, as the name implies, are not evaluated.
8 In some contexts, unevaluated operands appear (5.2.8, 5.3.3, 5.3.7,
7.1.6.2). An unevaluated operand is not evaluated. An unevaluated operand is considered a full-expression. [...]
Using a discarded-value expression in your use case is undefined behavior, but using an unevaluated operand is not.
Using sizeof for example would not cause UB because it takes an unevaluated operand.
#define STR_MEMBER(S,X) (sizeof(S::X), #X)
sizeof is preferable to offsetof, because offsetof can't be used for static members and classes that are not standard-layout:
18 Language support library
4 The macro offsetof(type, member-designator) accepts a restricted
set of type arguments in this International Standard. If type is not a
standard-layout class (Clause 9), the results are undefined. [...] The result of applying the offsetof macro to a field that
is a static data member or a function member is undefined. [...]
The language doesn't need to say anything about "actual execution" because of the as-if rule. After all, with no side effects how could you tell whether the expression is evaluated? (Looking at the assembly or setting breakpoints doesn't count; that's not part of execution of the program, which is all the language describes.)
On the other hand, dereferencing a null pointer is undefined behavior, so the language says nothing at all about what happens. You can't expect as-if to save you: as-if is a relaxation of otherwise-plausible restrictions on the implementation, and undefined behavior is a relaxation of all restrictions on the implementation. There is therefore no "conflict" between "this doesn't have side effects, so we can ignore it" and "this is undefined behavior, so nasal demons"; they're on the same side!

C++ Perincrement Undefined Operation vs C

I have this line of code:
front = (++front) % size;
In C I get no warnings but in C++ I get the warning operation on front may be undefined [-Wsequence-point]. How does this preincrement usage cause undefined behavior? In my mind, this line is very unambiguous and will be interpreted as:
increment front
mod front with size
assign new value to front.
Is my compiler just throwing a blanket warning?
P.S. I understand the warning if I were doing something like front = front++; or Heaven forbid front = front++ + front++;.
EDIT: This warning was produced in CodeBlocks on Windows 64 using GCC (tdm-1) 4.6.1
You are changing front twice between sequence points: once through ++, and once through assignment.
This is undefined behaviour.
In C++11 this is well-defined; the structure is the same as that of:
i = ++i + 1;
which is given as an example of well-defined behaviour in the Standard itself. For a more detailed explanation see AndreyT's answer here.
In C++03, C89 and C99 this is undefined behaviour as they had looser sequencting rules for ++i.
The old sequencing rules had edge cases where order of operations was well defined but which were still technically undefined behavior. With C++11 and C11 this has been fixed by replacing the sequence point requirements with 'sequenced-before' and 'sequenced-after' relations.
Your example happens to be such a case. If you're getting warnings in a C11 or C++11 mode then the warning simply hasn't been updated for the new rules yet. In earlier C and C++ modes the warning is correct. If you're not getting warnings in earlier modes then they simply weren't implemented, and that's okay as 'no diagnostic is required'.
At the same time, this line can be written more clearly and also be correct under the old rules:
front = (front + 1) % size;
Writing the incremented value back to front can happen at any time (before or after the assignment modifies front), so the warning is valid and the code is unsafe.

Differences in C and C++ with sequence points and UB

I used this post Undefined Behavior and Sequence Points to document undefined behavior(UB) in a C program and it was pointed to me that C and C++ have their own divergent rules for this [sequence points]. So what are the differences between C and C++ when it comes to sequence points and related UB? Can’t I use a post about C++ sequences to analyze what is happening in C code?
* Of Course I am not talking about features of C++ not applicable to C.
There are two parts to this question, we can tackle a comparison of sequence points rules without much trouble. This does not get us too far though, C and C++ are different languages which have different standards(the latest C++ standard is almost twice as large as the the latest C standard) and even though C++ uses C as a normative reference it would be incorrect to quote the C++ standard for C and vice versa, regardless how similar certain sections may be. The C++ standard does explicitly reference the C standard but that is for small sections.
The second part is a comparison of undefined behavior between C and C++, there can be some big differences and enumerating all the differences in undefined behavior may not be possible but we can give some indicative examples.
Sequence Points
Since we are talking about sequence points then this is covering pre C++11 and pre C11. The sequence point rules do not differ greatly as far as I can tell between C99 and Pre C++11 draft standards. As we will see in some of the example I give of differing undefined behavior the sequence point rules do not play a part in them.
The sequence points rules are covered in the closest draft C++ standard to C++03 section 1.9 Program execution which says:
There is a sequence point at the completion of evaluation of each full-expression12).
When calling a function (whether or not the function is inline), there is a sequence point after the evaluation of all
function arguments (if any) which takes place before execution of any expressions or statements in the function body.
There is also a sequence point after the copying of a returned value and before the execution of any expressions outside
the function13). Several contexts in C++ cause evaluation of a function call, even though no corresponding function call
syntax appears in the translation unit. [ Example: evaluation of a new expression invokes one or more allocation and
constructor functions; see 5.3.4. For another example, invocation of a conversion function (12.3.2) can arise in contexts
in which no function call syntax appears. —end example ] The sequence points at function-entry and function-exit
(as described above) are features of the function calls as evaluated, whatever the syntax of the expression that calls the
function might be.
In the evaluation of each of the expressions
a && b
a || b
a ? b : c
a , b
using the built-in meaning of the operators in these expressions (5.14, 5.15, 5.16, 5.18), there is a sequence point after
the evaluation of the first expression14).
I will use the sequence point list from the draft C99 standard Annex C which although it is not normative I can find no disagreement with the normative sections it references. It says:
The following are the sequence points described in 5.1.2.3:
The call to a function, after the arguments have been evaluated (6.5.2.2).
The end of the first operand of the following operators: logical AND && (6.5.13);
logical OR || (6.5.14); conditional ? (6.5.15); comma , (6.5.17).
The end of a full declarator: declarators (6.7.5);
The end of a full expression: an initializer (6.7.8); the expression in an expression
statement (6.8.3); the controlling expression of a selection statement (if or switch)
(6.8.4); the controlling expression of a while or do statement (6.8.5); each of the
expressions of a for statement (6.8.5.3); the expression in a return statement
(6.8.6.4).
The following entries do not seem to have equivalents in the draft C++ standard but these come from the C standard library which C++ incorporates by reference:
Immediately before a library function returns (7.1.4).
After the actions associated with each formatted input/output function conversion
specifier (7.19.6, 7.24.2).
Immediately before and immediately after each call to a comparison function, and
also between any call to a comparison function and any movement of the objects
passed as arguments to that call (7.20.5).
So there is not much of a difference between C and C++ here.
Undefined Behavior
When it comes to the typical examples of sequence points and undefined behavior, for example those covered in Section 5 Expression dealing with modifying a variable more than once within a sequence points I can not come up with an example that is undefined in one but not the other. In C99 it says:
Between the previous and next sequence point an object shall have its
stored value modified at most once by the evaluation of an
expression.72) Furthermore, the prior value shall be read only to
determine the value to be stored.73)
and it provides these examples:
i = ++i + 1;
a[i++] = i;
and in C++ it says:
Except where noted, the order of evaluation of operands of individual
operators and subexpressions of individual expressions, and the order
in which side effects take place, is unspecified.57) Between the
previous and next sequence point a scalar object shall have its stored
value modified at most once by the evaluation of an expression.
Furthermore, the prior value shall be accessed only to determine the
value to be stored. The requirements of this paragraph shall be met
for each allowable ordering of the subexpressions of a full
expression; otherwise the behavior is undefined
and provides these examples:
i = v[i ++]; / / the behavior is undefined
i = ++ i + 1; / / the behavior is undefined
In C++11 and C11 we do have one major difference which is covered in Assignment operator sequencing in C11 expressions which is the following:
i = ++i + 1;
This is due to the result of pre-increment being an lvalue in C++11 but not in C11 even though the sequencing rules are the same.
We do have major difference in areas that have nothing to do with sequence points:
In C what uses of an indeterminate value is undefined has always been well specified while in C++ it was not until the recent draft C++1y standard that it has been well specified. This is covered in my answer to Has C++ standard changed with respect to the use of indeterminate values and undefined behavior in C++1y?
Type punning through a union has always been well defined in C but not in C++ or at least it is hotly debatable whether it is undefined behavior or not. I have several references to this in my answer to Why does optimisation kill this function?
In C++ simply falling off the end of value returning function is undefined behavior while in C it is only undefined behavior if you use the value.
There are probably plenty more examples but these are ones I have written about before.

Does this expression invokes undefined behavior? [duplicate]

This question already has answers here:
Why are these constructs using pre and post-increment undefined behavior?
(14 answers)
Undefined behavior and sequence points
(5 answers)
Closed 9 years ago.
This started from a joke:
Interviewer: What is the difference between C and C++?
Candidate: ONE
My question is whether the expressions abs(C++ - C) and abs(C - C++) invokes undefined behavior or not?
It depends on the type of C, but at the best (a user defined
type, where ++ is a function), it is unspecified whether the
second C is evaluated before or after the evaluation of
C.operator++.
Of course, for a built-in type, the expression is undefined
behavior, and for a user defined type, the final results will
also depend on how the user defined operator++, as well as the
compiler dependent order of evaluation.
Yes, this is undefined behaviour. The compiler will not make any promises on when the increment will happen if you reuse the same variable in the statement.
yes this is UB. From C99, Section 6.5
An expression is a sequence of operators and operands that specifies
computation of a value
Except as specified later (for the function-call (), &&, ||, ?:, and
comma operators), the order of evaluation of subexpressions and the
order in which side effects take place are both unspecified
Therefore the is no guarantee in the express C++ - C when the post increment is executed.