Warn about UB in argument evaluation order - c++

I recently faced a bug in a code like this
class C
{
public:
// foo return value depends on C's state
// AND each call to foo changes the state.
int foo(int arg) /*foo is not const-qualified.*/ {}
private:
// Some mutable state
};
C c;
bar(c.foo(42), c.foo(43))
The last call behaved differently on different platforms (which is perfectly legal due to undefined order of argument evaluation), and I fixed the bug.
But the rest codebase is large and I would like to spot all other UB of this type.
Is there a special compiler warning in GCC, Clang or MSVS for such cases?
And what is the ideomatic and lightweight way to prevent such bugs?

Argument order evaluation is unspecified rather than undefined.
Order of evaluation of the operands of almost all C++ operators (including the order of evaluation of function arguments in a function-call expression and the order of evaluation of the subexpressions within any expression) is unspecified. The compiler can evaluate operands in any order, and may choose another order when the same expression is evaluated again.
Since it is unspecified rather than undefined behavior, compilers are not required to issue diagnostics for it.
GCC and Clang do not have any general compiler option to issue diagnostics for unspecified behavior.
In GCC there is the option fstrong-eval-order which does this:
Evaluate member access, array subscripting, and shift expressions in left-to-right order, and evaluate assignment in right-to-left order, as adopted for C++17. Enabled by default with -std=c++17. -fstrong-eval-order=some enables just the ordering of member access and shift expressions, and is the default without -std=c++17.
There is also the option -Wreorder (C++ and Objective-C++ only) which does this:
Warn when the order of member initializers given in the code does not match the order in which they must be executed
But I do not think these options will be helpful in your particular case.
In the below statement, if you want the first argument to be evaluated before the second:
bar(c.foo(42), c.foo(43))
The simple way is to store the results of c.foo(42) and c.foo(43) in intermediate variables first and then call bar(). (Turn off compiler optimizations to avoid any reordering of statements by the compiler !!)
auto var1 = c.foo(42);
auto var2 = c.foo(43);
bar(var1, var2);
I guess that is how you must have fixed the bug.

Related

Is `x^=y^=x^=y;` correct for swapping integers in C/C++?

x^=y^=x^=y; is a tricky/amusing implementation of the XOR swap algorithm in C and C++. It parses as x^=(y^=(x^=y)); and uses the fact that assignment operators return the assigned value. But is it correct? The GCC 10.3.0 C compiler gives me the warning operation on ‘x’ may be undefined [-Wsequence-point] and clang 12.0.0 warning: unsequenced modification and access to 'x' [-Wunsequenced]. Compiling as C++, clang continues to warn the same way, and GCC stops. So is this code correct in either language? It looks rather sequenced to me, but maybe it's illegal to modify a variable two times in the same statement?
As pointed out in this answer, clang++ -std=c++17 does not give the warning. With -std=c++11 the situation is as described above. So maybe my question should be further broken down into C/C++11/C++17.
Add --std=c++17 to your compiler and you will not get warning anymore.
There is a part that is added to C++17 that prevents undefined behavior and you need that part for it:
In every simple assignment expression E1=E2 and every compound assignment expression E1#=E2, every value computation and side-effect of E2 is sequenced before every value computation and side effect of E1
Though, I suggest that you never use it in your code too.

Must constant evaluator reject undefined behavior (union example) in C++?

As far as I know, undefined behavior shall be a compile error during constant evaluation.
But if one takes an example of undefined behavior from C++20 standard class.union#6.3 with minor modification to activate constant evaluation:
struct X { const int a; int b; };
union Y { X x; int k; };
constexpr bool g() {
Y y = { { 1, 2 } }; // OK, y.x is active union member ([class.mem])
int n = y.x.a;
y.k = 4; // OK: ends lifetime of y.x, y.k is active member of union
y.x.b = n; // undefined behavior: y.x.b modified outside its lifetime,
// S(y.x.b) is empty because X's default constructor is deleted,
// so union member y.x's lifetime does not implicitly start
return y.x.b > 0;
}
int main() {
static_assert( g() );
}
then it is accepted by all compilers without any warnings. Demo: https://gcc.godbolt.org/z/W7o4n5KrG
Are all compilers wrong here, or there is no undefined behavior in the example, or no diagnostic is required?
In the original versions of the C and C++ Standards, the phrase "Undefined Behavior" was intended to mean nothing more nor less than "the Standard imposes no requirements". There was no perceived need for the Standard to ensure that every possible execution of every possible construct as either having unambiguously defined behavior or as being readily and unambiguously recognizable as invoking Undefined Behavior.
Both the C and C++ drafts explicitly state that in cases where the Standard imposes no requirements, implementations may behave "in a documented manner characteristic of the environment". If there were some execution environment where cache lines were twice as large as int, and where storing an int value into the first half of cache line and zeroing the rest would be faster than a read-modify-write sequence necessary to update just the first half of the cache line while leaving the remainder undisturbed, an implementation for that platform might process the act of writing to y.k in a manner which would disturb the storage associated with y.x.b. On the other hand, for most environments the "characteristic behavior" of writing y.k would be to modify an int-sized chunk of storage, while leaving the remainder of the storage associated with the union undisturbed.
Treating the act of writing to y.k and then reading y.x.b as UB was intended to allow implementations to process the write to y.k in the fastest fashion, without having to consider whether code might care about the contents of y.x.b. It was not intended to require that implementations make any effort to prevent code from accessing y.x.b after writing y.k. Although C++ mandates that integer constant expressions within a template expansion be viewed as substitution failures in cases where they invoke certain actions upon which the Standard would otherwise impose no requirements, requiring that all such actions be treated as substitution failures would create contradictions where the Standard could be interpreted both as requiring that a compiler make a particular template substitution, and as requiring that it refrain from doing so.
Huh, I guess it is the compiler being a bit lax - but there is technically nothing undefined about this at compile time, as there is no way for y.x.a to ever be accessed. Indeed, if you change your definition of g to return y.x.a; instead of y.x.b > 0 then it does spit out an error message ("expression did not evaluate to a constant" on my machine).
When a compiler evaluates a constexpr expression, instead of compiling the relevant parts of the code it is (universally, as far as I'm aware but don't quote me on that) delegated to an interpreter to evaluate and the constant result is then given to the compiler to be compiled along with the rest of the non-constexpr code. Interpreter's are generally far worse at catching what we would call "compile time errors", and so if nothing is actually undefined about the execution of the code, then this is probably good enough for the interpreter. For instance, there is some documentation on the clang interpreter which shows that the execution model is very different to how one would expect the compiled code to run.

In the comma operator, is the left operand guaranteed not to be actually executed if it hasn't side effects?

To show the topic I'm going to use C, but the same macro can be used also in C++ (with or without struct), raising the same question.
I came up with this macro
#define STR_MEMBER(S,X) (((struct S*)NULL)->X, #X)
Its purpose is to have strings (const char*) of an existing member of a struct, so that if the member doesn't exist, the compilation fails. A minimal usage example:
#include <stdio.h>
struct a
{
int value;
};
int main(void)
{
printf("a.%s member really exists\n", STR_MEMBER(a, value));
return 0;
}
If value weren't a member of struct a, the code wouldn't compile, and this is what I wanted.
The comma operator should evaluate the left operand and then discard the result of the expression (if there is one), so that my understanding is that usually this operator is used when the evaluation of the left operand has side effects.
In this case, however, there aren't (intended) side effects, but of course it works iff the compiler doesn't actually produce the code which evaluates the expression, for otherwise it would access to a struct located at NULL and a segmentation fault would occur.
Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.
Adding volatile in the macro (e.g. because accessing that memory address is the desired side effect) was so far the only way to trigger the segmentation fault.
So the question: is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator when the compiler can be sure that the evaluation hasn't side effects?
Notes and fixing
I am not asking for a judgment about the macro as it is and the opportunity to use it or make it better. For the purpose of this question, the macro is bad if and only if it evokes undefined behaviour — i.e., if and only if it is risky because compilers are allowed to generate the “evaluation code” even when this hasn't side effects.
I have already two obvious fixes in mind: “reifying” the struct and using offsetof. The former needs an accessible memory area as big as the biggest struct we use as first argument of STR_MEMBER (e.g. maybe a static union could do…). The latter should work flawlessly: it gives an offset we aren't interested in and avoids the access problem — indeed I'm assuming gcc, because it's the compiler I use (hence the tag), and that its offsetof built-in behaves.
With the offsetof fix the macro becomes
#define STR_MEMBER(S,X) (offsetof(struct S,X), #X)
Writing volatile struct S instead of struct S doesn't cause the segfault.
Suggestions about other possible “fixes” are welcome, too.
Added note
Actually, the real usage case was in C++ in a static storage struct. This seems to be fine in C++, but as soon as I tried C with a code closer to the original instead of the one boiled for this question, I realized that C isn't happy at all with that:
error: initializer element is not constant
C wants the struct to be initializable at compile time, instead C++ it's fine with that.
Is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator ?
It's the opposite. The standard guarantees that the left operand IS evaluated (really it does, there aren't any exceptions). The result is discarded.
Note: for lvalue expressions, "evaluate" does not mean "access the stored value". Instead, it means to work out where the designated memory location is. The other code encompassing the lvalue expression may or may not then go on to access the memory location. The process of reading from the memory location is known as "lvalue conversion" in C, or "lvalue to rvalue conversion" in C++.
In C++ a discarded-value expression (such as the left operand of the comma operator) only has lvalue to rvalue conversion performed on it if it is volatile and also meets some other criteria (see C++14 [expr]/11 for detail). In C lvalue conversion does occur for expressions whose result is not used (C11 6.3.2.1/2).
In your example, it is moot whether or not lvalue conversion happens. In both languages X->Y, where X is a pointer, is defined as (*X).Y; in C the act of applying * to a null pointer already causes undefined behaviour (C11 6.5.3/3), and in C++ the . operator is only defined for the case when the left operand actually designates an object (C++14 [expr.ref]/4.2).
The comma operator (C documentation, says something very similar) has no such guarantees.
In a comma expression E1, E2, the expression E1 is evaluated, its result is discarded ..., and its side effects are completed before evaluation of the expression E2 begins
irrelevant information omitted
To put it simply, E1 will be evaluated, although the compiler might optimize it away by the as-if rule if it is able to determine that there are no side-effects.
Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.
clang will produce code which raises an error if you pass it the -fsanitize=undefined option. Which should answer your question: at least one major implementation's developers clearly consider the code as having undefined behaviour. And they are correct.
Suggestions about other possible “fixes” are welcome, too.
I would look for something which is guaranteed not to evaluate the expression. Your suggestion of offsetof does the job, but may occasionally cause code to be rejected that would otherwise be accepted, such as when X is a.b. If you want that to be accepted, my thought would be to use sizeof to force an expression to remain unevaluated.
You ask,
is there anything in the C and C++ languages standard which guarantees
that compilers will always avoid actual evaluation of the left operand
of the comma operator when the compiler can be sure that the
evaluation hasn't side effects?
As others have remarked, the answer is "no". On the contrary, the standards both unconditionally state that the left-hand operand of the comma operator is evaluated, and that the result is discarded.
This is of course a description of the execution model of an abstract machine; implementations are permitted to work differently, so long as the observable behavior is the same as the abstract machine behavior would produce. If indeed evaluation of the left-hand expression produces no side effects, then that would permit skipping it altogether, but there is nothing in either standard that provides for requiring that it be skipped.
As for fixing it, you have various options, some of which apply only to one or the other of the two languages you have named. I tend to like your offsetof() alternative, but others have noted that in C++, there are types to which offsetof cannot be applied. In C, on the other hand, the standard specifically describes its application to structure types, but says nothing about union types. Its behavior on union types, though very likely to be consistent and natural, as technically undefined.
In C only, you could use a compound literal to avoid the undefined behavior in your approach:
#define HAS_MEMBER(T,X) (((T){0}).X, #X)
That works equally well on structure and union types (though you need to provide a full type name for this version, not just a tag). Its behavior is well defined when the given type does have such a member. The expansion violates a language constraint -- thus requiring a diagnostic to be emitted -- when the type does not have such a member, including when it is neither a structure type nor a union type.
You might also use sizeof, as #alain suggested, because although the sizeof expression will be evaluated, its operand will not be evaluated (except, in C, when its operand has variably-modified type, which will not apply to your use). I think this variation will work in both C and C++ without introducing any undefined behavior:
#define HAS_MEMBER(T,X) (sizeof(((T *)NULL)->X), #X)
I have again written it so that it works for both structs and unions.
The left operand of the comma operator is a discarded-value expression
5 Expressions
11 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value
expression. The expression is evaluated and its value is discarded.
[...]
There are also unevaluated operands which, as the name implies, are not evaluated.
8 In some contexts, unevaluated operands appear (5.2.8, 5.3.3, 5.3.7,
7.1.6.2). An unevaluated operand is not evaluated. An unevaluated operand is considered a full-expression. [...]
Using a discarded-value expression in your use case is undefined behavior, but using an unevaluated operand is not.
Using sizeof for example would not cause UB because it takes an unevaluated operand.
#define STR_MEMBER(S,X) (sizeof(S::X), #X)
sizeof is preferable to offsetof, because offsetof can't be used for static members and classes that are not standard-layout:
18 Language support library
4 The macro offsetof(type, member-designator) accepts a restricted
set of type arguments in this International Standard. If type is not a
standard-layout class (Clause 9), the results are undefined. [...] The result of applying the offsetof macro to a field that
is a static data member or a function member is undefined. [...]
The language doesn't need to say anything about "actual execution" because of the as-if rule. After all, with no side effects how could you tell whether the expression is evaluated? (Looking at the assembly or setting breakpoints doesn't count; that's not part of execution of the program, which is all the language describes.)
On the other hand, dereferencing a null pointer is undefined behavior, so the language says nothing at all about what happens. You can't expect as-if to save you: as-if is a relaxation of otherwise-plausible restrictions on the implementation, and undefined behavior is a relaxation of all restrictions on the implementation. There is therefore no "conflict" between "this doesn't have side effects, so we can ignore it" and "this is undefined behavior, so nasal demons"; they're on the same side!

Why gcc and clang both don't emit any warning?

Suppose we have code like this:
int check(){
int x = 5;
++x; /* line 1.*/
return 0;
}
int main(){
return check();
}
If line 1 is commented out and the compiler is started with all warnings enabled, it emits:
warning: unused variable ‘x’ [-Wunused-variable]
However if we un-comment line 1, i.e. increase x, then no warning is emitted.
Why is that? Increasing the variable is not really using it.
This happen in both GCC and Clang for both c and c++.
Yes.
x++ is the same as x = x+1;, the assignment. When you are assigning to something, you possibly can not skip using it. The result is not discarded.
Also, from the online gcc manual, regarding -Wunused-variable option
Warn whenever a local or static variable is unused aside from its declaration.
So, when you comment the x++;, it satisfies the condition to generate and emit the warning message. When you uncomment, the usage is visible to the compiler (the "usefulness" of this particular "usage" is questionable, but, it's an usage, nonetheless) and no warning.
With the preincrement you are incrementing and assigning the value to the variable again. It is like:
x=x+1
As the gcc documentation says:
-Wunused-variable:
Warn whenever a local or static variable is unused aside from its declaration.
If you comment that line you are not using the variable aside of the line in which you declare it
increasing variable not really using it.
Sure this is using it. It's doing a read and a write access on the stored object. This operation doesn't have any effect in your simple toy code, and the optimizer might notice that and remove the variable altogether. But the logic behind the warning is much simpler: warn iff the variable is never used.
This has actually the benefit that you can silence that warning in cases where it makes sense:
void someCallback(void *data)
{
(void)data; // <- this "uses" data
// [...] handler code that doesn't need data
}
Why is that? increasing variable not really using it.
Yes, it is really using it. At least from the language point of view. I would hope that an optimizer removes all trace of the variable.
Sure, that particular use has no effect on the rest of the program, so the variable is indeed redundant. I would agree that warning in this case would be helpful. But that is not the purpose of the warning about being unused, that you mention.
However, consider that analyzing whether a particular variable has any effect on the execution of the program in general is quite difficult. There has to be a point where the compiler stops checking whether a variable is actually useful. It appears that the stages that generate warnings of the compilers that you tested only check whether the variable is used at least once. That once was the increment operation.
I think there is a misconception about the word 'using' and what the compiler means with that. When you have a ++i you are not only accessing the variable, you are even modifying it, and AFAIK this counts as 'use'.
There are limitations to what the compiler can identify as 'how' variables are being used, and if the statements make any sense. In fact both clang and gcc will try to remove unnecessary statements, depending on the -O-flag (sometimes too aggressively). But these optimizations happen without warnings.
Detecting a variable that is never ever accessed or used though (there is no further statement mentioning that variable) is rather easy.
I agree with you, it could generate a warning about this. I think it doesn't generate a warning, because developers of the compilers just didn't bothered handling this case (yet). Maybe it is because it is too complicated to do. But maybe they will do this in the future (hint: you can suggest them this warning).
Compilers getting more and more warnings. For example, there is -Wunused-but-set-variable in GCC (which is a "new" warning, introduced in GCC 4.6 in 2011), which warns about this:
void fn() {
int a;
a = 2;
}
So it is completely fine to expect that this emits a warning too (there is nothing different here, neither codes do anything useful):
void fn() {
int a = 1;
a++;
}
Maybe they could add a new warning, like -Wmeaningless-variable
As per C standard ISO/IEC 9899:201x, expressions evaluation are always executed to allow for expression's side effects to be produced unless the compiler can't be sufficiently sure that removing it the program execution is not altered.
5.1.2.3 Program execution
In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).
When removing the line
++x;
The compiler can deduce that the local variable x is defined and initialized, but not used.
When you add it, the expression itself can be considered a void expression, that must be evaluated for side effects, as stated in:
6.8.3 Expression and null statements
The expression in an expression statement is evaluated as a void expression for its side effects.
On the other hand to remove compiler warnings relative to unused variable is very common to cast the expression to void. I.e. for an unused parameter in a function you can write:
int MyFunc(int unused)
{
(void)unused;
...
return a;
}
In this case we have a void expression that reference the symbol unused.

Rationale for [dcl.constexpr]p5 in the c++ standard

What is the rationale for [dcl.constexpr]p5 (http://eel.is/c++draft/dcl.constexpr#5)?
For a non-template, non-defaulted constexpr function or a
non-template, non-defaulted, non-inheriting constexpr constructor, if
no argument values exist such that an invocation of the function or
constructor could be an evaluated subexpression of a core constant
expression ([expr.const]), or, for a constructor, a constant
initializer for some object ([basic.start.init]), the program is
ill-formed; no diagnostic required.
If a program violated this rule, declaring the offending function constexpr was useless. So what? Isn't it better to accept useless uses of the decl-specifier constexpr instead of triggering undefined behaviour (by no diagnostics required)? In addition to the problem with undefined behaviour we also have the additional complexity of having the rule [dcl.constexpr]p5 in the standard.
An implementation can still provide useful diagnostic messages in some cases that it is able to detect (warnings by convention). Just like in the following case:
int main() { 0; }
The expression in main there is well-formed but useless. Some compilers issue a diagnostic message anyway (and they are allowed to) in the form of a warning.
I understand that [dcl.constexpr]p5 cannot require diagnostics, so i'm not asking about that. I'm just asking about why this rule is even in the standard.
The reason it's ill-formed is because making it ill-formed allows implementations to reject constexpr function definitions that cannot possibly form constant expressions. Rejecting them early means getting more useful diagnostics.
The reason no diagnostic is required is because it may be unrealistic for an implementation to determine that for each and every possible combination of arguments, the result is not a constant expression.
The fact that ill-formed, no diagnostic required, effectively means the same thing as making the behaviour undefined seems to me as if it's unfortunate, but merely picked for lack of a better option. I'd be highly surprised if the intent would actually be to allow any arbitrary run-time behaviour, but there is no concept of "may be diagnosed as an error, but if not, must behave as specified" for any language feature in C++.