Could a compiler optimisaton give an lvalue rather than rvalue? - c++

The following code:
int x = 0;
x+0 = 10;
unsurprisingly produces the compiler error
lvalue required as left operand of assignment
However, is it guaranteed that all standards-conforming compilers will produce a similar error, or could a compiler legitimately treat line 2 as
x = 10;
which then would compile?

Yes, it is guaranteed you get an error (or more precisely, some diagnostic). Compiler optimization never makes ill-formed code well formed.

Yes, compilers must reject this.
A good description of value categories can be found here:
https://medium.com/#barryrevzin/value-categories-in-c-17-f56ae54bccbe
One of the takeaways is, the value category of an expression can be determined by looking at the type decltype ((expr)).
Compiler optimations do not change the types of expressions, and they generally happen after name resolution, determination of types, overload resolution etc.
Gcc is known to perform some constant folding in the front end but i would be shocked if any version of gcc compiles your example.

Related

Warn about UB in argument evaluation order

I recently faced a bug in a code like this
class C
{
public:
// foo return value depends on C's state
// AND each call to foo changes the state.
int foo(int arg) /*foo is not const-qualified.*/ {}
private:
// Some mutable state
};
C c;
bar(c.foo(42), c.foo(43))
The last call behaved differently on different platforms (which is perfectly legal due to undefined order of argument evaluation), and I fixed the bug.
But the rest codebase is large and I would like to spot all other UB of this type.
Is there a special compiler warning in GCC, Clang or MSVS for such cases?
And what is the ideomatic and lightweight way to prevent such bugs?
Argument order evaluation is unspecified rather than undefined.
Order of evaluation of the operands of almost all C++ operators (including the order of evaluation of function arguments in a function-call expression and the order of evaluation of the subexpressions within any expression) is unspecified. The compiler can evaluate operands in any order, and may choose another order when the same expression is evaluated again.
Since it is unspecified rather than undefined behavior, compilers are not required to issue diagnostics for it.
GCC and Clang do not have any general compiler option to issue diagnostics for unspecified behavior.
In GCC there is the option fstrong-eval-order which does this:
Evaluate member access, array subscripting, and shift expressions in left-to-right order, and evaluate assignment in right-to-left order, as adopted for C++17. Enabled by default with -std=c++17. -fstrong-eval-order=some enables just the ordering of member access and shift expressions, and is the default without -std=c++17.
There is also the option -Wreorder (C++ and Objective-C++ only) which does this:
Warn when the order of member initializers given in the code does not match the order in which they must be executed
But I do not think these options will be helpful in your particular case.
In the below statement, if you want the first argument to be evaluated before the second:
bar(c.foo(42), c.foo(43))
The simple way is to store the results of c.foo(42) and c.foo(43) in intermediate variables first and then call bar(). (Turn off compiler optimizations to avoid any reordering of statements by the compiler !!)
auto var1 = c.foo(42);
auto var2 = c.foo(43);
bar(var1, var2);
I guess that is how you must have fixed the bug.

In the comma operator, is the left operand guaranteed not to be actually executed if it hasn't side effects?

To show the topic I'm going to use C, but the same macro can be used also in C++ (with or without struct), raising the same question.
I came up with this macro
#define STR_MEMBER(S,X) (((struct S*)NULL)->X, #X)
Its purpose is to have strings (const char*) of an existing member of a struct, so that if the member doesn't exist, the compilation fails. A minimal usage example:
#include <stdio.h>
struct a
{
int value;
};
int main(void)
{
printf("a.%s member really exists\n", STR_MEMBER(a, value));
return 0;
}
If value weren't a member of struct a, the code wouldn't compile, and this is what I wanted.
The comma operator should evaluate the left operand and then discard the result of the expression (if there is one), so that my understanding is that usually this operator is used when the evaluation of the left operand has side effects.
In this case, however, there aren't (intended) side effects, but of course it works iff the compiler doesn't actually produce the code which evaluates the expression, for otherwise it would access to a struct located at NULL and a segmentation fault would occur.
Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.
Adding volatile in the macro (e.g. because accessing that memory address is the desired side effect) was so far the only way to trigger the segmentation fault.
So the question: is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator when the compiler can be sure that the evaluation hasn't side effects?
Notes and fixing
I am not asking for a judgment about the macro as it is and the opportunity to use it or make it better. For the purpose of this question, the macro is bad if and only if it evokes undefined behaviour — i.e., if and only if it is risky because compilers are allowed to generate the “evaluation code” even when this hasn't side effects.
I have already two obvious fixes in mind: “reifying” the struct and using offsetof. The former needs an accessible memory area as big as the biggest struct we use as first argument of STR_MEMBER (e.g. maybe a static union could do…). The latter should work flawlessly: it gives an offset we aren't interested in and avoids the access problem — indeed I'm assuming gcc, because it's the compiler I use (hence the tag), and that its offsetof built-in behaves.
With the offsetof fix the macro becomes
#define STR_MEMBER(S,X) (offsetof(struct S,X), #X)
Writing volatile struct S instead of struct S doesn't cause the segfault.
Suggestions about other possible “fixes” are welcome, too.
Added note
Actually, the real usage case was in C++ in a static storage struct. This seems to be fine in C++, but as soon as I tried C with a code closer to the original instead of the one boiled for this question, I realized that C isn't happy at all with that:
error: initializer element is not constant
C wants the struct to be initializable at compile time, instead C++ it's fine with that.
Is there anything in the C and C++ languages standard which guarantees that compilers will always avoid actual evaluation of the left operand of the comma operator ?
It's the opposite. The standard guarantees that the left operand IS evaluated (really it does, there aren't any exceptions). The result is discarded.
Note: for lvalue expressions, "evaluate" does not mean "access the stored value". Instead, it means to work out where the designated memory location is. The other code encompassing the lvalue expression may or may not then go on to access the memory location. The process of reading from the memory location is known as "lvalue conversion" in C, or "lvalue to rvalue conversion" in C++.
In C++ a discarded-value expression (such as the left operand of the comma operator) only has lvalue to rvalue conversion performed on it if it is volatile and also meets some other criteria (see C++14 [expr]/11 for detail). In C lvalue conversion does occur for expressions whose result is not used (C11 6.3.2.1/2).
In your example, it is moot whether or not lvalue conversion happens. In both languages X->Y, where X is a pointer, is defined as (*X).Y; in C the act of applying * to a null pointer already causes undefined behaviour (C11 6.5.3/3), and in C++ the . operator is only defined for the case when the left operand actually designates an object (C++14 [expr.ref]/4.2).
The comma operator (C documentation, says something very similar) has no such guarantees.
In a comma expression E1, E2, the expression E1 is evaluated, its result is discarded ..., and its side effects are completed before evaluation of the expression E2 begins
irrelevant information omitted
To put it simply, E1 will be evaluated, although the compiler might optimize it away by the as-if rule if it is able to determine that there are no side-effects.
Gcc/g++ 6.3 and 4.9.2 never produced that dangerous code, even with -O0, as if they were always able to “see” that the evaluation hasn't side effects and so it can be skipped.
clang will produce code which raises an error if you pass it the -fsanitize=undefined option. Which should answer your question: at least one major implementation's developers clearly consider the code as having undefined behaviour. And they are correct.
Suggestions about other possible “fixes” are welcome, too.
I would look for something which is guaranteed not to evaluate the expression. Your suggestion of offsetof does the job, but may occasionally cause code to be rejected that would otherwise be accepted, such as when X is a.b. If you want that to be accepted, my thought would be to use sizeof to force an expression to remain unevaluated.
You ask,
is there anything in the C and C++ languages standard which guarantees
that compilers will always avoid actual evaluation of the left operand
of the comma operator when the compiler can be sure that the
evaluation hasn't side effects?
As others have remarked, the answer is "no". On the contrary, the standards both unconditionally state that the left-hand operand of the comma operator is evaluated, and that the result is discarded.
This is of course a description of the execution model of an abstract machine; implementations are permitted to work differently, so long as the observable behavior is the same as the abstract machine behavior would produce. If indeed evaluation of the left-hand expression produces no side effects, then that would permit skipping it altogether, but there is nothing in either standard that provides for requiring that it be skipped.
As for fixing it, you have various options, some of which apply only to one or the other of the two languages you have named. I tend to like your offsetof() alternative, but others have noted that in C++, there are types to which offsetof cannot be applied. In C, on the other hand, the standard specifically describes its application to structure types, but says nothing about union types. Its behavior on union types, though very likely to be consistent and natural, as technically undefined.
In C only, you could use a compound literal to avoid the undefined behavior in your approach:
#define HAS_MEMBER(T,X) (((T){0}).X, #X)
That works equally well on structure and union types (though you need to provide a full type name for this version, not just a tag). Its behavior is well defined when the given type does have such a member. The expansion violates a language constraint -- thus requiring a diagnostic to be emitted -- when the type does not have such a member, including when it is neither a structure type nor a union type.
You might also use sizeof, as #alain suggested, because although the sizeof expression will be evaluated, its operand will not be evaluated (except, in C, when its operand has variably-modified type, which will not apply to your use). I think this variation will work in both C and C++ without introducing any undefined behavior:
#define HAS_MEMBER(T,X) (sizeof(((T *)NULL)->X), #X)
I have again written it so that it works for both structs and unions.
The left operand of the comma operator is a discarded-value expression
5 Expressions
11 In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value
expression. The expression is evaluated and its value is discarded.
[...]
There are also unevaluated operands which, as the name implies, are not evaluated.
8 In some contexts, unevaluated operands appear (5.2.8, 5.3.3, 5.3.7,
7.1.6.2). An unevaluated operand is not evaluated. An unevaluated operand is considered a full-expression. [...]
Using a discarded-value expression in your use case is undefined behavior, but using an unevaluated operand is not.
Using sizeof for example would not cause UB because it takes an unevaluated operand.
#define STR_MEMBER(S,X) (sizeof(S::X), #X)
sizeof is preferable to offsetof, because offsetof can't be used for static members and classes that are not standard-layout:
18 Language support library
4 The macro offsetof(type, member-designator) accepts a restricted
set of type arguments in this International Standard. If type is not a
standard-layout class (Clause 9), the results are undefined. [...] The result of applying the offsetof macro to a field that
is a static data member or a function member is undefined. [...]
The language doesn't need to say anything about "actual execution" because of the as-if rule. After all, with no side effects how could you tell whether the expression is evaluated? (Looking at the assembly or setting breakpoints doesn't count; that's not part of execution of the program, which is all the language describes.)
On the other hand, dereferencing a null pointer is undefined behavior, so the language says nothing at all about what happens. You can't expect as-if to save you: as-if is a relaxation of otherwise-plausible restrictions on the implementation, and undefined behavior is a relaxation of all restrictions on the implementation. There is therefore no "conflict" between "this doesn't have side effects, so we can ignore it" and "this is undefined behavior, so nasal demons"; they're on the same side!

How can I know if C++ compiler evaluates the expression at compile time?

I have a code like this
const int Value = 123 * 2 + GetOffset();
GetOffset is a constexpr function returning int.
How can I make sure this expression is indeed evaluated at compile time?
Why don't you use constexpr for Value too? I think it will ask the compiler to evaluate it,
constexpr int Value = 123 * 2 + GetOffset();
if the function GetOffset() is simple and meet the requirements of constexpr.
The requirements are
the function must have a non-void return type.
the function body cannot declare variables or define new types.
the body may contain only declarations, null statements and a single return
statement.
Since Getoffset() returns int, it meets the first one.
You can't ensure the compiler does this. You generally need to enable optimization, including some level of function inlining. What these options are depend on your compiler and its version.
You can check the generated assembly to see if it contains a call to GetOffset or just uses a constant determined by the compiler.
what if you declare 'Value' as constexpr too? Actually you can probalby never be sure if something is evaluated at compilation time, however in this case there is no reason why it could not be evaluated.
One possibility is to use std::ratio. From section 20.10.1 of the C++11 standard:
This subclause describes the ratio library. It provides a class template ratio which exactly represents any finite rational number with a numerator and denominator representable by compile-time constants of type intmax_t.
So according to the standard, this would only be valid for a compile-time constant:
const int value = std::ratio<123 * 2 + GetOffset()>::num;
So this would guarantee that the expression is evaluated at compile time. However, it doesn't also guarantee that the expression is not evaluated at run time.
You can't be absolutely sure; the compiler is only required to generate code with the specified behaviour, and calculating it at compile- or run-time would not change the behaviour.
However, the compiler is required to be able to evaluate this at compile time, since it can be used where only compile-time constants are allowed such as array sizes and template arguments; so there's no reason why a sane compiler shouldn't perform that obvious optimisation. If the compiler doesn't (at least with optimisations enabled), throw it away and find a better one.
You can check the assembly produced by the compiler to see whether it calculates the value; but this in itself doesn't guarantee that future builds will do the same.
Considering that I have not used C++ in over half a decade now, the chances of the suggestion being way of the mark are quite high, but what about using inline for the function.
If the function returns a certain predefined value, available at the compile time, then the compiler should be able to make use of that value.
Create a separate source file with the expression. Evaluate printf("#define MyExpression %d.\n", expression);. When building your project, compile this source file for the native system and execute it. Include the resulting output as a header in your regular sources.
If you want to confirm that the initializer is a constant expression than you can use the constexpr specifier:
constexpr int Value = 123 * 2 + GetOffset();
It will fail to compile if it isn't a constant expression.
It is theoretically unspecified whether a constexpr variable Value is actually calculated during translation - but in practice you can be sure it is.
Just assert it: static_assert(Value == 123 * 2 + GetOffset(), "constexpr");
Doesn't get any simpler than that.

Testing endianess at compile-time: is this constexpr function correct according to the standard?

After some search for a way to check endianess at compile-time I've come up with the following solution:
static const int a{1};
constexpr bool is_big_endian()
{
return *((char*)&(a)) == 1;
}
GCC accepts this code only in some contexts where constexpr is required:
int b[is_big_endian() ? 12 : 25]; //works
std::array<int, testendian() ? 12 : 25> c; //fails
For the second case, GCC says error: accessing value of ‘a’ through a ‘char’ glvalue in a constant expression. I couldn't find anything in the standard that forbids such thing. Maybe someone could clarify in which case GCC is correct?
This is what I get from Clang 3.1 ToT:
error: constexpr function never produces a constant expression
§5.19 [expr.const]
p1 Certain contexts require expressions that satisfy additional requirements as detailed in this sub-clause; other contexts have different semantics depending on whether or not an expression satisfies these requirements. Expressions that satisfy these requirements are called constant expressions.
p2 A conditional-expression is a core constant expression unless it involves one of the following as a potentially evaluated subexpression:
[...]
a reinterpret_cast (5.2.10);
So, (char*)&(a) evaluates to a reinterpret_cast, as such the function is never a valid constexpr function.
You should look into Boost.Detail.Endian
It is a mapping of several architectures to their endianness (through the macros BOOST_BIG_ENDIAN, BOOST_LITTLE_ENDIAN, and BOOST_PDP_ENDIAN). As far as I know, there is no actual way to determine the endianness at compile time, other than a list like this.
For an example implementation that uses Boost.Detail.Endian, you can see the library I'm hoping to get reviewed for submission to Boost: https://bitbucket.org/davidstone/endian/ (the relevant file is byte_order.hpp, but unsigned.hpp is necessary as well if you want to just use my implementation).
If N3620 - Network Byte Order Conversion is implemented, you'll be able to use the constexpr ntoh to check for endianness, but remember there are rare architectures like middle-endian and you'll never be able to support all of them.

Taking the address of a temporary object

§5.3.1 Unary operators, Section 3
The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id.
What exactly does "shall be" mean in this context? Does it mean it's an error to take the address of a temporary? I was just wondering, because g++ only gives me a warning, whereas comeau refuses to compile the following program:
#include <string>
int main()
{
&std::string("test");
}
g++ warning: taking address of temporary
comeau error: expression must be an lvalue or a function designator
Does anyone have a Microsoft compiler or other compilers and can test this program, please?
The word "shall" in the standard language means a strict requirement. So, yes, your code is ill-formed (it is an error) because it attempts to apply address-of operator to a non-lvalue.
However, the problem here is not an attempt of taking address of a temporary. The problem is, again, taking address of a non-lvalue. Temporary object can be lvalue or non-lvalue depending on the expression that produces that temporary or provides access to that temporary. In your case you have std::string("test") - a functional style cast to a non-reference type, which by definition produces a non-lvalue. Hence the error.
If you wished to take address of a temporary object, you could have worked around the restriction by doing this, for example
const std::string &r = std::string("test");
&r; // this expression produces address of a temporary
whith the resultant pointer remaining valid as long as the temporary exists. There are other ways to legally obtain address of a temporary object. It is just that your specific method happens to be illegal.
When the word "shall" is used in the C++ Standard, it means "must on pain of death" - if an implementation does not obey this, it is faulty.
It is permitted in MSVC with the deprecated /Ze (extensions enabled) option. It was allowed in previous versions of MSVC. It generates a diagnostic with all warnings enabled:
warning C4238: nonstandard extension used : class rvalue used as lvalue.
Unless the /Za option is used (enforce ANSI compatibility), then:
error C2102: '&' requires l-value
&std::string("test"); is asking for the address of the return value of the function call (we'll ignore as irrelevant the fact that this function is a ctor). It didn't have an address until you assign it to something. Hence it's an error.
The C++ standard is a actually a requirement on conformant C++ implementations. At places it is written to distinguish between code that conformant implementations must accept and code for which conformant implementations must give a diagnostic.
So, in this particular case, a conformant compiler must give a diagnostic if the address of an rvalue is taken. Both compilers do, so they are conformant in this respect.
The standard does not forbid the generation of an executable if a certain input causes a diagnostic, i.e. warnings are valid diagnostics.
I'm not a standards expert, but it certainly sounds like an error to me. g++ very often only gives a warning for things that are really errors.
user defined conversion
struct String {
std::string str;
operator std::string*() {
return &str;
}
};
std::string *my_str = String{"abc"};