Do literals in C++ really evaluate? - c++

It was always my understanding that l-values have to evaluate, but for kind of obvious and easily explained reasons. An identifier represents a region of storage, and the value is in that storage and must be retrieved. That makes sense. But a program needing to evaluate a literal (for example, the integer 21) doesn't quite make sense to me. The value is right there, how much more explicit can you get? Well, besides adding U for unsigned, or some other suffix. This is why I'm curious about literals needing to be evaluated, as I've only seen this mentioned in one place. Most books also switch up terminology, like "Primary Expression," "operand," or "subexpression" and the like, to the point where the lines begin to blur. In all this time I have yet to see a clear explanation for this particular thing. It seems like a waste of processing power.

A ordinary literal only needs to be evaluated during compilation, by the compiler.
A user defined literal may be evaluated also at run time. For example, after including the <string> header, and making its ...s literals available by the directive using namespace std::string_literals;, then "Blah"s is a user defined literal of type std::string. The "Blah" part is evaluated by the compiler, at compile time. The conversion to std::string, which involves dynamic allocation, necessarily happens at run time.

But a program needing to evaluate a literal (for example, the integer
21) doesn't quite make sense to me. The value is right there, how much
more explicit can you get?
Things are a little more complicated for floating point types. Consider the number 0.1. In binary it cannot be represented exactly and the closest floating point representation must be selected for it. If you input that number during runtime, the conversion of 0.1 to the binary representation has to respect the rounding mode (upward, downward, toward zero, toward infinity). Strict treatment of floating point arithmetic suggests that conversion of the 0.1 floating point literal to the binary representation should also be performed respecting the rounding mode (which only becomes known during runtime) and therefore cannot be done by the compiler (actually the bigger part of it can be done by the compiler but the final rounding has to be performed during runtime, taking into account the rounding mode).

Related

How to express float constants precisely in source code

I have some C++11 code generated via a code generator that contains a large array of floats, and I want to make sure that the compiled values are precisely the same as the compiled values in the generator (assuming that both depend on the same float ISO norm)
So I figured the best way to do it is to store the values as hex representations and interpret them as float in the code.
Edit for Clarification: The code generator takes the float values and converts them to their corresponding hex representations. The target code is supposed to convert back to float.
It looks something like this:
const unsigned int data[3] = { 0x3d13f407U, 0x3ea27884U, 0xbe072dddU};
float const* ptr = reinterpret_cast<float const*>(&data[0]);
This works and gives me access to all the data element as floats, but I recently stumbled upon the fact that this is actually undefined behavior and only works because my compiler resolves it the way I intended:
https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8
https://en.cppreference.com/w/cpp/language/reinterpret_cast.
The standard basically says that reinterpret_cast is not defined between POD pointers of different type.
So basically I have three options:
Use memcopy and hope that the compiler will be able to optimize this
Store the data not as hex-values but in a different way.
Use std::bit_cast from C++20.
I cannot use 3) because I'm stuck with C++11.
I don't have the resources to store the data array twice, so I would have to rely on the compiler to optimize this. Due to this, I don't particularly like 1) because it could stop working if I changed compilers or compiler settings.
So that leaves me with 2):
Is there a standardized way to express float values in source code so that they map to the exact float value when compiled? Does the ISO float standard define this in a way that guarantees that any compiler will follow the interpretation? I imagine if I deviate from the way the compiler expects, I could run the risk that the float "neighbor" of the number I actually want is used.
I would also take alternative ideas if there is an option 4 I forgot.
How to express float constants precisely in source code
Use hexadecimal floating point literals. Assuming some endianess for the hexes you presented:
float floats[] = { 0x1.27e80ep-5, 0x1.44f108p-2, -0x1.0e5bbap-3 };
If you have the generated code produce the full representation of the floating-point value—all of the decimal digits needed to show its exact value—then a C++ 11 compiler is required to parse the number exactly.
C++ 11 draft N3092 2.14.4 1 says, of a floating literal:
… The exponent, if present, indicates the power of 10 by which the significant [likely typo, should be “significand”] part is to be scaled. If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner…
Thus, if the floating literal does not have all the digits needed to show the exact value, the implementation may round it either upward or downward, as the implementation defines. But if it does have all the digits, then the value represented by the floating literal is representable in the floating-point format, and so its value must be the result of the parsing.
I have read some very valuable information here and would like to throw in an option that does not strictly answer the question, but could be a solution.
It might be problematic, but if so, I would like to discuss it.
The simple solution would be: Leave it as it is.
A short rundown of why I am hesitant about the suggested options:
memcpy relies on the compiler to optimize away the actual copy and understand that I only want to read the values. Since I am having large arrays of data I would want to avoid a surprise event in which a compiler setting would be changed that suddenly introduces increased runtime and would require a fix on short notice.
bit_cast is only available from C++20. There are reference implementations but they basically use memcpy under the hood (see above).
hex float literals are only available from C++17
Directly writing the floats precisely... I don't know, it seems to be somewhat dangerous, because if I make a slight mistake I may end up with a data block that is slightly off and could have an impact on my classification results. A mistake like that would be a nightmare to spot.
So why do I think I can get away with an implementation that is strictly speaking undefined? The rationale is that the standard may not define it, but compiler manufacturers likely do, at least the ones I have worked with so far gave me exact results. The code has been running without major problems for a fairly long time, across dozens of code generator run and I would expect that a failed reinterpret_cast would break the conversion so severely that I would spot the result in my classification results right away.
Still not robust enough though. So my idea was to write a unit test that contains a significant number of hex-floats, do the reinterpret_cast and compare to reference float values for exact correspondence to tell me if a setting or compiler failed in this regard.
I have one doubt though: Is the assumption somewhat reasonable that a failed reinterpret_cast would break things spectacularly, or are the bets totally off when it comes to undefined behavior?
I am a bit worried that if the compiler implementation defines the undefined behavior in a way that it would pick a float that is close the hex value instead of the precise one (although I would wonder why), and that it happens only sporadically so that my unit test misses the problems.
So the endgame would be to unit test every single data entry against the corresponding reference float. Since the code is generated, I can generate the test as well. I think that should put all my worries to rest and make sure that I can get this to work across all possible compilers and compiler settings or be notified if anything breaks.

In which of these examples is conversion necessary?

Here are three examples, where I get a number from a function with some general non-double type (could be some sort of int, some sort of size_t, etc), and need to store that in a double.
My question is, is the code fine as is in all three examples, or do I need to do some conversion?
double x = getNotDouble(); //Set x = some number.
//Set x equal to division between two non-doubles:
double x = getNotDouble() / getAnotherNotDouble();
//Take non-double in constructor
class myClass
{
double x
myClass(someType notDoublex) : x(NotDoublex)
};
Strictly speaking, a conversion is used whenever you assign a value of one type to a variable of another type. In this respect, a conversion is needed in all three cases, since all three cases assign a non-double value to a double variable.
However, needing a conversion is not the same as needing to specify a conversion. Some conversions are provided automagically by the compiler. When this happens, you do not need to specify a conversion unless the automatic conversion is not the one you wanted. So whether or not a conversion needs to be specified depends on what you want to achieve.
Each of your three cases is correct in certain situations, but not necessarily in all situations. At the same time, each of your three cases could be enhanced with an explicit conversion, which would at least serve as a reminder to future programmers (including you!) that the conversion is intentional. This could be particularly useful when there are integers and division involved, since an explicit conversion could confirm that the intent is to convert to double after the integer division (dropping the fractional part).
In the end, what you need to do depends upon what you want to accomplish. One program's feature is another program's bug, simply because the programs seek to accomplish different goals.
Note that I have taken the following statement at face value:
I get a number from a function [...] and need to store that in a double.
For the second example, the value being stored in a double is getNotDouble() / getAnotherNotDouble(). To make this fit the statement, I needed to interpret "function" in the mathematical sense, not the programming sense. That is, the division is the "function" producing the value to store in a double. Otherwise I would have two numbers from two C++ functions, and that is inconsistent with "a number from a function". So I read the question as asking whether or not a conversion is needed after the division.
If the intent was to ask if a conversion is needed before the division, the answer still depends upon what you want to accomplish. The behavior of division depends on its operands, not on what is done with the result. So if the operands are integers, then integer division is performed, and the result is an integer even if that resulting integer is then assigned to a floating point variable. Sometimes this is desired. Often not.
If you are storing the result of the division in a double because you want to store the fractional part of the quotient, then you would need to make sure at least one of the operands is a floating point value before the division is performed. (There are floating point types other than double, so "not double" is not enough to know if an explicit conversion is needed.) However, this is really a separate topic than what this question is nominally about since this is about the division operator, while the question is nominally about storing values.
Your first and third example result in no loss of data, so I assume they're fine.
Your second example is where some loss of data takes place (integer division means the result is rounded down), which you could potentially:
double x = static_cast<double>(getNotDouble()) / getAnotherNotDouble();
One of the values has to be a double in order for the return value to also be a double.

Optimising a simple equation for execution speed in C++

I am working on a program where speed is really important since everything is in a loop. I wanted to know which one of these two equations is faster to execute.
The first one is:
smoothing / (1 + smoothing)
where smoothing is a const unsigned int.
The second one would be:
1-1/(1+smoothing)
Will the first one be faster since there is less operators involved in the equation? Will the second one be faster become smoothing is only called one time? Is there another option that is faster than these two?
As others have pointed out, the expressions as-is will produce a 0 or a 1, respectively, due to integer arithmetic (whatever floating point result you may have expected will be lost). This can be solved by using floating point literals in your expression (e.g. smoothing / (1.0f + smoothing)), which will produce a floating point result.
That aside, you shouldn't worry too much over manual optimization at this level. Your compiler is able to optimize equivalent expressions on its own; your focus should be on writing what is most readable to you as a programmer.
If you fix the floating point issue mentioned above, gcc 7.2 produces equivalent assembly for both expressions, and that's with optimization disabled. So there's nothing to worry about. They're both just as "fast".
As well, if smoothing is indeed constant, the result of your expression is also constant, and does not need to be recalculated with every iteration of the loop. You can simply declare another constant variable whose value is the result of the expression.

Warning for inexact floating-point constants

Questions like "Why isn't 0.1+0.1+0.1+0.1+0.1+0.1+0.1+0.1 = 0.8?" got me thinking that...
... It would probably be nice to have the compiler warn about the floating-point constants that it rounds to the nearest representable in the binary floating-point type (e.g. 0.1 and 0.8 are rounded in radix-2 floating-point, otherwise they'd need an infinite amount of space to store the infinite number of digits).
I've looked up gcc warnings and so far found none for this purpose (-Wall, -Wextra, -Wfloat-equal, -Wconversion, -Wcoercion (unsupported or C only?), -Wtraditional (C only) don't appear to be doing what I want).
I haven't found such a warning in Microsoft Visual C++ compiler either.
Am I missing a hidden or rarely-used option?
Is there any compiler at all that has this kind of warning?
EDIT: This warning could be useful for educational purposes and serve as a reminder to those new to floating-point.
There is no technical reason the compiler could not issue such warnings. However, they would be useful only for students (who ought to be taught how floating-point arithmetic works before they start doing any serious work with it) and people who do very fine work with floating-point. Unfortunately, most floating-point work is rough; people throw numbers at the computer without much regard for how the computer works, and they accept whatever results they get.
The warning would have to be off by default to support the bulk of existing floating-point code. Were it available, I would turn it on for my code in the Mac OS X math library. Certainly there are points in the library where we depend on every bit of the floating-point value, such as places where we use extended-precision arithmetic, and values are represented across more than one floating-point object (e.g., we would have one object with the high bits of 1/π, another object with 1/π minus the first object, and a third object with 1/π minus the first two objects, giving us about 150 bits of 1/π). Some such values are represented in hexadecimal floating-point in the source text, to avoid any issues with compiler conversion of decimal numerals, and we could readily convert any remaining numerals to avoid the new compiler warning.
However, I doubt we could convince the compiler developers that enough people would use this warning or that it would catch enough bugs to make it worth their time. Consider the case of libm. Suppose we generally wrote exact numerals for all constants but, on one occasion, wrote some other numeral. Would this warning catch a bug? Well, what bug is there? Most likely, the numeral is converted to exactly the value we wanted anyway. When writing code with this warning turned on, we are likely thinking about how the floating-point calculations will be performed, and the value we have written is one that is suitable for our purpose. E.g., it may be a coefficient of some minimax polynomial we calculated, and the coefficient is as good as it is going to get, whether represented approximately in decimal or converted to some exactly-representable hexadecimal floating-point numeral.
So, this warning will rarely catch bugs. Perhaps it would catch an occasion where we mistyped a numeral, accidentally inserting an extra digit into a hexadecimal floating-point numeral, causing it to extend beyond the representable significand. But that is rare. In most cases, the numerals we use are either simple and short or are copied and pasted from software that has calculated them. On some occasions, we will hand-type special values, such as 0x1.fffffffffffffp0. A warning when an extra “f” slips into that numeral might catch a bug during compilation, but that error would almost certainly be caught quickly in testing, since it drastically alters the special value.
So, such a compiler warning has little utility: Very few people will use it, and it will catch very few bugs for the people who do use it.
The warning is in the source: when you write float, double, or long double including any of their respective literals. Obviously, some literals are exact but even this doesn't help much: the sum of two exact values may inexact, e.g., if the have rather different scales. Having the compiler warn about inexact floating point constants would generate a false sense of security. Also, what are you meant to do about rounded constants? Writing the exact closest value explicitly would be error prone and obfuscate the intent. Writing them differently, e.g., writing 1.0 / 10.0 instead of 0.1 also obfuscates the intent and could yield different values.
There will be no such compiler switch and the reason is obvious.
We are writing down the binary components in decimal:
First fractional bit is 0.5
Second fractional bit is 0.25
Third fractional bit is 0.125
....
Do you see it ? Due to the odd endings with the number 5 every bit needs
another decimal to represent it exactly. One bit needs one decimal, two bits
needs two decimals and so on.
So for fractional floating points it would mean that for most decimal numbers
you need 24(!) decimal digits for single precision floats and
53(!!) decimal digits for double precision.
Worse, the exact digits carry no extra information, they are pure artifacts
caused by the base change.
Noone is going to write down 3.141592653589793115997963468544185161590576171875
for pi to avoid a compiler warning.
I don't see how a compiler would know or that the compiler can warn you about something like that. It is only a coincidence that a number can be exactly represented by something that is inherently inaccurate.

Integer vs floating division -> Who is responsible for providing the result?

I've been programming for a while in C++, but suddenly had a doubt and wanted to clarify with the Stackoverflow community.
When an integer is divided by another integer, we all know the result is an integer and like wise, a float divided by float is also a float.
But who is responsible for providing this result? Is it the compiler or DIV instruction?
That depends on whether or not your architecture has a DIV instruction. If your architecture has both integer and floating-point divide instructions, the compiler will emit the right instruction for the case specified by the code. The language standard specifies the rules for type promotion and whether integer or floating-point division should be used in each possible situation.
If you have only an integer divide instruction, or only a floating-point divide instruction, the compiler will inline some code or generate a call to a math support library to handle the division. Divide instructions are notoriously slow, so most compilers will try to optimize them out if at all possible (eg, replace with shift instructions, or precalculate the result for a division of compile-time constants).
Hardware divide instructions almost never include conversion between integer and floating point. If you get divide instructions at all (they are sometimes left out, because a divide circuit is large and complicated), they're practically certain to be "divide int by int, produce int" and "divide float by float, produce float". And it'll usually be that both inputs and the output are all the same size, too.
The compiler is responsible for building whatever operation was written in the source code, on top of these primitives. For instance, in C, if you divide a float by an int, the compiler will emit an int-to-float conversion and then a float divide.
(Wacky exceptions do exist. I don't know, but I wouldn't put it past the VAX to have had "divide float by int" type instructions. The Itanium didn't really have a divide instruction, but its "divide helper" was only for floating point, you had to fake integer divide on top of float divide!)
The compiler will decide at compile time what form of division is required based on the types of the variables being used - at the end of the day a DIV (or FDIV) instruction of one form or another will get involved.
Your question doesn't really make sense. The DIV instruction doesn't do anything by itself. No matter how loud you shout at it, even if you try to bribe it, it doesn't take responsibility for anything
When you program in a programming language [X], it is the sole responsibility of the [X] compiler to make a program that does what you described in the source code.
If a division is requested, the compiler decides how to make a division happen. That might happen by generating the opcode for the DIV instruction, if the CPU you're targeting has one. It might be by precomputing the division at compile-time, and just inserting the result directly into the program (assuming both operands are known at compile-time), or it might be done by generating a sequence of instructions which together emulate a divison.
But it is always up to the compiler. Your C++ program doesn't have any effect unless it is interpreted according to the C++ standard. If you interpret it as a plain text file, it doesn't do anything. If your compiler interprets it as a Java program, it is going to choke and reject it.
And the DIV instruction doesn't know anything about the C++ standard. A C++ compiler, on the other hand, is written with the sole purpose of understanding the C++ standard, and transforming code according to it.
The compiler is always responsible.
One of the most important rules in the C++ standard is the "as if" rule:
The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
Which in relation to your question means it doesn't matter what component does the division, as long as it gets done. It may be performed by a DIV machine code, it may be performed by more complicated code if there isn't an appropriate instruction for the processor in question.
It can also:
Replace the operation with a bit-shift operation if appropriate and likely to be faster.
Replace the operation with a literal if computable at compile-time or an assignment if e.g. when processing x / y it can be shown at compile time that y will always be 1.
Replace the operation with an exception throw if it can be shown at compile time that it will always be an integer division by zero.
Practically
The C99 standard defines "When integers are divided, the result of the / operator
is the algebraic quotient with any fractional part
discarded." And adds in a footnote that "this is often called 'truncation toward zero.'"
History
Historically, the language specification is responsible.
Pascal defines its operators so that using / for division always returns a real (even if you use it to divide 2 integers), and if you want to divide integers and get an integer result, you use the div operator instead. (Visual Basic has a similar distinction and uses the \ operator for integer division that returns an integer result.)
In C, it was decided that the same distinction should be made by casting one of the integer operands to a float if you wanted a floating point result. It's become convention to treat integer versus floating point types the way you describe in many C-derived languages. I suspect this convention may have originated in Fortran.