Correctly converting floating point in C++ - c++

What would be the correct/recommended way of telling the C++ compiler "only warn me of floating point conversions that I'm not aware of"?
In C, I would enable the warnings related to floating point conversions, and then I would use explicit C-style casts to silence warnings related to the conversions that are under control.
For example, computing a*a*a - b*b is quite prone to overflow in single precision floating point, so you might wish to compute it in double precision and only go single precision later:
double a = 443620.52;
double b = 874003.01;
float c = (float)(a*a*a - b*b);
The above C-style explicit cast would silence the compiler warning about the conversion from double to float.
Reading C++ documentation about casts, I get to the conclusion that the correct way of doing this in C++ would be as follows:
double a = 443620.52;
double b = 874003.01;
float c = static_cast<float>(a*a*a - b*b);
But, is this really the correct way of doing this in C++?
I understand the rationale behind the static_cast syntax being ugly on purpose, so that you avoid casts completely if possible.
Yes, I can omit the explicit cast to float. But then I need to disable compiler warnings telling me of precision loss (or otherwise I'd get a number of irrelevant warnings that would make it difficult to notice really relevant warnings). And if I disable fp-related compiler warnings, I'd lose the possibility of being warned when I'm mistakenly losing precision in other code places.
So, what's the correct approach for floating point conversions in C++?

Yes
float c = static_cast<float>(a*a*a - b*b);
is the correct way of explicitly casting to float in C++. You can also do:
float c = (float)(a*a*a - b*b);
but using a "C-style" cast like that is bad style because static_cast will hide rather fewer errors than C-style.
Alternatively, if you are doing this a lot, you can define a function:
inline float flt(double d){return static_cast<float>(d);}
and then you can write:
float c = flt(a*a*a - b*b);
which is even more compact than the original C (and will be optimized away to nothing).

As far as I am aware, there are three different ways to avoid the warning:
C-style cast
static_cast
Constructor-style cast (e.g. float c = float(a*a*a-b*b))
In the code example below, c1, c2 and c3 avoid the warnings:
int main()
{
double a = 443620.52;
double b = 874003.01;
// These three versions avoid the conversion warnings:
float c1 = (float)(a*a*a - b*b);
float c2 = static_cast<float>(a*a*a - b*b);
float c3 = float(a*a*a - b*b);
// Only these two give conversion warnings:
float c4(a*a*a - b*b);
float c5 = a*a*a - b*b;
(void)c1; // Just to avoid unused-variable warnings
(void)c2;
(void)c3;
(void)c4;
(void)c5;
}
Only c4 and c5 trigger a warning. Check the live demo to see the results.

Related

Floating Point Representation in Debugger in C vs C++(CLI)

A little background: I was working on some data conversion from C to C# by using a C++/CLI midlayer, and I noticed a peculiarity with the way the debugger shows floats and doubles, depending on which dll the code is executing in (see code and images below). At first I thought it had something to do with managed/unmanaged differences, but then I realized that if I completely left the C# layer out of it and only used unmanaged data types, the same behaviour was exhibited.
Test Case: To further explore the issue, I created an isolated test case to clearly identify the strange behaviour. I am assuming that anyone who may be testing this code already has a working Solution and dllimport/dllexport/ macros set up. Mine is called DLL_EXPORT. If you need a minimal working header file, let me know. Here the main application is in C and calling a function from a C++/CLI dll. I am using Visual Studio 2015 and both assemblies are 32 bit.
I am a bit concerned, as I am not sure if this is something I need to worry about or it's just something the debugger is doing (I am leaning towards the latter). And to be quite honest, I am just outright curious as to what's happening here.
Question: Can anyone explain the observed behaviour or at least point me in the right direction?
C - Calling Function
void floatTest()
{
float floatValC = 42.42f;
double doubleValC = 42.42;
//even if passing the address, behaviour is same as all others.
float retFloat = 42.42f;
double retDouble = 42.42;
int sizeOfFloatC = sizeof(float);
int sizeOfDoubleC = sizeof(double);
floatTestCPP(floatValC, doubleValC, &retFloat, &retDouble);
//do some dummy math to make compiler happy (i.e. no unsused variable warnings)
sizeOfFloatC = sizeOfFloatC + sizeOfDoubleC;//break point here
}
C++/CLI Header
DLL_EXPORT void floatTestCPP(float floatVal, double doubleVal,
float *floatRet, double *doubleRet);
C++/CLI Source
//as you can see, there are no managed types in this function
void floatTestCPP(float floatVal, double doubleVal, float *floatRet, double *doubleRet)
{
float floatLocal = floatVal;
double doubleLocal = doubleVal;
int sizeOfFloatCPP = sizeof(float);
int sizeOfDoubleCPP = sizeof(double);
*floatRet = 42.42f;
*doubleRet = 42.42;
//do some dummy math to make compiler happy (no warnings)
floatLocal = (float)doubleLocal;//break point here
sizeOfDoubleCPP = sizeOfFloatCPP;
}
Debugger in C - break point on last line of floatTest()
Debugger in C++/CLI - break point on the second to last line of floatTestCPP()
Consider Debugger in C++/CLI itself is not necessarily coded in C, C# or C++.
MS libraries support the "R" format: A string that can round-trip to an identical number. I suspect this or a g format was used.
Without MS source code, the following is only a good supposition:
The debug output is enough to distinguish the double from other nearby double. So code need not print "42.420000000000002", but "42.42" is sufficient - whatever format is used.
42.42 as an IEEE double is about 42.4200000000000017053025658242404460906982... and the debugger certainly need not print the exact value.
Potential; similar C code
int main(void) {
puts("12.34567890123456");
double d = 42.42;
printf("%.16g\n", nextafter(d,0));
printf("%.16g\n", d);
printf("%.17g\n", d);
printf("%.16g\n", nextafter(d,2*d));
d = 1 / 3.0f;
printf("%.9g\n", nextafterf(d,0));
printf("%.9g\n", d);
printf("%.9g\n", nextafterf(d,2*d));
d = 1 / 3.0f;
printf("%.16g\n", nextafter(d,0));
printf("%.16g\n", d);
printf("%.16g\n", nextafter(d,2*d));
}
output
12.34567890123456
42.41999999999999
42.42
42.420000000000002 // this level of precision not needed.
42.42000000000001
0.333333313
0.333333343
0.333333373
0.3333333432674407
0.3333333432674408
0.3333333432674409
For your code to convert a double to text with sufficient textual precision and back to double to "round-trip" the number, see Printf width specifier to maintain precision of floating-point value.

Significance of -Werror=old-style-cast?

I'm using code that casts some ints to floats for division.
size_t a;
uint8_t b, c;
a = (float)b / (float)c;
I was compiling with warning flags enabled and I got one for 'old cast'. Is there a better or proper way I should be casting these things? If so, how?
Old style casts are "C-style" casts.
-Werror=old-style-cast turns the usage of C-style casts into errors.
You should use the C++ casts.
Here you can use a static_cast :
size_t a; uint8_t b, c;
a = static_cast<float>(b) / static_cast<float>(c);

How to convert a float to an int in modern C++

As strange as it may seems, I can't find how to cleanly convert a float to an int.
This technique
int int_value = (int)(float_value + 0.5);
triggers a
warning: use of old-style cast
in gcc.
So, what is the modern-style, simple way to convert a float to an int ? (I accept the loss of precision of course)
As Josh pointed out in the comments, + 0.5 is not very reliable. For extra security you could combine a static_cast with std::round like so:
int int_value = static_cast<int>(std::round(float_value));
For the casting part, see this excellent post for an explanation.
try:
int int_value = static_cast<int>(float_value + 0.5);
FYI: different casts in C++ gave a very good explanation about those 4 casts introduced in C++.
You could also consider
int int_value = boost::lexical_cast<int>(float_value);
lexical_cast has the benefit of working for all primitive types, and stl strings etc. It also means you don't have to do the (float_value + 0.5) stuff.

Should I always use the appropriate literals for number types?

I'm often using the wrong literals in expressions, e.g. dividing a float by an int, like this:
float f = read_f();
float g = f / 2;
I believe that the compiler will in this case first convert the int literal (2) to float, and then apply the division operator. GCC and Clang have always let stuff like that pass, but Visual C++ warns about an implicit conversion. So I have to write it like this:
float f = read_f();
float g = f / 2.0f;
That got me wondering: Should I always use the appropriate literals for float, double, long etc.? I normally use int literals whenever I can get away with it, but I'm not sure if that's actually a good idea.
Is this a likely cause of subtle errors?
Is this only an issue for expressions or also for function parameters?
Are there warning levels for GCC or Clang that warn about such implicit conversions?
How about unsigned int, long int etc?
You should always explicitly indicate the type of literal that you intend to use. This will prevent problems when for example this sort of code:
float foo = 9.0f;
float bar = foo / 2;
changes to the following, truncating the result:
int foo = 9;
float bar = foo / 2;
It's a concern with function parameters as well when you have overloading and templates involved.
I know gcc has -Wconversion but I can't recall everything that it covers.
For integer values that fit in int I usually don't qualify those for long or unsigned as there is usually much less chance there for subtle bugs.
There's pretty much never an absolutely correct answer to a "should" question. Who's going to use this code, and for what? That's relevant here. But also, particularly for anything to do with floats, it's good to get into the habit of specifying exactly the operations you require. float*float is done in single-precision. anything with a double is done double-precision, 2 gets converted to a double so you're specifying different operations here.
The best answer here is What Every Computer Scientist Should Know About Floating-Point Arithmetic. I'd say don't tl;dr it, there are no simple answers with floating point.

Locating numerical errors due to Integer division

Is there a g++ warning or other tool that can identify integer division (truncation toward zero)? I have thousands of lines of code with calculations that inevitably will have numerical errors typically due to "float = int/int" that need to be located. I need a reasonable method for finding these.
Try -Wconversion.
From gcc's man page:
Warn for implicit conversions that may
alter a value. This includes
conversions between real and integer,
like "abs (x)" when "x" is "double";
conversions between signed and
unsigned, like "unsigned ui = -1"; and
conversions to smaller types, like
"sqrtf (M_PI)". Do not warn for
explicit casts like "abs ((int) x)"
and "ui = (unsigned) -1", or if the
value is not changed by the conversion
like in "abs (2.0)". Warnings about
conversions between signed and
unsigned integers can be disabled by
using -Wno-sign-conversion.
For C++, also warn for conversions
between "NULL" and non-pointer types;
confusing overload resolution for
user-defined conversions; and
conversions that will never use a type
conversion operator: conversions to
"void", the same type, a base class or
a reference to them. Warnings about
conversions between signed and
unsigned integers are disabled by
default in C++ unless
-Wsign-conversion is explicitly enabled.
For the following sample program (test.cpp), I get the error test.cpp: In function ‘int main()’:
test.cpp:7: warning: conversion to ‘float’ from ‘int’ may alter its value.
#include <iostream>
int main()
{
int a = 2;
int b = 3;
float f = a / b;
std::cout << f;
return 0;
}
I have a hard time calling these numerical errors. You asked for integer calculations, and got the correct numbers for integer calculations. If those numbers aren't acceptable, then ask for floating point calculations:
int x = 3;
int y = 10;
int z = x / y;
// "1." is the same thing as "1.0", you may want to read up on
// "the usual arithmetic conversions." You could add some
// parentheses here, but they aren't needed for this specific
// statement.
double zz = 1. * x / y;
This page contains info about g++ warnings. If you've already tried -Wall then the only thing left could be the warnings in this link. On second look -Wconversion might do the trick.
Note: Completely edited the response.
Remark on -Wconversion of gcc:
Changing the type of the floating point variable from float to double makes the warning vanish:
$ cat 'file.cpp'
#include <iostream>
int main()
{
int a = 2;
int b = 3;
double f = a / b;
std::cout << f;
}
Compiling with $ g++-4.7 -Wconversion 'file.cpp' returns no warnings (as $ clang++ -Weverything 'file.cpp').
Explanation:
The warning when using the type float is not returned because of the totally valid integer arithmetics, but because float cannot store all possible values of int (larger ones cannot be captured by float but by double). So there might be a change of value when assigning RHS to f in the case of float but not in the case of double. To make it clear: The warning is not returned because of int/int but because of the assignment float = int.
For this see following questions: what the difference between the float and integer data type when the size is same in java, Storing ints as floats and Rounding to use for int -> float -> int round trip conversion
However, when using float -Wconversion could still be useful to identify possible lines which are affected but is not comprehensive and is actually not intended for that. For the purpose of -Wconversion see docs/gcc/Warning-Options.html and here gcc.gnu.org/wiki/NewWconversion
Possibly of interest is also following discussion 'Implicit casting Integer calculation to float in C++'
The best way to find such error is to have really good unit tests. All alternatives are not good enough.
Have a look at this clang-tidy detection.
It catches cases like this:
d = 32 * 8 / (2 + i);
d = 8 * floatFunc(1 + 7 / 2);
d = i / (1 << 4);