Is 'float a = 3.0;' a correct statement? - c++

If I have the following declaration:
float a = 3.0 ;
is that an error? I read in a book that 3.0 is a double value and that I have to specify it as float a = 3.0f. Is it so?

It is not an error to declare float a = 3.0 : if you do, the compiler will convert the double literal 3.0 to a float for you.
However, you should use the float literals notation in specific scenarios.
For performance reasons:
Specifically, consider:
float foo(float x) { return x * 0.42; }
Here the compiler will emit a conversion (that you will pay at runtime) for each returned value. To avoid it you should declare:
float foo(float x) { return x * 0.42f; } // OK, no conversion required
To avoid bugs when comparing results:
e.g. the following comparison fails :
float x = 4.2;
if (x == 4.2)
std::cout << "oops"; // Not executed!
We can fix it with the float literal notation :
if (x == 4.2f)
std::cout << "ok !"; // Executed!
(Note: of course, this is not how you should compare float or double numbers for equality in general)
To call the correct overloaded function (for the same reason):
Example:
void foo(float f) { std::cout << "\nfloat"; }
void foo(double d) { std::cout << "\ndouble"; }
int main()
{
foo(42.0); // calls double overload
foo(42.0f); // calls float overload
return 0;
}
As noted by Cyber, in a type deduction context, it is necessary to help the compiler deduce a float :
In case of auto :
auto d = 3; // int
auto e = 3.0; // double
auto f = 3.0f; // float
And similarly, in case of template type deduction :
void foo(float f) { std::cout << "\nfloat"; }
void foo(double d) { std::cout << "\ndouble"; }
template<typename T>
void bar(T t)
{
foo(t);
}
int main()
{
bar(42.0); // Deduce double
bar(42.0f); // Deduce float
return 0;
}
Live demo

The compiler will turn any of the following literals into floats, because you declared the variable as a float.
float a = 3; // converted to float
float b = 3.0; // converted to float
float c = 3.0f; // float
It would matter is if you used auto (or other type deducting methods), for example:
auto d = 3; // int
auto e = 3.0; // double
auto f = 3.0f; // float

Floating point literals without a suffix are of type double, this is covered in the draft C++ standard section 2.14.4 Floating literals:
[...]The type of a floating literal is double unless explicitly specified by a suffix.[...]
so is it an error to assign 3.0 a double literal to a float?:
float a = 3.0
No, it is not, it will be converted, which is covered in section 4.8 Floating point conversions:
A prvalue of floating point type can be converted to a prvalue of
another floating point type. If the source value can be exactly
represented in the destination type, the result of the conversion is
that exact representation. If the source value is between two adjacent
destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise,
the behavior is undefined.
We can read more details on the implications of this in GotW #67: double or nothing which says:
This means that a double constant can be implicitly (i.e., silently)
converted to a float constant, even if doing so loses precision (i.e.,
data). This was allowed to remain for C compatibility and usability
reasons, but it's worth keeping in mind when you do floating-point
work.
A quality compiler will warn you if you try to do something that's
undefined behavior, namely put a double quantity into a float that's
less than the minimum, or greater than the maximum, value that a float
is able to represent. A really good compiler will provide an optional
warning if you try to do something that may be defined but could lose
information, namely put a double quantity into a float that is between
the minimum and maximum values representable by a float, but which
can't be represented exactly as a float.
So there are caveats for the general case that you should be aware of.
From a practical perspective, in this case the results will most likely be the same even though technically there is a conversion, we can see this by trying out the following code on godbolt:
#include <iostream>
float func1()
{
return 3.0; // a double literal
}
float func2()
{
return 3.0f ; // a float literal
}
int main()
{
std::cout << func1() << ":" << func2() << std::endl ;
return 0;
}
and we see that the results for func1 and func2 are identical, using both clang and gcc:
func1():
movss xmm0, DWORD PTR .LC0[rip]
ret
func2():
movss xmm0, DWORD PTR .LC0[rip]
ret
As Pascal points out in this comment you won't always be able to count on this. Using 0.1 and 0.1f respectively causes the assembly generated to differ since the conversion must now be done explicitly. The following code:
float func1(float x )
{
return x*0.1; // a double literal
}
float func2(float x)
{
return x*0.1f ; // a float literal
}
results in the following assembly:
func1(float):
cvtss2sd %xmm0, %xmm0 # x, D.31147
mulsd .LC0(%rip), %xmm0 #, D.31147
cvtsd2ss %xmm0, %xmm0 # D.31147, D.31148
ret
func2(float):
mulss .LC2(%rip), %xmm0 #, D.31155
ret
Regardless whether you can determine if the conversion will have a performance impact or not, using the correct type better documents your intention. Using an explicit conversions for example static_cast also helps to clarify the conversion was intended as opposed to accidental, which may signify a bug or potential bug.
Note
As supercat points out, multiplication by e.g. 0.1 and 0.1f is not equivalent. I am just going to quote the comment because it was excellent and a summary probably would not do it justice:
For example, if f was equal to 100000224 (which is exactly
representable as a float), multiplying it by one tenth should yield a
result which rounds down to 10000022, but multiplying by 0.1f will
instead yield a result which erroneously rounds up to 10000023. If the
intention is to divide by ten, multiplication by double constant 0.1
will likely be faster than division by 10f, and more precise than
multiplication by 0.1f.
My original point was to demonstrate a false example given in another question but this finely demonstrates subtle issues can exist in toy examples.

It's not an error in the sense that the compiler will reject it, but it is an error in the sense that it may not be what you want.
As your book correctly states, 3.0 is a value of type double. There is an implicit conversion from double to float, so float a = 3.0; is a valid definition of a variable.
However, at least conceptually, this performs a needless conversion. Depending on the compiler, the conversion may be performed at compile time, or it may be saved for run time. A valid reason for saving it for run time is that floating-point conversions are difficult and may have unexpected side effects if the value cannot be represented exactly, and it's not always easy to verify whether the value can be represented exactly.
3.0f avoids that problem: although technically, the compiler is still allowed to calculate the constant at run time (it always is), here, there is absolutely no reason why any compiler might possibly do that.

While not an error, per se, it is a little sloppy. You know you want a float, so initialize it with a float.Another programmer may come along and not be sure which part of the declaration is correct, the type or the initializer. Why not have them both be correct?
float Answer = 42.0f;

When you define a variable, it is initialized with the provided initializer. This may require converting the value of the initializer to the type of the variable that's being initialized. That's what's happening when you say float a = 3.0;: The value of the initializer is converted to float, and the result of the conversion becomes the initial value of a.
That's generally fine, but it doesn't hurt to write 3.0f to show that you're aware of what you're doing, and especially if you want to write auto a = 3.0f.

If you try out the following:
std::cout << sizeof(3.2f) <<":" << sizeof(3.2) << std::endl;
you will get output as:
4:8
that shows, size of 3.2f is taken as 4 bytes on 32-bit machine wheres 3.2 is interpreted as double value taking 8 bytes on 32-bit machine.
This should provide the answer that you are looking for.

The compiler deduces the best-fitting type from literals, or at leas what it thinks is best-fitting. That is rather lose efficiency over precision, i.e. use a double instead of float.
If in doubt, use brace-intializers to make it explicit:
auto d = double{3}; // make a double
auto f = float{3}; // make a float
auto i = int{3}; // make a int
The story gets more interesting if you initialize from another variable where type-conversion rules apply: While it is legal to constuct a double form a literal, it cant be contructed from an int without possible narrowing:
auto xxx = double{i} // warning ! narrowing conversion of 'i' from 'int' to 'double'

Related

Data type float

While declaring a variable of type float, is it necessary to write f towards the end of the value? For example, float amount = .01 and float amount = 0.01f, here what does the f mean and how does it make any difference?Also, what is the role of #include library file here.
It's not necessary: the compiler will make an appropriate numerical conversion for you.
0.01f is a literal of float type, whereas 0.01 is a double type.
Occasionally you need to descriminate explicitly especially when working with templates or overloaded functions:
void foo(const float&){
// Pay me a bonus
}
void foo(const double&){
// Reformat my disk
}
int main(){
foo(1.f);
}
Finally, if you're leading towards using a float over a double, then do read through this: Is using double faster than float?
It depends on how you define your variable. When specifying the type float in the definition, adding a trailing f is not necessary:
float amount = 0.1; /* This is fine, compiler knows the type of amount. */
Adding a superfluous literal here (float amount = 0.1f;) might even be considered bad practice, as you repeat the type information, resulting in more edits when the type is changed.
In the context of type deduction though, you have to give the f literal:
auto amount = 0.1f; /* Without the literal, compiler deduces double. */
There are more subtle contexts in which type deduction occurs, e.g.
std::vector<float> vecOfFloats;
/* ... */
std::accumulate(vecOfFloats.cbegin(), vecOfFloats.cend(), 0.1f);
Here, the third argument is used to deduce the type on which std::accumulate operates. If you just call it like std::accumulate(..., 0.1);, a double to float conversion takes place for every element in vecOfFloats.
.01 is a double literal. There is an implicit conversion to float in the initialisation
float amount = .01;
.01f is a float literal. There is no conversion in the initialisation
float amount = .01f;
That depends...
You can do for example:
1)
float f = 3.14f;
In this case the literal 3.14 is explicitly given as a float... so everything is ok
2)
float f = 3.14;
In this case 3.14 is actually a double, but the variable f is declared as a float...so when compiling, the number will be casted to a float with the loss precision consequences of that case...
You could since c++11
auto f = 3.14;
Or auto f{3,14};
In both cases the compiler takes exactly the type of the literal...(both are doubles)

Is it necessary to cast literal constants in templated functions?

I am writing a code that has to compile in single and double precision. The original version was only in double precision, but I am trying to enable single precision now by using templates.
My question is: is it necessary to cast the 1. and the 7.8 to the specified type with static_cast<TF>(1.) for instance, or will the compiler take care of it? I find the casting not remarkably pretty and prefer to stay away from it. (I have other functions that are much longer and that contain many more literal constants).
template<typename TF>
inline TF phih_stable(const TF zeta)
{
// Hogstrom, 1988
return 1. + 7.8*zeta;
}
Casting and implicit conversions are two things. For this example, you could treat the template function as if it were two overloaded functions, but with the same code within. At the interface level (parameters, return value) the compiler will generate implicit conversions.
Now, the question you will have to ask yourself is this: Do those implicit conversions do what I want? If they do, just leave it as it is. If they don't, you could try to add explicit conversions (maybe use the function-style casting like TF(1.)) or, you could specialize this function for double and float.
Another option, less general but maybe it works here, is that you switch the code around, i.e. that you write the code for single-precision float and then let the compiler apply its implicit conversions. Since the conversions usually only go to the bigger type, it should fit for both double and float without incurring any overhead for float.
When you do:
return 1. + 7.8*zeta;
The literals 1. and 7.8 are double, so when if zeta is a float, it will first be converted to double, then the whole computation will be done in double-precision, and the result will be cast back to float, this is equivalent to:
return (float)(1. + 7.8 * (double)zeta);
Otherwise said, this is equivalent to calling phih_stable(double) and storing the result in a float, so your template would be useless for float.
If you want the computation to be made in single precision, you need the casts1:
return TF(1.) + TF(7.8) * zeta;
What about using 1.f and 7.8f? The problem is that (double)7.8f != 7.8 due to floating point precision. The difference is around 1e-7, the actual stored value for 7.8f (assuming 32-bits float) is:
7.80000019073486328125
While the actual stored value for 7.8 (assuming 64-bits double) is:
7.79999999999999982236431605997
So you have to ask yourself if you accept this loss of precision.
You can compare the two following implementations:
template <class T>
constexpr T phih_stable_cast(T t) {
return T(1l) + T(7.8l) * t;
}
template <class T>
constexpr T phih_stable_float(T t) {
return 1.f + 7.8f * t;
}
And the following assertions:
static_assert(phih_stable_cast(3.4f) == 1. + 7.8f * 3.4f, "");
static_assert(phih_stable_cast(3.4) == 1. + 7.8 * 3.4, "");
static_assert(phih_stable_cast(3.4l) == 1. + 7.8l * 3.4l, "");
static_assert(phih_stable_float(3.4f) == 1.f + 7.8f * 3.4f, "");
static_assert(phih_stable_float(3.4) == 1. + 7.8 * 3.4, "");
static_assert(phih_stable_float(3.4l) == 1.l + 7.8l * 3.4l, "");
The two last assertions fail due to the loss of precision when doing the computation.
1 You should even downcast from long double to not lose precision when using your function with long double: return TF(1.l) + TF(7.8l) * zeta;.

Type of '?:' if the first operand is a constant expression

Consider the following code:
void f(float x)
{
x * (true ? 1.f : 0.0);
}
The type of declval(bool) ? declval(float) : declval(double) is double according to the C++ standard [expr.cond].
Does this mean that the above code has to be equivalent to:
void f(float x)
{
double(x) * 1.0;
}
Or is there a statement that allows an optimization in case the first operand of ?: is a compile time constant expression ?
Yes, it does mean that the above codes are equivalent.
Using RTTI we can check that at least both clang and g++ are standard conformant and give d (e.g. double) as an output to this program:
#include <iostream>
#include <typeinfo>
int main() {
float x = 3.;
auto val = x * (true ? 1.f : 0.0);
std::cout << typeid(val).name() << std::endl;
}
And the alternative way using C++11 type traits
#include <iostream>
#include <typeinfo>
int main() {
float x = 3.;
auto val = x * (true ? 1.f : 0.0);
std::cout << std::boolalpha <<
std::is_same<decltype(val), double>::value << std::endl;
}
Outputs true.
A C++ compiler can optimize as it sees fit, provided that it does not alter the "observable behaviour" of a conforming program (§1.9p1, the so-called "as if" rule).
For example, if on a given platform it is known that multiplying by 1.0 is an identity transformation without the potential to trap, then the multiplication does not actually need to be performed. (This may or may not be true for a given architecture, since it is possible that multiplying a NaN value by 1.0 could trap. However, the compiler could also replace the multiplication by any other operation which would produce the same trap under the same circumstances.)
In the absence of traps and assuming that multiplication by 1.0 is an identity transform, the entire body of your function f can be eliminated, because the standard requires that the set of float values is a subset of the set of double values (possibly the same set). Consequently, the float->double->float round trip must return to the original value or trap. (§3.9.1p8: "The set of values of the type float is a subset of the set of values of the type double". §4.8p1: "A prvalue of floating point type can be converted to a prvalue of another floating point type. If the source value can be exactly represented in the destination type, the result of the conversion is that exact representation.")
So, yes, optimizations may be possible. But that does not affect the type of the ?: expression, in the case that the type is observable (for example, if the expression were to be used for template deduction or as the operand of decltype).

Is a float guaranteed to be preserved when transported through a double in C/C++?

Assuming IEEE-754 conformance, is a float guaranteed to be preserved when transported through a double?
In other words, will the following assert always be satisfied?
int main()
{
float f = some_random_float();
assert(f == (float)(double)f);
}
Assume that f could acquire any of the special values defined by IEEE, such as NaN and Infinity.
According to IEEE, is there a case where the assert will be satisfied, but the exact bit-level representation is not preserved after the transportation through double?
The code snippet is valid in both C and C++.
You don't even need to assume IEEE. C89 says in 3.1.2.5:
The set of values of the type float is a subset of the set of values
of the type double
And every other C and C++ standard says equivalent things. As far as I know, NaNs and infinities are "values of the type float", albeit values with some special-case rules when used as operands.
The fact that the float -> double -> float conversion restores the original value of the float follows (in general) from the fact that numeric conversions all preserve the value if it's representable in the destination type.
Bit-level representations are a slightly different matter. Imagine that there's a value of float that has two distinct bitwise representations. Then nothing in the C standard prevents the float -> double -> float conversion from switching one to the other. In IEEE that won't happen for "actual values" unless there are padding bits, but I don't know whether IEEE rules out a single NaN having distinct bitwise representations. NaNs don't compare equal to themselves anyway, so there's also no standard way to tell whether two NaNs are "the same NaN" or "different NaNs" other than maybe converting them to strings. The issue may be moot.
One thing to watch out for is non-conforming modes of compilers, in which they keep super-precise values "under the covers", for example intermediate results left in floating-point registers and reused without rounding. I don't think that would cause your example code to fail, but as soon as you're doing floating-point == it's the kind of thing you start worrying about.
From C99:
6.3.1.5 Real floating types
1 When a float is promoted to double or long double, or a double is promoted to long double, its value is unchanged.
2 When a double is demoted to float, a long double is demoted to double or float, or a value being represented in greater precision and range than required by its semantic type (see 6.3.1.8) is explicitly converted to its semantic type, if the value being converted can be represented exactly in the new type, it is unchanged...
I think, this guarantees you that a float->double->float conversion is going to preserve the original float value.
The standard also defines the macros INFINITY and NAN in 7.12 Mathematics <math.h>:
4 The macro INFINITY expands to a constant expression of type float representing positive or unsigned infinity, if available; else to a positive constant of type float that overflows at translation time.
5 The macro NAN is defined if and only if the implementation supports quiet NaNs for the float type. It expands to a constant expression of type float representing a quiet NaN.
So, there's provision for such special values and conversions may just work for them as well (including for the minus infinity and negative zero).
The assertion will fail in flush-to-zero and/or denormalized-is-zero mode (e.g. code compiled with -mfpmath=sse, -fast-math, etc, but also on heaps of compilers and architectures as default, such as Intel's C++ compiler) if f is denormalized.
You cannot produce a denormalized float in that mode though, but the scenario is still possible:
a) Denormalized float comes from external source.
b) Some libraries tamper with FPU modes but forget (or intentionally avoid) setting them back after each function call to it, making it possible for caller to mismatch normalization.
Practical example which prints following:
f = 5.87747e-39
f2 = 5.87747e-39
f = 5.87747e-39
f2 = 0
error, f != f2!
The example works both for VC2010 and GCC 4.3 but assumes that VC uses SSE for math as default and GCC uses FPU for math as default. The example may fail to illustrate the problem otherwise.
#include <limits>
#include <iostream>
#include <cmath>
#ifdef _MSC_VER
#include <xmmintrin.h>
#endif
template <class T>bool normal(T t)
{
return (t != 0 || fabsf( t ) >= std::numeric_limits<T>::min());
}
void csr_flush_to_zero()
{
#ifdef _MSC_VER
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);
#else
unsigned csr = __builtin_ia32_stmxcsr();
csr |= (1 << 15);
__builtin_ia32_ldmxcsr(csr);
#endif
}
void test_cast(float f)
{
std::cout << "f = " << f << "\n";
double d = double(f);
float f2 = float(d);
std::cout << "f2 = " << f2 << "\n";
if(f != f2)
std::cout << "error, f != f2!\n";
std::cout << "\n";
}
int main()
{
float f = std::numeric_limits<float>::min() / 2.0;
test_cast(f);
csr_flush_to_zero();
test_cast(f);
}

Locating numerical errors due to Integer division

Is there a g++ warning or other tool that can identify integer division (truncation toward zero)? I have thousands of lines of code with calculations that inevitably will have numerical errors typically due to "float = int/int" that need to be located. I need a reasonable method for finding these.
Try -Wconversion.
From gcc's man page:
Warn for implicit conversions that may
alter a value. This includes
conversions between real and integer,
like "abs (x)" when "x" is "double";
conversions between signed and
unsigned, like "unsigned ui = -1"; and
conversions to smaller types, like
"sqrtf (M_PI)". Do not warn for
explicit casts like "abs ((int) x)"
and "ui = (unsigned) -1", or if the
value is not changed by the conversion
like in "abs (2.0)". Warnings about
conversions between signed and
unsigned integers can be disabled by
using -Wno-sign-conversion.
For C++, also warn for conversions
between "NULL" and non-pointer types;
confusing overload resolution for
user-defined conversions; and
conversions that will never use a type
conversion operator: conversions to
"void", the same type, a base class or
a reference to them. Warnings about
conversions between signed and
unsigned integers are disabled by
default in C++ unless
-Wsign-conversion is explicitly enabled.
For the following sample program (test.cpp), I get the error test.cpp: In function ‘int main()’:
test.cpp:7: warning: conversion to ‘float’ from ‘int’ may alter its value.
#include <iostream>
int main()
{
int a = 2;
int b = 3;
float f = a / b;
std::cout << f;
return 0;
}
I have a hard time calling these numerical errors. You asked for integer calculations, and got the correct numbers for integer calculations. If those numbers aren't acceptable, then ask for floating point calculations:
int x = 3;
int y = 10;
int z = x / y;
// "1." is the same thing as "1.0", you may want to read up on
// "the usual arithmetic conversions." You could add some
// parentheses here, but they aren't needed for this specific
// statement.
double zz = 1. * x / y;
This page contains info about g++ warnings. If you've already tried -Wall then the only thing left could be the warnings in this link. On second look -Wconversion might do the trick.
Note: Completely edited the response.
Remark on -Wconversion of gcc:
Changing the type of the floating point variable from float to double makes the warning vanish:
$ cat 'file.cpp'
#include <iostream>
int main()
{
int a = 2;
int b = 3;
double f = a / b;
std::cout << f;
}
Compiling with $ g++-4.7 -Wconversion 'file.cpp' returns no warnings (as $ clang++ -Weverything 'file.cpp').
Explanation:
The warning when using the type float is not returned because of the totally valid integer arithmetics, but because float cannot store all possible values of int (larger ones cannot be captured by float but by double). So there might be a change of value when assigning RHS to f in the case of float but not in the case of double. To make it clear: The warning is not returned because of int/int but because of the assignment float = int.
For this see following questions: what the difference between the float and integer data type when the size is same in java, Storing ints as floats and Rounding to use for int -> float -> int round trip conversion
However, when using float -Wconversion could still be useful to identify possible lines which are affected but is not comprehensive and is actually not intended for that. For the purpose of -Wconversion see docs/gcc/Warning-Options.html and here gcc.gnu.org/wiki/NewWconversion
Possibly of interest is also following discussion 'Implicit casting Integer calculation to float in C++'
The best way to find such error is to have really good unit tests. All alternatives are not good enough.
Have a look at this clang-tidy detection.
It catches cases like this:
d = 32 * 8 / (2 + i);
d = 8 * floatFunc(1 + 7 / 2);
d = i / (1 << 4);