What is a correct way to convert double to float in c++. Is the conversion implicit?
Question 1: Consider double d = 5.0; and float f;
Which one is correct?
f = d;
f = (float)d;
f = static_cast<float>(d);
Question 2: Now consider we have
char *buffer = readAllBuffer();
double *d = (double*)(buffer + offset);
float f;
Which one is now correct?
f = d[0];
f = (float)d[0];
f = static_cast<float>(d[0]);
Thanks in advance!
They all boil down to the same thing, and the use of arrays is a red herring. You can indeed write
float f = d;
Some folk argue that a static_cast makes code more readable as it sticks out so clearly. It can also defeat warnings that some compilers might issue if a less long-winded form is used.
Naturally of course since a double is a superset of float, you might lose precision. Finally, note that for
float f1 = whatever;
double d1 = f1;
float f2 = d1;
, the C++ standard insists that f1 and f2 must be the same value.
You do have one major issue. This is not allowed:
double *d = (double*)(buffer + offset);
It violates strict aliasing and quite possibly alignment requirements. Instead you need to use memcpy:
double d;
memcpy(&d, buffer + offset, sizeof d);
float f = d;
Either of the cast alternative can be substituted for the last line, the important change is from dereferencing an pointer with incorrect type and alignment to making a bytewise copy.
Related
While declaring a variable of type float, is it necessary to write f towards the end of the value? For example, float amount = .01 and float amount = 0.01f, here what does the f mean and how does it make any difference?Also, what is the role of #include library file here.
It's not necessary: the compiler will make an appropriate numerical conversion for you.
0.01f is a literal of float type, whereas 0.01 is a double type.
Occasionally you need to descriminate explicitly especially when working with templates or overloaded functions:
void foo(const float&){
// Pay me a bonus
}
void foo(const double&){
// Reformat my disk
}
int main(){
foo(1.f);
}
Finally, if you're leading towards using a float over a double, then do read through this: Is using double faster than float?
It depends on how you define your variable. When specifying the type float in the definition, adding a trailing f is not necessary:
float amount = 0.1; /* This is fine, compiler knows the type of amount. */
Adding a superfluous literal here (float amount = 0.1f;) might even be considered bad practice, as you repeat the type information, resulting in more edits when the type is changed.
In the context of type deduction though, you have to give the f literal:
auto amount = 0.1f; /* Without the literal, compiler deduces double. */
There are more subtle contexts in which type deduction occurs, e.g.
std::vector<float> vecOfFloats;
/* ... */
std::accumulate(vecOfFloats.cbegin(), vecOfFloats.cend(), 0.1f);
Here, the third argument is used to deduce the type on which std::accumulate operates. If you just call it like std::accumulate(..., 0.1);, a double to float conversion takes place for every element in vecOfFloats.
.01 is a double literal. There is an implicit conversion to float in the initialisation
float amount = .01;
.01f is a float literal. There is no conversion in the initialisation
float amount = .01f;
That depends...
You can do for example:
1)
float f = 3.14f;
In this case the literal 3.14 is explicitly given as a float... so everything is ok
2)
float f = 3.14;
In this case 3.14 is actually a double, but the variable f is declared as a float...so when compiling, the number will be casted to a float with the loss precision consequences of that case...
You could since c++11
auto f = 3.14;
Or auto f{3,14};
In both cases the compiler takes exactly the type of the literal...(both are doubles)
This question has probably being asked, but I searched and could not find the answer.
I'm implementing a toy virtual machine, where the OpCodes take the form:
std::tuple<int8_t, int64_t, int64_t> // instruction op1, op2
I'm trying to pack a double into one of the operands and read it back again when processing it. This doesn't work reliably.
double d = ...
auto a = static_cast<int64_t>(d);
auto b = static_cast<double>(a)
// sometimes, b != d
Is there a way to pack the bit representation of the double into an int64_t, and then read that bit pattern back get the same exact double as before?
static_cast performs a value conversion - the fractionary part is always lost. memcpy is what you are after.
double d = ...
int64_t a;
memcpy(&a, &d, sizeof(a));
double d2;
memcpy(&d2, &a, sizeof(d2));
Still, I would probably instead make the operands a union with a double and an int64_t (plus possibly other types that are interesting for your VM).
One way to make it work is to reinterpret the block of memory as int64_t/double, i.e. to do pointer casts:
double d = ...
auto *a = (int64_t*)&d;
auto *d2 = (double*)a;
auto b = *d2;
assert(d == b);
Note that we both assume here that double and int64_t are of the same size (64 bit). I don't remember now if it is a part of the standard.
Is there a way i could initialize a float type variable with hexadecimal number? what i want to do is say i have single precision representation for 4 which is 0x40800000 in hex. I want to do something like float a = 0x40800000 in which case it takes the hex value as integer. What can i do to make it treat as floating point number?
One option is to use type punning via a union. This is defined behaviour in C since C99 (previously this was implementation defined).
union {
float f;
uint32_t u;
} un;
un.u = 0x40800000;
float a = un.f;
As you tagged this C++, you could also use reinterpret_cast.
uint32_t u = 0x40800000;
float a = *reinterpret_cast<float*>(&u);
Before doing either of these, you should also confirm that they're the same size:
assert(sizeof(float) == sizeof(uint32_t));
You can do this if you introduce a temporary integer type variable, cast it to a floating point type and dereference it. You must be careful about the sizes of the types involved, and know that they may change. With my compiler, for example, this works:
unsigned i = 0x40800000;
float a = *(float*)&i;
printf("%f\n", a);
// output 4.00000
I'm not sure how you're getting your the value "0x40800000".
If that's coming in as an int you can just do:
const auto foo = 0x40800000;
auto a = *(float*)&foo;
If that's coming in as a string you can do:
float a;
sscanf("0x40800000", "0x%x", &a);
If I have the following declaration:
float a = 3.0 ;
is that an error? I read in a book that 3.0 is a double value and that I have to specify it as float a = 3.0f. Is it so?
It is not an error to declare float a = 3.0 : if you do, the compiler will convert the double literal 3.0 to a float for you.
However, you should use the float literals notation in specific scenarios.
For performance reasons:
Specifically, consider:
float foo(float x) { return x * 0.42; }
Here the compiler will emit a conversion (that you will pay at runtime) for each returned value. To avoid it you should declare:
float foo(float x) { return x * 0.42f; } // OK, no conversion required
To avoid bugs when comparing results:
e.g. the following comparison fails :
float x = 4.2;
if (x == 4.2)
std::cout << "oops"; // Not executed!
We can fix it with the float literal notation :
if (x == 4.2f)
std::cout << "ok !"; // Executed!
(Note: of course, this is not how you should compare float or double numbers for equality in general)
To call the correct overloaded function (for the same reason):
Example:
void foo(float f) { std::cout << "\nfloat"; }
void foo(double d) { std::cout << "\ndouble"; }
int main()
{
foo(42.0); // calls double overload
foo(42.0f); // calls float overload
return 0;
}
As noted by Cyber, in a type deduction context, it is necessary to help the compiler deduce a float :
In case of auto :
auto d = 3; // int
auto e = 3.0; // double
auto f = 3.0f; // float
And similarly, in case of template type deduction :
void foo(float f) { std::cout << "\nfloat"; }
void foo(double d) { std::cout << "\ndouble"; }
template<typename T>
void bar(T t)
{
foo(t);
}
int main()
{
bar(42.0); // Deduce double
bar(42.0f); // Deduce float
return 0;
}
Live demo
The compiler will turn any of the following literals into floats, because you declared the variable as a float.
float a = 3; // converted to float
float b = 3.0; // converted to float
float c = 3.0f; // float
It would matter is if you used auto (or other type deducting methods), for example:
auto d = 3; // int
auto e = 3.0; // double
auto f = 3.0f; // float
Floating point literals without a suffix are of type double, this is covered in the draft C++ standard section 2.14.4 Floating literals:
[...]The type of a floating literal is double unless explicitly specified by a suffix.[...]
so is it an error to assign 3.0 a double literal to a float?:
float a = 3.0
No, it is not, it will be converted, which is covered in section 4.8 Floating point conversions:
A prvalue of floating point type can be converted to a prvalue of
another floating point type. If the source value can be exactly
represented in the destination type, the result of the conversion is
that exact representation. If the source value is between two adjacent
destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise,
the behavior is undefined.
We can read more details on the implications of this in GotW #67: double or nothing which says:
This means that a double constant can be implicitly (i.e., silently)
converted to a float constant, even if doing so loses precision (i.e.,
data). This was allowed to remain for C compatibility and usability
reasons, but it's worth keeping in mind when you do floating-point
work.
A quality compiler will warn you if you try to do something that's
undefined behavior, namely put a double quantity into a float that's
less than the minimum, or greater than the maximum, value that a float
is able to represent. A really good compiler will provide an optional
warning if you try to do something that may be defined but could lose
information, namely put a double quantity into a float that is between
the minimum and maximum values representable by a float, but which
can't be represented exactly as a float.
So there are caveats for the general case that you should be aware of.
From a practical perspective, in this case the results will most likely be the same even though technically there is a conversion, we can see this by trying out the following code on godbolt:
#include <iostream>
float func1()
{
return 3.0; // a double literal
}
float func2()
{
return 3.0f ; // a float literal
}
int main()
{
std::cout << func1() << ":" << func2() << std::endl ;
return 0;
}
and we see that the results for func1 and func2 are identical, using both clang and gcc:
func1():
movss xmm0, DWORD PTR .LC0[rip]
ret
func2():
movss xmm0, DWORD PTR .LC0[rip]
ret
As Pascal points out in this comment you won't always be able to count on this. Using 0.1 and 0.1f respectively causes the assembly generated to differ since the conversion must now be done explicitly. The following code:
float func1(float x )
{
return x*0.1; // a double literal
}
float func2(float x)
{
return x*0.1f ; // a float literal
}
results in the following assembly:
func1(float):
cvtss2sd %xmm0, %xmm0 # x, D.31147
mulsd .LC0(%rip), %xmm0 #, D.31147
cvtsd2ss %xmm0, %xmm0 # D.31147, D.31148
ret
func2(float):
mulss .LC2(%rip), %xmm0 #, D.31155
ret
Regardless whether you can determine if the conversion will have a performance impact or not, using the correct type better documents your intention. Using an explicit conversions for example static_cast also helps to clarify the conversion was intended as opposed to accidental, which may signify a bug or potential bug.
Note
As supercat points out, multiplication by e.g. 0.1 and 0.1f is not equivalent. I am just going to quote the comment because it was excellent and a summary probably would not do it justice:
For example, if f was equal to 100000224 (which is exactly
representable as a float), multiplying it by one tenth should yield a
result which rounds down to 10000022, but multiplying by 0.1f will
instead yield a result which erroneously rounds up to 10000023. If the
intention is to divide by ten, multiplication by double constant 0.1
will likely be faster than division by 10f, and more precise than
multiplication by 0.1f.
My original point was to demonstrate a false example given in another question but this finely demonstrates subtle issues can exist in toy examples.
It's not an error in the sense that the compiler will reject it, but it is an error in the sense that it may not be what you want.
As your book correctly states, 3.0 is a value of type double. There is an implicit conversion from double to float, so float a = 3.0; is a valid definition of a variable.
However, at least conceptually, this performs a needless conversion. Depending on the compiler, the conversion may be performed at compile time, or it may be saved for run time. A valid reason for saving it for run time is that floating-point conversions are difficult and may have unexpected side effects if the value cannot be represented exactly, and it's not always easy to verify whether the value can be represented exactly.
3.0f avoids that problem: although technically, the compiler is still allowed to calculate the constant at run time (it always is), here, there is absolutely no reason why any compiler might possibly do that.
While not an error, per se, it is a little sloppy. You know you want a float, so initialize it with a float.Another programmer may come along and not be sure which part of the declaration is correct, the type or the initializer. Why not have them both be correct?
float Answer = 42.0f;
When you define a variable, it is initialized with the provided initializer. This may require converting the value of the initializer to the type of the variable that's being initialized. That's what's happening when you say float a = 3.0;: The value of the initializer is converted to float, and the result of the conversion becomes the initial value of a.
That's generally fine, but it doesn't hurt to write 3.0f to show that you're aware of what you're doing, and especially if you want to write auto a = 3.0f.
If you try out the following:
std::cout << sizeof(3.2f) <<":" << sizeof(3.2) << std::endl;
you will get output as:
4:8
that shows, size of 3.2f is taken as 4 bytes on 32-bit machine wheres 3.2 is interpreted as double value taking 8 bytes on 32-bit machine.
This should provide the answer that you are looking for.
The compiler deduces the best-fitting type from literals, or at leas what it thinks is best-fitting. That is rather lose efficiency over precision, i.e. use a double instead of float.
If in doubt, use brace-intializers to make it explicit:
auto d = double{3}; // make a double
auto f = float{3}; // make a float
auto i = int{3}; // make a int
The story gets more interesting if you initialize from another variable where type-conversion rules apply: While it is legal to constuct a double form a literal, it cant be contructed from an int without possible narrowing:
auto xxx = double{i} // warning ! narrowing conversion of 'i' from 'int' to 'double'
I see a lot of C++ code that has lines like:
float a = 2;
float x = a + 1.0f;
float b = 3.0f * a;
float c = 2.0f * (1.0f - a);
Are these .0f after these literals really necessary? Would you lose numeric accuracy if you omit these?
I thought you only need them if you have a line like this:
float a = 10;
float x = 1 / a;
where you should use 1.0f, right?
You would need to use it in the following case:
float x = 1/3;
either 1 or 3 needs to have a .0 or else x will always be 0.
If a is an int, these two lines are definitely not equivalent:
float b = 3.0f * a;
float b = 3 * a;
The second will silently overflow if a is too large, because the right-hand side is evaluated using integer arithmetic. But the first is perfectly safe.
But if a is a float, as in your examples, then the two expressions are equivalent. Which one you use is a question of personal preference; the first is probably more hygeinic.
It somewhat depends on what you are doing with the numbers. The type of a floating point literal with a f or F are of type float. The type of a floating point literal without a suffix is of type double. As a result, there may be subtle differences when using a f suffix compared to not using it.
As long as a subexpression involves at least one object of floating point type it probably doesn't matter much. It is more important to use suffixes with integers to be interpreted as floating points: If there is no floating point value involved in a subexpression integer arithmetic is used. This can have major effects because the result will be an integer.
float b = 3.0f * a;
Sometimes this is done because you want to make sure 3.0 is created as a float and not as double.