Data type float - c++

While declaring a variable of type float, is it necessary to write f towards the end of the value? For example, float amount = .01 and float amount = 0.01f, here what does the f mean and how does it make any difference?Also, what is the role of #include library file here.

It's not necessary: the compiler will make an appropriate numerical conversion for you.
0.01f is a literal of float type, whereas 0.01 is a double type.
Occasionally you need to descriminate explicitly especially when working with templates or overloaded functions:
void foo(const float&){
// Pay me a bonus
}
void foo(const double&){
// Reformat my disk
}
int main(){
foo(1.f);
}
Finally, if you're leading towards using a float over a double, then do read through this: Is using double faster than float?

It depends on how you define your variable. When specifying the type float in the definition, adding a trailing f is not necessary:
float amount = 0.1; /* This is fine, compiler knows the type of amount. */
Adding a superfluous literal here (float amount = 0.1f;) might even be considered bad practice, as you repeat the type information, resulting in more edits when the type is changed.
In the context of type deduction though, you have to give the f literal:
auto amount = 0.1f; /* Without the literal, compiler deduces double. */
There are more subtle contexts in which type deduction occurs, e.g.
std::vector<float> vecOfFloats;
/* ... */
std::accumulate(vecOfFloats.cbegin(), vecOfFloats.cend(), 0.1f);
Here, the third argument is used to deduce the type on which std::accumulate operates. If you just call it like std::accumulate(..., 0.1);, a double to float conversion takes place for every element in vecOfFloats.

.01 is a double literal. There is an implicit conversion to float in the initialisation
float amount = .01;
.01f is a float literal. There is no conversion in the initialisation
float amount = .01f;

That depends...
You can do for example:
1)
float f = 3.14f;
In this case the literal 3.14 is explicitly given as a float... so everything is ok
2)
float f = 3.14;
In this case 3.14 is actually a double, but the variable f is declared as a float...so when compiling, the number will be casted to a float with the loss precision consequences of that case...
You could since c++11
auto f = 3.14;
Or auto f{3,14};
In both cases the compiler takes exactly the type of the literal...(both are doubles)

Related

How can I make this expression involving floating-point functions a compile-time constant?

I have a constant integer, steps, which is calculated using the floor function of the quotient of two other constant variables. However, when I attempt to use this as the length of an array, visual studio tells me it must be a constant value and the current value cannot be used as a constant. How do I make this a "true" constant that can be used as an array length? Is the floor function the problem, and is there an alternative I could use?
const int simlength = 3.154*pow(10,7);
const float timestep = 100;
const int steps = floor(simlength / timestep);
struct body bodies[bcount];
struct body {
string name;
double mass;
double position[2];
double velocity[2];
double radius;
double trace[2][steps];
};
It is not possible with the standard library's std::pow and std::floor function, because they are not constexpr-qualified.
You can probably replace std::pow with a hand-written implementation my_pow that is marked constexpr. Since you are just trying to take the power of integers, that shouldn't be too hard. If you are only using powers of 10, floating point literals may be written in the scientific notation as well, e.g. 1e7, which makes the pow call unnecessary.
The floor call is not needed since float/double to int conversion already does flooring implicitly. Or more correctly it truncates, which for positive non-negative values is equivalent to flooring.
Then you should also replace the const with constexpr in the variable declarations to make sure that the variables are usable in constant expressions:
constexpr int simlength = 3.154*my_pow(10,7); // or `3.154e7`
constexpr float timestep = 100;
constexpr int steps = simlength / timestep;
Theoretically only float requires this change, since there is a special exception for const integral types, but it seems more consistent this way.
Also, I have a feeling that there is something wrong with the types of your variables. A length and steps should not be determined by floating-point operations and types, but by integer types and operations alone. Floating-point operations are not exact and introduce errors relative to the mathematical precise calculations on the real numbers. It is easy to get unexpected off-by-one or worse errors this way.
You cannot define an array of a class type before defining the class.
Solution: Define body before defining bodies.
Furthermore, you cannot use undefined names.
Solution: Define bcount before using it as the size of the array.
Is the floor function the problem, and is there an alternative I could use?
std::floor is one problem. There's an easy solution: Don't use it. Converting a floating point number to integer performs similar operation implicitly (the behaviour is different in case of negative numbers).
std::pow is another problem. It cannot be replaced as trivially in general, but in this case we can use a floating point literal in scientific notation instead.
Lastly, non-constexpr floating point variable isn't compile time constant. Solution: Use constexpr.
Here is a working solution:
constexpr int simlength = 3.154e7;
constexpr float timestep = 100;
constexpr int steps = simlength / timestep;
P.S. trace is a very large array. I would recommend against using so large member variables, because it's easy for the user of the class to not notice such detail, and they are likely to create instances of the class in automatic storage. This is a problem because so large objects in automatic storage are prone to cause stack overflow errors. Using std::vector instead of an array is an easy solution. If you do use std::vector, then as a side effect the requirement of compile time constant size disappear and you will no longer have trouble using std::pow etc.
Because simlength is 3.154*10-to-the-7th, and because timestep is 10-squared, then the steps variable's value can be written as:
3.154e7 / 1e2 == 3.154e5
And, adding a type-cast, you should be able to write the array as:
double trace[2][(int)(3.154e5)];
Note that this is HIGHLY IRREGULAR, and should have extensive comments describing why you did this.
Try switching to constexpr:
constexpr int simlength = 3.154e7;
constexpr float timestep = 1e2;
constexpr int steps = simlength / timestep;
struct body {
string name;
double mass;
double position[2];
double velocity[2];
double radius;
double trace[2][steps];
};

Treating a hexadecimal value as single precision or double precision value

Is there a way i could initialize a float type variable with hexadecimal number? what i want to do is say i have single precision representation for 4 which is 0x40800000 in hex. I want to do something like float a = 0x40800000 in which case it takes the hex value as integer. What can i do to make it treat as floating point number?
One option is to use type punning via a union. This is defined behaviour in C since C99 (previously this was implementation defined).
union {
float f;
uint32_t u;
} un;
un.u = 0x40800000;
float a = un.f;
As you tagged this C++, you could also use reinterpret_cast.
uint32_t u = 0x40800000;
float a = *reinterpret_cast<float*>(&u);
Before doing either of these, you should also confirm that they're the same size:
assert(sizeof(float) == sizeof(uint32_t));
You can do this if you introduce a temporary integer type variable, cast it to a floating point type and dereference it. You must be careful about the sizes of the types involved, and know that they may change. With my compiler, for example, this works:
unsigned i = 0x40800000;
float a = *(float*)&i;
printf("%f\n", a);
// output 4.00000
I'm not sure how you're getting your the value "0x40800000".
If that's coming in as an int you can just do:
const auto foo = 0x40800000;
auto a = *(float*)&foo;
If that's coming in as a string you can do:
float a;
sscanf("0x40800000", "0x%x", &a);

Is 'float a = 3.0;' a correct statement?

If I have the following declaration:
float a = 3.0 ;
is that an error? I read in a book that 3.0 is a double value and that I have to specify it as float a = 3.0f. Is it so?
It is not an error to declare float a = 3.0 : if you do, the compiler will convert the double literal 3.0 to a float for you.
However, you should use the float literals notation in specific scenarios.
For performance reasons:
Specifically, consider:
float foo(float x) { return x * 0.42; }
Here the compiler will emit a conversion (that you will pay at runtime) for each returned value. To avoid it you should declare:
float foo(float x) { return x * 0.42f; } // OK, no conversion required
To avoid bugs when comparing results:
e.g. the following comparison fails :
float x = 4.2;
if (x == 4.2)
std::cout << "oops"; // Not executed!
We can fix it with the float literal notation :
if (x == 4.2f)
std::cout << "ok !"; // Executed!
(Note: of course, this is not how you should compare float or double numbers for equality in general)
To call the correct overloaded function (for the same reason):
Example:
void foo(float f) { std::cout << "\nfloat"; }
void foo(double d) { std::cout << "\ndouble"; }
int main()
{
foo(42.0); // calls double overload
foo(42.0f); // calls float overload
return 0;
}
As noted by Cyber, in a type deduction context, it is necessary to help the compiler deduce a float :
In case of auto :
auto d = 3; // int
auto e = 3.0; // double
auto f = 3.0f; // float
And similarly, in case of template type deduction :
void foo(float f) { std::cout << "\nfloat"; }
void foo(double d) { std::cout << "\ndouble"; }
template<typename T>
void bar(T t)
{
foo(t);
}
int main()
{
bar(42.0); // Deduce double
bar(42.0f); // Deduce float
return 0;
}
Live demo
The compiler will turn any of the following literals into floats, because you declared the variable as a float.
float a = 3; // converted to float
float b = 3.0; // converted to float
float c = 3.0f; // float
It would matter is if you used auto (or other type deducting methods), for example:
auto d = 3; // int
auto e = 3.0; // double
auto f = 3.0f; // float
Floating point literals without a suffix are of type double, this is covered in the draft C++ standard section 2.14.4 Floating literals:
[...]The type of a floating literal is double unless explicitly specified by a suffix.[...]
so is it an error to assign 3.0 a double literal to a float?:
float a = 3.0
No, it is not, it will be converted, which is covered in section 4.8 Floating point conversions:
A prvalue of floating point type can be converted to a prvalue of
another floating point type. If the source value can be exactly
represented in the destination type, the result of the conversion is
that exact representation. If the source value is between two adjacent
destination values, the result of the conversion is an
implementation-defined choice of either of those values. Otherwise,
the behavior is undefined.
We can read more details on the implications of this in GotW #67: double or nothing which says:
This means that a double constant can be implicitly (i.e., silently)
converted to a float constant, even if doing so loses precision (i.e.,
data). This was allowed to remain for C compatibility and usability
reasons, but it's worth keeping in mind when you do floating-point
work.
A quality compiler will warn you if you try to do something that's
undefined behavior, namely put a double quantity into a float that's
less than the minimum, or greater than the maximum, value that a float
is able to represent. A really good compiler will provide an optional
warning if you try to do something that may be defined but could lose
information, namely put a double quantity into a float that is between
the minimum and maximum values representable by a float, but which
can't be represented exactly as a float.
So there are caveats for the general case that you should be aware of.
From a practical perspective, in this case the results will most likely be the same even though technically there is a conversion, we can see this by trying out the following code on godbolt:
#include <iostream>
float func1()
{
return 3.0; // a double literal
}
float func2()
{
return 3.0f ; // a float literal
}
int main()
{
std::cout << func1() << ":" << func2() << std::endl ;
return 0;
}
and we see that the results for func1 and func2 are identical, using both clang and gcc:
func1():
movss xmm0, DWORD PTR .LC0[rip]
ret
func2():
movss xmm0, DWORD PTR .LC0[rip]
ret
As Pascal points out in this comment you won't always be able to count on this. Using 0.1 and 0.1f respectively causes the assembly generated to differ since the conversion must now be done explicitly. The following code:
float func1(float x )
{
return x*0.1; // a double literal
}
float func2(float x)
{
return x*0.1f ; // a float literal
}
results in the following assembly:
func1(float):
cvtss2sd %xmm0, %xmm0 # x, D.31147
mulsd .LC0(%rip), %xmm0 #, D.31147
cvtsd2ss %xmm0, %xmm0 # D.31147, D.31148
ret
func2(float):
mulss .LC2(%rip), %xmm0 #, D.31155
ret
Regardless whether you can determine if the conversion will have a performance impact or not, using the correct type better documents your intention. Using an explicit conversions for example static_cast also helps to clarify the conversion was intended as opposed to accidental, which may signify a bug or potential bug.
Note
As supercat points out, multiplication by e.g. 0.1 and 0.1f is not equivalent. I am just going to quote the comment because it was excellent and a summary probably would not do it justice:
For example, if f was equal to 100000224 (which is exactly
representable as a float), multiplying it by one tenth should yield a
result which rounds down to 10000022, but multiplying by 0.1f will
instead yield a result which erroneously rounds up to 10000023. If the
intention is to divide by ten, multiplication by double constant 0.1
will likely be faster than division by 10f, and more precise than
multiplication by 0.1f.
My original point was to demonstrate a false example given in another question but this finely demonstrates subtle issues can exist in toy examples.
It's not an error in the sense that the compiler will reject it, but it is an error in the sense that it may not be what you want.
As your book correctly states, 3.0 is a value of type double. There is an implicit conversion from double to float, so float a = 3.0; is a valid definition of a variable.
However, at least conceptually, this performs a needless conversion. Depending on the compiler, the conversion may be performed at compile time, or it may be saved for run time. A valid reason for saving it for run time is that floating-point conversions are difficult and may have unexpected side effects if the value cannot be represented exactly, and it's not always easy to verify whether the value can be represented exactly.
3.0f avoids that problem: although technically, the compiler is still allowed to calculate the constant at run time (it always is), here, there is absolutely no reason why any compiler might possibly do that.
While not an error, per se, it is a little sloppy. You know you want a float, so initialize it with a float.Another programmer may come along and not be sure which part of the declaration is correct, the type or the initializer. Why not have them both be correct?
float Answer = 42.0f;
When you define a variable, it is initialized with the provided initializer. This may require converting the value of the initializer to the type of the variable that's being initialized. That's what's happening when you say float a = 3.0;: The value of the initializer is converted to float, and the result of the conversion becomes the initial value of a.
That's generally fine, but it doesn't hurt to write 3.0f to show that you're aware of what you're doing, and especially if you want to write auto a = 3.0f.
If you try out the following:
std::cout << sizeof(3.2f) <<":" << sizeof(3.2) << std::endl;
you will get output as:
4:8
that shows, size of 3.2f is taken as 4 bytes on 32-bit machine wheres 3.2 is interpreted as double value taking 8 bytes on 32-bit machine.
This should provide the answer that you are looking for.
The compiler deduces the best-fitting type from literals, or at leas what it thinks is best-fitting. That is rather lose efficiency over precision, i.e. use a double instead of float.
If in doubt, use brace-intializers to make it explicit:
auto d = double{3}; // make a double
auto f = float{3}; // make a float
auto i = int{3}; // make a int
The story gets more interesting if you initialize from another variable where type-conversion rules apply: While it is legal to constuct a double form a literal, it cant be contructed from an int without possible narrowing:
auto xxx = double{i} // warning ! narrowing conversion of 'i' from 'int' to 'double'

Is there a type which can switch between Float and Double?

Is there any predefined type which can switch between float and double in some specific condition.
For example, some type, I would like this type be float; sometimes I need this type becomes double.
Use always double, it is recommended everywhere, if you don't have huge arrays in your program. This is a thing that I learned in programming competitions and because of which I failed many times previously because of precision problems.
It is not clear what you want to do. As #thecoder suggested the simplest option is to just use double. To convert it to float you just assign it:
double d = 0.1;
float f = d;
However, if you need to write code that must work both for float and double, use a template:
template<NumberT>
NumberT sum(NumberT first, NumberT second) {
return first + second;
}

Is const double cast at compile time?

It makes sense to define mathematical constants as double values but what happens when one requires float values instead of doubles? Does the compiler automatically interpret the doubles as floats at compile-time (so they are actually treated as they were const floats) or is this conversion made at runtime?
If by "defining", you mean using #define, here's what happens:
Say you have:
#define CONST1 1.5
#define CONST2 1.12312455431461363145134614 // Assume some number too
// precise for float
Now if you have:
float x = CONST1;
float y = CONST2;
you don't get any warning for x because the compiler automatically makes CONST1 a float. For y, you get a warning because CONST2 doesn't fit in a float, but the compiler casts it to float anyway.
If by "defining", you mean using const variables, here's what happens:
Say you have
const double CONST1=1.5;
const double CONST2=1.12312455431461363145134614; // Assume some number too
// precise for float
Now if you have:
float x = CONST1;
float y = CONST2;
there is no way for the compiler to know the values of CONST1 and CONST2(*) and therefore cannot interpret the values as float at compile them. You will be given two warnings about possible loss of data and the conversion will be done at runtime.
(*) Actually there is a way. Since the values are const, the optimizer may decide not to take a variable for them, but replace the values throughout the code. This could get complicated though, as you may pass the address to these variables around, so the optimizer may decide not to do that. That is, don't count on it.
Note that, this whole thing is true for any basic type conversions. If you have
#define CONST3 1
then you think CONST3 is int, but if you put it in a float, it would become float at compile-time, or if you put it in a char, it would become char at compiler-time.