How come some people append f to the end of variables? - c++

In the tutorial I'm reading for OGRE3d here the programmer is constantly adding f at the end of any variable he initializes, like 200.00f or 0.00f so I decided to erase f and see if it compiles and it compiles just fine, what is the point of adding f at the end of the variable?
EDIT: So you're saying if I initialize a variable with 200.03 it won't initialize it as a floating point but if I were to do so with 200.03f it would? If not where does the f become useful then?

It's a way to specify that number has to be interpreted as a "float", not a "double" (which is the standard for C++ decimal numbers and uses up twice the memory).
A floating-point constant without an f, F, l, or L suffix has type
double. If the letter f or F is the suffix, the constant has type
float. If suffixed by the letter l or L, it has type long double. For

200.00f is not a variable. It can't vary.
It's a compile-time constant, with float representation. The f signifies that it's a float.
By comparison, 200.00 would be interpreted as a double.

The C standard states that constant floats are doubles which promotes the operation to a double.
float a,b,c;
a = b+7.1; this is a double precision operation
a = b+7.1f; this is a single precision operation
c = 7.1; //double
a = b + c; //single all the way
The double precision requires more storage for the constant, plus a conversion from single to double for the variable operand, then a conversion from double to single to assign the result. With all the conversions going on if you are not in tune with how floating point works, rounding and such you might not get the result you were thinking you were going to get. The compiler may at some point in the path optimize some of this behavior out, making it either harder to understand the real problems and the fpu in the hardware might accept mixed mode operands, also hiding what is really going on.
It is not just a speed problem but also accuracy. There was a recent SO question, pretty much the same problem, why does this comparison work with one number and not another. Take the fraction 5/11ths for example 0.454545.... Lets say, hypothetically, you had base 10 fpu with single precision of 3 significant digits and a double of 6 significant digits.
float a = 0.45454545454;
if(a>0.4545454545) b=1;
well in our hypothetical system we can only store three digits into a, so a = .455 because we are using by default a round up rounding mode. but our comparision will be considered double because we didnt put the f at the end of the number. the double version is 0.454545. a is converted to a double which results in 0.455000, so:
if(0.455000>0.454545) b = 1;
0.455 is greater than 0.454545 so b would be a 1.
float a = 0.45454545454;
if(a>0.4545454545f) b=1;
so now the comparison is single precision so we are comparing 0.455 to 0.455 which is not greater, so b=1 does not happen.
When you write floating point constants that is base 10 decimal, the floating point numbers in the computer are base 2 and they dont always convert smoothly just like 5/11 would work just fine in base 11 but in base 10 you get an infinite repeating digit. 0.1 in decimal for example creates a repeating pattern in binary. Depending on where the mantissa cuts off the rounding can make that lsbit of the mantissa round up or not (also depends on the rounding mode you are using if the floating point format you are using even has rounding). Which of itself creates problems depending on how you use the variable as the comparison above shows.
For non-floating point the compiler usually saves you, but sometimes doesnt:
unsigned long long a;
a = ~3;
a = ~(3ULL);
Depending on the compiler and computer the two assignments can give you different results one MIGHT give you 0x00000000FFFFFFFC another MIGHT give 0xFFFFFFFFFFFFFFFC.
If you want something specific you should be quite clear when you tell the compiler what you want otherwise the compiler takes a guess and doesnt always make the guess that you wanted.

It means that the value is to be interpreted as a single-precision floating point variable (type float). Without the f-suffix, it is interpreted as a double-precision floationg point variable (type double).
This is usually done to shut up compiler warnings about possible loss of precision by assigning a double value to a float variable. When you didn't receive such a warning you maybe have switched off warnings in your compiler settings (which is bad!).
But it can also have subtile syntactical meaning. As you know C++ allows functions which have the same name but differ by the types of their parameters. In that case the f suffix can determine which function is called.


Is it safe to use Int(round(x))?

Say you have a double value, and want to round it to an integer...
Many round() functions return a double instead of an integer:
C# - round(double) -> double
C++ - round(double) -> double
Darwin - round(double) -> double
Swift - double.rounded() -> double
Java - round(double) -> int
Ruby: float.round() -> int
(This is most likely because doubles have a much wider range of possible values.)
Given this "default" behavior, it probably explains why you'll commonly see the following recommended:
(Here we assume that Int() removes everything after the decimal: 4.9 -> 4.)
So far so good, until you realize how complex floating points really are. E.g. 55 might actually be stored as 54.9999999999999999, for example.
Because of this, it sounds like the following might happen:
Int(round(55.4)) // we ask it to round 55.4, expecting 55
Int(54.9999999999999) // it rounded it to "55.0"
54 // the Int() function removed all remaining digits
We were expecting 55.4 rounded to be 55, but it ended up evaluating to 54.
Can something like the above really happen if we use Int(round(x))?
If so, what should we use instead of Int(round())?
Related: Many languages define floor(double) -> double. Is Int(floor(double)) safe?
Floating point models are constructed on these foundations:
a base b
a significand with a limited number of digits in that base (p the precision)
an exponent e for shifting the floating point, also limited in a certain range
a sign
So the floating point values are made like this: (-1)^signBit * significand * b^e
The significand can be represented in a normalized form x.xxxxxxxx with 1 non null digit left of floating point (except for zero, or eventually values near zero that lose precision and gradually underflow), and p-1 digit after floating point.
But by shifting appropriately the exponent (e+1-p), it can as well be considered as an integer with p digits, xxxxxxxxx.0.
With a reasonnable range for exponent, we see that every integer up to b^p can be represented exactly by such floating point model. With the limited precision, only the last digits in base b are lost, so if we have an integer too large to fit in significand, it will necessarily have a null fraction part. Thus, there is no reason for round to answer anything else but an integral value (with null fraction part).
The only unsafe part as you noted is that Int range might be much smaller than range of floating point values. Thus converting large floating point to Int could result in overflow exception, or worse, silent overflow with undefined behavior...
The conversion to Int is thus not necessary for the sake of eliminating the fraction part. It must be for other purposes (like feeding another part of the program that would only accept an Int).

Is it ok to compare floating points to 0.0 without epsilon?

I am aware, that to compare two floating point values one needs to use some epsilon precision, as they are not exact. However, I wonder if there are edge cases, where I don't need that epsilon.
In particular, I would like to know if it is always safe to do something like this:
double foo(double x){
if (x < 0.0) return 0.0;
else return somethingelse(x); // somethingelse(x) != 0.0
int main(){
int x = -3.0;
if (foo(x) == 0.0) {
std::cout << "^- is this comparison ok?" << std::endl;
I know that there are better ways to write foo (e.g. returning a flag in addition), but I wonder if in general is it ok to assign 0.0 to a floating point variable and later compare it to 0.0.
Or more general, does the following comparison yield true always?
double x = 3.3;
double y = 3.3;
if (x == y) { std::cout << "is an epsilon required here?" << std::endl; }
When I tried it, it seems to work, but it might be that one should not rely on that.
Yes, in this example it is perfectly fine to check for == 0.0. This is not because 0.0 is special in any way, but because you only assign a value and compare it afterwards. You could also set it to 3.3 and compare for == 3.3, this would be fine too. You're storing a bit pattern, and comparing for that exact same bit pattern, as long as the values are not promoted to another type for doing the comparison.
However, calculation results that would mathematically equal zero would not always equal 0.0.
This Q/A has evolved to also include cases where different parts of the program are compiled by different compilers. The question does not mention this, my answer applies only when the same compiler is used for all relevant parts.
C++ 11 Standard,
§5.10 Equality operators
6 If both operands are of arithmetic or enumeration type, the usual
arithmetic conversions are performed on both operands; each of the
operators shall yield true if the specified relationship is true and
false if it is false.
The relationship is not defined further, so we have to use the common meaning of "equal".
§2.13.4 Floating literals
1 [...] If the scaled value is in the range of representable values
for its type, the result is the scaled value if representable, else
the larger or smaller representable value nearest the scaled value,
chosen in an implementation-defined manner. [...]
The compiler has to choose between exactly two values when converting a literal, when the value is not representable. If the same value is chosen for the same literal consistently, you are safe to compare values such as 3.3, because == means "equal".
Yes, if you return 0.0 you can compare it to 0.0; 0 is representable exactly as a floating-point value. If you return 3.3 you have to be a much more careful, since 3.3 is not exactly representable, so a conversion from double to float, for example, will produce a different value.
correction: 0 as a floating point value is not unique, but IEEE 754 defines the comparison 0.0==-0.0 to be true (any zero for that matter).
So with 0.0 this works - for every other number it does not. The literal 3.3 in one compilation unit (e.g. a library) and another (e.g. your application) might differ. The standard only requires the compiler to use the same rounding it would use at runtime - but different compilers / compiler settings might use different rounding.
It will work most of the time (for 0), but is very bad practice.
As long as you are using the same compiler with the same settings (e.g. one compilation unit) it will work because the literal 0.0 or 0.0f will translate to the same bit pattern every time. The representation of zero is not unique though. So if foo is declared in a library and your call to it in some application the same function might fail.
You can rescue this very case by using std::fpclassify to check whether the returned value represents a zero. For every finite (non-zero) value you will have to use an epsilon-comparison though unless you stay within one compilation unit and perform no operations on the values.
As written in both cases you are using identical constants in the same file fed to the same compiler. The string to float conversion the compiler uses should return the same bit pattern so these should not only be equal as in a plus or minus cases for zero thing but equal bit by bit.
Were you to have a constant which uses the operating systems C library to generate the bit pattern then have a string to f or something that can possibly use a different C library if the binary is transported to another computer than the one compiled on. You might have a problem.
Certainly if you compute 3.3 for one of the terms, runtime, and have the other 3.3 computed compile time again you can and will get failures on the equal comparisons. Some constants obviously are more likely to work than others.
Of course as written your 3.3 comparison is dead code and the compiler just removes it if optimizations are enabled.
You didnt specify the floating point format nor standard if any for that format you were interested in. Some formats have the +/- zero problem, some dont for example.
It is a common misconception that floating point values are "not exact". In fact each of them is perfectly exact (except, may be, some special cases as -0.0 or Inf) and equal to s·2e – (p – 1), where s, e, and p are significand, exponent, and precision correspondingly, each of them integer. E.g. in IEEE 754-2008 binary32 format (aka float32) p = 24 and 1 is represented as ‭0x‭800000‬‬·20 – 23. There are two things that are really not exact when you deal with floating point values:
Representation of a real value using a FP one. Obviously, not all real numbers can be represented using a given FP format, so they have to be somehow rounded. There are several rounding modes, but the most commonly used is the "Round to nearest, ties to even". If you always use the same rounding mode, which is almost certainly the case, the same real value is always represented with the same FP one. So you can be sure that if two real values are equal, their FP counterparts are exactly equal too (but not the reverse, obviously).
Operations with FP numbers are (mostly) inexact. So if you have some real-value function φ(ξ) implemented in the computer as a function of a FP argument f(x), and you want to compare its result with some "true" value y, you need to use some ε in comparison, because it is very hard (sometimes even impossible) to white a function giving exactly y. And the value of ε strongly depends on the nature of the FP operations involved, so in each particular case there may be different optimal value.
For more details see D. Goldberg. What Every Computer Scientist Should Know About Floating-Point Arithmetic, and J.-M. Muller et al. Handbook of Floating-Point Arithmetic. Both texts you can find in the Internet.

C++ gcc large number errors

My c++ project needs to work with numbers of planet masses... up to over 24 digits. They are floats. The same variable may also be a relatively small number (100) I have tried using double, and long, but compiling in linux with G++ I am receiving the warning: warning:
integer constant is too large for its type [enabled by default].
Also my calculations do not work because of this. I am wondering what type variable this kind of number will require.
I have done research, but it's turned up dry.. still, my apologies if this question is frequent. Thank you!
If you have a piece of code like:
double mass = 31415926535892718281828459;
then you need to understand that the constant is an integer. The whole statement would turn it into a double before putting it into mass but your scheme is failing before that point.
You need to tell the compiler it's a double straight away with something like:
double mass = 31415926535892718281828459.0;
Section 2.14 of C++11 details the literals and how they're defined. A group of digits, where the first isn't 0, is captured by the following rule of section 2.14.2 Integer literals:
decimal-literal digit
(a group of digits starting with 0 is still an integer, just one made out of octal digits rather than decimal ones).
Section 2.14.4 Floating literals shows how to instruct the compiler that you want a double such as, for example:
including a fractional component as in 1.414 or 15.; or
using the exponent notation as in 12e2.
Or, for the language lawyers out there:
A floating literal consists of an integer part, a decimal point, a fraction part, an e or E, an optionally signed integer exponent, and an optional type suffix. The integer and fraction parts both consist of a sequence of decimal (base ten) digits. Either the integer part or the fraction part (not both) can be omitted; either the decimal point or the letter e (or E) and the exponent (not both) can be omitted.
The type of a floating literal is double unless explicitly specified by a suffix. The suffixes f and F specify float, the suffixes l and L specify long double.
You need to make sure it is double:
123456789012345678 // integer, give warning
123456789012345678.0 // double (floating point)
If you need extra precision, you should consider using a large number library. See also C++ library for big float numbers
Here's a simple test case that produces this warning:
float foo() {
return 1000000000000000000000000;
The problem is that the number written there is actually an integer literal. This code is basically saying "take this value as an int, convert it to float, and return that." But the number is too big to fit in an int.
Solution: add ".0" or ".0f" to the end of the number to make it a double or float literal instead of an int.

Differences in rounded result when calling pow()

OK, I know that there was many question about pow function and casting it's result to int, but I couldn't find answer to this a bit specific question.
OK, this is the C code:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int main()
int i = 5;
int j = 2;
double d1 = pow(i,j);
double d2 = pow(5,2);
int i1 = (int)d1;
int i2 = (int)d2;
int i3 = (int)pow(i,j);
int i4 = (int)pow(5,2);
printf("%d %d %d %d",i1,i2,i3,i4);
return 0;
And this is the output: "25 25 24 25". Notice that only in third case where arguments to pow are not literals we have that wrong result, probably caused by rounding errors. Same thing happends without explicit casting. Could somebody explain what happens in this four cases?
Im using CodeBlocks in Windows 7, and MinGW gcc compiler that came with it.
The result of the pow operation is 25.0000 plus or minus some bit of rounding error. If the rounding error is positive or zero, 25 will result from the conversion to an integer. If the rounding error is negative, 24 will result. Both answers are correct.
What is most likely happening internally is that in one case a higher-precision, 80-bit FPU value is being used directly and in the other case, the result is being written from the FPU to memory (as a 64-bit double) and then read back in (converting it to a slightly different 80-bit value). This can make a microscopic difference in the final result, which is all it takes to change a 25.0000000001 to a 24.999999997
Another possibility is that your compiler recognizes the constants passed to pow and does the calculation itself, substituting the result for the call to pow. Your compiler may use an internal arbitrary-precision math library or it may just use one that's different.
This is caused by a combination of two problems:
The implementation of pow you are using is not high quality. Floating-point arithmetic is necessarily approximate in many cases, but good implementations take care to ensure that simple cases such as pow(5, 2) return exact results. The pow you are using is returning a result that is less than 25 by an amount greater than 0 but less than or equal to 2–49. For example, it might be returning 25–2-50.
The C implementation you are using sometimes uses a 64-bit floating-point format and sometimes uses an 80-bit floating-point format. As long as the number is kept in the 80-bit format, it retains the complete value that pow returned. If you convert this value to an integer, it produces 24, because the value is less than 25 and conversion to integer truncates; it does not round. When the number is converted to the 64-bit format, it is rounded. Converting between floating-point formats rounds, so the result is rounded to the nearest representable value, 25. After that, conversion to integer produces 25.
The compiler may switch formats whenever it is “convenient” in some sense. For example, there are a limited number of registers with the 80-bit format. When they are full, the compiler may convert some values to the 64-bit format and store them in memory. The compiler may also rearrange expressions or perform parts of them at compile-time instead of run-time, and these can affect the arithmetic performed and the format used.
It is troublesome when a C implementation mixes floating-point formats, because users generally cannot predict or control when the conversions between formats occur. This leads to results that are not easily reproducible and interferes with deriving or controlling numerical properties of software. C implementations can be designed to use a single format throughout and avoid some of these problems, but your C implementation is apparently not so designed.
To add to the other answers here: just generally be very careful when working with floating point values.
I highly recommend reading this paper (even though it is a long read):
Skip to section 3 for practical examples, but don't neglect the previous chapters!
I'm fairly sure this can be explained by "intermediate rounding" and the fact that pow is not simply looping around j times multiplying by i, but calculating using exp(log(i)*j) as a floating point calculation. Intermediate rounding may well convert 24.999999999996 into 25.000000000 - even arbitrary storing and reloading of the value may cause differences in this sort of behaviuor, so depending on how the code is generated, it may make a difference to the exact result.
And of course, in some cases, the compiler may even "know" what pow actually achieves, and replace the calculation with a constant result.

What does the compiler do when it converts a float variable to an integer variable?

What does the compiler do? The aim is to get the number after the point as an integer. I did it like this:
float a = 0;
cin >> a;
int b = (a - (int)a)*10;
Now my problem is this: when I enter for example 3.2, I get 2, which is what I want. It also works with .4, .5 and .7. but when I enter for example 2.3, I get 2. For 2.7 I get 6 and so on. But when I do it without variables, for example:
(2.3 - (int)2.3)*10;
I get the correct result.
I couldn't figure out what the compiler does. I alway thought when I cast a float to an integer, then it simply cuts at the point. This is what the compiler actually does when I use constant numbers. However, when I use variables, the compiler reduces some of them, but not all.
You are most likely not having problems with the compiler, but with the fact that floating point numbers cannot be represented exactly on a binary computer.
So, when you do:
float f = 2.7f;
..what might actually be stored in the computer is:
This is a very well-known characteristic of floating points on binary computers. There are many posts on SO that discuss this.
Basically, the problem comes from the fact that binary has different "infinitely repeating" values than base 10 does. For instance. 1/10 in decimal is 0.1, in binary, it's 0.000110011001100110011001100... The problem is caused because floating point cannot hold 2.3 correctly because it's an infinite number of binary digits, but it approximates closely, probably as 2.2999999. For most math, it's the close enough. But be wary of truncation.
One solution is to round before you truncate.
int b = (a - (int)(a+.05))*10;
Also note that floating point values have different sizes in memory than in the registers, which means you have to round when comparing if two floating point values are equal as well.
The reason for the discrepancy is that by default, floating point literals are doubles, which have higher accuracy, and are more closely able to represent the value you're looking for.
Why don't you do it like this?
b = (a*10)%10;
I find it a lot easier.