My c++ project needs to work with numbers of planet masses... up to over 24 digits. They are floats. The same variable may also be a relatively small number (100) I have tried using double, and long, but compiling in linux with G++ I am receiving the warning: warning:
integer constant is too large for its type [enabled by default].
Also my calculations do not work because of this. I am wondering what type variable this kind of number will require.
I have done research, but it's turned up dry.. still, my apologies if this question is frequent. Thank you!
If you have a piece of code like:
double mass = 31415926535892718281828459;
then you need to understand that the constant is an integer. The whole statement would turn it into a double before putting it into mass but your scheme is failing before that point.
You need to tell the compiler it's a double straight away with something like:
double mass = 31415926535892718281828459.0;
Section 2.14 of C++11 details the literals and how they're defined. A group of digits, where the first isn't 0, is captured by the following rule of section 2.14.2 Integer literals:
decimal-literal:
nonzero-digit
decimal-literal digit
(a group of digits starting with 0 is still an integer, just one made out of octal digits rather than decimal ones).
Section 2.14.4 Floating literals shows how to instruct the compiler that you want a double such as, for example:
including a fractional component as in 1.414 or 15.; or
using the exponent notation as in 12e2.
Or, for the language lawyers out there:
A floating literal consists of an integer part, a decimal point, a fraction part, an e or E, an optionally signed integer exponent, and an optional type suffix. The integer and fraction parts both consist of a sequence of decimal (base ten) digits. Either the integer part or the fraction part (not both) can be omitted; either the decimal point or the letter e (or E) and the exponent (not both) can be omitted.
The type of a floating literal is double unless explicitly specified by a suffix. The suffixes f and F specify float, the suffixes l and L specify long double.
You need to make sure it is double:
123456789012345678 // integer, give warning
123456789012345678.0 // double (floating point)
If you need extra precision, you should consider using a large number library. See also C++ library for big float numbers
Here's a simple test case that produces this warning:
float foo() {
return 1000000000000000000000000;
}
The problem is that the number written there is actually an integer literal. This code is basically saying "take this value as an int, convert it to float, and return that." But the number is too big to fit in an int.
Solution: add ".0" or ".0f" to the end of the number to make it a double or float literal instead of an int.
Related
I was looking at the C++17 spec for floating point literals and found a problem. How do you tell the difference between the digit F and the suffix F for single precision?
For instance, does the literal 0x1p0F translate to a double precision 32768.0L or a single precision 1.0F?
The spec says that the suffix is optional, and that no suffix indicates double precision, so, as written, there is a definite ambiguity.
A hex-float literal must use a p exponent. The exponent is defined using non-hexadecimal digits (a decimal integer that represents the exponent to be applied to 2). Therefore, it cannot contain "A-F" characters. So there is no ambiguity. 0x1p0F has an exponent of "0", and is of type float.
Say you have a double value, and want to round it to an integer...
Many round() functions return a double instead of an integer:
C# - round(double) -> double
C++ - round(double) -> double
Darwin - round(double) -> double
Swift - double.rounded() -> double
Java - round(double) -> int
Ruby: float.round() -> int
(This is most likely because doubles have a much wider range of possible values.)
Given this "default" behavior, it probably explains why you'll commonly see the following recommended:
Int(round(myDouble))
(Here we assume that Int() removes everything after the decimal: 4.9 -> 4.)
So far so good, until you realize how complex floating points really are. E.g. 55 might actually be stored as 54.9999999999999999, for example.
Because of this, it sounds like the following might happen:
Int(round(55.4)) // we ask it to round 55.4, expecting 55
Int(54.9999999999999) // it rounded it to "55.0"
54 // the Int() function removed all remaining digits
We were expecting 55.4 rounded to be 55, but it ended up evaluating to 54.
Can something like the above really happen if we use Int(round(x))?
If so, what should we use instead of Int(round())?
Related: Many languages define floor(double) -> double. Is Int(floor(double)) safe?
Floating point models are constructed on these foundations:
a base b
a significand with a limited number of digits in that base (p the precision)
an exponent e for shifting the floating point, also limited in a certain range
a sign
So the floating point values are made like this: (-1)^signBit * significand * b^e
The significand can be represented in a normalized form x.xxxxxxxx with 1 non null digit left of floating point (except for zero, or eventually values near zero that lose precision and gradually underflow), and p-1 digit after floating point.
But by shifting appropriately the exponent (e+1-p), it can as well be considered as an integer with p digits, xxxxxxxxx.0.
With a reasonnable range for exponent, we see that every integer up to b^p can be represented exactly by such floating point model. With the limited precision, only the last digits in base b are lost, so if we have an integer too large to fit in significand, it will necessarily have a null fraction part. Thus, there is no reason for round to answer anything else but an integral value (with null fraction part).
The only unsafe part as you noted is that Int range might be much smaller than range of floating point values. Thus converting large floating point to Int could result in overflow exception, or worse, silent overflow with undefined behavior...
The conversion to Int is thus not necessary for the sake of eliminating the fraction part. It must be for other purposes (like feeding another part of the program that would only accept an Int).
In the tutorial I'm reading for OGRE3d here the programmer is constantly adding f at the end of any variable he initializes, like 200.00f or 0.00f so I decided to erase f and see if it compiles and it compiles just fine, what is the point of adding f at the end of the variable?
EDIT: So you're saying if I initialize a variable with 200.03 it won't initialize it as a floating point but if I were to do so with 200.03f it would? If not where does the f become useful then?
It's a way to specify that number has to be interpreted as a "float", not a "double" (which is the standard for C++ decimal numbers and uses up twice the memory).
This discussion could be of help:
http://www.cplusplus.com/forum/beginner/24483/
Quoted from http://msdn.microsoft.com/en-us/library/w9bk1wcy.aspx
A floating-point constant without an f, F, l, or L suffix has type
double. If the letter f or F is the suffix, the constant has type
float. If suffixed by the letter l or L, it has type long double. For
example:
200.00f is not a variable. It can't vary.
It's a compile-time constant, with float representation. The f signifies that it's a float.
By comparison, 200.00 would be interpreted as a double.
The C standard states that constant floats are doubles which promotes the operation to a double.
float a,b,c;
...
a = b+7.1; this is a double precision operation
...
a = b+7.1f; this is a single precision operation
...
c = 7.1; //double
a = b + c; //single all the way
The double precision requires more storage for the constant, plus a conversion from single to double for the variable operand, then a conversion from double to single to assign the result. With all the conversions going on if you are not in tune with how floating point works, rounding and such you might not get the result you were thinking you were going to get. The compiler may at some point in the path optimize some of this behavior out, making it either harder to understand the real problems and the fpu in the hardware might accept mixed mode operands, also hiding what is really going on.
It is not just a speed problem but also accuracy. There was a recent SO question, pretty much the same problem, why does this comparison work with one number and not another. Take the fraction 5/11ths for example 0.454545.... Lets say, hypothetically, you had base 10 fpu with single precision of 3 significant digits and a double of 6 significant digits.
float a = 0.45454545454;
...
if(a>0.4545454545) b=1;
...
well in our hypothetical system we can only store three digits into a, so a = .455 because we are using by default a round up rounding mode. but our comparision will be considered double because we didnt put the f at the end of the number. the double version is 0.454545. a is converted to a double which results in 0.455000, so:
if(0.455000>0.454545) b = 1;
0.455 is greater than 0.454545 so b would be a 1.
float a = 0.45454545454;
...
if(a>0.4545454545f) b=1;
...
so now the comparison is single precision so we are comparing 0.455 to 0.455 which is not greater, so b=1 does not happen.
When you write floating point constants that is base 10 decimal, the floating point numbers in the computer are base 2 and they dont always convert smoothly just like 5/11 would work just fine in base 11 but in base 10 you get an infinite repeating digit. 0.1 in decimal for example creates a repeating pattern in binary. Depending on where the mantissa cuts off the rounding can make that lsbit of the mantissa round up or not (also depends on the rounding mode you are using if the floating point format you are using even has rounding). Which of itself creates problems depending on how you use the variable as the comparison above shows.
For non-floating point the compiler usually saves you, but sometimes doesnt:
unsigned long long a;
...
a = ~3;
a = ~(3ULL);
...
Depending on the compiler and computer the two assignments can give you different results one MIGHT give you 0x00000000FFFFFFFC another MIGHT give 0xFFFFFFFFFFFFFFFC.
If you want something specific you should be quite clear when you tell the compiler what you want otherwise the compiler takes a guess and doesnt always make the guess that you wanted.
It means that the value is to be interpreted as a single-precision floating point variable (type float). Without the f-suffix, it is interpreted as a double-precision floationg point variable (type double).
This is usually done to shut up compiler warnings about possible loss of precision by assigning a double value to a float variable. When you didn't receive such a warning you maybe have switched off warnings in your compiler settings (which is bad!).
But it can also have subtile syntactical meaning. As you know C++ allows functions which have the same name but differ by the types of their parameters. In that case the f suffix can determine which function is called.
When I debug my software in VS C++ by stepping the code I notice that some float calculations show up as a number with a trailing dot, i.e.:
1232432.
One operation that lead up to this result is this:
float result = pow(10, a * 0.1f) / b
where a is a large negative number around -50 to -100 and b is most often around 1. I read some articles about problem with precision when it comes to floating-points. My question is just if the trailing dot is a Visual-Studio-way of telling me that the precision is very low on this number, i.e. in the variable result. If not, what does it mean?
This came up at work today and I remember that there was a problem for larger numbers so this did to occur every time (and by "this" I mean that trailing dot). But I do remember that it happened when there was seven digits in the number. Here they wright that the precision of floats are seven digits:
C++ Float Division and Precision
Can this be the thing and Visual Studio tells me this by putting a dot in the end?
I THINK I FOUND IT! It says "The mantissa is specified as a sequence of digits followed by a period". What does the mantissa mean? Can this be different on a PC and when running the code on a DSP? Because the thing is that I get different results and the only thing that looks strange to me is this period-thing, since I don't know what it means.
http://msdn.microsoft.com/en-us/library/tfh6f0w2(v=vs.71).aspx
If you're referring to the "sig figs" convention where "4.0" means 4±0.1 and "4.00" means 4±0.01, then no, there's no such concept in float or double. Numbers are always* stored with 24 or 53 significant bits (7.22 or 15.95 decimal digits) regardless of how many are actually "significant".
The trailing dot is just a decimal point without any digits after it (which is a legal C literal). It either means that
The value is 1232432.0 and they trimed the unnecessary trailing zero, OR
Everything is being rounded to 7 significant digits (in which case the true value might also be 1232431.5, 1232431.625, 1232431.75, 1232431.875, 1232432.125, 1232432.25, 1232432.375, or 1232432.5.)
The real question is, why are you using float? double is the "normal" floating-point type in C(++), and float a memory-saving optimization.
* Pedants will be quick to point out denormals, x87 80-bit intermediate values, etc.
The precision is not variable, that is simply how VS is formatting it for display. The precision (or lackof) is always constant for a given floating point number.
The MSDN page you linked to talks about the syntax of a floating-point literal in source code. It doesn't define how the number will be displayed by whatever tool you're using. If you print a floating-point number using either printf or std:cout << ..., the language standard specifies how it will be printed.
If you print it in the debugger (which seems to be what you're doing), it will be formatted in whatever way the developers of the debugger decided on.
There are a number of different ways that a given floating-point number can be displayed: 1.0, 1., 10.0E-001, and .1e+1 all mean exactly the same thing. A trailing . does not typically tell you anything about precision. My guess is that the developers of the debugger just used 1232432. rather than 1232432.0 to save space.
If you're seeing the trailing . for some values, and a decimal number with no . at all for others, that sounds like an odd glitch (possibly a bug) in the debugger.
If you're wondering what the actual precision is, for IEEE 32-bit float (the format most computers use these days), the next representable numbers before and after 1232432.0 are 1232431.875 and 1232432.125. (You'll get much better precision using double rather than float.)
This question already has answers here:
Why do I see a double variable initialized to some value like 21.4 as 21.399999618530273?
(14 answers)
Closed 6 years ago.
I am facing a problem and unable to resolve it. Need help from gurus. Here is sample code:-
float f=0.01f;
printf("%f",f);
if we check value in variable during debugging f contains '0.0099999998' value and output of printf is 0.010000.
a. Is there any way that we may force the compiler to assign same values to variable of float type?
b. I want to convert float to string/character array. How is it possible that only and only exactly same value be converted to string/character array. I want to make sure that no zeros are padded, no unwanted values are padded, no changes in digits as in above example.
It is impossible to accurately represent a base 10 decimal number using base 2 values, except for a very small number of values (such as 0.25). To get what you need, you have to switch from the float/double built-in types to some kind of decimal number package.
You could use boost::lexical_cast in this way:
float blah = 0.01;
string w = boost::lexical_cast<string>( blah );
The variable w will contain the text value 0.00999999978. But I can't see when you really need it.
It is preferred to use boost::format to accurately format a float as an string. The following code shows how to do it:
float blah = 0.01;
string w = str( boost::format("%d") % blah ); // w contains exactly "0.01" now
Have a look at this C++ reference. Specifically the section on precision:
float blah = 0.01;
printf ("%.2f\n", blah);
There are uncountably many real numbers.
There are only a finite number of values which the data types float, double, and long double can take.
That is, there will be uncountably many real numbers that cannot be represented exactly using those data types.
The reason that your debugger is giving you a different value is well explained in Mark Ransom's post.
Regarding printing a float without roundup, truncation and with fuller precision, you are missing the precision specifier - default precision for printf is typically 6 fractional digits.
try the following to get a precision of 10 digits:
float amount = 0.0099999998;
printf("%.10f", amount);
As a side note, a more C++ way (vs. C-style) to do things is with cout:
float amount = 0.0099999998;
cout.precision(10);
cout << amount << endl;
For (b), you could do
std::ostringstream os;
os << f;
std::string s = os.str();
In truth using the floating point processor or co-processor or section of the chip itself (most are now intergrated into the CPU), will never result in accurate mathematical results, but they do give a fairly rough accuracy, for more accurate results, you could consider defining a class "DecimalString", which uses nybbles as decimal characters and symbols... and attempt to mimic base 10 mathematics using strings... in that case, depending on how long you want to make the strings, you could even do away with the exponent part altogether a string 256 can represent 1x10^-254 upto 1^+255 in straight decimal using actual ASCII, shorter if you want a sign, but this may prove significantly slower. You could speed this by reversing the digit order, so from left to right they read
units,tens,hundreds,thousands....
Simple example
eg. "0021" becomes 1200
This would need "shifting" left and right to make the decimal points line up before routines as well, the best bet is to start with the ADD and SUB functions, as you will then build on them in the MUL and DIV functions. If you are on a large machine, you could make them theoretically as long as your heart desired!
Equally, you could use the stdlib.h, in there are the sprintf, ecvt and fcvt functions (or at least, there should be!).
int sprintf(char* dst,const char* fmt,...);
char *ecvt(double value, int ndig, int *dec, int *sign);
char *fcvt(double value, int ndig, int *dec, int *sign);
sprintf returns the number of characters it wrote to the string, for example
float f=12.00;
char buffer[32];
sprintf(buffer,"%4.2f",f) // will return 5, if it is an error it will return -1
ecvt and fcvt return characters to static char* locations containing the null terminated decimal representations of the numbers, with no decimal point, most significant number first, the offset of the decimal point is stored in dec, the sign in "sign" (1=-,0=+) ndig is the number of significant digits to store. If dec<0 then you have to pad with -dec zeros pror to the decimal point. I fyou are unsure, and you are not working on a Windows7 system (which will not run old DOS3 programs sometimes) look for TurboC version 2 for Dos 3, there are still one or two downloads available, it's a relatively small program from Borland which is a small Dos C/C++ edito/compiler and even comes with TASM, the 16 bit machine code 386/486 compile, it is covered in the help files as are many other useful nuggets of information.
All three routines are in "stdlib.h", or should be, though I have found that on VisualStudio2010 they are anything but standard, often overloaded with function dealing with WORD sized characters and asking you to use its own specific functions instead... "so much for standard library," I mutter to myself almost each and every time, "Maybe they out to get a better dictionary!"
You would need to consult your platform standards to determine how to best determine the correct format, you would need to display it as a*b^C, where 'a' is the integral component that holds the sign, 'b' is implementation defined (Likely fixed by a standard), and 'C' is the exponent used for that number.
Alternatively, you could just display it in hex, it'd mean nothing to a human, though, and it would still be binary for all practical purposes. (And just as portable!)
To answer your second question:
it IS possible to exactly and unambiguously represent floats as strings. However, this requires a hexadecimal representation. For instance, 1/16 = 0.1 and 10/16 is 0.A.
With hex floats, you can define a canonical representation. I'd personally use a fixed number of digits representing the underlying number of bits, but you could also decide to strip trailing zeroes. There's no confusion possible on which trailing digits are zero.
Since the representation is exact, the conversions are reversible: f==hexstring2float(float2hexstring(f))