exponential value restriction in Float variable in C++

exponential value restriction in Float variable in C++ - c++

I want my answer in some precision form but not for input output purposes.
float a = cos ( 90*(PI/180)) gives 1.794897E-09
where as I want up to 8 decimal places answer in my variable which will give
0.00000000
setprecision or other methods are not helping to store the value in a variable. How can it be stored? basically it may not even be 8 or 9 digits .. all i want is restriction of an exponentiol form in my answer

You are limited by the type you are using. A single precision float can only represent between 6 and 9 significant decimal digits.
https://en.wikipedia.org/wiki/Single-precision_floating-point_format
Remember, a float is not a decimal value. So what you're seeing is the decimal representation. If you want more digits in the decimal representation, use a double.
https://en.wikipedia.org/wiki/Double-precision_floating-point_format
Double precision floats can represent values to 15-17 digits of decimal precision. This should guarantee the minimum of 8 that you require.

The precision and encoding of floating point values is concretely defined by IEEE 754. Without defining your own implementation of of floating points should be stored in memory, you can't really change the internal precision for how it's encoded and stored in memory.
If you want better precision you can use doubles. All the math functions work well with doubles.

Related

strtof() function misplacing decimal place

I have a string "1613894376.500012077" and I want to use strtof in order to convert to floating point 1613894376.500012077. The problem is when I use strtof I get the following result with the decimal misplaced 1.61389e+09. Please help me determine how to use strof properly.

A typical float is 32-bit and can only represent exactly about 232 different values. "1613894376.500012077" is not one of those.
"1.61389e+09" is the same value as "1613890000.0" and represents a close value that float can represent.
The 2 closest floats are:
1613894272.0
1613894400.0 // slightly closer to 1613894376.500012077
Print with more precision to see more digits.

The decimal point is not misplaced. The notation “1.61389e+09” means 1.61389•109, which is 1,613,890,000., which has the decimal point in the correct place.
The actual result of strtof in your computer is probably 1,613,894,400. This is the closest value to 1613894376.500012077 that the IEEE-754 binary32 (“single”) format can represent, and that is the format commonly used for float. When you print it with %g, the default is to use just six significant digits. To see it with more precision, print it with %.999g.

The number 1613894376.500012077 is equivalent (the same number up to the precision of the machine as 1.61389e+09.) The e+09 suffix means that the decimal point is located nine decimal digits right the place it has been placed (or that the number is multiplied by 10 to the ninth power). This is a common notation in computer science called scientific notation.

Double value change during assignment

I know double has some precision issues and it can truncate values during conversion to integer.
In my case I am assigning a double 690000000000123455 and it gets changed to 690000000000123392 during assignment.
Why is the number being changed so much drastically? After all there's no fractional part assigned with it. It doesn't seems like a precision issues as value doesn't change by 1 but 63.

Presumably you store 690000000000123455 as a 64 bit integer and assign this to a double.
double d = 690000000000123455;
The closest representable double precision value to 690000000000123455 can be checked here: http://pages.cs.wisc.edu/~rkennedy/exact-float?number=690000000000123455 and is seen to be 690000000000123392.
In other words, everything is as to be expected. Your number cannot be represented exactly as a double precision value and so the closest representable value is chosen.
For more discussion of floating point data types see: Is floating point math broken?

IEEE-754 double precision floats have about 53 bits of precision which equates to about 16 decimal digits (give or take). You'll notice that's about where your two numbers start to diverge.

double storage size is 8 byte. It's value ranges from 2.3E-308 to 1.7E+308. It's precision is upto 15 decimal places. But your number contains 18 digits. That's the reason.
You could use long double as it has precision upto 19 decimal places.

The other answers are already pretty complete, but I want to suggest a website I find very helpful in understanding how float point number works: IEEE 754 Converter (only 32-bit float here, but the interaction is still very good).
As we can see, 690000000000123455 is between 2^59 and 2^60, and the highest precision of the Mantissa is 2^-52 for double precision, which means that the precision step for the given number is 2^7=128. The error 63 you provided, is actually within range.
As a side suggestion, it is better to use long for storing big integers, as it will hold the precision and does not overflow (in your case).

Errors multiplying large doubles

I've made a BOMDAS calculator in C++ that uses doubles. Whenever I input an expression like
1000000000000000000000*1000000000000000000000
I get a result like 1000000000000000000004341624882808674582528.000000. I suspect it has something to do with floating-point numbers.

Floating point number represent values with a fixed size representation. A double can represent 16 decimal digits in form where the decimal digits can be restored (internally, it normally stores the value using base 2 which means that it can accurately represent most fractional decimal values). If the number of digits is exceeded, the value will be rounded appropriately. Of course, the upshot is that you won't necessarily get back the digits you're hoping for: if you ask for more then 16 decimal digits either explicitly or implicitly (e.g. by setting the format to std::ios_base::fixed with numbers which are bigger than 1e16) the formatting will conjure up more digits: it will accurately represent the internally held binary values which may produce up to, I think, 54 non-zero digits.
If you want to compute with large values accurately, you'll need some variable sized representation. Since your values are integers a big integer representation might work. These will typically be a lot slower to compute with than double.

A double stores 53 bits of precision. This is about 15 decimal digits. Your problem is that a double cannot store the number of digits you are trying to store. Digits after the 15th decimal digit will not be accurate.

That's not an error. It's exactly because of how floating-point types are represented, as the result is precise to double precision.
Floating-point types in computers are written in the form (-1)sign * mantissa * 2exp so they only have broader ranges, not infinite precision. They're only accurate to the mantissa precision, and the result after every operation will be rounded as such. The double type is most commonly implemented as IEEE-754 64-bit double precision with 53 bits of mantissa so it can be correct to log(253) ≈ 15.955 decimal digits. Doing 1e21*1e21 produces 1e42 which when rounding to the closest value in double precision gives the value that you saw. If you round that to 16 digits it's exactly the same as 1e42.
If you need more range, use double or long double. If you only works with integer then int64_t (or __int128 with gcc and many other compilers on 64-bit platforms) has a much larger precision (64/128 bits compared to 53 bits). If you need even more precision, use an arbitrary-precision arithmetic library instead such as GMP

Want to calculate 18 digits but will only calculate to 6? Using long double but still won't work

I've written a program to estimate pi using the Gregory Leibniz formula, however, it will not calculate to 18 decimal points. It will only calculate up to 5 decimal points. Any suggestions?

Use
cout.precision(50);
To increase the precision of the printed output. Here 50 is the number of decimal digits in your output.

The default printing precision for printf is 6
Precision specifies the exact number of digits to appear after the decimal point character. The default precision is 6
Similarly when std::cout was introduced into C++ the same default value was used
Manages the precision (i.e. how many digits are generated) of floating point output performed by std::num_put::do_put.
Returns the current precision.
Sets the precision to the given one. Returns the previous precision.
The default precision, as established by std::basic_ios::init, is 6.
https://en.cppreference.com/w/cpp/io/ios_base/precision
Therefore regardless of how precise the type is, only 6 fractional digits will be printed out. To get more digits you'll need to use std::setprecision or std::cout.precision
However calling std::cout.precision only affects the number of decimal digits in the output, not the number's real precision. Any digits over that type's precision would be just garbage
Most modern systems use IEEE-754 where float is single-precision with 23 bits of mantissa and double maps to double-precision with 52 bits of mantissa. As a result they're accurate to ~6-7 digits and ~15-16 decimal digits respectively. That means they can't represent numbers to 18 decimal points as you expected
On some platforms there may be some extended precision types so you'll be able to store numbers more precisely. For example long double on most compilers on x86 has 64 bits of precision and can represent ~18 significant digits, but it's not 18 digits after the decimal point. Higher precision can be obtained with quadruple-precision on some compilers. To achieve even more precision, the only way is to use a big number library or write one for your own.

double and accuracy

Using long double I get 18/19 = 0.947368421052631578..., and 947368421052631578 is the repeating decimal. Using double I get 0.947368421052631526... However, the former is correct. Why such an incorrect result?
Thanks for help.

A double typically provides 16(±1) decimal digits. Your example shows this:
4 8 12 16
v v v v
0.947368421052631578 long double
0.947368421052631526 double
The answers agree to 16 digits. This is what should be expected. Also note that there's no guarantee in the C Standard that a long double has more precision than a double.

You're trying to represent every decimal number with a finite amount of bits. Some things just aren't expressible exactly in floating point. Expecting exact answers with floats is your first problem. Take a look at What Every Computer Scientist Should Know About Floating-Point Arithmetic
Here's a summary from some lecture notes:
As mentioned earlier, computers cannot represent real numbers precisely since there are only a finite number of bits for storing a real number. Therefore, any number that has infinite number of digits such as 1/3, the square root of 2 and PI cannot be represented completely. Moreover, even a number of finite number of digits cannot be represented precisely because of the way of encoding real numbers.

A double which is usually implemented with IEEE 754 will be accurate to between 15 and 17 decimal digits. Anything past that can't be trusted, even if you can make the compiler display it.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js