Converting float to int in C++ [duplicate] - c++

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 5 years ago.
I am talking about converting floats that already store integer values (like 10.0)
I need to do this because I want to create a simple function that reverses a given integer, and it uses a line of code that looks something like this: ans += x % (int)pow(10, i); (the % operator needs both arguments to be integer, and pow returns a double)
C++ is not very happy doing this conversion properly. For pow(10, 4), I get a value of 9999, which is very irritating. Why this error occurs is obvious, 10^4 must have been stored as 9999.999999... or something like that. I could possibly fix this error by using the lround() function, but that would be a less than optimal solution because what if I overlook a similar error like this?
Is there a way to do mathematics in C++ without worrying about such trivialties? If not, what language should I choose to do computation like this? I have briefly used the bigInt library in python. Any suggestions regarding which language and library to use to tackle this issue would be very helpful.

In other languages you have types for doing accurate math operations like in C# you have the decimal and in Java the BigDecimal type for example. So that problems like this:
printf (" %.20f \n", 3.6);
--> 3.60000000000000008882
or
0.1 + 0.2
--> 0.30000000000000004
does not happen.
In C++ you can use for example: The GNU Multiple Precision arithmetic library or Boost.Multiprecision or DecimalForCPP (header only). Surely there are more libraries available.
If you use such types, remember that they are slower than the inaccurate floating point types. But if you work with money for example, it's a must!

Related

float variable does not store decimal [duplicate]

This question already has answers here:
Why does integer division in C# return an integer and not a float?
(8 answers)
Closed 6 years ago.
The code I used in C++ is:
float y;
y=360/100;
cout<<y;
the output is 3.
Even, if I don't output y, and instead use it for a function like left(y), the value 3 is taken instead of 3.6. But if I define y=360.0/100, it works fine.
By the way, left() is a function included in package made by our CS prof. Left(x) changes the direction by an angle of x degrees towards left. A logo based package.
This is the way the language is defined. You divide an int by an int, so the calculation is performed resulting in an int, giving 3 (truncating, rather than rounding).
You then store 3 into a float, giving 3.0.
If you want the division performed using floats, make (at least) one of the arguments a float, e.g. 360f / 100. In this way, they other argument will be converted to a float before the division is performed.
360 / 100f will work equally well, but I think it probably make sense to make it clear as early as possible that this is a floating point calculation, rather than an integral one; that's a human consideration, rather than a technical one.
(Note that 360.0 is actually a double, although using that will work as well. The division would be performed as a double, then the result converted to a float for the assignment).
360/100 is computed in integer arithmetic before it's assigned to the float.
Reworking to 360f / 100 is my favourite way of fixing this.
Finally these days floats are for quiche eaters, girls, and folk who don't understand how modern chipsets work. Use a double type instead - which will probably be faster -, and 360.0 / 100.

clean up value without rounding up or down [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 6 years ago.
There is an input to my software for processing: float totalPurchased. I am coding with C++11/GCC/GDB/Linux.
The totalPurchase price informed is 14.92 as it is read from a file.
However, when the program runs, it shows 14.920001 out of no where. I don't want to round the value 14.92 to up 15.00 or down 14.00; the only thing I really need is to have the input right, without the compiler adding up things that does not exist as input.
The problem is that this 0.000001 is breaking a part the whole software calculation in the long run.
How to get rid of this 0.000001, and make sure that it appears the actual value that was read from the file into my float variable: 14.92?
All comments and suggestions are highly appreciated.
Unfortunately, floating point can't represent your number exactly: 14.92 is a repeating fraction in binary.
The question you want to ask yourself is: why does such a small offset break your calculation? If you really need to compare values so exactly, then perhaps floating point is not the appropriate datatype.
If you need something like, say, an exact percentage, or an exact number of cents, you can store 100* the number. This is a persistent problem in accounting, which is why accountants don't use floating point to add their money.
Use a fixed point library such as this.
For myself, in this domain, I would store price as integer pennies and do something along the line of
sprintf("%i.%02i", price/100, price%100)
Caveats for negative numbers, and in some applications you might want sub-penny precision, but you get the idea. A full fixed-point library is overkill for things like this, I think.
Actually, because I like to Do The Right Thing I would do something like this
class CashValue
{
public:
static CashValue parse (const std::string &);
float to_float () const;
CashValue & operator + (const CashValue &);
// etc
private:
int m_pennies;
};
Thus making the significance of the unit explicit. Only support operations for which exact solutions exist. If you want to do something like this
price *= pow (1.0 + (percent_interest/100.0), years)
then my interface would force you to verbosely convert it first, making you think about the issues when they become relevant, but still supporting safe operations such as addition transparently and accurately.

Handling large integers in c++ [duplicate]

This question already has answers here:
Using extremely large integer holding 3001 digits
(2 answers)
Closed 9 years ago.
How can i handle very large integers like 2^100000000 in c++?
I found no solution for this on internet that gives an exact answer.
Is there any mechanism that gives correct value in c++ for such large integers?
What you are looking for is called arbitrary precision arithmetic, you will find numerous libraries and educational resources with some googling.
You can represent given number as a string and convert it to array with integer digits. But the simplest way to google by keywords "long arithmetic c++ library" or something.
Maybe you want to use a Computer Algebra System (CAS), which would represent your expression like this:
class Pow : public Expr {
Number base;
Number exp;
};
Pow expr = new Pow(2, 100*1000*1000);
A CAS then allows you to manipulate these expressions structurally instead of the concrete values.

Precision problems of real numbers in Fortran [duplicate]

This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Closed 6 years ago.
I've been trying to use Fortran for my research project, with the GNU Fortran compiler (gfortran), latest version,
but I've been encountering some problems in the way it processes real numbers. If you have for example the code:
program test
implicit none
real :: y = 23.234, z
z = y * 100000
write(*,*) y, z
end program
You'll get as output:
23.23999 2323400.0
I find this really strange.
Can someone tell me what's exactly happening here? Looking at z I can see that y does retain its precision, so for calculations that shouldn't be a problem I suppose. But why is the output of y not exactly the same as the value that I've specified, and what can I do to make it exactly the same?
This is not a problem - all you see is floating-point representation of the number in the computer. The computer cannot handle real numbers exactly, but only approximations of them. A good read about this can be found here: What Every Computer Scientist Should Know About Floating-Point Arithmetic.
Simply by replacing real with double precision, you can increase the number of significant decimal places from about six to about 15 on most platforms.
The general issue is not limited to Fortran, but the representation of base 10 real numbers in another base of finite precision. This computer science question is asked many times here.
For the specifically Fortran aspects, the declaration "real" will likely give you a single precision floating point. As will expressing a constant as "23.234" without a type qualifier. The constant "100000" without a decimal point is an integer so the expression "y * 100000" is causing an implicit conversion of an integer to a real because "y" is a real variable.
For previous some previous discussions of these issues see Extended double precision , Fortran: integer*4 vs integer(4) vs integer(kind=4) and Is There a Better Double-Precision Assignment in Fortran 90?
The problem here is not with Fortran, in fact it is not a problem at all. This is just a feature of floating-point arithmetic. If you think about how you would represent 23.234 as a 'single float' in binary, you would see that the number has to be saved to only so many decimals of precision.
The thing to remember about float point number is: numbers that look round and even in base-10 probably won't in binary.
For a brief overview of floating-point topics, check the Wikipedia article. And for a VERY thorough explanation, check out the canonical paper by Goldberg (PDF).

Is there a library class to represent floating point numbers?

I am writing an application which does a lot of manipulation with decimal numbers (e.g. 57.65). As multiplications and divisions quickly erode their accuracy, I would like to store the numbers in a class which preserves their accuracy after manipulation, rather than rely on float and double.
I am talking about something like this:
class FloatingPointNumber {
private:
long m_mantissa;
int m_dps; // decimal points
// so for example 57.65 would be represented as m_mantissa=5765, m_dps=2
public:
// Overloaded function for addition
FloatingPointNumber operator+(FloatingPointNumber n);
// Other operator overloads follow
}
While it is possible for me to write such a class, it feels a bit like reinventing the wheel and I am sure that there must be some library class somewhere which does this (although this does not seem to exist in STL).
Does anybody know of such a library? Many thanks.
Do you mean something like this ?
#include "ttmath/ttmath.h"
#include <iostream>
int main()
{
// bigdouble consists of 1024*4 bytes for the mantissa
// and 256*4 bytes for the exponent.
// This is because my machine is 32-bit!
typedef ttmath::Big<1024, 256> bigdouble; // <Mantissa, Exponent>
bigdouble x = 5.05544;
bigdouble y = "54145454.15484854120248541841854158417";
bigdouble z = x * y * 0.01;
std::cout << z;
return 0;
}
You can specify the number of machine words in the mantissa and the exponent as you like.
I have used TTMath to solve Project Euler puzzles, and I am really pleased with it. I think it is relatively stable and the author is very kind if you have questions.
EDIT:: I have also used MAPM in the past. It represents big floats in base 100, so there would be no problem converting decimal numbers to base 100, unlike base 2. TTMAT uses base 2 to represents big floats. It is stable since 2000 as the library page claims. It has been used in many applications as you can see in the library page. It is a C library with a nice C++ wrapper.
MAPM nextPrime(){
static MAPM prime = 3;
MAPM retPrime = prime;
prime += 2;
while( isPrime( prime ) == false )
prime += 2;
return retPrime;
}
BTW, If you are interested in GMP and you are using VS, then you can check the MPIR which is GMP port for Windows ;) for me I find TTMath more than pleasing and easier/faster than all of what I tried because the library does stack allocations without touching the heap in anyway. Basically it is not an arbitrary precision library, you specify the precision at compile-time as shown above.
There is a list of libraries here.
I have never tried any of them so I can't recommend a single one, however this one is part of the GNU Project so it can't be half bad.
If you want to roll your own, Binary Coded Decimal is probably your best bet.
A list of decimal arithmetic packages, included Robert Klarer’s decNumber++, which implements the interfaces specified in the forthcoming ISO Technical Report on decimal arithmetic types in C++: ISO/IEC TR 24733: C++ Decimal Floating-Point Arithmetic Extensions
The Multiple Precision Floating point with correct Rounding library, but if I remember correctly, it is binary floating point
I have no experience with these libraries, but just as a matter of awareness, there have been 2 major developments that I think are relevant to this question in the last few years...
"posits" - a new floating-point format that is more efficient and less "messy" than IEEE754.
C# 11 has introduced "static abstract interface members" which enables (for our purposes) implementing new numeric types while getting all the same benefits of the built-in numeric types in terms of operator overloading, etc... i.e. truly generic numeric types in C#.
I know of no implementations of "posits" in C#, nor is C# 11 released yet. But again -- these are salient developments related to the question.