I'm using the asin and acos functions from cmath library to find the angles from their corresponding sine and cosine values but it sometimes returns inaccurate values. For example the result of the code below is not zero:
cout << acos(-1) / 6 - asin(0.5) << endl;
So what can I do? I use acos(-1) as the value of Pi, then somewhere in my code I want to see for example how many asin(0.5) are there in Pi but current method doesn't work.
Inexactness lurks in various places.
double is very commonly encoded using base 2, such as with binary64. As a floating point number, it can be expected to be a Dyadic rational. The key is that a finite floating-point number is a rational number.
Mathematically, arccos(-1) is π. π is an irrational number. Even with many digits of precision, a double cannot represent exactly π. Instead a nearby value for acos(-1) is used called machine pi or perhaps M_PI.
Mathematically, arcsin(0.5) is π/6. As above, asin(0.5) is a nearby value to π/6.
Division by 3 introduces potential inexactness. About 2 of every double/3 results in an inexact, yet rounded, quotient.
Of course it is not zero.
What is curious about this is that acos(-1) / 3 - asin(0.5) is about 1.57..., not 0.0.
I suspect OP was researching acos(-1) / 6 - asin(0.5).
Here subtraction of nearby values incurs catastrophic cancellation and a small, but non-zero difference, is a likely outcome.
So what can I do?
I want to see for example how many asin(0.5) are there in Pi
Tolerate marginal error in the calculation and employ the desired rounding.
double my_pi = acos(-1);
double target = asin(0.5);
long close_integer_quotient = lround(my_pi/target);
I came across something rather interesting while I am playing around with the math module for trigonometric calculations using tan, sin, and cos.
As stated is all math textbooks, online source, and courses, the following is true:
tan(x) = sin(x) / cos(x)
Although I came across some precision errors while using the three trig functions with the following:
from math import tan, sin, cos
theta = -30
alpha = tan(theta)
omega = sin(theta) / cos(theta)
print(alpha, omega)
print(alpha == omega)
>>> (6.405331196646276, 6.4053311966462765)
>>> (False)
I have tried a couple of different values for theta and the last digit of the results has been off by a tiny bit.
Is there something that I am missing?
This issue is because of the finite floating point precision (not all real numbers can be represented exactly and not all calculations with them are precise). An accessible guide is in the Python docs.
Using the default, "double precision" floating point representation, you can never hope for better than about 15 decimal place precision and calculations involving such numbers will tend to degrade this precision (the rounding error refered to in the above comment). In the same way, you get False from the following:
In [1]: 0.01 == (0.1)**2
Out[1]: False
because the Python isn't squaring 0.1 but the "nearest representable number" to 0.1, which is neither 0.01 nor the nearest representable number to 0.01.
D Stanley has given the correct way to test for "equality" within some absolute tolerance: (abs(a-b) < tol) where tol is some small number you choose to fit your expected precision.
As you have discovered, there is a level of imprecision when comparing floating point numbers. A common way to test for "equality" is to determine a reasonable amount of difference you want to accept (commonly called "epsilon") an compare the difference between the two numbers against that maximum error:
epsilon = 1E-14
print(alpha, omega)
print(alpha == omega)
print(abs(alpha - omega) < epsilon)
First you should notice that the arguments of trigonometric functions are given in arc length, not in degree. Thus theta=-30 refers to an angle of -30*180/pi in degrees.
Second, the processor, and thus the calling math library, has separate internal procedures for the computation of tan and (sin, cos). The extra division operation loses 1/2 to 1 bit of precision, which explains the difference in results.
I read a double value using std::cin from the keyboard, let the value be 1.15 . When I place a break point after reading the value visual studio showed that value as 1.14999999. But If I print it It printed 1.15 on my console. Later I wrote following code and it did not work well
int main()
{
long double valueA;
int required;
std::cin>>valueA;
required=(valueA*10000)-(((int)valueA)*10000);
std::cout<<(required);
}
When the input is 1.015 the output is 149 but the expected output is 150. Why is my compiler considering 1.015 as 1.014999999? How can I correct that error?
What you are describing is floating point error. This happens because of the way floating point is represented at the hardware level (see here). Basically a floating point number is kept as sme and is reconstructed as s * m * 2 ^ e where ^ is to the power of and s is 1 if the s bit is 0 and -1 if the s bit is 1.
If you need the sort of accuracy, you can use a decimal arithmetic library, it's more or less the same thing, but instead of using powers of 2, they use powers of 10 and because they are implemented in software, it means they can have arbitrary precision (more on that here).
Here's a list of libraries that implement decimal arithmetic that you can use:
gmp - https://gmplib.org/
qdecimal - https://code.google.com/p/qdecimal/
intel decimal - https://software.intel.com/en-us/articles/intel-decimal-floating-point-math-library/
I am trying to get the decimal part from the double and this is my code to get the decimal part
double decimalvalue = 23423.1234-23423.0;
0.12340000000040163
But after the subtraction I am expecting decimalvalue to be 0.1234 but I get 0.12340000000040163. Please help me to understand this behavior and if there is any workaround for it.
I suggest you have a look at
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Wikipedia: IEEE 754
There are a finite number of values you can specify in a floating point number, but an infinite number of floating point numbers in the represented range.
Some floating point numbers therefore cannot be represented exactly in any floating/double style data type.
The typical way to handle your specific problem is to avoid a direct equality comparison, but rather do an epsilon test: See if the expected and computed values are within some small number (compared to the values being subtracted), called epsilon, of each other.
Indirectly related is the concept of Machine Epsilon, worth having a look at for a complete understanding
This is a rounding error. In base ten you cannot perfectly represent 1/3 in a given number of digits (say 15). In base 2 there are a lot more things you can not represent, 0.1234 happens to be one of them. The precision depends on the scale, but it's about 15 decimal digits for a double. I would suggest taking a look at http://en.wikipedia.org/wiki/IEEE_floating_point for more details on floating point numbers.
If you are trying to make a base 10 system (like a human used calculator for instance) and you need exact results you should use BCD.
I'm trying to represent a simple set of 3 probabilities in C++. For example:
a = 0.1
b = 0.2
c = 0.7
(As far as I know probabilities must add up to 1)
My problem is that when I try to represent 0.7 in C++ as a float I end up with 0.69999999, which won't help when I am doing my calculations later. The same for 0.8, 0.80000001.
Is there a better way of representing numbers between 0.0 and 1.0 in C++?
Bear in mind that this relates to how the numbers are stored in memory so that when it comes to doing tests on the values they are correct, I'm not concerned with how they are display/printed out.
This has nothing to do with C++ and everything to do with how floating point numbers are represented in memory. You should never use the equality operator to compare floating point values, see here for better methods: http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm
My problem is that when I try to
represent 0.7 in C++ as a float I end
up with 0.69999999, which won't help
when I am doing my calculations later.
The same for 0.8, 0.80000001.
Is it really a problem? If you just need more precision, use a double instead of a float. That should get you about 15 digits precision, more than enough for most work.
Consider your source data. Is 0.7 really significantly more correct than 0.69999999?
If so, you could use a rational number library such as:
http://www.boost.org/doc/libs/1_40_0/libs/rational/index.html
If the problem is that probabilities add up to 1 by definition, then store them as a collection of numbers, omitting the last one. Infer the last value by subtracting the sum of the others from 1.
How much precision do you need? You might consider scaling the values and quantizing them in a fixed-point representation.
The tests you want to do with your numbers will be incorrect.
There is no exact floating point representation in a base-2 number system for a number like 0.1, because it is a infinte periodic number. Consider one third, that is exactly representable as 0.1 in a base-3 system, but 0.333... in the base-10 system.
So any test you do with a number 0.1 in floating point will be prone to be flawed.
A solution would be using rational numbers (boost has a rational lib), which will be always exact for, ermm, rationals, or use a selfmade base-10 system by multiplying the numbers with a power of ten.
If you really need the precision, and are sticking with rational numbers, I suppose you could go with a fixed point arithemtic. I've not done this before so I can't recommend any libraries.
Alternatively, you can set a threshold when comparing fp numbers, but you'd have to err on one side or another -- say
bool fp_cmp(float a, float b) {
return (a < b + epsilon);
}
Note that excess precision is automatically truncated in each calculation, so you should take care when operating at many different orders of magnitude in your algorithm. A contrived example to illustrate:
a = 15434355e10 + 22543634e10
b = a / 1e20 + 1.1534634
c = b * 1e20
versus
c = b + 1.1534634e20
The two results will be very different. Using the first method a lot of the precision of the first two numbers will be lost in the divide by 1e20. Assuming that the final value you want is on the order of 1e20, the second method will give you more precision.
If you only need a few digits of precision then just use an integer. If you need better precision then you'll have to look to different libraries that provide guarantees on precision.
The issue here is that floating point numbers are stored in base 2. You can not exactly represent a decimal in base 10 with a floating point number in base 2.
Lets step back a second. What does .1 mean? Or .7? They mean 1x10-1 and 7x10-1. If you're using binary for your number, instead of base 10 as we normally do, .1 means 1x2-1, or 1/2. .11 means 1x2-1 + 1x2-2, or 1/2+1/4, or 3/4.
Note how in this system, the denominator is always a power of 2. You cannot represent a number without a denominator that is a power of 2 in a finite number of digits. For instance, .1 (in decimal) means 1/10, but in binary that is an infinite repeating fraction, 0.000110011... (with the 0011 pattern repeating forever). This is similar to how in base 10, 1/3 is an infinite fraction, 0.3333....; base 10 can only represent numbers exactly with a denominator that is a multiple of powers of 2 and 5. (As an aside, base 12 and base 60 are actually really convenient bases, since 12 is divisible by 2, 3, and 4, and 60 is divisible by 2, 3, 4, and 5; but for some reason we use decimal anyhow, and we use binary in computers).
Since floating point numbers (or fixed point numbers) always have a finite number of digits, they cannot represent these infinite repeating fractions exactly. So, they either truncate or round the values to be as close as possible to the real value, but are not equal to the real value exactly. Once you start adding up these rounded values, you start getting more error. In decimal, if your representation of 1/3 is .333, then three copies of that will add up to .999, not 1.
There are four possible solutions. If all you care about is exactly representing decimal fractions like .1 and .7 (as in, you don't care that 1/3 will have the same problem you mention), then you can represent your numbers as decimal, for instance using binary coded decimal, and manipulate those. This is a common solution in finance, where many operations are defined in terms of decimal. This has the downside that you will need to implement all of your own arithmetic operations yourself, without the benefits of the computer's FPU, or find a decimal arithmetic library. This also, as mentioned, does not help with fractions that can't be represented exactly in decimal.
Another solution is to use fractions to represent your numbers. If you use fractions, with bignums (arbitrarily large numbers) for your numerators and denominators, you can represent any rational number that will fit in the memory of your computer. Again, the downside is that arithmetic will be slower, and you'll need to implement arithmetic yourself or use an existing library. This will solve your problem for all rational numbers, but if you wind up with a probability that is computed based on π or √2, you will still have the same issues with not being able to represent them exactly, and need to also use one of the later solutions.
A third solution, if all you care about is getting your numbers to add up to 1 exactly, is for events where you have n possibilities, to only store the values of n-1 of those probabilities, and compute the probability of the last as 1 minus the sum of the rest of the probabilities.
And a fourth solution is to do what you always need to remember when working with floating point numbers (or any inexact numbers, such as fractions being used to represent irrational numbers), and never compare two numbers for equality. Again in base 10, if you add up 3 copies of 1/3, you will wind up with .999. When you want to compare that number to 1, you have to instead compare to see if it is close enough to 1; check that the absolute value of the difference, 1-.999, is less than a threshold, such as .01.
Binary machines always round decimal fractions (except .0 and .5, .25, .75, etc) to values that don't have an exact representation in floating point. This has nothing to do with the language C++. There is no real way around it except to deal with it from a numerical perspective within your code.
As for actually producing the probabilities you seek:
float pr[3] = {0.1, 0.2, 0.7};
float accPr[3];
float prev = 0.0;
int i = 0;
for (i = 0; i < 3; i++) {
accPr[i] = prev + pr[i];
prev = accPr[i];
}
float frand = rand() / (1 + RAND_MAX);
for (i = 0; i < 2; i++) {
if (frand < accPr[i]) break;
}
return i;
I'm sorry to say there's not really an easy answer to your problem.
It falls into a field of study called "Numerical Analysis" that deals with these types of problems (which goes far beyond just making sure you don't check for equality between 2 floating point values). And by field of study, I mean there are a slew of books, journal articles, courses etc. dealing with it. There are people who do their PhD thesis on it.
All I can say is that that I'm thankful I don't have to deal with these issues very much, because the problems and the solutions are often very non-intuitive.
What you might need to do to deal with representing the numbers and calculations you're working on is very dependent on exactly what operations you're doing, the order of those operations and the range of values that you expect to deal with in those operations.
Depending on the requirements of your applications any one of several solutions could be best:
You live with the inherent lack of precision and use floats or doubles. You cannot test either for equality and this implies that you cannot test the sum of your probabilities for equality with 1.0.
As proposed before, you can use integers if you require a fixed precision. You represent 0.7 as 7, 0.1 as 1, 0.2 as 2 and they will add up perfectly to 10, i.e., 1.0. If you have to calculate with your probabilities, especially if you do division and multiplication, you need to round the results correctly. This will introduce an imprecision again.
Represent your numbers as fractions with a pair of integers (1,2) = 1/2 = 0.5. Precise, more flexible than 2) but you don't want to calculate with those.
You can go all the way and use a library that implements rational numbers (e.g. gmp). Precise, with arbitrary precision, you can calculate with it, but slow.
yeah, I'd scale the numbers (0-100)(0-1000) or whatever fixed size you need if you're worried about such things. It also makes for faster math computation in most cases. Back in the bad-old-days, we'd define entire cos/sine tables and other such bleh in integer form to reduce floating fuzz and increase computation speed.
I do find it a bit interesting that a "0.7" fuzzes like that on storage.