I came across something rather interesting while I am playing around with the math module for trigonometric calculations using tan, sin, and cos.
As stated is all math textbooks, online source, and courses, the following is true:
tan(x) = sin(x) / cos(x)
Although I came across some precision errors while using the three trig functions with the following:
from math import tan, sin, cos
theta = -30
alpha = tan(theta)
omega = sin(theta) / cos(theta)
print(alpha, omega)
print(alpha == omega)
>>> (6.405331196646276, 6.4053311966462765)
>>> (False)
I have tried a couple of different values for theta and the last digit of the results has been off by a tiny bit.
Is there something that I am missing?

This issue is because of the finite floating point precision (not all real numbers can be represented exactly and not all calculations with them are precise). An accessible guide is in the Python docs.
Using the default, "double precision" floating point representation, you can never hope for better than about 15 decimal place precision and calculations involving such numbers will tend to degrade this precision (the rounding error refered to in the above comment). In the same way, you get False from the following:
In [1]: 0.01 == (0.1)**2
Out[1]: False
because the Python isn't squaring 0.1 but the "nearest representable number" to 0.1, which is neither 0.01 nor the nearest representable number to 0.01.
D Stanley has given the correct way to test for "equality" within some absolute tolerance: (abs(a-b) < tol) where tol is some small number you choose to fit your expected precision.

As you have discovered, there is a level of imprecision when comparing floating point numbers. A common way to test for "equality" is to determine a reasonable amount of difference you want to accept (commonly called "epsilon") an compare the difference between the two numbers against that maximum error:
epsilon = 1E-14
print(alpha, omega)
print(alpha == omega)
print(abs(alpha - omega) < epsilon)

First you should notice that the arguments of trigonometric functions are given in arc length, not in degree. Thus theta=-30 refers to an angle of -30*180/pi in degrees.
Second, the processor, and thus the calling math library, has separate internal procedures for the computation of tan and (sin, cos). The extra division operation loses 1/2 to 1 bit of precision, which explains the difference in results.


The asin() function from cmath library returns inaccurate values

I'm using the asin and acos functions from cmath library to find the angles from their corresponding sine and cosine values but it sometimes returns inaccurate values. For example the result of the code below is not zero:
cout << acos(-1) / 6 - asin(0.5) << endl;
So what can I do? I use acos(-1) as the value of Pi, then somewhere in my code I want to see for example how many asin(0.5) are there in Pi but current method doesn't work.
Inexactness lurks in various places.
double is very commonly encoded using base 2, such as with binary64. As a floating point number, it can be expected to be a Dyadic rational. The key is that a finite floating-point number is a rational number.
Mathematically, arccos(-1) is π. π is an irrational number. Even with many digits of precision, a double cannot represent exactly π. Instead a nearby value for acos(-1) is used called machine pi or perhaps M_PI.
Mathematically, arcsin(0.5) is π/6. As above, asin(0.5) is a nearby value to π/6.
Division by 3 introduces potential inexactness. About 2 of every double/3 results in an inexact, yet rounded, quotient.
Of course it is not zero.
What is curious about this is that acos(-1) / 3 - asin(0.5) is about 1.57..., not 0.0.
I suspect OP was researching acos(-1) / 6 - asin(0.5).
Here subtraction of nearby values incurs catastrophic cancellation and a small, but non-zero difference, is a likely outcome.
So what can I do?
I want to see for example how many asin(0.5) are there in Pi
Tolerate marginal error in the calculation and employ the desired rounding.
double my_pi = acos(-1);
double target = asin(0.5);
long close_integer_quotient = lround(my_pi/target);

Should I combine multiplication and division steps when working with floating point values?

I am aware of the precision problems in floats and doubles, which why I am asking this:
If I have a formula such as: (a/PI)*180.0 (where PI is a constant)
Should I combine the division and multiplication, so I can use only one division: a/0.017453292519943295769236, in order to avoid loss of precision ?
Does this make it more precise when it has less steps to calculate the result?
Short answer
Yes, you should in general combine as many multiplications and divisions by constants as possible into one operation. It is (in general(*)) faster and more accurate at the same time.
Neither π nor π/180 nor their inverses are representable exactly as floating-point. For this reason, the computation will involve at least one approximate constant (in addition to the approximation of each of the operations involved).
Because two operations introduce one approximation each, it can be expected to be more accurate to do the whole computation in one operation.
In the case at hand, is division or multiplication better?
Apart from that, it is a question of “luck” whether the relative accuracy to which π/180 can be represented in the floating-point format is better or worse than that of 180/π.
My compiler provides addition precision with the long double type, so I am able to use it as reference for answering this question for double:
~ $ cat t.c
#define PIL 3.141592653589793238462643383279502884197L
#include <stdio.h>
int main() {
long double heop = 180.L / PIL;
long double pohe = PIL / 180.L;
printf("relative acc. of π/180: %Le\n", (pohe - (double) pohe) / pohe);
printf("relative acc. of 180/π: %Le\n", (heop - (double) heop) / heop);
~ $ gcc t.c && ./a.out
relative acc. of π/180: 1.688893e-17
relative acc. of 180/π: -3.469703e-17
In usual programming practice, one wouldn't bother and simply multiply by (the floating-point representation of) 180/π, because multiplication is so much faster than division.
As it turns out, in the case of the binary64 floating-point type double almost always maps to, π/180 can be represented with better relative accuracy than 180/π, so π/180 is the constant one should use to optimize accuracy: a / ((double) (π / 180)). With this formula, the total relative error would be approximately the sum of the relative error of the constant (1.688893e-17) and of the relative error of the division (which will depend on the value of a but never be more than 2-53).
Alternative methods for faster and more accurate results
Note that division is so expensive that you could get an even more accurate result faster by using one multiplication and one fma: let heop1 be the best double approximation of 180/π, and heop2 the best double approximation of 180/π - heop1. Then the best value for the result can be computed as:
double r = fma(a, heop1, a * heop2);
The fact that the above is the absolute best possible double approximation to the real computation is a theorem (in fact, it is a theorem with exceptions. The details can be found in the “Handbook of Floating-Point Arithmetic”). But even when the real constant you want to multiply a double by in order to get a double result is one of the exceptions to the theorem, the above computation is still clearly very accurate and only differs from the best double approximation for a few exceptional values of a.
If, like mine, your compiler provides more precision for long double than for double, you can also use one long double multiplication:
// this is more accurate than double division:
double r = (double)((long double) a * 57.295779513082320876798L)
This is not as good as the solution based on fma, but it is good enough that for most values of a, it produces the optimal double approximation to the real computation.
A counter-example to the general claim that operations should be grouped as one
(*) The claim that it is better to group constant is only statistically true for most constants.
If you happened to wish to multiply a by, say, the real constant 0.0000001 * DBL_MIN, you would be better off multiplying first by 0.0000001, then by DBL_MIN, and the end result (which can be a normalized number if a is larger than 1000000 or so) would be more precise than if you had multiplied by the best double representation of 0.0000001 * DBL_MIN. This is because the relative accuracy when representing 0.0000001 * DBL_MIN as a single double value is much worse than the accuracy for representing 0.0000001.

Stable Cotangent

Is there a more stable implementation for the cotangent function than return 1.0/tan(x);?
cot(x) = cos(x)/sin(x) should be more numerically stable close to π/2 than cot(x) = 1/tan(x). You can implement that efficiently using sincos on platforms that have it.
Another possibility is cot(x) = tan(M_PI_2 - x). This should be faster than the above (even if sincos is available), but it may also be less accurate, because M_PI_2 is of course only an approximation of the transcendental number π/2, so the difference M_PI_2 - x will not be accurate to the full width of a double mantissa -- in fact, if you get unlucky, it may have only a few meaningful bits.
As a rule of thumb, when looking for sources of inaccuracy one should be concerned first and foremost about additions and subtractions, which can lead to the issue of subtractive cancellation. Multiplications and divisions are typically harmless for accuracy other than adding additional rounding error, but may cause problems through overflow and underflow in intermediate computations.
No machine number x can get close enough to multiples of π/2 to cause tan(x) to overflow, therefore tan(x) is well-defined and finite for all floating-point encodings for any of the IEEE-754 floating-point formats, and by extension, so is cot(x) = 1.0 / tan(x).
This is easily demonstrated by performing an exhaustive test with all numerical float encodings, as exhaustive test using double is not feasible, except perhaps with the largest supercomputers in existence today.
Using a math library with an accurate implementation of tan() with a maximum error of ~= 0.5 ulp, we find that computing cot(x) = 1.0 / tan(x) incurs a maximum error of less than 1.5 ulp, where the additional error compared to tan() itself is contributed by the rounding error of the division.
Repeating this exhaustive test over all float values with cot(x) = cos(x) / sin(x), where sin() and cos() are computed with a maximum error of ~= 0.5 ulp, we find that the maximum error in cot() is less than 2.0 ulps, so slightly larger. This is easily explained by having three sources of error instead of two in the previous formula.
Lastly, cot(x) = tan (M_PI_2 - x) suffers from the issue of subtractive cancellation mentioned earlier when x is near M_PI_2, and from the issue that in finite-precision floating-point arithmetic, M_PI_2 - x == M_PI_2 when x is sufficiently small in magnitude. This can lead to very large errors that leave us with no valid bits in the result.
If you consider the angle between two vectors (v and w), you can also obtain the cotangent as follow (using Eigen::Vector3d):
inline double cot(Eigen::Vector3d v, Eigen::Vector3d w) {
return( / (v.cross(w).norm()) );
With theta the angle between v and w, the above function is correct because:
|v x w| = |v|.|w|.sin(theta)
v . w = |v|.|w|.cos(theta)
cot(theta) = cos(theta) / sin(theta) = (v . w) / |v x w|

Representing probability in C++

I'm trying to represent a simple set of 3 probabilities in C++. For example:
a = 0.1
b = 0.2
c = 0.7
(As far as I know probabilities must add up to 1)
My problem is that when I try to represent 0.7 in C++ as a float I end up with 0.69999999, which won't help when I am doing my calculations later. The same for 0.8, 0.80000001.
Is there a better way of representing numbers between 0.0 and 1.0 in C++?
Bear in mind that this relates to how the numbers are stored in memory so that when it comes to doing tests on the values they are correct, I'm not concerned with how they are display/printed out.
This has nothing to do with C++ and everything to do with how floating point numbers are represented in memory. You should never use the equality operator to compare floating point values, see here for better methods:
My problem is that when I try to
represent 0.7 in C++ as a float I end
up with 0.69999999, which won't help
when I am doing my calculations later.
The same for 0.8, 0.80000001.
Is it really a problem? If you just need more precision, use a double instead of a float. That should get you about 15 digits precision, more than enough for most work.
Consider your source data. Is 0.7 really significantly more correct than 0.69999999?
If so, you could use a rational number library such as:
If the problem is that probabilities add up to 1 by definition, then store them as a collection of numbers, omitting the last one. Infer the last value by subtracting the sum of the others from 1.
How much precision do you need? You might consider scaling the values and quantizing them in a fixed-point representation.
The tests you want to do with your numbers will be incorrect.
There is no exact floating point representation in a base-2 number system for a number like 0.1, because it is a infinte periodic number. Consider one third, that is exactly representable as 0.1 in a base-3 system, but 0.333... in the base-10 system.
So any test you do with a number 0.1 in floating point will be prone to be flawed.
A solution would be using rational numbers (boost has a rational lib), which will be always exact for, ermm, rationals, or use a selfmade base-10 system by multiplying the numbers with a power of ten.
If you really need the precision, and are sticking with rational numbers, I suppose you could go with a fixed point arithemtic. I've not done this before so I can't recommend any libraries.
Alternatively, you can set a threshold when comparing fp numbers, but you'd have to err on one side or another -- say
bool fp_cmp(float a, float b) {
return (a < b + epsilon);
Note that excess precision is automatically truncated in each calculation, so you should take care when operating at many different orders of magnitude in your algorithm. A contrived example to illustrate:
a = 15434355e10 + 22543634e10
b = a / 1e20 + 1.1534634
c = b * 1e20
c = b + 1.1534634e20
The two results will be very different. Using the first method a lot of the precision of the first two numbers will be lost in the divide by 1e20. Assuming that the final value you want is on the order of 1e20, the second method will give you more precision.
If you only need a few digits of precision then just use an integer. If you need better precision then you'll have to look to different libraries that provide guarantees on precision.
The issue here is that floating point numbers are stored in base 2. You can not exactly represent a decimal in base 10 with a floating point number in base 2.
Lets step back a second. What does .1 mean? Or .7? They mean 1x10-1 and 7x10-1. If you're using binary for your number, instead of base 10 as we normally do, .1 means 1x2-1, or 1/2. .11 means 1x2-1 + 1x2-2, or 1/2+1/4, or 3/4.
Note how in this system, the denominator is always a power of 2. You cannot represent a number without a denominator that is a power of 2 in a finite number of digits. For instance, .1 (in decimal) means 1/10, but in binary that is an infinite repeating fraction, 0.000110011... (with the 0011 pattern repeating forever). This is similar to how in base 10, 1/3 is an infinite fraction, 0.3333....; base 10 can only represent numbers exactly with a denominator that is a multiple of powers of 2 and 5. (As an aside, base 12 and base 60 are actually really convenient bases, since 12 is divisible by 2, 3, and 4, and 60 is divisible by 2, 3, 4, and 5; but for some reason we use decimal anyhow, and we use binary in computers).
Since floating point numbers (or fixed point numbers) always have a finite number of digits, they cannot represent these infinite repeating fractions exactly. So, they either truncate or round the values to be as close as possible to the real value, but are not equal to the real value exactly. Once you start adding up these rounded values, you start getting more error. In decimal, if your representation of 1/3 is .333, then three copies of that will add up to .999, not 1.
There are four possible solutions. If all you care about is exactly representing decimal fractions like .1 and .7 (as in, you don't care that 1/3 will have the same problem you mention), then you can represent your numbers as decimal, for instance using binary coded decimal, and manipulate those. This is a common solution in finance, where many operations are defined in terms of decimal. This has the downside that you will need to implement all of your own arithmetic operations yourself, without the benefits of the computer's FPU, or find a decimal arithmetic library. This also, as mentioned, does not help with fractions that can't be represented exactly in decimal.
Another solution is to use fractions to represent your numbers. If you use fractions, with bignums (arbitrarily large numbers) for your numerators and denominators, you can represent any rational number that will fit in the memory of your computer. Again, the downside is that arithmetic will be slower, and you'll need to implement arithmetic yourself or use an existing library. This will solve your problem for all rational numbers, but if you wind up with a probability that is computed based on π or √2, you will still have the same issues with not being able to represent them exactly, and need to also use one of the later solutions.
A third solution, if all you care about is getting your numbers to add up to 1 exactly, is for events where you have n possibilities, to only store the values of n-1 of those probabilities, and compute the probability of the last as 1 minus the sum of the rest of the probabilities.
And a fourth solution is to do what you always need to remember when working with floating point numbers (or any inexact numbers, such as fractions being used to represent irrational numbers), and never compare two numbers for equality. Again in base 10, if you add up 3 copies of 1/3, you will wind up with .999. When you want to compare that number to 1, you have to instead compare to see if it is close enough to 1; check that the absolute value of the difference, 1-.999, is less than a threshold, such as .01.
Binary machines always round decimal fractions (except .0 and .5, .25, .75, etc) to values that don't have an exact representation in floating point. This has nothing to do with the language C++. There is no real way around it except to deal with it from a numerical perspective within your code.
As for actually producing the probabilities you seek:
float pr[3] = {0.1, 0.2, 0.7};
float accPr[3];
float prev = 0.0;
int i = 0;
for (i = 0; i < 3; i++) {
accPr[i] = prev + pr[i];
prev = accPr[i];
float frand = rand() / (1 + RAND_MAX);
for (i = 0; i < 2; i++) {
if (frand < accPr[i]) break;
return i;
I'm sorry to say there's not really an easy answer to your problem.
It falls into a field of study called "Numerical Analysis" that deals with these types of problems (which goes far beyond just making sure you don't check for equality between 2 floating point values). And by field of study, I mean there are a slew of books, journal articles, courses etc. dealing with it. There are people who do their PhD thesis on it.
All I can say is that that I'm thankful I don't have to deal with these issues very much, because the problems and the solutions are often very non-intuitive.
What you might need to do to deal with representing the numbers and calculations you're working on is very dependent on exactly what operations you're doing, the order of those operations and the range of values that you expect to deal with in those operations.
Depending on the requirements of your applications any one of several solutions could be best:
You live with the inherent lack of precision and use floats or doubles. You cannot test either for equality and this implies that you cannot test the sum of your probabilities for equality with 1.0.
As proposed before, you can use integers if you require a fixed precision. You represent 0.7 as 7, 0.1 as 1, 0.2 as 2 and they will add up perfectly to 10, i.e., 1.0. If you have to calculate with your probabilities, especially if you do division and multiplication, you need to round the results correctly. This will introduce an imprecision again.
Represent your numbers as fractions with a pair of integers (1,2) = 1/2 = 0.5. Precise, more flexible than 2) but you don't want to calculate with those.
You can go all the way and use a library that implements rational numbers (e.g. gmp). Precise, with arbitrary precision, you can calculate with it, but slow.
yeah, I'd scale the numbers (0-100)(0-1000) or whatever fixed size you need if you're worried about such things. It also makes for faster math computation in most cases. Back in the bad-old-days, we'd define entire cos/sine tables and other such bleh in integer form to reduce floating fuzz and increase computation speed.
I do find it a bit interesting that a "0.7" fuzzes like that on storage.

C++ Cosine Problem

I have the following code using C++:
double value = .3;
double result = cos(value);
When I look at the values in the locals window for "value" it shows 0.2999999999
Then, when I get the value of "result" I get: 0.95533648912560598
However, when I run cos(.3) on the computers calculator I get: .9999862922474
So clearly there is something that I am doing wrong.
Any thoughts on what might be causing the difference in results?
I am running Win XP on an Intel processor.
The difference in results is because:
Your computer's calculator is returning the cosine of an angle specified in degrees.
The C++ cos() function is returning cosine of an angle specified in radians.
The .2999999999 is due to the way floating point numbers are handled in computers. .3 cannot be represented exactly in a double. For details, I recommend reading What Every Computer Scientist Should Know about Floating Point Arithmetic.
cos(.3 radians) = 0.95533...
cos(.3 degrees) = 0.99998...
cos(0.3) = 0.99998629224742679269138848004408 using degrees
cos(0.3) = 0,95533648912560601964231022756805 using radians
When I look at the values in the locals window for "value" it shows 0.2999999999
Long story short, your calculator uses decimal arithmetic, while your C++ code uses binary arithmetic (double is a binary floating-point number). Decimal number 0.3 cannot be represented exactly as a binary floating-point number. Read What Every Computer Scientist Should Know About Floating-Point Arithmetic, that will explain all implications in more detail.
Your calculator is using degrees. For example:
>>> import math
>>> math.cos (.3)
>>> math.cos (.3 * math.pi / 180) # convert to degrees
C++ does not exactly represent floating point numbers due to the insane amount of storage that would be required to get the infinite precision necessary. For a demonstration of this, try the following:
double ninth = 1.0/9.0;
double result = 9.0 * ninth;
This should yield a value in result of .99999999999
So, in essence, you need to compare floating point values within a small epsilon (I tend to use 1e-7). You can do a strict bit-by-bit comparison, but this consists of converting the memory used by the floating point to an array of characters of length sizeof(float), then comparing the characters.
Another thing to check would be whether or not you are using degrees. The computer's calculator uses degrees for its cosine calculation (notice how the result from the calculator is .99999..., which is very close to 1. The cosine of zero is 1 exactly), whereas the cosine function offered in <math> is in radians. Try multiplying your value by PI/180.0 and seeing if the result is more inline with your expectations.