I'm copying a script from matlab into a c++ function. However, I constantly get different result for the exp function. For example, following snippet:
std::complex<double> final_b = std::exp(std::complex<double>(0, 1 * pi));
should be equivalent to the MATLAB code
final_b = exp(1i * pi);
But it isn't. For MATLAB, I receive -1 + 0i (which is correct) and for c++, I get -1 + -2.068231e-013*i.
Now I thought at the beginning this is just a rounding error of sorts, but for the actual script I'm using, which has bigger complex exponentials, I get completely different numbers. What is the cause of this? How do I fix this?
Edit: I've manually tried calculating the exponential with eulers formula
exp(x+iy) = exp(x) * (cos(y) + i*sin(y))
and get the same wonky results in c++
That is called floating point approximation (or imprecision):
If you include the header cfloat there are some definitions. In particular, DBL_EPSILON, which is the smallest number that 1.0 + DBL_EPSILON != 1.0, which is usually 1e-9 (and -2.068231e-013 is much smaller than that. If you do the following piece of code, you can check if it is zero or not:
// The complete formula is std::abs(a - b), but since b is zero, I am ommiting it
if (std::abs(number.imag()) < DBL_EPSILON) {
// The number is either zero or very close to zero
}
For example, you can see the working code here: http://ideone.com/2OzNZm
Related
Ok so i am doing an a program where I am trying to get the result of the right side to be equivalent to the left side with 0.0001% accuracy
sin x = x - (x^3)/3! + (x^5)/5! + (x^7)/7! +....
#include<iostream>
#include<iomanip>
#include<math.h>
using namespace std;
long int fact(long int n)
{
if(n == 1 || n == 0)
return 1;
else
return n*fact(n-1);
}
int main()
{
int n = 1, counts=0; //for sin
cout << "Enter value for sin" << endl;
long double x,value,next = 0,accuracy = 0.0001;
cin >> x;
value = sin(x);
do
{
if(counts%2 == 0)
next = next + (pow(x,n)/fact(n));
else
next = next - (pow(x,n)/fact(n));
counts++;
n = n+2;
} while((fabs(next - value))> 0);
cout << "The value of sin " << x << " is " << next << endl;
}
and lets say i enter 45 for x
I get the result
The value for sin 45 in nan.
can anyone help me out on where I did wrong ?
First your while condition should be
while((fabs(next - value))> accuracy) and fact should return long double.
When you change that it still won't work for value of 45. The reason is that this Taylor series converge too slowly for large values.
Here is the error term in the formula
Here k is the number of iterations a=0 and the function is sin.In order for the condition to become false 45^(k+1)/(k+1)! times some absolute value of sin or cos (depending what the k-th derivative is) (it's between 0 and 1) should be less than 0.0001.
Well in this formula for value of 50 the number is still very large (we should expect error of around 1.3*10^18 which means we will do more than 50 iterations for sure).
45^50 and 50! will overflow and then dividing them will give you infinity/infinity=NAN.
In your original version fact value doesn't fit in the integer (your value overflows to 0) and then the division over 0 gives you infinity which after subtract of another infinity gives you NAN.
I quote from here in regard to pow:
Return value
If no errors occur, base raised to the power of exp (or
iexp) (baseexp), is returned.
If a domain error occurs, an
implementation-defined value is returned (NaN where supported)
If a pole error or a range error due to overflow occurs, ±HUGE_VAL,
±HUGE_VALF, or ±HUGE_VALL is returned.
If a range error occurs due to
underflow, the correct result (after rounding) is returned.
Reading further:
Error handling
...
except where specified above, if any argument is NaN, NaN is returned
So basically, since n is increasing and and you have many loops pow returns NaN (the compiler you use obviously supports that). The rest is arithmetic. You calculate with overflowing values.
I believe you are trying to approximate sin(x) by using its Taylor series. I am not sure if that is the way to go.
Maybe you can try to stop the loop as soon as you hit NaN and not update the variable next and simply output that. That's the closest you can get I believe with your algorithm.
If the choice of 45 implies you think the input is in degrees, you should rethink that and likely should reduce mod 2 Pi.
First fix two bugs:
long double fact(long int n)
...
}while((fabs(next - value))> accuracy);
the return value of fact will overflow quickly if it is long int. The return value of fact will overflow eventually even for long double. When you compare to 0 instead of accuracy the answer is never correct enough, so only nan can stop the while
Because of rounding error, you still never converge (while pow is giving values bigger than fact you are computing differences between big numbers, which accumulates significant rounding error, which is then never removed). So you might instead stop by computing long double m=pow(x,n)/fact(n); before increasing n in each step of the loop and use:
}while(m > accuracy*.5);
At that point, either the answer has the specified accuracy or the remaining error is dominated by rounding error and iterating further won't help.
If you had compiled your system with any reasonable level of warnings enabled you would have immediately seen that you are not using the variable accuracy. This and the fact that your fact function returns a long int are but a small part of your problem. You will never get a good result for sin(45) using your algorithm even if you correct those issues.
The problem is that with x=45, the terms in the Taylor expansion of sin(x) won't start decreasing until n=45. This is a big problem because 4545/45! is a very large number, 2428380447472097974305091567498407675884664058685302734375 / 1171023117375434566685446533210657783808, or roughly 2*1018. Your algorithm initially adds and subtracts huge numbers that only start decreasing after 20+ additions/subtractions, with the eventual hope that the result will be somewhere between -1 and +1. That is an unrealizable hope given an input value of 45 and using a native floating point type.
You could use some BigNum type (the internet is chock-full of them) with your algorithm, but that's extreme overkill when you only want four place accuracy. Alternatively, you could take advantage of the cyclical nature of sin(x), sin(x+2*pi)=sin(x). An input value of 45 is equivalent to 1.017702849742894661522992634... (modulo 2*pi). Your algorithm works quite nicely for an input of 1.017702849742894661522992634.
You can do much better than that, but taking the input value modulo 2*pi is the first step toward a reasonable algorithm for computing sine and cosine. Even better, you can use the facts that sin(x+pi)=-sin(x). This lets you reduce the range from -infinity to +infinity to 0 to pi. Even better, you can use the fact that between 0 and pi, sin(x) is symmetric about pi/2. You can do even better than that. The implementations of the trigonometric functions take extreme advantage of these behaviors, but they typically do not use Taylor approximations.
Currently I have a function in an application which takes in a float as a parameter and should perform a simple multiplication and division on the value passed in. Before the value is passed into the function in the application, it is typecast to a float as the particulars of the main application deal with the numerical data in ints. Unfortunately when I pass in the value of 0.0 to the function, it does not generate an output of 1.0 (which it should from the calculation the function performs) but merely outputs a value of 0.0 and I was wondering why the calulation was failing to produce the correct output as the program compiles and the calculation is correct as far as I'm aware.
Here is the code:
void CarPositionClass::centre(float inputPos)
{
if ((inputPos <= 0) && (inputPos >= -125))
{
membershipC = ((inputPos + 125)*(1 / 125));
}
}
It should also be noted that membershipC is a float variable that is a member of the CarPositionClass.
Change 1 / 125 to, say, 1.0 / 125. 1 / 125 uses integer division, so the result is 0.
Or change this expression
((inputPos + 125)*(1 / 125))
to
(inputPos + 125) / 125
Since inputPos is floating point, so is inputPos + 125, and then dividing a float by an integer is a float.
P.S. This is surely a duplicate question. I expect the C++ gurus to lower the dup hammer any second now. :)
The division between two integers results in an integer. At least one operand has to be a floating point type for it not to truncate the result:
membershipC = ((inputPos + 125)*(1.0 / 125));
// ^^^
Given a non-negative integer c, I need an efficient algorithm to find the largest integer x such that
x*(x-1)/2 <= c
Equivalently, I need an efficient and reliably accurate algorithm to compute:
x = floor((1 + sqrt(1 + 8*c))/2) (1)
For the sake of defineteness I tagged this question C++, so the answer should be a function written in that language. You can assume that c is an unsigned 32 bit int.
Also, if you can prove that (1) (or an equivalent expression involving floating-point arithmetic) always gives the right result, that's a valid answer too, since floating-point on modern processors can be faster than integer algorithms.
If you're willing to assume IEEE doubles with correct rounding for all operations including square root, then the expression that you wrote (plus a cast to double) gives the right answer on all inputs.
Here's an informal proof. Since c is a 32-bit unsigned integer being converted to a floating-point type with a 53-bit significand, 1 + 8*(double)c is exact, and sqrt(1 + 8*(double)c) is correctly rounded. 1 + sqrt(1 + 8*(double)c) is accurate to within one ulp, since the last term being less than 2**((32 + 3)/2) = 2**17.5 implies that the unit in the last place of the latter term is less than 1, and thus (1 + sqrt(1 + 8*(double)c))/2 is accurate to within one ulp, since division by 2 is exact.
The last piece of business is the floor. The problem cases here are when (1 + sqrt(1 + 8*(double)c))/2 is rounded up to an integer. This happens if and only if sqrt(...) rounds up to an odd integer. Since the argument of sqrt is an integer, the worst cases look like sqrt(z**2 - 1) for positive odd integers z, and we bound
z - sqrt(z**2 - 1) = z * (1 - sqrt(1 - 1/z**2)) >= 1/(2*z)
by Taylor expansion. Since z is less than 2**17.5, the gap to the nearest integer is at least 1/2**18.5 on a result of magnitude less than 2**17.5, which means that this error cannot result from a correctly rounded sqrt.
Adopting Yakk's simplification, we can write
(uint32_t)(0.5 + sqrt(0.25 + 2.0*c))
without further checking.
If we start with the quadratic formula, we quickly reach sqrt(1/4 + 2c), round up at 1/2 or higher.
Now, if you do that calculation in floating point, there can be inaccuracies.
There are two approaches to deal with these inaccuracies. The first would be to carefully determine how big they are, determine if the calculated value is close enough to a half for them to be important. If they aren't important, simply return the value. If they are, we can still bound the answer to being one of two values. Test those two values in integer math, and return.
However, we can do away with that careful bit, and note that sqrt(1/4 + 2c) is going to have an error less than 0.5 if the values are 32 bits, and we use doubles. (We cannot make this guarantee with floats, as by 2^31 the float cannot handle +0.5 without rounding).
In essense, we use the quadratic formula to reduce it to two possibilities, and then test those two.
uint64_t eval(uint64_t x) {
return x*(x-1)/2;
}
unsigned solve(unsigned c) {
double test = sqrt( 0.25 + 2.*c );
if ( eval(test+1.) <= c )
return test+1.
ASSERT( eval(test) <= c );
return test;
}
Note that converting a positive double to an integral type rounds towards 0. You can insert floors if you want.
This may be a bit tangential to your question. But, what caught my attention is the specific formula. You are trying to find the triangular root of Tn - 1 (where Tn is the nth triangular number).
I.e.:
Tn = n * (n + 1) / 2
and
Tn - n = Tn - 1 = n * (n - 1) / 2
From the nifty trick described here, for Tn we have:
n = int(sqrt(2 * c))
Looking for n such that Tn - 1 ≤ c in this case doesn't change the definition of n, for the same reason as in the original question.
Computationally, this saves a few operations, so it's theoretically faster than the exact solution (1). In reality, it's probably about the same.
Neither this solution or the one presented by David are as "exact" as your (1) though.
floor((1 + sqrt(1 + 8*c))/2) (blue) vs int(sqrt(2 * c)) (red) vs Exact (white line)
floor((1 + sqrt(1 + 8*c))/2) (blue) vs int(sqrt(0.25 + 2 * c) + 0.5 (red) vs Exact (white line)
My real point is that triangular numbers are a fun set of numbers that are connected to squares, pascal's triangle, Fibonacci numbers, et. al.
As such there are loads of identities around them which might be used to rearrange the problem in a way that didn't require a square root.
Of particular interest may be that Tn + Tn - 1 = n2
I'm assuming you know that you're working with a triangular number, but if you didn't realize that, searching for triangular roots yields a few questions such as this one which are along the same topic.
Given 2 numbers, where A <= B say for example A = 9 and B = 10, I am trying to get the percentage of how smaller A is compared to B. I need to have the percentage as an int e.g. if the result is 10.00% The int should be 1000.
Here is my code:
int A = 9;
int B = 10;
int percentage = (((1 - (double)A/B) / 0.01)) * 100;
My code returns 999 instead of 1000. Some precision related to the usage of double is lost.
Is there a way to avoid losing precision in my case?
Seems the formula you're looking for is
int result = 10000 - (A*10000+B/2)/B;
The idea is to do all computations in integers and delaying division.
To do the rounding half of the denominator is added before performing the division (otherwise you get truncation in the division and thus upper rounding because of 100%-x)
For example with A=9 and B=11 the percentage is 18.18181818... and rounding 18.18, the computation without the rounding would give 1819 instead of the expected result 1818.
Note that the computation is done all in integers so there is a risk of overflow for large values of A and B. For example if int is 32 bit then A can be up to around 200000 before risking an overflow when computing A*10000.
Using A*10000LL instead of A*10000 in the formula will trade in some speed to raise the limit to a much bigger value.
Offcourse there may be precision loss in floating point number. Either you should use fixed point number as #6502 answered or add a bias to the result to get the intended answer.
You should better do
assert(B != 0);
int percentage = ((A<0) == (B<0) ? 0.5 : -0.5) + (((1 - (double)A/B) / 0.01)) * 100;
Because of precision loss, result of (((1 - (double)A/B) / 0.01)) * 100 may be slightly less or more than intended. If you add extra 0.5, it is guaranteed to be sligthly more than intended. Now when you cast this value to an integer, you get intended answer. (floor or ceil value depending whether the fractional part of the result of equation was above or below 0.5)
I tried
float floatpercent = (((1 - (double)A/B) / 0.01)) * 100;
int percentage = (int) floatpercent;
cout<< percentage;
displays 1000
I suspect a precision loss on automatic casting to int as the root problem to your code.
[I alluded to this in a comment to the original question, but I though I'd post it as an answer.]
The core problem is that the form of expression you're using amplifies the unavoidable floating point loss of precision when representing simple fractions of 10.
Your expression (with casts stripped out for now, using standard precedence to also avoid some parens)
((1 - A/B) / 0.01) * 100
is quite a complicated way of representing what you want, although it's algebraically correct. Unfortunately, floating point numbers can only precisely represent numbers like 1/2, 1/4, 1/8, etc, their multiples, and sums of those. In particular, neither 9/10 or 1/10 or 1/100 have precise representations.
The above expression introduces these errors twice: first in the calculation of A/B, and then in the division by 0.01. These two imprecise values are then divided, which further amplifies the inherent error.
The most direct way to write what you meant (again without needed casts) is
((B-A) / B) * 10000
This produces the correct answer and considerably easier to read, I would suggest, than the original. The fully correct C form is
((B - A) / (double)B) * 10000
I've tested this and it works reliably. As others have noted, it's generally good better to work with doubles instead of floats, as their extra precision makes them less prone (but not immune) to this sort of difficulty.
Given this example C++ code snippet:
void floatSurprise()
{
// these come from some sort of calculation
int a = 18680, b = 3323524, c = 121;
float m = float(a) / c;
// variant 1: calculate result from single expression
float r1 = b - (2.0f * m * a) + (m * m * c);
cout << "r1 = " << r1 << endl;
// variant 2: break up the expression into intermediate parts,
/// then calculate
float
r2_p1 = 2.0f * m * a,
r2_p2 = m * m * c,
r2 = b - r2_p1 + r2_p2;
cout << "r2 = " << r2 << endl;
}
The output is:
dev1 = 439703
dev2 = 439702
When viewed in the debugger, the values are actually 439702.50 and 439702.25, respectively, which is interesting in itself - not sure why iostream prints floats without the fractional part by default. EDIT: The reason for this was that the default precision setting for cout was too low, needed cout << setprecision(7) at least to see the decimal point for numbers of this magnitude.
But I'm even more interested in why am I getting different results. I suppose it has to do with rounding and some subtle interplay of ints with the required float output type, but I can't put my finger on it. Which value is the correct one?
I was amazed that it was so easy to shoot myself in the foot with such a simple piece of code. Any insight will be greatly appreciated! The compiler was VC++2010.
EDIT2: I did some more investigating using a spreadsheet to generate "correct" values for the intermediate variables and found (via tracing) that indeed they were being trimmed, contributing to the precision loss in the ultimate result. I also found a problem with the single expression, because I actually used a handy function for calculating squares instead of m * m there:
template<typename T> inline T sqr(const T &arg) { return arg*arg; }
Even though I asked nicely, the compiler apparently didn't inline this, and calculated the value separately, trimming the result before returning the value to the expression, skewing the result yet again. Ouch.
You should read my long, long answer about why the same thing happens in C#:
(.1f+.2f==.3f) != (.1f+.2f).Equals(.3f) Why?
Summing up: first of all, you only get about seven decimal places of accuracy with float. The correct answer were you do to it with exact arithmetic throughout the entire calculation is about 439702.51239669... so you are getting darn close to the correct answer considering the limitations of a float, in either case.
But that doesn't explain why you are getting different results with what looks like exactly the same calculations. The answer is: the compiler is permitted wide lattitude to make your math more accurate, and apparently you have hit upon two cases where the optimizer takes what is logically the same expression and does not optimize them down to the same code.
Anyway, read my answer regarding C# carefully; everything in there applies to C++ just as well.