Here's the code:
#include <iostream>
#include <math.h>
const double ln2per12 = log(2.0) / 12.0;
int main() {
std::cout.precision(100);
double target = 9.800000000000000710542735760100185871124267578125;
double unnormalizatedValue = 9.79999999999063220457173883914947509765625;
double ln2per12edValue = unnormalizatedValue * ln2per12;
double errorLn2per12 = fabs(target - ln2per12edValue / ln2per12);
std::cout << unnormalizatedValue << std::endl;
std::cout << ln2per12 << std::endl;
std::cout << errorLn2per12 << " <<<<< its different" << std::endl;
}
If I try on my machine (MSVC), or here (GCC):
errorLn2per12 = 9.3702823278363212011754512786865234375e-12
Instead, here (GCC):
errorLn2per12 = 9.368505970996920950710773468017578125e-12
which is different. Its due to Machine Epsilon? Or Compiler precision flags? Or a different IEEE evaluation?
What's the cause here for this drift? The problem seems in fabs() function (since the other values seems the same).
Even without -Ofast, the C++ standard does not require implementations to be exact with log (or sin, or exp, etc.), only that they be within a few ulp (i.e. there may be some inaccuracies in the last binary places). This allows faster hardware (or software) approximations, which each platform/compiler may do differently.
(The only floating point math function that you will always get perfect results from on all platforms is sqrt.)
More annoyingly, you may even get different results between compilation (the compiler may use some internal library to be as precise as float/double allows for constant expressions) and runtime (e.g. hardware-supported approximations).
If you want log to give the exact same result across platforms and compilers, you will have to implement it yourself using only +, -, *, / and sqrt (or find a library with this guarantee). And avoid a whole host of pitfalls along the way.
If you need floating point determinism in general, I strongly recommend reading this article to understand how big of a problem you have ahead of you: https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
Related
I'm learning c++ from a tutorial and there, I was told that comparing floats in c++ can be very comfusing. For example, in the below code:
#include <iostream>
using namespace std;
int main()
{
float decimalNumber = 1.2;
if (decimalNumber == 1.2){
cout << "Equal" << endl;
}else{
cout << "Not equal" << endl;
}
return 0;
}
I would get "not equal". I agree on this point. The tutor told that If we need to compare floats, we can use > to the nearest number. (that would not be very precise). I searched for different ways to compare floats in google and I got many complex ways of doing that.
Then I created a program myself:
#include <iostream>
using namespace std;
int main()
{
float decimalNumber = 1.2;
if (decimalNumber == (float)1.2){
cout << "Equal" << endl;
}else{
cout << "Not equal" << endl;
}
return 0;
}
After type-casting like above, I got "Equal".
The thing I want to know is that should I use the above way to compare floats in all of my programs? Does this have some cons?
Note : I know how a number is represented exactly in the memory and how 0.1 + 0.2 !- 0.3 as described in another SO Question. I just want to know that can I check the equality of two floats in the above way?
The thing I want to know is that should I use the above way to compare floats in all of my programs?
Depends on context. Equality comparison is very rarely useful with floats.
However yes, whether you compare with equality or relationality, you should compare floating point objects of same type, instead of mixing float and double.
Does this have some cons?
Floating point calculations potentially have error. Result might not be what you expect. When there is error, then equality comparison is meaningless.
The reason that second example works is, in a way, pure chance.
Some factors to consider:
When both sides of the comparison are of type float, the compiler is probably more likely to "optimise out" the comparison so that it just happens during the compilation process. It can look at the literals and realise that the two numbers are logically the same, even though at runtime they may differ at lower levels of precision.
When both sides of the comparison are of type float, they have the same precision, so if you've created the values in the same way (here, by a literal), any error in the lower levels of precision could be identical. When one of them is a double, you have additional erroring bits at the end that throw off the comparison. And if you'd created one via 0.6 + 0.6 then the result could also be different as the errors propagate differently.
In general, do not rely on this. You can't really predict how "accurate" your floats will be when they contain numbers not representable exactly in binary. You should stick to epsilon-range compares (where appropriate) if you need loose value comparison, even if direct comparison appears to "work" occasionally without it.
A good approach to take, if you don't actually need floating point, is to use fixed point instead. There are no built-in fixed-point types in the language, but that's okay because you can simulate them trivially with a little arithmetic. It's hard to know what your use case is here, but if you only need one decimal place, instead you can store int decimalNumber = 12 (i.e. shift the decimal point by one) and just divide by ten whenever you need to display it. Two ints of value 12 always compare nicely.
Money is a great example of this: count in pennies (or tenths of pennies), not in pounds, to avoid errors creeping in that scam your customers out of their cash. 😉
I'm writing an audio plugin and I would like to convert a series of real numbers representing sample values into a complex array representing frequency and phase. Then I want to be able to do the opposite, turning a complex array of frequency and phases to a series of real numbers, reconstructing the original data.
I'm using Intel MKL and I see only the possibility to perform real->real conversions or complex->complex conversions. Here's the reference I'm using: Intel MKL FFT Functions.
In that reference, there are two overloaded functions: DftiComputeForward and DftiComputeBackward. So, I would like to use these to do the conversions. In the reference for DftiCreateDescriptor, the only options available for DFTI_FORWARD_DOMAIN are DFTI_COMPLEX and DFTI_REAL, but no option for mixed conversions.
Edit
I found that the phase is actually equal to atan(imaginary/real). I don't want to mislead anyone getting information from questions.
Edit
I just learned that it's best to use atan2(imaginary,real). More information is in the comments.
Every real number is a complex number: ℝ ⊂ ℤ. So going forward from float or double in time domain to complex is trivial. The language does that for you!
#include <complex>
#include <iostream>
int main() {
double d = 42.0;
std::complex<double> z = d;
std::cout << d << " = " << z << '\n';
}
Output: 42 = (42,0)
And the C++ standard library also does everything else. It's quite simple, in fact. For once, the library does pretty much what it says on the box.
Even better: std::complex offers array access. You can reinterpret-cast std::complex<T> to T[2], whether through a reference or a pointer. And thus, std::complex can be "stripped" of its identity and passed into any lower-level API that requires pairs of floats or pairs of doubles.
The complex frequency domain data can be converted to magnitude and phase, and back, as follows:
#include <complex>
#include <iostream>
int main() {
std::complex<double> z{0.7071, 0.7071};
double magnitude = abs(z);
double phase = arg(z); // in radians
std::cout << z << " ≈ (" << magnitude << "∠" << phase*180.0/M_PI << "°)\n";
std::complex<double> z2 = std::polar(magnitude, phase);
std::cout << " ≈ " << z2 << '\n';
}
Output:
(0.7071,0.7071) ≈ (0.99999∠45°)
≈ (0.7071,0.7071)
Once you get the "real" data back, it's not likely that the imaginary part of the time domain data will be zero - it depends on what processing you'll do with the frequency domain data. What you want to convert back is the magnitude of each complex time sample, using the abs function.
There's stuff in the C++ library that's mind-bogglingly overcomplicated, to the point where you have to have the reference open or you won't remember how to use it. See e.g. the mess known as the random number support. Ugh. But complex number support is relatively sane and even follows the notation used in teaching complex number arithmetic :)
I have a function doing some mathematical computation and returning a double. It ends up with different results under Windows and Android due to std::exp implementation beging different (Why do I get platform-specific result for std::exp?). The e-17 rounding difference gets propagated and in the end it's not just a rounding difference that I get (results can change 2.36 to 2.47 in the end). As I compare the result to some expected values, I want this function to return the same result on all platform.
So I need to round my result. The simpliest solution to do this is apparently (as far as I could find on the web) to do std::ceil(d*std::pow<double>(10,precision))/std::pow<double>(10,precision). However, I feel like this could still end up with different results depending on the platform (and moreover, it's hard to decide what precision should be).
I was wondering if hard-coding the least significant byte of the double could be a good rounding strategy.
This quick test seems to show that "yes":
#include <iostream>
#include <iomanip>
double roundByCast( double d )
{
double rounded = d;
unsigned char* temp = (unsigned char*) &rounded;
// changing least significant byte to be always the same
temp[0] = 128;
return rounded;
}
void showRoundInfo( double d, double rounded )
{
double diff = std::abs(d-rounded);
std::cout << "cast: " << d << " rounded to " << rounded << " (diff=" << diff << ")" << std::endl;
}
void roundIt( double d )
{
showRoundInfo( d, roundByCast(d) );
}
int main( int argc, char* argv[] )
{
roundIt( 7.87234042553191493141184764681 );
roundIt( 0.000000000000000000000184764681 );
roundIt( 78723404.2553191493141184764681 );
}
This outputs:
cast: 7.87234 rounded to 7.87234 (diff=2.66454e-14)
cast: 1.84765e-22 rounded to 1.84765e-22 (diff=9.87415e-37)
cast: 7.87234e+07 rounded to 7.87234e+07 (diff=4.47035e-07)
My question is:
Is unsigned char* temp = (unsigned char*) &rounded safe or is there an undefined behaviour here, and why?
If there is no UB (or if there is a better way to do this without UB), is such a round function safe and accurate for all input?
Note: I know floating point numbers are inaccurate. Please don't mark as duplicate of Is floating point math broken? or Why Are Floating Point Numbers Inaccurate?. I understand why results are different, I'm just looking for a way to make them be identical on all targetted platforms.
Edit, I may reformulate my question as people are asking why I have different values and why I want them to be the same.
Let's say you get a double from a computation that could end up with a different value due to platform specific implementations (like std::exp). If you want to fix those different double to end up having the exact same memory representation (1) on all platforms, and you want to loose the fewest precision as possible, then, is fixing the least significant byte a good approach? (because I feel that rounding to an arbitrary given precision is likely to loose more information than this trick).
(1) By "same representation", I mean that if you transform it to a std::bitset, you want to see the same bits sequence for all platform.
No, rounding is not a strategy for removing small errors, or guaranteeing agreement with calculations performed with errors.
For any slicing of the number line into ranges, you will successfully eliminate most slight deviations (by placing them in the same bucket and clamping to the same value), but you greatly increase the deviation if your original pair of values straddle a boundary.
In your particular case of hardcoding the least significant byte, the very near values
0x1.mmmmmmm100
and
0x1.mmmmmmm0ff
have a deviation of only one ULP... but after your rounding, they differ by 256 ULP. Oops!
Is unsigned char* temp = (unsigned char*) &rounded safe or is there an undefined behaviour here, and why?
It is well defined, as aliasing through unsigned char is allowed.
is such a round function safe and accurate for all input?
No. You cannot perfectly fix this problem with truncating/rounding. Consider, that one implementation gives 0x.....0ff, and the other 0x.....100. Setting the lsb to 0x00 will make the original 1 ulp difference to 256 ulps.
No rounding algorithm can fix this.
You have two options:
don't use floating point, use some other way (for example, fixed point)
embed a floating point library into your application, which only uses basic floating point arithmetic (+, -, *, /, sqrt), and don't use -ffast-math, or any equivalent option. This way, if you're on a IEEE-754 compatible platform, floating point results should be the same, as IEEE-754 mandates that basic operations should be calculated "perfectly". It means as if the operation calculated at infinite precision, and then rounded to the resulting representation.
Btw, if an input 1e-17 difference means a huge output difference, then your problem/algorithm is ill-conditioned, which generally should be avoided, as it usually doesn't give you meaningful results.
What you are doing is totally, totally misguided.
Your problem is not that you are getting different results (2.36 vs. 2.47). Your problem is that at least one of these results, and likely both, have massive errors. Your Windows and Android results are not just different, they are WRONG. (At least one of them, and you have no idea which one).
Find out why you get these massive errors and change your algorithms to not increase tiny rounding errors massively. Or you have a problem that is inherently chaotic, in which case the difference between results is actually very useful information.
What you are trying just makes the rounding errors 256 times bigger, and if two different results end in ....1ff and ....200 hexadecimal, then you change these to ....180 and ....280, so even the difference between slightly different numbers can grow by a factor 256.
And on a bigendian machine your code will just go kaboom!!!
Your function won't work because of aliasing.
double roundByCast( double d )
{
double rounded = d;
unsigned char* temp = (unsigned char*) &rounded;
// changing least significant byte to be always the same
temp[0] = 128;
return rounded;
}
Casting to unsigned char* for temp is allowed, because char* casts are the exception to the aliasing rules. That's necessary for functions like read, write, memcpy, etc, so that they can copy values to and from byte representations.
However, you aren't allowed to write to temp[0] and then assume that rounded changed. You must create a new double variable (on the stack is fine) and memcpy temp back to it.
When I wrote the following code, instead of causing runtime error, it outputs 'inf'. Now, is there any way to assign this value ('inf') to a variable, like regular numeric values? How to check if a division yields 'inf'?
#include<iostream>
int main(){
double a = 1, b = 0;
std::cout << a / b << endl;
return 0;
}
C++ does not require implementations to support infinity or division by zero. Many implementations will, as many implementations use IEEE 754 formats even if they do not fully support IEEE 754 semantics.
When you want to use infinity as a value (that is, you want to refer to infinity in source code), you should not generate it by dividing by zero. Instead, include <limits> and use std::numeric_limits<T>::infinity() with T specified as double.
Returns the special value "positive infinity", as represented by the floating-point type T. Only meaningful if std::numeric_limits< T >::has_infinity== true.
(You may also see code that includes <cmath> and uses INFINITY, which are inherited from C.)
When you want to check if a number is finite, include <cmath> and use std::isfinite. Note that computations with infinite values tend to ultimately produce NaNs, and std::isfinite(x) is generally more convenient than !std::isinf(x) && !std::isnan(x).
A final warning in case you are using unsafe compiler flags: In case you use, e.g., GCC's -ffinite-math-only (included in -ffast-math) then std::isfinite does not work.
It appears to be I can:
#include<iostream>
int main(){
double a = 1, b = 0, c = 1/0.0;
std::cout << a / b << endl;
if (a / b == c) std::cout << "Yes you can.\n";
return 0;
}
What is the reason for the catastrophic performance of pow() for NaN values? As far as I can work out, NaNs should not have an impact on performance if the floating-point math is done with SSE instead of the x87 FPU.
This seems to be true for elementary operations, but not for pow(). I compared multiplication and division of a double to squaring and then taking the square root. If I compile the piece of code below with g++ -lrt, I get the following result:
multTime(3.14159): 20.1328ms
multTime(nan): 244.173ms
powTime(3.14159): 92.0235ms
powTime(nan): 1322.33ms
As expected, calculations involving NaN take considerably longer. Compiling with g++ -lrt -msse2 -mfpmath=sse however results in the following times:
multTime(3.14159): 22.0213ms
multTime(nan): 13.066ms
powTime(3.14159): 97.7823ms
powTime(nan): 1211.27ms
The multiplication / division of NaN is now much faster (actually faster than with a real number), but the squaring and taking the square root still takes a very long time.
Test code (compiled with gcc 4.1.2 on 32bit OpenSuSE 10.2 in VMWare, CPU is a Core i7-2620M)
#include <iostream>
#include <sys/time.h>
#include <cmath>
void multTime( double d )
{
struct timespec startTime, endTime;
double durationNanoseconds;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime);
for(int i=0; i<1000000; i++)
{
d = 2*d;
d = 0.5*d;
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime);
durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec);
std::cout << "multTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl;
}
void powTime( double d )
{
struct timespec startTime, endTime;
double durationNanoseconds;
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &startTime);
for(int i=0; i<1000000; i++)
{
d = pow(d,2);
d = pow(d,0.5);
}
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &endTime);
durationNanoseconds = 1e9*(endTime.tv_sec - startTime.tv_sec) + (endTime.tv_nsec - startTime.tv_nsec);
std::cout << "powTime(" << d << "): " << durationNanoseconds/1e6 << "ms" << std::endl;
}
int main()
{
multTime(3.14159);
multTime(NAN);
powTime(3.14159);
powTime(NAN);
}
Edit:
Unfortunately, my knowledge on this topic is extremely limited, but I guess that the glibc pow() never uses SSE on a 32bit system, but rather some assembly in sysdeps/i386/fpu/e_pow.S. There is a function __ieee754_pow_sse2 in more recent glibc versions, but it's in sysdeps/x86_64/fpu/multiarch/e_pow.c and therefore probably only works on x64. However, all of this might be irrelevant here, because pow() is also a gcc built-in function. For an easy fix, see Z boson's answer.
"NaNs should not have an impact on performance if the floating-point math is done with SSE instead of the x87 FPU."
I'm not sure this follows from the resource you quote. In any case, pow is a C library function. It is not implemented as an instruction, even on x87. So there are 2 separate issues here - how SSE handles NaN values, and how a pow function implementation handles NaN values.
If the pow function implementation uses a different path for special values like +/-Inf, or NaN, you might expect a NaN value for the base, or exponent, to return a value quickly. On the other hand, the implementation might not handle this as a separate case, and simply relies on floating-point operations to propagate intermediate results as NaN values.
Starting with 'Sandy Bridge', many of the performance penalties associated with denormals were reduced or eliminated. Not all though, as the author describes a penalty for mulps. Therefore, it would be reasonable to expect that not all arithmetic operations involving NaNs are 'fast'. Some architectures might even revert to microcode to handle NaNs in different contexts.
Your math library is too old. Either find another math library which implements pow with NAN better or implement a fix like this:
inline double pow_fix(double x, double y)
{
if(x!=x) return x;
if(y!=y) return y;
return pow(x,y);
}
Compile with g++ -O3 -msse2 -mfpmath=sse foo.cpp.
If you want to do squaring or taking the square root, use d*d or sqrt(d). The pow(d,2) and pow(d,0.5) will be slower and possibly less accurate, unless your compiler optimizes them based on the constant second argument 2 and 0.5; note that such an optimization may not always be possible for pow(d,0.5) since it returns 0.0 if d is a negative zero, while sqrt(d) returns -0.0.
For those doing timings, please make sure that you test the same thing.
With a complex function like pow() there are lots of ways that NaN could trigger slowness. It could be that the operations on NaNs are slow, or it could be that the pow() implementation checks for all sorts of special values that it can handle efficiently, and the NaN values fail all of those tests, leading to a more expensive path being taken. You'd have to step through the code to find out for sure.
A more recent implementation of pow() might include additional checks to handle NaN more efficiently, but this is always a tradeoff -- it would be a shame to have pow() handle 'normal' cases more slowly in order to accelerate NaN handling.
My blog post only applied to individual instructions, not complex functions like pow().