I have completed a port from Fortran to C++ but have discovered some differences in the COMPLEX type. Consider the following codes:
PROGRAM CMPLX
COMPLEX*16 c
REAL*8 a
c = (1.23456789, 3.45678901)
a = AIMAG(1.0 / c)
WRITE (*, *) a
END
And the C++:
#include <complex>
#include <iostream>
#include <iomanip>
int main()
{
std::complex<double> c(1.23456789, 3.45678901);
double a = (1.0 / c).imag();
std::cout << std::setprecision(15) << " " << a << std::endl;
}
Compiling the C++ version with clang++ or g++, I get the output: -0.256561150444368
Compiling the Fortran version however gives me: -0.25656115049876993
I mean, doesn't both languages follow the IEEE 754? If I run the following in Octave (Matlab):
octave:1> c=1.23456789+ 3.45678901i
c = 1.2346 + 3.4568i
octave:2> c
c = 1.2346 + 3.4568i
octave:3> output_precision(15)
octave:4> c
c = 1.23456789000000e+00 + 3.45678901000000e+00i
octave:5> 1 / c
ans = 9.16290109820952e-02 - 2.56561150444368e-01i
I get the same as the C++ version. What is up with the Fortran COMPLEX type? Am I missing some compiler flags? -ffast-math doesn't change anything. I want to produce the exact same 15 decimals in C++ and Fortran, so I easier can spot porting differences.
Any Fortran gurus around? Thanks!
In the Fortran code replace
c = (1.23456789, 3.45678901)
with
c = (1.23456789d0, 3.45678901d0)
Without a kind the real literals you use on the rhs are, most likely, 32-bit reals and you probably want 64-bit reals. The suffix d0 causes the compiler to create 64-bit reals closest to the values you provide. I've glossed over some details in this, and there are other (possibly better) ways of specifying the kind of a real number literal but this approach should work OK on any current Fortran compiler.
I don't know C++ very well, I'm not sure if the C++ code has the same problem.
If I read your question correctly the two codes produce the same answer to 8sf, the limit of single precision.
As for IEEE-754 compliance, that standard does not cover, so far as I am aware, the issues of complex arithmetic. I expect the f-p arithmetic used behind the scenes produces results on complex numbers within expected error bounds in most cases, but I'm not aware that they are guaranteed as error bounds on f-p arithmetic are.
I would propose to change all Fortran contants to DP
1.23456789_8 (or 1.23456789D00) etc
and use DIMAG instead of AIMAG
Related
I was trying to handle the integer division by zero (please don't judge, I was told I had to use <csignal> lib and I couldn't just use an if statement), but I needed to make sure the program would keep running (even though it's a very bad practice), instead of crashing or closing. The weird part is that the program should only handle division by zero but should exit for every other type of SIGFPE.
SideNote: Now, I have no idea why they use names like FPU or FE or FPE when referring to integer "exceptions" (or I should say interrupts) since the standard says clearly that dividing a floating point number should return either inf or nan for 0 / 0 (tell me if I'm wrong).
Anyway, I wrote this test code so I could better understand what I needed to do before the actual implementation. I know, it's weird to have x as a global variable but if I don't reset it, it will keep calling handle for no reason like forever....
#include <iostream>
#include <csignal>
using namespace std;
int x = 0;
void handle(int s);
int main(int argc, char * argv[]) {
signal(SIGFPE, handle);
cout << "Insert 0: ";
cin >> x; // here I would input 0, so the program can compile
x = 5 / x;
cout << "X: " << x << endl;
return 0;
}
void handle(int s) {
if (s != FPE_INTDIV) exit(1);
cout << "sig: " << s << endl;
x = 1;
}
As you can see I used FPE_INTDIV to rule out every other type of exceptions, but it doesn't work.
Eventually I discovered that FPE_INTDIV is a symbolic constant for 7 (that's what vs-code's intellisense tells me) and if I were to print the value of s, that would be 8. I discovered that, strangely enough, 8 is the value for FPE_INTOVF on which the documentation states that it's specifically designed for integer overflows.
Why on earth is the symbolic value for overflows used for integer division if there is a symbolic for integer division? What am I missing? Did someone mess up the values in the library? Am I using the wrong macros?
I should also mention, this code compiles fine with clang++ and g++ but when compiled on a Windows computer with cl, it tells me there's no macro for FPE_INTDIV.
How can I be sure of what I'm doing and write a cross platform solution that works?
I already feel like an idiot.
It's defined as:
The SIGFPE signal reports a fatal arithmetic error. Although the name is derived from “floating-point exception”, this signal actually covers all arithmetic errors, including division by zero and overflow. If a program stores integer data in a location which is then used in a floating-point operation, this often causes an “invalid operation” exception, because the processor cannot recognize the data as a floating-point number.
There's no reason for it to be labelled specifically FPE but these sorts of labels can evolve in unpredictable ways. I wouldn't read too much into it.
These signals are part of the POSIX standard and may not be fully supported or implemented in Windows. The Windows implementation of these support facilities is lacking in a number of areas, like how fork() is unsupported.
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
Solution
Thanks to #Michael Veksler 's answer, I was put on the right track to search for a solution. #Christoph, in this post, suggests trying different compiler flags to set the precision of the floating-point operations.
For me, the -mpc32 flag solved the problem.
I have to translate C++ code to C code as the new target won't have a C++ compiler.
I am running into a strange thing, where a mathematical equation gives different results when run in a C program compared to when run in a C++ program.
Equation:
float result = (float)(a+b-c+d+e);
The elements of the equation are all floats. I check the contents of the memory of each element by using
printf("a : 0x%02X%02X%02X%02X\n",((unsigned char*)&a)[0],((unsigned char*)&a)[1],((unsigned char*)&a)[2],((unsigned char*)&a)[3]);
Both in C and in C++, a b c d and e are equal, but the results are different.
Sample of a calculation in C:
a : 0x1D9969BB
b : 0x6CEDC83E
c : 0xAC89452F
d : 0xD2DC92B3
e : 0x4FE9F23C
result : 0xCD48D63E
And a sample in C++:
a : 0x1D9969BB
b : 0x6CEDC83E
c : 0xAC89452F
d : 0xD2DC92B3
e : 0x4FE9F23C
result : 0xCC48D63E
When I separate the equation in smaller parts, as in r = a + b then r = r - c and so on, the results are equal.
I have a 64-bit Windows machine.
Can someone explain why this happens?
I am sorry for this noob question, I am just starting out.
EDIT
I use the latest version of MinGW with options
-O0 -g3 -Wall -c -fmessage-length=0
EDIT 2
Sorry for the long time...
Here are the values corresponding to the above hex ones in C:
a : -0.003564424114301801
b : 0.392436385154724120
c : 0.000000000179659565
d : -0.000000068388217755
e : 0.029652265831828117
r : 0.418524175882339480
And here are for C++:
a : -0.003564424114301801
b : 0.392436385154724120
c : 0.000000000179659565
d : -0.000000068388217755
e : 0.029652265831828117
r : 0.418524146080017090
They are printed like printf("a : %.18f\n",a);
The values are not known at compile time, the equation is in a function called multiple times throughout the execution. The elements of the equation are computed inside the function.
Also, I observed a strange thing: I ran the exact equation in a new "pure" project (for both C and C++), i.e. only the main itself. The values of the elements are the same as the ones above (in float). The result is r : 0xD148D63E for both. The same as in #geza 's comment.
Introduction: Given that the question is not detailed enough, I am left to speculate the infamous gcc's 323 bug. As the low bug-ID suggests, this bug has been there forever. The bug report existed since June 2000, currently has 94 (!) duplicates, and the last one reported only half a year ago (on 2018-08-28). The bug affects only 32 bit executable on Intel computers (like cygwin). I assume that OP's code uses x87 floating point instructions, which are the default for 32 bit executables, while SSE instructions are only optional. Since 64 bit executables are more prevalent than 32, and no longer depend on x87 instructions, this bug has zero chance of ever being fixed.
Bug description: The x87 architecture has 80 bit floating point registers. The float requires only 32 bits. The bug is that x87 floating point operations are always done with 80 bits accuracy (subject to hardware configuration flag). This extra accuracy makes precision very flaky, because it depends on when the registers are being spilled (written) to memory.
If a 80 bit register is spilled into a 32 bit variable in memory, then extra precision is lost. This is the correct behavior if this happened after each floating point operation (since float is supposed to be 32 bits). However, spilling to memory slows things down and no compiler writer wants the executable to run slow. So by default the values are not spilled to memory.
Now, sometimes the value is spilled to memory and sometimes it is not. It depends on optimization level, on compiler heuristics, and on other seemingly random factors. Even with -O0 there could be slightly different strategies for dealing with spilling the x87 registers to memory, resulting in slightly different results. The strategy of spilling is probably the difference between your C and C++ compilers that you experience.
Work around:
For ways to handle this, please read c handling of excess precision. Try running your compiler with -fexcess-precision=standard and compare it with -fexcess-precision=fast. You can also try playing with -mfpmath=sse.
NOTE: According to the C++ standard this is not really a bug. However, it is a bug according to the documentation of GCC which claims to follow the IEEE-754 FP standard on Intel architectures (like it does on many other architectures). Obviously bug 323 violates the IEE-754 standard.
NOTE 2: On some optimization levels -fast-math is invoked, and all bets are off regarding to extra precision and evaluation order.
EDIT I have simulated the described behavior on an intel 64-bit system, and got the same results as the OP. Here is the code:
int main()
{
float a = hex2float(0x1D9969BB);
float b = hex2float(0x6CEDC83E);
float c = hex2float(0xAC89452F);
float d = hex2float(0xD2DC92B3);
float e = hex2float(0x4FE9F23C);
float result = (float)((double)a+b-c+d+e);
print("result", result);
result = flush(flush(flush(flush(a+b)-c)+d)+e);
print("result2", result);
}
The implementations of the support functions:
float hex2float(uint32_t num)
{
uint32_t rev = (num >> 24) | ((num >> 8) & 0xff00) | ((num << 8) & 0xff0000) | (num << 24);
float f;
memcpy(&f, &rev, 4);
return f;
}
void print(const char* label, float val)
{
printf("%10s (%13.10f) : 0x%02X%02X%02X%02X\n", label, val, ((unsigned char*)&val)[0],((unsigned char*)&val)[1],((unsigned char*)&val)[2],((unsigned char*)&val)[3]);
}
float flush(float x)
{
volatile float buf = x;
return buf;
}
After running this I have got exactly the same difference between the results:
result ( 0.4185241461) : 0xCC48D63E
result2 ( 0.4185241759) : 0xCD48D63E
For some reason this is different than the "pure" version described at the question. At one point I was also getting the same results as the "pure" version, but since then the question has changed. The original values in the original question were different. They were:
float a = hex2float(0x1D9969BB);
float b = hex2float(0x6CEDC83E);
float c = hex2float(0xD2DC92B3);
float d = hex2float(0xA61FD930);
float e = hex2float(0x4FE9F23C);
and with these values the resulting output is:
result ( 0.4185242951) : 0xD148D63E
result2 ( 0.4185242951) : 0xD148D63E
The C and C++ standards both permit floating-point expressions to be evaluated with more precision than the nominal type. Thus, a+b-c+d+e may be evaluated using double even though the types are float, and the compiler may optimize the expression in other ways. In particular, using exact mathematics is essentially using an infinite amount of precision, so the compiler is free to optimize or otherwise rearrange the expression based on mathematical properties rather than floating-point arithmetic properties.
It appears, for whatever reason, your compiler is choosing to use this liberty to evaluate the expression differently in different circumstances (which may be related to the language being compiled or due to other variations between your C and C++ code). One may be evaluating (((a+b)-c)+d)+e while the other does (((a+b)+d)+e)-c, or other variations.
In both languages, the compiler is required to “discard” the excess precision when a cast or assignment is performed. So you can compel a certain evaluation by inserting casts or assignments. Casts would make a mess of the expression, so assignments may be easier to read:
float t0 = a+b;
float t1 = t0-c;
float t2 = t1+d;
float result = t2+e;
Here's the code:
#include <iostream>
#include <math.h>
const double ln2per12 = log(2.0) / 12.0;
int main() {
std::cout.precision(100);
double target = 9.800000000000000710542735760100185871124267578125;
double unnormalizatedValue = 9.79999999999063220457173883914947509765625;
double ln2per12edValue = unnormalizatedValue * ln2per12;
double errorLn2per12 = fabs(target - ln2per12edValue / ln2per12);
std::cout << unnormalizatedValue << std::endl;
std::cout << ln2per12 << std::endl;
std::cout << errorLn2per12 << " <<<<< its different" << std::endl;
}
If I try on my machine (MSVC), or here (GCC):
errorLn2per12 = 9.3702823278363212011754512786865234375e-12
Instead, here (GCC):
errorLn2per12 = 9.368505970996920950710773468017578125e-12
which is different. Its due to Machine Epsilon? Or Compiler precision flags? Or a different IEEE evaluation?
What's the cause here for this drift? The problem seems in fabs() function (since the other values seems the same).
Even without -Ofast, the C++ standard does not require implementations to be exact with log (or sin, or exp, etc.), only that they be within a few ulp (i.e. there may be some inaccuracies in the last binary places). This allows faster hardware (or software) approximations, which each platform/compiler may do differently.
(The only floating point math function that you will always get perfect results from on all platforms is sqrt.)
More annoyingly, you may even get different results between compilation (the compiler may use some internal library to be as precise as float/double allows for constant expressions) and runtime (e.g. hardware-supported approximations).
If you want log to give the exact same result across platforms and compilers, you will have to implement it yourself using only +, -, *, / and sqrt (or find a library with this guarantee). And avoid a whole host of pitfalls along the way.
If you need floating point determinism in general, I strongly recommend reading this article to understand how big of a problem you have ahead of you: https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/
Trying to solve a computer vision problem, I have to minimize a nonlinear energy function, implementing it in C++. Although I didn't find a library to help me with the specific function, I have the math for it. So what's the best way to go from symbolic math to C++ code?
Example: given the functions g(x):=x^2 and f(x):=x+2, let's imagine I am interested in converting f(g(x)) to C code; the obvious C code would be y=x^2+2; however for complicated math including jacobians, etc, it is not so easy, translating to pages and pages of operations.
I tried already Matlab and it's conversion module to C code, but the code is far from being optimized (ex: same operations repeating many times instead of reusing the result).
there exists NLopt library callable from C++, C, Matlab, Fortran, (...) for nonlinear optimizations. The implementation of minimization procedure using this library might look like this:
#include <nlopt.hpp>
nlopt::opt opt(nlopt::LD_MMA, 2);
std::vector<double> lb(2);
lb[0] = -HUGE_VAL; lb[1] = 0;
opt.set_lower_bounds(lb);
opt.set_min_objective(myfunc, NULL);
my_constraint_data data[2] = { {2,0}, {-1,1} };
opt.add_inequality_constraint(myconstraint, &data[0], 1e-8);
opt.add_inequality_constraint(myconstraint, &data[1], 1e-8);
opt.set_xtol_rel(1e-4);
std::vector<double> x(2);
x[0] = 1.234; x[1] = 5.678;
double minf;
nlopt::result result = opt.optimize(x, minf);
Please explain output of the below given code.I m getting different values of c for both the cases i.e,
Case 1 : Value of n is taken from the standard input.
Case 2 : Value of n is directly written in the code.
link : http://www.ideone.com/UjYFQd
#include <iostream>
#include <cstdio>
#include <math.h>
using namespace std;
int main()
{
int c;
int n;
scanf("%d", &n); //n = 64
c = (log(n) / log(2));
cout << c << endl; //OUTPUT = 5
n = 64;
c = (log(n) / log(2));
cout << c << endl; //OUTPUT = 6
return 0;
}
You may see this because of how the floating point number is stored:
double result = log(n) / log(2); // where you input n as 64
int c = (int)result; // this will truncate result. If result is 5.99999999999999, you will get 5
When you hardcode the value, the compiler will optimize it for you:
double result = log(64) / log(2); // which is the same as 6 * log(2) / log(2)
int c = (int)result;
Will more than likely be replaced entirely with:
int c = 6;
Because the compiler will see that you are using a bunch of compile-time constants to store the value in a variable (it will go ahead and crunch the value at compile time).
If you want to get the integer result for the operation, you should use std::round instead of just casting to an int.
int c = std::round(log(n) / log(2));
The first time around, log(n)/log(2) is computed and the result is very close to 6 but slightly less. This is just how floating point computation works: neither log(64) nor log(2) have an infinitely precise representation in binary floating point, so you can expect the result of dividing one by the other to be slightly off from the true mathematical value. Depending on the implementation you can expect to get 5 or 6.
In the second computation:
n = 64;
c = (log(n) / log(2));
The value assigned to c can be inferred to be a compile-time constant and can be computed by the compiler. The compiler does the computation in a different environment than the program while it runs, so you can expect to get slightly different results from computations performed at compile-time and at runtime.
For example, a compiler generating code for x86 may choose to use x87 floating point instructions that use 80bit floating point arithmetic, while the compiler itself uses standard 64bit floating point arithmetic to compute compile-time constants.
Check the assembler output from your compiler to confirm this. Using GCC 4.8 I get 6 from both computations.
The difference in output can be explained by the fact that gcc is optimizing out the calls to log in the constant cases for example, in this case:
n = 64;
c = (log(n) / log(2));
both calls to log are being done at compile time, these compile time evaluation can cause different results. This is documented in the gcc manual in the Other Built-in Functions Provided by GCC section which says:
GCC includes built-in versions of many of the functions in the standard C library. The versions prefixed with _builtin are always treated as having the same meaning as the C library function even if you specify the -fno-builtin option. (see C Dialect Options) Many of these functions are only optimized in certain cases; if they are not optimized in a particular case, a call to the library function is emitted.
and log is one of the many functions that has builtin versions. If I build using -fno-builtin all four calls to log are made but without that only one call to log is emitted you can check this by building with the -S flag which will output the assembly which gcc generate.