A little background: I was working on some data conversion from C to C# by using a C++/CLI midlayer, and I noticed a peculiarity with the way the debugger shows floats and doubles, depending on which dll the code is executing in (see code and images below). At first I thought it had something to do with managed/unmanaged differences, but then I realized that if I completely left the C# layer out of it and only used unmanaged data types, the same behaviour was exhibited.
Test Case: To further explore the issue, I created an isolated test case to clearly identify the strange behaviour. I am assuming that anyone who may be testing this code already has a working Solution and dllimport/dllexport/ macros set up. Mine is called DLL_EXPORT. If you need a minimal working header file, let me know. Here the main application is in C and calling a function from a C++/CLI dll. I am using Visual Studio 2015 and both assemblies are 32 bit.
I am a bit concerned, as I am not sure if this is something I need to worry about or it's just something the debugger is doing (I am leaning towards the latter). And to be quite honest, I am just outright curious as to what's happening here.
Question: Can anyone explain the observed behaviour or at least point me in the right direction?
C - Calling Function
void floatTest()
{
float floatValC = 42.42f;
double doubleValC = 42.42;
//even if passing the address, behaviour is same as all others.
float retFloat = 42.42f;
double retDouble = 42.42;
int sizeOfFloatC = sizeof(float);
int sizeOfDoubleC = sizeof(double);
floatTestCPP(floatValC, doubleValC, &retFloat, &retDouble);
//do some dummy math to make compiler happy (i.e. no unsused variable warnings)
sizeOfFloatC = sizeOfFloatC + sizeOfDoubleC;//break point here
}
C++/CLI Header
DLL_EXPORT void floatTestCPP(float floatVal, double doubleVal,
float *floatRet, double *doubleRet);
C++/CLI Source
//as you can see, there are no managed types in this function
void floatTestCPP(float floatVal, double doubleVal, float *floatRet, double *doubleRet)
{
float floatLocal = floatVal;
double doubleLocal = doubleVal;
int sizeOfFloatCPP = sizeof(float);
int sizeOfDoubleCPP = sizeof(double);
*floatRet = 42.42f;
*doubleRet = 42.42;
//do some dummy math to make compiler happy (no warnings)
floatLocal = (float)doubleLocal;//break point here
sizeOfDoubleCPP = sizeOfFloatCPP;
}
Debugger in C - break point on last line of floatTest()
Debugger in C++/CLI - break point on the second to last line of floatTestCPP()
Consider Debugger in C++/CLI itself is not necessarily coded in C, C# or C++.
MS libraries support the "R" format: A string that can round-trip to an identical number. I suspect this or a g format was used.
Without MS source code, the following is only a good supposition:
The debug output is enough to distinguish the double from other nearby double. So code need not print "42.420000000000002", but "42.42" is sufficient - whatever format is used.
42.42 as an IEEE double is about 42.4200000000000017053025658242404460906982... and the debugger certainly need not print the exact value.
Potential; similar C code
int main(void) {
puts("12.34567890123456");
double d = 42.42;
printf("%.16g\n", nextafter(d,0));
printf("%.16g\n", d);
printf("%.17g\n", d);
printf("%.16g\n", nextafter(d,2*d));
d = 1 / 3.0f;
printf("%.9g\n", nextafterf(d,0));
printf("%.9g\n", d);
printf("%.9g\n", nextafterf(d,2*d));
d = 1 / 3.0f;
printf("%.16g\n", nextafter(d,0));
printf("%.16g\n", d);
printf("%.16g\n", nextafter(d,2*d));
}
output
12.34567890123456
42.41999999999999
42.42
42.420000000000002 // this level of precision not needed.
42.42000000000001
0.333333313
0.333333343
0.333333373
0.3333333432674407
0.3333333432674408
0.3333333432674409
For your code to convert a double to text with sufficient textual precision and back to double to "round-trip" the number, see Printf width specifier to maintain precision of floating-point value.
Related
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
Solution
Thanks to #Michael Veksler 's answer, I was put on the right track to search for a solution. #Christoph, in this post, suggests trying different compiler flags to set the precision of the floating-point operations.
For me, the -mpc32 flag solved the problem.
I have to translate C++ code to C code as the new target won't have a C++ compiler.
I am running into a strange thing, where a mathematical equation gives different results when run in a C program compared to when run in a C++ program.
Equation:
float result = (float)(a+b-c+d+e);
The elements of the equation are all floats. I check the contents of the memory of each element by using
printf("a : 0x%02X%02X%02X%02X\n",((unsigned char*)&a)[0],((unsigned char*)&a)[1],((unsigned char*)&a)[2],((unsigned char*)&a)[3]);
Both in C and in C++, a b c d and e are equal, but the results are different.
Sample of a calculation in C:
a : 0x1D9969BB
b : 0x6CEDC83E
c : 0xAC89452F
d : 0xD2DC92B3
e : 0x4FE9F23C
result : 0xCD48D63E
And a sample in C++:
a : 0x1D9969BB
b : 0x6CEDC83E
c : 0xAC89452F
d : 0xD2DC92B3
e : 0x4FE9F23C
result : 0xCC48D63E
When I separate the equation in smaller parts, as in r = a + b then r = r - c and so on, the results are equal.
I have a 64-bit Windows machine.
Can someone explain why this happens?
I am sorry for this noob question, I am just starting out.
EDIT
I use the latest version of MinGW with options
-O0 -g3 -Wall -c -fmessage-length=0
EDIT 2
Sorry for the long time...
Here are the values corresponding to the above hex ones in C:
a : -0.003564424114301801
b : 0.392436385154724120
c : 0.000000000179659565
d : -0.000000068388217755
e : 0.029652265831828117
r : 0.418524175882339480
And here are for C++:
a : -0.003564424114301801
b : 0.392436385154724120
c : 0.000000000179659565
d : -0.000000068388217755
e : 0.029652265831828117
r : 0.418524146080017090
They are printed like printf("a : %.18f\n",a);
The values are not known at compile time, the equation is in a function called multiple times throughout the execution. The elements of the equation are computed inside the function.
Also, I observed a strange thing: I ran the exact equation in a new "pure" project (for both C and C++), i.e. only the main itself. The values of the elements are the same as the ones above (in float). The result is r : 0xD148D63E for both. The same as in #geza 's comment.
Introduction: Given that the question is not detailed enough, I am left to speculate the infamous gcc's 323 bug. As the low bug-ID suggests, this bug has been there forever. The bug report existed since June 2000, currently has 94 (!) duplicates, and the last one reported only half a year ago (on 2018-08-28). The bug affects only 32 bit executable on Intel computers (like cygwin). I assume that OP's code uses x87 floating point instructions, which are the default for 32 bit executables, while SSE instructions are only optional. Since 64 bit executables are more prevalent than 32, and no longer depend on x87 instructions, this bug has zero chance of ever being fixed.
Bug description: The x87 architecture has 80 bit floating point registers. The float requires only 32 bits. The bug is that x87 floating point operations are always done with 80 bits accuracy (subject to hardware configuration flag). This extra accuracy makes precision very flaky, because it depends on when the registers are being spilled (written) to memory.
If a 80 bit register is spilled into a 32 bit variable in memory, then extra precision is lost. This is the correct behavior if this happened after each floating point operation (since float is supposed to be 32 bits). However, spilling to memory slows things down and no compiler writer wants the executable to run slow. So by default the values are not spilled to memory.
Now, sometimes the value is spilled to memory and sometimes it is not. It depends on optimization level, on compiler heuristics, and on other seemingly random factors. Even with -O0 there could be slightly different strategies for dealing with spilling the x87 registers to memory, resulting in slightly different results. The strategy of spilling is probably the difference between your C and C++ compilers that you experience.
Work around:
For ways to handle this, please read c handling of excess precision. Try running your compiler with -fexcess-precision=standard and compare it with -fexcess-precision=fast. You can also try playing with -mfpmath=sse.
NOTE: According to the C++ standard this is not really a bug. However, it is a bug according to the documentation of GCC which claims to follow the IEEE-754 FP standard on Intel architectures (like it does on many other architectures). Obviously bug 323 violates the IEE-754 standard.
NOTE 2: On some optimization levels -fast-math is invoked, and all bets are off regarding to extra precision and evaluation order.
EDIT I have simulated the described behavior on an intel 64-bit system, and got the same results as the OP. Here is the code:
int main()
{
float a = hex2float(0x1D9969BB);
float b = hex2float(0x6CEDC83E);
float c = hex2float(0xAC89452F);
float d = hex2float(0xD2DC92B3);
float e = hex2float(0x4FE9F23C);
float result = (float)((double)a+b-c+d+e);
print("result", result);
result = flush(flush(flush(flush(a+b)-c)+d)+e);
print("result2", result);
}
The implementations of the support functions:
float hex2float(uint32_t num)
{
uint32_t rev = (num >> 24) | ((num >> 8) & 0xff00) | ((num << 8) & 0xff0000) | (num << 24);
float f;
memcpy(&f, &rev, 4);
return f;
}
void print(const char* label, float val)
{
printf("%10s (%13.10f) : 0x%02X%02X%02X%02X\n", label, val, ((unsigned char*)&val)[0],((unsigned char*)&val)[1],((unsigned char*)&val)[2],((unsigned char*)&val)[3]);
}
float flush(float x)
{
volatile float buf = x;
return buf;
}
After running this I have got exactly the same difference between the results:
result ( 0.4185241461) : 0xCC48D63E
result2 ( 0.4185241759) : 0xCD48D63E
For some reason this is different than the "pure" version described at the question. At one point I was also getting the same results as the "pure" version, but since then the question has changed. The original values in the original question were different. They were:
float a = hex2float(0x1D9969BB);
float b = hex2float(0x6CEDC83E);
float c = hex2float(0xD2DC92B3);
float d = hex2float(0xA61FD930);
float e = hex2float(0x4FE9F23C);
and with these values the resulting output is:
result ( 0.4185242951) : 0xD148D63E
result2 ( 0.4185242951) : 0xD148D63E
The C and C++ standards both permit floating-point expressions to be evaluated with more precision than the nominal type. Thus, a+b-c+d+e may be evaluated using double even though the types are float, and the compiler may optimize the expression in other ways. In particular, using exact mathematics is essentially using an infinite amount of precision, so the compiler is free to optimize or otherwise rearrange the expression based on mathematical properties rather than floating-point arithmetic properties.
It appears, for whatever reason, your compiler is choosing to use this liberty to evaluate the expression differently in different circumstances (which may be related to the language being compiled or due to other variations between your C and C++ code). One may be evaluating (((a+b)-c)+d)+e while the other does (((a+b)+d)+e)-c, or other variations.
In both languages, the compiler is required to “discard” the excess precision when a cast or assignment is performed. So you can compel a certain evaluation by inserting casts or assignments. Casts would make a mess of the expression, so assignments may be easier to read:
float t0 = a+b;
float t1 = t0-c;
float t2 = t1+d;
float result = t2+e;
I scanned my code with PVS Studio analyzer and I am confused on why this error and how to fix this.
V550 An odd precise comparison: * dest == value. It's probably better to use a comparison with defined precision: fabs(A - B) < Epsilon.
bool PipelineCache::SetShadowRegister(float* dest, uint32_t register_name) {
float value = register_file_->values[register_name].f32;
if (*dest == value) {
return false;
}
*dest = value;
return true;
}
I am guessing to change code like this:
bool PipelineCache::SetShadowRegister(float* dest, float* epsilon uint32_t register_name) {
float value = register_file_->values[register_name].f32;
return fabs(dest - value) < epsilon;
}
Whoever's wondering, we're talking about this code.
I'll try to explain what PVS studio developers were trying to achieve with this message. Citing their reference about V550:
Consider this sample:
double a = 0.5;
if (a == 0.5) //OK
x++;
double b = sin(M_PI / 6.0);
if (b == 0.5) //ERROR
x++;
The first comparison 'a == 0.5' is true. The second comparison 'b == 0.5' may be both true and false. The result of the 'b == 0.5' expression depends upon the processor, compiler's version and settings being used. For instance, the 'b' variable's value was 0.49999999999999994 when we used the Visual C++ 2010 compiler.
What they are trying to say, is, comparing floating point numbers is tricky. If you just assign your floating point number, store it and move it around the memory to later compare with itself in this function - feel free to dismiss this error message.
If you are looking to perform some bit-representation check (which I honestly think you are doing), see below.
If you are performing some massive calculations on floating point numbers, and you are a game developer, calculating the coordinates of the enemy battlecruisers - this warning is one of your best friends.
Anyway, let's return to your case. As it usually happens with PVS-Studio, they did not see the exact error, but they pointed you to the right direction. You actually want to compare two float values, but you are doing it wrong. The thing is, if both float numbers you are comparing contain NaN (even in the same bit representation), you'll get *dest != value, and your code will not work in the way you want.
In this scenario, you better reinterpret the memory under float * as uint32_t (or whatever integer type has the same size as float on your target) and compare them instead.
For example, in your particular case, register_file_->values[register_name] is of type xe::gpu::RegisterFile::RegisterValue, which already supports uint32_t representation.
As a side effect, this will draw the warning away :)
I have seen this piece of code in one of the C++ project in windows environment. just wondering what does the meaning of %12.10lg. Anyone has idea?
class Point
{
double x, y;
public Point::Point(double x_cord, double y_cord)
{
x = x_cord;
y = y_cord;
}
}
void foo(){
Point ptStart(12.5, 33.5678)
TRACE("%12.10lg, %12.10lg, %12.10lg\n", ptStart)
}
TRACE probably uses the normal format specifiers which then means that %12.10lg should print out a double value with a minimum width of 12 and a precision of 10, something like : 15.8930000000.
To display messages from your program in the debugger Output window, you can use the ATLTRACE macro or the MFC TRACE macro. Like assertions, the trace macros are active only in the Debug version of your program and disappear when compiled in the Release version. Like printf, the TRACE macro can handle a number of arguments.
https://msdn.microsoft.com/en-us/library/4wyz8787(v=vs.80).aspx
In your particular case, "%12.10lg" is a string similar to what you'd see in printf.
printf uses this format:
%[flags][width][.precision][length]specifier
In your case:
flags = unused
width = 12
precision = 10
length=long int
specifier=short representation
When you print this, it'll print the following arguments (ptStart)
using g++ (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
I have tried different typecasting of scaledvalue2 but not until I stored the multiplication in a double variable and then to an int could I get desired result.. but I can't explain why ???
I know double precission(0.6999999999999999555910790149937383830547332763671875) is an issue but I don't understand why one way is OK and the other is not ??
I would expect both to fail if precision is a problem.
I DON'T NEED solution to fix it.. but just a WHY ??
(the problem IS fixed)
void main()
{
double value = 0.7;
int scaleFactor = 1000;
double doubleScaled = (double)scaleFactor * value;
int scaledvalue1 = doubleScaled; // = 700
int scaledvalue2 = (double)((double)(scaleFactor) * value); // = 699 ??
int scaledvalue3 = (double)(1000.0 * 0.7); // = 700
std::ostringstream oss;
oss << scaledvalue2;
printf("convert FloatValue[%f] multi with %i to get %f = %i or %i or %i[%s]\r\n",
value,scaleFactor,doubleScaled,scaledvalue1,scaledvalue2,scaledvalue3,oss.str().c_str());
}
or in short:
value = 0.6999999999999999555910790149937383830547332763671875;
int scaledvalue_a = (double)(1000 * value); // = 699??
int scaledvalue_b = (double)(1000 * 0.6999999999999999555910790149937383830547332763671875); // = 700
// scaledvalue_a = 699
// scaledvalue_b = 700
I can't figure out what is going wrong here.
Output :
convert FloatValue[0.700000] multi with 1000 to get 700.000000 = 700 or 699 or 700[699]
vendor_id : GenuineIntel
cpu family : 6
model : 54
model name : Intel(R) Atom(TM) CPU N2600 # 1.60GHz
This is going to be a bit handwaving; I was up too late last night watching the Cubs win the World Series, so don't insist on precision.
The rules for evaluating floating-point expressions are somewhat flexible, and compilers typically treat floating-point expressions even more flexibly than the rules formally allow. This makes evaluation of floating-point expressions faster, at the expense of making the results somewhat less predictable. Speed is important for floating-point calculations. Java initially made the mistake of imposing exact requirements on floating-point expressions and the numerics community screamed with pain. Java had to give in to the real world and relax those requirements.
double f();
double g();
double d = f() + g(); // 1
double dd1 = 1.6 * d; // 2
double dd2 = 1.6 * (f() + g()); // 3
On x86 hardware (i.e., just about every desktop system in existence), floating-point calculations are in fact done with 80 bits of precision (unless you set some switches that kill performance, as Java required), even though double and float are 64 bits and 32 bits, respectively. So for arithmetic operations the operands are converted up to 80 bits and the results are converted back down to 64 or 32 bits. That's slow, so the generated code typically delays doing conversions as long as possible, doing all of the calculation with 80-bit precision.
But C and C++ both require that when a value is stored into a floating-point variable, the conversion has to be done. So, formally, in line //1, the compiler must convert the sum back to 64 bits to store it into the variable d. Then the value of dd1, calculated in line //2, must be computed using the value that was stored into d, i.e., a 64-bit value, while the value of dd2, calculated in line //3, can be calculated using f() + g(), i.e., a full 80-bit value. Those extra bits can make a difference, and the value of dd1 might be different from the value of dd2.
And often the compiler will hang on to the 80-bit value of f() + g() and use that instead of the value stored in d when it calculates the value of dd1. That's a non-conforming optimization, but as far as I know, every compiler does that sort of thing by default. They all have command-line switches to enforce the strictly-required behavior, so if you want slower code you can get it. <g>
For serious number crunching, speed is critical, so this flexibility is welcome, and number-crunching code is carefully written to avoid sensitivity to this kind of subtle difference. People get PhDs for figuring out how to make floating-point code fast and effective, so don't feel bad that the results you see don't seem to make sense. They don't, but they're close enough that, handled carefully, they give correct results without a speed penalty.
Since x86 floating-point unit performs its computations in extended precision floating point type (80 bits wide), the result might easily depend on whether the intermediate values were forcefully converted to double (64-bit floating-point type). In that respect, in non-optimized code it is not unusual to see compilers treat memory writes to double variables literally, but ignore "unnecessary" casts to double applied to temporary intermediate values.
In your example, the first part involves saving the intermediate result in a double variable
double doubleScaled = (double)scaleFactor * value;
int scaledvalue1 = doubleScaled; // = 700
The compiler takes it literally and does indeed store the product in a double variable doubleScaled, which unavoidably requires converting the 80-bit product to double. Later that double value is read from memory again and then converted to int type.
The second part
int scaledvalue2 = (double)((double)(scaleFactor) * value); // = 699 ??
involves conversions that the compiler might see as unnecessary (and they indeed are unnecessary from the point of view of abstract C++ machine). The compiler ignores them, which means that the final int value is generated directly from the 80-bit product.
The presence of that intermediate conversion to double in the first variant (and its absence in the second one) is what causes that difference.
I converted mindriot's example assembly code to Intel syntax to test with Visual Studio. I could only reproduce the error by setting the floating point control word to use extended precision.
The issue is that rounding is performed when converting from extended precision to double precision when storing a double, versus truncation is performed when converting from extended precision to integer when storing an integer.
The extended precision multiply produces a product of 699.999..., but the product is rounded to 700.000... during the conversion from extended to double precision when the product is stored into doubleScaled.
double doubleScaled = (double)scaleFactor * value;
Since doubleScaled == 700.000..., when truncated to integer, it's still 700:
int scaledvalue1 = doubleScaled; // = 700
The product 699.999... is truncated when it's converted into an integer:
int scaledvalue2 = (double)((double)(scaleFactor) * value); // = 699 ??
My guess here is that the compiler generated a compile time constant 0f 700.000... rather than doing the multiply at run time.
int scaledvalue3 = (double)(1000.0 * 0.7); // = 700
This truncation issue can be avoided by using the round() function from the C standard library.
int scaledvalue2 = (int)round(scaleFactor * value); // should == 700
Depending on compiler and optimization flags, scaledvalue_a, involving a variable, may be evaluated at runtime using your processors floating point instructions, whereas scaledvalue_b, involving constants only, may be evaluated at compile time using a math library (e.g. gcc uses GMP - the GNU Multiple Precision math library for this). The difference you are seeing seems to be the difference between the precision and rounding of the runtime vs compile time evaluation of that expression.
Due to rounding errors, most floating-point numbers end up being slightly imprecise.
For the below double to int conversion use std::ceil() API
int scaledvalue2 = (double)((double)(scaleFactor) * value); // = 699
??
There is a piece of code confuse me, which runs in windows!
Here is the code:
#define point_float2uint(x) *((unsigned int *)&x)
float divide_1000(float y)
{
float v = y / 1000.0f;
return v;
}
float divide_1000(int y)
{
float v = float(y) / 1000.0f;
return v;
}
void float_test(void)
{
int num[5] = {67975500, 67251500, 67540620, 69435500, 70171500};
for (int i = 0; i < 5; ++i)
{
int a = num[i];
float af_f = divide_1000(float(a));
float af_i = divide_1000((a));
printf("src num:%d, af_f:%f, %x, af_i:%f, %x\n", num[i], af_f, point_float2uint(af_f), af_i, point_float2uint(af_i));
}
}
Here is the output, compiled by vs2005:
src num:67975500, af_f:67975.507813, 4784c3c1, af_i:67975.500000, 4784c3c0
src num:67251500, af_f:67251.507813, 478359c1, af_i:67251.500000, 478359c0
src num:67540620, af_f:67540.625000, 4783ea50, af_i:67540.617188, 4783ea4f
src num:69435500, af_f:69435.507813, 47879dc1, af_i:69435.500000, 47879dc0
src num:70171500, af_f:70171.507813, 47890dc1, af_i:70171.500000, 47890dc0
The question is: why I use the "divide_1000", get the different result in windows? This is not what I want!
And I find that not all the integer result in different, but some just like the code above.
Here is the the output, comipled by gcc4.4.5 in debian:
src num:67975500, af_f:67975.507812, 4784c3c1, af_i:67975.507812, 4784c3c1
src num:67251500, af_f:67251.507812, 478359c1, af_i:67251.507812, 478359c1
src num:67540620, af_f:67540.625000, 4783ea50, af_i:67540.625000, 4783ea50
src num:69435500, af_f:69435.507812, 47879dc1, af_i:69435.507812, 47879dc1
src num:70171500, af_f:70171.507812, 47890dc1, af_i:70171.507812, 47890dc1
I get the same result in useing different function "divide_1000". That's what I want.
There are quite a few code generation settings involved here that affect the outcome. The difference that you report is observable in non-optimized code under default floating point model (i.e. "precise" model) when using the "classic" FPU instructions for floating-point computations.
The compiler translates the first call literally: the original integer value is first converted to float - 4-byte floating-point value - stored in memory (as function argument). This conversion rounds the value to +6.7975504e+7, which is already not precise. Later that float value is read form memory inside the first function and used for further computations.
The second call passes an int value to the function, which is directly loaded into high-precision FPU register and used for further computations. Even though you specified an explicit conversion from int to float inside the second function, the compiler decided to ignore your request. This value is never literally converted to float, meaning that the aforementioned loss of precision never occurs.
That is what is causing the difference you observed.
If you rewrite your second function as
float divide_1000(int y)
{
float fy = y;
float v = fy / 1000.0f;
return v;
}
i.e. add an additional step that saves the float value to a named location in memory, the compiler will perform that step in non-optimized code. This will cause the results to become identical.
Again, the above applies to the code compiled without optimizations, when the compiler normally attempts to translate all statements very closely (but not always exactly). In optimized code the compiler eliminates the "unnecessary" intermediate conversions to float and all "unnecessary" intermediate memory stores in both cases, producing identical results.
You might also want to experiment with other floating-point models (i.e. "strict" and "fast") to see how it affects the results. These floating-point models exist specifically to deal with issues like the one you observed.
If you change code generation settings of the compiler and make it use SSE instructions for floating-point arithmetic, the results might also change (in my experiment the difference disappears when SSE2 instruction set is used instead of FPU instructions).