Casting float to int inconsistent across MinGw and Clang - c++

Using C++, I'm trying to cast a float value to an int using these instructions :
#include <iostream>
int main() {
float NbrToCast = 1.8f;
int TmpNbr = NbrToCast * 10;
std::cout << TmpNbr << "\n";
}
I understand the value 1.8 cannot be precisely represented as a float and is actually stored as 1.79999995.
Thus, I would expect that multiplying this value by ten, would result to 17.99999995 and then casting it to an int would give 17.
When compiling and running this code with MinGW (v4.9.2 32bits) on Windows 7, I get the expected result (17).
When compiling and running this code with CLang (v600.0.57) on my Mac (OS X 10.11), I get 18as a result, which is not what I was expecting but which seems more correct in a mathematical way !
Why do I get this difference ?
Is there a way to have a consistent behavior no matter the OS or the compiler ?

Like Yuushi said in the comments, I guess the rounding rules may differ for each compiler. Having a portable solution on such a topic probably means you need to write your own rounding method.
So in your case you probably need to check the value of the digit after 7 and increment the value or not. Let's say something like:
int main() {
float NbrToCast = 1.8f;
float TmpNbr = NbrToCast * 10;
std::cout << RoundingFloatToInt(TmpNbr) << "\n";
}
int RoundingFloatToInt(const float &val)
{
float intPart, fractPart;
fractpart = modf (val, &intpart);
int result = intPart;
if (fractpart > 0.5)
{
result++;
}
return result;
}
(code not tested at all but you have the idea)
If you need performance, it's probably not great but I think it should be portable.

Related

compilers processing maths differently?

I made a code to find the derivative of a function at a given point. The code reads
#include"stdafx.h"
#include<iostream>
using namespace std;
double function(double x) {
return (3 * x * x);
}
int main() {
double x, y, dy, dx;
cin >> x;
y = function(x);
dx = 0.00000001;
dy = function(x + dx) - y;
cout << "Derivative of function at x = " << x << " is " << (double)dy / dx;
cin >> x;
}
Now my college uses turbo C++ as its IDE and compiler while at home I have visual studio (because TC++ looks very bad on a 900p screen but jokes apart). When I tried a similar program on the college PCs the result was quite messed up and was much less accurate than what I am getting at home. for example:
Examples:
x = 3
#College result = 18.something
#Home result = 18 (precise without a decimal point)
x = 1
#College result = 6.000.....something
#Home result = 6 (precise without a decimal point)
The Very big Question:
Why are different compilers giving different results ?
I’m 90% sure the result’s same in both cases, and the only reason why you see difference is different output formatting.
For 64-bit IEEE double math, the precise results of those computations are probably 17.9999997129698385833762586116790771484375 and 6.0000000079440951594733633100986480712890625, respectively.
If you want to verify that hypothesis, you can print you double values this way:
void printDoubleAsHex( double val )
{
const uint64_t* p = (const uint64_t*)( &val );
printf( "%" PRIx64 "\n", *p );
}
And verify you have same output in both compilers.
However, there’s also 10% chance that indeed your two compilers compiled your code in a way that the result is different. Ain’t uncommon, it can even happen with the same compiler but different settings/flags/options.
The most likely reason is different instruction sets. By default, many modern compilers generate SSE instructions for the code like yours, older ones producer legacy x87 code (x87 operates on 80-bit floating point values on the stack, SSE on 32 or 64-bit FP values on these vector registers, hence the difference in precision). Another reason is different rounding modes. Another one is compiler specific optimizations, such as /fp in Visual C++.

Size of float and double

Calculating the dot-product of two vectors (float value) gives the different results on the different machines:
6.102435302 (Win7 x64, compiler VS12 version 17.00.50727.1)
6.140244007 (Win7 x64, MinGW, gcc version 5.3.0)
The code is:
#include <iostream>
#include <iterator>
#include <fstream>
#include <vector>
#include <iomanip>
#include <algorithm>
int main(int argc, char** argv){
std::ifstream is("test.txt");
std::istream_iterator<float> start(is), end;
std::vector<float> numbers(start, end);
std::cout << "Read " << numbers.size() << " numbers" << std::endl;
float product = 0;
for (int i = 0; i <= numbers.size() - 1; i++)
product += (numbers[i])*(numbers[i]); // += means add to product
std::cout << std::setprecision(10) << product << std::endl;
std::cin.get();
}
test.txt is:
-0.082833
0.151422
-0.088526
-0.538506
0.646273
0.266993
0.200206
-0.149989
0.141407
0.158835
-0.119255
-0.039122
-0.045419
0.141848
-0.218912
-0.264521
0.032238
-0.055877
0.100393
-0.097075
-0.006268
-0.070172
-0.275793
0.103654
-0.075405
-0.117017
0.029951
-0.094158
-0.168427
0.381314
0.144073
-0.100971
-0.078645
0.013768
0.144876
0.005855
-0.018223
-0.090576
-0.071564
-0.029456
-0.098014
-0.149181
0.200667
-0.189492
0.264529
-0.061738
-0.097826
0.138872
-0.241878
0.019428
-0.087634
-0.058300
-0.009269
0.039241
-0.066350
0.059845
-0.048516
-0.070653
-0.116227
0.037203
-0.037091
-0.097324
0.043834
-0.340037
0.133938
0.087197
0.213261
-0.170708
-0.151203
0.052959
0.027145
-0.142675
-0.209020
0.001813
-0.022321
0.190862
-0.015501
-0.228589
-0.038538
-0.038480
-0.194482
0.087518
-0.257362
0.160805
-0.114158
0.176832
0.219573
-0.333160
-0.068385
-0.143289
-0.228401
0.214679
0.277186
-0.130965
0.142526
-0.166073
-0.035309
0.001260
-0.064977
0.020747
0.014043
-0.133625
-0.156975
-0.043092
0.154749
-0.181473
-0.288339
-0.144132
-0.004081
-0.071694
-0.094631
0.483994
-0.260140
0.020749
0.031850
0.041064
0.250101
-0.192338
-0.222687
0.114226
-0.227428
0.005388
-0.163509
-0.135427
-0.206788
-0.021093
0.279840
-0.055362
-0.016305
-0.279524
0.277402
0.198076
0.103796
-0.272994
0.306518
-0.024435
0.149532
-0.165079
-0.394348
-0.141590
-0.188541
0.002890
0.064264
-0.045430
-0.026021
0.096325
0.033765
0.111890
-0.012204
0.130457
-0.106022
-0.180052
-0.447620
0.051825
0.089245
-0.265819
-0.087720
0.180074
-0.259521
-0.356145
0.162247
0.282323
-0.096935
-0.040101
-0.214359
0.357032
0.195393
0.150603
-0.120796
0.204032
0.130334
0.115753
-0.123727
-0.107526
0.196002
-0.397541
0.320854
0.013272
-0.058865
0.018108
0.023616
-0.053654
-0.223593
-0.310052
0.109229
-0.107124
0.074454
-0.021471
-0.033081
0.108072
-0.067013
-0.084968
-0.171947
0.308421
-0.204827
-0.060015
0.092264
0.115863
0.131043
0.041844
I suppose, it somehow depends on the size of float and double and they are not the same for these machines. Is it possible to make the same output for both computers?
I have no access to the first machine (with the first result 6.102435302), but I can reproduce the same result with the following python code (with numpy):
test = np.loadtxt(test_file, dtype=np.float32)
result = test.dot(test)
The difference is too large to be explained by using float instead of double. Look for actual bugs in your code. Or your calculation is highly unstable, in which case you really need to examine what's going on and can't trust any numbers until you understood what's going on.
Getting the same output for both compilers is easy - just set the result to zero. But what you want is to get the correct result. You have one result that is badly wrong, and one that cannot be trusted, and you don't know which one is which. Making the results the same would only cover this up but not solve any problem.
It looks likely that the VS12 auto vectorized the loop (and then you mis-typed the result).
If you run the loop vectorised as so:
float product = 0;
for (int i = 0; i < numbers.size(); i+= 4)
{
__m128 val = *(__m128*)(&numbers[i]);
auto res = _mm_dp_ps (val, val, 255);
float result;
_mm_store_ss(&result, res);
product += result;
}
Then the result you get out is:
6.14024353
This is the same as your first result of 6.102435302 but it looks like you missed the 4 out whilst transcribing.
At least the the best explanation I can come up with. Already spent way too long on this question :-)

How does the cout statement affect the O/P of the code written?

#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
int main() {
int t;
double n;
cin>>t;
while(t--)
{
cin>>n;
double x;
for(int i=1;i<=10000;i++)
{
x=n*i;
if(x==ceilf(x))
{
cout<<i<<endl;
break;
}
}
}
return 0;
}
For I/P:
3
5
2.98
3.16
O/P:
1
If my code is:
#include <iostream>
#include <iomanip>
#include <math.h>
using namespace std;
int main() {
int t;
double n;
cin>>t;
while(t--)
{
cin>>n;
double x;
for(int i=1;i<=10000;i++)
{
x=n*i;
cout<<"";//only this statement is added;
if(x==ceilf(x))
{
cout<<i<<endl;
break;
}
}
}
return 0;
}
For the same input O/P is:
1
50
25
The only extra line added in 2nd code is: cout<<"";
Can anyone please help in finding why there is such a difference in output just because of the cout statement added in the 2nd code?
Well this is a veritable Heisenbug. I've tried to strip your code down to a minimal replicating example, and ended up with this (http://ideone.com/mFgs0S):
#include <iostream>
#include <math.h>
using namespace std;
int main()
{
float n;
cin >> n; // this input is needed to reproduce, but the value doesn't matter
n = 2.98; // overwrite the input value
cout << ""; // comment this out => y = z = 149
float x = n * 50; // 149
float y = ceilf(x); // 150
cout << ""; // comment this out => y = z = 150
float z = ceilf(x); // 149
cout << "x:" << x << " y:" << y << " z:" << z << endl;
}
The behaviour of ceilf appears to depend on the particular sequence of iostream operations that occur around it. Unfortunately I don't have the means to debug in any more detail at the moment, but maybe this will help someone else to figure out what's going on. Regardless, it seems almost certain that it's a bug in gcc-4.9.2 and gcc-5.1. (You can check on ideone that you don't get this behaviour in gcc-4.3.2.)
You're probably getting an issue with floating point representations - which is to say that computers cannot perfectly represent all fractions. So while you see 50, the result is probably something closer to 50.00000000001. This is a pretty common problem you'll run across when dealing with doubles and floats.
A common way to deal with it is to define a very small constant (in mathematical terms this is Epsilon, a number which is simply "small enough")
const double EPSILON = 0.000000001;
And then your comparison will change from
if (x==ceilf(x))
to something like
double difference = fabs(x - ceilf(x));
if (difference < EPSILON)
This will smooth out those tiny inaccuracies in your doubles.
"Comparing for equality
Floating point math is not exact. Simple values like 0.2 cannot be precisely represented using binary floating point numbers, and the limited precision of floating point numbers means that slight changes in the order of operations can change the result. Different compilers and CPU architectures store temporary results at different precisions, so results will differ depending on the details of your environment. If you do a calculation and then compare the results against some expected value it is highly unlikely that you will get exactly the result you intended.
In other words, if you do a calculation and then do this comparison:
if (result == expectedResult)
then it is unlikely that the comparison will be true. If the comparison is true then it is probably unstable – tiny changes in the input values, compiler, or CPU may change the result and make the comparison be false."
From http://www.cygnus-software.com/papers/comparingfloats/Comparing%20floating%20point%20numbers.htm
Hope this answers your question.
Also you had a problem with
if(x==ceilf(x))
ceilf() returns a float value and x you have declared as a double.
Refer to problems in floating point comparison as to why that wont work.
change x to float and the program runs fine,
I made a plain try on my laptop and even online compilers.
g++ (4.9.2-10) gave the desired output (3 outputs), along with online compiler at geeksforgeeks.org. However, ideone, codechef did not gave the right output.
All I can infer is that online compilers name their compiler as "C++(gcc)" and give wrong output. While, geeksforgeeks.org, which names the compiler as "C++" runs perfectly, along with g++ (as tested on Linux).
So, we could arrive at a hypothesis that they use gcc to compile C++ code as a method suggested at this link. :)

Brute-force equation solving

I'm writing a program that uses brute-force to solve an equation. Unfortunately, I seem to have an error in my code somewhere, as my program stops at search = 0.19999. Here is the code:
#include <iostream>
#include <cmath>
#include <vector>
#define min -4.0
#define max 6.5
using namespace std;
double fx (double x){
long double result = cos(2*x)-0.4*x;
double scale = 0.00001;
double value = (int)(result / scale) * scale;
return value;
}
int sign (double a){
if(a<0) return -1;
if(a==0) return 0;
else return 1;
}
int main(){
vector <double> results;
double step, interval, start, end, search;
interval=(fabs(min)+fabs(max))/50;
step=0.00001;
start=min;
end=min+interval;
search=start;
while(end <= max){
if(sign(start) != sign(end)){
search=start;
while(search < end){
if(fx(search) == 0) results.push_back(search);
search=search+step;
}
}
start=end;
end=start + interval;
}
for(int i=0; i<results.size(); i++){
cout << results[i] << endl;
}
}
I've been looking at it for quite some time now and I still can't find the error in the code.
The program should check if there is a root in each given interval and, if yes, check every possibility in that interval. If it finds a root, it should push it into the results vector.
I know you already found the answer but I just spotted a problem while trying to find the bug. On line 37 you make the following comparison:
if(fx(search) == 0)
Since your fx function returns double. It's generally not advisable to test using the equal operator when dealing with double precision float numbers. Your result will probably never be exactly 0, then this test will never return true. I think you should use comparison using a maximum error margin, like this:
double maximum_error = 0.005;
if(abs(fx(search)) < maximum_error)
I think that would do the trick in your case. You may find more information on this link
Even if it's working right now, micro changes in your input numbers, CPU architecture or even compiler flags may break your program. It's highly dangerous to compare doubles in C++ like that, even though it's legal to do so.
I've just made a run through the code again and found the error.
if(sign(start) != sign(end))
was the culprit. There will be a root if the values of f(x) for start and end have different signs. Instead, I wrote that the if the signs of start and end are different, there will be a root. Sorry for the fuss.

Exact binary representation of a double [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Float to binary in C++
I have a very small double var, and when I print it I get -0. (using C++).
Now in order to get better precision I tried using
cout.precision(18); \\i think 18 is the max precision i can get.
cout.setf(ios::fixed,ios::floatfield);
cout<<var;\\var is a double.
but it just writes -0.00000000000...
I want to see the exact binary representation of the var.
In other words I want to see what binary number is written in the stack memory/register for this var.
union myUnion {
double dValue;
uint64_t iValue;
};
myUnion myValue;
myValue.dValue=123.456;
cout << myValue.iValue;
Update:
The version above will work for most purposes, but it assumes 64 bit doubles. This version makes no assumptions and generates a binary representation:
double someDouble=123.456;
unsigned char rawBytes[sizeof(double)];
memcpy(rawBytes,&someDouble,sizeof(double));
//The C++ standard does not guarantee 8-bit bytes
unsigned char startMask=1;
while (0!=static_cast<unsigned char>(startMask<<1)) {
startMask<<=1;
}
bool hasLeadBit=false; //set this to true if you want to see leading zeros
size_t byteIndex;
for (byteIndex=0;byteIndex<sizeof(double);++byteIndex) {
unsigned char bitMask=startMask;
while (0!=bitMask) {
if (0!=(bitMask&rawBytes[byteIndex])) {
std::cout<<"1";
hasLeadBit=true;
} else if (hasLeadBit) {
std::cout<<"0";
}
bitMask>>=1;
}
}
if (!hasLeadBit) {
std::cout<<"0";
}
This way is guaranteed to work by the standard:
double d = -0.0;
uint64_t u;
memcpy(&u, &d, sizeof(d));
std::cout << std::hex << u;
Try:
printf("0x%08x\n", myFloat);
This should work for a 32 bit variable, to display it in hex. I've never tried using this technique to see a 64 bit variable, but I think it's:
printf("%016llx\n", myDouble);
EDIT: tested the 64-bit version and it definitely works on Win32 (I seem to recall the need for uppercase LL on GCC.. maybe)
EDIT2: if you really want binary, you are best off using one of the other answers to get a uint64_t version of your double, and then looping:
for ( int i = 63; i >= 0; i-- )
{
printf( "%d", (myUint64 >> i ) & 1 );
}