Discrepancy between the values computed by Fortran and C++ - c++

I would have dared say that the numeric values computed by Fortran and C++ would be way more similar. However, from what I am experiencing, it turns out that the calculated numbers start to diverge after too few decimal digits. I have come across this problem during the process of porting some legacy code from the former language to the latter. The original Fortran 77 code...
M = 2
DENOMINATOR = 0.7714+0.2286*(ROUND**3.82)
... outputs 0.842201471328735, while its C++ equivalent...
int m = 2;
int round = 1;
long double numerator = 5.0 / pow((m-1)+pow(1.3, m), 1.8);
long double denominator = 0.7714 + 0.2286 * pow(round, 3.82);
std::cout << std::setiosflags(std::ios::fixed) << std::setprecision(15)
<< numerator/denominator << std::endl;
... returns 0.842201286195064. That is, the computed values are equal only up to the sixth decimal. Although not particularly a Fortran advocator, I feel inclined to consider its results as the 'correct' ones, given its legitimate reputation of number cruncher. However, I am intrigued about the cause of this difference between the computed values. Does anyone know what the reason for this discrepancy could be?

In Fortran, by default, floating point literals are single precision, whereas in C/C++ they are double precision.
Thus, in your Fortran code, the expression for calculating NUMERATOR is done in single precision; it is only converted to double precision when assigning the final result to the NUMERATOR variable.
And the same thing for the expression calculating the value that is assigned to the DENOMINATOR variable.


pow() function gives an error [duplicate]

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
And the result is wrong:
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
The problem doesn't occur anymore:
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;

C++ int64 * double == off by one

Below is the code I've tested in a 64-bit environment and 32-bit. The result is off by one precisely each time. The expected result is: 1180000000 with the actual result being 1179999999. I'm not sure exactly why and I was hoping someone could educate me:
#include <stdint.h>
#include <iostream>
using namespace std;
int main() {
double odds = 1.18;
int64_t st = 1000000000;
int64_t res = st * odds;
cout << "result: " << res << endl;
return 1;
I appreciate any feedback.
1.18, or 118 / 100 can't be exactly represented in binary, it will have repeating decimals. The same happens if you write 1 / 3 in decimal.
So let's go over a similar case in decimal, let's calculate (1 / 3) × 30000, which of course should be 10000:
odds = 1 / 3 and st = 30000
Since computers have only a limited precision we have to truncate this number to a limited number of decimals, let's say 6, so:
odds = 0.333333
0.333333 × 10000 = 9999.99. The cast (which in your program is implicit) will truncate this number to 9999.
There is no 100% reliable way to work around this. float and double just have only limited precision. Dealing with this is a hard problem.
Your program contains an implicit cast from double to an integer on the line int64_t res = st * odds;. Many compilers will warn you about this. It can be the source of bugs of the type you are describing. This cast, which can be explicitly written as (int64_t) some_double, rounds the number towards zero.
An alternative is rounding to the nearest integer with round(some_double);. That will—in this case—give the expected result.
First of all - 1.18 is not exactly representable in double. Mathematically the result of:
double odds = 1.18;
is 1.17999999999999993782751062099 (according to an online calculator).
So, mathematically, odds * st is 1179999999.99999993782751062099.
But in C++, odds * st is an expression with type double. So your compiler has two options for implementing this:
Do the computation in double precision
Do the computation in higher precision and then round the result to double
Apparently, doing the computation in double precision in IEEE754 results in exactly 1180000000.
However, doing it in long double precision produces something more like 1179999999.99999993782751062099
Converting this to double is now implementation-defined as to whether it selects the next-highest or next-lowest value, but I believe it is typical for the next-lowest to be selected.
Then converting this next-lowest result to integer will truncate the fractional part.
There is an interesting blog post here where the author describes the behaviour of GCC:
It uses long double intermediate precision for x86 code (due to the x87 FPUs long double registers)
It uses actual types for x64 code (because the SSE/SSE2 FPU supports this more naturally)
According to the C++11 standard you should be able to inspect which intermediate precision is being used by outputting FLT_EVAL_METHOD from <cfloat>. 0 would mean actual values, 2 would mean long double is being used.

C++ Modulus returning wrong answer

Here is my code :
#include <iostream>
#include <cmath>
using namespace std;
int main()
int n, i, num, m, k = 0;
cout << "Enter a number :\n";
cin >> num;
n = log10(num);
while (n > 0) {
i = pow(10, n);
m = num / i;
k = k + pow(m, 3);
num = num % i;
cout << m << endl;
cout << num << endl;
k = k + pow(num, 3);
return 0;
When I input 111 it gives me this
I am using codeblocks. I don't know what is wrong.
Whenever I use pow expecting an integer result, I add .5 so I use (int)(pow(10,m)+.5) instead of letting the compiler automatically convert pow(10,m) to an int.
I have read many places telling me others have done exhaustive tests of some of the situations in which I add that .5 and found zero cases where it makes a difference. But accurately identifying the conditions in which it isn't needed can be quite hard. Using it when it isn't needed does no real harm.
If it makes a difference, it is a difference you want. If it doesn't make a difference, it had a tiny cost.
In the posted code, I would adjust every call to pow that way, not just the one I used as an example.
There is no equally easy fix for your use of log10, but it may be subject to the same problem. Since you expect a non integer answer and want that non integer answer truncated down to an integer, adding .5 would be very wrong. So you may need to find some more complicated work around for the fundamental problem of working with floating point. I'm not certain, but assuming 32-bit integers, I think adding 1e-10 to the result of log10 before converting to int is both never enough to change log10(10^n-1) into log10(10^n) but always enough to correct the error that might have done the reverse.
pow does floating-point exponentiation.
Floating point functions and operations are inexact, you cannot ever rely on them to give you the exact value that they would appear to compute, unless you are an expert on the fine details of IEEE floating point representations and the guarantees given by your library functions.
(and furthermore, floating-point numbers might even be incapable of representing the integers you want exactly)
This is particularly problematic when you convert the result to an integer, because the result is truncated to zero: int x = 0.999999; sets x == 0, not x == 1. Even the tiniest error in the wrong direction completely spoils the result.
You could round to the nearest integer, but that has problems too; e.g. with sufficiently large numbers, your floating point numbers might not have enough precision to be near the result you want. Or if you do enough operations (or unstable operations) with the floating point numbers, the errors can accumulate to the point you get the wrong nearest integer.
If you want to do exact, integer arithmetic, then you should use functions that do so. e.g. write your own ipow function that computes integer exponentiation without any floating-point operations at all.

Weird Rounding Occurs in C++ Function

I am writing a function in c++ that is supposed to find the largest single digit in the number passed (inputValue). For example, the answer for .345 is 5. However, after a while, the program is changing the inputValue to something along the lines of .3449 (and the largest digit is then set to 9). I have no idea why this is happening. Any help to resolve this problem would be greatly appreciated.
This is the function in my .hpp file
void LargeInput(const double inputValue)
//Function to find the largest value of the input
int tempMax = 0,//Value that the temporary max number is in loop
digit = 0,//Value of numbers after the decimal place
test = 0,
powerOten = 10;//Number multiplied by so that the next digit can be checked
double number = inputValue;//A variable that can be changed in the function
cout << "The number is still " << number << endl;
for (int k = 1; k <= 6; k++)
test = (number*powerOten);
cout << "test: " << test << endl;
digit = test % 10;
cout << (static_cast<int>(number*powerOten)) << endl;
if (tempMax < digit)
tempMax = digit;
powerOten *= 10;
You cannot represent real numbers (doubles) precisely in a computer - they need to be approximated. If you change your function to work on longs or ints there won't be any inaccuracies. That seems natural enough for the context of your question, you're just looking at the digits and not the number, so .345 can be 345 and get the same result.
Try this:
int get_largest_digit(int n) {
int largest = 0;
while (n > 0) {
int x = n % 10;
if (x > largest) largest = x;
n /= 10;
return largest;
This is because the fractional component of real numbers is in the form of 1/2^n. As a result you can get values very close to what you want but you can never achieve exact values like 1/3.
It's common to instead use integers and have a conversion (like 1000 = 1) so if you had the number 1333 you would do printf("%d.%d", 1333/1000, 1333 % 1000) to print out 1.333.
By the way the first sentence is a simplification of how floating point numbers are actually represented. For more information check out; http://en.wikipedia.org/wiki/Floating_point#Representable_numbers.2C_conversion_and_rounding
This is how floating point number work, unfortunately. The core of the problem is that there are an infinite number of floating point numbers. More specifically, there are an infinite number of values between 0.1 and 0.2 and there are an infinite number of values between 0.01 and 0.02. Computers, however, have a finite number of bits to represent a floating point number (64 bits for a double precision number). Therefore, most floating point numbers have to be approximated. After any floating point operation, the processor has to round the result to a value it can represent in 64 bits.
Another property of floating point numbers is that as number get bigger they get less and less precise. This is because the same 64 bits have to be able to represent very big numbers (1,000,000,000) and very small numbers (0.000,000,000,001). Therefore, the rounding error gets larger when working with bigger numbers.
The other issue here is that you are converting from floating point to integer. This introduces even more rounding error. It appears that when (0.345 * 10000) is converted to an integer, the result is closer to 3449 than 3450.
I suggest you don't convert your numbers to integers. Write your program in terms of floating point numbers. You can't use the modulus (%) operator on floating point numbers to get a value for digit. Instead use the fmod function in the C math library (cmath.h).
As other answers have indicated, binary floating-point is incapable of representing most decimal numbers exactly. Therefore, you must reconsider your problem statement. Some alternatives are:
The number is passed as a double (specifically, a 64-bit IEEE-754 binary floating-point value), and you wish to find the largest digit in the decimal representation of the exact value passed. In this case, the solution suggested by user millimoose will work (provided the asprintf or snprintf function used is of good quality, so that it does not incur rounding errors that prevent it from producing correctly rounded output).
The number is passed as a double but is intended to represent a number that is exactly representable as a decimal numeral with a known number of digits. In this case, the solution suggested by user millimoose again works, with the format specification altered to convert the double to decimal with the desired number of digits (e.g., instead of “%.64f”, you could use “%.6f”).
The function is changed to pass the number in another way, such as with decimal floating-point, as a scaled integer, or as a string containing a decimal numeral.
Once you have clarified the problem statement, it may be interesting to consider how to solve it with floating-point arithmetic, rather than calling library functions for formatted output. This is likely to have pedagogical value (and incidentally might produce a solution that is computationally more efficient than calling a library function).

Why pow(10,5) = 9,999 in C++

Recently i write a block of code:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = pow(sections, 5- t -1);
cout << i << endl;
And the result is wrong:
If i using just this code:
for(int t = 0; t < 5; t++){
cout << pow(sections,5-t-1) << endl;
The problem doesn't occur anymore:
Does anyone give me an explaination? thanks you very much!
Due to the representation of floating point values pow(10.0, 5) could be 9999.9999999 or something like this. When you assign that to an integer that got truncated.
EDIT: In case of cout << pow(10.0, 5); it looks like the output is rounded, but I don't have any supporting document right now confirming that.
EDIT 2: The comment made by BoBTFish and this question confirms that when pow(10.0, 5) is used directly in cout that is getting rounded.
When used with fractional exponents, pow(x,y) is commonly evaluated as exp(log(x)*y); such a formula would mathematically correct if evaluated with infinite precision, but may in practice result in rounding errors. As others have noted, a value of 9999.999999999 when cast to an integer will yield 9999. Some languages and libraries use such a formulation all the time when using an exponentiation operator with a floating-point exponent; others try to identify when the exponent is an integer and use iterated multiplication when appropriate. Looking up documentation for the pow function, it appears that it's supposed to work when x is negative and y has no fractional part (when x is negative and `y is even, the result should be pow(-x,y); when y is odd, the result should be -pow(-x,y). It would seem logical that when y has no fractional part a library which is going to go through the trouble of dealing with a negative x value should use iterated multiplication, but I don't know of any spec dictating that it must.
In any case, if you are trying to raise an integer to a power, it is almost certainly best to use integer maths for the computation or, if the integer to be raised is a constant or will always be small, simply use a lookup table (raising numbers from 0 to 15 by any power that would fit in a 64-bit integer would require only a 4,096-item table).
From Here
Looking at the pow() function: double pow (double base, double exponent); we know the parameters and return value are all double type. But the variable num, i and res are all int type in code above, when tranforming int to double or double to int, it may cause precision loss. For example (maybe not rigorous), the floating point unit (FPU) calculate pow(10, 4)=9999.99999999, then int(9999.9999999)=9999 by type transform in C++.
How to solve it?
Change the code:
const int num = 10;
for(int i = 0; i < 5; ++i){
double res = pow(num, i);
cout << res << endl;
Replace floating point unit (FPU) having higher calculation precision in double type. For example, we use SSE in Windows CPU. In Code::Block 13.12, we can do this steps to reach the goal: Setting -> Compiler setting -> GNU GCC Compile -> Other options, add
-mfpmath=sse -msse3
The picture is as follows:
(source: qiniudn.com)
Whats happens is the pow function returns a double so
when you do this
int i = pow(sections, 5- t -1);
the decimal .99999 cuts of and you get 9999.
while printing directly or comparing it with 10000 is not a problem because it is runded of in a sense.
If the code in your first example is the exact code you're running, then you have a buggy library. Regardless of whether you're picking up std::pow or C's pow which takes doubles, even if the double version is chosen, 10 is exactly representable as a double. As such the exponentiation is exactly representable as a double. No rounding or truncation or anything like that should occur.
With g++ 4.5 I couldn't reproduce your (strange) behavior even using -ffast-math and -O3.
Now what I suspect is happening is that sections is not being assigned the literal 10 directly but instead is being read or computed internally such that its value is something like 9.9999999999999, which when raised to the fourth power generates a number like 9999.9999999. This is then truncated to the integer 9999 which is displayed.
Depending on your needs you may want to round either the source number or the final number prior to assignment into an int. For example: int i = pow(sections, 5- t -1) + 0.5; // Add 0.5 and truncate to round to nearest.
There must be some broken pow function in the global namespace. Then std::pow is "automatically" used instead in your second example because of ADL.
Either that or t is actually a floating-point quantity in your first example, and you're running into rounding errors.
You're assigning the result to an int. That coerces it, truncating the number.
This should work fine:
for(int t= 0; t < 5; t++){
double i = pow(sections, 5- t -1);
cout << i << endl;
What happens is that your answer is actually 99.9999 and not exactly 100. This is because pow is double. So, you can fix this by using i = ceil(pow()).
Your code should be:
const int sections = 10;
for(int t= 0; t < 5; t++){
int i = ceil(pow(sections, 5- t -1));
cout << i << endl;