I'm trying to check if a double variable p is approximately equal to an integer. At some point in my code I have
double ip;
cout << setprecision(15) << abs(p) << " " << modf(abs(p), &ip) << endl;
And for a given run I get the printout
1 1
This seems to say that the fractional part of 1 is 1, am I missing something here or could there be some roundoff problem etc?
Note: I'm not including the whole code since the origin of p is complicated and I'm just asking if this is a familiar issue
could there be some roundoff problem etc?
There certainly could. If the value is very slightly less than 1, then both its value and its fractional part could be rounded to 1 when displayed.
the origin of p is complicated
Then it's very likely not to be an exact round number.
You are testing a nearly-1-value, so precision of 15 is not enough to describe it unambiguously.
This code shows your problem clearly:
#include <iostream>
#include <iomanip>
#include <cmath>
#include <limits>
using namespace std;
int main() {
double ip, d = nextafter(1., .0); // Get a double just smaller than 1
const auto mp = std::numeric_limits<double>::max_digits10;
cout << 15 << ": " << setprecision(15)
<< abs(d) << " " << modf(abs(d), &ip) << '\n';
cout << mp << ": " << setprecision(mp)
<< abs(d) << " " << modf(abs(d), &ip) << '\n';
}
On coliru: http://coliru.stacked-crooked.com/a/e00ded79c1727299
15: 1 1
17: 0.99999999999999989 0.99999999999999989
Related
After calling pow function with the argument as in the code bellow
it produces some high number as if it was accessing some invalid memory location.
I have no idea why this happens and any help would be greatly appreciated.
#include <iostream>
#include <vector>
#include <math.h>
using namespace std;
int main() {
vector<vector<int>> G = {
{1, 2, 3},
{0, 4}
};
cout << pow(G[1].size() - G[0].size(), 2) << endl;
return 0;
}
This prints 1.84467e+019.
The type of .size() is unsigned and you can not simply subtract them when the left operand is less than the right one.
Try this:
cout << pow((long) G[1].size() - (long)G[0].size(), 2) << endl;
~~~~~~ ~~~~~~
However, this solution is based on the assumption that casting the result of .size() fits into a signed long.
If you want a more defensive code, try this one:
size_t size_diff(size_t s0, size_t s1)
{
return s0 < s1? (s1 - s0) : (s0 - s1);
}
int main() {
// ...
cout << pow(size_diff(G[1].size(), G[0].size()), 2) << endl;
}
In addition to the accepted answer, I'd like to note that in C++20 we'll have std::ssize() free function that returns size as a signed type value. Then
std::pow(std::ssize(G[1]) - std::ssize(G[0]), 2)
will produce the correct result without explicit type casts.
Since pow takes a floating point value as its first argument, I'd suggest letting the compiler decide the right promotion by adding the difference to 0.0 (or 0.0L):
#include <iostream>
#include <cstdint>
#include <cmath>
using namespace std;
int main()
{
std::string name;
/// 52 of 64 bits used
uint64_t n1 = 0x000ffffffffffffd;
uint64_t n2 = 0x000fffffffffffff;
cout << "plain: " << n1 - n2 << endl;
cout << "float: " << (float)n1 - (float)n2 << endl;
cout << "double: " << (double)n1 - (double)n2 << endl;
cout << "long double: " << (long double)n1 - (long double)n2 << endl;
cout << "0.0+: " << 0.0 + n1 - n2 << endl;
cout << "0.0L+: " << 0.0L + n1 - n2 << endl;
cout << "pow(plain, 2): " << pow(n1-n2, 2) << endl;
cout << "pow(0.0+diff, 2): " << pow(0.0+n1-n2, 2) << endl;
cout << "pow(0.0L+diff, 2): " << pow(0.0L+n1-n2, 2) << endl;
}
The output
plain: 18446744073709551614
float: 0
double: -2
long double: -2
0.0+: -2
0.0L+: -2
pow(plain, 2): 3.40282e+38
pow(0.0+diff, 2): 4
pow(0.0L+diff, 2): 4
shows that plain subtraction goes wrong. Even casting to float doesn't suffice because float provides only a 23-bit mantissa.
The decision whether to use 0.0 or 0.0L for differences of size_t values returned by real std::vector::size() calls is theoretical for processes with address spaces below 4.5 Petabytes.
So I think the following will do:
cout << pow(0.0 + G[1].size() - G[0].size(), 2) << endl;
It is easy to output a double value which is calculated to two decimal places.
And the code snippet is below:
cout.setf(ios_base::showpoint);
cout.setf(ios_base::fixed, ios_base::floatfield);
cout.precision(2);
cout << 10000000.2 << endl; // output: 10000000.20
cout << 2.561452 << endl; // output: 2.56
cout << 24 << endl; // output: 24 but I want 24.00, how to change my code?
How to output an interger which is calculated to two decimal places? I want 24.00 as an output.
It depends on what your 24 is.
If it is a hard-coded value, you can just write:
std::cout << 24.00 << std::endl;
If it's an integer variable, write this:
std::cout << static_cast<double>(myIntegerVariable) << std::endl;
Don't use any of the suggested approaches like adding ".00" as this will break your code if you want to change the precision later.
A rewrite of completeness, please try with following
#include <iostream>
#include <iomanip>
int main()
{
int i = 24;
std::cout << std::fixed << std::setprecision(2) << double(i) << std::endl;
// Output: 24.00
}
It would seem fmod(x,1) where x is a double gives the wrong result, as output by the line:
std::cout << fmod(min, 1) << "|" << fmod(max, 1) << std::endl;
I forgot the name for what you call this, but this is the smallest amount of code necessary to illustrate my problem:
#include <stdio.h>
#include <stdlib.h>
#include <iostream>
#include <time.h>
#include <math.h>
const int deviation = 3;
void weightedRandomNumber(double min, double max);
int main() {
srand(time(nullptr));
std::cout.precision(16);
std::cout << 123.1 << "|" << 2789.3234 << std::endl;
weightedRandomNumber(123.1, 2789.3234);
system("pause");
return 0;
}
void weightedRandomNumber(double min, double max) {//inclusive
int multiplier = 1;
std::cout << min << "|" << max << std::endl;
while (fmod(min, 1) > 0 || fmod(max, 1) > 0) {
std::cout << min << "|" << max << std::endl;
std::cout << fmod(min, 1) << "|" << fmod(max, 1) << std::endl;
min *= 10;
max *= 10;
multiplier++;
}
std::cout << min << "|" << max << std::endl;
std::cout << multiplier << std::endl;
}
The outputs I get when I run the code are as such:
123.1|2789.3234
123.1|2789.3234
123.1|2789.3234
0.09999999999999432|0.3234000000002197
1231|27893.234
0|0.2340000000040163
12310|278932.34
0|0.3400000000256114
123100|2789323.4
0|0.400000000372529
1231000|27893234
0|3.725290298461914e-09
12310000|278932340.0000001
0|5.960464477539063e-08
123100000|2789323400
0|4.76837158203125e-07
1231000000|27893234000
0|3.814697265625e-06
12310000000|278932340000.0001
0|6.103515625e-05
123100000000|2789323400000
0|0.00048828125
1231000000000|27893234000000
0|0.00390625
12310000000000|278932340000000
0|0.03125
123100000000000|2789323400000001
0|0.5
1231000000000000|2.7893234e+16
14
Other than this I don't quite know what to say, if I have missed anything necessary please comment so I can amend my question.
The issue is not with fmod, which is giving the highest precision results it can. The issue is with cout precision not behaving like you expect, combined with "rounding" because a double cannot store 0.1 accurately enough to represent what cout considers a precision of 16.
This code demonstrates the issue. The rounding actually occurs when you assign 123.1 to a double, but because of the 3 digits to the left is not visible until it becomes a smaller number.
int main() {
std::cout.precision(16);
std::cout << (123.1L - 123L);
}
output:
0.09999999999999432
Actually....this illustrates the problem even more succinctly:
int main() {
std::cout.precision(20);
std::cout << 123.1;
}
123.09999999999999432
Further reading from the comments on your question:
Is floating point math broken?
Also, for the vast majority of scenarios, a double is more than fine. For accurate, recursive math, you'd want to consider a heavy-duty math library, or even a math-specialized language.
Further further reading:
http://www.boost.org/doc/libs/1_62_0/libs/math/doc/html/math_toolkit/high_precision/why_high_precision.html
I am having an issue with primitive types using built in operators. All of my operators work for all datatypes except for float and (un)signed long long int.
Why is it wrong even when multiplying by one? Also, why does +10 and -10 give the same number as +1, -1, /1, and *1.
The number 461168601 was chosen because it fits within the max float and max signed long long int.
Ran the following code and got the following output:
fmax : 340282346638528859811704183484516925440
imax : 9223372036854775807
i : 461168601
f : 10
f2 : 1
461168601 / 10 = 46116860
461168601 + 10 = 461168608
461168601 - 10 = 461168608
461168601 * 1 = 461168608
461168601 / 1 = 461168608
461168601 + 1 = 461168608
461168601 - 1 = 461168608
The following code can be ran here.
#include <iostream>
#include <sstream>
#include <iomanip>
#include <limits>
#define fmax std::numeric_limits<float>::max()
#define imax std::numeric_limits<signed long long int>::max()
int main()
{
signed long long int i = 461168601;
float f = 10;
float f2 = 1;
std::cout << std::setprecision(40);
std::cout <<"fmax : " << fmax << std::endl;
std::cout <<"imax : " << imax << std::endl;
std::cout <<"i : " << i << std::endl;
std::cout <<"f : " << f << std::endl;
std::cout <<"f2 : " << f2 << std::endl;
std::cout <<std::endl;
std::cout << i << " / " << f << " = " << i / f << std::endl;
std::cout << i << " + " << f << " = " << i + f << std::endl;
std::cout << i << " - " << f << " = " << i - f << std::endl;
std::cout <<std::endl;
std::cout << i << " * " << f2 << " = " <<i * f2 << std::endl;
std::cout << i << " / " << f2 << " = " << i / f2 << std::endl;
std::cout << i << " + " << f2 << " = " << i + f2 << std::endl;
std::cout << i << " - " << f2 << " = " << i - f2 << std::endl;
}
The error is caused by the too big difference between 4611686018427387904 and 1 or 10. You should never sum numbers with a such difference, because actual difference between two closest floating point numbers grows with exponent value.
When two floating point numbers are added, the first of all they are aligned to the same exponent value (the bigger one), so before operation you have e.g. 1e10 and 1e-10 and after alignment you have 1e10 and 0e10 the result is 1e10.
Dug around some and found this article.
Casting opens up its own can of worms. You have to be careful, because your float might not have enough precision to preserve an entire integer. A 32-bit integer can represent any 9-digit decimal number, but a 32-bit float only offers about 7 digits of precision. So if you have large integers, making this conversion will clobber them. Thankfully, doubles have enough precision to preserve a whole 32-bit integer (notice, again, the analogy between floating point precision and integer dynamic range). Also, there is some overhead associated with converting between numeric types, going from float to int or between float and double.
So, essentially once the whole part of a number reaches about more than seven digits, the float begins to shift the number to keep the whole part of the number about seven digits. When this shifting of the decimal place occurs, the number begins to reach the floating point inaccuracy.
I've tried searching for information on long double, and so far I understand it is implemented differently by compilers.
When using GCC on Ubuntu (XUbuntu) Linux 12.10 I get this:
double PId = acos(-1);
long double PIl = acos(-1);
std::cout.precision(100);
std::cout << "PId " << sizeof(double) << " : " << PId << std::endl;
std::cout << "PIl " << sizeof(long double) << " : " << PIl << std::endl;
Output:
PId 8 : 3.141592653589793115997963468544185161590576171875
PIl 16 : 3.141592653589793115997963468544185161590576171875
Anyone understand why they output (almost) the same thing?
According to the reference of acos, it will return a long double only if you pass a long double to it. You'll also have to use std::acos like baboon suggested. This works for me:
#include <cmath>
#include <iostream>
int main() {
double PId = acos((double)-1);
long double PIl = std::acos(-1.0l);
std::cout.precision(100);
std::cout << "PId " << sizeof(double) << " : " << PId << std::endl;
std::cout << "PIl " << sizeof(long double) << " : " << PIl << std::endl;
}
Output:
PId 8 : 3.141592653589793115997963468544185161590576171875
PIl 12 : 3.14159265358979323851280895940618620443274267017841339111328125
3.14159265358979323846264338327950288419716939937510582097494459
The last line is not part of the output and contains the correct digits for pi to this precision.
To get the correct number of significant digits use std::numeric_limits. In C++11 we have digits10 for decimal significant digits (as opposed to digits which gives significant bits).
#include <cmath>
#include <iostream>
#include <limits>
int
main()
{
std::cout.precision(std::numeric_limits<float>::digits10);
double PIf = acos(-1.0F);
std::cout << "PIf " << sizeof(float) << " : " << PIf << std::endl;
std::cout.precision(std::numeric_limits<double>::digits10);
double PId = acos(-1.0);
std::cout << "PId " << sizeof(double) << " : " << PId << std::endl;
std::cout.precision(std::numeric_limits<long double>::digits10);
long double PIl = std::acos(-1.0L);
std::cout << "PIl " << sizeof(long double) << " : " << PIl << std::endl;
}
On x86_64 linux I get:
PIf 4 : 3.14159
PId 8 : 3.14159265358979
PIl 16 : 3.14159265358979324
Try:
long double PIl = std::acos(-1.0L);
That makes you pass a long double and not just an int which gets converted.
Note that mostly all of these numbers are rubbish anyway.
With an 8 byte double you get 15 Numbers of precision, if you compare your numbers with the real PI
3.1415926535897932384626433
You see that only the first 15 Numbers fit.
As noted in the comments, you probably won't get double the precision as the implementation might only use a 80Bit representation and then it depends on how many bits it reserves for the mantissa.