Math Error with Primitive Operators - c++

I am having an issue with primitive types using built in operators. All of my operators work for all datatypes except for float and (un)signed long long int.
Why is it wrong even when multiplying by one? Also, why does +10 and -10 give the same number as +1, -1, /1, and *1.
The number 461168601 was chosen because it fits within the max float and max signed long long int.
Ran the following code and got the following output:
fmax : 340282346638528859811704183484516925440
imax : 9223372036854775807
i : 461168601
f : 10
f2 : 1
461168601 / 10 = 46116860
461168601 + 10 = 461168608
461168601 - 10 = 461168608
461168601 * 1 = 461168608
461168601 / 1 = 461168608
461168601 + 1 = 461168608
461168601 - 1 = 461168608
The following code can be ran here.
#include <iostream>
#include <sstream>
#include <iomanip>
#include <limits>
#define fmax std::numeric_limits<float>::max()
#define imax std::numeric_limits<signed long long int>::max()
int main()
{
signed long long int i = 461168601;
float f = 10;
float f2 = 1;
std::cout << std::setprecision(40);
std::cout <<"fmax : " << fmax << std::endl;
std::cout <<"imax : " << imax << std::endl;
std::cout <<"i : " << i << std::endl;
std::cout <<"f : " << f << std::endl;
std::cout <<"f2 : " << f2 << std::endl;
std::cout <<std::endl;
std::cout << i << " / " << f << " = " << i / f << std::endl;
std::cout << i << " + " << f << " = " << i + f << std::endl;
std::cout << i << " - " << f << " = " << i - f << std::endl;
std::cout <<std::endl;
std::cout << i << " * " << f2 << " = " <<i * f2 << std::endl;
std::cout << i << " / " << f2 << " = " << i / f2 << std::endl;
std::cout << i << " + " << f2 << " = " << i + f2 << std::endl;
std::cout << i << " - " << f2 << " = " << i - f2 << std::endl;
}

The error is caused by the too big difference between 4611686018427387904 and 1 or 10. You should never sum numbers with a such difference, because actual difference between two closest floating point numbers grows with exponent value.
When two floating point numbers are added, the first of all they are aligned to the same exponent value (the bigger one), so before operation you have e.g. 1e10 and 1e-10 and after alignment you have 1e10 and 0e10 the result is 1e10.

Dug around some and found this article.
Casting opens up its own can of worms. You have to be careful, because your float might not have enough precision to preserve an entire integer. A 32-bit integer can represent any 9-digit decimal number, but a 32-bit float only offers about 7 digits of precision. So if you have large integers, making this conversion will clobber them. Thankfully, doubles have enough precision to preserve a whole 32-bit integer (notice, again, the analogy between floating point precision and integer dynamic range). Also, there is some overhead associated with converting between numeric types, going from float to int or between float and double.
So, essentially once the whole part of a number reaches about more than seven digits, the float begins to shift the number to keep the whole part of the number about seven digits. When this shifting of the decimal place occurs, the number begins to reach the floating point inaccuracy.

Related

Difference in evaluation of expression when using long long int vs double in c++ [duplicate]

This question already has answers here:
Why does floating-point arithmetic not give exact results when adding decimal fractions?
(31 answers)
Closed last year.
I'll refer to the below code to explain my question.
typedef long long int ll;
void func(){
ll lli_a = 603828039791327040;
ll lli_b = 121645100408832000;
double d_b = (double)lli_b;
cout << "a " << lli_b - d_b << endl; \\0
cout << "b " << (lli_a - 4*lli_b) - (lli_a - 4*d_b) << endl; \\64
cout << "c " << (lli_a - 4*lli_b) - (lli_a - (ll)4*d_b) << endl; \\64
cout << "d " << (lli_a - 4*lli_b) - (lli_a - 4*(ll)d_b) << endl; \\0
cout << "e " << 4*(ll)d_b - 4*d_b << endl; \\0
cout << "f " << 4*(ll)d_b - (ll)4*d_b << endl; \\0
}
I'm unable to understand why statements b and c have evaluated to 64, while d has evaluated to 0, which happens to be the correct answer.
Both e and f evaluate to 0, so the difference is coming because of subtraction from lli_a I assume. I don't think there is any overflow issue as individual values for each term are coming correctly.
double is a floating point type. Floating point types have limited precision. They cannot represent all numbers - not even all rational numbers. Simply (on your system) 603828039791327040 is a number that cannot be represented by the double datatype. The closest value that is representable happens to be 64 away from the precise value.
You can (likely) get the expected result by using long double which (typically) can represent all values of long long - or you could avoid using floating point in the first place.
Some code to walk you through it, bottom line don't mix doubles with ints implicitly
#include <cassert>
#include <iostream>
#include <type_traits>
// typedef long long int ll; NO , use using and never use aliasing to safe a bit of typing. Aliases are there to introduce meaning not shortcuts
//using namespace std; // also NO
int main()
{
long long int lli_a = 603828039791327040;
long long int lli_b = 121645100408832000;
//double d_b = (double)lli_b; // No this is C++ don't use 'C' style casts
double d_b = static_cast<double>(lli_b);
assert(static_cast<long long int>(d_b) == lli_b); // you are in luck the double can represent your value exectly, NOT guaranteed
std::cout << "a " << lli_b - d_b << "\n"; // endl; \\0 don't use endl unless you have a good reason to flush
long long int lli_b4 = 4 * lli_b;
// use auto to show you this expression evaluates to a double!
auto lli_d_b4 = (lli_a - static_cast<long long int>(4) * d_b); // d_b is double!!! what do you want to do here? Use it as a long long int then cast it first
static_assert(std::is_same_v<double, decltype(lli_d_b4)>);
auto result_c = lli_b4 - lli_d_b4;
// result c is still a double!
static_assert(std::is_same_v<double, decltype(result_c)>);
std::cout << "c " << result_c << "\n";
// long story short don't mix types implicitly and use "C++" style cast explicitly to get the results you want
/*
cout << "b " << (lli_a - 4 * lli_b) - (lli_a - 4 * d_b) << endl; \\64
cout << "c " << (lli_a - 4 * lli_b) - (lli_a - (ll)4 * d_b) << endl; \\64
cout << "d " << (lli_a - 4 * lli_b) - (lli_a - 4 * (ll)d_b) << endl; \\0
cout << "e " << 4 * (ll)d_b - 4 * d_b << endl; \\0
cout << "f " << 4 * (ll)d_b - (ll)4 * d_b << endl; \\0
*/
return 0;
}

C++ pow behaviour with negative vector size calculations

After calling pow function with the argument as in the code bellow
it produces some high number as if it was accessing some invalid memory location.
I have no idea why this happens and any help would be greatly appreciated.
#include <iostream>
#include <vector>
#include <math.h>
using namespace std;
int main() {
vector<vector<int>> G = {
{1, 2, 3},
{0, 4}
};
cout << pow(G[1].size() - G[0].size(), 2) << endl;
return 0;
}
This prints 1.84467e+019.
The type of .size() is unsigned and you can not simply subtract them when the left operand is less than the right one.
Try this:
cout << pow((long) G[1].size() - (long)G[0].size(), 2) << endl;
~~~~~~ ~~~~~~
However, this solution is based on the assumption that casting the result of .size() fits into a signed long.
If you want a more defensive code, try this one:
size_t size_diff(size_t s0, size_t s1)
{
return s0 < s1? (s1 - s0) : (s0 - s1);
}
int main() {
// ...
cout << pow(size_diff(G[1].size(), G[0].size()), 2) << endl;
}
In addition to the accepted answer, I'd like to note that in C++20 we'll have std::ssize() free function that returns size as a signed type value. Then
std::pow(std::ssize(G[1]) - std::ssize(G[0]), 2)
will produce the correct result without explicit type casts.
Since pow takes a floating point value as its first argument, I'd suggest letting the compiler decide the right promotion by adding the difference to 0.0 (or 0.0L):
#include <iostream>
#include <cstdint>
#include <cmath>
using namespace std;
int main()
{
std::string name;
/// 52 of 64 bits used
uint64_t n1 = 0x000ffffffffffffd;
uint64_t n2 = 0x000fffffffffffff;
cout << "plain: " << n1 - n2 << endl;
cout << "float: " << (float)n1 - (float)n2 << endl;
cout << "double: " << (double)n1 - (double)n2 << endl;
cout << "long double: " << (long double)n1 - (long double)n2 << endl;
cout << "0.0+: " << 0.0 + n1 - n2 << endl;
cout << "0.0L+: " << 0.0L + n1 - n2 << endl;
cout << "pow(plain, 2): " << pow(n1-n2, 2) << endl;
cout << "pow(0.0+diff, 2): " << pow(0.0+n1-n2, 2) << endl;
cout << "pow(0.0L+diff, 2): " << pow(0.0L+n1-n2, 2) << endl;
}
The output
plain: 18446744073709551614
float: 0
double: -2
long double: -2
0.0+: -2
0.0L+: -2
pow(plain, 2): 3.40282e+38
pow(0.0+diff, 2): 4
pow(0.0L+diff, 2): 4
shows that plain subtraction goes wrong. Even casting to float doesn't suffice because float provides only a 23-bit mantissa.
The decision whether to use 0.0 or 0.0L for differences of size_t values returned by real std::vector::size() calls is theoretical for processes with address spaces below 4.5 Petabytes.
So I think the following will do:
cout << pow(0.0 + G[1].size() - G[0].size(), 2) << endl;

why i am getting output blank?

why I am getting output blank? pointers are able to modify but can't read.why?
#include <iostream>
using namespace std;
int main(){
int a = 0;
char *x1,*x2,*x3,*x4;
x1 = (char *)&a;
x2 = x1;x2++;
x3 = x2;x3++;
x4 = x3;x4++;
*x1=1;
*x2=1;
*x3=1;
*x4=1;
cout <<"#" << *x1 << " " << *x2 << " " << *x3 << " " << *x4 << "#"<<endl ;
cout << a << endl;
}
[Desktop]👉 g++ test_pointer.cpp
[Desktop]👉 ./a.out
# #
16843009
I want to read the value of integer by using pointers type of char.
so i can read byte by byte.
You're streaming chars. These get automatically ASCII-ised for you by IOStreams*, so you're seeing (or rather, not seeing) unprintable characters (in fact, all 0x01 bytes).
You can cast to int to see the numerical value, and perhaps add std::hex for a conventional view.
Example:
#include <iostream>
#include <iomanip>
int main()
{
int a = 0;
// Alias the first four bytes of `a` using `char*`
char* x1 = (char*)&a;
char* x2 = x1 + 1;
char* x3 = x1 + 2;
char* x4 = x1 + 3;
*x1 = 1;
*x2 = 1;
*x3 = 1;
*x4 = 1;
std::cout << std::hex << std::setfill('0');
std::cout << '#' << std::setw(2) << "0x" << (int)*x1
<< ' ' << std::setw(2) << "0x" << (int)*x2
<< ' ' << std::setw(2) << "0x" << (int)*x3
<< ' ' << std::setw(2) << "0x" << (int)*x4
<< '#' << '\n';
std::cout << "0x" << a << '\n';
}
// Output:
// #0x01 0x01 0x01 0x01#
// 0x1010101
(live demo)
Those saying that your program has undefined are incorrect (assuming your int has at least four bytes in it); aliasing objects via char* is specifically permitted.
The 16843009 output is correct; that's equal to 0x01010101 which you'd again see if you put your stream into hex mode.
N.B. Some people will recommend reinterpret_cast<char*>(&a) and static_cast<int>(*x1), instead of C-style casts, though personally I find them ugly and unnecessary in this particular case. For the output you can at least write +*x1 to get a "free" promotion to int (via the unary + operator), but that's not terribly self-documenting.
* Technically it's something like the opposite; IOStreams usually automatically converts your numbers and booleans and things into the right ASCII characters to appear correct on screen. For char it skips that step, assuming that you're already providing the ASCII value you want.
Assuming an int is at least 4 bytes long on your system, the program manipulates the 4 bytes of int a.
The result 16843009 is the decimal value of 0x01010101, so this is as you might expect.
You don't see anything in the first line of output because you write 4 characters of a binary value 1 (or 0x01) which are invisible characters (ASCII SOH).
When you modify your program like this
*x1='1';
*x2='3';
*x3='5';
*x4='7';
you will see output with the expected characters
#1 3 5 7#
926233393
The value 926233393 is the decimal representation of 0x37353331 where 0x37 is the ASCII value of the character '7' etc.
(These results are valid for a little-endian architecture.)
You can use unary + for converting character type (printed as symbol) into integer type (printed as number):
cout <<"#" << +*x1 << " " << +*x2 << " " << +*x3 << " " << +*x4 << "#"<<endl ;
See integral promotion:
Have a look at your declarations of the x's
char *x1,*x2,*x3,*x4;
these are pointers to chars (characters).
In your stream output they are interpreted as printable characters.
A short look into the ascii-Table let you see that the lower numbers are not printeable.
Since your int a is zero also the x's that point to the individual bytes are zero.
One possibility to get readeable output would be to cast the characters to int, so that the stream would print the numerical representation instead the ascii character:
cout <<"#" << int(*x1) << " " << int(*x2) << " " << int(*x3) << " " << int(*x4) << "#"<<endl ;
If I understood your problem correctly, this is the solution
#include <stdio.h>
#include <iostream>
using namespace std;
int main(){
int a = 0;
char *x1,*x2,*x3,*x4;
x1 = (char*)&a;
x2 = x1;x2++;
x3 = x2;x3++;
x4 = x3;x4++;
*x1=1;
*x2=1;
*x3=1;
*x4=1;
cout <<"#" << (int)*x1 << " " << (int)*x2 << " " << (int)*x3 << " " << (int)*x4 << "#"<<endl ;
cout << a << endl;
}

fmod telling me fractional part of 1 is 1

I'm trying to check if a double variable p is approximately equal to an integer. At some point in my code I have
double ip;
cout << setprecision(15) << abs(p) << " " << modf(abs(p), &ip) << endl;
And for a given run I get the printout
1 1
This seems to say that the fractional part of 1 is 1, am I missing something here or could there be some roundoff problem etc?
Note: I'm not including the whole code since the origin of p is complicated and I'm just asking if this is a familiar issue
could there be some roundoff problem etc?
There certainly could. If the value is very slightly less than 1, then both its value and its fractional part could be rounded to 1 when displayed.
the origin of p is complicated
Then it's very likely not to be an exact round number.
You are testing a nearly-1-value, so precision of 15 is not enough to describe it unambiguously.
This code shows your problem clearly:
#include <iostream>
#include <iomanip>
#include <cmath>
#include <limits>
using namespace std;
int main() {
double ip, d = nextafter(1., .0); // Get a double just smaller than 1
const auto mp = std::numeric_limits<double>::max_digits10;
cout << 15 << ": " << setprecision(15)
<< abs(d) << " " << modf(abs(d), &ip) << '\n';
cout << mp << ": " << setprecision(mp)
<< abs(d) << " " << modf(abs(d), &ip) << '\n';
}
On coliru: http://coliru.stacked-crooked.com/a/e00ded79c1727299
15: 1 1
17: 0.99999999999999989 0.99999999999999989

Implementation of type "long double" with GCC and C++11

I've tried searching for information on long double, and so far I understand it is implemented differently by compilers.
When using GCC on Ubuntu (XUbuntu) Linux 12.10 I get this:
double PId = acos(-1);
long double PIl = acos(-1);
std::cout.precision(100);
std::cout << "PId " << sizeof(double) << " : " << PId << std::endl;
std::cout << "PIl " << sizeof(long double) << " : " << PIl << std::endl;
Output:
PId 8 : 3.141592653589793115997963468544185161590576171875
PIl 16 : 3.141592653589793115997963468544185161590576171875
Anyone understand why they output (almost) the same thing?
According to the reference of acos, it will return a long double only if you pass a long double to it. You'll also have to use std::acos like baboon suggested. This works for me:
#include <cmath>
#include <iostream>
int main() {
double PId = acos((double)-1);
long double PIl = std::acos(-1.0l);
std::cout.precision(100);
std::cout << "PId " << sizeof(double) << " : " << PId << std::endl;
std::cout << "PIl " << sizeof(long double) << " : " << PIl << std::endl;
}
Output:
PId 8 : 3.141592653589793115997963468544185161590576171875
PIl 12 : 3.14159265358979323851280895940618620443274267017841339111328125
3.14159265358979323846264338327950288419716939937510582097494459
The last line is not part of the output and contains the correct digits for pi to this precision.
To get the correct number of significant digits use std::numeric_limits. In C++11 we have digits10 for decimal significant digits (as opposed to digits which gives significant bits).
#include <cmath>
#include <iostream>
#include <limits>
int
main()
{
std::cout.precision(std::numeric_limits<float>::digits10);
double PIf = acos(-1.0F);
std::cout << "PIf " << sizeof(float) << " : " << PIf << std::endl;
std::cout.precision(std::numeric_limits<double>::digits10);
double PId = acos(-1.0);
std::cout << "PId " << sizeof(double) << " : " << PId << std::endl;
std::cout.precision(std::numeric_limits<long double>::digits10);
long double PIl = std::acos(-1.0L);
std::cout << "PIl " << sizeof(long double) << " : " << PIl << std::endl;
}
On x86_64 linux I get:
PIf 4 : 3.14159
PId 8 : 3.14159265358979
PIl 16 : 3.14159265358979324
Try:
long double PIl = std::acos(-1.0L);
That makes you pass a long double and not just an int which gets converted.
Note that mostly all of these numbers are rubbish anyway.
With an 8 byte double you get 15 Numbers of precision, if you compare your numbers with the real PI
3.1415926535897932384626433
You see that only the first 15 Numbers fit.
As noted in the comments, you probably won't get double the precision as the implementation might only use a 80Bit representation and then it depends on how many bits it reserves for the mantissa.