Sorting between signed and unsigned zeros c++ - c++

I have to be able to sort negative zeros and zeros for an assignment i am doing at university using c++, could someone tell me why the following code does produce a negative zero? i'm stuck and i am not sure why this works...
cout << "Enter the number of elements you want to add to the vector:\n";
cin >> x;
cout << "Enter the integers: \n" << endl;
for (int i = 0; i < x; i++)
{
cin >> y;
y = y - 0.0;
cout << y;
Array.push_back(y);
}
If there is a better way of producing a negative zero when sorting the above vector please advise.
Many thanks!

First of all, there need not be any negative zeroes in standard C++, so I assume you are talking about the negative zero from IEEE-754, which is what most (I never encountered an exception) C++ implementations base their floating point math on.
In this case, the expression
y = y - 0.0;
will yield -0.0 only if either y == -0.0 before that assignment or if you set your rounding mode to "round towards -INFINITY", which you usually won't.
To actually produce a double with the value -0.0, you can just assign the desired value:
double d = -0.0;
Now d == -0.0 in IEEE floating point math.
However, as
Comparisons shall ignore the sign of zero
(IEEE 754-1985, 5.7. Comparison), -0.0 < 0.0 will yield false, so if you actually want to sort negative zero before positive zero, you will need to write your own comparator, possibly using std::signbit.
Appendix: Relevant standard quote:
When the sum of two operands with opposite signs (or the difference of two
operands with like signs) is exactly zero, the sign of that sum (or difference)
shall be + in all rounding modes except round toward –INFINITY, in which
mode that sign shall be –.
IEEE 754-1985, 6.3 (The Sign Bit)

Related

Issue related to double precision floating point division in C++

In C++, we know that we can find the minimum representable double precision value using std::numeric_limits<double>::min(). The value turns out to be 2.22507e-308 when printed.
Now if a given double value (say val) is subtracted from this minimum value and then a division is undertaken with the same previous double value (val - minval) / val, I was expecting the answer to be rounded to 0 if the operation floor((val - minval ) / val) was performed on the resulting divided value.
To my surprise, the answer is delivered as 1. Can someone please explain this anomalous behavior?
Consider the following code:
int main()
{
double minval = std::numeric_limits<double>::min(), wg = 8038,
ans = floor((wg - minval) / wg); // expecting the answer to round to 0
cout << ans; // but the answer actually resulted as 1!
}
A double typically has around 16 digits of precision.
You're starting with 8038. For simplicity, I'm going to call that 8.038e3. Since we have around 16 digits of precision, the smallest number we can subtract from that and still get a result different from 8038 is 8038e(3-16) = 8038e-13.
8038 - 2.2e-308 is like reducing the mass of the universe by one electron, and expecting that to affect the mass of the universe by a significant amount.
Actually, relatively speaking, 8038-2.2e-308 is a much smaller change than removing a whole electron from the universe--more like removing a minuscule fraction of a single electron from the universe, if that were possible. Even if we were to assume that string theory were correct, even removing one string from the universe would still be a huge change compared to subtracting 2.2e-308 from 8038.
The comments and the previous answer correctly attribute the cause to floating point precision issues but there are additional details needed to explain the correct behavior. In fact, even in cases where subtraction cannot be carried out such that the results of the subtraction cannot be represented with the finite precision of floating point numbers, inexact rounding is still performed by the compiler and subtraction is not completely discarded.
As an example, consider the code below.
int main()
{
double b, c, d;
vector<double> a{0.07, 0.08, 0.09, 0.1, 0.2, 0.3, 0.4, 0.6, 0.7};
cout << "Subtraction Possible?" << "\t" << "Floor Result" << "\n";
for( int i = 0; i < 9; i++ ) {
b = std::nextafter( a[i], 0 );
c = a[i] - b;
d = 1e-17;
if( (bool)(d > c) )
cout << "True" << "\t";
else
cout << "False" << "\t";
cout << setprecision(52) << floor((a[i] - d)/a[i]) << "\n";
}
return 0;
}
The code takes in different double precision values in the form of vector a and performs subtraction from 1e-17. It must be noted that the smallest value that can be subtracted from 0.07 is shown to be 1.387778780781445675529539585113525390625e-17 using std::nextafter for the value 0.07. This means that 1e-17 is smaller than the smallest value which can be subtracted from any of these numbers. Hence, theoretically, subtraction should not be possible for any of the numbers listed in vector a. If we assume that the subtraction results are discarded, then the answer must always stay 1 but it turns out that sometimes the answer is 0 and other times 1.
This can be observed from the output of the C++ program as shown below:
Subtraction Possible? Floor Result
False 0
False 0
False 0
False 0
False 1
False 1
False 1
False 1
False 1
The reasons lay buried within the Floating Point specification prescribed in the IEEE 754 document. In general the standard specifically states that even in cases where the results of an operation cannot be represented, rounding must be carried out. I quote Page 27, Section 4.3 of the IEEE 754, 2019 document:
Except where stated otherwise, every operation shall be performed as if it first produced an
intermediate result correct to infinite precision and with unbounded range, and then rounded that result
according to one of the attributes in this clause
The statement in further repeated in Section 5.1 of Page 29 as shown below:
Unless otherwise specified, each of the computational
operations specified by this standard that returns a numeric result shall be performed as if it first produced
an intermediate result correct to infinite precision and with unbounded range, and then rounded that
intermediate result, if necessary, to fit in the destination’s format (see Clause 4 and Clause 7).
C++'s g++ compiler (which I have been testing) correctly and very precisely interprets the standard by implementing nearest rounding stated in Section 4.3.1 of the IEEE 754 document. This has the implication that even when a[i] - b is not representable, a numeric result is delivered as if the subtraction first produced an intermediate result correct to infinite precision and with unbounded range, and then rounded that
intermediate result. Hence, it may or may not be the case that a[i] - b == a[i] which means that the answer may or may not be 1 depending on whether a[i] - b is closer to a[i] or it is closer to the next representable value after a[i].
It turns out that 8038 - 2.22507e-308 is closer to 8038 due to which the answer is rounded (using nearest rounding) to 8038 and the final answer is 1 but this is to only state that this behavior does result from the compiler's interpretation of the standard and is not something arbitrary.
I found below references on Floating Point numbers to be very useful. I would recommend reading Cleve Moler's (founder of MATLAB) reference on floating point numbers before going through the IEEE specification for a quick and easy understanding of their behavior.
"IEEE Standard for Floating-Point Arithmetic," in IEEE Std 754-2019 (Revision of IEEE 754-2008) , vol., no., pp.1-84, 22 July 2019, doi: 10.1109/IEEESTD.2019.8766229.
Moler, Cleve. “Floating Points.” MATLAB News and Notes. Fall, 1996.

In what situation do we get nan in c++?

I read some article about nan, but sites didn't mentioned all situations. For example I compiled this code and received nan.
Why doesn't it give inf ?
#include <iostream>
using namespace std;
int main()
{
double input,counter,pow= 1, sum = 0, sign = 1.0;
cin >> input;
for (counter = 1; pow / counter >= 1e-4; counter++)
{
pow *= input;
sum += sign * pow / counter;
sign = -sign;
}
cout << sum << endl;
}
The result is :
nan
With input of “2”, your program adds two infinities of opposite signs, which generates a NaN. This occurs because repeatedly multiplying pow by two causes it to become infinity, and the alternating sign results in a positive infinity being added to a negative infinity in sum from the previous iteration or vice-versa.
However, it is not clear why you see any output, as counter++ becomes ineffective once counter reaches 253 (in typical C++ implementations) because then the double format lacks precision to represent 253+1, so the result of adding one to 253 is rounded to 253. So counter stops changing, and the loop continues forever.
One possibility is that your compiler is generating code that always terminates the loop, because this is allowed by the “Forward progress” clause (4.7.2 in draft n4659) of the C++ standard. It says the compiler can assume your loop will not continue forever without doing something useful (like writing output or calling exit), and that allows the compiler to generate code that exits the loop even though it would otherwise continue forever with input of “2”.
Per the IEEE-754 standard, operations that produce NaN as a result include:
operations on a NaN,
multiplication of zero by an infinity,
subtraction of two infinities of the same sign or addition of two infinities of opposite signs,
division of zero by zero or an infinity by an infinity,
remainder when the divisor is zero or the dividend is infinity,
square root of a value less than zero,
various exceptions in some utility and mathematical routines (such as pow, see IEEE-754 9.2, 5.3.2, and 5.3.3).
C++ implementations do not always conform to IEEE-754, but these are generally good guidelines for sources of NaNs.

Display of Double Precision Floating Points Vs Their comparrison

Preamble
I am looking into a system developed to be used by people who don't understand floating point arithmetic. For this reason the implementation of comparison for floating point numbers is not exposed to the people using the system. Currently comparisons of floating point numbers occur like this (And this cannot change due to legacy reasons):
// If either number is not finite, do default comparison
if (!IsFinite(num1) || !IsFinite(num2)) {
output = (num1 == num2);
} else {
// Get exponents of both numbers to determine epsilon for comparison
tmp = (OSINT32*)&num1+1;
exp1 = (((*tmp)>>20)& 0x07ff) - 1023;
tmp = (OSINT32*)&num2+1;
exp2 = (((*tmp)>>20)& 0x07ff) - 1023;
// Check if exponent is the same
if (exp1 != exp2) {
output = false;
} else {
// Calculate epsilon based on the magic number 47 (presumably calculated experimentally)?
epsilon = pow(2.0,exp1-47);
output = (fabs(num2-num1) <= eps);
}
}
The crux of it is, we calculate the epsilon based on the exponent of the number to stop users of the interface from making floating point comparison mistakes. A BIG NOTE: This is for people who are not software programmers so when they do pow(sqrt(2), 2) == 2 they don't get a big surprise. Maybe this is not the best idea, but like i said, it cannot be changed.
The Problem
We are having trouble figuring out how to display numbers to the user. In the past they simply displayed the number to 15 significant digits. But this results in problems of the following type:
>> SHOW 4.1 MOD 1
>> 0.099999999999999996
>> SHOW (4.1 MOD 1) == 0.1
>> TRUE
The comparison calls this correct because of the generated epsilon. But the printing of the number is confusing for people, how is 0.099999999999999996 = 0.1?. We need a way to show the number such that it represents the shortest number of significant bits to which a number compared to it would be TRUE. So for 0.099999999999999996 this would be 0.1, for 0.569999999992724327 it would be 0.569999999992725.
Is this possible?
You could calculate (num - pow(2.0, exp - 47)) and (num + pow(2.0, exp - 47)), convert both to string and search the smallest decimal between the range.
The exact value of a double is mantissa * pow(2.0, exp - 51) with an integer value mantissa, so if you add/subtract pow(2.0, exp - 47) you change the mantissa by 2^4, which should be exactly representable without rounding errors (unless in corner cases where the mantissa under/overflows, i.e if it is binary <= pow(2,4) or >= pow(2, 53) - pow(2,4). you might want to check for these*).
Then you have two strings, search the first position where the digits differ and cut it off there. Although there are a lot of rounding cases, especially when you not just want a correct number in the range, but the number closes to the input number (but that might not be needed). For example if you get "1.23" and "1.24", you might even want to output `"1.235".
This also shows that your example is wrong. epsilon for 0.569999999992724327 is (to maximal precision) 0.000000000000003552713678800500929355621337890625. The ranges are 0.569999999992720773889232077635824680328369140625 to 0.569999999992727879316589678637683391571044921875 and would be cut off at 0.569999999992725 (or 0.569999999992723 if you prefer that rounding)
An easier to implement sledgehammer method would be to output it to the maximal precision, cut one digit off, convert it back to double, check if it compares correctly. Then continue cutting, till the comparison fails. (could be improved with a binary search)
* They should still be exactly representable, but your comparison method will behave very odd. Consider num1 == 1 and num2 == 1 - pow(2.0, -53) = 0.99999999999999988897769753748434595763683319091796875. There difference 0.00000000000000011102230246251565404236316680908203125 is below your epsilon0.000000000000003552713678800500929355621337890625, but the comparison will say they differ, because they have different exponents
Yes, it's possible.
double a=fmod(4.1,1);
cerr<<std::setprecision(0)<<a<<"\n";
cerr<<std::setprecision(10)<<a<<"\n";
cerr<<std::setprecision(20)<<a<<"\n";
produces:
0.1
0.1
0.099999999999999644729
I think you just need to determine what level of display precision corresponds to your epsilon value.
We need a way to show the number such that it represents the shortest
number of significant bits to which a number compared to it would be
TRUE.
Can't you just do it the brute-force-ish way?
float num = 0.09999999;
for (int precision = 0; precision < MAX_PRECISION; ++precision) {
std::stringstream str;
float tmp = 0;
str << std::fixed << std::setprecision(precision) << num;
str >> tmp;
if (num == tmp) {
std::cout << std::fixed << std::setprecision(precision) << num;
break;
}
}
It is not possible to avoid confusing users given the constraints you've specified. For one thing, 0.0999999999999996447 compares equal to 0.1, and 0.1000000000000003664 compares equal to 0.1, but 0.0999999999999996447 does not compare equal to 0.1000000000000003664. For another, 2.00000000000001421 compares equal to 2.0, but 1.999999999999999778 does not compare equal to 2.0 even though it's much closer to 2.0 than 2.00000000000001421 is.
Enjoy.

comparison function in g++

what is wrong with my code? its converting inches and feet and comparing them in meters. if i enter 12 for inches and 1 for feet it says that the numbers are not equal. Is this a known issue with g++? Can somebody explain this to me?
#include <iostream>
#include <cmath>
using namespace std;
int main()
{
double in, ft, m1, m2;
cin >> in >> ft;
m1 = in * 0.0254;
m2 = ft * 0.3048;
cout << m1 << '\t' << m2 << '\n' << endl;
// to show that both numbers are equal
if (m1 == m2) cout << "yay";
else cout << "boo";
}
Does anybody else have this issue?
#Josh, add this to your code and run it
cout << m2-m1;
u will be surprised, answer is not zero
For the problem in code, changing data type from double to float fixes the problem
float in, ft, m1, m2;
The reason that the numbers don't match is that computers use a binary representation of numbers which leads to inaccuracies when trying to represent decimal numbers.
You think the number is 0.3048 (because that's what you coded) - but when compiled, the computer can only represent this as the nearest equivalent in binary format (see IEEE floating point for more info). So the number might be something extremely close to 0.3048, but not precisely that.
After you've done your calculations, you compare the numbers - but if the two are not absolutely identical in their binary representations, they won't match.
One simple way to solve it (but by no means the only solution) it to subtract the two operands and check how close to zero it is. If:
fabs(a - b) < 0.00001
(an arbitrary amount), then you can presume the values are the same.
What you're seeing is a result of inexact floating point representation. Base 2^n floating point numbers cannot represent all base 10 decimal values exactly. Thus, when you do something simple like multiplying 12*0.0254 you get the very odd result of 0.3047999.......6, whereas if you compute 1*0.3048 you get the expected result of 0.3048. The problem is that 0.0254 isn't being stored exactly; instead, the closest approximate value (something like 0.0253999999....98) is used. The difference is small but can become noticeable when you use the inexact value in a calculation, and then compare it to another value which doesn't suffer from rounding issue such as 0.3048. A basic rule to keep in mind is that you should never compare floating point values for equality; instead, compare them in a manner that allows for an acceptable error, e.g. instead of comparing values in the following manner:
if(val1 == val2)...
use something like
if(abs(val1 - val2) < 0.0000001)...
so that the two variables will be considered equal if their values differ by less than 1/10,000,000 (which is pretty close :-).

Round a double number when printing to cout

I want to print only one digit after the decimal dot.
So I write:
double number1 = -0.049453;
double number2 = -0.05000;
cout.setf(ios::fixed,ios::floatfield);
cout.precision(1);
cout << number1 << endl;
cout << number2 << endl;
My output is:
-0.0
-0.1
I want that the first line will be 0.0 (cause -0.0 is 0.0). How should I change my code that in a case of -0.0, it will print 0.0?
And about the second line, why doesn't it print 0.0 (or -0.0)?
Any help appreciated!
To print floating point numbers, they are rounded up if they are greater or equal to the "middle" of two possible outputs, so -0.05 becomes -0.0 intentionally.
Negative zero is the result if a negative number is rounded up to zero, also intentionally. The internal floating point format (IEEE 754) distinguishes between negative and positive zero (see this link), and it seems like C++ output streams stick to this behavior.
You can overcome both problems by separating printing the sign and the absolute value of the floating point number.
cout << (number <= -.05 ? "-" : "") << abs(number);
Explanation of the comparison: For numbers equal to or smaller than -0.05, rounding abs(number) (which is then >= 0.05) will be non-zero, so only then you want the sign to appear.
Note that the threshold value in this comparison depends on the output format you chose. So you should keep this printing method as local as possible in your code, where you know the exact format you want to print floating pointers.
Live example
Note: float and double behave exactly the same regarding these points.