I wrote this code to overload the unary operator- on a matrix class:
const RegMatrix RegMatrix::operator-()const{
RegMatrix result(numRow,numCol);
int i,j;
for(i=0;i<numRow;++i)
for(j=0;j<numCol;++j){
result.setElement(i,j,(-_matrix[i][j]));
}
return result;
}
When i ran my program with debugger in visual studio, it showed me that when the operation is done on a double equals zero, it inserts the result matrix the number -0.00000.
Is it some weird VS-display feature, or is it something i should handle carefully?
Signed zero is zero with an associated
sign. In ordinary arithmetic, −0 = +0
= 0. However, in computing, some number representations allow for the
existence of two zeros, often denoted
by −0 (negative zero) and +0 (positive
zero). This occurs in some signed
number representations for integers,
and in most floating point number
representations. The number 0 is
usually encoded as +0, however it can
be represented by either +0 or −0.
The IEEE 754 standard for floating
point arithmetic (presently used by
most computers and programming
languages that support floating point
numbers) requires both +0 and −0. The
zeroes can be considered as a variant
of the extended real number line such
that 1/−0 = −∞ and 1/+0 = +∞, division
by zero is only undefined for ±0/±0.
Negatively signed zero echoes the
mathematical analysis concept of
approaching 0 from below as a
one-sided limit, which may be denoted
by x → 0−, x → 0−, or x → ↑0. The
notation "−0" may be used informally
to denote a small negative number that
has been rounded to zero. The concept
of negative zero also has some
theoretical applications in
statistical mechanics and other
disciplines.
It is claimed that the inclusion of
signed zero in IEEE 754 makes it much
easier to achieve numerical accuracy
in some critical problems,1 in
particular when computing with complex
elementary functions.[2] On the other
hand, the concept of signed zero runs
contrary to the general assumption
made in most mathematical fields (and
in most mathematics courses) that
negative zero is the same thing as
zero. Representations that allow
negative zero can be a source of
errors in programs, as software
developers do not realize (or may
forget) that, while the two zero
representations behave as equal under
numeric comparisons, they are
different bit patterns and yield
different results in some operations.
For more information see Signed Zero wiki page.
using double (IEEE754), there is defined positive and negative zero.
Well for doubles actually have different values for '0.0' and '-0.0' I think it makes perfect sense....
What different result did you expect?
As ereOn said, you've got a negative zero:
#include <stdio.h>
int main()
{
printf("%f\n", -0.0);
}
-0 and 0 are the same thing, and it is nothing to worry about. Floating point numbers have the capability to have both a positive and negative 0, for math reasons. But -0 is interpreted the same way as 0 in C/C++ arithmetic.
Related
Due to the floating point "approx" nature, its possible that two different sets of values return the same value.
Example:
#include <iostream>
int main() {
std::cout.precision(100);
double a = 0.5;
double b = 0.5;
double c = 0.49999999999999994;
std::cout << a + b << std::endl; // output "exact" 1.0
std::cout << a + c << std::endl; // output "exact" 1.0
}
But is it also possible with subtraction? I mean: is there two sets of different values (keeping one value of them) that return 0.0?
i.e. a - b = 0.0 and a - c = 0.0, given some sets of a,b and a,c with b != c??
The IEEE-754 standard was deliberately designed so that subtracting two values produces zero if and only if the two values are equal, except that subtracting an infinity from itself produces NaN and/or an exception.
Unfortunately, C++ does not require conformance to IEEE-754, and many C++ implementations use some features of IEEE-754 but do not fully conform.
A not uncommon behavior is to “flush” subnormal results to zero. This is part of a hardware design to avoid the burden of handling subnormal results correctly. If this behavior is in effect, the subtraction of two very small but different numbers can yield zero. (The numbers would have to be near the bottom of the normal range, having some significand bits in the subnormal range.)
Sometimes systems with this behavior may offer a way of disabling it.
Another behavior to beware of is that C++ does not require floating-point operations to be carried out precisely as written. It allows “excess precision” to be used in intermediate operations and “contractions” of some expressions. For example, a*b - c*d may be computed by using one operation that multiplies a and b and then another that multiplies c and d and subtracts the result from the previously computed a*b. This latter operation acts as if c*d were computed with infinite precision rather than rounded to the nominal floating-point format. In this case, a*b - c*d may produce a non-zero result even though a*b == c*d evaluates to true.
Some C++ implementations offer ways to disable or limit such behavior.
Gradual underflow feature of IEEE floating point standard prevents this. Gradual underflow is achieved by subnormal (denormal) numbers, which are spaced evenly (as opposed to logarithmically, like normal floating point) and located between the smallest negative and positive normal numbers with zeroes in the middle. As they are evenly spaced, the addition of two subnormal numbers of differing signedness (i.e. subtraction towards zero) is exact and therefore won't reproduce what you ask. The smallest subnormal is (much) less than the smallest distance between normal numbers, and therefore any subtraction between unequal normal numbers is going to be closer to a subnormal than zero.
If you disable IEEE conformance using a special denormals-are-zero (DAZ) or flush-to-zero (FTZ) mode of the CPU, then indeed you could subtract two small, close numbers which would otherwise result in a subnormal number, which would be treated as zero due to the mode of the CPU. A working example (Linux):
_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON); // system specific
double d = std::numeric_limits<double>::min(); // smallest normal
double n = std::nextafter(d, 10.0); // second smallest normal
double z = d - n; // a negative subnormal (flushed to zero)
std::cout << (z == 0) << '\n' << (d == n);
This should print
1
0
First 1 indicates that result of subtraction is exactly zero, while the second 0 indicates that the operands are not equal.
Unfortunately the answer is dependent on your implementation and the way it is configured. C and C++ don't demand any specific floating point representation or behavior. Most implementations use the IEEE 754 representations, but they don't always precisely implement IEEE 754 arithmetic behaviour.
To understand the answer to this question we must first understand how floating point numbers work.
A naive floating point representation would have an exponent, a sign and a mantissa. It's value would be
(-1)s2(e – e0)(m/2M)
Where:
s is the sign bit, with a value of 0 or 1.
e is the exponent field
e0 is the exponent bias. It essentially sets the overall range of the floating point number.
M is the number of mantissa bits.
m is the mantissa with a value between 0 and 2M-1
This is similar in concept to the scientific notation you were taught in school.
However this format has many different representations of the same number, nearly a whole bit's worth of encoding space is wasted. To fix this we can add an "implicit 1" to the mantissa.
(-1)s2(e – e0)(1+(m/2M))
This format has exactly one representation of each number. However there is a problem with it, it can't represent zero or numbers close to zero.
To fix this IEEE floating point reserves a couple of exponent values for special cases. An exponent value of zero is reserved for representing small numbers known as subnormals. The highest possible exponent value is reserved for NaNs and infinities (which I will ignore in this post since they aren't relevant here). So the definition now becomes.
(-1)s2(1 – e0)(m/2M) when e = 0
(-1)s2(e – e0)(1+(m/2M)) when e >0 and e < 2E-1
With this representation smaller numbers always have a step size that is less than or equal to that for larger ones. So provided the result of the subtraction is smaller in magnitude than both operands it can be represented exactly. In particular results close to but not exactly zero can be represented exactly.
This does not apply if the result is larger in magnitude than one or both of the operands, for example subtracting a small value from a large value or subtracting two values of opposite signs. In those cases the result may be imprecise but it clearly can't be zero.
Unfortunately FPU designers cut corners. Rather than including the logic to handle subnormal numbers quickly and correctly they either did not support (non-zero) subnormals at all or provided slow support for subnormals and then gave the user the option to turn it on and off. If support for proper subnormal calculations is not present or is disabled and the number is too small to represent in normalized form then it will be "flushed to zero".
So in the real world under some systems and configurations subtracting two different very-small floating point numbers can result in a zero answer.
Excluding funny numbers like NAN, I don't think it's possible.
Let's say a and b are normal finite IEEE 754 floats, and |a - b| is less than or equal to both |a| and |b| (otherwise it's clearly not zero).
That means the exponent is <= both a's and b's, and so the absolute precision is at least as high, which makes the subtraction exactly representable. That means that if a - b == 0, then it is exactly zero, so a == b.
Consider the following C++ code:
double someZero = 0;
std::cout << 0 - someZero << '\n'; // prints 0
std::cout << -someZero << std::endl; // prints -0
The question arises: what is negative zero good for, and should it be defensively avoided (i.e. use subtraction instead of smacking a minus onto a variable)?
From Wikipedia:
It is claimed that the inclusion of signed zero in IEEE 754 makes it much easier to achieve numerical accuracy in some critical problems[1], in particular when computing with complex elementary functions[2].
The first reference is "Branch Cuts for Complex Elementary Functions or Much Ado About Nothing's Sign Bit" by W. Kahan, that is available for download here.
One example from that paper is 1/(+0) vs 1/(-0). Here, the sign of zero makes a huge difference, since the first expression equals +inf and the second, -inf.
In addition
Signed Zero Good For :
The zeroes can be considered as a variant of the extended real number line such that 1/−0 = −∞ and 1/+0 = +∞, division by zero is only undefined for ±0/±0.
Negatively signed zero echoes the mathematical analysis concept of approaching 0 from below as a one-sided limit, which may be denoted by x → 0−, x → 0−, or x → ↑0. The notation "−0" may be used informally to denote a small negative number that has been rounded to zero. The concept of negative zero also has some theoretical applications in statistical mechanics and other disciplines
There are only two real use-cases that I can see:
You want to show that a value is negative but very very small (perhaps infinitessimal), i.e. too small to represent as a float or double.
You are working with math that only allows negatives, but still want to display zero. There are a few cases in physics, complex numbers and number theory where this can be useful.
For the mostpart, it's not useful and should be avoided.
You may also want to take a look at this question: Is there a negative zero? and the IEEE 754 spec for floating point.
I'm making a measuring app, and the -0 is very useful for mixed numbers (such as separating into feet and inches).
Imagine that we have a variable "length" that we're trying to separate into "feet" and "inches".
(This is java code, but the same idea is true for C++).
feet = Math.signum(length) * Math.floor(Math.abs(length / 12));
// could also do feet = length>0 ? Math.floor(length / 12) : Math.ceil(length / 12)
inches = Math.abs(length) % 12;
If the length is between -1 feet and 0 feet, we'd want it to say -0 for the feet so we know it's negative.
Negative zero has for example some use when handling complex numbers...
In everyday use one should mostly avoid the negative zero.
Some links with information regarding background/uses/pitfalls of "negative zero":
http://en.wikipedia.org/wiki/Signed_zero
http://en.wikipedia.org/wiki/Floating_point#Signed_zero
http://en.wikipedia.org/wiki/Branch_cut
http://connect.microsoft.com/VisualStudio/feedback/details/344366/negative-zero-behavior-between-c-and-c-code-is-different
http://connect.microsoft.com/VisualStudio/feedback/details/292276/in-vs2005-c-zero-reported-as-negative-zero-for-double-type
C++ ceil and negative zero
I know that the integer values 0 and -0 are essentially the same.
But, I am wondering if it is possible to differentiate between them.
For example, how do I know if a variable was assigned -0?
bool IsNegative(int num)
{
// How ?
}
int num = -0;
int additinon = 5;
num += (IsNegative(num)) ? -addition : addition;
Is the value -0 saved in the memory the exact same way as 0?
It depends on the machine you're targeting.
On a machine that uses a 2's complement representation for integers there's no difference at bit-level between 0 and -0 (they have the same representation)
If your machine used one's complement, you definitely could
0000 0000 -> signed 0
1111 1111 -> signed −0
Obviously we're talking about using native support, x86 series processors have native support for the two's complement representation of signed numbers. Using other representations is definitely possible but would probably be less efficient and require more instructions.
(As JerryCoffin also noted: even if one's complement has been considered mostly for historical reasons, signed magnitude representations are still fairly common and do have a separate representation for negative and positive zero)
For an int (in the almost-universal "2's complement" representation) the representations of 0 and -0 are the same. (They can be different for other number representations, eg. IEEE 754 floating point.)
Let's begin with representing 0 in 2's complement (of course there exist many other systems and representations, here I'm referring this specific one), assuming 8-bit, zero is:
0000 0000
Now let's flip all the bits and add 1 to get the 2's complement:
1111 1111 (flip)
0000 0001 (add one)
---------
0000 0000
we got 0000 0000, and that's the representation of -0 as well.
But note that in 1's complement, signed 0 is 0000 0000, but -0 is 1111 1111.
I've decided to leave this answer up since C and C++ implementations are usually closely related, but in fact it doesn't defer to the C standard as I thought it did. The point remains that the C++ standard does not specify what happens for cases like these. It's also relevant that non-twos-complement representations are exceedingly rare in the real world, and that even where they do exist they often hide the difference in many cases rather than exposing it as something someone could easily expect to discover.
The behavior of negative zeros in the integer representations in which they exist is not as rigorously defined in the C++ standard as it is in the C standard. It does, however, cite the C standard (ISO/IEC 9899:1999) as a normative reference at the top level [1.2].
In the C standard [6.2.6.2], a negative zero can only be the result of bitwise operations, or operations where a negative zero is already present (for example, multiplying or dividing negative zero by a value, or adding a negative zero to zero) - applying the unary minus operator to a value of a normal zero, as in your example, is therefore guaranteed to result in a normal zero.
Even in the cases that can generate a negative zero, there is no guarantee that they will, even on a system that does support negative zero:
It is unspecified whether these cases actually generate a negative zero or a normal zero, and whether a negative zero becomes a normal zero when stored in an object.
Therefore, we can conclude: no, there is no reliable way to detect this case. Even if not for the fact that non-twos-complement representations are very uncommon in modern computer systems.
The C++ standard, for its part, makes no mention of the term "negative zero", and has very little discussion of the details of signed magnitude and one's complement representations, except to note [3.9.1 para 7] that they are allowed.
If your machine has distinct representations for -0 and +0, then memcmp will be able to distinguish them.
If padding bits are present, there might actually be multiple representations for values other than zero as well.
In the C++ language specification, there is no such int as negative zero.
The only meaning those two words have is the unary operator - applied to 0, just as three plus five is just the binary operator + applied to 3 and 5.
If there were a distinct negative zero, two's complement (the most common representation of integers types) would be an insufficient representation for C++ implementations, as there is no way to represent two forms of zero.
In contrast, floating points (following IEEE) have separate positive and negative zeroes. They can be distinguished, for example, when dividing 1 by them. Positive zero produces positive infinity; negative zero produces negative infinity.
However, if there happen to be different memory representations of the int 0 (or any int, or any other value of any other type), you can use memcmp to discover that:
#include <string>
int main() {
int a = ...
int b = ...
if (memcmp(&a, &b, sizeof(int))) {
// a and b have different representations in memory
}
}
Of course, if this did happen, outside of direct memory operations, the two values would still work in exactly the same way.
To simplify i found it easier to visualize.
Type int(_32) is stored with 32 bits. 32 bits means 2^32 = 4294967296 unique values. Thus :
unsigned int data range is 0 to 4,294,967,295
In case of negative values it depends on how they are stored. In case
Two's complement –2,147,483,648 to 2,147,483,647
One's complement –2,147,483,647 to 2,147,483,647
In case of One's complement value -0 exists.
When comparing doubles for equality, we need to give a tolerance level, because floating-point computation might introduce errors. For example:
double x;
double y;
x = f();
y = g();
if (fabs(x-y)<epsilon) {
// they are equal!
} else {
// they are not!
}
However, if I simply assign a constant value, without any computation, do I still need to check the epsilon?
double x = 1;
double y = 1;
if (x==y) {
// they are equal!
} else {
// no they are not!
}
Is == comparison good enough? Or I need to do fabs(x-y)<epsilon again? Is it possible to introduce error in assigning? Am I too paranoid?
How about casting (double x = static_cast<double>(100))? Is that gonna introduce floating-point error as well?
I am using C++ on Linux, but if it differs by language, I would like to understand that as well.
Actually, it depends on the value and the implementation. The C++ standard (draft n3126) has this to say in 2.14.4 Floating literals:
If the scaled value is in the range of representable values for its type, the result is the scaled value if representable, else the larger or smaller representable value nearest the scaled value, chosen in an implementation-defined manner.
In other words, if the value is exactly representable (and 1 is, in IEEE754, as is 100 in your static cast), you get the value. Otherwise (such as with 0.1) you get an implementation-defined close match (a). Now I'd be very worried about an implementation that chose a different close match based on the same input token but it is possible.
(a) Actually, that paragraph can be read in two ways, either the implementation is free to choose either the closest higher or closest lower value regardless of which is actually the closest, or it must choose the closest to the desired value.
If the latter, it doesn't change this answer however since all you have to do is hardcode a floating point value exactly at the midpoint of two representable types and the implementation is once again free to choose either.
For example, it might alternate between the next higher and next lower for the same reason banker's rounding is applied - to reduce the cumulative errors.
No if you assign literals they should be the same :)
Also if you start with the same value and do the same operations, they should be the same.
Floating point values are non-exact, but the operations should produce consistent results :)
Both cases are ultimately subject to implementation defined representations.
Storage of floating point values and their representations take on may forms - load by address or constant? optimized out by fast math? what is the register width? is it stored in an SSE register? Many variations exist.
If you need precise behavior and portability, do not rely on this implementation defined behavior.
IEEE-754, which is a standard common implementations of floating point numbers abide to, requires floating-point operations to produce a result that is the nearest representable value to an infinitely-precise result. Thus the only imprecision that you will face is rounding after each operation you perform, as well as propagation of rounding errors from the operations performed earlier in the chain. Floats are not per se inexact. And by the way, epsilon can and should be computed, you can consult any numerics book on that.
Floating point numbers can represent integers precisely up to the length of their mantissa. So for example if you cast from an int to a double, it will always be exact, but for casting into into a float, it will no longer be exact for very large integers.
There is one major example of extensive usage of floating point numbers as a substitute for integers, it's the LUA scripting language, which has no integer built-in type, and floating-point numbers are used extensively for logic and flow control etc. The performance and storage penalty from using floating-point numbers turns out to be smaller than the penalty of resolving multiple types at run time and makes the implementation lighter. LUA has been extensively used not only on PC, but also on game consoles.
Now, many compilers have an optional switch that disables IEEE-754 compatibility. Then compromises are made. Denormalized numbers (very very small numbers where the exponent has reached smallest possible value) are often treated as zero, and approximations in implementation of power, logarithm, sqrt, and 1/(x^2) can be made, but addition/subtraction, comparison and multiplication should retain their properties for numbers which can be exactly represented.
The easy answer: For constants == is ok.
There are two exceptions which you should be aware of:
First exception:
0.0 == -0.0
There is a negative zero which compares equal for the IEEE 754 standard. This means
1/INFINITY == 1/-INFINITY which breaks f(x) == f(y) => x == y
Second exception:
NaN != NaN
This is a special caveat of NotaNumber which allows to find out if a number is a NaN
on systems which do not have a test function available (Yes, that happens).
Consider the following C++ code:
double someZero = 0;
std::cout << 0 - someZero << '\n'; // prints 0
std::cout << -someZero << std::endl; // prints -0
The question arises: what is negative zero good for, and should it be defensively avoided (i.e. use subtraction instead of smacking a minus onto a variable)?
From Wikipedia:
It is claimed that the inclusion of signed zero in IEEE 754 makes it much easier to achieve numerical accuracy in some critical problems[1], in particular when computing with complex elementary functions[2].
The first reference is "Branch Cuts for Complex Elementary Functions or Much Ado About Nothing's Sign Bit" by W. Kahan, that is available for download here.
One example from that paper is 1/(+0) vs 1/(-0). Here, the sign of zero makes a huge difference, since the first expression equals +inf and the second, -inf.
In addition
Signed Zero Good For :
The zeroes can be considered as a variant of the extended real number line such that 1/−0 = −∞ and 1/+0 = +∞, division by zero is only undefined for ±0/±0.
Negatively signed zero echoes the mathematical analysis concept of approaching 0 from below as a one-sided limit, which may be denoted by x → 0−, x → 0−, or x → ↑0. The notation "−0" may be used informally to denote a small negative number that has been rounded to zero. The concept of negative zero also has some theoretical applications in statistical mechanics and other disciplines
There are only two real use-cases that I can see:
You want to show that a value is negative but very very small (perhaps infinitessimal), i.e. too small to represent as a float or double.
You are working with math that only allows negatives, but still want to display zero. There are a few cases in physics, complex numbers and number theory where this can be useful.
For the mostpart, it's not useful and should be avoided.
You may also want to take a look at this question: Is there a negative zero? and the IEEE 754 spec for floating point.
I'm making a measuring app, and the -0 is very useful for mixed numbers (such as separating into feet and inches).
Imagine that we have a variable "length" that we're trying to separate into "feet" and "inches".
(This is java code, but the same idea is true for C++).
feet = Math.signum(length) * Math.floor(Math.abs(length / 12));
// could also do feet = length>0 ? Math.floor(length / 12) : Math.ceil(length / 12)
inches = Math.abs(length) % 12;
If the length is between -1 feet and 0 feet, we'd want it to say -0 for the feet so we know it's negative.
Negative zero has for example some use when handling complex numbers...
In everyday use one should mostly avoid the negative zero.
Some links with information regarding background/uses/pitfalls of "negative zero":
http://en.wikipedia.org/wiki/Signed_zero
http://en.wikipedia.org/wiki/Floating_point#Signed_zero
http://en.wikipedia.org/wiki/Branch_cut
http://connect.microsoft.com/VisualStudio/feedback/details/344366/negative-zero-behavior-between-c-and-c-code-is-different
http://connect.microsoft.com/VisualStudio/feedback/details/292276/in-vs2005-c-zero-reported-as-negative-zero-for-double-type
C++ ceil and negative zero