Two's Complement disadvantage? - twos-complement

I was reading about two's complement, I understand this method is most efficient, but there might be some disadvantages too. I could not find any disadvantages, Is there any situation where the conversion to two's complement could fail to represent the number correctly?

Two's complement is awesome - that's why everyone uses it. The biggest disadvantage is that if you try to negate the lowest representable value, you get an overflow. With one's complement or sign and magnitude, that doesn't happen.

With "two complement"-notation you can't compare the size of two Integers with very simple logic operators (at lowest level hardware). Thats the reason why the exponent in IEEE Standard for Floating-Point Arithmetic (IEEE 754) is not represented in "two complement" but in "biased" notation.

Related

Numeric Limits of Float Lowest and Max

In what real case does c++ std::numeric_limits<float>::lowest() not equal the negative of std::numeric_limits<float>::max()?
std::numeric_limits<float>::lowest() will equal the negative of std::numeric_limits<float>::max() if the floating point representation uses a signed magnitude, as the max and lowest values will have identical significand (ignoring the sign bit) and exponent, with both containing their maximum values. The sign bit will then determine whether the number is positive or negative, and will not affect its magnitude. (Another concequence of this is that the underlying representation will be able to represent both positive and negative zero.) IEEE 754 uses signed magnitude representation, so this is what you are most likely to encounter in practice.
However, an alternative representation is to use a two's complement significand. A property of two's complement is that the maximal negative value has a greater magnitude than the maximal positive value, and so for a floating point representation using a two's complement significand, std::numeric_limits<float>::lowest() will not equal the negative of std::numeric_limits<float>::max(). An example of a real system using a two's complement significand floating point format is the IBM 1130, circa 1970.
Now, cppreference.com does state:
While it's not true for fundamental C++ floating-poing types, a third-party floating-point type T may exist such that std::numeric_limits<T>::lowest() != -std::numeric_limits<T>::max().
But I'm not sure what the basis of that statement is, as the C++11 standard, paragraph 18.3.2.4, merely specifies that lowest() is:
A finite value x such that there is no other finite value y where y < x.
Meaningful for all specializations in which is_bounded != false.
It also notes in a footnote:
lowest() is necessary because not all floating-point representations have a smallest (most negative) value that is the
negative of the largest (most positive) finite value.
So it would seem that a two's complement floating point representation could be used by a C++ implementation, and that would have std::numeric_limits<float>::lowest() not equal to the negative of std::numeric_limits<float>::max().

Signed type representation in c++

In the book I am reading it says that:
The standard does not define how signed types are represented, but does specify that range should be evenly divided between positive and negative values. Hence, an 8-bit signed char is guaranteed to be able to hold values from -127 through 127; most modern machines use representations that allow values from -128 through 127.
I presume that [-128;127] range arises from method called "twos-complement" in which negative number is !A+1 (e.g. 0111 is 7, and 1001 is then -7). But I cannot wrap my head around why in some older(?) machines the values range [-127;127]. Can anyone clarify this?
Both one's complement and signed magnitude are representations that provide the range [-127,127] with an 8 bit number. Both have a different representation for +0 and -0. Both have been used by (mostly) early computer systems.
The signed magnitude representation is perhaps the simplest for humans to imagine and was probably used for the same reason as why people first created decimal computers, rather than binary.
I would imagine that the only reason why one's complement was ever used, was because two's complement hadn't yet been considered by the creators of early computers. Then later on, because of backwards compatibility. Although, this is just my conjecture, so take it with a grain of salt.
Further information: https://en.wikipedia.org/wiki/Signed_number_representations
As a slightly related factoid: In the IEEE floating point representation, the signed exponent uses excess-K representation and the fractional part is represented by signed magnitude.
It's not actually -127 to 127. But -127 to -0 and 0 to 127.
Earlier processor used two methods:
Signed magnitude: In this a a negative answer is form by putting 1 at the most significant bit. So 10000000 and 00000000 both represent 0
One's complement: Just applying not to positive number. This cause two zero representation: 11111111 and 00000000.
Also two's complement is nearly as old as other two. https://www.linuxvoice.com/edsac-dennis-wheeler-and-the-cambridge-connection/

Does RISC-V mandate two's complement or one's complement signedness, or is it implementation-determined?

I have looked through the ISA spec and searched the internet for the answer to this, but I could not find it.
In the RISC-V ISA, should negative numbers be represented with one's complement or two's complement? Or, is this decision left to implementors?
The reason I ask is that I am writing an RV32I simulator, and this would affect how I store negative numbers in the simulated memory, for example.
The RISC-V architecture requires twos-complement integer arithmetic. This can be most directly seen from the fact that it specifies a single addition instruction, not a pair of signed and unsigned addition instructions. In twos-complement arithmetic, signed and unsigned addition are the same operation; in ones-complement (and sign-magnitude) they are not the same.
It appears to me, skimming the architecture manual, that the authors considered the choice of twos-complement integer arithmetic too obvious to bother mentioning. There hasn't been a CPU manufactured in at least 25 years that used anything else.
The user-level ISA manual (page 13) notes that bitwise NOT rd, rs1 may be performed by XORI rd, rs1, -1, which would imply two's complement, if I see things correctly: XORing with the one's complement of -1 would not invert the least significant bit, while it would work correctly in two's complement.
Yes, the RISC-V instruction set architecture (ISA) mandates two's complement:
The base integer instruction sets use a two’s-complement representation for signed integer values.
(Section 1.3 RISC-V ISA Overview, page 4, The RISC-V Instruction Set Manual. Volume I, 2019-06-08, ratified)
General purpose registers x1–x31 hold values that various instructions interpret as a collection of Boolean values, or as two’s complement signed binary integers or unsigned binary integers.
(Section 2.1 Programmers' Model for Base Integer ISA, page 13, The RISC-V Instruction Set Manual. Volume I, 2019-06-08, ratified)

Should I use bit manipulation on float point numbers

I'm writing an algorithm, to round a floating number. The input will be a 64bit IEEE754 double type number, very close to X.5, where X is a integer less than 32. The first solution came into my mind is to use a bit mask, to mask off those least significant bits as they represent very small fractions of 2^-n.(Given the exponent is not large).
But the problem is should I do that? Is there any other ways to accomplish the same thing? I feel using bit operation on float point is very controversy. Thanks!
The langugage I'm using is C++ by the way.
Edit:
Thanks guys, for your comments. I appreciate! Let's say I have a float number, can be 1.4999999... or 21.50000012.... I want to round it to 1.5 or 21.5. My goal is to round any number to its nearest to X.5 form, since it can be stored in a IEEE754 float point number.
If your compiler guarantees that you are using IEEE 754 floating-point, I would recommend that you round according to the method delineated in this blog post: add, and then immediately subtract a large constant so as to send the value in the binade of floating-point numbers where the ULP is 0.5. You won't find any faster method, and it does not involve any bit manipulation.
The appropriate constant to round a number between 0 and 32 to the nearest halt-unit for IEEE 754 double-precision is 2251799813685248.0.
Summary: use x = x + 2251799813685248.0 - 2251799813685248.0;.
You can use any of the functions round(), floor(), ceil(), rint(), nearbyint(), and trunc(). All do rounding in different modes, and all are standard C99. The only thing you need to do is to link against the standard math library by specifying -lm as a compiler flag.
As to trying to achieve rounding by bit manipulations, I would stay away from that: a) it will be much slower than using the functions above (they generally use hardware facilities where possible), b) it is reinventing the wheel with a lot of potential for bugs, and c) the newer C standards don't like you doing bit manipulations on floating point types: they use the so called strict aliasing rules that disallow you to just cast a double* to an uint64_t*. You would either need to do your bit manipulation by casting to a unsigned char* and manipulating the IEEE number byte by byte, or you would have to use memcpy() to copy the bit representation from a double variable into an uint64_t and back again. A lot of hassle for something already available in the form of standardized functions and hardware support.
You want to round x to the nearest value of the form d.5. For a generan number you write:
round(x+0.5)-0.5
For a number close to d.5, less than 0.25 away, you can use Pascal's offering:
round(2*x)*0.5
If you're looking for a bit trick and are guaranteed to have doubles in the ranges you describe, then you could do something like this (inline as you see fit):
void RoundNearestHalf(double &d) {
unsigned const maskshift = ((*(unsigned __int64*)&d >> 52) - 1023);
unsigned __int64 const setmask = 0x0008000000000000 >> maskshift;
unsigned __int64 const clearmask = ~0x0007FFFFFFFFFFFF >> maskshift;
*(unsigned __int64*)&d |= setmask;
*(unsigned __int64*)&d &= clearmask;
}
maskshift is the unbiased exponent. For the input range, we know this will be non-negative and no more than 4 (the trick will work for higher values too, but no more than 51). We use this value to make a setmask which sets the 2^-1 (one-half) place in the mantissa, and clearmask which clears all bits in the mantissa of lower value than 2^-1. The result is d rounded to the nearest half.
Note that it would be worth profiling this against other implementations, perhaps using the standard library to determine whether or not its actually faster.
I can't speak about C++ for sure, but in C99 the use of IEEE 754 standard for floating point will be purely normative (not required). In C99 if the __STDC_IEC_559__ macro is set then it declares that IEC 559 (which is more or less IEEE 754) is used for floating point.
I think it should be pointed out that there are functions to handle many types of rounding for you.

Exponent in IEEE 754

Why exponent in float is displaced by 127?
Well, the real question is : What is the advantage of such notation in comparison to 2's complement notation?
Since the exponent as stored is unsigned, it is possible to use integer instructions to compare floating point values. the the entire floating point value can be treated as a signed magnitude integer value for purposes of comparison (not twos-compliment).
Just to correct some misinformation: it is 2^n * 1.mantissa, the 1 infront of the fraction is implicitly stored.
Note that there is a slight difference in the representable range for the exponent, between biased and 2's complement. The IEEE standard supports exponents in the range of (-127 to +128), while if it was 2's complement, it would be (-128 to +127). I don't really know the reason why the standard chooses the bias form, but maybe the committee members thought it would be more useful to allow extremely large numbers, rather than extremely small numbers.
#Stephen Canon, in response to ysap's answer (sorry, this should have been a follow up comment to my answer, but the original answer was entered as an unregistered user, so I cannot really comment it yet).
Stephen, obviously you are right, the exponent range I mentioned is incorrect, but the spirit of the answer still applies. Assuming that if it was 2's complement instead of biased value, and assuming that the 0x00 and 0xFF values would still be special values, then the biased exponents allow for (2x) bigger numbers than the 2's complement exponents.
The exponent in a 32-bit float consists of 8 bits, but without a sign bit. So the range is effectively [0;255]. In order to represent numbers < 2^0, that range is shifted by 127, becoming [-127;128].
That way, very small numbers can be represented very precisely. With a [0;255] range, small numbers would have to be represented as 2^0 * 0.mantissa with lots of zeroes in the mantissa. But with a [-127;128] range, small numbers are more precise because they can be represented as 2^-126 * 0.mantissa (with less unnecessary zeroes in the mantissa). Hope you get the point.