Why do signed negative integers start at the lowest value? - sign

I can't really explain this question in words alone (probably why I can't find an answer), so I'll try to give as much detail as I can. This isn't really a practical question, I'm just curious.
So let's say we have a signed 8bit int.
sign | bytes | sign 0 | sign 1
? | 0000000 | (+)0 | (-)128
? | 1111111 | (+)127 | (-)1
I don't understand why this works this way, can someone explain? In my head, it makes more sense for the value to be the same and for the sign to just put a plus or minus in front, so to me it looks backwards.

There are a couple of systems for signed integers.
One of them, sign-magnitude, is exactly what you expect: a part that says how big the number is, and a bit that either leaves the number positive or negates it. That makes the sign bit really special, substantially different than the other bits. For example:
sign-magnitude representation
0_0000000 = 0
0_0000001 = 1
1_0000001 = -1
1_0000000 = -0
This has some uncomfortable side-effects, mainly no longer corresponding to unsigned arithmetic in a useful way (if you add two sign-magnitude integers as if they are unsigned weird things happen, eg -0 + 1 = -1), which has far-reaching consequences: addition/subtraction/equals/multiplication all need special signed versions of them, multiplication and division by powers of two in no way corresponds to bit shifts (except accidentally), since it has no clear correlation to Z/2^k Z it's not immediately clear how it behaves algebraically. Also -0 exists as separate thing from 0, which is weird and causes different kinds of trouble depending on your semantics for it, but never no trouble.
The most common system by far is two's complement, where the sign bit does not mean "times 1 or times -1" but "add 0 or add -2^k". As with one's complement, the sign bit is largely a completely normal bit (except with respect to division and right shift). For example:
two's complement representation (8bit)
00000000 = 0 (no surprises there)
10000000 = -128
01111111 = 127
11111111 = -1 (= -128 + 127)
etc
Now note that 11111111 + 00000001 = 0 in unsigned 8bit arithmetic anyway, and -1+1=0 is clearly desirable (in fact it is the definition of -1). So what it comes down to, at least for addition/subtraction/multiplication/left shift, is plain old unsigned arithmetic - you just print the numbers differently. Of course some operators still need special signed versions. Since it corresponds to unsigned arithmetic so closely, you can reason about additions and multiplications as if you are in Z/2^k Z with total confidence. It does have a slight oddity comparable with the existence of negative zero, namely the existence of a negative number with no positive absolute value.

The idea to make the value the same and to just put a plus or minus in front is a known idea, called a signed magnitude representation or a similar expression. A discussion here says the two major problems with signed magnitude representation are that there are two zeros (plus and minus), and that integer arithmetic becomes more complicated in the computer algorithm.
A popular alternative for computers is a two's complement representation, which is what you are asking about. This representation makes arithmetic algorithms simpler, but looks weird when you imagine plotting the binary values along a number line, as you are doing. Two's complement also has a single zero, which takes care of the first major problem.
The signed number representations article in Wikipedia has comparison tables illustrating signed magnitude, two's complement, and three other representation systems of values in a decimal number line from -11 to +16 and in a binary value chart from 0000 to 1111.

Related

Signed type representation in c++

In the book I am reading it says that:
The standard does not define how signed types are represented, but does specify that range should be evenly divided between positive and negative values. Hence, an 8-bit signed char is guaranteed to be able to hold values from -127 through 127; most modern machines use representations that allow values from -128 through 127.
I presume that [-128;127] range arises from method called "twos-complement" in which negative number is !A+1 (e.g. 0111 is 7, and 1001 is then -7). But I cannot wrap my head around why in some older(?) machines the values range [-127;127]. Can anyone clarify this?
Both one's complement and signed magnitude are representations that provide the range [-127,127] with an 8 bit number. Both have a different representation for +0 and -0. Both have been used by (mostly) early computer systems.
The signed magnitude representation is perhaps the simplest for humans to imagine and was probably used for the same reason as why people first created decimal computers, rather than binary.
I would imagine that the only reason why one's complement was ever used, was because two's complement hadn't yet been considered by the creators of early computers. Then later on, because of backwards compatibility. Although, this is just my conjecture, so take it with a grain of salt.
Further information: https://en.wikipedia.org/wiki/Signed_number_representations
As a slightly related factoid: In the IEEE floating point representation, the signed exponent uses excess-K representation and the fractional part is represented by signed magnitude.
It's not actually -127 to 127. But -127 to -0 and 0 to 127.
Earlier processor used two methods:
Signed magnitude: In this a a negative answer is form by putting 1 at the most significant bit. So 10000000 and 00000000 both represent 0
One's complement: Just applying not to positive number. This cause two zero representation: 11111111 and 00000000.
Also two's complement is nearly as old as other two. https://www.linuxvoice.com/edsac-dennis-wheeler-and-the-cambridge-connection/

NOT(~) bitwise operator on signed type number

I'm having little trouble understanding how not(~) operator work for positive signed number.
as an example -ve numbers are stored in 2's compliment
int a=-5;
cout<<bitset<32>(a)<<endl; // 11111111111111111111111111111011
cout<<bitset<32>(~a)<<endl; // 00000000000000000000000000000100
cout<<(~a)<<endl; // 4
4 is an expected output
but
int a=5;
cout<<bitset<32>(a)<<endl; // 00000000000000000000000000000101
cout<<bitset<32>(~a)<<endl; // 11111111111111111111111111111010
cout<<(~a)<<endl; // -6
how come -6?
The bitwise not (~) operator can basically be summarized as the following
~x == -(x+1)
The reason for this is because negation (unary -) uses 2's complement, in general.
Two's complement is defined as taking the bitwise not and then adding one i.e. -x == (~x) + 1. Simple transposition of the + 1 to the other side and then using distributive property of the negative sign (i.e. distributive property of multiplication, specifically with -1 in this case), we get the equation on top.
In a more mathematical sense:
-x == (~x) + 1 // Due to way negative numbers are stored in memory (in 2's complement)
-x - 1 == ~x // Transposing the 1 to the other side
-(x+1) == ~x // Distributing the negative sign
The C++ standard only says that the binary NOT operator (~) flips all the bits that are used to represent the value. What this means for the integer value depends on what interpretation of the bits your machine uses. The C++ standard allows the following three representations of signed integers:
sign+magnitde: There is a sign bit followed by bits encoding the magnitude.
one's complement: The idea is to represent -x by flipping all bits of x. This is similar to sign+magnitude where in the negative cases the magnitude bits are all flipped. So, 1...1111 would be -0, 1...1110 would be -1, 1...1101 would be -2 and so on...
two's complement: This representation removes one of two possible encodings for zero and adds a new value at the lower end. In two's complement 1...1111 represents -1, 1...1110 represents -2 and so on. So, it's basically one shifted down compared to one's complement.
Two's complement is much more popular these days. X86, X64, ARM, MIPS all use this representation for signed numbers as far as I know. Your machine apparently also uses two's complement because you observed ~5 to be -6.
If you want to stay 100% portable you should not rely on the two's complement representation and only use these kinds of operators on unsigned types.
It is due to how the computer understands signed numbers.
take a byte : 8 bits.
1 (the strong one) for de sign
7 for the number
as a whole :
so from -128 to +127
for negative numbers :
-128 to -1
for positive numbers : 0 to 127.
there is no need to have two 0 in numbers, therefore, there is one more negative number.

Is it possible to differentiate between 0 and -0?

I know that the integer values 0 and -0 are essentially the same.
But, I am wondering if it is possible to differentiate between them.
For example, how do I know if a variable was assigned -0?
bool IsNegative(int num)
{
// How ?
}
int num = -0;
int additinon = 5;
num += (IsNegative(num)) ? -addition : addition;
Is the value -0 saved in the memory the exact same way as 0?
It depends on the machine you're targeting.
On a machine that uses a 2's complement representation for integers there's no difference at bit-level between 0 and -0 (they have the same representation)
If your machine used one's complement, you definitely could
0000 0000 -> signed  0 
1111 1111 -> signed −0
Obviously we're talking about using native support, x86 series processors have native support for the two's complement representation of signed numbers. Using other representations is definitely possible but would probably be less efficient and require more instructions.
(As JerryCoffin also noted: even if one's complement has been considered mostly for historical reasons, signed magnitude representations are still fairly common and do have a separate representation for negative and positive zero)
For an int (in the almost-universal "2's complement" representation) the representations of 0 and -0 are the same. (They can be different for other number representations, eg. IEEE 754 floating point.)
Let's begin with representing 0 in 2's complement (of course there exist many other systems and representations, here I'm referring this specific one), assuming 8-bit, zero is:
0000 0000
Now let's flip all the bits and add 1 to get the 2's complement:
1111 1111 (flip)
0000 0001 (add one)
---------
0000 0000
we got 0000 0000, and that's the representation of -0 as well.
But note that in 1's complement, signed 0 is 0000 0000, but -0 is 1111 1111.
I've decided to leave this answer up since C and C++ implementations are usually closely related, but in fact it doesn't defer to the C standard as I thought it did. The point remains that the C++ standard does not specify what happens for cases like these. It's also relevant that non-twos-complement representations are exceedingly rare in the real world, and that even where they do exist they often hide the difference in many cases rather than exposing it as something someone could easily expect to discover.
The behavior of negative zeros in the integer representations in which they exist is not as rigorously defined in the C++ standard as it is in the C standard. It does, however, cite the C standard (ISO/IEC 9899:1999) as a normative reference at the top level [1.2].
In the C standard [6.2.6.2], a negative zero can only be the result of bitwise operations, or operations where a negative zero is already present (for example, multiplying or dividing negative zero by a value, or adding a negative zero to zero) - applying the unary minus operator to a value of a normal zero, as in your example, is therefore guaranteed to result in a normal zero.
Even in the cases that can generate a negative zero, there is no guarantee that they will, even on a system that does support negative zero:
It is unspecified whether these cases actually generate a negative zero or a normal zero, and whether a negative zero becomes a normal zero when stored in an object.
Therefore, we can conclude: no, there is no reliable way to detect this case. Even if not for the fact that non-twos-complement representations are very uncommon in modern computer systems.
The C++ standard, for its part, makes no mention of the term "negative zero", and has very little discussion of the details of signed magnitude and one's complement representations, except to note [3.9.1 para 7] that they are allowed.
If your machine has distinct representations for -0 and +0, then memcmp will be able to distinguish them.
If padding bits are present, there might actually be multiple representations for values other than zero as well.
In the C++ language specification, there is no such int as negative zero.
The only meaning those two words have is the unary operator - applied to 0, just as three plus five is just the binary operator + applied to 3 and 5.
If there were a distinct negative zero, two's complement (the most common representation of integers types) would be an insufficient representation for C++ implementations, as there is no way to represent two forms of zero.
In contrast, floating points (following IEEE) have separate positive and negative zeroes. They can be distinguished, for example, when dividing 1 by them. Positive zero produces positive infinity; negative zero produces negative infinity.
However, if there happen to be different memory representations of the int 0 (or any int, or any other value of any other type), you can use memcmp to discover that:
#include <string>
int main() {
int a = ...
int b = ...
if (memcmp(&a, &b, sizeof(int))) {
// a and b have different representations in memory
}
}
Of course, if this did happen, outside of direct memory operations, the two values would still work in exactly the same way.
To simplify i found it easier to visualize.
Type int(_32) is stored with 32 bits. 32 bits means 2^32 = 4294967296 unique values. Thus :
unsigned int data range is 0 to 4,294,967,295
In case of negative values it depends on how they are stored. In case
Two's complement –2,147,483,648 to 2,147,483,647
One's complement –2,147,483,647 to 2,147,483,647
In case of One's complement value -0 exists.

Are negative numbers that are left shifted *always* filled in with "1" instead of "0"?

I'm independently studying bit-shifting by porting some C++ functions to .NET's BigInteger. I noticed that when I shifted BigInteger the void was filled in with ones.
I believe this has to do with the negative number being stored in twos compliment form.
BigInteger num = -126;
compactBitsRepresentation = (uint)(int)(num << 16);
Here is what happened after the shift (most significant bit first)
10000010 will be shifted 16
11111111100000100000000000000000 was shifted 16
Should I always expect a similar bit-shift operation to act this way? Is this consistent with different languages and implementations of "bigNumber" such as OpenSSL?
From the BigInteger.LeftShift operator docs:
Unlike the bitwise left-shift operation with integer primitives, the
LeftShift method preserves the sign of the original BigInteger value.
So .NET guarantees the behavior that you see.
I'm not that familiar with bignum libraries, but the docs for OpenSSL's BIGNUM BN_lshift()` function says:
BN_lshift() shifts a left by n bits and places the result in r ("r=a*2^n"). BN_lshift1() shifts a left by one and places the result in r ("r=2*a").
Since the operation is defined in terms of multiplication by a power of two, if you convert the resulting BIGNUM to a two's complement number (I have no idea how BIGNUM represents numbers internally) then you'll see similar behavior to .NET.
I wouldn't be surprised if other bignum libraries behaved similarly, but you'd really need to check the docs if you want to depend on the behavior. However, since shifting is very similar to multiplication or division by powers of two, you can probably get 'portable' behavior by using an appropriate multiplication or division instead of a shift. Then all you'd need to ensure is that you can get the conversion to a two's complement representation (which is an issue that is really independent of the behavior of the shift operation).
Should I always expect a similar bit-shift operation to act this way?
You should if the number format in question uses two's complement to represent negative numbers (and many do). To form the two's complement of a number, you invert all the bits and add one, so for example:
23 is represented as 00010111
-23 is represented as 11101001 (that is, 11101000 + 1)
Furthermore, when you convert a type to a larger type, the value is usually sign-extended, i.e. the leftmost bit is extended into the extra bits in the larger type. That preserves the sign of the number.
So yes, it's quite common for numeric representations to be "filled" with 1 for numeric values.
10000010 is first changed to a bigger width. In this case, 4 bytes:
10000010 --> 11111111 11111111 11111111 10000010
You get 1s on the left since the number is negative.
Now the left shift simply inserts 0s from the right and throw bits from the left:
11111111 11111111 11111111 10000010 << 16 -->
11111111 10000010 00000000 00000000

Two's complement binary form

In a TC++ compiler, the binary representation of 5 is (00000000000000101).
I know that negative numbers are stored as 2's complement, thus -5 in binary is (111111111111011). The most significant bit (sign bit) is 1 which tells that it is a negative number.
So how does the compiler know that it is -5? If we interpret the binary value given above (111111111111011) as an unsigned number, it will turn out completely different?
Also, why is the 1's compliment of 5 -6 (1111111111111010)?
The compiler doesn't know. If you cast -5 to unsigned int you'll get 32763.
The compiler knows because this is the convention the CPU uses natively. Your computer has a CPU that stores negative numbers in two's complement notation, so the compiler follows suit. If your CPU supported one's complement notation, the compiler would use that (as is the case with IEEE floats, incidentally).
The Wikipedia article on the topic explains how two's complement notation works.
The processor implements signed and unsigned instructions, which will operate on the binary number representation differently. The compiler knows which of these instructions to emit based on the type of the operands involved (i.e. int vs. unsigned int).
The compiler doesn't need to know if a number is negative or not, it simply emits the correct machine or intermediate language instructions for the types involved. The processor or runtime's implementation of these instructions usually doesn't much care if the number is negative or not either, as the formulation of two's complement arithmetic is such that it is the same for positive or negative numbers (in fact, this is the chief advantage of two's complement arithmetic). What would need to know if a number is negative would be something like printf(), and as Andrew Jaffe pointed out, the MSBit being set is indicative of a negative number in two's complement.
The first bit is set only for negative numbers (it's called the sign bit)
Detailed information is available here
The kewl part of two's complement is that the machine language Add, and Subtract instructions can ignore all that, and just do binary arithmetic and it just works...
i.e., -3 + 4
in Binary 2's complement, is
1111 1111 1111 1101 (-3)
+ 0000 0000 0000 0100 ( 4)
-------------------
0000 0000 0000 0001 ( 1)
let us give an example:
we have two numbers in two bytes in binary:
A = 10010111
B = 00100110
(note that the machine does not know the concept of signed or unsigned in this level)
now when you say "add" these two, what does the machine? it simply adds:
R = 10111101 (and carry bit : 1)
now, we -as compiler- need to interpret the operation. we have two options: the numbers can be signed or unsigned.
1- unsigned case: in c, the numbers are of type "unsigned char" and the values are 151 and 38 and the result is 189. this is trivial.
2 - signed case: we, the compiler, interpret the numbers according to their msb and the first number is -105 and the second is still 38. so -105 + 38 = -67. But -67 is 10111101. But this is what we already have in the result (R)! The result is same, the only difference is how the compiler interprets it.
The conclusion is that, no matter how we consider the numbers, the machine does the same operation on the numbers. But the compiler will interpret the results in its turn.
Note that, it is not the machine who knows the concept of 2's complement. it just adds two numbers without caring the content. The compiler, then, looks at the sign bit and decides.
When it comes to subtraction, this time again, the operation is unique: take 2's complement of the second number and add the two.
If the number is declared as a signed data type (and not type cast to an unsigned type), then the compiler will know that, when the sign bit is 1, it's a negative number. As for why 2's complement is used instead of 1's complement, you don't want to be able to have a value of -0, which 1's complement would allow you to do, so they invented 2's complement to fix that.
It's exactly that most significant bit -- if you know a number is signed, then if the MSB=1 the compiler (and the runtime!) knows to interpret it as negative. This is why c-like languages have both integers (positive and negative) and unsigned integers -- in that case you interpret them all as positive. Hence a signed byte goes from -128 to 127, but an unsigned byte from 0 to 255.