Counting Sign Bit in 2's Complement in Memory Address - c++

In c++, if memory address of a variable is displayed as 0XFFF, for instance, I am not sure if it's 2047 or 4095(including the "sign bit"). 0xFFF is 1111 1111 1111 in binary. And I have recently learned about the "sign bit", which just indicates if the number is positive or negative. My another guess is that it would be 0000 0000 0001 after converting into 2's complement positive number.
So my question is this: does memory address of 0XFFF indicate that it's 4095 or 2047? or, perhaps, is it 1?

0xFFF can be either. In hex, it's just bits - the rest is all in how you interpret it. It also depends on the width of the data type we're talking about.
Assuming you're not using a 12-bit datatype, the particular value you've shown is always 4095.
Now if we establish the width of our integer, and choose a different value, things get more interesting. Take a 16-bit value 0xFFFF, for example:
Interpretted As an unsigned integer (i.e. uint16_t), it is 65535.
As a signed integer (i.e. int16_t), it is -1.

Memory addresses are typically considered as unsigned values, so two's complement really doesn't come into play in this respect.
For signed C++ primitive types, the standard doesn't care. Whether it's stored as two's complement or not is determined by the CPU architecture, which for x86 and most architectures two's complement is used due to the ease of hardware implementation.

Related

What happen when give value bigger than CHAR_MAX (127) to char [duplicate]

if we assign +128 to char variable then it is converted into -128 because of binary equivalent(10000000-first bit tells sign).
Binary equivalent of 129 is 10000001, What will be the value it will be converted to?
char c=129;
Thanks,
S
There are actually several possibilities.
If char is a signed type, then
char c = 129;
implicitly converts the int value to char. The result is implementation-defined (or it can raise an implementation-defined signal, but I don't know of any implementations that do that). In most implementations, the conversion yields -127 because signed types are represented using two's-complement. In an unsigned 8-bit type, 10000001 represents the value 129; in a signed 8-bit type, the same bit pattern represents -127; the conversion isn't required to keep the same bit pattern, but it very commonly does.
If plain char is an unsigned type (as it is on some systems), then the value 129 is within the range of char, and the conversion simply yields the value 129.
But all this assumes that char is 8 bits. The C standard requires CHAR_BIT (the number of bits in a byte, or equivalently in a char object) to be at least 8, but it permits it to be larger. You're not likely to run into a system with CHAR_BIT > 8, but it's not uncommon in C implementations for DSPs.
Depending on all this is rarely a good idea. Whatever it is you're trying to accomplish by assigning 129 to a char object, there's very likely a better, clearer, and more portable way to do it.
Negative numbers are stored as 2's compliment of the positive of that number.
For eg:
-127 is stored as 2's compliment of 127
127 is 01111111 in binary
2's compliment=(1's compliment of a number) + 1
therefore, 2's compliment of 127(01111111) is 10000001(-127).
And 10000001 is 129:: Therefore when you give 129 to a char variable, compiler takes it as negative number 127.
Similarly, 130 will be assigned as -126.
Thanks!!
Attempt to store +129 in a char results in storing -127 in it. When we attempt to store +128 in a char the first number on the negative side, i.e. -128 gets stored. This is because from the 9-bit binary of +128, 010000000, only the right-most 8 bits get stored. But when 10000000 is stored the left-most bit is 1 and it is treated as a sign bit. Thus the value of the number becomes -128 since it is indeed the binary (2's complement) of -128.In general, if we exceed the range from positive side we end up on the negative side. Vice versa is also true. If we exceed the range from negative side we end upon positive side.
When numbers are in the negative domain (i.e. signing bit is 1) they assume minimum value when the "actual" binary representation (i.e. ignoring signing bit, only dealing with the part that stores the number) is 0.
E.g.
The signed binary: 10000000 represents -128 in decimal.
This makes sense as for every increment of the binary representation there is a increment of the decimal number as well (remember -127 is one more and than -128). If you deal only in decimal and ignore any rules that prohibit overflow behavior, you'll see that as the number approaches the maximum positive value it immediately assumes the maximum negative value and counts up to 0.
Another point you can see is that the binary representation for -1 would be 11111111. If we increment this by one we get 1 00000000 where the leading one is discarded, giving us the number 0.
A quick way to determine what is the value of a signed negative number is to take the binary representation (ignoring the signing bit) and add it to the maximum negative number.
For e.g
If we have the binary representation of 10000001 where the unsigned value is 129, we can see that ignoring the signing bit we have 1 which we add to -128 yielding the answer of -127.
Please note: I am ignoring the fact that sometimes overflows are not permitted and undefined behavior may result. I'm also dealing with signed chars in this answer. Should you have a case where you're dealing with unsigned chars then the full binary expression is used to give it it's value (e.g. 10000001 = 129).

Signed type representation in c++

In the book I am reading it says that:
The standard does not define how signed types are represented, but does specify that range should be evenly divided between positive and negative values. Hence, an 8-bit signed char is guaranteed to be able to hold values from -127 through 127; most modern machines use representations that allow values from -128 through 127.
I presume that [-128;127] range arises from method called "twos-complement" in which negative number is !A+1 (e.g. 0111 is 7, and 1001 is then -7). But I cannot wrap my head around why in some older(?) machines the values range [-127;127]. Can anyone clarify this?
Both one's complement and signed magnitude are representations that provide the range [-127,127] with an 8 bit number. Both have a different representation for +0 and -0. Both have been used by (mostly) early computer systems.
The signed magnitude representation is perhaps the simplest for humans to imagine and was probably used for the same reason as why people first created decimal computers, rather than binary.
I would imagine that the only reason why one's complement was ever used, was because two's complement hadn't yet been considered by the creators of early computers. Then later on, because of backwards compatibility. Although, this is just my conjecture, so take it with a grain of salt.
Further information: https://en.wikipedia.org/wiki/Signed_number_representations
As a slightly related factoid: In the IEEE floating point representation, the signed exponent uses excess-K representation and the fractional part is represented by signed magnitude.
It's not actually -127 to 127. But -127 to -0 and 0 to 127.
Earlier processor used two methods:
Signed magnitude: In this a a negative answer is form by putting 1 at the most significant bit. So 10000000 and 00000000 both represent 0
One's complement: Just applying not to positive number. This cause two zero representation: 11111111 and 00000000.
Also two's complement is nearly as old as other two. https://www.linuxvoice.com/edsac-dennis-wheeler-and-the-cambridge-connection/

Is it possible to differentiate between 0 and -0?

I know that the integer values 0 and -0 are essentially the same.
But, I am wondering if it is possible to differentiate between them.
For example, how do I know if a variable was assigned -0?
bool IsNegative(int num)
{
// How ?
}
int num = -0;
int additinon = 5;
num += (IsNegative(num)) ? -addition : addition;
Is the value -0 saved in the memory the exact same way as 0?
It depends on the machine you're targeting.
On a machine that uses a 2's complement representation for integers there's no difference at bit-level between 0 and -0 (they have the same representation)
If your machine used one's complement, you definitely could
0000 0000 -> signed  0 
1111 1111 -> signed −0
Obviously we're talking about using native support, x86 series processors have native support for the two's complement representation of signed numbers. Using other representations is definitely possible but would probably be less efficient and require more instructions.
(As JerryCoffin also noted: even if one's complement has been considered mostly for historical reasons, signed magnitude representations are still fairly common and do have a separate representation for negative and positive zero)
For an int (in the almost-universal "2's complement" representation) the representations of 0 and -0 are the same. (They can be different for other number representations, eg. IEEE 754 floating point.)
Let's begin with representing 0 in 2's complement (of course there exist many other systems and representations, here I'm referring this specific one), assuming 8-bit, zero is:
0000 0000
Now let's flip all the bits and add 1 to get the 2's complement:
1111 1111 (flip)
0000 0001 (add one)
---------
0000 0000
we got 0000 0000, and that's the representation of -0 as well.
But note that in 1's complement, signed 0 is 0000 0000, but -0 is 1111 1111.
I've decided to leave this answer up since C and C++ implementations are usually closely related, but in fact it doesn't defer to the C standard as I thought it did. The point remains that the C++ standard does not specify what happens for cases like these. It's also relevant that non-twos-complement representations are exceedingly rare in the real world, and that even where they do exist they often hide the difference in many cases rather than exposing it as something someone could easily expect to discover.
The behavior of negative zeros in the integer representations in which they exist is not as rigorously defined in the C++ standard as it is in the C standard. It does, however, cite the C standard (ISO/IEC 9899:1999) as a normative reference at the top level [1.2].
In the C standard [6.2.6.2], a negative zero can only be the result of bitwise operations, or operations where a negative zero is already present (for example, multiplying or dividing negative zero by a value, or adding a negative zero to zero) - applying the unary minus operator to a value of a normal zero, as in your example, is therefore guaranteed to result in a normal zero.
Even in the cases that can generate a negative zero, there is no guarantee that they will, even on a system that does support negative zero:
It is unspecified whether these cases actually generate a negative zero or a normal zero, and whether a negative zero becomes a normal zero when stored in an object.
Therefore, we can conclude: no, there is no reliable way to detect this case. Even if not for the fact that non-twos-complement representations are very uncommon in modern computer systems.
The C++ standard, for its part, makes no mention of the term "negative zero", and has very little discussion of the details of signed magnitude and one's complement representations, except to note [3.9.1 para 7] that they are allowed.
If your machine has distinct representations for -0 and +0, then memcmp will be able to distinguish them.
If padding bits are present, there might actually be multiple representations for values other than zero as well.
In the C++ language specification, there is no such int as negative zero.
The only meaning those two words have is the unary operator - applied to 0, just as three plus five is just the binary operator + applied to 3 and 5.
If there were a distinct negative zero, two's complement (the most common representation of integers types) would be an insufficient representation for C++ implementations, as there is no way to represent two forms of zero.
In contrast, floating points (following IEEE) have separate positive and negative zeroes. They can be distinguished, for example, when dividing 1 by them. Positive zero produces positive infinity; negative zero produces negative infinity.
However, if there happen to be different memory representations of the int 0 (or any int, or any other value of any other type), you can use memcmp to discover that:
#include <string>
int main() {
int a = ...
int b = ...
if (memcmp(&a, &b, sizeof(int))) {
// a and b have different representations in memory
}
}
Of course, if this did happen, outside of direct memory operations, the two values would still work in exactly the same way.
To simplify i found it easier to visualize.
Type int(_32) is stored with 32 bits. 32 bits means 2^32 = 4294967296 unique values. Thus :
unsigned int data range is 0 to 4,294,967,295
In case of negative values it depends on how they are stored. In case
Two's complement –2,147,483,648 to 2,147,483,647
One's complement –2,147,483,647 to 2,147,483,647
In case of One's complement value -0 exists.

I don't seem to understand the output of this program regarding conversion of integer pointer to character pointer

In the program below i initiliaze i to 255
Thus in Binary we have:
0000 0000 1111 1111
That is in Hex:
0x 0 0 f f
So according to Little-Endian layout:
The Lower Order Byte - 0xff is stored first.
#include<cstdio>
int main()
{
int i=0x00ff; //I know 0xff works. Just tried to make it understable
char *cptr=(char *)&i;
if(*(cptr)==-127)
printf("Little-Endian");
else
printf("Big-Endian");
}
So, when i store the address of i in cptr it should point to the Lower Byte (assuming Little Endian, coz that is what my System has) .
Hence, *cptr contains 1111 1111. This should come down to -127. Since, 1 bit is for the Sign-bit.
But when i print *cptr's value i get -1, why is it so?
Please explain where am i going wrong?
Where did you get the idea that 1111 1111 is -127? Apparently, you are assuming that the "sign bit" should be interpreted independently from the rest of the bits, i.e. setting the sign bit in binary representation of +127 should turn it into -127, right? Well, a representation that works that way does indeed exist: it is called Signed Magnitude representation. However, it is virtually unused in practice. Most modern machines use 2's Complement representation for signed types, which is a completely different thing.
On a 2's-complement machine 1111 1111 has always been -1. -127 would be 1000 0001. And 1000 0000 is -128, BTW.
On top of that keep in mind that char type is not necessarily signed. If you specifically need a signed type, you have to spell it out: signed char.
1111 1111 is a -1 because -1 is the largest negative integral number. Remember that -1 is more then -2 in math, so binary representation should have the same properties for the convenience. So 1000 0001 will represent -127.
There are three ways to represent negative numbers in binary: Signed magnitude, ones-complement and twos-complement.
The signed magnitude is easiest to understand: One bit is used to represent the sign (0=+, 1=-), the other bits represent the magnitude of the value.
In ones-complement, a negative value is obtained by performing a bitwise inversion on the corresponding positive number (toggle all bits)
In twos-complement, the way to convert between positive and negative numbers is less straightforward (flip all bits, then add 1), but it has some characteristics that make it particularly useful in computers.
Because computers work very well with twos-complement representations, that is what gets used in most computers for representing integers.
What is going wrong is that you expected a signed magnitude representation (highest bit set, so negative value. All other bits set as well, so value = -127), but the computer is using twos-complement representation (where all bis set == -1).

Two's complement binary form

In a TC++ compiler, the binary representation of 5 is (00000000000000101).
I know that negative numbers are stored as 2's complement, thus -5 in binary is (111111111111011). The most significant bit (sign bit) is 1 which tells that it is a negative number.
So how does the compiler know that it is -5? If we interpret the binary value given above (111111111111011) as an unsigned number, it will turn out completely different?
Also, why is the 1's compliment of 5 -6 (1111111111111010)?
The compiler doesn't know. If you cast -5 to unsigned int you'll get 32763.
The compiler knows because this is the convention the CPU uses natively. Your computer has a CPU that stores negative numbers in two's complement notation, so the compiler follows suit. If your CPU supported one's complement notation, the compiler would use that (as is the case with IEEE floats, incidentally).
The Wikipedia article on the topic explains how two's complement notation works.
The processor implements signed and unsigned instructions, which will operate on the binary number representation differently. The compiler knows which of these instructions to emit based on the type of the operands involved (i.e. int vs. unsigned int).
The compiler doesn't need to know if a number is negative or not, it simply emits the correct machine or intermediate language instructions for the types involved. The processor or runtime's implementation of these instructions usually doesn't much care if the number is negative or not either, as the formulation of two's complement arithmetic is such that it is the same for positive or negative numbers (in fact, this is the chief advantage of two's complement arithmetic). What would need to know if a number is negative would be something like printf(), and as Andrew Jaffe pointed out, the MSBit being set is indicative of a negative number in two's complement.
The first bit is set only for negative numbers (it's called the sign bit)
Detailed information is available here
The kewl part of two's complement is that the machine language Add, and Subtract instructions can ignore all that, and just do binary arithmetic and it just works...
i.e., -3 + 4
in Binary 2's complement, is
1111 1111 1111 1101 (-3)
+ 0000 0000 0000 0100 ( 4)
-------------------
0000 0000 0000 0001 ( 1)
let us give an example:
we have two numbers in two bytes in binary:
A = 10010111
B = 00100110
(note that the machine does not know the concept of signed or unsigned in this level)
now when you say "add" these two, what does the machine? it simply adds:
R = 10111101 (and carry bit : 1)
now, we -as compiler- need to interpret the operation. we have two options: the numbers can be signed or unsigned.
1- unsigned case: in c, the numbers are of type "unsigned char" and the values are 151 and 38 and the result is 189. this is trivial.
2 - signed case: we, the compiler, interpret the numbers according to their msb and the first number is -105 and the second is still 38. so -105 + 38 = -67. But -67 is 10111101. But this is what we already have in the result (R)! The result is same, the only difference is how the compiler interprets it.
The conclusion is that, no matter how we consider the numbers, the machine does the same operation on the numbers. But the compiler will interpret the results in its turn.
Note that, it is not the machine who knows the concept of 2's complement. it just adds two numbers without caring the content. The compiler, then, looks at the sign bit and decides.
When it comes to subtraction, this time again, the operation is unique: take 2's complement of the second number and add the two.
If the number is declared as a signed data type (and not type cast to an unsigned type), then the compiler will know that, when the sign bit is 1, it's a negative number. As for why 2's complement is used instead of 1's complement, you don't want to be able to have a value of -0, which 1's complement would allow you to do, so they invented 2's complement to fix that.
It's exactly that most significant bit -- if you know a number is signed, then if the MSB=1 the compiler (and the runtime!) knows to interpret it as negative. This is why c-like languages have both integers (positive and negative) and unsigned integers -- in that case you interpret them all as positive. Hence a signed byte goes from -128 to 127, but an unsigned byte from 0 to 255.