How to represent a negative number with a fraction in 2's complement? - twos-complement

So I want to represent the number -12.5. So 12.5 equals to:
001100.100
If I don't calculate the fraction then it's simple, -12 is:
110100
But what is -12.5? is it 110100.100? How can I calculate this negative fraction?

With decimal number systems, each number position (or column) represents (reading a number from right to left): units (which is 10^0), tens (i.e. 10^1),hundreds (i.e. 10^2), etc.
With unsigned binary numbers, the base is 2, thus each position becomes (again, reading from right to left): 1 (i.e. 2^0) ,2 (i.e. 2^1), 4 (i.e. 2^2), etc.
For example
2^2 (4), 2^1 (2), 2^0 (1).
In signed twos-complement the most significant bit (MSB) becomes negative. Therefore it represent the number sign: '1' for a negative number and '0' for a positive number.
For a three bit number the rows would hold these values:
-4, 2, 1
0 0 1 => 1
1 0 0 => -4
1 0 1 => -4 + 1 = -3
The value of the bits held by a fixed-point (fractional) system is unchanged. Column values follow the same pattern as before, base (2) to a power, but with power going negative:
2^2 (4), 2^1 (2), 2^0 (1) . 2^-1 (0.5), 2^-2 (0.25), 2^-3 (0.125)
-1 will always be 111.000
-0.5 add 0.5 to it: 111.100
In your case 110100.10 is equal to -32+16+4+0.5 = -11.5. What you did was create -12 then add 0.5 rather than subtract 0.5.
What you actually want is -32+16+2+1+0.5 = -12.5 = 110011.1

you can double the number again and again until it's negative integer or reaches a defined limit and then set the decimal point correspondingly.
-25 is 11100111, so -12.5 is 1110011.1

So;U want to represent -12.5 in 2's complement representation
12.5:->> 01100.1
2's complement of (01100.1):->>10011.1
verify the ans by checking the weighted code property of 2's complement representation(MSB weight is -ve). we will get -16+3+.5=-12.5

Related

double precision in C++, 308 digits or 15 digits?

If double can represent value up to 3.4E308 (308 zeros), then why do we say that double stores only 15 digits? What is point of saying "ten power of 308" ?
We don't say that "double stores only 15 digits". We say that "double has 15 digits of precision". It means that the computed value of double, when printed as a base-10 sequence of digits, is accurate only up to those 15 digits.
double can represent 3.4E308. Yes, to print it you need more than 15 digits of precision. Some particular values have those guarantees thanks to floating-point implementation. But, for example, a number 3.4E308 - 1, which is inside of double's range, cannot be represented accurately by a double.
If you want to be sure, just take the first 15 digits of double. Some values can be correctly represented with more than 15 digits, but some cannot. Every value in double's range will be correctly represented up to the 15th digit of its decimal representation.
To help perceive this in simple terms, consider a number type that can represent following numbers:
-100000
-100
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+100
+100000
The type can represent numbers up to 100000. Does this type have 6 digits of precision, or 1?

How is float_max + 1 defined in C++?

What is that actual value of f?
float f = std::numeric_limits<float>::max() + 1.0f;
For unsigned integral types it is well defined to overflow to 0, and for signed integral it is undefined/implementation specific if I'm not wrong.
But how is it specified in standard for float/double? Is it std::numeric_limits<float>::max() or does it become std::numeric_limits<float>::infinity()?
On cppreference I didn't find a specification so far, maybe I missed it.
Thanks for help!
In any rounding mode, max + 1 will simply be max with an IEEE-754 single-precision float.
Note that the maximum positive finite 32-bit float is:
3 2 1 0
1 09876543 21098765432109876543210
S ---E8--- ----------F23----------
Binary: 0 11111110 11111111111111111111111
Hex: 7F7F FFFF
Precision: SP
Sign: Positive
Exponent: 127 (Stored: 254, Bias: 127)
Hex-float: +0x1.fffffep127
Value: +3.4028235e38 (NORMAL)
For this number to overflow and become infinity using the default rounding mode of round-nearest-ties-to-even, you have to add at least:
3 2 1 0
1 09876543 21098765432109876543210
S ---E8--- ----------F23----------
Binary: 0 11100110 00000000000000000000000
Hex: 7300 0000
Precision: SP
Sign: Positive
Exponent: 103 (Stored: 230, Bias: 127)
Hex-float: +0x1p103
Value: +1.0141205e31 (NORMAL)
Anything you add less than this particular value will round it back to max value itself. Different rounding modes might have slightly different results, but the order of the number you're looking for is about 1e31, which is pretty darn large.
This is an excellent example of how IEEE floats get sparser and sparser as their magnitude increases.

Why is absolute value of INT_MIN different from INT MAX? [duplicate]

This question already has answers here:
What is “two's complement”?
(24 answers)
Closed 7 years ago.
I'm trying to understand why INT_MIN is equal to -2^31 - 1 and not just -2^31.
My understanding is that an int is 4 bytes = 32 bits. Of these 32 bits, I assume 1 bit is used for the +/- sign, leaving 31 bits for the actual value. As such, INT_MAX is equal to 2^31-1 = 2147483647. On the other hand, why is INT_MIN equal to -2^31 = -2147483648? Wouldn't this exceed the '4 bytes' allotted for int? Based on my logic, I would have expected INT_MIN to equal -2^31 = -2147483647
Most modern systems use two's complement to represent signed integer data types. In this representation, one state in the positive side is used up to represent zero, hence one positive value lesser than the negatives. In fact this is one of the prime advantage this system has over the sign-magnitude system, where zero has two representations, +0 and -0. Since zero has only one representation in two's complement, the other state, now free, is used to represent one more number.
Let's take a small data type, say 4 bits wide, to understand this better. The number of possible states with this toy integer type would be 2⁴ = 16 states. When using two's complement to represent signed numbers, we would have 8 negative and 7 positive numbers and zero; in sign-magnitude system, we'd get two zeros, 7 positive and 7 negative numbers.
Bin Dec
0000 = 0
0001 = 1
0010 = 2
0011 = 3
0100 = 4
0101 = 5
0110 = 6
0111 = 7
1000 = -8
1001 = -7
1010 = -6
1011 = -5
1100 = -4
1101 = -3
1110 = -2
1111 = -1
I think you are confused since you are imagining that sign-magnitude representation is used for signed numbers; although this is also allowed by the language standards, this system is very less likely to be implemented as two's complement system is significantly a better representation.
As of C++20, only two's complement is allowed for signed integers; source.

In binary notation, what is the meaning of the digits after the radix point "."?

I have this example on how to convert from a base 10 number to IEEE 754 float representation
Number: 45.25 (base 10) = 101101.01 (base 2) Sign: 0
Normalized form N = 1.0110101 * 2^5
Exponent esp = 5 E = 5 + 127 = 132 (base 10) = 10000100 (base 2)
IEEE 754: 0 10000100 01101010000000000000000
This makes sense to me except one passage:
45.25 (base 10) = 101101.01 (base 2)
45 is 101101 in binary and that's okay.. but how did they obtain the 0.25 as .01 ?
Simple place value. In base 10, you have these places:
... 103 102 101 100 . 10-1 10-2 10-3 ...
... thousands, hundreds, tens, ones . tenths, hundredths, thousandths ...
Similarly, in binary (base 2) you have:
... 23 22 21 20 . 2-1 2-2 2-3 ...
... eights, fours, twos, ones . halves, quarters, eighths ...
So the second place after the . in binary is units of 2-2, well known to you as units of 1/4 (or alternately, 0.25).
You can convert the part after the decimal point to another base by repeatedly multiplying by the new base (in this case the new base is 2), like this:
0.25 * 2 = 0.5
-> The first binary digit is 0 (take the integral part, i.e. the part before the decimal point).
Continue multiplying with the part after the decimal point:
0.5 * 2 = 1.0
-> The second binary digit is 1 (again, take the integral part).
This is also where we stop because the part after the decimal point is now zero, so there is nothing more to multiply.
Therefore the final binary representation of the fractional part is: 0.012.
Edit:
Might also be worth noting that it's quite often that the binary representation is infinite even when starting with a finite fractional part in base 10. Example: converting 0.210 to binary:
0.2 * 2 = 0.4 -> 0
0.4 * 2 = 0.8 -> 0
0.8 * 2 = 1.6 -> 1
0.6 * 2 = 1.2 -> 1
0.2 * 2 = ...
So we end up with: 0.001100110011...2.
Using this method you see quite easily if the binary representation ends up being infinite.
"Decimals" (fractional bits) in other bases are surprisingly unintuitive considering they work in exactly the same way as integers.
base 10
scinot 10e2 10e1 10e0 10e-1 10e-2 10e-3
weight 100.0 10.0 1.0 0.1 0.01 0.001
value 0 4 5 .2 5 0
base 2
scinot 2e6 2e5 2e4 2e3 2e2 2e1 2e0 2e-1 2e-2 2e-3
weight 64 32 16 8 4 2 1 .5 .25 .125
value 0 1 0 1 1 0 1 .0 1 0
If we start with 45.25, that's bigger/equal than 32, so we add a binary 1, and subtract 32.
We're left with 13.25, which is smaller than 16, so we add a binary 0.
We're left with 13.25, which is bigger/equal than 8, so we add a binary 1, and subtract 8.
We're left with 05.25, which is bigger/equal than 4, so we add a binary 1, and subtract 4.
We're left with 01.25, which is smaller than 2, so we add a binary 0.
We're left with 01.25, which is bigger/equal than 1, so we add a binary 1, and subtract 1.
With integers, we'd have zero left, so we stop. But:
We're left with 00.25, which is smaller than 0.5, so we add a binary 0.
We're left with 00.25, which is bigger/equal to 0.25, so we add a binary 1, and subtract 0.25.
Now we have zero, so we stop (or not, you can keep going and calculating zeros forever if you want)
Note that not all "easy" numbers in decimal always reach that zero stopping point. 0.1 (decimal) converted into base 2, is infinitely repeating: 0.0001100110011001100110011... However, all "easy" numbers in binary will always convert nicely into base 10.
You can also do this same process with fractional (2.5), irrational (pi), or even imaginary(2i) bases, except the base cannot be between -1 and 1 inclusive .
2.00010 = 2+1 = 10.0002
1.00010 = 2+0 = 01.0002
0.50010 = 2-1 = 00.1002
0.25010 = 2-2 = 00.0102
0.12510 = 2-3 = 00.0012
The fractions base 2 are .1 = 1/2, .01 = 1/4. ...
Think of it this way
(dot) 2^-1 2^-2 2^-3 etc
so
. 0/2 + 1/4 + 0/8 + 0/16 etc
See http://floating-point-gui.de/formats/binary/
You can think of 0.25 as 1/4.
Dividing by 2 in (base 2) moves the decimal point one step left, the same way dividing by 10 in (base 10) moves the decimal point one step left. Generally dividing by M in (base M) moves the decimal point one step left.
so
base 10 base 2
--------------------------------------
1 => 1
1/2 = 0.5 => 0.1
0.5/2 = 1/4 = 0.25 => 0.01
0.25/2 = 1/8 = 0.125 => 0.001
.
.
.
etc.

represent negative number with 2' complement technique?

I am using 2' complement to represent a negative number in binary form
Case 1:number -5
According to the 2' complement technique:
Convert 5 to the binary form:
00000101, then flip the bits
11111010, then add 1
00000001
=> result: 11111011
To make sure this is correct, I re-calculate to decimal:
-128 + 64 + 32 + 16 + 8 + 2 + 1 = -5
Case 2: number -240
The same steps are taken:
11110000
00001111
00000001
00010000 => recalculate this I got 16, not -240
I am misunderstanding something?
The problem is that you are trying to represent 240 with only 8 bits. The range of an 8 bit signed number is -128 to 127.
If you instead represent it with 9 bits, you'll see you get the correct answer:
011110000 (240)
100001111 (flip the signs)
+
000000001 (1)
=
100010000
=
-256 + 16 = -240
Did you forget that -240 cannot be represented with 8 bits when it is signed ?
The lowest negative number you can express with 8 bits is -128, which is 10000000.
Using 2's complement:
128 = 10000000
(flip) = 01111111
(add 1) = 10000000
The lowest negative number you can express with N bits (with signed integers of course) is always - 2 ^ (N - 1).