I have some real data. For example +2 and -3. These data are represented in two's complement fixed point with 4 bit binary value where MSB represents the sign bit and number of fractional bit is zero.
So +2 = 0010
-3 = 1101
addition of this two numbers is (+2) + (-3)=-1
(0010)+(1101)=(1111)
But in case of subtraction (+2)-(-3) what should i do?
Is it needed to take the two's complement of 1101 (-3) again and add with 0010?
You can evaluate -(-3) in binary and than simply sums it with the other values.
With two's complement, evaluate the opposite of a number is pretty simple: just apply the NOT binary operation to every digits except for the less significant bit. The equation below uses the tilde to rapresent the NOT operation of a single bit and assumed to deal with integer rapresented by n bits (n = 4 in your example):
In your example (with an informal notation): -(-3) = -(1101) = 0011
Related
IS -28.91 = 00100.0111 ??
28 -> 11100 then flip and add 1
-28 -> 00100
.91 -> 0111 with the accuracy of 4 decimals places
I have tried to check a lot of places to check my conversion if it is correct but I am failing at it. So I like to ask people here if I am correct.
For addition / subtraction and other operations to work normally (by using binary addition on the whole bit-pattern), the whole thing (integer and fractional parts combined) as an integer has to be x * 2^4.
i.e. the actual value represented by 0b00100.0111 is 0b001000111 / 16.
That means you have to do 2's complement negation (binary subtraction from 0, or use the invert and add 1 identity) for the whole and fractional bits together.
Also, your value for 28 has its MSB set, so it's already negative, i.e. you've overflowed 5-bit signed 2's complement. Presumably you actually have a wider integer part.
For 16-bit 12.4 fixed-point, 28.91:
28.91 * 16 = 462.56, which rounds up to 463.
+463 = 0b0000000111001111
-463 = 0b1111111000110001
As 12.4 fixed-point, this 0b111111100011.0001 bit-pattern represents -463/16 = -28.9375, the nearest representable value to -28.91
As seen in the picture above all of the variables have a negative limit that is one more than the positive limit. I was how it is able to add that extra one. I know that the first digit in the variable is used to tell if it is negative (1) or if is not (0). I also know that binary is based on the powers of 2. What I am confused about is how there is one extra when the positive itself can't go higher and the negative only has one digit changing. For example, a short can go up to 32,767 (01111111 11111111) or 16,383 + all of the decimal values of the binary numbers below it. Negative numbers are the same thing except a one at the beginning, right? So how do the negative numbers have a larger limit? Thanks to anyone who answers!
The reason is a scheme called "2's complement" to represent signed integer.
You know that the most significant bit of a signed integer represent the sign. But what you don't know is, it also represent a value, a negative value.
Take a 4-bit 2's complement signed integer as an example:
1 0 1 0
-2^3 2^2 2^1 2^0
This 4-bit integer is interpreted as:
1 * -2^3 + 0 * 2^2 + 1 * 2^1 + 0 * 2^0
= -8 + 0 + 2 + 0
= -6
With this scheme, the max of 4-bit 2's complement is 7.
0 1 1 1
-2^3 2^2 2^1 2^0
And the min is -8.
1 0 0 0
-2^3 2^2 2^1 2^0
Also, 0 is represented by 0000, 1 is 0001, and -1 is 1111. Comparing these three numbers, we can observe that zero has its "sign bit" positive, and there is no "negative zero" in 2's complement scheme. In other words, half of the range only consists of negative number, but the other half of the range includes zero and positive numbers.
If integers are stored using two's complement then you get one extra negative value and a single zero. If they are stored using one's complement or signed magnitude you get two zeros and the same number of negative values as positive ones. Floating point numbers have their own storage scheme, and under IEEE formats use have an explicit sign bit.
I know that the first digit in the variable is used to tell if it is negative (1) or if is not (0).
The first binary digit (or bit), yes, assuming two's complement representation. Which basically answers your question. There are 32,768 numbers < 0 (-32,768 .. -1) , and 32,768 numbers >= 0 (0 .. +32,767).
Also note that in binary the total possible representations (bit patterns) are an even number. You couldn't have the min and max values equal in absolute values, since you'd end up with an odd number of possible values (counting 0). Thus, you'd have to waste or declare illegal at least one bit pattern.
I'd like to know the science behind the following. a 32 bit value is shifted left 32 times in a 64 bit type, then a division is performed. somehow the precision is contained within the last 32 bits and in order to retrieve the value as a floating point number, I can multiply by 1 over the max value of an unsigned 32 bit int.
phase = ((uint64) 44100 << 32) / 48000;
(phase & 0xffffffff) * (1.0f / 4294967296.0f);// == 0.918749988
the same as
(float)44100/48000;// == 0.918749988
(...)
If you lose precision when dividing two integer numbers, you should remember the remainder.
The reminder in C++ can be taken by doing 44100%48000 in your case.
Actually these are constants and it's completely clear that 44100/48000 == 0, so remainder is all you have.
Well, the reminder will even be -- guess what -- 44100!
The float type (imposed by the explicit cast) has only 6 significant digits. So 4294967296.0f will be simply 429496e4 (in mathematics: 429496*10^4). That's why this type isn't valuable for anything but playing around.
The best way to get a value of fixed integer type in which all bits are set, and not miss the correct number of 'f' in 0xfffff, is to use the ~ operator and 0 value. In your case, ~uint32_t(0).
Well, I should have said this in the beginning: 44100.0/48000 should give you the result you want. :P
this is the answer I was looking for
bit shifting left will provide that number of bits in which to store the precision vale from a division.
dividing the integer value represented by these bits by 2 to the power of the bit shift amount will return the precision value
e.g
0000 0001 * 2^8 = 1 0000 0000 = 256(base 10)
1 0000 0000 / 2 = 1000 0000 = 128(base 10)
128 / 2^8 = 0.5
Let the size of integer i=-5 be 2 bytes. The signed bit value at the leftmost bit is '1'(which signifies that it is a negative number).
When i am trying to do a right shift operation, should i not expect the '1' at the 15th bit position to shift to 14th position? and give me a high but positive value?
What i tried:
int i=5;
i>>1 // giving me 2 (i understand this)
int i=-5
i>>1 // giving me -3 (mind=blown)
Right shifts of negative values are implementation-defined, [expr.shift]/3
The value of E1 >> E2 is E1 right-shifted E2 bit positions.
[..]. If E1 has a signed type and a negative value, the resulting
value is implementation-defined.
Most implementations use the so-called arithmetic shift though, which preserves and extends the sign-bit:
Shifting right by n bits on a two's complement signed binary number
has the effect of dividing it by 2n, but it always rounds down
(towards negative infinity). This is different from the way rounding
is usually done in signed integer division (which rounds towards 0).
This discrepancy has led to bugs in more than one compiler.
So what happens is, when shortened down to 8 bit, the following. In two's complement -5 would be
1111 1011
After the arithmetic right shift:
1111 1101
Now flip and add one to get the positive value for comparison:
0000 0011
Looks like a three to me.
Given any 8 bits negative integer (signed so between -1 and -128), a right shift in HLA causes an overflow and I don't understand why. If shifted once, it should basically divide the value by 2. This is true for positive numbers but obviously not for negative. Why? So for example if -10 is entered the result is +123.
Program cpy;
#include ("stdlib.hhf")
#include ("hla.hhf")
static
i:int8;
begin cpy;
stdout.put("Enter value to divide by 2: ");
stdin.geti8();
mov(al,i);
shr(1,i); //shift bits one position right
if(#o)then // if overlow
stdout.put("overflow");
endif;
end cpy;
Signed numbers are represented with their 2's complement in binary, plus a sign bit "on the left".
The 2's complement of 10 coded on 7 bits is 1110110, and the sign bit value for negative numbers is 1.
-10: 1111 0110
^
|
sign bit
Then you shift it to the right (when you right shift zeroes get added to the left):
-10 >> 1: 0111 1001
^
|
sign bit
Your sign bit is worth 0 (positive), and 1111011 is 123 in decimal.