How do I reinterpret the exponents (2^31, ..., 2^0) of an integer? - bit-manipulation

I have a regular 32 bit Integer where the bits represent b31*2^31, b30*2^30, ..., b0*2^0 respectively.
Is there a way to decrease each exponent by 32, i.e. calculate b31*2^-1 + b30*2^-2 + ... + b0*2^-32 (similar to an IEEE Float mantissa) without having to extract the bits (e.g. using shifts or mod)?
E.g. for an 8 bit Integer, if input is 0010 1111 = 47 then output is 0 * 2^-1 + 0 * 2^-2 + 1 * 2^-3 + 0 * 2^-4 + 1 * 2^-5 + 1 * 2^-6 + 1 * 2^-7 + 1 * 2^-8 = 0.18359375.

Related

Why is the 4s-complement of 412 (in decimal) 321210?

I used an online converter to convert 412 from decimal to base 4 (which is 12130), and then applied the r's complement formula to get its 4s complement (which is 21210). However, in 6bits, 21210 becomes 321210.
When I try to convert it to decimal by doing
3 x - 4^5 + 2 x 4^4 + 1 x 4^3 + 2 x 4^2 + 1 x 4^1,
I get a number in decimal that is way larger than 412.
You have your "conversion to decimal" wrong -- it should be:
-1 x 4^5 + 2 x 4^4 + 1 x 4^3 + 2 x 4^2 + 1 x 4^1
note the difference in the first term, dealing with the sign -- the '3' digit has a value of -1 in a 4s complement sign digit.

Bitwise Operators NOT

I encountered a problem with bit arithmetic. It is bitwise NOT.
if A = 5; then ~A = ?
The binary of 5 is 101, the inverse is 010, and then converted to decimal is 0 * 2^2 + 1 * 2^1 + 0 * 2^0 = 2
But when I test in the IDE, the output is as follows:
System.out.println( ~5 );
Output:
-6
I don't know why. Thanks!!!
If you using a standard int, then after assignment your A to 5:
int A = 5;
Then your "A" would be not 101b, but 00000000000000000000000000000101b - all 32 bits.
After NEG operation, which inverse all bits, you will get:
A = 11111111111111111111111111111010
And this int-value is -6, in the 2-complement representation, used int the most of computers.

C/C++: 1.00000 <= 1.0f = False

Can someone explain why 1.000000 <= 1.0f is false?
The code:
#include <iostream>
#include <stdio.h>
using namespace std;
int main(int argc, char **argv)
{
float step = 1.0f / 10;
float t;
for(t = 0; t <= 1.0f; t += step)
{
printf("t = %f\n", t);
cout << "t = " << t << "\n";
cout << "(t <= 1.0f) = " << (t <= 1.0f) << "\n";
}
printf("t = %f\n", t );
cout << "t = " << t << "\n";
cout << "(t <= 1.0f) = " << (t <= 1.0f) << "\n";
cout << "\n(1.000000 <= 1.0f) = " << (1.000000 <= 1.0f) << "\n";
}
The result:
t = 0.000000
t = 0
(t <= 1.0f) = 1
t = 0.100000
t = 0.1
(t <= 1.0f) = 1
t = 0.200000
t = 0.2
(t <= 1.0f) = 1
t = 0.300000
t = 0.3
(t <= 1.0f) = 1
t = 0.400000
t = 0.4
(t <= 1.0f) = 1
t = 0.500000
t = 0.5
(t <= 1.0f) = 1
t = 0.600000
t = 0.6
(t <= 1.0f) = 1
t = 0.700000
t = 0.7
(t <= 1.0f) = 1
t = 0.800000
t = 0.8
(t <= 1.0f) = 1
t = 0.900000
t = 0.9
(t <= 1.0f) = 1
t = 1.000000
t = 1
(t <= 1.0f) = 0
(1.000000 <= 1.0f) = 1
As correctly pointed out in the comments, the value of t is not actually the same 1.00000 that you are defining in the line below.
Printing t with higher precision with std::setprecision(20) will reveal its actual value: 1.0000001192092895508.
The common way to avoid these kinds of issues is to compare not with 1, but with 1 + epsilon, where epsilon is a very small number, that is maybe one or two magnitudes greater than your floating point precision.
So you would write your for loop condition as
for(t = 0; t <= 1.000001f; t += step)
Note that in your case, epsilon should be atleast ten times greater than the maximum possible floating point error, as the float is added ten times.
As pointed out by Muepe and Alain, the reason for t != 1.0f is that 1/10 can not be precisely represented in binary floating point numbers.
Floating point types in C++ (and most other languages) are implemented using an approach that uses the available bytes (for example 4 or 8) for the following 3 components:
Sign
Exponent
Mantissa
Lets have a look at it for a 32 bit (4 byte) type which often is what you have in C++ for float.
The sign is just a simple bit beeing 1 or 0 where 0 could mean its positive and 1 that its negative. If you leave every standardization away that exists you could also say 0 -> negative, 1 -> positive.
The exponent could use 8 bits. Opposed to our daily life this exponent is not ment to be used to the base 10 but base 2. That means 1 as an exponent does not correspond to 10 but to 2, and the exponent 2 means 4 (=2^2) and not 100 (=10^2).
Another important part is, that for floating point variables we also might want to have negative exponents like 2^-1 beeing 0.5, 2^-2 for 0.25 and so on. Thus we define a bias value that gets subtracted from the exponent and yields the real value. In this case with 8 bits we'd choose 127 meaning that an exponent of 0 gives 2^-127 and an exponent of 255 means 2^128. But, there is an exception to this case. Usually two values of the exponent are used to mark NaN and infinity. Therefore the real exponent is from 0 to 253 giving a range from 2^-127 to 2^126.
The mantissa obviously now fills up the remaining 23 bits. If we see the mantissa as a series of 0 and 1 you can imagine its value to be like 1.m where m is the series of those bits, but not in powers of 10 but in powers of 2. So 1.1 would be 1 * 2^0 + 1 * 2^-1 = 1 + 0.5 = 1.5. As an example lets have a look at the following mantissa (a very short one):
m = 100101 -> 1.100101 to base 2 -> 1 * 2^0 + 1 * 2^-1 + 0 * 2^-2 + 0 * 2^-3 + 1 * 2^-4 + 0 * 2^-5 + 1 * 2^-6 = 1 * 1 + 1 * 0.5 + 1 * 1/16 + 1 * 1/64 = 1.578125
The final result of a float is then calculated using:
e * 1.m * (sign ? -1 : 1)
What exactly is going wrong in your loop: Your step is 0.1! 0.1 is a very bad number for floating point numbers to base 2, lets have a look why:
sign -> 0 (as its non-negative)
exponent -> The first value smaller than 0.1 is 2^-4. So the exponent should be -4 + 127 = 123
mantissa -> For this we check how many times the exponent is 0.1 and then try to convert the fraction to a mantissa. 0.1 / (2^-4) = 0.1/0.0625 = 1.6. Considering the mantissa gives 1.m our mantissa should be 0.6. So lets convert that to binary:
0.6 = 1 * 2^-1 + 0.1 -> m = 1
0.1 = 0 * 2^-2 + 0.1 -> m = 10
0.1 = 0 * 2^-3 + 0.1 -> m = 100
0.1 = 1 * 2^-4 + 0.0375 -> m = 1001
0.0375 = 1 * 2^-5 + 0.00625 -> m = 10011
0.00625 = 0 * 2^-6 + 0.00625 -> m = 100110
0.00625 = 0 * 2^-7 + 0.00625 -> m = 1001100
0.00625 = 1 * 2^-8 + 0.00234375 -> m = 10011001
We could continue like thiw until we have our 23 mantissa bits but i can tell you that you get:
m = 10011001100110011001...
Therefore 0.1 in a binary floating point environment is like 1/3 is in a base 10 system. Its a periodic infinite number. As the space in a float is limited there comes the 23rd bit where it just has to cut of, and therefore 0.1 is a tiny bit greater than 0.1 as there are not all infinite parts of the number in the float and after 23 bits there would be a 0 but it gets rounded up to a 1.
The reason is that 1.0/10.0 = 0.1 can not be represented exactly in binary, just as 1.0/3.0 = 0.333.. can not be represented exactly in decimals.
If we use
float step = 1.0f / 8;
for example, the result is as expected.
To avoid such problems, use a small offset as shown in the answer of mic_e.

Recursive function (help me understand it by pen and paper)

First of all I have to say that I can use recursive functions on easy examples like Fibonacci, but I can't understand how to dry run (solve with pen and paper) this recursion :
#include<iostream>
using namespace std;
int max(int a, int b)
{
if(a>b)return a;
return b;
}
int f(int a, int b)
{
if(a==0)return b;
return max( f(a-1,2*b), f(a-1,2*b+1) );
}
int main()
{
cout<<f(8,0);
}
How do I do this with pen and paper, with say, a = 5 and b = 6?
We have always a depth of a (8)
Each invocations calls itself 2 times, once 2b and once 2b+1 is passed
The greater result of both calls is returned
As 2b + 1 > 2b only the right site of the max call is meaningful (2b + 1)
Now lets do the first iterations mathematically:
2 * b + 1 = 2^1 * b + 2^0
2 * (2^1 * b + 2^0) + 1 = 2^2 * b + 2^1 + 2^0
2 * (2^2 * b + 2^1 + 2^0) + 1 = 2^3 * b + 2^2 + 2^1 + 2^0
2 * (2^3 * b + 2^2 + 2^1 + 2^0) + 1 = 2^4 * b + 2^3 + 2^2 + 2^1 + 2^0
As you can see there is a system behind it. Because b = 0 for the first iteration, we can ignore the left side. The final value is thus:
2^0 + 2^1 + 2^2 + 2^3 + 2^4 + 2^5 + 2^6 + 2^7
=
1 + 2 + 4 + 8 + 16 + 32 + 64 + 128
=
255
If we run the programm we get the exact same value
Just to give some information there are algorithms that use a little more complex parameters, one basic example would be mergesort
Merging is simple:
Take two elements one from each array A and B.
Compare them and place smaller of two (say from A) in sorted list.
Take next element from A and compare with element in hand (from B).
Repeat until one of the array is exhausted.
Now place all remaining elements of non-empty array one by one.
Maybe you can find this doc useful
Or maybe this one
Assuming you want to analyzse the funciton on paper, I'll paste the result for f(1, 2)
f(2, 1) =
max( f(1, 2), f(1, 3) ) =
max ( max(f(0, 4), f(0, 5) , max(f(0, 6), f(0, 7) ) =
max ( max(4, 5) , max(6, 7) ) =
max (5, 7) =
7
Is up to you to follow the computations
Note: I'm also assuming you didn't miss a parenthesis here: 2*b+1

In binary notation, what is the meaning of the digits after the radix point "."?

I have this example on how to convert from a base 10 number to IEEE 754 float representation
Number: 45.25 (base 10) = 101101.01 (base 2) Sign: 0
Normalized form N = 1.0110101 * 2^5
Exponent esp = 5 E = 5 + 127 = 132 (base 10) = 10000100 (base 2)
IEEE 754: 0 10000100 01101010000000000000000
This makes sense to me except one passage:
45.25 (base 10) = 101101.01 (base 2)
45 is 101101 in binary and that's okay.. but how did they obtain the 0.25 as .01 ?
Simple place value. In base 10, you have these places:
... 103 102 101 100 . 10-1 10-2 10-3 ...
... thousands, hundreds, tens, ones . tenths, hundredths, thousandths ...
Similarly, in binary (base 2) you have:
... 23 22 21 20 . 2-1 2-2 2-3 ...
... eights, fours, twos, ones . halves, quarters, eighths ...
So the second place after the . in binary is units of 2-2, well known to you as units of 1/4 (or alternately, 0.25).
You can convert the part after the decimal point to another base by repeatedly multiplying by the new base (in this case the new base is 2), like this:
0.25 * 2 = 0.5
-> The first binary digit is 0 (take the integral part, i.e. the part before the decimal point).
Continue multiplying with the part after the decimal point:
0.5 * 2 = 1.0
-> The second binary digit is 1 (again, take the integral part).
This is also where we stop because the part after the decimal point is now zero, so there is nothing more to multiply.
Therefore the final binary representation of the fractional part is: 0.012.
Edit:
Might also be worth noting that it's quite often that the binary representation is infinite even when starting with a finite fractional part in base 10. Example: converting 0.210 to binary:
0.2 * 2 = 0.4 -> 0
0.4 * 2 = 0.8 -> 0
0.8 * 2 = 1.6 -> 1
0.6 * 2 = 1.2 -> 1
0.2 * 2 = ...
So we end up with: 0.001100110011...2.
Using this method you see quite easily if the binary representation ends up being infinite.
"Decimals" (fractional bits) in other bases are surprisingly unintuitive considering they work in exactly the same way as integers.
base 10
scinot 10e2 10e1 10e0 10e-1 10e-2 10e-3
weight 100.0 10.0 1.0 0.1 0.01 0.001
value 0 4 5 .2 5 0
base 2
scinot 2e6 2e5 2e4 2e3 2e2 2e1 2e0 2e-1 2e-2 2e-3
weight 64 32 16 8 4 2 1 .5 .25 .125
value 0 1 0 1 1 0 1 .0 1 0
If we start with 45.25, that's bigger/equal than 32, so we add a binary 1, and subtract 32.
We're left with 13.25, which is smaller than 16, so we add a binary 0.
We're left with 13.25, which is bigger/equal than 8, so we add a binary 1, and subtract 8.
We're left with 05.25, which is bigger/equal than 4, so we add a binary 1, and subtract 4.
We're left with 01.25, which is smaller than 2, so we add a binary 0.
We're left with 01.25, which is bigger/equal than 1, so we add a binary 1, and subtract 1.
With integers, we'd have zero left, so we stop. But:
We're left with 00.25, which is smaller than 0.5, so we add a binary 0.
We're left with 00.25, which is bigger/equal to 0.25, so we add a binary 1, and subtract 0.25.
Now we have zero, so we stop (or not, you can keep going and calculating zeros forever if you want)
Note that not all "easy" numbers in decimal always reach that zero stopping point. 0.1 (decimal) converted into base 2, is infinitely repeating: 0.0001100110011001100110011... However, all "easy" numbers in binary will always convert nicely into base 10.
You can also do this same process with fractional (2.5), irrational (pi), or even imaginary(2i) bases, except the base cannot be between -1 and 1 inclusive .
2.00010 = 2+1 = 10.0002
1.00010 = 2+0 = 01.0002
0.50010 = 2-1 = 00.1002
0.25010 = 2-2 = 00.0102
0.12510 = 2-3 = 00.0012
The fractions base 2 are .1 = 1/2, .01 = 1/4. ...
Think of it this way
(dot) 2^-1 2^-2 2^-3 etc
so
. 0/2 + 1/4 + 0/8 + 0/16 etc
See http://floating-point-gui.de/formats/binary/
You can think of 0.25 as 1/4.
Dividing by 2 in (base 2) moves the decimal point one step left, the same way dividing by 10 in (base 10) moves the decimal point one step left. Generally dividing by M in (base M) moves the decimal point one step left.
so
base 10 base 2
--------------------------------------
1 => 1
1/2 = 0.5 => 0.1
0.5/2 = 1/4 = 0.25 => 0.01
0.25/2 = 1/8 = 0.125 => 0.001
.
.
.
etc.