calculation of bits needed - bit-manipulation

I need help with this
I was asked that for an unsigned integer range 1 to 1 billion, ,how many bits are needed!
How do we calculate this?
Thank you
UPDATE!!!!
This what I wanted to know because the interviwer said 17

Take the log base 2 of 1 billion and round up.
Alternatively, you should know that integers (with over 4 billion values) require 32-bits, therefore for 2 billion you'd need 31-bits and for 1 billion, 30-bits.
Another handy thing to know is that every 10 bits increase the number of values you can represent by a factor just over 1000 (1024), so for 1000, you need 10 bits, 1 million needs 20 bits, and 1 billion needs 30 bits.

Calculate log2(1000000000) and round it up. It works out to 30 bits.
For example in Python you can calculate it like this:
>>> import math
>>> math.ceil(math.log(1000000000, 2))
30.0

2^10 = 1024
2^10 * 2^10 = 2^20 = 1024*1024 = 1048576
2^10 * 2^10 * 2^10 = 2^30 = 3 * 1024 ~= 1,000,000
=> 30 Bits

Related

Optimal way to compress 60 bit string

Given 15 random hexadecimal numbers (60 bits) where there is always at least 1 duplicate in every 20 bit run (5 hexdecimals).
What is the optimal way to compress the bytes?
Here are some examples:
01230 45647 789AA
D8D9F 8AAAF 21052
20D22 8CC56 AA53A
AECAB 3BB95 E1E6D
9993F C9F29 B3130
Initially I've been trying to use Huffman encoding on just 20 bits because huffman coding can go from 20 bits down to ~10 bits but storing the table takes more than 9 bits.
Here is the breakdown showing 20 bits -> 10 bits for 01230
Character Frequency Assignment Space Savings
0 2 0 2×4 - 2×1 = 6 bits
2 1 10 1×4 - 1×2 = 2 bits
1 1 110 1×4 - 1×3 = 1 bits
3 1 111 1×4 - 1×3 = 1 bits
I then tried to do huffman encoding on all 300 bits (five 60bit runs) and here is the mapping given the above example:
Character Frequency Assignment Space Savings
---------------------------------------------------------
a 10 101 10×4 - 10×3 = 10 bits
9 8 000 8×4 - 8×3 = 8 bits
2 7 1111 7×4 - 7×4 = 0 bits
3 6 1101 6×4 - 6×4 = 0 bits
0 5 1100 5×4 - 5×4 = 0 bits
5 5 1001 5×4 - 5×4 = 0 bits
1 4 0010 4×4 - 4×4 = 0 bits
8 4 0111 4×4 - 4×4 = 0 bits
d 4 0101 4×4 - 4×4 = 0 bits
f 4 0110 4×4 - 4×4 = 0 bits
c 4 1000 4×4 - 4×4 = 0 bits
b 4 0011 4×4 - 4×4 = 0 bits
6 3 11100 3×4 - 3×5 = -3 bits
e 3 11101 3×4 - 3×5 = -3 bits
4 2 01000 2×4 - 2×5 = -2 bits
7 2 01001 2×4 - 2×5 = -2 bits
This yields a savings of 8 bits overall, but 8 bits isn't enough to store the huffman table. It seems because of the randomness of the data that the more bits you try to encode with huffman the less effective it works. Huffman encoding seemed to work best with 20 bits (50% reduction) but storing the table in 9 or less bits isnt possible AFAIK.
In the worst-case for a 60 bit string there are still at least 3 duplicates, the average case there are more than 3 duplicates (my assumption). As a result of at least 3 duplicates the most symbols you can have in a run of 60 bits is just 12.
Because of the duplicates plus the less than 16 symbols, I can't help but feel like there is some type of compression that can be used
If I simply count the number of 20-bit values with at least two hexadecimal digits equal, there are 524,416 of them. A smidge more than 219. So the most you could possibly save is a little less than one bit out of the 20.
Hardly seems worth it.
If I split your question in two parts:
How do I compress (perfect) random data: You can't. Every bit is some new entropy which can't be "guessed" by a compression algorithm.
How to compress "one duplicate in five characters": There are exactly 10 options where the duplicate can be (see table below). This is basically the entropy. Just store which option it is (maybe grouped for the whole line).
These are the options:
AAbcd = 1 AbAcd = 2 AbcAd = 3 AbcdA = 4 (<-- cases where first character is duplicated somewhere)
aBBcd = 5 aBcBd = 6 aBcdB = 7 (<-- cases where second character is duplicated somewhere)
abCCd = 8 abCdC = 9 (<-- cases where third character is duplicated somewhere)
abcDD = 0 (<-- cases where last characters are duplicated)
So for your first example:
01230 45647 789AA
The first one (01230) is option 4, the second 3 and the third option 0.
You can compress this by multiplying each consecutive by 10: (4*10 + 3)*10 + 0 = 430
And uncompress it by using divide and modulo: 430%10=0, (430/10)%10=3, (430/10/10)%10=4. So you could store your number like that:
1AE 0123 4567 789A
^^^ this is 430 in hex and requires only 10 bit
The maximum number for the three options combined is 1000, so 10 bit are enough.
Compared to storing these 3 characters normally you save 2 bit. As someone else already commented - this is probably not worth it. For the whole line it's even less: 2 bit / 60 bit = 3.3% saved.
If you want to get rid of the duplicates first, do this, then look at the links at the bottom of the page. If you don't want to get rid of the duplicates, then still look at the links at the bottom of the page:
Array.prototype.contains = function(v) {
for (var i = 0; i < this.length; i++) {
if (this[i] === v) return true;
}
return false;
};
Array.prototype.unique = function() {
var arr = [];
for (var i = 0; i < this.length; i++) {
if (!arr.contains(this[i])) {
arr.push(this[i]);
}
}
return arr;
}
var duplicates = [1, 3, 4, 2, 1, 2, 3, 8];
var uniques = duplicates.unique(); // result = [1,3,4,2,8]
console.log(uniques);
Then you would have shortened your code that you have to deal with. Then you might want to check out Smaz
Smaz is a simple compression library suitable for compressing strings.
If that doesn't work, then you could take a look at this:
http://ed-von-schleck.github.io/shoco/
Shoco is a C library to compress and decompress short strings. It is very fast and easy to use. The default compression model is optimized for english words, but you can generate your own compression model based on your specific input data.
Let me know if it works!

c++, binary number calculations

I have question that asks how values such as c are computed in terms of binary numbers. Im researching it but now but figured id ask here if anyone has somewhere they can send me or explain how this works.
int main()
{
int a 10, int b = 12, int c, int d;
int c = a << 2; //output 40
}
Well, I'm not answering with C++ code, as the question is not really related to the language.
The integer ten is written 10 in base 10 as it's equal to 1 * 10^1 + 0 * 10^0.
Binary is base 2, so let's try to write ten as a sum of powers of 2.
10 = 8 + 2
That is 2^3 + 2^1.
Let's switch to binary (using only two digits : 0 and 1).
2^3 is written 1000
2^1 is written 10
Their sum is 1010 in binary.
"<<" is the operation that shift left binary digits by a certain amount (beware of overflow).
So 1010 << 2 is 101000
That is in decimal 2^5 + 2^3 = 32 + 8 = 40
You can also think of "<< N" as being a multiplication by 2^N of an integer.

In Doom3's source code, why did they use bitshift to generate the number instead of hardcoding it?

Why did they do this:
Sys_SetPhysicalWorkMemory( 192 << 20, 1024 << 20 ); //Min = 201,326,592 Max = 1,073,741,824
Instead of this:
Sys_SetPhysicalWorkMemory( 201326592, 1073741824 );
The article I got the code from
A neat property is that shifting a value << 10 is the same as multiplying it by 1024 (1 KiB), and << 20 is 1024*1024, (1 MiB).
Shifting by successive powers of 10 yields all of our standard units of computer storage:
1 << 10 = 1 KiB (Kibibyte)
1 << 20 = 1 MiB (Mebibyte)
1 << 30 = 1 GiB (Gibibyte)
...
So that function is expressing its arguments to Sys_SetPhysicalWorkMemory(int minBytes, int maxBytes) as 192 MB (min) and 1024 MB (max).
Self commenting code:
192 << 20 means 192 * 2^20 = 192 * 2^10 * 2^10 = 192 * 1024 * 1024 = 192 MByte
1024 << 20 means 1024 * 2^20 = 1 GByte
Computations on constants are optimized away so nothing is lost.
I might be wrong (and I didn't study the source) , but I guess it's just for readability reasons.
I think the point (not mentioned yet) is that
All but the most basic compilers will do the shift at compilation time. Whenever you use operators with constant expressions, the
compiler will be able to do this before the code is even generated.
Note, that before constexpr and C++11, this did not extend to
functions.

represent negative number with 2' complement technique?

I am using 2' complement to represent a negative number in binary form
Case 1:number -5
According to the 2' complement technique:
Convert 5 to the binary form:
00000101, then flip the bits
11111010, then add 1
00000001
=> result: 11111011
To make sure this is correct, I re-calculate to decimal:
-128 + 64 + 32 + 16 + 8 + 2 + 1 = -5
Case 2: number -240
The same steps are taken:
11110000
00001111
00000001
00010000 => recalculate this I got 16, not -240
I am misunderstanding something?
The problem is that you are trying to represent 240 with only 8 bits. The range of an 8 bit signed number is -128 to 127.
If you instead represent it with 9 bits, you'll see you get the correct answer:
011110000 (240)
100001111 (flip the signs)
+
000000001 (1)
=
100010000
=
-256 + 16 = -240
Did you forget that -240 cannot be represented with 8 bits when it is signed ?
The lowest negative number you can express with 8 bits is -128, which is 10000000.
Using 2's complement:
128 = 10000000
(flip) = 01111111
(add 1) = 10000000
The lowest negative number you can express with N bits (with signed integers of course) is always - 2 ^ (N - 1).

Find rank of a number on basis of number of 1's

Let f(k) = y where k is the y-th number in the increasing sequence of non-negative integers with
the same number of ones in its binary representation as k, e.g. f(0) = 1, f(1) = 1, f(2) = 2, f(3) = 1, f(4)
= 3, f(5) = 2, f(6) = 3 and so on. Given k >= 0, compute f(k)
many of us have seen this question
1 solution to this problem to categorise numbers on basis of number of 1's and then find the rank.i did find some patterns going by this way but it would be a lengthy process. can anyone suggest me a better solution?
This is a counting problem. I think that if you approach it with this in mind, you can do much better than literally enumerating values and checking how many bits they have.
Consider the number 17. The binary representation is 10001. The number of 1s is 2. We can get smaller numbers with two 1s by (in this case) re-distributing the 1s to any of the four low-order bits. 4 choose 2 is 6, so 17 should be the 7th number with 2 ones in the binary representation. We can check this...
0 00000 -
1 00001 -
2 00010 -
3 00011 1
4 00100 -
5 00101 2
6 00110 3
7 00111 -
8 01000 -
9 01001 4
10 01010 5
11 01011 -
12 01100 6
13 01101 -
14 01110 -
15 01111 -
16 10000 -
17 10001 7
And we were right. Generalize that idea and you should get an efficient function for which you simply compute the rank of k.
EDIT: Hint for generalization
17 is special in that if you don't consider the high-order bit, the number has rank 1; that is, f(z) = 1 where z is everything except the higher order bit. For numbers where this is not the case, how can you account for the fact that you can get smaller numbers without moving the high-order bit?
f(k) are integers less than or equal to k that have the same number of ones in their binary representation as k.
For example, k needs m bits, that is k = 2^(m-1) + a, where a < 2^(m-1). The number of integers less than 2^(m-1) that have the same number of bits as k is choose(m-1, bitcount(k)), since you can freely redistribute the ones among the m-1 least significant bits.
Integers that are greater than or equal to 2^(m-1) have the same most significant bit as k (which is 1), so there are f(k - 2^(m-1)) of them. This implies f(k) = choose(m-1, bitcount(k)) + f(k-2^(m-1)).
See "Efficiently Enumerating the Subsets of a Set". Look at Table 3, the "Bankers sequence". This is a method to generate exactly the sequence you need (if you reverse the bit order). Just run K iterations for the word with K bits. There is code to generate it included in the paper.