Binary quicksort starting bit position - c++

I am reading about binary quicksort at the following location:
http://books.google.co.in/books?id=hyvdUQUmf2UC&pg=PA426&lpg=PA426&dq=robert+sedwick+binary+quick+sort&source=bl&ots=kAYK3_LkCg&sig=BjKk4g68h8xG87Vx2vS_TiUKDQY&hl=en&sa=X&ei=uuKzUq4-iY-tB7nZgdgL&ved=0CEYQ6AEwBA#v=onepage&q=robert%20sedwick%20binary%20quick%20sort&f=false
Text snippet:
For full-word keys consisting of random bits, the starting point in Program 10.1 should be the leftmost bit of the words, or bit 0. In general, the starting point that should be used depends in a straightforward way on the application, on the number of bits per word in the machine, and on the machine representation of integers and negative numbers. For the one-letter 5-bit keys in Figures 10.2 and 10.3, the starting point on a 32-bit machine would be bit 27.
My question on above text is:
Why does the author conclude that the starting point on a 32-bit machine should be bit 27 for 5-bit keys?

The text excerpt is confusing because it is incomplete.
It appears the text assumes a big-endian bit numbering for bits within a machine word. In big endian bit numbering, bit 0 is the leftmost bit within a word. The hint comes from the phrase "the leftmost bit of the words, or bit 0."
Therefore, for a 5 bit number held in a 32 bit register, bit 0 of that number would be held in bit 27 of the machine word, for a right-aligned value in a big-endian numbered word.
0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 machine word
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 bit numbers
+-----------------------------------------------------+---------+
|x x x x x x x x x x x x x x x x x x x x x x x x x x x|0 1 2 3 4| char to sort
+-----------------------------------------------------+---------+
Big endian bit numbering is uncommon in most places these days. IBM POWER / PowerPC still use big endian numbering, as did older big endian architectures such as the TMS9900 / TMS99000 family.

Related

C++ how would you find a number with the most similar bits in N integers?

For example say I have 3 integers 18 9 21
those 3 integers in binary : 10010, 10001, 10101
and say there's a number x I want that number to basically be the most similar bits for example the first digit of each number is 1 so x will start off as "1.....". The second digit of all three numbers is zero, so it will be "10...". The third digit is a mix: We have a 0,0 and a 1. but we have more zeros than 1's so x will be "100.." etc.
Is there any way to do this? I've been looking at bitwise operators and I'm just not sure how to do this? Because bitwise and doesn't really work on three numbers like this because if it sees even just one zero it will just return 0
I would simply add the bits if I were you: imagine the numbers: 17, 9 and 21, and let's write them in binary:
17 : 10001
9 : 01001
21 : 10101
Put this in a "table" and sum your binary digits:
1 0 0 0 1
0 1 0 0 1
1 0 1 0 1
2 1 1 0 3
... and then you say "When I have 0 or 1, I put '0', when 2 or 3, I put '1'", then you get:
1 0 0 0 1
=> your answer becomes "10001" which equals 17.

Optimal way to compress 60 bit string

Given 15 random hexadecimal numbers (60 bits) where there is always at least 1 duplicate in every 20 bit run (5 hexdecimals).
What is the optimal way to compress the bytes?
Here are some examples:
01230 45647 789AA
D8D9F 8AAAF 21052
20D22 8CC56 AA53A
AECAB 3BB95 E1E6D
9993F C9F29 B3130
Initially I've been trying to use Huffman encoding on just 20 bits because huffman coding can go from 20 bits down to ~10 bits but storing the table takes more than 9 bits.
Here is the breakdown showing 20 bits -> 10 bits for 01230
Character Frequency Assignment Space Savings
0 2 0 2×4 - 2×1 = 6 bits
2 1 10 1×4 - 1×2 = 2 bits
1 1 110 1×4 - 1×3 = 1 bits
3 1 111 1×4 - 1×3 = 1 bits
I then tried to do huffman encoding on all 300 bits (five 60bit runs) and here is the mapping given the above example:
Character Frequency Assignment Space Savings
---------------------------------------------------------
a 10 101 10×4 - 10×3 = 10 bits
9 8 000 8×4 - 8×3 = 8 bits
2 7 1111 7×4 - 7×4 = 0 bits
3 6 1101 6×4 - 6×4 = 0 bits
0 5 1100 5×4 - 5×4 = 0 bits
5 5 1001 5×4 - 5×4 = 0 bits
1 4 0010 4×4 - 4×4 = 0 bits
8 4 0111 4×4 - 4×4 = 0 bits
d 4 0101 4×4 - 4×4 = 0 bits
f 4 0110 4×4 - 4×4 = 0 bits
c 4 1000 4×4 - 4×4 = 0 bits
b 4 0011 4×4 - 4×4 = 0 bits
6 3 11100 3×4 - 3×5 = -3 bits
e 3 11101 3×4 - 3×5 = -3 bits
4 2 01000 2×4 - 2×5 = -2 bits
7 2 01001 2×4 - 2×5 = -2 bits
This yields a savings of 8 bits overall, but 8 bits isn't enough to store the huffman table. It seems because of the randomness of the data that the more bits you try to encode with huffman the less effective it works. Huffman encoding seemed to work best with 20 bits (50% reduction) but storing the table in 9 or less bits isnt possible AFAIK.
In the worst-case for a 60 bit string there are still at least 3 duplicates, the average case there are more than 3 duplicates (my assumption). As a result of at least 3 duplicates the most symbols you can have in a run of 60 bits is just 12.
Because of the duplicates plus the less than 16 symbols, I can't help but feel like there is some type of compression that can be used
If I simply count the number of 20-bit values with at least two hexadecimal digits equal, there are 524,416 of them. A smidge more than 219. So the most you could possibly save is a little less than one bit out of the 20.
Hardly seems worth it.
If I split your question in two parts:
How do I compress (perfect) random data: You can't. Every bit is some new entropy which can't be "guessed" by a compression algorithm.
How to compress "one duplicate in five characters": There are exactly 10 options where the duplicate can be (see table below). This is basically the entropy. Just store which option it is (maybe grouped for the whole line).
These are the options:
AAbcd = 1 AbAcd = 2 AbcAd = 3 AbcdA = 4 (<-- cases where first character is duplicated somewhere)
aBBcd = 5 aBcBd = 6 aBcdB = 7 (<-- cases where second character is duplicated somewhere)
abCCd = 8 abCdC = 9 (<-- cases where third character is duplicated somewhere)
abcDD = 0 (<-- cases where last characters are duplicated)
So for your first example:
01230 45647 789AA
The first one (01230) is option 4, the second 3 and the third option 0.
You can compress this by multiplying each consecutive by 10: (4*10 + 3)*10 + 0 = 430
And uncompress it by using divide and modulo: 430%10=0, (430/10)%10=3, (430/10/10)%10=4. So you could store your number like that:
1AE 0123 4567 789A
^^^ this is 430 in hex and requires only 10 bit
The maximum number for the three options combined is 1000, so 10 bit are enough.
Compared to storing these 3 characters normally you save 2 bit. As someone else already commented - this is probably not worth it. For the whole line it's even less: 2 bit / 60 bit = 3.3% saved.
If you want to get rid of the duplicates first, do this, then look at the links at the bottom of the page. If you don't want to get rid of the duplicates, then still look at the links at the bottom of the page:
Array.prototype.contains = function(v) {
for (var i = 0; i < this.length; i++) {
if (this[i] === v) return true;
}
return false;
};
Array.prototype.unique = function() {
var arr = [];
for (var i = 0; i < this.length; i++) {
if (!arr.contains(this[i])) {
arr.push(this[i]);
}
}
return arr;
}
var duplicates = [1, 3, 4, 2, 1, 2, 3, 8];
var uniques = duplicates.unique(); // result = [1,3,4,2,8]
console.log(uniques);
Then you would have shortened your code that you have to deal with. Then you might want to check out Smaz
Smaz is a simple compression library suitable for compressing strings.
If that doesn't work, then you could take a look at this:
http://ed-von-schleck.github.io/shoco/
Shoco is a C library to compress and decompress short strings. It is very fast and easy to use. The default compression model is optimized for english words, but you can generate your own compression model based on your specific input data.
Let me know if it works!

Adding n bits to the first n bits of another number

I am doing a project on digital filters. I needed to know how to add a 4 bit binary number to the most significant 4 bits of an 8 bit number. For example:
0 1 0 0 0 0 0 0 //x
+ 1 0 1 0 //y
= 1 1 1 0 0 0 0 0 //z
Can I add using a code somewhat like this?
z=[7:4]x + y
or should I have to concatenate the 4 bit number with another four zeros and add?
Assuming y is the 4 bit number and x the 8 bit number:
If you do
assign z = x[7:4] + y
Then you are doing a 4-bit addition and the most significant part of z is padded with 0's.
If you do
assign z = y[7:4] + x
You will get an error message from the synthesizer, as subscripts for y are wrong.
So do as this:
assign z = {y,4'b0} + x
Which performs an 8-bit addition with x and the value of y shifted 4 bits to the left, which is want you wanted.

How can I count amount of sequentially set bits in a byte from left to right until the first 0?

I'm not good in English, I can't ask it better, but please below:
if byte in binary is 1 0 0 0 0 0 0 0 then result is 1
if byte in binary is 1 1 0 0 0 0 0 0 then result is 2
if byte in binary is 1 1 1 0 0 0 0 0 then result is 3
if byte in binary is 1 1 1 1 0 0 0 0 then result is 4
if byte in binary is 1 1 1 1 1 0 0 0 then result is 5
if byte in binary is 1 1 1 1 1 1 0 0 then result is 6
if byte in binary is 1 1 1 1 1 1 1 0 then result is 7
if byte in binary is 1 1 1 1 1 1 1 1 then result is 8
But if for example the byte in binary is 1 1 1 0 * * * * then result is 3.
I would determine how many bit is set contiguous from left to right with one operation.
The results are not necessary numbers from 1-8, just something to distinguish.
I think it's possible in one or two operations, but I don't know how.
If you don't know a solution as short as 2 operations, please write that too, and I won't try it anymore.
Easiest non-branching solution I can think of:
y=~x
y|=y>>4
y|=y>>2
y|=y>>1
Invert x, and extend the lefttmost 1-bit (which corresponds to the leftmost 0-bit in the non-inverted value) to the right. Will give distinct values (not 1-8 though, but it's pretty easy to do a mapping).
110* ****
turns into
001* ****
001* **1*
001* 1*1*
0011 1111
EDIT:
As pointed out in a different answer, using a precomputed lookup table is probably the fastets. Given only 8 bits, it's probably even feasible in terms of memory consumption.
EDIT:
Heh, woops, my bad.. You can skip the invert, and do ands instead.
x&=x>>4
x&=x>>2
x&=x>>1
here
110* ****
gives
110* **0*
110* 0*0*
1100 0000
As you can see all values beginning with 110 will result in the same output (1100 0000).
EDIT:
Actually, the 'and' version is based on undefined behavior (shifting negative numbers), and will usually do the right thing if using signed 8-bit (i.e. char, rather than unsigned char in C), but as I said the behavaior is undefined and might not always work.
I'd second a lookup table... otherwise you can also do something like:
unsigned long inverse_bitscan_reverse(unsigned long value)
{
unsigned long bsr = 0;
_BitScanReverse(&bsr, ~value); // x86 bsr instruction
return bsr;
}
EDIT: Not that you have to be careful of the special case where "value" has no zeroed bits. See the documentation for _BitScanReverse.

ROUNDUP? what does it do? in C++

Can someone explain to me what this does?
#define ROUNDUP(n,width) (((n) + (width) - 1) & ~unsigned((width) - 1))
Providing width is an even power of 2 (so 2,4,8,16,32 etc), it will return a number equal to or greater than n, which is a multiple of width, and which is the smallest value meeting that criteria.
So width = 16; 5->16, 7->16, 15->16, 16->16, 17->32, 18->32 etc.
EDIT I started out on providing an explanation of why this works as it does, as I sense that's really what the OP wants, but it turned into a rather convoluted story. If the OP is still confused, I'd suggest working through a few simple examples, say width = 16, n=15,16,17. Remember that & = bitwise AND, ~ = bitwise complement, and to use binary representation exclusively as you work through the examples.
It rounds n up to the next 'width' - but I think width needs to be a power of 2.
For example width == 8, n = 5:
(5 + 8 - 1) & ~(7)
= 12 & ~7
= 8
So 5 rounds to 8. Anything 1 - 8 rounds to 8. 9 to 16 rounds to 16. Etc. (0 rounds to 0)
It defines a macro called ROUNDUP which takes two parameters, n and width, and returns the value (n + width - 1) & ~unsigned(width - 1).
:)
Try this if you think you know what it does:
std::string s("WTF");
std::complex<double> c(-11,5);
ROUNDUP(s, c);
It won't work in C because of the unsigned. Here is what is does, as long as width is confined to powers of 2:
n width ROUNDUP(n,width)
----------------
0 4 0
1 4 4
2 4 4
3 4 4
4 4 4
5 4 8
6 4 8
7 4 8
8 4 8
9 4 12
10 4 12
11 4 12
12 4 12
13 4 16
14 4 16
15 4 16
16 4 16
17 4 20
18 4 20
19 4 20