Comparing Bitfields of Different Sizes - c++

What happens if you use a bitwise operator (&, |, etc.) to compare two bitfields of different sizes?
For example, comparing 0 1 1 0 with 0 0 1 0 0 0 0 1:
0 1 1 0 0 0 0 0 The smaller one is extended with zeros and pushed to the
0 0 1 0 0 0 0 1 most-significant side.
Or...
0 0 0 0 0 1 1 0 The smaller one is extended with zeros and pushed to the
0 0 1 0 0 0 0 1 least-significant side.
Or...
0 1 1 0 The longer one is truncated from its least-significant side,
0 0 1 0 keeping its most significant side.
Or...
0 1 1 0 The longer one is truncated from its most-significant side,
0 0 0 1 keeping its least-significant side.

The bitwise operators always work on promoted operands. So exactly what might happen can depend on whether one (or both) bitfields are signed (as that may result in sign extension).
So, for your example values, the bit-field with the binary value 0 1 1 0 will be promoted to the int 6, and the bit-field with the binary value 0 0 1 0 0 0 0 1 will be promoted to the int 33, and those are the operands that will be used with whatever the operation is.

0 0 0 0 0 1 1 0 The smaller one is extended with zeros and pushed to the
0 0 1 0 0 0 0 1 least-significant side.

If you're actually using the values as bitfields, what's the meaning of comparing bitfields of different sizes? Would it generate a meaningful result for you?
That said, both operands will be promoted to a minimum size of int/unsigned with signedness depending on the signedness of the original operands. Then these promoted values will be compared with the bitwise operator.
This behaves as your second example: The smaller one is padded with zeroes on the MSB side (pushed to LSB side if you prefer).
If one operand is signed and negative while the other is unsigned, the negative one will be converted to the congruent unsigned number before the bit operation takes place.
If instead of integral numbers you mean std::bitset, you can't do bitwise operations on bitsets of differing sizes.

Related

Distance span between two binary strings

Is there any way of finding efficiently (bitwise operations) the distance (not Hamming Distance!) of two 8-bit binary strings?
Each byte is guaranteed to have only one bit set.
Like:
a=0 0 0 0 0 0 0 1
b=0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 1 -> distance = 3
^^^^^
------
a=0 0 0 0 0 1 0 0
b=0 1 0 0 0 0 0 0
0 1 0 0 0 1 0 0 -> distance = 3
^^^^^
------
a=0 1 0 0 0 0 0 0
b=0 0 0 0 0 0 1 0
0 1 0 0 0 0 1 0 -> distance = 4
^^^^^^^
I could work with something like logarithms but that is not very efficient
"Efficient" can mean a different things here: e.g., asymptotic vs performance for a known range of inputs; time vs space; etc.
I'll assume you care about raw speed for the small bounded inputs you describe.
Baseline approach. Take the smaller bit, and left shift it until it's equal to the larger bit, counting shifts. While this is O(n), that sort of analysis doesn't matter here since n is bounded.
You might compare that baseline to either of the following approaches, which have better time complexity but may or may not be faster for your inputs.
Alternative 1. Put all the distances in a lookup matrix. O(1) time complexity, but O(n^2) space complexity.
Alternative 2. Have a lookup table for the logarithms, and return the difference log2(a) - log2(b), where a >= b. O(1) time complexity, O(n) space complexity. (Note that I'm assuming that dist(a, a) = 0, which is a off-by-one from what you describe above.)
I don't know in practice which of those will be faster, but the main point is not to assume that O(n) means that the algorithm is slower in absolute terms for your inputs.
You can use OR operation (logical summing) and then find a maximum amount of zeros, which goes one by one. Hope i get your question right.

What does the & operator mean? [duplicate]

This question already has an answer here:
What does & stands for in C and mmap()
(1 answer)
Closed 8 years ago.
I am trying to understand the condition of an if-else statement in c++, here is the snippet where this statement is in (not it's a shorthand version):
for (int i = 0; i < 8; ++i)
{
Point newCenter = center;
newCenter.x += oneEighth.x * (i&4 ? 0.5f : -0.5f);
}
I do understand that the 0.5f holds if the condition is true and -0.5f otherwise, but what does the i&4 mean?
This here is using two things, firstly it is using the bitwise AND operator &, this takes the binary representation of the two integers (i and 4) and computes the bitwise AND of both of these (i.e. for each position in the resulting binary representation of the number we look at the bits at the corresponding position in the two arguments and set the resultant bit to 1 if and only if both bits in the arguments are 1), secondly, it is using the implicit int to bool conversion which returns true if the integer is not equal to 0.
For example, if we have i=7, then the internal bitwise representation of this in two's complement would be:
/*24 0s*/ 0 0 0 0 0 1 1 1
And the two's complement representation of 4 is /*24 0s*/ 0 0 0 0 0 1 0 0 and so the bitwise AND is /*24 0s*/ 0 0 0 0 0 1 0 0 and as this is not equal to zero it is implictly converted to true and so the condition is met.
Alternatively, if we consider i=2, then we have the internal representation:
/*24 0s*/ 0 0 0 0 0 0 1 0
And thus the bitwise AND gives /*24 0s*/ 0 0 0 0 0 0 0 0 and thus the condition is not met.
The operator is Bitwise AND.
Bitwise binary AND does the logical AND of the bits in each position of a number in its binary form.
So, in your code, i&4 is true when i is 4, 5, 6, 7, because the base-2 representation of 4 is 100. i&4 will be true when the base-2 representation of i has 1 in the 3-rd position(right-left)

How do I write encoded information (JPEG/JFIF) when it is not a multiple of eight?

I'm trying to write a baseline JPEG encoder. I already know how the handle the JFIF format (very good article, BTW). Right now I'm trying to compress a 8x8 grayscale image that is basically white. So, considering that a white pixel is basically 255, once you apply the JPEG algorithm (obviating the zig zag step, because for this example it is basically unnecessary) you get this matrix:
B = [63 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0]
As we can see, there's only one DC component (63) and no AC components. If you check the Huffman tables you find that the category is 6 (1110) and because 63 in binary is 111111 the sequence of bits for this DC component is 1110111111 (10 bits). Now, according to the algorithm, when all AC components are 0 you need to send EOB, whose sequence is 1010 (four bits). So, in the end, the final sequence of bits is 11101111111010 (14 bits).
Now, we already know that I can only write (or append) bytes to a file. So I am trying to write something like this to a new .jpeg file:
0xFF 0xD8 .. JFIF metadata ... 11101111111010 0xFF 0xD9
SOI marker block EOI marker
The question is, what should I do about those 14 bits? I guess I need to insert 2 filler bits (I don't know if there's a better term for them) to get 2 bytes but I don't know where to insert them, let alone their values (00? 01? 10? 11?). I suppose that this is a common problem in data encoding and/or low-level programming so I it is widely solved :)
The JPEG format says that:
The only padding that occurs is at the end of the scan when the remaining bits in the last byte to be filled with 1’s if the byte is incomplete.
So you are supposed to fill with 1-s here. That means in fine you should have:
1110 111111 1010 11
DC code DC value (=63) EOB (=10) Extra 1-s
In other words 11101111 11101011 which gives the 0xEF 0xEB sequence in hexadecimal.
Pro-tip: you can refer to this code section from jpec - a tiny JPEG encoder written in C. Also, the jpec_huff_write_bits includes a relevant documentation that may help you understand how to write the bits at Huffman time.

C++ Novice regarding Bitset operations with strings

I'm currently learning about bitset, and in one paragraph it says this about their interactions with strings:
"The numbering conventions of strings and bitsets are inversely related: the rightmost character in the string--the one with the highest subscript--is used to initialize the low order bit in the bitset--the bit with subscript 0."
however later on they give an example + diagram which shows something like this:
string str("1111111000000011001101");
bitset<32> bitvec5(str, 5, 4); // 4 bits starting at str[5], 1100
value of str:
1 1 1 1 1 (1 1 0 0) 0 0 0 ...
value of bitvec5:
...0 0 0 0 0 0 0 (1 1 0 0)
This example shows it taking the rightmost bit and putting it so the last element from the string is the last in the bitset, not the first.
Which is right?(or are both wrong?)
They are both right.
Traditionally the bits in a machine word are numbered from right to left, so the lowest bit (bit 0) is to the right, just like it is in the string.
The bitset looks like this
...1100 value
...3210 bit numbers
and the string that looks the same
"1100"
will have string[0] == '1' and string[3] == '0', the exact opposite!
string strval("1100"); //1100, so from rightmost to leftmost : 0 0 1 1
bitset<32> bitvec4(strval); //bitvec4 is 0 0 1 1
So whatever you are reading is correct(both text and example) :
the rightmost character in the string--the one with the highest
subscript--is used to initialize the low order bit in the bitset--the
bit with subscript 0.

How can I count amount of sequentially set bits in a byte from left to right until the first 0?

I'm not good in English, I can't ask it better, but please below:
if byte in binary is 1 0 0 0 0 0 0 0 then result is 1
if byte in binary is 1 1 0 0 0 0 0 0 then result is 2
if byte in binary is 1 1 1 0 0 0 0 0 then result is 3
if byte in binary is 1 1 1 1 0 0 0 0 then result is 4
if byte in binary is 1 1 1 1 1 0 0 0 then result is 5
if byte in binary is 1 1 1 1 1 1 0 0 then result is 6
if byte in binary is 1 1 1 1 1 1 1 0 then result is 7
if byte in binary is 1 1 1 1 1 1 1 1 then result is 8
But if for example the byte in binary is 1 1 1 0 * * * * then result is 3.
I would determine how many bit is set contiguous from left to right with one operation.
The results are not necessary numbers from 1-8, just something to distinguish.
I think it's possible in one or two operations, but I don't know how.
If you don't know a solution as short as 2 operations, please write that too, and I won't try it anymore.
Easiest non-branching solution I can think of:
y=~x
y|=y>>4
y|=y>>2
y|=y>>1
Invert x, and extend the lefttmost 1-bit (which corresponds to the leftmost 0-bit in the non-inverted value) to the right. Will give distinct values (not 1-8 though, but it's pretty easy to do a mapping).
110* ****
turns into
001* ****
001* **1*
001* 1*1*
0011 1111
EDIT:
As pointed out in a different answer, using a precomputed lookup table is probably the fastets. Given only 8 bits, it's probably even feasible in terms of memory consumption.
EDIT:
Heh, woops, my bad.. You can skip the invert, and do ands instead.
x&=x>>4
x&=x>>2
x&=x>>1
here
110* ****
gives
110* **0*
110* 0*0*
1100 0000
As you can see all values beginning with 110 will result in the same output (1100 0000).
EDIT:
Actually, the 'and' version is based on undefined behavior (shifting negative numbers), and will usually do the right thing if using signed 8-bit (i.e. char, rather than unsigned char in C), but as I said the behavaior is undefined and might not always work.
I'd second a lookup table... otherwise you can also do something like:
unsigned long inverse_bitscan_reverse(unsigned long value)
{
unsigned long bsr = 0;
_BitScanReverse(&bsr, ~value); // x86 bsr instruction
return bsr;
}
EDIT: Not that you have to be careful of the special case where "value" has no zeroed bits. See the documentation for _BitScanReverse.