How to invert shifting and addition - bit-manipulation

1E 1B 01 13 6 [ 0001 1110 0001 1011 0000 0001 0001 0011 0110 ]is converted to F6C336 by doing sifting and addition.
(0x1E<<19)+(0x1B<<14)+(0x01<<9)+(0x13<<4)+6 = F6C336[1111 0110 1100 0011 0011 0110]
Now, I am stuck to reverse this calculation. i.e. From F6C336, I want to get 1E 1B 01 13 6.
Sorry for my poor knowledge in bit operations.

If these are four blocks of 5 bits each and one block of 4 bits each, then the "conversion" is their concatenation, and the inverse of that is splitting it back up into those pieces. For example:
piece0 = x >> 19;
piece1 = (x >> 14) & 31;
piece2 = (x >> 9) & 31;
piece3 = (x >> 4) & 31;
piece4 = x & 15;
Shown in Java here but the logic would be similar in most languages.
If the input was not of that form, for example if it was FF FF FF FF F then the inverse is ambiguous.

Related

What does this piece of Tiled C++ code do?

I'm trying to figure out the purpose of this piece of code, from the Tiled utility's map format documentation.
const int gid = data[i] |
data[i + 1] << 8 |
data[i + 2] << 16 |
data[i + 3] << 24;
It looks like there is some "or-ing" and shifting of bits, but I have no clue what the aim of this is, in the context of using data from the tiled program.
Tiled stores its layer "Global Tile ID" (GID) data in an array of 32-bit integers, base64-encoded and (optionally) compressed in the XML file.
According to the documentation, these 32-bit integers are stored in little-endian format -- that is, the first byte of the integer contains the least significant byte of the number. As an analogy, in decimal, writing the number "1234" in little-endian would look like 4321 -- the 4 is the least significant digit in the number (representing a value of just 4), the 3 is the next-least-significant (representing 30), and so on. The only difference between this example and what Tiled is doing is that we're using decimal digits, while Tiled is using bytes, which are effectively digits that can each hold 256 different values instead of just 10.
If we think about the code in terms of decimal numbers, though, it's actually pretty easy to understand what it's doing. It's basically reconstructing the integer value from the digits by doing just this:
int digit[4] = { 4, 3, 2, 1 }; // our decimal digits in little-endian order
int gid = digit[0] +
digit[1] * 10 +
digit[2] * 100 +
digit[3] * 1000;
It's just moving each digit into position to create the full integer value. (In binary, bit shifting by multiples of 8 is like multiplying by powers of 10 in decimal; it moves a value into the next 'significant digit' slot)
More information on big-endian and little-endian and why the difference matters can be found in On Holy Wars And A Plea For Peace, an important (and entertainingly written) document from 1980 in which Danny Cohen argued for the need to standardise on a single byte ordering for network protocols. (spoiler: big-endian eventually won that fight, and so the big-endian representation of integers is now the standard way to represent integers in files and network transmissions -- and has been for decades. Tiled's use of little-endian integers in their file format is somewhat unusual. And results in needing code like the code you quoted in order to reliably convert the little-endian integers in the data file into the computer's native format. If they'd stored their data in the standard big-endian format, every OS provides standard utility functions for converting back and forth from big-endian to native, and you could simply have called ntohl() to assemble the native-format integer, instead of needing to write and comprehend this sort of byte manipulation code manually).
As you noted, the << operator shifts bits to the left by the given number.
This block takes the data[] array, which has four (presumably one byte) elements, and "encodes" those four values into one integer.
Example Time!
data[0] = 0x3A; // 0x3A = 58 = 0011 1010 in binary
data[1] = 0x48; // 0x48 = 72 = 0100 1000 in binary
data[2] = 0xD2; // 0xD2 = 210 = 1101 0010 in binary
data[3] = 0x08; // 0x08 = 8 = 0000 1000 in binary
int tmp0 = data[0]; // 00 00 00 3A = 0000 0000 0000 0000 0000 0000 0011 1010
int tmp1 = data[1] << 8; // 00 00 48 00 = 0000 0000 0000 0000 0100 1000 0000 0000
int tmp2 = data[2] << 16; // 00 D2 00 00 = 0000 0000 1101 0010 0000 0000 0000 0000
int tmp3 = data[3] << 24; // 08 00 00 00 = 0000 1000 0000 0000 0000 0000 0000 0000
// "or-ing" these together will set each bit to 1 if any of the bits are 1
int gid = tmp1 | // 00 00 00 3A = 0000 0000 0000 0000 0000 0000 0011 1010
tmp2 | // 00 00 48 00 = 0000 0000 0000 0000 0100 1000 0000 0000
tmp3 | // 00 D2 00 00 = 0000 0000 1101 0010 0000 0000 0000 0000
tmp4; // 08 00 00 00 = 0000 1000 0000 0000 0000 0000 0000 0000
gid == 147998778;// 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
Now, you've just encoded four one-byte values into a single four-byte integer.
If you're (rightfully) wondering, why would anyone want to go through all that effort when you can just use byte and store the four single-byte pieces of data directly into four bytes, then you should check out this question:
int, short, byte performance in back-to-back for-loops
Bonus Example!
To get your encoded values back, we use the "and" operator along with the right-shift >>:
int gid = 147998778; // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
// "and-ing" will set each bit to 1 if BOTH bits are 1
int tmp0 = gid & // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
0x000000FF; // 00 00 00 FF = 0000 0000 0000 0000 0000 0000 1111 1111
int data0 = tmp0; // 00 00 00 3A = 0000 0000 0000 0000 0000 0000 0011 1010
int tmp1 = gid & // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
0x0000FF00; // 00 00 FF 00 = 0000 0000 0000 0000 1111 1111 0000 0000
tmp1; //value of tmp1 00 00 48 00 = 0000 0000 0000 0000 0100 1000 0000 0000
int data1 = tmp1 >> 8; // 00 00 00 48 = 0000 0000 0000 0000 0000 0000 0100 1000
int tmp2 = gid & // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
0x00FF0000; // 00 FF 00 00 = 0000 0000 1111 1111 0000 0000 0000 0000
tmp2; //value of tmp2 00 D2 00 00 = 0000 0000 1101 0010 0000 0000 0000 0000
int data2 = tmp2 >> 16; // 00 00 00 D2 = 0000 0000 0000 0000 0000 0000 1101 0010
int tmp3 = gid & // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
0xFF000000; // FF 00 00 00 = 1111 1111 0000 0000 0000 0000 0000 0000
tmp3; //value of tmp3 08 00 00 00 = 0000 1000 0000 0000 0000 0000 0000 0000
int data3 = tmp3 >> 24; // 00 00 00 08 = 0000 0000 0000 0000 0000 0000 0000 1000
The last "and-ing" for tmp3 isn't needed, since the bits that "fall off" when shifting are just lost and the bits coming in are zero. So:
gid; // 08 D2 48 3A = 0000 1000 1101 0010 0100 1000 0011 1010
int data3 = gid >> 24; // 00 00 00 08 = 0000 0000 0000 0000 0000 0000 0000 1000
but I wanted to provide a complete example.

bit parity code needs explanation as to how it works?

Here is the code that reports the bit parity of a given integer:
01: bool parity(unsigned int x)
02: {
03: x ^= x >> 16;
04: x ^= x >> 8;
05: x ^= x >> 4;
06: x &= 0x0F;
07: return ((0x6996 >> x) & 1) != 0;
08: }
I found this here.. while there seems to be explanation in the link, I do not understand.
The first explanation that start with The code first "merges" bits 0 − 15 with bits 16 − 31 using a right shift and XOR (line 3). is making it hard for me to understand as to what is going on. I tried to play around them but that did not help. if a clarity on how this work is given, it will be useful for beginners like me
Thanks
EDIT: from post below:
value : 1101 1110 1010 1101 1011 1110 1110 1111
value >> 16: 0000 0000 0000 0000 1101 1110 1010 1101
----------------------------------------------------
xor : 1101 1110 1010 1101 0110 0001 0100 0010
now right shift this again by 8 bits:
value : 1101 1110 1010 1101 0110 0001 0100 0010
value >>8 : 0000 0000 1101 1110 1010 1101 0110 0001
----------------------------------------------------
xor : 1101 1110 1110 0001 0100 1100 0010 0011
so where is the merging of parity happening here?
Let's start first with a 2-bit example so you can see what's going on. The four possibilities are:
ab a^b
-- ---
00 0
01 1
10 1
11 0
You can see that a^b (xor) gives 0 for an even number of one-bits and 1 for an odd number. This woks for 3-bit values as well:
abc a^b^c
--- -----
000 0
001 1
010 1
011 0
100 1
101 0
110 0
111 1
The same trick is being used in lines 3 through 6 to merge all 32 bits into a single 4-bit value. Line 3 merges b31-16 with b15-0 to give a 16-bit value, then line 4 merges the resultant b15-b8 with b7-b0, then line 5 merges the resultant b7-b4 with b3-b0. Since b31-b4 (the upper half of each xor operation) aren't cleared by that operations, line 6 takes care of that by clearing them out (anding with binary 0000...1111 to clear all but the lower 4 bits).
The merging here is achieved in a chunking mode. By "chunking", I mean that it treats the value in reducing chunks rather than as individual bits, which allows it to efficiently reduce the value to a 4-bit size (it can do this because the xor operation is both associative and commutative). The alternative would be to perform seven xor operations on the nybbles rather than three. Or, in complexity analysis terms, O(log n) instead of O(n).
Say you have the value 0xdeadbeef, which is binary 1101 1110 1010 1101 1011 1110 1110 1111. The merging happens thus:
value : 1101 1110 1010 1101 1011 1110 1110 1111
>> 16: 0000 0000 0000 0000 1101 1110 1010 1101
----------------------------------------------------
xor : .... .... .... .... 0110 0001 0100 0010
(with the irrelevant bits, those which will not be used in future, left as . characters).
For the complete operation:
value : 1101 1110 1010 1101 1011 1110 1110 1111
>> 16: 0000 0000 0000 0000 1101 1110 1010 1101
----------------------------------------------------
xor : .... .... .... .... 0110 0001 0100 0010
>> 8: .... .... .... .... 0000 0000 0110 0011
----------------------------------------------------
xor : .... .... .... .... .... .... 0010 0001
>> 4: .... .... .... .... .... .... 0000 0010
----------------------------------------------------
xor : .... .... .... .... .... .... .... 0011
And, looking up 0011 in the table below, we see that it gives even parity (there are 24 1-bits in the original value). Changing just one bit in that original value (any bit, I've chosen the righmost bit) will result in the opposite case:
value : 1101 1110 1010 1101 1011 1110 1110 1110
>> 16: 0000 0000 0000 0000 1101 1110 1010 1101
----------------------------------------------------
xor : .... .... .... .... 0110 0001 0100 0011
>> 8: .... .... .... .... 0000 0000 0110 0011
----------------------------------------------------
xor : .... .... .... .... .... .... 0010 0000
>> 4: .... .... .... .... .... .... 0000 0010
----------------------------------------------------
xor : .... .... .... .... .... .... .... 0010
And 0010 in the below table is odd parity.
The only "magic" there is the 0x6996 value which is shifted by the four-bit value to ensure the lower bit is set appropriately, then that bit is used to decide the parity. The reason 0x6996 (binary 0110 1001 1001 0110) is used is because of the nature of parity for binary values as shown in the lined page:
Val Bnry #1bits parity (1=odd)
--- ---- ------ --------------
+------> 0x6996
|
0 0000 0 even (0)
1 0001 1 odd (1)
2 0010 1 odd (1)
3 0011 2 even (0)
4 0100 1 odd (1)
5 0101 2 even (0)
6 0110 2 even (0)
7 0111 3 odd (1)
8 1000 1 odd (1)
9 1001 2 even (0)
10 1010 2 even (0)
11 1011 3 odd (1)
12 1100 2 even (0)
13 1101 3 odd (1)
14 1110 3 odd (1)
15 1111 4 even (0)
Note that it's not necessary to do the final shift-of-a-constant. You could just as easily continue the merging operations until you get down to a single bit, then use that bit:
bool parity (unsigned int x) {
x ^= x >> 16;
x ^= x >> 8;
x ^= x >> 4;
x ^= x >> 2;
x ^= x >> 1;
return x & 1;
}
However, once you have the value 0...15, a shift of a constant by that value is likely to be faster than two extra shift-and-xor operations.
From the original page,
Bit parity tells whether a given input contains an odd number of 1's.
So you want to add up the number of 1's. The code uses the xor operator to add pairs of bits,
0^1 = 1 bits on
1^0 = 1 bits on
0^0 = 0 bits on
1^1 = 0 bits on (well, 2, but we cast off 2's)
So the first three lines count up the number of 1's (tossing pairs of 1's).
That should help...
And notice from the original page, the description of why 0x6996,
If we encode even by 0 and odd by 1 beginning with parity(15) then we
get 0110 1001 0110 1001 = 0x6996, which is the magic number found in
line 7. The shift moves the relevant bit to bit 0. Then everything
except for bit 0 is masked out. In the end, we get 0 for even and 1
for odd, exactly as desired.

Adding digits in list in python

I have sentence like "Q 000 1111 00001 0001 00 //SOME_STRING" I wanted to add except Q and //SOME_STRING in a List in Python in the Result only List contains 000 1111 00001 0001 00.
How can I do this?
import re
data = "Q 000 1111 00001 0001 00 //SOME_STRING"
digits = re.findall(r"\b\d+\b",data)
Test
>>> re.findall(r"\b\d+\b","Q 000 1111 00001 0001 00 //SOME_STRING234zzzz")
['000', '1111', '00001', '0001', '00']
import re
def filter_digits(bar):
return re.search("^\d+$", bar)
foo = "Q 000 1111 00001 0001 00 //SOME_STRING"
foo = foo.split(' ')
foo = filter(filter_digits, foo)

Bit Operation in C / C++

When we talk Bit Operation in C or C++.
Does bit start from bit0 or bit1 ? Which one is more make sense?
As I know A bit can assume either of two values: 1 or 0.
Generally, bit identifiers start from 0 at the least significant end, such as with the following octet:
+----+----+----+----+----+----+----+----+
| b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
+----+----+----+----+----+----+----+----+
80 40 20 10 08 04 02 01 <-- hex value
While a bit can take either a 0 or 1 value, that doesn't limit their identifiers, which can range from zero up to the number of bits minus 1.
For an explanation of the bitwise operators, see here.
For example, if you wanted to know whether b3 was set in C:
b3 = value & 0x08; // non-zero if set.
Similarly, setting b0 and clearing b7 can be done with:
value = (value | 0x01) & 0x7f; // or with 0000-0001, and with 0111-1111.
We always start from bit 0, which is almost always the least-significant bit.
its not Bit Operations, but Bitwise Operations
A bitwise operation is performed on all the bits of a variable, e.g
1 XOR 2
for integers of size 2 byte mean
0000 0000 0000 0001
XOR
0000 0000 0000 0010
By convention bit indexing starts at 0, e.g. for an expression such as (x >> i) & 1.
Bit operations use all the bits in the operands.

Question about bitwise And and Shift operations

How exactly do the following lines work if pData = "abc"?
pDes[1] = ( pData[0] & 0x1c ) >> 2;
pDes[0] = ( pData[0] << 6 ) | ( pData[1] & 0x3f );
Okay, assuming ASCII which is by no means guaranteed, pData[0] is 'a' (0x61) and pData[1] is 'b' (0x62):
pDes[1]:
pData[0] 0110 0001
&0x1c 0001 1100
---- ----
0000 0000
>>2 0000 0000 0x00
pDes[0]:
pData[0] 0110 0001
<< 6 01 1000 0100 0000 (interim value *a)
pData[1] 0110 0010
&0x3f 0011 1111
-- ---- ---- ----
0010 0010
|(*a) 01 1000 0100 0000
-- ---- ---- ----
01 1000 0110 0010 0x1862
How it works:
<< N simply means shift the bits N spaces to the left, >> N is the same but shifting to the right.
The & (and) operation will set each bit of the result to 1 if and only if the corresponding bit in both inputs is 1.
The | (or) operations sets each bit of the result to 1 if one or more of the corresponding bit in both inputs is 1.
Note that the 0x1862 will be truncated to fit into pDes[0] if it's type is not wide enough.
The folowing C program shows this in action:
#include <stdio.h>
int main(void) {
char *pData = "abc";
int pDes[2];
pDes[1] = ( pData[0] & 0x1c ) >> 2;
pDes[0] = ( pData[0] << 6 ) | ( pData[1] & 0x3f );
printf ("%08x %08x\n", pDes[0], pDes[1]);
return 0;
}
It outputs:
00001862 00000000
and, when you change pDes to a char array, you get:
00000062 00000000
& is not logical AND - it is bit-wise AND.
a is 0x61, thus pData[0] & 0x1c gives
0x61 0110 0001
0x1c 0001 1100
--------------
0000 0000
>> 2 shifts this to right by two positions - value doesn't change as all bits are zero.
pData[0] << 6 left shifts 0x61 by 6 bits to give 01000000 or 0x40
pData[1] & 0x3f
0x62 0110 0010
0x3f 0011 1111
--------------
0x22 0010 0010
Thus it comes down to 0x40 | 0x22 - again | is not logical OR, it is bit-wise.
0x40 0100 0000
0x22 0010 0010
--------------
0x62 0110 0010
The results will be different if pDes is not a char array. Left shifting 0x61 would give you 0001 1000 0100 0000 or 0x1840 - (in case pDes is a char array, the left parts are not in the picture).
0x1840 0001 1000 0100 0000
0x0022 0000 0000 0010 0010
--------------------------
0x1862 0001 1000 0110 0010
pDes[0] would end up as 0x1862 or decimal 6242.
C++ will treat a character as a number according to it's encoding. So, assuming ASCII, 'a' is 97 (which has a bit pattern of 0110_0001) and 'b' is 98 (bit pattern 0110_0010).
Once you think of them as numbers, bit operations on characters should be a bit clearer.
In C, all characters are also integers. That means "abc" is equivalent to (char[]){0x61, 0x62, 0x63, 0}.
The & is not the logical AND operator (&&). It is the bitwise AND, which computes the AND at bit-level, e.g.
'k' = 0x6b -> 0 1 1 0 1 0 1 1
0x1c -> 0 0 0 1 1 1 0 0 (&
———————————————————
8 <- 0 0 0 0 1 0 0 0
The main purpose of & 0x1c here is to extract bits #2 ~ #4 from pData[0]. The >> 2 afterwards remove the extra zeros at the end.
Similarly, the & 0x3f is to extract bits #0 ~ #5 from pData[1].
The << 6 pushes 6 zeros at the least significant end of the bits. Assuming pDes[0] is also a char, the most significant 6 bits will be discarded:
'k' = 0x6b -> 0 1 1 0 1 0 1 1
<< 6 = 0 1 1 0 1 0 1 1 0 0 0 0 0 0
xxxxxxxxxxx—————————————————
0xc0 <- 1 1 0 0 0 0 0 0
In terms of bits, if
pData[1] pData[0]
pData -> b7 b6 b5 b4 b3 b2 b1 b0 a7 a6 a5 a4 a3 a2 a1 a0
then
pDes -> 0 0 0 0 0 a4 a3 a2 a1 a0 b5 b4 b3 b2 b1 b0
pDes[1] pDes[0]
This looks like an operation to pack three values into a 6-5-5 bit structure.