Isolating a string of 1's in a character - c++

I need to come up with a function which takes a char and index of a set bit in it and isolates a string of 1's containing that bit.
i.e.
char isolate(unsigned char arg, int i);
For example:
isolate(221,2) would return 28 (11011101 >>> 00011100)
isolate(221,6) would return 192 (11011101 >>> 1100000)
A lookup table seems a clumsy solution as it would require ~256*8=2048 entries.
I am thinking of examining each individual bit to the left and right of the index:
char isolate(char arg, int i)
{
char result=0;
char mask = 1<<i;
for(char mask = 1<<i; arg & mask != 0; mask>>=1)
result |= mask;
for(char mask = 1<<i; arg & mask != 0; mask<<=1)
result |= mask;
return result;
}
But it also seems a bit ugly. How can I do any better than this?

That's a funny operation. The code you've written expresses it fairly well, so would you mind elaborating on how it's ugly?
The details I can see: Given that i expresses a bit number in arg, there's absolutely no point in i being a wider type. There's never a point in writing != 0 in a condition. You probably don't want to be redeclaring mask everywhere you use it, nor initializing it twice in a row.
As for the actual spreading bit mask, I can't think of a way that's more expressive, cleaner or efficient right now.

Warning: none of this was tested or even relevant*, but it may be interesting.
Isolating the rightmost run of 1s is easy, like this: x ^ (x & ((x|(x-1))+1)) (explanation below), so let's work with that.
First x|(x-1) smears the rightmost 1 to the right, adding 1 turns all those bits to 0 including the rightmost run of 1's, anding x with removes rightmost run of 1's, and finally, xoring that with x leaves just the rightmost run of 1s.
Then we just need to make sure that the range we're looking for is the rightmost one. That's less amenable to simple bitmath, but if there's Count Leading Zeros (clz), it's not too hard:
int shift = 32 - clz(~x & ((1 << i) - 1)); //replace 32 with word size
x = (x >> shift) << shift;
((1 << i) - 1) makes a mask of the part where the right-end of the run we're looking for could be in (it could also just miss the end, but that's ok), then clz looks for the first zero to the right of i in x, then the shifts remove the bits that we don't want to look at.
Apply the first formula, for isolating the rightmost run of 1s, to the result of that to get the run of ones where i was in. i had better be in some run, or things go sideways (more accurately, it would return the first run of 1s that starts at an index higher than i)
*: For this question, none of this really matters. A 2KB table is not a clumsy solution unless you only have a tiny amount of memory available, and even if that's the case, the input is so short that the loops aren't all that bad.

Related

Use bit manipulation to convert a bit from each byte in an 8-byte number to a single byte

I have a 64-bit unsigned integer. I want to check the 6th bit of each byte and return a single byte representing those 6th bits.
The obvious, "brute force" solution is:
inline const unsigned char Get6thBits(unsigned long long num) {
unsigned char byte(0);
for (int i = 7; i >= 0; --i) {
byte <<= 1;
byte |= bool((0x20 << 8 * i) & num);
}
return byte;
}
I could unroll the loop into a bunch of concatenated | statements to avoid the int allocation, but that's still pretty ugly.
Is there a faster, more clever way to do it? Maybe use a bitmask to get the 6th bits, 0x2020202020202020 and then somehow convert that to a byte?
If _pext_u64 is a possibility (this will work on Haswell and newer, it's very slow on Ryzen though), you could write this:
int extracted = _pext_u64(num, 0x2020202020202020);
This is a really literal way to implement it. pext takes a value (first argument) and a mask (second argument), at every position that the mask has a set bit it takes the corresponding bit from the value, and all bits are concatenated.
_mm_movemask_epi8 is more widely usable, you could use it like this:
__m128i n = _mm_set_epi64x(0, num);
int extracted = _mm_movemask_epi8(_mm_slli_epi64(n, 2));
pmovmskb takes the high bit of every byte in its input vector and concatenates them. The bits we want are not the high bit of every byte, so I move them up two positions with psllq (of course you could shift num directly). The _mm_set_epi64x is just some way to get num into a vector.
Don't forget to #include <intrin.h>, and none of this was tested.
Codegen seems reasonable enough
A weirder option is gathering the bits with a multiplication: (only slightly tested)
int extracted = (num & 0x2020202020202020) * 0x08102040810204 >> 56;
The idea here is that num & 0x2020202020202020 only has very few bits set, so we can arrange a product that never carries into bits that we need (or indeed at all). The multiplier is constructed to do this:
a0000000b0000000c0000000d0000000e0000000f0000000g0000000h0000000 +
0b0000000c0000000d0000000e0000000f0000000g0000000h00000000000000 +
00c0000000d0000000e0000000f0000000g0000000h000000000000000000000 etc..
Then the top byte will have all the bits "compacted" together. The lower bytes actually have something like that too, but they're missing bits that would have to come from "higher" (bits can only move to the left in a multiplication).

Bit operation used in a for loop

I found this loop in the source code of an algorithm. I think that details about the problems aren't relevant here, because this is a really small part of the solution.
void update(int i, int value, int array[], int n) {
for(; i < n; i += ~i & (i + 1)) {
array[i] += value;
}
}
I don't really understand what happens in that for loop, is it some sort of trick? I found something similar named Fenwick trees, but they look a bit different than what I have here.
Any ideas what this loop means?
Also, found this :
"Bit Hack #9. Isolate the rightmost 0-bit.
y = ~x & (x+1)
"
You are correct: the bit-hack ~i & (i + 1) should evaluate to an integer which is all binary 0's, except the one corresponding to the rightmost zero-bit of i, which is set to binary 1.
So at the end of each pass of the for loop, it adds this value to itself. Since the corresponding bit in i is zero, this has the effect of setting it, without affecting any other bits in i. This will strictly increase the value of i at each pass, until i overflows (or becomes -1, if you started with i<0). In context, you can probably expect that it is called with i>=0, and that i < n is set terminate the loop before your index walks off the array.
The overall function should have the effect of iterating through the zero-bits of the original value of i from least- to most-significant, setting them one by one, and incrementing the corresponding elements of the array.
Fenwick trees are a clever way to accumulate and query statistics efficiently; as you say, their update loop looks a bit like this, and typically uses a comparable bit-hack. There are bound to be multiple ways to accomplish this kind of bit-fiddling, so it is certainly possible that your source code is updating a Fenwick tree, or something comparable.
Assume that from the right to the left, you have some number of 1 bits, a 0 bit, and then more bits in x.
If you add x + 1, then all the 1's at the right are changed to 0, the 0 is changed to 1, the rest is unchanged. For example xxxx011 + 1 = xxxx100.
In ~x, you have the same number of 0 bits, a 1 bit, and the inverses of the other bits. The bitwise and produces the 0 bits, one 1 bit, and since the remaining bits are and'ed with their negation, those bits are 0.
So the result of ~x & (x + 1) is a number with one 1 bit where x had its rightmost zero bit.
If you add this to x, you change the rightmost 0 to a 1. So if you do this repeatedly, you change the 0 bits in x to 1, from the right to the left.
The update function iterates and sets the 0-bits of i from the leftmost zero to the rightmost zero and add value to the ith element of array.
The for loop checks if i is less than n, if so, ~i & (i + 1) would be an integer has all binary 0's, except for the rightmost bit ( i.e. 1). Then array[i] += value adds value to iterated itself.
Setting i to 8 and going through iterations may clear things to you.

add and remove last bit

I am trying to determine the next and previous even number with bitwise operations.
So for example for the next function:
x nextEven(x)
1 2
2 2
3 4
4 4
and for the previous:
x previousEven(x)
1 0
2 2
3 2
4 4
I had the idea for the nextEven function something like: value = ((value+1)>>1)<<1;
And for the previousEven function something like: value = ((value)>>1)<<1
is there a better approach?, without comparing and seeing if the values are even or odd.
Thank you.
Doing a right shift followed by a left shift to clear the LSB isn't very efficient.
I'd use something like:
previous: value &= ~1;
next: value = (value +1) & ~1;
The ~1 can (and normally will) be pre-computed at compile time, so the previous will end up as a single bit-wise operation at run-time. the next will probably end up as two operations (increment, and), but should still be quite fast.
About the best you can hope for from the shifts is that the compiler will recognize that you're just clearly the LSB, and optimize it to about what you'd expect this to produce anyway.
you could do something like this
for previous even
unsigned prevev(unsigned x)
{
return x-(x%2);//bitwise counterpart x-(x&1);
}
for next even
unsigned nxtev(unsigned x)
{
return (x%2)+x; //bitwise counterpart x+(x&1);
}
Say you're using unsigned ints, previous even (matching your values - we could argue about whether previous even of 2 should be 0 etc) is simply x & ~1u. Next even is previous even of x + 1.
Tricks like Duff's Device, or swapping two variables with XOR, or working out next and previous even number with bitwise operations seem clever, but they rarely are.
The best thing you can do as a developer is to optimise for readability first and only tackle performance once you've identified a specific bottleneck that is causing real problems.
The best code for getting the previous even number (by your definition where the previous even number of 2 is 2) is simply writing something like:
if ((num % 2) == 1) num--; // num++ for next.
or (slightly more advanced):
num -= num % 2; // += for next.
and letting the insane optimising compilers figure out the best underlying code.
Unless you need to do these operations billions of times per second, readability should always be your prime concern.
Previous even number:
For previous even number I prefer Jerry Coffin's answer
// Get previous even number
unsigned prevEven(unsigned no)
{
return (no & ~1);
}
Next even number:
I try to use only bitwise operator's but still i use one unary minus(-) operator to get next number.
// Get next even number
unsigned nextEven(unsigned no)
{
return (no & 1) ? (-(~no)) : no ;
}
Working of Method nextEven():
If number is even return the same number,
if no is even it's LSB is 0 otherwise 1
Get LSB of number => number & 1
If number is odd return the number + 1,
Add 1 to number => -(~number)
unsigned int previous(unsigned int x)
{
return x & 0xfffffffe;
}
unsigned int next(unsigned int x)
{
return previous(x + 2);
}

bitwise bitmanipulation puzzle

Hello is have a question for a school assignment i need to :
Read a round number, and with the internal binaire code with bit 0 on the right and bit 7 on the left.
Now i need to change:
bit 0 with bit 7
bit 1 with bit 6
bit 2 with bit 5
bit 3 with bit 4
by example :
if i use hex F703 becomes F7C0
because 03 = 0000 0011 and C0 = 1100 0000
(only the right byte (8 bits) need to be switched.
The lession was about bitmanipulation but i can't find a way to make it correct for al the 16 hexnumbers.
I`am puzzling for a wile now,
i am thinking for using a array for this problem or can someone say that i can be done with only bitwise ^,&,~,<<,>>, opertors ???
Study the following two functions:
bool GetBit(int value, int bit_position)
{
return value & (1 << bit_position);
}
void SetBit(int& value, int bit_position, bool new_bit_value)
{
if (new_bit_value)
value |= (1 << bit_position);
else
value &= ~(1 << bit_position);
}
So now we can read and write arbitrary bits just like an array.
1 << N
gives you:
000...0001000...000
Where the 1 is in the Nth position.
So
1 << 0 == 0000...0000001
1 << 1 == 0000...0000010
1 << 2 == 0000...0000100
1 << 3 == 0000...0001000
...
and so on.
Now what happens if I BINARY AND one of the above numbers with some other number Y?
X = 1 << N
Z = X & Y
What is Z going to look like? Well every bit apart from the Nth is definately going to be 0 isnt it? because those bits are 0 in X.
What will the Nth bit of Z be? It depends on the value of the Nth bit of Y doesn't it? So under what circumstances is Z zero? Precisely when the Nth bit of Y is 0. So by converting Z to a bool we can seperate out the value of the Nth bit of Y. Take another look at the GetBit function above, this is exactly what it is doing.
Now thats reading bits, how do we set a bit? Well if we want to set a bit on we can use BINARY OR with one of the (1 << N) numbers from above:
X = 1 << N
Z = Y | X
What is Z going to be here? Well every bit is going to be the same as Y except the Nth right? And the Nth bit is always going to be 1. So we have set the Nth bit on.
What about setting a bit to zero? What we want to do is take a number like 11111011111 where just the Nth bit is off and then use BINARY AND. To get such a number we just use BINARY NOT:
X = 1 << N // 000010000
W = ~X // 111101111
Z = W & Y
So all the bits in Z apart from the Nth will be copies of Y. The Nth will always be off. So we have effectively set the Nth bit to 0.
Using the above two techniques is how we have implemented SetBit.
So now we can read and write arbitrary bits. Now we can reverse the bits of the number just like it was an array:
int ReverseBits(int input)
{
int output = 0;
for (int i = 0; i < N; i++)
{
bool bit = GetBit(input, i); // read ith bit
SetBit(output, N-i-1, bit); // write (N-i-1)th bit
}
return output;
}
Please make sure you understand all this. Once you have understood this all, please close the page and implement and test them without looking at it.
If you enjoyed this than try some of these:
http://graphics.stanford.edu/~seander/bithacks.html
And/or get this book:
http://www.amazon.com/exec/obidos/ASIN/0201914654/qid%3D1033395248/sr%3D11-1/ref%3Dsr_11_1/104-7035682-9311161
This does one quarter of the job, but I'm not going to give you any more help than that; if you can work out why I said that, then you should be able to fill in the rest of the code.
if ((i ^ (i >> (5 - 2))) & (1 >> 2))
i ^= (1 << 2) | (1 << 5);
Essentially you need to reverse the bit ordering.
We're not going to solve this for you.. but here's a hint:
What if you had a 2-bit value. How would you reverse these bits?
A simple swap would work, right? Think about how to code this swap with operators that are available to you.
Now let's say you had a 4-bit value. How would you reverse these bits?
Could you split it into two 2-bit values, reverse each one, and then swap them? Would that give you the right result? Now code this.
Generalizing that solution to the 8-bit value should be trivial now.
Good luck!

Some random C questions (ascii magic and bitwise operators)

I am trying to learn C programming, and I was studying some source codes and there are some things I didn't understand, especially regarding Bitwise Operators. I read some sites on this, and I kinda got an idea on what they do, but when I went back to look at this codes, I could not understand why and how where they used.
My first question is not related to bitwise operators but rather some ascii magic:
Can somebody explain to me how the following code works?
char a = 3;
int x = a - '0';
I understand this is done to convert a char into an int, however I don't understand the logic behind it. Why/How does it work?
Now, Regarding Bitwise operators, I feel really lost here.
What does this code do?
if (~pointer->intX & (1 << i)) { c++; n = i; }
I read somewhere that ~ inverts bits, but I fail to see what this statement is doing and why is it doing that.
Same with this line:
row.data = ~(1 << i);
Other question:
if (x != a)
{
ret |= ROW;
}
What exactly is the |= operator doing? From what I read, |= is OR but i don't quite understand what is this statement doing.
Is there any way of rewriting this code to make it easier to understands so that it doesn't use this bitwise operators? I find them very confusing to understand, so hopefully somebody will point me in the right direction on understanding how they work better!
I have a much better understanding of bitwise operators now and the whole code makes much more sense now.
One last thing: appartenly nobody responded if there would be a "cleaner" way for rewriting this code in a way that its easier to understand and maybe not at "bitlevel". Any ideas?
This will produce junk:
char a = 3;
int x = a - '0';
This is different - note the quotes:
char a = '3';
int x = a - '0';
The char datatype stores a number that identifiers a character. The characters for the digits 0 through 9 are all next to each other in the character code list, so if you subtract the code for '0' from the code for '9', you get the answer 9. So this will turn a digit character code into the integer value of the digit.
(~pointer->intX & (1 << i))
That will be interpreted by the if statement as true if it's non-zero. There are three different bitwise operators being used.
The ~ operator flips all the bits in the number, so if pointer->intX was 01101010, then ~pointer->intX will be 10010101. (Note that throughout, I'm illustrating the contents of a byte. If it was a 32-bit integer, I'd have to write 32 digits of 1s and 0s).
The & operator combines two numbers into one number, by dealing with each bit separately. The resulting bit is only 1 if both the input bits are 1. So if the left side is 00101001 and the right side is 00001011, the result will be 00001001.
Finally, << means left shift. If you start with 00000001 and left shift it by three places, you'll have 00001000. So the expression (1 << i) produces a value where bit i is switched on, and the others are all switch off.
Putting it all together, it tests if bit i is switched off (zero) in pointer->intX.
So you may be able to figure out what ~(1 << i) does. If i is 4, the thing in brackets will be 00010000, and so the whole thing will be 11101111.
ret |= ROW;
That one is equivalent to:
ret = ret | ROW;
The | operator is like & except that the resulting bit is 1 if either of the input bits is 1. So if ret is 00100000 and ROW is 00000010, the result will be 00100010.
ret |= ROW;
is equivalent to
ret = ret | ROW;
For char a = 3; int x = a - '0'; I think you meant char a = '3'; int x = a - '0';. It's easy enough to understand if you realize that in ASCII the numbers come in order, like '0', '1', '2', ... So if '0' is 48 and '1' is 49, then '1' - '0' is 1.
For bitwise operations, they are hard to grasp until you start looking at bits. When you view these operations on binary numbers then you can see exactly how they work...
010 & 111 = 010
010 | 111 = 111
010 ^ 111 = 101
~010 = 101
I think you probably have a typo, and meant:
char a = '3';
The reason this works is that all the numbers come in order, and '0' is the first. Obviously, '0' - '0' = 0. '1' - '0' = 1, since the character value for '1' is one greater than the character value for '0'. Etc.
1) A char is really just a 8-bit integer. '0' == 48, and all that that implies.
2) (~(pointer->intX) & (1 << i)) evalutates whether the 'i'th bit (from the right) in the intX member of whatever pointer points to is not set. The ~ inverts the bits, so all the 0s become 1s and vice versa, then the 1 << i puts a single 1 in the desired location, & combines the two values so that only the desired bit is kept, and the whole thing evalutes to true if that bit was 0 to begin with.
3) | is bitwise or. It takes each bit in both operands and performs a logical OR, producing a result where each bit is set if either operand had that bit set. 0b11000000 | 0b00000011 == 0b11000011. |= is an assignment operator, in the same way that a+=b means a=a+b, a|=b means a=a|b.
Not using bitwise operators CAN make things easier to read in some cases, but it will usually also make your code significantly slower without strong compiler optimization.
The subtraction trick you reference works because ASCII numbers are arranged in ascending order, starting with zero. So if ASCII '0' is a value of 48 (and it is), then '1' is a value of 49, '2' is 50, etc. Therefore ASCII('1') - ASCII('0') = 49 - 48 = 1.
As far as bitwise operators go, they allow you to perform bit-level operations on variables.
Let's break down your example:
(1 << i) -- this is left-shifting the constant 1 by i bits. So if i=0, the result is decimal 1. If i = 1, it shifts the bit one to the left, backfilling with zeros, yielding binary 0010, or decimal 2. If i = 2, you shift the bit two to the left, backfilling with zeros, yielding binary 0100 or decimal 4, etc.
~pointer->intX -- this is taking the value of the intX member of pointer and inverting its bits, setting all zeros to ones and vice versa.
& -- the ampersand operator does a bitwise AND comparison. The results of this will be 1 wherever both the left and right side of the expression are 1, and 0 otherwise.
So the test will succeed if pointer->intX has a 0 bit at the ith position from the right.
Also, |= means to do a bitwise OR comparison and assign the result to the left side of the expression. The result of a bitwise OR is 1 for every bit where the corresponding left or right side bit is 1,
Single quotes are used to indicate that a single char is used. '0' therefore is the char '0', which has the ASCII-Code 48.
3-'0'=3-48
'1<<i' shifts 1 i places to the left, therefore only the ith bit from the right is 1.
~pointer->intX negates the field intX, so the logical AND returns a true value (not 0) when intX has every bit except for the ith bit from the right isn't set.
char a = '3';
int x = a - '0';
you had a typo here (notice the 's around the 3), this assigns the ascii value of the character 3, to the char variable, then the next line takes '3' - '0' and assigns it to x, because of the way ascii values work, x will then be equal to 3 (integer value)
In the first comparison, I've never seen ~ being used on a pointer that way before, another typo maybe? If I were to read out the following code:
(~pointer->intX & (1 << i))
I would say "(the value intX dereferenced from pointer) AND (1 left shifted i times)"
1 << i is a quick way of multiplying 1 by a power of 2, ie if i is 3, then 1 << 3 == 8
In this case, I have no clue why you would invert the bits of the pointer..
In the 2nd comparison, x |= y is the same as x = x | y
I'm assuming you mean char a='3'; for the first line of code (otherwise you get a rather strange answer). The basic principal is that ASCII codes for digits are sequential, i.e. the code for '0'=48, the code for '1'=49, and so on. Subtracting '0' simply converts from the ASCII code to the actual digit, so e.g. '3' - '0' = 3, and so on. Note that this will only work if the character you're subtracting '0' from is an actual digit - otherwise the result will have little meaning.
a. Without context the "why" of this code is impossible to say. As for what it's doing, it appears that the if statement evaluates as true when bit i of pointer->intX is not set, i.e. that particular bit is a 0. I believe the & operator gets executed before the ~ operator, as the ~ operator has very low precedence. The code could make better use of parentheses to make the intended order of operations clearer. In this case, the order of operations might not matter though - I believe the result is the same either way.
b. This is simply creating a number with all bits EXCEPT bit i set to 1. A convenient way of creating a mask for bit i is to use the expression (1<<i).
The bitwise OR operation in this case is used to set the bits specified by the ROW constant to 1. If these bits are not set, it sets them; if they're already set it has no effect.
1) Can somebody explain to me how the following code works? char a = 3; int x = a - '0';
I undertand this is done to convert a char into an int, however I don't understand the logic behind it. Why/How does it work?
Sure. variable a is of type char, and by putting single quotes around 0 that is causing C to view it as a char as well. Finally, the whole statement is automagically typecast to its integer equivalent, because x is defined as an integer.
2) Now, Regarding Bitwise operators, I feel really lost here.
--- What does this code do? if (~pointer->intX & (1 << i)) { c++; n = i; } I read somewhere that ~ inverts bits, but I fail to see what this statement is doing and why is it doing that.
(~pointer->intX & (1 << i)) is saying:
negate intX, and AND it with a 1 shifted left by i bits
so, what you're getting, if intX = 1011, and i = 2, equates to
(0100 & 0100)
-negate 1011 = 0100
-(1 << 2) = 0100
0100 & 0100 = 1 :)
then, if the AND operation returns a 1 (which, in my example, it does)
{ c++; n = i; }
so, increment c by 1, and set variable n to be i
Same with this line: row.data = ~(1 << i);
Same principle here.
Shift a 1 to the left by i places, and negate.
So, if i = 2 again
(1 << 2) = 0100
~(0100) = 1011
**--- Other question:
if (x != a) { ret |= ROW; }
What exacly is the |= operator doing? From what I read, |= is OR but i don't quite understand what is this statement doing.**
if (x != a) (hopefully this is apparent to you....if variable x does not equal variable a)
ret |= ROW;
equates to
ret = ret | ROW;
which means, binary OR ret with ROW
For examples of exactly what AND and OR operations accomplish, you should have a decent understanding of binary logic.
Check wikipedia for truth tables...ie
Bitwise operations