How do I change low order bits to 1? - bit-manipulation

I'm trying to add a thing where it sets the low order (b) bits of x to 1, and the others to 0. This also includes edge cases where if b is 32 then x is -1, or if b is 0 then x is 0. I'm trying to do this without the use of formal code (just bitwise operations) but I'm not getting any luck. So far I have as follows:
x = (0xFFFFFFFF >> (b+1)) & (!b + (~1 + 1));
Any help is appreciated.

Related

Getting exponent value using bit shifts (C, C++) [duplicate]

Note - This is NOT a duplicate of this question - Count the consecutive zero bits (trailing) on the right in parallel: an explanation? . The linked question has a different context, it only asks the purpose of signed() being use. DO NOT mark this question as duplicate.
I've been finding a way to acquire the number of trailing zeros in a number. I found a bit twiddling Stanford University Write up HERE here that gives the following explanation.
unsigned int v; // 32-bit word input to count zero bits on right
unsigned int c = 32; // c will be the number of zero bits on the right
v &= -signed(v);
if (v) c--;
if (v & 0x0000FFFF) c -= 16;
if (v & 0x00FF00FF) c -= 8;
if (v & 0x0F0F0F0F) c -= 4;
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;
Why does this end up working ? I have an understanding of how Hex numbers are represented as binary and bitwise operators, but I am unable to figure out the intuition behind this working ? What is the working mechanism ?
The code is broken (undefined behavior is present). Here is a fixed version which is also slightly easier to understand (and probably faster):
uint32_t v; // 32-bit word input to count zero bits on right
unsigned c; // c will be the number of zero bits on the right
if (v) {
v &= -v; // keep rightmost set bit (the one that determines the answer) clear all others
c = 0;
if (v & 0xAAAAAAAAu) c |= 1; // binary 10..1010
if (v & 0xCCCCCCCCu) c |= 2; // binary 1100..11001100
if (v & 0xF0F0F0F0u) c |= 4;
if (v & 0xFF00FF00u) c |= 8;
if (v & 0xFFFF0000u) c |= 16;
}
else c = 32;
Once we know only one bit is set, we determine one bit of the result at a time, by simultaneously testing all bits where the result is odd, then all bits where the result has the 2's-place set, etc.
The original code worked in reverse, starting with all bits of the result set (after the if (c) c--;) and then determining which needed to be zero and clearing them.
Since we are learning one bit of the output at a time, I think it's more clear to build the output using bit operations not arithmetic.
This code (from the net) is mostly C, although v &= -signed(v); isn't correct C. The intent is for it to behave as v &= ~v + 1;
First, if v is zero, then it remains zero after the & operation, and all of the if statements are skipped, so you get 32.
Otherwise, the & operation (when corrected) clears all bits to the left of the rightmost 1, so at that point v contains a single 1 bit. Then c is decremented to 31, i.e. all 1 bits within the possible result range.
The if statements then determine its numeric position one bit at a time (one bit of the position number, not of v), clearing the bits that should be 0.
The code first transforms v is such a way that is is entirely null, except the left most one that remains. Then, it determines the position of this first one.
First let's see how we suppress all ones but the left most one.
Assume that k is the position of the left most one in v. v=(vn-1,vn-2,..vk+1,1,0,..0).
-v is the number that added to v will give 0 (actually it gives 2^n, but bit 2^n is ignored if we only keep the n less significant bits).
What must the value of bits in -v so that v+-v=0?
obviously bits k-1..0 of -k must be at 0 so that added to the trailing zeros in v they give a zero.
bit k must be at 1. Added to the one in vk, it will give a zero and a carry at one at order k+1
bit k+1 of -v will be added to vk+1 and to the carry generated at step k. It must be the logical complement of vk+1. So whatever the value of vk+1, we will have 1+0+1 if vk+1=0 (or 1+1+0 if vk+1=1) and result will be 0 at order k+1 with a carry generated at order k+2.
This is similar for bits n-1..k+2 and they must all be the logical complement of the corresponding bit in v.
Hence, we get the well-known result that to get -v, one must
leave unchanged all trailing zeros of v
leave unchanged the left most one of v
complement all the other bits.
If we compute v&-v, we have
v vn-1 vn-2 ... vk+1 1 0 0 ... 0
-v & ~vn-1 ~vn-2 ... ~vk+1 1 0 0 ... 0
v&-v 0 0 ... 0 1 0 0 ... 0
So v&-v only keeps the left most one in v.
To find the location of first one, look at the code:
if (v) c--; // no 1 in result? -> 32 trailing zeros.
// Otherwise it will be in range c..0=31..0
if (v & 0x0000FFFF) c -= 16; // If there is a one in left most part of v the range
// of possible values for the location of this one
// will be 15..0.
// Otherwise, range must 31..16
// remaining range is c..c-15
if (v & 0x00FF00FF) c -= 8; // if there is one in either byte 0 (c=15) or byte 2 (c=31),
// the one is in the lower part of range.
// So we must substract 8 to boundaries of range.
// Other wise, the one is in the upper part.
// Possible range of positions of v is now c..c-7
if (v & 0x0F0F0F0F) c -= 4; // do the same for the other bits.
if (v & 0x33333333) c -= 2;
if (v & 0x55555555) c -= 1;

Can XorShift return zero?

I've been reading about the XorShift PRNG especially the paper here
A guy here states that
The number lies in the range [1, 2**64). Note that it will NEVER be 0.
Looking at the code that makes sense:
uint64_t x;
uint64_t next(void) {
x ^= x >> 12; // a
x ^= x << 25; // b
x ^= x >> 27; // c
return x * UINT64_C(2685821657736338717);
}
If x would be zero than every next number would be zero too. But wouldn't that make it less useful? The usual use-pattern would be something like min + rand() % (max - min) or converting the 64 bits to 32 bits if you only need an int. But if 0 is never returned than that might be a serious problem. Also the bits are not 0 or 1 with the same probability as obviously 0 is missing so zeroes or slightly less likely. I even can't find any mention of that on Wikipedia so am I missing something?
So what is a good/appropriate way to generate random, equally distributed numbers from XorShift64* in a given range?
Short answer: No it cannot return zero.
According the Numeric Recipes "it produces a full period of 2^64-1 [...] the missing value is zero".
The essence is that those shift values have been chosen carefully to make very long sequences (full possible one w/o zero) and hence one can be sure that every number is produced. Zero is indeed the fixpoint of this generator, hence it produces 2 sequences: Zero and the other containing every other number.
So IMO for a sufficiently small range max-min it is enough to make a function (next() - 1) % (max - min) + min or even omitting the subtraction altogether as zero will be returned by the modulo.
If one wants better quality equal distribution one should use the 'usual' method by using next() as a base generator with a range of [1, 2^64)
I am nearly sure that there is an x, for which the xorshift operation returns 0.
Proof:
First, we have these equations:
a = x ^ (x >> 12);
b = a ^ (a << 25);
c = b ^ (b >> 27);
Substituting them:
b = (x ^ x >> 12) ^ ((x ^ x >> 12) << 25);
c = b ^ (b >> 27) = ((x ^ x >> 12) ^ ((x ^ x >> 12) << 25)) ^ (((x ^ x >> 12) ^ ((x ^ x >> 12) << 25)) >> 27);
As you can see, although c is a complex equation, it is perfectly abelian.
It means, you can express the bits of c as fully boolean expressions of the bits of x.
Thus, you can simply construct an equation system for the bits b0, b1, b2, ... so:
(Note: the coefficients are only examples, I didn't calculate them, but so would it look):
c0 = x1 ^ !x32 ^ x47 ...
c1 = x23 ^ x45 ^ !x61 ...
...
c63 = !x13 ^ ...
From that point, you have 64 equations and 64 unknowns. You can simply solve it with Gauss-elimination, you will always have a single unique solution.
Except some rare cases, i.e. if the determinant of the coefficients of the equation system is zero, but it is very unlikely in the size of such a big matrix.
Even if it happens, it would mean that you have an information loss in every iteration, i.e. you can't get all of the 2^64 possible values of x, only some of them.
Now consider the much more probable possibility, that the coefficient matrix is non-zero. In this case, for all the possible 2^64 values of x, you have all possible 2^64 values of c, and these are all different.
Thus, you can get zero.
Extension: actually you get zero for zero... sorry, the proof is more useful to show that it is not so simple as it seems for the first spot. The important part is that you can express the bits of c as a boolean function of the bits of x.
There is another problem with this random number generator. And this is that even if you somehow modify the function to not have such problem (for example, by adding 1 in every iteration):
You still can't guarantee that it won't get into a short loop *for any possible values of x. What if there is a 5 length loop for value 345234523452345? Can you prove for all possible initial values? I can't.
Actually, having a really pseudorandom iteration function, your system will likely loop after 2^32 iterations. It has a nearly trivial combinatoric reason, but "unfortunately this margin is small to contain it" ;-)
So:
If a 2^32 loop length is for your PRNG okay, then use a proven iteration function collected from somewhere on the net.
If it isn't, upgrade the bit length to at least 2^128. It will result a roughly 2^64 loop length which is not so bad.
If you still want a 64-bit output, then use 128-bit numeric internally, but return (x>>64) ^ (x&(2^64-1)) (i.e. xor-ing the upper and lower half of the internal state x).

Why does this work for determining if a number is a power of 2?

int isPower2(int x) {
int neg_one = ~0;
return !(neg_one ^ (~x+1));
}
This code works, I have implemented it and it performs perfectly. However, I cannot wrap my head around why. When I do it by hand, it doesn't make any sense to me.
Say we are starting with a 4 bit number, 4:
0100
This is obviously a power of 2. When I follow the algorithm, though, ~x+1 =
1011 + 1 = 1100
XORing this with negative one (1111) gives 0011. !(0011) = 0. Where am I going wrong here? I know this has to be a flaw in the way I am doing this by hand.
To paraphrase Inigo Montoya, "I do not think this does what you think it does".
Let's break it down.
~x + 1
This flips the bits of 'x' and then adds one. This is the same as taking the 2's complement of 'x'. Or, to put it another way, this is the same as '-x'.
neg_one ^ (~x + 1)
Using what we noted in step 1, this simplifies to ...
neg_one ^ (-x)
or more simply ...
-1 ^ (-x)
But wait! XOR'ing something with -1 is the same as flipping the bits. That is ...
~(-x)
~(-x)
This can be simplified even more if we make use of the 2's complement.
~(-x) + 0
= ~(-x) + 1 - 1
= x - 1
If you are looking for an easy way to determine if a number is a power of 2, you can use the following instead for numbers greater than zero. It will return true if it is a power of two.
(x & (x - 1)) == 0

Logical operators and bit manipulation in C

i am trying to do some exercises, but i'm stuck at this point, where i can't understand what's happening and can't find anything related to this particular matter (Found other things about logical operators, but still not enough)
EDIT: Why the downvote, i was pretty explicit. There is no information regarding the type of X, but i assume is INT, the size is not described either, i thought i would discover that by doing the exercise.
a) At least one bit of x is '1';
b) At least one bit of x is '0';
c) At least one bit at the Least Significant Byte of x , is '1';
d) At least one bit at the Least Significant Byte of x , is '0';
I have the solutions, but would be great to understand them
a) !!x // What happens here? The '!' Usually is NOT in c
b) !!~x // Again, the '!' appears... The bitwise operand NOT is '~' and it makes the int one's complement, no further realization made unfortunately
c) !!(x & 0xFF) // I've read that this is a bit mask, i think they take in consideration 4 bytes in X, and this applies a mask at the least significant byte?
d) !!(~x & 0xFF) // Well, at this point i'm lost ...
I would love not having to skip classes at college, but i work full time in order to pay the fees :( .
You can add brackets around the separate operations and apply them in order. e.g.
!(!(~x))
i.e. !! is 2 NOT's
What happens to some value if you perform one NOT is:
If x == 0 then !x == 1, otherwise !x == 0
So, if you would perform another NOT, you invert the truth-value again. i.e.
If x == 0 then !!x == 0, otherwise !!x == 1
You could see it as getting your value between 0 and 1 in which 0 means: "no bit of x is '1'", and 1 means: "at least one bit of x is '1'".
Also, x & 0xFF takes the least significant byte of your variable. More thoroughly explained here:
What does least significant byte mean?
Assuming x is some unsigned int/short/long/... and you want conditions (if, while...):
a) You´ll have to know that just a value/variable as condition (without a==b or something)
is false if it is 0 and true if it is not 0. So, if x is not 0 (true), one ! will switch it to 0 and the other ! to something not-0-like again (not necessarily the old value, only not 0). If x was 0, the ! will finally result in 0 again (first not 0, then again 0).
The whole value of x is not 0 if at least 1 bit is 1...
What you´re doing is to transform either 0 to 0 or a value with 1-bits to some value with 1-bits. Not wrong, but... You can just write if(x) instead of if(!!x)
b) ~ switches every 0-bit to 1 and every 1 to 0. Now you can search again a 1 because you want a 0 in the original value. The same !!-thing again...
c and d:
&0xFF sets all bits except for the lowest 8 ones (lowest byte) to 0.
The result of A&B is a value where each bit is only 1 if the bits of A an B at the same position are both 1. 0xff (decimal 255) is the number which has exactly the lowest 8 bits set to 1...

Is there any alternative to using % (modulus) in C/C++?

I read somewhere once that the modulus operator is inefficient on small embedded devices like 8 bit micro-controllers that do not have integer division instruction. Perhaps someone can confirm this but I thought the difference is 5-10 time slower than with an integer division operation.
Is there another way to do this other than keeping a counter variable and manually overflowing to 0 at the mod point?
const int FIZZ = 6;
for(int x = 0; x < MAXCOUNT; x++)
{
if(!(x % FIZZ)) print("Fizz\n"); // slow on some systems
}
vs:
The way I am currently doing it:
const int FIZZ = 6;
int fizzcount = 1;
for(int x = 1; x < MAXCOUNT; x++)
{
if(fizzcount >= FIZZ)
{
print("Fizz\n");
fizzcount = 0;
}
}
Ah, the joys of bitwise arithmetic. A side effect of many division routines is the modulus - so in few cases should division actually be faster than modulus. I'm interested to see the source you got this information from. Processors with multipliers have interesting division routines using the multiplier, but you can get from division result to modulus with just another two steps (multiply and subtract) so it's still comparable. If the processor has a built in division routine you'll likely see it also provides the remainder.
Still, there is a small branch of number theory devoted to Modular Arithmetic which requires study if you really want to understand how to optimize a modulus operation. Modular arithmatic, for instance, is very handy for generating magic squares.
So, in that vein, here's a very low level look at the math of modulus for an example of x, which should show you how simple it can be compared to division:
Maybe a better way to think about the problem is in terms of number
bases and modulo arithmetic. For example, your goal is to compute DOW
mod 7 where DOW is the 16-bit representation of the day of the
week. You can write this as:
DOW = DOW_HI*256 + DOW_LO
DOW%7 = (DOW_HI*256 + DOW_LO) % 7
= ((DOW_HI*256)%7 + (DOW_LO % 7)) %7
= ((DOW_HI%7 * 256%7) + (DOW_LO%7)) %7
= ((DOW_HI%7 * 4) + (DOW_LO%7)) %7
Expressed in this manner, you can separately compute the modulo 7
result for the high and low bytes. Multiply the result for the high by
4 and add it to the low and then finally compute result modulo 7.
Computing the mod 7 result of an 8-bit number can be performed in a
similar fashion. You can write an 8-bit number in octal like so:
X = a*64 + b*8 + c
Where a, b, and c are 3-bit numbers.
X%7 = ((a%7)*(64%7) + (b%7)*(8%7) + c%7) % 7
= (a%7 + b%7 + c%7) % 7
= (a + b + c) % 7
since 64%7 = 8%7 = 1
Of course, a, b, and c are
c = X & 7
b = (X>>3) & 7
a = (X>>6) & 7 // (actually, a is only 2-bits).
The largest possible value for a+b+c is 7+7+3 = 17. So, you'll need
one more octal step. The complete (untested) C version could be
written like:
unsigned char Mod7Byte(unsigned char X)
{
X = (X&7) + ((X>>3)&7) + (X>>6);
X = (X&7) + (X>>3);
return X==7 ? 0 : X;
}
I spent a few moments writing a PIC version. The actual implementation
is slightly different than described above
Mod7Byte:
movwf temp1 ;
andlw 7 ;W=c
movwf temp2 ;temp2=c
rlncf temp1,F ;
swapf temp1,W ;W= a*8+b
andlw 0x1F
addwf temp2,W ;W= a*8+b+c
movwf temp2 ;temp2 is now a 6-bit number
andlw 0x38 ;get the high 3 bits == a'
xorwf temp2,F ;temp2 now has the 3 low bits == b'
rlncf WREG,F ;shift the high bits right 4
swapf WREG,F ;
addwf temp2,W ;W = a' + b'
; at this point, W is between 0 and 10
addlw -7
bc Mod7Byte_L2
Mod7Byte_L1:
addlw 7
Mod7Byte_L2:
return
Here's a liitle routine to test the algorithm
clrf x
clrf count
TestLoop:
movf x,W
RCALL Mod7Byte
cpfseq count
bra fail
incf count,W
xorlw 7
skpz
xorlw 7
movwf count
incfsz x,F
bra TestLoop
passed:
Finally, for the 16-bit result (which I have not tested), you could
write:
uint16 Mod7Word(uint16 X)
{
return Mod7Byte(Mod7Byte(X & 0xff) + Mod7Byte(X>>8)*4);
}
Scott
If you are calculating a number mod some power of two, you can use the bit-wise and operator. Just subtract one from the second number. For example:
x % 8 == x & 7
x % 256 == x & 255
A few caveats:
This only works if the second number is a power of two.
It's only equivalent if the modulus is always positive. The C and C++ standards don't specify the sign of the modulus when the first number is negative (until C++11, which does guarantee it will be negative, which is what most compilers were already doing). A bit-wise and gets rid of the sign bit, so it will always be positive (i.e. it's a true modulus, not a remainder). It sounds like that's what you want anyway though.
Your compiler probably already does this when it can, so in most cases it's not worth doing it manually.
There is an overhead most of the time in using modulo that are not powers of 2.
This is regardless of the processor as (AFAIK) even processors with modulus operators are a few cycles slower for divide as opposed to mask operations.
For most cases this is not an optimisation that is worth considering, and certainly not worth calculating your own shortcut operation (especially if it still involves divide or multiply).
However, one rule of thumb is to select array sizes etc. to be powers of 2.
so if calculating day of week, may as well use %7 regardless
if setting up a circular buffer of around 100 entries... why not make it 128. You can then write % 128 and most (all) compilers will make this & 0x7F
Unless you really need high performance on multiple embedded platforms, don't change how you code for performance reasons until you profile!
Code that's written awkwardly to optimize for performance is hard to debug and hard to maintain. Write a test case, and profile it on your target. Once you know the actual cost of modulus, then decide if the alternate solution is worth coding.
#Matthew is right. Try this:
int main() {
int i;
for(i = 0; i<=1024; i++) {
if (!(i & 0xFF)) printf("& i = %d\n", i);
if (!(i % 0x100)) printf("mod i = %d\n", i);
}
}
x%y == (x-(x/y)*y)
Hope this helps.
Do you have access to any programmable hardware on the embedded device? Like counters and such? If so, you might be able to write a hardware based mod unit, instead of using the simulated %. (I did that once in VHDL. Not sure if I still have the code though.)
Mind you, you did say that division was 5-10 times faster. Have you considered doing a division, multiplication, and subtraction to simulated the mod? (Edit: Misunderstood the original post. I did think it was odd that division was faster than mod, they are the same operation.)
In your specific case, though, you are checking for a mod of 6. 6 = 2*3. So you could MAYBE get some small gains if you first checked if the least significant bit was a 0. Something like:
if((!(x & 1)) && (x % 3))
{
print("Fizz\n");
}
If you do that, though, I'd recommend confirming that you get any gains, yay for profilers. And doing some commenting. I'd feel bad for the next guy who has to look at the code otherwise.
You should really check the embedded device you need. All the assembly language I have seen (x86, 68000) implement the modulus using a division.
Actually, the division assembly operation returns the result of the division and the remaining in two different registers.
In the embedded world, the "modulus" operations you need to do are often the ones that break down nicely into bit operations that you can do with &, | and sometimes >>.
#Jeff V: I see a problem with it! (Beyond that your original code was looking for a mod 6 and now you are essentially looking for a mod 8). You keep doing an extra +1! Hopefully your compiler optimizes that away, but why not just test start at 2 and go to MAXCOUNT inclusive? Finally, you are returning true every time that (x+1) is NOT divisible by 8. Is that what you want? (I assume it is, but just want to confirm.)
For modulo 6 you can change the Python code to C/C++:
def mod6(number):
while number > 7:
number = (number >> 3 << 1) + (number & 0x7)
if number > 5:
number -= 6
return number
Not that this is necessarily better, but you could have an inner loop which always goes up to FIZZ, and an outer loop which repeats it all some certain number of times. You've then perhaps got to special case the final few steps if MAXCOUNT is not evenly divisible by FIZZ.
That said, I'd suggest doing some research and performance profiling on your intended platforms to get a clear idea of the performance constraints you're under. There may be much more productive places to spend your optimisation effort.
The print statement will take orders of magnitude longer than even the slowest implementation of the modulus operator. So basically the comment "slow on some systems" should be "slow on all systems".
Also, the two code snippets provided don't do the same thing. In the second one, the line
if(fizzcount >= FIZZ)
is always false so "FIZZ\n" is never printed.