I need to xor the every single bits each other in a variable using c++
Let's consider 4-bit values a and x where their bit-representation is a = a3a2a1a0 and x = x3x2x1x0.
We dene the masking operation "." as a.x = a3x3(xor)a2x2(xor)a1x1(xor)a0x0.
I did a&x and find a3x3 a2x2 a1x1 a0x0 now i need to xor them but how ? is there any special way to do that ? like '&' operation ? I searched but didn't find anything..any help will be appreciated!
Based on your description, the final result that you're going to get is either 0 or 1, since you finished the anding, what you need is to calculate how many 1's in the binary representation of the anding result: a&x.
What you need to do is to shift the bits, one by one and calculate 1's, if the final result is odd number then the final result is 1, if even then the final result is 0.
You'll need to shift "a and x" to do the xor of all bits.
Something like:
uint32_t a = 0xa;
uint32_t x = 0xb;
uint32_t tmp = a & x; // Bitwise AND of a and x
uint32_t res = 0;
for (int i = 0; i < 32; ++i)
{
res = res ^ (0x1 & tmp); // Only include LSB of tmp in the XOR
tmp = tmp >> 1; // Shift tmp to get a new LSB
}
cout << "Result: " << res << endl;
An alternative solution could be:
uint32_t a = 0xa;
uint32_t x = 0xb;
uint32_t tmp = a & x; // Bitwise AND of a and x
uint32_t res = 0;
while (tmp > 0)
{
if ((tmp % 2) == 1) res = (res + 1) & 0x1; // XOR operation
tmp = tmp/2; // Shift operation
}
cout << "Result: " << res << endl;
Related
I am in the quite fortunate position to say that my code for a simple SHA1 Hash generator seems to work well. Unfortunately I know that this Arduino Program runs with Little Endianness and the description on the method to generate a hash requires the original message length to be appended as Big Endian integer.
This means for the message char m[] = "Applecake" I would have 9*8 bits, expressed as a 64-bit unsigned integer that is 0x0000 0000 0000 0048. That means, stored with Little Endian, the memory would look like this: 0x0048 0000 0000 0000.
As described in Section 4 of RFC 3174 Step c) I have to
Obtain the 2-word representation of l, the number of bits in the original message. If l < 2^32 then the first word is all zeroes. Append these two words to the padded message.
So with my memory as described above, I would have to convert it to Big Endian first and then append the lower 32 bits to the padded message.
The problem is, that if I do convert the Endianness of the length, which I know is Little Endian, I get the wrong padding and therefore the wrong hash.
Why is my code working without conversion of the Endianness?
Which limitations does my code have concerning the compatibility across different Arduinos, microcontrollers and compilers?
// initialize variables
h0 = 0x67452301;
h1 = 0xEFCDAB89;
h2 = 0x98BADCFE;
h3 = 0x10325476;
h4 = 0xC3D2E1F0;
// calculate the number of required cycles and create a blocks array
uint32_t numCycles = ((ml+65)/512)+1;
uint32_t blocks[numCycles*16] = {};
// copy message
uint32_t messageBytes = ml/8 + (ml%8!=0 ? 1 : 0);
for (uint32_t i = 0; i < messageBytes; i++) {
blocks[i/4] |= ((uint32_t) message[i]) << (8*(3-(i%4)));
}
// append the 1 bit
blocks[ml/32] |= ((uint32_t) 0b1) << (31-(ml%32));
// append the 64-bit big endian ml at the end
if (ml < 0x80000000)
blocks[(numCycles*16)-1] = (uint32_t) ml;
else {
blocks[(numCycles*16)-2] = (uint32_t) ml;
blocks[(numCycles*16)-1] = (uint32_t) (ml >> 32);
}
for (uint32_t iCycle = 0; iCycle < numCycles; iCycle++) {
// initalize locals
uint32_t w[80] = {};
uint32_t a = h0, b = h1, c = h2, d = h3, e = h4;
for (uint8_t i = 0; i < 80; i++) {
// convert words to big-endian and copy to 80-elem array
if (i < 16)
w[i] = blocks[(iCycle*16)+i];
else
w[i] = rotL((w[i-3]^w[i-8]^w[i-14]^w[i-16]), 1);
// run defined formulas
uint32_t f, k, temp;
if (i < 20) {
f = (b & c) | ((~b) & d);
k = 0x5A827999;
}
else if (i < 40) {
f = b ^ c ^ d;
k = 0x6ED9EBA1;
}
else if (i < 60) {
f = (b & c) | (b & d) | (c & d);
k = 0x8F1BBCDC;
}
else {
f = b ^ c ^ d;
k = 0xCA62C1D6;
}
temp = rotL(a, 5) + f + e + k + w[i];
e = d; d = c; c = rotL(b, 30); b = a; a = temp;
}
// write back the results
h0 += a; h1 += b; h2 += c; h3 += d; h4 += e;
}
// append the 64-bit big endian ml at the end
if (ml < 0x80000000)
blocks[(numCycles*16)-1] = (uint32_t) ml;
else {
blocks[(numCycles*16)-2] = (uint32_t) ml;
blocks[(numCycles*16)-1] = (uint32_t) (ml >> 32);
}
This puts the most-significant 32-bit value first and the least-significant 32-bit value second. That's half the reason your code works.
The other half is that while the 32-bit values are in little-endian form, you are reading their values on a little-endian platform. That will always give you the correct value. You never try to access the individual bytes of the 32-bit values, so which bytes goes where makes no difference.
My input is:
a bit mask mask of width n and some offset k >=0
a bit pattern pattern with 1s in some (but not necessarily all) the positions where the bit mask has 1s.
an integer val
I want to find the next largest integer result such that:
result > val
result & mask == pattern
For example, suppose mask = 0xFF00 and pattern = 0x0100. Then we expect the following result:
NextLargest(mask, pattern, 0x00000) => 0x00100
NextLargest(mask, pattern, 0x000FF) => 0x00100
NextLargest(mask, pattern, 0x010FE) => 0x001FF
NextLargest(mask, pattern, 0x010FF) => 0x10100
Another example -- say mask = 0xF and pattern = 0xF. Then we expect:
NextLargest(mask, pattern, 0x20) => 0x2F.
I've tried something like "strip out the bits that mask cares about, increment it, OR back in pattern and return" but I keep hitting edge cases. The problem is something like a generalization of finding the next largest multiple of some integer.
Here's my attempt so far (runnable link: https://ideone.com/AhXG5M):
#include <iostream>
using namespace std;
using uint32 = unsigned long;
uint32 NextLargest(int width, int offset, uint32 mask, uint32 pattern, uint32 val) {
unsigned long long ret = (val + 1) & ~mask;
if ((ret & ((1 << (offset + 1)) - 1)) == 0) {
// "carry" across the mask
ret += 1 << (offset + width);
}
return ret | pattern;
}
int main() {
// your code goes here
int width = 12;
int offset = 4;
uint32 significant_bits = (1 << (width + 1) - 1) << offset;
uint32 wanted_bits = 0xFFF << offset;
cout << hex;
// want 0xFFF1 -- correct
cout << NextLargest(width, offset, significant_bits, wanted_bits, 0) << endl;
// want 0xFFF2 -- correct
cout << NextLargest(width, offset, significant_bits, wanted_bits, 1) << endl;
// want 0x1FFFF0 -- incorrect, get 0xFFF0
cout << NextLargest(width, offset, significant_bits, wanted_bits, 0xF) << endl;
return 0;
}
I didn't test this, but the following algorithm should work (pseudocode):
let mask, pattern, and val be inputs
let fls be function that finds last bit set in word
let ffs be function that finds first bit set in a word
let applied be (val & ~mask) | pattern
if applied is greater than val then
return applied
let low_order_mask be (1 << ffs(mask)) - 1
if applied == val then
let flipped_low be (~value & low_order_mask)
if not flipped_low then
return applied + 1 // no need to carry
// need to carry
let set_low_zero be applied & ~low_order_mask
let carry be 1 << (fls(mask) + 1)
return set_low_zero + carry
fls and ffs are provided by POSIX, but other systems might not do so. There are answers on SO for how to implement these if you need to.
Think of the value broken into 3.
The bits above the mask, in the mask and below the mask.
H(value), M(value), L(value).
We know M(result)==pattern.
We have three candidates.
C1 is H(value)+pattern+0.
C2 is H(value)+pattern+L(value)+1
C3 is H(value)+pattern+X
X==(mask<<1)&~mask. That is the lowest bit above the mask.
If pattern>M(value) we can use C1.
Reducing the high-bits will get a number <value and setting any low bits will increase the number.
If pattern==M(value) then we can try C2 which is actually value+1.
That fails if adding one overflows to the pattern bits.
That means all the low bits are set and the next lowest place to add is the first bit above the mask.
unsigned next_masked(unsigned mask,unsigned pattern,unsigned value){
unsigned reduced_pattern=(mask&pattern);//May not be required...
unsigned over_add=(mask<<1)&~mask;
unsigned upper_mask=~(over_add-1);
unsigned cand=(value&upper_mask)|reduced_pattern;
if(cand>value){
return cand;
}
if((value&mask)==reduced_pattern){
unsigned scand=value+1;
if((scand&mask)==reduced_pattern){
return scand;
}
}
return cand + over_add;
}
Here it is again with some unit tests:
#include <iostream>
unsigned next_masked(unsigned mask,unsigned pattern,unsigned value){
unsigned reduced_pattern=(mask&pattern);//May not be required...
unsigned over_add=(mask<<1)&~mask;
unsigned upper_mask=~(over_add-1);
unsigned cand=(value&upper_mask)|reduced_pattern;
if(cand>value){
return cand;
}
if((value&mask)==reduced_pattern){
unsigned scand=value+1;
if((scand&mask)==reduced_pattern){
return scand;
}
}
return cand + over_add;
}
bool invariant_next_masked(unsigned mask,unsigned pattern,unsigned value,unsigned result){
if((result&mask)!=(pattern&mask)){
return false;
}
if(result<=value){
return false;
}
for(unsigned test=result-1;test>value;--test){
if((test&mask)==(pattern&mask)){
return false;
}
}
return true;
}
int check_next_masked(unsigned mask,unsigned pattern,unsigned value,unsigned expect){
unsigned result=next_masked(mask,pattern,value);
if(result!=expect){
std::cout << std::hex << mask << ' ' << std::hex << pattern << ' ' << std::hex <<value << "==" << std::hex <<result << "!=" << std::hex <<expect <<'\n';
return 1;
}
if(!invariant_next_masked(mask,pattern,value,result)){
return 1;
}
return 0;
}
int main() {
int errors=0;
errors+=check_next_masked(0xFF00,0x0100,0x0000,0x00100);
errors+=check_next_masked(0xFF00,0x0100,0x00FF,0x00100);
errors+=check_next_masked(0xFF00,0x0100,0x10FE,0x10100);
errors+=check_next_masked(0xFF00,0x0100,0x1067,0x10100);
errors+=check_next_masked(0xFF00,0x0100,0x10123,0x10124);
errors+=check_next_masked(0xFF00,0x0100,0x110FF,0x20100);
errors+=check_next_masked(0xFF00,0x0100,0x102FF,0x20100);
errors+=check_next_masked(0xFF00,0x0100,0x101FF,0x20100);
errors+=check_next_masked(0x000F,0x0007,0x10123,0x10127);
errors+=check_next_masked(0x000F,0x0007,0x10128,0x10137);
errors+=check_next_masked(0x0FF0,0x0230,0x10128,0x10230);
errors+=check_next_masked(0x0FFF0,0x01230,0x01231,0x01232);
errors+=check_next_masked(0x0FFF0,0x01230,0x41237,0x41238);
errors+=check_next_masked(0x0FFF0,0x01230,0x4123F,0x51230);
if(errors>0){
std::cout << "Errors "<< errors << '\n';
return 1;
}
std::cout << "Success\n";
return 0;
}
is the problem to calculate the LARGEST or SMALLEST next value?? the largest value seems odd to be. if the requirement is to calculate the smallest value, I think this code should work: (tested on gcc 7.1, assuming a 64 bit target, sizeof(void *) == sizeof(size_t) == sizeof(uint64_t))
size_t next_smallest_value (size_t mask, size_t pattern, size_t x) {
assert(pattern & mask == pattern);
// only change bits within mask range to meet the requirement
auto y = x & ~mask | pattern;
if (y > x) {
// if the operation increased the value
// mask off all the lower bits
auto lsb_mask = __builtin_ctzll(mask);
return y & ~ones(lsb_mask);
} else {
// otherwise, the operation decreased or didn't change the value
// need to increase the fraction higher than the mask
auto msb_mask = 63 - __builtin_clzll(mask);
// higher part cannot be empty if the masked part decrease
assert(msb_mask < 63);
auto higher = ((y >> msb_mask) + 1) << msb_mask;
// also higher part cannot overflow
assert(higher != 0);
return y & mask | higher;
}
}
the idea is very simple: divide the bits into 3 parts: higher part, masked part, lower part. the masked part can be derived directly from and is determined by the mask and pattern, it cannot be other values.
after calculating the masked bits, if the value increase, just mask off all bits in the lower part. otherwise, increase the higher part by 1 (and also mask off all the lower bits).
the above code doesn't deal with ill-formed input, it will trigger the assertions, but the checks are not exhausted.
Here is two functions for this little bit confusing question
First function gives the largest next integer that fulfills the requirements for the result.
Second one gives SMALLEST next value.
1: Get the LARGEST integer which satisfies result & mask == pattern and result > val:
unsigned NextLargest (unsigned mask, unsigned pattern, unsigned val) {
// zero "mask" bits and set "pattern" bits in largest (unsigned) int
unsigned const x = ~mask | pattern;
// if result is not greater than val, we can't satisfy requirements
if (x <= val) {
... report error, return error code or throw something
}
return x;
}
Obviously this just returns highest (unsigned) integer value that meets the requirements result & mask == pattern and result > val. The if-clause checks if result will not be greater than val, and the function will fail.
2: Get the SMALLEST next value after val that meets the requirements:
unsigned NextSmallest (unsigned mask, unsigned pattern, unsigned val) {
unsigned const x = (val + mask + 1) & ~mask | pattern;
if (x <= val) {
... increment wrapped, can't give greater value
}
return x;
}
edit: Changed (val|mask) to val+mask because the result must be still greater than val.
This function calculates val + 1 and carrying overflowing bits over mask'd bits.
Here are few examples what the function does, if mask = 0x0ff00 and pattern = 0x00500:
val +mask +1 &~mask |pattern == result
0x00000 0x0ff00 0x0ff01 0x00001 0x00501
0x00001 0x0ff01 0x0ff02 0x00002 0x00502
0x000fe 0x0fffe 0x0ffff 0x000ff 0x005ff
0x000ff 0x0ffff 0x10000 0x10000 0x10500
0x00100 0x10000 0x10001 0x10001 0x10501
0x0f000 0x1ef00 0x1ef01 0x10001 0x10501
0x0ff00 0x1fe00 0x1fe01 0x10001 0x10501
0x0ffff 0x1feff 0x1ff00 0x10000 0x10500
0x10000 0x1ff00 0x1ff01 0x10001 0x10501
0x10001 0x1ff01 0x1ff02 0x10002 0x10502
0x100ff 0x1ffff 0x20000 0x20000 0x20500
After long editing and rewriting I still can't give good enough answer for the question. Its examples has weird results. I still leave this here, if someone finds these functions or parts of them useful. Also I did not actually tested the functions on the computer.
Is there a clever (ie: branchless) way to "compact" a hex number. Basically move all the 0s all to one side?
eg:
0x10302040 -> 0x13240000
or
0x10302040 -> 0x00001324
I looked on Bit Twiddling Hacks but didn't see anything.
It's for a SSE numerical pivoting algorithm. I need to remove any pivots that become 0. I can use _mm_cmpgt_ps to find good pivots, _mm_movemask_ps to convert that in to a mask, and then bit hacks to get something like the above. The hex value gets munged in to a mask for a _mm_shuffle_ps instruction to perform a permutation on the SSE 128 bit register.
To compute mask for _pext:
mask = arg;
mask |= (mask << 1) & 0xAAAAAAAA | (mask >> 1) & 0x55555555;
mask |= (mask << 2) & 0xCCCCCCCC | (mask >> 2) & 0x33333333;
First do bit-or on pairs of bits, then on quads. Masks prevent shifted values from overflowing to other digits.
After computing mask this way or harold's way (which is probably faster) you don't need the full power of _pext, so if targeted hardware doesn't support it you can replace it with this:
for(int i = 0; i < 7; i++) {
stay_mask = mask & (~mask - 1);
arg = arg & stay_mask | (arg >> 4) & ~stay_mask;
mask = stay_mask | (mask >> 4);
}
Each iteration moves all nibbles one digit to the right if there is some space. stay_mask marks bits that are in their final positions. This uses somewhat less operations than Hacker's Delight solution, but might still benefit from branching.
Supposing we can use _pext_u32, the issue then is computing a mask that has an F for every nibble that isn't zero. I'm not sure what the best approach is, but you can compute the OR of the 4 bits of the nibble and then "spread" it back out to F's like this:
// calculate horizontal OR of every nibble
x |= x >> 1;
x |= x >> 2;
// clean up junk
x &= 0x11111111;
// spread
x *= 0xF;
Then use that as the mask of _pext_u32.
_pext_u32 can be emulated by this (taken from Hacker's Delight, figure 7.6)
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel prefix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}
But that's a bit of a disaster. It's probably better to just resort to branching code then.
uint32_t fun(uint32_t val) {
uint32_t retVal(0x00);
uint32_t sa(28);
for (int sb(28); sb >= 0; sb -= 4) {
if (val & (0x0F << sb)) {
retVal |= (0x0F << sb) << (sa - sb)
sa -= 4;
}
}
return retVal;
}
I think this (or something similar) is what you're looking for. Eliminating the 0 nibbles within a number. I've not debugged it, and it would only works on one side atm.
If your processor supports conditional instruction execution, you may get a benefit from this algorithm:
uint32_t compact(uint32_t orig_value)
{
uint32_t mask = 0xF0000000u; // Mask for isolating a hex digit.
uint32_t new_value = 0u;
for (unsigned int i = 0; i < 8; ++i) // 8 hex digits
{
if (orig_value & mask == 0u)
{
orig_value = orig_value << 4; // Shift the original value by 1 digit
}
new_value |= orig_value & mask;
mask = mask >> 4; // next digit
}
return new_value;
}
This looks like a good candidate for loop unrolling.
The algorithm assumes that when the original value is shifted left, zeros are shifted in, filling in the "empty" bits.
Edit 1:
On a processor that supports conditional execution of instructions, the shifting of the original value would be conditionally executed depending on the result of the ANDing of the original value and the mask. Thus no branching, only ignored instructions.
I came up with the following solution. Please take a look, maybe it will help you.
#include <iostream>
#include <sstream>
#include <algorithm>
using namespace std;
class IsZero
{
public:
bool operator ()(char c)
{
return '0' == c;
}
};
int main()
{
int a = 0x01020334; //IMPUT
ostringstream my_sstream;
my_sstream << hex << a;
string str = my_sstream.str();
int base_str_length = str.size();
cout << "Input hex: " << str << endl;
str.insert(remove_if(begin(str), end(str), IsZero()), count_if(begin(str), end(str), IsZero()), '0');
str.replace(begin(str) + base_str_length, end(str), "");
cout << "Processed hex: " << str << endl;
return 0;
}
Output:
Input hex: 1020334
Processed hex: 1233400
I have a bit-mask of N chars in size, which is statically known (i.e. can be calculated at compile time, but it's not a single constant, so I can't just write it down), with bits set to 1 denoting the "wanted" bits. And I have a value of the same size, which is only known at runtime. I want to collect the "wanted" bits from that value, in order, into the beginning of a new value. For simplicity's sake let's assume the number of wanted bits is <= 32.
Completely unoptimized reference code which hopefully has the correct behaviour:
template<int N, const char mask[N]>
unsigned gather_bits(const char* val)
{
unsigned result = 0;
char* result_p = (char*)&result;
int pos = 0;
for (int i = 0; i < N * CHAR_BIT; i++)
{
if (mask[i/CHAR_BIT] & (1 << (i % CHAR_BIT)))
{
if (val[i/CHAR_BIT] & (1 << (i % CHAR_BIT)))
{
if (pos < sizeof(unsigned) * CHAR_BIT)
{
result_p[pos/CHAR_BIT] |= 1 << (pos % CHAR_BIT);
}
else
{
abort();
}
}
pos += 1;
}
}
return result;
}
Although I'm not sure whether that formulation actually allows access to the contents of the mask at compile time. But in any case, it's available for use, maybe a constexpr function or something would be a better idea. I'm not looking here for the necessary C++ wizardry (I'll figure that out), just the algorithm.
An example of input/output, with 16-bit values and imaginary binary notation for clarity:
mask = 0b0011011100100110
val = 0b0101000101110011
--
wanted = 0b__01_001__1__01_ // retain only those bits which are set in the mask
result = 0b0000000001001101 // bring them to the front
^ gathered bits begin here
My questions are:
What's the most performant way to do this? (Are there any hardware instructions that can help?)
What if both the mask and the value are restricted to be unsigned, so a single word, instead of an unbounded char array? Can it then be done with a fixed, short sequence of instructions?
There will pext (parallel bit extract) that does exactly what you want in Intel Haswell. I don't know what the performance of that instruction will be, probably better than the alternatives though. This operation is also known as "compress-right" or simply "compress", the implementation from Hacker's Delight is this:
unsigned compress(unsigned x, unsigned m) {
unsigned mk, mp, mv, t;
int i;
x = x & m; // Clear irrelevant bits.
mk = ~m << 1; // We will count 0's to right.
for (i = 0; i < 5; i++) {
mp = mk ^ (mk << 1); // Parallel prefix.
mp = mp ^ (mp << 2);
mp = mp ^ (mp << 4);
mp = mp ^ (mp << 8);
mp = mp ^ (mp << 16);
mv = mp & m; // Bits to move.
m = m ^ mv | (mv >> (1 << i)); // Compress m.
t = x & mv;
x = x ^ t | (t >> (1 << i)); // Compress x.
mk = mk & ~mp;
}
return x;
}
I try to determine the right most nth bit set
if (value & (1 << 0)) { return 0; }
if (value & (1 << 1)) { return 1; }
if (value & (1 << 2)) { return 2; }
...
if (value & (1 << 63)) { return 63; }
if comparison needs to be done 64 times. Is there any faster way?
If you're using GCC, use the __builtin_ctz or __builtin_ffs function. (http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Other-Builtins.html#index-g_t_005f_005fbuiltin_005fffs-2894)
If you're using MSVC, use the _BitScanForward function. See How to use MSVC intrinsics to get the equivalent of this GCC code?.
In POSIX there's also a ffs function. (http://linux.die.net/man/3/ffs)
There's a little trick for this:
value & -value
This uses the twos' complement integer representation of negative numbers.
Edit: This doesn't quite give the exact result as given in the question. The rest can be done with a small lookup table.
You could use a loop:
unsigned int value;
unsigned int temp_value;
const unsigned int BITS_IN_INT = sizeof(int) / CHAR_BIT;
unsigned int index = 0;
// Make a copy of the value, to alter.
temp_value = value;
for (index = 0; index < BITS_IN_INT; ++index)
{
if (temp_value & 1)
{
break;
}
temp_value >>= 1;
}
return index;
This takes up less code space than the if statement proposal, with similar functionality.
KennyTM's suggestions are good if your compiler supports them. Otherwise, you can speed it up using a binary search, something like:
int result = 0;
if (!(value & 0xffffffff)) {
result += 32;
value >>= 32;
}
if (!(value & 0xffff)) {
result += 16;
value >>= 16;
}
and so on. This will do 6 comparisons (in general, log(N) comparisons, versus N for a linear search).
b = n & (-n) // finds the bit
b -= 1; // this gives 1's to the right
b--; // this gets us just the trailing 1's that need counting
b = (b & 0x5555555555555555) + ((b>>1) & 0x5555555555555555); // 2 bit sums of 1 bit numbers
b = (b & 0x3333333333333333) + ((b>>2) & 0x3333333333333333); // 4 bit sums of 2 bit numbers
b = (b & 0x0f0f0f0f0f0f0f0f) + ((b>>4) & 0x0f0f0f0f0f0f0f0f); // 8 bit sums of 4 bit numbers
b = (b & 0x00ff00ff00ff00ff) + ((b>>8) & 0x00ff00ff00ff00ff); // 16 bit sums of 8 bit numbers
b = (b & 0x0000ffff0000ffff) + ((b>>16) & 0x0000ffff0000ffff); // 32 bit sums of 16 bit numbers
b = (b & 0x00000000ffffffff) + ((b>>32) & 0x00000000ffffffff); // sum of 32 bit numbers
b &= 63; // otherwise I think an input of 0 would produce 64 for a result.
This is in C of course.
Here's another method that takes advantage of short-circuit with logical AND operations and conditional instruction execution or the instruction pipeline.
unsigned int value;
unsigned int temp_value = value;
bool bit_found = false;
unsigned int index = 0;
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 0
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 1
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 2
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 3
//...
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 64
return index - 1; // The -1 may not be necessary depending on the starting bit number.
The advantage to this method is that there are no branches and the instruction pipeline is not disturbed. This is very fast on processors that perform conditional execution of instructions.
Works for Visual C++ 6
int toErrorCodeBit(__int64 value) {
const int low_double_word = value;
int result = 0;
__asm
{
bsf eax, low_double_word
jz low_double_value_0
mov result, eax
}
return result;
low_double_value_0:
const int upper_double_word = value >> 32;
__asm
{
bsf eax, upper_double_word
mov result, eax
}
result += 32;
return result;
}