How to check if exactly one bit is set in an int? - c++

I have an std::uint32_t and want to check if exactly one bit is set. How can I do this without iterating over all bits like this? In other words, can the following function be simplified?
static inline bool isExactlyOneBitSet(std::uint32_t bits)
{
return ((bits & 1) == bits
|| (bits & 1 << 1) == bits
|| (bits & 1 << 2) == bits
// ...
|| (bits & 1 << 31) == bits
);
}
Bonus: It would be nice if the return value was the one found bit or else 0.
static inline bool isExactlyOneBitSet(std::uint32_t bits)
{
if (bits & 1) {return 1;}
else if (bits & 1 << 1) {return 1 << 1;};
//...
else if (bits & 1 << 31) {return 1 << 31;};
return 0;
}

So you want to know if a number is power of 2 or not? Well there is a famous algorithm for that, you can simply do,
check_bit(std::uint32_t bits)
{
return bits && !(bits & (bits-1));
}
Any power of 2 when subtracted by 1 is all 1s. e.g,
4 - 1 = 3 (011)
8 - 1 = 7 (0111)
The bitwise and of any power of 2 and any number 1 less than it will give 0. So we can verify if a number is power of 2 or not by using the expression, n&(n-1).
It will fail when n=0, so we have to add an extra and condition.
For finding the position of bit, you can do:
int findSetBit(std::uint32_t bits)
{
if (!(bits && !(bits & (bits-1))))
return 0;
return log2(bits) + 1;
}
Extra Stuffs
In gcc, you can use __builtin_popcount(), to find the count of set bits in any number.
#include <iostream>
int main()
{
std::cout << __builtin_popcount (4) << "\n";
std::cout << __builtin_popcount (3) << "\n";
return 0;
}
Then check if count is equal to 1 or not.
Regarding count, there is another famous algorithm, Brian Kernighan’s Algorithm. Google it up, it finds count in log(n) time.

Here's a solution for your bonus question (and of course, it is a solution for your original question as well):
std::uint32_t exactlyOneBitSet(std::uint32_t bits) {
return bits&(((bool)(bits&(bits-1)))-1);
}
This compiles down to only 4 instructions on x86_64 with clang:
0000000000000000 <exactlyOneBitSet(unsigned int)>:
0: 8d 4f ff lea -0x1(%rdi),%ecx
3: 31 c0 xor %eax,%eax
5: 85 f9 test %edi,%ecx
7: 0f 44 c7 cmove %edi,%eax
a: c3 retq

Related

How to create a number with (f)16 repeating n times?

I need to create a number where (f)16 repeats n times. 0 < n <= 16.
I tried the following for example for n = 16
std::cout << "hi:" << std::hex << std::showbase << (1ULL << 64) - 1 << std::endl;
warning: shift count >= width of type [-Wshift-count-overflow]
std::cout << "hi:" << std::hex << std::showbase << (1ULL << 64) - 1 << std::endl;
^ ~~ 1 warning generated.
hi:0x200
How can I get all digits f without overflowing ULL ?
For n = 1 to 16, you could start with all Fs and then shift accordingly:
0xFFFFFFFFFFFFFFFFULL >> (4*(16-n));
(handle n=0 separately)
where (f)16 repeats n times.
If I understood that correctly, I believe that's trivial. Add one f. Shift the number to the left by 4 bits. Add another f. Shift to the left 4 bits. Add another f. Repeat n times.
#include <stdio.h>
unsigned long long gen(unsigned n) {
unsigned long long r = 0;
while (n--) {
r <<= 4;
r |= 0xf;
}
return r;
}
int main() {
for (int i = 0; i < 16; ++i) {
printf("%d -> %llx\n", i, gen(i));
}
}
outputs:
0 -> 0
1 -> f
2 -> ff
3 -> fff
4 -> ffff
5 -> fffff
6 -> ffffff
7 -> fffffff
8 -> ffffffff
9 -> fffffffff
10 -> ffffffffff
11 -> fffffffffff
12 -> ffffffffffff
13 -> fffffffffffff
14 -> ffffffffffffff
15 -> fffffffffffffff
Since shifting by 4*n bits is problematic if n is 16 and unsigned long long is 64 bits, you can solve the problem by shifting by a smaller amount. If n is known to be positive, we can partition it into two shifts:
(1ull << 4 << 4*(n-1)) - 1u
And, since 1ull << 4 is a constant, we can replace it:
(0x10ull << 4*(n-1)) - 1u
If n can be zero, then, to support any value from 0 to 16, we cannot use a single expression. A solution is:
n ? 0 : (0x10ull << 4*(n-1)) - 1u
If you're only interrested in in hex format and the digit f, use the other answers.
The function below can generate the number for both hex and decimal formats and for any digit.
#include <iostream>
uint64_t getNum(uint64_t digit, uint64_t times, uint64_t base)
{
if (base != 10 && base != 16) return 0;
if (digit >= base) return 0;
uint64_t res = 0;
uint64_t multiply = 1;
for(uint64_t i = 0; i < times; ++i)
{
res += digit * multiply;
multiply *= base;
}
return res;
}
int main() {
std::cout << getNum(3, 7, 10) << std::endl;
std::cout << std::hex << getNum(0xa, 14, 16) << std::dec << std::endl;
return 0;
}
Output:
3333333
aaaaaaaaaaaaaa
notice: The current code has no overflow detection.
You can write a separate function looking for example the following way.
#include <stdio.h>
unsigned long long create_hex( size_t n )
{
unsigned long long x = 0;
n %= 2 * sizeof( unsigned long long );
while ( n-- )
{
x = x << 4 | 0xf;
}
return x;
}
int main( void )
{
for ( size_t i = 0; i <= 16; i++ )
{
printf( "%zu -> %llx\n", i, create_hex( i ) );
}
}
The program output is
0 -> 0
1 -> f
2 -> ff
3 -> fff
4 -> ffff
5 -> fffff
6 -> ffffff
7 -> fffffff
8 -> ffffffff
9 -> fffffffff
10 -> ffffffffff
11 -> fffffffffff
12 -> ffffffffffff
13 -> fffffffffffff
14 -> ffffffffffffff
15 -> fffffffffffffff
16 -> 0
As initially you was using two language tag, C and C++, then to run this program as a C++ program substitute the header <stdio.h> for <iostream> and use the operator << instead of the call of printf.

Creating a mask around a subsection [i,j] for a number [duplicate]

This question already has answers here:
Fastest way to produce a mask with n ones starting at position i
(5 answers)
Closed 3 years ago.
I'm learning bit manipulation and bitwise operators currently and was working on a practice problem where you have to merge a subsection[i,j] of an int M into N at [i,j]. I created the mask in a linear fashion but after googling i found that ~0 << j | ((1 << i) - 1) creates the mask I wanted. However, I am not sure why. If anyone could provide clarification that would great, thanks.
void merge(int N, int M, int i, int j){
int mask = ~0 << j | ((1 << i) - 1);
N = N & mask; // clearing the bits [i,j] in N
mask = ~(mask); // inverting the mask so that we can isolate [i,j] in
//M
M = M & mask; // clearing the bits in M outside of [i,j]
// merging the subsection [i,j] in M into N at [i,j] by using OR
N = N | M;
}
~0 is the "all 1 bits" number. When you shift it up by j, you make the least significant j bits into 0:
1111111111111111 == ~0 == ~0 << 0
1111111111111110 == ~0 << 1
1111111111100000 == ~0 << 5
1111111110000000 == ~0 << 7
1 << i is just the i + 1th least significant bit turned on.
0000000000000001 == 1 << 0
0000000000000010 == 1 << 1
0000000000001000 == 1 << 3
0000000001000000 == 1 << 6
When you subtract 1 from this, there is a one carried all the way from the left, so you are left with all the bits before the 1 bit becoming 1 (So you end up with the first i least significant bits turned on).
0000000000000000 == (1 << 0) - 1
0000000000000001 == (1 << 1) - 1
0000000000000111 == (1 << 3) - 1
0000000000111111 == (1 << 6) - 1
When you or them, you end up with a window between the jth least significant bit and the i + 1th least significant bit turned on (inclusive).
1111111110000000 == ~0 << 7
0000000000000111 == (1 << 3) - 1
1111111110000111 == ~0 << 7 | ((1 << 3) - 1)
7 3
When you & a number with this mask, you clear the bits in the range (i, j] (The ith bit itself is not included).
When you ~ the mask, you get a new mask that will only give you the bits in the range (i, j].
1111111110000111 == ~0 << 7 | ((1 << 3) - 1)
0000000001111000 == ~(~0 << 7 | ((1 << 3) - 1))
Which could also be constructed with something like ((1 << j) - 1) & ~((1 << i) - 1).

Convert a 74-bit integer to base 31

To generate a UFI number, I use a bitset of size 74. To perform step 2 of UFI generation, I need to convert this number:
9 444 732 987 799 592 368 290
(10000000000000000000000000000101000001000001010000011101011111100010100010)
into:
DFSTTM62QN6DTV1
by converting the first representation to base 31 and getting the equivalent chars from a table.
#define PAYLOAD_SIZE 74
// payload = binary of 9444732987799592368290
std::bitset<PAYLOAD_SIZE> bs_payload(payload);
/*
perform modulo 31 to obtain:
12(D), 14(F), 24(S), 25(T), 25, 19, 6, 2, 22, 20, 6, 12, 25, 27, 1
*/
Is there a way to perform the conversion on my bitset without using an external BigInteger library?
Edit: I finally done a BigInteger class even if the Cheers and hth. - Alf's solution works like a charm
To get modulo 31 of a number you just need to sum up the digits in base 32, just like how you calculate modulo 3 and 9 of a decimal number
unsigned mod31(std::bitset<74> b) {
unsigned mod = 0;
while (!b.none()) {
mod += (b & std::bitset<74>(0x1F)).to_ulong();
b >>= 5;
}
while (mod > 31)
mod = (mod >> 5) + (mod & 0x1F);
return mod;
}
You can speedup the modulo calculation by running the additions in parallel like how its done here. The similar technique can be used to calculate modulo 3, 5, 7, 15... and 231 - 1
C - Algorithm for Bitwise operation on Modulus for number of not a power of 2
Is there any easy way to do modulus of 2^32 - 1 operation?
Logic to check the number is divisible by 3 or not?
However since the question is actually about base conversion and not about modulo as the title said, you need to do a real division for this purpose. Notice 1/b is 0.(1) in base b + 1, we have
1/31 = 0.000010000100001000010000100001...32 = 0.(00001)32
and then N/31 can be calculated like this
N/31 = N×2-5 + N×2-10 + N×2-15 + ...
uint128_t result = 0;
while (x)
{
x >>= 5;
result += x;
}
Since both modulo and division use shift-by-5, you can also do both them together in a single loop.
However the tricky part here is how to round the quotient properly. The above method will work for most values except some between a multiple of 31 and the next power of 2. I've found the way to correct the result for values up to a few thousands but yet to find a generic way for all values
You can see the same shift-and-add method being used to divide by 10 and by 3. There are more examples in the famous Hacker's Delight with proper rounding. I didn't have enough time to read through the book to understand how they implement the result correction part so maybe I'll get back to this later. If anyone has any idea to do that it'll be grateful.
One suggestion is to do the division in fixed-point. Just shift the value left so that we have enough fractional part to round later
uint128_t result = 0;
const unsigned num_fraction = 125 - 75 // 125 and 75 are the nearest multiple of 5
// or maybe 128 - 74 will also work
uint128_t x = UFI_Number << num_fraction;
while (x)
{
x >>= 5;
result += x;
}
// shift the result back and add the fractional bit to round
result = (result >> num_fraction) + ((result >> (num_fraction - 1)) & 1)
Note that your result above is incorrect. I've confirmed the result is CEOPPJ62MK6CPR1 from both Yaniv Shaked's answer and Wolfram alpha unless you use different symbols for the digits
This code seems to work. To guarantee the result I think you need to do additional testing. E.g. first with small numbers where you can compute the result directly.
Edit: Oh, now I noticed you posted the required result digits, and they match. Means it's generally good, but still not tested for corner cases.
#include <assert.h>
#include <algorithm> // std::reverse
#include <bitset>
#include <vector>
#include <iostream>
using namespace std;
template< class Type > using ref_ = Type&;
namespace base31
{
void mul2( ref_<vector<int>> digits )
{
int carry = 0;
for( ref_<int> d : digits )
{
const int local_sum = 2*d + carry;
d = local_sum % 31;
carry = local_sum / 31;
}
if( carry != 0 )
{
digits.push_back( carry );
}
}
void add1( ref_<vector<int>> digits )
{
int carry = 1;
for( ref_<int> d : digits )
{
const int local_sum = d + carry;
d = local_sum % 31;
carry = local_sum / 31;
}
if( carry != 0 )
{
digits.push_back( carry );
}
}
void divmod2( ref_<vector<int>> digits, ref_<int> mod )
{
int carry = 0;
for( int i = int( digits.size() ) - 1; i >= 0; --i )
{
ref_<int> d = digits[i];
const int divisor = d + 31*carry;
carry = divisor % 2;
d = divisor/2;
}
mod = carry;
if( digits.size() > 0 and digits.back() == 0 )
{
digits.resize( digits.size() - 1 );
}
}
}
int main() {
bitset<74> bits(
"10000000000000000000000000000101000001000001010000011101011111100010100010"
);
vector<int> reversed_binary;
for( const char ch : bits.to_string() ) { reversed_binary.push_back( ch - '0' ); }
vector<int> base31;
for( const int bit : reversed_binary )
{
base31::mul2( base31 );
if( bit != 0 )
{
base31::add1( base31 );
}
}
{ // Check the conversion to base31 by converting back to base 2, roundtrip:
vector<int> temp31 = base31;
int mod;
vector<int> base2;
while( temp31.size() > 0 )
{
base31::divmod2( temp31, mod );
base2.push_back( mod );
}
reverse( base2.begin(), base2.end() );
cout << "Original : " << bits.to_string() << endl;
cout << "Reconstituted: ";
string s;
for( const int bit : base2 ) { s += bit + '0'; cout << bit; }; cout << endl;
assert( s == bits.to_string() );
}
cout << "Base 31 digits (msd to lsd order): ";
for( int i = int( base31.size() ) - 1; i >= 0; --i )
{
cout << base31[i] << ' ';
}
cout << endl;
cout << "Mod 31 = " << base31[0] << endl;
}
Results with MinGW g++:
Original : 10000000000000000000000000000101000001000001010000011101011111100010100010
Reconstituted: 10000000000000000000000000000101000001000001010000011101011111100010100010
Base 31 digits (msd to lsd order): 12 14 24 25 25 19 6 2 22 20 6 12 25 27 1
Mod 31 = 1
I did not compile the psuedo code, but you can get the generate understanding of how to convert the number:
// Array for conversion of value to base-31 characters:
char base31Characters[] =
{
'0',
'1',
'2',
...
'X',
'Y'
};
void printUFINumber(__int128_t number)
{
string result = "";
while (number != 0)
{
var mod = number % 31;
result = base31Characters[mod] + result;
number = number / 31;
}
cout << number;
}

C++ Extracting a character from an image using bit-wise operations

this is my first time here asking a questions, so bear with me! I have a steganography lab that I am nearly complete with. I have completed a program that hides a message in the lower bits of an image, but the program to extract the image is where I am stuck. The image is in a file represented as a 2D matrix, column major order. So here is the code where I am stuck.
void image::reveal_message()
{
int bitcount = 0;
char c;
char *msg;
while(c != '\0' || bitcount < 1128)
{
for(int z = 0; z < cols; z++)
{
for(int k = 0; k < 8; k++)
{
int i = bitcount % rows ;
int j = bitcount / rows ;
int b = c & 1;
if(img[i][j] % 2 != 0 && b == 0)
{
c = c & (~1);
}
else if(img[i][j] % 2 == 0 && b == 1)
{
c = c | 1;
}
bitcount++;
c = c << 1;
}
reverse_bits(c);
cout << c << endl;
//strncat(msg, &c, 1);
}
}
int i = 0;
for(int i = 0; i < cols; i++)
{
if(!isprint(msg[i]))
{
cout << "There is no hidden message" << endl;
}
}
cout << "This is the hidden message" << endl;
cout << msg;
}
The code is able to loop through and grab all the right number for the bits. The bits are based on if the number in the matrix is odd or even. Where I am having trouble is actually setting the bits of the char to the bits the I extracted from the matrix. I am not the best at bit-wise operations, and we are also not supposed to use any library for this. The reverse_bits function works as well, so it seems to be just my shifting and bit-wise operations are messed up.I also commented out the strcat() line because it was producing a lot of errors due to the fact that char c is incorrect. Also the main error I keep receiving is Segmentation Dump.
My understanding from your code is that you embedded your message as 1 bit per pixel, row by row. For example, if you have a 3x10 image, with pixels
01 02 03 04 05 06 07 08 09 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
the first character of your message resides in the pixels 01-08, the second from 09 to 16, etc. After your message, you embedded an extra null character, which you can use during extraction to know when to stop. With all that in mind, you're looking for something like this.
int bitcount = 0;
int i = 0;
int j = 0;
while(bitcount < 1128)
{
// this will serve as the ordinal value for the extracted char
int b = 0;
for(int k = 0; k < 8; k++)
{
b = (b << 1) | (img[i][j] & 1);
j++;
if(j == cols)
{
i++;
j = 0;
}
}
bitcount += 8;
// do whatever you want with this, print it, store it somewhere, etc
c = (char)b;
if(c == '\0')
{
break;
}
}
Understanding how the bitshifting work. b starts with the value 0, or 00000000 if you would like to visualise it in binary. Every time, you shift it to the left by one to make room for the new extracted bit, which you OR. No need to check whether it's 1 or 0, it'll just work.
So, imagine you've extracted 5 bits so far, b is 00010011 and the least significant bit of the current image pixel is 1. What will happen is this
b = (b << 1) | 1 // b = 00100110 | 1 = 00100111
And thus you have extracted the 6th bit.
Now, let's say you embedded the character "a" (01100001) in the first 8 pixels.
01 02 03 04 05 06 07 08 \\ pixels
0 1 1 0 0 0 0 1 \\ least significant bit of each pixel
When you extract the bits with the above, b will equal to 97 and c will give you "a". However, if you embedded your bits in the reverse order, i.e.,
01 02 03 04 05 06 07 08 \\ pixels
1 0 0 0 0 1 1 0 \\ least significant bit of each pixel
you should change the extracting algorithm to the following so you won't have to reverse the bits later on
int b = 0;
for(int k = 7; k <= 0; k--)
{
b = b | ((img[i][j] & 1) << k);
// etc
}
You start with undefined data in your char c.
You read from it here int b = c & 1;.
That is clearly nonsense.
c = c <<1; // shift before, not after
// if odd clear:
if(img[i][j] % 2)
{
c = c & (~1);
}
else // if even set:
{
c = c | 1;
}
the above may not read the data, but at least is not nonesense.
The bitwise operations look otherwise fine.
char *msg; should be std::string, and use += instead of strncat.

What is unoptimized about this code? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I wrote a solution for a question on interviewstreet, here is the problem description:
https://www.interviewstreet.com/challenges/dashboard/#problem/4e91289c38bfd
Here is the solution they have given:
https://gist.github.com/1285119
Here is the solution that I coded:
#include<iostream>
#include <string.h>
using namespace std;
#define LOOKUPTABLESIZE 10000000
int popCount[2*LOOKUPTABLESIZE];
int main()
{
int numberOfTests = 0;
cin >> numberOfTests;
for(int test = 0;test<numberOfTests;test++)
{
int startingNumber = 0;
int endingNumber = 0;
cin >> startingNumber >> endingNumber;
int numberOf1s = 0;
for(int number=startingNumber;number<=endingNumber;number++)
{
if(number >-LOOKUPTABLESIZE && number < LOOKUPTABLESIZE)
{
if(popCount[number+LOOKUPTABLESIZE] != 0)
{
numberOf1s += popCount[number+LOOKUPTABLESIZE];
}
else
{
popCount[number+LOOKUPTABLESIZE] =__builtin_popcount (number);
numberOf1s += popCount[number+LOOKUPTABLESIZE];
}
}
else
{
numberOf1s += __builtin_popcount (number);
}
}
cout << numberOf1s << endl;
}
}
Can you please point me what is wrong with my code? It can only pass 3/10 of tests. The time limit is 3 seconds.
What is unoptimized about this code?
The algorithm. You are looping
for(int number=startingNumber;number<=endingNumber;number++)
computing or looking up the number of 1-bits in each. That can take a while.
A good algorithm counts the number of 1-bits in all numbers 0 <= k < n in O(log n) time using a bit of math.
Here is an implementation counting 0s in decimal expansions, the modification to make it count 1-bits shouldn't be hard.
When looking at such a question, you need to break it down in simple pieces.
For example, suppose that you know how many 1s there are in all numbers [0, N] (let's call this ones(N)), then we have:
size_t ones(size_t N) { /* magic ! */ }
size_t count(size_t A, size_t B) {
return ones(B) - (A ? ones(A - 1) : 0);
}
This approach has the advantage that one is probably simpler to program that count, for example using recursion. As such, a first naive attempt would be:
// Naive
size_t naive_ones(size_t N) {
if (N == 0) { return 0; }
return __builtin_popcount(N) + naive_ones(N-1);
}
But this is likely to be too slow. Even when simply computing the value of count(B, A) we will be computing naive_ones(A-1) twice!
Fortunately, there is always memoization to assist here, and the transformation is quite trivial:
size_t memo_ones(size_t N) {
static std::deque<size_t> Memo(1, 0);
for (size_t i = Memo.size(); i <= N; ++i) {
Memo.push_back(Memo[i-1] + __builtin_popcnt(i));
}
return Memo[N];
}
It's likely that this helps, however the cost in terms of memory might be... crippling. Ugh. Imagine that for computing ones(1,000,000) we will occupy 8MB of memory on a 64bits computer! A sparser memoization could help (for example, only memoizing every 8th or 16th count):
// count number of ones in (A, B]
static unoptimized_count(size_t A, size_t B) {
size_t result = 0;
for (size_t i = A + 1; i <= B; ++i) {
result += __builtin_popcount(i);
}
return result;
}
// something like this... be wary it's not tested.
size_t memo16_ones(size_t N) {
static std::vector<size_t> Memo(1, 0);
size_t const n16 = N - (N % 16);
for (size_t i = Memo.size(); i*16 <= n16; ++i) {
Memo.push_back(Memo[i-1] + unoptimized_count(16*(i-1), 16*i);
}
return Memo[n16/16] + unoptimized_count(n16, N);
}
However, while it does reduce the memory cost, it does not solve the main speed issue: we must at least use __builtin_popcount B times! And for large values of B this is a killer.
The above solutions are mechanical, they did not require one ounce of thought. It turns out that interviews are not so much about writing code than they are about thinking.
Can we solve this problem more efficiently than dumbly enumerating all integers until B ?
Let's see what our brains (quite the amazing pattern machine) picks up when considering the first few entries:
N bin 1s ones(N)
0 0000 0 0
1 0001 1 1
2 0010 1 2
3 0011 2 4
4 0100 1 5
5 0101 2 7
6 0110 2 9
7 0111 3 12
8 1000 1 13
9 1001 2 15
10 1010 2 17
11 1011 3 20
12 1100 2 22
13 1101 3 25
14 1110 3 28
15 1111 3 32
Notice a pattern ? I do ;) The range 8-15 is built exactly like 0-7 but with one more 1 per line => it's like a transposition. And it's quite logical too, isn't it ?
Therefore, ones(15) - ones(7) = 8 + ones(7), ones(7) - ones(3) = 4 + ones(3) and ones(1) - ones(0) = 1 + ones(0).
Well, let's make this a formula:
Reminder: ones(N) = popcount(N) + ones(N-1) (almost) by definition
We now know that ones(2**n - 1) - ones(2**(n-1) - 1) = 2**(n-1) + ones(2**(n-1) - 1)
Let's make isolate ones(2**n), it's easier to deal with, note that popcount(2**n) = 1:
regroup: ones(2**n - 1) = 2**(n-1) + 2*ones(2**(n-1) - 1)
use the definition: ones(2**n) - 1 = 2**(n-1) + 2*ones(2**(n-1)) - 2
simplify: ones(2**n) = 2**(n-1) - 1 + 2*ones(2**(n-1)), with ones(1) = 1.
Quick sanity check:
1 = 2**0 => 1 (bottom)
2 = 2**1 => 2 = 2**0 - 1 + 2 * ones(1)
4 = 2**2 => 5 = 2**1 - 1 + 2 * ones(2)
8 = 2**3 => 13 = 2**2 - 1 + 2 * ones(4)
16 = 2**4 => 33 = 2**3 - 1 + 2 * ones(8)
Looks like it works!
We are not quite done though. A and B might not necessarily be powers of 2, and if we have to count all the way from 2**n to 2**n + 2**(n-1) that's still O(N)!
On the other hand, if we manage to express a number in base 2, then we should be able to leverage our newly acquired formula. The main advantage being than there are only log2(N) bits in the representation.
Let's pick an example and understand how it works: 13 = 8 + 4 + 1
1 -> 0001
4 -> 0100
8 -> 1000
13 -> 1101
... however, the count is not just merely the sum:
ones(13) != ones(8) + ones(4) + ones(1)
Let's express it in terms of the "transposition" strategy instead:
ones(13) - ones(8) = ones(5) + (13 - 8)
ones(5) - ones(4) = ones(1) + (5 - 4)
Okay, easy to do with a bit of recursion.
#include <cmath>
#include <iostream>
static double const Log2 = log(2);
// store ones(2**n) at P2Count[n]
static size_t P2Count[64] = {};
// Unfortunately, the conversion to double might lose some precision
// static size_t log2(size_t n) { return log(double(n - 1))/Log2 + 1; }
// __builtin_clz* returns the number of leading 0s
static size_t log2(size_t n) {
if (n == 0) { return 0; }
return sizeof(n) - __builtin_clzl(n) - 1;
}
static size_t ones(size_t n) {
if (n == 0) { return 0; }
if (n == 1) { return 1; }
size_t const lg2 = log2(n);
size_t const np2 = 1ul << lg2; // "next" power of 2
if (np2 == n) { return P2Count[lg2]; }
size_t const pp2 = np2 / 2; // "previous" power of 2
return ones(pp2) + ones(n - pp2) + (n - pp2);
} // ones
// reminder: ones(2**n) = 2**(n-1) - 1 + 2*ones(2**(n-1))
void initP2Count() {
P2Count[0] = 1;
for (size_t i = 1; i != 64; ++i) {
P2Count[i] = (1ul << (i-1)) - 1 + 2 * P2Count[i-1];
}
} // initP2Count
size_t count(size_t const A, size_t const B) {
if (A == 0) { return ones(B); }
return ones(B) - ones(A - 1);
} // count
And a demonstration:
int main() {
// Init table
initP2Count();
std::cout << "0: " << P2Count[0] << ", 1: " << P2Count[1] << ", 2: " << P2Count[2] << ", 3: " << P2Count[3] << "\n";
for (size_t i = 0; i != 16; ++i) {
std::cout << i << ": " << ones(i) << "\n";
}
std::cout << "count(7, 14): " << count(7, 14) << "\n";
}
Victory!
Note: as Daniel Fisher noted, this fails to account for negative number (but assuming two-complement it can be inferred from their positive count).