Bit counting in a contiguous memory chunk - c++

I was asked in an interview the following question.
int countSetBits(void *ptr, int start, int end);
Synopsis:
Assume that ptr points to a big chunk of memory. Viewing this memory as contiguous sequence of bits, start and end are bit positions. Assume start and end
have proper values and ptr is pointing to an initialized chunck of memory.
Question:
Write a C code to count number of bits set from start to end [inclusive] and return the count.
Just to make it more clear
ptr---->+-------------------------------+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+-------------------------------+
| 8 | 9 | |15 |
+-------------------------------+
| |
+-------------------------------+
...
...
+-------------------------------+
| | S | |
+-------------------------------+
...
...
+-------------------------------+
| | E | |
+-------------------------------+
...
...
My solution:
int countSetBits(void *ptr, int start, int end )
{
int count = 0, idx;
char *ch;
for (idx = start; idx <= end; idx++)
{ ch = ptr + (idx/8);
if((128 >> (idx%8)) & (*ch))
{
count++;
}
}
return count;
}
I gave a very lengthy and somewhat inefficient code during the interview. I worked on it later and came up with above solution.
I am very sure SO community can provide more elegant solution. I am just curious to see their response.
PS: Above code is not compiled. It is more like a pseudo code and may contain errors.

The most quick and efficient way to my opinion is to use a table of 256 entries, where every element represents number of bits in the index. Index is a next byte from the memory location.
something like this:
int bit_table[256] = {0, 1, 1, 2, 1, ...};
char* p = ptr + start;
int count = 0;
for (p; p != ptr + end; p++)
count += bit_table[*(unsigned char*)p];

Boundary conditions, they get no respect...
Everyone here seems to be concentrating on the lookup table to count the bits. And that's OK, but I think that even more important when answering an interview question is to make sure you handle the boundary conditions.
The look up table is just an optimization. It's much more important to get the answer right than to get it fast. If this were my interview, going straight for the lookup table without even mentioning that there are some tricky details about handling the first few and last few bits that aren't on full-byte boundaries would be worse than coming up with a solution that counted each bit ploddingly, but got the boundary conditions right.
So I think Bhaskar's solution in his question is probably superior to the most of the answers mentioned here - it seems to handle the boundary conditions.
Here's a solution that uses a lookup table and tries to still handle the boundaries (it's only lightly tested, so I won't claim that it's 100% correct). It's also uglier than I'd like, but it's late:
typedef unsigned char uint8_t;
static
size_t bits_in_byte( uint8_t val)
{
static int const half_byte[] = { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4 };
int result1 = half_byte[val & 0x0f];
int result2 = half_byte[(val >> 4) & 0x0f];
return result1 + result2;
}
int countSetBits( void* ptr, int start, int end)
{
uint8_t* first;
uint8_t* last;
int bits_first;
int bits_last;
uint8_t mask_first;
uint8_t mask_last;
size_t count = 0;
// get bits from the first byte
first = ((uint8_t*) ptr) + (start / 8);
bits_first = 8 - start % 8;
mask_first = (1 << bits_first) - 1;
mask_first = mask_first << (8 - bits_first);
// get bits from last byte
last = ((uint8_t*) ptr) + (end / 8);
bits_last = 1 + (end % 8);
mask_last = (1 << bits_last) - 1;
if (first == last) {
// we only have a range of bits in the first byte
count = bits_in_byte( (*first) & mask_first & mask_last);
}
else {
// handle the bits from the first and last bytes specially
count += bits_in_byte((*first) & mask_first);
count += bits_in_byte((*last) & mask_last);
// now we've collected the odds and ends from the start and end of the bit range
// handle the full bytes in the interior of the range
for (first = first+1; first != last; ++first) {
count += bits_in_byte(*first);
}
}
return count;
}
Note that a detail that would have to be worked out as part of the interview is whether the bits within a byte are indexed starting at the least-significant-bit (lsb) or most-significant-bit (msb). In other words, if the start index were specified as 0, would a byte with the value 0x01 or a byte with the value 0x80 have the bit set in that index? Sort of like deciding whether the indexes consider the bit order within a byte as big-endian or little-endian.
There's no 'right' answer for this - the interviewer would have to specify what the behavior should be. I'll also note that my example solution handles this in the opposite way to the OP's example code (I was going by how I interpreted the diagram, with the indexes reading as 'bit numbers' as well). The OPs' solution considers the bit order as big-endian, my function treats them as little-endian. So even though both handle partial bytes at the star & end of the range, they'll give different answers. Which is the right answer depends on what the actual spec for the problem is.

The version of #dimitri is likely the fastest. But it is difficult to build the table of bit counts for all 128 8-bit chars in an interview. You can get a very fast version with a table for 16 hex numbers 0x0, 0x1, ..., 0xF, that you can build easily:
int countBits(void *ptr, int start, int end) {
// start, end are byte indexes
int hexCounts[16] = {0, 1, 1, 2, 1, 2, 2, 3,
1, 2, 3, 3, 2, 3, 3, 4};
unsigned char * pstart = (unsigned char *) ptr + start;
unsigned char * pend = (unsigned char *) ptr + end;
int count = 0;
for (unsigned char * p = pstart; p <= pend; ++p) {
unsigned char b = *p;
count += hexCounts[b & 0x0F] + hexCounts[(b >> 4) & 0x0F];
}
return count;
}
EDIT: If start and end are bit indexes then the bits in the first and last bytes would be counted first before the above function is called:
int countBits2(void *ptr, int start, int end) {
// start, end are bit indexes
if (start > end) return 0;
int count = 0;
unsigned char* pstart = (unsigned char *) ptr + start/8; // first byte
unsigned char* pend = (unsigned char *) ptr + end/8; // last byte
int istart = start % 8; // index in first byte
int iend = end % 8; // index in last byte
unsigned char b = *pstart; // byte
if (pstart == pend) { // count in 1 byte only
b = b << istart;
for (int i = istart; i <= iend; ++i) { // between istart, iend
if (b & 0x80) ++count;
b = b << 1;
}
}
else { // count in 2 bytes
for (int i = istart; i < 8; ++i) { // from istart to 7
if (b & 1) ++count;
b = b >> 1;
}
b = *pend;
for (int i = 0; i <= iend; ++i) { // from 0 to iend
if (b & 0x80) ++count;
b = b << 1;
}
}
return count + countBits(ptr, start/8 + 1, end/8 - 1);
}

An excellent recent study comparing several of the most modern techniques for counting the number of 'set' (1-valued) bits in a range of memory (aka Hamming Weight, bitset cardinality, sideways sum, population count or popcnt, etc.) can be found in Wojciech, Kurz, and Lemire (2017), Faster population counts using AVX2 instructions 1
The following is a complete, tested, and fully-working C# adaptation of the "Harley-Seal" algorithm from that paper, which the authors found to be the fastest method that uses general-purpose bitwise operations (that is, that doesn't require special hardware).
1. Managed array entry points(optional) Provides access to the block-optimized bit-counting for managed array ulong[].
/// <summary> Returns the total number of 1-valued bits in the array </summary>
[DebuggerStepThrough]
public static int OnesCount(ulong[] rg) => OnesCount(rg, 0, rg.Length);
/// <summary> Finds the total number of '1' bits in an array or its subset </summary>
/// <param name="rg"> Array of ulong values to scan </param>
/// <param name="index"> Starting index in the array </param>
/// <param name="count"> Number of ulong values to examine, starting at 'i' </param>
public static int OnesCount(ulong[] rg, int index, int count)
{
if ((index | count) < 0 || index > rg.Length - count)
throw new ArgumentException();
fixed (ulong* p = &rg[index])
return OnesCount(p, count);
}
2. Scalar APIUsed by the block-optimized counter to aggregate results from the carry-save adder, and also to finish up any remainder for block sizes not divisible by the optimized chunk size of 16 x 8 bytes/ulong = 128 bytes. Suitable for general-purpose use also.
/// <summary> Finds the Hamming Weight or ones-count of a ulong value </summary>
/// <returns> The number of 1-bits that are set in 'x' </returns>
public static int OnesCount(ulong x)
{
x -= (x >> 1) & 0x5555555555555555;
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
return (int)((((x + (x >> 4)) & 0x0F0F0F0F0F0F0F0F) * 0x0101010101010101) >> 56);
}
3. "Harley-Seal" block-optimized 1s-bit counterProcesses blocks of 128 bytes at a time, i.e., 16 ulong values per block. Uses the carry-save adder (shown below) to gang-add single bits across adjacent ulongs, and aggregates totals upwards as powers of two.
/// <summary> Count the number of 'set' (1-valued) bits in a range of memory. </summary>
/// <param name="p"> Pointer to an array of 64-bit ulong values to scan </param>
/// <param name="c"> Size of the memory block as a count of 64-bit ulongs </param>
/// <returns> The total number of 1-bits </returns>
public static int OnesCount(ulong* p, int c)
{
ulong z, y, x, w;
int c = 0;
for (w = x = y = z = 0UL; cq >= 16; cq -= 16)
c += OnesCount(CSA(ref w,
CSA(ref x,
CSA(ref y,
CSA(ref z, *p++, *p++),
CSA(ref z, *p++, *p++)),
CSA(ref y,
CSA(ref z, *p++, *p++),
CSA(ref z, *p++, *p++))),
CSA(ref x,
CSA(ref y,
CSA(ref z, *p++, *p++),
CSA(ref z, *p++, *p++)),
CSA(ref y,
CSA(ref z, *p++, *p++),
CSA(ref z, *p++, *p++)))));
c <<= 4;
c += (OnesCount(w) << 3) + (OnesCount(x) << 2) + (OnesCount(y) << 1) + OnesCount(z);
while (--cq >= 0)
c += OnesCount(*p++);
return c;
}
4. Carry-save adder (CSA)
/// <summary> carry-save adder </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static ulong CSA(ref ulong a, ulong b, ulong c)
{
ulong v = a & b | (a ^ b) & c;
a ^= b ^ c;
return v;
}
Remarks
Because the approach shown here counts the total number of 1-bits by proceeding 128-byte chunks at a time, it only becomes optimal with larger memory block sizes. For example, likely at least some (small) multiple of that sixteen-qword (16-ulong) chunk size. For counting 1-bits in smaller memory ranges, this code will work correctly, but drastically underperform more naïve methods. See the paper for details.
From the paper, this diagram summarizes how the Carry-Save Adder works:
References
[1.] Muła, Wojciech, Nathan Kurz, and Daniel Lemire. "Faster population counts using AVX2 instructions." The Computer Journal 61, no. 1 (2017): 111-120.

Disclaimer: No attempt to compile the following code has been made.
/*
* Table counting the number of set bits in a byte.
* The byte is the index to the table.
*/
uint8_t table[256] = {...};
/***************************************************************************
*
* countBits - count the number of set bits in a range
*
* The most significant bit in the byte is considered to be bit 0.
*
* RETURNS: 0 on success, -1 on failure
*/
int countBits (
uint8_t * buffer,
int startBit, /* starting bit */
int endBit, /* End-bit (inlcusive) */
unsigned * pTotal /* Output: number of consecutively set bits */
) {
int numBits; /* number of bits left to check */
int mask; /* mask to apply to byte from <buffer> */
int bits; /* # of bits to end of byte */
unsigned count = 0; /* total number of bits set */
uint8_t value; /* value read from the buffer */
/* Return -1 if parameters fail sanity check (skipped) */
numBits = (endBit - startBit) + 1;
index = startBit >> 3;
bits = 8 - (startBit & 7);
mask = (1 << bits) - 1;
value = buffer[index] & mask; /* mask-out any bits preceding <startBit> */
numBits -= bits;
while (numBits > 0) { /* Note: if <startBit> and <endBit> are in */
count += table[value]; /* same byte, this loop gets skipped. */
index++;
value = buffer[index];
numBits -= 8;
}
if (numBits < 0) { /* mask-out any bits following <endBit> */
bits = 8 - (endBit & 7);
mask = 0xff << bits;
value &= mask;
}
count += table[value];
*pTotal = count;
return 0;
}
Edit: Function header updated.

Depending on the industry you applied in, look-up tables might not be an acceptable means of optimization while platform / compiler specific optimizations are. Knowing that most compilers and CPU instruction sets have a pop count instruction, I'd go for this. It's a simplicity vs. performance trade-off though because right now I'm still iterating over a list of chars.
Also note that contrary to most answers I assume start and end are byte-offsets because it's not specified in the question that they're not and it's the default in most cases.
int countSetBits(void *ptr, int start, int end )
{
assert(start < end);
unsigned char *s = ((unsigned char*)ptr + start);
unsigned char *e = ((unsigned char*)ptr + end);
int r = 0;
while(s != e)
{
// __builtin_clz is not defined for 0 input.
if(*s) r += 32 - __builtin_clz(*s);
s++;
}
return r;
}

Related

Next higher number with same number of set bits and some bits fixed

I'm trying to find a way to, given a number of bits that need to be set, and a few indexes of bits that should remain fixed, generate the next higher number with the same number of set bits, that has all the fixed bits in place. This is closely related to https://www.chessprogramming.org/Traversing_Subsets_of_a_Set#Snoobing_the_Universe
The difference is that I want to keep some of the bits unchanged, and I'm trying to do this as efficiently as possible / something close to the snoob function, that given a number and the conditions, bithacks its way into the next one (So I'm trying to avoid iterating through all the smaller subsets and seeing which ones contain the required bits, for example).
For example, if I have the universe of numbers {1,2,...,20}, I'd like to, given a number with bits {2,5,6} set, generate the smallest number with 6 set bits that has bit {2,5,6} set, and then the number after that etc
Solution 1 (based on "Snoobing the Universe")
One solution is to define a value corresponding to the "fixed bits" and one corresponding to the "variable bits".
E.g. for bits {2, 5, 6}, the "fixed bits" value would be 0x64 (assuming bits are counted starting from 0).
The "variable bits" is initialized with the smallest value having the remaining number of bits. E.g. if we want a total of 6 bits and have 3 fixed bits, the remaining number of bits is 6-3=3, so the "variable bits" starting value is 0x7.
Now the resulting value is calculated by "blending" the two bit sets by inserting the "variable bits" into the places where the "fixed bits" are 0 (see the function blend() below).
To get the next value, the "variable bits" are modified using the linked "Snoobing the Universe" function (snoob() below) and the result is again obtained by "blending" the fixed and variable bits.
All in all, a solution is as follows (prints the first 10 numbers as an example):
#include<stdio.h>
#include<stdint.h>
uint64_t snoob (uint64_t x) {
uint64_t smallest, ripple, ones;
smallest = x & -x;
ripple = x + smallest;
ones = x ^ ripple;
ones = (ones >> 2) / smallest;
return ripple | ones;
}
uint64_t blend(uint64_t fixed, uint64_t var)
{
uint64_t result = fixed;
uint64_t maskResult = 1;
while(var != 0)
{
if((result & maskResult) == 0)
{
if((var & 1) != 0)
{
result |= maskResult;
}
var >>= 1;
}
maskResult <<= 1;
}
return result;
}
int main(void)
{
const uint64_t fixedBits = 0x64; // Bits 2, 5, 6 must be set
const int additionalBits = 3;
uint64_t varBits = ((uint64_t)1 << additionalBits) - 1;
uint64_t value;
for(unsigned i = 0; i < 10; i++)
{
value = blend(fixedBits, varBits);
printf("%u: decimal=%llu hex=0x%04llx\n", i, value, value);
varBits = snoob(varBits); // Get next value for variable bits
}
}
Solution 2 (based on "Snoobing any Sets")
Another solution based on the linked "Snoobing any Sets" (function snoobSubset()below) is to define the "variale set" as the bits which are not fixed and then initialize the "variable bits" as the n least significant of these bits (see function getLsbOnes() below). In the example case, n=3.
This solution is as follows:
#include<stdio.h>
#include<stdint.h>
uint64_t getLsbOnes(uint64_t value, unsigned count)
{
uint64_t mask = 1;
while(mask != 0)
{
if(count > 0)
{
if((mask & value) != 0)
{
count--;
}
}
else
{
value &= ~mask;
}
mask <<= 1;
}
return value;
}
// get next greater subset of set with same number of one bits
uint64_t snoobSubset (uint64_t sub, uint64_t set) {
uint64_t tmp = sub-1;
uint64_t rip = set & (tmp + (sub & (0-sub)) - set);
for(sub = (tmp & sub) ^ rip; sub &= sub-1; rip ^= tmp, set ^= tmp)
tmp = set & (0-set);
return rip;
}
int main(void)
{
const uint64_t fixedBits = 0x64;
const int additionalBits = 3;
const uint64_t varSet = ~fixedBits;
uint64_t varBits = getLsbOnes(varSet, additionalBits);
uint64_t value;
for(unsigned i = 0; i < 10; i++)
{
value = fixedBits | varBits;
printf("%u: decimal=%llu hex=0x%04llx\n", i, value, value);
varBits = snoobSubset(varBits, varSet);
}
}
Example output
The output for both solutions should be:
0: decimal=111 hex=0x006f
1: decimal=119 hex=0x0077
2: decimal=125 hex=0x007d
3: decimal=126 hex=0x007e
4: decimal=231 hex=0x00e7
5: decimal=237 hex=0x00ed
6: decimal=238 hex=0x00ee
7: decimal=245 hex=0x00f5
8: decimal=246 hex=0x00f6
9: decimal=252 hex=0x00fc

Remove nth bit from buffer, and shift the rest

Giving a uint8_t buffer of x length, I am trying to come up with a function or a macro that can remove nth bit (or n to n+i), then left-shift the remaining bits.
example #1:
for input 0b76543210 0b76543210 ... then output should be 0b76543217 0b654321 ...
example #2: if the input is:
uint8_t input[8] = {
0b00110011,
0b00110011,
...
};
the output without the first bit, should be
uint8_t output[8] = {
0b00110010,
0b01100100,
...
};
I have tried the following to remove the first bit, but it did not work for the second group of bits.
/* A macro to extract (a-b) range of bits without shifting */
#define BIT_RANGE(N,x,y) ((N) & ((0xff >> (7 - (y) + (x))) << ((x))))
void removeBit0(uint8_t *n) {
for (int i=0; i < 7; i++) {
n[i] = (BIT_RANGE(n[i], i + 1, 7)) << (i + 1) |
(BIT_RANGE(n[i + 1], 1, i + 1)) << (7 - i); /* This does not extract the next element bits */
}
n[7] = 0;
}
Update #1
In my case, the input will be uint64_t number, then I will use memmov to shift it one place to the left.
Update #2
The solution can be in C/C++, assembly(x86-64) or inline assembly.
This is really 2 subproblems: remove bits from each byte and pack the results. This is the flow of the code below. I wouldn't use a macro for this. Too much going on. Just inline the function if you're worried about performance at that level.
#include <stdio.h>
#include <stdint.h>
// Remove bits n to n+k-1 from x.
unsigned scrunch_1(unsigned x, int n, int k) {
unsigned hi_bits = ~0u << n;
return (x & ~hi_bits) | ((x >> k) & hi_bits);
}
// Remove bits n to n+k-1 from each byte in the buffer,
// then pack left. Return number of packed bytes.
size_t scrunch(uint8_t *buf, size_t size, int n, int k) {
size_t i_src = 0, i_dst = 0;
unsigned src_bits = 0; // Scrunched source bit buffer.
int n_src_bits = 0; // Initially it's empty.
for (;;) {
// Get scrunched bits until the buffer has at least 8.
while (n_src_bits < 8) {
if (i_src >= size) { // Done when source bytes exhausted.
// If there are left-over bits, add one more byte to output.
if (n_src_bits > 0) buf[i_dst++] = src_bits << (8 - n_src_bits);
return i_dst;
}
// Pack 'em in.
src_bits = (src_bits << (8 - k)) | scrunch_1(buf[i_src++], n, k);
n_src_bits += 8 - k;
}
// Write the highest 8 bits of the buffer to the destination byte.
n_src_bits -= 8;
buf[i_dst++] = src_bits >> n_src_bits;
}
}
int main(void) {
uint8_t x[] = { 0xaa, 0xaa, 0xaa, 0xaa };
size_t n = scrunch(x, 4, 2, 3);
for (size_t i = 0; i < n; i++) {
printf("%x ", x[i]);
}
printf("\n");
return 0;
}
This writes b5 ad 60, which by my reckoning is correct. A few other test cases work as well.
Oops I coded it the first time shifting the wrong way, but include that here in case it's useful to someone.
#include <stdio.h>
#include <stdint.h>
// Remove bits n to n+k-1 from x.
unsigned scrunch_1(unsigned x, int n, int k) {
unsigned hi_bits = 0xffu << n;
return (x & ~hi_bits) | ((x >> k) & hi_bits);
}
// Remove bits n to n+k-1 from each byte in the buffer,
// then pack right. Return number of packed bytes.
size_t scrunch(uint8_t *buf, size_t size, int n, int k) {
size_t i_src = 0, i_dst = 0;
unsigned src_bits = 0; // Scrunched source bit buffer.
int n_src_bits = 0; // Initially it's empty.
for (;;) {
// Get scrunched bits until the buffer has at least 8.
while (n_src_bits < 8) {
if (i_src >= size) { // Done when source bytes exhausted.
// If there are left-over bits, add one more byte to output.
if (n_src_bits > 0) buf[i_dst++] = src_bits;
return i_dst;
}
// Pack 'em in.
src_bits |= scrunch_1(buf[i_src++], n, k) << n_src_bits;
n_src_bits += 8 - k;
}
// Write the lower 8 bits of the buffer to the destination byte.
buf[i_dst++] = src_bits;
src_bits >>= 8;
n_src_bits -= 8;
}
}
int main(void) {
uint8_t x[] = { 0xaa, 0xaa, 0xaa, 0xaa };
size_t n = scrunch(x, 4, 2, 3);
for (size_t i = 0; i < n; i++) {
printf("%x ", x[i]);
}
printf("\n");
return 0;
}
This writes d6 5a b. A few other test cases work as well.
Something similar to this should work:
template<typename S> void removeBit(S* buffer, size_t length, size_t index)
{
const size_t BITS_PER_UNIT = sizeof(S)*8;
// first we find which data unit contains the desired bit
const size_t unit = index / BITS_PER_UNIT;
// and which index has the bit inside the specified unit, starting counting from most significant bit
const size_t relativeIndex = (BITS_PER_UNIT - 1) - index % BITS_PER_UNIT;
// then we unset that bit
buffer[unit] &= ~(1 << relativeIndex);
// now we have to shift what's on the right by 1 position
// we create a mask such that if 0b00100000 is the bit removed we use 0b00011111 as mask to shift the rest
const S partialShiftMask = (1 << relativeIndex) - 1;
// now we keep all bits left to the removed one and we shift left all the others
buffer[unit] = (buffer[unit] & ~partialShiftMask) | ((buffer[unit] & partialShiftMask) << 1);
for (int i = unit+1; i < length; ++i)
{
//we set rightmost bit of previous unit according to last bit of current unit
buffer[i-1] |= buffer[i] >> (BITS_PER_UNIT-1);
// then we shift current unit by one
buffer[i] <<= 1;
}
}
I just tested it on some basic cases so maybe something is not exactly correct but this should move you onto the right track.

char array of boolean numbers to 64 bit integer number

I have an character array (say char charr[5]) which contains 0/1 (char array of boolean number). Now, I want to convert the character array to 64 bit integer number (if array is {0, 0, 0, 1 , 0}, it will give 2 ). How to do that ? Is there any library functions ?
No, there's no standard function for that. But it's pretty trivial:
uint64_t pack(const uint8_t *bits, size_t n)
{
uint64_t x = 0, value = 1 << (n - 1);
while(n > 0)
{
x += value * *bits++;
n--;
value /= 2;
}
return x;
}
Unwind has the basic idea right, but a complex implementation. This also works:
uint64_t pack(const uint8_t *bits, size_t n)
{
uint64 x = 0;
for(;n > 0; n--) // For all input bits.
{
x <<= 1; // make room for next bit.
assert(*bits <= 1); // It better be a 0 or 1.
x += *bits++; // Add new bit on the end.
}
return x;
}
Try strtoll with base 2:
int val = strtoll(input, NULL, 2);

How to read individual bits from an array?

Lets say i have an array dynamically allocated.
int* array=new int[10]
That is 10*4=40 bytes or 10*32=320 bits. I want to read the 2nd bit of the 30th byte or 242nd bit. What is the easiest way to do so? I know I can access the 30th byte using array[30] but accessing individual bits is more tricky.
bool bitset(void const * data, int bitindex) {
int byte = bitindex / 8;
int bit = bitindex % 8;
unsigned char const * u = (unsigned char const *) data;
return (u[byte] & (1<<bit)) != 0;
}
this is working !
#define GET_BIT(p, n) ((((unsigned char *)p)[n/8] >> (n%8)) & 0x01)
int main()
{
int myArray[2] = { 0xaaaaaaaa, 0x00ff00ff };
for( int i =0 ; i < 2*32 ; i++ )
printf("%d", GET_BIT(myArray, i));
return 0;
}
ouput :
0101010101010101010101010101010111111111000000001111111100000000
Be carefull of the endiannes !
First, if you're doing bitwise operations, it's usually
preferable to make the elements an unsigned integral type
(although in this case, it really doesn't make that much
difference). As for accessing the bits: to access bit i in an
array of n int's:
static int const bitsPerWord = sizeof(int) * CHAR_BIT;
assert( i >= 0 && i < n * bitsPerWord );
int wordIndex = i / bitsPerWord;
int bitIndex = i % bitsPerWord;
then to read:
return (array[wordIndex] & (1 << bitIndex)) != 0;
to set:
array[wordIndex] |= 1 << bitIndex;
and to reset:
array[wordIndex] &= ~(1 << bitIndex);
Or you can use bitset, if n is constant, or vector<bool> or
boost::dynamic_bitset if it's not, and let someone else do the
work.
You can use something like this:
!((array[30] & 2) == 0)
array[30] is the integer.
& 2 is an and operation which masks the second bit (2 = 00000010)
== 0 will check if the mask result is 0
! will negate that result, because we're checking if it's 1 not zero....
You need bit operations here...
if(array[5] & 0x1)
{
//the first bit in array[5] is 1
}
else
{
//the first bit is 0
}
if(array[5] & 0x8)
{
//the 4th bit in array[5] is 1
}
else
{
//the 4th bit is 0
}
0x8 is 00001000 in binary. Doing the anding masks all other bits and allows you to see if the bit is 1 or 0.
int is typically 32 bits, so you would need to do some arithmetic to get a certain bit number in the entire array.
EDITED based on comment below - array contains int of 32 bits, not 8 bits uchar.
int pos = 241; // I start at index 0
bool bit242 = (array[pos/32] >> (pos%32)) & 1;

How to determine how many bytes an integer needs?

I'm looking for the most efficient way to calculate the minimum number of bytes needed to store an integer without losing precision.
e.g.
int: 10 = 1 byte
int: 257 = 2 bytes;
int: 18446744073709551615 (UINT64_MAX) = 8 bytes;
Thanks
P.S. This is for a hash functions which will be called many millions of times
Also the byte sizes don't have to be a power of two
The fastest solution seems to one based on tronics answer:
int bytes;
if (hash <= UINT32_MAX)
{
if (hash < 16777216U)
{
if (hash <= UINT16_MAX)
{
if (hash <= UINT8_MAX) bytes = 1;
else bytes = 2;
}
else bytes = 3;
}
else bytes = 4;
}
else if (hash <= UINT64_MAX)
{
if (hash < 72057594000000000ULL)
{
if (hash < 281474976710656ULL)
{
if (hash < 1099511627776ULL) bytes = 5;
else bytes = 6;
}
else bytes = 7;
}
else bytes = 8;
}
The speed difference using mostly 56 bit vals was minimal (but measurable) compared to Thomas Pornin answer. Also i didn't test the solution using __builtin_clzl which could be comparable.
Use this:
int n = 0;
while (x != 0) {
x >>= 8;
n ++;
}
This assumes that x contains your (positive) value.
Note that zero will be declared encodable as no byte at all. Also, most variable-size encodings need some length field or terminator to know where encoding stops in a file or stream (usually, when you encode an integer and mind about size, then there is more than one integer in your encoded object).
You need just two simple ifs if you are interested on the common sizes only. Consider this (assuming that you actually have unsigned values):
if (val < 0x10000) {
if (val < 0x100) // 8 bit
else // 16 bit
} else {
if (val < 0x100000000L) // 32 bit
else // 64 bit
}
Should you need to test for other sizes, choosing a middle point and then doing nested tests will keep the number of tests very low in any case. However, in that case making the testing a recursive function might be a better option, to keep the code simple. A decent compiler will optimize away the recursive calls so that the resulting code is still just as fast.
Assuming a byte is 8 bits, to represent an integer x you need [log2(x) / 8] + 1 bytes where [x] = floor(x).
Ok, I see now that the byte sizes aren't necessarily a power of two. Consider the byte sizes b. The formula is still [log2(x) / b] + 1.
Now, to calculate the log, either use lookup tables (best way speed-wise) or use binary search, which is also very fast for integers.
The function to find the position of the first '1' bit from the most significant side (clz or bsr) is usually a simple CPU instruction (no need to mess with log2), so you could divide that by 8 to get the number of bytes needed. In gcc, there's __builtin_clz for this task:
#include <limits.h>
int bytes_needed(unsigned long long x) {
int bits_needed = sizeof(x)*CHAR_BIT - __builtin_clzll(x);
if (bits_needed == 0)
return 1;
else
return (bits_needed + 7) / 8;
}
(On MSVC you would use the _BitScanReverse intrinsic.)
You may first get the highest bit set, which is the same as log2(N), and then get the bytes needed by ceil(log2(N) / 8).
Here are some bit hacks for getting the position of the highest bit set, which are copied from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious, and you can click the URL for details of how these algorithms work.
Find the integer log base 2 of an integer with an 64-bit IEEE float
int v; // 32-bit integer to find the log base 2 of
int r; // result of log_2(v) goes here
union { unsigned int u[2]; double d; } t; // temp
t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] = 0x43300000;
t.u[__FLOAT_WORD_ORDER!=LITTLE_ENDIAN] = v;
t.d -= 4503599627370496.0;
r = (t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] >> 20) - 0x3FF;
Find the log base 2 of an integer with a lookup table
static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};
unsigned int v; // 32-bit word to find the log of
unsigned r; // r will be lg(v)
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else
{
r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}
Find the log base 2 of an N-bit integer in O(lg(N)) operations
unsigned int v; // 32-bit value to find the log2 of
const unsigned int b[] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000};
const unsigned int S[] = {1, 2, 4, 8, 16};
int i;
register unsigned int r = 0; // result of log2(v) will go here
for (i = 4; i >= 0; i--) // unroll for speed...
{
if (v & b[i])
{
v >>= S[i];
r |= S[i];
}
}
// OR (IF YOUR CPU BRANCHES SLOWLY):
unsigned int v; // 32-bit value to find the log2 of
register unsigned int r; // result of log2(v) will go here
register unsigned int shift;
r = (v > 0xFFFF) << 4; v >>= r;
shift = (v > 0xFF ) << 3; v >>= shift; r |= shift;
shift = (v > 0xF ) << 2; v >>= shift; r |= shift;
shift = (v > 0x3 ) << 1; v >>= shift; r |= shift;
r |= (v >> 1);
// OR (IF YOU KNOW v IS A POWER OF 2):
unsigned int v; // 32-bit value to find the log2 of
static const unsigned int b[] = {0xAAAAAAAA, 0xCCCCCCCC, 0xF0F0F0F0,
0xFF00FF00, 0xFFFF0000};
register unsigned int r = (v & b[0]) != 0;
for (i = 4; i > 0; i--) // unroll for speed...
{
r |= ((v & b[i]) != 0) << i;
}
Find the number of bits by taking the log2 of the number, then divide that by 8 to get the number of bytes.
You can find logn of x by the formula:
logn(x) = log(x) / log(n)
Update:
Since you need to do this really quickly, Bit Twiddling Hacks has several methods for quickly calculating log2(x). The look-up table approach seems like it would suit your needs.
This will get you the number of bytes. It's not strictly the most efficient, but unless you're programming a nanobot powered by the energy contained in a red blood cell, it won't matter.
int count = 0;
while (numbertotest > 0)
{
numbertotest >>= 8;
count++;
}
You could write a little template meta-programming code to figure it out at compile time if you need it for array sizes:
template<unsigned long long N> struct NBytes
{ static const size_t value = NBytes<N/256>::value+1; };
template<> struct NBytes<0>
{ static const size_t value = 0; };
int main()
{
std::cout << "short = " << NBytes<SHRT_MAX>::value << " bytes\n";
std::cout << "int = " << NBytes<INT_MAX>::value << " bytes\n";
std::cout << "long long = " << NBytes<ULLONG_MAX>::value << " bytes\n";
std::cout << "10 = " << NBytes<10>::value << " bytes\n";
std::cout << "257 = " << NBytes<257>::value << " bytes\n";
return 0;
}
output:
short = 2 bytes
int = 4 bytes
long long = 8 bytes
10 = 1 bytes
257 = 2 bytes
Note: I know this isn't answering the original question, but it answers a related question that people will be searching for when they land on this page.
Floor((log2(N) / 8) + 1) bytes
You need exactly the log function
nb_bytes = floor(log(x)/log(256))+1
if you use log2, log2(256) == 8 so
floor(log2(x)/8)+1
You need to raise 256 to successive powers until the result is larger than your value.
For example: (Tested in C#)
long long limit = 1;
int byteCount;
for (byteCount = 1; byteCount < 8; byteCount++) {
limit *= 256;
if (limit > value)
break;
}
If you only want byte sizes to be powers of two (If you don't want 65,537 to return 3), replace byteCount++ with byteCount *= 2.
I think this is a portable implementation of the straightforward formula:
#include <limits.h>
#include <math.h>
#include <stdio.h>
int main(void) {
int i;
unsigned int values[] = {10, 257, 67898, 140000, INT_MAX, INT_MIN};
for ( i = 0; i < sizeof(values)/sizeof(values[0]); ++i) {
printf("%d needs %.0f bytes\n",
values[i],
1.0 + floor(log(values[i]) / (M_LN2 * CHAR_BIT))
);
}
return 0;
}
Output:
10 needs 1 bytes
257 needs 2 bytes
67898 needs 3 bytes
140000 needs 3 bytes
2147483647 needs 4 bytes
-2147483648 needs 4 bytes
Whether and how much the lack of speed and the need to link floating point libraries depends on your needs.
I know this question didn't ask for this type of answer but for those looking for a solution using the smallest number of characters, this does the assignment to a length variable in 17 characters, or 25 including the declaration of the length variable.
//Assuming v is the value that is being counted...
int l=0;
for(;v>>l*8;l++);
This is based on SoapBox's idea of creating a solution that contains no jumps, branches etc... Unfortunately his solution was not quite correct. I have adopted the spirit and here's a 32bit version, the 64bit checks can be applied easily if desired.
The function returns number of bytes required to store the given integer.
unsigned short getBytesNeeded(unsigned int value)
{
unsigned short c = 0; // 0 => size 1
c |= !!(value & 0xFF00); // 1 => size 2
c |= (!!(value & 0xFF0000)) << 1; // 2 => size 3
c |= (!!(value & 0xFF000000)) << 2; // 4 => size 4
static const int size_table[] = { 1, 2, 3, 3, 4, 4, 4, 4 };
return size_table[c];
}
For each of eight times, shift the int eight bits to the right and see if there are still 1-bits left. The number of times you shift before you stop is the number of bytes you need.
More succinctly, the minimum number of bytes you need is ceil(min_bits/8), where min_bits is the index (i+1) of the highest set bit.
There are a multitude of ways to do this.
Option #1.
int numBytes = 0;
do {
numBytes++;
} while (i >>= 8);
return (numBytes);
In the above example, is the number you are testing, and generally works for any processor, any size of integer.
However, it might not be the fastest. Alternatively, you can try a series of if statements ...
For a 32 bit integers
if ((upper = (value >> 16)) == 0) {
/* Bit in lower 16 bits may be set. */
if ((high = (value >> 8)) == 0) {
return (1);
}
return (2);
}
/* Bit in upper 16 bits is set */
if ((high = (upper >> 8)) == 0) {
return (3);
}
return (4);
For 64 bit integers, Another level of if statements would be required.
If the speed of this routine is as critical as you say, it might be worthwhile to do this in assembler if you want it as a function call. That could allow you to avoid creating and destroying the stack frame, saving a few extra clock cycles if it is that critical.
A bit basic, but since there will be a limited number of outputs, can you not pre-compute the breakpoints and use a case statement? No need for calculations at run-time, only a limited number of comparisons.
Why not just use a 32-bit hash?
That will work at near-top-speed everywhere.
I'm rather confused as to why a large hash would even be wanted. If a 4-byte hash works, why not just use it always? Excepting cryptographic uses, who has hash tables with more then 232 buckets anyway?
there are lots of great recipes for stuff like this over at Sean Anderson's "Bit Twiddling Hacks" page.
This code has 0 branches, which could be faster on some systems. Also on some systems (GPGPU) its important for threads in the same warp to execute the same instructions. This code is always the same number of instructions no matter what the input value.
inline int get_num_bytes(unsigned long long value) // where unsigned long long is the largest integer value on this platform
{
int size = 1; // starts at 1 sot that 0 will return 1 byte
size += !!(value & 0xFF00);
size += !!(value & 0xFFFF0000);
if (sizeof(unsigned long long) > 4) // every sane compiler will optimize this out
{
size += !!(value & 0xFFFFFFFF00000000ull);
if (sizeof(unsigned long long) > 8)
{
size += !!(value & 0xFFFFFFFFFFFFFFFF0000000000000000ull);
}
}
static const int size_table[] = { 1, 2, 4, 8, 16 };
return size_table[size];
}
g++ -O3 produces the following (verifying that the ifs are optimized out):
xor %edx,%edx
test $0xff00,%edi
setne %dl
xor %eax,%eax
test $0xffff0000,%edi
setne %al
lea 0x1(%rdx,%rax,1),%eax
movabs $0xffffffff00000000,%rdx
test %rdx,%rdi
setne %dl
lea (%rdx,%rax,1),%rax
and $0xf,%eax
mov _ZZ13get_num_bytesyE10size_table(,%rax,4),%eax
retq
Why so complicated? Here's what I came up with:
bytesNeeded = (numBits/8)+((numBits%8) != 0);
Basically numBits divided by eight + 1 if there is a remainder.
There are already a lot of answers here, but if you know the number ahead of time, in c++ you can use a template to make use of the preprocessor.
template <unsigned long long N>
struct RequiredBytes {
enum : int { value = 1 + (N > 255 ? RequiredBits<(N >> 8)>::value : 0) };
};
template <>
struct RequiredBytes<0> {
enum : int { value = 1 };
};
const int REQUIRED_BYTES_18446744073709551615 = RequiredBytes<18446744073709551615>::value; // 8
or for a bits version:
template <unsigned long long N>
struct RequiredBits {
enum : int { value = 1 + RequiredBits<(N >> 1)>::value };
};
template <>
struct RequiredBits<1> {
enum : int { value = 1 };
};
template <>
struct RequiredBits<0> {
enum : int { value = 1 };
};
const int REQUIRED_BITS_42 = RequiredBits<42>::value; // 6