Inserting bits into byte - c++

I was looking at an example of reading bits from a byte and the implementation looked simple and easy to understand. I was wondering if anyone has a similar example of how to insert bits into a byte or byte array, that is easier to understand and also implement like the example below.
Here is the example I found of reading bits from a byte:
static int GetBits3(byte b, int offset, int count)
{
return (b >> offset) & ((1 << count) - 1);
}
Here is what I'm trying to do. This is my current implementation, I'm just a little confused with the bit-masking/shifting, etc., so I'm trying to find out if there is an easier way to do what I'm doing
BYTE Msg[2];
Msg_Id = 3;
Msg_Event = 1;
Msg_Ready = 2;
Msg[0] = ( ( Msg_Event << 4 ) & 0xF0 ) | ( Msg_Id & 0x0F ) ;
Msg[1] = Msg_Ready & 0x0F; //MsgReady & Unused

If you are using consecutive integer constant values like in the example above, you should shift the bits with these constants when putting them inside a byte. Otherwise they overlap: in your example, Msg_Id equals Msg_Event & Msg_Ready. These can be used like
Msg[0] = ( 1 << Msg_Event ) | ( 1 << Msg_Id); // sets the 2nd and 4th bits
(Note that bits within a byte are indexed from 0.) The other approach would be using powers of 2 as constant values:
Msg_Id = 4; // equals 1 << 2
Msg_Event = 1; // equals 1 << 0
Msg_Ready = 2; // equals 1 << 1
Note that in your code above, masking with 0x0F or 0xF0 is not really needed: (Msg_Id & 0x0F) == Msg_Id and ((Msg_Event << 4) & 0xF0) == (Msg_Event << 4).

You could use a bit field. For instance :
struct Msg
{
unsigned MsgEvent : 1; // 1 bit
unsigned MsgReady : 1; // 1 bit
};
You could then use a union to manipulate either the bitfield or the byte, something like this :
struct MsgBitField {
unsigned MsgEvent : 1; // 1 bit
unsigned MsgReady : 1; // 1 bit
};
union ByteAsBitField {
unsigned char Byte;
MsgBitField Message;
};
int main() {
ByteAsBitField MyByte;
MyByte.Byte = 0;
MyByte.Message.MsgEvent = true;
}

Related

Bit counting in a contiguous memory chunk

I was asked in an interview the following question.
int countSetBits(void *ptr, int start, int end);
Synopsis:
Assume that ptr points to a big chunk of memory. Viewing this memory as contiguous sequence of bits, start and end are bit positions. Assume start and end
have proper values and ptr is pointing to an initialized chunck of memory.
Question:
Write a C code to count number of bits set from start to end [inclusive] and return the count.
Just to make it more clear
ptr---->+-------------------------------+
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
+-------------------------------+
| 8 | 9 | |15 |
+-------------------------------+
| |
+-------------------------------+
...
...
+-------------------------------+
| | S | |
+-------------------------------+
...
...
+-------------------------------+
| | E | |
+-------------------------------+
...
...
My solution:
int countSetBits(void *ptr, int start, int end )
{
int count = 0, idx;
char *ch;
for (idx = start; idx <= end; idx++)
{ ch = ptr + (idx/8);
if((128 >> (idx%8)) & (*ch))
{
count++;
}
}
return count;
}
I gave a very lengthy and somewhat inefficient code during the interview. I worked on it later and came up with above solution.
I am very sure SO community can provide more elegant solution. I am just curious to see their response.
PS: Above code is not compiled. It is more like a pseudo code and may contain errors.
The most quick and efficient way to my opinion is to use a table of 256 entries, where every element represents number of bits in the index. Index is a next byte from the memory location.
something like this:
int bit_table[256] = {0, 1, 1, 2, 1, ...};
char* p = ptr + start;
int count = 0;
for (p; p != ptr + end; p++)
count += bit_table[*(unsigned char*)p];
Boundary conditions, they get no respect...
Everyone here seems to be concentrating on the lookup table to count the bits. And that's OK, but I think that even more important when answering an interview question is to make sure you handle the boundary conditions.
The look up table is just an optimization. It's much more important to get the answer right than to get it fast. If this were my interview, going straight for the lookup table without even mentioning that there are some tricky details about handling the first few and last few bits that aren't on full-byte boundaries would be worse than coming up with a solution that counted each bit ploddingly, but got the boundary conditions right.
So I think Bhaskar's solution in his question is probably superior to the most of the answers mentioned here - it seems to handle the boundary conditions.
Here's a solution that uses a lookup table and tries to still handle the boundaries (it's only lightly tested, so I won't claim that it's 100% correct). It's also uglier than I'd like, but it's late:
typedef unsigned char uint8_t;
static
size_t bits_in_byte( uint8_t val)
{
static int const half_byte[] = { 0, 1, 1, 2, 1, 2, 2, 3, 1, 2, 2, 3, 2, 3, 3, 4 };
int result1 = half_byte[val & 0x0f];
int result2 = half_byte[(val >> 4) & 0x0f];
return result1 + result2;
}
int countSetBits( void* ptr, int start, int end)
{
uint8_t* first;
uint8_t* last;
int bits_first;
int bits_last;
uint8_t mask_first;
uint8_t mask_last;
size_t count = 0;
// get bits from the first byte
first = ((uint8_t*) ptr) + (start / 8);
bits_first = 8 - start % 8;
mask_first = (1 << bits_first) - 1;
mask_first = mask_first << (8 - bits_first);
// get bits from last byte
last = ((uint8_t*) ptr) + (end / 8);
bits_last = 1 + (end % 8);
mask_last = (1 << bits_last) - 1;
if (first == last) {
// we only have a range of bits in the first byte
count = bits_in_byte( (*first) & mask_first & mask_last);
}
else {
// handle the bits from the first and last bytes specially
count += bits_in_byte((*first) & mask_first);
count += bits_in_byte((*last) & mask_last);
// now we've collected the odds and ends from the start and end of the bit range
// handle the full bytes in the interior of the range
for (first = first+1; first != last; ++first) {
count += bits_in_byte(*first);
}
}
return count;
}
Note that a detail that would have to be worked out as part of the interview is whether the bits within a byte are indexed starting at the least-significant-bit (lsb) or most-significant-bit (msb). In other words, if the start index were specified as 0, would a byte with the value 0x01 or a byte with the value 0x80 have the bit set in that index? Sort of like deciding whether the indexes consider the bit order within a byte as big-endian or little-endian.
There's no 'right' answer for this - the interviewer would have to specify what the behavior should be. I'll also note that my example solution handles this in the opposite way to the OP's example code (I was going by how I interpreted the diagram, with the indexes reading as 'bit numbers' as well). The OPs' solution considers the bit order as big-endian, my function treats them as little-endian. So even though both handle partial bytes at the star & end of the range, they'll give different answers. Which is the right answer depends on what the actual spec for the problem is.
The version of #dimitri is likely the fastest. But it is difficult to build the table of bit counts for all 128 8-bit chars in an interview. You can get a very fast version with a table for 16 hex numbers 0x0, 0x1, ..., 0xF, that you can build easily:
int countBits(void *ptr, int start, int end) {
// start, end are byte indexes
int hexCounts[16] = {0, 1, 1, 2, 1, 2, 2, 3,
1, 2, 3, 3, 2, 3, 3, 4};
unsigned char * pstart = (unsigned char *) ptr + start;
unsigned char * pend = (unsigned char *) ptr + end;
int count = 0;
for (unsigned char * p = pstart; p <= pend; ++p) {
unsigned char b = *p;
count += hexCounts[b & 0x0F] + hexCounts[(b >> 4) & 0x0F];
}
return count;
}
EDIT: If start and end are bit indexes then the bits in the first and last bytes would be counted first before the above function is called:
int countBits2(void *ptr, int start, int end) {
// start, end are bit indexes
if (start > end) return 0;
int count = 0;
unsigned char* pstart = (unsigned char *) ptr + start/8; // first byte
unsigned char* pend = (unsigned char *) ptr + end/8; // last byte
int istart = start % 8; // index in first byte
int iend = end % 8; // index in last byte
unsigned char b = *pstart; // byte
if (pstart == pend) { // count in 1 byte only
b = b << istart;
for (int i = istart; i <= iend; ++i) { // between istart, iend
if (b & 0x80) ++count;
b = b << 1;
}
}
else { // count in 2 bytes
for (int i = istart; i < 8; ++i) { // from istart to 7
if (b & 1) ++count;
b = b >> 1;
}
b = *pend;
for (int i = 0; i <= iend; ++i) { // from 0 to iend
if (b & 0x80) ++count;
b = b << 1;
}
}
return count + countBits(ptr, start/8 + 1, end/8 - 1);
}
An excellent recent study comparing several of the most modern techniques for counting the number of 'set' (1-valued) bits in a range of memory (aka Hamming Weight, bitset cardinality, sideways sum, population count or popcnt, etc.) can be found in Wojciech, Kurz, and Lemire (2017), Faster population counts using AVX2 instructions 1
The following is a complete, tested, and fully-working C# adaptation of the "Harley-Seal" algorithm from that paper, which the authors found to be the fastest method that uses general-purpose bitwise operations (that is, that doesn't require special hardware).
1. Managed array entry points(optional) Provides access to the block-optimized bit-counting for managed array ulong[].
/// <summary> Returns the total number of 1-valued bits in the array </summary>
[DebuggerStepThrough]
public static int OnesCount(ulong[] rg) => OnesCount(rg, 0, rg.Length);
/// <summary> Finds the total number of '1' bits in an array or its subset </summary>
/// <param name="rg"> Array of ulong values to scan </param>
/// <param name="index"> Starting index in the array </param>
/// <param name="count"> Number of ulong values to examine, starting at 'i' </param>
public static int OnesCount(ulong[] rg, int index, int count)
{
if ((index | count) < 0 || index > rg.Length - count)
throw new ArgumentException();
fixed (ulong* p = &rg[index])
return OnesCount(p, count);
}
2. Scalar APIUsed by the block-optimized counter to aggregate results from the carry-save adder, and also to finish up any remainder for block sizes not divisible by the optimized chunk size of 16 x 8 bytes/ulong = 128 bytes. Suitable for general-purpose use also.
/// <summary> Finds the Hamming Weight or ones-count of a ulong value </summary>
/// <returns> The number of 1-bits that are set in 'x' </returns>
public static int OnesCount(ulong x)
{
x -= (x >> 1) & 0x5555555555555555;
x = ((x >> 2) & 0x3333333333333333) + (x & 0x3333333333333333);
return (int)((((x + (x >> 4)) & 0x0F0F0F0F0F0F0F0F) * 0x0101010101010101) >> 56);
}
3. "Harley-Seal" block-optimized 1s-bit counterProcesses blocks of 128 bytes at a time, i.e., 16 ulong values per block. Uses the carry-save adder (shown below) to gang-add single bits across adjacent ulongs, and aggregates totals upwards as powers of two.
/// <summary> Count the number of 'set' (1-valued) bits in a range of memory. </summary>
/// <param name="p"> Pointer to an array of 64-bit ulong values to scan </param>
/// <param name="c"> Size of the memory block as a count of 64-bit ulongs </param>
/// <returns> The total number of 1-bits </returns>
public static int OnesCount(ulong* p, int c)
{
ulong z, y, x, w;
int c = 0;
for (w = x = y = z = 0UL; cq >= 16; cq -= 16)
c += OnesCount(CSA(ref w,
CSA(ref x,
CSA(ref y,
CSA(ref z, *p++, *p++),
CSA(ref z, *p++, *p++)),
CSA(ref y,
CSA(ref z, *p++, *p++),
CSA(ref z, *p++, *p++))),
CSA(ref x,
CSA(ref y,
CSA(ref z, *p++, *p++),
CSA(ref z, *p++, *p++)),
CSA(ref y,
CSA(ref z, *p++, *p++),
CSA(ref z, *p++, *p++)))));
c <<= 4;
c += (OnesCount(w) << 3) + (OnesCount(x) << 2) + (OnesCount(y) << 1) + OnesCount(z);
while (--cq >= 0)
c += OnesCount(*p++);
return c;
}
4. Carry-save adder (CSA)
/// <summary> carry-save adder </summary>
[MethodImpl(MethodImplOptions.AggressiveInlining)]
static ulong CSA(ref ulong a, ulong b, ulong c)
{
ulong v = a & b | (a ^ b) & c;
a ^= b ^ c;
return v;
}
Remarks
Because the approach shown here counts the total number of 1-bits by proceeding 128-byte chunks at a time, it only becomes optimal with larger memory block sizes. For example, likely at least some (small) multiple of that sixteen-qword (16-ulong) chunk size. For counting 1-bits in smaller memory ranges, this code will work correctly, but drastically underperform more naïve methods. See the paper for details.
From the paper, this diagram summarizes how the Carry-Save Adder works:
References
[1.] Muła, Wojciech, Nathan Kurz, and Daniel Lemire. "Faster population counts using AVX2 instructions." The Computer Journal 61, no. 1 (2017): 111-120.
Disclaimer: No attempt to compile the following code has been made.
/*
* Table counting the number of set bits in a byte.
* The byte is the index to the table.
*/
uint8_t table[256] = {...};
/***************************************************************************
*
* countBits - count the number of set bits in a range
*
* The most significant bit in the byte is considered to be bit 0.
*
* RETURNS: 0 on success, -1 on failure
*/
int countBits (
uint8_t * buffer,
int startBit, /* starting bit */
int endBit, /* End-bit (inlcusive) */
unsigned * pTotal /* Output: number of consecutively set bits */
) {
int numBits; /* number of bits left to check */
int mask; /* mask to apply to byte from <buffer> */
int bits; /* # of bits to end of byte */
unsigned count = 0; /* total number of bits set */
uint8_t value; /* value read from the buffer */
/* Return -1 if parameters fail sanity check (skipped) */
numBits = (endBit - startBit) + 1;
index = startBit >> 3;
bits = 8 - (startBit & 7);
mask = (1 << bits) - 1;
value = buffer[index] & mask; /* mask-out any bits preceding <startBit> */
numBits -= bits;
while (numBits > 0) { /* Note: if <startBit> and <endBit> are in */
count += table[value]; /* same byte, this loop gets skipped. */
index++;
value = buffer[index];
numBits -= 8;
}
if (numBits < 0) { /* mask-out any bits following <endBit> */
bits = 8 - (endBit & 7);
mask = 0xff << bits;
value &= mask;
}
count += table[value];
*pTotal = count;
return 0;
}
Edit: Function header updated.
Depending on the industry you applied in, look-up tables might not be an acceptable means of optimization while platform / compiler specific optimizations are. Knowing that most compilers and CPU instruction sets have a pop count instruction, I'd go for this. It's a simplicity vs. performance trade-off though because right now I'm still iterating over a list of chars.
Also note that contrary to most answers I assume start and end are byte-offsets because it's not specified in the question that they're not and it's the default in most cases.
int countSetBits(void *ptr, int start, int end )
{
assert(start < end);
unsigned char *s = ((unsigned char*)ptr + start);
unsigned char *e = ((unsigned char*)ptr + end);
int r = 0;
while(s != e)
{
// __builtin_clz is not defined for 0 input.
if(*s) r += 32 - __builtin_clz(*s);
s++;
}
return r;
}

How to read individual bits from an array?

Lets say i have an array dynamically allocated.
int* array=new int[10]
That is 10*4=40 bytes or 10*32=320 bits. I want to read the 2nd bit of the 30th byte or 242nd bit. What is the easiest way to do so? I know I can access the 30th byte using array[30] but accessing individual bits is more tricky.
bool bitset(void const * data, int bitindex) {
int byte = bitindex / 8;
int bit = bitindex % 8;
unsigned char const * u = (unsigned char const *) data;
return (u[byte] & (1<<bit)) != 0;
}
this is working !
#define GET_BIT(p, n) ((((unsigned char *)p)[n/8] >> (n%8)) & 0x01)
int main()
{
int myArray[2] = { 0xaaaaaaaa, 0x00ff00ff };
for( int i =0 ; i < 2*32 ; i++ )
printf("%d", GET_BIT(myArray, i));
return 0;
}
ouput :
0101010101010101010101010101010111111111000000001111111100000000
Be carefull of the endiannes !
First, if you're doing bitwise operations, it's usually
preferable to make the elements an unsigned integral type
(although in this case, it really doesn't make that much
difference). As for accessing the bits: to access bit i in an
array of n int's:
static int const bitsPerWord = sizeof(int) * CHAR_BIT;
assert( i >= 0 && i < n * bitsPerWord );
int wordIndex = i / bitsPerWord;
int bitIndex = i % bitsPerWord;
then to read:
return (array[wordIndex] & (1 << bitIndex)) != 0;
to set:
array[wordIndex] |= 1 << bitIndex;
and to reset:
array[wordIndex] &= ~(1 << bitIndex);
Or you can use bitset, if n is constant, or vector<bool> or
boost::dynamic_bitset if it's not, and let someone else do the
work.
You can use something like this:
!((array[30] & 2) == 0)
array[30] is the integer.
& 2 is an and operation which masks the second bit (2 = 00000010)
== 0 will check if the mask result is 0
! will negate that result, because we're checking if it's 1 not zero....
You need bit operations here...
if(array[5] & 0x1)
{
//the first bit in array[5] is 1
}
else
{
//the first bit is 0
}
if(array[5] & 0x8)
{
//the 4th bit in array[5] is 1
}
else
{
//the 4th bit is 0
}
0x8 is 00001000 in binary. Doing the anding masks all other bits and allows you to see if the bit is 1 or 0.
int is typically 32 bits, so you would need to do some arithmetic to get a certain bit number in the entire array.
EDITED based on comment below - array contains int of 32 bits, not 8 bits uchar.
int pos = 241; // I start at index 0
bool bit242 = (array[pos/32] >> (pos%32)) & 1;

Fast way to determine right most nth bit set in a 64 bit

I try to determine the right most nth bit set
if (value & (1 << 0)) { return 0; }
if (value & (1 << 1)) { return 1; }
if (value & (1 << 2)) { return 2; }
...
if (value & (1 << 63)) { return 63; }
if comparison needs to be done 64 times. Is there any faster way?
If you're using GCC, use the __builtin_ctz or __builtin_ffs function. (http://gcc.gnu.org/onlinedocs/gcc-4.4.0/gcc/Other-Builtins.html#index-g_t_005f_005fbuiltin_005fffs-2894)
If you're using MSVC, use the _BitScanForward function. See How to use MSVC intrinsics to get the equivalent of this GCC code?.
In POSIX there's also a ffs function. (http://linux.die.net/man/3/ffs)
There's a little trick for this:
value & -value
This uses the twos' complement integer representation of negative numbers.
Edit: This doesn't quite give the exact result as given in the question. The rest can be done with a small lookup table.
You could use a loop:
unsigned int value;
unsigned int temp_value;
const unsigned int BITS_IN_INT = sizeof(int) / CHAR_BIT;
unsigned int index = 0;
// Make a copy of the value, to alter.
temp_value = value;
for (index = 0; index < BITS_IN_INT; ++index)
{
if (temp_value & 1)
{
break;
}
temp_value >>= 1;
}
return index;
This takes up less code space than the if statement proposal, with similar functionality.
KennyTM's suggestions are good if your compiler supports them. Otherwise, you can speed it up using a binary search, something like:
int result = 0;
if (!(value & 0xffffffff)) {
result += 32;
value >>= 32;
}
if (!(value & 0xffff)) {
result += 16;
value >>= 16;
}
and so on. This will do 6 comparisons (in general, log(N) comparisons, versus N for a linear search).
b = n & (-n) // finds the bit
b -= 1; // this gives 1's to the right
b--; // this gets us just the trailing 1's that need counting
b = (b & 0x5555555555555555) + ((b>>1) & 0x5555555555555555); // 2 bit sums of 1 bit numbers
b = (b & 0x3333333333333333) + ((b>>2) & 0x3333333333333333); // 4 bit sums of 2 bit numbers
b = (b & 0x0f0f0f0f0f0f0f0f) + ((b>>4) & 0x0f0f0f0f0f0f0f0f); // 8 bit sums of 4 bit numbers
b = (b & 0x00ff00ff00ff00ff) + ((b>>8) & 0x00ff00ff00ff00ff); // 16 bit sums of 8 bit numbers
b = (b & 0x0000ffff0000ffff) + ((b>>16) & 0x0000ffff0000ffff); // 32 bit sums of 16 bit numbers
b = (b & 0x00000000ffffffff) + ((b>>32) & 0x00000000ffffffff); // sum of 32 bit numbers
b &= 63; // otherwise I think an input of 0 would produce 64 for a result.
This is in C of course.
Here's another method that takes advantage of short-circuit with logical AND operations and conditional instruction execution or the instruction pipeline.
unsigned int value;
unsigned int temp_value = value;
bool bit_found = false;
unsigned int index = 0;
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 0
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 1
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 2
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 3
//...
bit_found = !bit_found && ((temp_value & (1 << index++)); // bit 64
return index - 1; // The -1 may not be necessary depending on the starting bit number.
The advantage to this method is that there are no branches and the instruction pipeline is not disturbed. This is very fast on processors that perform conditional execution of instructions.
Works for Visual C++ 6
int toErrorCodeBit(__int64 value) {
const int low_double_word = value;
int result = 0;
__asm
{
bsf eax, low_double_word
jz low_double_value_0
mov result, eax
}
return result;
low_double_value_0:
const int upper_double_word = value >> 32;
__asm
{
bsf eax, upper_double_word
mov result, eax
}
result += 32;
return result;
}

Does anyone have an easy solution to parsing Exp-Golomb codes using C++?

Trying to decode the SDP sprop-parameter-sets values for an H.264 video stream and have found to access some of the values will involve parsing of Exp-Golomb encoded data and my method contains the base64 decoded sprop-parameter-sets data in a byte array which I now bit walking but have come up to the first part of Exp-Golomb encoded data and looking for a suitable code extract to parse these values.
Exp.-Golomb codes of what order ??
If it you need to parse H.264 bit stream (I mean transport layer) you can write a simple functions to make an access to scecified bits in the endless bit stream. Bits indexing from left to right.
inline u_dword get_bit(const u_byte * const base, u_dword offset)
{
return ((*(base + (offset >> 0x3))) >> (0x7 - (offset & 0x7))) & 0x1;
}
This function implement decoding of exp-Golomb codes of zero range (used in H.264).
u_dword DecodeUGolomb(const u_byte * const base, u_dword * const offset)
{
u_dword zeros = 0;
// calculate zero bits. Will be optimized.
while (0 == get_bit(base, (*offset)++)) zeros++;
// insert first 1 bit
u_dword info = 1 << zeros;
for (s_dword i = zeros - 1; i >= 0; i--)
{
info |= get_bit(base, (*offset)++) << i;
}
return (info - 1);
}
u_dword means unsigned 4 bytes integer.
u_byte means unsigned 1 byte integer.
Note that first byte of each NAL Unit is a specified structure with forbidden bit, NAL reference, and NAL type.
Accepted answer is not a correct implementation. It is giving wrong output. Correct implementation as per pseudo code from
"Sec 9.1 Parsing process for Exp-Golomb codes" spec T-REC-H.264-201304
int32_t getBitByPos(unsigned char *buffer, int32_t pos) {
return (buffer[pos/8] >> (8 - pos%8) & 0x01);
}
uint32_t decodeGolomb(unsigned char *byteStream, uint32_t *index) {
uint32_t leadingZeroBits = -1;
uint32_t codeNum = 0;
uint32_t pos = *index;
if (byteStream == NULL || pos == 0 ) {
printf("Invalid input\n");
return 0;
}
for (int32_t b = 0; !b; leadingZeroBits++)
b = getBitByPos(byteStream, pos++);
for (int32_t b = leadingZeroBits; b > 0; b--)
codeNum = codeNum | (getBitByPos(byteStream, pos++) << (b - 1));
*index = pos;
return ((1 << leadingZeroBits) - 1 + codeNum);
}
I wrote a c++ jpeg-ls compression library that uses golomb codes. I don't know if Exp-Golomb codes is exactly the same. The library is open source can be found at http://charls.codeplex.com. I use a lookup table to decode golomb codes <= 8 bits in length. Let me know if you have problems finding your way around.
Revised with a function to get N bits from the stream; works parsing H.264 NALs
inline uint32_t get_bit(const uint8_t * const base, uint32_t offset)
{
return ((*(base + (offset >> 0x3))) >> (0x7 - (offset & 0x7))) & 0x1;
}
inline uint32_t get_bits(const uint8_t * const base, uint32_t * const offset, uint8_t bits)
{
uint32_t value = 0;
for (int i = 0; i < bits; i++)
{
value = (value << 1) | (get_bit(base, (*offset)++) ? 1 : 0);
}
return value;
}
// This function implement decoding of exp-Golomb codes of zero range (used in H.264).
uint32_t DecodeUGolomb(const uint8_t * const base, uint32_t * const offset)
{
uint32_t zeros = 0;
// calculate zero bits. Will be optimized.
while (0 == get_bit(base, (*offset)++)) zeros++;
// insert first 1 bit
uint32_t info = 1 << zeros;
for (int32_t i = zeros - 1; i >= 0; i--)
{
info |= get_bit(base, (*offset)++) << i;
}
return (info - 1);
}

How to determine how many bytes an integer needs?

I'm looking for the most efficient way to calculate the minimum number of bytes needed to store an integer without losing precision.
e.g.
int: 10 = 1 byte
int: 257 = 2 bytes;
int: 18446744073709551615 (UINT64_MAX) = 8 bytes;
Thanks
P.S. This is for a hash functions which will be called many millions of times
Also the byte sizes don't have to be a power of two
The fastest solution seems to one based on tronics answer:
int bytes;
if (hash <= UINT32_MAX)
{
if (hash < 16777216U)
{
if (hash <= UINT16_MAX)
{
if (hash <= UINT8_MAX) bytes = 1;
else bytes = 2;
}
else bytes = 3;
}
else bytes = 4;
}
else if (hash <= UINT64_MAX)
{
if (hash < 72057594000000000ULL)
{
if (hash < 281474976710656ULL)
{
if (hash < 1099511627776ULL) bytes = 5;
else bytes = 6;
}
else bytes = 7;
}
else bytes = 8;
}
The speed difference using mostly 56 bit vals was minimal (but measurable) compared to Thomas Pornin answer. Also i didn't test the solution using __builtin_clzl which could be comparable.
Use this:
int n = 0;
while (x != 0) {
x >>= 8;
n ++;
}
This assumes that x contains your (positive) value.
Note that zero will be declared encodable as no byte at all. Also, most variable-size encodings need some length field or terminator to know where encoding stops in a file or stream (usually, when you encode an integer and mind about size, then there is more than one integer in your encoded object).
You need just two simple ifs if you are interested on the common sizes only. Consider this (assuming that you actually have unsigned values):
if (val < 0x10000) {
if (val < 0x100) // 8 bit
else // 16 bit
} else {
if (val < 0x100000000L) // 32 bit
else // 64 bit
}
Should you need to test for other sizes, choosing a middle point and then doing nested tests will keep the number of tests very low in any case. However, in that case making the testing a recursive function might be a better option, to keep the code simple. A decent compiler will optimize away the recursive calls so that the resulting code is still just as fast.
Assuming a byte is 8 bits, to represent an integer x you need [log2(x) / 8] + 1 bytes where [x] = floor(x).
Ok, I see now that the byte sizes aren't necessarily a power of two. Consider the byte sizes b. The formula is still [log2(x) / b] + 1.
Now, to calculate the log, either use lookup tables (best way speed-wise) or use binary search, which is also very fast for integers.
The function to find the position of the first '1' bit from the most significant side (clz or bsr) is usually a simple CPU instruction (no need to mess with log2), so you could divide that by 8 to get the number of bytes needed. In gcc, there's __builtin_clz for this task:
#include <limits.h>
int bytes_needed(unsigned long long x) {
int bits_needed = sizeof(x)*CHAR_BIT - __builtin_clzll(x);
if (bits_needed == 0)
return 1;
else
return (bits_needed + 7) / 8;
}
(On MSVC you would use the _BitScanReverse intrinsic.)
You may first get the highest bit set, which is the same as log2(N), and then get the bytes needed by ceil(log2(N) / 8).
Here are some bit hacks for getting the position of the highest bit set, which are copied from http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious, and you can click the URL for details of how these algorithms work.
Find the integer log base 2 of an integer with an 64-bit IEEE float
int v; // 32-bit integer to find the log base 2 of
int r; // result of log_2(v) goes here
union { unsigned int u[2]; double d; } t; // temp
t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] = 0x43300000;
t.u[__FLOAT_WORD_ORDER!=LITTLE_ENDIAN] = v;
t.d -= 4503599627370496.0;
r = (t.u[__FLOAT_WORD_ORDER==LITTLE_ENDIAN] >> 20) - 0x3FF;
Find the log base 2 of an integer with a lookup table
static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};
unsigned int v; // 32-bit word to find the log of
unsigned r; // r will be lg(v)
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else
{
r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}
Find the log base 2 of an N-bit integer in O(lg(N)) operations
unsigned int v; // 32-bit value to find the log2 of
const unsigned int b[] = {0x2, 0xC, 0xF0, 0xFF00, 0xFFFF0000};
const unsigned int S[] = {1, 2, 4, 8, 16};
int i;
register unsigned int r = 0; // result of log2(v) will go here
for (i = 4; i >= 0; i--) // unroll for speed...
{
if (v & b[i])
{
v >>= S[i];
r |= S[i];
}
}
// OR (IF YOUR CPU BRANCHES SLOWLY):
unsigned int v; // 32-bit value to find the log2 of
register unsigned int r; // result of log2(v) will go here
register unsigned int shift;
r = (v > 0xFFFF) << 4; v >>= r;
shift = (v > 0xFF ) << 3; v >>= shift; r |= shift;
shift = (v > 0xF ) << 2; v >>= shift; r |= shift;
shift = (v > 0x3 ) << 1; v >>= shift; r |= shift;
r |= (v >> 1);
// OR (IF YOU KNOW v IS A POWER OF 2):
unsigned int v; // 32-bit value to find the log2 of
static const unsigned int b[] = {0xAAAAAAAA, 0xCCCCCCCC, 0xF0F0F0F0,
0xFF00FF00, 0xFFFF0000};
register unsigned int r = (v & b[0]) != 0;
for (i = 4; i > 0; i--) // unroll for speed...
{
r |= ((v & b[i]) != 0) << i;
}
Find the number of bits by taking the log2 of the number, then divide that by 8 to get the number of bytes.
You can find logn of x by the formula:
logn(x) = log(x) / log(n)
Update:
Since you need to do this really quickly, Bit Twiddling Hacks has several methods for quickly calculating log2(x). The look-up table approach seems like it would suit your needs.
This will get you the number of bytes. It's not strictly the most efficient, but unless you're programming a nanobot powered by the energy contained in a red blood cell, it won't matter.
int count = 0;
while (numbertotest > 0)
{
numbertotest >>= 8;
count++;
}
You could write a little template meta-programming code to figure it out at compile time if you need it for array sizes:
template<unsigned long long N> struct NBytes
{ static const size_t value = NBytes<N/256>::value+1; };
template<> struct NBytes<0>
{ static const size_t value = 0; };
int main()
{
std::cout << "short = " << NBytes<SHRT_MAX>::value << " bytes\n";
std::cout << "int = " << NBytes<INT_MAX>::value << " bytes\n";
std::cout << "long long = " << NBytes<ULLONG_MAX>::value << " bytes\n";
std::cout << "10 = " << NBytes<10>::value << " bytes\n";
std::cout << "257 = " << NBytes<257>::value << " bytes\n";
return 0;
}
output:
short = 2 bytes
int = 4 bytes
long long = 8 bytes
10 = 1 bytes
257 = 2 bytes
Note: I know this isn't answering the original question, but it answers a related question that people will be searching for when they land on this page.
Floor((log2(N) / 8) + 1) bytes
You need exactly the log function
nb_bytes = floor(log(x)/log(256))+1
if you use log2, log2(256) == 8 so
floor(log2(x)/8)+1
You need to raise 256 to successive powers until the result is larger than your value.
For example: (Tested in C#)
long long limit = 1;
int byteCount;
for (byteCount = 1; byteCount < 8; byteCount++) {
limit *= 256;
if (limit > value)
break;
}
If you only want byte sizes to be powers of two (If you don't want 65,537 to return 3), replace byteCount++ with byteCount *= 2.
I think this is a portable implementation of the straightforward formula:
#include <limits.h>
#include <math.h>
#include <stdio.h>
int main(void) {
int i;
unsigned int values[] = {10, 257, 67898, 140000, INT_MAX, INT_MIN};
for ( i = 0; i < sizeof(values)/sizeof(values[0]); ++i) {
printf("%d needs %.0f bytes\n",
values[i],
1.0 + floor(log(values[i]) / (M_LN2 * CHAR_BIT))
);
}
return 0;
}
Output:
10 needs 1 bytes
257 needs 2 bytes
67898 needs 3 bytes
140000 needs 3 bytes
2147483647 needs 4 bytes
-2147483648 needs 4 bytes
Whether and how much the lack of speed and the need to link floating point libraries depends on your needs.
I know this question didn't ask for this type of answer but for those looking for a solution using the smallest number of characters, this does the assignment to a length variable in 17 characters, or 25 including the declaration of the length variable.
//Assuming v is the value that is being counted...
int l=0;
for(;v>>l*8;l++);
This is based on SoapBox's idea of creating a solution that contains no jumps, branches etc... Unfortunately his solution was not quite correct. I have adopted the spirit and here's a 32bit version, the 64bit checks can be applied easily if desired.
The function returns number of bytes required to store the given integer.
unsigned short getBytesNeeded(unsigned int value)
{
unsigned short c = 0; // 0 => size 1
c |= !!(value & 0xFF00); // 1 => size 2
c |= (!!(value & 0xFF0000)) << 1; // 2 => size 3
c |= (!!(value & 0xFF000000)) << 2; // 4 => size 4
static const int size_table[] = { 1, 2, 3, 3, 4, 4, 4, 4 };
return size_table[c];
}
For each of eight times, shift the int eight bits to the right and see if there are still 1-bits left. The number of times you shift before you stop is the number of bytes you need.
More succinctly, the minimum number of bytes you need is ceil(min_bits/8), where min_bits is the index (i+1) of the highest set bit.
There are a multitude of ways to do this.
Option #1.
int numBytes = 0;
do {
numBytes++;
} while (i >>= 8);
return (numBytes);
In the above example, is the number you are testing, and generally works for any processor, any size of integer.
However, it might not be the fastest. Alternatively, you can try a series of if statements ...
For a 32 bit integers
if ((upper = (value >> 16)) == 0) {
/* Bit in lower 16 bits may be set. */
if ((high = (value >> 8)) == 0) {
return (1);
}
return (2);
}
/* Bit in upper 16 bits is set */
if ((high = (upper >> 8)) == 0) {
return (3);
}
return (4);
For 64 bit integers, Another level of if statements would be required.
If the speed of this routine is as critical as you say, it might be worthwhile to do this in assembler if you want it as a function call. That could allow you to avoid creating and destroying the stack frame, saving a few extra clock cycles if it is that critical.
A bit basic, but since there will be a limited number of outputs, can you not pre-compute the breakpoints and use a case statement? No need for calculations at run-time, only a limited number of comparisons.
Why not just use a 32-bit hash?
That will work at near-top-speed everywhere.
I'm rather confused as to why a large hash would even be wanted. If a 4-byte hash works, why not just use it always? Excepting cryptographic uses, who has hash tables with more then 232 buckets anyway?
there are lots of great recipes for stuff like this over at Sean Anderson's "Bit Twiddling Hacks" page.
This code has 0 branches, which could be faster on some systems. Also on some systems (GPGPU) its important for threads in the same warp to execute the same instructions. This code is always the same number of instructions no matter what the input value.
inline int get_num_bytes(unsigned long long value) // where unsigned long long is the largest integer value on this platform
{
int size = 1; // starts at 1 sot that 0 will return 1 byte
size += !!(value & 0xFF00);
size += !!(value & 0xFFFF0000);
if (sizeof(unsigned long long) > 4) // every sane compiler will optimize this out
{
size += !!(value & 0xFFFFFFFF00000000ull);
if (sizeof(unsigned long long) > 8)
{
size += !!(value & 0xFFFFFFFFFFFFFFFF0000000000000000ull);
}
}
static const int size_table[] = { 1, 2, 4, 8, 16 };
return size_table[size];
}
g++ -O3 produces the following (verifying that the ifs are optimized out):
xor %edx,%edx
test $0xff00,%edi
setne %dl
xor %eax,%eax
test $0xffff0000,%edi
setne %al
lea 0x1(%rdx,%rax,1),%eax
movabs $0xffffffff00000000,%rdx
test %rdx,%rdi
setne %dl
lea (%rdx,%rax,1),%rax
and $0xf,%eax
mov _ZZ13get_num_bytesyE10size_table(,%rax,4),%eax
retq
Why so complicated? Here's what I came up with:
bytesNeeded = (numBits/8)+((numBits%8) != 0);
Basically numBits divided by eight + 1 if there is a remainder.
There are already a lot of answers here, but if you know the number ahead of time, in c++ you can use a template to make use of the preprocessor.
template <unsigned long long N>
struct RequiredBytes {
enum : int { value = 1 + (N > 255 ? RequiredBits<(N >> 8)>::value : 0) };
};
template <>
struct RequiredBytes<0> {
enum : int { value = 1 };
};
const int REQUIRED_BYTES_18446744073709551615 = RequiredBytes<18446744073709551615>::value; // 8
or for a bits version:
template <unsigned long long N>
struct RequiredBits {
enum : int { value = 1 + RequiredBits<(N >> 1)>::value };
};
template <>
struct RequiredBits<1> {
enum : int { value = 1 };
};
template <>
struct RequiredBits<0> {
enum : int { value = 1 };
};
const int REQUIRED_BITS_42 = RequiredBits<42>::value; // 6