Getting and setting bits in a byte in C/C++ - c++

I've created a function that enables you to get a range of bits in a byte, counting bit 0 for the least significant bit (LSb) on the right and 7 for the most significant bit (MSb) on the left for code in C or C++. However, it's not so straightforward to set bits. This can be generalized for short, int, long etc., but for now I'm sticking with bytes.
Given the following:
#include <stdio.h>
#typedef unsigned char BYTE;
BYTE getByteBits(BYTE n, BYTE b) {
return (n < 8) ? b & ((0x01 << (n + 1)) - 1) : 0;
}
where the bits we want to extract are between bits 0 and n, and b is the full byte. If n is 0 only 0 or 1 is returned for this LSb, if n is 1, values between 0 and 3 can be returned, etc., up to n is 7 for the full byte. Of course in the latter case the code is redundant.
This can be called from main() by using something like:
BYTE num = getByteBits(4, myByte);
which will return the numerical value from the 4 lowest bits. However, I've discovered that this can be generalized to:
BYTE num = getByteBits(n - m, myByte >> m);
which will extract the value returned by the bits from m to n, such that 0 <= m <= n <= 7. This is done by shifting myByte by m bits to the right, which is equivalent to shifting the mask to the left.
However, so far I've been unable to do something similar to set bits using a function with three arguments. The best I can do is to create the function:
BYTE setByteBits(BYTE n, BYTE m, BYTE b, BYTE c) {
BYTE mask = ((0x01 << (n + 1)) - 1) << m;
return (b & ~mask) | (c & mask);
}
where n and m are the same as before, b is the value of the original byte, and c is the value of some of the bits we want to change. The update byte is returned. This would be called as:
BYTE num = setByteBits(n - m, m, MyByte, c << m);
but 4 rather than 3 arguments are needed, and rather than shifting myByte by m bits to the right, the mask is shifted to the left together with its complement in the function, as given be the 2nd argument m. Although as far as I can see this works correctly, so far all attempts to do something similar as getByteBits() with 3 arguments have failed.
Does anybody have any idea about this? Also I would appreciate if any bugs can be found in the code. Incidentally, for setByteBits() I made use of the link:
How do you set only certain bits of a byte in C without affecting the rest?
Many thanks.

so far I've been unable to do something similar to set bits using a function with three arguments
I do not understand your code, because you use so many m n c b one letter variables. Maybe try to be more descriptive? You don't have to write short code, there is no performance gain in that.
When n is the stopping bit position inside the byte, 1 << (n + 1) will give you too big mask. You have to shift 1 of the length of the range of bits, and then that mask shift to the left by the start position. I mixed n with m so many times in your code, I do not know which is which.
The following works, at least for tests I tried:
#include <stdio.h>
#include <stdint.h>
#include <assert.h>
#include <limits.h>
typedef unsigned char byte;
#define BYTE_BITS CHAR_BIT
typedef uint_fast8_t bitpos;
/**
* Set bits in the byte `thebyte`
* between the range of bit `rangestart` inclusive to `rangestop` exclusive
* counting from 0 from LSB
* the the range of these bits inside `masktoset`.
*/
byte setByteBits(bitpos rangestart,
bitpos rangestop,
byte thebyte,
byte masktoset) {
assert(rangestop <= BYTE_BITS);
assert(rangestart < rangestop);
const bitpos rangelen = rangestop - rangestart;
const byte rangemask = (1u << rangelen) - 1u;
const byte mask = rangemask << rangestart;
return (thebyte & ~mask) | (masktoset & mask);
}
int main() {
const int tests[][5] = {
{ 4,6,0,0xff,0b00110000 },
{ 4,6,0,0b01010101,0b00010000 },
{ 4,6,0,0b01100101,0b00100000 },
{ 4,6,0xff,0b01000101,0b11001111 },
{ 0,8,0,0xff,0xff },
{ 0,8,0,0xab,0xab },
{ 0,4,0xfa,0xab,0xfb },
{ 0,4,0xab,0xcd,0xad },
{ 4,8,0xab,0xcd,0xcb },
{ 4,8,0xef,0xab,0xaf },
};
for (size_t i = 0; i <sizeof(tests)/sizeof(*tests); ++i) {
const int *const t = tests[i];
const byte r = setByteBits(t[0],t[1],t[2],t[3]);
fprintf(stderr, "%d (%#02x,%#02x,%#02x,%#02x)->%#02x ?= %#02x\n",
i,t[0],t[1],t[2],t[3],r,t[4]);
assert(r == t[4]);
}
}

Related

How shifting operator works in finding number of different bit in two integer?

i was trying to find out number of different bit in two number. i find a solution here but couldn't understand how it works.it right shifting with i and and doing and with 1. actually what is happening behind it? and why do loop through 32?
void solve(int A, int B)
{
int count = 0;
// since, the numbers are less than 2^31
// run the loop from '0' to '31' only
for (int i = 0; i < 32; i++) {
// right shift both the numbers by 'i' and
// check if the bit at the 0th position is different
if (((A >> i) & 1) != ((B >> i) & 1)) {
count++;
}
}
cout << "Number of different bits : " << count << endl;
}
The loop runs from 0 up to and including 31 (not through 32) because these are all of the possible bits that comprise a 32-bit integer and we need to check them all.
Inside the loop, the code
if (((A >> i) & 1) != ((B >> i) & 1)) {
count++;
}
works by shifting each of the two integers rightward by i (cutting off bits if i > 0), extracting the rightmost bit after the shift (& 1) and checking that they're the same (i.e. both 0 or both 1).
Let's walk through an example: solve(243, 2182). In binary:
243 = 11110011
2182 = 100010000110
diff bits = ^ ^^^ ^ ^
int bits = 00000000000000000000000000000000
i = 31 0
<-- loop direction
The indices of i that yield differences are 0, 2, 4, 5, 6 and 11 (we check from the right to the left--in the first iteration, i = 0 and nothing gets shifted, so & 1 gives us the rightmost bit, etc). The padding to the left of each number is all 0s in the above example.
Also, note that there are better ways to do this without a loop: take the XOR of the two numbers and run a popcount on them (count the bits that are set):
__builtin_popcount(243 ^ 2182); // => 6
Or, more portably:
std::bitset<CHAR_BIT * sizeof(int)>(243 ^ 2182).count()
Another note: best to avoid using namespace std;, return a value instead of producing a print side effect and give the method a clearer name than solve, for example bit_diff (I realize this is from geeksforgeeks).

How to random flip binary bit of char in C/C++

If I have a char array A, I use it to store hex
A = "0A F5 6D 02" size=11
The binary representation of this char array is:
00001010 11110101 01101101 00000010
I want to ask is there any function can random flip the bit?
That is:
if the parameter is 5
00001010 11110101 01101101 00000010
-->
10001110 11110001 01101001 00100010
it will random choose 5 bit to flip.
I am trying make this hex data to binary data and use bitmask method to achieve my requirement. Then turn it back to hex. I am curious is there any method to do this job more quickly?
Sorry, my question description is not clear enough. In simply, I have some hex data, and I want to simulate bit error in these data. For example, if I have 5 byte hex data:
"FF00FF00FF"
binary representation is
"1111111100000000111111110000000011111111"
If the bit error rate is 10%. Then I want to make these 40 bits have 4 bits error. One extreme random result: error happened in the first 4 bit:
"0000111100000000111111110000000011111111"
First of all, find out which char the bit represents:
param is your bit to flip...
char *byteToWrite = &A[sizeof(A) - (param / 8) - 1];
So that will give you a pointer to the char at that array offset (-1 for 0 array offset vs size)
Then get modulus (or more bit shifting if you're feeling adventurous) to find out which bit in here to flip:
*byteToWrite ^= (1u << param % 8);
So that should result for a param of 5 for the byte at A[10] to have its 5th bit toggled.
store the values of 2^n in an array
generate a random number seed
loop through x times (in this case 5) and go data ^= stored_values[random_num]
Alternatively to storing the 2^n values in an array, you could do some bit shifting to a random power of 2 like:
data ^= (1<<random%7)
Reflecting the first comment, you really could just write out that line 5 times in your function and avoid the overhead of a for loop entirely.
You have 32 bit number. You can treate the bits as parts of hte number and just xor this number with some random 5-bits-on number.
int count_1s(int )
{
int m = 0x55555555;
int r = (foo&m) + ((foo>>>1)&m);
m = 0x33333333;
r = (r&m) + ((r>>>2)&m);
m = 0x0F0F0F0F;
r = (r&m) + ((r>>>4)&m);
m = 0x00FF00FF;
r = (r&m) + ((r>>>8)&m);
m = 0x0000FFFF;
return r = (r&m) + ((r>>>16)&m);
}
void main()
{
char input[] = "0A F5 6D 02";
char data[4] = {};
scanf("%2x %2x %2x %2x", &data[0], &data[1], &data[2], &data[3]);
int *x = reinterpret_cast<int*>(data);
int y = rand();
while(count_1s(y) != 5)
{
y = rand(); // let's have this more random
}
*x ^= y;
printf("%2x %2x %2x %2x" data[0], data[1], data[2], data[3]);
return 0;
}
I see no reason to convert the entire string back and forth from and to hex notation. Just pick a random character out of the hex string, convert this to a digit, change it a bit, convert back to hex character.
In plain C:
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
int main (void)
{
char *hexToDec_lookup = "0123456789ABCDEF";
char hexstr[] = "0A F5 6D 02";
/* 0. make sure we're fairly random */
srand(time(0));
/* 1. loop 5 times .. */
int i;
for (i=0; i<5; i++)
{
/* 2. pick a random hex digit
we know it's one out of 8, grouped per 2 */
int hexdigit = rand() & 7;
hexdigit += (hexdigit>>1);
/* 3. convert the digit to binary */
int hexvalue = hexstr[hexdigit] > '9' ? hexstr[hexdigit] - 'A'+10 : hexstr[hexdigit]-'0';
/* 4. flip a random bit */
hexvalue ^= 1 << (rand() & 3);
/* 5. write it back into position */
hexstr[hexdigit] = hexToDec_lookup[hexvalue];
printf ("[%s]\n", hexstr);
}
return 0;
}
It might even be possible to omit the convert-to-and-from-ASCII steps -- flip a bit in the character string, check if it's still a valid hex digit and if necessary, adjust.
First randomly chose x positions (each position consist of array index and the bit position).
Now if you want to flip ith bit from right for a number n. Find the remainder of n by 2n as :
code:
int divisor = (2,i);
int remainder = n % divisor;
int quotient = n / divisor;
remainder = (remainder == 0) ? 1 : 0; // flip the remainder or the i th bit from right.
n = divisor * quotient + remainder;
Take mod 8 of input(5%8)
Shift 0x80 to right by input value (e.g 5)
XOR this value with (input/8)th element of your character array.
code:
void flip_bit(int bit)
{
Array[bit/8] ^= (0x80>>(bit%8));
}

Get Integer From Bits Inside `std::vector<char>`

I have a vector<char> and I want to be able to get an unsigned integer from a range of bits within the vector. E.g.
And I can't seem to be able to write the correct operations to get the desired output. My intended algorithm goes like this:
& the first byte with (0xff >> unused bits in byte on the left)
<< the result left the number of output bytes * number of bits in a byte
| this with the final output
For each subsequent byte:
<< left by the (byte width - index) * bits per byte
| this byte with the final output
| the final byte (not shifted) with the final output
>> the final output by the number of unused bits in the byte on the right
And here is my attempt at coding it, which does not give the correct result:
#include <vector>
#include <iostream>
#include <cstdint>
#include <bitset>
template<class byte_type = char>
class BitValues {
private:
std::vector<byte_type> bytes;
public:
static const auto bits_per_byte = 8;
BitValues(std::vector<byte_type> bytes) : bytes(bytes) {
}
template<class return_type>
return_type get_bits(int start, int end) {
auto byte_start = (start - (start % bits_per_byte)) / bits_per_byte;
auto byte_end = (end - (end % bits_per_byte)) / bits_per_byte;
auto byte_width = byte_end - byte_start;
return_type value = 0;
unsigned char first = bytes[byte_start];
first &= (0xff >> start % 8);
return_type first_wide = first;
first_wide <<= byte_width;
value |= first_wide;
for(auto byte_i = byte_start + 1; byte_i <= byte_end; byte_i++) {
auto byte_offset = (byte_width - byte_i) * bits_per_byte;
unsigned char next_thin = bytes[byte_i];
return_type next_byte = next_thin;
next_byte <<= byte_offset;
value |= next_byte;
}
value >>= (((byte_end + 1) * bits_per_byte) - end) % bits_per_byte;
return value;
}
};
int main() {
BitValues<char> bits(std::vector<char>({'\x78', '\xDA', '\x05', '\x5F', '\x8A', '\xF1', '\x0F', '\xA0'}));
std::cout << bits.get_bits<unsigned>(15, 29) << "\n";
return 0;
}
(In action: http://coliru.stacked-crooked.com/a/261d32875fcf2dc0)
I just can't seem to wrap my head around these bit manipulations, and I find debugging very difficult! If anyone can correct the above code, or help me in any way, it would be much appreciated!
Edit:
My bytes are 8 bits long
The integer to return could be 8,16,32 or 64 bits wside
The integer is stored in big endian
You made two primary mistakes. The first is here:
first_wide <<= byte_width;
You should be shifting by a bit count, not a byte count. Corrected code is:
first_wide <<= byte_width * bits_per_byte;
The second mistake is here:
auto byte_offset = (byte_width - byte_i) * bits_per_byte;
It should be
auto byte_offset = (byte_end - byte_i) * bits_per_byte;
The value in parenthesis needs to be the number of bytes to shift right by, which is also the number of bytes byte_i is away from the end. The value byte_width - byte_i has no semantic meaning (one is a delta, the other is an index)
The rest of the code is fine. Though, this algorithm has two issues with it.
First, when using your result type to accumulate bits, you assume you have room on the left to spare. This isn't the case if there are set bits near the right boundry and the choice of range causes the bits to be shifted out. For example, try running
bits.get_bits<uint16_t>(11, 27);
You'll get the result 42 which corresponds to the bit string 00000000 00101010 The correct result is 53290 with the bit string 11010000 00101010. Notice how the rightmost 4 bits got zeroed out. This is because you start off by overshifting your value variable, causing those four bits to be shifted out of the variable. When shifting back at the end, this results in the bits being zeroed out.
The second problem has to do with the right shift at the end. If the rightmost bit of the value variable happens to be a 1 before the right shift at the end, and the template parameter is a signed type, then the right shift that is done is an 'arithmetic' right shift, which causes bits on the right to be 1-filled, leaving you with an incorrect negative value.
Example, try running:
bits.get_bits<int16_t>(5, 21);
The expected result should be 6976 with the bit string 00011011 01000000, but the current implementation returns -1216 with the bit string 11111011 01000000.
I've put my implementation of this below which builds the bit string from the right to the left, placing bits in their correct positions to start with so that the above two problems are avoided:
template<class ReturnType>
ReturnType get_bits(int start, int end) {
int max_bits = kBitsPerByte * sizeof(ReturnType);
if (end - start > max_bits) {
start = end - max_bits;
}
int inclusive_end = end - 1;
int byte_start = start / kBitsPerByte;
int byte_end = inclusive_end / kBitsPerByte;
// Put in the partial-byte on the right
uint8_t first = bytes_[byte_end];
int bit_offset = (inclusive_end % kBitsPerByte);
first >>= 7 - bit_offset;
bit_offset += 1;
ReturnType ret = 0 | first;
// Add the rest of the bytes
for (int i = byte_end - 1; i >= byte_start; i--) {
ReturnType tmp = (uint8_t) bytes_[i];
tmp <<= bit_offset;
ret |= tmp;
bit_offset += kBitsPerByte;
}
// Mask out the partial byte on the left
int shift_amt = (end - start);
if (shift_amt < max_bits) {
ReturnType mask = (1 << shift_amt) - 1;
ret &= mask;
}
}
There is one thing you certainly missed I think: the way you index the bits in the vector is different from what you have been given in the problem. I.e. with algorithm you outlined, the order of the bits will be like 7 6 5 4 3 2 1 0 | 15 14 13 12 11 10 9 8 | 23 22 21 .... Frankly, I didn't read through your whole algorithm, but this one was missed in the very first step.
Interesting problem. I've done similar, for some systems work.
Your char is 8 bits wide? Or 16? How big is your integer? 32 or 64?
Ignore the vector complexity for a minute.
Think about it as just an array of bits.
How many bits do you have? You have 8*number of chars
You need to calculate a starting char, number of bits to extract, ending char, number of bits there, and number of chars in the middle.
You will need bitwise-and & for the first partial char
you will need bitwise-and & for the last partial char
you will need left-shift << (or right-shift >>), depending upon which order you start from
what is the endian-ness of your Integer?
At some point you will calculate an index into your array that is bitindex/char_bit_width, you gave the value 171 as your bitindex, and 8 as your char_bit_width, so you will end up with these useful values calculated:
171/8 = 23 //location of first byte
171%8 = 3 //bits in first char/byte
8 - 171%8 = 5 //bits in last char/byte
sizeof(integer) = 4
sizeof(integer) + ( (171%8)>0?1:0 ) // how many array positions to examine
Some assembly required...

How to read/write arbitrary bits in C/C++

Assuming I have a byte b with the binary value of 11111111
How do I for example read a 3 bit integer value starting at the second bit or write a four bit integer value starting at the fifth bit?
Some 2+ years after I asked this question I'd like to explain it the way I'd want it explained back when I was still a complete newb and would be most beneficial to people who want to understand the process.
First of all, forget the "11111111" example value, which is not really all that suited for the visual explanation of the process. So let the initial value be 10111011 (187 decimal) which will be a little more illustrative of the process.
1 - how to read a 3 bit value starting from the second bit:
___ <- those 3 bits
10111011
The value is 101, or 5 in decimal, there are 2 possible ways to get it:
mask and shift
In this approach, the needed bits are first masked with the value 00001110 (14 decimal) after which it is shifted in place:
___
10111011 AND
00001110 =
00001010 >> 1 =
___
00000101
The expression for this would be: (value & 14) >> 1
shift and mask
This approach is similar, but the order of operations is reversed, meaning the original value is shifted and then masked with 00000111 (7) to only leave the last 3 bits:
___
10111011 >> 1
___
01011101 AND
00000111
00000101
The expression for this would be: (value >> 1) & 7
Both approaches involve the same amount of complexity, and therefore will not differ in performance.
2 - how to write a 3 bit value starting from the second bit:
In this case, the initial value is known, and when this is the case in code, you may be able to come up with a way to set the known value to another known value which uses less operations, but in reality this is rarely the case, most of the time the code will know neither the initial value, nor the one which is to be written.
This means that in order for the new value to be successfully "spliced" into byte, the target bits must be set to zero, after which the shifted value is "spliced" in place, which is the first step:
___
10111011 AND
11110001 (241) =
10110001 (masked original value)
The second step is to shift the value we want to write in the 3 bits, say we want to change that from 101 (5) to 110 (6)
___
00000110 << 1 =
___
00001100 (shifted "splice" value)
The third and final step is to splice the masked original value with the shifted "splice" value:
10110001 OR
00001100 =
___
10111101
The expression for the whole process would be: (value & 241) | (6 << 1)
Bonus - how to generate the read and write masks:
Naturally, using a binary to decimal converter is far from elegant, especially in the case of 32 and 64 bit containers - decimal values get crazy big. It is possible to easily generate the masks with expressions, which the compiler can efficiently resolve during compilation:
read mask for "mask and shift": ((1 << fieldLength) - 1) << (fieldIndex - 1), assuming that the index at the first bit is 1 (not zero)
read mask for "shift and mask": (1 << fieldLength) - 1 (index does not play a role here since it is always shifted to the first bit
write mask : just invert the "mask and shift" mask expression with the ~ operator
How does it work (with the 3bit field beginning at the second bit from the examples above)?
00000001 << 3
00001000 - 1
00000111 << 1
00001110 ~ (read mask)
11110001 (write mask)
The same examples apply to wider integers and arbitrary bit width and position of the fields, with the shift and mask values varying accordingly.
Also note that the examples assume unsigned integer, which is what you want to use in order to use integers as portable bit-field alternative (regular bit-fields are in no way guaranteed by the standard to be portable), both left and right shift insert a padding 0, which is not the case with right shifting a signed integer.
Even easier:
Using this set of macros (but only in C++ since it relies on the generation of member functions):
#define GETMASK(index, size) ((((size_t)1 << (size)) - 1) << (index))
#define READFROM(data, index, size) (((data) & GETMASK((index), (size))) >> (index))
#define WRITETO(data, index, size, value) ((data) = (((data) & (~GETMASK((index), (size)))) | (((value) << (index)) & (GETMASK((index), (size))))))
#define FIELD(data, name, index, size) \
inline decltype(data) name() const { return READFROM(data, index, size); } \
inline void set_##name(decltype(data) value) { WRITETO(data, index, size, value); }
You could go for something as simple as:
struct A {
uint bitData;
FIELD(bitData, one, 0, 1)
FIELD(bitData, two, 1, 2)
};
And have the bit fields implemented as properties you can easily access:
A a;
a.set_two(3);
cout << a.two();
Replace decltype with gcc's typeof pre-C++11.
You need to shift and mask the value, so for example...
If you want to read the first two bits, you just need to mask them off like so:
int value = input & 0x3;
If you want to offset it you need to shift right N bits and then mask off the bits you want:
int value = (intput >> 1) & 0x3;
To read three bits like you asked in your question.
int value = (input >> 1) & 0x7;
just use this and feelfree:
#define BitVal(data,y) ( (data>>y) & 1) /** Return Data.Y value **/
#define SetBit(data,y) data |= (1 << y) /** Set Data.Y to 1 **/
#define ClearBit(data,y) data &= ~(1 << y) /** Clear Data.Y to 0 **/
#define TogleBit(data,y) (data ^=BitVal(y)) /** Togle Data.Y value **/
#define Togle(data) (data =~data ) /** Togle Data value **/
for example:
uint8_t number = 0x05; //0b00000101
uint8_t bit_2 = BitVal(number,2); // bit_2 = 1
uint8_t bit_1 = BitVal(number,1); // bit_1 = 0
SetBit(number,1); // number = 0x07 => 0b00000111
ClearBit(number,2); // number =0x03 => 0b0000011
You have to do a shift and mask (AND) operation.
Let b be any byte and p be the index (>= 0) of the bit from which you want to take n bits (>= 1).
First you have to shift right b by p times:
x = b >> p;
Second you have to mask the result with n ones:
mask = (1 << n) - 1;
y = x & mask;
You can put everything in a macro:
#define TAKE_N_BITS_FROM(b, p, n) ((b) >> (p)) & ((1 << (n)) - 1)
"How do I for example read a 3 bit integer value starting at the second bit?"
int number = // whatever;
uint8_t val; // uint8_t is the smallest data type capable of holding 3 bits
val = (number & (1 << 2 | 1 << 3 | 1 << 4)) >> 2;
(I assumed that "second bit" is bit #2, i. e. the third bit really.)
To read bytes use std::bitset
const int bits_in_byte = 8;
char myChar = 's';
cout << bitset<sizeof(myChar) * bits_in_byte>(myChar);
To write you need to use bit-wise operators such as & ^ | & << >>. make sure to learn what they do.
For example to have 00100100 you need to set the first bit to 1, and shift it with the << >> operators 5 times. if you want to continue writing you just continue to set the first bit and shift it. it's very much like an old typewriter: you write, and shift the paper.
For 00100100: set the first bit to 1, shift 5 times, set the first bit to 1, and shift 2 times:
const int bits_in_byte = 8;
char myChar = 0;
myChar = myChar | (0x1 << 5 | 0x1 << 2);
cout << bitset<sizeof(myChar) * bits_in_byte>(myChar);
int x = 0xFF; //your number - 11111111
How do I for example read a 3 bit integer value starting at the second bit
int y = x & ( 0x7 << 2 ) // 0x7 is 111
// and you shift it 2 to the left
If you keep grabbing bits from your data, you might want to use a bitfield. You'll just have to set up a struct and load it with only ones and zeroes:
struct bitfield{
unsigned int bit : 1
}
struct bitfield *bitstream;
then later on load it like this (replacing char with int or whatever data you are loading):
long int i;
int j, k;
unsigned char c, d;
bitstream=malloc(sizeof(struct bitfield)*charstreamlength*sizeof(char));
for (i=0; i<charstreamlength; i++){
c=charstream[i];
for(j=0; j < sizeof(char)*8; j++){
d=c;
d=d>>(sizeof(char)*8-j-1);
d=d<<(sizeof(char)*8-1);
k=d;
if(k==0){
bitstream[sizeof(char)*8*i + j].bit=0;
}else{
bitstream[sizeof(char)*8*i + j].bit=1;
}
}
}
Then access elements:
bitstream[bitpointer].bit=...
or
...=bitstream[bitpointer].bit
All of this is assuming are working on i86/64, not arm, since arm can be big or little endian.

algorithm to figure out how many bytes are required to hold an int

sorry for the stupid question, but how would I go about figuring out, mathematically or using c++, how many bytes it would take to store an integer.
If you mean from an information theory point of view, then the easy answer is:
log(number) / log(2)
(It doesn't matter if those are natural, binary, or common logarithms, because of the division by log(2), which calculates the logarithm with base 2.)
This reports the number of bits necessary to store your number.
If you're interested in how much memory is required for the efficient or usual encoding of your number in a specific language or environment, you'll need to do some research. :)
The typical C and C++ ranges for integers are:
char 1 byte
short 2 bytes
int 4 bytes
long 8 bytes
If you're interested in arbitrary-sized integers, special libraries are available, and every library will have its own internal storage mechanism, but they'll typically store numbers via 4- or 8- byte chunks up to the size of the number.
You could find the first power of 2 that's larger than your number, and divide that power by 8, then round the number up to the nearest integer. So for 1000, the power of 2 is 1024 or 2^10; divide 10 by 8 to get 1.25, and round up to 2. You need two bytes to hold 1000!
If you mean "how large is an int" then sizeof(int) is the answer.
If you mean "how small a type can I use to store values of this magnitude" then that's a bit more complex. If you already have the value in integer form, then presumably it fits in 4, 3, 2, or 1 bytes. For unsigned values, if it's 16777216 or over you need 4 bytes, 65536-16777216 requires 3 bytes, 256-65535 needs 2, and 0-255 fits in 1 byte. The formula for this comes from the fact that each byte can hold 8 bits, and each bit holds 2 digits, so 1 byte holds 2^8 values, ie. 256 (but starting at 0, so 0-255). 2 bytes therefore holds 2^16 values, ie. 65536, and so on.
You can generalise that beyond the normal 4 bytes used for a typical int if you like. If you need to accommodate signed integers as well as unsigned, bear in mind that 1 bit is effectively used to store whether it is positive or negative, so the magnitude is 1 power of 2 less.
You can calculate the number of bits you need iteratively from an integer by dividing it by two and discarding the remainder. Each division you can make and still have a non-zero value means you have one more bit of data in use - and every 8 bits you're using means 1 byte.
A quick way of calculating this is to use the shift right function and compare the result against zero.
int value = 23534; // or whatever
int bits = 0;
while (value)
{
value >> 1;
++bits;
}
std::cout << "Bits used = " << bits << std::endl;
std::cout << "Bytes used = " << (bits / 8) + 1 << std::endl;
This is basically the same question as "how many binary digits would it take to store a number x?" All you need is the logarithm.
A n-bit integer can store numbers up to 2n-1. So, given a number x, ceil(log2 x) gets you the number of digits you need.
It's exactly the same thing as figuring out how many decimal digits you need to write a number by hand. For example, log10 123456 = 5.09151220... , so ceil( log10(123456) ) = 6, six digits.
Since nobody put up the simplest code that works yet, I mind as well do it:
unsigned int get_number_of_bytes_needed(unsigned int N) {
unsigned int bytes = 0;
while(N) {
N >>= 8;
++bytes;
};
return bytes;
};
assuming sizeof(long int) = 4.
int nbytes( long int x )
{
unsigned long int n = (unsigned long int) x;
if (n <= 0xFFFF)
{
if (n <= 0xFF) return 1;
else return 2;
}
else
{
if (n <= 0xFFFFFF) return 3;
else return 4;
}
}
The shortest code way to do this is as follows:
int bytes = (int)Math.Log(num, 256) + 1;
The code is small enough to be inlined, which helps offset the "slow" FP code. Also, there are no branches, which can be expensive.
Try this code:
// works for num >= 0
int numberOfBytesForNumber(int num) {
if (num < 0)
return 0;
else if (num == 0)
return 1;
else if (num > 0) {
int n = 0;
while (num != 0) {
num >>= 8;
n++;
}
return n;
}
}
/**
* assumes i is non-negative.
* note that this returns 0 for 0, when perhaps it should be special cased?
*/
int numberOfBytesForNumber(int i) {
int bytes = 0;
int div = 1;
while(i / div) {
bytes++;
div *= 256;
}
if(i % 8 == 0) return bytes;
return bytes + 1;
}
This code runs at 447 million tests / sec on my laptop where i = 1 to 1E9. i is a signed int:
n = (i > 0xffffff || i < 0) ? 4 : (i < 0xffff) ? (i < 0xff) ? 1 : 2 : 3;
Python example: no logs or exponents, just bit shift.
Note: 0 counts as 0 bits and only positive ints are valid.
def bits(num):
"""Return the number of bits required to hold a int value."""
if not isinstance(num, int):
raise TypeError("Argument must be of type int.")
if num < 0:
raise ValueError("Argument cannot be less than 0.")
for i in count(start=0):
if num == 0:
return i
num = num >> 1