Convert Byte Array into Bitset

Convert Byte Array into Bitset - c++

I have a byte array generated by a random number generator. I want to put this into the STL bitset.
Unfortunately, it looks like Bitset only supports the following constructors:
A string of 1's and 0's like "10101011"
An unsigned long. (my byte array will be longer)
The only solution I can think of now is to read the byte array bit by bit and make a string of 1's and 0's. Does anyone have a more efficient solution?

Something like this?
#include <bitset>
#include <climits>
template<size_t numBytes>
std::bitset<numBytes * CHAR_BIT> bytesToBitset(uint8_t *data)
{
std::bitset<numBytes * CHAR_BIT> b;
for(int i = 0; i < numBytes; ++i)
{
uint8_t cur = data[i];
int offset = i * CHAR_BIT;
for(int bit = 0; bit < CHAR_BIT; ++bit)
{
b[offset] = cur & 1;
++offset; // Move to next bit in b
cur >>= 1; // Move to next bit in array
}
}
return b;
}
And an example usage:
int main()
{
std::array<uint8_t, 4> bytes = { 0xDE, 0xAD, 0xBE, 0xEF };
auto bits = bytesToBitset<bytes.size()>(bytes.data());
std::cout << bits << std::endl;
}

There's a 3rd constructor for bitset<> - it takes no parameters and sets all the bits to 0. I think you'll need to use that then walk through the array calling set() for each bit in the byte array that's a 1.
A bit brute-force, but it'll work. There will be a bit of complexity to convert the byte-index and bit offset within each byte to a bitset index, but it's nothing a little bit of thought (and maybe a run through under the debugger) won't solve. I think it's most likely simpler and more efficient than trying to run the array through a string conversion or a stream.

I have spent a lot of time by writing a reverse function (bitset -> byte/char array). There it is:
bitset<SIZE> data = ...
// bitset to char array
char current = 0;
int offset = 0;
for (int i = 0; i < SIZE; ++i) {
if (data[i]) { // if bit is true
current |= (char)(int)pow(2, i - offset * CHAR_BIT); // set that bit to true in current masked value
} // otherwise let it to be false
if ((i + 1) % CHAR_BIT == 0) { // every 8 bits
buf[offset++] = current; // save masked value to buffer & raise offset of buffer
current = 0; // clear masked value
}
}
// now we have the result in "buf" (final size of contents in buffer is "offset")

Here is my implementation using template meta-programming.
Loops are done in the compile-time.
I took #strager version, modified it in order to prepare for TMP:
changed order of iteration (so that I could make recursion from it);
reduced number of used variables.
Modified version with loops in a run-time:
template <size_t nOfBytes>
void bytesToBitsetRunTimeOptimized(uint8_t* arr, std::bitset<nOfBytes * CHAR_BIT>& result) {
for(int i = nOfBytes - 1; i >= 0; --i) {
for(int bit = 0; bit < CHAR_BIT; ++bit) {
result[i * CHAR_BIT + bit] = ((arr[i] >> bit) & 1);
}
}
}
TMP version based on it:
template<size_t nOfBytes, int I, int BIT> struct LoopOnBIT {
static inline void bytesToBitset(uint8_t* arr, std::bitset<nOfBytes * CHAR_BIT>& result) {
result[I * CHAR_BIT + BIT] = ((arr[I] >> BIT) & 1);
LoopOnBIT<nOfBytes, I, BIT+1>::bytesToBitset(arr, result);
}
};
// stop case for LoopOnBIT
template<size_t nOfBytes, int I> struct LoopOnBIT<nOfBytes, I, CHAR_BIT> {
static inline void bytesToBitset(uint8_t* arr, std::bitset<nOfBytes * CHAR_BIT>& result) { }
};
template<size_t nOfBytes, int I> struct LoopOnI {
static inline void bytesToBitset(uint8_t* arr, std::bitset<nOfBytes * CHAR_BIT>& result) {
LoopOnBIT<nOfBytes, I, 0>::bytesToBitset(arr, result);
LoopOnI<nOfBytes, I-1>::bytesToBitset(arr, result);
}
};
// stop case for LoopOnI
template<size_t nOfBytes> struct LoopOnI<nOfBytes, -1> {
static inline void bytesToBitset(uint8_t* arr, std::bitset<nOfBytes * CHAR_BIT>& result) { }
};
template <size_t nOfBytes>
void bytesToBitset(uint8_t* arr, std::bitset<nOfBytes * CHAR_BIT>& result) {
LoopOnI<nOfBytes, nOfBytes - 1>::bytesToBitset(arr, result);
}
client code:
uint8_t arr[]={0x6A};
std::bitset<8> b;
bytesToBitset<1>(arr,b);

Well, let's be honest, I was bored and started to think there had to be a slightly faster way than setting each bit.
template<int numBytes>
std::bitset<numBytes * CHARBIT bytesToBitset(byte *data)
{
std::bitset<numBytes * CHAR_BIT> b = *data;
for(int i = 1; i < numBytes; ++i)
{
b <<= CHAR_BIT; // Move to next bit in array
b |= data[i]; // Set the lowest CHAR_BIT bits
}
return b;
}
This is indeed slightly faster, at least as long as the byte array is smaller than 30 elements (depending on your optimization-flags passed to compiler). Larger array than that and the time used by shifting the bitset makes setting each bit faster.

you can initialize the bitset from a stream. I can't remember how to wrangle a byte[] into a stream, but...
from http://www.sgi.com/tech/stl/bitset.html
bitset<12> x;
cout << "Enter a 12-bit bitset in binary: " << flush;
if (cin >> x) {
cout << "x = " << x << endl;
cout << "As ulong: " << x.to_ulong() << endl;
cout << "And with mask: " << (x & mask) << endl;
cout << "Or with mask: " << (x | mask) << endl;
}

Related

Accessing 8-bit data as 7-bit

I have an array of 100 uint8_t's, which is to be treated as a stream of 800 bits, and dealt with 7 bits at a time. So in other words, if the first element of the 8-bit array holds 0b11001100 and the second holds ob11110000 then when I come to read it in 7-bit format, the first element of the 7-bit array would be 0b1100110 and the second would be 0b0111100 with the remaining 2 bits being held in the 3rd.
The first thing I tried was a union...
struct uint7_t {
uint8_t i1:7;
};
union uint7_8_t {
uint8_t u8[100];
uint7_t u7[115];
};
but of course everything's byte aligned and I essentially end up simply loosing the 8th bit of each element.
Does anyone have any idea's on how I can go about doing this?
Just to be clear, this is something of a visual representation of the result of the union:
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 32 bits of 8 bit data
0xxxxxxx 0xxxxxxx 0xxxxxxx 0xxxxxxx 32 bits of 7-bit data.
And this represents what it is that I want to do instead:
xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx 32 bits of 8 bit data
xxxxxxx xxxxxxx xxxxxxx xxxxxxx xxxx 32 bits of 7-bit data.
I'm aware the last bits may be padded but that's fine, I just want someway of accessing each byte 7 bits at a time without losing any of the 800 bits. So far the only way I can think of is lots of bit shifting, which of course would work but I'm sure there's a cleaner way of going about it(?)
Thanks in advance for any answers.

Not sure what you mean by "cleaner". Generally people who work on this sort of problem regularly consider shifting and masking to be the right primitive tool to use. One can do something like defining a bitstream abstraction with a method to read an arbitrary number of bits off the stream. This abstraction sometimes shows up in compression applications. The internals of the method of course do use shifting and masking.
One fairly clean approach is to write a function which extracts a 7-bit number at any bit index in an array of unsigned char's. Use a division to convert the bit index to a byte index, and modulus to get the bit index within the byte. Then shift and mask. The input bits can span two bytes, so you either have to glue together a 16-bit value before extraction, or do two smaller extractions and or them together to construct the result.
If I were aiming for something moderately performant, I'd likely take one of two approaches:
The first has two state variables saying how many bits to take from the current and next byte. It would use shifting, masking, and bitwise or, to produce the current output (a number between 0 and 127 as an int for example), then the loop would update both state variables via adding and modulus, and would increment the current byte pointers if all bits in the first byte were consumed.
The second approach is to load 56-bits (8 outputs worth of input) into a 64-bit integer and use a fully unrolled structure to extract each of the 8 outputs. Doing this without using unaligned memory reads requires constructing the 64-bit integer piecemeal. (56-bits is special because the starting bit position is byte aligned.)
To go real fast, I might try writing SIMD code in Halide. That's beyond scope here I believe. (And not clear it is going to win much actually.)
Designs which read more than one byte into a integer at a time will likely have to consider processor byte ordering.

Process them in groups of 8 (since 8x7 nicely rounds to something 8bit aligned). Bitwise operators are the order of the day here. Faffing around with the last (upto) 7 numbers is a little faffy, but not impossible. (This code assumes these are unsigned 7 bit integers! Signed conversion would require you to do consider flipping the top bit if bit[6] is 1)
// convert 8 x 7bit ints in one go
void extract8(const uint8_t input[7], uint8_t output[8])
{
output[0] = input[0] & 0x7F;
output[1] = (input[0] >> 7) | ((input[1] << 1) & 0x7F);
output[2] = (input[1] >> 6) | ((input[2] << 2) & 0x7F);
output[3] = (input[2] >> 5) | ((input[3] << 3) & 0x7F);
output[4] = (input[3] >> 4) | ((input[4] << 4) & 0x7F);
output[5] = (input[4] >> 3) | ((input[5] << 5) & 0x7F);
output[6] = (input[5] >> 2) | ((input[6] << 6) & 0x7F);
output[7] = input[6] >> 1;
}
// convert array of 7bit ints to 8bit
void seven_bit_to_8bit(const uint8_t* const input, uint8_t* const output, const size_t count)
{
size_t count8 = count >> 3;
for(size_t i = 0; i < count8; ++i)
{
extract8(input + 7 * i, output + 8 * i);
}
// handle remaining (upto) 7 bytes
const size_t countr = (count % 8);
if(countr)
{
// how many bytes do we need to copy from the input?
size_t remaining_bits = 7 * countr;
if(remaining_bits % 8)
{
// round to next nearest multiple of 8
remaining_bits += (8 - remaining_bits % 8);
}
remaining_bits /= 8;
{
uint8_t in[7] = {0}, out[8] = {0};
for(size_t i = 0; i < remaining_bits; ++i)
{
in[i] = input[count8 * 7 + i];
}
extract8(in, out);
for(size_t i = 0; i < countr; ++i)
{
output[count8 * 8 + i] = in[i];
}
}
}
}

Here is a solution that uses the vector bool specialization. It also uses a similar mechanism to allow access to the seven-bit elements via reference objects.
The member functions allow for the following operations:
uint7_t x{5}; // simple value
Arr<uint7_t> arr(10); // array of size 10
arr[0] = x; // set element
uint7_t y = arr[0]; // get element
arr.push_back(uint7_t{9}); // add element
arr.push_back(x); //
std::cout << "Array size is "
<< arr.size() << '\n'; // get size
for(auto&& i : arr)
std::cout << i << '\n'; // range-for to read values
int z{50};
for(auto&& i : arr)
i = z++; // range-for to change values
auto&& v = arr[1]; // get reference to second element
v = 99; // change second element via reference
Full program:
#include <vector>
#include <iterator>
#include <iostream>
struct uint7_t {
unsigned int i : 7;
};
struct seven_bit_ref {
size_t begin;
size_t end;
std::vector<bool>& bits;
seven_bit_ref& operator=(const uint7_t& right)
{
auto it{bits.begin()+begin};
for(int mask{1}; mask != 1 << 7; mask <<= 1)
*it++ = right.i & mask;
return *this;
}
operator uint7_t() const
{
uint7_t r{};
auto it{bits.begin() + begin};
for(int i{}; i < 7; ++i)
r.i += *it++ << i;
return r;
}
seven_bit_ref operator*()
{
return *this;
}
void operator++()
{
begin += 7;
end += 7;
}
bool operator!=(const seven_bit_ref& right)
{
return !(begin == right.begin && end == right.end);
}
seven_bit_ref operator=(int val)
{
uint7_t temp{};
temp.i = val;
operator=(temp);
return *this;
}
};
template<typename T>
class Arr;
template<>
class Arr<uint7_t> {
public:
Arr(size_t size) : bits(size * 7, false) {}
seven_bit_ref operator[](size_t index)
{
return {index * 7, index * 7 + 7, bits};
}
size_t size()
{
return bits.size() / 7;
}
void push_back(uint7_t val)
{
for(int mask{1}; mask != 1 << 7; mask <<= 1){
bits.push_back(val.i & mask);
}
}
seven_bit_ref begin()
{
return {0, 7, bits};
}
seven_bit_ref end()
{
return {size() * 7, size() * 7 + 7, bits};
}
std::vector<bool> bits;
};
std::ostream& operator<<(std::ostream& os, uint7_t val)
{
os << val.i;
return os;
}
int main()
{
uint7_t x{5}; // simple value
Arr<uint7_t> arr(10); // array of size 10
arr[0] = x; // set element
uint7_t y = arr[0]; // get element
arr.push_back(uint7_t{9}); // add element
arr.push_back(x); //
std::cout << "Array size is "
<< arr.size() << '\n'; // get size
for(auto&& i : arr)
std::cout << i << '\n'; // range-for to read values
int z{50};
for(auto&& i : arr)
i = z++; // range-for to change values
auto&& v = arr[1]; // get reference
v = 99; // change via reference
std::cout << "\nAfter changes:\n";
for(auto&& i : arr)
std::cout << i << '\n';
}

The following code works as you have asked for it, but first the output and live example on ideone.
Output:
Before changing values...:
7 bit representation: 1111111 0000000 0000000 0000000 0000000 0000000 0000000 0000000
8 bit representation: 11111110 00000000 00000000 00000000 00000000 00000000 00000000
After changing values...:
7 bit representation: 1000000 1001100 1110010 1011010 1010100 0000111 1111110 0000000
8 bit representation: 10000001 00110011 10010101 10101010 10000001 11111111 00000000
8 Bits: 11111111 to ulong: 255
7 Bits: 1111110 to ulong: 126
After changing values...:
7 bit representation: 0010000 0101010 0100000 0000000 0000000 0000000 0000000 0000000
8 bit representation: 00100000 10101001 00000000 00000000 00000000 00000000 00000000
It is very straight forward using a std::bitset in a class called BitVector. I implement one getter and setter. The getter returns also a std::bitset at the given index selIdx with a given template argument size M. The given idx will be multiplied by the given size M to get the right position. The returned bitset can also be converted to numerical or string values.
The setter uses an uint8_t value as input and again the index selIdx. The bits will be shifted to the right position into the bitset.
Further you can use the getter and setter with different sizes because of the template argument M, which means you can work with either 7 or 8 bit representation but also 3 or what ever you like.
I'm sure this code is not the best concerning speed, but I think it is a very clear and clean solution. Also it is not complete at all as there are just one getter, one setter and two constructors. Remember to implement error checking concerning indexes and sizes.
Code:
#include <iostream>
#include <bitset>
template <size_t N> class BitVector
{
private:
std::bitset<N> _data;
public:
BitVector (unsigned long num) : _data (num) { };
BitVector (const std::string& str) : _data (str) { };
template <size_t M>
std::bitset<M> getBits (size_t selIdx)
{
std::bitset<M> retBitset;
for (size_t idx = 0; idx < M; ++idx)
{
retBitset |= (_data[M * selIdx + idx] << (M - 1 - idx));
}
return retBitset;
}
template <size_t M>
void setBits (size_t selIdx, uint8_t num)
{
const unsigned char* curByte = reinterpret_cast<const unsigned char*> (&num);
for (size_t bitIdx = 0; bitIdx < 8; ++bitIdx)
{
bool bitSet = (1 == ((*curByte & (1 << (8 - 1 - bitIdx))) >> (8 - 1 - bitIdx)));
_data.set(M * selIdx + bitIdx, bitSet);
}
}
void print_7_8()
{
std:: cout << "\n7 bit representation: ";
for (size_t idx = 0; idx < (N / 7); ++idx)
{
std::cout << getBits<7>(idx) << " ";
}
std:: cout << "\n8 bit representation: ";
for (size_t idx = 0; idx < N / 8; ++idx)
{
std::cout << getBits<8>(idx) << " ";
}
}
};
int main ()
{
BitVector<56> num = 127;
std::cout << "Before changing values...:";
num.print_7_8();
num.setBits<8>(0, 0x81);
num.setBits<8>(1, 0b00110011);
num.setBits<8>(2, 0b10010101);
num.setBits<8>(3, 0xAA);
num.setBits<8>(4, 0x81);
num.setBits<8>(5, 0xFF);
num.setBits<8>(6, 0x00);
std::cout << "\n\nAfter changing values...:";
num.print_7_8();
std::cout << "\n\n8 Bits: " << num.getBits<8>(5) << " to ulong: " << num.getBits<8>(5).to_ulong();
std::cout << "\n7 Bits: " << num.getBits<7>(6) << " to ulong: " << num.getBits<7>(6).to_ulong();
num = BitVector<56>(std::string("1001010100000100"));
std::cout << "\n\nAfter changing values...:";
num.print_7_8();
return 0;
}

Here is one approach without the manual shifting. This is just a crude POC, but hopefully you will be able to get something out of it. I don't know if you are able to easily transform your input into bitset, but i think it should be possible.
int bytes = 0x01234567;
bitset<32> bs(bytes);
cout << "Input: " << bs << endl;
for(int i = 0; i < 5; i++)
{
bitset<7> slice(bs.to_string().substr(i*7, 7));
cout << slice << endl;
}
Also this is probably much less performant then the bitshifting version, so i wouldn't recommend it for heavy lifting.

You can use this to get the index'th 7-bit element from in (note that it doesn't have proper end of array handling). Simple, fast.
int get7(const uint8_t *in, int index) {
int fidx = index*7;
int idx = fidx>>3;
int sidx = fidx&7;
return (in[idx]>>sidx|in[idx+1]<<(8-sidx))&0x7f;
}

You can use direct access or bulk bit packing/unpacking as in TurboPFor:Integer Compression
// Direct read access
// b : bit width 0-16 (7 in your case)
#define bzhi32(u,b) ((u) & ((1u <<(b))-1))
static inline unsigned bitgetx16(unsigned char *in,
unsigned idx,
unsigned b) {
unsigned bidx = b*idx;
return bzhi32( *(unsigned *)((uint16_t *)in+(bidx>>4)) >> (bidx& 0xf), b );
}

Remove nth bit from buffer, and shift the rest

Giving a uint8_t buffer of x length, I am trying to come up with a function or a macro that can remove nth bit (or n to n+i), then left-shift the remaining bits.
example #1:
for input 0b76543210 0b76543210 ... then output should be 0b76543217 0b654321 ...
example #2: if the input is:
uint8_t input[8] = {
0b00110011,
0b00110011,
...
};
the output without the first bit, should be
uint8_t output[8] = {
0b00110010,
0b01100100,
...
};
I have tried the following to remove the first bit, but it did not work for the second group of bits.
/* A macro to extract (a-b) range of bits without shifting */
#define BIT_RANGE(N,x,y) ((N) & ((0xff >> (7 - (y) + (x))) << ((x))))
void removeBit0(uint8_t *n) {
for (int i=0; i < 7; i++) {
n[i] = (BIT_RANGE(n[i], i + 1, 7)) << (i + 1) |
(BIT_RANGE(n[i + 1], 1, i + 1)) << (7 - i); /* This does not extract the next element bits */
}
n[7] = 0;
}
Update #1
In my case, the input will be uint64_t number, then I will use memmov to shift it one place to the left.
Update #2
The solution can be in C/C++, assembly(x86-64) or inline assembly.

This is really 2 subproblems: remove bits from each byte and pack the results. This is the flow of the code below. I wouldn't use a macro for this. Too much going on. Just inline the function if you're worried about performance at that level.
#include <stdio.h>
#include <stdint.h>
// Remove bits n to n+k-1 from x.
unsigned scrunch_1(unsigned x, int n, int k) {
unsigned hi_bits = ~0u << n;
return (x & ~hi_bits) | ((x >> k) & hi_bits);
}
// Remove bits n to n+k-1 from each byte in the buffer,
// then pack left. Return number of packed bytes.
size_t scrunch(uint8_t *buf, size_t size, int n, int k) {
size_t i_src = 0, i_dst = 0;
unsigned src_bits = 0; // Scrunched source bit buffer.
int n_src_bits = 0; // Initially it's empty.
for (;;) {
// Get scrunched bits until the buffer has at least 8.
while (n_src_bits < 8) {
if (i_src >= size) { // Done when source bytes exhausted.
// If there are left-over bits, add one more byte to output.
if (n_src_bits > 0) buf[i_dst++] = src_bits << (8 - n_src_bits);
return i_dst;
}
// Pack 'em in.
src_bits = (src_bits << (8 - k)) | scrunch_1(buf[i_src++], n, k);
n_src_bits += 8 - k;
}
// Write the highest 8 bits of the buffer to the destination byte.
n_src_bits -= 8;
buf[i_dst++] = src_bits >> n_src_bits;
}
}
int main(void) {
uint8_t x[] = { 0xaa, 0xaa, 0xaa, 0xaa };
size_t n = scrunch(x, 4, 2, 3);
for (size_t i = 0; i < n; i++) {
printf("%x ", x[i]);
}
printf("\n");
return 0;
}
This writes b5 ad 60, which by my reckoning is correct. A few other test cases work as well.
Oops I coded it the first time shifting the wrong way, but include that here in case it's useful to someone.
#include <stdio.h>
#include <stdint.h>
// Remove bits n to n+k-1 from x.
unsigned scrunch_1(unsigned x, int n, int k) {
unsigned hi_bits = 0xffu << n;
return (x & ~hi_bits) | ((x >> k) & hi_bits);
}
// Remove bits n to n+k-1 from each byte in the buffer,
// then pack right. Return number of packed bytes.
size_t scrunch(uint8_t *buf, size_t size, int n, int k) {
size_t i_src = 0, i_dst = 0;
unsigned src_bits = 0; // Scrunched source bit buffer.
int n_src_bits = 0; // Initially it's empty.
for (;;) {
// Get scrunched bits until the buffer has at least 8.
while (n_src_bits < 8) {
if (i_src >= size) { // Done when source bytes exhausted.
// If there are left-over bits, add one more byte to output.
if (n_src_bits > 0) buf[i_dst++] = src_bits;
return i_dst;
}
// Pack 'em in.
src_bits |= scrunch_1(buf[i_src++], n, k) << n_src_bits;
n_src_bits += 8 - k;
}
// Write the lower 8 bits of the buffer to the destination byte.
buf[i_dst++] = src_bits;
src_bits >>= 8;
n_src_bits -= 8;
}
}
int main(void) {
uint8_t x[] = { 0xaa, 0xaa, 0xaa, 0xaa };
size_t n = scrunch(x, 4, 2, 3);
for (size_t i = 0; i < n; i++) {
printf("%x ", x[i]);
}
printf("\n");
return 0;
}
This writes d6 5a b. A few other test cases work as well.

Something similar to this should work:
template<typename S> void removeBit(S* buffer, size_t length, size_t index)
{
const size_t BITS_PER_UNIT = sizeof(S)*8;
// first we find which data unit contains the desired bit
const size_t unit = index / BITS_PER_UNIT;
// and which index has the bit inside the specified unit, starting counting from most significant bit
const size_t relativeIndex = (BITS_PER_UNIT - 1) - index % BITS_PER_UNIT;
// then we unset that bit
buffer[unit] &= ~(1 << relativeIndex);
// now we have to shift what's on the right by 1 position
// we create a mask such that if 0b00100000 is the bit removed we use 0b00011111 as mask to shift the rest
const S partialShiftMask = (1 << relativeIndex) - 1;
// now we keep all bits left to the removed one and we shift left all the others
buffer[unit] = (buffer[unit] & ~partialShiftMask) | ((buffer[unit] & partialShiftMask) << 1);
for (int i = unit+1; i < length; ++i)
{
//we set rightmost bit of previous unit according to last bit of current unit
buffer[i-1] |= buffer[i] >> (BITS_PER_UNIT-1);
// then we shift current unit by one
buffer[i] <<= 1;
}
}
I just tested it on some basic cases so maybe something is not exactly correct but this should move you onto the right track.

Shifting arrays of bytes and skipping bits

I'm trying to make a function that would return N number of bits of a given memory chunk, and optionally skipping M bits.
Example:
unsigned char *data = malloc(3);
data[0] = 'A'; data[1] = 'B'; data[2] = 'C';
read(data, 8, 4);
would skip 12 bits and then read 8 bits from the data chunk "ABC".
"Skipping" bits means it would actually bitshift the entire array, carrying bits from the right to the left.
In this example ABC is
01000001 01000010 01000011
and the function would need to return
0001 0100
This question is a follow up of my previous question
Minimal compilable code
#include <ios>
#include <cmath>
#include <bitset>
#include <cstdio>
#include <cstring>
#include <cstdlib>
#include <iostream>
using namespace std;
typedef unsigned char byte;
typedef struct bit_data {
byte *data;
size_t length;
} bit_data;
/*
Asume skip_n_bits will be 0 >= skip_n_bits <= 8
*/
bit_data *read(size_t n_bits, size_t skip_n_bits) {
bit_data *bits = (bit_data *) malloc(sizeof(struct bit_data));
size_t bytes_to_read = ceil(n_bits / 8.0);
size_t bytes_to_read_with_skip = ceil(n_bits / 8.0) + ceil(skip_n_bits / 8.0);
bits->data = (byte *) calloc(1, bytes_to_read);
bits->length = n_bits;
/* Hardcoded for the sake of this example*/
byte *tmp = (byte *) malloc(3);
tmp[0] = 'A'; tmp[1] = 'B'; tmp[2] = 'C';
/*not working*/
if(skip_n_bits > 0){
unsigned char *tmp2 = (unsigned char *) calloc(1, bytes_to_read_with_skip);
size_t i;
for(i = bytes_to_read_with_skip - 1; i > 0; i--) {
tmp2[i] = tmp[i] << skip_n_bits;
tmp2[i - 1] = (tmp[i - 1] << skip_n_bits) | (tmp[i] >> (8 - skip_n_bits));
}
memcpy(bits->data, tmp2, bytes_to_read);
free(tmp2);
}else{
memcpy(bits->data, tmp, bytes_to_read);
}
free(tmp);
return bits;
}
int main(void) {
//Reading "ABC"
//01000001 01000010 01000011
bit_data *res = read(8, 4);
cout << bitset<8>(*res->data);
cout << " -> Should be '00010100'";
return 0;
}
The current code returns 00000000 instead of 00010100.
I feel like the error is something small, but I'm missing it. Where is the problem?

Your code is tagged as C++, and indeed you're already using C++ constructs like bitset, however it's very C-like. The first thing to do I think would be to use more C++.
Turns out bitset is pretty flexible already. My approach would be to create one to store all the bits in our input data, and then grab a subset of that based on the number you wish to skip, and return the subset:
template<size_t N, size_t M, typename T = unsigned char>
std::bitset<N> read(size_t skip_n_bits, const std::array<T, M>& data)
{
const size_t numBits = sizeof(T) * 8;
std::bitset<N> toReturn; // initially all zeros
// if we want to skip all bits, return all zeros
if (M*numBits <= skip_n_bits)
return toReturn;
// create a bitset to store all the bits represented in our data array
std::bitset<M*numBits> tmp;
// set bits in tmp based on data
// convert T into bit representations
size_t pos = M*numBits-1;
for (const T& element : data)
{
for (size_t i=0; i < numBits; ++i)
{
tmp.set(pos-i, (1 << (numBits - i-1)) & element);
}
pos -= numBits;
}
// grab just the bits we need
size_t startBit = tmp.size()-skip_n_bits-1;
for (size_t i = 0; i < N; ++i)
{
toReturn[N-i-1] = tmp[startBit];
tmp <<= 1;
}
return toReturn;
}
Full working demo
And now we can call it like so:
// return 8-bit bitset, skip 12 bits
std::array<unsigned char, 3> data{{'A', 'B', 'C'}};
auto&& returned = read<8>(12, data);
std::cout << returned << std::endl;
Prints
00100100
which is precisely our input 01000001 01000010 01000011 skipping the first twelve bits (from the left towards the right), and only grabbing the next 8 available.
I'd argue this is a bit easier to read than what you've got, esp. from a C++ programmer's point of view.

Convert integer to binary and store it in an integer array of specified size:c++

I want to convert an integer to binary string and then store each bit of the integer string to an element of a integer array of a given size. I am sure that the input integer's binary expression won't exceed the size of the array specified. How to do this in c++?

Pseudo code:
int value = ???? // assuming a 32 bit int
int i;
for (i = 0; i < 32; ++i) {
array[i] = (value >> i) & 1;
}

template<class output_iterator>
void convert_number_to_array_of_digits(const unsigned number,
output_iterator first, output_iterator last)
{
const unsigned number_bits = CHAR_BIT*sizeof(int);
//extract bits one at a time
for(unsigned i=0; i<number_bits && first!=last; ++i) {
const unsigned shift_amount = number_bits-i-1;
const unsigned this_bit = (number>>shift_amount)&1;
*first = this_bit;
++first;
}
//pad the rest with zeros
while(first != last) {
*first = 0;
++first;
}
}
int main() {
int number = 413523152;
int array[32];
convert_number_to_array_of_digits(number, std::begin(array), std::end(array));
for(int i=0; i<32; ++i)
std::cout << array[i] << ' ';
}
Proof of compilation here

You could use C++'s bitset library, as follows.
#include<iostream>
#include<bitset>
int main()
{
int N;//input number in base 10
cin>>N;
int O[32];//The output array
bitset<32> A=N;//A will hold the binary representation of N
for(int i=0,j=31;i<32;i++,j--)
{
//Assigning the bits one by one.
O[i]=A[j];
}
return 0;
}
A couple of points to note here:
First, 32 in the bitset declaration statement tells the compiler that you want 32 bits to represent your number, so even if your number takes fewer bits to represent, the bitset variable will have 32 bits, possibly with many leading zeroes.
Second, bitset is a really flexible way of handling binary, you can give a string as its input or a number, and again you can use the bitset as an array or as a string.It's a really handy library.
You can print out the bitset variable A as
cout<<A;
and see how it works.

You can do like this:
while (input != 0) {
if (input & 1)
result[index] = 1;
else
result[index] =0;
input >>= 1;// dividing by two
index++;
}

As Mat mentioned above, an int is already a bit-vector (using bitwise operations, you can check each bit). So, you can simply try something like this:
// Note: This depends on the endianess of your machine
int x = 0xdeadbeef; // Your integer?
int arr[sizeof(int)*CHAR_BIT];
for(int i = 0 ; i < sizeof(int)*CHAR_BIT ; ++i) {
arr[i] = (x & (0x01 << i)) ? 1 : 0; // Take the i-th bit
}

Decimal to Binary: Size independent
Two ways: both stores binary represent into a dynamic allocated array bits (in msh to lsh).
First Method:
#include<limits.h> // include for CHAR_BIT
int* binary(int dec){
int* bits = calloc(sizeof(int) * CHAR_BIT, sizeof(int));
if(bits == NULL) return NULL;
int i = 0;
// conversion
int left = sizeof(int) * CHAR_BIT - 1;
for(i = 0; left >= 0; left--, i++){
bits[i] = !!(dec & ( 1u << left ));
}
return bits;
}
Second Method:
#include<limits.h> // include for CHAR_BIT
int* binary(unsigned int num)
{
unsigned int mask = 1u << ((sizeof(int) * CHAR_BIT) - 1);
//mask = 1000 0000 0000 0000
int* bits = calloc(sizeof(int) * CHAR_BIT, sizeof(int));
if(bits == NULL) return NULL;
int i = 0;
//conversion
while(mask > 0){
if((num & mask) == 0 )
bits[i] = 0;
else
bits[i] = 1;
mask = mask >> 1 ; // Right Shift
i++;
}
return bits;
}

I know it doesn't add as many Zero's as you wish for positive numbers. But for negative binary numbers, it works pretty well.. I just wanted to post a solution for once :)
int BinToDec(int Value, int Padding = 8)
{
int Bin = 0;
for (int I = 1, Pos = 1; I < (Padding + 1); ++I, Pos *= 10)
{
Bin += ((Value >> I - 1) & 1) * Pos;
}
return Bin;
}

This is what I use, it also lets you give the number of bits that will be in the final vector, fills any unused bits with leading 0s.
std::vector<int> to_binary(int num_to_convert_to_binary, int num_bits_in_out_vec)
{
std::vector<int> r;
// make binary vec of minimum size backwards (LSB at .end() and MSB at .begin())
while (num_to_convert_to_binary > 0)
{
//cout << " top of loop" << endl;
if (num_to_convert_to_binary % 2 == 0)
r.push_back(0);
else
r.push_back(1);
num_to_convert_to_binary = num_to_convert_to_binary / 2;
}
while(r.size() < num_bits_in_out_vec)
r.push_back(0);
return r;
}

How to put bit sequence into bytes (C/C++)

I have a couple of integers, for example (in binary represetation):
00001000, 01111111, 10000000, 00000001
and I need to put them in sequence to array of bytes(chars), without the leading zeros, like so:
10001111 11110000 0001000
I understand that it is must be done by bit shifting with <<,>> and using binary or |. But I can't find the correct algorithm, can you suggest the best approach?
The integers I need to put there are unsigned long long ints, so the length of one can be anywhere from 1 bit to 8 bytes (64 bits).

You could use a std::bitset:
#include <bitset>
#include <iostream>
int main() {
unsigned i = 242122534;
std::bitset<sizeof(i) * 8> bits;
bits = i;
std::cout << bits.to_string() << "\n";
}

There are doubtless other ways of doing it, but I would probably go with the simplest:
std::vector<unsigned char> integers; // Has your list of bytes
integers.push_back(0x02);
integers.push_back(0xFF);
integers.push_back(0x00);
integers.push_back(0x10);
integers.push_back(0x01);
std::string str; // Will have your resulting string
for(unsigned int i=0; i < integers.size(); i++)
for(int j=0; j<8; j++)
str += ((integers[i]<<j) & 0x80 ? "1" : "0");
std::cout << str << "\n";
size_t begin = str.find("1");
if(begin > 0) str.erase(0,begin);
std::cout << str << "\n";
I wrote this up before you mentioned that you were using long ints or whatnot, but that doesn't actually change very much of this. The mask needs to change, and the j loop variable, but otherwise the above should work.

Convert them to strings, then erase all leading zeros:
#include <iostream>
#include <sstream>
#include <string>
#include <cstdint>
std::string to_bin(uint64_t v)
{
std::stringstream ss;
for(size_t x = 0; x < 64; ++x)
{
if(v & 0x8000000000000000)
ss << "1";
else
ss << "0";
v <<= 1;
}
return ss.str();
}
void trim_right(std::string& in)
{
size_t non_zero = in.find_first_not_of("0");
if(std::string::npos != non_zero)
in.erase(in.begin(), in.begin() + non_zero);
else
{
// no 1 in data set, what to do?
in = "<no data>";
}
}
int main()
{
uint64_t v1 = 437148234;
uint64_t v2 = 1;
uint64_t v3 = 0;
std::string v1s = to_bin(v1);
std::string v2s = to_bin(v2);
std::string v3s = to_bin(v3);
trim_right(v1s);
trim_right(v2s);
trim_right(v3s);
std::cout << v1s << "\n"
<< v2s << "\n"
<< v3s << "\n";
return 0;
}

A simple approach would be having the "current byte" (acc in the following), the associated number of used bits in it (bitcount) and a vector of fully processed bytes (output):
int acc = 0;
int bitcount = 0;
std::vector<unsigned char> output;
void writeBits(int size, unsigned long long x)
{
while (size > 0)
{
// sz = How many bit we're about to copy
int sz = size;
// max avail space in acc
if (sz > 8 - bitcount) sz = 8 - bitcount;
// get the bits
acc |= ((x >> (size - sz)) << (8 - bitcount - sz));
// zero them off in x
x &= (1 << (size - sz)) - 1;
// acc got bigger and x got smaller
bitcount += sz;
size -= sz;
if (bitcount == 8)
{
// got a full byte!
output.push_back(acc);
acc = bitcount = 0;
}
}
}
void writeNumber(unsigned long long x)
{
// How big is it?
int size = 0;
while (size < 64 && x >= (1ULL << size))
size++;
writeBits(size, x);
}
Note that at the end of the processing you should check if there is any bit still in the accumulator (bitcount > 0) and you should flush them in that case by doing a output.push_back(acc);.
Note also that if speed is an issue then probably using a bigger accumulator is a good idea (however the output will depend on machine endianness) and also that discovering how many bits are used in a number can be made much faster than a linear search in C++ (for example x86 has a special machine language instruction BSR dedicated to this).

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js