How do I convert bitset to array of bytes/uint8? - c++

I need to extact bytes from the bitset which may (not) contain a multiple of CHAR_BIT bits. I now how many of the bits in the bitset I need to put into an array. For example,
the bits set is declared as std::bitset < 40> id;
There is a separate variable nBits how many of the bits in id are usable. Now I want to extract those bits in multiples of CHAR_BIT. I also need to take care of cases where nBits % CHAR_BIT != 0. I am okay to put this into an array of uint8

You can use boost::dynamic_bitset, which can be converted to a range of "blocks" using boost::to_block_range.
#include <cstdlib>
#include <cstdint>
#include <iterator>
#include <vector>
#include <boost/dynamic_bitset.hpp>
int main()
{
typedef uint8_t Block; // Make the block size one byte
typedef boost::dynamic_bitset<Block> Bitset;
Bitset bitset(40); // 40 bits
// Assign random bits
for (int i=0; i<40; ++i)
{
bitset[i] = std::rand() % 2;
}
// Copy bytes to buffer
std::vector<Block> bytes;
boost::to_block_range(bitset, std::back_inserter(bytes));
}

Unfortunately there's no good way within the language, assuming you need for than the number of bits in an unsigned long (in which case you could use to_ulong). You'll have to iterate over all the bits and generate the array of bytes yourself.

With standard C++11, you can get the bytes out of your 40-bit bitset with shifting and masking. I didn't deal with handling different values rather than 8 and 40 and handling when the second number is not a multiple of the first.
#include <bitset>
#include <iostream>
#include <cstdint>
int main() {
constexpr int numBits = 40;
std::bitset<numBits> foo(0x1234567890);
std::bitset<numBits> mask(0xff);
for (int i = 0; i < numBits / 8; ++i) {
auto byte =
static_cast<uint8_t>(((foo >> (8 * i)) & mask).to_ulong());
std::cout << std::hex << setfill('0') << setw(2) << static_cast<int>(byte) << std::endl;
}
}

Related

dynamic size of std::bitset initialization [duplicate]

I want to make a simple program that will take number of bits from the input and as an output show binary numbers, written on given bits (example: I type 3: it shows 000, 001, 010, 011, 100, 101, 110, 111).
The only problem I get is in the second for-loop, when I try to assign variable in bitset<bits>, but it wants constant number.
If you could help me find the solution I would be really greatful.
Here's the code:
#include <iostream>
#include <bitset>
#include <cmath>
using namespace std;
int main() {
int maximum_value = 0,x_temp=10;
//cin >> x_temp;
int const bits = x_temp;
for (int i = 1; i <= bits; i++) {
maximum_value += pow(2, bits - i);
}
for (int i = maximum_value; i >= 0; i--)
cout << bitset<bits>(maximum_value - i) << endl;
return 0;
}
A numeric ("non-type", as C++ calls it) template parameter must be a compile-time constant, so you cannot use a user-supplied number. Use a large constant number (e.g. 64) instead. You need another integer that will limit your output:
int x_temp = 10;
cin >> x_temp;
int const bits = 64;
...
Here 64 is some sort of a maximal value you can use, because bitset has a constructor with an unsigned long long argument, which has 64 bits (at least; may be more).
However, if you use int for your intermediate calculations, your code supports a maximum of 14 bits reliably (without overflow). If you want to support more than 14 bits (e.g. 64), use a larger type, like uint32_t or uint64_t.
A problem with holding more bits than needed is that the additional bits will be displayed. To cut them out, use substr:
cout << bitset<64>(...).to_string().substr(64 - x_temp);
Here to_string converts it to string with 64 characters, and substr cuts the last characters, whose number is x_temp.
You have to define const int bits=10; as a global constant :
#include <iostream>
#include <math.h>
#include <bitset>
using namespace std;
const unsigned bits=10;
int main() {
int maximum_value = 0,x_temp=10;
for (int i = 1; i <= bits; i++) {
maximum_value += pow(2, bits - i);
}
for (int i = maximum_value; i >= 0; i--)
cout << bitset<bits>(maximum_value - i) << endl;
return 0;
}

c++ bitstring to byte

For an assignment, I'm doing a compression/decompression of Huffman algorithm in Visual Studio. After I get the 8 bits (10101010 for example) I want to convert it to a byte. This is the code I have:
unsigned byte = 0;
string stringof8 = "11100011";
for (unsigned b = 0; b != 8; b++){
if (b < stringof8.length())
byte |= (stringof8[b] & 1) << b;
}
outf.put(byte);
First couple of bitstring are output correctly as a byte but then if I have more than 3 bytes being pushed I get the same byte multiple times. I'm not familiar with bit manipulation and was asking for someone to walk me through this or walk through a working function.
Using std::bitset
#include <iostream>
#include <string>
#include <bitset>
int main() {
std::string bit_string = "10101010";
std::bitset<8> b(bit_string); // [1,0,1,0,1,0,1,0]
unsigned char c = ( b.to_ulong() & 0xFF);
std::cout << static_cast<int>(c); // prints 170
return 0;
}

Bitwise Operations on a 16 bit number

I am having trouble figuring out how to create a 16 bit int and set/maniuplate all the individual bits. What would the code be if I want my int to start out with all 16 bits = 0?
If I declare my int as
int16_t bitNum = 0;
Is this the same as 0000000000000000? And how do I access the values of the individual bits? Thanks for your time.
Is this the same as 0000000000000000?
Yes.
And how do I access the values of the individual bits?
You cannot access real individual bit as the smaller variable computer can address and allocate is a char (a char variable is of the natural size to hold a character on a given machine). But you can manipulate each bit using bit masks ( and bitwise operations)
temp & (1 << N) // this will test N-th bit
or in C++ you can use std::bitset to represent a sequence of bits.
#include <bitset>
#include <iostream>
#include <stdint.h>
int main()
{
uint16_t temp = 0x0;
std::bitset< 16> bits( temp);
// 0 -> bit 1
// 2 -> bit 3
std::cout << bits[2] << std::endl;
}
This is what Bjarne Stroustrup says about operations on bits in "C++ Prog... 3d edition" 17.5.3 Bitset:
C++ supports the notion of small sets of flags efficiently through
bitwise operations on integers (§6.2.4). These operations include &
(and), | (or), ^ (exclusive or), << (shift left), and >> (shift
right).
Well, also ~, bitwise complement operator, the tilde, that flips every bit.
Class bitset generalizes this notion and offers greater
convenience by providing operations on a set of N bits indexed from 0
through N-1, where N is known at compile time. For sets of bits that
don’t fit into long int using a bitset is much more convenient than
using integers directly. For smaller sets, there may be an efficiency
tradeoff. If you want to name the bits, rather than numbering them,
using a set (§17.4.3), an enumeration (§4.8), or a bitfield (§C.8.1)
are alternatives. (...) A key idea in the design of bitset is that an optimized implementation can be provided for bitsets that fit in a single word. The interface reflects this assumption.
So there are alternatives, i.e another option is to use a bitfields. They are binary variables bundled together as fields in a struct. You can then access each individual "bit" using access operator: . for references or -> for pointers.
struct BitPack {
bool b1 : 0;
bool b2 : 0;
//...
bool b15 : 0;
};
void f( BitPack& b)
{
if( b.b1) // if b1 is set
g();
}
links:
http://en.cppreference.com/w/cpp/utility/bitset
http://en.cppreference.com/w/cpp/language/bit_field
Setting an object of an integral type to zero means setting all its used bits to zero.
You could write two functions. one will set a specified bit (starting from 0) and other will reset a specified bit. For example
#include <iostream>
#include <cstdint>
inline uint16_t & set( uint16_t &bitNum, size_t n )
{
return ( bitNum |= 1 << n );
}
inline uint16_t & reset( uint16_t &bitNum, size_t n )
{
return ( bitNum &= ~( 1 << n ) );
}
int main()
{
uint16_t bitNum = 0;
for ( size_t i = 0; i < 16; i++ )
{
std::cout << set( bitNum, i ) << std::endl;
reset( bitNum, i );
}
return 0;
}
The output is
1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
The other way is to use standard class std::bitset declared in header <bitset> It already has the corresponding functions.
For example
#include <iostream>
#include <bitset>
int main()
{
std::bitset<16> bitNum;
for ( size_t i = 0; i < 16; i++ )
{
std::cout << bitNum.set( i ) << std::endl;
bitNum.reset( i );
}
return 0;
}
The output is
0000000000000001
0000000000000010
0000000000000100
0000000000001000
0000000000010000
0000000000100000
0000000001000000
0000000010000000
0000000100000000
0000001000000000
0000010000000000
0000100000000000
0001000000000000
0010000000000000
0100000000000000
1000000000000000
Enjoy!:)

How to get array of bits in a structure?

I was pondering (and therefore am looking for a way to learn this, and not a better solution) if it is possible to get an array of bits in a structure.
Let me demonstrate by an example. Imagine such a code:
#include <stdio.h>
struct A
{
unsigned int bit0:1;
unsigned int bit1:1;
unsigned int bit2:1;
unsigned int bit3:1;
};
int main()
{
struct A a = {1, 0, 1, 1};
printf("%u\n", a.bit0);
printf("%u\n", a.bit1);
printf("%u\n", a.bit2);
printf("%u\n", a.bit3);
return 0;
}
In this code, we have 4 individual bits packed in a struct. They can be accessed individually, leaving the job of bit manipulation to the compiler. What I was wondering is if such a thing is possible:
#include <stdio.h>
typedef unsigned int bit:1;
struct B
{
bit bits[4];
};
int main()
{
struct B b = {{1, 0, 1, 1}};
for (i = 0; i < 4; ++i)
printf("%u\n", b.bits[i]);
return 0;
}
I tried declaring bits in struct B as unsigned int bits[4]:1 or unsigned int bits:1[4] or similar things to no avail. My best guess was to typedef unsigned int bit:1; and use bit as the type, yet still doesn't work.
My question is, is such a thing possible? If yes, how? If not, why not? The 1 bit unsigned int is a valid type, so why shouldn't you be able to get an array of it?
Again, I don't want a replacement for this, I am just wondering how such a thing is possible.
P.S. I am tagging this as C++, although the code is written in C, because I assume the method would be existent in both languages. If there is a C++ specific way to do it (by using the language constructs, not the libraries) I would also be interested to know.
UPDATE: I am completely aware that I can do the bit operations myself. I have done it a thousand times in the past. I am NOT interested in an answer that says use an array/vector instead and do bit manipulation. I am only thinking if THIS CONSTRUCT is possible or not, NOT an alternative.
Update: Answer for the impatient (thanks to neagoegab):
Instead of
typedef unsigned int bit:1;
I could use
typedef struct
{
unsigned int value:1;
} bit;
properly using #pragma pack
NOT POSSIBLE - A construct like that IS NOT possible(here) - NOT POSSIBLE
One could try to do this, but the result will be that one bit is stored in one byte
#include <cstdint>
#include <iostream>
using namespace std;
#pragma pack(push, 1)
struct Bit
{
//one bit is stored in one BYTE
uint8_t a_:1;
};
#pragma pack(pop, 1)
typedef Bit bit;
struct B
{
bit bits[4];
};
int main()
{
struct B b = {{0, 0, 1, 1}};
for (int i = 0; i < 4; ++i)
cout << b.bits[i] <<endl;
cout<< sizeof(Bit) << endl;
cout<< sizeof(B) << endl;
return 0;
}
output:
0 //bit[0] value
0 //bit[1] value
1 //bit[2] value
1 //bit[3] value
1 //sizeof(Bit), **one bit is stored in one byte!!!**
4 //sizeof(B), ** 4 bytes, each bit is stored in one BYTE**
In order to access individual bits from a byte here is an example (Please note that the layout of the bitfields is implementation dependent)
#include <iostream>
#include <cstdint>
using namespace std;
#pragma pack(push, 1)
struct Byte
{
Byte(uint8_t value):
_value(value)
{
}
union
{
uint8_t _value;
struct {
uint8_t _bit0:1;
uint8_t _bit1:1;
uint8_t _bit2:1;
uint8_t _bit3:1;
uint8_t _bit4:1;
uint8_t _bit5:1;
uint8_t _bit6:1;
uint8_t _bit7:1;
};
};
};
#pragma pack(pop, 1)
int main()
{
Byte myByte(8);
cout << "Bit 0: " << (int)myByte._bit0 <<endl;
cout << "Bit 1: " << (int)myByte._bit1 <<endl;
cout << "Bit 2: " << (int)myByte._bit2 <<endl;
cout << "Bit 3: " << (int)myByte._bit3 <<endl;
cout << "Bit 4: " << (int)myByte._bit4 <<endl;
cout << "Bit 5: " << (int)myByte._bit5 <<endl;
cout << "Bit 6: " << (int)myByte._bit6 <<endl;
cout << "Bit 7: " << (int)myByte._bit7 <<endl;
if(myByte._bit3)
{
cout << "Bit 3 is on" << endl;
}
}
In C++ you use std::bitset<4>. This will use a minimal number of words for storage and hide all the masking from you. It's really hard to separate the C++ library from the language because so much of the language is implemented in the standard library. In C there's no direct way to create an array of single bits like this, instead you'd create one element of four bits or do the manipulation manually.
EDIT:
The 1 bit unsigned int is a valid type, so why shouldn't you be able
to get an array of it?
Actually you can't use a 1 bit unsigned type anywhere other than the context of creating a struct/class member. At that point it's so different from other types it doesn't automatically follow that you could create an array of them.
C++ would use std::vector<bool> or std::bitset<N>.
In C, to emulate std::vector<bool> semantics, you use a struct like this:
struct Bits {
Word word[];
size_t word_count;
};
where Word is an implementation-defined type equal in width to the data bus of the CPU; wordsize, as used later on, is equal to the width of the data bus.
E.g. Word is uint32_fast_t for 32-bit machines, uint64_fast_t for 64-bit machines;
wordsize is 32 for 32-bit machines, and 64 for 64-bit machines.
You use functions/macros to set/clear bits.
To extract a bit, use GET_BIT(bits, bit) (((bits)->)word[(bit)/wordsize] & (1 << ((bit) % wordsize))).
To set a bit, use SET_BIT(bits, bit) (((bits)->)word[(bit)/wordsize] |= (1 << ((bit) % wordsize))).
To clear a bit, use CLEAR_BIT(bits, bit) (((bits)->)word[(bit)/wordsize] &= ~(1 << ((bit) % wordsize))).
To flip a bit, use FLIP_BIT(bits, bit) (((bits)->)word[(bit)/wordsize] ^= (1 << ((bit) % wordsize))).
To add resizeability as per std::vector<bool>, make a resize function which calls realloc on Bits.word and changes Bits.word_count accordingly. The exact details of this is left as a problem.
The same applies for proper range-checking of bit indices.
this is abusive, and relies on an extension... but it worked for me:
struct __attribute__ ((__packed__)) A
{
unsigned int bit0:1;
unsigned int bit1:1;
unsigned int bit2:1;
unsigned int bit3:1;
};
union U
{
struct A structVal;
int intVal;
};
int main()
{
struct A a = {1, 0, 1, 1};
union U u;
u.structVal = a;
for (int i =0 ; i<4; i++)
{
int mask = 1 << i;
printf("%d\n", (u.intVal & mask) >> i);
}
return 0;
}
You can also use an array of integers (ints or longs) to build an arbitrarily large bit mask. The select() system call uses this approach for its fd_set type; each bit corresponds to the numbered file descriptor (0..N). Macros are defined: FD_CLR to clear a bit, FD_SET to set a bit, FD_ISSET to test a bit, and FD_SETSIZE is the total number of bits. The macros automatically figure out which integer in the array to access and which bit in the integer. On Unix, see "sys/select.h"; under Windows, I think it is in "winsock.h". You can use the FD technique to make your own definitions for a bit mask. In C++, I suppose you could create a bit-mask object and overload the [] operator to access individual bits.
You can create a bit list by using a struct pointer. This will use more than a bit of space per bit written though, since it'll use one byte (for an address) per bit:
struct bitfield{
unsigned int bit : 1;
};
struct bitfield *bitstream;
Then after this:
bitstream=malloc( sizeof(struct bitfield) * numberofbitswewant );
You can access them like so:
bitstream[bitpointer].bit=...

How does one store a vector<bool> or a bitset into a file, but bit-wise?

How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
// export a bitset consisting of I bits to an output stream.
// Eight bits are stored to a single stream byte.
unsigned int i = 0; // the current bit index
unsigned char c = 0; // the current byte
short bits = 0; // to process next byte
while(i < in.size())
{
c = c << 1; //
if(in.at(i)) ++c; // adding 1 if bit is true
++bits;
if(bits == 8)
{
out.put((char)c);
c = 0;
bits = 0;
}
++i;
}
// dump remaining
if(bits != 0) {
// pad the byte so that first bits are in the most significant positions.
while(bits != 8)
{
c = c << 1;
++bits;
}
out.put((char)c);
}
return;
}
template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
// read bytes from the input stream to a bitset of size I.
/* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
unsigned int i = 0; // current bit index
unsigned char mask = 0x80; // current byte mask
unsigned char c = 0; // current byte in stream
while(in.good() && (i < I))
{
if((i%8) == 0) // retrieve next character
{ c = in.get();
mask = 0x80;
}
else mask = mask >> 1; // shift mask
out.at(i) = (c & mask);
++i;
}
}
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
How about this
#include <sys/time.h>
#include <unistd.h>
#include <algorithm>
#include <fstream>
#include <vector>
...
{
std::srand(std::time(nullptr));
std::vector<bool> vct1, vct2;
vct1.resize(20000000, false);
vct2.resize(20000000, false);
// insert some data
for (size_t i = 0; i < 1000000; i++) {
vct1[std::rand() % 20000000] = true;
}
// serialize to file
std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
auto vct1_iter = vct1.begin();
vct1_iter += i;
uint32_t block_num = i / std::_S_word_bit;
std::_Bit_type block_val = *(vct1_iter._M_p);
if (block_val != 0) {
// only write not-zero block
ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
}
}
ofs.close();
// deserialize
std::ifstream ifs("bitset", std::ios::in);
ifs.seekg(0, std::ios::end);
uint64_t file_size = ifs.tellg();
ifs.seekg(0);
uint64_t load_size = 0;
while (load_size < file_size) {
uint32_t block_num;
ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
std::_Bit_type block_value;
ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
auto offset = block_num * std::_S_word_bit;
if (offset >= vct2.size()) {
std::cout << "error! already touch end" << std::endl;
break;
}
auto iter = vct2.begin();
iter += offset;
*(iter._M_p) = block_value;
}
ifs.close();
// check result
int count_true1 = std::count(vct1.begin(), vct1.end(), true);
int count_true2 = std::count(vct2.begin(), vct2.end(), true);
std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;
}
One way might be:
std::vector<bool> data = /* obtain bits somehow */
// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS));
for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
int bit = data[byteIndex * CHAR_BITS + bitIndex];
bytes[byteIndex] |= bit << bitIndex;
}
}
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.