Outputting a Binary String to a Binary File in C++ - c++

Let's say I have a string that contains a binary like this one "0110110101011110110010010000010". Is there a easy way to output that string into a binary file so that the file contains 0110110101011110110010010000010? I understand that the computer writes one byte at a time but I am having trouble coming up with a way to write the contents of the string as a binary to a binary file.

Use a bitset:
//Added extra leading zero to make 32-bit.
std::bitset<32> b("00110110101011110110010010000010");
auto ull = b.to_ullong();
std::ofstream f;
f.open("test_file.dat", std::ios_base::out | std::ios_base::binary);
f.write(reinterpret_cast<char*>(&ull), sizeof(ull));
f.close();

I am not sure if that's what you need but here you go:
#include<iostream>
#include<fstream>
#include<string>
using namespace std;
int main() {
string tmp = "0110110101011110110010010000010";
ofstream out;
out.open("file.txt");
out << tmp;
out.close();
}

Make sure your output stream is in binary mode. This handles the case where the string size is not a multiple of the number of bits in a byte. Extra bits are set to 0.
const unsigned int BitsPerByte = CHAR_BIT;
unsigned char byte;
for (size_t i = 0; i < data.size(); ++i)
{
if ((i % BitsPerByte) == 0)
{
// first bit of a byte
byte = 0;
}
if (data[i] == '1')
{
// set a bit to 1
byte |= (1 << (i % BitsPerByte));
}
if (((i % BitsPerByte) == BitsPerByte - 1) || i + 1 == data.size())
{
// last bit of the byte
file << byte;
}
}

Related

Converting string hexadecmials to unsigned char (BYTE) in C

I want to convert the hexadecimal string value 0x1B6 to unsigned char - where it will store the value in the format 0x1B, 0x60 We had achieved the scenarios in C++, but C doesn't support std::stringstream.
The following code is C++, how do I achieve similar behavior in C?
char byte[2];
std::string hexa;
std::string str = "0x1B6" // directly assigned the char* value in to string here
int index =0;
unsigned int i;
for(i = 2; i < str.length(); i++) {
hexa = "0x"
if(str[i + 1] !NULL) {
hexa = hexa + str[i] + str[i + 1];
short temp;
std::istringstream(hexa) >> std::hex >> temp;
byte[index] = static_cast<BYTE>(temp);
} else {
hexa = hexa+ str[i] + "0";
short temp;
std::istringstream(hexa) >> std::hex >> temp;
byte[index] = static_cast<BYTE>(temp);
}
}
output:
byte[0] --> 0x1B
byte[1]--> 0x60
I don't think your solution is very efficient. But disregarding that, with C you would use strtol. This is an example of how to achieve something similar:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
int main(void) {
const char *hex_string = "0x1B60";
long hex_as_long = strtol(hex_string, NULL, 16);
printf("%lx\n", hex_as_long);
// From right to left
for(int i = 0; i < strlen(&hex_string[2]); i += 2) {
printf("%x\n", (hex_as_long >> (i * 4)) & 0xff);
}
printf("---\n");
// From left to right
for(int i = strlen(&hex_string[2]) - 2; i >= 0; i -= 2) {
printf("%x\n", (hex_as_long >> (i * 4)) & 0xff);
}
}
So here we get the full value as a long inside hex_as_long. We then print the whole long with the first print and the individual bytes inside the second for loop. We are shifting multiples of 4 bits because one hex digit (0xf) covers exactly 4 bits of data.
To get the bytes or the long printed to a string rather than to stdout (if that is what you want to achieve), you can use strprintf or strnprintf in a similar way to how printf is used, but with a variable or array as destination.
This solution scans whole bytes (0xff) at a time. If you need to handle one hex digit (0xf) at a time you can divide all the operations by two and mask with 0xf instead of 0xff.

C++ Any faster method to write a large binary file?

Goal
My goal is to quickly create a file from a large binary string (a string that contains only 1 and 0).
Straight to the point
I need a function that can achieve my goal. If I am not clear enough, please read on.
Example
Test.exe is running...
.
Inputted binary string:
1111111110101010
Writing to: c:\users\admin\desktop\Test.txt
Done!
File(Test.txt) In Byte(s):
0xFF, 0xAA
.
Test.exe executed successfully!
Explanation
First, Test.exe requested the user to input a binary string.
Then, it converted the inputted binary string to hexadecimal.
Finally, it wrote the converted value to a file called Test.txt.
I've tried
As an fail attempt to achieve my goal, I've created this simple (and possibly horrible) function (hey, at least I tried):
void BinaryStrToFile( __in const char* Destination,
__in std::string &BinaryStr )
{
std::ofstream OutputFile( Destination, std::ofstream::binary );
for( ::UINT Index1 = 0, Dec = 0;
// 8-Bit binary.
Index1 != BinaryStr.length( )/8;
// Get the next set of binary value.
// Write the decimal value as unsigned char to file.
// Reset decimal value to 0.
++ Index1, OutputFile << ( ::BYTE )Dec, Dec = 0 )
{
// Convert the 8-bit binary to hexadecimal using the
// positional notation method - this is how its done:
// http://www.wikihow.com/Convert-from-Binary-to-Decimal
for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
}
OutputFile.close( );
};
Example of usage
#include "Global.h"
void BinaryStrToFile( __in const char* Destination,
__in std::string &BinaryStr );
int main( void )
{
std::string Bin = "";
// Create a binary string that is a size of 9.53674 mb
// Note: The creation of this string will take awhile.
// However, I only start to calculate the speed of writing
// and converting after it is done generating the string.
// This string is just created for an example.
std::cout << "Generating...\n";
while( Bin.length( ) != 80000000 )
Bin += "10101010";
std::cout << "Writing...\n";
BinaryStrToFile( "c:\\users\\admin\\desktop\\Test.txt", Bin );
std::cout << "Done!\n";
#ifdef IS_DEBUGGING
std::cout << "Paused...\n";
::getchar( );
#endif
return( 0 );
};
Problem
Again, that was my fail attempt to achieve my goal. The problem is the speed. It is too slow. It took more than 7 minutes. Are there any method to quickly create a file from a large binary string?
Thanks in advance,
CLearner
I'd suggest removing the substr call in the inner loop. You are allocating a new string and then destroying it for each character that you process. Replace this code:
for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' )
Dec += Inc;
by something like:
for(::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr[Index1 * 8 + Index2 ] == '1' )
Dec += Inc;
The majority of your time is spent here:
for( ::UINT Index2 = 7, Inc = 1; Index2 + 1 != 0; -- Index2, Inc += Inc )
if( BinaryStr.substr( Index1 * 8, 8 )[ Index2 ] == '1' ) Dec += Inc;
When I comment that out the file is written in seconds. I think you need to finetune your conversion.
I think I'd consider something like this as a starting point:
#include <bitset>
#include <fstream>
#include <algorithm>
int main() {
std::ifstream in("junk.txt", std::ios::binary | std::ios::in);
std::ofstream out("junk.bin", std::ios::binary | std::ios::out);
std::transform(std::istream_iterator<std::bitset<8> >(in),
std::istream_iterator<std::bitset<8> >(),
std::ostream_iterator<unsigned char>(out),
[](std::bitset<8> const &b) { return b.to_ulong();});
return 0;
}
Doing a quick test, this processes an input file of 80 million bytes in about 6 seconds on my machine. Unless your files are much larger than what you've mentioned in your question, my guess is this is adequate speed, and the simplicity is going to be hard to beat.
Something not entirely unlike this should be significantly faster:
void
text_to_binary_file(const std::string& text, const char *fname)
{
unsigned char wbuf[4096]; // 4k is a good size of "chunk to write to file"
unsigned int i = 0, j = 0;
std::filebuf fp; // dropping down to filebufs may well be faster
// for this problem
fp.open(fname, std::ios::out|std::ios::trunc);
memset(wbuf, 0, 4096);
for (std::string::iterator p = text.begin(); p != text.end(); p++) {
wbuf[i] |= (1u << (CHAR_BIT - (j+1)));
j++;
if (j == CHAR_BIT) {
j = 0;
i++;
}
if (i == 4096) {
if (fp.sputn(wbuf, 4096) != 4096)
abort();
memset(wbuf, 0, 4096);
i = 0;
j = 0;
}
}
if (fp.sputn(wbuf, i+1) != i+1)
abort();
fp.close();
}
Proper error handling left as an exercise.
So instead of converting back and forth between std::strings, why not use a bunch of machine word-sized integers for fast access?
const size_t bufsz = 1000000;
uint32_t *buf = new uint32_t[bufsz];
memset(buf, 0xFA, sizeof(*buf) * bufsz);
std::ofstream ofile("foo.bin", std::ofstream::binary);
int i;
for (i = 0; i < bufsz; i++) {
ofile << hex << setw(8) << setfill('0') << buf[i];
// or if you want raw binary data instead of formatted hex:
ofile.write(reinterpret_cast<char *>(&buf[i]), sizeof(buf[i]));
}
delete[] buf;
For me, this runs in a fraction of a second.
Even though late, I want to place my example for handling such strings.
Architecture specific optimizations may use unaligned loads of chars into multiple registers for 'squeezing' out the bits in parallel. This untested example code does not check the chars and avoids alignment and endianness requirements. It assumes the characters of that binary string to represent contiguous octets (bytes) with the most significant bit first, not words and double words, etc., where their specific representation in memory (and in that string) would require special treatment for portability.
//THIS CODE HAS NEVER BEEN TESTED! But I hope you get the idea.
//set up an ofstream with a 64KiB buffer
std::vector<char> buffer(65536);
std::ofstream ofs("out.bin", std::ofstream::binary|std::ofstream::out|std::ofstream::trunc);
ofs.rdbuf()->pubsetbuf(&buffer[0],buffer.size());
std::string::size_type bits = Bin.length();
std::string::const_iterator cIt = Bin.begin();
//You may treat cases, where (bits % 8 != 0) as error
//Initialize with the first iteration
uint8_t byte = uint8_t(*cIt++) - uint8_t('0');
byte <<= 1;
for(std::string::size_type i = 1;i < (bits & (~std::string::size_type(0x7)));++i,++cIt)
{
if(i & 0x7) //bit 7 ... 1
{
byte |= uint8_t(*cIt) - uint8_t('0');
byte <<= 1;
}
else //bit 0: write and advance to the the next most significant bit of an octet
{
byte |= uint8_t(*cIt) - uint8_t('0');
ofs.put(byte);
//advance
++i;
++cIt;
byte = uint8_t(*cIt) - uint8_t('0');
byte <<= 1;
}
}
ofs.flush();
This make a 76.2 MB (80,000,000 bytes) file of 1010101010101......
#include <stdio.h>
#include <iostream>
#include <fstream>
using namespace std;
int main( void )
{
char Bin=0;
ofstream myfile;
myfile.open (".\\example.bin", ios::out | ios::app | ios::binary);
int c=0;
Bin = 0xAA;
while( c!= 80000000 ){
myfile.write(&Bin,1);
c++;
}
myfile.close();
cout << "Done!\n";
return( 0 );
};

Base 64 Encoding Losing data

This is my fourth attempt at doing base64 encoding. My first tries work but it isn't standard. It's also extremely slow!!! I used vectors and push_back and erase a lot.
So I decided to re-write it and this is much much faster! Except that it loses data. -__-
I need as much speed as I can possibly get because I'm compressing a pixel buffer and base64 encoding the compressed string. I'm using ZLib. The images are 1366 x 768 so yeah.
I do not want to copy any code I find online because... Well, I like to write things myself and I don't like worrying about copyright stuff or having to put a ton of credits from different sources all over my code..
Anyway, my code is as follows below. It's very short and simple.
const static std::string Base64Chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
inline bool IsBase64(std::uint8_t C)
{
return (isalnum(C) || (C == '+') || (C == '/'));
}
std::string Copy(std::string Str, int FirstChar, int Count)
{
if (FirstChar <= 0)
FirstChar = 0;
else
FirstChar -= 1;
return Str.substr(FirstChar, Count);
}
std::string DecToBinStr(int Num, int Padding)
{
int Bin = 0, Pos = 1;
std::stringstream SS;
while (Num > 0)
{
Bin += (Num % 2) * Pos;
Num /= 2;
Pos *= 10;
}
SS.fill('0');
SS.width(Padding);
SS << Bin;
return SS.str();
}
int DecToBinStr(std::string DecNumber)
{
int Bin = 0, Pos = 1;
int Dec = strtol(DecNumber.c_str(), NULL, 10);
while (Dec > 0)
{
Bin += (Dec % 2) * Pos;
Dec /= 2;
Pos *= 10;
}
return Bin;
}
int BinToDecStr(std::string BinNumber)
{
int Dec = 0;
int Bin = strtol(BinNumber.c_str(), NULL, 10);
for (int I = 0; Bin > 0; ++I)
{
if(Bin % 10 == 1)
{
Dec += (1 << I);
}
Bin /= 10;
}
return Dec;
}
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
int PaddingAmount = ((-Result.size() * 3) & 3);
for (int I = 0; I < PaddingAmount; ++I)
Result += '=';
return Result;
}
std::string DecodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = Data.size(); I > 0; --I)
{
if (Data[I - 1] != '=')
{
std::string Characters = Copy(Data, 0, I);
for (std::size_t J = 0; J < Characters.size(); ++J)
Binary += DecToBinStr(Base64Chars.find(Characters[J]), 6);
break;
}
}
for (std::size_t I = 0; I < Binary.size(); I += 8)
{
Result += (char)BinToDecStr(Copy(Binary, I, 8));
if (I == 0) ++I;
}
return Result;
}
I've been using the above like this:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604)); //IMG.677*604
std::cout<<DecodeBase64(Data); //Prints IMG.677*601
}
As you can see in the above, it prints the wrong string. It's fairly close but for some reason, the 4 is turned into a 1!
Now if I do:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(1366) + "*" + ::ToString(768)); //IMG.1366*768
std::cout<<DecodeBase64(Data); //Prints IMG.1366*768
}
It prints correctly.. I'm not sure what is going on at all or where to begin looking.
Just in-case anyone is curious and want to see my other attempts (the slow ones): http://pastebin.com/Xcv03KwE
I'm really hoping someone could shed some light on speeding things up or at least figuring out what's wrong with my code :l
The main encoding issue is that you are not accounting for data that is not a multiple of 6 bits. In this case, the final 4 you have is being converted into 0100 instead of 010000 because there are no more bits to read. You are supposed to pad with 0s.
After changing your Copy like this, the final encoded character is Q, instead of the original E.
std::string data = Str.substr(FirstChar, Count);
while(data.size() < Count) data += '0';
return data;
Also, it appears that your logic for adding padding = is off because it is adding one too many = in this case.
As far as comments on speed, I'd focus primarily on trying to reduce your usage of std::string. The way you are currently converting the data into a string with 0 and 1 is pretty inefficent considering that the source could be read directly with bitwise operators.
I'm not sure whether I could easily come up with a slower method of doing Base-64 conversions.
The code requires 4 headers (on Mac OS X 10.7.5 with G++ 4.7.1) and the compiler option -std=c++11 to make the #include <cstdint> acceptable:
#include <string>
#include <iostream>
#include <sstream>
#include <cstdint>
It also requires a function ToString() that was not defined; I created:
std::string ToString(int value)
{
std::stringstream ss;
ss << value;
return ss.str();
}
The code in your main() — which is what uses the ToString() function — is a little odd: why do you need to build a string from pieces instead of simply using "IMG.677*604"?
Also, it is worth printing out the intermediate result:
int main()
{
std::string Data = EncodeBase64("IMG." + ::ToString(677) + "*" + ::ToString(604));
std::cout << Data << std::endl;
std::cout << DecodeBase64(Data) << std::endl; //Prints IMG.677*601
}
This yields:
SU1HLjY3Nyo2MDE===
IMG.677*601
The output string (SU1HLjY3Nyo2MDE===) is 18 bytes long; that has to be wrong as a valid Base-64 encoded string has to be a multiple of 4 bytes long (as three 8-bit bytes are encoded into four bytes each containing 6 bits of the original data). This immediately tells us there are problems. You should only get zero, one or two pad (=) characters; never three. This also confirms that there are problems.
Removing two of the pad characters leaves a valid Base-64 string. When I use my own home-brew Base-64 encoding and decoding functions to decode your (truncated) output, it gives me:
Base64:
0x0000: SU1HLjY3Nyo2MDE=
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 31 00 IMG.677*601.
Thus it appears you have encode the null terminating the string. When I encode IMG.677*604, the output I get is:
Binary:
0x0000: 49 4D 47 2E 36 37 37 2A 36 30 34 IMG.677*604
Base64: SU1HLjY3Nyo2MDQ=
You say you want to speed up your code. Quite apart from fixing it so that it encodes correctly (I've not really studied the decoding), you will want to avoid all the string manipulation you do. It should be a bit manipulation exercise, not a string manipulation exercise.
I have 3 small encoding routines in my code, to encode triplets, doublets and singlets:
/* Encode 3 bytes of data into 4 */
static void encode_triplet(const char *triplet, char *quad)
{
quad[0] = base_64_map[(triplet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((triplet[0] & 0x03) << 4) | ((triplet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((triplet[1] & 0x0F) << 2) | ((triplet[2] >> 6) & 0x03)];
quad[3] = base_64_map[triplet[2] & 0x3F];
}
/* Encode 2 bytes of data into 4 */
static void encode_doublet(const char *doublet, char *quad, char pad)
{
quad[0] = base_64_map[(doublet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((doublet[0] & 0x03) << 4) | ((doublet[1] >> 4) & 0x0F)];
quad[2] = base_64_map[((doublet[1] & 0x0F) << 2)];
quad[3] = pad;
}
/* Encode 1 byte of data into 4 */
static void encode_singlet(const char *singlet, char *quad, char pad)
{
quad[0] = base_64_map[(singlet[0] >> 2) & 0x3F];
quad[1] = base_64_map[((singlet[0] & 0x03) << 4)];
quad[2] = pad;
quad[3] = pad;
}
This is written as C code rather than using native C++ idioms, but the code shown should compile with C++ (unlike the C99 initializers elsewhere in the source). The base_64_map[] array corresponds to your Base64Chars string. The pad character passed in is normally '=', but can be '\0' since the system I work with has eccentric ideas about not needing padding (pre-dating my involvement in the code, and it uses a non-standard alphabet to boot) and the code handles both the non-standard and the RFC 3548 standard.
The driving code is:
/* Encode input data as Base-64 string. Output length returned, or negative error */
static int base64_encode_internal(const char *data, size_t datalen, char *buffer, size_t buflen, char pad)
{
size_t outlen = BASE64_ENCLENGTH(datalen);
const char *bin_data = (const void *)data;
char *b64_data = (void *)buffer;
if (outlen > buflen)
return(B64_ERR_OUTPUT_BUFFER_TOO_SMALL);
while (datalen >= 3)
{
encode_triplet(bin_data, b64_data);
bin_data += 3;
b64_data += 4;
datalen -= 3;
}
b64_data[0] = '\0';
if (datalen == 2)
encode_doublet(bin_data, b64_data, pad);
else if (datalen == 1)
encode_singlet(bin_data, b64_data, pad);
b64_data[4] = '\0';
return((b64_data - buffer) + strlen(b64_data));
}
/* Encode input data as Base-64 string. Output length returned, or negative error */
int base64_encode(const char *data, size_t datalen, char *buffer, size_t buflen)
{
return(base64_encode_internal(data, datalen, buffer, buflen, base64_pad));
}
The base64_pad constant is the '='; there's also a base64_encode_nopad() function that supplies '\0' instead. The errors are somewhat arbitrary but relevant to the code.
The main point to take away from this is that you should be doing bit manipulation and building up a string that is an exact multiple of 4 bytes for a given input.
std::string EncodeBase64(std::string Data)
{
std::string Binary = std::string();
std::string Result = std::string();
for (std::size_t I = 0; I < Data.size(); ++I)
{
Binary += DecToBinStr(Data[I], 8);
}
if (Binary.size() % 6)
{
Binary.resize(Binary.size() + 6 - Binary.size() % 6, '0');
}
for (std::size_t I = 0; I < Binary.size(); I += 6)
{
Result += Base64Chars[BinToDecStr(Copy(Binary, I, 6))];
if (I == 0) ++I;
}
if (Result.size() % 4)
{
Result.resize(Result.size() + 4 - Result.size() % 4, '=');
}
return Result;
}

How to Write a binary file in c++

I'm trying to implement the Huffman's encoding algorithm in c++.
my question is : after i got the equivalent binary string for each character , how can i write those zeros and ones as binary on a file not as string 0 or string 1 ?
thanks in advance ...
I hope this code can help you.
You start from a sequence of bytes (1s and 0s) representing the continuous encoding of every character of the input file.
You take every byte of the sequence and add a bit into a temporary byte (char byte)
Every time you fill a byte, you write it to file (you could also wait, for efficiency, to have a bigger data)
At the end, you write the remaining bits to file, filled with trailing zeros, for example
As akappa correctly pointed out, the else branch can be removed if byte is set to 0 after each file writing operation (or, more generically, every time it has been totally filled and flushed somewhere else), so only 1s must be written.
void writeBinary(char *huffmanEncoding, int sequenceLength)
{
char byte = 0;
// For each bit of the sequence
for (int i = 0; i < sequenceLength; i++) {
char bit = huffmanEncoding[i];
// Add a single bit to byte
if (bit == 1) {
// MSB of the sequence to msb of the file
byte |= (1 << (7 - (i % 8)));
// equivalent form: byte |= (1 << (-(i + 1) % 8);
}
else {
// MSB of the sequence to msb of the file
byte &= ~(1 << (7 - (i % 8)));
// equivalent form: byte &= ~(1 << (-(i + 1) % 8);
}
if ((i % 8) == 0 && i > 0) {
//writeByteToFile(byte);
}
}
// Fill the last incomplete byte, if any, and write to file
}
Obtaining individually the encoding of each character in a different data structure is a broken solution, because you need to juxtapose the encoding of each character in the resulting binary file: storing them individually makes that as hard as directly storing them contiguously in a vector of bits.
This consideration suggests using a std::vector<bool> to perform your task, but it is a broken solution because it can't be treated as a c-style array, and you really need that at output time.
This question asks precisely which are the valid alternatives to std::vector<bool>, so I think answers to that question fits perfectly your question.
BTW, what I would do is to just wrap a std::vector<uint8_t> under a class which suits yout needs, like the code attached:
#include <iostream>
#include <vector>
#include <cstdint>
#include <algorithm>
class bitstream {
private:
std::vector<std::uint8_t> storage;
unsigned int bits_used:3;
void alloc_space();
public:
bitstream() : bits_used(0) { }
void push_bit(bool bit);
template <typename T>
void push(T t);
std::uint8_t *get_array();
size_t size() const;
// beware: no reference!
bool operator[](size_t pos) const;
};
void bitstream::alloc_space()
{
if (bits_used == 0) {
std::uint8_t push = 0;
storage.push_back(push);
}
}
void bitstream::push_bit(bool bit)
{
alloc_space();
storage.back() |= bit << 7 - bits_used++;
}
template <typename T>
void bitstream::push(T t)
{
std::uint8_t *t_byte = reinterpret_cast<std::uint8_t*>(&t);
for (size_t i = 0; i < sizeof(t); i++) {
uint8_t byte = t_byte[i];
if (bits_used > 0) {
storage.back() |= byte >> bits_used;
std::uint8_t to_push = (byte & ((1 << (8 - bits_used)) - 1)) << bits_used;
storage.push_back(to_push);
} else {
storage.push_back(byte);
}
}
}
std::uint8_t *bitstream::get_array()
{
return &storage.front();
}
size_t bitstream::size() const
{
const unsigned int m = 0;
return std::max(m, (storage.size() - 1) * 8 + bits_used);
}
bool bitstream::operator[](size_t size) const
{
// No range checking
return static_cast<bool>((storage[size / 8] >> 7 - (size % 8)) & 0x1);
}
int main(int argc, char **argv)
{
bitstream bs;
bs.push_bit(true);
std::cout << bs[0] << std::endl;
bs.push_bit(false);
std::cout << bs[0] << "," << bs[1] << std::endl;
bs.push_bit(true);
bs.push_bit(true);
std::uint8_t to_push = 0xF0;
bs.push_byte(to_push);
for (size_t i = 0; i < bs.size(); i++)
std::cout << bs[i] << ",";
std::cout << std::endl;
}
You cant write to a binary file with only bits; the smallest size of data written is one byte (thus 8 bits).
So what you should do is create a buffer (any size).
char BitBuffer;
Writing to a buffer:
int Location;
bool Value;
if (Value)
BitBuffer |= (1 << Location);
else
BitBuffer &= ~(1 << Location)
The code (1 << Location) generates a number with all 0's except the position specified by Location. Then, if Value is set to true, it sets corresponding bit in Buffer to 1, and to 0 in other case. The binary operations used are fairly simple, if you don't understand them, it should be in any good C++ book/tutorial.
Location should be number in range <0, sizeof(Buffer)-1>, so <0,7> in this case.
Writing buffer to a file is relatively simple when using fstream. Just remember to open it as binary.
ofstream File;
File.open("file.txt", ios::out | ios::binary);
File.write(BitBuffer, sizeof(char))
EDIT: Noticed a bug and fixed it.
EDIT2: You can't use << operators in binary mode, i forgot about it.
Alternative solution : Use std::vector<bool> or std::bitset as a buffer.
This should be even simpler, but I thought I could help you a little bit more.
void WriteData (std::vector<bool> const& data, std::ofstream& str)
{
char Buffer;
for (unsigned int i = 0; i < data.size(); ++i)
{
if (i % 8 == 0 && i != 0)
str.write(Buffer, 1);
else
// Paste buffer setting code here
// Location = i/8;
// Value = data[i];
}
// It might happen that data.size() % 8 != 0. You should fill the buffer
// with trailing zeros and write it individually.
}

How does one store a vector<bool> or a bitset into a file, but bit-wise?

How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
// export a bitset consisting of I bits to an output stream.
// Eight bits are stored to a single stream byte.
unsigned int i = 0; // the current bit index
unsigned char c = 0; // the current byte
short bits = 0; // to process next byte
while(i < in.size())
{
c = c << 1; //
if(in.at(i)) ++c; // adding 1 if bit is true
++bits;
if(bits == 8)
{
out.put((char)c);
c = 0;
bits = 0;
}
++i;
}
// dump remaining
if(bits != 0) {
// pad the byte so that first bits are in the most significant positions.
while(bits != 8)
{
c = c << 1;
++bits;
}
out.put((char)c);
}
return;
}
template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
// read bytes from the input stream to a bitset of size I.
/* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
unsigned int i = 0; // current bit index
unsigned char mask = 0x80; // current byte mask
unsigned char c = 0; // current byte in stream
while(in.good() && (i < I))
{
if((i%8) == 0) // retrieve next character
{ c = in.get();
mask = 0x80;
}
else mask = mask >> 1; // shift mask
out.at(i) = (c & mask);
++i;
}
}
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
How about this
#include <sys/time.h>
#include <unistd.h>
#include <algorithm>
#include <fstream>
#include <vector>
...
{
std::srand(std::time(nullptr));
std::vector<bool> vct1, vct2;
vct1.resize(20000000, false);
vct2.resize(20000000, false);
// insert some data
for (size_t i = 0; i < 1000000; i++) {
vct1[std::rand() % 20000000] = true;
}
// serialize to file
std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
auto vct1_iter = vct1.begin();
vct1_iter += i;
uint32_t block_num = i / std::_S_word_bit;
std::_Bit_type block_val = *(vct1_iter._M_p);
if (block_val != 0) {
// only write not-zero block
ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
}
}
ofs.close();
// deserialize
std::ifstream ifs("bitset", std::ios::in);
ifs.seekg(0, std::ios::end);
uint64_t file_size = ifs.tellg();
ifs.seekg(0);
uint64_t load_size = 0;
while (load_size < file_size) {
uint32_t block_num;
ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
std::_Bit_type block_value;
ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
auto offset = block_num * std::_S_word_bit;
if (offset >= vct2.size()) {
std::cout << "error! already touch end" << std::endl;
break;
}
auto iter = vct2.begin();
iter += offset;
*(iter._M_p) = block_value;
}
ifs.close();
// check result
int count_true1 = std::count(vct1.begin(), vct1.end(), true);
int count_true2 = std::count(vct2.begin(), vct2.end(), true);
std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;
}
One way might be:
std::vector<bool> data = /* obtain bits somehow */
// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS));
for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
int bit = data[byteIndex * CHAR_BITS + bitIndex];
bytes[byteIndex] |= bit << bitIndex;
}
}
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.