Binary Output c++ - c++

I'm working on a C++ text file compression program for a class.
I have everything working except being able to output to a file in binary mode.
I am using:
FILE* pFile;
pFile = fopen(c, "wb");
In order to open the file for writing in binary mode,
bool Buffer[8] = { 0, 0, 0,0, 0,0, 0,0 };
Im using a buffer of bools (initialized to all 0's) to store the 1's and 0's to store each byte of data until I use fwrite.
vector<bool> temp2 = bitstring[temp];//bitstring refers to a vector in a map at key [temp]
for (vector<bool>::iterator v = temp2.begin(); v != temp2.end(); ++v)
{
if (j < 8)//To create an 8 bit/byte buffer
{
if (*v == 1)//Checks the vector at that position to know what the bit is.
Buffer[j] =1;//sets the array at 'j' to 1
else
Buffer[j] =0 ;//sets the array at 'j' to 0
j++;
}
else //once the Buffer hits 8 it will print the buffer to the file
{
fwrite(Buffer,1,sizeof(Buffer), pFile);
clearBuffer(Buffer);//Clears the buffer to restart the process.
j = 0;
}
}
the (vector iterator )is going through a vector of bools that is assigned to a specific character, essentially a unique binary string that represents the character. My problem is that instead of outputting as binary in the buffer, its outputting essentially ASCII characters in binary mode, instead of just the digits as binary. Which ends up making the file WAY bigger than it needs to be. How could i change the buffer to just output bits. I was told to use bitwise operators, but I can't find very much documentation on implementing this in c++. Any help is appreciated, and

I would use std::bitset in the first place, definitely flexible for this purpose
std::bitset<8> BufferBits;
vector<bool> temp2 = bitstring[temp];//bitstring refers to a vector in a map at key [temp]
for (vector<bool>::iterator v = temp2.begin(); v != temp2.end(); ++v)
{
if (j < 8)//To create an 8 bit/byte buffer
{
if (*v == 1)//Checks the vector at that position to know what the bit is.
BufferBits[j] =1;//sets the array at 'j' to 1
else
BufferBits[j] =0 ;//sets the array at 'j' to 0
j++;
}
else //once the Buffer hits 8 it will print the buffer to the file
{
unsigned long i = BufferBits.to_ulong();
unsigned char c = static_cast<unsigned char>( i );
fwrite(&c, sizeof(char), 1, pFile);
BufferBits.reset();//Clears the buffer to restart the process.
j = 0;
}
}
Notice: I just considered the issues regarding your bit-vector

To set a single bit in a byte, use a shift and an or. This code starts with the highest order bit in a byte when j is 0, which is the usual convention.
char data = 0;
// ...
data |= 0x80 >> j;

To set individual bits in a byte you can use a union and bitfields:
typedef union
{
struct
{
unsigned char value;
} byte;
struct
{
unsigned char b0:1;
unsigned char b1:1;
unsigned char b2:1;
unsigned char b3:1;
unsigned char b4:1;
unsigned char b5:1;
unsigned char b6:1;
unsigned char b7:1;
} bits;
} u;
int main()
{
u buffer = {{0}};
buffer.bits.b0 = 1;
buffer.bits.b1 = 0;
buffer.bits.b2 = 1;
cout << static_cast<int>(buffer.byte.value) << endl;
return 0;
}
which would print out 5 (depending on your PC's endianness)

Related

Save not leading zero in an integer variable

I'm implementing the Huffman Algorithm in c++ so I need to write file in binary mode and I need to write to the output file "bits per bits" so I need to save the encoding of every character in a buffer and then print out that binary digit when the buffer reaches the length of a byte.
Here's the code:
char temp[1] = {toPrint[0]};
unsigned long long int binary_buffer = atoi(temp);
int bitscount = 1;
char buf[1];
for(unsigned int i=1;i < strlen(toPrint);i++)
{
if(bitscount == 8)
{
buf[0] = (char)binary_buffer;
fileout.write(buf,1);
bitscount = 0;
binary_buffer = 0;
buf[0] = 0;
}
else
{
temp[0] = toPrint[i];
binary_buffer = (binary_buffer << 1) | atoi(temp);
bitscount++;
}
}
So my problem is that the variable binary_buffer loses the not leading zero because it is an integer value so I lose a big amount of data. How can I save the not leading zero?
Obviously I can't use a char buffer beacause every char weighs 1 byte.
Example:
Text to encode: cccccvvv
Encoding table: c=0 v=1
Text to print out: 00000111
Text printed: 111

Convert unsigned char array of characters to int C++

How can I convert an unsigned char array that contains letters into an integer. I have tried this so for but it only converts up to four bytes. I also need a way to convert the integer back into the unsigned char array .
int buffToInteger(char * buffer)
{
int a = static_cast<int>(static_cast<unsigned char>(buffer[0]) << 24 |
static_cast<unsigned char>(buffer[1]) << 16 |
static_cast<unsigned char>(buffer[2]) << 8 |
static_cast<unsigned char>(buffer[3]));
return a;
}
It looks like you're trying to use a for loop, i.e. repeating a task over and over again, for an in-determinant amount of steps.
unsigned int buffToInteger(char * buffer, unsigned int size)
{
// assert(size <= sizeof(int));
unsigned int ret = 0;
int shift = 0;
for( int i = size - 1; i >= 0, i-- ) {
ret |= static_cast<unsigned int>(buffer[i]) << shift;
shift += 8;
}
return ret;
}
What I think you are going for is called a hash -- converting an object to a unique integer. The problem is a hash IS NOT REVERSIBLE. This hash will produce different results for hash("WXYZABCD", 8) and hash("ABCD", 4). The answer by #Nicholas Pipitone DOES NOT produce different outputs for these different inputs.
Once you compute this hash, there is no way to get the original string back. If you want to keep knowledge of the original string, you MUST keep the original string as a variable.
int hash(char* buffer, size_t size) {
int res = 0;
for (size_t i = 0; i < size; ++i) {
res += buffer[i];
res *= 31;
}
return res;
}
Here's how to convert the first sizeof(int) bytes of the char array to an int:
int val = *(unsigned int *)buffer;
and to convert in back:
*(unsigned int *)buffer = val;
Note that your buffer must be at least the length of your int type size. You should check for this.

Using c++ is it possible to convert an Ascii character to Hex?

I have written a program that sets up a client/server TCP socket over which the user sends an integer value to the server through the use of a terminal interface. On the server side I am executing byte commands for which I need hex values stored in my array.
sprint(mychararray, %X, myintvalue);
This code takes my integer and prints it as a hex value into a char array. The only problem is when I use that array to set my commands it registers as an ascii char. So for example if I send an integer equal to 3000 it is converted to 0x0BB8 and then stored as 'B''B''8' which corresponds to 42 42 38 in hex. I have looked all over the place for a solution, and have not been able to come up with one.
Finally came up with a solution to my problem. First I created an array and stored all hex values from 1 - 256 in it.
char m_list[256]; //array defined in class
m_list[0] = 0x00; //set first array index to zero
int count = 1; //count variable to step through the array and set members
while (count < 256)
{
m_list[count] = m_list[count -1] + 0x01; //populate array with hex from 0x00 - 0xFF
count++;
}
Next I created a function that lets me group my hex values into individual bytes and store into the array that will be processing my command.
void parse_input(char hex_array[], int i, char ans_array[])
{
int n = 0;
int j = 0;
int idx = 0;
string hex_values;
while (n < i-1)
{
if (hex_array[n] = '\0')
{
hex_values = '0';
}
else
{
hex_values = hex_array[n];
}
if (hex_array[n+1] = '\0')
{
hex_values += '0';
}
else
{
hex_values += hex_array[n+1];
}
cout<<"This is the string being used in stoi: "<<hex_values; //statement for testing
idx = stoul(hex_values, nullptr, 16);
ans_array[j] = m_list[idx];
n = n + 2;
j++;
}
}
This function will be called right after my previous code.
sprint(mychararray, %X, myintvalue);
void parse_input(arrayA, size of arrayA, arrayB)
Example: arrayA = 8byte char array, and arrayB is a 4byte char array. arrayA should be double the size of arrayB since you are taking two ascii values and making a byte pair. e.g 'A' 'B' = 0xAB
While I was trying to understand your question I realized what you needed was more than a single variable. You needed a class, this is because you wished to have a string that represents the hex code to be printed out and also the number itself in the form of an unsigned 16 bit integer, which I deduced would be something like unsigned short int. So I created a class that did all this for you named hexset (I got the idea from bitset), here:
#include <iostream>
#include <string>
class hexset {
public:
hexset(int num) {
this->hexnum = (unsigned short int) num;
this->hexstring = hexset::to_string(num);
}
unsigned short int get_hexnum() {return this->hexnum;}
std::string get_hexstring() {return this->hexstring;}
private:
static std::string to_string(int decimal) {
int length = int_length(decimal);
std::string ret = "";
for (int i = (length > 1 ? int_length(decimal) - 1 : length); i >= 0; i--) {
ret = hex_arr[decimal%16]+ret;
decimal /= 16;
}
if (ret[0] == '0') {
ret = ret.substr(1,ret.length()-1);
}
return "0x"+ret;
}
static int int_length(int num) {
int ret = 1;
while (num > 10) {
num/=10;
++ret;
}
return ret;
}
static constexpr char hex_arr[16] = {'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};
unsigned short int hexnum;
std::string hexstring;
};
constexpr char hexset::hex_arr[16];
int main() {
int number_from_file = 3000; // This number is in all forms technically, hex is just another way to represent this number.
hexset hex(number_from_file);
std::cout << hex.get_hexstring() << ' ' << hex.get_hexnum() << std::endl;
return 0;
}
I assume you'll probably want to do some operator overloading to make it so you can add and subtract from this number or assign new numbers or do any kind of mathematical or bit shift operation.

Convert a decoded Base64 byte string to a vector of bools

I have a base64 string containing bits, I have alredy decoded it with the code in here. But I'm unable to transform the resultant string in bits I could work with. Is there a way to convert the bytes contained in the code to a vector of bools containing the bits of the string?
I have tried converting the char with this code but it failed to conver to a proper char
void DecodedStringToBit(std::string const& decodedString, std::vector<bool> &bits) {
int it = 0;
for (int i = 0; i < decodedString.size(); ++i) {
unsigned char c = decodedString[i];
for (unsigned char j = 128; j > 0; j <<= 1) {
if (c&j) bits[++it] = true;
else bits[++it] = false;
}
}
}
Your inner for loop is botched: it's shifting j the wrong way. And honestly, if you want to work with 8-bit values, you should use the proper <stdint.h> types instead of unsigned char:
for (uint8_t j = 128; j; j >>= 1)
bits.push_back(c & j);
Also, remember to call bits.reserve(decodedString.size() * 8); so your program doesn't waste a bunch of time on resizing.
I'm assuming the bit order is MSB first. If you want LSB first, the loop becomes:
for (uint8_t j = 1; j; j <<= 1)
In OP's code, it is not clear if the vector bits is of sufficient size, for example, if it is resized by the caller (It should not be!). If not, then the vector does not have space allocated, and hence bits[++it] may not work; the appropriate thing might be to push_back. (Moreover, I think the code might need the post-increment of it, i.e. bits[it++] to start from bits[0].)
Furthermore, in OP's code, the purpose of unsigned char j = 128 and j <<= 1 is not clear. Wouldn't j be all zeros after the first iteration? If so, the inner loop would always run for only one iteration.
I would try something like this (not compiled):
void DecodedStringToBit(std::string const& decodedString,
std::vector<bool>& bits) {
for (auto charIndex = 0; charIndex != decodedString.size(); ++charIndex) {
const unsigned char c = decodedString[charIndex];
for (int bitIndex = 0; bitIndex != CHAR_BIT; ++bitIndex) {
// CHAR_BIT = bits in a char = 8
const bool bit = c & (1 << bitIndex); // bitwise-AND with mask
bits.push_back(bit);
}
}
}

How does one store a vector<bool> or a bitset into a file, but bit-wise?

How to write bitset data to a file?
The first answer doesn't answer the question correctly, since it takes 8 times more space than it should.
How would you do it ? I really need it to save a lot of true/false values.
Simplest approach : take consecutive 8 boolean values, represent them as a single byte, write that byte to your file. That would save lot of space.
In the beginning of file, you can write the number of boolean values you want to write to the file; that number will help while reading the bytes from file, and converting them back into boolean values!
If you want the bitset class that best supports converting to binary, and your bitset is more than the size of unsigned long, then the best option to use is boost::dynamic_bitset. (I presume it is more than 32 and even 64 bits if you are that concerned about saving space).
From dynamic_bitset you can use to_block_range to write the bits into the underlying integral type. You can construct the dynamic_bitset back from the blocks by using from_block_range or its constructor from BlockInputIterator or by making append() calls.
Now you have the bytes in their native format (Block) you still have the issue of writing it to a stream and reading it back.
You will need to store a bit of "header" information first: the number of blocks you have and potentially the endianness. Or you might use a macro to convert to a standard endianness (eg ntohl but you will ideally use a macro that is no-op for your most common platform so if that is little-endian you probably want to store that way and convert only for big-endian systems).
(Note: I am assuming that boost::dynamic_bitset standardly converts integral types the same way regardless of underlying endianness. Their documentation does not say).
To write numbers binary to a stream use os.write( &data[0], sizeof(Block) * nBlocks ) and to read use is.read( &data[0], sizeof(Block) * nBlocks ) where data is assumed to be vector<Block> and before read you must do data.resize(nBlocks) (not reserve()). (You can also do weird stuff with istream_iterator or istreambuf_iterator but resize() is probably better).
Here is a try with two functions that will use a minimal number of bytes, without compressing the bitset.
template<int I>
void bitset_dump(const std::bitset<I> &in, std::ostream &out)
{
// export a bitset consisting of I bits to an output stream.
// Eight bits are stored to a single stream byte.
unsigned int i = 0; // the current bit index
unsigned char c = 0; // the current byte
short bits = 0; // to process next byte
while(i < in.size())
{
c = c << 1; //
if(in.at(i)) ++c; // adding 1 if bit is true
++bits;
if(bits == 8)
{
out.put((char)c);
c = 0;
bits = 0;
}
++i;
}
// dump remaining
if(bits != 0) {
// pad the byte so that first bits are in the most significant positions.
while(bits != 8)
{
c = c << 1;
++bits;
}
out.put((char)c);
}
return;
}
template<int I>
void bitset_restore(std::istream &in, std::bitset<I> &out)
{
// read bytes from the input stream to a bitset of size I.
/* for debug */ //for(int n = 0; n < I; ++n) out.at(n) = false;
unsigned int i = 0; // current bit index
unsigned char mask = 0x80; // current byte mask
unsigned char c = 0; // current byte in stream
while(in.good() && (i < I))
{
if((i%8) == 0) // retrieve next character
{ c = in.get();
mask = 0x80;
}
else mask = mask >> 1; // shift mask
out.at(i) = (c & mask);
++i;
}
}
Note that probably using a reinterpret_cast of the portion of memory used by the bitset as an array of chars could also work, but it is maybe not portable accross systems because you don't know what the representation of the bitset is (endianness?)
How about this
#include <sys/time.h>
#include <unistd.h>
#include <algorithm>
#include <fstream>
#include <vector>
...
{
std::srand(std::time(nullptr));
std::vector<bool> vct1, vct2;
vct1.resize(20000000, false);
vct2.resize(20000000, false);
// insert some data
for (size_t i = 0; i < 1000000; i++) {
vct1[std::rand() % 20000000] = true;
}
// serialize to file
std::ofstream ofs("bitset", std::ios::out | std::ios::trunc);
for (uint32_t i = 0; i < vct1.size(); i += std::_S_word_bit) {
auto vct1_iter = vct1.begin();
vct1_iter += i;
uint32_t block_num = i / std::_S_word_bit;
std::_Bit_type block_val = *(vct1_iter._M_p);
if (block_val != 0) {
// only write not-zero block
ofs.write(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
ofs.write(reinterpret_cast<char*>(&block_val), sizeof(std::_Bit_type));
}
}
ofs.close();
// deserialize
std::ifstream ifs("bitset", std::ios::in);
ifs.seekg(0, std::ios::end);
uint64_t file_size = ifs.tellg();
ifs.seekg(0);
uint64_t load_size = 0;
while (load_size < file_size) {
uint32_t block_num;
ifs.read(reinterpret_cast<char*>(&block_num), sizeof(uint32_t));
std::_Bit_type block_value;
ifs.read(reinterpret_cast<char*>(&block_value), sizeof(std::_Bit_type));
load_size += sizeof(uint32_t) + sizeof(std::_Bit_type);
auto offset = block_num * std::_S_word_bit;
if (offset >= vct2.size()) {
std::cout << "error! already touch end" << std::endl;
break;
}
auto iter = vct2.begin();
iter += offset;
*(iter._M_p) = block_value;
}
ifs.close();
// check result
int count_true1 = std::count(vct1.begin(), vct1.end(), true);
int count_true2 = std::count(vct2.begin(), vct2.end(), true);
std::cout << "count_true1: " << count_true1 << " count_true2: " << count_true2 << std::endl;
}
One way might be:
std::vector<bool> data = /* obtain bits somehow */
// Reserve an appropriate number of byte-sized buckets.
std::vector<char> bytes((int)std::ceil((float)data.size() / CHAR_BITS));
for(int byteIndex = 0; byteIndex < bytes.size(); ++byteIndex) {
for(int bitIndex = 0; bitIndex < CHAR_BITS; ++bitIndex) {
int bit = data[byteIndex * CHAR_BITS + bitIndex];
bytes[byteIndex] |= bit << bitIndex;
}
}
Note that this assumes you don't care what the bit layout ends up being in memory, because it makes no adjustments for anything. But as long as you also serialize out the number of bits that were actually stored (to cover cases where you have a bit count that isn't a multiple of CHAR_BITS) you can deserialize exactly the same bitset or vector as you had originally like this.
(I'm not happy with that bucket size computation but it's 1am and I'm having trouble thinking of something more elegant).
#include "stdio"
#include "bitset"
...
FILE* pFile;
pFile = fopen("output.dat", "wb");
...
const unsigned int size = 1024;
bitset<size> bitbuffer;
...
fwrite (&bitbuffer, 1, size/8, pFile);
fclose(pFile);
Two options:
Spend the extra pounds (or pence, more likely) for a bigger disk.
Write a routine to extract 8 bits from the bitset at a time, compose them into bytes, and write them to your output stream.