how to combine binary nth bit of char vector - c++

Used to working in python and trying abstract how to access the binary elements of a single char vector item. The issue is that python was very slow and i'm translating this to c++. I have a binary file and i read the file into a std::vector<char> buffer(1024) and the data is organized so that there are 32 channels and each channel is 32 bytes(256 bits) long. A sample consists of 1 bit from each of the 32 channels. So there are 256 samples in a set. What is the best way to read combine the nth bit of each 32 byte channel into a sample? Python has a bitstring module, anything related to that in c++?
I'm not asking how to read raw binary data into a bitset vector. I am asking how to read the nth bit of a char vector.

Don't read the file into std::vector<unsigned char>, instead use std::vector<std::bitset<32>>.
First, read the file into an std::vector of std::bitset-s. Since it only have a constructor from an unsigned long, we have to do it like this:
std::vector<std::bitset<32>> batches;
std::ifstream fin(<path>, std::ios::binary);
uint32_t x;
while (fin >> x)
batches.emplace_back(x);
Now, we have a vector that contains 256 batches of 32 bits each. Let's create the samples from them:
std::vector<std::bitset<256>> samples;
for (unsigned i = 0; i < 32; i++) {
std::bitset<256> sample;
for (unsigned j = 0; j < 256; j++)
sample[j] = batches[j][i];
samples.push_back(sample);
}
Now your samples vector contains 32 samples of 256 bits each.
samples[i] // <-- Sample i
samples[i][j] // <-- Bit j in sample i

Related

How to store Huffman Codes in a binary file c++?

I was working on a Huffman project to compress text files. I was able to generate the required codes. I read the whole file and accordingly stored the codes in a "vector char" variable. I also padded the encoded vector.
vector<char> padding(vector<char> text)
{
int num = text.size();
unsigned int pad_value = 32-(num%32);
for(int i=0;i<pad_value;i++){
text.push_back('0');
}
string pad_info = bitset<32>(pad_value).to_string();
for(int i=pad_info.length()-1;i>=0;i--){
text.insert(text.begin(),pad_info[i]);
}
return text;
}
I padded on the base of 32 bits, as I was thinking if using an array of "unsigned int" to directly store the integers in a binary file so that they occupy 4 bytes for every 32 characters. I used this function for that:
vector<unsigned int> build_byte_array(vector<char> padded_text)
{
vector<unsigned int> byte_arr;
for(int i=0;i<padded_text.size();i+=32)
{
string byte="";
for(int j=i;j<i+32;j++){
byte += padded_text[j];
}
unsigned int b = stoul(byte,nullptr,2);
//cout<<b<<":"<<byte<<endl;
byte_arr.push_back(b);
}
return byte_arr;
}
Now the problem is when I write this byte array to binary file using
ofstream output("compressed.bin",ios::binary);
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
}
I get a binary file which is bigger than the original text file. How do I solve that or what error am I making.
Edit : I tried to compress a file of about 2,493 KB (for testing purposes) and it generated a compressed.bin file of 3,431 KB. So, I don't think padding is the issue here.
I also tried with 15KB file but the size of always increases after using this algo.
I tried using:
for(int i=0;i<byte_array.size();i++){
unsigned int a = byte_array[i];
char b = (char)a;
output.write((char*)(&a),sizeof(b));
}
but after using this I am unable to recover the original byte array when decompressing the file.
unsigned int a = byte_array[i];
output.write((char*)(&a),sizeof(a));
The size of the write is sizeof(a) which is usually 4 bytes.
An unsigned int is not a byte. A more suitable type for a byte would be std::byte, uint8_t, or unsigned char.
You are expanding your data with padding, so if you're not getting much compression or there's not much data to begin with, the output could easily be larger.
You don't need to pad nearly as much as you do. First off, you are adding 32 bits when the data already ends on a word boundary (when num is a multiple of 32). Pad zero bits in that case. Second, you are inserting 32 bits at the start to record how many bits you padded, where five bits would suffice to encode 0..31. Third, you could write bytes instead of ints, so the padding on the end could be 0..7 bits, and you could prepend three bits instead of five. The padding overall could be reduced from your current 33..64 bits to 3..10 bits.

How can you read different sized bit values from a file?

I'm reading a bunch of bit values from a text file which are in binary from because I stored them using fwrite. The problem is that the first value in the file is 5 bytes in size and the next 4800 values are 2 bytes in size. So when I try to cycle through the file and read the values it will give me the wrong results because my program does not know that it should take 5 bytes the first time and then 2 bytes the remaining 4800 times.
Here is how I'm cycling through the file:
long lSize;
unsigned short * buffer;
size_t result;
pFile = fOpen("dataValues.txt", "rb");
lSize = ftell(pFile);
buffer = (unsigned short *) malloc (sizeof(unsigned short)*lSize);
size_t count = lSize/sizeof(short);
for(size_t i = 0; i < count; ++i)
{
result = fread(buffer+i, sizeof(unsigned short), 1, pFile);
print("%u\n", buffer[i]);
}
I'm pretty sure I'm going to need to change my fread statement because the first value is of type time_t so I'll probably need a statement that looks like this:
result = fread(buffer+i, sizeof(time_t), 1, pFile);
However, this did not work work when I tried it and I think it's because I am not changing the starting position properly. I think that while I do read 5 bytes worth of data, I don't move the starting position enough.
Does anyone here have a good understanding of fread? Can you please let me know what I can change to make my program accomplish what I need.
EDIT:
This is how I'm writing to the file.
fwrite(&timer, sizeof(timer), 1, pFile);
fwrite(ptr, sizeof(unsigned short), rawData.size(), pFile);
EDIT2:
I tried to read the file using ifstream
int main()
{
time_t x;
ifstream infile;
infile.open("binaryValues.txt", ios::binary | ios::in);
infile.read((char *) &x, sizeof(x));
return 0;
}
However, now it doesn't compile and just give me a bunch of undefined reference to errors to code that I don't even have written.
I don't see the problem:
uint8_t five_byte_buffer[5];
uint8_t two_byte_buffer[2];
//...
ifstream my_file(/*...*/);
my_file.read(&five_byte_buffer[0], 5);
my_file.read(&two_byte_buffer[0], 2);
So, what is your specific issue?
Edit 1: Reading in a loop
while (my_file.read(&five_byte_buffer[0], 5))
{
my_file.read(&two_byte_buffer[0], 5);
Process_Data();
}
You can't. Streams are byte, almost always octet (8 bit byte) oriented.
You can easily enough build a bit-oriented stream on top of that. You just keep a few bytes in a buffer and keep track of which bit is current. Watch out for getting the last few bits, and attempts to mix byte access with bit access.
Untested but this is the general idea.
struct bitstream
{
unsigned long long rack; // 64 bits rack
FILE *fp; // file opened for reading
int rackpos; // 0 - 63, poisition of bits read.
}
int getbits(struct bitstream *bs, int Nbits)
{
unsigned long long mask = 0x8000 0000 0000 0000;
int answer = 0;
while(bs->rackpos > 8)
{
bs->rack <<= 8;
bs->rack |= fgetc(bs->fp);
bs->rackpos -= 8;
}
mask >>= bs->rackpos;
for(i=0;i<Nbits;i++)
{
answer <<= 1;
answer |= bs->rack & mask;
mask >>= 1;
}
bs->rackpos += Nbits;
return answer;
}
You need to decide how you know when the stream is terminated. As is you'll corrupt the last few bits with the EOF read by fgetc().

Serialize/deserialize unsigned char

I'm working on an API for an embedded device, and need to display an image generated (by the API). The screen attached to the device allows me to render bitmaps, with data stored as unsigned char image[] = { 0B00000000, 0B00001111, 0B11111110... }.
What is the easiest way to deserialize a string in whatever format needed?
My approach was to create a stringstream, separate by comma and push to vector<char>. However, the function to render bitmaps will only accept char, and from what I can find online it seems to be quite difficult to convert it. Ideally, I'd rather not use a vector at all, as including it adds several kbs to the project, which is limited in size by both the download speed of the embedded device (firmware is transferred by EDGE) and the onboard storage.
From the comments, it sounds like you want to convert a string composed of a series of "0b00000000" style literals, comma separated, into an array of their actual values. The way I would do this is to:
Get the number of bytes in the image (I assume this is known from the string length?).
Create a std::vector of unsigned char to hold the results.
For each byte in the input, construct a std::bitset from the string value, and then get its actual value.
Here's a code example. Since you have said you'd rather not use vector I have used C-style arrays and strings:
#include <bitset>
#include <cstring>
#include <iostream>
#include <memory>
int main() {
auto input = "0b00000000,0b00001111,0b11111111";
auto length = strlen(input);
// Get the number of bytes from the string length. Each byte takes 10 chars
// plus a comma separator.
int size = (length + 1) / 11;
// Allocate memory to hold the result.
std::unique_ptr<unsigned char[]> bytes(new unsigned char[size]);
// Populate each byte individually.
for (int i = 0; i < size; ++i) {
// Create the bitset. The stride is 11, and skip the first 2 characters
// to skip the 0b prefix.
std::bitset<8> bitset(input + 2 + i * 11, 8);
// Store the resulting byte.
bytes[i] = bitset.to_ulong();
}
// Now loop back over each byte, and output it to confirm the result.
for (int i = 0; i < size; ++i) {
std::cout << "0b" << std::bitset<8>(bytes[i]) << std::endl;
}
}

Read binary data on C++ where different columns have different data types

We have a binary file that represents data arranged in columns.
Each column has a different data format, for example:
column 1: 8 bytes (unsigned long int)
column 2: 4 bytes (int)
column 3: 4 bytes (float)
What would be the best way to read these files in C++, I can do it in matlab, but I don't really have much clue of how to do it in C++
Assuming that these values are in order:
unsigned long int dataMember0 = 0;
int dataMember1 = 0;
float dataMember2 = 0.0;
std::ifstream fileStream("file.bin", std::ios::in | std::ios::binary);
fileStream.read((char*)&dataMember0, sizeof(unsigned long int));
fileStream.read((char*)&dataMember1, sizeof(int));
fileStream.read((char*)&dataMember2, sizeof(float));
You cast a char pointer because it is being read as an array of bytes (char is one byte). If you want to loop this process: while(fileStream) {...} will excute until there is no more to read

How can I print a bit instead of byte in a file?

I am using huffman algorithm to develop a file compressor and right now I am facing a problem which is:
By using the algorithm to the word:
stackoverflow, i get the following result:
a,c,e,f,k,l,r,s,t,v,w = 1 time repeated
o = 2 times repeated
a,c,e,f,k,l,r,s,t,v,w = 7.69231%
and
o = 15.3846%
So I start inserting then into a Binary Tree, which will get me the results:
o=00
a=010
e=0110
c=0111
t=1000
s=1001
w=1010
v=1011
k=1100
f=1101
r=1110
l=1111
which means the path for the character in the tree, considering 0 to be left and 1 to right.
then the word "stackoverflow" will be:
100110000100111010011111000010110110111011011111001010
and well, I want to put that whole value into a binary file to be in bits, which will result in 47bits, which would happen to be 6bytes, but instead I can only make it 47bytes because the minimun to put into a file with fwrite or fprintf is 1byte, by using sizeof(something).
Than my question is: how can I print in my file only a single bit?
Just write the "header" to the file: the number of bits and then "pack" the bits into bytes padding the last one. Here's a sample.
#include <stdio.h>
FILE* f;
/* how many bits in current byte */
int bit_counter;
/* current byte */
unsigned char cur_byte;
/* write 1 or 0 bit */
void write_bit(unsigned char bit)
{
if(++bit_counter == 8)
{
fwrite(&cur_byte,1,1,f);
bit_counter = 0;
cur_byte = 0;
}
cur_byte <<= 1;
cur_byte |= bit;
}
int main()
{
f = fopen("test.bits", "w");
cur_byte = 0;
bit_counter = 0;
/* write the number of bits here to decode the bitstream later (47 in your case) */
/* int num = 47; */
/* fwrite(num, 1, 4, f); */
write_bit(1);
write_bit(0);
write_bit(0);
/* etc... - do this in a loop for each encoded character */
/* 100110000100111010011111000010110110111011011111001010 */
if(bit_counter > 0)
{
// pad the last byte with zeroes
cur_byte <<= 8 - bit_counter;
fwrite(&cur_byte, 1, 1, f);
}
fclose(f);
return 0;
}
To do the full Huffman encoder you'll have to write the bit codes at the beginning, of course.
This is sort of an encoding issue. The problem is that files can only contain bytes - so 1 and 0 can only be '1' and '0' in a file - the characters for 1 and 0, which are bytes.
What you'll have to do is to pack the bits into bytes, creating a file that contains the bits in a set of bytes. You won't be able to open the file in a text editor - it doesn't know you want to display each bit as a 1 or 0 char, it will display whatever each packed byte turns out to be. You could open it with an editor that understands how to work with binary files, though. For instance, vim can do this.
As far as extra trailing bytes or an end-of-file marker, you're going to have to create some sort of encoding convention. For example, you can pack and pad with extra zeros, like you mention in your comments, but then by convention have the first N bytes be metadata - e.g. the data length, how many bits are interesting in your file. This sort of thing is very common.
You'll need to manage this yourself, by buffering the bits to write and only actually writing data when you have a complete byte. Something like...
void writeBit(bool b)
{
static char buffer=0;
static int bitcount=0;
buffer = (buffer << 1) | (b ? 1:0);
if (++bitcount == 8)
{
fputc(buffer); // write out the byte
bitcount = 0;
buffer = 0;
}
}
The above isn't reentrant (and is likely to be pretty inefficient) - and you need to make sure you somehow flush any half-written byte at the end, (write an extra 7 zero bits, maybe) but you should get the general idea.