I'm working on Huffman coding and I have built the chars frequency table with a
std::map<char,int> frequencyTable;
Then I have built the Huffman tree and then i have built the codes table in this way:
std::map<char,std::vector<bool> > codes;
Now I would read the input file, character by character ,and encode them through the codes table but i don't know how write bits into a binary output file.
Any advice?
UPDATE:
Now i'm trying with these functions:
void Encoder::makeFile()
{
char c,ch;
unsigned char ch2;
while(inFile.get(c))
{
ch=c;
//send the Huffman string to output file bit by bit
for(unsigned int i=0;i < codes[ch].size();++i)
{
if(codes[ch].at(i)==false){
ch2=0;
}else{
ch2=1;
}
encode(ch2, outFile);
}
}
ch2=2; // send EOF
encode(ch2, outFile);
inFile.close();
outFile.close();
}
and this:
void Encoder::encode(unsigned char i, std::ofstream & outFile)
{
int bit_pos=0; //0 to 7 (left to right) on the byte block
unsigned char c; //byte block to write
if(i<2) //if not EOF
{
if(i==1)
c |= (i<<(7-bit_pos)); //add a 1 to the byte
else //i==0
c=c & static_cast<unsigned char>(255-(1<<(7-bit_pos))); //add a 0
++bit_pos;
bit_pos%=8;
if(bit_pos==0)
{
outFile.put(c);
c='\0';
}
}
else
{
outFile.put(c);
}
}
but ,I don't know why ,it doesn't work, the loop is never executed and the encode function is never used, why?
You can't write a single bit directly to a file. The I/O unit of reading/writing is a byte (8-bits). So you need to pack your bools into chunks of 8 bits and then write the bytes. See Writing files in bit form to a file in C or How to write single bits to a file in C for example.
The C++ Standard streams support an access of the smallest unit the underlying CPU supports. That's a byte.
There are implementation of a bit stream class in C++ like the
Stanford Bitstream Class.
Another approach could use the std::bitset class.
Related
I want to store multiple arrays which all entries consist of either 0 or 1.
This file would be quite large if i do it the way i do it.
I made a minimalist version of what i currently do.
#include <iostream>
#include <fstream>
using namespace std;
int main(){
ofstream File;
File.open("test.csv");
int array[4]={1,0,0,1};
for(int i = 0; i < 4; ++i){
File << array[i] << endl;
}
File.close();
return 0;
}
So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways?
If yes, how to do this? Can i also still have line-breaks and maybe even commas in that file? If either of the latter does not work, that's also fine. Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.
Thank you very much!
So basically is there a way of storing this in a binary file or something, since my data is 0 or 1 in the first place anyways? If yes, how to do this? Can i also still have line-breaks and maybe even commas in that file? If either of the latter does not work, that's also fine. Just more importantly, how to store this as a binary file which has only 0 and 1 so my file is smaller.
The obvious solution is to take 64 characters, say A-Z, a-z, 0-9, and + and /, and have each character code for six entries in your table. There is, in fact, a standard for this called Base64. In Base64, A encodes 0,0,0,0,0,0 while / encodes 1,1,1,1,1,1. Each combination of six zeroes or ones has a corresponding character.
This still leaves commas, spaces, and newlines free for your use as separators.
If you want to store the data as compactly as possible, I'd recommend storing it as binary data, where each bit in the binary file represents one boolean value. This will allow you to store 8 boolean values for each byte of disk space you use up.
If you want to store arrays whose lengths are not multiples of 8, it gets a little bit more complicated since you can't store a partial byte, but you can solve that problem by storing an extra byte of meta-data at the end of the file that specifies how many bits of the final data-byte are valid and how many are just padding.
Something like this:
#include <iostream>
#include <fstream>
#include <cstdint>
#include <vector>
using namespace std;
// Given an array of ints that are either 1 or 0, returns a packed-array
// of uint8_t's containing those bits as compactly as possible.
vector<uint8_t> packBits(const int * array, size_t arraySize)
{
const size_t vectorSize = ((arraySize+7)/8)+1; // round up, then +1 for the metadata byte
vector<uint8_t> packedBits;
packedBits.resize(vectorSize, 0);
// Store 8 boolean-bits into each byte of (packedBits)
for (size_t i=0; i<arraySize; i++)
{
if (array[i] != 0) packedBits[i/8] |= (1<<(i%8));
}
// The last byte in the array is special; it holds the number of
// valid bits that we stored to the byte just before it.
// That way if the number of bits we saved isn't an even multiple of 8,
// we can use this value later on to calculate exactly how many bits we should restore
packedBits[vectorSize-1] = arraySize%8;
return packedBits;
}
// Given a packed-bits vector (i.e. as previously returned by packBits()),
// returns the vector-of-integers that was passed to the packBits() call.
vector<int> unpackBits(const vector<uint8_t> & packedBits)
{
vector<int> ret;
if (packedBits.size() < 2) return ret;
const size_t validBitsInLastByte = packedBits[packedBits.size()-1]%8;
const size_t numValidBits = 8*(packedBits.size()-((validBitsInLastByte>0)?2:1)) + validBitsInLastByte;
ret.resize(numValidBits);
for (size_t i=0; i<numValidBits; i++)
{
ret[i] = (packedBits[i/8] & (1<<(i%8))) ? 1 : 0;
}
return ret;
}
// Returns the size of the specified file in bytes, or -1 on failure
static ssize_t getFileSize(ifstream & inFile)
{
if (inFile.is_open() == false) return -1;
const streampos origPos = inFile.tellg(); // record current seek-position
inFile.seekg(0, ios::end); // seek to the end of the file
const ssize_t fileSize = inFile.tellg(); // record current seek-position
inFile.seekg(origPos); // so we won't change the file's read-position as a side effect
return fileSize;
}
int main(){
// Example of packing an array-of-ints into packed-bits form and saving it
// to a binary file
{
const int array[]={0,0,1,1,1,1,1,0,1,0};
// Pack the int-array into packed-bits format
const vector<uint8_t> packedBits = packBits(array, sizeof(array)/sizeof(array[0]));
// Write the packed-bits to a binary file
ofstream outFile;
outFile.open("test.bin", ios::binary);
outFile.write(reinterpret_cast<const char *>(&packedBits[0]), packedBits.size());
outFile.close();
}
// Now we'll read the binary file back in, unpack the bits to a vector<int>,
// and print out the contents of the vector.
{
// open the file for reading
ifstream inFile;
inFile.open("test.bin", ios::binary);
const ssize_t fileSizeBytes = getFileSize(inFile);
if (fileSizeBytes < 0)
{
cerr << "Couldn't read test.bin, aborting" << endl;
return 10;
}
// Read in the packed-binary data
vector<uint8_t> packedBits;
packedBits.resize(fileSizeBytes);
inFile.read(reinterpret_cast<char *>(&packedBits[0]), fileSizeBytes);
// Expand the packed-binary data back out to one-int-per-boolean
vector<int> unpackedInts = unpackBits(packedBits);
// Print out the int-array's contents
cout << "Loaded-from-disk unpackedInts vector is " << unpackedInts.size() << " items long:" << endl;
for (size_t i=0; i<unpackedInts.size(); i++) cout << unpackedInts[i] << " ";
cout << endl;
}
return 0;
}
(You could probably make the file even more compact than that by running zip or gzip on the file after you write it out :) )
You can indeed write and read binary data. However having line breaks and commas would be difficult. Imagine you save your data as boolean data, so only ones and zeros. Then having a comma would mean you need an special character, but you have only ones and zeros!. The next best thing would be to make an object of two booleans, one meaning the usual data you need (c++ would then read the data in pairs of bits), and the other meaning whether you have a comma or not, but I doubt this is what you need. If you want to do something like a csv, then it would be easy to just fix the size of each column (int would be 4 bytes, a string of no more than 32 char for example), and then just read and write accordingly. Suppose you have your binary
To initially save your array of the an object say pets, then you would use
FILE *apFile;
apFile = fopen(FILENAME,"w+");
fwrite(ARRAY_OF_PETS, sizeof(Pet),SIZE_OF_ARRAY, apFile);
fclose(apFile);
To access your idx pet, you would use
Pet m;
ifstream input_file (FILENAME, ios::in|ios::binary|ios::ate);
input_file.seekg (sizeof(Pet) * idx, ios::beg);
input_file.read((char*) &m,sizeof(Pet));
input_file.close();
You can also add data add the end, change data in the middle and so on.
i am trying to make an compress program that for example write instead regular 8 bits (char) only 1 or 2 bits, depend of the char we are trying to write. i tried to write with:
//I dont know what the function should return
char getBytes(char c)
{
return 0xff;
}
ofstream fout;
fout.open("file.bin", ios::binary | ios::out);
fout << getBytes(c);
but so far i succed writing only chars.
so how can i write for example: '01'? or only '1'? in what function i should use for write into file with only bytes? thanks.
Streams are sequences of bytes. There is no standard interface to write individual bits. If you want to write individual bits you’ll need to create your own abstraction of a bit stream built on top of streams. It would aggregate multiple bits into a byte which is then written to the underlying stream. If you want reasonable efficient writing you’ll probably need to aggregate multiple bytes before writing them to the stream.
A naïve implementation could look something like this:
class bitstream {
std::streambuf* sbuf;
int count = 0;
unsigned char byte = 0;
public:
bitstream(std::ostream& out): sbuf(out.rdbuf()) {}
~bitstream() {
if (this->count) {
this->sbuf->sputc(this->byte);
}
}
bitstream& operator<< (bool b) {
this->byte = (this->byte << 1) | b;
if (++this->count == 8) {
this->sbuf->sputc(this->byte);
this->count = 0;
}
return *this;
}
};
Note that this implementation has rather basic handling for the last byte: if a byte was started it will be written as is. Whether that is the intended behavior or whether it needs to be shifted for the written bits being in the top bits of the last byte depends on how things are being used. Also, there is no error handling for the case that the underlying stream couldn’t be written. (and the code is untest - I don’t have an easy way to compile on my mobile phone)
You’d use the class something like this:
int main() {
std::ofstream file(“bits.txt”, std::ios_base::binary);
bitstream out(file);
out << false << true << false << false
<< false << false << false << true;
}
Unless I messed things up, the above code should write A into the file bits.txt (the ASCII code for A is 65).
Just for context: files are actually organized into blocks of bytes. However, the stream abstraction aggregates the individual bytes written into blocks. Although byte oriented interfaces are provided by all popular operating systems writing anything but blocks of data tends to be rather inefficient.
I'm trying to read data from binary file using folloing code:
fstream s;
s.open(L"E:\\test_bin.bin", ios::in | ios::binary);
int c = 0;
while (!s.eof())
{
s >> c;
cout << c;
}
c is always 0 (current value of c. If I set c to 1, result is 1). File exists and it has data that is not zeros, so problem is not at file. I can read this file using fread and using s.get(), but why given code not working?
Using the ios::binary flag doesn't necessarily mean that you read and write binary data. Take a look at https://stackoverflow.com/a/2225612/2372604 . ios::binary means "data is read or written without translating..."
What you probably want to do is use s.read(...). In your case the stream operator attempt to read a complete integer (something like "1234") rather then X number of bits that will fit into your integer.
For reading 4 bytes, something like the folling might work (untested):
int n;
while (s.read((char*) &n, 4) && s.gcount() != 0 ) {}
What's wrong with:
int c = 0;
char ch;
int shift = 32;
while ( s.get( ch ) && shift != 0 ) {
shift -= 8;
c |= (ch & 0xFF) << shift;
}
if ( shift != 0 ) {
// Unexpected end of file...
}
This is the (more or less) standard way of reading binary 32 bit
integers off the network. (This supposes that native int is
32 bits 2's complement, of course.)
Some protocols use different representation of 32 bit ints, and
so will require different code.
As for your original code: the test s.eof() is always wrong,
and >> is for inputting text; in particular, it will skip
leading whitespace (and binary data may contain codes which
correspond to whitespace).
I might also add that you should ensure that the stream is
imbued with the "C" locale, so that no code translation
occurs.
I am taking input from a file in binary mode using C++; I read the data into unsigned ints, process them, and write them to another file. The problem is that sometimes, at the end of the file, there might be a little bit of data left that isn't large enough to fit into an int; in this case, I want to pad the end of the file with 0s and record how much padding was needed, until the data is large enough to fill an unsigned int.
Here is how I am reading from the file:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//TODO: read the remaining data and pad it here prior to processing
} else {
//output to error stream and exit with failure condition
}
The TODO in the code is where I'm having trouble. After the file input finishes and the loop exits, I need to read in the remaining data at the end of the file that was too small to fill an unsigned int. I need to then pad the end of that data with 0's in binary, recording enough about how much padding was done to be able to un-pad the data in the future.
How is this done, and is this already done automatically by C++?
NOTE: I cannot read the data into anything but an unsigned int, as I am processing the data as if it were an unsigned integer for encryption purposes.
EDIT: It was suggested that I simply read what remains into an array of chars. Am I correct in assuming that this will read in ALL remaining data from the file? It is important to note that I want this to work on any file that C++ can open for input and/or output in binary mode. Thanks for pointing out that I failed to include the detail of opening the file in binary mode.
EDIT: The files my code operates on are not created by anything I have written; they could be audio, video, or text. My goal is to make my code format-agnostic, so I can make no assumptions about the amount of data within a file.
EDIT: ok, so based on constructive comments, this is something of the approach I am seeing, documented in comments where the operations would take place:
std::ifstream fin;
fin.open('filename.whatever', std::ios::in | std::ios::binary);
if(fin) {
unsigned int m;
while(fin >> m) {
//processing the data and writing to another file here
}
//1: declare Char array
//2: fill it with what remains in the file
//3: fill the rest of it until it's the same size as an unsigned int
} else {
//output to error stream and exit with failure condition
}
The question, at this point, is this: is this truly format-agnostic? In other words, are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size? I should know this, I know, but I've got to ask it anyway.
Are bytes used to measure file size as discrete units, or can a file be, say, 11.25 bytes in size?
No data type can be less than a byte, and your file is represented as an array of char meaning each character is one byte. Thus it is impossible to not get a whole number measure in bytes.
Here is step one, two, and three as per your post:
while (fin >> m)
{
// ...
}
std::ostringstream buffer;
buffer << fin.rdbuf();
std::string contents = buffer.str();
// fill with 0s
std::fill(contents.begin(), contents.end(), '0');
I've finally figured out how to write some specifically formatted information to a binary file, but now my problem is reading it back and building it back the way it originally was.
Here is my function to write the data:
void save_disk(disk aDisk)
{
ofstream myfile("disk01", ios::out | ios::binary);
int32_t entries;
entries = (int32_t) aDisk.current_file.size();
char buffer[10];
sprintf(buffer, "%d",entries);
myfile.write(buffer, sizeof(int32_t));
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (const file_node& aFile)
{
myfile.write(aFile.name, MAX_FILE_NAME);
myfile.write(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
}
and my structure that it originally was created with and what I want to load it back into is composed as follows.
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node(){};
};
struct disk
{
vector<file_node> current_file;
};
I don't really know how to read it back in so that it is arranged the same way, but here is my pathetic attempt anyway (I just tried to reverse what I did for saving):
void load_disk(disk aDisk)
{
ifstream myFile("disk01", ios::in | ios::binary);
char buffer[10];
myFile.read(buffer, sizeof(int32_t));
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (file_node& aFile)
{
myFile.read(aFile.name, MAX_FILE_NAME);
myFile.read(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
}
^^ This is absolutely wrong. ^^
I understand the basic operations of the ifstream, but really all I know how to do with it is read in a file of text, anything more complicated than that I'm kind of lost.
Any suggestions on how I can read this in?
You're very close. You need to write and read the length as binary.
This part of your length-write is wrong:
char buffer[10];
sprintf(buffer, "%d",entries);
myfile.write(buffer, sizeof(int32_t));
It only writes the first four bytes of whatever the length is, but the length is character data from a sprintf() call. You need to write this as a binary-value of entries (the integer):
// writing your entry count.
uint32_t entries = (uint32_t)aDisk.current_file.size();
entries = htonl(entries);
myfile.write((char*)&entries, sizeof(entries));
Then on the read:
// reading the entry count
uint32_t entries = 0;
myFile.read((char*)&entries, sizeof(entries));
entries = ntohl(entries);
// Use this to resize your vector; for_each has places to stuff data now.
aDisk.current_file.resize(entries);
std::for_each(aDisk.current_file.begin(), aDisk.current_file.end(), [&] (file_node& aFile)
{
myFile.read(aFile.name, MAX_FILE_NAME);
myFile.read(aFile.data, BLOCK_SIZE - MAX_FILE_NAME);
});
Or something like that.
Note 1: this does NO error checking nor does it account for portability for potentially different endian-ness on different host machines (a big-endian machine writing the file, a little endian machine reading it). Thats probably ok for your needs, but you should at least be aware of it.
Note 2: Pass your input disk parameter to load_disk() by reference:
void load_disk(disk& aDisk)
EDIT Cleaning file_node content on construction
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node()
{
memset(name, 0, sizeof(name));
memset(data, 0, sizeof(data));
}
};
If you are using a compliant C++11 compiler:
struct file_node
{
char name[MAX_FILE_NAME];
char data[BLOCK_SIZE - MAX_FILE_NAME];
file_node() : name(), data() {}
};