c++ writing and reading objects to binary files - c++

I'm trying to read an array object (Array is a class I've made using read and write functions to read and write from binary files. So far the write functions works but it won't read from the file properly for some reason. This is the write function :
void writeToBinFile(const char* path) const
{
ofstream ofs(path, ios_base::out | ios_base::app | ios_base::binary);
if (ofs.is_open())
{
ostringstream oss;
for (unsigned int i = 0; i < m_size; i++)
{
oss << ' ';
oss << m_data[i];
}
ofs.write(oss.str().c_str(), oss.str().size());
}
}
This is the read function :
void readFromBinFile(const char* path)
{
ifstream ifs(path, ios_base::in | ios_base::binary || ios_base::ate);
if (ifs.is_open())
{
stringstream ss;
int charCount = 0, spaceCount = 0;
ifs.unget();
while (spaceCount != m_size)
{
charCount++;
if (ifs.peek() == ' ')
{
spaceCount++;
}
ifs.unget();
}
ifs.get();
char* ch = new char[sizeof(char) * charCount];
ifs.read(ch, sizeof(char) * charCount);
ss << ch;
delete[] ch;
for (unsigned int i = 0; i < m_size; i++)
{
ss >> m_data[i];
m_elementCount++;
}
}
}
those are the class fields :
T* m_data;
unsigned int m_size;
unsigned int m_elementCount;
I'm using the following code to write and then read (1 execution for reading another for writing):
Array<int> arr3(5);
//arr3[0] = 38;
//arr3[1] = 22;
//arr3[2] = 55;
//arr3[3] = 7;
//arr3[4] = 94;
//arr3.writeToBinFile("binfile.bin");
arr3.readFromBinFile("binfile.bin");
for (unsigned int i = 0; i < arr3.elementCount(); i++)
{
cout << "arr3[" << i << "] = " << arr3[i] << endl;
}
The problem is now at the readFromBinFile function, it get stuck in an infinite loop and peek() returns -1 for some reason and I can't figure why.
Also note I'm writing to the binary file using spaces to make a barrier between each element so I would know to differentiate between objects in the array and also a space at the start of the writing to make a barrier between previous stored binary data in the file to the array binary data.

The major problem, in my mind, is that you write fixed-size binary data in variable-size textual form. It could be so much simpler if you just stick to pure binary form.
Instead of writing to a string stream and then writing that output to the actual file, just write the binary data directly to the file:
ofs.write(reinterpret_cast<char*>(m_data), sizeof(m_data[0]) * m_size);
Then do something similar when reading the data.
For this to work, you of course need to save the number of entries in the array/vector first before writing the actual data.
So the actual write function could be as simple as
void writeToBinFile(const char* path) const
{
ofstream ofs(path, ios_base::out | ios_base::binary);
if (ofs)
{
ofs.write(reinterpret_cast<const char*>(&m_size), sizeof(m_size));
ofs.write(reinterpret_cast<const char*>(&m_data[0]), sizeof(m_data[0]) * m_size);
}
}
And the read function
void readFromBinFile(const char* path)
{
ifstream ifs(path, ios_base::in | ios_base::binary);
if (ifs)
{
// Read the size
ifs.read(reinterpret_cast<char*>(&m_size), sizeof(m_size));
// Read all the data
ifs.read(reinterpret_cast<char*>(&m_data[0]), sizeof(m_data[0]) * m_size);
}
}
Depending on how you define m_data you might need to allocate memory for it before reading the actual data.
Oh, and if you want to append data at the end of the array (but why would you, in the current code you show, you rewrite the whole array anyway) you write the size at the beginning, seek to the end, and then write the new data.

Related

C++ decoding LZ77-compressed data using std::fstream too slow

I have a function in my code which decodes a file compressed using the LZ77 algorithm. But on 15 MB input file decompression takes about 3 minutes (too slow). What's the reason of poor performance? On every step of the loop I read two or three bytes and get length, offset and next character. If offset is not zero I also have to move "offset" bytes back in output stream and read "length" bytes. Then I insert them to the end of the same stream before writing next character there.
void uncompressData(long block_size, unsigned char* data, fstream &file_out)
{
unsigned char* append;
append = new unsigned char[buf_length];
link myLink;
long cur_position = 0;
file_out.seekg(0, ios::beg);
cout << file_out.tellg() << endl;
int i=0;
myLink.length=-1;
while(i<(block_size-1))
{
if(myLink.length!=-1) file_out << myLink.next;
myLink.length = (short)(data[i] >> 4);
//cout << myLink.length << endl;
if(myLink.length!=0)
{
myLink.offset = (short)(data[i] & 0xF);
myLink.offset = myLink.offset << 8;
myLink.offset = myLink.offset | (short)data[i+1];
myLink.next = (unsigned char)data[i+2];
cur_position=file_out.tellg();
file_out.seekg(-myLink.offset,ios_base::cur);
if(myLink.length<=myLink.offset)
{
file_out.read((char*)append, myLink.length);
}
else
{
file_out.read((char*)append, myLink.offset);
int k=myLink.offset,j=0;
while(k<myLink.length)
{
append[k]=append[j];
j++;
if(j==myLink.offset) j=0;
k++;
}
}
file_out.seekg(cur_position);
file_out.write((char*)append, myLink.length);
i++;
}
else {
myLink.offset = 0;
myLink.next = (unsigned char)data[i+1];
}
i=i+2;
}
unsigned char hasOddSymbol = data[block_size-1];
if(hasOddSymbol==0x0) { file_out << myLink.next; }
delete[] append;
}
You could try doing it on a std::stringstream in memory instead:
#include <sstream>
void uncompressData(long block_size, unsigned char* data, fstream& out)
{
std::stringstream file_out; // first line in the function
// the rest of your function goes here
out << file_out.rdbuf(); // last line in the function
}

How to find a string in a binary file?

I want to find a specific string "fileSize" in a binary file.
The purpose of finding that string is to get 4 bytes that next to the string because that 4 bytes contains the size of data that I want to read it.
The content of the binary file like the following:
The same string in another position:
Another position:
The following is the function that writes the data to a file:
void W_Data(char *readableFile, char *writableFile) {
ifstream RFile(readableFile, ios::binary);
ofstream WFile(writableFile, ios::binary | ios::app);
RFile.seekg(0, ios::end);
unsigned long size = (unsigned long)RFile.tellg();
RFile.seekg(0, ios::beg);
unsigned int bufferSize = 1024;
char *contentsBuffer = new char[bufferSize];
WFile.write("fileSize:", 9);
WFile.write((char*)&size, sizeof(unsigned long));
while (!RFile.eof()) {
RFile.read(contentsBuffer, bufferSize);
WFile.write(contentsBuffer, bufferSize);
}
RFile.close();
WFile.close();
delete contentsBuffer;
contentsBuffer = NULL;
}
Also, the function that searches for the string:
void R_Data(char *readableFile) {
ifstream RFile(readableFile, ios::binary);
const unsigned int bufferSize = 9;
char fileSize[bufferSize];
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "fileSize:") == 0) {
cout << "Exists" << endl;
}
}
RFile.close();
}
How to find a specific string in a binary file?
I think of using find() is an easy way to search for patterns.
void R_Data(const std::string filename, const std::string pattern) {
std::ifstream(filename, std::ios::binary);
char buffer[1024];
while (file.read(buffer, 1024)) {
std::string temp(buffer, 1024);
std::size_t pos = 0, old = 0;
while (pos != std::string::npos) {
pos = temp.find(pattern, old);
old = pos + pattern.length();
if ( pos != std::string::npos )
std::cout << "Exists" << std::endl;
}
file.seekg(pattern.length()-1, std::ios::cur);
}
}
How to find a specific string in a binary file?
If you don't know the location of the string in the file, I suggest the following:
Find the size of the file.
Allocate memory for being able to read everything in the file.
Read everything from the file to the memory allocated.
Iterate over the contents of the file and use std::strcmp/std::strncmp to find the string.
Deallocate the memory once you are done using it.
There are couple of problems with using
const unsigned int bufferSize = 9;
char fileSize[bufferSize];
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "filesize:") == 0) {
cout << "Exists" << endl;
}
}
Problem 1
The strcmp line will lead to undefined behavior when fileSize actually contains the string "fileSize:" since the variable has enough space only for 9 character. It needs an additional element to hold the terminating null character. You could use
const unsigned int bufferSize = 9;
char fileSize[bufferSize+1] = {0};
while (RFile.read(fileSize, bufferSize)) {
if (strcmp(fileSize, "filesize:") == 0) {
cout << "Exists" << endl;
}
}
to take care of that problem.
Problem 2
You are reading the contents of the file in blocks of 9.
First call to RFile.read reads the first block of 9 characters.
Second call to RFile.read reads the second block of 9 characters.
Third call to RFile.read reads the third block of 9 characters. etc.
Hence, unless the string "fileSize:" is at the boundary of one such blocks, the test
if (strcmp(fileSize, "filesize:") == 0)
will never pass.

Read binary file and count specific number c++

Hey everyone I've been looking everywhere for insight on how to do this particular assignment. I saw something similar but it didn't have a clear explanation. I'm trying to read a bin file and count the number of times a specific number appears. I saw examples of this using a .txt file and it seemed very straight forward using getline. I tried to replicate the similar structure but using a binary file.
int main() {
int searching = 3;
int counter = 0;
unsigned char * memblock;
long long int size;
//open bin file
ifstream file;
file.open("threesData.bin", ios:: in | ios::binary | ios::ate);
//read bin file
if (file.is_open()) {
cout << "it opened\n";
size = file.tellg();
memblock = new unsigned char[size];
file.seekg(0, ios::beg);
file.read((char * ) memblock, size);
while (file.read((char * ) memblock, size)) {
for (int i = 0; i < size; i++) {
(int) memblock[i];
if (memblock[i] == searching) {
counter++;
}
}
}
}
file.close();
cout << "The number " << searching << " appears ";
cout << counter << " times!";
return 0;
}
When I run the program it's clear that it opens but it doesn't count the number I'm searching for. What am I doing wrong?
You seem to be thinking this through but here's how I would go about doing it.
Initialize a buffer with a sensible size.
Cast it to integers, so you can do array[size_t] syntax for simpler arithmetic.
Open the stream, and read while the stream is valid.
Convert the number of read bytes to the number of ints you would expect.
Increment the counter for each character you find that is valid.
Code
#include <fstream>
#include <iostream>
bool check_character(int value)
{
return value == 3;
}
int main(void)
{
// choose the size, cast a pointer as an int type, and initialize
// our counter
static constexpr size_t size = 4096;
char* buffer = new char[size];
int* ints = (int*) buffer;
size_t counter = 0;
// create our stream,
std::ifstream stream("file.bin", std::ios_base::binary);
while (stream) {
// keep reading while the stream is valid
stream.read(buffer, size);
auto count = stream.gcount();
// we only want to go to the last valid integer
// if we expect the file to be only integers,
// we could do `assert(count % sizeof(int) == 0);
// otherwise, we may have trailing characters
// if we have trailing characters, we may want to move them
// to the front of the buffer....
auto chars = count / sizeof(int); // floor division
for (size_t i = 0; i < chars; ++i) {
// false == 0, true == 1, so we can just add
// if the value is 3
counter += check_character(ints[i]);
}
}
std::cout << "Counter is: " << counter << std::endl;
delete[] buffer;
return 0;
}
As NeilButterworth points out, you could also use a vector. I don't really like this, but "meh".
#include <fstream>
#include <iostream>
#include <vector>
/* ellipsed lines */
int main(void)
{
/* ellipsed lines */
static constexpr size_t size = 4096;
std::vector<int> ints;
ints.resize(size / sizeof(int));
char* buffer = (char*) ints.data();
/* ellipsed lines */
/* ellipsed lines */
std::cout << "Counter is: " << counter << std::endl;
// no delete[]
return 0;
}

Writing and reading a file

I'm trying to use ifstream/ofstream to read/write but for some reason, the data gets corrupted along the way. Heres the read/write methods and the test:
void FileWrite(const char* FilePath, std::vector<char> &data) {
std::ofstream os (FilePath);
int len = data.size();
os.write(reinterpret_cast<char*>(&len), 4);
os.write(&(data[0]), len);
os.close();
}
std::vector<char> FileRead(const char* FilePath) {
std::ifstream is(FilePath);
int len;
is.read(reinterpret_cast<char*>(&len), 4);
std::vector<char> ret(len);
is.read(&(ret[0]), len);
is.close();
return ret;
}
void test() {
std::vector<char> sample(1024 * 1024);
for (int i = 0; i < 1024 * 1024; i++) {
sample[i] = rand() % 256;
}
FileWrite("C:\\test\\sample", sample);
auto sample2 = FileRead("C:\\test\\sample");
int err = 0;
for (int i = 0; i < sample.size(); i++) {
if (sample[i] != sample2[i])
err++;
}
std::cout << err << "\n";
int a;
std::cin >> a;
}
It writes the length correctly, reads it correctly and starts reading the data correctly but at some point(depending on input, usually at around the 1000'th byte) it goes wrong and everything to follow is wrong. Why is that?
for starter, you should open the file stream for binary read and write :
std::ofstream os (FilePath,std::ios::binary);
(edit: assuming char really means "signed char")
Do notice that regular char can hold up to CHAR_MAX/2 value, which is 127.
If the random number is bigger - the result will wrap around, resulting negative value. the stream will try to write this character as a text character, which is invalid value to write. binary format should at least fix this problem.
Also, you shouldn't close the stream yourself here, the destructor does it for you.
Two more simple points:
1) &(data[0]) should be just &data[0], the () are redundant
2) try keep the same convention. you write upper-camel-case for FilePath variable, but lower-camel-case for all the other variables.

Decoding problems with Lempel-Ziv-Welch algorithm

I have to implement the LZW algorithm but I have found some trouble with the decoding part.
I think the code is right because it works with a example I've found somewhere on the web: if I initialize my dictionary as follows
m_dictionary.push_back("a");
m_dictionary.push_back("b");
m_dictionary.push_back("d");
m_dictionary.push_back("n");
m_dictionary.push_back("_");
and my input file has the string banana_bandana, I get the following results:
compressed.txt: 1036045328
decompressed.txt:banana_bandana
But if I initialize the dictionary with all the 255 ASCII characters, the decoding process fails miserably. I think the problem rests in the number of bits used on the codes because when I'm going to decode, I always read from the input file char by char (8 bits) instead the correct number of bits, I guess.
Below is the code of my implementation of this algorithm:
template <class T>
size_t toUnsigned(T t) {
std::stringstream stream;
stream << t;
size_t x;
stream >> x;
return x;
}
bool LempelZivWelch::isInDictionary(const std::string& entry) {
return (std::find(m_dictionary.begin(), m_dictionary.end(), entry) != m_dictionary.end());
}
void LempelZivWelch::initializeDictionary() {
m_dictionary.clear();
for (int i = 0; i < 256; ++i)
m_dictionary.push_back(std::string(1, char(i)));
}
void LempelZivWelch::addEntry(std::string entry) {
m_dictionary.push_back(entry);
}
size_t LempelZivWelch::encode(char *data, size_t dataSize) {
initializeDictionary();
std::string s;
char c;
std::ofstream file;
file.open("compressed.txt", std::ios::out | std::ios::binary);
for (size_t i = 0; i < dataSize; ++i) {
c = data[i];
if(isInDictionary(s + c))
s = s + c;
else {
for (size_t j = 0; j < m_dictionary.size(); ++j)
if (m_dictionary[j] == s) {
file << j;
break;
}
addEntry(s + c);
s = c;
}
}
for (size_t j = 0; j < m_dictionary.size(); ++j)
if (m_dictionary[j] == s) {
file << j;
break;
}
file.close();
return dataSize;
}
size_t LempelZivWelch::decode(char *data, size_t dataSize) {
initializeDictionary();
std::string entry;
char c;
size_t previousCode, currentCode;
std::ofstream file;
file.open("decompressed.txt", std::ios::out | std::ios::binary);
previousCode = toUnsigned(data[0]);
file << m_dictionary[previousCode];
for (size_t i = 1; i < dataSize; ++i) {
currentCode = toUnsigned(data[i]);
entry = m_dictionary[currentCode];
file << entry;
c = entry[0];
addEntry(m_dictionary[previousCode] + c);
previousCode = currentCode;
}
file.close();
return dataSize;
}
And this is the function that reads the input files:
void Compression::readFile(std::string filename) {
std::ifstream file;
file.open(filename.c_str(), std::ios::in | std::ios::binary | std::ios::ate);
if (!file.is_open())
exit(EXIT_FAILURE);
m_dataSize = file.tellg();
m_data = new char [m_dataSize];
file.seekg(0, std::ios::beg);
file.read(m_data, m_dataSize);
file.close();
}
My guess is the decoding problem resides in reading the input file as a array of chars and/or writing to the compressed file the chars as size_t.
Thanks in advance!
It looks like you are outputting the dictionary indices as ASCII encoded numbers. How are you going to tell the sequence 1,2,3 from 12,3 or 1,23.
You need to encode the data in an unambiguous way using either 9-bit (10, 11 or whatever) numbers or some sort of prefix-free code like huffman coding.