I am writing a binary I/O for storing data in my application.
For illustration consider I want to store a double array of size 10 to the file.
Now since it is not guaranteed that double uses 8 bytes on all platforms, the reader of the file needs to be be modified a bit.
Although I am using Qt I think the problem is mainly in the way data read in char * is translated into double. The data read is almost zero.
For example, 1 is read as 2.08607954259741e-317.
Why is every double being read as zero even thought it is not?
void FileString::SaveBinary()
{
QFile *file = new QFile(fileName);
if (!file->open(QFile::WriteOnly))
{
QString err = file->errorString();
QString *msgText = new QString("Could not open the file from disk!\n");
msgText->append(err);
QString *msgTitle = new QString("ERROR: Could not open the file!");
emit errMsg(msgTitle, msgText, "WARNING");
delete file;
return;
}
QDataStream out(file);
QString line = "MyApp";
out << line;
line.setNum(size);//size = 10
out << line;
line.setNum(sizeof(double));
out << line;
for(int i = 0; i < size; i++)
{
out << array[i];
}
if(out.status() != QDataStream::Ok)
{
qCritical("error: " + QString::number(out.status()).toAscii());
}
file->close();
delete file;
}
void FileString::ReadBinary()
{
bool ok = false;
QString line = "";
QFile *file = new QFile(fileName);
if (!file->open(QFile::ReadOnly))
{
QString err = file->errorString();
QString *msgText = new QString("Could not open the file from disk!\n");
msgText->append(err);
QString *msgTitle = new QString("ERROR: Could not open the file!");
emit errMsg(msgTitle, msgText, "WARNING");
delete file;
return;
}
QDataStream in(file);
in >> line;
if(line.simplified().contains("MyApp"))
{
in >> line;
size = line.simplified().toInt();
if(size == 10)
{
int mysize = 0;
in >> line;
mysize = line.simplified().toInt();
if(1)//this block runs perfect
{
for(int i = 0; i < size; i++)
{
in >> array[i];
}
if(in.status() == QDataStream::Ok)
ok = true;
}
}
else if(1)//this block reads only zeros
{
char *reader = new char[mysize + 1];
int read = 0;
double *dptr = NULL;
for(int i = 0; i < size; i++)
{
read = in.readRawData(reader, mysize);
if(read != mysize)
{
break;
}
dptr = reinterpret_cast<double *>(reader);//garbage data stored in dptr, why?
if(dptr)
{
array[i] = *dptr;
dptr = NULL;
}
else
{
break;
}
}
if(in.status() == QDataStream::Ok)
ok = true;
delete[] reader;
}
}
}
if(!ok || (in.status() != QDataStream::Ok))
{
qCritical("error : true" + " status = " + QString::number((int) in.status()).toAscii());
}
file->close();
delete file;
}
EDIT:
Contents of the generated file
& M y A p p 1 . 1 8 . 3 . 0 1 0 8?ð # # # # # # # #" #$
That is supposed to contain:
MyApp 1.18.3.010812345678910
"MyApp 1.18.3.0" "10" "8" "12345678910"
What do you expect to read if sizeof double on read platform differs from sizeof double on write platform?
Suppose sizeof double on your write platform was 10. Then you stored a sequence of 10 bytes in a file that represents a 10-byte double. Then, if sizeof double on your read platform is 8, you would try to parse bits of an 10-byte double into an 8-byte and that would obviously end up with garbage.
Here's a more intuitive example with ints:
If you a have a 2-byte integer number, say 5. If you store it in binary file, you'll get a sequence of 2 bytes: 00000000 00000101. Then, if you try to read the same number as a 1-byte int, you'll manage to read only the first byte, which is 00000000 and get just zero as a result.
Consider using strings to save doubles for portability https://stackoverflow.com/a/6790009/817441
Note that in your original code sizeof(double) could work instead of the hard-coded string, but it will not as long as to migrate to a different architecture with a different double size on it.
As a side note if you are worried about the performance of the double to string conversion, you may have more problems when your users or you would like to move to embedded later. I have just run some conversions in a loop, and it is not that bad on my old laptop either. Here is my very poor benchmark result:
time ./main
real 0m1.244s
user 0m1.240s
sys 0m0.000s
I would like to point it out again that it is an old laptop.
for the code:
#include <QString>
int main()
{
for (int i = 0; i < 1000000; ++i)
QString::number(5.123456789012345, 'g', 15);
return 0;
}
So, instead of the non-portable direct write, I would suggest to use the following method:
QString QString::number(double n, char format = 'g', int precision = 6) [static]
Returns a string equivalent of the number n, formatted according to the specified format and precision. See Argument Formats for details.
Unlike QLocale::toString(), this function does not honor the user's locale settings.
http://doc-snapshot.qt-project.org/qdoc/qstring.html#number-2
Having discussed all this theretically, I would be writing something like this if I were you:
void FileString::SaveBinary()
{
QFile *file = new QFile(fileName);
if (!file->open(QFile::WriteOnly))
{
QString err = file->errorString();
QString *msgText = new QString("Could not open the file from disk!\n");
msgText->append(err);
QString *msgTitle = new QString("ERROR: Could not open the file!");
emit errMsg(msgTitle, msgText, "WARNING");
delete file;
return;
}
QDataStream out(file);
QString line = QString::number(myDouble);
out << line;
for(int i = 0; i < size; i++)
{
out << array[i];
}
if(out.status() != QDataStream::Ok)
{
qCritical("error: " + QString::number(out.status()).toAscii());
}
file->close();
delete file;
}
One portable option could be to use long double, but of course that would increase the computation at other places, so depending on the scenario, it may or may not be an option.
Related
I have a program that produces a Huffman tree based on ASCII character frequency read in a text input file. The Huffman codes are stored in a string array of 256 elements, empty string if the character is not read. This program also then encodes and compresses an output file and currently has some functionality in decompression and decoding.
In summary, my program takes a input file compresses and encodes an output file, closes the output file and opens the encoding as an input file, and takes a new output file that is supposed to have a decoded message identical to the original text input file.
My problem is that in my test run while compressing I notice that I have 3 extra bytes and in turn when I decompress and decode my encoded file, these 3 extra bytes are being decoded to my output file. Depending on the amount of text in the original input file, my other tests output these extra bytes.
My research has let me to a few suggestions such as making the first 8 bytes of your encoded output file the 64 bits of an unsigned long long that give the number of bytes in the file or using a psuedo-EOF but I am stuck on how I would go about handling it and which of the two is a smart way to handle it given the code I have already written or if either is a smart way at all?
Any guidance or solution to this problem is appreciated.
(For encodedOutput function, fileName is the input file parameter, fileName2 is the output file parameter)
(For decodeOutput function, fileName2 is the input file parameter, fileName 3 is output file parameter)
code[256] is a parameter for both of these functions and holds the Huffman code for each unique character read in the original input file, for example, the character 'H' being read in the input file may have a code of "111" stored in the code array for code[72] at the time it is being passed to the functions.
freq[256] holds the frequency of each ascii character read or holds 0 if it is not in original input file.
void encodeOutput(const string & fileName, const string & fileName2, string code[256]) {
ifstream ifile; //to read file
ifile.open(fileName, ios::binary);
if (!ifile)//to check if file is open or not
{
die("Can't read again"); // function that exits program if can't open
}
ofstream ofile;
ofile.open(fileName2, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
int read;
read = ifile.get(); //read one char from file and store it in int
char buffer = 0, bit_count = 0;
while (read != -1) {//run this loop until reached to end of file(-1)
for (unsigned b = 0; b < code[read].size(); b++) { // loop through bits (code[read] outputs huffman code)
buffer <<= 1;
buffer |= code[read][b] != '0';
bit_count++;
if (bit_count == 8) {
ofile << buffer;
buffer = 0;
bit_count = 0;
}
}
read = ifile.get();
}
if (bit_count != 0)
ofile << (buffer << (8 - bit_count));
ifile.close();
ofile.close();
}
void decodeOutput(const string & fileName2, const string & fileName3, string code[256], const unsigned long long freq[256]) {
ifstream ifile;
ifile.open(fileName2, ios::binary);
if (!ifile)
{
die("Can't read again");
}
ofstream ofile;
ofile.open(fileName3, ios::binary);
if (!ofile) {
die("Can't open encoding output file");
}
priority_queue < node > q;
for (unsigned i = 0; i < 256; i++) {
if (freq[i] == 0) {
code[i] = "";
}
}
for (unsigned i = 0; i < 256; i++)
if (freq[i])
q.push(node(unsigned(i), freq[i]));
if (q.size() < 1) {
die("no data");
}
while (q.size() > 1) {
node *child0 = new node(q.top());
q.pop();
node *child1 = new node(q.top());
q.pop();
q.push(node(child0, child1));
} // created the tree
string answer = "";
const node * temp = &q.top(); // root
for (int c; (c = ifile.get()) != EOF;) {
for (unsigned p = 8; p--;) { //reading 8 bits at a time
if ((c >> p & 1) == '0') { // if bit is a 0
temp = temp->child0; // go left
}
else { // if bit is a 1
temp = temp->child1; // go right
}
if (temp->child0 == NULL && temp->child1 == NULL) // leaf node
{
answer += temp->value;
temp = &q.top();
}
}
}
ofile << ans;
}
Because of integral promotion rules, (buffer << (8 - bit_count)) will be an integer expression, causing 4 bytes to be written. To only write one byte, you need to cast this to a char.
ofile << char(buffer << (8 - bit_count));
I'm trying to use ifstream/ofstream to read/write but for some reason, the data gets corrupted along the way. Heres the read/write methods and the test:
void FileWrite(const char* FilePath, std::vector<char> &data) {
std::ofstream os (FilePath);
int len = data.size();
os.write(reinterpret_cast<char*>(&len), 4);
os.write(&(data[0]), len);
os.close();
}
std::vector<char> FileRead(const char* FilePath) {
std::ifstream is(FilePath);
int len;
is.read(reinterpret_cast<char*>(&len), 4);
std::vector<char> ret(len);
is.read(&(ret[0]), len);
is.close();
return ret;
}
void test() {
std::vector<char> sample(1024 * 1024);
for (int i = 0; i < 1024 * 1024; i++) {
sample[i] = rand() % 256;
}
FileWrite("C:\\test\\sample", sample);
auto sample2 = FileRead("C:\\test\\sample");
int err = 0;
for (int i = 0; i < sample.size(); i++) {
if (sample[i] != sample2[i])
err++;
}
std::cout << err << "\n";
int a;
std::cin >> a;
}
It writes the length correctly, reads it correctly and starts reading the data correctly but at some point(depending on input, usually at around the 1000'th byte) it goes wrong and everything to follow is wrong. Why is that?
for starter, you should open the file stream for binary read and write :
std::ofstream os (FilePath,std::ios::binary);
(edit: assuming char really means "signed char")
Do notice that regular char can hold up to CHAR_MAX/2 value, which is 127.
If the random number is bigger - the result will wrap around, resulting negative value. the stream will try to write this character as a text character, which is invalid value to write. binary format should at least fix this problem.
Also, you shouldn't close the stream yourself here, the destructor does it for you.
Two more simple points:
1) &(data[0]) should be just &data[0], the () are redundant
2) try keep the same convention. you write upper-camel-case for FilePath variable, but lower-camel-case for all the other variables.
I have records coming in from fgets (web data using popen(), not a file) that are in const char * array format.
const char * cstr = buff;
The 3rd item is a string and needs to either be removed or changed to a zero.
How do I access the 3rd element in a const char * array stream in increments of 5?
1
2
string
4
5
1
2
string
4
5
code:
while(fgets(buff, sizeof buff, fp) != NULL)
{
const char * cstr2 = buff;
for(int i = 0; i < 5; ++i){
if(i == 3){
if (!strcmp(cstr2, "Buy\n")) {
printf("Found One!\n");
cstr2[2] = 0;
}
if (!strcmp(cstr2, "Sell\n")) {
printf("Found Two!\n");
cstr2[2] = 0;
}
}
}
}
expected output:
1
2
0
4
5
1
2
0
4
5
error:
no match and :
error: assignment of read-only location '*(cstr2 + 2u)'
How do you correctly access the 3rd element in a char streaming char array?
This solution was previously posted by anonymous:
char* getmyData()
{
char buff[BUFSIZ];
FILE *fp = popen("php getMyorders.php 155", "r");
if (fp == NULL) perror ("Error opening file");
size_t size = 1000;
char* data = (char*)malloc(size);
char* ptr = data;
std::string::size_type sz;
int i=0;
while(fgets(buff, sizeof(buff), fp) != NULL)
{
const char * cstr2 = buff;
const char* test = ptr;
//for(int i = 0; i < 5; ++i)
{
if(i == 2){
if (!strcmp(cstr2, "Buy\n")) {
printf("Found One!\n");
strcpy(ptr,"0\n");
//ptr+=2;
}
else
if (!strcmp(cstr2, "Sell\n")) {
printf("Found Two!\n");
strcpy(ptr,"0\n");
//ptr+=2;
}
else
{
strcpy(ptr,cstr2);
ptr+=strlen(cstr2);
}
}
else
{
strcpy(ptr,cstr2);
ptr+=strlen(cstr2);
}
try
{
int nc = std::stod (test,&sz);
std::cout << "Test: " << 1+nc << "\n";
}
catch(...)
{
}
i++;
if (i==5)
i=0;
}
if (ptr-data+100>=size)
{
int ofs = ptr-data;
size*=2;
data = (char*)realloc(data,size);
ptr = data+ofs;
}
}
return data; // DONT FORGET TO call free() on it
}
From your sample code it is not clear what is the expected output. Is it an array of integer ? a formatted text string ? a byte array ? we can't know.
Assuming you have a text formatted input and want a text formatted output, a simple solution is to write a new string with the correct values and not try to modify your input buffer.
If you know the exact format of your input records you could use fscanf to do the parsing instead of doing it by hand. And you could use ssprintf to do the formatting of the output string.
As others pointed out, if you can use C++, you'd have safer/easier options. Please comment about your willingness to use C++.
I'm trying to read an array object (Array is a class I've made using read and write functions to read and write from binary files. So far the write functions works but it won't read from the file properly for some reason. This is the write function :
void writeToBinFile(const char* path) const
{
ofstream ofs(path, ios_base::out | ios_base::app | ios_base::binary);
if (ofs.is_open())
{
ostringstream oss;
for (unsigned int i = 0; i < m_size; i++)
{
oss << ' ';
oss << m_data[i];
}
ofs.write(oss.str().c_str(), oss.str().size());
}
}
This is the read function :
void readFromBinFile(const char* path)
{
ifstream ifs(path, ios_base::in | ios_base::binary || ios_base::ate);
if (ifs.is_open())
{
stringstream ss;
int charCount = 0, spaceCount = 0;
ifs.unget();
while (spaceCount != m_size)
{
charCount++;
if (ifs.peek() == ' ')
{
spaceCount++;
}
ifs.unget();
}
ifs.get();
char* ch = new char[sizeof(char) * charCount];
ifs.read(ch, sizeof(char) * charCount);
ss << ch;
delete[] ch;
for (unsigned int i = 0; i < m_size; i++)
{
ss >> m_data[i];
m_elementCount++;
}
}
}
those are the class fields :
T* m_data;
unsigned int m_size;
unsigned int m_elementCount;
I'm using the following code to write and then read (1 execution for reading another for writing):
Array<int> arr3(5);
//arr3[0] = 38;
//arr3[1] = 22;
//arr3[2] = 55;
//arr3[3] = 7;
//arr3[4] = 94;
//arr3.writeToBinFile("binfile.bin");
arr3.readFromBinFile("binfile.bin");
for (unsigned int i = 0; i < arr3.elementCount(); i++)
{
cout << "arr3[" << i << "] = " << arr3[i] << endl;
}
The problem is now at the readFromBinFile function, it get stuck in an infinite loop and peek() returns -1 for some reason and I can't figure why.
Also note I'm writing to the binary file using spaces to make a barrier between each element so I would know to differentiate between objects in the array and also a space at the start of the writing to make a barrier between previous stored binary data in the file to the array binary data.
The major problem, in my mind, is that you write fixed-size binary data in variable-size textual form. It could be so much simpler if you just stick to pure binary form.
Instead of writing to a string stream and then writing that output to the actual file, just write the binary data directly to the file:
ofs.write(reinterpret_cast<char*>(m_data), sizeof(m_data[0]) * m_size);
Then do something similar when reading the data.
For this to work, you of course need to save the number of entries in the array/vector first before writing the actual data.
So the actual write function could be as simple as
void writeToBinFile(const char* path) const
{
ofstream ofs(path, ios_base::out | ios_base::binary);
if (ofs)
{
ofs.write(reinterpret_cast<const char*>(&m_size), sizeof(m_size));
ofs.write(reinterpret_cast<const char*>(&m_data[0]), sizeof(m_data[0]) * m_size);
}
}
And the read function
void readFromBinFile(const char* path)
{
ifstream ifs(path, ios_base::in | ios_base::binary);
if (ifs)
{
// Read the size
ifs.read(reinterpret_cast<char*>(&m_size), sizeof(m_size));
// Read all the data
ifs.read(reinterpret_cast<char*>(&m_data[0]), sizeof(m_data[0]) * m_size);
}
}
Depending on how you define m_data you might need to allocate memory for it before reading the actual data.
Oh, and if you want to append data at the end of the array (but why would you, in the current code you show, you rewrite the whole array anyway) you write the size at the beginning, seek to the end, and then write the new data.
I'm using libzip to extract the content of each file in a zip into my own data structure, a C++ immutable POD.
The problem is that every time I extract the content of a file, I get some random data with tacked on to the end. Here's my code:
void Parser::populateFileMetadata() {
int error = 0;
zip *zip = zip_open(this->file_path.c_str(), 0, &error);
if (zip == nullptr) {
LOG(DEBUG)<< "Could not open zip file.";
return;
}
const zip_int64_t n_entries = zip_get_num_entries(zip, ZIP_FL_UNCHANGED);
for (zip_int64_t i = 0; i < n_entries; i++) {
const char *file_name = zip_get_name(zip, i, ZIP_FL_ENC_GUESS);
struct zip_stat st;
zip_stat_init(&st);
zip_stat(zip, file_name, (ZIP_FL_NOCASE|ZIP_FL_UNCHANGED), &st);
char *content = new char[st.size];
zip_file *file = zip_fopen(zip, file_name, \
(ZIP_FL_NOCASE|ZIP_FL_UNCHANGED));
const zip_int64_t did_read = zip_fread(file, content, st.size);
if (did_read <= 0) {
LOG(WARNING)<< "Could not read contents of " << file_name << ".";
continue;
}
const FileMetadata metadata(string(file_name), -1, string(content));
this->file_metadata.push_back(metadata);
zip_fclose(file);
delete[] content;
}
zip_close(zip);
}
You're constructing a std::string from content without telling the constructor how long it is, so the constructor is going to read from the start of the buffer until it finds a terminating NUL. But there's no guarantee that the file contains one, and so the constructor reads past the end of your buffer until it happens to find a NUL.
Fix: use the two-argument std::string constructor (string(const char* s, size_t size)) and pass it the data length.
zip_fread seems to increase the size of content, so I just truncate content: content[st.size] = '\0';
#ruipacheco solution did not work for me. Doing content[st.size] = '\0'; fixed the problem but caused the error "double free or corruption..." when calling zip_fclose() and/or delete[] content So I did the below and it seems to work
void ReadZip(std::string &data){
....
....
data.resize(st.size);
for(uint i = 0; i<st.size; ++i)
data[i] = std::move(content[i]);
}