Reading and writing binary files using structures - c++

I am attempting read from a binary file and dump the information into a structure. Before I read from it I write into the file from a vector of structures. Unfortunately I am not able to get the new structure to receive the information from the file.
I have tried switching between vectors and individual structures. Also tried messing with the file pointer, moving it back and forth and also leaving it as is to see if that has been the problem. Using vectors because it is supposed to take unlimited values. Also allows me to test what the output should look like when I look up a specific structure in the file.
struct Department{
string departmentName;
string departmentHead;
int departmentID;
double departmentSalary;
};
int main()
{
//...
vector<Employee> emp;
vector<Department> dept;
vector<int> empID;
vector<int> deptID;
if(response==1){
addDepartment(dept, deptID);
fstream output_file("departments.dat", ios::in|ios::out|ios::binary);
output_file.write(reinterpret_cast<char *>(&dept[counter-1]), sizeof(dept[counter-1]));
output_file.close();
}
else if(response==2){
addEmployee(emp, dept, empID);
}
else if(response==3){
Department master;
int size=dept.size();
int index;
cout << "Which record to EDIT:\n";
cout << "Please choose one of the following... 1"<< " to " << size << " : ";
cin >> index;
fstream input_file("departments.dat", ios::in|ios::out|ios::binary);
input_file.seekg((index-1) * sizeof(master), ios::beg);
input_file.read(reinterpret_cast<char *>(&master), sizeof(master));
input_file.close();
cout<< "\n" << master.departmentName;
}
else if(response==4){
}
//...

Files are streams of bytes. If you want to write something to a file and read it back reliably, you need to define the contents of the file at the byte level. Have a look at the specifications for some binary file formats (such a GIF) to see what such a specification looks like. Then write code to convert to and from your class instance and a chunk of bytes.
Otherwise, it will be hit or miss and, way too often, miss. Punch "serialization C++" into your favorite search engine for lots of ideas on how to do this.
Your code can't possibly work for an obvious reason. A string can contain a million bytes of data. But you're only writing sizeof(string) bytes to your file. So you're not writing anything that a reader can make sense out of.
Say sizeof(string) is 32 on your platform but the departmentHead is more than 32 bytes. How could the file's contents possibly be right? This code makes no attempt to serialize the data into a stream of bytes suitable for writing to a file which is ... a stream of bytes.

Related

Save the variables of an object to then be able to initialise another object with those variables

What I am trying to achieve is this:
Let's say I have a class Score. This class has an int variable and a char* variable.
Now when I have an object Score score, I would like to be able to save the value of those variables (I guess to a file). So now this file has an int variable and a char* variable that I can then access later to create a new Score object.
So I create Score score(10, "Bert");. I either do something like score.SaveScore(); or the score gets saved when the game is over or the program exits, it doesn't matter.
Basically I am looking for the equivalent/correct way of doing this:
score.SaveScore(FILE file)
{
file.var1 = score.score;
file.var2 = score.name;
}
I realize this is probably very stupid and not done this way whatsoever! This is just me trying to explain what I am trying to achieve in the simplest way possible.
Anyway, when I run the program again, that original Score score(10, "Bert") does not exist any more. But I would like to be able to access the saved score(from file or wherever it may be) and create another Score object.
So it may look something like:
LoadScore(FILE file)
{
Score newScore(file.var1, file.var2);
}
Again, just trying to show what I am trying to achieve.
The reason why I want to be able to access the variables again is to eventually have a Scoreboard, the Scoreboard would load a bunch of scores from the file.
Then when a new score is created, it is added to the scoreboard, compared to the other scores currently in the scoreboard and inserted in the right position (like a score of 6 would go in between 9 and 4).
I feel like this was a bit long winded but I was trying to really explain myself well! Which I hope I did!
Anyway, I am not looking for someone to tell me how to do all of that.
All I am after is how to do the initial save to a file.
Thank you for any suggestions.
I would use the <fstream> library, like this;
//example values
int x=10;
float y=10.5;
const char* chars = "some random value";
string str(chars); //make string buffer for sizing
str.resize(20); //make sure its fixed size
//open a test.txt file, in the same dir for output
std::ofstream os("test.txt", std::ios::out | std::ios::binary); //make it output binary
//(char*) cast &x, sizeof(type) for values/write to file chars for x and y
os.write((char*)&x, sizeof(int)); //only sizeof(int) starting at &x
os.write((char*)&y, sizeof(float)); //cast as a char pointer
os.write(str.data(), sizeof(char)*str.size()); //write str data
os.close();
//the file test.txt will now have binary data in it
//to read it back in, just ifstream, and put that info in new containers, like this;
int in_x = 0; //new containters set to 0 for debug
float in_y = 0;
char inchar[20]; //buffer to write 20 chars to
ifstream is("test.txt", std::ios::in | std::ios::binary); //read in binary
is.read((char*)&in_x, sizeof(int)); //write to new containers
is.read((char*)&in_y, sizeof(float));
is.read((char*)&inchar, sizeof(char)*20); //write char assuming 20 size
is.close();
//outputting will show the values are correctly read into the new containers
cout << in_x << endl;
cout << in_y << endl;
cout << inchar << endl;
I realize this is probably very stupid and not done this way whatsoever!
The entire software industry was stupid enough to have it done so many times that even a special term was invented for this operation - serialization and nearly all C++ frameworks and libraries have implemented this in a various ways.
Since question is tagged with C++ I would suggest you to look at boost serialization but there are many other implementations.
Do you need that file to be readable by a human? If yes than consider, for example, XML or JSON formats.
You don't need it be readable but want it be as compact as possible? Consider google protobuf
Just start doing it and come with a more specific question(s).
As it was mentioned before, keep strings as std:string objects rather then char*
About writing/reading to/from files in C++ read about fstream

ifstream / ofstream issue with c++?

I have been having a very hard time writing to a binary file and reading back. I am basically writing records of this format
1234|ABCD|efgh|IJKL|ABC
Before writing this record, I would write the length of this entire record ( using string.size()) and then I write the record to the binary file using ofstream as follows:
int size;
ofstream studentfile;
studentfile.open( filename.c_str(),ios::out|ios::binary );
studentfile.write((char*)&size,sizeof(int));
studentfile.write(data.c_str(),(data.size()*(sizeof(char))));
cout << "Added " << data << " to " << filename << endl;
studentfile.close();
And I read this data at some other place
ifstream ifile11;
int x;
std::string y;
ifile11.open("student.db", ios::in |ios::binary);
ifile11.read((char*)&x,sizeof(int));
ifile11.read((char*)&y,x);
cout << "X " << x << " Y " << y << endl;
first I read the length of the record into the variable x, and then read the record into string y. The problem is, the output shows x as being '0' and 'y' is empty.
I am not able figure this out. Someone who can look into this problem and provide some insight will be thanked very much.
Thank you
You can't read a string that way, as a std::string is really only a pointer and a size member. (Try doing std::string s; sizeof(s), the size will be constant no matter what you set the string to.)
Instead read it into a temporary buffer, and then convert that buffer into a string:
int length;
ifile11.read(reinterpret_cast<char*>(&length), sizeof(length));
char* temp_buffer = new char[length];
ifile11.read(temp_buffer, length);
std::string str(temp_buffer, length);
delete [] temp_buffer;
I know I am answering my own question, but I strictly feel this information is going to help everyone. For most part, Joachim's answer is correct and works. However, there are two main issues behind my problem :
1. The Dev-C++ compiler was having a hard time reading binary files.
2. Not passing strings properly while writing to the binary file, and also reading from the file. For the reading part, Joachim's answer fixed it all.
The Dev-C++ IDE didn't help me. It wrongly read data from the binary file, and it did it without me even making use of a temp_buffer. Visual C++ 2010 Express has correctly identified this error, and threw run-time exceptions and kept me from being misled.
As soon as I took all my code into a new VC++ project, it appropriately provided me with error messages, so that I could fix it all.
So, please do not use Dev-C++ unless you want to run into real troubles like thiis. Also, when trying to read strings, Joachim's answer would be the ideal way.

C++: Reading and Sorting Binary Files

I've been scratching my head and putting this homework off for a couple days but now that I hunker down to try and do it I'm coming up empty. There's 4 things I need to do.
1) Read a binary file and place that data into arrays
2) Sort the list according to the test scores from lowest to highest
3) Average the scores and output it
4) Create a new binary file with the sorted data
This is what the binary data file looks unsorted
A. Smith 89
T. Phillip 95
S. Long 76
I can probably sort since I think I know how to use parallel arrays and index sorting to figure it out, but the reading of the binary file and placing that data into an array is confusing as hell to me as my book doesn't really explain very well.
So far this is my preliminary code which doesn't really do much:
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <Windows.h>
using namespace std;
int get_int(int default_value);
int average(int x, int y, int z);
int main()
{
char filename[MAX_PATH + 1];
int n = 0;
char name[3];
int grade[3];
int recsize = sizeof(name) + sizeof(int);
cout << "Enter directory and file name of the binary file you want to open: ";
cin.getline(filename, MAX_PATH);
// Open file for binary write.
fstream fbin(filename, ios::binary | ios::in);
if (!fbin) {
cout << "Could not open " << filename << endl;
system("PAUSE");
return -1;
}
}
Sorry for such a novice question.
edit: Sorry what the data file stated earlier is what it SHOULD look like, the binary file is a .dat that has this in it when opened with notepad:
A.Smith ÌÌÌÌÌÌÌÌÌÌÌY T. Phillip ÌÌÌÌÌÌÌÌ_ S. Long ip ÌÌÌÌÌÌÌÌL J. White p ÌÌÌÌÌÌÌÌd
Reading a file in c++ is simple:
create a stream from file [so that to read from the stream] (you have filestream[input/output], stringstream ... )
ifstream fin; //creates a fileinput stream
fin.open(fname.c_str(),ifstream::binary); // this opens the file in binary mod
void readFile(string fname)
{
ifstream fin;
fin.open(fname.c_str()); //opens that file;
if(!fin)
cout<<"err";
string line;
while(getline(fin,line)) //gets a line from stream and put it in line (string)
{
cout<<line<<endl;
//reading every line
//process for you need.
...
}
fin.close();
}
as you specify, the file is simply a text file, so you can process each line and do whatever you want.
Reading from a binary file may seem confusing, but it is really relatively simple. You have declared your fstream using your file name and set it to binary, which leaves little to do.
Create a pointer to a character array (typically called a buffer, since this data is typically extracted from this array after for other purposes). The size of the array is determined by the length of the file, which you can get by using:
fbin.seekg(0, fbin.end); //Tells fbin to seek to 0 entries from the end of the stream
int binaryLength = fbin.tellg(); //The position of the stream (i.e. its length) is stored in binaryLength
fbin.seekg(0, fbin.beg); //Returns fbin to the beginning of the stream
Then this is used to create a simple character array pointer:
char* buffer = new char[binaryLength];
The data is then read into the buffer:
fbin.read(buffer, binaryLength);
All the binary data that was in the file is now in the buffer. This data can be accessed very simply as in a normal array, and can be used for whatever you please.
The data you have, however, does not at all seem binary. It looks more like a regular text file. Perhaps, unless explicitly stated, you ought to consider a different method for reading your data.
You know, with that low range of sorting index you can avoid actual sorting (with comparing indices and moving data forth and back). All you have to do is to allocate a vector of vector of strings, resize it to 101. Then traverse the data, storing each: "A. Smith" in 89-th element; "T. Phillip" in 95-th; "S. Long" in 76-th and so on.
Then by iterating the vector elements from begin() to end() you would have all the data already sorted.
It's almost linear complexity (almost, because allocation/resizing of subvectors and strings can be costly) easy and transparent.

Reading Binary File into a Structure (C++)

So I'm having a bit of an issue of not being able to properly read a binary file into my structure. The structure is this:
struct Student
{
char name[25];
int quiz1;
int quiz2;
int quiz3;
};
It is 37 bytes (25 bytes from char array, and 4 bytes per integer). My .dat file is 185 bytes. It's 5 students with 3 integer grades. So each student takes up 37 bytes (37*5=185).
It looks something like this in plain text format:
Bart Simpson 75 65 70
Ralph Wiggum 35 60 44
Lisa Simpson 100 98 91
Martin Prince 99 98 99
Milhouse Van Houten 80 87 79
I'm able to read each of the records individually by using this code:
Student stud;
fstream file;
file.open("quizzes.dat", ios::in | ios::out | ios::binary);
if (file.fail())
{
cout << "ERROR: Cannot open the file..." << endl;
exit(0);
}
file.read(stud.name, sizeof(stud.name));
file.read(reinterpret_cast<char *>(&stud.quiz1), sizeof(stud.quiz1));
file.read(reinterpret_cast<char *>(&stud.quiz2), sizeof(stud.quiz2));
file.read(reinterpret_cast<char *>(&stud.quiz3), sizeof(stud.quiz3));
while(!file.eof())
{
cout << left
<< setw(25) << stud.name
<< setw(5) << stud.quiz1
<< setw(5) << stud.quiz2
<< setw(5) << stud.quiz3
<< endl;
// Reading the next record
file.read(stud.name, sizeof(stud.name));
file.read(reinterpret_cast<char *>(&stud.quiz1), sizeof(stud.quiz1));
file.read(reinterpret_cast<char *>(&stud.quiz2), sizeof(stud.quiz2));
file.read(reinterpret_cast<char *>(&stud.quiz3), sizeof(stud.quiz3));
}
And I get a nice looking output, but I want to be able to read in one whole structure at a time, not just individual members of each structure at a time. This code is what I believe needed to accomplish the task, but... it doesn't work (I'll show output after it):
*not including the similar parts as far as opening of the file and structure declaration, etc.
file.read(reinterpret_cast<char *>(&stud), sizeof(stud));
while(!file.eof())
{
cout << left
<< setw(25) << stud.name
<< setw(5) << stud.quiz1
<< setw(5) << stud.quiz2
<< setw(5) << stud.quiz3
<< endl;
file.read(reinterpret_cast<char *>(&stud), sizeof(stud));
}
OUTPUT:
Bart Simpson 16640179201818317312
ph Wiggum 288358417665884161394631027
impson 129184563217692391371917853806
ince 175193530917020655191851872800
The only part it doesn't mess up is the first name, after that it's down the hill.. I've tried everything and I've no idea what is wrong. I've even searched through the books I have and I couldn't find anything. Things in there look like what I have and they work, but for some odd reason mine doesn't. I did the file.get(ch) (ch being a char) at byte 25 and it returned K, which is ASCII for 75.. which is the 1st test score, so, everything's where it should be. It's just not reading in my structures properly.
Any help would be greatly appreciated, I'm just stuck with this one.
EDIT: After receiving such a large amount of unexpected and awesome input from you guys, I've decided to take your advice and stick with reading in one member at a time. I made things cleaner and smaller by using functions. Thank you once again for providing such quick and enlightening input. It's much appreciated.
IF you're interested in a workaround that's not recommended by most, scroll towards the bottom, to the 3rd answer by user1654209. That workaround works flawlessly, but read all the comments to see why it's not favored.
Your struct has almost certainly been padded to preserve the alignment of its content. This means that it will not be 37 bytes, and that mismatch causes the reading to go out of sync. Looking at the way each string is losing 3 characters, it seems that it has been padded to 40 bytes.
As the padding is likely to be between the string and the integers, not even the first record reads correctly.
In this case I would recommend not attempting to read your data as a binary blob, and stick to reading individual fields. It's far more robust, especially if you even want to alter your structure.
Without seeing the code that writes the data, I'm guessing that you write the data the way you read it in the first example, each element one by one. Then each record in the file will indeed be 37 bytes.
However, since the compiler pads structures to put members on nice boundaries for optimization reasons, your structure is 40 bytes. So when you read the complete structure in a single call, then you actually read 40 bytes at a time, which means that your reading will go out of phase with the actual records in the file.
You either have to re-implement the writing to write the complete structure in one go, or use the first method of reading where you're reading one member field at a time.
A simple workaround is to pack your structure to 1 byte
using gcc
struct __attribute__((packed)) Student
{
char name[25];
int quiz1;
int quiz2;
int quiz3;
};
using msvc
#pragma pack(push, 1) //set padding to 1 byte, saves previous value
struct Student
{
char name[25];
int quiz1;
int quiz2;
int quiz3;
};
#pragma pack(pop) //restore previous pack value
EDIT : As user ahans states : pragma pack is supported by gcc since version 2.7.2.3 (released in 1997) so it seems safe to use pragma pack as the only packed notation if you are targetting msvc and gcc
As you've already found out, the padding is the issue here. Also, as others have suggested, the proper way of solving this is to read each member individually as you've done in your example. I don't expect this to cost much more than reading the whole thing in once performance-wise. However, if you still want to go ahead and read it as once, you can tell the compiler to do the padding differently:
#pragma pack(push, 1)
struct Student
{
char name[25];
int quiz1;
int quiz2;
int quiz3;
};
#pragma pack(pop)
With #pragma pack(push, 1) you tell the compiler to save the current pack value on an internal stack and use a pack value of 1 thereafter. This means you get an alignment of 1 byte, which means no padding at all in this case. With #pragma pack(pop) you tell the compiler to get the last value from the stack and use this thereafter, thereby restoring the behavior the compiler used before the definition of your struct.
While #pragma usually indicates non-portable, compiler-dependent features, this one works at least with GCC and Microsoft VC++.
There is more than one way to solve the problem of this thread. Here is a solution based on using union of a struct and a char buf:
#include <fstream>
#include <sstream>
#include <iomanip>
#include <string>
/*
This is the main idea of the technique: Put the struct
inside a union. And then put a char array that is the
number of chars needed for the array.
union causes sStudent and buf to be at the exact same
place in memory. They overlap each other!
*/
union uStudent
{
struct sStudent
{
char name[25];
int quiz1;
int quiz2;
int quiz3;
} field;
char buf[ sizeof(sStudent) ]; // sizeof calcs the number of chars needed
};
void create_data_file(fstream& file, uStudent* oStudent, int idx)
{
if (idx < 0)
{
// index passed beginning of oStudent array. Return to start processing.
return;
}
// have not yet reached idx = -1. Tail recurse
create_data_file(file, oStudent, idx - 1);
// write a record
file.write(oStudent[idx].buf, sizeof(uStudent));
// return to write another record or to finish
return;
}
std::string read_in_data_file(std::fstream& file, std::stringstream& strm_buf)
{
// allocate a buffer of the correct size
uStudent temp_student;
// read in to buffer
file.read( temp_student.buf, sizeof(uStudent) );
// at end of file?
if (file.eof())
{
// finished
return strm_buf.str();
}
// not at end of file. Stuff buf for display
strm_buf << std::setw(25) << std::left << temp_student.field.name;
strm_buf << std::setw(5) << std::right << temp_student.field.quiz1;
strm_buf << std::setw(5) << std::right << temp_student.field.quiz2;
strm_buf << std::setw(5) << std::right << temp_student.field.quiz3;
strm_buf << std::endl;
// head recurse and see whether at end of file
return read_in_data_file(file, strm_buf);
}
std::string quiz(void)
{
/*
declare and initialize array of uStudent to facilitate
writing out the data file and then demonstrating
reading it back in.
*/
uStudent oStudent[] =
{
{"Bart Simpson", 75, 65, 70},
{"Ralph Wiggum", 35, 60, 44},
{"Lisa Simpson", 100, 98, 91},
{"Martin Prince", 99, 98, 99},
{"Milhouse Van Houten", 80, 87, 79}
};
fstream file;
// ios::trunc causes the file to be created if it does not already exist.
// ios::trunc also causes the file to be empty if it does already exist.
file.open("quizzes.dat", ios::in | ios::out | ios::binary | ios::trunc);
if ( ! file.is_open() )
{
ShowMessage( "File did not open" );
exit(1);
}
// create the data file
int num_elements = sizeof(oStudent) / sizeof(uStudent);
create_data_file(file, oStudent, num_elements - 1);
// Don't forget
file.flush();
/*
We wrote actual integers. So, you cannot check the file so
easily by just using a common text editor such as Windows Notepad.
You would need an editor that shows hex values or something similar.
And integrated development invironment (IDE) is likely to have such
an editor. Of course, not always so.
*/
/*
Now, read the file back in for display. Reading into a string buffer
for display all at once. Can modify code to display the string buffer
wherever you want.
*/
// make sure at beginning of file
file.seekg(0, ios::beg);
std::stringstream strm_buf;
strm_buf.str( read_in_data_file(file, strm_buf) );
file.close();
return strm_buf.str();
}
Call quiz() and receive a string formatted for display to std::cout, writing to a file, or whatever.
The main idea is that all the items inside a union start at the same address in memory. So you can have a char or wchar_t buf that is the same size as the struct you want to write to or read from a file. And notice that zero casts are needed. There is not one cast in the code.
I also did not have to worry about padding.
For those who do not like recursion, sorry. Working it out with recursion is easier and less error prone for me. Maybe not easier for others? The recursions can be converted to loops. And they would need to be converted to loops for very large files.
For those who like recursions, this is yet another instance of using recursion.
I don't claim that using union is the best solution or not. Seems that it is a solution. Maybe you like it?

How to efficiently write a vector of structs to file?

I have code that is writing a vector of size greater than 10million to a text file. I used clock() to time the writefile function and its the slowest part of my program. Is there a better way to write to file than my below method?
void writefile(vector<fields>& fieldsvec, ofstream& sigfile, ofstream& noisefile)
/* Writes clean and noise data to respective files
*
* fieldsvec: vector of clean data
* noisevec: vector of noise data
* sigfile: file to store clean data
* noisefile: file to store noise data
*/
{
for(unsigned int i=0; i<fieldsvec.size(); i++)
{
if(fieldsvec[i].nflag==false)
{
sigfile << fieldsvec[i].timestamp << ";" << fieldsvec[i].price << ";" << fieldsvec[i].units;
sigfile << endl;
}
else
{
noisefile << fieldsvec[i].timestamp << ";" << fieldsvec[i].price << ";" << fieldsvec[i].units;
noisefile << endl;
}
}
}
where my struct is:
struct fields
// Stores a parsed line of a file
{
public:
string timestamp;
float price;
float units;
bool nflag; //flag if noise (TRUE=NOISE)
};
I suggest getting rid of the endl. This effectively flushes the buffer every time and thus greatly increases the number of syscalls.
Writing '\n' instead of endl should be a very good improvement.
And by the way, the code can be simplified:
ofstream& files[2] = { sigfile, noisefile };
for(unsigned int i=0; i<fieldsvec.size(); i++)
files[fieldsvec[i].nflag] << fieldsvec[i].timestamp << ';' << fieldsvec[i].price << ";\n";
You could write your file in binary format instead of text format to increase the writing speed, as suggested in the first answer of this SO question:
file.open(filename.c_str(), ios_base::binary);
...
// The following writes a vector into a file in binary format
vector<double> v;
const char* pointer = reinterpret_cast<const char*>(&v[0]);
size_t bytes = v.size() * sizeof(v[0]);
file.write(pointer, bytes);
From the same link, the OP reported:
replacing std::endl with \n increased his code speed by 1%
concatenating all the content to be written in a stream and writing everything in the file at the end increased the code speed by 7%
the change of text format to binary format increased his code speed by 90%.
A significant speed-killer is that you are converting your numbers to text.
As for the raw file output, the buffering on an ofstream is supposed to be pretty efficient by default.
You should pass your array as a const reference. That might not be a big deal, but it does allow certain compiler optimizations.
If you think the stream is messing things up because of repeated writes, you could try creating a string with sprintf of snprintf and write it once. Only do this if your timestamp is a known size. Of course, that would make extra copying because the string must be then put in the output buffer. Experiment.
Otherwise, it's going to start getting dirty. When you need to tweak out the performance of files, you need to start tailoring the buffers to your application. That tends to get down to using no buffering or cache, sector-aligning your own buffer, and writing large chunks.