Retrieving File Data Stored in Buffer - c++

I'm new to the forum, but not to this website. I've been searching for weeks on how to process a large data file quickly using C++ 11. I'm trying to have a function with a member that will capture the trace file name, open and process the data. The trace file contains 2 million lines of data, and each line is structured with a read/write operation and a hex address:
r abcdef123456
However, with a file having that much data, I need to read in and parse those 2 values quickly. My first attempt to read the file was the following:
void getTraceData(string filename)
{
ifstream inputfile;
string file_str;
vector<string> op, addr;
// Open input file
inputfile.open(filename.c_str());
cout << "Opening file for reading: " << filename << endl;
// Determine if file opened successfully
if(inputfile.fail())
{
cout << "Text file failed to open." << endl;
cout << "Please check file name and path." << endl;
exit(1);
}
// Retrieve and store address values and operations
if(inputfile.is_open())
{
cout << "Text file opened successfully." << endl;
while(inputfile >> file_str)
{
if((file_str == "r") || (file_str == "w"))
{
op.push_back(file_str);
}
else
{
addr.push_back(file_str);
}
}
}
inputfile.close();
cout << "File closed." << endl;
}
It worked, it ran, and read in the file. Unfortunately, it took the program 8 minutes to run and read the file. I modified the first program to the second program, to try and read the file in faster. It did, reading the file into a buffer in a fraction of a second versus 8 mins. using ifstream:
void getTraceData()
{
// Setup variables
char* fbuffer;
ifstream ifs("text.txt");
long int length;
clock_t start, end;
// Start timer + get file length
start = clock();
ifs.seekg(0, ifs.end);
length = ifs.tellg();
ifs.seekg(0, ifs.beg);
// Setup buffer to read & store file data
fbuffer = new char[length];
ifs.read(fbuffer, length);
ifs.close();
end = clock();
float diff((float)end - (float)start);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Run time: " << seconds << " seconds" << endl;
delete[] fbuffer;
}
But when I added the parsing portion of the code, to get each line, and parsing the buffer contents line-by-line to store the two values in two separate variables, the program silently exits at the while-loop containing getline from the buffer:
void getTraceData(string filename)
{
// Setup variables
char* fbuffer;
ifstream ifs("text.txt");
long int length;
string op, addr, line;
clock_t start, end;
// Start timer + get file length
start = clock();
ifs.seekg(0, ifs.end);
length = ifs.tellg();
ifs.seekg(0, ifs.beg);
// Setup buffer to read & store file data
fbuffer = new char[length];
ifs.read(fbuffer, length);
ifs.close();
// Setup stream buffer
const int maxline = 20;
char* lbuffer;
stringstream ss;
// Parse buffer data line-by-line
while(ss.getline(lbuffer, length))
{
while(getline(ss, line))
{
ss >> op >> addr;
}
ss.ignore( strlen(lbuffer));
}
end = clock();
float diff((float)end - (float)start);
float seconds = diff / CLOCKS_PER_SEC;
cout << "Run time: " << seconds << " seconds" << endl;
delete[] fbuffer;
delete[] lbuffer;
}
I was wondering, once my file is read into a buffer, how do I retrieve it and store it into variables? For added value, my benchmark time is under 2 mins. to read and process the data file. But right now, I'm just focused on the input file, and not the rest of my program or the machine it runs on (the code is portable to other machines). The language is C++ 11 and the OS is a Linux computer. Sorry for the long posting.

Your stringstream ss is not associated to fbuffer at all. You are trying to getline from an empty stringstream, thus nothing happens. Try this:
string inputedString(fbuffer);
istringstream ss(fbuffer);
And before ss.getline(lbuffer, length), please allocate memory for lbuffer.
Actually you can directly read your file into a string to avoid the copy construction. Check this Reading directly from an std::istream into an std::string .
Last but not least, since your vector is quite large, you'd better reserve enough space for it before push_back the items one by one. When a vector reaches its capacity, attempt to push_back another item into it will result in reallocation and copy of all previous items in order to ensure continuous storage. Millions of items will make that happen quite a few times.

Related

C++ Save string line by line to a file as fast as possible

I was trying to write to a file or save the string s.substr (space_pos) in a vector as fast as possible. I tried to write it to a file with ofstream or to output it with cout but it takes a long time. The size of the text file is 130mb.
This is the code:
fstream f(legitfiles.c_str(), fstream::in );
string s;
while(getline(f, s)){
size_t space_pos = s.rfind(" ") + 1;
cout << s.substr(space_pos) << endl;
ofstream results("results.c_str()");
results << s.substr(space_pos) << endl;
results.close();
}
cout << s << endl;
f.close();
Is there a way to write or print the string in a faster way?
Uncouple the C++ stream from the C stream:
std::ios_base::sync_with_stdio(false);
Remove the coupling between cin and cout
std::cin.tie(NULL);
Now don't use std::endl needlessly flushes the fstream buffer after every line, flushing is expensive. You should use a newline escape character \n instead and leave the buffer flushing to the stream.
Also don't build an extra string you don't need. Use a character string_view (which prevents copying)
s.substr(space_pos)
//replace with:
std::string_view view(s);
view.substr(space_pos);
If you don't have a modern compiler just use C-Strings.
s.data() + space_pos
You are duplicating the substring. I suggest creating a temporary:
ofstream results("results.c_str()");
while(getline(f, s)){
size_t space_pos = s.rfind(" ") + 1;
const std::string sub_string(s.substr(space_pos));
cout << sub_string << "\n";
results << sub_string << "\n";
}
results.close();
You'll need to profile to see if the next code fragment is faster:
while(getline(f, s))
{
static const char newline[] = "\n";
size_t space_pos = s.rfind(" ") + 1;
const std::string sub_string(s.substr(space_pos));
const size_t length(sub_string.length());
cout.write(sub_string.c_str(), length);
cout.write(newline, 1);
results.write(sub_string.c_str(), length);
results.write(newline, 1);
}
The idea behind the 2nd fragment is that you are bypassing the formatting process and directly writing the contents of the string to the output stream. You'll need to measure both fragments to see which is faster (start a clock, run an example at least 1E6 iterations, stop the clock. Take average).
If you want to speed up the file writing, remove the writing to std::cout.
Edit 1: multiple threads
You may be able to get some more efficiency out of this by using multiple threads: "Read Thread", "Processing Thread" and "Writing Thread".
The "Read Thread" reads the lines and appends to a buffer. Start this one first.
After a delay, the "Processing Thread" performs the substr method on all the strings.
After N about of strings have been processed, the "Writing Thread" starts and writes the substr strings to the file.
This technique uses double buffering. One thread reads and places data into the buffer. When the buffer is full, the Processing Thread should start processing and placing results into a second buffer. When the 2nd buffer is full, the Writing Thread starts and writes the buffer to the results file. There should be at least 2 "read" buffers and 2 "write" buffers. The amount and size of the buffers should be adjusted to get the best performance from your program.
//Edit: Please note that this answer solves a different problem than that stated in the question. It will copy each line skipping everything from the beginning of the line up to and including the first whitespace.
It might be faster to read the first word of a line and throw it away before getline()ing the rest of it instead of using string::find() and std::substr(). Also you should avoid opening and closing the output file on every iteration.
#include <string>
#include <fstream>
int main()
{
std::ifstream is{ "input" };
std::ofstream os{ "output" };
std::string str;
str.reserve(1024); // change 1024 to your estimated line length.
while (is.peek() == ' ' || is >> str, std::getline(is, str)) {
str += '\n'; // save an additional call to operator<<(char)
os << str.data() + 1; // +1 ... skip the space
// os.write(str.data() + 1, str.length() - 1); // might be even faster
}
}

open/save to binary file in c++ fails intermittently

I've been staring at this to long...
I have made a program that logs weather data from different sensors. It handles the data in a double linkedlist and saves it to a binary file. To store different "compressions" of the data different files are used. e.g no compression, hours, days, etc.
The main program first loads the content of the correct file (defined by the WeatherSeries constructor) and adds all content from the file into the linkedlist. Then adds the new element and then saves it all. It adds to the list from oldest to newest and in the file it is also saved so the newest record is the last to be added to the file.
The error that occurs is that it seems that I lose a couple of hours of recorded data. I have observed that the data have existed. I.e. seen that there were data recorded between e.g. 9PM and 10PM and then in the morning after this data is gone.
The weird thing is the following:
Error only occurs intermittent and only for the barometric sensor that delivers values with 6 digits compared to the humidity and temperature sensors which delivers values with four digits.
It has only happened to the "no compression" and never for any of the other compressions. This means that the program that retrieves the data from the sensors works. Also it means that the function that adds data to the double linkedlist works.
Left is the functions that opens and saves the data to the files.
Can you please see if you can find any errors in my code?
void weatherSeries::saveSeries()
{
ostringstream s;
s << "WLData/" << mSensorNbr << "_" << mSensorType << "_" << mTimeBase << ".dat";
ofstream file(s.str().c_str(), ios::out | ios::trunc | ios::binary);
if (!file)
{
file.clear();
file.open(s.str().c_str(), ios::out | ios::trunc | ios::binary);
}
if(file.is_open())
{
for (current = tail; current != NULL; current = current->prev)
{
file.write((char*)&current->time_stamp, sizeof(time_t));
file.write((char*)&current->val_avg, sizeof(double));
file.write((char*)&current->min, sizeof(double));
file.write((char*)&current->max, sizeof(double));
file.write((char*)&current->nbrOfValues, sizeof(unsigned long int));
}
}
else
{
cerr << "Unable to open for saving to " << mSensorNbr << "_" << mSensorType << "_" << mTimeBase << ".dat";
}
file.close();
}
void weatherSeries::openSeries()
{
deleteAll();
ostringstream s;
s << "WLData/" << mSensorNbr << "_" << mSensorType << "_" << mTimeBase << ".dat";
ifstream file(s.str().c_str(), ios::in | ios::binary);
if (!file)
{
file.clear();
file.open(s.str().c_str(), ios::in | ios::binary);
}
if(file.is_open())
{
time_t tmp_TS = 0;
double tmp_val_avg = 0;
double tmp_min = 0;
double tmp_max = 0;
unsigned long int tmp_nbrOfValues = 0;
while (file.read((char*)&tmp_TS, sizeof(time_t)))
{
file.read((char*)&tmp_val_avg, sizeof(double));
file.read((char*)&tmp_min, sizeof(double));
file.read((char*)&tmp_max, sizeof(double));
file.read((char*)&tmp_nbrOfValues, sizeof(unsigned long int));
addToSeries(tmp_TS, tmp_val_avg, tmp_min, tmp_max, tmp_nbrOfValues, true);
}
}
else
{
cerr << "Unable to open for opening from " << mSensorNbr << "_" << mSensorType << "_" << mTimeBase << ".dat";
}
file.close();
}
Note: deleteAll() clears the double linkedlist.
You were correct. The error was found in another part of the program. When I started logging different things in the code.
More or less different mechanism instantiate the program and it happened to happen at the same time causing the file to be manipulated by to instances at the same time.

edit: trouble checking if file is empty or not, what am I doing wrong?

Edit: changed my question to be more accurate of the situation
I'm trying to open up a text file (create it if it doesnt exist,open it if it doesnt). It is the same input file as output.
ofstream oFile("goalsFile.txt");
fstream iFile("goalsFile.txt");
string goalsText;
string tempBuffer;
//int fileLength = 0;
bool empty = false;
if (oFile.is_open())
{
if (iFile.is_open())
{
iFile >> tempBuffer;
iFile.seekg(0, iFile.end);
size_t fileLength = iFile.tellg();
iFile.seekg(0, iFile.beg);
if (fileLength == 0)
{
cout << "Set a new goal\n" << "Goal Name:"; //if I end debugging her the file ends up being empty
getline(cin, goalSet);
oFile << goalSet;
oFile << ";";
cout << endl;
cout << "Goal Cost:";
getline(cin, tempBuffer);
goalCost = stoi(tempBuffer);
oFile << goalCost;
cout << endl;
}
}
}
Couple of issues. For one, if the file exist and has text within it, it still goes into the if loop that would normally ask me to set a new goal. I can't seem to figure out what's happening here.
The problem is simply that you are using buffered IO streams. Despite the fact that they reference the same file underneath, they have completely separate buffers.
// open the file for writing and erase existing contents.
std::ostream out(filename);
// open the now empty file for reading.
std::istream in(filename);
// write to out's buffer
out << "hello";
At this point, "hello" may not have been written to disk, the only guarantee is that it's in the output buffer of out. To force it to be written to disk you could use
out << std::endl; // new line + flush
out << std::flush; // just a flush
that means that we've committed our output to disk, but the input buffer is still untouched at this point, and so the file still appears to be empty.
In order for your input file to see what you've written to the output file, you'd need to use sync.
#include <iostream>
#include <fstream>
#include <string>
static const char* filename = "testfile.txt";
int main()
{
std::string hello;
{
std::ofstream out(filename);
std::ifstream in(filename);
out << "hello\n";
in >> hello;
std::cout << "unsync'd read got '" << hello << "'\n";
}
{
std::ofstream out(filename);
std::ifstream in(filename);
out << "hello\n";
out << std::flush;
in.sync();
in >> hello;
std::cout << "sync'd read got '" << hello << "'\n";
}
}
The next problem you'll run into trying to do this with buffered streams is the need to clear() the eof bit on the input stream every time more data is written to the file...
Try Boost::FileSystem::is_empty which test if your file is empty. I read somewhere that using fstream's is not a good way to test empty files.

In c++ seekg seems to include cr chars, but read() drops them

I'm currently trying to read the contents of a file into a char array.
For instance, I have the following text in a char array. 42 bytes:
{
type: "Backup",
name: "BackupJob"
}
This file is created in windows, and I'm using Visual Studio c++, so there is no OS compatibility issues.
However, executing the following code, at the completion of the for loop, I get Index: 39, with no 13 displayed prior to the 10's.
// Create the file stream and open the file for reading
ifstream fs;
fs.open("task.txt", ifstream::in);
int index = 0;
int ch = fs.get();
while (fs.good()) {
cout << ch << endl;
ch = fs.get();
index++;
}
cout << "----------------------------";
cout << "Index: " << index << endl;
return;
However, when attempting to create a char array the length of the file, reading the file size as per below results in the 3 additional CR chars attributing to the total filesize so that length is equal 42, which is adding screwing up the end of the array with dodgy bytes.
// Create the file stream and open the file for reading
ifstream fs;
fs.seekg(0, std::ios::end);
length = fs.tellg();
fs.seekg(0, std::ios::beg);
// Create the buffer to read the file
char* buffer = new char[length];
fs.read(buffer, length);
buffer[length] = '\0';
// Close the stream
fs.close();
Using a hex viewer, I have confirmed that file does indeed contain the CRLF (13 10) bytes in the file.
There seems to be a disparity with getting the end of the file, and what the get() and read() methods actually return.
Could anyone please help with this?
Cheers,
Justin
You should open your file in binary mode. This will stop read dropping CR.
fs.open("task.txt", ifstream::in|ifstream::binary);

Write and read records to .dat file C++

I am quite new to C++ and am trying to work out how to write a record in the format of this structure below to a text file:
struct user {
int id;
char username [20];
char password [20];
char name [20];
char email [30];
int telephone;
char address [70];
int level;
};
So far, I'm able to write to it fine but without an incremented id number as I don't know how to work out the number of records so the file looks something like this after I've written the data to the file.
1 Nick pass Nick email tele address 1
1 user pass name email tele address 1
1 test test test test test test 1
1 user pass Nick email tele addy 1
1 nbao pass Nick email tele 207 1
Using the following code:
ofstream outFile;
outFile.open("users.dat", ios::app);
// User input of data here
outFile << "\n" << 1 << " " << username << " " << password << " " << name << " "
<< email << " " << telephone << " " << address << " " << 1;
cout << "\nUser added successfully\n\n";
outFile.close();
So, how can I increment the value for each record on insertion and how then target a specific record in the file?
EDIT: I've got as far as being able to display each line:
if (inFile.is_open())
{
while(!inFile.eof())
{
cout<<endl;
getline(inFile,line);
cout<<line<<endl;
}
inFile.close();
}
What you have so far is not bad, except that it cannot handle cases where there is space in your strings (for example in address!)
What you are trying to do is write a very basic data base. You require three operations that need to be implemented separately (although intertwining them may give you better performance in certain cases, but I'm sure that's not your concern here).
Insert: You already have this implemented. Only thing you might want to change is the " " to "\n". This way, every field of the struct is in a new line and your problem with spaces are resolved. Later when reading, you need to read line by line
Search: To search, you need to open the file, read struct by struct (which itself consists of reading many lines corresponding to your struct fields) and identifying the entities of your interest. What to do with them is another issue, but simplest case would be to return the list of matching entities in an array (or vector).
Delete: This is similar to search, except you have to rewrite the file. What you do is basically, again read struct by struct, see which ones match your criteria of deletion. You ignore those that match, and write (like the insert part) the rest to another file. Afterwards, you can replace the original file with the new file.
Here is a pseudo-code:
Write-entity(user &u, ofstream &fout)
fout << u.id << endl
<< u.username << endl
<< u.password << endl
<< ...
Read-entity(user &u, ifstream &fin)
char ignore_new_line
fin >> u.id >> ignore_new_line
fin.getline(u.username, 20);
fin.getline(u.password, 20);
...
if end of file
return fail
Insert(user &u)
ofstream fout("db.dat");
Write-entity(u, fout);
fout.close();
Search(char *username) /* for example */
ifstream fin("db.dat");
user u;
vector<user> results;
while (Read-entity(u))
if (strcmp(username, u.username) == 0)
results.push_back(u);
fin.close();
return results;
Delete(int level) /* for example */
ifstream fin("db.dat");
ofstream fout("db_temp.dat");
user u;
while (Read-entity(u))
if (level != u.level)
Write-entity(u, fout);
fin.close();
fout.close();
copy "db_temp.dat" to "db.dat"
Side note: It's a good idea to place the \n after data has been written (so that your text file would end in a new line)
Using typical methods at least you will need to use fix size records if you want to have random access when reading the file so say you have 5 characters for name it will be stored as
bob\0\0
or whatever else you use to pad, this way you can index with record number * record size.
To increment the index you in the way you are doing you will need to the read the file to find the high existing index and increment it. Or you can load the file into memory and append the new record and write the file back
std::vector<user> users=read_dat("file.dat");
user user_=get_from_input();
users.push_back(user_);
then write the file back
std::ofstream file("file.dat");
for(size_t i=0; i!=users.size(); ++i) {
file << users.at(i);
//you will need to implement the stream extractor to do this easily
}
I suggest to wrap the file handler into a Class, and then overload the operator >> and << for your struct, with this was you will control the in and out.
For instance
struct User{
...
};
typedef std::vector<User> UserConT;
struct MyDataFile
{
ofstream outFile;
UserConT User_container;
MyDataFile(std::string const&); //
MyDataFile& operator<< (User const& user); // Implement and/or process the record before to write
MyDataFile& operator>> (UserConT & user); // Implement the extraction/parse and insert into container
MyDataFile& operator<< (UserConT const & user); //Implement extraction/parse and insert into ofstream
};
MyDataFile& MyDataFile::operator<< (User const& user)
{
static unsigned myIdRecord=User_container.size();
myIdRecord++;
outFile << user.id+myIdRecord << ....;
return *this;
}
int main()
{
MydataFile file("data.dat");
UserConT myUser;
User a;
//... you could manage a single record
a.name="pepe";
...
file<<a;
..//
}
A .Dat file is normally a simple text file itself that can be opened with notepad . So , you can simply read the Last Line of the file , read it , extract the first character , convert it into integer . THen increment the value and be done .
Some sample code here :
#include <iostream.h>
#include <fstream.h>
using namespace std;
int main(int argc, char *argv[])
{
ifstream in("test.txt");
if(!in) {
cout << "Cannot open input file.\n";
return 1;
}
char str[255];
while(in) {
in.getline(str, 255); // delim defaults to '\n'
//if(in) cout << str << endl;
}
// Now str contains the last line ,
if ((str[0] >=48) || ( str[0] <=57))
{
int i = atoi(str[0]);
i++;
}
//i contains the latest value , do your operation now
in.close();
return 0;
}
Assuming your file format doesn't not need to be human readable.
You can write the struct out to file such as.
outFile.open("users.dat", ios::app | ios::binary);
user someValue = {};
outFile.write( (char*)&someValue, sizeof(user) );
int nIndex = 0;
user fetchValue = {};
ifstream inputFile.open("user.data", ios::binary);
inputFile.seekg (0, ios::end);
int itemCount = inputFile.tellg() / sizeof(user);
inputFile.seekg (0, ios::beg);
if( nIndex > -1 && nIndex < itemCount){
inputFile.seekg ( sizeof(user) * nIndex , ios::beg);
inputFile.read( (char*)&fetchValue, sizeof(user) );
}
The code that writes to the file is a member function of the user struct?
Otherwise I see no connection with between the output and the struct.
Possible things to do:
write the id member instead of 1
use a counter for id and increment it at each write
don't write the id and when reading use the line number as id