Write performance of a file c++ - c++

I need to write a file in c++. The content is token from a while loop so now I'm writing it line by line.
Now I'm thinking that I can improve the writing time saving all the content in a variable and then write the file.
Does someone know which of the two ways is better?
Each line is written by this function:
void writeFile(char* filename, string value){
ofstream outFile(filename, ios::app);
outFile << value;
outFile.close();
}
while(/* Something */){
/* something */
writeFile(..);
}
The other way is:
void writeNewFile(char* filename, string value){
ofstream outFile(filename);
outFile<<value;
outFile.close();
}
string res = "";
while(/* Something */){
/* something */
res += mydata;
}
writeNewFile(filename, res);

Have you considered;
ofstream outFile(filename);
while(/* Something */){
/*...*/
outFile<< mydata;
}
outFile.close();
The outfile (streams) are buffered, which means that it will accumulate the data in an internal buffer (like a string) before writing it to disk -- you are unlikely to be able to beat that unless you have very special requirements

In the first case you are closing the filestream every loop iteration and opening a new stream, it'd be better to do something like:
void writeFile(const std::string& value, std::ostream* os = nullptr) {
if (os != nullptr) {
os << value;
}
}
Or:
void writeFile(const std::string& value, std::ostream& os) {
os << value;
}
And then call this with the fstream object (or its address in first case) you created in another function/main/whatever.
As for whether it's quicker to write continuously or at the end, it really depends on the kind of computations you're performing in the while loop and how much that'll slow down the whole process. However, for reliability reasons, sometimes it's best to write continuously to avoid losing all the data if the program crashes during the while loop execution for whatever reason.

fstream objects are already buffered, therfore adding to string doesn't help you much, in fact one may even lose performance as result of reallocation of string content when exceeding previously allocated size. One would have to test to know.

Related

Why is stream::ignore not working as intended?

As far as I know, stream.ignore(n, 'n') should ignore an (n) amount of characters or if ā€˜\nā€™ is reached, and skip over to the next line, however, when I run the next code:
// include...
void insertInfo(int info) {
std::fstream stream("infoFile.txt"); // Open the file
while (!stream.eof()) {
std::string a{};
// getline(stream, a); <--- Tried this, didn't work either
stream.ignore(99, '\n');
} // Skip to the last line without any number, in theory
std::cout << info << std::endl; // Check if the output it's correct (Which is)
stream << info; // Insert the info
stream.close(); // Close the file
}
void main() //Main
{
std::cout << "Enter your name, followed by the info you want to add to infoFile:" << std::endl;
std::string info, temp = "";
std::getline(std::cin, temp); // Get the info input
std::stringstream sstream;
sstream << temp;
sstream >> temp >> info; // Remove the name keeping only the info
temp = ""; // ^
std::string::size_type sz;
insertInfo(stoi(info, &sz)); // Convert info string into an integer and insert it in infoFile
}
The console prints out the "info" correct value, however, when I check info.txt, in which I previously wrote a '0' on, you don't see any change.
I tried removing the "ignore" function and it overwrites the 0, which is exactly what I was trying to prevent.
I also tried using "getline" function but the same thing happens.
What is the error here?
Problem
Cannot write to file.
Why
void insertInfo(int info) {
std::fstream stream("infoFile.txt"); // Open the file
Opens file with default permissions, which includes reading. The C++ Standard says I should expect "r+" behaviour and the C Standard says a file opened with "r+" behaviour must exist in order to be read (Someone please add a link if you have one). You cannot create a new file. This is problem 1. The Asker has dealt with this problem by providing a file.
Note: take care when working with files via relative paths. The program's working directory may not be where you think it is. This is problem 1a. It appears that the Asker has this taken care of for the moment.
while (!stream.eof()) {
Common bug. For more details see Why is iostream::eof inside a loop condition considered wrong? In this case since all you're looking for is the end of the file, the fact that the file hasn't been opened at all or has encountered any read errors is missed. Since a file in an error state can never reach the end of the file this quickly becomes an infinite loop. This is problem 2.
std::string a{};
// getline(stream, a); <--- Tryied this, didn't work neither
stream.ignore(99, '\n');
Always test IO transactions for success. This call can fail unchecked.
} // Skip to the last line without any number, in theory
Assuming nothing has gone wrong, and since we're not checking the error state assuming's all we can do, the file has reached the end and is now in the EOF error state. We can't read from or write to the stream until we clear this error. This is problem number 3 and likely the problem the Asker is struggling with.
std::cout << info << std::endl; // Check if the output it's correct (Wich is)
stream << info; // Insert the info
This can fail unchecked.
stream.close(); // Close the file
This is not necessary. The file will be closed when it goes out of scope.
}
Solution
void insertInfo(int info) {
std::fstream stream("infoFile.txt"); // Open the file
while (!stream.eof()) {
stream.ignore(99, '\n');
} // Skip to the last line without any number, in theory
std::cout << info << std::endl; // Check if the output it's correct (Wich is)
stream.clear(); // Added a call to clear the error flags.
stream << info; // Insert the info
stream.close(); // Close the file
}
Now we can write to the file. But let's improve this shall we?
void insertInfo(int info) {
std::fstream stream("infoFile.txt");
while (stream.ignore(99, '\n')) // moved ignore here. now we ignore, then test the result
{
}
stream.clear();
stream << info << '\n'; // added a line ending. Without some delimiter the file
// turns into one big number
}
Note that this isn't exactly kosher. If any ignore fails for any reason, we bail out and possibly write over data because the code blindly clears and writes. I'm not spending much time here trying to patch this up because we can get really, really simple and solve the problem of creating a non-existent file at the same time.
void insertInfo(int info) {
std::fstream stream("infoFile.txt", std::ios::app);
stream << info << '\n';
}
Two lines and pretty much done. With app we append to the file. We do not need to find the end of the file, the stream automatically points at it. If the file does not exist, it is created.
Next improvement: Let people know if the write failed.
bool insertInfo(int info) {
std::fstream stream("infoFile.txt", std::ios::app);
return static_cast<bool>(stream << info << '\n');
}
If the file was not written for any reason, the function returns false and the caller can figure out what to do. The only thing left is to tighten up the stream. Since all we do is write to ti we don't need the permissiveness of a fstream. Always start with the most restrictive and move to the least. This helps prevent some potential errors by making them impossible.
bool insertInfo(int info) {
std::ofstream stream("infoFile.txt", std::ios::app);
return static_cast<bool>(stream << info << '\n');
}
Now we use an ofstream and eliminate all the extra overhead and risk brought in by the ability to read the stream when we don't read the stream.

C++: buffer the cin istream

The problem is:
I have a code that operates on a fully functional istream. It uses methods like:
istream is;
is.seekg(...) // <--- going backwards at times
is.tellg() // <--- to save the position before looking forward
etc.
These methods are only available for istreams from, say, a file. However, if I use cin in this fashion, it will not work--cin does not have the option of saving a position, reading forward, then returning to the saved position.
// So, I can't cat the file into the program
cat file | ./program
// I can only read the file from inside the program
./program -f input.txt
// Which is the problem with a very, very large zipped file
// ... that cannot coexist on the same raid-10 drive system
// ... with the resulting output
zcat really_big_file.zip | ./program //<--- Doesn't work due to cin problem
./program -f really_big_file.zip //<--- not possible without unzipping
I can read cin into a deque, and process the deque. A 1mb deque buffer would be more than enough. However, this is problematic in three senses:
I have to rewrite everything to do this with a deque
It wont be as bulletproof as just using an istream, for which the code has already been debugged
It seems like, if I implement it as a deque with some difficulty, someone is going to come along and say, why didn't you just do it like ___
What is the proper/most efficient way to create a usable istream object, in the sense that all members are active, with a cin istream?
(Bearing in mind that performance is important)
You could create a filtering stream buffer reading from std::cin when getting new data but buffering all received characters. You'd be able to implement seeking within the buffered range of the input. Seeking beyond the end of the already buffered input would imply reading corresponding amounts of data. Here is an example of a corresponding implementation:
#include <iostream>
#include <vector>
class bufferbuf
: public std::streambuf {
private:
std::streambuf* d_sbuf;
std::vector<char> d_buffer;
int_type underflow() {
char buffer[1024];
std::streamsize size = this->d_sbuf->sgetn(buffer, sizeof(buffer));
if (size == 0) {
return std::char_traits<char>::eof();
}
this->d_buffer.insert(this->d_buffer.end(), buffer, buffer + size);
this->setg(this->d_buffer.data(),
this->d_buffer.data() + this->d_buffer.size() - size,
this->d_buffer.data() + this->d_buffer.size());
return std::char_traits<char>::to_int_type(*this->gptr());
}
pos_type seekoff(off_type off, std::ios_base::seekdir whence, std::ios_base::openmode) {
switch (whence) {
case std::ios_base::beg:
this->setg(this->eback(), this->eback() + off, this->egptr());
break;
case std::ios_base::cur:
this->setg(this->eback(), this->gptr() + off, this->egptr());
break;
case std::ios_base::end:
this->setg(this->eback(), this->egptr() + off, this->egptr());
break;
default: return pos_type(off_type(-1)); break;
}
return pos_type(off_type(this->gptr() - this->eback()));
}
pos_type seekpos(pos_type pos, std::ios_base::openmode) {
this->setg(this->eback(), this->eback() + pos, this->egptr());
return pos_type(off_type(this->gptr() - this->eback()));
}
public:
bufferbuf(std::streambuf* sbuf)
: d_sbuf(sbuf)
, d_buffer() {
this->setg(0, 0, 0); // actually the default setting
}
};
int main() {
bufferbuf sbuf(std::cin.rdbuf());
std::istream in(&sbuf);
std::streampos pos(in.tellg());
std::string line;
while (std::getline(in, line)) {
std::cout << "pass1: '" << line << "'\n";
}
in.clear();
in.seekg(pos);
while (std::getline(in, line)) {
std::cout << "pass2: '" << line << "'\n";
}
}
This implementation buffers input before passing it on to the reading step. You can read individual characters (e.g. change char buffer[1024]; to become char buffer[1]; or replace the use of sgetn() appropriately using sbumpc()) to provide a more direct response: there is a trade-off between immediate response and performance for batch processing.
cin is user input and should be treated as unpredictable. If you want to use mentioned functionality and you are sure about your input you can read whole input to istringstream and then operate on it

Data not written with ofstream, even though success is returned

I'm writing a program which fetches a large number of email files using libcurl and then writes the file to disk, and then generates a receipt.
My problem is that, whilst most of the receipts seem to get written, the majority of the emails aren't written to disk. Worse, even though the file doesn't get written, ofstream returns success - so the receipt gets written even if the file write didn't complete successfully.
My guess is that, because ofstream is asynchronous, if a write doesn't complete in time then it'll get dropped on the floor - only a certain number of writes being possible concurrently. I am just guessing here.
Perhaps I need to refactor my code to write synchronously - but I can't believe that that's necessary. Does anyone have any idea how I can make this work?
The email sizes range from a few KBytes to a couple of MBytes.
int write_file(string filename, string mail_item) {
ofstream out(filename.c_str());
out << mail_item;
out.close();
out.flush();
if (!out) {
return FUNCTION_FAILED;
}
return FUNCTION_SUCCESS;
}
This is part of another function, and has been cut out so that only the salient code for this question is shown.
vector<string> directory = curl_listroot(curl);
for (int i=0; i<directory.size(); i++) {
vector<int> mail_list = curl_search(curl,directory[i],make_vector<string>() << "SEEN" << "RECENT" << "NEW" << "ANSWERED" << "FLAGGED");
for (int j=0; j<mail_list.size(); j++) {
curl_reset(curl, imap.username, imap.password);
string mail_item = curl_fetch(curl,directory[i],mail_list[j]);
if (mail_item.compare("") != 0) {
string m_id = getMessageID(mail_item);
string filename = save_path+"/"+RECEIPTNAME+"/"+clean_filename(m_id) + ".eml";
if (!file_exists(filename)) {
string real_filename;
real_filename = save_path+"/"+INBOXNAME+"/"+clean_filename(m_id) + ".eml";
int success = write_file(real_filename, mail_item);
if (success == FUNCTION_SUCCESS) {
write_file(filename, ""); //write empty receipt
}
}
}
}
}
All suggestions gratefully received! Thank you!
Okay. I've found an answer - there may be better answers - but this one works for me. The problem seems to be in the OS (Linux, in this case) - ofstream completes, having handed the responsibility for writing the file to the OS, but the file hasn't actually been written yet (so whilst ofstream may be synchronous the end to end write of the file, from data to file safely written to disk, isn't). Given that I'm banging away with a huge number of writes in quick succession (potentially thousands), this won't necessarily work. The OS may throw its hands in the air and drop a significant number of the files writes on the floor (hence my original request for a synchronous way of writing the files - end to end).
My solution is to pause after each write to give the OS time to catch up. It's inelegant though, and not as performant as it should be - it doesn't take half a second to write an empty file. Additionally, on slow storage, half a second might not be enough time. I'd welcome any clever suggestions for how to improve my code.
int write_file(string filename, string mail_item) {
ofstream out(filename.c_str());
if (!out) {
return FUNCTION_FAILED;
}
out << mail_item << endl;
out.flush();
usleep(500000); //wait for half a second to give the OS time to output the file
if (!out) {
return FUNCTION_FAILED;
}
out.close();
if (!out) {
return FUNCTION_FAILED;
}
return FUNCTION_SUCCESS;
}

Is my fstream bad or not good()?

So I have a .cpp file with a Function which recieves a filename, and should return a String with the contents of the file (actualy modified contents, I modified the code to make it more understandable, but that doesn't have any effect on my problem). The problem is that f.good() is returning false and the loop, which reads the file is not working.
CODE :
#include "StdAfx.h"
#include "Form21.h"
#include <string>
#include <fstream>
#include <iostream>
string ReadAndWrite(char* a){
char filename[8];
strcpy_s(filename,a);
string output;
char c;
ifstream f(filename,ios::in);
output+= "Example text"; // <-- this writes and returns just fine!
c = f.get();
while (f.good())
{
output+= c;
c= f.get();
}
return output;
}
Does anyone have an idea on why this is happening?
Does it have something to do with, that this is a seperate .cpp file( it doesnt even throw out an error when I remove #include <fstream>).
Maybe there is a different kind of method to make the loop?
I'll be very happy to hear any suggestions on how to fix this or maybe a different method on how to achieve my goal.
First, there's really no reason to copy the file name you receive -- you can just use it as-is. Second, almost any loop of the form while (stream.good()), while (!stream.bad()), while (stream), etc., is nearly certain to be buggy. What you normally want to do is check whether reading some data worked.
Alternatively, you can skip using a loop at all. There are a couple of ways to do this. One that works nicely for shorter files looks like this:
string readfile(std::string const &filename) {
std::ifstream f(filename.c_str());
std::string retval;
retval << f.rdbuf();
return retval;
}
That works nicely up to a few tens of kilobytes (or so) of data, but starts to slow down on larger files. In such a case, you usually want to use ifstream::read to get the data, something along this general line:
std::string readfile(std::string const &filename) {
std::ifstream f(filename.c_str());
f.seekg(0, std::ios_base::end);
size_t size = f.tellg();
std::string retval(size, ' ');
f.seekg(0);
f.read(&retval[0], size);
return retval;
}
Edit: If you need to process the individual characters (not just read them) you have a couple of choices. One is to separate it into phases, where you read all the data in one phase, and do the processing in a separate phase. Another possibility (if you just need to look at individual characters during processing) is to use something like std::transform to read data, do the processing, and put the output into a string:
struct character_processor {
char operator()(char input) {
// do some sort of processing on each character:
return ~input;
}
};
std::transform(std::istream_iterator<char>(f),
std::istream_iterator<char>(),
std::back_inserter(result),
character_processor());
I would check that strlen(a) is not greater than 7...
You might overrun filename and get a file name that doesn't exist.
Not relating the problem, I would re-write the function:
string ReadAndWrite(string a) { // string here, if you are into C++ already
string filename; // also here
filename = a; // simpler
string output;
char c;
ifstream f(filename.c_str()); // no need for ios::in (but needs a char *, not a string
output+= "Example text"; // <-- this writes and returns just fine!
f >> c; // instead c = f.get();
while (f) // no need for f.good())
{
output+= c;
f >> c; // again, instead c= f.get();
}
return output;
}
Might I suggest using fopen? http://www.cplusplus.com/reference/clibrary/cstdio/fopen/ It takes in a filename and returns a file pointer. With that you can use fgets to read the file line by line http://www.cplusplus.com/reference/clibrary/cstdio/fgets/

Reading multiple files

I want to alternate between reading multiple files. Below is a simplified version of my code.
ifstream* In_file1 = new ifstream("a.dat", ios::binary);
ifstream* In_file2 = new ifstream("b..dat", ios::binary);
ifstream* In_file;
int ID;
In_file = In_file1;
int ID = 0;
//LOOPING PORTION
if (In_file -> eof())
{
In_file -> seekg(0, ios_base::beg);
In_file->close();
switch (ID)
{
case 0:
In_file = In_file2; ID = 1; break;
case 1:
In_file = In_file1; ID = 0; break;
}
}
//some codes
:
:
In_file->read (data, sizeof(double));
//LOOPING PORTION
The code works well if I am reading the files one time and I thought that everything was cool. However, if the part termed 'looping portion' is within a loop, then the behaviour becomes weird and I start having a single repeating output. Please, can someone tell me what is wrong and how I can fix it? If you have a better method of tacking the problem, please suggest. I appreciate it.
//SOLVED
Thank you everybody for your comments, I appreciate it. Here is what I simple did:
Instead of the original
switch (ID)
{
case 0:
In_file = In_file2; ID = 1; break;
case 1:
In_file = In_file1; ID = 0; break;
}
I simply did
switch (ID)
{
case 0:
In_file = new ifstream("a.dat", ios::binary); ID = 1; break;
case 1:
In_file = new ifstream("b.dat", ios::binary); ID = 0; break;
}
Now it works like charm and I can loop as much as I want:-). I appreciate your comments, great to know big brother still helps.
Let's see: the code you posted works fine, and you want us to tell you
what's wrong with the code you didn't post. That's rather difficult.
Still, the code you posted probably doesn't work correctly either.
std::istream::eof can only be used reliably after an input (or some
other operation) has failed; in the code you've posted, it will almost
certainly be false, regardless.
In addition: there's no need to dynamically allocate ifstream; in
fact, there are almost no cases where dynamic allocation of ifstream
is appropriate. And you don't check that the opens have succeeded.
If you want to read two files, one after the other, the simplest way is
to use two loops, one after the other (calling a common function for
processing the data). If for some reason that's not appropriate, I'd
use a custom streambuf, which takes a list of filenames in the
constructor, and advances to the next whenever it reaches end of file on
one, only returning EOF when it has reached the end of all of the
files. (The only complication in doing this is what to do if one of the
opens fails. I do this often enough that it's part of my tool kit,
and I use a callback to handle failure. For a one time use, however,
you can just hard code in whatever is appropriate.)
As a quick example:
// We define our own streambuf, deriving from std::streambuf
// (All istream and ostream delegate to a streambuf for the
// actual data transfer; we'll use an instance of this to
// initialize the istream we're going to read from.)
class MultiFileInputStreambuf : public std::streambuf
{
// The list of files we will process
std::vector<std::string> m_filenames;
// And our current position in the list (actually
// one past the current position, since we increment
// it when we open the file).
std::vector<std::string>::const_iterator m_current;
// Rather than create a new filebuf for each file, we'll
// reuse this one, closing any previously open file, and
// opening a new file, as needed.
std::filebuf m_streambuf;
protected:
// This is part of the protocol for streambuf. The base
// class will call this function anytime it needs to
// get a character, and there aren't any in the buffer.
// This function can set up a buffer, if it wants, but
// in this case, the buffering is handled by the filebuf,
// so it's likely not worth the bother. (But this depends
// on the cost of virtual functions---without a buffer,
// each character read will require a virtual function call
// to get here.
//
// The protocol is to return the next character, or EOF if
// there isn't one.
virtual int underflow()
{
// Get one character from the current streambuf.
int result = m_streambuf.sgetc();
// As long as 1) the current streambuf is at end of file,
// and 2) there are more files to read, open the next file
// and try to get a character from it.
while ( result == EOF && m_current != m_filenames.eof() ) {
m_streambuf.close();
m_streambuf.open( m_current->c_str(), std::ios::in );
if ( !m_streambuf.is_open() )
// Error handling here...
++ m_current;
result = m_streambuf.sgetc();
}
// We've either gotten a character from the (now) current
// streambuf, or there are no more files, and we'll return
// the EOF from our last attempt at reading.
return result;
}
public:
// Use a template and two iterators to initialize the list
// of files from any STL sequence whose elements can be
// implicitly converted to std::string.
template<typename ForwardIterator>
MultiFileInputStreambuf(ForwardIterator begin, ForwardIterator end)
: m_filenames(begin, end)
, m_current(m_filenames.begin())
{
}
};
#include <iostream>
#include <fstream>
#include <string>
#define NO_OF_FILES 2
int main ()
{
std::ifstream in;
std::string line;
std::string files[NO_OF_FILES] =
{
"file1.txt",
"file2.txt",
};
// start our engine!
for (int i = 0; i < NO_OF_FILES; i++)
{
in.open(files[i].c_str(), std::fstream::in);
if (in.is_open())
{
std::cout << "reading... " << files[i] << endl;
while (in.good())
{
getline(in, line);
std::cout << line << std::endl;
}
in.close();
std::cout << "SUCCESS" << std::endl;
}
else
std::cout << "Error: unable to open " + files[i] << std::endl;
}
return 0;
}