Efficient way to write results to file during the computational experiment

Efficient way to write results to file during the computational experiment - c++

I have a piece of software that performs a set of experiments (C++).
Without storing the outcomes, all experiments take a little over a minute.
The total amount of data generated is equal to 2.5 Gbyte, which is too large to store in memory till the end of the experiment and write to file afterwards.
Therefore I write them in chunks.
for(int i = 0; i < chunkSize;i++){
outfile << results_experiments[i] << endl;
}
where
ofstream outfile("data");
and outfile is only closed at the end.
However when I write them in chunks of 4700 kbytes (actually 4700/Chunksize = size of results_experiments element) the experiments take about 50 times longer (over an hour...). This is unacceptable and makes my prior optimization attempts look rather silly. Especially since these experiments again need to be perfomed using many different parameter settings ect.. (at least 100 times, but preferably more)
Concrete my question is:
What would be the ideal chunksize to write at?
Is there a more efficient way than (or something very inefficient in) the way I write data currently?
Basically: Help me getting the file IO overhead introduced as small as possible..
I think it should be possible to do this a lot faster as copying (writing & reading!) the resulting file (same size), takes me under a minute..
The code should be fairly platform independent and not use any (non standard) libraries (I can provide seperate versions for seperate platforms & more complicated install instructions, but it is a hassle..)
If it is not feasible to get the total experiment time under 5 minutes, without platform/library dependencies (and possible with), I will seriously consider introducing these. (platform is windows, but a trivial linux port should at least be possible)
Thank you for your effort.

For starters not flushing the buffer for every chunk seems like a good idea. It also seems possible to do the IO asynchronously, as it is completely independent of the computation. You can also use mmap to improve the performance of File I/O.

If the output doesn't have to be human-readable, then you could investigate a binary format. Storing data in binary format occupies less space than text format and therefore needs less disk i/o. But there'll be little difference if the data is all strings. So if you write out as much as possible as numbers and not formatted text you could get a big gain.
However I'm not sure if/how this is done with STL iostreams. The C-style way is using fopen(..., "wb") and fwrite(&object, ...).
I think boost::Serialisation can do binary output using << operator.
Also, can you reduce the amount you write? e.g. no formatting or redundant text, just the bare minimum.

Whether endl flushes the buffer when writing to a ofstream is implementation dependent--
You might also try increasing the buffer size of your ofstream
char *biggerbuffer = new char[512000];
outfile.rdbuf()->pubsetbuf(biggerbuffer,512000);
The availability of pubsetbuf may vary depending on your iostream implementation

Related

Fastest output to file in c and c++

I was helping someone with a question about outputting in C, and I was unable to answer this seemingly simple question I wanted to use the answer to (in my answer), that is:
What's the fastest way to output to a file in C / C++?
I've done a lot of work with prime number generation and mathematical algorithm optimization, using C++ and Java, and this was the biggest holdup for me sometimes - I sometimes need to move a lot to a file and fast.
Forgive me if this has been answered, but I've been looking on google and SO for some time to no avail.
I'm not expecting someone to do the work of benchmarking - but there are several ways to put to file and I doubt I know them all.
So to summarize,
What ways are there to output to a file in C and C++?
And which of these is/are the faster ones?
Obviously redirecting from the console is terrible.
Any brief comparison of printf, cout, fputc, etc. would help.
Edit:
From the comments,
There's a great baseline test of cout and printf in:
mixing cout and printf for faster output
This is a great start, but not the best answer to what I'm asking.
For example, it doesn't handle std::ostreambuf_iterator<> mentioned in the comments, if that's a possibility. Nor does it handle fputc or mention console redirection (how bad in comparison)(not that it needs to)
Edit 2:
Also, for the sake of arguing my historical case, you can assume a near infinite amount of data being output (programs literally running for days on a newer Intel i7, producing gigabytes of text)
Temporary storage is only so helpful here - you can't buffer gigabytes of data easily that I'm aware.

The functions such as fwrite, fprintf, etc. Are in fact doing a write syscall. The only difference with write is that these functions use a buffer to reduce the number of syscalls.
So, if I need to choose between fwrite, fprintf and write, I would avoid fprintf because it's a nice but complicated function that does a lot of things. If I really need something fast, I would reimplement the formating part myself to the bare minimum required. And between fwrite and write, I would pick fwrite if I need to write a lot of small data, otherwise write could be faster because it doesn't require the whole buffering system.

As far as I'm aware, the biggest bottleneck would be to write a character at a time (for example, using fputc). This is compared to building up a buffer in memory and dumping the whole lot (using fwrite). Experience has shown me that using fputc and writing individual characters is considerably slower.
This is probably because of hardware factors, rather than any one function being faster.

The bottleneck in performance of output is formatting the characters.
In embedded systems, I improved performance by formatting text into a buffer (array of characters), then sending the entire buffer to output using block write commands, such as cout.write or fwrite. The functions bypass formatting and pass the data almost straight through.
You may encounter buffering by the OS along the way.
The bottleneck isn't due to the process of formatting the characters, but the multiple calls to the function.
If the text is constant, don't call the formatted output functions, write it direct:
static const char Message[] = "Hello there\n";
cout.write(&Message[0], sizeof(Message) - 1); // -1 because the '\0' doesn't need to be written

cout is actually slightly faster than printf because it is a template function, so the assembly is pre-compiled for the used type, although the difference in speed is negligible. I think that your real bottle neck isn't the call the language is making, but your hard-drives write rate. If you really want to go all the way with this, you could create a multi-thread or network solution that will store the data in a buffer, and then slowly write the data to the a hard-drive separate from the processing of the data.

Writing similar contents to many files at once in C++

I am working on a C++ program that needs to write several hundreds of ASCII files. These files will be almost identical. In particular, the size of the files is always exactly the same, with only few characters different between them.
For this I am currently opening up N files with a for-loop over fopen and then calling fputc/fwrite on each of them for every chunk of data (every few characters). This seems to work, but it feels like there should be some more efficient way.
Is there something I can do to decrease the load on the file system and/or improve the speed of this? For example, how taxing is it on the file system to keep hundreds of files open and write to all of them bit by bit? Would it be better to open one file, write that one entirely, close it and only then move on to the next?

If you consider the cost of context switches usually involved on doing any of those syscalls then yes, you should "pigghy back" as much data is possible taing into account the writing time and the lenght of buffers.
Given also the fact that this is primarly an io driven problem maybe a pub sub architecture where the publisher bufferize data for you to give to any subscriber that does the io work (and that also waits for the underlying storage mechanism to be ready) could be a good choice.

You can write just once to one file and then make copies of that file. You can read about how making copies here
This is the sample code from the upper link how to do it in C++:
int main() {
String* path = S"c:\\temp\\MyTest.txt";
String* path2 = String::Concat(path, S"temp");
// Ensure that the target does not exist.
File::Delete(path2);
// Copy the file.
File::Copy(path, path2);
Console::WriteLine(S"{0} copied to {1}", path, path2);
return 0;
}

Without benchmarking your particular system, I would GUESS - and that is probably as best as you can get - that writing a file at a time is better than opening lost of files and writing the data to several files. After all, preparing the data in memory is a minor detail, the writing to the file is the "long process".

I have done some testing now and it seems like, at least on my system, writing all files in parallel is about 60% slower than writing them one after the other (263s vs. 165s for 100 files times 100000000 characters).
I also tried to use ofstream instead of fputc, but fputc seems to be about twice as fast.
In the end, I will probably keep doing what I am doing at the moment, since the complexity of rewriting my code to write one file at a time is not worth the performance improvement.

Several ifstreams vs. ifstream + constant seeking

I'm writing an external merge sort. It works like that: read k chunks from big file, sort them in memory, perform k-way merge, done. So I need to sequentially read from different portions of the file during the k-way merge phase. What's the best way to do that: several ifstreams or one ifstream and seeking? Also, is there a library for easy async IO?

Use one ifstream at a time on the same file. More than one wastes resources, and you'd have to seek anyway (because by default the ifstream's file pointer starts at the beginning of the file).
As for a C++ async IO library, check out this question.
EDIT: I originally misunderstood what you are trying to do (this Wikipedia article filled me in). I don't know how much ifstream buffers by default, but you can turn off buffering by using the pubsetbuf(0, 0); method described here, and then do your own buffering. This may be slower, however, than using multiple ifstreams with automatic buffering. Some benchmarking is in order.

Definitely try the multiple streams. Seeking probably throws away internally buffered data (at least within the process, even if the OS retains it in cache), and if the items you're sorting are small that could be very costly indeed.
Anyway, it shouldn't be too hard to compare the performance of your two fstream strategies. Do a simple experiment with k = 2.
Note that there may be a limit on the number of simultaneous open files one process can have (ulimit -n). if you reach that, then you might want to consider using a single stream, but buffering data from each of your k chunks manually.
It might be worth mmapping the file and using multiple pointers, if the file is small enough (equivalently: your address space is large enough).

All things equal what is the fastest way to output data to disk in C++?

I am running simulation code that is largely bound by CPU speed. I am not interested in pushing data in/out to a user interface, simply saving it to disk as it is computed.
What would be the fastest solution that would reduce overhead? iostreams? printf? I have previously read that printf is faster. Will this depend on my code and is it impossible to get an answer without profiling?
This will be running in Windows and the output data needs to be in text format, tab/comma separated, with formatting/precision options for mostly floating point values.

Construct (large-ish) blocks of data which can be sequentially written and use asynchronous IO.
Accurately Profiling will be painfull, read some papers on the subject: scholar.google.com.

I haven't used them myself, but I've heard memory mapped files offer the best optimisation opportunities to the OS.
Edit: related question, and Wikipedia article on memory mapped files — both mention performance benefits.

My thought is that you are tackling the wrong problem. Why are you writing out vast quantities of text formatted data? If it is because you want it to be human readable, writing a quick browser program to read the data in binary format on the fly - this way the simulation application can quickly write out binary data and the browser can do the grunt work of formatting the data as and when needed. If it is because you are using some stats package to read and analyse text data then write one that inputs binary data.

Scott Meyers' More Effective C++ point 23 "Consider alternate libraries" suggests using stdio over iostream if you prefer speed over safety and extensibility. It's worth checking.

The fastest way is what is fastest for your particular application running on its typical target OS and hardware. The only sensible thing to do do is to try several approaches and time them. You probably don't need a complete profile, and the exercise should only take a few hours. I would test, in this order:
normal C++ stream I/O
normal stream I/O using ostream::write()
use of the C I/O library
use of system calls such as write()
asynch I/O
And I would stop when I found a solution that was fast enough.

Text format means it's for human consumption. The speed at which humans can read is far, far lower than the speed of any reasonable output method. There's a contradiction somewhere. I suspect the "output must be text format".
Therefore, I beleive the correct was is to output binary, and provide a separate viewer to convert individual entries to readable text. Formatting in the viewer need only be as fast as people can read.

Mapping the file to memory (i.e. using a Memory Mapped File) then just memcopy-ing data there is a really fast way of reading/writing.
You can use several threads/cores to write to the data, and the OS/kernel will sync the pages to disk, using the same kind of routines used for virtual memory, which one can expect to be optimized to hell and back, more or less.
Chiefly, there should be few extra copies/buffers in memory when doing this. The writes are caught by interrupts and added to the disk queue once a page has been written.

Open the file in binary mode, and write "unformatted" data to the disc.
fstream myFile;
...
myFile.open ("mydata.bin", ios:: in | ios::out | ios::binary);
...
class Data {
int key;
double value;
char[10] desc;
};
Data x;
myFile.seekp (location1);
myFile.write ((char*)&x, sizeof (Data));
EDIT: The OP added the "Output data needs to be in text format, whether tab or comma separated." constraint.
If your application is CPU bound, the formatting of output is an overhead that you do not need. Binary data is much faster to write and read than ascii, is smaller on the disc (e.g. there are fewer total bytes written with binary than with ascii), and because it is smaller it is faster to move around a network (including a network mounted file system). All indicators point to binary as a good overall optimization.
Viewing the binary data can be done after the run with a simple utility that will dump the data to ascii in whatever format is needed. I would encourage some version information be added to the resulting binary data to ensure that changes in the format of the data can be handled in the dump utility.
Moving from binary to ascii, and then quibbling over the relative performance of printf versus iostreams is likely not the best use of your time.

The fastest way is completion-based asynchronous IO.
By giving the OS a set of data to write, which it hasn't actually written when the call returns, the OS can reorder it to optimise write performance.
The API for doing this is OS specific: on Linux, its called AIO; on Windows its called Completion Ports.

A fast method is to use double buffering and multiple threads (at least two).
One thread is in charge of writing data to the hard drive. This task checks the buffer and if not empty (or another rule perhaps) begins writing to the hard drive.
The other thread writes formatted text to the buffer.
One performance issue with hard drives is the amount of time required to get up to speed and position the head to the correct location. To avoid this from happening, the objective is to continually write to the hard drive so that it doesn't stop. This is tricky and may involve stuff outside of your program's scope (such as other programs running at the same time). The larger the chunk of data written to the hard drive, the better.
Another thorn is finding empty slots on the hard drive to put the data. A fragmented hard drive would be slower than a formatted or defragmented drive.
If portability is not an issue, you can check your OS for some APIs that perform block writes to the hard drive. Or you can go down lower and use the API that writes directly to the drive.
You may also want your program to change it's priority so that it is one of the most important tasks running.

std::istream::get efficiency

c++ question.
for(i=1;i<10000;i++){
cout << myfile.get();
}
Will program make 10000 IO operations on the file in HDD? (given that file is larger)
If so, maybe it is better to read lets say 512 bytes to some buffer and then take char by char from there and then again copy 512 bytes and so on?

As others have said - try it. Tests I've done show that reading a large block in one go (using streams) can be up to twice as fast as depending solely on the stream's own buffering. However, this is dependent on things like buffer size and (I would expect) stream library implementation - I use g++.

Your OS will cache the file, so you shouldn't need to optimize this for common use.

ifstream is buffered, so, no.

Try it.
However, in many cases, the fastest operation will be to read the whole file at once, and then work on in-memory data.
But really, try out each strategy, and see what works best.
Keep in mind though, that regardless of the underlying file buffering mechanism, reading one byte at a time is slow. If nothing else, it calls the fairly slow IOStreams library 10000 times, when you could have done just a couple of calls.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js