std::istream::get efficiency - c++

c++ question.
for(i=1;i<10000;i++){
cout << myfile.get();
}
Will program make 10000 IO operations on the file in HDD? (given that file is larger)
If so, maybe it is better to read lets say 512 bytes to some buffer and then take char by char from there and then again copy 512 bytes and so on?

As others have said - try it. Tests I've done show that reading a large block in one go (using streams) can be up to twice as fast as depending solely on the stream's own buffering. However, this is dependent on things like buffer size and (I would expect) stream library implementation - I use g++.

Your OS will cache the file, so you shouldn't need to optimize this for common use.

ifstream is buffered, so, no.

Try it.
However, in many cases, the fastest operation will be to read the whole file at once, and then work on in-memory data.
But really, try out each strategy, and see what works best.
Keep in mind though, that regardless of the underlying file buffering mechanism, reading one byte at a time is slow. If nothing else, it calls the fairly slow IOStreams library 10000 times, when you could have done just a couple of calls.

Related

How do I do stdin.byLine, but with a buffer?

I'm reading multi-gigabyte files and processing them from stdin. I'm reading from stdin like this.
string line;
foreach(line1; stdin.byLine){
line = to!string(line1);
...
}
Is there a faster way to do this? I tried a threading approach with
auto childTid = spawn(&fn, thisTid);
string line;
foreach(line1; stdin.byLine){
line = to!string(line1);
receiveOnly!(int);
send(childTid, line);
}
int x= 0;
send(childTid, x);
That allows it to load at least one more line from disk while my process is running at the cost of a copy operation, but this is still silly, what I need is fgets, or a way to combine stdio.byChunk(4096) with readline. I tried fgets.
char[] buf = new char[4096];
fgets(buf.ptr, 4096, stdio)
but it always fails with stdio is a file and not a stream. Not sure how to make it a stream. Any help would be appreciated with the approach you think best. I'm not very good at D, apologies for any noob mistakes.
There are actually already two layers of buffering under the hood (excluding the hardware itself): the C runtime library and the kernel both do a layer of buffering to minimize I/O costs.
First, the kernel keeps data from disk in its own buffer and will look ahead, loading beyond what you request in a single call if you are following a predictable pattern. This is to mitigate the low-level costs associated with seeking the device and will cache across processes - if you read a file with one program then again with a second, the second will probably get it from the kernel memory cache instead of the physical disk and may be noticeably much faster.
Second, the C library, on which D's std.stdio is built, also keeps a buffer. readln ultimately calls C file I/O functions which read a chunk from the kernel at a time. (Fun fact, writes are also buffered by the C library, default by line if user interactive and by chunk otherwise. Writing is quite slow and doing it by chunk makes a big difference, but sometimes the C lib thinks a pipe isn't interactive when it is and leads to a FAQ: Simple D program Output order is wrong )
These C lib buffers also mitigate the costs of many small reads and writes by batching them up before even sending to the kernel. In the case of readln, it will likely read several kilobytes at once, even if you ask for just one line or one byte, and the rest stays in the buffer for next time.
So your readln loop is already going to be automatically buffered and should get decent I/O performance.
You might be able to do it better yourself with a few techniques though. In that case, you may try using std.mmfile for a memory-mapped file and reading it as if i was an array, but your files are too big to fit in that on 32 bit. Might work on 64 bit though. (Note that a memory mapped file is NOT loaded all at once, it is just mapped to a memory address. When you actually touch part of it, the operating system will load/save on demand.)
Or, of course, you can use the lower level operating system functions like write from import core.sys.posix.unistd or WriteFile from import core.sys.windows.windows, which will bypass the C lib's layers (but, of course, keep the kernel layers, which you want, don't try to bypass them.)
You can look for any win32 or posix system call C tutorials if you want to know more about using those functions. It is the same in D as in C, with minor caveats like the import instead of #include.
Once you load the chunk, you will want to scan it for the newline and slice it in all probability to form the range to pass to the loop or other algorithms. The std.range and std.algorithm modules also have searching, splitting, and chunking functions that might help, but you need to be careful with lines that span the edges of your buffers to keep correctness and efficiency.
But if your performance is good enough as it is, I'd say just leave it - the C lib+kernel's buffering do a pretty good job in most cases.

Fastest output to file in c and c++

I was helping someone with a question about outputting in C, and I was unable to answer this seemingly simple question I wanted to use the answer to (in my answer), that is:
What's the fastest way to output to a file in C / C++?
I've done a lot of work with prime number generation and mathematical algorithm optimization, using C++ and Java, and this was the biggest holdup for me sometimes - I sometimes need to move a lot to a file and fast.
Forgive me if this has been answered, but I've been looking on google and SO for some time to no avail.
I'm not expecting someone to do the work of benchmarking - but there are several ways to put to file and I doubt I know them all.
So to summarize,
What ways are there to output to a file in C and C++?
And which of these is/are the faster ones?
Obviously redirecting from the console is terrible.
Any brief comparison of printf, cout, fputc, etc. would help.
Edit:
From the comments,
There's a great baseline test of cout and printf in:
mixing cout and printf for faster output
This is a great start, but not the best answer to what I'm asking.
For example, it doesn't handle std::ostreambuf_iterator<> mentioned in the comments, if that's a possibility. Nor does it handle fputc or mention console redirection (how bad in comparison)(not that it needs to)
Edit 2:
Also, for the sake of arguing my historical case, you can assume a near infinite amount of data being output (programs literally running for days on a newer Intel i7, producing gigabytes of text)
Temporary storage is only so helpful here - you can't buffer gigabytes of data easily that I'm aware.
The functions such as fwrite, fprintf, etc. Are in fact doing a write syscall. The only difference with write is that these functions use a buffer to reduce the number of syscalls.
So, if I need to choose between fwrite, fprintf and write, I would avoid fprintf because it's a nice but complicated function that does a lot of things. If I really need something fast, I would reimplement the formating part myself to the bare minimum required. And between fwrite and write, I would pick fwrite if I need to write a lot of small data, otherwise write could be faster because it doesn't require the whole buffering system.
As far as I'm aware, the biggest bottleneck would be to write a character at a time (for example, using fputc). This is compared to building up a buffer in memory and dumping the whole lot (using fwrite). Experience has shown me that using fputc and writing individual characters is considerably slower.
This is probably because of hardware factors, rather than any one function being faster.
The bottleneck in performance of output is formatting the characters.
In embedded systems, I improved performance by formatting text into a buffer (array of characters), then sending the entire buffer to output using block write commands, such as cout.write or fwrite. The functions bypass formatting and pass the data almost straight through.
You may encounter buffering by the OS along the way.
The bottleneck isn't due to the process of formatting the characters, but the multiple calls to the function.
If the text is constant, don't call the formatted output functions, write it direct:
static const char Message[] = "Hello there\n";
cout.write(&Message[0], sizeof(Message) - 1); // -1 because the '\0' doesn't need to be written
cout is actually slightly faster than printf because it is a template function, so the assembly is pre-compiled for the used type, although the difference in speed is negligible. I think that your real bottle neck isn't the call the language is making, but your hard-drives write rate. If you really want to go all the way with this, you could create a multi-thread or network solution that will store the data in a buffer, and then slowly write the data to the a hard-drive separate from the processing of the data.

Efficient way to write results to file during the computational experiment

I have a piece of software that performs a set of experiments (C++).
Without storing the outcomes, all experiments take a little over a minute.
The total amount of data generated is equal to 2.5 Gbyte, which is too large to store in memory till the end of the experiment and write to file afterwards.
Therefore I write them in chunks.
for(int i = 0; i < chunkSize;i++){
outfile << results_experiments[i] << endl;
}
where
ofstream outfile("data");
and outfile is only closed at the end.
However when I write them in chunks of 4700 kbytes (actually 4700/Chunksize = size of results_experiments element) the experiments take about 50 times longer (over an hour...). This is unacceptable and makes my prior optimization attempts look rather silly. Especially since these experiments again need to be perfomed using many different parameter settings ect.. (at least 100 times, but preferably more)
Concrete my question is:
What would be the ideal chunksize to write at?
Is there a more efficient way than (or something very inefficient in) the way I write data currently?
Basically: Help me getting the file IO overhead introduced as small as possible..
I think it should be possible to do this a lot faster as copying (writing & reading!) the resulting file (same size), takes me under a minute..
The code should be fairly platform independent and not use any (non standard) libraries (I can provide seperate versions for seperate platforms & more complicated install instructions, but it is a hassle..)
If it is not feasible to get the total experiment time under 5 minutes, without platform/library dependencies (and possible with), I will seriously consider introducing these. (platform is windows, but a trivial linux port should at least be possible)
Thank you for your effort.
For starters not flushing the buffer for every chunk seems like a good idea. It also seems possible to do the IO asynchronously, as it is completely independent of the computation. You can also use mmap to improve the performance of File I/O.
If the output doesn't have to be human-readable, then you could investigate a binary format. Storing data in binary format occupies less space than text format and therefore needs less disk i/o. But there'll be little difference if the data is all strings. So if you write out as much as possible as numbers and not formatted text you could get a big gain.
However I'm not sure if/how this is done with STL iostreams. The C-style way is using fopen(..., "wb") and fwrite(&object, ...).
I think boost::Serialisation can do binary output using << operator.
Also, can you reduce the amount you write? e.g. no formatting or redundant text, just the bare minimum.
Whether endl flushes the buffer when writing to a ofstream is implementation dependent--
You might also try increasing the buffer size of your ofstream
char *biggerbuffer = new char[512000];
outfile.rdbuf()->pubsetbuf(biggerbuffer,512000);
The availability of pubsetbuf may vary depending on your iostream implementation

Performance of copying a file with fread/fwrite to USB

I'm in front of a piece of code, which copies a file to a usb-device.
Following part is the important one:
while((bytesRead = fread(buf, 1, 16*1024, m_hSource)) && !bAbort) {
// write to target
long bytesWritten = fwrite(buf, 1, bytesRead, m_hTarget);
m_lBytesCopied += bytesWritten;
The thing, the customer said, it's pretty slow in comparison to normal pc<->usb speed. I didn't code this, so it's my job, to optimize.
So I was wondering, if it's a better approach to first read the complete file and then write the file in one step. But I don't know how error-prone this would be.
The code also check after each copystep if all bytes where written correctly, so that might also slow down the process.
I'm not that c++ & hardware guru, so I'm asking you guys, how I could speed things up and keep the copying successful.
Try to read/write in big chunk. 16M, 32M are not bad for copying file.
If you just want to copy the file you can always invoke system() It'll be faster.
The code also check after each copystep if all bytes where written correctly, so that might also slow down the process.
You can check it by creating hash of bigger chunk. Like splitting the file into 64M chunks. Then match hashes of those chunks. Bittorrent protocol has this feature.
If you have mmap or MapViewOfFile available, map the file first. Then write it to usb. This way read operation will be handled by kernel.
Kerrek just commented about using memcpy on mmap. memcpy with 2 mmaped file seems great.
Also note that, Most recent operating systems writes to USB stick when they are being removed. Before removal it just writes the data in a cache. So copy from OS may appear faster.
What about overlapping reads and writes?
In the current code, the total time is time(read original) + time(write copy), if you read the first block, then while writing it start reading the second block, etc. your total time would be max(time(read original), time(write copy)) (plus the time reading/writing the first and last blocks that won't be pipelined).
It could be almost half the time if reading and writing takes more or less the same time.
You can do it with two threads or with asynchronous IO. Unfortunately, threads and async IO are platform dependent, so you'll have to check your system manual or choose appropriate portable libraries.
I would just go with some OS specific functions that for sure do this faster that anything written only with c/c++ functions.
For Linux this could be sendfile function. For Windows CopyFile will do the job.

Several ifstreams vs. ifstream + constant seeking

I'm writing an external merge sort. It works like that: read k chunks from big file, sort them in memory, perform k-way merge, done. So I need to sequentially read from different portions of the file during the k-way merge phase. What's the best way to do that: several ifstreams or one ifstream and seeking? Also, is there a library for easy async IO?
Use one ifstream at a time on the same file. More than one wastes resources, and you'd have to seek anyway (because by default the ifstream's file pointer starts at the beginning of the file).
As for a C++ async IO library, check out this question.
EDIT: I originally misunderstood what you are trying to do (this Wikipedia article filled me in). I don't know how much ifstream buffers by default, but you can turn off buffering by using the pubsetbuf(0, 0); method described here, and then do your own buffering. This may be slower, however, than using multiple ifstreams with automatic buffering. Some benchmarking is in order.
Definitely try the multiple streams. Seeking probably throws away internally buffered data (at least within the process, even if the OS retains it in cache), and if the items you're sorting are small that could be very costly indeed.
Anyway, it shouldn't be too hard to compare the performance of your two fstream strategies. Do a simple experiment with k = 2.
Note that there may be a limit on the number of simultaneous open files one process can have (ulimit -n). if you reach that, then you might want to consider using a single stream, but buffering data from each of your k chunks manually.
It might be worth mmapping the file and using multiple pointers, if the file is small enough (equivalently: your address space is large enough).