How to optimize input/output in C++

How to optimize input/output in C++ - c++

I'm solving a problem which requires very fast input/output. More precisely, the input data file will be up to 15MB. Is there a fast, way to read/print integer values.
Note: I don't know if it helps, but the input file has the following form:
line 1: a number n
line 2..n+1: three numbers a,b,c;
line n+2: a number r
line n+3..n+4+r: four numbers a,b,c,d
Note 2: The input file will be stdin.
Edit: Something like the following isn't fast enough:
void fast_scan(int &n) {
char buffer[10];
gets(buffer);
n=atoi(buffer);
}
void fast_scan_three(int &a,int &b,int &c) {
char buffval[3][20],buffer[60];
gets(buffer);
int n=strlen(buffer);
int buffindex=0, curindex=0;
for(int i=0; i<n; ++i) {
if(!isdigit(buffer[i]) && !isspace(buffer[i]))break;
if(isspace(buffer[i])) {
buffindex++;
curindex=0;
} else {
buffval[buffindex][curindex++]=buffer[i];
}
}
a=atoi(buffval[0]);
b=atoi(buffval[1]);
c=atoi(buffval[2]);
}

General input/output optimization principle is to perform as less I/O operations as possible reading/writing as much data as possible.
So performance-aware solution typically looks like this:
Read all data from device into some buffer (using the principle mentioned above)
Process the data generating resulting data to some buffer (on place or another one)
Output results from buffer to device (using the principle mentioned above)
E.g. you could use std::basic_istream::read to input data by big chunks instead of doing it line by line. The similar idea with output - generate single string as result adding line feed symbols manually and output it at once.

If you want to minimize the physical I/O operation overhead, load the whole file into memory by a technique called memory mapped files. I doubt you'll get a noticable performance gain though. Parsing will most likely be a lot costlier.

Consider using threads. Threading is useful for lots of things, but this is exactly the kind of problem that motivated the invention of threads.
The underlying idea is to separate the input, processing, and output, so these different operations can run in parallel. Do it right and you will see a significant speedup.
Have one thread doing close to pure input. It reads lines into a buffer of lines. Have a second thread do a quick pre-parse and organize the raw input into blocks. You have two things that need to be parsed, the line that contain the number of lines that contain triples and the line that contains the number of lines that contain quads. This thread forms the raw input into blocks that are still mostly text. A third thread parses the triples and quads, re-forming the the input into fully parsed structures. Since the data are now organized into independent blocks, you can have multiple instances of this third operation so as to better take advantage of the multiple processors on your computer. Finally, other threads will operate on these fully-parsed structures. Note: It might be better to combine some of these operations, for example combining the input and pre-parsing operations into one thread.

Put several input lines in a buffer, split them, and then parse them simultaneously in different threads.

It's only 15MB. I would just slurp the whole thing into a memory buffer, then parse it.
The parsing looks something like this, approximately:
#define DIGIT(c)((c) >= '0' && (c) <= '9')
while(*p == ' ') p++;
if (DIGIT(*p)){
a = 0;
while(DIGIT(*p){
a *= 10; a += (*p++ - '0');
}
}
// and so on...
You should be able to write this kind of code in your sleep.
I don't know if that's any faster than atoi, but it's not fussing around figuring out where numbers begin and end. I would stay away from scanf because it goes through nine yards of figuring out its format string.
If you run this whole thing in a loop 1000 times and grab some stack samples, you should see that it is spending nearly 100% of its time reading in the file and generating output (which you didn't mention).
You just can't beat that.
If you do see that noticeable time is spent in the actual parsing, it might be possible to do overlapped I/O, but the machine would have to be really slow, or the I/O really fast (like from a solid state drive) before that would make sense.

Related

c++ read small portions of big number of files

I have a relatively simple question to ask, there has been an ongoing discussion regarding many programming languages about which method provides the fastest file read. Mostly debated on read() or mmap(). As a person who also participated in these debates, I failed to find an answer to my current problem, because most answers help in the situation where the file to read is huge (example, how to read a 10 TB text file...).
But my problem is a bit different, I have lots of files, lets say a 100 million. I want to read the first 1-2 lines from these files. Whether the file is 10 kb or 100 TB is irrelevant. I just want the first one or two lines from every file. So I want to avoid reading or buffering the unnecessary parts of the files. My knowledge was not enough to thoroughly test which method is faster, or to discover what are all my options in the first place.
What I am doing right know: (I am doing this multithreaded for the moment)
for(const auto& p: std::filesystem::recursive_directory_iterator(path)) {
if (!std::filesystem::is_directory(p)) {
std::ifstream read_file(p.path().string());
if (read_file.is_open()) {
while (getline(read_file, line)) {
// Get two lines here.
}
}
}
}
What does C++, or the linux environment provide me in this situation ? Is there a faster or more efficient way to read small portions of millions of files ?
Thank you for your time.
Info: I have access to C++20 and Ubuntu 18.04

You can save one underlying call to fstat by not testing if the path is a directory, and then rely on is_open test
#include <iostream>
#include <fstream>
#include <filesystem>
#include <string>
int main()
{
std::string line,path=".";
for(const auto& p: std::filesystem::recursive_directory_iterator(path)) {
{
std::ifstream read_file(p.path().string());
if (read_file.is_open()) {
std::cout << "opened: " << p.path().string() << '\n';
while (getline(read_file, line)) {
// Get two lines here.
}
}
}
}
}
At least on Windows this code skips the directories. And as suggested in comments is_open test can even be skipped since getline doesn't read anything from a directory either.
Not the cleanest, but if it can save time it's worth it.

Any function in a program accessing a file under Linux will result in calling some "system calls" (for example read()).
All other available functions in some programming language (like fread(), fgets(), std::filesystem ...) call functions or methods which in turn call some system calls.
For this reason you can't be faster than calling the system calls directly.
I'm not 100% sure, but I think in most cases, the combination open(), read(), close() will be the fastest method for reading data from the start of a file.
(If the data is not located at the start of the file, pread() might be faster than read(); I'm not sure.)
Note that read() does not read a certain number of lines but a certain number of bytes (e.g. into an array of char), so you have to find the end(s) of the line(s) "manually" by searching the '\n' character(s) and/or the end of the file in the array of char.
Unfortunately, a line may be much longer than you expect, so reading the first N bytes from the file does not contain the first M lines and you have to call read() again.
In this case it depends on your system (e.g. file system or even hard disks) how many bytes you should read in each call to read() to get the maximum performance.
Example: Let's say in 75% of all files, the first N lines are found in the first 512 bytes of the file; in the other 25% of all files, first N lines are longer than 512 bytes in sum.
On some computers, reading 1024 bytes at once might require nearly the same time as reading 512 bytes, but reading 512 bytes twice will be much slower than reading 1024 bytes at once; on such computers it makes sense to read() 1024 bytes at once: You save a lot of time for 25% of the files and you lose only very little time for the other 75%.
On other computers, reading 512 bytes is significantly faster than reading 1024 bytes; on such computers it would be better to read() 512 bytes: Reading 1024 bytes would save you only little time when processing the 25% of files but cost you very much time when processing the other 75%.
I think in the most cases this "optimal value" will be a multiple of 512 bytes because most modern file systems organize files in units that have a multiple of 512 bytes.

I was just typing something similar to Martin Rosenau answer (when his popped up): unstructured read of the max length of two lines. But I would go further: queue that text buffer with corresponding file name and let another thread parse / analyze that. If parsing takes about the same time as reading, you can save half of the time. If it takes longer (unlikely) - you can use multiple threads and save even more.
Side note - you should not parallelize reading (tried that).
It may be worth experimenting: can you open one file, read it asynchronously while proceeding to open the next one? I don't know if any OS can overlap those things.

Efficient way to read file multiple line one time?

I am now trying to handle a large file (several GB), so I am thinking to use multi-thread. The file is multiple lines of data like:
data1 attr1.1 attr1.2 attr1.3
data2 attr2.1 attr2.2 attr2.3
data3 attr3.1 attr3.2 attr3.3
I am thinking to use one thread read multiple lines first to a buffer1, and then one other thread to handle the data in buffer1 line by line, while the reading thread start to read file to buffer2. Then the handling thread continues when buffer2 is ready, and the reading thread read to buffer1 again.
Now I finished the handler part by using freads for small file (several KB), but I am not sure how to make the buffer contains the complete line instead of splitting part of line at end of the buffer, which is like this:
data1 attr1.1 attr1.2 attr1.3
data2 attr2.1 att
Also, I find that the fgets or ifstream getline can read file line by line, but would it be very costly since it has many IOs?
Now I am struggling to figure out what it the best way to do that? Is there any efficient way to read multiple lines at one time? Any advice is appreciated.

C stdio and C++ iostream functions use buffered I/O. Small reads only have function-call and locking overhead, not read(2) system call overhead.
Without knowing the line length ahead of time, fgets has to either use a buffer or read one byte at a time. Luckily, the C/C++ I/O semantics allow it to use buffering, so every mainstream implementation does. (According to the docs, mixing stdio and I/O on the underlying file descriptors gives undefined results. This is what allows buffering.)
You're right that it would be a problem if every fgets required a system call.
You might find it useful for one thread to read lines and put the lines into some kind of data structure that's useful for the processing thread.
If you don't have to do much processing on each line, doing the I/O in the same thread as the processing will keep everything in the L1 cache of that CPU, though. Otherwise data will end up in L1 of the I/O thread, and then have to make it to L1 of the core running the processing thread.
Depending on what you want to do with your data, you can minimize copying by memory-mapping the file in-place. Or read with fread, or skip the stdio layer entirely and just use POSIX open / read, if you don't need your code to be as portable. Scanning a buffer for newlines migh have less overhead than what the stdio functions do.
You can handle the leftover line at the end of the buffer by copying it to the front of the buffer, and calling the next fread with a reduced buffer size. (Or, make your buffer ~1k bigger than the size of your fread calls, so you can always read multiples of the memory and filesystem page size (typically 4kiB), unless the trailing part of the line is > 1k.)
Or use a circular buffer, but reading from a circular buffer means checking for wraparound every time you touch it.

It all depends what you want to do as processing afterwards : do you need to keep a copy of the lines ? Do you intend to process input as std::strings ? etc...
Here some general remarks that could help you further:
istream::getline() and fgets() are buffered operations. So I/O is already reduced and you could assume the performance is already correct.
std::getline() is also buffered. Nevertheless, if you don't need to process std::strings the function would cost you a considerable number of memory allocation/deallocation, which might impact performance
Bloc operations like read() or fread() can achieve economies of scale if you can afford large buffers. This can be especially efficient, if you use the data in a throw-away fashion (because you can avoid copying the data and work directly in the buffer), but at the cost of an extra complexity.
But all these considerations shall not forget that the erformance is very much affected by the library implementation that you use.
I've done a little informal benchmark reading a milion of lines in the format you've shown:
* With MSVC2015 on my PC the read() is twice as fast as fgets(), and almost 4 times faster than std::string.
* With GCC on CodingGround, compiling with O3, fgets(), and both getline() are approximately the same, and the read() is slower.
Here the full code if you want to play around.
Here the the code that show you how to move the buffer arround.
int nr=0; // number of bytes read
bool last=false; // last (incomplete) read
while (!last)
{
// here nr conains the number of bytes kept from incomplete line
last = !ifs.read(buffer+nr, szb-nr);
nr = nr+ifs.gcount();
char *s, *p = buffer, *pe = p + nr;
do { // process complete lines in buffer
for (s = p; p != pe && *p != '\n'; p++)
;
if (p != pe || (p == pe && last)) {
if (p != pe)
*p++ = '\0';
lines++; // TO DO: here s is a null terminated line to process
sln += strlen(s); // (dummy operatio for the example)
}
} while (p != pe); // until eand of buffer is reached
std::copy(s, pe, buffer); // copy last (incoplete) line to begin of buffer
nr = pe - s; // and prepare the info for the next iteration
}

Logic behind Fast input

Can somebody explain the logic behind this fast input?I know its faster than scanf.
int scan()
{
int ip=getchar_unlocked(),ret=0;
for(;ip<'0'||ip>'9';ip=getchar_unlocked());
for(;ip>='0'&&ip<='9';ip=getchar_unlocked())
ret=ret*10+ip-'0';
return ret;
}

The unlocked part here is to avoid locking the input file (thus potentially causing problems if multiple threads are reading from the same input).
This is probably where 90% of the gains are, compared to other using getchar, and that in turn is probably only marginally better than scanf. Obviously, scanf also has overhead in parsing the format string, which may be a bit of an overhead.
The rest of the code is simply "skip anything that isn't a digit", then read a decimal number into ret, stopping when the digit is a non-digit.
For reading vast number of inputs, I would suggest using fread (or mmap or MapViewoOfFile if the system is known to support one of those calls) to load up a large amount of input data in a buffer, and then use a pointer-based method to "skip over non-digits" (assuming this is a "safe" thing to do). Highly likely that this is faster again than the code above.

Proper, efficient file reading

I'd like to read and process (e.g. print) entries from the first row of a CSV file one at a time. I assume Unix-style \n newlines, that no entry is longer than 255 chars and (for now) that there's a newline before EOF. This is meant to be a more efficient alternative to fgets() followed by strtok().
#include <stdio.h>
#include <string.h>
int main() {
int i;
char ch, buf[256];
FILE *fp = fopen("test.csv", "r");
for (;;) {
for (i = 0; ; i++) {
ch = fgetc(fp);
if (ch == ',') {
buf[i] = '\0';
puts(buf);
break;
} else if (ch == '\n') {
buf[i] = '\0';
puts(buf);
fclose(fp);
return 0;
} else buf[i] = ch;
}
}
}
Is this method as efficient and correct as possible?
What is the best way to test for EOF and file reading errors using this method? (Possibilities: testing against the character macro EOF, feof(), ferror(), etc.).
Can I perform the same task using C++ file I/O with no loss of efficiency?

What is most efficient is going to depend a lot on the operating system, standard libraries (e.g. libc), and even the hardware you are running on. This makes it nearly impossible to tell you what's "most efficient".
That having been said, there are a few things you could try:
Use mmap() or a local operating system equivalent (Windows has CreateFileMapping / OpenFileMapping / MapViewOfFile, and probably others). Then you don't do explicit file reads: you simply access the file as if it were already in memory, and anything that isn't there will be faulted in by the page fault mechanism.
Read the entire file into a buffer manually, then work on that buffer. The fewer times you call into file read functions, the fewer function-call overheads you take, and likely also fewer application/OS domain switches. Obviously this uses more memory, but may very well be worth it.
Use a more optimal string scanner for your problem and platform. Going character-by-character yourself is almost never as fast as relying on something existing that's close to your problem domain. For example, you can bet that strchr and memchr are probably better-optimized than most code you can roll yourself, doing things like reading entire cachelines or words at once, scanning using better algorithms for this kind of search, etc. For more complicated cases, you might consider a full regular expression engine that could compile your regex to something fast for your complicated case.
Avoid copying your string around. It may be helpful to think in terms of "find delimiters" and then "output between delimiters". You could for example use strchr to find the next character of interest, and then fwrite or something to write to stdout directly from your input buffer. Then you're keeping most of your work in a few local registers, rather than using a stack or heap buf.
When in doubt, though, try a few possibilities and profile, profile, profile.
Also for this kind of problem, be very aware of differences between runs that are caused by OS and hardware caches: profile a bunch of runs rather than just one after each change -- and if possible, use tests that will either likely always hit caches (if you're trying to measure best-case performance) or tests that will likely miss (if you're trying to measure worst-case performance).
Regarding C++ file IO (fstream and such), just be aware that they're larger, more complicated beasts. They tend to include things such as locale management, automatic buffering, and the like -- as well as being less prone to particular types of coding mistakes.
If you're doing something pretty simple (like what you describe here), I tend to find C++ library stuff gets in the way. (Use a debugger and "step instruction" through a stringstream method versus some C string functions some time, you'll get a good feel for this quickly.)
It all depends on whether you're going to want or need that additional functionality or safety in the future.
Finally, the obligatory "don't sweat the small stuff". Only spend time on optimizing here if it's really important. Otherwise trust the libraries and OS to do the right thing for you most of the time -- if you get too far into micro-optimizations you'll find you're shooting yourself in the foot later. This is not to discourage you from thinking in terms of "should I read the whole file in ahead of time, will that break future use cases" -- because that's macro, rather than micro.
But generally speaking if you're not doing this kind of "make it faster" investigation for a good reason -- i.e. "need this app to perform better now that I've written it, and this code shows up as slow in profiler", or "doing this for fun so I can better understand the system" -- well, spend your time elsewhere first. =)

One method, provided you are going to scan through the file serially, is to use 2 buffers of a decent enough size (16K is the optimal size for SSDs and 4K for HDDs IIRC. But 16K should suffice for both). You start off by performing an asynchronous load (In windows look up Overlapped I/O and on Unix/OSX use O_NONBLOCK) of the first 16K into buffer 0 and then start another load into buffer 1 of bytes 16K to 32K. When your read position hits 16K, swap the buffers (so you are now reading from buffer 1 instead) wait for any further loads to complete into buffer 1 and perform an asynchronous load of bytes 32K to 48K into buffer 0 and so on. This way, you have far less chance of ever having to wait for a load to complete as it should be happening while you are processing the previous 16K.
I moved over to a scheme like this in my XML parser having been using fopen and fgetc previously and the speedup was huge. Loading in a 15 meg XML file and processing it reduced from minutes to seconds. Of course, Your milage may vary.

use fgets to read one line at a time. C++ file I/O are basically wrapper code with some compiler optimization tucked inside ( and many unwanted functionality ). Unless you are reading millions of lines of code and measuring time, it does not matter.

More efficient way than scanf and printf for scaning and printing integers

I just want to know that how can we read integers(from stdin) and write integers(on stdout) without using scanf/printf/cin/cout because they are too slow.
Can anyone tell me how fread and fwrite can be used for this purpose? I know only a little about buffering etc.
I want the substitute for following two codes->
[1]
long i , a[1000000];
for(i=0;i<=999999;i++)
scanf("%ld",&a[i]);
and
[2]
for(i=0;i<=999999;i++)
printf("%ld\n",a[i]);
Any other efficient method is appreciated. Thanks in advance!!

I suggest you're looking in the wrong place for speed improvements. Let's look at just printf( ):
printf( )is limited by the huge time it takes to physically (electronically?) put characters on the terminal. You could speed this up a lot by using sprintf( ) to first write the output chars into an array; and then using printf( ) to send the array to the tty. printf( ) buffers lines anyway, but using a large, multi-line output array can overcome the setup delay that happens for every line.
printf( ) formatting is a tiny part of its overhead. You can be sure that the folks who wrote this library function did their best to make it as fast as it could be. And, over the forty years that printf( ) has been around, many others have worked it over and rewritten it a few zillion times to speed it up. No matter how hard you work to do the formatting that printf( ) takes care of, it's unlikely you can improve very much on their efforts.
scanf( ) delays and overhead are analogous.
Yours is a noble effort, put I don't believe it can pay off.

boost::karma (part of boost::spirit) promises quite good generator performance. You might try it.

If it is crucial that the numbers are read fast, and written fast, but not necessarily important that the output is presented to the user fast, or read from the file fast. You might consider making a buffer between the input/output streams and the processing.
Read the file into the buffer, it can be done in a separate thread, and then extract the number from this buffer instead. And for generating the output write to a memory buffer first, and then write the buffer to a file afterwards, again this can be done in a separate thread.
Most IO routines are relatively slow, since accessing information on the disk is slow (slower than the memory or cache). Of course this only makes sense if it is not about optimising the entire output/input phase. In which case there is no point in going through a separate (own implement) buffer.
By separating the parsing of the numbers and the IO part, you will increase the perceived speed of the parsing tremendously.
If you are looking for alternatives faster than scanf / printf you might consider implementing your own method that is not depending on a format string. Specialised implementations are often faster than the generalised ones. However do consider it twice before you start reinventing the wheel.

printf is an operation on a FILE * and it buffers, puts operates on a FD and does not. Build output in buffer and then use puts. printf also has to parse the format string and takes variable type args; if you know the value is an integer, you can avoid all that by doing some math and add value of each digit to '0' and formatting the number yourself.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js