C++ Input Performance

C++ Input Performance - c++

I was trying to solve a problem on InterviewStreet. After some time I determine that I was actually spending the bulk of my time reading the input. This particular question had a lot of input, so that makes some amount of sense. What doesn't make sense is why the varying methods of input had such different performances:
Initially I had:
std::string command;
std::cin >> command;
Replacing it made it noticeably faster:
char command[5];
cin.ignore();
cin.read(command, 5);
Rewriting everything to use scanf made it even faster
char command;
scanf("get_%c", &command);
All told I cut the time reading the input down by about a 1/3.
I'm wondering there is such a variation in performance between these different methods. Additionally, I'm wondering why using gprof didn't highlight the time I was spending in I/O, rather seeming to point the blame to my algorithm.

There is a big variation in these routines because console input speed almost never matters.
And where it does (Unix shell) the code is written in C, reads directly from the stdin device and is efficient.

At the risk of being downvoted, I/O streams are, in general, slower and bulkier than their C counterparts. That's not a reason to avoid using them though in many purposes as they are safer (ever run into a scanf or printf bug? Not very pleasant) and more general (ex: overloaded insertion operator allowing you to output user-defined types). But I'd also say that's not a reason to use them dogmatically in very performance-critical code.
I do find the results a bit surprising though. Out of the three you listed, I would have suspected this to be fastest:
char command[5];
cin.ignore();
cin.read(command, 5);
Reason: no memory allocations needed and straightforward reading of a character buffer. That's also true of your C example below, but calling scanf to read a single character repeatedly isn't anywhere close to optimal either even at the conceptual level, as scanf must parse the format string you passed in each time. I'd be interested in the details of your I/O code as it seems that there is a reasonable possibility of something wrong happening when scanf calls to read a single character turn out to be the fastest. I just have to ask and without meaning to offend, but is the code truly compiled and linked with optimizations on?
Now as to your first example:
std::string command;
std::cin >> command;
We can expect this to be quite a bit slower than optimal for the reason that you're working with a variable-sized container (std::string) which will have to involve some heap allocations to read in the desired buffer. When it comes to stack vs. heap issues, the stack is always significantly faster, so if you can anticipate the maximum buffer size needed in a particular case, a simple character buffer on the stack will beat std::string for input (even if you used reserve). This is likewise true of an array on the stack as opposed to std::vector but these containers are best used for cases where you cannot anticipate the size in advance. Where std::string can be faster would be cases where people might be tempted to call strlen repeatedly where storing and maintaining a size variable would be better.
As to the details of gprof, it should be highlighting those issues. Are you looking at the full call graph as opposed to a flat profile? Naturally the flat profile could be misleading in this case. I'd have to know some further details on how you are using gprof to give a better answer.

gprof only samples during CPU time, not during blocked time.
So, a program may spend an hour doing I/O, and a microsecond doing computation, and gprof will only see the microsecond.
For some reason, this isn't well known.

By default, the standard iostreams are configured to work together and with the C stdio library — in practice this means using cin and cout for things other than interactive input and output tend to be slow.
To get good performance using cin and cout, you need to disable the synchronization with stdio. For high performance input, you might even want to untie the streams.
See the following stackoverflow question for more details.
How to get IOStream to perform better?

Related

Fortran Print Statements Performance Effects

I just inherited some old Fortran code that has print statements everywhere (when it runs, the matrix streams by). I know these print statements are useless because I cannot tell what the program is printing as it is going by so fast. But is there a significant performance impact to having a lot of print statements in a Fortran program (i.e. does an overly verbose program take longer to execute)? It seems like it would as it is another line to execute, but I don't know if it is significant.

In general, yes, I/O is "relatively costly" to execute since you have to do things like formatting numbers - especially floating point numbers, even if those procedures are highly optimized. However, one of the biggest costs (the system call to actually perform the I/O after the buffer to write has been prepared) is amortized in good compilers/runtimes since the I/O statements are usually buffered by default. This helps cut down the number of system calls significantly, thus reducing delays caused by frequent context switching between your app and the OS.
That said, if you are worried about the performance hit, why don't you try to comment every PRINT or WRITE statement and see how the runtime changes? Or even better, profile your application and see the amount of time spent on I/O and related routines.

Fastest output to file in c and c++

I was helping someone with a question about outputting in C, and I was unable to answer this seemingly simple question I wanted to use the answer to (in my answer), that is:
What's the fastest way to output to a file in C / C++?
I've done a lot of work with prime number generation and mathematical algorithm optimization, using C++ and Java, and this was the biggest holdup for me sometimes - I sometimes need to move a lot to a file and fast.
Forgive me if this has been answered, but I've been looking on google and SO for some time to no avail.
I'm not expecting someone to do the work of benchmarking - but there are several ways to put to file and I doubt I know them all.
So to summarize,
What ways are there to output to a file in C and C++?
And which of these is/are the faster ones?
Obviously redirecting from the console is terrible.
Any brief comparison of printf, cout, fputc, etc. would help.
Edit:
From the comments,
There's a great baseline test of cout and printf in:
mixing cout and printf for faster output
This is a great start, but not the best answer to what I'm asking.
For example, it doesn't handle std::ostreambuf_iterator<> mentioned in the comments, if that's a possibility. Nor does it handle fputc or mention console redirection (how bad in comparison)(not that it needs to)
Edit 2:
Also, for the sake of arguing my historical case, you can assume a near infinite amount of data being output (programs literally running for days on a newer Intel i7, producing gigabytes of text)
Temporary storage is only so helpful here - you can't buffer gigabytes of data easily that I'm aware.

The functions such as fwrite, fprintf, etc. Are in fact doing a write syscall. The only difference with write is that these functions use a buffer to reduce the number of syscalls.
So, if I need to choose between fwrite, fprintf and write, I would avoid fprintf because it's a nice but complicated function that does a lot of things. If I really need something fast, I would reimplement the formating part myself to the bare minimum required. And between fwrite and write, I would pick fwrite if I need to write a lot of small data, otherwise write could be faster because it doesn't require the whole buffering system.

As far as I'm aware, the biggest bottleneck would be to write a character at a time (for example, using fputc). This is compared to building up a buffer in memory and dumping the whole lot (using fwrite). Experience has shown me that using fputc and writing individual characters is considerably slower.
This is probably because of hardware factors, rather than any one function being faster.

The bottleneck in performance of output is formatting the characters.
In embedded systems, I improved performance by formatting text into a buffer (array of characters), then sending the entire buffer to output using block write commands, such as cout.write or fwrite. The functions bypass formatting and pass the data almost straight through.
You may encounter buffering by the OS along the way.
The bottleneck isn't due to the process of formatting the characters, but the multiple calls to the function.
If the text is constant, don't call the formatted output functions, write it direct:
static const char Message[] = "Hello there\n";
cout.write(&Message[0], sizeof(Message) - 1); // -1 because the '\0' doesn't need to be written

cout is actually slightly faster than printf because it is a template function, so the assembly is pre-compiled for the used type, although the difference in speed is negligible. I think that your real bottle neck isn't the call the language is making, but your hard-drives write rate. If you really want to go all the way with this, you could create a multi-thread or network solution that will store the data in a buffer, and then slowly write the data to the a hard-drive separate from the processing of the data.

How negligible is the time for ofstream

I have a C++ program where I do various experiments and during these experiments I output some values to a file using ofstream. The structure is basically:
Start timer
output to a file using ofstream (the output is, at most, a few words)
do some experimental work
Stop timer
My question, which is a bit broad, is can I ignore the time that the ofstream takes or it's not something negligible ? or I guess it depends ?

First of all, from your pseudo code, you could just start the timer after the file output :-) But I'm guessing it's not like that in the real app.
Beyond that, it's obviously a matter of "it depends". If you aren't outputting all that much, and the code you're interested in runs for minutes, then the output obviously won't make much of a difference. If, on the other hand, you are trying to catch runtimes measured in microseconds, you'll probably be mostly measuring the ofstream.
You could try doing various magics, like running the actual output on a thread, or just adding your messages to a previously allocated char array and outputting that at the end. However, everything incurs some runtime penalty; nothing is ever free.
Since you're not interested in measuring the actual output time, you could compile a version without the output to do measurements, and a version with the output to debug the code. EDIT: or make that a runtime option. Nothing is ever free, but an "if (OutputEnabled)" is pretty close to "free" :-)

It mostly depends from what the ofstream does... as long as it just stores data in its internal buffer it will be fast, but if gets its buffer filled and actually call the OS API to perform the write the time spent could be much bigger.
But obviously everything depends on how long does the "experimental work" take in comparison to the IO you perform, both in the case where it just writes the data to the internal buffer and when the stream is flushed; as suggested in the comment, you should time the two things independently, to see how one time compares to the other.

Something is negligible compared to something else. You compare to nothing else in your question.
I had already asked a question here so as to check the validity of my statement below, and the conclusion was that I should not keep such classification, though it still just gives a rough rough draft evaluation (that may be false in some cases):
Stack operations are 10x faster than heap memory creations that are 10x faster than graphical device operations that are 10x faster than I/O operations ( writing to a file on hard drive for example ) that are 10x faster than net communication operation...
It's is only a rough estimation. Everything has to be re-evaluated each time you code.
If the time passed in the writing to ofstream doesnt impact the whole mechanism then it can be considered negligible.
If it does impact your whole program mechanism, then it cannot be considered negligible. Obviously.

Is there a way to read a string faster than getchar() (C/C++)?

I am participating in some programming competitions, and on many problems there's the need to read strings from an input file. Obviously performance is a big issue on those competitions, and strings might be huge, so I am trying to understand the most efficient way to read those strings.
My guess is that reading the strings char by char, with getchar(), is the fastest you can go. That's because even if you use other functions, say fgets() or getline(), those functions will still need to read every char anyway.
Update: I know that I/O won't be a bottleneck on most algorithmic problems. That being said I would still very much like to know what's the fastest way you can use to read strings, should this become an issue on any future problem.

You can use std::istream::read() function to read a chunk of unformatted data. It is relatively faster precisely because the data is unformatted. All overloads of operator>> read formatted data which makes reading from stream slower compared to read().
Similarly, you can use std::ostream::write() function to write a chunk of data to output stream at once.

The reverse is true, reading larger chunks of data into memory in one go is far faster than reading one character at a time. The OS and/or hard drive will likley cache the data in any case, but the function call overhead alone of repeatedly cycling through the standard-library, OS, file system and device driver for each character is significant for large data sets.
When handling strings there are some more important performance issues you might consider: Back to Basics by Joel Spolsky
Either way, the most convincing way to answer the question for yourself is to write test code that investigates the difference between different I/O methods.

Will reading a file be faster with a FILE* or an std::ifstream?

I was thinking about this when I ran into a problem using std::ofstream.
My thinking is that since std::ifstream, it wouldn't support random access. Rather, it would just start at the beginning and stream by until you get to the part you want. Is this just quick so we don't notice?
And I'm pretty sure FILE* supports random access so this would be fast as well?

ifstream supports random access with seekg. FILE* might be faster but you should measure it.

Since both of them imply system calls and that is going to be some orders of magnitude more time consuming that the rest of the operation, the performance of both should be very similar.

Let's assume that FILE* was faster. Now can you give me just one good reason why std::ifstream shouldn't be implemented in terms of that? Which means that performance becomes similar.
I'll leave the opposite case (if std::ifstream was faster) as an exercise to the reader (hint, the same is the case there).
Before worry about performance, there is one rule of thumb you should always keep in mind:
The people who wrote your language's
standard library have at least 4
working brain cells. They are not
stupid.
This implies that if feature X can be trivially implemented in terms of feature Y, then X will not be noticeably slower than Y.

If you run speed comparisons on standard input or output, remember to call std::ios_base::sync_with_stdio(false) before. When this setting is true, then all operations are done so that reads from std::cin pull data from the same buffers as fgets(stdin). Setting it to false gives iostreams more freedom and less bookkeeping.

Remember that random I/O will defeat any caching done in the underlying APIs. And it is NOT going to be faster to read until you reach a particular location than to seek, regardless of which mechanism you use (assuming that your files are of any significant size).
I'm with stribika here: Measure then make a decision.

std::ifstream is a wrapper around FILE, so the former is no way faster then the latter. How much do they differ depends on the compiler and whether it can inline wrapper function calls and perform other optimizations. Besides, reading formatted data from C++ streams is slower because they work with locales and such stuff.
However, if you often call for random access, this would be the bottleneck, as the other answers state. Anyway, the best thing is to use a profiler and measure your app performance.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js