Using scanf() in C++ programs is faster than using cin? - c++

I don't know if this is true, but when I was reading FAQ on one of the problem providing sites, I found something, that poke my attention:
Check your input/output methods. In C++, using cin and cout is too slow. Use these, and you will guarantee not being able to solve any problem with a decent amount of input or output. Use printf and scanf instead.
Can someone please clarify this? Is really using scanf() in C++ programs faster than using cin >> something ? If yes, that is it a good practice to use it in C++ programs? I thought that it was C specific, though I am just learning C++...

Here's a quick test of a simple case: a program to read a list of numbers from standard input and XOR all of the numbers.
iostream version:
#include <iostream>
int main(int argc, char **argv) {
int parity = 0;
int x;
while (std::cin >> x)
parity ^= x;
std::cout << parity << std::endl;
return 0;
}
scanf version:
#include <stdio.h>
int main(int argc, char **argv) {
int parity = 0;
int x;
while (1 == scanf("%d", &x))
parity ^= x;
printf("%d\n", parity);
return 0;
}
Results
Using a third program, I generated a text file containing 33,280,276 random numbers. The execution times are:
iostream version: 24.3 seconds
scanf version: 6.4 seconds
Changing the compiler's optimization settings didn't seem to change the results much at all.
Thus: there really is a speed difference.
EDIT: User clyfish points out below that the speed difference is largely due to the iostream I/O functions maintaining synchronization with the C I/O functions. We can turn this off with a call to std::ios::sync_with_stdio(false);:
#include <iostream>
int main(int argc, char **argv) {
int parity = 0;
int x;
std::ios::sync_with_stdio(false);
while (std::cin >> x)
parity ^= x;
std::cout << parity << std::endl;
return 0;
}
New results:
iostream version: 21.9 seconds
scanf version: 6.8 seconds
iostream with sync_with_stdio(false): 5.5 seconds
C++ iostream wins! It turns out that this internal syncing / flushing is what normally slows down iostream i/o. If we're not mixing stdio and iostream, we can turn it off, and then iostream is fastest.
The code: https://gist.github.com/3845568

http://www.quora.com/Is-cin-cout-slower-than-scanf-printf/answer/Aditya-Vishwakarma
Performance of cin/cout can be slow because they need to keep themselves in sync with the underlying C library. This is essential if both C IO and C++ IO is going to be used.
However, if you only going to use C++ IO, then simply use the below line before any IO operations.
std::ios::sync_with_stdio(false);
For more info on this, look at the corresponding libstdc++ docs.

Probably scanf is somewhat faster than using streams. Although streams provide a lot of type safety, and do not have to parse format strings at runtime, it usually has an advantage of not requiring excessive memory allocations (this depends on your compiler and runtime). That said, unless performance is your only end goal and you are in the critical path then you should really favour the safer (slower) methods.
There is a very delicious article written here by Herb Sutter "The String Formatters of Manor Farm" who goes into a lot of detail of the performance of string formatters like sscanf and lexical_cast and what kind of things were making them run slowly or quickly. This is kind of analogous, probably to the kind of things that would affect performance between C style IO and C++ style. The main difference with the formatters tended to be the type safety and the number of memory allocations.

I just spent an evening working on a problem on UVa Online (Factovisors, a very interesting problem, check it out):
http://uva.onlinejudge.org/index.php?option=com_onlinejudge&Itemid=8&category=35&page=show_problem&problem=1080
I was getting TLE (time limit exceeded) on my submissions. On these problem solving online judge sites, you have about a 2-3 second time limit to handle potentially thousands of test cases used to evaluate your solution. For computationally intensive problems like this one, every microsecond counts.
I was using the suggested algorithm (read about in the discussion forums for the site), but was still getting TLEs.
I changed just "cin >> n >> m" to "scanf( "%d %d", &n, &m )" and the few tiny "couts" to "printfs", and my TLE turned into "Accepted"!
So, yes, it can make a big difference, especially when time limits are short.

If you care about both performance and string formatting, do take a look at Matthew Wilson's FastFormat library.
edit -- link to accu publication on that library: http://accu.org/index.php/journals/1539

The statements cin and cout in general use seem to be slower than scanf and printf in C++, but actually they are FASTER!
The thing is: In C++, whenever you use cin and cout, a synchronization process takes place by default that makes sure that if you use both scanf and cin in your program, then they both work in sync with each other. This sync process takes time. Hence cin and cout APPEAR to be slower.
However, if the synchronization process is set to not occur, cin is faster than scanf.
To skip the sync process, include the following code snippet in your program right in the beginning of main():
std::ios::sync_with_stdio(false);
Visit this site for more information.

There are stdio implementations (libio) which implements FILE* as a C++ streambuf, and fprintf as a runtime format parser. IOstreams don't need runtime format parsing, that's all done at compile time. So, with the backends shared, it's reasonable to expect that iostreams is faster at runtime.

Yes iostream is slower than cstdio.
Yes you probably shouldn't use cstdio if you're developing in C++.
Having said that, there are even faster ways to get I/O than scanf if you don't care about formatting, type safety, blah, blah, blah...
For instance this is a custom routine to get a number from STDIN:
inline int get_number()
{
int c;
int n = 0;
while ((c = getchar_unlocked()) >= '0' && c <= '9')
{
// n = 10 * n + (c - '0');
n = (n << 3) + ( n << 1 ) + c - '0';
}
return n;
}

The problem is that cin has a lot of overhead involved because it gives you an abstraction layer above scanf() calls. You shouldn't use scanf() over cin if you are writing C++ software because that is want cin is for. If you want performance, you probably wouldn't be writing I/O in C++ anyway.

Of course it's ridiculous to use cstdio over iostream. At least when you develop software (if you are already using c++ over c, then go all the way and use it's benefits instead of only suffering from it's disadvantages).
But in the online judge you are not developing software, you are creating a program that should be able to do things Microsoft software takes 60 seconds to achieve in 3 seconds!!!
So, in this case, the golden rule goes like (of course if you dont get into even more trouble by using java)
Use c++ and use all of it's power (and heaviness/slowness) to solve the problem
If you get time limited, then change the cins and couts for printfs and scanfs
(if you get screwed up by using the class string, print like this: printf(%s,mystr.c_str());
If you still get time limited, then try to make some obvious optimizations (like avoiding too many embedded for/while/dowhiles or recursive functions). Also make sure to pass by reference objects that are too big...
If you still get time limited, then try changing std::vectors and sets for c-arrays.
If you still get time limited, then go on to the next problem...

#include <stdio.h>
#include <unistd.h>
#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)
static int scanuint(unsigned int* x)
{
char c;
*x = 0;
do
{
c = getchar_unlocked();
if (unlikely(c==EOF)) return 1;
} while(c<'0' || c>'9');
do
{
//*x = (*x<<3)+(*x<<1) + c - '0';
*x = 10 * (*x) + c - '0';
c = getchar_unlocked();
if (unlikely(c==EOF)) return 1;
} while ((c>='0' && c<='9'));
return 0;
}
int main(int argc, char **argv) {
int parity = 0;
unsigned int x;
while (1 != (scanuint(&x))) {
parity ^= x;
}
parity ^=x;
printf("%d\n", parity);
return 0;
}
There's a bug at the end of the file, but this C code is dramatically faster than the faster C++ version.
paradox#scorpion 3845568-78602a3f95902f3f3ac63b6beecaa9719e28a6d6 ▶ make test
time ./xor-c < rand.txt
360589110
real 0m11,336s
user 0m11,157s
sys 0m0,179s
time ./xor2-c < rand.txt
360589110
real 0m2,104s
user 0m1,959s
sys 0m0,144s
time ./xor-cpp < rand.txt
360589110
real 0m29,948s
user 0m29,809s
sys 0m0,140s
time ./xor-cpp-noflush < rand.txt
360589110
real 0m7,604s
user 0m7,480s
sys 0m0,123s
The original C++ took 30sec the C code took 2sec.

Even if scanf were faster than cin, it wouldn't matter. The vast majority of the time, you will be reading from the hard drive or the keyboard. Getting the raw data into your application takes orders of magnitude more time than it takes scanf or cin to process it.

Related

Why is this variable returning 32766?

I wrote a very basic evolution algorithm. The way it's supposed to work is that the user types in the desired value, and the amount of generations to try to reach it. Then, the program will run through, taking the nearest value in an array to the goal and mutating it four times (while also leaving the original, in case it's right) to try and get closer to the goal. In theory, it should take roughly |n|/2 generations to reach the value, as mutations happen in either one or two points.
Here's the code to demonstrate what I mean:
#include <iostream>
using namespace std;
int gen [5] = {0, 0, 0, 0, 0}; int goal; int gens; int best; int i = 0; int fit;
int dif(int in) {
return abs(gen[in] - goal);
}
void nextgen() {
int fit [5] = {dif(1), dif(2), dif(3), dif(4), dif(5)};
best = *max_element(fit, fit + 6);
int gen [5] = {best - 2, best - 1, best, best + 1, best + 2};
}
int main() {
cout << "Goal: "; cin >> goal; cout << "Gens: "; cin >> gens;
while(i < gens) {
nextgen(); cout << "Generation " << i + 1 << ": " << best << "\n";
i = i + 1;
}
}
It's pretty simple code. However, it seems that the int best bit of the output is returning 32766 every time, no matter what I do. Do you know what I've done wrong?
I've tried outputting the entire generation (which is even worse––a jumbled mess of non user friendly data that appears meaningless), I've reworked the code, I've added varibles and functions to try and pin down exactly where the error is, and I watched the entire code aesthetic youtube channel to make sure this looked good for you guys.
Looks like you're driving C++ without a license or safety belt. Joke aside, please keep trying and learning. But with C/C++ you should always enable compiler warnings. The godbolt link in the comment from #user4581301 is really good, the compiler flags -Wall -Wextra -pedantic -O2 -fsanitize=address,undefined are all best practice. (I would add -Werror.)
Why you got 32766 is possible to analyze with a debugger, but it's not meaningful. A number close to 32768 (=2^15) should trigger all the warning bells (could be an integer overflow). Your code is accessing uninitialized memory (among other issues), leading to what is called undefined behaviour. This means it may produce different output depending on your machine, compiler, optimization flags, OS, standard libraries, etc. - even adding a debug-print could change what it does.
For optimization algorithms (like GAs) it's also super easy to fool yourself into thinking that your implementation is correct, because the optimization will find a way to avoid (or exploit) any bugs. I've had one in my NN implementation that was accessing some data from the previous example by accident, and it took several days until I even noticed there was a problem.
If you want to focus on the algorithms, I suggest to start with a different language (anything except C/C++/Assembly). My advice would be either Python (though it can be 50x slower, it's much easier to learn and write) or Rust (just as fast as C++ and just as complicated, but no undefined behaviour). With Rust, every mistake in your code above would have given you either a warning by default, a compiler error, or a runtime error instead of wrong output. Though C++ with the flags mentioned above does the same for your specific code.

Handling division by zero with <csignal> results in unexpected behaviour

I was trying to handle the integer division by zero (please don't judge, I was told I had to use <csignal> lib and I couldn't just use an if statement), but I needed to make sure the program would keep running (even though it's a very bad practice), instead of crashing or closing. The weird part is that the program should only handle division by zero but should exit for every other type of SIGFPE.
SideNote: Now, I have no idea why they use names like FPU or FE or FPE when referring to integer "exceptions" (or I should say interrupts) since the standard says clearly that dividing a floating point number should return either inf or nan for 0 / 0 (tell me if I'm wrong).
Anyway, I wrote this test code so I could better understand what I needed to do before the actual implementation. I know, it's weird to have x as a global variable but if I don't reset it, it will keep calling handle for no reason like forever....
#include <iostream>
#include <csignal>
using namespace std;
int x = 0;
void handle(int s);
int main(int argc, char * argv[]) {
signal(SIGFPE, handle);
cout << "Insert 0: ";
cin >> x; // here I would input 0, so the program can compile
x = 5 / x;
cout << "X: " << x << endl;
return 0;
}
void handle(int s) {
if (s != FPE_INTDIV) exit(1);
cout << "sig: " << s << endl;
x = 1;
}
As you can see I used FPE_INTDIV to rule out every other type of exceptions, but it doesn't work.
Eventually I discovered that FPE_INTDIV is a symbolic constant for 7 (that's what vs-code's intellisense tells me) and if I were to print the value of s, that would be 8. I discovered that, strangely enough, 8 is the value for FPE_INTOVF on which the documentation states that it's specifically designed for integer overflows.
Why on earth is the symbolic value for overflows used for integer division if there is a symbolic for integer division? What am I missing? Did someone mess up the values in the library? Am I using the wrong macros?
I should also mention, this code compiles fine with clang++ and g++ but when compiled on a Windows computer with cl, it tells me there's no macro for FPE_INTDIV.
How can I be sure of what I'm doing and write a cross platform solution that works?
I already feel like an idiot.
It's defined as:
The SIGFPE signal reports a fatal arithmetic error. Although the name is derived from “floating-point exception”, this signal actually covers all arithmetic errors, including division by zero and overflow. If a program stores integer data in a location which is then used in a floating-point operation, this often causes an “invalid operation” exception, because the processor cannot recognize the data as a floating-point number.
There's no reason for it to be labelled specifically FPE but these sorts of labels can evolve in unpredictable ways. I wouldn't read too much into it.
These signals are part of the POSIX standard and may not be fully supported or implemented in Windows. The Windows implementation of these support facilities is lacking in a number of areas, like how fork() is unsupported.

Execution time and checking stream state in c++

I am trying to understand streams in C++. I have the following code where I print a message a number of times and I'm trying to find if there is a difference in execution time when checking for a good state or not. I used time of course but I couldn't find a definitive answer since sometimes checking was faster and sometimes it wasn't. My intuition says that since checking is an additional operation it should always take (a slightly) longer time. Is there any actual difference or it is just random?
#include <iostream>
using namespace std;
int main(int argc, char **argv)
{
ostream &out = cout; //initialize ostream object
size_t arg = stoul(argv[1]); //convert char to size_t
for (size_t cnt = 0; cnt != arg; ++cnt)
{
// if (out.good()) //check goodbit
out << "Nr. of command line argument " << argc << '\n';
}
}
The real answer to your question it is extremely hard in practice to measure the difference. It's just one comparison (the if) vs. giving up execution to the OS for I/O and communicating with hardware.
There are multiple layers of abstraction when it comes to printing, from buffering to branch prediction. The actual impact depends on multiple factors. Even multiple runs of exactly the same program will exhibit execution time variation.
You would need to devise a careful and clever experiment to measure the effect of the check reliably.
The take out for your problem here is, that most certainly, the difference is below your testing accuracy and probably below the execution noise. On top of that CPU architecture can actually eliminate the difference, keywords are: pre-fetching, branch prediction and (in)famous speculative execution.

Why is getchar_unlocked() faster than alternatives?

I know how this code works but i could not find why this code faster than other i/o methords???
int read_int() {
char c = getchar_unlocked();
while(c<'0' || c>'9') c = getchar_unlocked();
int ret = 0;
while(c>='0' && c<='9') {
ret = 10 * ret + c - 48;
c = getchar_unlocked();
}
return ret;
}
scanf("%d\n", &x) has to parse the format string and lock the stream before and after the reading.
std::cin >> x might do locking too, and it might have to sync with stdin, and it might need to go through some abstraction layers.
With the above, you only do one type of input parsing (so no need to parse a format string and decide what to do based on that) and most importantly, you don't lock the stream.
Locking streams is mandated by POSIX, and glibc uses recursive mutexes to prevent multiple threads (even in a single-threaded environment) from accessing the stdin FILE simultaneously (which would corrupt it).
These mutexes are quite expensive (your read_int should be several (fivish?) times faster than scanf("%d",&x)).
Regarding your implementation, apart from fixing the magic number issue,
you should probably detect failures in getchar_unlocked too and report those failures through a separate channel -- e.g., by returning the parsed integer through a passed-in pointer and using the return status for error reporting.
If you want thread safety, you can still use getchar_unlocked to get a speedup compared to getchar, but you have to flockfile(stdin); and funlock(stdin); at the beginning and end (respectively) of your read_int function.
Locking between threads is expensive. This is a non locking IO call.
https://discuss.codechef.com/questions/2667/getchar_unlocked

Get the remaining available memory in standard C++11?

Is it possible to get the remaining available memory on a system (x86, x64, PowerPC / Windows, Linux or MacOS) in standard C++11 without crashing ?
A naive way would be to try allocating very large arrays starting by too large size, catch exceptions everytime it fails and decrease the size until no exception is thrown. But maybe there is a more efficient/clever method...
EDIT 1: In fact I do not need the exact amount of memory. I would like to know approximately (error bar of 100MB) how much my code could use when I start it.
EDIT 2 :
What do you think of this code ? Is it secure to run it at the start of my program or it could corrupt the memory ?
#include <iostream>
#include <array>
#include <list>
#include <initializer_list>
#include <stdexcept>
int main(int argc, char* argv[])
{
static const long long int megabyte = 1024*1024;
std::array<char, megabyte> content({{'a'}});
std::list<decltype(content)> list1;
std::list<decltype(content)> list2;
const long long int n1 = list1.max_size();
const long long int n2 = list2.max_size();
long long int i1 = 0;
long long int i2 = 0;
long long int result = 0;
for (i1 = 0; i1 < n1; ++i1) {
try {
list1.push_back(content);
}
catch (const std::exception&) {
break;
}
}
for (i2 = 0; i2 < n2; ++i2) {
try {
list2.push_back(content);
}
catch (const std::exception&) {
break;
}
}
list1.clear();
list2.clear();
result = (i1+i2)*sizeof(content);
std::cout<<"Memory available for program execution = "<<result/megabyte<<" MB"<<std::endl;
return 0;
}
This is highly dependent on the OS/platform. The approach that you suggest need not even work in real life. In some platforms the OS will grant you all your memory requests, but not really give you the memory until you use it, at which point you get a SEGFAULT...
The standard does not have anything related to this.
It seems to me that the answer is no, you cannot do it in standard C++.
What you could do instead is discussed under How to get available memory C++/g++? and the contents linked there. Those are all platform specific stuff. It's not standard but it least it helps you to solve the problem you are dealing with.
As others have mentioned, the problem is hard to precisely define, much less solve. Does virtual memory on the hard disk count as "available"? What about if the system implements a prompt to delete files to obtain more hard disk space, meanwhile suspending your program? (This is exactly what happens on OS X.)
The system probably implements a memory hierarchy which gets slower as you use more. You might try detecting the performance cliff between RAM and disk by allocating and initializing chunks of memory while using the C alarm interrupt facility or clock or localtime/mktime, or the C++11 clock facilities. Wall-clock time should appear to pass quicker as the machine slows down under the stress of obtaining memory from less efficient resources. (But this makes the assumption that it's not stressed by anything else such as another process.) You would want to tell the user what the program is attempting, and save the results to an editable configuration file.
I would advise using a configurable maximum amount of memory instead. Since some platforms overcommit memory, it's not easy to tell how much memory you will actually have access to. It's also not polite to assume that you have exclusive access to 100% of the memory available, many systems will have other programs running.