c++ Inline function for array multiplications of 10000 - c++

I am tasked with two programs and this is the second one. The first program involved no calculation() function and to time the program when it started and finished. My computer will display anything from .523 seconds to .601 seconds.
The second task was to create an inline function for the calculation and I believe that I have done it wrong because it is not faster. I am not sure if I made the calculation function right because it includes display information, or if the inline function should focus only on the multiplication. Either way pulling the arrays out of main and into a function is not faster.
Is the compiler just ignoring it?
#include <ctime>
#include <iostream>
using namespace std;
inline int calculation(){
int i;
double result[10000];
double user[10000];
for(i=0; i<10000; i++){
user[i]=i+100;
}
double second[10000];
for(i=0; i<10000; i++){
second[i]=10099-i;
}
for (i = 0; i < 10000; i++){
result[i] = user[i] * second[i];
}
for (i = 0; i < 10000; i++){
cout << user[i] << " * " << second[i] << " = " << result[i] << '\n';
}
}
int main() {
time_t t1 = time(0); // get time now
struct tm * now = localtime( & t1 );
cout << "The time now is: ";
cout << now->tm_hour << ":" << now->tm_min << ":" << now->tm_sec << endl;
clock_t t; // get ticks
t = clock();
cout << " Also calculating ticks...\n"<<endl;
calculation(); // inline function
time_t t2 = time(0); // get time now
struct tm * now2 = localtime( & t2 );
cout << "The time now is: ";
cout << now2->tm_hour << ":" << now2->tm_min << ":" << now2->tm_sec << endl;
time_t t3= t2-t1;
cout << "This took me "<< t3 << " second(s)" << endl; // ticks
t = clock() - t;
float p;
p = (float)t/CLOCKS_PER_SEC;
cout << "Or more accuratley, this took " << t << " clicks"
<< " or " << p << " seconds"<<endl;
}

Is the compiler just ignoring it?
Most probably, yes. It could be doing that for two reasons:
You're compiling in debug mode. In debug mode all inline keywords are ignored to facilitate debugging.
It's ignoring it because the function is far too long for an inline function, and uses far too much stack space to safely inline, and is only invoked once. The inline keyword is a compiler HINT, not a mandatory requirement. It's the programmer's way of recommending the compiler to inline the function, just like a compiler in release mode will frequently inline functions on its own to increase performance. If it only sees negative value it won't comply.
Also, given the single invocation, it's highly unlikely that you'll even see differences no matter if it works or not. A single native function call is much easier on the CPU than a single task switch at the OS level.

You should disable optimization to verify if what you do has any effect, because there are good chances that the compiler is already inlining the function by itself.
Also, if you want to know exactly what your code does, you should compile with the -s flag in g++, and look at the assembly generated by the compiler for your program. This will remove all uncertainty about what the compiler is doing to your program.

I would not make the function inlined and define arrays as static. For example
int calculation(){
int i;
static double result[10000];
static double user[10000];
for(i=0; i<10000; i++){
user[i]=i+100;
}
static double second[10000];
for(i=0; i<10000; i++){
second[i]=10099-i;
}
for (i = 0; i < 10000; i++){
result[i] = user[i] * second[i];
}
for (i = 0; i < 10000; i++){
cout << user[i] << " * " << second[i] << " = " << result[i] << '\n';
}
}

Related

pthread execution time worse than sequential

I was learning to use pthread with hopes it will help some of the slowest pieces of my code
go a bit faster. I tried to (as a warm-up example) to write a Montecarlo integrator using
threads. I wrote a code that compares three approaches:
Single thread pthread evaluation of the integral with NEVALS integrand evaluations.
Multiple thread evaluation of the integral NTHREADS times each with NEVALS
integrand evaluations.
Multiple threads commited to different cores in my CPU, again totalling NEVALS*NTHREADS
integrand evaluations.
Upon running the fastest per integrand evaluations is the single core, between 2 and 3 times faster than the others. The other two seem to be somewhat equivalent except for the fact that
the CPU usage is very different, the second one spreads the threads across all the (8) cores
in my CPU, while the third (unsurprisingly) concentrates the job in NTHREADS and leaves the rest
unoccupied.
Here is the source:
#include <iostream>
#define __USE_GNU
#include <sched.h>
#include <pthread.h>
#include <thread>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <unistd.h>
using namespace std;
double aleatorio(double a, double b){
double r = double(rand())/RAND_MAX;
return a + r * (b - a);
}
double funct(double* a){
return pow(a[0],6);
}
void EstimateBounds(int ndim, double (*f)(double*), double* bounds){
double x[ndim];
for(int i=1;i<=1000;i++){
for(int j=0;j<ndim;j++) x[j] = aleatorio(0,1);
if ( f(x) > bounds[1]) bounds[1] = f(x);
if ( f(x) < bounds[0]) bounds[0] = f(x);
}
}
void Integrate(double (*f)(double*), int ndim, double* integral, int verbose, int seed){
int nbatch = 5000000;
const int maxeval = 25*nbatch;
double x[ndim];
srand(seed);
/// Algorithm to estimate the maxima and minima ///
for(int j=0;j<ndim;j++) x[j] = 0.5;
double bounds[2] = {f(x),f(x)};
EstimateBounds(ndim,f,bounds);
/// Integral initialization ///
int niter = int(maxeval/nbatch);
for(int k=1;k<=niter;k++)
{
double loc_min = bounds[0];
double loc_max = bounds[1];
int count = 0;
for (int i=1; i<=nbatch; i++)
{
for(int j=0;j<ndim;j++) x[j] = aleatorio(0,1);
double y = aleatorio(bounds[0],bounds[1]);
if ( f(x) > loc_max ) loc_max = f(x);
if ( f(x) < loc_min ) loc_min = f(x);
if ( f(x) > y && y > 0 ) count++;
if ( f(x) < y && y < 0 ) count--;
}
double delta = (bounds[1]-bounds[0])*double(count)/nbatch;
integral[0] += delta;
integral[1] += pow(delta,2);
bounds[0] = loc_min;
bounds[1] = loc_max;
if(verbose>0){
cout << "Iteration["<<k<<"]: " << k*nbatch;
cout << " integrand evaluations so far" <<endl;
if(verbose>1){
cout << "The bounds for this iteration were = ["<<bounds[0]<<","<<bounds[1]<<"]"<<endl;}
cout << "Integral = ";
cout << integral[0]/k << " +- ";
cout << sqrt((integral[1]/k - pow(integral[0]/k,2)))/(k) << endl;
cout << endl;
}
}
integral[0] /= niter;
integral[1] = sqrt((integral[1]/niter - pow(integral[0],2)))/niter;
}
struct IntegratorArguments{
double (*Integrand)(double*);
int NumberOfVariables;
double* Integral;
int VerboseLevel;
int Seed;
};
void LayeredIntegrate(IntegratorArguments IA){
Integrate(IA.Integrand,IA.NumberOfVariables,IA.Integral,IA.VerboseLevel,IA.Seed);
}
void ThreadIntegrate(void * IntArgs){
IntegratorArguments *IA = (IntegratorArguments*)IntArgs;
LayeredIntegrate(*IA);
pthread_exit(NULL);
}
#define NTHREADS 5
int main(void)
{
cout.precision(16);
bool execute_single_core = true;
bool execute_multi_core = true;
bool execute_multi_core_2 = true;
///////////////////////////////////////////////////////////////////////////
///
/// Single Thread Execution
///
///////////////////////////////////////////////////////////////////////////
if(execute_single_core){
pthread_t thr0;
double integral_value0[2] = {0,0};
IntegratorArguments IntArg0;
IntArg0.Integrand = funct;
IntArg0.NumberOfVariables = 2;
IntArg0.VerboseLevel = 0;
IntArg0.Seed = 1;
IntArg0.Integral = integral_value0;
int t = time(NULL);
cout << "Now Attempting to create thread "<<0<<endl;
int rc0 = 0;
rc0 = pthread_create(&thr0, NULL, ThreadIntegrate,&IntArg0);
if (rc0) {
cout << "Error:unable to create thread," << rc0 << endl;
exit(-1);
}
else cout << "Thread "<<0<<" has been succesfuly created" << endl;
pthread_join(thr0,NULL);
cout << "Thread 0 has finished, it took " << time(NULL)-t <<" secs to finish" << endl;
cout << "Integral Value = "<< integral_value0[0] << "+/-" << integral_value0[1] <<endl;
}
////////////////////////////////////////////////////////////////////////////////
///
/// Multiple Threads Creation
///
///////////////////////////////////////////////////////////////////////////////
if(execute_multi_core){
pthread_t threads[NTHREADS];
double integral_value[NTHREADS][2];
IntegratorArguments IntArgs[NTHREADS];
int rc[NTHREADS];
for(int i=0;i<NTHREADS;i++){
integral_value[i][0]=0;
integral_value[i][1]=0;
IntArgs[i].Integrand = funct;
IntArgs[i].NumberOfVariables = 2;
IntArgs[i].VerboseLevel = 0;
IntArgs[i].Seed = i;
IntArgs[i].Integral = integral_value[i];
}
int t = time(NULL);
for(int i=0;i<NTHREADS;i++){
cout << "Now Attempting to create thread "<<i<<endl;
rc[i] = pthread_create(&threads[i], NULL, ThreadIntegrate,&IntArgs[i]);
if (rc[i]) {
cout << "Error:unable to create thread," << rc[i] << endl;
exit(-1);
}
else cout << "Thread "<<i<<" has been succesfuly created" << endl;
}
/// Thread Waiting Phase ///
for(int i=0;i<NTHREADS;i++) pthread_join(threads[i],NULL);
cout << "All threads have now finished" <<endl;
cout << "This took " << time(NULL)-t << " secs to finish" <<endl;
cout << "Or " << (time(NULL)-t)/NTHREADS << " secs per core" <<endl;
for(int i = 0; i < NTHREADS; i++ ) {
cout << "Thread " << i << " has as the value for the integral" << endl;
cout << "Integral = ";
cout << integral_value[i][0] << " +- ";
cout << integral_value[i][1] << endl;
}
}
////////////////////////////////////////////////////////////////////////
///
/// Multiple Cores Execution
///
///////////////////////////////////////////////////////////////////////
if(execute_multi_core_2){
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
pthread_t threads[NTHREADS];
double integral_value[NTHREADS][2];
IntegratorArguments IntArgs[NTHREADS];
int rc[NTHREADS];
for(int i=0;i<NTHREADS;i++){
integral_value[i][0]=0;
integral_value[i][1]=0;
IntArgs[i].Integrand = funct;
IntArgs[i].NumberOfVariables = 2;
IntArgs[i].VerboseLevel = 0;
IntArgs[i].Seed = i;
IntArgs[i].Integral = integral_value[i];
}
int t = time(NULL);
for(int i=0;i<NTHREADS;i++){
cout << "Now Attempting to create thread "<<i<<endl;
rc[i] = pthread_create(&threads[i], NULL, ThreadIntegrate,&IntArgs[i]);
if (rc[i]) {
cout << "Error:unable to create thread," << rc[i] << endl;
exit(-1);
}
else cout << "Thread "<<i<<" has been succesfuly created" << endl;
CPU_SET(i, &cpuset);
}
cout << "Now attempting to commit different threads to different cores" << endl;
for(int i=0;i<NTHREADS;i++){
const int set_result = pthread_setaffinity_np(threads[i], sizeof(cpu_set_t), &cpuset);
if(set_result) cout << "Error: Thread "<<i<<" could not be commited to a new core"<<endl;
else cout << "Thread reassignment succesful" << endl;
}
/// Thread Waiting Phase ///
for(int i=0;i<NTHREADS;i++) pthread_join(threads[i],NULL);
cout << "All threads have now finished" <<endl;
cout << "This took " << time(NULL)-t << " secs to finish" <<endl;
cout << "Or " << (time(NULL)-t)/NTHREADS << " secs per core" <<endl;
for(int i = 0; i < NTHREADS; i++ ) {
cout << "Thread " << i << " has as the value for the integral" << endl;
cout << "Integral = ";
cout << integral_value[i][0] << " +- ";
cout << integral_value[i][1] << endl;
}
}
pthread_exit(NULL);
}
I compile with
g++ -std=c++11 -w -fpermissive -O3 SOURCE.cpp -lpthread
It seems to me that my threads are actually being excecuted sequentially, because
the time seems to grow with NTHREADS, and it actully takes roughly NTHREADS times longer
than a single thread.
Does anyone have an idea of where the bottleneck is?
You are using rand(), which is a global random number generator. First of all it is not thread-safe, so using it in multiple threads, potentially in parallel, causes undefined behavior.
Even if we set that aside, rand() is using one global instance, shared by all threads. If one thread wants to call it, the processor core needs to check whether the other cores modified its state and needs to refetch that state from the main memory or other caches each time it is used. This is why you observe the drop in performance.
Use the <random> facilities for pseudo-random number generators instead. They offer much better quality random number generators, random number distributions, and the ability to create multiple independent random number generator instances. Make these thread_local, so the threads do not interfere with one another:
double aleatorio(double a, double b){
thread_local std::mt19937 rng{/*seed*/};
return std::uniform_real_distribution<double>{a, b}(rng);
}
Please note though that this is not using proper seeding for std::mt19937, see this question for details and that uniform_real_distribution<double>{a, b} will return a uniformly distributed number between a inclusive and b exclusive. Your original code gave a number between a and b inclusive (potential rounding errors aside). I assume that neither is particularly relevant to you.
Also note my unrelated comments under your question for other things you should improve.

Code has function specific variables changing values outside of scope

This is FCFS cpu scheduling algorithm.
void findTurnAroundTime(int processes[], int n, int bt[], int wt[], int tat[])
{
// Calculating turnaround time by adding bt[i] + wt[i]
for (int i = 0; i < n; i++)
tat[i] = bt[i] + wt[i];
}
// Function to calculate average waiting and turn-around
// times.
void findavgTime(int processes[], int n, int bt[], int at[])
{
int wt[n], tat[n];
// Function to find waiting time of all processes
findWaitingTime(processes, n, bt, wt, at);
// Function to find turn around time for all processes
findTurnAroundTime(processes, n, bt, wt, tat);
// Display processes along with all details
cout << "Processes " << " Burst Time " << " Arrival Time "
<< " Waiting Time " << " Turn-Around Time "
<< " Completion Time \n";
int total_wt = 0, total_tat = 0;
for (int i = 0; i < n; i++)
{
total_wt = total_wt + wt[i];
total_tat = total_tat + tat[i];
int compl_time = tat[i] + at[i];
cout << " " << i + 1 << "\t\t" << bt[i] << "\t\t" << at[i] << "\t\t"
<< wt[i] << "\t\t " << tat[i] << "\t\t " << compl_time << endl;
}
cout << "Average waiting time = " << (float) total_wt / (float) n;
cout << "\nAverage turn around time = " << (float) total_tat / (float) n;
}
How are variables like wt and tat connected if they are decleared inside each function?(This is the main question)
full code is working.
How are variables like wt and tat connected if they are decleared inside each function?
wt and tat are defined in findavgTime. (They are defined using a non-standard extension but that's a separate issue).
When findavgTime calls findWaitingTime and findTurnAroundTime, it passes those variables to the functions. The functions don't define them in their function body -- they are defined in the functions by way of function arguments. Since wt and tat are arrays, they decay to a pointer to the first elements of the respective arrays when findWaitingTime and findTurnAroundTime are called. Because of that, any changes made to the variables inside those functions are visible in findavgTime too.
You don't have to use the same variable names in the function arguments. You could use
void findTurnAroundTime(int processes[], int n, int bt[], int wt_here[], int tat_here[])
{
for (int i = 0; i < n; i++)
tat_here[i] = bt[i] + wt_here[i];
}
That won't change the behavior of the program.

C++: array index is not increasing

I want to make a simple game of 3 players, each player moves in a block depending of the random function from 1 to 6 blocks each time, when first player has been moved the second player start and then then the third player. To do that I increase the index of an array rach time a player finish its move.
My problem is that the indexer seems no to been increased, and it stacks in the player 1 even if I increase it. I have exactly the same code in C# and it works well!
Here is the code in C++.
int main ()
{
string namesofplayers[] = {"one","two","three"};
int movementofplayers[] = {0,0,0}; // start position of players is
int gamesize = 32; //32 blocks-steps of game
int random;
int y = 0;
a:
y++;
if (y >= 3)
{
y = 0;
}
cout << "it's" << namesofplayers[y] << "turn to play";
int R = (rand() % 6 + 1);
cout << "player " << namesofplayers[y] << " moves to block" << R << endl;
movementofplayers[y] += random;
cout << movementofplayers[y];
if (movementofplayers[y] < gamesize)
{
goto a;
}
else
{
cout << "Player " << namesofplayers[y] << " wins the game" << endl;
}
}
On the off chance of doing your work, I took the liberty to write up an alternative implementation which fixes some of the problems your former code had and also produces more readable output. I also threw out the one-liners because they drive me crazy, but that's personal preference. Also, I tend to explicitly qualify symbols from the standard library using the appropriate scope.
Get rid of goto. You can browse SO and the web for multiple reasons why not to use an explicit jump like that. Just use a loop
Fix the missing initial seed for the pseudo-random number generator. If you set a varying seed, i.e. by invoking it with some variable value (e.g. time(nullptr) ), you'll always get the same succession of "random" values - with each program invocation.
Fix the use of the variable random. You tried to add some garbage-initialized value random to movementofplayers[y]. Interestingly, g++-4.7 seems to ensure that the variable is set to 1 before being used in the arithmetic op. However, the correct variable you need is R.
Return a well defined value from main().
I hope the code still does what you intended it to do:
#include <string>
#include <iostream>
int main ()
{
srand(time(NULL));
std::string namesofplayers[] = {"one","two","three"};
int movementofplayers[] = {0,0,0}; // start position of players is
int gamesize = 32; //32 blocks-steps of game
int y = 1;
while(movementofplayers[y] < gamesize)
{
if (y >= 3)
{
y = 0;
}
std::cout << "it's " << namesofplayers[y] << " turn to play" << std::endl;
int R = (rand() % 6 + 1);
std::cout << "player " << namesofplayers[y] << " moves to block " << R << std::endl;
movementofplayers[y] += R;
std::cout << "movements of player " << namesofplayers[y] <<": " << movementofplayers[y] << std::endl;
y++;
}
std::cout << "Player " << namesofplayers[y] << " wins the game" << std::endl;
return 0;
}
Here is how I would do it.
Added seeding the random number generator so you don't get the same game every time.
Added a constant for number of players to get rid of the magic number and also make it easier to expand the number of players if desired.
Got rid of the goto. Although it is possible to use goto in a reasonable way it is prone to accidental misuse, makes the code harder to follow, and makes people angry. :)
I tweaked the output and names a bit just to make it a little easier for me to test. In doing so I corrected an issue where it said the player moved to block R which was their roll for that turn, not their actual position in the game.
#include <string>
#include <iostream>
#include <cstdlib>
#include <ctime>
int main()
{
std::srand(static_cast<unsigned int>(std::time(0)));
const int gamesize = 32;
const int num_players = 3;
const std::string namesofplayers[num_players] = {"1", "2", "3"};
int movementofplayers[num_players] = {0, 0, 0};
int current_player = 0;
for(;;) //Loop forever, the game logic will exit the loop when a winner is found
{
const int roll = rand() % 6 + 1;
movementofplayers[current_player] += roll;
std::cout << "Player " << namesofplayers[current_player] << " rolls a " << roll << " and moves to block " << movementofplayers[current_player] << std::endl;
//Check if they won and if so, end the game
if(movementofplayers[current_player] >= gamesize)
{
std::cout << "Player " << namesofplayers[current_player] << " wins the game!" << std::endl;
break;
}
current_player = (current_player + 1) % num_players;
}
return 0;
}

Printing a pointer to a pointer

This may be very simple but I am confused!
I am getting segmentation fault when extracting information from a pointer to a pointer. See the cout section in main(). Any help will be appreciated.
Thanks..
Sen
#include <stdlib.h>
#include <iostream>
typedef struct{
int hour;
int minute;
} Time;
Time* GetNextTime(void)
{
Time *p_time = new Time;
return p_time;
}
void GetTime( Time **sometime )
{
int length = 10;
sometime = new Time*[length];
for(int i=0; i<length; i++)
{
sometime[i] = GetNextTime();
sometime[i]->hour = rand()%24 ;
sometime[i]->minute = rand()%60;
std::cout << "Entered times " << sometime[i]->hour << " hour " << sometime[i]->minute << " minutes " << std::endl;
}
}
int main()
{
Time** _time;
GetTime( _time );
//here is the question
// I cant print them from original _time
for( int i=0; i<10; i++)
std::cout << " Print times " << (*_time)[i].hour << " hour " << (*_time)[i].minute << " minutes " << std::endl;
}
You're passing sometime by value, not by reference so it remains uninitialized. Change GetTime to the following:
void GetTime( Time ** &sometime ) //& means pass by reference
Because you're creating an array of pointers, you can use array notation to access them during printing as well.
std::cout << " Print times " << _time[i]->hour << " hour "
<< _time[i]->minute << " minutes " << std::endl;
Unless an argument is explicitly labelled as using a reference it is passed by value in C++. Thus, assigning to sometime in GetTime() has no effect on _time in main().
My strong advice is not to us explict memory allocation but use containers, e.g. std::vector<T>, instead. You'd still need to pass the container by refernence, however.
In main
It should be
Time *_time;
GetTime(&_time)
And then cout should be done with _time instead of *_time

C++ cout printing slowly

I noticed if I print out a long string(char*) using cout it seems to print 1 character at a time to the screen in Windows 7, Vista, and Linux(using putty) using Visual C++ 2008 on Windows and G++ on Linux. Printf is so much faster I actually switched from cout to printf for most printing in a project of mine. This is confusing me because this question makes it seem like I'm the only one having this issue.
I even wrote a cout replacement that looks like it beats the pants off of cout on my comp -
class rcout
{
public:
char buff[4096];
unsigned int size;
unsigned int length;
rcout()
{
size = 4096;
length = 0;
buff[0] = '\0';
}
~rcout()
{
printf("%s", buff);
}
rcout &operator<<(char *b)
{
strncpy(buff+length, b, size-length);
unsigned int i = strlen(b);
if(i+length >= size)
{
buff[size-1] = '\0';
printf("%s", buff);
b += (size-length) -1;
length = 0;
return (*this) << b;
}
else
length += i;
return (*this);
}
rcout &operator<<(int i)
{
char b[32];
_itoa_s(i, b, 10);
return (*this)<<b;
}
rcout &operator<<(float f)
{
char b[32];
sprintf_s(b, 32, "%f", f);
return (*this)<<b;
}
};
int main()
{
char buff[65536];
memset(buff, 0, 65536);
for(int i=0;i<3000;i++)
buff[i] = rand()%26 + 'A';
rcout() << buff << buff <<"\n---"<< 121 <<"---" << 1.21f <<"---\n";
Sleep(1000);
cout << "\n\nOk, now cout....\n\n";
cout << buff << buff <<"\n---"<< 121 <<"---" << 1.21f <<"---\n";
Sleep(1000);
cout << "\n\nOk, now me again....\n\n";
rcout() << buff << buff <<"\n---"<< 121 <<"---" << 1.21f <<"---\n";
Sleep(1000);
return 0;
}
Any ideas why cout is printing so slowly for me?
NOTE: This experimental result is valid for MSVC. In some other implementation of library, the result will vary.
printf could be (much) faster than cout. Although printf parses the format string in runtime, it requires much less function calls and actually needs small number of instruction to do a same job, comparing to cout. Here is a summary of my experimentation:
The number of static instruction
In general, cout generates a lot of code than printf. Say that we have the following cout code to print out with some formats.
os << setw(width) << dec << "0x" << hex << addr << ": " << rtnname <<
": " << srccode << "(" << dec << lineno << ")" << endl;
On a VC++ compiler with optimizations, it generates around 188 bytes code. But, when you replace it printf-based code, only 42 bytes are required.
The number of dynamically executed instruction
The number of static instruction just tells the difference of static binary code. What is more important is the actual number of instruction that are dynamically executed in runtime. I also did a simple experimentation:
Test code:
int a = 1999;
char b = 'a';
unsigned int c = 4200000000;
long long int d = 987654321098765;
long long unsigned int e = 1234567890123456789;
float f = 3123.4578f;
double g = 3.141592654;
void Test1()
{
cout
<< "a:" << a << “\n”
<< "a:" << setfill('0') << setw(8) << a << “\n”
<< "b:" << b << “\n”
<< "c:" << c << “\n”
<< "d:" << d << “\n”
<< "e:" << e << “\n”
<< "f:" << setprecision(6) << f << “\n”
<< "g:" << setprecision(10) << g << endl;
}
void Test2()
{
fprintf(stdout,
"a:%d\n"
"a:%08d\n"
"b:%c\n"
"c:%u\n"
"d:%I64d\n"
"e:%I64u\n"
"f:%.2f\n"
"g:%.9lf\n",
a, a, b, c, d, e, f, g);
fflush(stdout);
}
int main()
{
DWORD A, B;
DWORD start = GetTickCount();
for (int i = 0; i < 10000; ++i)
Test1();
A = GetTickCount() - start;
start = GetTickCount();
for (int i = 0; i < 10000; ++i)
Test2();
B = GetTickCount() - start;
cerr << A << endl;
cerr << B << endl;
return 0;
}
Here is the result of Test1 (cout):
# of executed instruction: 423,234,439
# of memory loads/stores: approx. 320,000 and 980,000
Elapsed time: 52 seconds
Then, what about printf? This is the result of Test2:
# of executed instruction: 164,800,800
# of memory loads/stores: approx. 70,000 and 180,000
Elapsed time: 13 seconds
In this machine and compiler, printf was much faster cout. In both number of executed instructions, and # of load/store (indicates # of cache misses) have 3~4 times differences.
I know this is an extreme case. Also, I should note that cout is much easier when you're handling 32/64-bit data and require 32/64-platform independence. There is always trade-off. I'm using cout when checking type is very tricky.
Okay, cout in MSVS just sucks :)
I would suggest you try this same test on a different computer. I don't have a good answer for why this might be happening; all I can say is I have never noticed a speed difference between cout and printf. I also tested your code using gcc 4.3.2 on Linux and there was no difference whatsoever.
That being said, you can't easily replace cout with your own implementation. The fact is, cout is an instance of std::ostream which has a lot of functionality built into it which is necessary for interoperability with other classes that overload the iostream operators.
Edit:
Anyone that says printf is always faster than std::cout is simply wrong. I just ran the test code posted by minjang, with gcc 4.3.2 and the -O2 flag on a 64-bit AMD Athlon X2, and cout was actually faster.
I got the following results:
printf: 00:00:12.024
cout: 00:00:04.144
Is cout always faster than printf? Probably not. Especially not with older implementations. But on newer implementations iostreams are likely to be faster than stdio because instead of parsing a format string at runtime, the compiler knows at compile time what functions it needs to call in order to convert integers/floats/objects to strings.
But more importantly, the speed of printf versus cout depends on the implementation, and so the problem described by the OP is not easily explicable.
Try call ios::sync_with_stdio(false); before using std::cout/cin, unless of course, you mix stdio and iostream in your program, which is a bad thing to do.
Based on my experience in programming competitions, printf IS faster than cout.
I remember many times when my solution didn't make it before the Time limit just because of cin/cout, while printf/scanf did work.
Besides that, it seems normal (at least for me) that cout is slower than printf, because it does more operations.
Try using some endls or flushes as they will flush cout's buffer, in case the OS is caching your program's output for whatever reason. But, as Charles says, there's no good explanation for this behavior, so if that doesn't help then it's likely a problem specific to your machine.
You should try to write all your data to an ostringstream first, and then use cout on the ostringstream's str(). I am on 64-bit Windows 7 and Test1 was already significantly faster than Test2 (your mileage may vary). Using an ostringstream to build a single string first and then using cout on that further decreased Test1's execution time by a factor of about 3 to 4. Be sure to #include <sstream>.
I.e., replace
void Test1()
{
cout
<< "a:" << a << "\n"
<< "a:" << setfill('0') << setw(8) << a << "\n"
<< "b:" << b << "\n"
<< "c:" << c << "\n"
<< "d:" << d << "\n"
<< "e:" << e << "\n"
<< "f:" << setprecision(6) << f << "\n"
<< "g:" << setprecision(10) << g << endl;
}
with:
void Test1()
{
ostringstream oss;
oss
<< "a:" << a << "\n"
<< "a:" << setfill('0') << setw(8) << a << "\n"
<< "b:" << b << "\n"
<< "c:" << c << "\n"
<< "d:" << d << "\n"
<< "e:" << e << "\n"
<< "f:" << setprecision(6) << f << "\n"
<< "g:" << setprecision(10) << g << endl;
cout << oss.str();
}
I suspect ostringstream makes this so much faster as a result of not trying to write to the screen each time you call operator<< on cout. I've also noticed through experience that reducing the number of times you write to the screen (by writing more at once) increases performance (again, your mileage may vary).
E.g.,
void Foo1()
{
for(int i = 0; i < 10000; ++i) {
cout << "Foo1\n";
}
}
void Foo2()
{
std::string s;
for(int i = 0; i < 10000; ++i) {
s += "Foo2\n";
}
cout << s;
}
void Foo3()
{
std::ostringstream oss;
for(int i = 0; i < 10000; ++i) {
oss << "Foo3\n";
}
cout << oss.str();
}
In my case, Foo1 took 1,092ms, Foo2 took 234ms, and Foo3 took 218ms. ostingstreams are your friend. Obviously Foo2 and Foo3 require (trivially) more memory. To compare this against a C-style function, try sprintf into a buffer and then write that buffer using fprintf and you should see still more efficiency over Test2 (though for me this only improved performance of Test2 by about 10% or so; cout and printf are indeed different beasts under the hood).
Compiler: MinGW64 (TDM and its bundled libraries).
Try using ios::sync_with_stdio(false);. Mention it before using std::cin/cout. It doesn't mix stdio or iostream but it synchronizes iostream standard streams with their corresponding standard c streams.
for example - std::cin/wcin of iostream is synchronized with stdin of c stream
Here is hax that should make c++ streams as fast as c printf. I never tested it but I believe it works.
ios_base::sync_with_stdio(0);