Timing the Thrash - c++

I feel confident that my nested for conditions will thrash memory, but I would like to know long it takes. I'm assuming time.h can help but I don't know what methods to use and how to display. Can someone help?
I have updated my code with the suggestions made and I believe it worked. I got a slow output of 4 (thrashTime). Is this in seconds? Also, perhaps my method could be refactored. I set a time before and after the for conditions.
// Updated
#include <iostream>
#include <time.h>
using namespace std;
int array[1 << 14][1 << 14];
int main() {
time_t beforeThrash = 0;
time_t afterThrash = 0;
time_t thrashTime;
int i, j;
beforeThrash = time(NULL);
for (i = 0; i<16384; i++)
for (j = 0; j<16384; j++)
array[i][j] = i*j;
afterThrash = time(NULL);
thrashTime = afterThrash - beforeThrash;
cout << thrashTime << endl;
system("pause");
return 0;
}

You can just follow the instruction in time and clock as Joe Z mentioned.
A quick demo for printing current time:
#include <ctime>
time_t start = time(0);
const char* tstart = ctime(&start);
// std::cout << tstart; will give you local time Fri Dec 06 11:53:46 2013
For time difference:
#include <ctime>
clock_t t = clock();
do_something();
t = clock() - t;
// std::cout << (float)t / CLOCKS_PER_SEC; will give you elapsed time in seconds
You can simply replace do_something(); with the operations to be measured.

Related

Execution time of a function in C++

I want to use several functions that declare the same array but in different ways (statically, on the stack and on the heap) and to display the execution time of each functions. Finally I want to call those functions several times.
I think I've managed to do everything but for the execution time of the functions I'm constantly getting 0 and I don't know if it's supposed to be normal. If somebody could confirm it for me. Thanks
Here's my code
#include "stdafx.h"
#include <iostream>
#include <time.h>
#include <stdio.h>
#include <chrono>
#define size 100000
using namespace std;
void prem(){
auto start = std::chrono::high_resolution_clock::now();
static int array[size];
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed timefor static: " << elapsed.count() << " s\n";
}
void first(){
auto start = std::chrono::high_resolution_clock::now();
int array[size];
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time on the stack: " << elapsed.count() << " s\n";
}
void secon(){
auto start = std::chrono::high_resolution_clock::now();
int *array = new int[size];
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time dynamic: " << elapsed.count() << " s\n";
delete[] array;
}
int main()
{
for (int i = 0; i <= 1000; i++){
prem();
first();
secon();
}
return 0;
}
prem() - the array is allocated outside of the function
first() - the array is allocated before your code gets to it
You are looping over all 3 functions in a single loop. Why? Didn't you mean to loop for 1000 times over each one separately, so that they (hopefully) don't affect each other? In practice that last statement is not true though.
My suggestions:
Loop over each function separately
Do the now() call for the entire 1000 loops: make the now() calls before you enter the loop and after you exit it, then get the difference and divide it by the number of iterations(1000)
Dynamic allocation can be (trivially) reduced to just grabbing a block of memory in the vast available address space (I assume you are running on 64-bit platform) and unless you actually use that memory the OS doesn't even need to make sure it is in RAM. That would certainly skew your results significantly
Write a "driver" function that gets function pointer to "test"
Possible implementation of that driver() function:
void driver( void(*_f)(), int _iter, std::string _name){
auto start = std::chrono::high_resolution_clock::now();
for(int i = 0; i < _iter; ++i){
*_f();
}
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time " << _name << ": " << elapsed.count() / _iter << " s" << std::endl;
}
That way your main() looks like that:
void main(){
const int iterations = 1000;
driver(prem, iterations, "static allocation");
driver(first, iterations, "stack allocation");
driver(secon, iterations, "dynamic allocation");
}
Do not do such synthetic tests because the compiler will optimize out everything that is not used.
As another answer suggests, you need to measure the time for entire 1000 loops. And even though, I do not think you will get reasonable results.
Let's make not 1000 iterations, but 1000000. And let's add another case, where we just do two subsequent calls to chrono::high_resolution_clock::now() as a baseline:
#include <iostream>
#include <time.h>
#include <stdio.h>
#include <chrono>
#include <string>
#include <functional>
#define size 100000
using namespace std;
void prem() {
static int array[size];
}
void first() {
int array[size];
}
void second() {
int *array = new int[size];
delete[] array;
}
void PrintTime(std::chrono::duration<double> elapsed, int count, std::string msg)
{
std::cout << msg << elapsed.count() / count << " s\n";
}
int main()
{
int iterations = 1000000;
{
auto start = std::chrono::high_resolution_clock::now();
auto finish = std::chrono::high_resolution_clock::now();
PrintTime(finish - start, iterations, "Elapsed time for nothing: ");
}
{
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i <= iterations; i++)
{
prem();
}
auto finish = std::chrono::high_resolution_clock::now();
PrintTime(finish - start, iterations, "Elapsed timefor static: ");
}
{
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i <= iterations; i++)
{
first();
}
auto finish = std::chrono::high_resolution_clock::now();
PrintTime(finish - start, iterations, "Elapsed time on the stack: ");
}
{
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i <= iterations; i++)
{
second();
}
auto finish = std::chrono::high_resolution_clock::now();
PrintTime(finish - start, iterations, "Elapsed time dynamic: ");
}
return 0;
}
With all optimisations on, I get this result:
Elapsed time for nothing: 3.11e-13 s
Elapsed timefor static: 3.11e-13 s
Elapsed time on the stack: 3.11e-13 s
Elapsed time dynamic: 1.88703e-07 s
That basically means, that compiler actually optimized out prem() and first(). Even not calls, but entire loops, because they do not have side effects.

Why is a if statement and a variable declaration faster than a addition in a loop?

if we have if statements with declarations of variables like this:
#include <iostream>
#include <ctime>
using namespace std;
int main() {
int res = 0;
clock_t begin = clock();
for(int i=0; i<500500000; i++) {
if(i%2 == 0) {int fooa; fooa = i;}
if(i%2 == 0) {int foob; foob = i;}
if(i%2 == 0) {int fooc; fooc = i;}
}
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
cout << elapsed_secs << endl;
return 0;
}
the result is:
1.44
Process returned 0 (0x0) execution time : 1.463 s
Press any key to continue.
but, if it is:
#include <iostream>
#include <ctime>
using namespace std;
int main() {
int res = 0;
clock_t begin = clock();
for(int i=0; i<500500000; i++) {
res++;
res--;
res++;
}
clock_t end = clock();
double elapsed_secs = double(end - begin) / CLOCKS_PER_SEC;
cout << elapsed_secs << endl;
return 0;
}
the result is:
3.098
Process returned 0 (0x0) execution time : 3.115 s
Press any key to continue.
why does adding or subtracting takes more time to run than a if statement with a variable declaration?
The difference is almost certainly due to compiler optimization. You’ll have to look at the assembly to make sure, but here is my take on what happens:
In the first example it’s trivial for the optimizer to realize that the bodies of the ifs have no effect. In each a variable local to the if is declared, assigned to and immediately destroyed. So the ifs get optimized away, leaving an empty for loop that gets optimized away as well.
The sitiuation in the second example is not that trivial on the whole. What is trivial is that the body of the loop boils down to a single res++, which most likely will be further optimized to ++res. But because res is not local to the loop the optimizer has to consider the whole main() function to realize that the loop has no effect. Most likely it fails to do so.
Conclusion: In its current form the measurements are meaningless. Disabling optimization also won’t help because you’ll never do that for a production build. If you really want to dive into this I suggest watching CppCon 2015: Chandler Carruth "Tuning C++: Benchmarks, and CPUs, and Compilers! Oh My!" for great advice on how to handle the optimizer in these kinds of siutuations.

Why thread code is taking more time than sequential code in c++?

i went to multiply matrix[1000][1000] by vector[1000] using sequetial code and thread code and the calc the time performance ?
//use this command to run the program:-
//g++ -fopenmp -std=c++11 -O3 -o OMPmonti OMPmonti.cpp -lpthread
#include<iostream>
#include<fstream>
#include <ctime>
using namespace std;
#define SIZE 1000
int main()
{
std::clock_t start;
start = std::clock();
int MATRIX[SIZE][SIZE]={0};
int VECTOR[SIZE]={0};
int RESULT[SIZE]={0};
for(int i=0;i<SIZE;i++)
{
int x=i;
for(int j=0;j<SIZE;j++)
{
MATRIX[i][j]=x;
x=x+1;
}
VECTOR[i]=i;
}
for(int i=0;i<SIZE;i++)
for(int j=0;j<SIZE;j++)
RESULT[i]+=MATRIX[i][j]*VECTOR[j];
ofstream output("result.txt");
for(int i=0;i<SIZE;i++)
output<<RESULT[i]<<"\n";
output.close();
std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
return 0;
}
the thread code is :-
//use this command to run the program:-
//g++ -fopenmp -std=c++11 -O3 -o OMPmonti OMPmonti.cpp -lpthread
#include<iostream>
#include<pthread.h>
#include<fstream>
#include <ctime>
using namespace std;
#define SIZE 1000
int NUM_THREADS;
int MATRIX[SIZE][SIZE]={0};
int VECTOR[SIZE]={0};
int RESULT[SIZE]={0};
struct BOUNDARIES{
int START;
int END;
};
void *MUL_ROUTINE(void *PARAM)
{
BOUNDARIES *info= ( BOUNDARIES *) PARAM;
for(int i=info->START;i<=info->END;i++)
for(int j=0;j<SIZE;j++)
RESULT[i]+=MATRIX[i][j]*VECTOR[j];
pthread_exit(NULL);
}
int main()
{
std::clock_t start;
start = std::clock();
for(int i=0;i<SIZE;i++)
{
int x=i;
for(int j=0;j<SIZE;j++)
{
MATRIX[i][j]=x;
x=x+1;
}
VECTOR[i]=i;
}
NUM_THREADS=4;
pthread_t THREADS[NUM_THREADS];
BOUNDARIES info[NUM_THREADS];
int ret;
for(int i=0;i<NUM_THREADS;i++)
{
if(i==0)
info[i].START=0;
else
info[i].START=info[i-1].END+1;
info[i].END=info[i].START+(SIZE/NUM_THREADS-1);
if(i<(SIZE%NUM_THREADS))
info[i].END++;
ret=pthread_create(&THREADS[i],NULL,&MUL_ROUTINE,&info[i]);
if(ret)
{
cout<<"Error Creating Thread "<<i<<endl;
cout<<"Terminating The Program......"<<endl;
return 0;
}
}
for(int i=0;i<NUM_THREADS;i++)
pthread_join(THREADS[i],0);
ofstream output("result1.txt");
for(int i=0;i<SIZE;i++)
output<<RESULT[i]<<"\n";
output.close();
std::cout << "Time: " << (std::clock() - start) / (double)(CLOCKS_PER_SEC / 1000) << " ms" << std::endl;
return 0;
}
You are also measuring the time of the thread creation. Lets assume it takes 1.9ms to create those 4 threads. After that each thread performs it's calculation and after 0,625ms every thread is done.
To conclude the above example: Those time-measurements are unfair! You should not compare two pieces of code that do not do the same!
You can try to put your start clock after the loop that creates the threads. But you may realize that this is also unfair because the first threads may have already been started. So in either case you wont get the time it took to perform the calculation.
Well to see a benefit you should use a more expensive calculation that takes several seconds in the single core case. After that the 2ms that it takes to create those threads are insignificant.
From the man page of clock(3):
DESCRIPTION
The clock() function returns an approximation of processor time used by the
program.
You are measuring the CPU time that the execution of your two programs takes. Since there is always overhead related to communication and synchronization in threaded programming, the CPU time will never be lower than that of a sequential program.
But that mostly doesn't matter because the real time is lower - you're just not measuring it. To do that you could
#include <omp.h> and use the double omp_get_wtime() function
use the function clock_gettime with CLOCK_REALTIME from <ctime>
use any of the clocks available in #include <chrono>
You have to know what you're measuring to be able to draw conclusions.

clock() at two locations returning same number of ticks

I am trying to learn how to use clock(). Here is a piece of code that i have
int main()
{
srand(time(NULL));
clock_t t;
int num[100000];
int total=0;
t=clock();
cout<<"tick:"<<t<<endl;
for (int i=0;i<100000;i++)
{
num[i]=rand();
//cout<<num[i]<<endl;
}
for(int j=0;j<100000;j++)
{
total+=num[j];
}
t=clock();
cout<<"total:"<<total<<endl;
cout<<"ticks after loop:"<<t<<endl;
//std::cout<<"The number of ticks for the loop to caluclate total:"<<t<<"\t time is seconds:"<<((float)t)/CLOCKS_PER_SEC<<endl;
cin.get();
}
The result that i get is in below image. I don't understand why the tick count are same even though there are two big loops in between.
The clock() function has a finite resolution. On VC2013 it is once per millisec. (Your system may vary). If you call clock() twice in the same millisecond (or whatever) you get the same value.
in <ctime> there is a constant CLOCKS_PER_SEC which tells you how many ticks per second. For VC2012 that is 1000.
** Update 1 **
You said you're in Windows. Here's some Win-specific code that gets higher resolution time. If I get time I'll try to do something portable.
#include <iostream>
#include <vector>
#include <ctime>
#include <Windows.h>
int main()
{
::srand(::time(NULL));
FILETIME ftStart, ftEnd;
const int nMax = 1000*1000;
std::vector<unsigned> vBuff(nMax);
int nTotal=0;
::GetSystemTimeAsFileTime(&ftStart);
for (int i=0;i<nMax;i++)
{
vBuff[i]=rand();
}
for(int j=0;j<nMax;j++)
{
nTotal+=vBuff[j];
}
::GetSystemTimeAsFileTime(&ftEnd);
double dElapsed = (ftEnd.dwLowDateTime - ftStart.dwLowDateTime) / 10000.0;
std::cout << "Elapsed time = " << dElapsed << " millisec\n";
return 0;
}
** Update 2 **
Ok, here's the portable version.
#include <iostream>
#include <vector>
#include <ctime>
#include <chrono>
// abbreviations to avoid long lines
typedef std::chrono::high_resolution_clock Clock_t;
typedef std::chrono::time_point<Clock_t> TimePoint_t;
typedef std::chrono::microseconds usec;
uint64_t ToUsec(Clock_t::duration t)
{
return std::chrono::duration_cast<usec>(t).count();
}
int main()
{
::srand(static_cast<unsigned>(::time(nullptr)));
const int nMax = 1000*1000;
std::vector<unsigned> vBuff(nMax);
int nTotal=0;
TimePoint_t tStart(Clock_t::now());
for (int i=0;i<nMax;i++)
{
vBuff[i]=rand();
}
for(int j=0;j<nMax;j++)
{
nTotal+=vBuff[j];
}
TimePoint_t tEnd(Clock_t::now());
uint64_t nMicroSec = ToUsec(tEnd - tStart);
std::cout << "Elapsed time = "
<< nMicroSec / 1000.0
<< " millisec\n";
return 0;
}
Strong suggestion:
Run the same benchmark, but try multiple, alternative methods. For example:
clock_gettime
/proc/pid/stat
GetProcessTimes
getrusage
Etc.
The problem with (Posix-compliant) "clock()" is that it isn't necessarily accurate enough for meanintful benchmarks, dependent on your compiler library/platform.
Time has limited accuracy (perhaps only several milliseconds)... And on Linux clock has been slightly improved in very recent libc. At last, your loop is too small (a typical elementary C instruction runs in less than a few nanoseconds). Make it bigger, e.g. do it a billion times. But then you should declare static int num[1000000000]; to avoid eating too much stack space.

Timing a for loop with clock

Hi so I am trying to do a program that sums 20 consecutive numbers and calculates the time that it took to do so... the problem is that when I run the program the time is always 0... any ideas?
this is what I have so far... thanks!
#include <iostream>
#include <time.h>
using namespace std;
int main()
{
int finish = 20;
int start = 1;
int result = 0;
double msecs;
clock_t init, end;
init = clock();
for (int i = start; i <= finish; i++)
{
result += i;
}
end = clock();
cout << ((float)(end - init)) *1000 / (CLOCKS_PER_SEC);
system ("PAUSE");
return 0;
}
No matter what technique you use for timing they all have some precision. This simply executes so fast that your timer isn't registering any time as having passed.
Aside #1: Use high_resolution_clock - maybe that will register something non-zero, probably not.
Aside #2: Don't name your variable null, in C++ that implies 0 or a null pointer
You can try this...but you might need version C++11.
This can get down to 0.000001 seconds.
#include <iostream>
#include <ctime>
#include <ratio>
#include <chrono>
//using namespace std;
int main()
{
using namespace std::chrono;
high_resolution_clock::time_point t1 = high_resolution_clock::now();
int finish = 20;
int start = 1;
for (int i = start; i <= finish; i++)
{
result += i;
}
high_resolution_clock::time_point t2 = high_resolution_clock::now();
duration<double> time_span = duration_cast<duration<double>>(t2 - t1);
cout << time_span.count() << " seconds" << endl;
end = clock();
system ("PAUSE");
return 0;
}