Timing a for loop with clock - c++

Hi so I am trying to do a program that sums 20 consecutive numbers and calculates the time that it took to do so... the problem is that when I run the program the time is always 0... any ideas?
this is what I have so far... thanks!
#include <iostream>
#include <time.h>
using namespace std;
int main()
{
int finish = 20;
int start = 1;
int result = 0;
double msecs;
clock_t init, end;
init = clock();
for (int i = start; i <= finish; i++)
{
result += i;
}
end = clock();
cout << ((float)(end - init)) *1000 / (CLOCKS_PER_SEC);
system ("PAUSE");
return 0;
}

No matter what technique you use for timing they all have some precision. This simply executes so fast that your timer isn't registering any time as having passed.
Aside #1: Use high_resolution_clock - maybe that will register something non-zero, probably not.
Aside #2: Don't name your variable null, in C++ that implies 0 or a null pointer

You can try this...but you might need version C++11.
This can get down to 0.000001 seconds.
#include <iostream>
#include <ctime>
#include <ratio>
#include <chrono>
//using namespace std;
int main()
{
using namespace std::chrono;
high_resolution_clock::time_point t1 = high_resolution_clock::now();
int finish = 20;
int start = 1;
for (int i = start; i <= finish; i++)
{
result += i;
}
high_resolution_clock::time_point t2 = high_resolution_clock::now();
duration<double> time_span = duration_cast<duration<double>>(t2 - t1);
cout << time_span.count() << " seconds" << endl;
end = clock();
system ("PAUSE");
return 0;
}

Related

Does armadillo library slow down the execution of a matrix operations?

I've converted a MATLAB code to C++ to speed it up, using the Armadillo library to handle matrix operations in C++, but surprisingly it is 10 times slower than the MATLAB code!
So I test the Armadillo library to see if it's the cause. The below code is a simple test code that initializes two matrices, adds them together and saves the result to a new matrix. One section of code uses the Armadillo library and other one doesn't. The section using Armadillo is too slow (notice the elapsed times).
Does it really slow down the execution (though it is supposed to speed it up) or am I missing some thing?
#include<iostream>
#include<math.h>
#include<chrono>
#include<armadillo>
using namespace std;
using namespace arma;
int main()
{
auto start = std::chrono::high_resolution_clock::now();
double a[100][100];
double b[100][100];
double c[100][100];
for (int i = 0; i < 100; i++)
{
for (int j = 0; j < 100; j++)
{
a[i][j] = 1;
b[i][j] = 1;
c[i][j] = a[i][j] + b[i][j];
}
}
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time: " << elapsed.count() << " s\n";
auto start1 = std::chrono::high_resolution_clock::now();
mat a1=ones(100,100);
mat b1=ones(100,100);
mat c1(100,100);
c1 = a1 + b1;
auto finish1 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed1 = finish1 - start1;
std::cout << "Elapsed time: " << elapsed1.count() << " s\n";
return 0;
}
Here is the answer I get:
Elapsed time: 5.1729e-05 s
Elapsed time: 0.00025536 s
As you see, Armadillo is significantly slower! Is it better not to use the Armadillo library?
First of all make sure that the blas and lapack library are enabled, there are instructions at Armadillo doc.
The second thing is that it might be a more extensive memory allocation in Armadillo. If you restructure your code to do the memory initialisation first as
#include<iostream>
#include<math.h>
#include<chrono>
#include<armadillo>
using namespace std;
using namespace arma;
int main()
{
double a[100][100];
double b[100][100];
double c[100][100];
mat a1=ones(100,100);
mat b1=ones(100,100);
mat c1(100,100);
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < 100; i++)
{
for (int j = 0; j < 100; j++)
{
a[i][j] = 1;
b[i][j] = 1;
c[i][j] = a[i][j] + b[i][j];
}
}
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time: " << elapsed.count() << " s\n";
auto start1 = std::chrono::high_resolution_clock::now();
c1 = a1 + b1;
auto finish1 = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed1 = finish1 - start1;
std::cout << "Elapsed time: " << elapsed1.count() << " s\n";
return 0;
}
With this I got the result:
Elapsed time: 0.000647521 s
Elapsed time: 0.000353198 s
I compiled it with (in Ubuntu 17.10):
g++ prog.cpp -larmadillo
I believe the problem comes from you not using armadillo at all. You uniquely used it to create variables that are a bit more complex than normal 2D arrays of C++, but really nothing more. What Armadillo can do for you is give you very fast matrix operations, as in c1=a1+b1;, without loops.
But if you just instead write it as an elemetwise operation, you are just not using armadillo. Its the same as using MATLAB for matrix multiplication but writing the matrix multiplication yourself. You are not using MATLAB's libraries then!

Execution time of a function in C++

I want to use several functions that declare the same array but in different ways (statically, on the stack and on the heap) and to display the execution time of each functions. Finally I want to call those functions several times.
I think I've managed to do everything but for the execution time of the functions I'm constantly getting 0 and I don't know if it's supposed to be normal. If somebody could confirm it for me. Thanks
Here's my code
#include "stdafx.h"
#include <iostream>
#include <time.h>
#include <stdio.h>
#include <chrono>
#define size 100000
using namespace std;
void prem(){
auto start = std::chrono::high_resolution_clock::now();
static int array[size];
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed timefor static: " << elapsed.count() << " s\n";
}
void first(){
auto start = std::chrono::high_resolution_clock::now();
int array[size];
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time on the stack: " << elapsed.count() << " s\n";
}
void secon(){
auto start = std::chrono::high_resolution_clock::now();
int *array = new int[size];
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time dynamic: " << elapsed.count() << " s\n";
delete[] array;
}
int main()
{
for (int i = 0; i <= 1000; i++){
prem();
first();
secon();
}
return 0;
}
prem() - the array is allocated outside of the function
first() - the array is allocated before your code gets to it
You are looping over all 3 functions in a single loop. Why? Didn't you mean to loop for 1000 times over each one separately, so that they (hopefully) don't affect each other? In practice that last statement is not true though.
My suggestions:
Loop over each function separately
Do the now() call for the entire 1000 loops: make the now() calls before you enter the loop and after you exit it, then get the difference and divide it by the number of iterations(1000)
Dynamic allocation can be (trivially) reduced to just grabbing a block of memory in the vast available address space (I assume you are running on 64-bit platform) and unless you actually use that memory the OS doesn't even need to make sure it is in RAM. That would certainly skew your results significantly
Write a "driver" function that gets function pointer to "test"
Possible implementation of that driver() function:
void driver( void(*_f)(), int _iter, std::string _name){
auto start = std::chrono::high_resolution_clock::now();
for(int i = 0; i < _iter; ++i){
*_f();
}
auto finish = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> elapsed = finish - start;
std::cout << "Elapsed time " << _name << ": " << elapsed.count() / _iter << " s" << std::endl;
}
That way your main() looks like that:
void main(){
const int iterations = 1000;
driver(prem, iterations, "static allocation");
driver(first, iterations, "stack allocation");
driver(secon, iterations, "dynamic allocation");
}
Do not do such synthetic tests because the compiler will optimize out everything that is not used.
As another answer suggests, you need to measure the time for entire 1000 loops. And even though, I do not think you will get reasonable results.
Let's make not 1000 iterations, but 1000000. And let's add another case, where we just do two subsequent calls to chrono::high_resolution_clock::now() as a baseline:
#include <iostream>
#include <time.h>
#include <stdio.h>
#include <chrono>
#include <string>
#include <functional>
#define size 100000
using namespace std;
void prem() {
static int array[size];
}
void first() {
int array[size];
}
void second() {
int *array = new int[size];
delete[] array;
}
void PrintTime(std::chrono::duration<double> elapsed, int count, std::string msg)
{
std::cout << msg << elapsed.count() / count << " s\n";
}
int main()
{
int iterations = 1000000;
{
auto start = std::chrono::high_resolution_clock::now();
auto finish = std::chrono::high_resolution_clock::now();
PrintTime(finish - start, iterations, "Elapsed time for nothing: ");
}
{
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i <= iterations; i++)
{
prem();
}
auto finish = std::chrono::high_resolution_clock::now();
PrintTime(finish - start, iterations, "Elapsed timefor static: ");
}
{
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i <= iterations; i++)
{
first();
}
auto finish = std::chrono::high_resolution_clock::now();
PrintTime(finish - start, iterations, "Elapsed time on the stack: ");
}
{
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i <= iterations; i++)
{
second();
}
auto finish = std::chrono::high_resolution_clock::now();
PrintTime(finish - start, iterations, "Elapsed time dynamic: ");
}
return 0;
}
With all optimisations on, I get this result:
Elapsed time for nothing: 3.11e-13 s
Elapsed timefor static: 3.11e-13 s
Elapsed time on the stack: 3.11e-13 s
Elapsed time dynamic: 1.88703e-07 s
That basically means, that compiler actually optimized out prem() and first(). Even not calls, but entire loops, because they do not have side effects.

C++ Unhandled exception for large vector/array

I keep getting an unhandled exception in my code and it has me stumped.
I am sure it is in the way I have my variables declared.
Basically I am attempting to create 3 arrays, M rows, N columns of random variables.
If I set my N = 1,000 and M = 10,000, not a problem.
If I then change M = 100,000 I get an Unhandled exception memory allocation error.
Can someone please help me understand why this is happening.
Parts of the code was written on VS2010. I have now moved on to VS2013, so any additional advice on the usage of newer functions would also be appreciated.
cheers,
#include <cmath>
#include <iostream>
#include <random>
#include <vector>
#include <ctime>
#include <ratio>
#include <chrono>
int main()
{
using namespace std::chrono;
steady_clock::time_point Start_Time = steady_clock::now();
unsigned int N; // Number of time Steps in a simulation
unsigned long int M; // Number of simulations (paths)
N = 1000;
M = 10000;
// Random Number generation setup
double RANDOM;
srand((unsigned int)time(NULL)); // Generator loop reset
std::default_random_engine generator(rand()); // Seed with RAND()
std::normal_distribution<double> distribution(0.0, 1.0); // Mean = 0.0, Variance = 1.0 ie Normal
std::vector<std::vector<double>> RandomVar_A(M, std::vector<double>(N)); // dw
std::vector<std::vector<double>> RandomVar_B(M, std::vector<double>(N)); // uncorrelated dz
std::vector<std::vector<double>> RandomVar_C(M, std::vector<double>(N)); // dz
// Generate random variables for dw
for (unsigned long int i = 0; i < M; i++)
{
for (unsigned int j = 0; j < N; j++)
{
RANDOM = distribution(generator);
RandomVar_A[i][j] = RANDOM;
}
}
// Generate random variables for uncorrelated dz
for (unsigned long int i = 0; i < M; i++)
{
for (unsigned int j = 0; j < N; j++)
{
RANDOM = distribution(generator);
RandomVar_B[i][j] = RANDOM;
}
}
// Generate random variables for dz
for (unsigned long int i = 0; i < M; i++)
{
for (unsigned int j = 0; j < N; j++)
{
RANDOM = distribution(generator);
RandomVar_C[i][j] = RANDOM;
}
}
steady_clock::time_point End_Time = steady_clock::now();
duration<double> time_span = duration_cast<duration<double>>(End_Time - Start_Time);
//Clear Matricies
RandomVar_A.clear();
RandomVar_B.clear();
RandomVar_C.clear();
std::cout << std::endl;
std::cout << "its done";
std::cout << std::endl << std::endl;
std::cout << "Time taken : " << time_span.count() << " Seconds" << std::endl << std::endl;
std::cout << "End Of Program" << std::endl << std::endl;
system("pause");
return 0;
}
// *************** END OF PROGRAM ***************
Three 100,000 x 1,000 arrays of doubles represents 300 million doubles. Assuming 8 byte doubles, that's around 2.3 GB of memory. Most likely your process is by default limited to 2 GB on Windows (even if you have much more RAM installed on the machine). However, there are ways to allow your process to access a larger address space: Memory Limits for Windows.
I'm experienced something similar then my 32-bit application allocates more than 2Gb memory.
Your vectors require about 2.1Gb memory, so it might be same problem.
Try to change platform of your application to x64. This may solve problem.

Timing the Thrash

I feel confident that my nested for conditions will thrash memory, but I would like to know long it takes. I'm assuming time.h can help but I don't know what methods to use and how to display. Can someone help?
I have updated my code with the suggestions made and I believe it worked. I got a slow output of 4 (thrashTime). Is this in seconds? Also, perhaps my method could be refactored. I set a time before and after the for conditions.
// Updated
#include <iostream>
#include <time.h>
using namespace std;
int array[1 << 14][1 << 14];
int main() {
time_t beforeThrash = 0;
time_t afterThrash = 0;
time_t thrashTime;
int i, j;
beforeThrash = time(NULL);
for (i = 0; i<16384; i++)
for (j = 0; j<16384; j++)
array[i][j] = i*j;
afterThrash = time(NULL);
thrashTime = afterThrash - beforeThrash;
cout << thrashTime << endl;
system("pause");
return 0;
}
You can just follow the instruction in time and clock as Joe Z mentioned.
A quick demo for printing current time:
#include <ctime>
time_t start = time(0);
const char* tstart = ctime(&start);
// std::cout << tstart; will give you local time Fri Dec 06 11:53:46 2013
For time difference:
#include <ctime>
clock_t t = clock();
do_something();
t = clock() - t;
// std::cout << (float)t / CLOCKS_PER_SEC; will give you elapsed time in seconds
You can simply replace do_something(); with the operations to be measured.

Why is accumulate faster than a simple for cycle?

I was testing algorithms and run into this weird behavior, when std::accumulate is faster than a simple for cycle.
Looking at the generated assembler I'm not much wiser :-) It seems that the for cycle is optimized into MMX instructions, while accumulate expands into a loop.
This is the code. The behavior manifests with -O3 optimization level, gcc 4.7.1
#include <vector>
#include <chrono>
#include <iostream>
#include <random>
#include <algorithm>
using namespace std;
int main()
{
const size_t vsize = 100*1000*1000;
vector<int> x;
x.reserve(vsize);
mt19937 rng;
rng.seed(chrono::system_clock::to_time_t(chrono::system_clock::now()));
uniform_int_distribution<uint32_t> dist(0,10);
for (size_t i = 0; i < vsize; i++)
{
x.push_back(dist(rng));
}
long long tmp = 0;
for (size_t i = 0; i < vsize; i++)
{
tmp += x[i];
}
cout << "dry run " << tmp << endl;
auto start = chrono::high_resolution_clock::now();
long long suma = accumulate(x.begin(),x.end(),0);
auto end = chrono::high_resolution_clock::now();
cout << "Accumulate runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma << endl;
start = chrono::high_resolution_clock::now();
suma = 0;
for (size_t i = 0; i < vsize; i++)
{
suma += x[i];
}
end = chrono::high_resolution_clock::now();
cout << "Manual sum runtime " << chrono::duration_cast<chrono::nanoseconds>(end-start).count() << " - " << suma << endl;
return 0;
}
When you pass the 0 to accumulate, you are making it accumulate using an int instead of a long long.
If you code your manual loop like this, it will be equivalent:
int sumb = 0;
for (size_t i = 0; i < vsize; i++)
{
sumb += x[i];
}
suma = sumb;
or you can call accumulate like this:
long long suma = accumulate(x.begin(),x.end(),0LL);
I have some different results using Visual Studio 2012
// original code
Accumulate runtime 93600 ms
Manual sum runtime 140400 ms
Note that the original std::accumulate code isn't equivalent to the for loop because the third parameter to std::accumulate is an int 0 value. It performs the summation using an int and only at the end stores the result in a long long. Changing the third parameter to 0LL forces the algorithm to use a long long accumulator and results in the following times.
// change std::accumulate initial value -> 0LL
Accumulate runtime 265200 ms
Manual sum runtime 140400 ms
Since the final result fits in an int I changed suma and std::accumulate back to using only int values. After this change the MSVC 2012 compiler was able to auto-vectorize the for loop and resulted in the following times.
// change suma from long long to int
Accumulate runtime 93600 ms
Manual sum runtime 46800 ms
After fixing the accumulate issue others noted I tested with both Visual Studio 2008 & 2010 and accumulate was indeed faster than the manual loop.
Looking at the disassembly I saw some additional iterator checking being done in the manual loop so I switched to just a raw array to eliminate it.
Here's what I ended up testing with:
#include <Windows.h>
#include <iostream>
#include <numeric>
#include <stdlib.h>
int main()
{
const size_t vsize = 100*1000*1000;
int* x = new int[vsize];
for (size_t i = 0; i < vsize; i++) x[i] = rand() % 1000;
LARGE_INTEGER start,stop;
long long suma = 0, sumb = 0, timea = 0, timeb = 0;
QueryPerformanceCounter( &start );
suma = std::accumulate(x, x + vsize, 0LL);
QueryPerformanceCounter( &stop );
timea = stop.QuadPart - start.QuadPart;
QueryPerformanceCounter( &start );
for (size_t i = 0; i < vsize; ++i) sumb += x[i];
QueryPerformanceCounter( &stop );
timeb = stop.QuadPart - start.QuadPart;
std::cout << "Accumulate: " << timea << " - " << suma << std::endl;
std::cout << " Loop: " << timeb << " - " << sumb << std::endl;
delete [] x;
return 0;
}
Accumulate: 633942 - 49678806711
Loop: 292642 - 49678806711
Using this code, the manual loop easily beats accumulate. The big difference is the compiler unrolled the manual loop 4 times, otherwise the generated code is almost identical.