Avoiding CPU Contention - c++

I have a program that I want to calculate its time of execution :
#include <iostream>
#include <boost/chrono.hpp>
using namespace std;
int main(int argc, char* const argv[])
{
boost::chrono::system_clock::time_point start = boost::chrono::system_clock::now();
// Intructions to burn time
boost::chrono::duration<double> sec = boost::chrono::system_clock::now() - start;
cout <<"---- time execution is " << sec.count() << ";";
return 0;
}
For example the result after one run:
---- time execution is 0.0223588
This result isn't very conscious because the CPU time is included .
I had An idea to avoid CPU contention by testing many runs and getting there average .
The problem is :
How can I store the time value of the previous run ?
Can we do that via a file ?
How to incrementally calculate the average after each run ?
Your suggestion / pseudocodes are welcome.

You may pass average number via command line using 2 args: current average value and the number of iterations performed.
Then:
NewAverage = ((CurrentAverage*N) + CurrentValue) / (N+1);
where N is the number of iterations.

Related

Chrono high_resolution_clock giving inconsistent times?

So I have a program that evaluates a polynomial in two different ways: Honrer's method and a Naive method. I'm trying to see their run times respectively, but depending on which order I place the function calls their times change. For example, I place the Horner method first and it takes longer. I then tried with the naive method first and then it takes longer. The Horner method should be much much faster since it only has one loop where the naive method has a nested loop. So i figured it must be the way I'm using the clocks from the chrono library. I tried both the high_resolution_clock and system_clock, but the same thing happens. Any help/comments are welcomed.
#include <cstdlib>
#include <iostream>
#include <chrono>
#include "Polynomial.h"
int main(int argc, char** argv) {
double c[5] = {5, 0, -3, 1, -8};
int degree = 4;
Polynomial obj(c, degree);
auto start = std::chrono::high_resolution_clock::now();
std::cout<<"Horner Evaluation: " << obj.hornerEval(-2)<<", ";
auto elapsed = std::chrono::high_resolution_clock::now() - start;
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(elapsed).count();
std::cout<< duration << " nanoseconds "<<std::endl;
auto start2 = std::chrono::high_resolution_clock::now();
std::cout<<"Naive Evaluation: " << obj.naiveEval(-2)<<", ";
auto elapsed2 = std::chrono::high_resolution_clock::now() - start2;
auto duration2 = std::chrono::duration_cast<std::chrono::nanoseconds>(elapsed2).count();
std::cout<< duration2 << " nanoseconds "<<std::endl;
}
You didn't put all the code but from description it looks it is caching effect.
When it runs first method CPU cache is cold (data from memory is not yet populated with CPU cache) so it takes more time to process (memory is slow compared to cache).
When second method is called it has all (or most depending on data size) the data already available in cache - cache is hot.
Solution - call both methods outside timing part first to warm up the cache than do measurements.
Like one of the previous responds already said, it's probably something with the cache, the prefetcher can maybe better determine which memory to load into cache in the naiveEval method. Here is a talk about benchmarking c++ code for futher information for exapmle on the effect of cold starts on benchmarking: https://www.youtube.com/watch?v=zWxSZcpeS8Q

<time.h> / <ctime> are not counting ticks

EDIT: It appears to be functioning now. The code has been updated to show my revisions. Thank you all for your help.
I imagine I'm just stupid, but I'm attempting to use ctime to count CPU ticks through my entire program. I'm writing an encryption algorithm for a school project and I'm trying to include a timer so that I can add noise processes, equalizing the amount of time among different key/plaintext combinations.
Here is a little test for ctime:
#include <iostream>
#include <string>
#include <ctime>
int main (int arc, char **argv)
{
double elapsedTime;
const clock_t start = clock ();
int uselessInt = 0;
for (int i = 0; i <= 200; i++)
{
uselessInt = uselessInt * 2 / 3 + i;
std::cout << uselessInt << std::endl;
}
clock_t end = clock();
elapsedTime = static_cast<double>(end - start);
std::cout << elapsedTime << " CPU ticks have elapsed since this application's initiation." << std::endl;
return (0);
}
which prints:
0
1
2
4
/* ... long list of numbers ... */
591
594
0 CPU ticks have elapsed since this application's initiation.
[smalltock#localhost Desktop]$
I am using GCC (G++) and it appears that ctime/time.h simply isn't counting ticks like I want it to. Can anybody identify the problem? I'm a relative amateur in this language.
My two cents. When you do cin.get(), it waits for your to input something on the console, did you do anything or simply typed enter?
I did run your code without typing any text but simply press enter, it gave the following output:
Test Text
It's a stone, Luigi... you didn't make it.
0 CPU ticks have elapsed since this application's initiation.
Real 0m0.700s
User 0m0.000s
Sys 0m0.061s
It may be because the precision of CLOCKS_PER_SEC is kind of "big" (in seconds) compared to the CPU time used by your program
Meanwhile, a syntax error in duration line, you either missed another ) or should delete the first (
BTW:
Real is wall clock time - time from start to finish of the call.
User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process.
Sys is the amount of CPU time spent in the kernel within the process.
So you basically have 0 CPU time since you are keep waiting for I/O, no CPU computation.
elapsedTime in your program is a measure of time in seconds, not a count of clock ticks. If you want ticks, use duration.
Since your program (presumably) spends the vast majority of its time blocked on I/O, not very many seconds are going to have gone by.

C++ Trouble with ctime's clock() method in VS2010 under VMWare Fusion/Boot Camp

I'm having trouble getting anything useful from the clock() method in the ctime library in particular situations on my Mac. Specifically, if I'm trying to run VS2010 in Windows 7 under either VMWare Fusion or on Boot Camp, it always seems to return the same value. Some test code to test the issue:
#include <time.h>
#include "iostream"
using namespace std;
// Calculate the factorial of n recursively.
unsigned long long recursiveFactorial(int n) {
// Define the base case.
if (n == 1) {
return n;
}
// To handle other cases, call self recursively.
else {
return (n * recursiveFactorial(n - 1));
}
}
int main() {
int n = 60;
unsigned long long result;
clock_t start, stop;
// Mark the start time.
start = clock();
// Calculate the factorial of n;
result = recursiveFactorial(n);
// Mark the end time.
stop = clock();
// Output the result of the factorial and the elapsed time.
cout << "The factorial of " << n << " is " << result << endl;
cout << "The calculation took " << ((double) (stop - start) / CLOCKS_PER_SEC) << " seconds." << endl;
return 0;
}
Under Xcode 4.3.3, the function executes in about 2 μs.
Under Visual Studio 2010 in a Windows 7 virtual machine (under VMWare Fusion 4.1.3), the same code gives an execution time of 0; this machine is given 2 of the Mac’s 4 cores and 2GB RAM.
Under Boot Camp running Windows 7, again I get an execution time of 0.
Is this a question of being "too far from the metal"?
It could be that the resolution of the timer is not as high under the virtual machine. The compiler can easily convert the tail recursion into a loop; 60 multiplications don't tend to take a terribly long time. Try computing something significantly more costly, like Fibonacci numbers (recursively, of course) and you should see the timer go on.
From time.h included with MSVC,
#define CLOCKS_PER_SEC 1000
which means clock() only has a resolution of 1 millisecond when using the Visual C++ runtime libraries, so any set of operations that takes less than that will almost always be measured as having zero time elapsed.
For higher resolution timing on Windows that can help you, check out QueryPerformanceCounter
and this sample code.

Timing woes with clock_gettime() CUDA

I wanted to write an CUDA code where I could see firsthand the benefits that CUDA offered for speeding up applications.
Here is is a CUDA code I have written using Thrust ( http://code.google.com/p/thrust/ )
Briefly, all that the code does is create two 2^23 length integer vectors,one on the host and one on the device identical to each other, and sorts them. It also (attempts to) measure time for each.
On the host vector I use std::sort. On the device vector I use thrust::sort.
For compilation I used
nvcc sortcompare.cu -lrt
The output of the program at the terminal is
Desktop: ./a.out
Host Time taken is: 19 . 224622882 seconds
Device Time taken is: 19 . 321644143 seconds
Desktop:
The first std::cout statement is produced after 19.224 seconds as stated. Yet the second std::cout statement (even though it says 19.32 seconds) is produced immediately after the first
std::cout statement. Note that I have used different time_stamps for measurements in clock_gettime() viz ts_host and ts_device
I am using Cuda 4.0 and NVIDIA GTX 570 compute capability 2.0
#include<iostream>
#include<vector>
#include<algorithm>
#include<stdlib.h>
//For timings
#include<time.h>
//Necessary thrust headers
#include<thrust/sort.h>
#include<thrust/host_vector.h>
#include<thrust/device_vector.h>
#include<thrust/copy.h>
int main(int argc, char *argv[])
{
int N=23;
thrust::host_vector<int>H(1<<N);//create a vector of 2^N elements on host
thrust::device_vector<int>D(1<<N);//The same on the device.
thrust::host_vector<int>dummy(1<<N);//Copy the D to dummy from GPU after sorting
//Set the host_vector elements.
for (int i = 0; i < H.size(); ++i) {
H[i]=rand();//Set the host vector element to pseudo-random number.
}
//Sort the host_vector. Measure time
// Reset the clock
timespec ts_host;
ts_host.tv_sec = 0;
ts_host.tv_nsec = 0;
clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Start clock
thrust::sort(H.begin(),H.end());
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_host);//Stop clock
std::cout << "\nHost Time taken is: " << ts_host.tv_sec<<" . "<< ts_host.tv_nsec <<" seconds" << std::endl;
D=H; //Set the device vector elements equal to the host_vector
//Sort the device vector. Measure time.
timespec ts_device;
ts_device.tv_sec = 0;
ts_device.tv_nsec = 0;
clock_settime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Start clock
thrust::sort(D.begin(),D.end());
thrust::copy(D.begin(),D.end(),dummy.begin());
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts_device);//Stop clock
std::cout << "\nDevice Time taken is: " << ts_device.tv_sec<<" . "<< ts_device.tv_nsec <<" seconds" << std::endl;
return 0;
}
You are not checking the return value of clock_settime. I would guess it is failing, probably with errno set to EPERM or EINVAL. Read the documentation and always check your return values!
If I'm right, you are not resetting the clock as you think you are, hence the second timing is cumulative with the first, plus some extra stuff you don't intend to count at all.
The right way to do this is to call clock_gettime only, storing the result first, doing the computation, then subtracting the original time from the end time.

C++ Keeping track of how many seconds has passed since start of program

I am writing a program that will be used on a Solaris machine. I need a way of keeping track of how many seconds has passed since the start of the program. I'm talking very simple here. For example I would have an int seconds = 0; but how would I go about updating the seconds variable as each second passes?
It seems that some of the various time functions that I've looked at only work on Windows machines, so I'm just not sure.
Any suggestions would be appreciated.
Thanks for your time.
A very simple method:
#include <time.h>
time_t start = time(0);
double seconds_since_start = difftime( time(0), start);
The main drawback to this is that you have to poll for the updates. You'll need platform support or some other lib/framework to do this on an event basis.
Use std::chrono.
#include <chrono>
#include <iostream>
int main(int argc, char *argv[])
{
auto start_time = std::chrono::high_resolution_clock::now();
auto current_time = std::chrono::high_resolution_clock::now();
std::cout << "Program has been running for " << std::chrono::duration_cast<std::chrono::seconds>(current_time - start_time).count() << " seconds" << std::endl;
return 0;
}
If you only need a resolution of seconds, then std::steady_clock should be sufficient.
You are approaching it backwards. Instead of having a variable you have to worry about updating every second, just initialize a variable on program start with the current time, and then whenever you need to know how many seconds have elapsed, you subtract the now current time from that initial time. Much less overhead that way, and no need to nurse some timing related variable update.
#include <stdio.h>
#include <time.h>
#include <windows.h>
using namespace std;
void wait ( int seconds );
int main ()
{
time_t start, end;
double diff;
time (&start); //useful call
for (int i=0;i<10;i++) //this loop is useless, just to pass some time.
{
printf ("%s\n", ctime(&start));
wait(1);
}
time (&end);//useful call
diff = difftime(end,start);//this will give you time spent between those two calls.
printf("difference in seconds=%f",diff); //convert secs as u like
system("pause");
return 0;
}
void wait ( int seconds )
{
clock_t endwait;
endwait = clock () + seconds * CLOCKS_PER_SEC ;
while (clock() < endwait) {}
}
this should work fine on solaris/unix also, just remove win refs
You just need to store the date/time when application started. Whenever you need to display for how long your program is running get current date/time and subtract the when application started.