C++ , Timer, Milliseconds - c++

#include <iostream>
#include <conio.h>
#include <ctime>
using namespace std;
double diffclock(clock_t clock1,clock_t clock2)
{
double diffticks=clock1-clock2;
double diffms=(diffticks)/(CLOCKS_PER_SEC/1000);
return diffms;
}
int main()
{
clock_t start = clock();
for(int i=0;;i++)
{
if(i==10000)break;
}
clock_t end = clock();
cout << diffclock(start,end)<<endl;
getch();
return 0;
}
So my problems comes to that it returns me a 0, well to be stright i want to check how much time my program does operate...
I found tons of crap over the internet well mostly it comes to the same point of getting a 0 beacuse the start and the end is the same
This problems goes to C++ remeber : <

There are a few problems in here. The first is that you obviously switched start and stop time when passing to diffclock() function. The second problem is optimization. Any reasonably smart compiler with optimizations enabled would simply throw the entire loop away as it does not have any side effects. But even you fix the above problems, the program would most likely still print 0. If you try to imagine doing billions operations per second, throw sophisticated out of order execution, prediction and tons of other technologies employed by modern CPUs, even a CPU may optimize your loop away. But even if it doesn't, you'd need a lot more than 10K iterations in order to make it run longer. You'd probably need your program to run for a second or two in order to get clock() reflect anything.
But the most important problem is clock() itself. That function is not suitable for any time of performance measurements whatsoever. What it does is gives you an approximation of processor time used by the program. Aside of vague nature of the approximation method that might be used by any given implementation (since standard doesn't require it of anything specific), POSIX standard also requires CLOCKS_PER_SEC to be equal to 1000000 independent of the actual resolution. In other words — it doesn't matter how precise the clock is, it doesn't matter at what frequency your CPU is running. To put simply — it is a totally useless number and therefore a totally useless function. The only reason why it still exists is probably for historical reasons. So, please do not use it.
To achieve what you are looking for, people have used to read the CPU Time Stamp also known as "RDTSC" by the name of the corresponding CPU instruction used to read it. These days, however, this is also mostly useless because:
Modern operating systems can easily migrate the program from one CPU to another. You can imagine that reading time stamp on another CPU after running for a second on another doesn't make a lot of sense. It is only in latest Intel CPUs the counter is synchronized across CPU cores. All in all, it is still possible to do this, but a lot of extra care must be taken (i.e. once can setup the affinity for the process, etc. etc).
Measuring CPU instructions of the program oftentimes does not give an accurate picture of how much time it is actually using. This is because in real programs there could be some system calls where the work is performed by the OS kernel on behalf of the process. In that case, that time is not included.
It could also happen that OS suspends an execution of the process for a long time. And even though it took only a few instructions to execute, for user it seemed like a second. So such a performance measurement may be useless.
So what to do?
When it comes to profiling, a tool like perf must be used. It can track a number of CPU clocks, cache misses, branches taken, branches missed, a number of times the process was moved from one CPU to another, and so on. It can be used as a tool, or can be embedded into your application (something like PAPI).
And if the question is about actual time spent, people use a wall clock. Preferably, a high-precision one, that is also not a subject to NTP adjustments (monotonic). That shows exactly how much time elapsed, no matter what was going on. For that purpose clock_gettime() can be used. It is part of SUSv2, POSIX.1-2001 standard. Given that use you getch() to keep the terminal open, I'd assume you are using Windows. There, unfortunately, you don't have clock_gettime() and the closest thing would be performance counters API:
BOOL QueryPerformanceFrequency(LARGE_INTEGER *lpFrequency);
BOOL QueryPerformanceCounter(LARGE_INTEGER *lpPerformanceCount);
For a portable solution, the best bet is on std::chrono::high_resolution_clock(). It was introduced in C++11, but is supported by most industrial grade compilers (GCC, Clang, MSVC).
Below is an example of how to use it. Please note that since I know that my CPU will do 10000 increments of an integer way faster than a millisecond, I have changed it to microseconds. I've also declared the counter as volatile in hope that compiler won't optimize it away.
#include <ctime>
#include <chrono>
#include <iostream>
int main()
{
volatile int i = 0; // "volatile" is to ask compiler not to optimize the loop away.
auto start = std::chrono::steady_clock::now();
while (i < 10000) {
++i;
}
auto end = std::chrono::steady_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "It took me " << elapsed.count() << " microseconds." << std::endl;
}
When I compile and run it, it prints:
$ g++ -std=c++11 -Wall -o test ./test.cpp && ./test
It took me 23 microseconds.
Hope it helps. Good Luck!

At a glance, it seems like you are subtracting the larger value from the smaller value. You call:
diffclock( start, end );
But then diffclock is defined as:
double diffclock( clock_t clock1, clock_t clock2 ) {
double diffticks = clock1 - clock2;
double diffms = diffticks / ( CLOCKS_PER_SEC / 1000 );
return diffms;
}
Apart from that, it may have something to do with the way you are converting units. The use of 1000 to convert to milliseconds is different on this page:
http://en.cppreference.com/w/cpp/chrono/c/clock

The problem appears to be the loop is just too short. I tried it on my system and it gave 0 ticks. I checked what diffticks was and it was 0. Increasing the loop size to 100000000, so there was a noticeable time lag and I got -290 as output (bug -- I think that the diffticks should be clock2-clock1 so we should get 290 and not -290). I tried also changing "1000" to "1000.0" in the division and that didn't work.
Compiling with optimization does remove the loop, so you have to not use it, or make the loop "do something", e.g. increment a counter other than the loop counter in the loop body. At least that's what GCC does.

Note: This is available after c++11.
You can use std::chrono library.
std::chrono has two distinct objects. (timepoint and duration). Timepoint represents a point in time, and duration, as we already know the term represents an interval or a span of time.
This c++ library allows us to subtract two timepoints to get a duration of time passed in the interval. So you can set a starting point and a stopping point. Using functions you can also convert them into appropriate units.
Example using high_resolution_clock (which is one of the three clocks this library provides):
#include <chrono>
using namespace std::chrono;
//before running function
auto start = high_resolution_clock::now();
//after calling function
auto stop = high_resolution_clock::now();
Subtract stop and start timepoints and cast it into required units using the duration_cast() function. Predefined units are nanoseconds, microseconds, milliseconds, seconds, minutes, and hours.
auto duration = duration_cast<microseconds>(stop - start);
cout << duration.count() << endl;

First of all you should subtract end - start not vice versa.
Documentation says if value is not available clock() returns -1, did you check that?
What optimization level do you use when compile your program? If optimization is enabled compiler can effectively eliminate your loop entirely.

Related

Measuing CPU clock speed

I am trying to measure the speed of the CPU.I am not sure how much my method is accurate. Basicly, I tried an empty for loop with values like UINT_MAX but the program terminated quickly so I tried UINT_MAX * 3 and so on...
Then I realized that the compiler is optimizing away the loop, so I added a volatile variable to prevent optimization. The following program takes 1.5 seconds approximately to finish. I want to know how accurate is this algorithm for measuring the clock speed. Also,how do I know how many core's are being involved in the process?
#include <iostream>
#include <limits.h>
#include <time.h>
using namespace std;
int main(void)
{
volatile int v_obj = 0;
unsigned long A, B = 0, C = UINT32_MAX;
clock_t t1, t2;
t1 = clock();
for (A = 0; A < C; A++) {
(void)v_obj;
}
t2 = clock();
std::cout << (double)(t2 - t1) / CLOCKS_PER_SEC << std::endl;
double t = (double)(t2 - t1) / CLOCKS_PER_SEC;
unsigned long clock_speed = (unsigned long)(C / t);
std::cout << "Clock speed : " << clock_speed << std::endl;
return 0;
}
This doesn't measure clock speed at all, it measures how many loop iterations can be done per second. There's no rule that says one iteration will run per clock cycle. It may be the case, and you may have actually found it to be the case - certainly with optimized code and a reasonable CPU, a useless loop shouldn't run much slower than that. It could run at half speed though, some processors are not able to retire more than 1 taken branch every 2 cycles. And on esoteric targets, all bets are off.
So no, this doesn't measure clock cycles, except accidentally. In general it's extremely hard to get an empirical clock speed (you can ask your OS what it thinks the maximum clock speed and current clock speed are, see below), because
If you measure how much wall clock time a loop takes, you must know (at least approximately) the number of cycles per iteration. That's a bad enough problem in assembly, requiring fairly detailed knowledge of the expected microarchitectures (maybe a long chain of dependent instructions that each could only reasonably take 1 cycle, like add eax, 1? a long enough chain that differences in the test/branch throughput become small enough to ignore), so obviously anything you do there is not portable and will have assumptions built into it may become false (actually there is an other answer on SO that does this and assumes that addps has a latency of 3, which it doesn't anymore on Skylake, and didn't have on old AMDs). In C? Give up now. The compiler might be rolling some random code generator, and relying on it to be reasonable is like doing the same with a bear. Guessing the number of cycles per iteration of code you neither control nor even know is just folly. If it's just on your own machine you can check the code, but then you could just check the clock speed manually too so..
If you measure the number of clock cycles elapsed in a given amount of wall clock time.. but this is tricky. Because rdtsc doesn't measure clock cycles (not anymore), and nothing else gets any closer. You can measure something, but with frequency scaling and turbo, it generally won't be actual clock cycles. You can get actual clock cycles from a performance counter, but you can't do that from user mode. Obviously any way you try to do this is not portable, because you can't portably ask for the number of elapsed clock cycles.
So if you're doing this for actual information and not just to mess around, you should probably just ask the OS. For Windows, query WMI for CurrentClockSpeed or MaxClockSpeed, whichever one you want. On Linux there's stuff in /proc/cpuinfo. Still not portable, but then, no solution is.
As for
how do I know how many core's are being involved in the process?
1. Of course your thread may migrate between cores, but since you only have one thread, it's on only one core at any time.
A good optimizer may remove the loop, since
for (A = 0; A < C; A++) {
(void)v_obj;
}
has the same effect on the program state as;
A = C;
So the optimizer is entirely free to unwind your loop.
So you cannot measure CPU speed this way as it depends on the compiler as much as it does on the computer (not to mention the variable clock speed and multicore architecture already mentioned)

why clock() does not work on the the cluster machine

I want to get the running time of part of my code.
my C++ code is like:
...
time_t t1 = clock();
/*
Here is my core code.
*/
time_t t2 = clock();
cout <<"Running time: "<< (1000.0 * (t2 - t1)) / CLOCKS_PER_SEC << "ms" << endl;
...
This code works well on my laptop.(Opensuse,g++ and clang++, Core i5).
But it does not work well on the cluster in the department.
(Ubuntu, g++, amd Opteron and intel Xeon)
I always get some integer running time :
like : 0ms or 10ms or 20ms.
What cause that ? Why? Thanks!
Clocks are not guaranteed to be exact down to ~10-44 seconds (Planck time), they often have a minimal resolution. The Linux man page implies this with:
The clock() function returns an approximation of processor time used by the program.
and so does the ISO standard C11 7.27.2.1 The clock function /3:
The clock function returns the implementation’s best approximation of ...
and in 7.27.1 Components of time /4:
The range and precision of times representable in clock_t and time_t are implementation-defined.
From your (admittedly limited) sample data, it looks like the minimum resolution of your cluster machines is on the order of 10ms.
In any case, you have several possibilities if you need a finer resolution.
First, find a (probably implementation-specific) means of timing things more accurately.
Second, don't do it once. Do it a thousand times in a tight loop and then just divide the time taken by 1000. That should roughly increase your resolution a thousand-fold.
Thirdly, think about the implication that your code only takes 50ms at the outside. Unless you have a pressing need to execute it more than twenty times a second (assuming you have no other code to run), it may not be an issue.
On that last point, think of things like "What's the longest a user will have to wait before they get annoyed?". The answer to that would vary but half a second might be fine in most situations.
Since 50ms code could run ten times over during that time, you may want to ignore it. You'd be better off concentrating on code that has a clearly larger impact.

C++ precise measure time in decimal ms

I am measuring time of sorting algorhytms like Bubble,Insert, Selection and Quick sort.
I am using this for my purpose
long int before = GetTickCount();
QuickSort(pole,0,dlzka-1);
long int after = GetTickCount();
double dif = double((after - before));
cout << "Quick Sort with time "<< dif << " ms " << endl;
I am sorting array with 30 000 integers and working fine for other sort except the QuickSort which is probbably so fast that it sorts 30k integers in less then 1ms and then my timeer says it is 0ms which look like a mistake.
I want to write it for example 0,01ms just to make it looks that it run corectly.
Thank you.
When you benchmark, you never benchmark just one run. Your timer is not precise/accurate enough to give meaningful results across that tiny amount of time.
For example, the documentation for GetTickCount says:
The resolution of the GetTickCount function is limited to the resolution of the system timer, which is typically in the range of 10 milliseconds to 16 milliseconds.
So, it is plainly obvious that obtaining a value of 0.01ms is folly.
Instead, benchmark many runs, then divide by the number of times you ran it.
Put your code into a loop that you run 1000 times, with the clock started and stopped outside of that loop. Then divide the result by 1000. Or, if you like, the result of the clock will now be in µs instead of in ms.
If your loop is very fast, you may need more than 1000 repetitions to get a meaningful measurement. You could run 10,000, 100,000, ... etc times until you get a "reasonable number of milliseconds".
When the piece of code you are testing is very fast, the overhead of the loop may become significant; in that case, you might run an "empty loop" and subtract the two results to give you the "net" timing of the inner part of the loop only.
It is rare, however, that this is something you need to do - most often you are trying to compare different algorithms, and as long as the overhead of the loop is the same it doesn't matter that it exists - the faster algorithm will still be faster.
One more thought - and this is pretty important: if you sort things in the first pass through the loop, and your algorithm speed depends on whether the data is sorted or not, you will get a different answer for multiple passes than you get for a single pass. Thus you need to make sure that you are using the same inputs for every pass through the algorithm. This might mean that you cannot use in-place sorting, or that you copy the unsorted data back into the "to be sorted" array at every pass of the algorithm.
other option: there is a good article on high precision timing that explains the use of the clock_gettime() function, with its various options and flavors. On some systems this will allow you to use higher resolution measurements. It is still always a good idea to do multiple runs, or even multiple runs of multiple runs - so you can compute statistics and thus come up with a confidence interval.
If you are using c++11:
std::chrono::high_resolution_clock represents the clock with the smallest tick period provided by the implementation.
If your compiler supports C++11 with std::chrono, this is best way to measure time at high accuracy; it is cross-platform and part of the standard library.
#include <chrono>
#include <iostream>
#include <iomanip>
::std::chrono::steady_clock::time_point startTime = std::chrono::steady_clock::now();
doWork();
::std::chrono::steady_clock::duration elapsedTime = ::std::chrono::steady_clock::now() - startTime;
std::cout << std::fixed << std::setprecision(9) << std::endl;
double duration = ::std::chrono::duration_cast< ::std::chrono::duration< double > >(elapsedTime).count();
std::cout << "Milliseconds: " << duration * 1000 << std::endl;
To use C++11 in GCC, you run g++ -std=c++11 -o app main.cpp. For the Visual Studio compiler, you need 2012 or higher to use chrono.

Is clock() reliable for a timer?

I'm using clock(), and I'm wondering whether it ever resets or maxes out. all I'm using it for is just too subject it from a previous function call and find the difference.
Thanks for the help so far but I'm not really able to get the chrono thing working in VS '12 but its fine because I think its a little more than I need anyway, I was think about using 's time() but I have no idea how to convert the t_time into an int that contains just the current seconds 0-60, any help?
As far as the standard is concerned,
The range and precision of times representable in clock_t and time_t are implementation-defined.
(C99, §7.23.1 ¶4)
so there are no guarantees of range; the definition of clock() does not say anything about wrapping around, although it says that
If the processor time used is not available or its value cannot be represented, the function returns the value (clock_t)(-1)
So we may say that exceeding the range of clock_t may be seen as "its value cannot be represented"; on the other hand, this interpretation would mean that, after some time, clock() becomes completely useless.
In facts, if we get down to a specific implementation (glibc), we see:
matteo#teokubuntu:~$ man 3 clock
Note that the time can wrap around. On a 32-bit system where
CLOCKS_PER_SEC equals 1000000 this function will return the same value
approximately every 72 minutes.
Depends on what system you are on. It may use a 32+ or a 64-bit clock_t. It will definitely roll over, but if it's 64-bit, it will be OK for quite some time before it rolls over - 264 microseconds is still an awful long time (approx 244 seconds, and there is around 216 seconds per day, so 228 days - which is about 220, or a million, years... ;)
Of course, in a 32-bit system, we have about 212=4096 seconds at microsecond resoltion. An hour being 3600s = about 1h10m.
However, another problem, in some systems, is that clock() returns CPU time used, so if you sleep, it won't count as time in clock().
And of course, even though CLOCKS_PER_SEC may be 1000000, it doesn't mean that you get microsecond resultion - in many systems, it "jumps" 10000 units at a time.
In summary, "probably a bad idea".
If you have C++11 on the system, use std::chrono, which has several options for timekeeping that are sufficiently good for most purposes (but do study the std::chrono docs)
Example code:
#include <iostream>
#include <chrono>
#include <unistd.h> // replace with "windows.h" if needed.
int main()
{
std::chrono::time_point<std::chrono::system_clock> start, end;
start = std::chrono::system_clock::now();
// 10 seconds on a unix system. Sleep(10000) on windows will be the same thing
sleep(10);
end = std::chrono::system_clock::now();
int elapsed_seconds = std::chrono::duration_cast<std::chrono::seconds>
(end-start).count();
std::cout << "elapsed time: " << elapsed_seconds << "s\n";
}
The simple answer is that if you're just using it to time a function, it will probably not wrap around. It may also be too slow and chances are you might see a function duration of zero. If you want accurate timing for a function that executes fast, you're probably better using an OS level call like this one on Windows.

How to calculate and print clock_t time roughly

I am timing how long it takes to do three different types of searches, sequential, recursive binary, and iterative binary. I have those in place, and it does iterate through and finish the search. My problem is that when I time them all, I get 0 for all of them every time, even if I make an array of 100,000, and I have it search for something not in the array. If I set a break point in the search it obviously makes the time longer, and it gives me a reasonable time that I can work with. But otherwise it is always 0. Here is my code, it is similar for all three search timers.
clock_t recStart = clock();
mySearch.recursiveSearch(SEARCH_INT);
clock_t recEnd = clock();
clock_t recDiff = recEnd - recStart;
double recClockTime = (double)recDiff/(double)CLOCKS_PER_SEC;
cout << recClockTime << endl;
cout << CLOCKS_PER_SEC << endl;
cout << recClockTime << endl;
For the last two I get 1000 and 0.
Am I doing something wrong here? Or is it in my search Object?
clock() is not an accurate timer, and it just don't work well for timing short intervals.
C says clock returns the implementation’s best approximation to the processor time used by the program since the beginning of an implementation-defined era related only to the program invocation.
If between two successive clock calls you program takes less time than one unity of the clock function, you could get 0. POSIX clock defines the unity with CLOCKS_PER_SEC as 1000000 (unity is then 1 microsecond).
(http://pubs.opengroup.org/onlinepubs/009604499/functions/clock.html)
To measure clock cycles in x86/x64 you can use assembly to retreive the clock count of the CPU Time Stamp Counter register rdtsc. (which can be achieved by inline assembling?) Note that it returns the time stamp, not the number of seconds elapsed. So you need to retrieve the cpu frequency as well.
However, the best way to get accurate time in seconds depends on your platform.
To sum up, it's virtually impossible to achieve calculating and printing clock_t time in seconds accurately. You might want to see this on Stackoverflow to find a better approach (if accuracy is top priority).
clock() just doesn't have enough resolution - here is one good discussion/blog on that topic
http://www.guyrutenberg.com/2007/09/10/resolution-problems-in-clock/
I think two options either use clock_gettime or even better have you considered using OProfile or CodeAnalyst?
I personally prefer to use tools - OProfile is good. I have not used CodeAnalyst before - and then there is Valgrind and gprof.
If you insist on using clock_gettime - please check this out
http://www.guyrutenberg.com/2007/09/22/profiling-code-using-clock_gettime/