I wanted to test a way to measure the precise execution time of a piece of code in nanoseconds (accuracy upto 100 nanoseconds is ok) in C++.
I tried using chrono::high_resolution_clock for this purpose. In order to test whether it is working properly or not. I do the following:
Get current time in nanoseconds using high_resolution_clock, call it "start"
sleep for "x" nanoseconds using nanosleep(x)
Get current time in nanoseconds using high_resolution_clock, call it "end"
Now "end" - "start" should be roughly same as "x". Lets call this difference "diff"
I ran the above test for x varying from 10 to 1000000. I get the diff to be around 100000 i.e (100 microseconds)
Where as this shouldn't be more than say 100 nanoseconds. Please help me fix this.
#include <ctime>
#include <unistd.h>
#include <iostream>
#include <chrono>
using namespace std;
int main() {
int sleep_ns[] = {10, 50, 100, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000};
int n = sizeof(sleep_ns)/sizeof(int);
for (int i = 0; i < n; i++) {
auto start = std::chrono::high_resolution_clock::now();
timespec tspec = {0, sleep_ns[i]};
nanosleep(&tspec, NULL);
auto end = std::chrono::high_resolution_clock::now();
chrono::duration<int64_t, nano> dur_ns = (end - start);
int64_t measured_ns = dur_ns.count();
int64_t diff = measured_ns - sleep_ns[i];
cout << "diff: " << diff
<< " sleep_ns: " << sleep_ns[i]
<< " measured_ns: " << measured_ns << endl;
}
return 0;
}
Following was the output of this code on my machine. Its running "Ubuntu 16.04.4 LTS"
diff: 172747 sleep_ns: 10 measured_ns: 172757
diff: 165078 sleep_ns: 50 measured_ns: 165128
diff: 164669 sleep_ns: 100 measured_ns: 164769
diff: 163855 sleep_ns: 500 measured_ns: 164355
diff: 163647 sleep_ns: 1000 measured_ns: 164647
diff: 162207 sleep_ns: 2000 measured_ns: 164207
diff: 160904 sleep_ns: 5000 measured_ns: 165904
diff: 155709 sleep_ns: 10000 measured_ns: 165709
diff: 145306 sleep_ns: 20000 measured_ns: 165306
diff: 115915 sleep_ns: 50000 measured_ns: 165915
diff: 125983 sleep_ns: 100000 measured_ns: 225983
diff: 115470 sleep_ns: 200000 measured_ns: 315470
diff: 115774 sleep_ns: 500000 measured_ns: 615774
diff: 116473 sleep_ns: 1000000 measured_ns: 1116473
What you're trying to do is not going to work on every platform, or even most platforms. There's a couple of reasons why.
The first, and biggest, reason is that measuring the precise time that code is executed at/within is, by its very nature, imprecise. It requires a black-box OS call to determine, and if you've ever looked at how those calls are implemented in the first place, it's quickly apparent that there's inherent imprecision in the technique. On Windows, this is done by measuring both the current "tick" of the processor, and its reported frequency, and multiplying one by the other to determine how many nanoseconds have passed between two successive calls. But Windows only reports with accuracy of Microseconds to begin with, and if the CPU changes its frequency, even if only modestly (which is common in modern CPUs, to lower the frequency when the CPU isn't being maxed out, to save power) that can skew results.
Linux also has similar quirks, and every OS is at the mercy of the CPU's ability to accurately report its own tick counter/tick rate.
The second reason you're going to get results like what you've observed is in the fact that, for similar reasons to the first reason, "Sleep"-ing a thread is usually very imprecise. CPUs usually can't sleep with better precision than microsecond precision, and it's often not possible to sleep any faster than half a millisecond at a time. Your particular environment seems to be at least capable of a few hundred microseconds of precision, but it's clearly not more precise than that. Some environments will even drop Nanosecond resolution altogether.
Altogether, it's probably a mistake to presume that, without programming for an explicitly Real-Time OS, using the specific API of that OS, that you can get the kind of precision you're expecting/desiring. If you want reliable information about the timing of individual snippets of code, you'll need to run said code over-and-over, get a sample of the entire execution, and then take the average, to get you a broad idea of the timing for each run. It'll still be imprecise, but it'll help get around these limitations.
Here's part of the description of nanosleep:
If the interval specified in req is not an exact multiple of the granularity underlying clock (see time(7)), then the interval will be rounded up to the next multiple. Furthermore, after the sleep completes, there may still be a delay before the CPU becomes free to once again execute the calling thread.
The behavior you're getting seems to fit pretty well with the description.
For extremely short pauses, you're probably going to have to do some (most?) of the work on your own. The system clock source will often have a granularity of a microsecond or so.
One possible way to pause for less than the system clock time would be to measure how often you can execute a loop before the clock changes. During startup (for example) do that a few times, to get a good idea of how many loops you can execute per microsecond.
Then to pause for some fraction of that time, you can do linear interpolation to guess at a number of times to execute a loop to get about the same length of pause.
Note: this will generally run the CPU at 100% for the duration of the pause, so you only want to do it for really short pauses--up to a microsecond or two is fine, but if you want much more than that, you probably want to fall back to nanosleep.
Even with that, however, you need to be aware that a pause could end up substantially longer than you planned. The OS does time slicing. If your process' time slice expires in the middle of your pause loop, it could easily be tens of milliseconds (or more) before it's scheduled to run again.
If you really need an assurance of response times on this order, you'll probably need to consider another OS (but even that's not a panacea--what you're asking for isn't trivial, regardless of how you approach it).
Reference
nanosleep man page
Related
when running some benchmarks of my C++ software, I obtained the following picture:
The plot shows the execution time in nanoseconds of one tick of the software.
The exact same tick is ran each time (one data point is one tick).
When testing in the simulated environment of valgrind, there is zero difference between each tick, and I don't have syscalls others than what clock_gettime may do.
I would like to understand what can cause the two "speeds" in which the tick seems to run. I disabled intel CPU sleep states which greatly helped (before that I had 4 lines like this), and what could be the causes for the outlier points. The scheduler used is linux's FIFO scheduler.
An interesting observation is that the tick times alternate between the two values of 9000 and 6700 ns; here's some data points:
9022
6605
9170
6756
9126
6594
9102
6744
9016
6643
8950
6638
9047
6662
edit:
just doing this in a loop in my thread:
auto t0 = std::chrono::high_resolution_clock::now();
auto t1 = std::chrono::high_resolution_clock::now();
measure(std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count()));
is enough for me to see the alternating effect, though between 100 and 150 ns.
I used the following function to find the time taken by my code.
#include <sys/time.h>
struct timeval start, end;
gettimeofday(&start,NULL);
//mycode
gettimeofday(&end,NULL);
cout<<" time taken by my code: "<<((end.tv_sec - start.tv_sec) * 1000000 + end.tv_usec - start.tv_usec ) / 1000.0<<" msec"<<endl;
I observed that even though my code runs for 2 hours, yet the time reported by the above function is 1213 milliseconds. I am not able to understand as to why is it happened. Also is there a way by which I may record the time taken by my code in hours correctly
My best guess is that time_t (the type of tv_sec) on your system is signed 32 bits and that (end.tv_sec - start.tv_sec) * 1000000 overflows.
You could test that theory by making sure that you don't use 32 bit arithmetic for this computation:
(end.tv_sec - start.tv_sec) * 1000000LL
That being said, I advise use of the C++11 <chrono> library instead:
#include <chrono>
auto t0 = std::chrono::system_clock::now();
//mycode
auto t1 = std::chrono::system_clock::now();
using milliseconds = std::chrono::duration<double, std::milli>;
milliseconds ms = t1 - t0;
std::cout << " time taken by my code: " << ms.count() << '\n';
The <chrono> library has an invariant that none of the "predefined" durations will overflow in less than +/- 292 years. In practice, only nanoseconds will overflow that quickly, and the other durations will have a much larger range. Each duration has static ::min() and ::max() functions you can use to query the range for each.
The original proposal for <chrono> has a decent tutorial section that might be a helpful introduction. It is only slightly dated. What it calls monotonic_clock is now called steady_clock. I believe that is the only significant update it lacks.
On which platform are you doing this? If it's Linux/Unix-like your easiest non-intrusive bet is simply using the time command from the command-line. Is the code you're running single-threaded or not? Some of the functions in time.h (like clock() e.g. ) return the number of ticks against each core, which may or may not be what you want. And the newer stuff in the chrono may not be as exact as you like (a while back I tried to measure time intervals in nanoseconds with chrono, but the lowest time interval I got back back then was 300ns, which was much less exact than I'd hoped).
This part of the bench marking process may help your purpose:
#include<time.h>
#include<cstdlib>
...
...
float begin = (float)clock()/CLOCKS_PER_SEC;
...
//do your bench-marking stuffs
...
float end = (float)clock()/CLOCKS_PER_SEC;
float totalTime = end - begin;
cout<<"Time Req in the stuffs: "<<totalTime<<endl;
NOTE: This process is a simple alternative to the chrono library
If you are on linux and the code that you want to time is largely the program itself, then you can time your program by passing it as an argument to the time command and look at the 'elapsed time' row.
/usr/bin/time -v <your program's executable>
For example:
/usr/bin/time -v sleep 3 .../home/aakashah/workspace/head/src/GroverStorageCommon
Command being timed: "sleep 3"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2176
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 165
Voluntary context switches: 2
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I am profiling CPU usage on a simple program I am writing. I have different algorithms I want to try, and I also want to know what's the impact on the total system performance.
Currently, I am using ualarm() to execute some instructions at 30Hz; every 15 of those interruptions (every 0.5s) I record the CPU time with getrusage() (in useconds), so I have an estimation on the total cpu time of cpu consumption on that point in time. But to get context, I also need to know the total time elapsed in the system in that time period, so I can have the % of which is used by my program.
/* Main Loop */
while(1)
{
alarm = 0;
/* Waiting Loop: */
for(i=0; !alarm; i++){
}
count++;
/* Do my things */
/* Check if it's time to store cpu log: */
if ((count%count_max) == 0)
{
getrusage(RUSAGE_SELF, &ru);
store_cpulog(f,
(int64_t) ru.ru_utime.tv_sec,
(int64_t) ru.ru_utime.tv_usec,
(int64_t) ru.ru_stime.tv_sec,
(int64_t) ru.ru_stime.tv_usec);
}
}
I have different options, but I don't know which one will provide the most exact result:
Use ualarm for the timing. Currently it's programmed to signal every 0.5 seconds, so I can take those 0.5 seconds as the CPU time. Seems quite obvious to use, but it's the best option?
Use clock_gettime(CLOCK_MONOTONIC): it provides readings with a nanosec resolution.
Use gettimeofday(): provides readings with a usec resolution. I've found opinions against using it.
Any recommendation? Thanks.
Possible solution is to use system function time and don't using busy loop (like #Hasturkun say) in your program. Call in console:
time /path/to/my/program
and after execution of it you get something like:
real 0m1.465s
user 0m0.000s
sys 0m1.210s
Not sure about precision, if it is enough for you.
Callgrind is possibly the best application for profiling C/C++ code under linux. Use it with pride:)
I want to calculate time intervals (in 1/10th of 1 second) between some events happening in my program. Thus I use clock function for these needs like follows:
clock_t begin;
clock_t now;
clock_t diff;
begin = clock();
while ( 1 )
{
now = clock();
diff = now - begin;
cout << diff / CLOCKS_PER_SEC << "\n";
//usleep ( 1000000 );
};
I expect the program to print 0 for 1 second, then 1 for 1 sec., then 2 for 1 sec. and so on... In fact it prints 0 for about 8 seconds, then 1 for about 8 seconds and so on...
By the way, if I add usleep in order program prints only 1 time per second, it prints only 0 all way long...
Great thanks for help!
The clock() function returns the amount of CPU time charged to your program. When you are blocked inside a usleep() call, no time is being charged to you, making it very clear why your time never seems to increase. As to why you seem to be taking 8 seconds to be charged one second -- there are other things going on within your system, consuming CPU time that you would like to be consuming but you must share the processor. clock() cannot be used to measure the passage of real time.
I bet your printing so much to stdout that old prints are getting buffered. The buffer is growing and the output to the console can't keep up with your tight loop. By adding the sleep you're allowing the buffer some time to flush and catch up. So even though its 8 seconds into your program, your printing stuff from 8 seconds ago.
I'd suggest putting the actual timestamp into the print statement. See if the timestamp is lagging significantly from the actual time.
If you're able to use boost, checkout the Boost Timers library.
Maybe you have to typecast it to double.
cout << (double)(diff / CLOCKS_PER_SEC) << "\n";
Integers get rounded, probably to 0 in your case.
Read about the time() function.
I am trying to get total cpu usage in %. First I should start by saying that "top" will simply not do, as there is a delay between cpu dumps, it requires 2 dumps and several seconds, which hangs my program (I do not want to give it its own thread)
next thing what I tried is "ps" which is instant but always gives very high number in total (20+) and when I actually got my cpu to do something it stayed at about 20...
Is there any other way that I could get total cpu usage? It does not matter if it is over one second or longer periods of time... Longer periods would be more useful, though.
cat /proc/stat
http://www.linuxhowtos.org/System/procstat.htm
I agree with this answer above. The cpu line in this file gives the total number of "jiffies" your system has spent doing different types of processing.
What you need to do is take 2 readings of this file, seperated by whatever interval of time you require. The numbers are increasing values (subject to integer rollover) so to get the %cpu you need to calculate how many jiffies have elapsed over your interval, versus how many jiffies were spend doing work.
e.g.
Suppose at 14:00:00 you have
cpu 4698 591 262 8953 916 449 531
total_jiffies_1 = (sum of all values) = 16400
work_jiffies_1 = (sum of user,nice,system = the first 3 values) = 5551
and at 14:00:05 you have
cpu 4739 591 289 9961 936 449 541
total_jiffies_2 = 17506
work_jiffies_2 = 5619
So the %cpu usage over this period is:
work_over_period = work_jiffies_2 - work_jiffies_1 = 68
total_over_period = total_jiffies_2 - total_jiffies_1 = 1106
%cpu = work_over_period / total_over_period * 100 = 6.1%
Try reading /proc/loadavg. The first three numbers are the number of processes actually running (i.e., using a CPU), averaged over the last 1, 5, and 15 minutes, respectively.
http://www.linuxinsight.com/proc_loadavg.html
Read /proc/cpuinfo to find the number of CPU/cores available to the systems.
Call the getloadavg() (or alternatively read the /proc/loadavg), take the first value, multiply it by 100 (to convert to percents), divide by number of CPU/cores. If the value is greater than 100, truncate it to 100. Done.
Relevant documentation: man getloadavg and man 5 proc
N.B. Load average, usual to *NIX systems, can be more than 100% (per CPU/core) because it actually measures number of processes ready to be run by scheduler. With Windows-like CPU metric, when load is at 100% you do not really know whether it is optimal use of CPU resources or system is overloaded. Under *NIX, optimal use of CPU loadavg would give you value ~1.0 (or 2.0 for dual system). If the value is much greater than number CPU/cores, then you might want to plug extra CPUs into the box.
Otherwise, dig the /proc file system.
cpu-stat is a C++ project that permits to read Linux CPU counter from /proc/stat .
Get CPUData.* and CPUSnaphot.* files from cpu-stat's src directory.
Quick implementation to get overall cpu usage:
#include "CPUSnapshot.h"
#include <chrono>
#include <thread>
#include <iostream>
int main()
{
CPUSnapshot previousSnap;
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
CPUSnapshot curSnap;
const float ACTIVE_TIME = curSnap.GetActiveTimeTotal() - previousSnap.GetActiveTimeTotal();
const float IDLE_TIME = curSnap.GetIdleTimeTotal() - previousSnap.GetIdleTimeTotal();
const float TOTAL_TIME = ACTIVE_TIME + IDLE_TIME;
int usage = 100.f * ACTIVE_TIME / TOTAL_TIME;
std::cout << "total cpu usage: " << usage << " %" << std::endl;
}
Compile it:
g++ -std=c++11 -o CPUUsage main.cpp CPUSnapshot.cpp CPUData.cpp
cat /proc/stat
http://www.linuxhowtos.org/System/procstat.htm
I suggest two files to starting...
/proc/stat and /proc/cpuinfo.
http://www.mjmwired.net/kernel/Documentation/filesystems/proc.txt
have a look at this C++ Lib.
The information is parsed from /proc/stat. it also parses memory usage from /proc/meminfo and ethernet load from /proc/net/dev
----------------------------------------------
current CPULoad:5.09119
average CPULoad 10.0671
Max CPULoad 10.0822
Min CPULoad 1.74111
CPU: : Intel(R) Core(TM) i7-10750H CPU # 2.60GHz
----------------------------------------------
network load: wlp0s20f3 : 1.9kBit/s : 920Bit/s : 1.0kBit/s : RX Bytes Startup: 15.8mByte TX Bytes Startup: 833.5mByte
----------------------------------------------
memory load: 28.4% maxmemory: 16133792 Kb used: 4581564 Kb Memload of this Process 170408 KB
----------------------------------------------