I used the following function to find the time taken by my code.
#include <sys/time.h>
struct timeval start, end;
gettimeofday(&start,NULL);
//mycode
gettimeofday(&end,NULL);
cout<<" time taken by my code: "<<((end.tv_sec - start.tv_sec) * 1000000 + end.tv_usec - start.tv_usec ) / 1000.0<<" msec"<<endl;
I observed that even though my code runs for 2 hours, yet the time reported by the above function is 1213 milliseconds. I am not able to understand as to why is it happened. Also is there a way by which I may record the time taken by my code in hours correctly
My best guess is that time_t (the type of tv_sec) on your system is signed 32 bits and that (end.tv_sec - start.tv_sec) * 1000000 overflows.
You could test that theory by making sure that you don't use 32 bit arithmetic for this computation:
(end.tv_sec - start.tv_sec) * 1000000LL
That being said, I advise use of the C++11 <chrono> library instead:
#include <chrono>
auto t0 = std::chrono::system_clock::now();
//mycode
auto t1 = std::chrono::system_clock::now();
using milliseconds = std::chrono::duration<double, std::milli>;
milliseconds ms = t1 - t0;
std::cout << " time taken by my code: " << ms.count() << '\n';
The <chrono> library has an invariant that none of the "predefined" durations will overflow in less than +/- 292 years. In practice, only nanoseconds will overflow that quickly, and the other durations will have a much larger range. Each duration has static ::min() and ::max() functions you can use to query the range for each.
The original proposal for <chrono> has a decent tutorial section that might be a helpful introduction. It is only slightly dated. What it calls monotonic_clock is now called steady_clock. I believe that is the only significant update it lacks.
On which platform are you doing this? If it's Linux/Unix-like your easiest non-intrusive bet is simply using the time command from the command-line. Is the code you're running single-threaded or not? Some of the functions in time.h (like clock() e.g. ) return the number of ticks against each core, which may or may not be what you want. And the newer stuff in the chrono may not be as exact as you like (a while back I tried to measure time intervals in nanoseconds with chrono, but the lowest time interval I got back back then was 300ns, which was much less exact than I'd hoped).
This part of the bench marking process may help your purpose:
#include<time.h>
#include<cstdlib>
...
...
float begin = (float)clock()/CLOCKS_PER_SEC;
...
//do your bench-marking stuffs
...
float end = (float)clock()/CLOCKS_PER_SEC;
float totalTime = end - begin;
cout<<"Time Req in the stuffs: "<<totalTime<<endl;
NOTE: This process is a simple alternative to the chrono library
If you are on linux and the code that you want to time is largely the program itself, then you can time your program by passing it as an argument to the time command and look at the 'elapsed time' row.
/usr/bin/time -v <your program's executable>
For example:
/usr/bin/time -v sleep 3 .../home/aakashah/workspace/head/src/GroverStorageCommon
Command being timed: "sleep 3"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2176
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 165
Voluntary context switches: 2
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Related
I wanted to test a way to measure the precise execution time of a piece of code in nanoseconds (accuracy upto 100 nanoseconds is ok) in C++.
I tried using chrono::high_resolution_clock for this purpose. In order to test whether it is working properly or not. I do the following:
Get current time in nanoseconds using high_resolution_clock, call it "start"
sleep for "x" nanoseconds using nanosleep(x)
Get current time in nanoseconds using high_resolution_clock, call it "end"
Now "end" - "start" should be roughly same as "x". Lets call this difference "diff"
I ran the above test for x varying from 10 to 1000000. I get the diff to be around 100000 i.e (100 microseconds)
Where as this shouldn't be more than say 100 nanoseconds. Please help me fix this.
#include <ctime>
#include <unistd.h>
#include <iostream>
#include <chrono>
using namespace std;
int main() {
int sleep_ns[] = {10, 50, 100, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000, 200000, 500000, 1000000};
int n = sizeof(sleep_ns)/sizeof(int);
for (int i = 0; i < n; i++) {
auto start = std::chrono::high_resolution_clock::now();
timespec tspec = {0, sleep_ns[i]};
nanosleep(&tspec, NULL);
auto end = std::chrono::high_resolution_clock::now();
chrono::duration<int64_t, nano> dur_ns = (end - start);
int64_t measured_ns = dur_ns.count();
int64_t diff = measured_ns - sleep_ns[i];
cout << "diff: " << diff
<< " sleep_ns: " << sleep_ns[i]
<< " measured_ns: " << measured_ns << endl;
}
return 0;
}
Following was the output of this code on my machine. Its running "Ubuntu 16.04.4 LTS"
diff: 172747 sleep_ns: 10 measured_ns: 172757
diff: 165078 sleep_ns: 50 measured_ns: 165128
diff: 164669 sleep_ns: 100 measured_ns: 164769
diff: 163855 sleep_ns: 500 measured_ns: 164355
diff: 163647 sleep_ns: 1000 measured_ns: 164647
diff: 162207 sleep_ns: 2000 measured_ns: 164207
diff: 160904 sleep_ns: 5000 measured_ns: 165904
diff: 155709 sleep_ns: 10000 measured_ns: 165709
diff: 145306 sleep_ns: 20000 measured_ns: 165306
diff: 115915 sleep_ns: 50000 measured_ns: 165915
diff: 125983 sleep_ns: 100000 measured_ns: 225983
diff: 115470 sleep_ns: 200000 measured_ns: 315470
diff: 115774 sleep_ns: 500000 measured_ns: 615774
diff: 116473 sleep_ns: 1000000 measured_ns: 1116473
What you're trying to do is not going to work on every platform, or even most platforms. There's a couple of reasons why.
The first, and biggest, reason is that measuring the precise time that code is executed at/within is, by its very nature, imprecise. It requires a black-box OS call to determine, and if you've ever looked at how those calls are implemented in the first place, it's quickly apparent that there's inherent imprecision in the technique. On Windows, this is done by measuring both the current "tick" of the processor, and its reported frequency, and multiplying one by the other to determine how many nanoseconds have passed between two successive calls. But Windows only reports with accuracy of Microseconds to begin with, and if the CPU changes its frequency, even if only modestly (which is common in modern CPUs, to lower the frequency when the CPU isn't being maxed out, to save power) that can skew results.
Linux also has similar quirks, and every OS is at the mercy of the CPU's ability to accurately report its own tick counter/tick rate.
The second reason you're going to get results like what you've observed is in the fact that, for similar reasons to the first reason, "Sleep"-ing a thread is usually very imprecise. CPUs usually can't sleep with better precision than microsecond precision, and it's often not possible to sleep any faster than half a millisecond at a time. Your particular environment seems to be at least capable of a few hundred microseconds of precision, but it's clearly not more precise than that. Some environments will even drop Nanosecond resolution altogether.
Altogether, it's probably a mistake to presume that, without programming for an explicitly Real-Time OS, using the specific API of that OS, that you can get the kind of precision you're expecting/desiring. If you want reliable information about the timing of individual snippets of code, you'll need to run said code over-and-over, get a sample of the entire execution, and then take the average, to get you a broad idea of the timing for each run. It'll still be imprecise, but it'll help get around these limitations.
Here's part of the description of nanosleep:
If the interval specified in req is not an exact multiple of the granularity underlying clock (see time(7)), then the interval will be rounded up to the next multiple. Furthermore, after the sleep completes, there may still be a delay before the CPU becomes free to once again execute the calling thread.
The behavior you're getting seems to fit pretty well with the description.
For extremely short pauses, you're probably going to have to do some (most?) of the work on your own. The system clock source will often have a granularity of a microsecond or so.
One possible way to pause for less than the system clock time would be to measure how often you can execute a loop before the clock changes. During startup (for example) do that a few times, to get a good idea of how many loops you can execute per microsecond.
Then to pause for some fraction of that time, you can do linear interpolation to guess at a number of times to execute a loop to get about the same length of pause.
Note: this will generally run the CPU at 100% for the duration of the pause, so you only want to do it for really short pauses--up to a microsecond or two is fine, but if you want much more than that, you probably want to fall back to nanosleep.
Even with that, however, you need to be aware that a pause could end up substantially longer than you planned. The OS does time slicing. If your process' time slice expires in the middle of your pause loop, it could easily be tens of milliseconds (or more) before it's scheduled to run again.
If you really need an assurance of response times on this order, you'll probably need to consider another OS (but even that's not a panacea--what you're asking for isn't trivial, regardless of how you approach it).
Reference
nanosleep man page
I have a C++ binary and I am trying to measure it's worst case performance.
I executed it with
/usr/bin/time -v < command >
And result was as
User time (seconds): 161.07
System time (seconds): 16.64
Percent of CPU this job got: 7%
Elapsed (wall clock) time (h:mm:ss or m:ss): 39:44.46
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 19889808
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1272786
Voluntary context switches: 233597
Involuntary context switches: 138
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
How do I interpret this result, what is causing this application to take this much time?
There is no waiting for user input, it basically deals with large text file and database.
I am looking at it from Linux(OS) perspective.Is it too many context switches(Round robin Scheduling in Linux) that has caused this?
The best thing you can do is to run it with a profiler like gprof, gperftools, callgrind (part of valgrind) or (the best in my opinion) Intel VTune. They can show you what is going one behind the code. And you'd better have the debug symbols (!= than compiling without optimization) to get a clear picture about that. Otherwise you can just have "best guesses" of what is going one under the hood...
As I said, I'm biased towards VTune as it is fast and it displays a lot of useful info. Take a look here at an example:
Vtune example
The documentation for usleep states that calling usleep(0) has no effect. However, on my system (RHEL 5.2) running the small snippets of C++ code below, I find that it actually appears to have the same effect as usleep(1). Is this to be expected, and if so, why is there the discrepancy between the documentation and what I see in real life?
Exhibit A
Code:
#include <unistd.h>
int main()
{
for( int i = 0; i < 10000; i++ )
{
usleep(1);
}
}
Output:
$ time ./test
real 0m10.124s
user 0m0.001s
sys 0m0.000s
Exhibit B
Code:
#include <unistd.h>
int main()
{
for( int i = 0; i < 10000; i++ )
{
usleep(1);
usleep(0);
}
}
Output:
$ time ./test
real 0m20.770s
user 0m0.002s
sys 0m0.001s
Technically it should have no effect. But you must remember that the value passed is used as a minimum, and not an absolute, therefore the system is free to use the smallest possible interval instead.
I just wanted to point out about the time command used here. You should use /usr/bin/time instead of only time command if you want to check your program memory,cpu,time stat. When you call time without full path then built-in time command is called. Look at the difference.
without full path:
# time -v ./a.out
-bash: -v: command not found
real 0m0.001s
user 0m0.000s
sys 0m0.001s
with full path:
# /usr/bin/time -v ./a.out
Command being timed: "./a.out"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.87
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 220
Voluntary context switches: 10001
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
use man time for /usr/bin/time manual and use help time for built in time information.
I would have to look at the source to make sure, but my guess is that it's not quite "no effect", but it's probably still less than usleep(1) - there's still the function call overhead, which can be measurable in a tight loop, even if the library call simply checks its arguments and returns immediately, avoiding the more usual process of setting up a timer/callback and calling the scheduler.
usleep() and sleep() are translated to nanosleep() system calls. Try strace your program and you'll see it. From nanosleep() manual:
nanosleep() suspends the execution of the calling thread until either
at least the time specified in *req has elapsed, or the delivery of a
signal that triggers the invocation of a handler in the calling
thread or that terminates the process.
So I think ulseep(0) will generate an interrupt and a context switch.
That documentation is back from 1997, not sure if it applies to current RHEL5, my Redhat dev systems man page for usleep does not indicate that a sleep time of 0 has no effect.
The parameter you pass is a minimum time for sleeping. There's no guarantee that the thread will wake up after exactly the time specified. Given the specific dynamics of the scheduler, it may result in longer than expected delays.
It also depends on if udelay is implemented as a busy loop for short durations.
As of my experience it has one effect: it's calling an interrupt.
This is good to release the processor for the smallest amount of time in multithreading programming.
I want to calculate time intervals (in 1/10th of 1 second) between some events happening in my program. Thus I use clock function for these needs like follows:
clock_t begin;
clock_t now;
clock_t diff;
begin = clock();
while ( 1 )
{
now = clock();
diff = now - begin;
cout << diff / CLOCKS_PER_SEC << "\n";
//usleep ( 1000000 );
};
I expect the program to print 0 for 1 second, then 1 for 1 sec., then 2 for 1 sec. and so on... In fact it prints 0 for about 8 seconds, then 1 for about 8 seconds and so on...
By the way, if I add usleep in order program prints only 1 time per second, it prints only 0 all way long...
Great thanks for help!
The clock() function returns the amount of CPU time charged to your program. When you are blocked inside a usleep() call, no time is being charged to you, making it very clear why your time never seems to increase. As to why you seem to be taking 8 seconds to be charged one second -- there are other things going on within your system, consuming CPU time that you would like to be consuming but you must share the processor. clock() cannot be used to measure the passage of real time.
I bet your printing so much to stdout that old prints are getting buffered. The buffer is growing and the output to the console can't keep up with your tight loop. By adding the sleep you're allowing the buffer some time to flush and catch up. So even though its 8 seconds into your program, your printing stuff from 8 seconds ago.
I'd suggest putting the actual timestamp into the print statement. See if the timestamp is lagging significantly from the actual time.
If you're able to use boost, checkout the Boost Timers library.
Maybe you have to typecast it to double.
cout << (double)(diff / CLOCKS_PER_SEC) << "\n";
Integers get rounded, probably to 0 in your case.
Read about the time() function.
I am trying to get total cpu usage in %. First I should start by saying that "top" will simply not do, as there is a delay between cpu dumps, it requires 2 dumps and several seconds, which hangs my program (I do not want to give it its own thread)
next thing what I tried is "ps" which is instant but always gives very high number in total (20+) and when I actually got my cpu to do something it stayed at about 20...
Is there any other way that I could get total cpu usage? It does not matter if it is over one second or longer periods of time... Longer periods would be more useful, though.
cat /proc/stat
http://www.linuxhowtos.org/System/procstat.htm
I agree with this answer above. The cpu line in this file gives the total number of "jiffies" your system has spent doing different types of processing.
What you need to do is take 2 readings of this file, seperated by whatever interval of time you require. The numbers are increasing values (subject to integer rollover) so to get the %cpu you need to calculate how many jiffies have elapsed over your interval, versus how many jiffies were spend doing work.
e.g.
Suppose at 14:00:00 you have
cpu 4698 591 262 8953 916 449 531
total_jiffies_1 = (sum of all values) = 16400
work_jiffies_1 = (sum of user,nice,system = the first 3 values) = 5551
and at 14:00:05 you have
cpu 4739 591 289 9961 936 449 541
total_jiffies_2 = 17506
work_jiffies_2 = 5619
So the %cpu usage over this period is:
work_over_period = work_jiffies_2 - work_jiffies_1 = 68
total_over_period = total_jiffies_2 - total_jiffies_1 = 1106
%cpu = work_over_period / total_over_period * 100 = 6.1%
Try reading /proc/loadavg. The first three numbers are the number of processes actually running (i.e., using a CPU), averaged over the last 1, 5, and 15 minutes, respectively.
http://www.linuxinsight.com/proc_loadavg.html
Read /proc/cpuinfo to find the number of CPU/cores available to the systems.
Call the getloadavg() (or alternatively read the /proc/loadavg), take the first value, multiply it by 100 (to convert to percents), divide by number of CPU/cores. If the value is greater than 100, truncate it to 100. Done.
Relevant documentation: man getloadavg and man 5 proc
N.B. Load average, usual to *NIX systems, can be more than 100% (per CPU/core) because it actually measures number of processes ready to be run by scheduler. With Windows-like CPU metric, when load is at 100% you do not really know whether it is optimal use of CPU resources or system is overloaded. Under *NIX, optimal use of CPU loadavg would give you value ~1.0 (or 2.0 for dual system). If the value is much greater than number CPU/cores, then you might want to plug extra CPUs into the box.
Otherwise, dig the /proc file system.
cpu-stat is a C++ project that permits to read Linux CPU counter from /proc/stat .
Get CPUData.* and CPUSnaphot.* files from cpu-stat's src directory.
Quick implementation to get overall cpu usage:
#include "CPUSnapshot.h"
#include <chrono>
#include <thread>
#include <iostream>
int main()
{
CPUSnapshot previousSnap;
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
CPUSnapshot curSnap;
const float ACTIVE_TIME = curSnap.GetActiveTimeTotal() - previousSnap.GetActiveTimeTotal();
const float IDLE_TIME = curSnap.GetIdleTimeTotal() - previousSnap.GetIdleTimeTotal();
const float TOTAL_TIME = ACTIVE_TIME + IDLE_TIME;
int usage = 100.f * ACTIVE_TIME / TOTAL_TIME;
std::cout << "total cpu usage: " << usage << " %" << std::endl;
}
Compile it:
g++ -std=c++11 -o CPUUsage main.cpp CPUSnapshot.cpp CPUData.cpp
cat /proc/stat
http://www.linuxhowtos.org/System/procstat.htm
I suggest two files to starting...
/proc/stat and /proc/cpuinfo.
http://www.mjmwired.net/kernel/Documentation/filesystems/proc.txt
have a look at this C++ Lib.
The information is parsed from /proc/stat. it also parses memory usage from /proc/meminfo and ethernet load from /proc/net/dev
----------------------------------------------
current CPULoad:5.09119
average CPULoad 10.0671
Max CPULoad 10.0822
Min CPULoad 1.74111
CPU: : Intel(R) Core(TM) i7-10750H CPU # 2.60GHz
----------------------------------------------
network load: wlp0s20f3 : 1.9kBit/s : 920Bit/s : 1.0kBit/s : RX Bytes Startup: 15.8mByte TX Bytes Startup: 833.5mByte
----------------------------------------------
memory load: 28.4% maxmemory: 16133792 Kb used: 4581564 Kb Memload of this Process 170408 KB
----------------------------------------------