The documentation for usleep states that calling usleep(0) has no effect. However, on my system (RHEL 5.2) running the small snippets of C++ code below, I find that it actually appears to have the same effect as usleep(1). Is this to be expected, and if so, why is there the discrepancy between the documentation and what I see in real life?
Exhibit A
Code:
#include <unistd.h>
int main()
{
for( int i = 0; i < 10000; i++ )
{
usleep(1);
}
}
Output:
$ time ./test
real 0m10.124s
user 0m0.001s
sys 0m0.000s
Exhibit B
Code:
#include <unistd.h>
int main()
{
for( int i = 0; i < 10000; i++ )
{
usleep(1);
usleep(0);
}
}
Output:
$ time ./test
real 0m20.770s
user 0m0.002s
sys 0m0.001s
Technically it should have no effect. But you must remember that the value passed is used as a minimum, and not an absolute, therefore the system is free to use the smallest possible interval instead.
I just wanted to point out about the time command used here. You should use /usr/bin/time instead of only time command if you want to check your program memory,cpu,time stat. When you call time without full path then built-in time command is called. Look at the difference.
without full path:
# time -v ./a.out
-bash: -v: command not found
real 0m0.001s
user 0m0.000s
sys 0m0.001s
with full path:
# /usr/bin/time -v ./a.out
Command being timed: "./a.out"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:10.87
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 220
Voluntary context switches: 10001
Involuntary context switches: 1
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
use man time for /usr/bin/time manual and use help time for built in time information.
I would have to look at the source to make sure, but my guess is that it's not quite "no effect", but it's probably still less than usleep(1) - there's still the function call overhead, which can be measurable in a tight loop, even if the library call simply checks its arguments and returns immediately, avoiding the more usual process of setting up a timer/callback and calling the scheduler.
usleep() and sleep() are translated to nanosleep() system calls. Try strace your program and you'll see it. From nanosleep() manual:
nanosleep() suspends the execution of the calling thread until either
at least the time specified in *req has elapsed, or the delivery of a
signal that triggers the invocation of a handler in the calling
thread or that terminates the process.
So I think ulseep(0) will generate an interrupt and a context switch.
That documentation is back from 1997, not sure if it applies to current RHEL5, my Redhat dev systems man page for usleep does not indicate that a sleep time of 0 has no effect.
The parameter you pass is a minimum time for sleeping. There's no guarantee that the thread will wake up after exactly the time specified. Given the specific dynamics of the scheduler, it may result in longer than expected delays.
It also depends on if udelay is implemented as a busy loop for short durations.
As of my experience it has one effect: it's calling an interrupt.
This is good to release the processor for the smallest amount of time in multithreading programming.
Related
On this sample code:
#include <iostream>
using namespace std;
int main()
{
cout<<"Hello World!"<<endl;
return 0;
}
I ran the following command 3 times:
perf stat -e cpu-cycles ./sample
Following are the 3 outputs on consecutive executions:
1)
Hello World!
Performance counter stats for './try':
22,71,970 cpu-cycles
0.003634105 seconds time elapsed
2)
Hello World!
Performance counter stats for './try':
18,51,044 cpu-cycles
0.001045616 seconds time elapsed
3)
Hello World!
Performance counter stats for './try':
18,21,834 cpu-cycles
0.001153489 seconds time elapsed
Why would the same program take different number of cpu-cycles on multiple runs?
I am using "Intel(R) Core(TM) i5-5250U CPU # 1.60GHz", "Ubuntu 14.04.3 LTS" and "g++ 4.8.4".
During start-up of the program the binary code is memory-mapped but loaded lazily (as needed). Hence, during the first invocation of the program some CPU cycles are spent by the kernel on (transparently) loading the binary code as execution hits instruction pages that are not yet in the RAM. Subsequent invocations take less time since they reuse the cached code, unless the underlying file has changed or the program hasn't been executed for a long time and its pages were recycled.
I have a C++ binary and I am trying to measure it's worst case performance.
I executed it with
/usr/bin/time -v < command >
And result was as
User time (seconds): 161.07
System time (seconds): 16.64
Percent of CPU this job got: 7%
Elapsed (wall clock) time (h:mm:ss or m:ss): 39:44.46
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 19889808
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1272786
Voluntary context switches: 233597
Involuntary context switches: 138
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
How do I interpret this result, what is causing this application to take this much time?
There is no waiting for user input, it basically deals with large text file and database.
I am looking at it from Linux(OS) perspective.Is it too many context switches(Round robin Scheduling in Linux) that has caused this?
The best thing you can do is to run it with a profiler like gprof, gperftools, callgrind (part of valgrind) or (the best in my opinion) Intel VTune. They can show you what is going one behind the code. And you'd better have the debug symbols (!= than compiling without optimization) to get a clear picture about that. Otherwise you can just have "best guesses" of what is going one under the hood...
As I said, I'm biased towards VTune as it is fast and it displays a lot of useful info. Take a look here at an example:
Vtune example
I used the following function to find the time taken by my code.
#include <sys/time.h>
struct timeval start, end;
gettimeofday(&start,NULL);
//mycode
gettimeofday(&end,NULL);
cout<<" time taken by my code: "<<((end.tv_sec - start.tv_sec) * 1000000 + end.tv_usec - start.tv_usec ) / 1000.0<<" msec"<<endl;
I observed that even though my code runs for 2 hours, yet the time reported by the above function is 1213 milliseconds. I am not able to understand as to why is it happened. Also is there a way by which I may record the time taken by my code in hours correctly
My best guess is that time_t (the type of tv_sec) on your system is signed 32 bits and that (end.tv_sec - start.tv_sec) * 1000000 overflows.
You could test that theory by making sure that you don't use 32 bit arithmetic for this computation:
(end.tv_sec - start.tv_sec) * 1000000LL
That being said, I advise use of the C++11 <chrono> library instead:
#include <chrono>
auto t0 = std::chrono::system_clock::now();
//mycode
auto t1 = std::chrono::system_clock::now();
using milliseconds = std::chrono::duration<double, std::milli>;
milliseconds ms = t1 - t0;
std::cout << " time taken by my code: " << ms.count() << '\n';
The <chrono> library has an invariant that none of the "predefined" durations will overflow in less than +/- 292 years. In practice, only nanoseconds will overflow that quickly, and the other durations will have a much larger range. Each duration has static ::min() and ::max() functions you can use to query the range for each.
The original proposal for <chrono> has a decent tutorial section that might be a helpful introduction. It is only slightly dated. What it calls monotonic_clock is now called steady_clock. I believe that is the only significant update it lacks.
On which platform are you doing this? If it's Linux/Unix-like your easiest non-intrusive bet is simply using the time command from the command-line. Is the code you're running single-threaded or not? Some of the functions in time.h (like clock() e.g. ) return the number of ticks against each core, which may or may not be what you want. And the newer stuff in the chrono may not be as exact as you like (a while back I tried to measure time intervals in nanoseconds with chrono, but the lowest time interval I got back back then was 300ns, which was much less exact than I'd hoped).
This part of the bench marking process may help your purpose:
#include<time.h>
#include<cstdlib>
...
...
float begin = (float)clock()/CLOCKS_PER_SEC;
...
//do your bench-marking stuffs
...
float end = (float)clock()/CLOCKS_PER_SEC;
float totalTime = end - begin;
cout<<"Time Req in the stuffs: "<<totalTime<<endl;
NOTE: This process is a simple alternative to the chrono library
If you are on linux and the code that you want to time is largely the program itself, then you can time your program by passing it as an argument to the time command and look at the 'elapsed time' row.
/usr/bin/time -v <your program's executable>
For example:
/usr/bin/time -v sleep 3 .../home/aakashah/workspace/head/src/GroverStorageCommon
Command being timed: "sleep 3"
User time (seconds): 0.00
System time (seconds): 0.00
Percent of CPU this job got: 0%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:03.00
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 2176
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 165
Voluntary context switches: 2
Involuntary context switches: 0
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
I am profiling CPU usage on a simple program I am writing. I have different algorithms I want to try, and I also want to know what's the impact on the total system performance.
Currently, I am using ualarm() to execute some instructions at 30Hz; every 15 of those interruptions (every 0.5s) I record the CPU time with getrusage() (in useconds), so I have an estimation on the total cpu time of cpu consumption on that point in time. But to get context, I also need to know the total time elapsed in the system in that time period, so I can have the % of which is used by my program.
/* Main Loop */
while(1)
{
alarm = 0;
/* Waiting Loop: */
for(i=0; !alarm; i++){
}
count++;
/* Do my things */
/* Check if it's time to store cpu log: */
if ((count%count_max) == 0)
{
getrusage(RUSAGE_SELF, &ru);
store_cpulog(f,
(int64_t) ru.ru_utime.tv_sec,
(int64_t) ru.ru_utime.tv_usec,
(int64_t) ru.ru_stime.tv_sec,
(int64_t) ru.ru_stime.tv_usec);
}
}
I have different options, but I don't know which one will provide the most exact result:
Use ualarm for the timing. Currently it's programmed to signal every 0.5 seconds, so I can take those 0.5 seconds as the CPU time. Seems quite obvious to use, but it's the best option?
Use clock_gettime(CLOCK_MONOTONIC): it provides readings with a nanosec resolution.
Use gettimeofday(): provides readings with a usec resolution. I've found opinions against using it.
Any recommendation? Thanks.
Possible solution is to use system function time and don't using busy loop (like #Hasturkun say) in your program. Call in console:
time /path/to/my/program
and after execution of it you get something like:
real 0m1.465s
user 0m0.000s
sys 0m1.210s
Not sure about precision, if it is enough for you.
Callgrind is possibly the best application for profiling C/C++ code under linux. Use it with pride:)
I am trying to get total cpu usage in %. First I should start by saying that "top" will simply not do, as there is a delay between cpu dumps, it requires 2 dumps and several seconds, which hangs my program (I do not want to give it its own thread)
next thing what I tried is "ps" which is instant but always gives very high number in total (20+) and when I actually got my cpu to do something it stayed at about 20...
Is there any other way that I could get total cpu usage? It does not matter if it is over one second or longer periods of time... Longer periods would be more useful, though.
cat /proc/stat
http://www.linuxhowtos.org/System/procstat.htm
I agree with this answer above. The cpu line in this file gives the total number of "jiffies" your system has spent doing different types of processing.
What you need to do is take 2 readings of this file, seperated by whatever interval of time you require. The numbers are increasing values (subject to integer rollover) so to get the %cpu you need to calculate how many jiffies have elapsed over your interval, versus how many jiffies were spend doing work.
e.g.
Suppose at 14:00:00 you have
cpu 4698 591 262 8953 916 449 531
total_jiffies_1 = (sum of all values) = 16400
work_jiffies_1 = (sum of user,nice,system = the first 3 values) = 5551
and at 14:00:05 you have
cpu 4739 591 289 9961 936 449 541
total_jiffies_2 = 17506
work_jiffies_2 = 5619
So the %cpu usage over this period is:
work_over_period = work_jiffies_2 - work_jiffies_1 = 68
total_over_period = total_jiffies_2 - total_jiffies_1 = 1106
%cpu = work_over_period / total_over_period * 100 = 6.1%
Try reading /proc/loadavg. The first three numbers are the number of processes actually running (i.e., using a CPU), averaged over the last 1, 5, and 15 minutes, respectively.
http://www.linuxinsight.com/proc_loadavg.html
Read /proc/cpuinfo to find the number of CPU/cores available to the systems.
Call the getloadavg() (or alternatively read the /proc/loadavg), take the first value, multiply it by 100 (to convert to percents), divide by number of CPU/cores. If the value is greater than 100, truncate it to 100. Done.
Relevant documentation: man getloadavg and man 5 proc
N.B. Load average, usual to *NIX systems, can be more than 100% (per CPU/core) because it actually measures number of processes ready to be run by scheduler. With Windows-like CPU metric, when load is at 100% you do not really know whether it is optimal use of CPU resources or system is overloaded. Under *NIX, optimal use of CPU loadavg would give you value ~1.0 (or 2.0 for dual system). If the value is much greater than number CPU/cores, then you might want to plug extra CPUs into the box.
Otherwise, dig the /proc file system.
cpu-stat is a C++ project that permits to read Linux CPU counter from /proc/stat .
Get CPUData.* and CPUSnaphot.* files from cpu-stat's src directory.
Quick implementation to get overall cpu usage:
#include "CPUSnapshot.h"
#include <chrono>
#include <thread>
#include <iostream>
int main()
{
CPUSnapshot previousSnap;
std::this_thread::sleep_for(std::chrono::milliseconds(1000));
CPUSnapshot curSnap;
const float ACTIVE_TIME = curSnap.GetActiveTimeTotal() - previousSnap.GetActiveTimeTotal();
const float IDLE_TIME = curSnap.GetIdleTimeTotal() - previousSnap.GetIdleTimeTotal();
const float TOTAL_TIME = ACTIVE_TIME + IDLE_TIME;
int usage = 100.f * ACTIVE_TIME / TOTAL_TIME;
std::cout << "total cpu usage: " << usage << " %" << std::endl;
}
Compile it:
g++ -std=c++11 -o CPUUsage main.cpp CPUSnapshot.cpp CPUData.cpp
cat /proc/stat
http://www.linuxhowtos.org/System/procstat.htm
I suggest two files to starting...
/proc/stat and /proc/cpuinfo.
http://www.mjmwired.net/kernel/Documentation/filesystems/proc.txt
have a look at this C++ Lib.
The information is parsed from /proc/stat. it also parses memory usage from /proc/meminfo and ethernet load from /proc/net/dev
----------------------------------------------
current CPULoad:5.09119
average CPULoad 10.0671
Max CPULoad 10.0822
Min CPULoad 1.74111
CPU: : Intel(R) Core(TM) i7-10750H CPU # 2.60GHz
----------------------------------------------
network load: wlp0s20f3 : 1.9kBit/s : 920Bit/s : 1.0kBit/s : RX Bytes Startup: 15.8mByte TX Bytes Startup: 833.5mByte
----------------------------------------------
memory load: 28.4% maxmemory: 16133792 Kb used: 4581564 Kb Memload of this Process 170408 KB
----------------------------------------------