I am trying to measure the execution time of FIO benchmark. I am, currently, doing so wrapping the FIO call between gettimeofday():
gettimeofday(&startFioFix, NULL);
FILE* process = popen("fio --name=randwrite --ioengine=posixaio rw=randwrite --size=100M --direct=1 --thread=1 --bs=4K", "r");
gettimeofday(&doneFioFix, NULL);
and calculate the elapsed time as:
double tstart = startFioFix.tv_sec + startFioFix.tv_usec / 1000000.;
double tend = doneFioFix.tv_sec + doneFioFix.tv_usec / 1000000.;
double telapsed = (tend - tstart);
Now, the question(s) is
telapsed time is different (larger) than the runt by FIO output. Can you please help me in understanding Why? as the fact can be seen in FIO output:
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=posixaio, iodepth=1
fio-2.2.8
Starting 1 thread
randwrite: (groupid=0, jobs=1): err= 0: pid=3862: Tue Nov 1 18:07:50 2016
write: io=102400KB, bw=91674KB/s, iops=22918, runt= 1117msec
...
and the telapsed is:
telapsed: 1.76088 seconds
what is the actual time taken by FIO execution:
a) runt given by FIO, or
b) the elapsed time by getttimeofday()
How does FIO measure its runt? (probably, this question linked to 1.)
PS: I have tried to replace the gettimeofday(with std::chrono::high_resolution_clock::now()), but it also behaves the same (by same, I mean it also gives larger elapsed time than runt)
Thank you in advance, for your time and assistance.
A quick point:gettimeofday() on Linux uses a clock that doesn't necessarily tick at a constant interval and can even move backwards (see http://man7.org/linux/man-pages/man2/gettimeofday.2.html and https://stackoverflow.com/a/3527632/4513656 ) - this may make telapsed unreliable (or even negative).
Your gettimeofday/popen/gettimeofday measurement (telapsed) is going to be: the fio process start up (i.e. fork+exec on Linux) elapsed + fio initialisation (e.g. thread creation because I see --thread, ioengine initialisation) + fio job elapsed (runt) + fio stopping elapsed + process stop elapsed). You are comparing this to just runt which is a sub component of telapsed. It is unlikely all the non-runt components are going to happen instantly (i.e. take up 0 usecs) so the expectation is that runt will be smaller than telapsed. Try running fio with --debug=all just to see all the things it does in addition to actually submitting I/O for the job.
This is difficult to answer because it depends on what you want you mean when you say "fio execution" and why (i.e. the question is hard to interpret in an unambiguous way). Are you interested in how long fio actually spent trying to submit I/O for a given job (runt)? Are you interested in how long it takes your system to start/stop a new process that just so happens to try and submit I/O for a given period (telapsed)? Are you interested in how much CPU time was spent submitting I/O (none of the above)? So because I'm confused I'll ask you some questions instead: what are you going to use the result for and why?
Why not look at the source code? https://github.com/axboe/fio/blob/7a3b2fc3434985fa519db55e8f81734c24af274d/stat.c#L405 shows runt comes from ts->runtime[ddir]. You can see it is initialised by a call to set_epoch_time() (https://github.com/axboe/fio/blob/6be06c46544c19e513ff80e7b841b1de688ffc66/backend.c#L1664 ), is updated by update_runtime() ( https://github.com/axboe/fio/blob/6be06c46544c19e513ff80e7b841b1de688ffc66/backend.c#L371 ) which is called from thread_main().
Related
I use Fortran to do some scientific computation. I use HPC. As we know, when we submit jobs in a HPC job scheduler, we also specify the wall clock time limit for our jobs. However, when the time is up, if the job is still writing output data, it will be terminated and it will cause some 'NUL' values in the data, causing trouble for the post-processing:
So, could we set an internal mechanism that our job can stop itself peacefully some time before the end of HPC allowance time?
Related Question: How to skip reading "NUL" value in MATLAB's textscan function?
After realizing what you are asking I found out that I implemented similar functionality in my program very recently (commit https://bitbucket.org/LadaF/elmm/commits/f10a1b3421a3dd14fdcbe165aa70bf5c5001413f). But I still have to set the time limit manually.
The most important part:
time_stepping%clock_time_limit is the time limit in seconds. Count the number of system clock ticks corresponding to that:
call system_clock(count_rate = timer_rate)
call system_clock(count_max = timer_max_count)
timer_count_time_limit = int( min(time_stepping%clock_time_limit &
* real(timer_rate, knd), &
real(timer_max_count, knd) * 0.999_dbl) &
, dbl)
Start the timer
call system_clock(count = time_steps_timer_count_start)
Check the timer and exit the main loop with error_exit set to .true. if the time is up
if (mod(time_step,time_stepping%check_period)==0) then
if (master) then
error_exit = time_steps_timer_count_2 - time_steps_timer_count_start > timer_count_time_limit
if (error_exit) write(*,*) "Maximum clock time exceeded."
end if
MPI_Bcast the error exit to other processes
if (error_exit) exit
end if
Now, you may want to get the time limit from your scheduler automatically. That will vary between different job scheduling softwares. There will be an environment variable like $PBS_WALLTIME. See Get walltime in a PBS job script but check your scheduler's manual.
You can read this variable using GET_ENVIRONMENT_VARIABLE()
Normally in an IDE, when you run a program the IDE will tell you the total amount of time that it took to run the program. Is there a way to get the total amount of time that it takes to run a program when using the terminal in Unix/Linux to compile and run?
I'm aware of ctime which allows for getting the total time since 1970, however I want to get just the time that it takes for the program to run.
You can start programs with time:
[:~/tmp] $ time sleep 1
real 0m1.007s
user 0m0.001s
sys 0m0.003s
You are on the right track! You can get the current time and subtract it from the end time of your program. The code below illustrated:
time_t begin = time(0); // get current time
// Do Stuff //
time_t end = time(0); // get current time
// Show number of seconds that have passed since program began //
std::cout << end - begin << std::endl;
NOTE: The time granularity is only a single second. If you need higher granularity, I suggest looking into precision timers such as QueryPerformanceCounter() on windows or clock_gettime() on linux. In both cases, the code will likely work very similarly.
As an addendum to mdsl's answer, if you want to get something close to that measurement in the program itself, you can get the time at the start of the program and get the time at the end of the program (as you said, in time since 1970) - then subtract the start time from the end time.
I am profiling CPU usage on a simple program I am writing. I have different algorithms I want to try, and I also want to know what's the impact on the total system performance.
Currently, I am using ualarm() to execute some instructions at 30Hz; every 15 of those interruptions (every 0.5s) I record the CPU time with getrusage() (in useconds), so I have an estimation on the total cpu time of cpu consumption on that point in time. But to get context, I also need to know the total time elapsed in the system in that time period, so I can have the % of which is used by my program.
/* Main Loop */
while(1)
{
alarm = 0;
/* Waiting Loop: */
for(i=0; !alarm; i++){
}
count++;
/* Do my things */
/* Check if it's time to store cpu log: */
if ((count%count_max) == 0)
{
getrusage(RUSAGE_SELF, &ru);
store_cpulog(f,
(int64_t) ru.ru_utime.tv_sec,
(int64_t) ru.ru_utime.tv_usec,
(int64_t) ru.ru_stime.tv_sec,
(int64_t) ru.ru_stime.tv_usec);
}
}
I have different options, but I don't know which one will provide the most exact result:
Use ualarm for the timing. Currently it's programmed to signal every 0.5 seconds, so I can take those 0.5 seconds as the CPU time. Seems quite obvious to use, but it's the best option?
Use clock_gettime(CLOCK_MONOTONIC): it provides readings with a nanosec resolution.
Use gettimeofday(): provides readings with a usec resolution. I've found opinions against using it.
Any recommendation? Thanks.
Possible solution is to use system function time and don't using busy loop (like #Hasturkun say) in your program. Call in console:
time /path/to/my/program
and after execution of it you get something like:
real 0m1.465s
user 0m0.000s
sys 0m1.210s
Not sure about precision, if it is enough for you.
Callgrind is possibly the best application for profiling C/C++ code under linux. Use it with pride:)
I want to run a function for example func() exactly 1 time per second. However the running time of func() is about 500 ms. How Can I do that? I know if the running time of the function is low, I can write a while loop in func() and sleep() for 1 second after each execution. But now, the running time is high. What should I do to ensure the func() run exactly 1 time per second? Thanks.
Yo do:
Take the current time in start_time.
Perform your job
Take the current time in end_time
Wait for (1 second + start_time - end_time)
That way, you can perform your tasks every seconds reliably. If the task takes less time, you will wait longer and vice versa. Note however that this assumes that your task takes always less than 1 sec. to execute. In the real code, you want to check for that before the sleep statement.
Implementation details depend on the platform.
Note that using this method still results in a small drift due to the time it takes to compute step 4. A more accurate alternative would be to synchronize on integer multiple of one second. That way, over 1000s of cycles you would not drift.
It depends on the level of accuracy you need.
If you want a brute, easy to code solution, you can get the time before first run of the function and save it in some variable (start_time). Create repeat index count variable (repeat_number) that stores next repeat number. Then you can do kinda this:
1) next_run_time = ++repeat_number*1sec + start_time;
2) func();
3) wait_time = next_run_time - current_time;
4) sleep(wait_time)
5) goto 1;
This approach disables accumulation of time error on each iteration.
But for the real application you should find some event framework or library.
I'm trying to make my program self timed and I know two methods
1) using getrusage
truct rusage startu; struct rusage endu;
getrusage(RUSAGE_SELF, &startu);
//Do computation here
getrusage(RUSAGE_SELF, &endu);
double start_sec = start.ru_utime.tv_sec + startu.ru_utime.tv_usec/1000000.0;
double end_sec = endu.ru_utime.tv_sec + endu.ru_utime.tv_usec/1000000.0;
double duration = end_sec - start_sec;
This fetches the user time of a program segment.
2) using clock(), which gets the processor's executing time
double start_sec = (double)clock()/CLOCKS_PER_SEC;
//Do computation here
double end_sec = (double)clock()/CLOCKS_PER_SEC;
double duration = end_sec - start_sec;
This fetches the real time of a program segment.
However, I get really long sys time for both methods. The user time is also longer than without these timings. System time sometimes even doubles the user time.
For example, I'm doing Traveling Salesman Problem, for a input that runs around 3 seconds for both user and real time normally, these two timings both make the user time to be over 5 seconds and real time over 15 secs, which means sys time is around 10 seconds long.
I hope to know if there are ways of improvements or other libraries that are capable of shortening the sys time and user time if possible. If I have to user other libraries, I want libraries for both user time timing and real time timing.
Thanks for any advice!
I suggest to carefully read the time(7) man page and to consider also the clock_gettime(2) syscall.