I was doing a microbenchmark. My code appears something like this
while(some condition){
struct timespec tps, tpe;
clock_gettime(CLOCK_REALTIME, &tps);
encrypt_data(some_data)
clock_gettime(CLOCK_REALTIME, &tpe);
long time_diff = tpe.tv_nsec - tps.tv_nsec;
usleep(1000);
}
However, the sleep time that I put in usleep() actually affects the observed time_diff that I get. If I measure the execution of this code using the skeleton above the time I get varies from ~1.8us to ~7us for sleep time of 100us and 1000us respectively. Why would the measured time change with the change in sleep time, when sleep time is outside the instrumentation block?
The time results are average of multiple runs. I am using Ubuntu 14.04 to run this code. For encryption I am using aesgcm from openssl.
I know that this is not the best way to microbenchmark but that is not the problem here.
Did you disable the CPU scaling?
sudo cpupower frequency-set --governor performance
See here and here.
Related
Currently I am coding a project that requires precise delay times over a number of computers. Currently this is the code I am using I found it on a forum. This is the code below.
{
LONGLONG timerResolution;
LONGLONG wantedTime;
LONGLONG currentTime;
QueryPerformanceFrequency((LARGE_INTEGER*)&timerResolution);
timerResolution /= 1000;
QueryPerformanceCounter((LARGE_INTEGER*)¤tTime);
wantedTime = currentTime / timerResolution + ms;
currentTime = 0;
while (currentTime < wantedTime)
{
QueryPerformanceCounter((LARGE_INTEGER*)¤tTime);
currentTime /= timerResolution;
}
}
Basically the issue I am having is this uses alot of CPU around 16-20% when I start to call on the function. The usual Sleep(); uses Zero CPU but it is extremely inaccurate from what I have read from multiple forums is that's the trade-off when you trade accuracy for CPU usage but I thought I better raise the question before I set for this sleep method.
The reason why it's using 15-20% CPU is likely because it's using 100% on one core as there is nothing in this to slow it down.
In general, this is a "hard" problem to solve as PCs (more specifically, the OSes running on those PCs) are in general not made for running real time applications. If that is absolutely desirable, you should look into real time kernels and OSes.
For this reason, the guarantee that is usually made around sleep times is that the system will sleep for atleast the specified amount of time.
If you are running Linux you could try using the nanosleep method (http://man7.org/linux/man-pages/man2/nanosleep.2.html) Though I don't have any experience with it.
Alternatively you could go with a hybrid approach where you use sleeps for long delays, but switch to polling when it's almost time:
#include <thread>
#include <chrono>
using namespace std::chrono_literals;
...
wantedtime = currentTime / timerResolution + ms;
currentTime = 0;
while(currentTime < wantedTime)
{
QueryPerformanceCounter((LARGE_INTEGER*)¤tTime);
currentTime /= timerResolution;
if(currentTime-wantedTime > 100) // if waiting for more than 100 ms
{
//Sleep for value significantly lower than the 100 ms, to ensure that we don't "oversleep"
std::this_thread::sleep_for(50ms);
}
}
Now this is a bit race condition prone, as it assumes that the OS will hand back control of the program within 50ms after the sleep_for is done. To further combat this you could turn it down (to say, sleep 1ms).
You can set the Windows timer resolution to minimum (usually 1 ms), to make Sleep() accurate up to 1 ms. By default it would be accurate up to about 15 ms. Sleep() documentation.
Note that your execution can be delayed if other programs are consuming CPU time, but this could also happen if you were waiting with a timer.
#include <timeapi.h>
// Sleep() takes 15 ms (or whatever the default is)
Sleep(1);
TIMECAPS caps_;
timeGetDevCaps(&caps_, sizeof(caps_));
timeBeginPeriod(caps_.wPeriodMin);
// Sleep() now takes 1 ms
Sleep(1);
timeEndPeriod(caps_.wPeriodMin);
I use clock_gettime() in Linux and QueryPerformanceCounter() in Windows to measure time. When measuring time, I encountered an interesting case.
Firstly, I'm calculating DeltaTime in infinite while loop. This loop calls some update functions. To calculating DeltaTime, the program's waiting in 40 milliseconds in an Update function because update functions is empty yet.
Then, in the program compiled as Win64-Debug i measure DeltaTime. It's approximately 0.040f. And this continues as long as the program is running (Win64-Release works like that too). It runs correctly.
But in the program compiled as Linux64-Debug or Linux64-Release, there is a problem.
When the program starts running. Everything is normal. DeltaTime is approximately 0.040f. But after a while, deltatime is calculated 0.12XXf or 0.132XX, immediately after it's 0.040f. And so on.
I thought I was using QueryPerformanceCounter correctly and using clock_gettime() incorrectly. Then I decided to try it with the standard library std::chrono::high_resolution_clock, but it's the same. No change.
#define MICROSECONDS (1000*1000)
auto prev_time = std::chrono::high_resolution_clock::now();
decltype(prev_time) current_time;
while(1)
{
current_time = std::chrono::high_resolution_clock::now();
int64_t deltaTime = std::chrono::duration_cast<std::chrono::microseconds>(current_time - previous_time).count();
printf("DeltaTime: %f", deltaTime/(float)MICROSECONDS);
NetworkManager::instance().Update();
prev_time = current_time;
}
void NetworkManager::Update()
{
auto start = std::chrono::high_resolution_clock::now();
decltype(start) end;
while(1)
{
end = std::chrono::high_resolution_clock::now();
int64_t y = std::chrono::duration_cast<std::chrono::microseconds>(end-start).count();
if(y/(float)MICROSECONDS >= 0.040f)
break;
}
return;
}
Normal
Problem
Possible causes:
Your clock_gettime is not using VDSO and is a system call instead - will be visible if run under strace, can be configured on modern kernel versions.
Your thread gets preempted (taken out of CPU by the scheduler). To run a clean experiment run your app with real time priority and pinned to a specific CPU core.
Also, I would disable CPU frequency scaling when experimenting.
I am trying to measure time take by processes in C++ program with linux and Vxworks. I have noticed that clock_gettime(CLOCK_REALTIME, timespec ) is accurate enough (resolution about 1 ns) to do the job on many Oses. For a portability matter I am using this function and running it on both Vxworks 6.2 and linux 3.7.
I ve tried to measure the time taken by a simple print:
#define <timers.h<
#define <iostream>
#define BILLION 1000000000L
int main(){
struct timespec start, end; uint32_t diff;
for(int i=0; i<1000; i++){
clock_gettime(CLOCK_REALTME, &start);
std::cout<<"Do stuff"<<std::endl;
clock_gettime(CLOCK_REALTME, &end);
diff = BILLION*(end.tv_sec-start.tv_sec)+(end.tv_nsec-start.tv_nsec);
std::cout<<diff<<std::endl;
}
return 0;
}
I compiled this on linux and vxworks. For linux results seemed logic (average 20 µs). But for Vxworks, I ve got a lot of zeros , then 5000000 ns , then a lot of zeros...
PS , for vxwroks, I runned this app on ARM-cortex A8, and results seemed random
have anyone seen the same bug before,
In vxworks, the clock resolution is defined by the system scheduler frequency. By default, this is typically 60Hz, however may be different dependant on BSP, kernel configuration, or runtime configuration.
The VxWorks kernel configuration parameters SYS_CLK_RATE_MAX and SYS_CLK_RATE_MIN define the maximum and minimum values supported, and SYS_CLK_RATE defines the default rate, applied at boot.
The actual clock rate can be modified at runtime using sysClkRateSet, either within your code, or from the shell.
You can check the current rate by using sysClkRateGet.
Given that you are seeing either 0 or 5000000ns - which is 5ms, I would expect that your system clock rate is ~200Hz.
To get greater resolution, you can increase the system clock rate. However, this may have undesired side effects, as this will increase the frequency of certain system operations.
A better method of timing code may be to use sysTimestamp which is typically driven from a high frequency timer, and can be used to perform high-res timing of short-lived activities.
I think in vxworks by default the clock resolution is 16.66ms which you can get by calling clock_getres() function. You can change the resolution by calling sysclkrateset() function(max resolution supported is 200us i guess by passing 5000 as argument to sysclkrateset function). You can then calculate the difference between two timestamps using difftime() function
I'm using boost::timer to time a section of my code. If I run the code with one thread:
$ time ./runfoo 1
Took 2.08s
real 0m2.086s
user 0m1.611s
sys 0m0.475s
2.08 is the output of t.elapsed().
Yet if I run the code with 4 threads:
$ time ./runfoo 4
Took 2.47s
real 0m1.022s
user 0m1.834s
sys 0m0.628s
It seems to be tracking user+sys time, not real time.
How do I make it track real time? I'm using Boost 1.46. To be more specific, the timer is in the main thread, which just calls a function that ends up using the multiple threads.
EDIT: The relevant source looks something like this:
boost::asio::io_service ios;
boost::thread_group pool;
boost::asio::io_service::work work(ios);
pool.create_thread(boost::bind(&boost::asio::io_service::run, &ioService));
pool.create_thread(boost::bind(&boost::asio::io_service::run, &ioService));
pool.create_thread(boost::bind(&boost::asio::io_service::run, &ioService));
pool.create_thread(boost::bind(&boost::asio::io_service::run, &ioService));
{
boost::timer t;
function_which_posts_to_ios(ios);
printf("Took %.2fs\n", t.elapsed();
}
As to what output I expect, I'd like the program in the 2nd run to print "Took 1.02s" instead of "Took 2.47s", as 1.02s was the actual amount of seconds that elapsed.
It appears that you are using the deprecated Version 1 timers where the semantics of elapsed() was not consistent across platforms, wall-clock time on some and CPU time on others. In the Version 2 cpu_timer, the elapsed() method returns a struct which has distinct members for real, user, and system time.
If you cannot use the Version 2 API, you can use boost::posix_time instead to measure wall clock time. See c++ boost get current time in milliseconds.
I am profiling CPU usage on a simple program I am writing. I have different algorithms I want to try, and I also want to know what's the impact on the total system performance.
Currently, I am using ualarm() to execute some instructions at 30Hz; every 15 of those interruptions (every 0.5s) I record the CPU time with getrusage() (in useconds), so I have an estimation on the total cpu time of cpu consumption on that point in time. But to get context, I also need to know the total time elapsed in the system in that time period, so I can have the % of which is used by my program.
/* Main Loop */
while(1)
{
alarm = 0;
/* Waiting Loop: */
for(i=0; !alarm; i++){
}
count++;
/* Do my things */
/* Check if it's time to store cpu log: */
if ((count%count_max) == 0)
{
getrusage(RUSAGE_SELF, &ru);
store_cpulog(f,
(int64_t) ru.ru_utime.tv_sec,
(int64_t) ru.ru_utime.tv_usec,
(int64_t) ru.ru_stime.tv_sec,
(int64_t) ru.ru_stime.tv_usec);
}
}
I have different options, but I don't know which one will provide the most exact result:
Use ualarm for the timing. Currently it's programmed to signal every 0.5 seconds, so I can take those 0.5 seconds as the CPU time. Seems quite obvious to use, but it's the best option?
Use clock_gettime(CLOCK_MONOTONIC): it provides readings with a nanosec resolution.
Use gettimeofday(): provides readings with a usec resolution. I've found opinions against using it.
Any recommendation? Thanks.
Possible solution is to use system function time and don't using busy loop (like #Hasturkun say) in your program. Call in console:
time /path/to/my/program
and after execution of it you get something like:
real 0m1.465s
user 0m0.000s
sys 0m1.210s
Not sure about precision, if it is enough for you.
Callgrind is possibly the best application for profiling C/C++ code under linux. Use it with pride:)