Implementing a low-overhead interval timer with C++ in Linux

Implementing a low-overhead interval timer with C++ in Linux - c++

I am trying to implement a background user-space program that will perform various tasks and calculations every 100ms in Linux. I can do this by using the alarm signal in Linux and the following structure is the way I implemented my interval timer.
void timer_handler (int signum){
printf("In timer handler!\n");
}
int main (){
struct sigaction s_action;
struct itimerval timer;
/* Set Up a Timer */
/* Install timer_handler as the signal handler for SIGVTALRM. */
memset (&s_action, 0, sizeof (s_action));
s_action.sa_handler = &timer_handler;
sigaction (SIGVTALRM, &s_action, NULL);
/* Timer configuration for 100ms */
timer.it_value.tv_sec = 0;
timer.it_value.tv_usec = 100000;
timer.it_interval.tv_sec = 0;
timer.it_interval.tv_usec = 100000;
/* Set up the timer */
setitimer (ITIMER_VIRTUAL, &timer, NULL);
while(1);
}
This approach, however, does not seem to be the optimal way of doing (in terms of performance) this because of the infinite loop. It significantly increases CPU utilization. Once it starts running in the background, the performance of my other programs degrades by as mush as 15%.
What I would ideally want is to have a program that, unless a timer interrupt occurs, will keep sleeping. Multi-threading seems to be an option but I am not very experienced with that subject. I would appreciate any suggestions or pointers on how to implement such program with minimal overhead.

Read time(7), signal(7), timerfd_create(2), poll(2), nanosleep(2) and Advanced Linux Programming.
Your signal handler is incorrect (it should not call printf; it could write).
You could have
while(1) poll(NULL, 0, 1);
but probably a real event loop using poll on a file descriptor initialized with timerfd_create should be better.
I assume of course that you are fairly confident that every periodic task lasts much less than the period. (e.g. that each task needs no more than 50 milliseconds but has a 100 millisecond period).

Related

Safest way to implement multiple timers inside a thread in C++

As the title says, I'm looking for the best way to implement multiple timers in C++ (not c++ 11).
My idea is having a single pthread (posix) to handle timers.
I need at least 4 timers, 3 periodic and 1 single shot.
The minimum resolution should be 1 second (for the shortest timer) and 15 hours for the longest one.
All the timers should be running at the same time.
These are the different implementations that come to my mind (I don't know if they are the safest in a thread environment or the easiest ones):
1) Using itimerspec, sigaction and sigevent structure like this:
static int Tcreate( char *name, timer_t *timerID, int expireMS, int intervalMS )
{
struct sigevent te;
struct itimerspec its;
struct sigaction sa;
int sigNo = SIGRTMIN;
sa.sa_flags = SA_SIGINFO;
sa.sa_sigaction = app;
sigemptyset(&sa.sa_mask);
if (sigaction(sigNo, &sa, NULL) == -1)
{
perror("sigaction");
}
/* Set and enable alarm */
te.sigev_notify = SIGEV_SIGNAL;
te.sigev_signo = sigNo;
te.sigev_value.sival_ptr = timerID;
timer_create(CLOCK_REALTIME, &te, timerID);
its.it_interval.tv_sec = 0;
its.it_interval.tv_nsec = intervalMS * 1000000;
its.it_value.tv_sec = 0;
its.it_value.tv_nsec = expireMS * 1000000;
timer_settime(*timerID, 0, &its, NULL);
return 1;
}
2) Using clock() and checking for time difference, like this:
std::clock_t start;
double duration;
start = std::clock();
duration = ( std::clock() - start ) / (double) CLOCKS_PER_SEC;
3) Using chrono like this:
auto diff = tp - chrono::system_clock::time_point();
cout << "diff:" << chrono::duration_cast<chrono::minutes>(diff).count()
<< " minute(s)" << endl;
Days days = chrono::duration_cast<Days>(diff);
cout << "diff:" << days.count() << " day(s)" << endl;
Please, consider these as ideas, not actual working code.
What is your opinion about it ?

If your timer thread is responsible only for the timers, and the minimum resolution is 1 second, and the timing doesn't need to be that precise (i.e. if +/- 0.1 second is good enough), then a simple implementation for the timer thread is to just sleep for 1 second, check for any timers that need to fire, and repeat, as in the following psuedocode:
repeat:
sleep 1
t = t+1
for timer in timers where timer(t) = true:
fire(timer)
The hard part will be populating the structure that stores the timers - presumably timers will be set by other threads, possibly by multiple threads that could try to set timers simultaneously. It would be advisable to use some standard data structure like a thread-safe queue to pass messages to the timer thread, which on each cycle would then update the collection of timers itself:
repeat:
sleep 1
t = t+1
while new_timer_spec = pop(timer_queue):
add_timer(new_timer_spec)
for timer in timers where timer(t) = true:
fire(timer)
Another thing to consider is the nature of fire(timer) - what to do here really depends on the needs of the threads that use the timers. Perhaps just setting a variable that they could read would be sufficient, or maybe this could fire a signal that threads could listen for.

Since all your timer creation apparently goes through a single API (i.e., the controlling code has visibility into all timers), you can avoid signals or busy-looping entirely and keep a sorted list of timers (like a std::map keyed by deadline), and simply wait on a condition variable using (for example) pthread_cond_timedwait. The condition variable mutex protects the list of timers.
If you schedule a new timer whose deadline is earlier than the current "next" timer, you'll need to wake the sleeping thread and schedule an adjusted sleep (if it wasn't for this requirement you could use plain usleep or whatever). This all happens inside the mutex associated with the condition variable.
You don't have to use condition variables, but they seem the cleanest, since the associated mutex is naturally used to protect the list of timers. You could probably also build this on top of a semaphone with sem_timedwait, but or on top of select on an internal socket, pipe or something like that, but then you're stuck separately controlling multi-threaded access to the timer queue.

How can I measure CPU time in C++ on windows and include calls of system()?

I want to run some benchmarks on a C++ algorithm and want to get the CPU time it takes, depending on inputs. I use Visual Studio 2012 on Windows 7. I already discovered one way to calculate the CPU time in Windows: How can I measure CPU time and wall clock time on both Linux/Windows?
However, I use the system() command in my algorithm, which is not measured that way. So, how can I measure CPU time and include the times of my script calls via system()?
I should add a small example. This is my get_cpu_time-function (From the link described above):
double get_cpu_time(){
FILETIME a,b,c,d;
if (GetProcessTimes(GetCurrentProcess(),&a,&b,&c,&d) != 0){
// Returns total user time.
// Can be tweaked to include kernel times as well.
return
(double)(d.dwLowDateTime |
((unsigned long long)d.dwHighDateTime << 32)) * 0.0000001;
}else{
// Handle error
return 0;
}
}
That works fine so far, and when I made a program, that sorts some array (or does some other stuff that takes some time), it works fine. However, when I use the system()-command like in this case, it doesn't:
int main( int argc, const char* argv[] )
{
double start = get_cpu_time();
double end;
system("Bla.exe");
end = get_cpu_time();
printf("Everything took %f seconds of CPU time", end - start);
std::cin.get();
}
The execution of the given exe-file is measured in the same way and takes about 5 seconds. When I run it via system(), the whole thing takes a CPU time of 0 seconds, which obviously does not include the execution of the exe-file.
One possibility would be to get a HANDLE on the system call, is that possible somehow?

Linux:
For the wall clock time, use gettimeofday() or clock_gettime()
For the CPU time, use getrusage() or times()

It will actually prints the CPU time that your program takes. But if you use threads in your program, It will not work properly. You should wait for thread to finish it's job before taking the finish CPU time. So basically you should write this:
WaitForSingleObject(threadhandle, INFINITE);
If you dont know what exactly you use in your program (if it's multithreaded or not..) you can create a thread for doing that job and wait for termination of thread and measure the time.
DWORD WINAPI MyThreadFunction( LPVOID lpParam );
int main()
{
DWORD dwThreadId;
HANDLE hThread;
int startcputime, endcputime, wcts, wcte;
startcputime = cputime();
hThread = CreateThread(
NULL, // default security attributes
0, // use default stack size
MyThreadFunction, // thread function name
NULL, // argument to thread function
0, // use default creation flags
dwThreadIdArray);
WaitForSingleObject(hThread, INFINITE);
endcputime = cputime();
std::cout << "it took " << endcputime - startcputime << " s of CPU to execute this\n";
return 0;
}
DWORD WINAPI MyThreadFunction( LPVOID lpParam )
{
//do your job here
return 0;
}

If your using C++11 (or have access to it) std::chrono has all of the functions you need to calculate how long a program has run.

You'll need to add your process to a Job object before creating any child processes. Child processes will then automatically run in the same job, and the information you want can be found in the TotalUserTime and TotalKernelTime members of the JOBOBJECT_BASIC_ACCOUNTING_INFORMATION structure, available through the QueryInformationJobObject function.
Further information:
Resource Accounting for Jobs
JOBOBJECT_BASIC_ACCOUNTING_INFORMATION structure
Beginning with Windows 8, nested jobs are supported, so you can use this method even if some of the programs already rely on job objects.

I don't think there is a cross platform mechanism. Using CreateProcess to launch the application, with a WaitForSingleObject for the application to finish, would allow you to get direct descendants times. After that you would need job objects for complete accounting (if you needed to time grandchildren)

You might also give external sampling profilers a shot. I've used the freebie "Sleepy" [http://sleepy.sourceforge.net/]and even better "Very Sleepy" [http://www.codersnotes.com/sleepy/] profilers under Windows and been very happy with the results -- nicely formatted info in a few minutes with virtually no effort.
There is a similar project called "Shiny" [http://sourceforge.net/projects/shinyprofiler/] that is supposed to work on both Windows and *nix.

You can try using boost timer. It is cross-platform capable. Sample code from boost web-site:
#include <boost/timer/timer.hpp>
#include <cmath>
int main() {
boost::timer::auto_cpu_timer t;
for (long i = 0; i < 100000000; ++i)
std::sqrt(123.456L); // burn some time
return 0;
}

Why main thread is slower than worker thread in pthread-win32?

void* worker(void*)
{
int clk = clock();
float val = 0;
for(int i = 0; i != 100000000; ++i)
{
val += sin(i);
}
printf("val: %f\n", val);
printf("worker: %d ms\n", clock() - clk);
return 0;
}
int main()
{
pthread_t tid;
pthread_create(&tid, NULL, worker, NULL);
int clk = clock();
float val = 0;
for(int i = 0; i != 100000000; ++i)
{
val += sin(i);
}
printf("val: %f\n", val);
printf("main: %d ms\n", clock() - clk);
pthread_join(tid, 0);
return 0;
}
Main thread and the worker thread are supposed to run equally fast, but the result is:
val: 0.782206
worker: 5017 ms
val: 0.782206
main: 8252 ms
Main thread is much slower, I don't know why....
Problem solved. It's the compiler's problem, GCC(MinGW) behaves weirdly on Windows.
I compliled the code in Visual Studio 2012, there's no speed difference.

Main thread and the worker thread are supposed to run equally fast, but the result is:
I have never seen a threading system outside a realtime OS which provided such guarantees. With windows threads and all other threading systems(I have also use posix threads, and whatever the lightweight threading on MacOS X is, and threads in C# threads) in Desktop systems it is my understanding that there are no performance guarantees in terms or how fast one thread will be in relation to another.
A possible explanation (speculation) could be that since you are using a modern quadcore it could be raising the clock rate on the main core. When there are mostly single threaded workloads modern i5/i7/AMD-FX systems raise the clock rate on one core to a pre-rated level that stock cooling can dissipate the heat for. On more parallel workloads all the cores get a smaller bump in clock speed, again pre-rated based on heat dissipation and when idle all of the cores are throttled down to minimize power usage. It is possible that the amount of background work is mostly performed on a single core and the amount of time the second thread spends on the second core is not enough to justify switching to the mode where all the cores speed is boosted.
I would try again with 4 threads and 10x the workload. If you have a tool that monitors CPU load and clock-speeds I would check that. Using that information you can infer if I am right or wrong.
Another option might be profiling and seeing if what part of the work is taking time. It could be that the OS calls are taking more time than your workload.
You could also test your software on another machine with different performance characteristics such as steady clock-speed or single core. This would provide more information.

What could be happening is that the worker thread execution is being interleaved with main's execution, so that some of the worker thread's execution time is being counted against main's time. You could try putting a sleep(10) (some time larger than the run-time of the worker and of main) at the very beginning of the worker and run again.

Concurrent server using pthread API

I am writing a simple client-server application using pthread-s API, which in pseudo code
looks something like this:
static volatile sig_atomic_t g_running = 1;
static volatile sig_atomic_t g_threads = 0;
static pthread_mutex_t g_threads_mutex;
static void signalHandler(int signal)
{
g_running = 0;
}
static void *threadServe(void *params)
{
/* Increment the number of currently running threads. */
pthread_mutex_lock(&g_threads_mutex);
g_threads++;
pthread_mutex_unlock(&g_threads_mutex);
/* handle client's request */
/* decrement the number of running threads */
pthread_mutex_lock(&g_threads_mutex);
g_threads--;
pthread_mutex_unlock(&g_threads_mutex);
}
int main(int argc, char *argv[])
{
/* do all the initialisation
(set up signal handlers, listening socket, ... ) */
/* run the server loop */
while (g_running)
{
int comm_sock = accept(listen_socket, NULL, 0);
pthread_create(&thread_id, NULL, &threadServe, comm_sock) ;
pthread_detach(thread_id);
}
/* wait for all threads that are yet busy processing client requests */
while (1)
{
std::cerr << "Waiting for all threads to finish" << std::endl;;
pthread_mutex_lock(&g_threads_mutex);
if (g_threads <= 0)
{
pthread_mutex_unlock(&g_threads_mutex);
break;
}
pthread_mutex_unlock(&g_threads_mutex);
}
/* clean up */
}
So the server is running in an infinite loop until a signal (SIGINT or SIGTERM) is received. The purpose of the second while loop is to let all the threads (that were processing client requests at the time a signal was received) to have a chance to finish the work they already started.
However I don't like this design very much, because that second while loop is basically a busy loop wasting cpu resources.
I tried to search on Google for some good examples on threaded concurrent server, but I had no luck. An idea that came to my mind was to use pthread_cond_wait() istead of that loop, but I am not sure if this does not bring further problems.
So the question is, how to improve my design, or point me to a nice simple example that deals with similar problem as mine.
EDIT:
I was considering pthread_join(), but I din't know how to join with worker thread,
while the main server loop (with accept() call in it) would be still running.
If I called pthread_join() somewhere after pthread_create()
(instead of pthread_detach()), then the while loop would be blocked until the worker
thread is done and the whole threading would not make sense.
I could use pthread_join() if I spawned all the threads at program start,
but then I would have them around for the entire life of my server,
which I thought might be a little inefficient.
Also after reading man page I understood, that pthread_detach() is exactly
suitable for this purpose.

The busy loop slurping CPU can easily be altered by having a usleep(10000); or something like that outside your mutex lock.
It would be more light-weight if you use a std::atomic<int> g_threads; - that way, you could get rid of the mutex altogether.
If you have an array of (active) thread_id's, you could just use a loop of
for(i = 0; i < num_active_threads; i++)
pthread_join(arr[i]);

pthread sleep function, cpu consumption

On behalf, sorry for my far from perfect English.
I've recently wrote my self a demon for Linux (to be exact OpenWRT router) in C++ and i came to problem.
Well there are few threads there, one for each opened TCP connection, main thread waiting for new TCP connections and, as I call it, commander thread to check for status.
Every thing works fine, but my CPU is always at 100%. I now that its because of the commander code:
void *CommanderThread(void* arg)
{
Commander* commander = (Commander*)arg;
pthread_detach(pthread_self());
clock_t endwait;
while(true)
{
uint8_t temp;
endwait = clock () + (int)(1 * CLOCKS_PER_SEC);
for(int i=0;i<commander->GetCount();i++)
{
ptrRelayBoard rb = commander->GetBoard(i);
if (rb!= NULL)
rb->Get(0x01,&temp);
}
while (clock() < endwait);
}
return NULL;
}
As you can see the program do stuff every 1s. Time is not critical here. I know that CPU is always checking did the time passed. I've tried do do something like this:
while (clock() < endwait)
usleep(200);
But when the function usleep (and sleep also) seam to freeze the clock increment (its always a constant value after the usleep).
Is there any solution, ready functions (like phread_sleep(20ms)), or walk around for my problem? Maybe i should access the main clock somehow?
Here its not so critical i can pretty much check how long did the execution of status checking took (latch the clock() before, compare with after), and count the value to put as an argument to the usleep function. But in other thread, I would like to use this form.
Do usleep is putting whole process to freeze?
I'm currently debugging it on Cygwin, but don't think the problem lies here.
Thanks for any answers and suggestions its much appreciated.
J.L.

If it doesn't need to be exactly 1s, then just usleep a second. usleep and sleep put the current thread into an efficient wait state that is at least the amount of time you requested (and then it becomes eligible for being scheduled again).
If you aren't trying to get near exact time there's no need to check clock().

I've I have resolved it other way.
#include <sys/time.h>
#define CLOCK_US_IN_SECOND 1000000
static long myclock()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return (tv.tv_sec * CLOCK_US_IN_SECOND) + tv.tv_usec;
}
void *MainThread(void* arg)
{
Commander* commander = (Commander*)arg;
pthread_detach(pthread_self());
long endwait;
while(true)
{
uint8_t temp;
endwait = myclock() + (int)(1 * CLOCK_US_IN_SECOND);
for(int i=0;i<commander->GetCount();i++)
{
ptrRelayBoard rb = commander->GetBoard(i);
if (rb!= NULL)
rb->Get(0x01,&temp);
}
while (myclock() < endwait)
usleep((int)0.05*CLOCK_US_IN_SECOND);
}
return NULL;
}
Bare in mind, that this code is vulnerable for time change during execution. Don't have idea how to omit that, but in my case its not really important.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js