A deterministic execution time measure - c++

Some algorithms depend on a time measure. E.g., 10% of the time, follow approach A. If that does not work, follow B for 20% of the time. If that does not work, do C.
Measuring execution time in seconds is non-deterministic. Cache state, interleaving non-user tasks on a core, or even simply the dynamic boost of a modern processor's clock speed are external influences that alter the execution time of otherwise deterministic code. Hence, the algorithm might behave non-deterministically if classic execution time measures are used.
To keep the algorithm behaving deterministically, I'm looking for a deterministic way to measure execution time. This is possible, e.g., the CPLEX solver has a deterministic time measure called ticks.
I know this simple question does not have a simple answer. So let me narrow it down a little:
The determinism property is a hard constraint. I'd rather have a measure that only very weakly correlates with measured execution time, as long as it is deterministic.
Ideally, the deterministic time measure measures the whole program execution, including statically compiled libraries. But if this is not possible, then measuring the execution time of the source code I can modify is fine.
I'm willing to take a 100% performance hit, but not more. Less of a performance hit would be better though :)
It's ok if the compiled binary is no longer portable among different CPU models.
Some approaches I have considered, but don't know how hard they are to implement or how well they will work:
modifying a compiler to add a command incrementing a global counter inbetween each other command in the compiled code. This seems like the most principled approach, and may in theory even work for statically compiled libraries.
counting the number of memory accesses. No idea how to do this in practice. Probably also by modifying a compiler?
counting the number of if-statements and loop condition checks using a global counter in the source code. This can be done easily by, e.g., macros, but it will overlook many library calls (e.g., a simple call to sort a vector will not increase the counter), and hence may not correlate much with the actual execution time.
accessing hardware performance counters to, e.g., count the number of instructions of a process, perhaps through a library such as PAPI. The problem here is that I think these counters are non-deterministic as well?
So, how to deterministically measure the execution time of a program?
Edit: measuring cpu time (e.g. by the clock() function) is definitely better than my naive wall clock time examples. However, measuring cpu time is by no means deterministic: runs of the same deterministic program will yield different cpu times. I'm really looking for a deterministic measure (or a measure of "work done" as #mevets calls it).

You can access process time (number of clock cycles used by the process) instead of wall clock time (time elapsed including any other processes that context-switched in between) by calling the C standard library function clock(). There are CLOCKS_PER_SEC clock ticks in one second. Note that this may run faster than wall clock time if your program is multithreaded -- i.e., it measures clock cycles consumed by the program over all processor cores. Therefore, CLOCKS_PER_SEC clock ticks refers to one second of compute time on one processor core. To implement the switching between methods, you could use asynchronous I/O (such as with newfangled C++20 coroutines, or Boost coroutines), checking process time occassionally, or you could do timed software interrupts that set a flag which is picked up by the main thread of execution, which then switches to a new method.
You probably don't want to increment a counter after each instruction. That creates enormous compute overhead and gums up your processor pipeline because every other instruction depends on the instruction 2 before it, and also your instruction cache.
Code example (POSIX):
static /* possibly thread_local */ std::atomic<int> method;
void interrupt_handler(int signal_code) {
void calculation(/* input */) {
auto prev_signal_handler = signal(SIGINT, &interrupt_handler);
try {
int prev_method = 0;
// schedule timer interrupts
for (size_t num_ns : /* list of times, in ns */) {
timer_t t_id;
sigevent ev;
ev.sigev_notify = SIGNAL;
ev.sigev_signo = SIGINT;
ev.sigev_value.sival_ptr = &t_id;
timer_create(CLOCK_THREAD_CPUTIME_ID, &ev, &t_id);
itimerspec t_spec;
t_spec.it_interval.tv_sec = t_spec.it_value.tv_sec = num_ns / 1000000000;
t_spec.it_interval.tv_nsec = t_spec.it_value.tv_nsec = num_ns % 1000000000;
timer_settime(t_id, 0, &t_spec, nullptr);
bool done = false;
while (!done) {
int current_method = method.load();
if (current_method != prev_method) {
// switch method
else {
// continue using current method
catch (...) {
signal(SIGINT, prev_signal_handler);
signal(SIGINT, prev_signal_handler);

You're mired with some detailed solutions that potentially extensively change the code, probably because those are the only approaches you're familiar with, but this is IMHO short sighted. You cannot at this point know for sure that instrumenting the generated code in such an invasive way has merit. Let's step back for a minute.
Some algorithms depend on a time measure. E.g., 10% of the time, follow approach A. If that does not work, follow B for 20% of the time. If that does not work, do C.
I don't think it's true. It's an arbitrary constraint, that's not general at all. The algorithms depend on the "effort", and often real time is a very poor substitute for effort. As you have well stated, any sort of "time" is mired in architectural specifics.
Another problem is the assumption that the algorithms are the units of change. They are generally not, i.e. you don't have as much control here as you think you do, unless you code all the numerical parts in assembly, or thoroughly audit the generated code. Each algorithm, if it succeeds, may produce slightly different results depending on numerical error stackups due to the architecture-dependent selections done by the generated code at runtime. It's a thing, compilers and/or their runtime libraries do plenty of that! So the idea that running the same compiled floating point code on various PCs and will produce bit-identical results is correct as long as your goal is to show it incorrect, but in reality it'll prove incorrect at some later time when you'll be too deep into it to realistically implement the huge changes needed for a fix.
But inside the algorithm you should have plenty of arbitrary points where you can increment a counter - not too often, and use the value of the counter as a measure of effort your algorithm has expended. It doesn't matter much that such a measure has a different scaling factor to "real time", for each algorithm, because real time is not the true requirement here. All you really want is some deterministic way to carry out a decision to switch algorithms, and you can roughly calibrate these arbitrary switchover points to real time once, and keep this calibration frozen: it doesn't really matter exactly, only that you can clearly decide when to switch.
Furthermore, there's some caution to be had when an algorithm produces a result ("converges") very close to the effort threshold. Due to architectural differences, the exact effort required to achieve "convergence" in terms of a fixed floating point threshold may slightly vary between CPU generations. So instead of being a hard cutoff, you need some way of expressing hysteresis, so that if the convergence happens close to the effort cut-off, some more alternative criterion is used for either threshold or convergence, but you'd need to do proper statistical modeling to show that the alternates are sufficiently reliable.

A counter can handle units of work, but is each unit of equal value (ie. time)? The service clock(3) provides an approximate virtual time of execution -- that is time elapsed while your process is actually running, as opposed to real world (wall) time.
Similarly, timer_create may accept clock ids similar to CLOCK_PROCESS_CPUTIME_ID, which permits you to raise a signal after a certain cpu time has passed. Providing your app can be arbitrarily interrupted without entering a undefined state, you could use this to switch from method 1 -> 2 -> 3.
Although better than counting blocks of work, you will need to accept a certain inaccuracy around the exact time to account for system overhead, cache contention, etc..


Should I make a large function atomic in order to benchmark it accurately?

I would like to know how long it takes to execute some code. The code I am executing deals with openCV matrices and operations. The code will be run in a ROS environment on Linux. I don't want the code to be interrupted by system functions during my benchmarking.
Looking at this post about benchmarking, the answerer said the granularity of the result is 15ms. I would like to do much better than that and so I was considering to make the function atomic (just for benchmarking purposes). I'm not sure if it is a good idea for a few reasons, primarily because I don't have a deep understanding of processor architecture.
void atomic_wrapper_function(const object& A, const object& B) {
static unsigned long running_sum = 0;
unsigned long before, after;
before = GetTimeMs64();
function_to_benchmark(A, B);
after = GetTimeMs64();
running_sum += (after - before);
The function I am trying to bench mark is not a short function.
Will the result be accurate? For marking the time I'm considering to use this function by Andreas Bonini.
Will it do something horrible to my computer? Call me superstitious but I think it's good to ask this question.
I'm using C++11 on the Linux Kernel.
C++11 atomics are not atomic in the RTOS way, they just provide guarantees when writing multithreaded code. Linux is not an RTOS. Your code can and will always be interrupted. There are some ways to lessen the effects though, but not without diving very deeply into linux.
You can for example configure the niceness to get interrupted less by other userspace programs. You can tell the kernel on which CPU core to process interrupts, then pin your program to a different cpu. You can increase the timer precision etc, but:
There are many other things that might change the runtime of your algorithm like several layers of CPU caches, power saving features of your CPU, etc... If you are really only interested in benchmarking the execution time of your function for non-hard realtime problems, it is easier to just run the algorithm many many times and get a statistical estimate for the execution time.
Call the function a billion times during the benchmark and average. OR
Benchmark the function from 1 time to a billion times. The measure for execution time you are interested in should scale linearly. Then do some kind of linear regression to get an estimate of that.
OR: You say that you want to know what influence the algorithm has on your total program runtime? Use profiling tools like callgrind (integratable into QtCreator).

Multitasking and measuring time difference

I understand that a preemptive multitasking OS can interrupt a process at any "code position".
Given the following code:
int main() {
while( true ) {
doSthImportant(); // needs to be executed at least each 20 msec
// start of critical section
int start_usec = getTime_usec();
int timeDiff_usec = getTime_usec() - start_usec;
// end of critical section
evalUsedTime( timeDiff_usec );
sleep_msec( 10 );
I would expect this code to usually produce proper results for timeDiff_usec, especially in case that doSthElse() and getTime_usec() don't take much time so they get interrupted rarely by the OS scheduler.
But the program would get interrupted from time to time somewhere in the "critical section". The context switch will do what it is supposed to do, and still in such a case the program would produce wrong results for the timeDiff_usec.
This is the only example I have in mind right now but I'm sure there would be other scenarios where multitasking might get a program(mer) into trouble (as time is not the only state that might be changed at re-entry).
Is there a way to ensure that measuring the time for a certain action works fine?
Which other common issues are critical with multitasking and need to be considered? (I'm not thinking of thread safety - but there might be common issues).
I changed the sample code to make it more precise.
I want to check the time being spent to make sure that doSthElse() doesn't take like 50 msec or so, and if it does I would look for a better solution.
Is there a way to ensure that measuring the time for a certain action works fine?
That depends on your operating system and your privilege level. On some systems, for some privilege levels, you can set a process or thread to have a priority that prevents it from being preempted by anything at lower priority. For example, on Linux, you might use sched_setscheduler to give a thread real-time priority. (If you're really serious, you can also set the thread affinity and SMP affinities to prevent any interrupts from being handled on the CPU that's running your thread.)
Your system may also provide time tracking that accounts for time spent preempted. For example, POSIX defines the getrusage function, which returns a struct containing ru_utime (the amount of time spent in “user mode” by the process) and ru_stime (the amount of time spent in “kernel mode” by the process). These should sum to the total time the CPU spent on the process, excluding intervals during which the process was suspended. Note that if the kernel needs to, for example, spend time paging on behalf of your process, it's not defined how much (if any) of that time is charged to your process.
Anyway, the common way to measure time spent on some critical action is to time it (essentially the way your question presents) repeatedly, on an otherwise idle system, throw out outlier measurements, and take the mean (after eliminating outliers), or take the median or 95th percentile of the measurements, depending on why you need the measurement.
Which other common issues are critical with multitasking and need to be considered? (I'm not thinking of thread safety - but there might be common issues).
Too broad. There are whole books written about this subject.