Multitasking and measuring time difference - c++

I understand that a preemptive multitasking OS can interrupt a process at any "code position".
Given the following code:
int main() {
while( true ) {
doSthImportant(); // needs to be executed at least each 20 msec
// start of critical section
int start_usec = getTime_usec();
doSthElse();
int timeDiff_usec = getTime_usec() - start_usec;
// end of critical section
evalUsedTime( timeDiff_usec );
sleep_msec( 10 );
}
}
I would expect this code to usually produce proper results for timeDiff_usec, especially in case that doSthElse() and getTime_usec() don't take much time so they get interrupted rarely by the OS scheduler.
But the program would get interrupted from time to time somewhere in the "critical section". The context switch will do what it is supposed to do, and still in such a case the program would produce wrong results for the timeDiff_usec.
This is the only example I have in mind right now but I'm sure there would be other scenarios where multitasking might get a program(mer) into trouble (as time is not the only state that might be changed at re-entry).
Is there a way to ensure that measuring the time for a certain action works fine?
Which other common issues are critical with multitasking and need to be considered? (I'm not thinking of thread safety - but there might be common issues).
Edit:
I changed the sample code to make it more precise.
I want to check the time being spent to make sure that doSthElse() doesn't take like 50 msec or so, and if it does I would look for a better solution.

Is there a way to ensure that measuring the time for a certain action works fine?
That depends on your operating system and your privilege level. On some systems, for some privilege levels, you can set a process or thread to have a priority that prevents it from being preempted by anything at lower priority. For example, on Linux, you might use sched_setscheduler to give a thread real-time priority. (If you're really serious, you can also set the thread affinity and SMP affinities to prevent any interrupts from being handled on the CPU that's running your thread.)
Your system may also provide time tracking that accounts for time spent preempted. For example, POSIX defines the getrusage function, which returns a struct containing ru_utime (the amount of time spent in “user mode” by the process) and ru_stime (the amount of time spent in “kernel mode” by the process). These should sum to the total time the CPU spent on the process, excluding intervals during which the process was suspended. Note that if the kernel needs to, for example, spend time paging on behalf of your process, it's not defined how much (if any) of that time is charged to your process.
Anyway, the common way to measure time spent on some critical action is to time it (essentially the way your question presents) repeatedly, on an otherwise idle system, throw out outlier measurements, and take the mean (after eliminating outliers), or take the median or 95th percentile of the measurements, depending on why you need the measurement.
Which other common issues are critical with multitasking and need to be considered? (I'm not thinking of thread safety - but there might be common issues).
Too broad. There are whole books written about this subject.

Related

C++11 Most accurate way to pause execution for a certain amount of time? [duplicate]

This question already has answers here:
How to guarantee exact thread sleep interval?
(4 answers)
accurate sampling in c++
(2 answers)
Closed 4 years ago.
I'm currently working on some C++ code that reads from a video file, parses the video/audio streams into its constituent units (such as an FLV tag) and sends it back out in order to "restream" it.
Because my input comes from file but I want to simulate a proper framerate when restreaming this data, I am considering the ways that I might sleep the thread that is performing the read on the file in order to attempt to extract a frame at the given rate that one might expect out of typical 30 or 60 FPS.
One solution is to use an obvious std::this_thread::sleep_for call and pass in the amount of milliseconds depending on what my FPS is. Another solution I'm considering is using a condition variable, and using std::condition_variable::wait_for with the same idea.
I'm a little stuck, because I know that the first solution doesn't guarantee exact precision -- the sleep will last around as long as the argument I pass in but may in theory be longer. And I know that the std::condition_variable::wait_for call will require lock reacquisition which will take some time too. Is there a better solution than what I'm considering? Otherwise, what's the best methodology to attempt to pause execution for as precise a granularity as possible?
C++11 Most accurate way to pause execution for a certain amount of time?
This:
auto start = now();
while(now() < start + wait_for);
now() is a placeholder for the most accurate time measuring method available for the system.
This is the analogue to sleep as what spinlock is to a mutex. Like a spinlock, it will consume all the CPU cycles while pausing, but it is what you asked for: The most accurate way to pause execution. There is trade-off between accuracy and CPU-usage-efficiency: You must choose which is more important to have for your program.
why is it more accurate than std::this_thread::sleep_for?
Because sleep_for yields execution of the thread. As a consequence, it can never have better granularity than the process scheduler of the operating system has (assuming there are other processes competing for time).
The live loop shown above which doesn't voluntarily give up its time slice will achieve the highest granularity provided by the clock that is used for measurement.
Of course, the time slice granted by the scheduler will eventually run out, and that might happen near the time we should be resuming. Only way to reduce that effect is to increase the priority of our thread. There is no standard way of affecting the priority of a thread in C++. The only way to get completely rid of that effect is to run on a non-multi-tasking system.
On multi-CPU systems, one trick that you might want to do is to set the thread affinity so that the OS thread won't be migrated to other hard ware threads which would introduce latency. Likewise, you might want to set thread affinity of your other threads to stay off the time measuring thread. There is no standard tool to set thread affinity.
Let T be the time you wish to sleep for and let G be the maximum time that sleep_for could possibly overshoot.
If T is greater than G, then it will be more efficient to use sleep_for for T - G time units, and only use the live loop for the final G - O time units (where O is the time that sleep_for was observed to overshoot).
Figuring out what G is for your target system can be quite tricky however. There is no standard tool for that. If you over-estimate, you'll waste more cycles than necessary. If you under-estimate, your sleep may overshoot the target.
In case you're wondering what is a good choice for now(), the most appropriate tool provided by the standard library is std::chrono::steady_clock. However, that is not necessarily the most accurate tool available on your system. What tool is the most accurate depends on what system you're targeting.

How to guarantee exact thread sleep interval?

Usually if I want to simulate some work or wait exact time interval I use condition_variable::wait_for or at the worst thread::this_thread::sleep_for. But condition_variable documentation states that wait_for or wait_until methods may block longer than was requested.
This function may block for longer than timeout_duration due to scheduling or resource contention delays.
How exact wait intervals can be guaranteed?
UPDATE
Can I reach it without condition_variable?
You cannot do this.
In order to have exact guarantees like this, you need a real time operating system.
C++ does not guarantee you are on a real time operating system.
So it provides the guarantees that a typical, non-RTOS provides.
Note that there are other complications to programming on a RTOS that go far beyond the scope of this question.
In practice, one thing people when they really want fine-grained timing control (say, they are twiddling around with per-frame or per-scanline buffers or the like, or audio buffers, or whatever) do is check if the time is short, and if so spin-wait. If the time is longer, they wait for a bit less than the amount of time they want to wait for, then wake up and spin.
This is also not guaranteed to work, but works well enough for almost all cases.
On a RTOS, the platform may provide primitives like you want. These lie outside the scope of standard C++. No typical desktop OS is an RTOS that I am aware of. If you are programming for a fighter jet's control hardware or similar, you may be programming on an RTOS.
I hope you aren't writing fighter jet control software and asking this question on stack overflow.
If you did hypothetically sleep for precisely some exact duration, and then performed some action in response (such as getting the current time, or printing a message to the screen), then that action might be delayed for some unknown period of time e.g. due to processor load. This is equivalent to the action happening (almost) immediately but the timer taking longer than expected. Even in the best case scenario, where the timer completes at precisely the time you request, and the operating system allows your action to complete without preempting your process, it will take a few clock cycles to perform that action.
So in other words, on a standard operating system, it is impossible or maybe even meaningless for a timer to complete at precisely the time requested.
How can this be overcome? An academic answer is that you can used specialized software and hardware such as a real-time operating system, but this is vastly more complicated to develop software for than regular programming. What you probably really want to know is, in the common case, the delay that that documentation refers to is not substantial i.e. it is commonly less that 1/100th second.
With a brute force loop... for example:
chrono::microseconds sleep_duration{1000};
auto now = chrono::high_resolution_clock::now()
while(true)
{
auto elapsed = chrono::duration_cast<hrono::microseconds>(chrono::high_resolution_clock::now() - now);
if (elapsed > sleep_duration)
break;
}
That's bit ugly but desktop operating system are not real time so you cannot have such precision.
In order to relax the cpu you look the following snippet:
void little_sleep(std::chrono::microseconds us)
{
auto start = std::chrono::high_resolution_clock::now();
auto end = start + us;
do {
std::this_thread::yield();
} while (std::chrono::high_resolution_clock::now() < end);
}
That depends on what accuracy you can expect. Generally as others have said regular OS (Linux, Windows) cannot guaranty that.
Why?
Your OS have probably has concept of threads. If so, then there is a scheduler which interrupts threads and switch execution to other threads waiting in the queue. And this can spoil accuracy of timers.
What can I do about it?
If you are using embedded system - go for bare metal, i.e. don't use
OS or use so called hard real time operating system.
If you are using Linux, look for Linux RT Preempt Patch in Google. You have to recompile your kernel to include the path (not so complicated though) and then you can create threads with priority above 50 - which means priority above kernel's thread - which in the end means that you can have a thread that can interrupt scheduler and kernel in general, providing quite good time accuracy. In my case it what three orders of magnitude (from few ms of latency to few us).
If you are using Windows, I don't know about such patch, but you can search for for High Precisions timers on Microsoft site. Maybe provided accuracy will be enough for your needs.

C++ main loop stutters occasionally

I encountered a problem where my game-loop stuttered approximating once a second (variable intervals). A single frame then takes over 60ms whereas all others require less than 1ms.
After simplifying a lot I ended with the following program which reproduces the bug. It only measures the frame time and reports it.
#include <iostream>
#include "windows.h"
int main()
{
unsigned long long frequency, tic, toc;
QueryPerformanceFrequency((LARGE_INTEGER*)&frequency);
QueryPerformanceCounter((LARGE_INTEGER*)&tic);
double deltaTime = 0.0;
while( true )
{
//if(deltaTime > 0.01)
std::cerr << deltaTime << std::endl;
QueryPerformanceCounter((LARGE_INTEGER*)&toc);
deltaTime = (toc - tic) / double(frequency);
tic = toc;
if(deltaTime < 0.01) deltaTime = 0.01;
}
}
Again one frame in many is much slower than the others. Adding the if let the error vanish (cerr is never called then). My original problem didn't contain any cerr/cout. However, I consider this as a reproduction of the same error.
cerr is flushed in every iteration, so this is not what happens to create single slow frames. I know from a profiler (Very Sleepy) that the stream internally uses a lock/critical section, but this shouldn't change anything because the program is singlethreaded.
What causes single iterations to stall that much?
Edit: I did some more tests:
Adding std::this_thread::sleep_for( std::chrono::milliseconds(7) ); and therefore reducing the process CPU utilization does not change anything.
With printf("%f\n", deltaTime); the problem vanishes (maybe because it doesn't use a mutex and memory allocation in contrast to the stream)
The design of windows does not guarantee an upper limit on any execution time, since it dynamically allocates runtime resources to all programs using some logic - for example, the scheduler will allocate resources to a process with high priority, and starve out lower priority processes in some circumstances. Programs are statistically more likely to - eventually - be affected by such things if they run tight loops and consume a lot of CPU resources. Because - again eventually - the scheduler will temporarily boost the priority of programs that are being starved and/or reduce the priority of programs that are starving others (in your case, by running a tight loop).
Making the output to std::cerr conditional doesn't change the fact of this happening - it just changes the likelihood that it will happen in a specified time interval, because it changes how the program uses system resources in the loop, and therefore changes how it interacts with the system scheduler, policies, etc.
This sort of thing affects programs running in all non-realtime operating systems, although the precise impact depends on how each OS is implemented (e.g. scheduling strategies, other policies controlling access by programs to resources, etc). There is always a non-zero probability (even if it is small) of such stalls occurring.
If you want absolute guarantees of no stalls on such things, you will need a realtime operating system. These systems are designed to do things more predictably in a timing sense, but that comes with trade-offs, since it also requires your programs to be designed with the knowledge that they MUST complete execution of specified functions within specified time intervals. Realtime operating systems use different strategies, but their enforcing timing of constraints can cause the program to malfunction if the program is not designed with such things in mind.
I'm not sure about it, but it could be that the system is interrupting your main thread to let others run, and since it takes some time (I remember on my Windows XP pc the quantum was 10ms), it will stall a frame.
This is very visible because it is a single-threaded application, if you use several thread they are usually dispatched on several cores of the processor (if available), and the stalls will still be here but less important (if you implemented your application logic right).
Edit: here you can have more information about windows and linux schedulers. Basically, windows use quantums (varying from a handful of milliseconds to 120 ms on Windows Server).
Edit 2: you can see a more detailed explanation on the windows scheduler here.

Sleeping for an exact duration

My understanding of the Sleep function is that it follows "at least semantics" i.e. sleep(5) will guarantee that the thread sleeps for 5 seconds, but it may remain blocked for more than 5 seconds depending on other factors. Is there a way to sleep for exactly a specified time period (without busy waiting).
As others have said, you really need to use a real-time OS to try and achieve this. Precise software timing is quite tricky.
However... although not perfect, you can get a LOT better results than "normal" by simply boosting the priority of the process that needs better timing. In Windows you can achieve this with the SetPriorityClass function. If you set the priority to the highest level (REALTIME_PRIORITY_CLASS: 0x00000100) you'll get much better timing results. Again - this will not be perfect like you are asking for, though.
This is also likely possible on other platforms than Windows, but I've never had reason to do it so haven't tested it.
EDIT: As per the comment by Andy T, if your app is multi-threaded you also need to watch out for the priority assigned to the threads. For Windows this is documented here.
Some background...
A while back I used SetPriorityClass to boost the priority on an application where I was doing real-time analysis of high-speed video and I could NOT miss a frame. Frames were arriving to the pc at a very regular (driven by external framegrabber HW) frequency of 300 frames per second (fps), which fired a HW interrupt on every frame which I then serviced. Since timing was very important, I collected a lot of stats on the interrupt timing (using QueryPerformanceCounter stuff) to see how bad the situation really was, and was appalled at the resulting distributions. I don't have the stats handy, but basically Windows was servicing the interrupt whenever it felt like it when run at normal priority. The histograms were very messy, with the stdev being wider than my ~3ms period. Frequently I would have gigantic gaps of 200 ms or greater in the interrupt servicing (recall that the interrupt fired roughly every 3 ms)!! ie: HW interrupts are FAR from exact! You're stuck with what the OS decides to do for you.
However - when I discovered the REALTIME_PRIORITY_CLASS setting and benchmarked with that priority, it was significantly better and the service interval distribution was extremely tight. I could run 10 minutes of 300 fps and not miss a single frame. Measured interrupt servicing periods were pretty much exactly 1/300 s with a tight distribution.
Also - try and minimize the other things the OS is doing to help improve the odds of your timing working better in the app where it matters. eg: no background video transcoding or disk de-fragging or anything while your trying to get precision timing with other code!!
In summary:
If you really need this, go with a real time OS
If you can't use a real-time OS (impossible or impractical), boosting your process priority will likely improve your timing by a lot, as it did for me
HW interrupts won't do it... the OS still needs to decide to service them!
Make sure that you don't have a lot of other processes running that are competing for OS attention
If timing is really important to you, do some testing. Although getting code to run exactly when you want it to is not very easy, measuring this deviation is quite easy. The high performance counters in PCs (what you get with QueryPerformanceCounter) are extremely good.
Since it may be helpful (although a bit off topic), here's a small class I wrote a long time ago for using the high performance counters on a Windows machine. It may be useful for your testing:
CHiResTimer.h
#pragma once
#include "stdafx.h"
#include <windows.h>
class CHiResTimer
{
private:
LARGE_INTEGER frequency;
LARGE_INTEGER startCounts;
double ConvertCountsToSeconds(LONGLONG Counts);
public:
CHiResTimer(); // constructor
void ResetTimer(void);
double GetElapsedTime_s(void);
};
CHiResTimer.cpp
#include "stdafx.h"
#include "CHiResTimer.h"
double CHiResTimer::ConvertCountsToSeconds(LONGLONG Counts)
{
return ((double)Counts / (double)frequency.QuadPart) ;
}
CHiResTimer::CHiResTimer()
{
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&startCounts); // starts the timer right away
}
void CHiResTimer::ResetTimer()
{
QueryPerformanceCounter(&startCounts); // reset the reference counter
}
double CHiResTimer::GetElapsedTime_s()
{
LARGE_INTEGER countsNow;
QueryPerformanceCounter(&countsNow);
return ConvertCountsToSeconds(countsNow.QuadPart - startCounts.QuadPart);
}
No.
The reason it's "at least semantics" is because that after those 5 seconds some other thread may be busy.
Every thread gets a time slice from the Operating System. The Operating System controls the order in which the threads are run.
When you put a thread to sleep, the OS puts the thread in a waiting list, and when the timer is over the operating system "Wakes" the thread.
This means that the thread is added back to the active threads list, but it isn't guaranteed that t will be added in first place. (What if 100 threads need to be awaken in that specific second ? Who will go first ?)
While standard Linux is not a realtime operating system, the kernel developers pay close attention to how long a high priority process would remain starved while kernel locks are held. Thus, a stock Linux kernel is usually good enough for many soft-realtime applications.
You can schedule your process as a realtime task with the sched_setscheduler(2) call, using either SCHED_FIFO or SCHED_RR. The two have slight differences in semantics, but it may be enough to know that a SCHED_RR task will eventually relinquish the processor to another task of the same priority due to time slices, while a SCHED_FIFO task will only relinquish the CPU to another task of the same priority due to blocking I/O or an explicit call to sched_yield(2).
Be careful when using realtime scheduled tasks; as they always take priority over standard tasks, you can easily find yourself coding an infinite loop that never relinquishes the CPU and blocks admins from using ssh to kill the process. So it might not hurt to run an sshd at a higher realtime priority, at least until you're sure you've fixed the worst bugs.
There are variants of Linux available that have been worked on to provide hard-realtime guarantees. RTLinux has commercial support; Xenomai and RTAI are competing implementations of realtime extensions for Linux, but I know nothing else about them.
As previous answerers said: there is no way to be exact (some suggested realtime-os or hardware interrupts and even those are not exact). I think what you are looking for is something that is just more precise than the sleep() function and you find that depending on your OS in e.g. the Windows Sleep() function or under GNU the nanosleep() function.
http://msdn.microsoft.com/en-us/library/ms686298%28VS.85%29.aspx
http://www.delorie.com/gnu/docs/glibc/libc_445.html
Both will give you precision within a few milliseconds.
Well, you try to tackle a difficult problem, and achieving exact timing is not feasible: the best you can do is to use hardware interrupts, and the implementation will depend on both your underlying hardware, and your operating system (namely, you will need a real-time operating system, which most regular desktop OS are not). What is your exact target platform?
No. Because you're always depending on the OS to handle waking up threads at the right time.
There is no way to sleep for a specified time period using standard C. You will need, at minimum, a 3rd party library which provides greater granularity, and you might also need a special operating system kernel such as the real-time Linux kernels.
For instance, here is a discussion of how close you can come on Win32 systems.
This is not a C question.

CPU throttling in C++

I was just wondering if there is an elegant way to set the maximum CPU load for a particular thread doing intensive calculations.
Right now I have located the most time consuming loop in the thread (it does only compression) and use GetTickCount() and Sleep() with hardcoded values. It makes sure that the loop continues for a certain period and then sleeps for a certain minimum time. It more or less does the job, i.e. guarantees that the thread will not use more than 50% of CPU. However, behavior is dependent on the number of CPU cores (huge disadvantage) and simply ugly (smaller disadvantage :)). Any ideas?
I am not aware of any API to do get the OS's scheduler to do what you want (even if your thread is idle-priority, if there are no higher-priority ready threads, yours will run). However, I think you can improvise a fairly elegant throttling function based on what you are already doing. Essentially (I don't have a Windows dev machine handy):
Pick a default amount of time the thread will sleep each iteration. Then, on each iteration (or on every nth iteration, such that the throttling function doesn't itself become a significant CPU load),
Compute the amount of CPU time your thread used since the last time your throttling function was called (I'll call this dCPU). You can use the GetThreadTimes() API to get the amount of time your thread has been executing.
Compute the amount of real time elapsed since the last time your throttling function was called (I'll call this dClock).
dCPU / dClock is the percent CPU usage (of one CPU). If it is higher than you want, increase your sleep time, if lower, decrease the sleep time.
Have your thread sleep for the computed time.
Depending on how your watchdog computes CPU usage, you might want to use GetProcessAffinityMask() to find out how many CPUs the system has. dCPU / (dClock * CPUs) is the percentage of total CPU time available.
You will still have to pick some magic numbers for the initial sleep time and the increment/decrement amount, but I think this algorithm could be tuned to keep a thread running at fairly close to a determined percent of CPU.
On linux, you can change the scheduling priority of a thread with nice().
I can't think of any cross platform way of what you want (or any guaranteed way full stop) but as you are using GetTickCount perhaps you aren't interested in cross platform :)
I'd use interprocess communications and set the intensive processes nice levels to get what you require but I'm not sure that's appropriate for your situation.
EDIT:
I agree with Bernard which is why I think a process rather than a thread might be more appropriate but it just might not suit your purposes.
The problem is it's not normal to want to leave the CPU idle while you have work to do. Normally you set a background task to IDLE priority, and let the OS handle scheduling it all the CPU time that isn't used by interactive tasks.
It sound to me like the problem is the watchdog process.
If your background task is CPU-bound then you want it to take all the unused CPU time for its task.
Maybe you should look at fixing the watchdog program?
You may be able to change the priority of a thread, but changing the maximum utilization would either require polling and hacks to limit how many things are occurring, or using OS tools that can set the maximum utilization of a process.
However, I don't see any circumstance where you would want to do this.