std::this_thread::sleep_for sleeps for too long - c++

Can anyone tell what the problem with following example is?
It produces 65 instead of 300 frames per second.
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <Thread>
#include <Chrono>
#include <String>
int main(int argc, const char* argv[]) {
using namespace std::chrono_literals;
constexpr unsigned short FPS_Limit = 300;
std::chrono::duration<double, std::ratio<1, FPS_Limit>> FrameDelay = std::chrono::duration<double, std::ratio<1, FPS_Limit>>(1.0f);
unsigned int FPS = 0;
std::chrono::steady_clock SecondTimer;
std::chrono::steady_clock ProcessTimer;
std::chrono::steady_clock::time_point TpS = SecondTimer.now();
std::chrono::steady_clock::time_point TpP = ProcessTimer.now();
while (true) {
// ...
// Count FPS
FPS++;
if ((TpS + (SecondTimer.now() - TpS)) > (TpS + 1s)) {
OutputDebugString(std::to_string(FPS).c_str()); OutputDebugString("\n");
FPS = 0;
TpS = SecondTimer.now();
}
// Sleep
std::this_thread::sleep_for(FrameDelay - (ProcessTimer.now() - TpP)); // FrameDelay minus time needed to execute other things
TpP = ProcessTimer.now();
}
return 0;
}
I guess it has something to do with std::chrono::duration<double, std::ratio<1, FPS_Limit>>, but when it is multiplied by FPS_Limit the correct 1 frames per second are produced.
Note that the limit of 300 frames per second is just an example.
It can be replaced by any other number and the program would still sleep for way too long.

In short, the problem is that you use std::this_thread::sleep_for at all. Or, any kind of "sleep" for that matter. Sleeping to limit the frame rate is just utterly wrong.
The purpose of sleep functionality is, well... I don't know to be honest. There are very few good uses for it at all, and in practically every situation, a different mechanism is better.
What std::this_thread::sleep_for does (give or take a few lines of sanity tests and error checking) is, it calls the Win32 Sleep function (or, on a different OS, a different, similar function such as nanosleep).
Now, what does Sleep do? It makes a note somewhere in the operating system's little red book that your thread needs to be made ready again at some future time, and then renders your thread not-ready. Being not-ready means simply that your thread is not on the list of candidates to be scheduled for getting CPU time.
Sometimes, eventually, a hardware timer will fire an interrupt. That can be a periodic timer (pre Windows 8) with an embarrassingly bad default resolution, or programmable one-shot interrupt, whatever. You can even adjust that timer's resolution, but doing so is a global thing which greatly increases the number of context switches. Plus, it doesn't solve the actual problem. When the OS handles the interrupt, it looks in its book to see which threads need to be made ready, and it does that.
That, however, is not the same as running your thread. It is merely a candidate for being run again (maybe, some time).
So, there's timer granularity, inaccuracy in your measurement, plus scheduling... which altogether is very, very unsuitable for short, periodic intervals. Also, different Windows versions are known to round differently to the scheduler's granularity.
Solution: Do not sleep. Enable vertical sync, or leave it to the user to enable it.

Related

sleep_until() and steady_clock loop drifting from real time in macOS

Good evening everyone,
I'm trying to learn concurrency using the C++ Concurrency Book by Anthony Williams. Having read the first 2 chapters I thought about coding a simple metronome working in its own thread:
#include <iostream>
#include <thread>
#include <chrono>
#include <vector>
class Metro
{
public:
// beats per minute
Metro(int bpm_in);
void start();
private:
// In milliseconds
int mPeriod;
std::vector<std::thread> mThreads;
private:
void loop();
};
Metro::Metro(int bpm_in):
mPeriod(60000/bpm_in)
{}
void Metro::start()
{
mThreads.push_back(std::thread(&Metro::loop, this));
mThreads.back().detach();
}
void Metro::loop()
{
auto x = std::chrono::steady_clock::now();
while(true)
{
x += std::chrono::milliseconds(mPeriod);
std::cout << "\a" << std::flush;
std::this_thread::sleep_until(x);
}
}
Now, this code seems to work properly, except for the time interval: the period (assuming bpm = 60 => mPeriod = 1000ms) is more than 1100ms. I read that sleep_until is not guaranteed to wake the process up exactly at the correct time (cppreference.com), but the lack of precision should not change the average period time, only delay the single "tic" inside the time grid, am I understanding it correctly? I assumed that storing the steady_clock::now() time only the first time and then using only the increment would be the correct way not to add drifting time at every cycle. Nevertheless, I also tried to change the x var update in the while loop to
x = std::chrono::steady_clock::now() + std::chrono::milliseconds(mPeriod);
but the period increases even more. I also tried using std::chrono::system_clock and high_resolution_clock, but the period didn't improve. Also, I think the properties I'm interested in for this application are monotonicity and steadiness, which steady_clock has. My question is: is there anything completely wrong I did in my code? Am I missing something concerning how to use std::chrono clocks and sleep_until? Or is this kind of method inherently not precise?
I've started analyzing the period by simply comparing it against some known metronomes (Logic Pro, Ableton Live, some mobile apps) and then recorded the output sound to have a better measurement. Maybe the sound buffer has some delay on itself, but same problem happens when making the program output a char. Also, the problem I'm concerned about is the drifting, not the single tic being a bit out of time.
I'm compiling from macos 10.15 terminal with g++ --std=c++11 -pthread and running it on Intel i7 4770hq.

Generating interrupt each 100 microsecond on windows

i want to generate interrupt every 100 microseconds on windows. Actually i couldnt do this on windows,because windows does not guarantee the interrupts less then 500 microseconds. So, i generate 2 threads. One of them is for timer counter(query performance counter), the other thread is the actual work. When timer counter is 100 microseconds, it change the state of the other thread(actual work) . But i have problem with race condition, because i dont want the threads wait each others, they must always run. So actually i need interrupts. How do i write such fast interrupt on windows with c++?
To avoid having two threads communicating when you have these short time windows, I'd put both the work and the timer in a loop in one thread.
Take a sample of the clock when the thread starts and add 100μs to that each loop.
Sleep until the calculated time occurs. Normally, one would use std::this_thread::sleep_until to do such a sleep, but in this case, when the naps are so short, it often becomes a little too inaccurate, so I suggest busy-waiting in a tight loop that just checks the time.
Do your work.
In this example a worker thread runs for 10s without doing any real work. On my machine I could add work consisting of ~3000 additions in the slot where you are supposed to do your work before the whole loop started taking more than 100μs, so you'd better do what you aim to do really fast.
Example:
#include <atomic>
#include <chrono>
#include <iostream>
#include <thread>
using namespace std::chrono_literals;
static std::atomic<bool> running = true;
using myclock = std::chrono::steady_clock;
void worker() {
int loops = 0;
auto sleeper = myclock::now();
while(running) {
++loops; // count loops to check that it's good enough afterwards
// add 100us to the sleeper time_point
sleeper += 100us;
// busy-wait until it's time to do some work
while(myclock::now() < sleeper);
// do your work here
}
std::cout << loops << " (should be ~100000)\n";
}
int main() {
auto th = std::thread(worker);
// let the thread work for 10 seconds
std::this_thread::sleep_for(10s);
running = false;
th.join();
}
Possible output:
99996 (should be ~100000)
It takes a few clock cycles to get the thread started so don't worry about the number of loops not being exactly on target. Double the time the thread runs and you should still stay close to the target number of loops. What matters is that it's pretty good (but not realtime-good) once it's started running.

How to maintain certain frame rate in different threads

I have two different computational tasks that have to execute at certain frequencies. One has to be performed every 1ms and the other every 13.3ms. The tasks share some data.
I am having a hard time how to schedule these tasks and how to share data between them. One way that I thought might work is to create two threads, one for each task.
The first task is relatively simpler and can be handled in 1ms itself. But, when the second task (that is relatively more time-consuming) is going to launch, it will make a copy of the data that was just used by task 1, and continue to work on them.
Do you think this would work? How can it be done in c++?
There are multiple ways to do that in C++.
One simple way is to have 2 threads, as you described. Each thread does its action and then sleeps till the next period start. A working example:
#include <functional>
#include <iostream>
#include <chrono>
#include <thread>
#include <atomic>
#include <mutex>
std::mutex mutex;
std::atomic<bool> stop = {false};
unsigned last_result = 0; // Whatever thread_1ms produces.
void thread_1ms_action() {
// Do the work.
// Update the last result.
{
std::unique_lock<std::mutex> lock(mutex);
++last_result;
}
}
void thread_1333us_action() {
// Copy thread_1ms result.
unsigned last_result_copy;
{
std::unique_lock<std::mutex> lock(mutex);
last_result_copy = last_result;
}
// Do the work.
std::cout << last_result_copy << '\n';
}
void periodic_action_thread(std::chrono::microseconds period, std::function<void()> const& action) {
auto const start = std::chrono::steady_clock::now();
while(!stop.load(std::memory_order_relaxed)) {
// Do the work.
action();
// Wait till the next period start.
auto now = std::chrono::steady_clock::now();
auto iterations = (now - start) / period;
auto next_start = start + (iterations + 1) * period;
std::this_thread::sleep_until(next_start);
}
}
int main() {
std::thread a(periodic_action_thread, std::chrono::milliseconds(1), thread_1ms_action);
std::thread b(periodic_action_thread, std::chrono::microseconds(13333), thread_1333us_action);
std::this_thread::sleep_for(std::chrono::seconds(1));
stop = true;
a.join();
b.join();
}
If executing an action takes longer than one period to execute, then it sleeps till the next period start (skips one or more periods). I.e. each Nth action happens exactly at start_time + N * period, so that there is no time drift regardless of how long it takes to perform the action.
All access to the shared data is protected by the mutex.
So I'm thinking that task1 needs to make the copy, because it knows when it is safe to do so. Here is one simplistic model:
Shared:
atomic<Result*> latestResult = {0};
Task1:
Perform calculation
Result* pNewResult = new ResultBuffer
Copy result to pNewResult
latestResult.swap(pNewResult)
if (pNewResult)
delete pNewResult; // Task2 didn't take it!
Task2:
Result* pNewResult;
latestResult.swap(pNewResult);
process result
delete pNewResult;
In this model task1 and task2 only ever naggle when swapping a simple atomic pointer, which is quite painless.
Note that this makes many assumptions about your calculation. Could your task1 usefully calculate the result straight into the buffer, for example? Also note that at the start Task2 may find the pointer is still null.
Also it inefficiently new()s the buffers. You need 3 buffers to ensure there is never any significant naggling between the tasks, but you could just manage three buffer pointers under mutexes, such that Task 1 will have a set of data ready, and be writing another set of data, while task 2 is reading from a third set.
Note that even if you have task 2 copy the buffer, Task 1 still needs 2 buffers to avoid stalls.
You can use C++ threads and thread facilities like class thread and timer classes like steady_clock like it has been described in previous answer but if this solution works strongly depends on the platform your code is running on.
1ms and 13.3ms are pretty short time intervals and if your code is running on non-real time OS like Windows or non-RTOS Linux, there is no guarantee that OS scheduler will wake up your threads at exact times.
C++ 11 has the class high_resolution_clock that should use high resolution timer if your platform supports one but it still depends on the implementation of this class. And the bigger problem than the timer is using C++ wait functions. Neither C++ sleep_until nor sleep_for guarantees that they will wake up your thread at specified times. Here is the quote from C++ documentation.
sleep_for - blocks the execution of the current thread for at least the specified sleep_duration. sleep_for
Fortunately, most OS have some special facilities like Windows Multimedia Timers you can use if your threads are not woken up at expected times.
Here are more details. Precise thread sleep needed. Max 1ms error

How much time it takes for a thread waiting with pthread_cond_wait to wake after being signaled? how can I estimate this time?

I'm writing a C++ ThreadPool implantation and using pthread_cond_wait in my worker's main function. I was wondering how much time will pass from signaling the condition variable until the thread/threads waiting on it will wake up.
do you have any idea of how can I estimate/calculate this time?
Thank you very much.
It depends, on the cost of a context switch
on the OS,
The CPU
is it thread or a different process
the load of the machine
Is the switch to same core as it last ran on
what is the working set size
time since it last ran
Linux best case, i7, 1100ns, thread in same process, same core as it ran in last, ran as the last thread, no load, working set 1 byte.
Bad case, flushed from cache, different core, different process, just expect 30µs of CPU overhead.
Where does the cost go:
Save last process context 70-400 cycles,
load new context 100-400 cycles
if different process, flush TLB, reload 3 to 5 page walks, which potentially could be from memory taking ~300 cycles each. Plus a few page walks if more than one page is touched, including instructions and data.
OS overhead, we all like the nice statistics, for example add 1 to context switch counter.
Scheduling overhead, which task to run next
potential cache misses on new core ~12 cycles per cache line on own L2 cache, and downhill from there the farther away the data is and the more there is of it.
As mentioned time for condition variable to react depends on many factors. One option is to actually measure it: you may start a thread that waits on a condition variable. Then, another thread that signals the condition variable takes timestamp right before signaling the variable. The thread that waits on the variable also takes timestamp the moment it wakes up. Simple as that. This way you may have rough approximation about time it takes for the thread to notice the signaled condition.
#include <mutex>
#include <condition_variable>
#include <thread>
#include <chrono>
#include <stdio.h>
typedef std::chrono::time_point<std::chrono::high_resolution_clock> timep;
int main()
{
std::mutex mx;
std::condition_variable cv;
timep t0, t1;
bool done = false;
std::thread th([&]() {
while (!done)
{
std::unique_lock lock(mx);
cv.wait(lock);
t1 = std::chrono::high_resolution_clock::now();
}
});
for (int i = 0; i < 25; ++i) // measure 25 times
{
std::this_thread::sleep_for(std::chrono::milliseconds(10));
t0 = std::chrono::high_resolution_clock::now();
cv.notify_one();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
std::unique_lock lock(mx);
printf("test#%-2d: cv reaction time: %6.3f micro\n", i,
1000000 * std::chrono::duration<double>(t1 - t0).count());
}
{
std::unique_lock lock(mx);
done = true;
}
cv.notify_one();
th.join();
}
Try it on coliru, it produced this output:
test#0 : cv reaction time: 50.488 micro
test#1 : cv reaction time: 55.057 micro
test#2 : cv reaction time: 53.765 micro
test#3 : cv reaction time: 50.973 micro
test#4 : cv reaction time: 51.015 micro
test#5 : cv reaction time: 57.166 micro
and so on...
On my windows 11 laptop I got values roughly 5-10x faster (5-10 microseconds).

slowing the speed of for loop

for(;;) {
int rand_number = rand() % 2;
cout << rand_number;
}
These loops generates the 1's and 0's across the screen like the matrix movie (LOL) but the code executes really fast is there any way that we make the numbers appear slowly?
use Sleep(3000); to wait for 3000 milliseconds
for example
#include <iostream>
#include <stdlib.h>
#include <Windows.h>
using namespace std;
int main(int argc,char**argv){
cout<<"a"<<endl;
Sleep(3000);
cout<<"b"<<endl;
return 0;
}
Check out usleep. You could use sleep as well, but I imagine that will be too slow.
USLEEP(3) BSD Library Functions Manual USLEEP(3)
NAME
usleep -- suspend thread execution for an interval measured in microseconds
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <unistd.h>
int
usleep(useconds_t useconds);
DESCRIPTION
The usleep() function suspends execution of the calling thread until either useconds microseconds have elapsed or a signal is delivered to the thread whose action
is to invoke a signal-catching function or to terminate the thread or process. The actual time slept may be longer, due to system latencies and possible limita-
tions in the timer resolution of the hardware.
This function is implemented, using nanosleep(2), by pausing for useconds microseconds or until a signal occurs. Consequently, in this implementation, sleeping
has no effect on the state of process timers and there is no special handling for SIGALRM.
RETURN VALUES
The usleep() function returns the value 0 if successful; otherwise the value -1 is returned and the global variable errno is set to indicate the error.
ERRORS
The usleep() function will fail if:
[EINTR] A signal was delivered to the process and its action was to invoke a signal-catching function.
SEE ALSO
nanosleep(2), sleep(3)
HISTORY
The usleep() function appeared in 4.3BSD.
BSD February 13, 1998 BSD
If you just want it to stop until you are ready you can put in a random pointless cin. It'll just wait there for input until you press return.