Generating interrupt each 100 microsecond on windows - c++

i want to generate interrupt every 100 microseconds on windows. Actually i couldnt do this on windows,because windows does not guarantee the interrupts less then 500 microseconds. So, i generate 2 threads. One of them is for timer counter(query performance counter), the other thread is the actual work. When timer counter is 100 microseconds, it change the state of the other thread(actual work) . But i have problem with race condition, because i dont want the threads wait each others, they must always run. So actually i need interrupts. How do i write such fast interrupt on windows with c++?

To avoid having two threads communicating when you have these short time windows, I'd put both the work and the timer in a loop in one thread.
Take a sample of the clock when the thread starts and add 100μs to that each loop.
Sleep until the calculated time occurs. Normally, one would use std::this_thread::sleep_until to do such a sleep, but in this case, when the naps are so short, it often becomes a little too inaccurate, so I suggest busy-waiting in a tight loop that just checks the time.
Do your work.
In this example a worker thread runs for 10s without doing any real work. On my machine I could add work consisting of ~3000 additions in the slot where you are supposed to do your work before the whole loop started taking more than 100μs, so you'd better do what you aim to do really fast.
Example:
#include <atomic>
#include <chrono>
#include <iostream>
#include <thread>
using namespace std::chrono_literals;
static std::atomic<bool> running = true;
using myclock = std::chrono::steady_clock;
void worker() {
int loops = 0;
auto sleeper = myclock::now();
while(running) {
++loops; // count loops to check that it's good enough afterwards
// add 100us to the sleeper time_point
sleeper += 100us;
// busy-wait until it's time to do some work
while(myclock::now() < sleeper);
// do your work here
}
std::cout << loops << " (should be ~100000)\n";
}
int main() {
auto th = std::thread(worker);
// let the thread work for 10 seconds
std::this_thread::sleep_for(10s);
running = false;
th.join();
}
Possible output:
99996 (should be ~100000)
It takes a few clock cycles to get the thread started so don't worry about the number of loops not being exactly on target. Double the time the thread runs and you should still stay close to the target number of loops. What matters is that it's pretty good (but not realtime-good) once it's started running.

Related

sleep_until() and steady_clock loop drifting from real time in macOS

Good evening everyone,
I'm trying to learn concurrency using the C++ Concurrency Book by Anthony Williams. Having read the first 2 chapters I thought about coding a simple metronome working in its own thread:
#include <iostream>
#include <thread>
#include <chrono>
#include <vector>
class Metro
{
public:
// beats per minute
Metro(int bpm_in);
void start();
private:
// In milliseconds
int mPeriod;
std::vector<std::thread> mThreads;
private:
void loop();
};
Metro::Metro(int bpm_in):
mPeriod(60000/bpm_in)
{}
void Metro::start()
{
mThreads.push_back(std::thread(&Metro::loop, this));
mThreads.back().detach();
}
void Metro::loop()
{
auto x = std::chrono::steady_clock::now();
while(true)
{
x += std::chrono::milliseconds(mPeriod);
std::cout << "\a" << std::flush;
std::this_thread::sleep_until(x);
}
}
Now, this code seems to work properly, except for the time interval: the period (assuming bpm = 60 => mPeriod = 1000ms) is more than 1100ms. I read that sleep_until is not guaranteed to wake the process up exactly at the correct time (cppreference.com), but the lack of precision should not change the average period time, only delay the single "tic" inside the time grid, am I understanding it correctly? I assumed that storing the steady_clock::now() time only the first time and then using only the increment would be the correct way not to add drifting time at every cycle. Nevertheless, I also tried to change the x var update in the while loop to
x = std::chrono::steady_clock::now() + std::chrono::milliseconds(mPeriod);
but the period increases even more. I also tried using std::chrono::system_clock and high_resolution_clock, but the period didn't improve. Also, I think the properties I'm interested in for this application are monotonicity and steadiness, which steady_clock has. My question is: is there anything completely wrong I did in my code? Am I missing something concerning how to use std::chrono clocks and sleep_until? Or is this kind of method inherently not precise?
I've started analyzing the period by simply comparing it against some known metronomes (Logic Pro, Ableton Live, some mobile apps) and then recorded the output sound to have a better measurement. Maybe the sound buffer has some delay on itself, but same problem happens when making the program output a char. Also, the problem I'm concerned about is the drifting, not the single tic being a bit out of time.
I'm compiling from macos 10.15 terminal with g++ --std=c++11 -pthread and running it on Intel i7 4770hq.

std::this_thread::sleep_for sleeps for too long

Can anyone tell what the problem with following example is?
It produces 65 instead of 300 frames per second.
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <Thread>
#include <Chrono>
#include <String>
int main(int argc, const char* argv[]) {
using namespace std::chrono_literals;
constexpr unsigned short FPS_Limit = 300;
std::chrono::duration<double, std::ratio<1, FPS_Limit>> FrameDelay = std::chrono::duration<double, std::ratio<1, FPS_Limit>>(1.0f);
unsigned int FPS = 0;
std::chrono::steady_clock SecondTimer;
std::chrono::steady_clock ProcessTimer;
std::chrono::steady_clock::time_point TpS = SecondTimer.now();
std::chrono::steady_clock::time_point TpP = ProcessTimer.now();
while (true) {
// ...
// Count FPS
FPS++;
if ((TpS + (SecondTimer.now() - TpS)) > (TpS + 1s)) {
OutputDebugString(std::to_string(FPS).c_str()); OutputDebugString("\n");
FPS = 0;
TpS = SecondTimer.now();
}
// Sleep
std::this_thread::sleep_for(FrameDelay - (ProcessTimer.now() - TpP)); // FrameDelay minus time needed to execute other things
TpP = ProcessTimer.now();
}
return 0;
}
I guess it has something to do with std::chrono::duration<double, std::ratio<1, FPS_Limit>>, but when it is multiplied by FPS_Limit the correct 1 frames per second are produced.
Note that the limit of 300 frames per second is just an example.
It can be replaced by any other number and the program would still sleep for way too long.
In short, the problem is that you use std::this_thread::sleep_for at all. Or, any kind of "sleep" for that matter. Sleeping to limit the frame rate is just utterly wrong.
The purpose of sleep functionality is, well... I don't know to be honest. There are very few good uses for it at all, and in practically every situation, a different mechanism is better.
What std::this_thread::sleep_for does (give or take a few lines of sanity tests and error checking) is, it calls the Win32 Sleep function (or, on a different OS, a different, similar function such as nanosleep).
Now, what does Sleep do? It makes a note somewhere in the operating system's little red book that your thread needs to be made ready again at some future time, and then renders your thread not-ready. Being not-ready means simply that your thread is not on the list of candidates to be scheduled for getting CPU time.
Sometimes, eventually, a hardware timer will fire an interrupt. That can be a periodic timer (pre Windows 8) with an embarrassingly bad default resolution, or programmable one-shot interrupt, whatever. You can even adjust that timer's resolution, but doing so is a global thing which greatly increases the number of context switches. Plus, it doesn't solve the actual problem. When the OS handles the interrupt, it looks in its book to see which threads need to be made ready, and it does that.
That, however, is not the same as running your thread. It is merely a candidate for being run again (maybe, some time).
So, there's timer granularity, inaccuracy in your measurement, plus scheduling... which altogether is very, very unsuitable for short, periodic intervals. Also, different Windows versions are known to round differently to the scheduler's granularity.
Solution: Do not sleep. Enable vertical sync, or leave it to the user to enable it.

How to maintain certain frame rate in different threads

I have two different computational tasks that have to execute at certain frequencies. One has to be performed every 1ms and the other every 13.3ms. The tasks share some data.
I am having a hard time how to schedule these tasks and how to share data between them. One way that I thought might work is to create two threads, one for each task.
The first task is relatively simpler and can be handled in 1ms itself. But, when the second task (that is relatively more time-consuming) is going to launch, it will make a copy of the data that was just used by task 1, and continue to work on them.
Do you think this would work? How can it be done in c++?
There are multiple ways to do that in C++.
One simple way is to have 2 threads, as you described. Each thread does its action and then sleeps till the next period start. A working example:
#include <functional>
#include <iostream>
#include <chrono>
#include <thread>
#include <atomic>
#include <mutex>
std::mutex mutex;
std::atomic<bool> stop = {false};
unsigned last_result = 0; // Whatever thread_1ms produces.
void thread_1ms_action() {
// Do the work.
// Update the last result.
{
std::unique_lock<std::mutex> lock(mutex);
++last_result;
}
}
void thread_1333us_action() {
// Copy thread_1ms result.
unsigned last_result_copy;
{
std::unique_lock<std::mutex> lock(mutex);
last_result_copy = last_result;
}
// Do the work.
std::cout << last_result_copy << '\n';
}
void periodic_action_thread(std::chrono::microseconds period, std::function<void()> const& action) {
auto const start = std::chrono::steady_clock::now();
while(!stop.load(std::memory_order_relaxed)) {
// Do the work.
action();
// Wait till the next period start.
auto now = std::chrono::steady_clock::now();
auto iterations = (now - start) / period;
auto next_start = start + (iterations + 1) * period;
std::this_thread::sleep_until(next_start);
}
}
int main() {
std::thread a(periodic_action_thread, std::chrono::milliseconds(1), thread_1ms_action);
std::thread b(periodic_action_thread, std::chrono::microseconds(13333), thread_1333us_action);
std::this_thread::sleep_for(std::chrono::seconds(1));
stop = true;
a.join();
b.join();
}
If executing an action takes longer than one period to execute, then it sleeps till the next period start (skips one or more periods). I.e. each Nth action happens exactly at start_time + N * period, so that there is no time drift regardless of how long it takes to perform the action.
All access to the shared data is protected by the mutex.
So I'm thinking that task1 needs to make the copy, because it knows when it is safe to do so. Here is one simplistic model:
Shared:
atomic<Result*> latestResult = {0};
Task1:
Perform calculation
Result* pNewResult = new ResultBuffer
Copy result to pNewResult
latestResult.swap(pNewResult)
if (pNewResult)
delete pNewResult; // Task2 didn't take it!
Task2:
Result* pNewResult;
latestResult.swap(pNewResult);
process result
delete pNewResult;
In this model task1 and task2 only ever naggle when swapping a simple atomic pointer, which is quite painless.
Note that this makes many assumptions about your calculation. Could your task1 usefully calculate the result straight into the buffer, for example? Also note that at the start Task2 may find the pointer is still null.
Also it inefficiently new()s the buffers. You need 3 buffers to ensure there is never any significant naggling between the tasks, but you could just manage three buffer pointers under mutexes, such that Task 1 will have a set of data ready, and be writing another set of data, while task 2 is reading from a third set.
Note that even if you have task 2 copy the buffer, Task 1 still needs 2 buffers to avoid stalls.
You can use C++ threads and thread facilities like class thread and timer classes like steady_clock like it has been described in previous answer but if this solution works strongly depends on the platform your code is running on.
1ms and 13.3ms are pretty short time intervals and if your code is running on non-real time OS like Windows or non-RTOS Linux, there is no guarantee that OS scheduler will wake up your threads at exact times.
C++ 11 has the class high_resolution_clock that should use high resolution timer if your platform supports one but it still depends on the implementation of this class. And the bigger problem than the timer is using C++ wait functions. Neither C++ sleep_until nor sleep_for guarantees that they will wake up your thread at specified times. Here is the quote from C++ documentation.
sleep_for - blocks the execution of the current thread for at least the specified sleep_duration. sleep_for
Fortunately, most OS have some special facilities like Windows Multimedia Timers you can use if your threads are not woken up at expected times.
Here are more details. Precise thread sleep needed. Max 1ms error

How much time it takes for a thread waiting with pthread_cond_wait to wake after being signaled? how can I estimate this time?

I'm writing a C++ ThreadPool implantation and using pthread_cond_wait in my worker's main function. I was wondering how much time will pass from signaling the condition variable until the thread/threads waiting on it will wake up.
do you have any idea of how can I estimate/calculate this time?
Thank you very much.
It depends, on the cost of a context switch
on the OS,
The CPU
is it thread or a different process
the load of the machine
Is the switch to same core as it last ran on
what is the working set size
time since it last ran
Linux best case, i7, 1100ns, thread in same process, same core as it ran in last, ran as the last thread, no load, working set 1 byte.
Bad case, flushed from cache, different core, different process, just expect 30µs of CPU overhead.
Where does the cost go:
Save last process context 70-400 cycles,
load new context 100-400 cycles
if different process, flush TLB, reload 3 to 5 page walks, which potentially could be from memory taking ~300 cycles each. Plus a few page walks if more than one page is touched, including instructions and data.
OS overhead, we all like the nice statistics, for example add 1 to context switch counter.
Scheduling overhead, which task to run next
potential cache misses on new core ~12 cycles per cache line on own L2 cache, and downhill from there the farther away the data is and the more there is of it.
As mentioned time for condition variable to react depends on many factors. One option is to actually measure it: you may start a thread that waits on a condition variable. Then, another thread that signals the condition variable takes timestamp right before signaling the variable. The thread that waits on the variable also takes timestamp the moment it wakes up. Simple as that. This way you may have rough approximation about time it takes for the thread to notice the signaled condition.
#include <mutex>
#include <condition_variable>
#include <thread>
#include <chrono>
#include <stdio.h>
typedef std::chrono::time_point<std::chrono::high_resolution_clock> timep;
int main()
{
std::mutex mx;
std::condition_variable cv;
timep t0, t1;
bool done = false;
std::thread th([&]() {
while (!done)
{
std::unique_lock lock(mx);
cv.wait(lock);
t1 = std::chrono::high_resolution_clock::now();
}
});
for (int i = 0; i < 25; ++i) // measure 25 times
{
std::this_thread::sleep_for(std::chrono::milliseconds(10));
t0 = std::chrono::high_resolution_clock::now();
cv.notify_one();
std::this_thread::sleep_for(std::chrono::milliseconds(10));
std::unique_lock lock(mx);
printf("test#%-2d: cv reaction time: %6.3f micro\n", i,
1000000 * std::chrono::duration<double>(t1 - t0).count());
}
{
std::unique_lock lock(mx);
done = true;
}
cv.notify_one();
th.join();
}
Try it on coliru, it produced this output:
test#0 : cv reaction time: 50.488 micro
test#1 : cv reaction time: 55.057 micro
test#2 : cv reaction time: 53.765 micro
test#3 : cv reaction time: 50.973 micro
test#4 : cv reaction time: 51.015 micro
test#5 : cv reaction time: 57.166 micro
and so on...
On my windows 11 laptop I got values roughly 5-10x faster (5-10 microseconds).

Synchronizing very fast threads

In the following example (an idealized "game") there are two threads. The main thread which updates data and RenderThread which "renders" it to the screen. What I need it those two to be synchronized. I cannot afford to run several update iteration without running a render for every single one of them.
I use a condition_variable to sync those two, so ideally the faster thread will spend some time waiting for the slower. However condition variables don't seem to do the job if one of the threads completes an iteration for a very small amount of time. It seems to quickly reacquire the lock of the mutex before wait in the other thread is able to acquire it. Even though notify_one is called
#include <iostream>
#include <thread>
#include <chrono>
#include <atomic>
#include <functional>
#include <mutex>
#include <condition_variable>
using namespace std;
bool isMultiThreaded = true;
struct RenderThread
{
RenderThread()
{
end = false;
drawing = false;
readyToDraw = false;
}
void Run()
{
while (!end)
{
DoJob();
}
}
void DoJob()
{
unique_lock<mutex> lk(renderReadyMutex);
renderReady.wait(lk, [this](){ return readyToDraw; });
drawing = true;
// RENDER DATA
this_thread::sleep_for(chrono::milliseconds(15)); // simulated render time
cout << "frame " << count << ": " << frame << endl;
++count;
drawing = false;
readyToDraw = false;
lk.unlock();
renderReady.notify_one();
}
atomic<bool> end;
mutex renderReadyMutex;
condition_variable renderReady;
//mutex frame_mutex;
int frame = -10;
int count = 0;
bool readyToDraw;
bool drawing;
};
struct UpdateThread
{
UpdateThread(RenderThread& rt)
: m_rt(rt)
{}
void Run()
{
this_thread::sleep_for(chrono::milliseconds(500));
for (int i = 0; i < 20; ++i)
{
// DO GAME UPDATE
// when this is uncommented everything is fine
// this_thread::sleep_for(chrono::milliseconds(10)); // simulated update time
// PREPARE RENDER THREAD
unique_lock<mutex> lk(m_rt.renderReadyMutex);
m_rt.renderReady.wait(lk, [this](){ return !m_rt.drawing; });
m_rt.readyToDraw = true;
// SUPPLY RENDER THREAD WITH DATA TO RENDER
m_rt.frame = i;
lk.unlock();
m_rt.renderReady.notify_one();
if (!isMultiThreaded)
m_rt.DoJob();
}
m_rt.end = true;
}
RenderThread& m_rt;
};
int main()
{
auto start = chrono::high_resolution_clock::now();
RenderThread rt;
UpdateThread u(rt);
thread* rendering = nullptr;
if (isMultiThreaded)
rendering = new thread(bind(&RenderThread::Run, &rt));
u.Run();
if (rendering)
rendering->join();
auto duration = chrono::high_resolution_clock::now() - start;
cout << "Duration: " << double(chrono::duration_cast<chrono::microseconds>(duration).count())/1000 << endl;
return 0;
}
Here is the source of this small example code, and as you can see even on ideone's run the output is frame 0: 19 (this means that the render thread has completed a single iteration, while the update thread has completed all 20 of its).
If we uncomment line 75 (ie simulate some time for the update loop) everything runs fine. Every update iteration has an associated render iteration.
Is there a way to really truly sync those threads, even if one of them completes an iteration in mere nanoseconds, but also without having a performance penalty if they both take some reasonable amount of milliseconds to complete?
If I understand correctly, you want the 2 threads to work alternately: updater wait until the renderer finish before to iterate again, and the renderer wait until the updater finish before to iterate again. Part of the computation could be parallel, but the number of iteration shall be similar between both.
You need 2 locks:
one for the updating
one for the rendering
Updater:
wait (renderingLk)
update
signal(updaterLk)
Renderer:
wait (updaterLk)
render
signal(renderingLk)
EDITED:
Even if it look simple, there are several problems to solve:
Allowing part of the calculations to be made in parallel: As in the above snippet, update and render will not be parallel but sequential, so there is no benefit to have multi-thread. To a real solution, some the calculation should be made before the wait, and only the copy of the new values need to be between the wait and the signal. Same for rendering: all the render need to be made after the signal, and only getting the value between the wait and the signal.
The implementation need to care also about the initial state: so no rendering is performed before the first update.
The termination of both thread: so no one will stay locked or loop infinitely after the other terminate.
I think a mutex (alone) is not the right tool for the job. You might want to consider using a semaphore (or something similar) instead. What you describe sound a lot like a producer/consumer problem, i.e., one process is allowed to run once everytime another process has finnished a task. Therefore you might also have a look at producer/consumer patterns. For example this series might get you some ideas:
A multi-threaded Producer Consumer with C++11
There a std::mutex is combined with a std::condition_variable to mimic the behavior of a semaphore. An approach that appears quite reasonable. You would probably not count up and down but rather toggle true and false a variable with needs redraw semantics.
For reference:
http://en.cppreference.com/w/cpp/thread/condition_variable
C++0x has no semaphores? How to synchronize threads?
This is because you use a separate drawing variable that is only set when the rendering thread reacquires the mutex after a wait, which may be too late. The problem disappears when the drawing variable is removed and the check for wait in the update thread is replaced with ! m_rt.readyToDraw (which is already set by the update thread and hence not susceptible to the logical race.
Modified code and results
That said, since the threads do not work in parallel, I don't really get the point of having two threads. Unless you should choose to implement double (or even triple) buffering later.
A technique often used in computer graphics is to use a double-buffer. Instead of having the renderer and the producer operate on the same data in memory, each one has its own buffer. This is implemented by using two independent buffers, and switch them when needed. The producer updates one buffer, and when it is done, it switches the buffer and fills the second buffer with the next data. Now, while the producer is processing the second buffer, the renderer works with the first one and displays it.
You could use this technique by letting the renderer lock the swap operation such that the producer may have to wait until rendering is finished.