Usually if I want to simulate some work or wait exact time interval I use condition_variable::wait_for or at the worst thread::this_thread::sleep_for. But condition_variable documentation states that wait_for or wait_until methods may block longer than was requested.
This function may block for longer than timeout_duration due to scheduling or resource contention delays.
How exact wait intervals can be guaranteed?
UPDATE
Can I reach it without condition_variable?
You cannot do this.
In order to have exact guarantees like this, you need a real time operating system.
C++ does not guarantee you are on a real time operating system.
So it provides the guarantees that a typical, non-RTOS provides.
Note that there are other complications to programming on a RTOS that go far beyond the scope of this question.
In practice, one thing people when they really want fine-grained timing control (say, they are twiddling around with per-frame or per-scanline buffers or the like, or audio buffers, or whatever) do is check if the time is short, and if so spin-wait. If the time is longer, they wait for a bit less than the amount of time they want to wait for, then wake up and spin.
This is also not guaranteed to work, but works well enough for almost all cases.
On a RTOS, the platform may provide primitives like you want. These lie outside the scope of standard C++. No typical desktop OS is an RTOS that I am aware of. If you are programming for a fighter jet's control hardware or similar, you may be programming on an RTOS.
I hope you aren't writing fighter jet control software and asking this question on stack overflow.
If you did hypothetically sleep for precisely some exact duration, and then performed some action in response (such as getting the current time, or printing a message to the screen), then that action might be delayed for some unknown period of time e.g. due to processor load. This is equivalent to the action happening (almost) immediately but the timer taking longer than expected. Even in the best case scenario, where the timer completes at precisely the time you request, and the operating system allows your action to complete without preempting your process, it will take a few clock cycles to perform that action.
So in other words, on a standard operating system, it is impossible or maybe even meaningless for a timer to complete at precisely the time requested.
How can this be overcome? An academic answer is that you can used specialized software and hardware such as a real-time operating system, but this is vastly more complicated to develop software for than regular programming. What you probably really want to know is, in the common case, the delay that that documentation refers to is not substantial i.e. it is commonly less that 1/100th second.
With a brute force loop... for example:
chrono::microseconds sleep_duration{1000};
auto now = chrono::high_resolution_clock::now()
while(true)
{
auto elapsed = chrono::duration_cast<hrono::microseconds>(chrono::high_resolution_clock::now() - now);
if (elapsed > sleep_duration)
break;
}
That's bit ugly but desktop operating system are not real time so you cannot have such precision.
In order to relax the cpu you look the following snippet:
void little_sleep(std::chrono::microseconds us)
{
auto start = std::chrono::high_resolution_clock::now();
auto end = start + us;
do {
std::this_thread::yield();
} while (std::chrono::high_resolution_clock::now() < end);
}
That depends on what accuracy you can expect. Generally as others have said regular OS (Linux, Windows) cannot guaranty that.
Why?
Your OS have probably has concept of threads. If so, then there is a scheduler which interrupts threads and switch execution to other threads waiting in the queue. And this can spoil accuracy of timers.
What can I do about it?
If you are using embedded system - go for bare metal, i.e. don't use
OS or use so called hard real time operating system.
If you are using Linux, look for Linux RT Preempt Patch in Google. You have to recompile your kernel to include the path (not so complicated though) and then you can create threads with priority above 50 - which means priority above kernel's thread - which in the end means that you can have a thread that can interrupt scheduler and kernel in general, providing quite good time accuracy. In my case it what three orders of magnitude (from few ms of latency to few us).
If you are using Windows, I don't know about such patch, but you can search for for High Precisions timers on Microsoft site. Maybe provided accuracy will be enough for your needs.
Related
This question already has answers here:
How to guarantee exact thread sleep interval?
(4 answers)
accurate sampling in c++
(2 answers)
Closed 4 years ago.
I'm currently working on some C++ code that reads from a video file, parses the video/audio streams into its constituent units (such as an FLV tag) and sends it back out in order to "restream" it.
Because my input comes from file but I want to simulate a proper framerate when restreaming this data, I am considering the ways that I might sleep the thread that is performing the read on the file in order to attempt to extract a frame at the given rate that one might expect out of typical 30 or 60 FPS.
One solution is to use an obvious std::this_thread::sleep_for call and pass in the amount of milliseconds depending on what my FPS is. Another solution I'm considering is using a condition variable, and using std::condition_variable::wait_for with the same idea.
I'm a little stuck, because I know that the first solution doesn't guarantee exact precision -- the sleep will last around as long as the argument I pass in but may in theory be longer. And I know that the std::condition_variable::wait_for call will require lock reacquisition which will take some time too. Is there a better solution than what I'm considering? Otherwise, what's the best methodology to attempt to pause execution for as precise a granularity as possible?
C++11 Most accurate way to pause execution for a certain amount of time?
This:
auto start = now();
while(now() < start + wait_for);
now() is a placeholder for the most accurate time measuring method available for the system.
This is the analogue to sleep as what spinlock is to a mutex. Like a spinlock, it will consume all the CPU cycles while pausing, but it is what you asked for: The most accurate way to pause execution. There is trade-off between accuracy and CPU-usage-efficiency: You must choose which is more important to have for your program.
why is it more accurate than std::this_thread::sleep_for?
Because sleep_for yields execution of the thread. As a consequence, it can never have better granularity than the process scheduler of the operating system has (assuming there are other processes competing for time).
The live loop shown above which doesn't voluntarily give up its time slice will achieve the highest granularity provided by the clock that is used for measurement.
Of course, the time slice granted by the scheduler will eventually run out, and that might happen near the time we should be resuming. Only way to reduce that effect is to increase the priority of our thread. There is no standard way of affecting the priority of a thread in C++. The only way to get completely rid of that effect is to run on a non-multi-tasking system.
On multi-CPU systems, one trick that you might want to do is to set the thread affinity so that the OS thread won't be migrated to other hard ware threads which would introduce latency. Likewise, you might want to set thread affinity of your other threads to stay off the time measuring thread. There is no standard tool to set thread affinity.
Let T be the time you wish to sleep for and let G be the maximum time that sleep_for could possibly overshoot.
If T is greater than G, then it will be more efficient to use sleep_for for T - G time units, and only use the live loop for the final G - O time units (where O is the time that sleep_for was observed to overshoot).
Figuring out what G is for your target system can be quite tricky however. There is no standard tool for that. If you over-estimate, you'll waste more cycles than necessary. If you under-estimate, your sleep may overshoot the target.
In case you're wondering what is a good choice for now(), the most appropriate tool provided by the standard library is std::chrono::steady_clock. However, that is not necessarily the most accurate tool available on your system. What tool is the most accurate depends on what system you're targeting.
I understand that a preemptive multitasking OS can interrupt a process at any "code position".
Given the following code:
int main() {
while( true ) {
doSthImportant(); // needs to be executed at least each 20 msec
// start of critical section
int start_usec = getTime_usec();
doSthElse();
int timeDiff_usec = getTime_usec() - start_usec;
// end of critical section
evalUsedTime( timeDiff_usec );
sleep_msec( 10 );
}
}
I would expect this code to usually produce proper results for timeDiff_usec, especially in case that doSthElse() and getTime_usec() don't take much time so they get interrupted rarely by the OS scheduler.
But the program would get interrupted from time to time somewhere in the "critical section". The context switch will do what it is supposed to do, and still in such a case the program would produce wrong results for the timeDiff_usec.
This is the only example I have in mind right now but I'm sure there would be other scenarios where multitasking might get a program(mer) into trouble (as time is not the only state that might be changed at re-entry).
Is there a way to ensure that measuring the time for a certain action works fine?
Which other common issues are critical with multitasking and need to be considered? (I'm not thinking of thread safety - but there might be common issues).
Edit:
I changed the sample code to make it more precise.
I want to check the time being spent to make sure that doSthElse() doesn't take like 50 msec or so, and if it does I would look for a better solution.
Is there a way to ensure that measuring the time for a certain action works fine?
That depends on your operating system and your privilege level. On some systems, for some privilege levels, you can set a process or thread to have a priority that prevents it from being preempted by anything at lower priority. For example, on Linux, you might use sched_setscheduler to give a thread real-time priority. (If you're really serious, you can also set the thread affinity and SMP affinities to prevent any interrupts from being handled on the CPU that's running your thread.)
Your system may also provide time tracking that accounts for time spent preempted. For example, POSIX defines the getrusage function, which returns a struct containing ru_utime (the amount of time spent in “user mode” by the process) and ru_stime (the amount of time spent in “kernel mode” by the process). These should sum to the total time the CPU spent on the process, excluding intervals during which the process was suspended. Note that if the kernel needs to, for example, spend time paging on behalf of your process, it's not defined how much (if any) of that time is charged to your process.
Anyway, the common way to measure time spent on some critical action is to time it (essentially the way your question presents) repeatedly, on an otherwise idle system, throw out outlier measurements, and take the mean (after eliminating outliers), or take the median or 95th percentile of the measurements, depending on why you need the measurement.
Which other common issues are critical with multitasking and need to be considered? (I'm not thinking of thread safety - but there might be common issues).
Too broad. There are whole books written about this subject.
By task scheduler I mean any implementation of worker threads pool which distribute work to the threads according to whatever algorithm they are designed with. (like Intel TBB)
I know that "real-time" constraints imply that work gets done in predictable time (I'm not talking about speed). So my guess is that using a task scheduler, which, as far as I know, can't guarantee that some task will be executed before a given time, makes the application impossible to use in these constraints.
Or am I missing something? Is there a way to have both? Maybe by forcing assumptions on the quantity of data that can be processed? Or maybe there are predictable task schedulers?
I'm talking about "hard" real time constraints, not soft real time (like video games).
To clarify:
It is known that there are features in C++ that are not possible to use in this kind of context: new, delete, throw, dynamic_cast. They are not predictable (you don't know how much time can be spent on one of these operations, it depends on too much parameters that are not even known before execution).
You can't really use them in real time contexts.
What I ask is, does task schedulers have the same unpredictability that would make them unusable in real-time applications?
Yes, it can be done, but no it's not trivial, and yes there are limits.
You can write the scheduler to guarantee (for example) that an interrupt handler, exception handler (etc.) is guaranteed to be invoked without a fixed period of time from when it occurs. You can guarantee that any given thread will (for example) get at least X milliseconds of CPU time out of any given second (or suitable fraction of a second).
To enforce the latter, you generally need admittance criteria -- an ability for the scheduler to say "sorry, but I can't schedule this as a real-time thread, because the CPU is already under too much load.
In other cases, all it does is guarantee that at least (say) 99% of CPU time will be given the real-time tasks (if any exist) and it's up to whomever designs the system on top of that to schedule few enough real-time tasks that this will ensure they all finish quickly enough.
I feel obliged to add that the "hardness" of real-time requirements is almost entirely orthogonal to the response speed needed. Rather, it's almost entirely about the seriousness of the consequences of being late.
Just for example, consider a nuclear power plant. For a lot of what happens, you're dealing with time periods on the order of minutes, or in some cases even hours. Filling a particular chamber with, say, half a million gallons of water just isn't going to happen in microseconds or milliseconds.
At the same time, the consequences of a later answer can be huge -- quite possibly causing not just a few deaths like hospital equipment could, but potentially hundreds or even thousands of deaths, hundreds of millions in damage, etc. As such, it's about as "hard" as real-time requirements get, even though the deadlines are unusually "loose" by most typical standards.
In the other direction, digital audio playback has much tighter limits. Delays or dropouts can be quite audible down to a fraction of a millisecond in some cases. At the same time, unless you're providing sound processing for a large concert (or something on that order) the consequences of a dropout will generally be a moment's minor annoyance on the part of a user.
Of course, it's also possible to combine the two -- for an obvious example, in high-frequency trading, deadlines may well be in the order of microseconds (or so) and the loss from missing a deadline could easily be millions or tens of millions of (dollars|euros|pounds|etc.)
The term real-time is quite flexible. "Hard real-time" tends to mean things where a few tens of microseconds make the difference between "works right" and "doesn't work right. Not all "real-time" systems require that sort of real-time-ness.
I once worked on a radio-base-station for mobile phones. One of the devices on the board had an interrupt that fired every 2-something milliseconds. For correct operation (not losing calls), we had to deal with the interrupt, that is, do the work inside the interrupt and write the hardware registers with the new values, within 100 microseconds - if we missed, there would be dropped calls. If the interrupt wasn't taken after 160 microseconds, the system would reboot. That is "hard real-time", especially as the processor was just running at a few tens of MHz.
If you produce a video-player, it requires real-time in the a few milliseconds range.
A "display stock prices" probably can be within the 100ms range.
For a webserver it is probably acceptable to respond within 1-2seconds without any big problems.
Also, there is a difference between "worst case worse than X means failure" (like the case above with 100 microseconds or dropped calls - that's bad if it happens more than once every few weeks - and even a few times a year is really something that should be fixed). This is called "Hard real-time".
But other systems, missing your deadline means "Oh, well, we have to do that over again" or "a frame of video flickered a bit", as long as it doesn't happen very often, it's probably OK. This is called "soft real-time".
A lot of modern hardware will make "hard real-time" (the 10s or 100 microsecond range) difficult, because the graphics processor will simply stop the processor from accessing memory, or if the processor gets hot, the stopclk pin is pulled for 100 microseconds...
Most modern OS's, such as Linux and Windows, aren't really meant to be "hard real-time". There are sections of code that does disable interrupt for longer than 100 microseconds in some parts of these OS's.
You can almost certainly get some good "soft real-time" (that is, missing the deadline isn't a failure, just a minor annoyance) out of a mainstream modern OS with modern hardware. It'll probably require either modifications to the OS or a dedicated real-time OS (and perhaps suitable special hardware) to make the system do hard real-time.
But only a few things in the world requires that sort of hard real-time. Often the hard real-time requirements are dealt with by hardware - for example, the next generation of radio-base-stations that I described above, had more clever hardware, so you just needed to give it the new values within the next 2-something milliseconds, and you didn't have the "mad rush to get it done in a few tens of microseconds". In a modern mobile phone, the GSM or UMTS protocol is largely dealt with by a dedicated DSP (digital signal processor).
A "hard real-time" requirement is where the system is really failing if a particular deadline (or set of deadlines) can't be met, even if the failure to meet deadlines happens only once. However, different systems have different systems have different sensitivity to the actual time that the deadline is at (as Jerry Coffin mentions). It is almost certainly possible to find cases where a commercially available general purpose OS is perfectly adequate in dealing with the real-time requirements of a hard real-time system. It is also absolutely sure that there are other cases where such hard real-time requirements are NOT possible to meet without a specialized system.
I would say that if you want sub-millisecond guarantees from the OS, then Desktop Windows or Linux are not for you. This is really down to the overall philosophy of the OS and scheduler design, and to build a hard real-time OS requires a lot of thought about locking and potential for one thread to block another thread, from running, etc.
I don't think there is ONE answer that applies to your question. Yes, you can certainly use thread-pools in a system that has hard real-time requirements. You probably can't do it on a sub-millisecond basis unless there is specific support for this in the OS. And you may need to have dedicated threads and processes to deal with the highest priority real-time behaviour, which is not part of the thread-pool itself.
Sorry if this isn't saying "Yes" or "No" to your answer, but I think you will need to do some research into the actual behaviour of the OS, and see what sort of guarantees it can give (worst case). You will also have to decide what is the worse case scenario, and what happens if you miss a deadline - are (lots of) people dying (plane falling out of the sky), or are some banker going to lose millions, is the green lights going to come on at the same time on two directions on a road crossing or is it some bad sound coming out of a speaker?
"Real time" doesn't just mean "fast", it means that the system can respond to meet deadlines in the real world. Those deadlines depend on what you're dealing with in the real world.
Whether or not a task finishes in a particular timeframe is a characteristic of the task, not the scheduler. The scheduler might decide which task gets resources, and if a task hasn't finished by a deadline it might be stopped or have its resource usage constrained so that other tasks can meet their deadlines.
So, the answer to your question is that you need to consider the workload, deadlines and the scheduler together, and construct your system to meet your requirements. There is no magic scheduler that will take arbitrary tasks and make them complete in a predictable time.
Update:
A task scheduler can be used in real-time systems if it provides the guarantees you need. As others have said, there are task schedulers that provide those guarantees.
On the comments: The issue is the upper bound on time taken.
You can use new and delete if you overload them to have the performance characteristics you are after; the problem isn't new and delete, it is dynamic memory allocation. There is no requirement that new and delete use a general-purpose dynamic allocator, you can use them to allocate out of a statically allocated pool sized appropriately for your workload with deterministic behaviour.
On dynamic_cast: I tend not to use it, but I don't think it's performance is non-deterministic to the extent that it should be banned in real-time code. This is an example of the same issue: understanding worst-case performance is important.
My understanding of the Sleep function is that it follows "at least semantics" i.e. sleep(5) will guarantee that the thread sleeps for 5 seconds, but it may remain blocked for more than 5 seconds depending on other factors. Is there a way to sleep for exactly a specified time period (without busy waiting).
As others have said, you really need to use a real-time OS to try and achieve this. Precise software timing is quite tricky.
However... although not perfect, you can get a LOT better results than "normal" by simply boosting the priority of the process that needs better timing. In Windows you can achieve this with the SetPriorityClass function. If you set the priority to the highest level (REALTIME_PRIORITY_CLASS: 0x00000100) you'll get much better timing results. Again - this will not be perfect like you are asking for, though.
This is also likely possible on other platforms than Windows, but I've never had reason to do it so haven't tested it.
EDIT: As per the comment by Andy T, if your app is multi-threaded you also need to watch out for the priority assigned to the threads. For Windows this is documented here.
Some background...
A while back I used SetPriorityClass to boost the priority on an application where I was doing real-time analysis of high-speed video and I could NOT miss a frame. Frames were arriving to the pc at a very regular (driven by external framegrabber HW) frequency of 300 frames per second (fps), which fired a HW interrupt on every frame which I then serviced. Since timing was very important, I collected a lot of stats on the interrupt timing (using QueryPerformanceCounter stuff) to see how bad the situation really was, and was appalled at the resulting distributions. I don't have the stats handy, but basically Windows was servicing the interrupt whenever it felt like it when run at normal priority. The histograms were very messy, with the stdev being wider than my ~3ms period. Frequently I would have gigantic gaps of 200 ms or greater in the interrupt servicing (recall that the interrupt fired roughly every 3 ms)!! ie: HW interrupts are FAR from exact! You're stuck with what the OS decides to do for you.
However - when I discovered the REALTIME_PRIORITY_CLASS setting and benchmarked with that priority, it was significantly better and the service interval distribution was extremely tight. I could run 10 minutes of 300 fps and not miss a single frame. Measured interrupt servicing periods were pretty much exactly 1/300 s with a tight distribution.
Also - try and minimize the other things the OS is doing to help improve the odds of your timing working better in the app where it matters. eg: no background video transcoding or disk de-fragging or anything while your trying to get precision timing with other code!!
In summary:
If you really need this, go with a real time OS
If you can't use a real-time OS (impossible or impractical), boosting your process priority will likely improve your timing by a lot, as it did for me
HW interrupts won't do it... the OS still needs to decide to service them!
Make sure that you don't have a lot of other processes running that are competing for OS attention
If timing is really important to you, do some testing. Although getting code to run exactly when you want it to is not very easy, measuring this deviation is quite easy. The high performance counters in PCs (what you get with QueryPerformanceCounter) are extremely good.
Since it may be helpful (although a bit off topic), here's a small class I wrote a long time ago for using the high performance counters on a Windows machine. It may be useful for your testing:
CHiResTimer.h
#pragma once
#include "stdafx.h"
#include <windows.h>
class CHiResTimer
{
private:
LARGE_INTEGER frequency;
LARGE_INTEGER startCounts;
double ConvertCountsToSeconds(LONGLONG Counts);
public:
CHiResTimer(); // constructor
void ResetTimer(void);
double GetElapsedTime_s(void);
};
CHiResTimer.cpp
#include "stdafx.h"
#include "CHiResTimer.h"
double CHiResTimer::ConvertCountsToSeconds(LONGLONG Counts)
{
return ((double)Counts / (double)frequency.QuadPart) ;
}
CHiResTimer::CHiResTimer()
{
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&startCounts); // starts the timer right away
}
void CHiResTimer::ResetTimer()
{
QueryPerformanceCounter(&startCounts); // reset the reference counter
}
double CHiResTimer::GetElapsedTime_s()
{
LARGE_INTEGER countsNow;
QueryPerformanceCounter(&countsNow);
return ConvertCountsToSeconds(countsNow.QuadPart - startCounts.QuadPart);
}
No.
The reason it's "at least semantics" is because that after those 5 seconds some other thread may be busy.
Every thread gets a time slice from the Operating System. The Operating System controls the order in which the threads are run.
When you put a thread to sleep, the OS puts the thread in a waiting list, and when the timer is over the operating system "Wakes" the thread.
This means that the thread is added back to the active threads list, but it isn't guaranteed that t will be added in first place. (What if 100 threads need to be awaken in that specific second ? Who will go first ?)
While standard Linux is not a realtime operating system, the kernel developers pay close attention to how long a high priority process would remain starved while kernel locks are held. Thus, a stock Linux kernel is usually good enough for many soft-realtime applications.
You can schedule your process as a realtime task with the sched_setscheduler(2) call, using either SCHED_FIFO or SCHED_RR. The two have slight differences in semantics, but it may be enough to know that a SCHED_RR task will eventually relinquish the processor to another task of the same priority due to time slices, while a SCHED_FIFO task will only relinquish the CPU to another task of the same priority due to blocking I/O or an explicit call to sched_yield(2).
Be careful when using realtime scheduled tasks; as they always take priority over standard tasks, you can easily find yourself coding an infinite loop that never relinquishes the CPU and blocks admins from using ssh to kill the process. So it might not hurt to run an sshd at a higher realtime priority, at least until you're sure you've fixed the worst bugs.
There are variants of Linux available that have been worked on to provide hard-realtime guarantees. RTLinux has commercial support; Xenomai and RTAI are competing implementations of realtime extensions for Linux, but I know nothing else about them.
As previous answerers said: there is no way to be exact (some suggested realtime-os or hardware interrupts and even those are not exact). I think what you are looking for is something that is just more precise than the sleep() function and you find that depending on your OS in e.g. the Windows Sleep() function or under GNU the nanosleep() function.
http://msdn.microsoft.com/en-us/library/ms686298%28VS.85%29.aspx
http://www.delorie.com/gnu/docs/glibc/libc_445.html
Both will give you precision within a few milliseconds.
Well, you try to tackle a difficult problem, and achieving exact timing is not feasible: the best you can do is to use hardware interrupts, and the implementation will depend on both your underlying hardware, and your operating system (namely, you will need a real-time operating system, which most regular desktop OS are not). What is your exact target platform?
No. Because you're always depending on the OS to handle waking up threads at the right time.
There is no way to sleep for a specified time period using standard C. You will need, at minimum, a 3rd party library which provides greater granularity, and you might also need a special operating system kernel such as the real-time Linux kernels.
For instance, here is a discussion of how close you can come on Win32 systems.
This is not a C question.
I there any method to sleep the thread upto 100.8564 millisecond under window OS. I am using multimedia timer but its resolution is minimum 1 second. Kindly guide me so that I can handle the fractional part of the millisecond.
Yes you can do it. See QueryPerformanceCounter() to read accurate time, and make a busy loop.
This will enable you to make waits with up to 10 nanosecond resolution, however, if thread scheduler decides to steal control from you at the moment of the cycle end, it will, and there's nothing you can do about it except assigning your process realtime priority.
You may also have a look at this: http://msdn.microsoft.com/en-us/library/ms838340(WinEmbedded.5).aspx
Several frameworks were developed to do hard realtime on windows.
Otherwise, your question probably implies that you might be doing something wrong. There're numerous mechanisms to trick around ever needing precise delays, such as using proper bus drivers (in case of hardware/IO, or respective DMAs if you are designing a driver), and more.
Please tell us what exactly are you building.
I do not know your use case, but even a high end realtime operating system would be hard pressed to achieve less 100ns jitter on timings.
In most cases I found you do not need that precision in reproducibility but only for long time drift. In that respect it is relatively straightforward to keep a timeline and calculate the event on the desired precision. Then use that timeline to synchronize the events which may be off even by 10's of ms. As long as these errors do not add up, I found I got adequate performance.
If you need guaranteed latency, you cannot get it with MS Windows. It's not a realtime operating system. It might swap in another thread or process at an importune instant. You might get a cache miss. When I did a robot controller a while back, I used an OS called On Time RTOS 32. It has an MS Windows API emulation layer. You can use it with Visual Studio. You'll need something like that.
The resolution of a multimedia timer is much better than one second. It can go down to 1 millisecond when you call timeBeginPeriod(1) first. The timer will automatically adjust its interval for the next call when the callback is delivered late. Which is inevitable on a multi-tasking operating system, there is always some kind of kernel thread with a higher priority than yours that will delay the callback.
While it will work pretty well on average, worst case latency is in the order of hundreds of milliseconds. Clearly, your requirements cannot be met by Windows by a long shot. You'll need some kind of microcontroller to supply that kind of execution guarantee.