Windows Sleep Function Extremely Slow - c++

I am making a program using the Sleep command via Windows.h, and am experiencing a frustrating difference between running my program on Windows 10 instead of Windows 7. I simplified my program to the program below which exhibits the same behavior as my more complicated program.
On Windows 7 this 5000 count loop runs with the Sleep function at 1ms. This takes 5 seconds to complete.
On Windows 10 when I run the exact same program (exact same binary executable file), this program takes almost a minute to complete.
For my application this is completely unacceptable as I need to have the 1ms timing delay in order to interact with hardware I am using.
I also tried a suggestion from another post to use the select() command (via winsock2), but that command did not work to delay 1ms either. I have tried this program on multiple Windows 7 and Windows 10 PC's and the root cause of the issue always points to using Windows 10 instead of Windows 7. The program always runs within ~5 seconds on numerous Windows 7 PC's, and on the multiple Windows 10 PC's that I have tested the duration has been much longer ~60 seconds.
I have been using Microsoft Visual Studio Express 2010 (C/C++) as well as Microsoft Visual Studio Express 2017 (C/C++) to compile the programs. The version of visual studio does not influence the results.
I have also changed the compile options from 'Debug' to 'Release' and tried to optimize the compiler but this will not help either.
Any suggestions would be greatly appreciated.
#include <stdio.h>
#include <Windows.h>
#define LOOP_COUNT 5000
int main()
{
int i = 0;
for (i; i < LOOP_COUNT; i++){
Sleep(1);
}
return 0;
}

I need to have the 1ms timing delay in order to interact with hardware I am using
Windows is the wrong tool for this job.
If you insist on using this wrong tool, you are going to have to make compromises (such as using a busy-wait and accepting the corresponding poor battery life).
You can make Sleep() more accurate using timeBeginPeriod(1) but depending on your hardware peripheral's limits on the "one millisecond" delay -- is that a minimum, maximum, or the middle of some range? -- it still will fail to meet your timing requirement with some non-zero probability.
The timeBeginPeriod function requests a minimum resolution for periodic timers.
The right solution for talking to hardware with tight timing tolerances is an embedded microcontroller which talks to the Windows PC through some very flexible interface such as UART or Ethernet, buffers data, and uses hardware timers to generate signals with very well-defined timing.
In some cases, you might be able to use embedded circuitry already existing within your Windows PC, such as "sound card" functionality.

#BenVoigt & #mzimmers thank you for your responses and suggestions. I did find a unique solution to this question and the solution was inspired by the post I have linked directly below.
Units of QueryPerformanceFrequency
In this post BrianP007 writes a function to see how fast the Sleep(1000) command takes. However, while I was playing around I realized that Sleep() accepts 0. Therefore I used a similar structure to the linked post to find the time that it takes to loop until reaching a delta t of 1ms.
For my purposes I increased i by 100, however it can be increased by 10 or by 1 in order to get a more accurate estimate as to what i should be.
Once you get a value for i, you can use that value to get an approximate delay for 1ms on your machine. If you run this function in a loop (I ran it 100 times) I was able to get anywhere from i = 3000 to i = 6000. However, my machine averages out around 5500. This spread is probably due to jitter/clock frequency changes through time in the processor.
The processor_check() function below only finds out what value should be returned for the for loop argument; the actual 'timer' needs to just have the for loop with Sleep(0) inside of it to run a timer with ~1ms resolution on the machine.
While this method is not perfect, it is much closer and works a ton better than using Sleep(1). I have to test this more thoroughly, but please let me know if this works for you as well. Please feel free to use the code below if you need it for your own applications. This code should be able to be copy and pasted into an empty command prompt C program in Visual Studio directly without modification.
/*ZKR Sleep_ZR()*/
#include "stdio.h"
#include <windows.h>
/*Gets for loop value*/
int processor_check()
{
double delta_time = 0;
int i = 0;
int n = 0;
while(delta_time < 0.001){
LARGE_INTEGER sklick, eklick, cpu_khz;
QueryPerformanceFrequency(&cpu_khz);
QueryPerformanceCounter(&sklick);
for(n = 0; n < i; n++){
Sleep(0);
}
QueryPerformanceCounter(&eklick);
delta_time = (eklick.QuadPart-sklick.QuadPart) / (double)cpu_khz.QuadPart;
i = i + 100;
}
return i;
}
/*Timer*/
void Sleep_ZR(int cnt)
{
int i = 0;
for(i; i < cnt; i++){
Sleep(0);
}
}
/*Main*/
int main(int argc, char** argv)
{
double average = 0;
int i = 0;
/*Single use*/
int loop_count = processor_check();
Sleep_ZR(loop_count);
/*Average based on processor to get more accurate Sleep_ZR*/
for(i = 0; i < 100; i++){
loop_count = processor_check();
average = average + loop_count;
}
average = average / 100;
printf("Average: %f\n", average);
/*10 second test*/
for (i = 0; i < 10000; i++){
Sleep_ZR((int)average);
}
return 0;
}

Related

lowest latency/high resolution, highest timing guarantee, timing/timer on windows? [duplicate]

I am making a program using the Sleep command via Windows.h, and am experiencing a frustrating difference between running my program on Windows 10 instead of Windows 7. I simplified my program to the program below which exhibits the same behavior as my more complicated program.
On Windows 7 this 5000 count loop runs with the Sleep function at 1ms. This takes 5 seconds to complete.
On Windows 10 when I run the exact same program (exact same binary executable file), this program takes almost a minute to complete.
For my application this is completely unacceptable as I need to have the 1ms timing delay in order to interact with hardware I am using.
I also tried a suggestion from another post to use the select() command (via winsock2), but that command did not work to delay 1ms either. I have tried this program on multiple Windows 7 and Windows 10 PC's and the root cause of the issue always points to using Windows 10 instead of Windows 7. The program always runs within ~5 seconds on numerous Windows 7 PC's, and on the multiple Windows 10 PC's that I have tested the duration has been much longer ~60 seconds.
I have been using Microsoft Visual Studio Express 2010 (C/C++) as well as Microsoft Visual Studio Express 2017 (C/C++) to compile the programs. The version of visual studio does not influence the results.
I have also changed the compile options from 'Debug' to 'Release' and tried to optimize the compiler but this will not help either.
Any suggestions would be greatly appreciated.
#include <stdio.h>
#include <Windows.h>
#define LOOP_COUNT 5000
int main()
{
int i = 0;
for (i; i < LOOP_COUNT; i++){
Sleep(1);
}
return 0;
}
I need to have the 1ms timing delay in order to interact with hardware I am using
Windows is the wrong tool for this job.
If you insist on using this wrong tool, you are going to have to make compromises (such as using a busy-wait and accepting the corresponding poor battery life).
You can make Sleep() more accurate using timeBeginPeriod(1) but depending on your hardware peripheral's limits on the "one millisecond" delay -- is that a minimum, maximum, or the middle of some range? -- it still will fail to meet your timing requirement with some non-zero probability.
The timeBeginPeriod function requests a minimum resolution for periodic timers.
The right solution for talking to hardware with tight timing tolerances is an embedded microcontroller which talks to the Windows PC through some very flexible interface such as UART or Ethernet, buffers data, and uses hardware timers to generate signals with very well-defined timing.
In some cases, you might be able to use embedded circuitry already existing within your Windows PC, such as "sound card" functionality.
#BenVoigt & #mzimmers thank you for your responses and suggestions. I did find a unique solution to this question and the solution was inspired by the post I have linked directly below.
Units of QueryPerformanceFrequency
In this post BrianP007 writes a function to see how fast the Sleep(1000) command takes. However, while I was playing around I realized that Sleep() accepts 0. Therefore I used a similar structure to the linked post to find the time that it takes to loop until reaching a delta t of 1ms.
For my purposes I increased i by 100, however it can be increased by 10 or by 1 in order to get a more accurate estimate as to what i should be.
Once you get a value for i, you can use that value to get an approximate delay for 1ms on your machine. If you run this function in a loop (I ran it 100 times) I was able to get anywhere from i = 3000 to i = 6000. However, my machine averages out around 5500. This spread is probably due to jitter/clock frequency changes through time in the processor.
The processor_check() function below only finds out what value should be returned for the for loop argument; the actual 'timer' needs to just have the for loop with Sleep(0) inside of it to run a timer with ~1ms resolution on the machine.
While this method is not perfect, it is much closer and works a ton better than using Sleep(1). I have to test this more thoroughly, but please let me know if this works for you as well. Please feel free to use the code below if you need it for your own applications. This code should be able to be copy and pasted into an empty command prompt C program in Visual Studio directly without modification.
/*ZKR Sleep_ZR()*/
#include "stdio.h"
#include <windows.h>
/*Gets for loop value*/
int processor_check()
{
double delta_time = 0;
int i = 0;
int n = 0;
while(delta_time < 0.001){
LARGE_INTEGER sklick, eklick, cpu_khz;
QueryPerformanceFrequency(&cpu_khz);
QueryPerformanceCounter(&sklick);
for(n = 0; n < i; n++){
Sleep(0);
}
QueryPerformanceCounter(&eklick);
delta_time = (eklick.QuadPart-sklick.QuadPart) / (double)cpu_khz.QuadPart;
i = i + 100;
}
return i;
}
/*Timer*/
void Sleep_ZR(int cnt)
{
int i = 0;
for(i; i < cnt; i++){
Sleep(0);
}
}
/*Main*/
int main(int argc, char** argv)
{
double average = 0;
int i = 0;
/*Single use*/
int loop_count = processor_check();
Sleep_ZR(loop_count);
/*Average based on processor to get more accurate Sleep_ZR*/
for(i = 0; i < 100; i++){
loop_count = processor_check();
average = average + loop_count;
}
average = average / 100;
printf("Average: %f\n", average);
/*10 second test*/
for (i = 0; i < 10000; i++){
Sleep_ZR((int)average);
}
return 0;
}

WinAPI calling code at specific timestamp

Using functions available in the WinAPI, is it possible to ensure that a specific function is called according to a milisecond precise timestamp? And if so what would be the correct implementation?
I'm trying to write tool assisted speedrun software. This type of software sends user input commands at very exact moments after the script is launched to perform humanly impossible inputs that allow faster completion of videogames. A typical sequence looks something like this:
At 0 miliseconds send right key down event
At 5450 miliseconds send right key up, and up key down event
At 5460 miliseconds send left key down event
etc..
What I've tried so far is listed below. As I'm not experienced in the low level nuances of high precision timers I have some results, but no understanding of why they are this way:
Using Sleep in combination with timeBeginPeriod set to 1 between inputs gave the worst results. Out of 20 executions 0 have met the timing requirement. I believe this is well explained in the documentation for sleep Note that a ready thread is not guaranteed to run immediately. Consequently, the thread may not run until some time after the sleep interval elapses. My understanding is that Sleep isn't up for this task.
Using a busy wait loop checking GetTickCount64 with timeBeginPeriod set to 1 produced slightly better results. Out of 20 executions 2 have met the timing requirement, but apparently that was just a fortunate circumstance. I've looked up some info on this timing function and my suspicion is that it doesn't update often enough to allow 1 milisecond accuracy.
Replacing the GetTickCount64 with the QueryPerformanceCounter improved the situation slightly. Out of 20 executions 8 succeded. I wrote a logger that would store the QPC timestamps right before each input is sent and dump the values in a file after the sequence is finished. I even went as far as to preallocate space for all variables in my code to make sure that time isn't wasted on needless explicit memory allocations. The log values diverge from the timestamps I supply the program by anything from 1 to 40 miliseconds. General purpose programming can live with that, but in my case a single frame of the game is 16.7 ms, so in the worst possible case with delays like these I can be 3 frames late, which defeats the purpose of the whole experiment.
Setting the process priority to high didn't make any difference.
At this point I'm not sure where to look next. My two guesses are that maybe the time that it takes to iterate the busy loop and check the time using (QPCNow - QPCStart) / QPF is itself somehow long enough to introduce the mentioned delay, or that the process is interrupted by the OS scheduler somwhere along the execution of the loop and control returns too late.
The game is 100% deterministic and locked at 60 fps. I am convinced that if I manage to make the input be timed accurately the result will always be 20 out of 20, but at this point I'm begining to suspect that this may not be possible.
EDIT: As per request here is a stripped down testing version. Breakpoint after the second call to ExecuteAtTime and view the TimeBeforeInput variables. For me it reads 1029 and 6017(I've omitted the decimals) meaning that the code executed 29 and 17 miliseconds after it should have.
Disclaimer: the code is not written to demonstrate good programming practices.
#include "stdafx.h"
#include <windows.h>
__int64 g_TimeStart = 0;
double g_Frequency = 0.0;
double g_TimeBeforeFirstInput = 0.0;
double g_TimeBeforeSecondInput = 0.0;
double GetMSSinceStart(double& debugOutput)
{
LARGE_INTEGER now;
QueryPerformanceCounter(&now);
debugOutput = double(now.QuadPart - g_TimeStart) / g_Frequency;
return debugOutput;
}
void ExecuteAtTime(double ms, INPUT* keys, double& debugOutput)
{
while(GetMSSinceStart(debugOutput) < ms)
{
}
SendInput(2, keys, sizeof(INPUT));
}
INPUT* InitKeys()
{
INPUT* result = new INPUT[2];
ZeroMemory(result, 2*sizeof(INPUT));
INPUT winKey;
winKey.type = INPUT_KEYBOARD;
winKey.ki.wScan = 0;
winKey.ki.time = 0;
winKey.ki.dwExtraInfo = 0;
winKey.ki.wVk = VK_LWIN;
winKey.ki.dwFlags = 0;
result[0] = winKey;
winKey.ki.dwFlags = KEYEVENTF_KEYUP;
result[1] = winKey;
return result;
}
int _tmain(int argc, _TCHAR* argv[])
{
INPUT* keys = InitKeys();
LARGE_INTEGER qpf;
QueryPerformanceFrequency(&qpf);
g_Frequency = double(qpf.QuadPart) / 1000.0;
LARGE_INTEGER qpcStart;
QueryPerformanceCounter(&qpcStart);
g_TimeStart = qpcStart.QuadPart;
//Opens windows start panel one second after launch
ExecuteAtTime(1000.0, keys, g_TimeBeforeFirstInput);
//Closes windows start panel 5 seconds later
ExecuteAtTime(6000.0, keys, g_TimeBeforeSecondInput);
delete[] keys;
Sleep(1000);
return 0;
}

how to run Clock-gettime correctly in Vxworks to get accurate time

I am trying to measure time take by processes in C++ program with linux and Vxworks. I have noticed that clock_gettime(CLOCK_REALTIME, timespec ) is accurate enough (resolution about 1 ns) to do the job on many Oses. For a portability matter I am using this function and running it on both Vxworks 6.2 and linux 3.7.
I ve tried to measure the time taken by a simple print:
#define <timers.h<
#define <iostream>
#define BILLION 1000000000L
int main(){
struct timespec start, end; uint32_t diff;
for(int i=0; i<1000; i++){
clock_gettime(CLOCK_REALTME, &start);
std::cout<<"Do stuff"<<std::endl;
clock_gettime(CLOCK_REALTME, &end);
diff = BILLION*(end.tv_sec-start.tv_sec)+(end.tv_nsec-start.tv_nsec);
std::cout<<diff<<std::endl;
}
return 0;
}
I compiled this on linux and vxworks. For linux results seemed logic (average 20 µs). But for Vxworks, I ve got a lot of zeros , then 5000000 ns , then a lot of zeros...
PS , for vxwroks, I runned this app on ARM-cortex A8, and results seemed random
have anyone seen the same bug before,
In vxworks, the clock resolution is defined by the system scheduler frequency. By default, this is typically 60Hz, however may be different dependant on BSP, kernel configuration, or runtime configuration.
The VxWorks kernel configuration parameters SYS_CLK_RATE_MAX and SYS_CLK_RATE_MIN define the maximum and minimum values supported, and SYS_CLK_RATE defines the default rate, applied at boot.
The actual clock rate can be modified at runtime using sysClkRateSet, either within your code, or from the shell.
You can check the current rate by using sysClkRateGet.
Given that you are seeing either 0 or 5000000ns - which is 5ms, I would expect that your system clock rate is ~200Hz.
To get greater resolution, you can increase the system clock rate. However, this may have undesired side effects, as this will increase the frequency of certain system operations.
A better method of timing code may be to use sysTimestamp which is typically driven from a high frequency timer, and can be used to perform high-res timing of short-lived activities.
I think in vxworks by default the clock resolution is 16.66ms which you can get by calling clock_getres() function. You can change the resolution by calling sysclkrateset() function(max resolution supported is 200us i guess by passing 5000 as argument to sysclkrateset function). You can then calculate the difference between two timestamps using difftime() function

Switching an image at specific frequencies c++

I am currently developing a stimuli provider for the brain's visual cortex as a part of a university project. The program is to (preferably) be written in c++, utilising visual studio and OpenCV. The way it is supposed to work is that the program creates a number of threads, accordingly to the amount of different frequencies, each running a timer for their respective frequency.
The code looks like this so far:
void timerThread(void *param) {
t *args = (t*)param;
int id = args->data1;
float freq = args->data2;
unsigned long period = round((double)1000 / (double)freq)-1;
while (true) {
Sleep(period);
show[id] = 1;
Sleep(period);
show[id] = 0;
}
}
It seems to work okay for some of the frequencies, but others vary quite a lot in frame rate. I have tried to look into creating my own timing function, similar to what is done in Arduino's "blinkWithoutDelay" function, though this worked very badly. Also, I have tried with the waitKey() function, this worked quite like the Sleep() function used now.
Any help would be greatly appreciated!
You should use timers instead of "sleep" to fix this, as sometimes the loop may take more or less time to complete.
Restart the timer at the start of the loop and take its value right before the reset- this'll give you the time it took for the loop to complete.
If this time is greater than the "period" value, then it means you're late, and you need to execute right away (and even lower the period for the next loop).
Otherwise, if it's lower, then it means you need to wait until it is greater.
I personally dislike sleep, and instead constantly restart the timer until it's greater.
I suggest looking into "fixed timestep" code, such as the one below. You'll need to put this snippet of code on every thread with varying values for the period (ns) and put your code where "doUpdates()" is.
If you need a "timer" library, since I don't know OpenCV, I recommend SFML (SFML's timer docs).
The following code is from here:
long int start = 0, end = 0;
double delta = 0;
double ns = 1000000.0 / 60.0; // Syncs updates at 60 per second (59 - 61)
while (!quit) {
start = timeAsMicro();
delta+=(double)(start - end) / ns; // You can skip dividing by ns here and do "delta >= ns" below instead //
end = start;
while (delta >= 1.0) {
doUpdates();
delta-=1.0;
}
}
Please mind the fact that in this code, the timer is never reset.
(This may not be completely accurate but is the best assumption I can make to fix your problem given the code you've presented)

Why does Sleep() slow down subsequent code for 40ms?

I originally asked about this at coderanch.com, so if you've tried to assist me there, thanks, and don't feel obliged to repeat the effort. coderanch.com is mostly a Java community, though, and this appears (after some research) to really be a Windows question, so my colleagues there and I thought this might be a more appropriate place to look for help.
I have written a short program that either spins on the Windows performance counter until 33ms have passed, or else calls Sleep(33). The former exhibits no unexpected effects, but the latter appears to (inconsistently) slow subsequent processing for about 40ms (either that, or it has some effect on the values returned from the performance counter for that long). After the spin or Sleep(), the program calls a routine, runInPlace(), that spins for 2ms, counting the number of times it queries the performance counter, and returning that number.
When the initial 33ms delay is done by spinning, the number of iterations of runInPlace() tends to be (on my Windows 10, XPS-8700) about 250,000. It varies, probably due to other system overhead, but it varies smoothing around 250,000.
Now, when the initial delay is done by calling Sleep(), something strange happens. A lot of the calls to runInPlace() return a number near 250,000, but quite a few of them return a number near 50,000. Again, the range varies around 50,000, fairly smoothly. But, it is clearly averaging one or the other, with nearly no returns anywhere between 80,000 and 150,000. If I call runInPlace() 100 times after each delay, instead of just once, it never returns a number of iterations in the smaller range after the 20th call. As runInPlace() runs for 2ms, this means the behavior I'm observing disappears after 40ms. If I have runInPlace() run for 4ms instead of 2ms, it never returns a number of iterations in the smaller range after the 10th call, so, again, the behavior disappears after 40ms (likewise if have runInPlace() run for only 1ms; the behavior disappears after the 40th call).
Here's my code:
#include "stdafx.h"
#include "Windows.h"
int runInPlace(int msDelay)
{
LARGE_INTEGER t0, t1;
int n = 0;
QueryPerformanceCounter(&t0);
do
{
QueryPerformanceCounter(&t1);
n++;
} while (t1.QuadPart - t0.QuadPart < msDelay);
return n;
}
int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER t0, t1;
LARGE_INTEGER frequency;
int n;
QueryPerformanceFrequency(&frequency);
int msDelay = 2 * frequency.QuadPart / 1000;
int spinDelay = 33 * frequency.QuadPart / 1000;
for (int i = 0; i < 100; i++)
{
if (argc > 1)
Sleep(33);
else
{
QueryPerformanceCounter(&t0);
do
{
QueryPerformanceCounter(&t1);
} while (t1.QuadPart - t0.QuadPart < spinDelay);
}
n = runInPlace(msDelay);
printf("%d \n", n);
}
getchar();
return 0;
}
Here's some output typical of what I get when using Sleep() for the delay:
56116
248936
53659
34311
233488
54921
47904
45765
31454
55633
55870
55607
32363
219810
211400
216358
274039
244635
152282
151779
43057
37442
251658
53813
56237
259858
252275
251099
And here's some output typical of what I get when I spin to create the delay:
276461
280869
276215
280850
188066
280666
281139
280904
277886
279250
244671
240599
279697
280844
159246
271938
263632
260892
238902
255570
265652
274005
273604
150640
279153
281146
280845
248277
Can anyone help me understand this behavior? (Note, I have tried this program, compiled with Visual C++ 2010 Express, on five computers. It only shows this behavior on the two fastest machines I have.)
This sounds like it is due to the reduced clock speed that the CPU will run at when the computer is not busy (SpeedStep). When the computer is idle (like in a sleep) the clock speed will drop to reduce power consumption. On newer CPUs this can be 35% or less of the listed clock speed. Once the computer gets busy again there is a small delay before the CPU will speed up again.
You can turn off this feature (either in the BIOS or by changing the "Minimum processor state" setting under "Processor power management" in the advanced settings of your power plan to 100%.
Besides what #1201ProgramAlarm said (which may very well be, modern processors are extremely fond of downclocking whenever they can), it may also be a cache warming up problem.
When you ask to sleep for a while the scheduler typically schedules another thread/process for the next CPU time quantum, which means that the caches (instruction cache, data cache, TLB, branch predictor data, ...) relative to your process are going to be "cold" again when your code regains the CPU.