Using functions available in the WinAPI, is it possible to ensure that a specific function is called according to a milisecond precise timestamp? And if so what would be the correct implementation?
I'm trying to write tool assisted speedrun software. This type of software sends user input commands at very exact moments after the script is launched to perform humanly impossible inputs that allow faster completion of videogames. A typical sequence looks something like this:
At 0 miliseconds send right key down event
At 5450 miliseconds send right key up, and up key down event
At 5460 miliseconds send left key down event
etc..
What I've tried so far is listed below. As I'm not experienced in the low level nuances of high precision timers I have some results, but no understanding of why they are this way:
Using Sleep in combination with timeBeginPeriod set to 1 between inputs gave the worst results. Out of 20 executions 0 have met the timing requirement. I believe this is well explained in the documentation for sleep Note that a ready thread is not guaranteed to run immediately. Consequently, the thread may not run until some time after the sleep interval elapses. My understanding is that Sleep isn't up for this task.
Using a busy wait loop checking GetTickCount64 with timeBeginPeriod set to 1 produced slightly better results. Out of 20 executions 2 have met the timing requirement, but apparently that was just a fortunate circumstance. I've looked up some info on this timing function and my suspicion is that it doesn't update often enough to allow 1 milisecond accuracy.
Replacing the GetTickCount64 with the QueryPerformanceCounter improved the situation slightly. Out of 20 executions 8 succeded. I wrote a logger that would store the QPC timestamps right before each input is sent and dump the values in a file after the sequence is finished. I even went as far as to preallocate space for all variables in my code to make sure that time isn't wasted on needless explicit memory allocations. The log values diverge from the timestamps I supply the program by anything from 1 to 40 miliseconds. General purpose programming can live with that, but in my case a single frame of the game is 16.7 ms, so in the worst possible case with delays like these I can be 3 frames late, which defeats the purpose of the whole experiment.
Setting the process priority to high didn't make any difference.
At this point I'm not sure where to look next. My two guesses are that maybe the time that it takes to iterate the busy loop and check the time using (QPCNow - QPCStart) / QPF is itself somehow long enough to introduce the mentioned delay, or that the process is interrupted by the OS scheduler somwhere along the execution of the loop and control returns too late.
The game is 100% deterministic and locked at 60 fps. I am convinced that if I manage to make the input be timed accurately the result will always be 20 out of 20, but at this point I'm begining to suspect that this may not be possible.
EDIT: As per request here is a stripped down testing version. Breakpoint after the second call to ExecuteAtTime and view the TimeBeforeInput variables. For me it reads 1029 and 6017(I've omitted the decimals) meaning that the code executed 29 and 17 miliseconds after it should have.
Disclaimer: the code is not written to demonstrate good programming practices.
#include "stdafx.h"
#include <windows.h>
__int64 g_TimeStart = 0;
double g_Frequency = 0.0;
double g_TimeBeforeFirstInput = 0.0;
double g_TimeBeforeSecondInput = 0.0;
double GetMSSinceStart(double& debugOutput)
{
LARGE_INTEGER now;
QueryPerformanceCounter(&now);
debugOutput = double(now.QuadPart - g_TimeStart) / g_Frequency;
return debugOutput;
}
void ExecuteAtTime(double ms, INPUT* keys, double& debugOutput)
{
while(GetMSSinceStart(debugOutput) < ms)
{
}
SendInput(2, keys, sizeof(INPUT));
}
INPUT* InitKeys()
{
INPUT* result = new INPUT[2];
ZeroMemory(result, 2*sizeof(INPUT));
INPUT winKey;
winKey.type = INPUT_KEYBOARD;
winKey.ki.wScan = 0;
winKey.ki.time = 0;
winKey.ki.dwExtraInfo = 0;
winKey.ki.wVk = VK_LWIN;
winKey.ki.dwFlags = 0;
result[0] = winKey;
winKey.ki.dwFlags = KEYEVENTF_KEYUP;
result[1] = winKey;
return result;
}
int _tmain(int argc, _TCHAR* argv[])
{
INPUT* keys = InitKeys();
LARGE_INTEGER qpf;
QueryPerformanceFrequency(&qpf);
g_Frequency = double(qpf.QuadPart) / 1000.0;
LARGE_INTEGER qpcStart;
QueryPerformanceCounter(&qpcStart);
g_TimeStart = qpcStart.QuadPart;
//Opens windows start panel one second after launch
ExecuteAtTime(1000.0, keys, g_TimeBeforeFirstInput);
//Closes windows start panel 5 seconds later
ExecuteAtTime(6000.0, keys, g_TimeBeforeSecondInput);
delete[] keys;
Sleep(1000);
return 0;
}
Related
I am currently developing a stimuli provider for the brain's visual cortex as a part of a university project. The program is to (preferably) be written in c++, utilising visual studio and OpenCV. The way it is supposed to work is that the program creates a number of threads, accordingly to the amount of different frequencies, each running a timer for their respective frequency.
The code looks like this so far:
void timerThread(void *param) {
t *args = (t*)param;
int id = args->data1;
float freq = args->data2;
unsigned long period = round((double)1000 / (double)freq)-1;
while (true) {
Sleep(period);
show[id] = 1;
Sleep(period);
show[id] = 0;
}
}
It seems to work okay for some of the frequencies, but others vary quite a lot in frame rate. I have tried to look into creating my own timing function, similar to what is done in Arduino's "blinkWithoutDelay" function, though this worked very badly. Also, I have tried with the waitKey() function, this worked quite like the Sleep() function used now.
Any help would be greatly appreciated!
You should use timers instead of "sleep" to fix this, as sometimes the loop may take more or less time to complete.
Restart the timer at the start of the loop and take its value right before the reset- this'll give you the time it took for the loop to complete.
If this time is greater than the "period" value, then it means you're late, and you need to execute right away (and even lower the period for the next loop).
Otherwise, if it's lower, then it means you need to wait until it is greater.
I personally dislike sleep, and instead constantly restart the timer until it's greater.
I suggest looking into "fixed timestep" code, such as the one below. You'll need to put this snippet of code on every thread with varying values for the period (ns) and put your code where "doUpdates()" is.
If you need a "timer" library, since I don't know OpenCV, I recommend SFML (SFML's timer docs).
The following code is from here:
long int start = 0, end = 0;
double delta = 0;
double ns = 1000000.0 / 60.0; // Syncs updates at 60 per second (59 - 61)
while (!quit) {
start = timeAsMicro();
delta+=(double)(start - end) / ns; // You can skip dividing by ns here and do "delta >= ns" below instead //
end = start;
while (delta >= 1.0) {
doUpdates();
delta-=1.0;
}
}
Please mind the fact that in this code, the timer is never reset.
(This may not be completely accurate but is the best assumption I can make to fix your problem given the code you've presented)
I originally asked about this at coderanch.com, so if you've tried to assist me there, thanks, and don't feel obliged to repeat the effort. coderanch.com is mostly a Java community, though, and this appears (after some research) to really be a Windows question, so my colleagues there and I thought this might be a more appropriate place to look for help.
I have written a short program that either spins on the Windows performance counter until 33ms have passed, or else calls Sleep(33). The former exhibits no unexpected effects, but the latter appears to (inconsistently) slow subsequent processing for about 40ms (either that, or it has some effect on the values returned from the performance counter for that long). After the spin or Sleep(), the program calls a routine, runInPlace(), that spins for 2ms, counting the number of times it queries the performance counter, and returning that number.
When the initial 33ms delay is done by spinning, the number of iterations of runInPlace() tends to be (on my Windows 10, XPS-8700) about 250,000. It varies, probably due to other system overhead, but it varies smoothing around 250,000.
Now, when the initial delay is done by calling Sleep(), something strange happens. A lot of the calls to runInPlace() return a number near 250,000, but quite a few of them return a number near 50,000. Again, the range varies around 50,000, fairly smoothly. But, it is clearly averaging one or the other, with nearly no returns anywhere between 80,000 and 150,000. If I call runInPlace() 100 times after each delay, instead of just once, it never returns a number of iterations in the smaller range after the 20th call. As runInPlace() runs for 2ms, this means the behavior I'm observing disappears after 40ms. If I have runInPlace() run for 4ms instead of 2ms, it never returns a number of iterations in the smaller range after the 10th call, so, again, the behavior disappears after 40ms (likewise if have runInPlace() run for only 1ms; the behavior disappears after the 40th call).
Here's my code:
#include "stdafx.h"
#include "Windows.h"
int runInPlace(int msDelay)
{
LARGE_INTEGER t0, t1;
int n = 0;
QueryPerformanceCounter(&t0);
do
{
QueryPerformanceCounter(&t1);
n++;
} while (t1.QuadPart - t0.QuadPart < msDelay);
return n;
}
int _tmain(int argc, _TCHAR* argv[])
{
LARGE_INTEGER t0, t1;
LARGE_INTEGER frequency;
int n;
QueryPerformanceFrequency(&frequency);
int msDelay = 2 * frequency.QuadPart / 1000;
int spinDelay = 33 * frequency.QuadPart / 1000;
for (int i = 0; i < 100; i++)
{
if (argc > 1)
Sleep(33);
else
{
QueryPerformanceCounter(&t0);
do
{
QueryPerformanceCounter(&t1);
} while (t1.QuadPart - t0.QuadPart < spinDelay);
}
n = runInPlace(msDelay);
printf("%d \n", n);
}
getchar();
return 0;
}
Here's some output typical of what I get when using Sleep() for the delay:
56116
248936
53659
34311
233488
54921
47904
45765
31454
55633
55870
55607
32363
219810
211400
216358
274039
244635
152282
151779
43057
37442
251658
53813
56237
259858
252275
251099
And here's some output typical of what I get when I spin to create the delay:
276461
280869
276215
280850
188066
280666
281139
280904
277886
279250
244671
240599
279697
280844
159246
271938
263632
260892
238902
255570
265652
274005
273604
150640
279153
281146
280845
248277
Can anyone help me understand this behavior? (Note, I have tried this program, compiled with Visual C++ 2010 Express, on five computers. It only shows this behavior on the two fastest machines I have.)
This sounds like it is due to the reduced clock speed that the CPU will run at when the computer is not busy (SpeedStep). When the computer is idle (like in a sleep) the clock speed will drop to reduce power consumption. On newer CPUs this can be 35% or less of the listed clock speed. Once the computer gets busy again there is a small delay before the CPU will speed up again.
You can turn off this feature (either in the BIOS or by changing the "Minimum processor state" setting under "Processor power management" in the advanced settings of your power plan to 100%.
Besides what #1201ProgramAlarm said (which may very well be, modern processors are extremely fond of downclocking whenever they can), it may also be a cache warming up problem.
When you ask to sleep for a while the scheduler typically schedules another thread/process for the next CPU time quantum, which means that the caches (instruction cache, data cache, TLB, branch predictor data, ...) relative to your process are going to be "cold" again when your code regains the CPU.
I am trying to get my arduino mega to run a function in the background while it is also running a bunch of other functions.
The function that I am trying to run in the background is a function to determine wind speed from an anemometer. The way it processes the data is similar to that of an odometer in that it reads the number of turns that the anemometer makes during a set time period and then takes that number of turns over the time to determine the wind speed. The longer time period that i have it run over the more accurate data i receive as there is more data to average.
The problem that i have is there is a bunch of other data that i am also reading in to the arduino which i would like to be reading in once a second. This one second time interval is too short for me to get accurate wind readings as not enough revolutions are being completed by the anemometer to give high accuracy wind data.
Is there a way to have the wind sensor function run in the background and update a global variable once every 5 seconds or so while the rest of my program is running simultaneously and updating the other data every second.
Here is the code that i have for reading the data from the wind sensor. Every time the wind sensor makes a revolution there is a portion where the signal reads in as 0, otherwise the sensor reads in as a integer larger than 0.
void windmeterturns(){
startime = millis();
endtime = startime + 5000;
windturncounter = 0;
turned = false;
int terminate = startime;
while(terminate <= endtime){
terminate = millis();
windreading = analogRead(windvelocityPin);
if(windreading == 0){
if(turned == true){
windturncounter = windturncounter + 1;
turned = false;
}
}
else if(windreading >= 1){
turned = true;
}
delay(5);
}
}
The rest of the processing of takes place in another function but this is the one that I am currently struggling with. Posting the whole code would not really be reasonable here as it is close to a 1000 lines.
The rest of the functions run with a 1 second delay in the loop but as i have found through trial and error the delay along with the processing of the other functions make it so that the delay is actually longer than a second and it varies based off of what kind of data i am reading in from the other sensors so a 5 loop counter for timing i do not think will work here
Let Interrupts do the work for you.
In short, I recommend using a Timer Interrupt to generate a periodic interrupt that measures the analog reading in the background. Subsequently this can update a static volatile variable.
See my answer here as it is a similar scenario, detailing how to use the timer interrupt. Where you can replace the callback() with your above analogread and increment.
Without seeing how the rest of your code is set up, I would try having windturncounter as a global variable, and add another integer that is iterated every second your main program loops. Then:
// in the main loop
if(iteratorVariable >= 5){
iteratorVariable = 0;
// take your windreading and implement logic here
} else {
iteratorVariable++;
}
I'm not sure how your anemometer stores data or what other challenges you might be facing, so this may not be a 100% solution, but it would allow you to run the logic from your original post every five seconds.
So I made a game loop that uses SDL_Delay function to cap the frames per second, it look like this:
//While the user hasn't qui
while( stateID != STATE_EXIT )
{
//Start the frame timer
fps.start();
//Do state event handling
currentState->handle_events();
//Do state logic
currentState->logic();
//Change state if needed
change_state();
//Do state rendering
currentState->render();
//Update the screen
if( SDL_Flip( screen ) == -1 )
{
return 1;
}
//Cap the frame rate
if( fps.get_ticks() < 1000 / FRAMES_PER_SECOND )
{
SDL_Delay( ( 1000 / FRAMES_PER_SECOND ) - fps.get_ticks() );
}
}
So when I run my games on 60 frames per second (which is the "eye cap" I assume) I can still see laggy type of motion, meaning i see the frames appearing independently causing unsmooth motion.
This is because apparently SDL_Delay function is not too accurate, causing +,- 15 milliseconds or something difference between frames greater than whatever I want it to be.
(all these are just my assumptions)
so I am just searching fo a good and accurate timer that will help me with this problem.
any suggestions?
I think there is a similar question in How to make thread sleep less than a millisecond on Windows
But as a game programmer myself, I don't rely on sleep functions to manage frame-rate (the parameter they take is just a minimum). I just draw stuff on screen as fast as I can. I have a bunch of function calls in my game loop, and then I keep track of how often I'm calling them. For instance, I check input quite often (1000x/second) to make the game more responsive, but I don't check the network inbox more than 100x/second.
For example:
#define NW_CHECK_INTERVAL 10
#define INPUT_CHECK_INTERVAL 1
uint32_t last_nw_check = 0, last_input_check = 0;
while (game_running) {
uint32_t now = SDL_GetTicks();
if (now - last_nw_check > NW_CHECK_INTERVAL) {
check_network();
last_nw_check = now;
}
if (now - last_input_check > INPUT_CHECK_INTERVAL) {
check_input();
last_input_check = now;
}
check_video();
// and so on...
}
Use the QueryPerformanceCounter / Frequency for that.
LARGE_INTEGER start, end, tps; //tps = ticks per second
QueryPerformanceFrequency( &tps );
QueryPerformanceCounter( &start );
QueryPerformanceCounter( &end );
int usPassed = (end.QuadPart - start.QuadPart) * 1000000 / tps.QuadPart;
Here's a small wait function I had created for timing midi sequences using QueryPerformanceCounter:
void wait(int waitTime) {
LARGE_INTEGER time1, time2, freq;
if(waitTime == 0)
return;
QueryPerformanceCounter(&time1);
QueryPerformanceFrequency(&freq);
do {
QueryPerformanceCounter(&time2);
} while((time2.QuadPart - time1.QuadPart) * 1000000ll / freq.QuadPart < waitTime);
}
To convert ticks to microseconds, calculate the difference in ticks, multiply by 1,000,000 (microseconds/second) and divide by the frequency of ticks per second.
Note that some things may throw this off, for instance the precision of the high-resolution counter is not likely to be down to a single microsecond. For example, if you want to wait 10 microseconds and the precision/frequency is one tick every 6 microseconds, your 10 microsecond wait will actually be no less than 12 microseconds. Again, this frequency is system dependent and will vary from system to system.
Also, Windows is not a real-time operating system. A process may be preempted at any time and it is up to Windows to decide when the process is rescheduled. The application may be preempted in the middle of this function and not restarted again until long after the expected wait time has elapsed. There really isn't much you can do about it but you'll probably never notice it if it happens.
60 fame per second is just the frequency of power in US (50 in Europe, Africa and Asia are somehow mixed) and is the frequency of video refreshing for hardware comfortable reasons (It can be an integer multiple on more sophisticated monitors). It was a mandatory constrains for CRT dispaly, and it is still a comfortable reference for LCD (that's how frequently the frame buffer is uploaded to the display)
The eye-cap is no more than 20-25 fps - not to be confused with retina persistency, that's about one-half - and that's why TV interlace two squares upon every refresh.
independently on the timing accuracy, whatever hardware device cannot be updated during its buffer-scan (otherwise the image changes while it is shown, resulting in half-drawn broken frames), hence, if you go faster than one half of the device refresh you are queued behind it and forced to wait for it.
60 fps in a game loop serves only to help CPU manufacturers to sell new faster CPUs. Slow down under 25 and everything will look more fluid.
SDL_Delay:
This function waits a specified number of milliseconds before returning. It waits at least the specified time, but possible longer due to OS scheduling. The delay granularity is at least 10 ms. Some platforms have shorter clock ticks but this is the most common.
The actual delays observed with this function depend on OS settings. I'd suggest to look into the
Mutimedia Timer API, particulary into the timeBeginPeriod function, to adapt the interrupt frequency to your requirements.
Obtaining and Setting Timer Resolution shows an example how to change the interrupt period to about 1ms. This way you don't have the 15ms hickup anymore. BTW: Eye-catch period is about 40ms.
Obtaining fixed period timing can also be addressed by Waitable Timer Objects. But the use of mutimedia timers is mandatory to obtain decent resolution, no matter what.
Using other tools to improve the timing capabilities is discussed here.
Is there a way to limit iterations per time unit? For example, I have a loop like this:
for (int i = 0; i < 100000; i++)
{
// do stuff
}
I want to limit the loop above so there will be maximum of 30 iterations per second.
I would also like the iterations to be evenly positioned in the timeline so not something like 30 iterations in first 0.4s and then wait 0.6s.
Is that possible? It does not have to be completely precise (though the more precise it will be the better).
#FredOverflow My program is running
very fast. It is sending data over
wifi to another program which is not
fast enough to handle them at the
current rate. – Richard Knop
Then you should probably have the program you're sending data to send an acknowledgment when it's finished receiving the last chunk of data you sent then send the next chunk. Anything else will just cause you frustrations down the line as circumstances change.
Suppose you have a good Now() function (GetTickCount() is bad example, it's OS specific and has bad precision):
for (int i = 0; i < 1000; i++){
DWORD have_to_sleep_until = GetTickCount() + EXPECTED_ITERATION_TIME_MS;
// do stuff
Sleep(max(0, have_to_sleep_until - GetTickCount()));
};
You can check elapsed time inside the loop, but it may be not an usual solution. Because computation time is totally up to the performance of the machine and algorithm, people optimize it during their development time(ex. many game programmer requires at least 25-30 frames per second for properly smooth animation).
easiest way (for windows) is to use QueryPerformanceCounter(). Some pseudo-code below.
QueryPerformanceFrequency(&freq)
timeWanted = 1.0/30.0 //time per iteration if 30 iterations / sec
for i
QueryPerf(count1)
do stuff
queryPerf(count2)
timeElapsed = (double)(c2 - c1) * (double)(1e3) / double(freq) //time in milliseconds
timeDiff = timeWanted - timeElapsed
if (timeDiff > 0)
QueryPerf(c3)
QueryPerf(c4)
while ((double)(c4 - c3) * (double)(1e3) / double(freq) < timeDiff)
queryPerf(c4)
end for
EDIT: You must make sure that the 'do stuff' area takes less time than your framerate or else it doesn't matter. Also instead of 1e3 for milliseconds, you can go all the way to nanoseconds if you do 1e9 (if you want that much accuracy)
WARNING... this will eat your CPU but give you good 'software' timing... Do it in a separate thread (and only if you have more than 1 processor) so that any guis wont lock. You can put a conditional in there to stop the loop if this is a multi-threaded app too.
#FredOverflow My program is running very fast. It is sending data over wifi to another program which is not fast enough to handle them at the current rate. – Richard Knop
What you might need a buffer or queue at the receiver side. The thread that receives the messages from the client (like through a socket) get the message and put it in the queue. The actual consumer of the messages reads/pops from the queue. Of course you need concurrency control for your queue.
Besides the flow control methods mentioned, if you also have the need to maintain an accurate specific data sending rate in your sender part. Usually it can be done like this.
E.x. if you want to send at 10Mbps, create a timer of interval 1ms so it will call a predefined function every 1ms. Then in the timer handler function, by keep tracking of 2 static variables 1)Time elapsed since beginning of sending data 2)How much data in bytes have been sent up to last call, you can easily calculate how much data is needed to be sent in the current call (or just sleep and wait for next call).
By this way, you can do "streaming" of data in a very stable way with very little jitterness, and this is usually adopted in streaming of videos. Of course it also depends on how accurate the timer is.