Precision timing callback in Qt C++ doesn't give expected results - c++

I'm trying to implement high resolution timing for a Game Boy emulator. A 16-17 millisecond timer is sufficient to get the emulation at roughly the right speed, but it eventually loses sync with precision emulations like BGB.
I originally used QElapsedTimer in a while loop. This gave the expected results and kept sync with BGB, but it feels really sloppy and eats up as much CPU time as it possibly can because of the constantly-running while loop. It also keeps the program resident in Task Manager after closing. I tried implementing it using a one millisecond QTimer that tests the QElapsedTimer before executing the next frame. Despite the reduced resolution, I figured that the timing would average out to the correct speed due to checking the QElapsedTimer. This is what I currently have:
void Platform::start() {
nanoSecondsPerFrame = 1000000000 / system->getRefreshRate();
speedRegulationTimer->start();
emulationUpdateTimer->start(1);
}
void Platform::executionLoop() {
qint64 timeDelay;
if (frameLocked == true)
timeDelay = nanoSecondsPerFrame;
else
timeDelay = 0;
if (speedRegulationTimer->nsecsElapsed() >= timeDelay) {
speedRegulationTimer->restart();
// Execute the cycles of the emulated system for one frame.
system->setControllerInputs(buttonInputs);
system->executeCycles();
if (system->getIsRunning() == false) {
this->stop();
errorMessage = QString::fromStdString(system->getSystemError());
}
//timeDelay = speedRegulationTimer->nsecsElapsed();
FPS++;
}
}
nanoSecondsPerFrame calculates to 16742005 for the 59.73 Hz refresh rate. speedRegulationTimer is the QElapsedTimer. emulationUpdateTimer is a QTimer set to Qt:PreciseTimer and is connected to executionLoop. The emulation does run, but at about 50-51 FPS instead of the expected 59-60 FPS. This is definitely due to the timing because running it without timing restraints results in an exponentially higher frame rate. Either there's an obvious oversight in my code or the timers aren't working like I expect. If anyone sees an obvious problem or could offer some advice on this, I'd appreciate it.

I'd suggest using QElapsedTimer to keep track of when your next frame should be executed (ideally) and then dynamically computing a QTimer::singleShot() call's msec argument based on that, so that your timing loop automatically compensates for the time it takes for the GameBoy code to run; that way you can avoid the "drifts away from sync" problem that you mentioned. Something like this:
// Warning: uncompiled/untested code, may contain errors
class Platform : public QObject
{
Q_OBJECT;
public:
Platform() {/* empty */}
void Start()
{
_nanosecondsPerFrame = 1000000000 / system->getRefreshRate();
_clock.start();
_nextSignalTime = _clock.elapsed();
ScheduleNextSignal();
}
private slots:
void ExecuteFrame()
{
// called 59.73 times per second, on average
[... do GameBoy calls here...]
ScheduleNextSignal();
}
private:
void ScheduleNextSignal()
{
_nextSignalTime += _nanosecondsPerFrame;
QTimer::singleShot(NanosToMillis(_nextSignalTime-_clock.elapsed()), Qt::PreciseTimer, this, SLOT(ExecuteFrame()));
}
int NanosToMillis(qint64 nanos) const
{
const quint64 _halfAMillisecondInNanos = 500 * 1000; // so that we'll round to the nearest millisecond rather than always rounding down
return (int) ((nanos+_halfAMillisecondInNanos)/(1000*1000));
}
QElapsedTimer _clock;
quint64 _nextSignalTime;
quint64 _nanosecondsPerFrame;
};

I'm adding my own answer based on Jeremy Friesner's suggestion. The 50 FPS issue was caused by another QTimer with a similar timing overlapping with the one used to regulate the emulation updates. I didn't realize that QTimers with nearly the same timeouts could throw timing off by that much, but apparently they can. This is my variation on Jeremy's suggestion if anyone is interested:
void Platform::start() {
nanoSecondsPerFrame = 1000000000 / system->getRefreshRate();
milliSecondsPerFrame = (double)nanoSecondsPerFrame / 1000000;
speedRegulationTimer->start();
executionLoop();
}
void Platform::executionLoop() {
qint8 timeDelay;
if (frameLocked == true)
timeDelay = round(milliSecondsPerFrame - (speedRegulationTimer->nsecsElapsed() / nanoSecondsPerFrame));
else
timeDelay = 1;
if (timeDelay <= 0)
timeDelay = 1;
speedRegulationTimer->restart();
QTimer::singleShot(timeDelay, Qt::PreciseTimer, this, SLOT(executionLoop()));
system->setControllerInputs(buttonInputs);
system->executeCycles();
if (system->getIsRunning() == false) {
this->stop();
errorMessage = QString::fromStdString(system->getSystemError());
}
emit screenUpdate();
FPS++;
}
If the function takes longer than it should to be called, it reduces the number of milliseconds until the next call. Using this implementation, the difference in speed with BGB is practically imperceptible with little CPU time wasted.

You can use a QTimer with the type Qt::PreciseTimer

Related

Is Increment Speed Affected By Clock Rate

Consider the loop below. This is a simplified example of a problem I am trying to solve. I want to limit the number of times doSomething function is called in each second. Since the loop works very fast, I thought I could use a rate limiter. Let's assume that I have found an appropriate value by running it with different x numbers.
unsigned int incrementionRate = x;
unsigned int counter == 0;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
counter = (counter + 1) % incrementionRate;
if (counter == 0) {
doSomething();
}
}
I wonder if the number of calls to doSomething function would be less if I was working on a lower clock rate. In that case, I would like to limit the number of calls to doSomething function to once for each second. The second loop I have written is below.
float epsilon = 0.0001;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
if (abs(seconds - floor(seconds)) <= epsilon) {
doSomething();
}
}
Would that do the trick for different clock cycles or are there still problems? Also, I would like to know if there is a better way of doing this. I have never worked with clock rates before and trying to understand how concerns differ when working with limited resources.
Note: Using sleep is not an option.
If I understand the issue proberly, you could use a std::chrono::steady_clock that you just add a second to every time a second has passed.
Example:
#include <chrono>
auto end_time = std::chrono::steady_clock::now();
while (true) {
// only call doSomething once a second
if(end_time < std::chrono::steady_clock::now()) {
doSomething();
// set a new end time a second after the previous one
end_time += std::chrono::seconds(1);
}
// do something else
}
Ted's answer is fine if you are really doing something else in the loop; if not, though, this results in a busy wait which is just consuming up your CPU for nothing.
In such a case you should rather prefer letting your thread sleep:
std::chrono::milliseconds offset(200);
auto next = std::chrono::steady_clock::now();
for(;;)
{
doSomething();
next += offset;
std::this_thread::sleep_until(next);
}
You'll need to include chrono and thread headers for.
I decided to go with a much more simple approach at the end. Used an adjustable time interval and just stored the latest update time, without introducing any new mechanism. Honestly, now I don't know why I couldn't think of it at first. Overthinking is a problem. :)
double lastUpdateTimestamp = 0;
const double updateInterval = 1.0;
while (true) {
double seconds = getElapsedSeconds();
print(seconds);
if ((elapsedSeconds - lastUpdateTimestamp) >= updateInterval) {
doSomething();
lastUpdateTimestamp = elapsedSeconds;
}
}

Inconsistent chrono::high_resolution_clock delay

I'm trying to implement a MIDI-like clocked sample player.
There is a timer, which increments pulse counter, and every 480 pulses is a quarter, so pulse period is 1041667 ns for 120 beats per minute.
Timer is not sleep-based and running in separate thread, but it seems like delay time is inconsistent: period between samples played in a test file is fluctuating +- 20 ms (in some occasions period is OK and steady, I can't find out dependency of this effect).
Audio backend influence is excluded: i've tried OpenAL as well as SDL_mixer.
void Timer_class::sleep_ns(uint64_t ns){
auto start = std::chrono::high_resolution_clock::now();
bool sleep = true;
while(sleep)
{
auto now = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::nanoseconds>(now - start);
if (elapsed.count() >= ns) {
TestTime = elapsed.count();
sleep = false;
//break;
}
}
}
void Timer_class::Runner(void){
// this running as thread
while(1){
sleep_ns(BPMns);
if (Run) Transport.IncPlaybackMarker(); // marker increment
if (Transport.GetPlaybackMarker() == Transport.GetPlaybackEnd()){ // check if timer have reached end, which is 480 pulses
Transport.SetPlaybackMarker(Transport.GetPlaybackStart());
Player.PlayFile(1); // period of this event fluctuates severely
}
}
};
void Player_class::PlayFile(int FileNumber){
#ifdef AUDIO_SDL_MIXER
if(Mix_PlayChannel(-1, WaveData[FileNumber], 0)==-1) {
printf("Mix_PlayChannel: %s\n",Mix_GetError());
}
#endif // AUDIO_SDL_MIXER
}
Am i doing something wrong in terms of an approach? Is there any better way to implement timer of this kind?
Deviation higher than 4-5 ms is too much in case of audio.
I see a large error and a small error. The large error is that your code assumes that the main processing in Runner consistently takes zero time:
if (Run) Transport.IncPlaybackMarker(); // marker increment
if (Transport.GetPlaybackMarker() == Transport.GetPlaybackEnd()){ // check if timer have reached end, which is 480 pulses
Transport.SetPlaybackMarker(Transport.GetPlaybackStart());
Player.PlayFile(1); // period of this event fluctuates severely
}
That is, you're "sleeping" for the time you want your loop iteration to take, and then you're doing processing on top of that.
The small error is presuming that you can represent your ideal loop iteration time with an integral number of nanoseconds. This error is so small that it doesn't really matter. However I amuse myself by showing people how they can get rid of this error too. :-)
First lets correct the small error by exactly representing the idealized loop iteration time:
using quarterPeriod = std::ratio<1, 2>;
using iterationPeriod = std::ratio_divide<quarterPeriod, std::ratio<480>>;
using iteration_time = std::chrono::duration<std::int64_t, iterationPeriod>;
I know nothing of music, but I'm guessing the above code is right because if you convert iteration_time{1} to nanoseconds, you get approximately 1041667ns. iteration_time{1} is intended to be the precise amount of time you want each iteration of your loop in Timer_class::Runner to take.
To correct the large error, you need to sleep until a time_point, as opposed to sleeping for a duration. Here's a generic utility to help you do that:
template <class Clock, class Duration>
void
delay_until(std::chrono::time_point<Clock, Duration> tp)
{
while (Clock::now() < tp)
;
}
Now if you code Timer_class::Runner to use delay_until instead of sleep_ns, I think you'll get better results:
void
Timer_class::Runner()
{
auto next_start = std::chrono::steady_clock::now() + iteration_time{1};
while (true)
{
if (Run) Transport.IncPlaybackMarker(); // marker increment
if (Transport.GetPlaybackMarker() == Transport.GetPlaybackEnd()){ // check if timer have reached end, which is 480 pulses
Transport.SetPlaybackMarker(Transport.GetPlaybackStart());
Player.PlayFile(1);
}
delay_until(next_start);
next_start += iteration_time{1};
}
}
I ended up using #howard-hinnant version of delay, and reducing buffer size in openal-soft, that's what made a huge difference, fluctuations is now about +-5 ms for 1/16th at 120BPM (125 ms period) and +-1 ms for quarter beats. Leaves a lot to be desired, but i guess it's okay

Uniformly Regulating Program Execution Rate [Windows C++]

First off, I found a lot of information on this topic, but no solutions that solved the issue unfortunately.
I'm simply trying to regulate my C++ program to run at 60 iterations per second. I've tried everything from GetClockTicks() to GetLocalTime() to help in the regulation but every single time I run the program on my Windows Server 2008 machine, it runs slower than on my local machine and I have no clue why!
I understand that "clock" based function calls return CPU time spend on the execution so I went to GetLocalTime and then tried to differentiate between the start time and the stop time then call Sleep((FPS / 1000) - millisecondExecutionTime)
My local machine is quite faster than the servers CPU so obviously the thought was that it was going off of CPU ticks, but that doesn't explain why the GetLocalTime doesn't work. I've been basing this method off of http://www.lazyfoo.net/SDL_tutorials/lesson14/index.php changing the get_ticks() with all of the time returning functions I could find on the web.
For example take this code:
#include <Windows.h>
#include <time.h>
#include <string>
#include <iostream>
using namespace std;
int main() {
int tFps = 60;
int counter = 0;
SYSTEMTIME gStart, gEnd, start_time, end_time;
GetLocalTime( &gStart );
bool done = false;
while(!done) {
GetLocalTime( &start_time );
Sleep(10);
counter++;
GetLocalTime( &end_time );
int startTimeMilli = (start_time.wSecond * 1000 + start_time.wMilliseconds);
int endTimeMilli = (end_time.wSecond * 1000 + end_time.wMilliseconds);
int time_to_sleep = (1000 / tFps) - (endTimeMilli - startTimeMilli);
if (counter > 240)
done = true;
if (time_to_sleep > 0)
Sleep(time_to_sleep);
}
GetLocalTime( &gEnd );
cout << "Total Time: " << (gEnd.wSecond*1000 + gEnd.wMilliseconds) - (gStart.wSecond*1000 + gStart.wMilliseconds) << endl;
cin.get();
}
For this code snippet, run on my computer (3.06 GHz) I get a total time (ms) of 3856 whereas on my server (2.53 GHz) I get 6256. So it potentially could be the speed of the processor though the ratio of 2.53/3.06 is only .826797386 versus 3856/6271 is .614893956.
I can't tell if the Sleep function is doing something drastically different than expected though I don't see why it would, or if it is my method for getting the time (even though it should be in world time (ms) not clock cycle time. Any help would be greatly appreciated, thanks.
For one thing, Sleep's default resolution is the computer's quota length - usually either 10ms or 15ms, depending on the Windows edition. To get a resolution of, say, 1ms, you have to issue a timeBeginPeriod(1), which reprograms the timer hardware to fire (roughly) once every millisecond.
In your main loop you can
int main()
{
// Timers
LONGLONG curTime = NULL;
LONGLONG nextTime = NULL;
Timers::GameClock::GetInstance()->GetTime(&nextTime);
while (true) {
Timers::GameClock::GetInstance()->GetTime(&curTime);
if ( curTime > nextTime && loops <= MAX_FRAMESKIP ) {
nextTime += Timers::GameClock::GetInstance()->timeCount;
// Business logic goes here and occurr based on the specified framerate
}
}
}
using this time library
include "stdafx.h"
LONGLONG cacheTime;
Timers::SWGameClock* Timers::SWGameClock::pInstance = NULL;
Timers::SWGameClock* Timers::SWGameClock::GetInstance ( ) {
if (pInstance == NULL) {
pInstance = new SWGameClock();
}
return pInstance;
}
Timers::SWGameClock::SWGameClock(void) {
this->Initialize ( );
}
void Timers::SWGameClock::GetTime ( LONGLONG * t ) {
// Use timeGetTime() if queryperformancecounter is not supported
if (!QueryPerformanceCounter( (LARGE_INTEGER *) t)) {
*t = timeGetTime();
}
cacheTime = *t;
}
LONGLONG Timers::SWGameClock::GetTimeElapsed ( void ) {
LONGLONG t;
// Use timeGetTime() if queryperformancecounter is not supported
if (!QueryPerformanceCounter( (LARGE_INTEGER *) &t )) {
t = timeGetTime();
}
return (t - cacheTime);
}
void Timers::SWGameClock::Initialize ( void ) {
if ( !QueryPerformanceFrequency((LARGE_INTEGER *) &this->frequency) ) {
this->frequency = 1000; // 1000ms to one second
}
this->timeCount = DWORD(this->frequency / TICKS_PER_SECOND);
}
Timers::SWGameClock::~SWGameClock(void)
{
}
with a header file that contains the following:
// Required for rendering stuff on time
#pragma once
#define TICKS_PER_SECOND 60
#define MAX_FRAMESKIP 5
namespace Timers {
class SWGameClock
{
public:
static SWGameClock* GetInstance();
void Initialize ( void );
DWORD timeCount;
void GetTime ( LONGLONG* t );
LONGLONG GetTimeElapsed ( void );
LONGLONG frequency;
~SWGameClock(void);
protected:
SWGameClock(void);
private:
static SWGameClock* pInstance;
}; // SWGameClock
} // Timers
This will ensure that your code runs at 60FPS (or whatever you put in) though you can probably dump the MAX_FRAMESKIP as that's not truly implemented in this example!
You could try a WinMain function and use the SetTimer function and a regular message loop (you can also take advantage of the filter mechanism of GetMessage( ... ) ) in which you test for the WM_TIMER message with the requested time and when your counter reaches the limit do a PostQuitMessage(0) to terminate the message loop.
For a duty cycle that fast, you can use a high accuracy timer (like QueryPerformanceTimer) and a busy-wait loop.
If you had a much lower duty cycle, but still wanted precision, then you could Sleep for part of the time and then eat up the leftover time with a busy-wait loop.
Another option is to use something like DirectX to sync yourself to the VSync interrupt (which is almost always 60 Hz). This can make a lot of sense if you're coding a game or a/v presentation.
Windows is not a real-time OS, so there will never be a perfect way to do something like this, as there's no guarantee your thread will be scheduled to run exactly when you need it to.
Note that in the remarks for Sleep, the actual amount of time will be at least one "tick" and possible one whole "tick" longer than the delay you requested before the thread is scheduled to run again (and then we have to assume the thread is scheduled). The "tick" can vary a lot depending on hardware and the version of Windows. It is commonly in the 10-15 ms range, and I've seen it as bad as 19 ms. For 60 Hz, you need 16.666 ms per iteration, so this is obviously not nearly precise enough to give you what you need.
What about rendering (iterating) based on the time elapsed between rendering of each frame? Consider creating a void render(double timePassed) function and render depending on the timePassed parameter instead of putting program to sleep.
Imagine, for example, you want to render a ball falling or bouncing. You would know it's speed, acceleration and all other physics that you need. Calculate the position of the ball based on timePassed and all other physics parameters (speed, acceleration, etc.).
Or if you prefer, you could just skip the render() function execution if time passed is a value to small, instead of puttin program to sleep.

pthread sleep function, cpu consumption

On behalf, sorry for my far from perfect English.
I've recently wrote my self a demon for Linux (to be exact OpenWRT router) in C++ and i came to problem.
Well there are few threads there, one for each opened TCP connection, main thread waiting for new TCP connections and, as I call it, commander thread to check for status.
Every thing works fine, but my CPU is always at 100%. I now that its because of the commander code:
void *CommanderThread(void* arg)
{
Commander* commander = (Commander*)arg;
pthread_detach(pthread_self());
clock_t endwait;
while(true)
{
uint8_t temp;
endwait = clock () + (int)(1 * CLOCKS_PER_SEC);
for(int i=0;i<commander->GetCount();i++)
{
ptrRelayBoard rb = commander->GetBoard(i);
if (rb!= NULL)
rb->Get(0x01,&temp);
}
while (clock() < endwait);
}
return NULL;
}
As you can see the program do stuff every 1s. Time is not critical here. I know that CPU is always checking did the time passed. I've tried do do something like this:
while (clock() < endwait)
usleep(200);
But when the function usleep (and sleep also) seam to freeze the clock increment (its always a constant value after the usleep).
Is there any solution, ready functions (like phread_sleep(20ms)), or walk around for my problem? Maybe i should access the main clock somehow?
Here its not so critical i can pretty much check how long did the execution of status checking took (latch the clock() before, compare with after), and count the value to put as an argument to the usleep function. But in other thread, I would like to use this form.
Do usleep is putting whole process to freeze?
I'm currently debugging it on Cygwin, but don't think the problem lies here.
Thanks for any answers and suggestions its much appreciated.
J.L.
If it doesn't need to be exactly 1s, then just usleep a second. usleep and sleep put the current thread into an efficient wait state that is at least the amount of time you requested (and then it becomes eligible for being scheduled again).
If you aren't trying to get near exact time there's no need to check clock().
I've I have resolved it other way.
#include <sys/time.h>
#define CLOCK_US_IN_SECOND 1000000
static long myclock()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return (tv.tv_sec * CLOCK_US_IN_SECOND) + tv.tv_usec;
}
void *MainThread(void* arg)
{
Commander* commander = (Commander*)arg;
pthread_detach(pthread_self());
long endwait;
while(true)
{
uint8_t temp;
endwait = myclock() + (int)(1 * CLOCK_US_IN_SECOND);
for(int i=0;i<commander->GetCount();i++)
{
ptrRelayBoard rb = commander->GetBoard(i);
if (rb!= NULL)
rb->Get(0x01,&temp);
}
while (myclock() < endwait)
usleep((int)0.05*CLOCK_US_IN_SECOND);
}
return NULL;
}
Bare in mind, that this code is vulnerable for time change during execution. Don't have idea how to omit that, but in my case its not really important.

Win32 API timers

I was using the system timer (clock() function, see time.h) to time some serial and USB comms. All I needed was approx 1ms accurace. The first thing I noticed is that individual times can be out (plus or minus) 10ms. Timing a number of smaller events led to less accurate timing as events went by. Aggregate timing was slightly better. After a bit of a root on MSDN etc I stumbled across the timer in windows multi-media library (timeGetTime(), see MMSystem.h). This was much better with decent accuracy to the 1ms level.
Weirdness then ensued, after initially working flawlessy for days (lovely logs with useful timings) it all went pearshaped as this API also started showing this odd granularity (instead of a bunch of small comms messages taking 3ms,2ms,3ms,2ms, 3ms etc. it came out as 0ms, 0ms, 0ms, 0ms, 15ms etc. Rebooting the PC restored nomal accuarce but at some indeterminate time (after an hour or so) the anomoly returned.
Anyone got any idea or suggestions of how to get this level of timing accuracy on Windows XP (32bit Pro, using Visual Studio 2008).
My little timing class:
class TMMTimer
{
public:
TMMTimer( unsigned long msec);
TMMTimer();
void Clear() { is_set = false; }
void Set( unsigned long msec=0);
bool Expired();
unsigned long Elapsed();
private:
unsigned long when;
int roll_over;
bool is_set;
};
/** Main constructor.
*/
TMMTimer::TMMTimer()
{
is_set = false;
}
/** Main constructor.
*/
TMMTimer::TMMTimer( unsigned long msec)
{
Set( msec);
}
/** Set the timer.
*
* #note This sets the timer to some point in the future.
* Because the timer can wrap around the function sets a
* rollover flag for this condition which is checked by the
* Expired member function.
*/
void TMMTimer::Set( unsigned long msec /*=0*/)
{
unsigned long now = timeGetTime(); // System millisecond counter.
when = now + msec;
if (when < now)
roll_over = 1;
else
roll_over = 0;
is_set = true;
}
/** Check if timer expired.
*
* #return Returns true if expired, else false.
*
* #note Also returns true if timer was never set. Note that this
* function can handle the situation when the system timer
* rolls over (approx every 47.9 days).
*/
bool TMMTimer::Expired()
{
if (!is_set)
return true;
unsigned long now = timeGetTime(); // System millisecond counter.
if (now > when)
{
if (!roll_over)
{
is_set = false;
return true;
}
}
else
{
if (roll_over)
roll_over = false;
}
return false;
}
/** Returns time elapsed since timer expired.
*
* #return Time in milliseconds, 0 if timer was never set.
*/
unsigned long TMMTimer::Elapsed()
{
if (!is_set)
return 0;
return timeGetTime()-when;
}
Did you call timeBeginPeriod(1); to set the multimedia resolution to 1 millisecond? The multimedia timer resolution is system-global, so if you didn't set it yourself, chances are that you started after something else had called it, then when that something else called timeEndPeriod(), the resolution went back to the system default (which is normally 10 ms, if memory serves).
Others have advised using QueryPerformanceCounter(). This does have much higher resolution, but you still need to be careful. Depending on the kernel involved, it can/will use the x86 RDTSC function, which is a 64-bit counter of instruction cycles. For better or worse, on a CPU whose clock rate varies (which started on laptops, but is now common almost everywhere) the relationship between the clock count and wall time varies right along with the clock speed. If memory serves, if you force Windows to install with the assumption that there are multiple physical processors (not just multiple cores), you'll get a kernel in which QueryPerformanceCounter() will read the motherboard's 1.024 MHz clock instead. This reduces resolution compared to the CPU clock, but at least the speed is constant (and if you only need 1 ms resolution, it should be more than adequate anyway).
If you want high resolution timing on Windows, you should consider using QueryPerformanceFrequency and QueryPerformanceCounter.
These will provide the most accurate timings available on Windows, and should be much more "stable" over time. QPF gives you the number of counts/second, and QPC gives you the current count. You can use this to do very high resolution timings on most systems (fraction of ms).
Check out the high resolution timers from the Win32 API.
http://msdn.microsoft.com/en-us/library/ms644904(VS.85).aspx
You can use it to usually get microsecond resolution timers.