Time calculation in the main game loop - c++

There is the code in the Quake 2 main game loop implementation:
if (!initialized)
{ // let base retain 16 bits of effectively random data
base = timeGetTime() & 0xffff0000;
initialized = true;
}
curtime = timeGetTime() - base;
I'm wondering about the line base = timeGetTime() & 0xffff0000. Why are they applying the 0xffff0000 mask on the retrieved time? Why not to use just:
if (!initialized)
{ // let base retain 16 bits of effectively random data
initialTime = timeGetTime();
initialized = true;
}
curtime = timeGetTime() - initialTime;
???
What is the role of that mask?

The if (!initialized) check will only pass once. Therefore curtime will become larger with each gameloop, which wouldn't be the case with suggested rewrite, since the upper word may increase after sufficiently many gameloops.
Likely this is in preparation of an int-to-float conversion where smaller numbers result in higher accuracy. (Such as adjusting for the time between two frames and numerically integrating game states over time; but also rendering smooth animations)

The way it is implemented, base, and hence curtime, will assume different values depending on initialized being true or false.

According to Microsoft Docs, the description of timeGetTime() is:
The timeGetTime function retrieves the system time, in milliseconds.
The system time is the time elapsed since Windows was started.
Remarks
The only difference between this function and the timeGetSystemTime
function is that timeGetSystemTime uses the MMTIME structure to return
the system time. The timeGetTime function has less overhead than
timeGetSystemTime.
Note that the value returned by the timeGetTime function is a DWORD
value. The return value wraps around to 0 every 2^32 milliseconds,
which is about 49.71 days. This can cause problems in code that
directly uses the timeGetTime return value in computations,
particularly where the value is used to control code execution. You
should always use the difference between two timeGetTime return values
in computations.
The default precision of the timeGetTime function can be five
milliseconds or more, depending on the machine. You can use the
timeBeginPeriod and timeEndPeriod functions to increase the precision
of timeGetTime. If you do so, the minimum difference between
successive values returned by timeGetTime can be as large as the
minimum period value set using timeBeginPeriod and timeEndPeriod. Use
the QueryPerformanceCounter and QueryPerformanceFrequency functions to
measure short time intervals at a high resolution.
In my opinion, this will help improve the accuracy of time calculations.

Related

How can I measure sub-second durations?

In my calculator-like program, the user selects what and how many to compute (eg. how many digits of pi, how many prime numbers etc.). I use time(0) to check for the computation time elapsed in order to trigger a timeout condition. If the computation completes without timeout, I will also print the computation time taken, the value of which is stored in a double, the return type of difftime().
I just found out that the time values calculated are in seconds only. I don't want a user input of 100 and 10000 results to both print a computation duration of 0e0 seconds. I want them to print, for example, durations of 1.23e-6 and 4.56e-3 seconds respectively (as accurate as the machine can measure - I am more acquainted to the accuracy provided in Java and with the accuracies in scientific measurements so it's a personal preference).
I have seen the answers to other questions, but they don't help because 1) I will not be multi-threading (not preferred in my work environment). 2) I cannot use C++11 or later.
How can I obtain time duration values more accurate than seconds as integral values given the stated constraints?
Edit: Platform & machine-independent solutions preferred, otherwise Windows will do, thanks!
Edit 2: My notebook is also not connected to the Internet, so no downloading of external libraries like Boost (is that what Boost is?). I'll have to code anything myself.
You can use QueryPerformanceCounter (QPC) which is part of the Windows API to do high-resolution time measurements.
LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;
QueryPerformanceFrequency(&Frequency);
QueryPerformanceCounter(&StartingTime);
// Activity to be timed
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
//
// We now have the elapsed number of ticks, along with the
// number of ticks-per-second. We use these values
// to convert to the number of elapsed microseconds.
// To guard against loss-of-precision, we convert
// to microseconds *before* dividing by ticks-per-second.
//
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;
On Windows, the simplest solution is to use GetTickCount, which returns the number of milliseconds since the computer was started.
#include <windows.h>
...
DWORD before = GetTickCount();
...
DWORD duration = GetTickCount() - before;
std::cout<<"It took "<<duration<<"ms\n";
Caveats:
it works only on Windows;
the resolution (milliseconds) is not stellar;
given that the result is a 32 bit integer, it wraps around after one month or something; thus, you cannot measure stuff longer than that; a possible solution is to use GetTickCount64, which however is available only from Vista onwards;
since systems with the uptime of more than one month are actually quite common, you may indeed have to deal with results bigger than 231; thus, make sure to always keep such values in a DWORD (or an uint32_t), without casting them to int, or you are risking signed integer overflow. Another option is to just store them in a 64 bit signed integer (or a double) and forget the difficulties of dealing with unsigned integers.
I realize the compiler you're using doesn't support it, but for reference purposes the C++11 solution is simple...
auto start = std::chrono::high_resolution_clock::now();
auto end = std::chrono::high_resolution_clock::now();
long ts = std::chrono::duration<long, std::chrono::nano>(end - start).count();

extending the std::chrono functionality to deal with run-time (non compile-time) constant periods

I have been experimenting with all kind of timers on Linux and OSX, and would like to try and wrap some of them with the same interface used by std::chrono.
That's easy to do for timers that have a well-defined "period" at compile time, e.g. the POSIX clock_gettime() familiy, the clock_get_time() family on OSX, or gettimeofday().
However, there are some useful timers for which the "period" - while constant - is only known at runtime.
For example:
- POSIX states the period of clock(), CLOCKS_PER_SEC, may be a variable on non-XSI systems
- on Linux, the period of times() is given at runtime by sysconf(_SC_CLK_TCK)
- on OSX, the period of mach_absolute_time() is given at runtime by mach_timebase_info()
- on recent Intel processors, the DST register ticks at a constant rate, but of course that can only be determined at runtime
To wrap these timers in the std::chrono interface, one possibility would be to use a period of std::chrono::nanosecond , and convert the value of each timer to nanoseconds. An other approach could be to use a floating point representation. However, both approaches would introduce a (very small) overhead to the now() function, and a (probably small) loss in precision.
The solution I'm trying to pursue is to define a set of classes to represent such "run-time constant" periods, built along the same lines as the std::ratio class.
However I expect that will require rewriting all the related template classes and functions (as they assume constexpr values).
How do I wrap these kind of timers a la std:chrono ?
Or use non-constexpr values for the time period of a clock ?
Does anyone have any experience with wrapping these kind of timers a
la std:chrono ?
Actually I do. And on OSX, one of your platforms of interest. :-)
You mention:
on OSX, the period of mach_absolute_time() is given at runtime by
mach_timebase_info()
Absolutely correct. Also on OSX, the libc++ implementation of high_resolution_clock and steady_clock is actually based on mach_absolute_time. I'm the author of this code, which is open source with a generous license (do anything you want with it as long as you retain the copyright).
Here is the source for libc++'s steady_clock::now(). It is built pretty much the way you surmised. The run time period is converted to nanoseconds prior to returning. On OS X the conversion factor is very often 1, and the code takes advantage of that fact with an optimization. However the code is general enough to handle non-1 conversion factors.
On the first call to now() there's a small cost of querying the run time conversion factor to nanoseconds. In the general case a floating point conversion factor is computed. In the common case (conversion factor == 1) the subsequent cost is calling through a function pointer. I've found that the overhead is really quite reasonable.
On OS X the conversion factor, although not determined until run time, is still a constant (i.e. does not vary as the program executes), so it only needs to be computed once.
If you're in a situation where your period is actually varying dynamically, you'll need more infrastructure to handle this. Essentially you would need to integrate (calculus) the period vs time curve and then compute an average period between two points in time. That would require a constant monitoring of the period as it changes with time, and <chrono> isn't the right tool for that. Such tools are typically handled at the OS level.
[Does anyone have any experience] Or with using non-constexpr values for the time period of a clock ?
After reading through the standard (20.11.5, Class template duration), "period" is expected to be "a specialization of ratio":
Remarks: If Period is not a specialization of ratio, the program is ill-formed.
and all chrono templates rely heavily on constexpr functionality.
Does anyone have any experience with wrapping these kind of timers a la std:chrono ?
I've found here a suggestion to use a duration with period = 1, boost::rational as rep , though without any concrete examples.
I have done a similar thing for my purposes, only for Linux though. You find the code here; feel free to use the code in whatever way you want.
The challenges my implementation addresses overlap partially with the ones mentioned in your question. Specifically:
The tick factor (required to convert from clock ticks to a time unit based on seconds) is retrieved at run time, but only the first time now() is used&ddagger;. If you are concerned about the small overhead this causes, you may call the now() function once at start-up before you measure any actual intervals. The tick factor is stored in a static variable, which means there is still some overhead as – on the lowest level – each call of the now() function implies checking whether the static variable has been initialized. However, this overhead will be the same in each call of now(), so it shouldn't impact measuring time intervals.
I do not convert to nanoseconds by default, because when measuring relatively long periods of time (e.g. a few seconds) this causes overflows very quickly. This is in fact the main reason why I don't use the boost implementation. Instead of converting to nanoseconds, I implement the base unit as a template parameter (called Precision in the code). I use std::ratio from C++11 as template arguments. So I can choose, for example, a clock<micro>, which implies that calling the now() function will internally convert to microseconds rather than nanoseconds, which means I can measure periods of many seconds or minutes without overflows and still with good precision. (This is independent of the unit used to produce output. You can have a clock<micro> and display the result in seconds, etc.)
My clock type, which is called combined_clock combines user time, system time and wall-clock time. There is a boost clock type for this, too, but it's not compatible with the ratio types and units from std, whereas mine is.
&ddagger;The tick factor is retrieved using the ::sysconf() call you suggest, and that is guaranteed to return one and the same value throughout the life time of the process.
So the way you use it is as follows:
#include "util/proctime.hpp"
#include <ratio>
#include <chrono>
#include <thread>
#include <utility>
#include <iostream>
int main()
{
using std::chrono::duration_cast;
using millisec = std::chrono::milliseconds;
using clock_type = rlxutil::combined_clock<std::micro>;
auto tp1 = clock_type::now();
/* Perform some random calculations. */
unsigned long step1 = 1;
unsigned long step2 = 1;
for (int i = 0 ; i < 50000000 ; ++i) {
unsigned long step3 = step1 + step2;
std::swap(step1,step2);
std::swap(step2,step3);
}
/* Sleep for a while (this adds to real time, but not CPU time). */
std::this_thread::sleep_for(millisec(1000));
auto tp2 = clock_type::now();
std::cout << "Elapsed time: "
<< duration_cast<millisec>(tp2 - tp1)
<< std::endl;
return 0;
}
The usage above involves a pretty-print function that generates output like this:
Elapsed time: [user 40, system 0, real 1070 millisec]

C++ fine granular time

The following piece of code gives 0 as runtime of the function. Can anybody point out the error?
struct timeval start,end;
long seconds,useconds;
gettimeofday(&start, NULL);
int optimalpfs=optimal(n,ref,count);
gettimeofday(&end, NULL);
seconds = end.tv_sec - start.tv_sec;
useconds = end.tv_usec - start.tv_usec;
long opt_runtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
cout<<"\nOptimal Runtime is "<<opt_runtime<<"\n";
I get both start and end time as the same.I get the following output
Optimal Runtime is 0
Tell me the error please.
POSIX 1003.1b-1993 specifies interfaces for clock_gettime() (and clock_getres()), and offers that with the MON option there can be a type of clock with a clockid_t value of CLOCK_MONOTONIC (so that your timer isn't affected by system time adjustments). If available on your system then these functions return a structure which has potential resolution down to one nanosecond, though the latter function will tell you exactly what resolution the clock has.
struct timespec {
time_t tv_sec; /* seconds */
long tv_nsec; /* and nanoseconds */
};
You may still need to run your test function in a loop many times for the clock to register any time elapsed beyond its resolution, and perhaps you'll want to run your loop enough times to last at least an order of magnitude more time than the clock's resolution.
Note though that apparently the Linux folks mis-read the POSIX.1b specifications and/or didn't understand the definition of a monotonically increasing time clock, and their CLOCK_MONOTONIC clock is affected by system time adjustments, so you have to use their invented non-standard CLOCK_MONOTONIC_RAW clock to get a real monotonic time clock.
Alternately one could use the related POSIX.1 timer_settime() call to set a timer running, a signal handler to catch the signal delivered by the timer, and timer_getoverrun() to find out how much time elapsed between the queuing of the signal and its final delivery, and then set your loop to run until the timer goes off, counting the number of iterations in the time interval that was set, plus the overrun.
Of course on a preemptive multi-tasking system these clocks and timers will run even while your process is not running, so they are not really very useful for benchmarking.
Slightly more rare is the optional POSIX.1-1999 clockid_t value of CLOCK_PROCESS_CPUTIME_ID, indicated by the presence of the _POSIX_CPUTIME from <time.h>, which represents the CPU-time clock of the calling process, giving values representing the amount of execution time of the invoking process. (Even more rare is the TCT option of clockid_t of CLOCK_THREAD_CPUTIME_ID, indicated by the _POSIX_THREAD_CPUTIME macro, which represents the CPU time clock, giving values representing the amount of execution time of the invoking thread.)
Unfortunately POSIX makes no mention of whether these so-called CPUTIME clocks count just user time, or both user and system (and interrupt) time, accumulated by the process or thread, so if your code under profiling makes any system calls then the amount of time spent in kernel mode may, or may not, be represented.
Even worse, on multi-processor systems, the values of the CPUTIME clocks may be completely bogus if your process happens to migrate from one CPU to another during its execution. The timers implementing these CPUTIME clocks may also run at different speeds on different CPU cores, and at different times, further complicating what they mean. I.e. they may not mean anything related to real wall-clock time, but only be an indication of the number of CPU cycles (which may still be useful for benchmarking so long as relative times are always used and the user is aware that execution time may vary depending on external factors). Even worse it has been reported that on Linux CPU TimeStampCounter-based CPUTIME clocks may even report the time that a process has slept.
If your system has a good working getrusage() system call then it will hopefully be able to give you a struct timeval for each of the the actual user and system times separately consumed by your process while it was running. However since this puts you back to a microsecond clock at best then you'll need to run your test code enough times repeatedly to get a more accurate timing, calling getrusage() once before the loop, and again afterwards, and the calculating the differences between the times given. For simple algorithms this might mean running them millions of times, or more. Note also that on many systems the division between user time and system time is done somewhat arbitrarily and if examined separately in a repeated loop one or the other can even appear to run backwards. However if your algorithm makes no system calls then summing the time deltas should still be a fair total time for your code execution.
BTW, take care when comparing time values such that you don't overflow or end up with a negative value in a field, either as #Nim suggests, or perhaps like this (from NetBSD's <sys/time.h>):
#define timersub(tvp, uvp, vvp) \
do { \
(vvp)->tv_sec = (tvp)->tv_sec - (uvp)->tv_sec; \
(vvp)->tv_usec = (tvp)->tv_usec - (uvp)->tv_usec; \
if ((vvp)->tv_usec < 0) { \
(vvp)->tv_sec--; \
(vvp)->tv_usec += 1000000; \
} \
} while (0)
(you might even want to be more paranoid that tv_usec is in range)
One more important note about benchmarking: make sure your function is actually being called, ideally by examining the assembly output from your compiler. Compiling your function in a separate source module from the driver loop usually convinces the optimizer to keep the call. Another trick is to have it return a value that you assign inside the loop to a variable defined as volatile.
You've got weird mix of floats and ints here:
long opt_runtime = ((seconds) * 1000 + useconds/1000.0) + 0.5;
Try using:
long opt_runtime = (long)(seconds * 1000 + (float)useconds/1000);
This way you'll get your results in milliseconds.
The execution time of optimal(...) is less than the granularity of gettimeofday(...). This likely happes on Windows. On Windows the typical granularity is up to 20 ms. I've answered a related gettimeofday(...) question here.
For Linux I asked How is the microsecond time of linux gettimeofday() obtained and what is its accuracy? and got a good result.
More information on how to obtain accurate timing is described in this SO answer.
I normally do such a calculation as:
long long ss = start.tv_sec * 1000000LL + start.tv_usec;
long long es = end.tv_sec * 1000000LL + end.tv_usec;
Then do a difference
long long microsec_diff = es - ss;
Now convert as required:
double seconds = microsec_diff / 1000000.;
Normally, I don't bother with the last step, do all timings in microseconds.

timers, threads and compiler misbehaviour

I'm having trouble with something and couldn't find any answers about it, as I don't even know what to search for. I have a done a timer class using QueryPerformanceCounter, from my application, I launch a second thread object that has its own instanced timer and I just have an infinite loop getting delta time from the timer and using it to output the number of loop iterations per second.
I've noticed that it was giving me weird values so I started printing delta time and found out it was coming as 0 sometimes, so I went inside the method that returns delta time and did some testing. This is my deltaTime() method:
double MyTimer2::deltaTime()
{
LARGE_INTEGER timenow;
QueryPerformanceCounter(&timenow);
//std::cout << "timenow=" << (double)timenow.QuadPart << " currentticks=" << (double)m_currentTicks.QuadPart << std::endl;
double m_deltaTime = (double)(timenow.QuadPart - m_currentTicks.QuadPart) /* 1000.0*/ / (double)m_frequency.QuadPart;
m_currentTicks = timenow;
if(m_deltaTime < 0.000001)
return 0.0;
return m_deltaTime;
}
So, I put a breakpoint on "return 0.0;" and what happens is that it gets there most of the time, which is not correct. However, if I uncomment the printing code and run, I will never stop on the breakpoint. So in theory, my printing code is making it work correctly, whereas if I remove it, things stop working as they should! How is this possible, why is it happening and how can I fix it? I've tried _ReadWriteBarrier() unsuccessfully.
Thanks in advance!
EDIT: I need a high-resolution timer for physics simulation!
A couple processor generations ago, QueryPerformanceCounter() would read the CPU's cycle counter (e.g. rdtsc). Using this method, the number of ticks from successive reads would never be zero. The resolution was equal to the CPU clock rate, e.g. 3 GHz.
Modern processors have two characteristics which make the cycle counter useless for timing. First, you have multiple cores, which each have their own cycle counter. Threads can migrate between cores, and if you read the cycle counter from two different cores, the difference would not be related to elapsed time. It could even be negative. Secondly, you have dynamic clocking based on load (both underclocking to save power and overclocking for performance). Intel calls these "SpeedStep" and "Turbo Boost", respectively. When the cycle rate isn't fixed, there's no way to convert from ticks to time.
So, QueryPerformanceCounter now uses a dedicated piece of hardware called a High-Performance Event Counter (HPET), with a resolution of several MHz. Importantly, there's only one regardless of how many cores you have, and it doesn't change speed dynamically. But, since the resolution is lower, it is now possible to read it twice between ticks, in which case you'll get an elapsed time reported as zero.
In practice, this isn't a problem. If you need timing more precise than what the HPET can provide, then a general purpose computer is not suitable for you. Timing in the nanosecond range will be severely affected by interrupts.
What could possibly be the purpose of this block?
if(m_deltaTime < 0.000001)
return 0.0;
It has no value, it simply screws with the results, telling you the time was zero when it actually wasn't.
First of all, your timer is wrong: it consumes your CPU intensively. On the single core machine it will slow down all the system. If you want to create a timer and target Windows, you can use timer functions.
Then, every not negative value, returned by your deltaTime() function is valid. While you hosted not in real-time operating system, every operation can take arbitrary amount of time. One iteration can take about tens cycles of processor ticks, or tens years. No one guarantee.
Third, about experimental results. It seems that if context will be switched once between two consecutive time measurement, you get value about 0.016s, if not, you get value bellow 0.000001s that is floored to 0s.
As it was said, printing to console is relatively heavy operation and you actually always get context switched when you enable it.
EDIT
While QueryPerformanceCounter seems to offer great resolution, it traps you. You will never get actually high resolution timer, unless you work in real-time OS.

QueryPerformanceCounter and overflows

I'm using QueryPerformanceCounter to do some timing in my application. However, after running it for a few days the application seems to stop functioning properly. If I simply restart the application it starts working again. This makes me a believe I have an overflow problem in my timing code.
// Author: Ryan M. Geiss
// http://www.geisswerks.com/ryan/FAQS/timing.html
class timer
{
public:
timer()
{
QueryPerformanceFrequency(&freq_);
QueryPerformanceCounter(&time_);
}
void tick(double interval)
{
LARGE_INTEGER t;
QueryPerformanceCounter(&t);
if (time_.QuadPart != 0)
{
int ticks_to_wait = static_cast<int>(static_cast<double>(freq_.QuadPart) * interval);
int done = 0;
do
{
QueryPerformanceCounter(&t);
int ticks_passed = static_cast<int>(static_cast<__int64>(t.QuadPart) - static_cast<__int64>(time_.QuadPart));
int ticks_left = ticks_to_wait - ticks_passed;
if (t.QuadPart < time_.QuadPart) // time wrap
done = 1;
if (ticks_passed >= ticks_to_wait)
done = 1;
if (!done)
{
// if > 0.002s left, do Sleep(1), which will actually sleep some
// steady amount, probably 1-2 ms,
// and do so in a nice way (cpu meter drops; laptop battery spared).
// otherwise, do a few Sleep(0)'s, which just give up the timeslice,
// but don't really save cpu or battery, but do pass a tiny
// amount of time.
if (ticks_left > static_cast<int>((freq_.QuadPart*2)/1000))
Sleep(1);
else
for (int i = 0; i < 10; ++i)
Sleep(0); // causes thread to give up its timeslice
}
}
while (!done);
}
time_ = t;
}
private:
LARGE_INTEGER freq_;
LARGE_INTEGER time_;
};
My question is whether the code above should work deterministically for weeks of running continuously?
And if not where the problem is? I thought the overflow was handled by
if (t.QuadPart < time_.QuadPart) // time wrap
done = 1;
But maybe thats not enough?
EDIT: Please observe that I did not write the original code, Ryan M. Geiss did, the link to the original source of the code is in the code.
QueryPerformanceCounter is notorious for its unreliability. It's fine to use for individual short-interval timing, if you're prepared to handle abnormal results. It is not exact - It's typically based on the PCI bus frequency, and a heavily loaded bus can lead to lost ticks.
GetTickCount is actually more stable, and can give you 1ms resolution if you've called timeBeginPeriod. It will eventually wrap, so you need to handle that.
__rdtsc should not be used, unless you're profiling and have control of which core you're running on and are prepared to handle variable CPU frequency.
GetSystemTime is decent for longer periods of measurements, but will jump when the system time is adjusted.
Also, Sleep(0) does not do what you think it does. It will yield the cpu if another context wants it - otherwise it'll return immediately.
In short, timing on windows is a mess. One would think that today it'd be possible to get accurate long-term timing from a computer without going through hoops - but this isn't the case. In our game framework we're using several time sources and corrections from the server to ensure all connected clients have the same game time, and there's a lot of bad clocks out there.
Your best bet would likely be to just use GetTickCount or GetSystemTime, wrap it into something that adjusts for time jumps/wrap arounds.
Also, you should convert your double interval to an int64 milliseconds and then use only integer math - this avoids problems due to floating point types' varying accuracy based on their contents.
Based on your comment, you probably should be using Waitable Timers instead.
See the following examples:
Using Waitable Timer Objects
Using Waitable Timers with an Asynchronous Procedure Call
Performance counters are 64-bit, so they are large enough for years of running continuously. For example, if you assume the performance counter increments 2 billion times each second (some imaginary 2 GHz processor) it will overflow in about 290 years.
Using a nanosecond-scale timer to control something like Sleep() that at best is precise to several milliseconds (and usually, several dozen milliseconds) is somewhat controversary anyway.
A different approach you might consider would be to use WaitForSingleObject or a similar function. This burns less CPU cycles, causes a trillion fewer context switches over the day, and is more reliable than Sleep(0), too.
You could for example create a semapore and never touch it in normal operation. The semaphore exists only so you can wait on something, if you don't have anything better to wait on. Then you can specify a timeout in milliseconds up to 49 days long with a single syscall. And, it will not only be less work, it will be much more accurate too.
The advantage is that if "something happens", so you want to break up earlier than that, you only need to signal the semaphore. The wait call will return instantly, and you will know from the WAIT_OBJECT_0 return value that it was due to being signaled, not due to time running out. And all that without complicated logic and counting cycles.
The problem you asked about most directly:
if (t.QuadPart < time_.QuadPart)
should instead be this:
if (t.QuadPart - time_.QuadPart < 0)
The reason for that is that you want to look for wrapping in relative time, not absolute time. Relative time will wrap (1ull<<63) time units after the reference call to QPC. Absolute time might wrap (1ull<<63) time units after reboot, but it could wrap at any other time it felt like it, that's undefined.
QPC is a little bugged on some systems (older RDTSC-based QPCs on early multicore CPUs, for instance) so it may be desirable to allow small negative time deltas like so:
if (t.QuadPart - time_.QuadPart < -1000000) //time wrap
An actual wrap will produce a very large negative time deltas, so that's safe. It shouldn't be necessary on modern systems, but trusting microsoft is rarely a good idea.
...
However, the bigger problem there with time wrapping is in the fact that ticks_to_wait, ticks_passed, and ticks_left are all int, not LARGE_INT or long long like they should be. This makes most of that code wrap if any significant time periods are involved - and "significant" in this context is platform dependent, it can be on the order of 1 second in a few (rare these days) cases, or even less on some hypothetical future system.
Other issues:
if (time_.QuadPart != 0)
Zero is not a special value there, and should not be treated as such. My guess is that the code is conflating QPC returning a time of zero with QPCs return value being zero. The return value is not the 64 bit time passed by pointer, it's the BOOL that QPC actually returns.
Also, that loop of Sleep(0) is foolish - it appears to be tuned to behave correctly only on a particular level of contention and a particular per-thread CPU performance. If you need resolution that's a horrible idea, and if you don't need resolution then that entire function should have just been a single call to Sleep.