setting/pausing clock time for std::chrono clocks in C++ - c++

I wrote something to measure how long my code takes to run, and to print it out. The way I have it now supports nesting of these measurements.
The thing is that in the process of getting the time interval, converting it into a number, getting time format, and then converting everything into a string, and then print it out takes a while (2-3 milliseconds), I/O especially seems expensive. I want the clock to "skip over" this process in a sense, since the stuff I'm measuring would be in the microseconds. (and I think it'd be a good feature regardless, if I have other things I want to skip)
std::chrono::high_resolution_clock clock;
std::chrono::time_point<std::chrono::steady_clock> first, second;
first = clock.now();
std::chrono::time_point<std::chrono::high_resolution_clock> paused_tp = clock.now();
std::cout << "Printing things \n";
clock.setTime(paused_tp); // Something like this is what I want
second = clock.now();
The idea is to make sure that first and second have minimal differences, ideally identical.
From what I see, the high_resolution_clock class (and all the other chrono clocks), keep their time_point private, and you can only access it from clock.now()
I know there might be benchmarking libraries out there that do this, but I'd like to figure out how to do it myself (if only for the sake of knowing how to do it). Any information on how other libraries do it or insights on how chrono works would be appreciated as well. I might be misunderstanding how chrono internally keeps track.
(Is std::chrono::high_resolution_clock even accurate enough for something like this?)
(While I'm here any resources on making C++ programs more efficient would be great)
Edit: I actually do printing after the section of code I'm trying to time, the problem only arises in, say, when I want to time the entire program, as well as individual functions. Then the printing of the function's time would cause delay in overall program time.
Edit 2: I figured I should have more of an example of what I'm doing.
I have a class that handles everything, let's say it's called tracker, it takes care of all that clock nonsense.
tracker loop = TRACK(
for(int i = 0; i != 100; ++i){
tracker b = TRACK(function_call());
b.display();
}
)
loop.display();
The macro is optional, it just a quick thing that makes it less cluttered and allows me to display the name of the function being called.
Explicitly the macro expands to
tracker loop = "for(int i = 0; i != 100; ++i){ tracker b = TRACK(function_call()); b.display(); }"
loop.start()
for(int i = 0; i != 100; ++i){
tracker b = "function_call()"
b.start();
function_call();
b.end();
b.display();
}
loop.end();
loop.display();
In most situations the printing isn't an issue, it only keeps track of what's between start() and end(), but here the b.display() ends up interfering with the tracker loop.
A goal of mine with this was for the tracker to be as non-intrusive as possible, so I'd like most/all of it to be handled in the tracker class. But then I run into the problem of b.display() being a method of a different instance than the tracker loop. I've tried a few things with the static keyword but ran into a few issues with that (still trying a little). I might've coded myself into a dead end here, but there are still a lot of things left to try.

Just time the two intervals separately and add them, i.e. save 4 total timestamps. For nested intervals, you might just save timestamps into an array and sort everything out at the end. (Or inside an outer loop before timestamps get overwritten). Storing to an array is quite cheap.
Or better: defer printing until later.
If the timed interval is only milliseconds, just save what you were going to print and do it outside the timed interval.
If you have nested timed intervals, at least sink the printing out of the inner-most intervals to minimize the amount of stop/restart you have to do.
If you're manually instrumenting your code all over the place, maybe look at profiling tools like flamegraph, especially if your timed intervals break down on function boundaries. linux perf: how to interpret and find hotspots.
Not only does I/O take time itself, it will make later code run slower for a few hundreds or thousands of cycles. Making a system call touches a lot of code, so when you return to user-space it's likely that you'll get instruction-cache and data-cache misses. System calls that modify your page tables will also result in TLB misses.
(See "The (Real) Costs of System Calls" section in the FlexSC paper (Soares, Stumm), timed on an i7 Nehalem running Linux. (First-gen i7 from ~2008/9). The paper proposes a batched system-call mechanism for high-throughput web-servers and similar, but their baseline results for plain Linux are interesting and relevant outside of that.)
On a modern Intel CPU with Meltdown mitigation enabled, you'll usually get TLB misses. With Spectre mitigation enabled on recent x86, branch-prediction history will probably be wiped out, depending on the mitigation strategy. (Intel added a way for the kernel to request that higher-privileged branches after this point won't be affected by prediction history for lower-privileged branches. On current CPUs, I think this just flushes the branch-prediction cache.)
You can avoid the system-call overhead by letting iostream buffer for you. It's still significant work formatting and copying data around, but much cheaper than writing to a terminal. Redirecting your program's stdout to a file will make cout full-buffered by default, instead of line-buffered. i.e. run it like this:
./my_program > time_log.txt
The final output will match what you would have got on the terminal, but (as long as you don't do anything silly like using std::endl to force a flush) it will just be buffered up. The default buffer size is probably something like 4kiB. Use strace ./my_program or a similar tool to trace system calls, and make sure you're getting one big write() at the end, instead of lots of small write()s.
It would be nice to avoid buffered I/O inside (outer) timed regions, but it's very important to avoid system calls in places your "real" (non-instrumented) code wouldn't have them, if you're timing down to nanoseconds. And this is true even before timed intervals, not just inside.
cout << foo if it doesn't make a system call isn't "special" in terms of slowing down later code.

To overcome the overhead, the time lapse printing can be done by another thread. The main thread saves a start and end time into shared global variables, and notifies the condition variable the print thread is waiting on.
#include<iostream>
#include<thread>
#include<chrono>
#include<mutex>
#include<condition_variable>
#include<atomic>
std::condition_variable cv;
std::mutex mu;
std::atomic<bool> running {true};
std::atomic<bool> printnow {false};
// shared but non-atomic: protected by the mutex and condition variable.
// possible bug: the main thread can write `now` before print_thread wakes up and reads it
std::chrono::high_resolution_clock::time_point start;
std::chrono::high_resolution_clock::time_point now;
void print_thread() {
std::thread([]() {
while (running) {
std::unique_lock<std::mutex> lock(mu);
cv.wait(lock, []() { return !running || printnow; });
if (!running) return;
std::chrono::milliseconds lapse_ms = std::chrono::duration_cast<std::chrono::milliseconds>(now - start);
printnow = false;
std::cout << " lapse time " << lapse_ms.count() << " ms\n";
}
}).detach();
}
void print_lapse(std::chrono::high_resolution_clock::time_point start1, std::chrono::high_resolution_clock::time_point now1) {
start = start1;
now = now1;
printnow = true;
cv.notify_one();
}
int main()
{
//launch thread
print_thread();
// laspe1
std::chrono::high_resolution_clock::time_point start1 = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_for(std::chrono::milliseconds(200));
std::chrono::high_resolution_clock::time_point now1 = std::chrono::high_resolution_clock::now();
print_lapse(start1,now1);
// laspe2
start1 = std::chrono::high_resolution_clock::now();
std::this_thread::sleep_for(std::chrono::milliseconds(500));
now1 = std::chrono::high_resolution_clock::now();
print_lapse(start1, now1);
//winding up
std::this_thread::sleep_for(std::chrono::milliseconds(300));
running = false;
cv.notify_one();
}

Related

I created a high precision multitasking basic C++ code, what is the algorithm implementation called?

So I always wanted to implement basic multitasking code, specifically asynchronous code (not concurrency code) without using interrupts, boost, complex threading, complex multitasking implementations or algorithms.
I did some programming on MCUs such as the ATmega328. In most cases to make most efficient and use out from the MCUs, multitasking is required in which functions run at the same time ("perceived" running at the same time) without halting the MCU to run other functions.
Such that one "function_a" requires a delay but it should not halt the MCU for the delay so that other functions like "function_b" can also run asynchronously.
To do such task with microcontrollers only having one CPU/thread, an algorithm with timers and keeping track of the time is used to implement multitasking. It's really simple and always works. I have taken the concept from MCUs and applied it to desktop PCs in C++ using high precision timers, the code is given below.
I am really surprised that no one uses this form of asynchronous algorithm for C++ and I haven't seen any examples on the internet for C++.
My question now is, what exactly this algorithm and implementation is called in computer science or computer engineering? I read that this implementation is called a "State Machine" but I googled it and did not see any code that is similar to mine that uses only with the help of timers directly in C++.
The code below does the following: It runs function 1 but at the same time also runs function 2 without needing to halt the application.
Both functions also needs to execute such that they do not run blatantly continuously, instead the functions need to run continuously with a specified time (function_1 runs every 1sec and function_2 every 3secs).
Finding similar implementation for the requirements above, given on the internet for C++ is really complex. The code below is simple in nature and works as intended:
// Asynchronous state machine using one CPU C++ example:
// Tested working multitasking code:
#include <iostream>
#include <ctime>
#include <ratio>
#include <chrono>
using namespace std::chrono;
// At the first execution of the program, capture the time as zero reference and store it to "t2".
auto t2 = high_resolution_clock::now();
auto t3 = high_resolution_clock::now();
int main()
{
while (1)
{
// Always update the time reference variable "t1" to the current time:
auto t1 = high_resolution_clock::now();
// Always check the difference of the zero reference time with the current time and see if it is greater than the set time specified in the "if" argument:
duration<double> time_span_1 = duration_cast<duration<double>>(t1 - t2);
duration<double> time_span_2 = duration_cast<duration<double>>(t1 - t3);
if(time_span_1.count() >= 1)
{
printf("This is function_1:\n\n");
std::cout << time_span_1.count() << " Secs (t1-t2)\n\n";
// Set t2 to capture the current time again as zero reference.
t2 = high_resolution_clock::now();
std::cout << "------------------------------------------\n\n";
}
else if (time_span_2.count() >= 3)
{
printf("This is function_2:\n\n");
std::cout << time_span_2.count() << " Secs (t1-t3)\n\n";
// Set t2 to capture the current time again as zero reference.
t3 = high_resolution_clock::now();
std::cout << "------------------------------------------\n\n";
}
}
return 0;
}
What is the algorithm...called?
Some people call it "super loop." I usually write it like this:
while (1) {
if ( itsTimeToPerformTheHighestPriorityTask() ) {
performTheHighestPriorityTask();
continue;
}
if ( itsTimeToPerformTheNextHighestPriorityTask() ) {
performTheNextHighestPriorityTask();
continue;
}
...
if ( itsTimeToPerformTheLowestPriorityTask() ) {
performTheLowestPriorityTask();
continue;
}
waitForInterrupt();
}
The waitForInterrupt() call at the bottom is optional. Most processors have an op-code that puts the processor into a low-power state (basically, it halts the processor for some definition of "halt") until an interrupt occurs.
Halting the CPU when there's no work to be done can greatly improve battery life if the device is battery powered, and it can help with thermal management if that's an issue. But, the price you pay for using it is, your timers and all of your I/O must be interrupt driven.
I would describe the posted code as "microcontroller code", because it is assuming that it is the only program that will be running on the CPU and that it can therefore burn as many CPU-cycles as it wants to without any adverse consequence. That assumption is often valid for programs running on microcontrollers (since usually a microcontroller doesn't have any OS or other programs installed on it), but "spinning the CPU" is not generally considered acceptable behavior in the context of a modern PC/desktop OS where programs are expected to be efficient and share the computer's resources with each other.
In particular, "spinning" the CPU on a modern PC (or Mac) introduces the following problems:
It uses up 100% of the CPU cycles on a CPU core, which means those CPU cycles are unavailable to any other programs that might otherwise be able to make productive use of them
It prevents the CPU from ever going to sleep, which wastes power -- that's bad on a desktop or server because it generates unwanted/unnecessary heat, and it's worse on a laptop because it quickly drains the battery.
Modern OS schedulers keep track of how much CPU time each program uses, and if the scheduler notices that a program is continuously spinning the CPU, it will likely respond by drastically reducing that program's scheduling-priority, in order to allow other, less CPU-hungry programs to remain responsive. Having a reduced CPU priority means that the program is less likely to be scheduled to run at the moment when it wants to do something useful, making its timing less accurate than it otherwise might be.
Users who run system-monitoring utilities like Task Manager (in Windows) or Activity Monitor (under MacOS/X) or top (in Linux) will see the program continuously taking 100% of a CPU core and will likely assume the program is buggy and kill it. (and unless the program actually needs 100% of a CPU core to do its job, they'll be correct!)
In any case, it's not difficult to rewrite the program to use almost no CPU cycles instead. Here's a version of the posted program that uses approximately 0% of a CPU core, but still calls the desired functions at the desired intervals (and also prints out how close it came to the ideal timing -- which is usually within a few milliseconds on my machine, but if you need better timing accuracy than that, you can get it by running the program at higher/real-time priority instead of as a normal-priority task):
#include <iostream>
#include <ctime>
#include <chrono>
#include <thread>
using namespace std::chrono;
int main(int argc, char ** argv)
{
// These variables will contain the times at which we next want to execute each task.
// Initialize them to the current time so that each task will run immediately on startup
auto nextT1Time = high_resolution_clock::now();
auto nextT3Time = high_resolution_clock::now();
while (1)
{
// Compute the next time at which we need to wake up and execute one of our tasks
auto nextWakeupTime = std::min(nextT1Time, nextT3Time);
// Sleep until the desired time
std::this_thread::sleep_until(nextWakeupTime);
bool t1Executed = false, t3Executed = false;
high_resolution_clock::duration t1LateBy, t3LateBy;
auto now = high_resolution_clock::now();
if (now >= nextT1Time)
{
t1Executed = true;
t1LateBy = now-nextT1Time;
// schedule our next execution to be 1 second later
nextT1Time = nextT1Time+seconds(1);
}
if (now >= nextT3Time)
{
t3Executed = true;
t3LateBy = now-nextT3Time;
// schedule our next execution to be 3 seconds later
nextT3Time = nextT3Time+seconds(3);
}
// Since the calls to std::cout can be slow, we'll execute them down here, after the functions have been called but before
// (nextWakeupTime) is recalculated on the next go-around of the loop. That way the time spent printing to stdout during the T1
// task won't potentially hold off execution of the T3 task
if (t1Executed) std::cout << "function T1 was called (it executed " << duration_cast<microseconds>(t1LateBy).count() << " microseconds after the expected time)" << std::endl;
if (t3Executed) std::cout << "function T3 was called (it executed " << duration_cast<microseconds>(t3LateBy).count() << " microseconds after the expected time)" << std::endl;
}
return 0;
}

C++ chrono timers go 1000x slower in a thread

My setup
iOS13.2
Xcode 11.2
I'm implementing a simple timer to count seconds by calendar/wall time.
From many SO tips, I tried various std::chrono:: timers, such as
system_clock
steady_clock
They seem to behave correctly under a single-threaded program.
But in a production code, when I use the timer in a background thread, then it falls apart,
meaning that the duration readings from calling the timer functions are way off.
m_thread = std::thread([&, this]() {
auto start = std::chrono::system_clock::now(); // a persistent state initialized at the beginning of the run.
while (true) {
auto end = std::chrono::system_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
auto isOutdated = duration > 1000;
if (isOutdated) {
print("we are outdated.").
}
}
});
duration seems to be almost always 0;
gettimeofday() works slightly better, i.e. it actually moves, but at a rate 1000x slower than wall time.
I had thought that the chrono class counts all kinds of times, but it seems that I have the wrong expectation how it works.
What am I missing?
UPDATE
Forgot to say, I have 2 more threads going at the same time. Could thread preemption affect this?
UPDATE 2
I tried a few things and now the program behaves as expected, but this actually drove me mad how it happened in the first place.
Things I did
Gradually increase the timeout threshold from 1 to 3000, each time the whole program gets recompiled. I found that when I lower the threshold, the program actually gets the duration right.
Try with gettimeofday() first, which consistently shows numbers 1000x slower, then switch back to system_clock.
Disable some logging to avoid performance hit. I use a thirdparty thread-safe logging lib, which writes to a log file and outputs to device syslog at the same time.
Right now I can finally see the correct duration. NO change in the code logic. What a bizarre experience!

sleep without system or IO calls

I need a sleep that does not issue any system or IO calls for a scenario with Hardware Transactional Memory (these calls would lead to an abort). Sleeping for 1 microsecond as in usleep(1) would be just fine.
This question suggests to implement nested loops to keep the program busy and delay it for some time. However, I want to be able to compile with optimization which would delete these loops.
An idea could be to calculate some sophisticated math equation. Are there approaches to this? The actual time waited does not have to be precise - it should be vaguely the same for multiple runs however.
Try a nop loop with a volatile asm directive:
for (int i = 0; i < 1000; i++) {
asm volatile ("nop");
}
The volatile should prevent the optimizer from getting rid of it. If that doesn't do it, then try __volatile__.
The tricky part here is the timing. Querying any sort of timer may well count as an I/O function, depending on the OS.
But if you just want a delay loop, when timing isn't that important, you should look to platform-specific code. For example, there is an Intel-specific intrinsic called _mm_pause that translates to a CPU pause instruction, which basically halts the pipeline until the next memory bus sync comes through. It was designed to be put into a spinlock loop (no point in spinning and requerying an atomic variable until there is a possibility of new information), but it might (might - read the documentation) inhibit the compiler from removing your delay loop as empty.
You can use this code:
#include <time.h>
void delay(int n)
{
n *= CLOCKS_PER_SEC / 1000;
clock_t t1 = clock();
while (clock() <= t1 + n && clock() >= t1);
}
Sometimes (not very often) this function will cause less delay than specified due to clock counter overflow.
Update
Another option is to use a loops like this with volatile counters.

C++ , Timer, Milliseconds

#include <iostream>
#include <conio.h>
#include <ctime>
using namespace std;
double diffclock(clock_t clock1,clock_t clock2)
{
double diffticks=clock1-clock2;
double diffms=(diffticks)/(CLOCKS_PER_SEC/1000);
return diffms;
}
int main()
{
clock_t start = clock();
for(int i=0;;i++)
{
if(i==10000)break;
}
clock_t end = clock();
cout << diffclock(start,end)<<endl;
getch();
return 0;
}
So my problems comes to that it returns me a 0, well to be stright i want to check how much time my program does operate...
I found tons of crap over the internet well mostly it comes to the same point of getting a 0 beacuse the start and the end is the same
This problems goes to C++ remeber : <
There are a few problems in here. The first is that you obviously switched start and stop time when passing to diffclock() function. The second problem is optimization. Any reasonably smart compiler with optimizations enabled would simply throw the entire loop away as it does not have any side effects. But even you fix the above problems, the program would most likely still print 0. If you try to imagine doing billions operations per second, throw sophisticated out of order execution, prediction and tons of other technologies employed by modern CPUs, even a CPU may optimize your loop away. But even if it doesn't, you'd need a lot more than 10K iterations in order to make it run longer. You'd probably need your program to run for a second or two in order to get clock() reflect anything.
But the most important problem is clock() itself. That function is not suitable for any time of performance measurements whatsoever. What it does is gives you an approximation of processor time used by the program. Aside of vague nature of the approximation method that might be used by any given implementation (since standard doesn't require it of anything specific), POSIX standard also requires CLOCKS_PER_SEC to be equal to 1000000 independent of the actual resolution. In other words — it doesn't matter how precise the clock is, it doesn't matter at what frequency your CPU is running. To put simply — it is a totally useless number and therefore a totally useless function. The only reason why it still exists is probably for historical reasons. So, please do not use it.
To achieve what you are looking for, people have used to read the CPU Time Stamp also known as "RDTSC" by the name of the corresponding CPU instruction used to read it. These days, however, this is also mostly useless because:
Modern operating systems can easily migrate the program from one CPU to another. You can imagine that reading time stamp on another CPU after running for a second on another doesn't make a lot of sense. It is only in latest Intel CPUs the counter is synchronized across CPU cores. All in all, it is still possible to do this, but a lot of extra care must be taken (i.e. once can setup the affinity for the process, etc. etc).
Measuring CPU instructions of the program oftentimes does not give an accurate picture of how much time it is actually using. This is because in real programs there could be some system calls where the work is performed by the OS kernel on behalf of the process. In that case, that time is not included.
It could also happen that OS suspends an execution of the process for a long time. And even though it took only a few instructions to execute, for user it seemed like a second. So such a performance measurement may be useless.
So what to do?
When it comes to profiling, a tool like perf must be used. It can track a number of CPU clocks, cache misses, branches taken, branches missed, a number of times the process was moved from one CPU to another, and so on. It can be used as a tool, or can be embedded into your application (something like PAPI).
And if the question is about actual time spent, people use a wall clock. Preferably, a high-precision one, that is also not a subject to NTP adjustments (monotonic). That shows exactly how much time elapsed, no matter what was going on. For that purpose clock_gettime() can be used. It is part of SUSv2, POSIX.1-2001 standard. Given that use you getch() to keep the terminal open, I'd assume you are using Windows. There, unfortunately, you don't have clock_gettime() and the closest thing would be performance counters API:
BOOL QueryPerformanceFrequency(LARGE_INTEGER *lpFrequency);
BOOL QueryPerformanceCounter(LARGE_INTEGER *lpPerformanceCount);
For a portable solution, the best bet is on std::chrono::high_resolution_clock(). It was introduced in C++11, but is supported by most industrial grade compilers (GCC, Clang, MSVC).
Below is an example of how to use it. Please note that since I know that my CPU will do 10000 increments of an integer way faster than a millisecond, I have changed it to microseconds. I've also declared the counter as volatile in hope that compiler won't optimize it away.
#include <ctime>
#include <chrono>
#include <iostream>
int main()
{
volatile int i = 0; // "volatile" is to ask compiler not to optimize the loop away.
auto start = std::chrono::steady_clock::now();
while (i < 10000) {
++i;
}
auto end = std::chrono::steady_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::microseconds>(end - start);
std::cout << "It took me " << elapsed.count() << " microseconds." << std::endl;
}
When I compile and run it, it prints:
$ g++ -std=c++11 -Wall -o test ./test.cpp && ./test
It took me 23 microseconds.
Hope it helps. Good Luck!
At a glance, it seems like you are subtracting the larger value from the smaller value. You call:
diffclock( start, end );
But then diffclock is defined as:
double diffclock( clock_t clock1, clock_t clock2 ) {
double diffticks = clock1 - clock2;
double diffms = diffticks / ( CLOCKS_PER_SEC / 1000 );
return diffms;
}
Apart from that, it may have something to do with the way you are converting units. The use of 1000 to convert to milliseconds is different on this page:
http://en.cppreference.com/w/cpp/chrono/c/clock
The problem appears to be the loop is just too short. I tried it on my system and it gave 0 ticks. I checked what diffticks was and it was 0. Increasing the loop size to 100000000, so there was a noticeable time lag and I got -290 as output (bug -- I think that the diffticks should be clock2-clock1 so we should get 290 and not -290). I tried also changing "1000" to "1000.0" in the division and that didn't work.
Compiling with optimization does remove the loop, so you have to not use it, or make the loop "do something", e.g. increment a counter other than the loop counter in the loop body. At least that's what GCC does.
Note: This is available after c++11.
You can use std::chrono library.
std::chrono has two distinct objects. (timepoint and duration). Timepoint represents a point in time, and duration, as we already know the term represents an interval or a span of time.
This c++ library allows us to subtract two timepoints to get a duration of time passed in the interval. So you can set a starting point and a stopping point. Using functions you can also convert them into appropriate units.
Example using high_resolution_clock (which is one of the three clocks this library provides):
#include <chrono>
using namespace std::chrono;
//before running function
auto start = high_resolution_clock::now();
//after calling function
auto stop = high_resolution_clock::now();
Subtract stop and start timepoints and cast it into required units using the duration_cast() function. Predefined units are nanoseconds, microseconds, milliseconds, seconds, minutes, and hours.
auto duration = duration_cast<microseconds>(stop - start);
cout << duration.count() << endl;
First of all you should subtract end - start not vice versa.
Documentation says if value is not available clock() returns -1, did you check that?
What optimization level do you use when compile your program? If optimization is enabled compiler can effectively eliminate your loop entirely.

QueryPerformanceCounter and overflows

I'm using QueryPerformanceCounter to do some timing in my application. However, after running it for a few days the application seems to stop functioning properly. If I simply restart the application it starts working again. This makes me a believe I have an overflow problem in my timing code.
// Author: Ryan M. Geiss
// http://www.geisswerks.com/ryan/FAQS/timing.html
class timer
{
public:
timer()
{
QueryPerformanceFrequency(&freq_);
QueryPerformanceCounter(&time_);
}
void tick(double interval)
{
LARGE_INTEGER t;
QueryPerformanceCounter(&t);
if (time_.QuadPart != 0)
{
int ticks_to_wait = static_cast<int>(static_cast<double>(freq_.QuadPart) * interval);
int done = 0;
do
{
QueryPerformanceCounter(&t);
int ticks_passed = static_cast<int>(static_cast<__int64>(t.QuadPart) - static_cast<__int64>(time_.QuadPart));
int ticks_left = ticks_to_wait - ticks_passed;
if (t.QuadPart < time_.QuadPart) // time wrap
done = 1;
if (ticks_passed >= ticks_to_wait)
done = 1;
if (!done)
{
// if > 0.002s left, do Sleep(1), which will actually sleep some
// steady amount, probably 1-2 ms,
// and do so in a nice way (cpu meter drops; laptop battery spared).
// otherwise, do a few Sleep(0)'s, which just give up the timeslice,
// but don't really save cpu or battery, but do pass a tiny
// amount of time.
if (ticks_left > static_cast<int>((freq_.QuadPart*2)/1000))
Sleep(1);
else
for (int i = 0; i < 10; ++i)
Sleep(0); // causes thread to give up its timeslice
}
}
while (!done);
}
time_ = t;
}
private:
LARGE_INTEGER freq_;
LARGE_INTEGER time_;
};
My question is whether the code above should work deterministically for weeks of running continuously?
And if not where the problem is? I thought the overflow was handled by
if (t.QuadPart < time_.QuadPart) // time wrap
done = 1;
But maybe thats not enough?
EDIT: Please observe that I did not write the original code, Ryan M. Geiss did, the link to the original source of the code is in the code.
QueryPerformanceCounter is notorious for its unreliability. It's fine to use for individual short-interval timing, if you're prepared to handle abnormal results. It is not exact - It's typically based on the PCI bus frequency, and a heavily loaded bus can lead to lost ticks.
GetTickCount is actually more stable, and can give you 1ms resolution if you've called timeBeginPeriod. It will eventually wrap, so you need to handle that.
__rdtsc should not be used, unless you're profiling and have control of which core you're running on and are prepared to handle variable CPU frequency.
GetSystemTime is decent for longer periods of measurements, but will jump when the system time is adjusted.
Also, Sleep(0) does not do what you think it does. It will yield the cpu if another context wants it - otherwise it'll return immediately.
In short, timing on windows is a mess. One would think that today it'd be possible to get accurate long-term timing from a computer without going through hoops - but this isn't the case. In our game framework we're using several time sources and corrections from the server to ensure all connected clients have the same game time, and there's a lot of bad clocks out there.
Your best bet would likely be to just use GetTickCount or GetSystemTime, wrap it into something that adjusts for time jumps/wrap arounds.
Also, you should convert your double interval to an int64 milliseconds and then use only integer math - this avoids problems due to floating point types' varying accuracy based on their contents.
Based on your comment, you probably should be using Waitable Timers instead.
See the following examples:
Using Waitable Timer Objects
Using Waitable Timers with an Asynchronous Procedure Call
Performance counters are 64-bit, so they are large enough for years of running continuously. For example, if you assume the performance counter increments 2 billion times each second (some imaginary 2 GHz processor) it will overflow in about 290 years.
Using a nanosecond-scale timer to control something like Sleep() that at best is precise to several milliseconds (and usually, several dozen milliseconds) is somewhat controversary anyway.
A different approach you might consider would be to use WaitForSingleObject or a similar function. This burns less CPU cycles, causes a trillion fewer context switches over the day, and is more reliable than Sleep(0), too.
You could for example create a semapore and never touch it in normal operation. The semaphore exists only so you can wait on something, if you don't have anything better to wait on. Then you can specify a timeout in milliseconds up to 49 days long with a single syscall. And, it will not only be less work, it will be much more accurate too.
The advantage is that if "something happens", so you want to break up earlier than that, you only need to signal the semaphore. The wait call will return instantly, and you will know from the WAIT_OBJECT_0 return value that it was due to being signaled, not due to time running out. And all that without complicated logic and counting cycles.
The problem you asked about most directly:
if (t.QuadPart < time_.QuadPart)
should instead be this:
if (t.QuadPart - time_.QuadPart < 0)
The reason for that is that you want to look for wrapping in relative time, not absolute time. Relative time will wrap (1ull<<63) time units after the reference call to QPC. Absolute time might wrap (1ull<<63) time units after reboot, but it could wrap at any other time it felt like it, that's undefined.
QPC is a little bugged on some systems (older RDTSC-based QPCs on early multicore CPUs, for instance) so it may be desirable to allow small negative time deltas like so:
if (t.QuadPart - time_.QuadPart < -1000000) //time wrap
An actual wrap will produce a very large negative time deltas, so that's safe. It shouldn't be necessary on modern systems, but trusting microsoft is rarely a good idea.
...
However, the bigger problem there with time wrapping is in the fact that ticks_to_wait, ticks_passed, and ticks_left are all int, not LARGE_INT or long long like they should be. This makes most of that code wrap if any significant time periods are involved - and "significant" in this context is platform dependent, it can be on the order of 1 second in a few (rare these days) cases, or even less on some hypothetical future system.
Other issues:
if (time_.QuadPart != 0)
Zero is not a special value there, and should not be treated as such. My guess is that the code is conflating QPC returning a time of zero with QPCs return value being zero. The return value is not the 64 bit time passed by pointer, it's the BOOL that QPC actually returns.
Also, that loop of Sleep(0) is foolish - it appears to be tuned to behave correctly only on a particular level of contention and a particular per-thread CPU performance. If you need resolution that's a horrible idea, and if you don't need resolution then that entire function should have just been a single call to Sleep.